The Operating System Language Compiler


This page describes the variant subset of Java that we will be using in the Operating Systems course. I have given it the descriptive but not very entertaining name "Operating System Language" (OSL for short).

Download OSLcompiler [Check the date on this; the G drive version is sometimes more current]
 

Contents:

Known Bugs
Language Overview
  Data Types
  Operators
  Statements
Generated Code
  System Calls
Class Library
  Hardware
 

Using the Compiler

To use the compiler, drag it from the G drive ("G:\CLASS\Computer Science\CIS4433\OSLcompiler.exe") onto your hard drive, and double-click it. It will put up an open-file dialog; navigate to your source code ("SomeSource.osl"). When it finishes, you must explicitly exit (and even then it's been known not to quit and may require a control-alt-delete termination). The object file will be a text file with the same name as the source, but with a ".ftx" (for "Floppy TeXt") extension, which the emulator knows how to read and convert to yet another (binary) file ".dsk" which the emulator knows how to boot from.
 

Known Bugs

A "feature" is a bug that's unlikely to get fixed. Some of these bugs are "features".

FEATURE The compiler gets confused over comments of the form /*...*/; use the // form, which seem to be OK. Start your program with a comment; it messes up if you begin immediately with code in the first character.

FEATURE Be careful in calling functions to initialize variables; variable allocations are re-ordered to do initialized variables first, then all uninitialized variables. This has the unfortunate effect that if you declare an array (most of which is uninitialized) in the same scope as a variable that is initialized by a function call like Read() with the side effect that it fills the array, the array will not be properly set up, and you could smash some data. sigh  Workaround: don't use function calls that have side effects to initialize variables; call them from a separate statement.

FIXED Dynamic structures like String now get allocated and deallocated OK by the compiler when you assign them to a variable or pass them as parameters to a function or return them as a function result. If you do something goofy like using the pointers in expressions (without saving them in variables), they won't be properly disposed. Don't do that. Of course the operating system is responsible for actual allocation and deallocation.

FIXED The code generator for unary minus -n and for right-shift >> has been fixed. PTL! I finally figured out what the C compiler bug was, and stopped doing that unnecessary timesaver.

FIXED The compiler now generates correct code for switch-case statements. MiniOS has been revised to use it.

FIXED The compiler now seems to generate correct code for records ("classes" with no methods). I have not yet tried putting class objects into arrays, nor arrays into objects, but they work as local and global variables. Pointers to other objects inside of these records probably won't be properly garbage-collected. Objects declared final are allocated on the stack (no memory manager calls); all others use the memory manager SysCall ops 2 and 3.

A number of little obscure bugs keep getting fixed.
 

Language Overview

Java is a pretty big language (but not as big as C++); not all of these features are needed to write general-purpose software like an operating system. On the other hand there are some language capabilities you do need, which have been purposefully omitted from Java. These have been added back in, mostly as functiions in a package to be imported.

OSL is somewhat more strongly typed than Java. That means, for example, that you cannot assign characters to integer variables, and statements do not have a value that you can re-use, for example in another assignment. Like Java and unlike C, arrays are bounds-checked.

Due to time constraints, there is a lot of Java that is simply not there, the most important being Objects. What that means is that all methods are static functions, declared outside the class structures, and there is no inheritance nor overloading. You can write in the OO paradigm, you just have to pass the object as a parameter to the function and embed superclass structures within the class structures of their subclasses. That's what the OO language compilers do for you anyway, only now it's exposed.

Because this is a language for writing operating systems, and memory management is a service provided by the operating system, OSL needs a way to allocate static arrays and data structures that do not make memory manager calls. OSL does it both ways: If you use the keyword new, OSL calls the memory allocator to allocate space; if you declare a final object, or an array with fixed array size (a constant between the brackets), OSL allocates space for it in the stack frame and does not call the memory manager. There is sample code showing both types of array declaration.
 

Data Types

OSL has the usual int, char, and boolean primitive data types, and they work correctly, except that all integers are 2 bytes (-32768 to +32767). Characters and integers are different types. To do numerical operations on character data, you must use the explicit built-in type-casting functions CHR(), which takes an integer parameter and returns the character whose ASCII value is that number, and ORD(), which takes a character parameter and returns its ASCII value. You can also apply ORD() to boolean values and get back 0 (false) or 1 (true). Although the OSL compiler is capable of compiling code to access byte data, it is clunky, so all three data types are 2 bytes.

OSL supports usual primitive type literals: (signed) integers including hexadecimal constants, characters within apostrophes (single-quotes), and the reserved words true and false. OSL knows about the 95 standard ASCII characters and these six escaped characters:

\t  tab = CHR(9)
\n  newline = CHR(10)
\r  return = CHR(13)
\"  quote
\'  apostrophe
\\  back-slant
Declaring a primitive type variable final makes it a compile-time constant for more efficient code. You must initialize it in the declaration, and you cannot assign to it later. The initialization code may make use of arithmetic operators and the built-in type-casting functions ORD() and CHR(), provided that the compiler can figure out what constant value it evaluates to. Thus the following line is valid, and initializes the variable xyz to the lower-case letter "b":
char xyz = CHR(ORD('A')+ORD('!')); // goofy (but valid) constant
You can build up arrays of the primitive types, and perhaps eventually (it has not yet been tested, and may prove too hard to get working in the time we have) arrays of other types. Class structures are in the compiler, but not yet tested. They should be working by the time you need them.

Array types are declared with the brackets to the left of the variable names, thus:

int[] dynAry = new int[15]; // dynamic array, with allocator
int[10] statAry; // static array of 10 integer words
The compiler will not let you put the brackets on the other side.

The built-in type String is identical to a dynamic array of characters; it is not an Object. You can assign quoted strings to String variables, and pass them as arguments to String parameters.

You can also pre-fill a final array with appropriate values. This builds a constant table, which does not call the memory allocator:

final int[] myTable = {3, 1, 4, 1, 5, 9, 2};
You can use the built-in function LENGTH() to obtain the upper bound of any array, including constant tables and String variables.
 

Operators

The usual Java operators are all there and functional, with these notable exceptions:

Because there are no objects and subclasses, there is no need for an instanceof operator.

The boolean operators && and || can only be used on boolean values. They also do not short-circuit yet, so if you need to test a pointer for null before dereferencing it, use nested if statements.

The integer bit-logical operators &, | and ^ cannot be used on boolean values, so they have a higher precedence than the relational operators in OSL (it is lower in C and Java). If you try to assume Java precedence, you will probably get a compiler error, but if in doubt, use parentheses.

The right bit-shift operator >> is unsigned (it is signed in Java). The Java unsigned right bit-shift operator >>> is obviously not needed.
 

Statements

All variable declarations must precede the statements in the same block. Global variables must be declared before the functions in your program. The compiler will re-order the declarations to put initialized variables before uninitialized variables, but all of them are implicitly initialized to zero. Eventually the compiler will use that fact to dispose dynamic variables that are given new values or go out of scope, but the code for that is not in yet.

The OSL compiler recognizes and correctly compiles simple (primitive or pointer) assignments, if else, while, do while, break, continue, and return statements. Code for switch case is in, but not yet known to work. It should be completely functional by the time you need it. Arrays and other larger structures must be assigned or copied element-by-element. There are no for loops (use while). Exceptions may be added at a later time, but I doubt it. Functions may take any number of 2-byte parameters (including pointers to dynamic structures) and return either a 2-byte result or void.

Functions must be declared before they are used. That means your voidmain() will generally be the last function in your program. If you need mutual recursion (which should not be necessary in a well-designed operating system), use C-style prototypes, which are in the compiler but as yet untested. They should be fully functional before you need them.
 

Generated Code

The OSL compiler generates code for the SandBox virtual machine. Most of the code sequences are pretty obvious in terms of what code gets generated for what source code. There are not (yet) very many optimizations -- indeed not many are possible. The compiler makes some assumptions about the host operating system that you will need to know about, mostly in terms of dynamic memory and system calls. Some other code generation details you might find helpful are described more fully in connection with the library functions they are associated with.
 

Dynamic Memory Blocks

When compiled OSL code creates a new dynamic array or object, it issues a SysCall#2 (see list below) with the number of bytes requested, and expects a return value consisting of a pointer to the new memory block. The sample code MiniOS supports this call. It further assumes that the two bytes immediately preceding this block are the logical size of the block (not including itself); the compiler uses this size for array bounds checking. The compiler also (but somewhat indirectly) also assumes that each dynamic block has a reference count in the byte before the size. This reference count is incremented each time a copy is made of the pointer, and decremented when a copy is re-assigned or goes out of scope; when the count is decremented to zero the block is assumed to be disposed. Note that the sample code MiniOS does not actually dispose anything, but it does properly decrement the reference count. Incrementing the reference count is done by the compiler-generated subroutine "_Up" #1.
 

System Calls

The current OSL compiler generates three defined SysCalls. Others may be added from time to time to support built-in class libraries.

#1 Array Bounds Check. This system call is used when a program makes an out of bounds access to an array, generally as tested by the aix instruction, or tries to dereference a null pointer, such as an unallocated dynamic array. The compiler generates only one such system call for each function, then jumps to it in the case of a fault from every such test. Your operating system will not be able to tell which access failed, but it might be able to tell which function was executing when it failed, if you want to look in the code base table. There is no recovery from an array bounds check fault; your system must kill the process.

#2 Allocate Memory. This system call has a second parameter (on the stack), which is the number of bytes requested. The system is expected to allocate the memory and return a pointer to the front of that many bytes, or else return null. It is expected that the 4 bytes in front of that block contain some overhead information not part of the block itself, namely a reference count (initially zero) and a size (the value requested). The reference count is used by SysCall#3Deallocate to determine if this is the last reference or not.

#3 Dereference-Count and Deallocate Memory. This system call takes one additional parameter, a pointer to the memory block. It assumes that behind this block is the usual size and reference count. A count of 16 or more (or else negative) is assumed to be permanent and never deallocated. Tables are given a negative reference count so not to confuse the memory manager. If the count is positive but less than 16, it is decremented; if it is decremented to or past zero, the block should be deallocated.
 

Class Library

Class Library? What class library? Who has time to write a class library?

The OSL compiler has a small but growing number of built-in packages that you can import and use their functions, constants, and class variables. There is no separate compilation facility, so you cannot add your own packages at this time.
 

Files

This package gives you function names for basic file handling, implemented as system calls; the operating system must appropriately handle these calls for them to work:
type FileId;
This is an opaque 2-byte type that your file system can use for anything that makes sense. It is returned by the Open function, and is required as a parameter to the other file handling functions. You can define it internally as a pointer to a file control block, or as an index into an array of file control blocks.
FileId Open(String filename);
This function is assumed to open a file and return some token reference to it as a FileId. It is implemented as SysCall#4.
void Close(FileId f);
This function is assumed to close the open file f. It is implemented as SysCall#5.
int Read(FileId f, array data);
This function is assumed to read from the open file f and place the data into the array. It should get the number of bytes to read from the array bounds word. The integer returned can be an error code or the number of bytes actually read, or both. It is implemented as SysCall#6.
int Write(FileId f, array data);
This function is assumed to write the array contents to the open file f. It should get the number of bytes to write from the array bounds word. The integer returned can be an error code or the number of bytes actually written, or both. It is implemented as SysCall#7.
void Seek(FileId f, int sector);
This function is assumed to move the current read/write offset in the open file f to a new value specified by sector. It only makes sense for disk files. Because this is a 16-bit word size, you must either limit your files to 65535 maximum bytes, or else assume that the number given is a sector count (not a byte count). It is implemented as SysCall#8.
int FileInfo(FileId f);
This function is assumed to return some kind of integer information from the open file f, presumably its permissions or size or some such. It is implemented as SysCall#9.
 

Semaphores

This package gives you safe function names for the two built-in semaphore instructions; the operating system must appropriately handle their interrupts for them to work:
type Semaphore;
This is an opaque 2-byte type representing a semaphore. It normally is the actual semaphore (initially zero). You might want to provide a system call to initialize it to other values.
void Signal(Semaphore theSem);
void Wait(Semaphore theSem);
These functions generate the appropriate signal or wait instructions. For more information on how semaphores work, see the Process Manager document.
 

Dangerous

Although OSL is strongly typed, importing this package gives you access to the C-like features you need to build a functional operating system. At this time it has four functions and one unary operator:
 
int PEEK(int addr)
int mmPEEK(int addr)
This returns the 2-byte integer stored in memory locations addr and addr+1. The virtual machine is "Big-Endian" so the byte  at addr+1 is returned in the lower half of the number. This can look anywhere in the 64K address space, at whatever is there, without protection. If there is no memory there, you will get a memory fault interrupt. If the memory manager is turned on, the address is the virtual address as mapped, not the real memory address. If it is turned off by virtue of being in an interrupt handler, then the address is real and centered on real address 0000, unless you use the mmPEEK version, which forces it to use the MMU if it points to a real table (non-zero value in MMU.address). In user mode (interrupts turned on and active), there is no difference between these two forms, it only makes a difference in system code.
 
void POKE(int addr, int datum)
void mmPOKE(int addr, int datum)
This stores the 2-byte integer datum into memory locations addr and addr+1. If there is no memory there, you will get a memory fault interrupt. Like PEEK, the mmPOKE version uses the MMU if MMU.address has been set to a non-zero value. In user mode (interrupts turned on and active), there is no difference between these two forms.
 
int FORK(int addr)
This is a grungy function for starting up a new process. Unlike the Unix function with a similar name, to use this function you must do all the allocation and setup work, and you must get it exactly right. There are examples in the sample code. Basically, you must allocate the new task's stack frame, with addr (the red arrow in the diagram) pointing to where the new process stack pointer will be when it starts. At that location should be the value addr+6, which is where this function stores a program counter value. At addr+4 should be the frame pointer for the new process, which you need to set up. The compiler will store the current Code Pointer in addr+2, so if the process is using shared code it will work properly (NOTE, this is an improvement on the original compiler, which assumed it did not know how to find the CB). See the example code. Note that you must allocate exactly as much space in the new frame for local variables are you have declared in the calling function; otherwise you will get strange results.

After this new process frame buffer is initialized, call FORK(); it returns twice, once in the new process and then again in the old process when the new process executes a COCALL() as described below. Your program can tell the difference by the return value. The new process will receive the stack pointer of the calling process as its return value. It is responsible for reaching into the calling process stack frame to set a different return value for it; we recommend zero, as in the example code.
 

void COCALL(int addr)
This function takes the address of a saved stack pointer from some other process, and performs a context switch through that address, leaving the calling process stack pointer in that memory location, and waking up whatever process was suspended there. That process can return by calling COCALL again on the same address.

The C-like unary & operator takes the address of whatever operand variable (or table name) follows it, as an integer. OSL does not (yet) have reference parameters, so this is the only to get the address of data into the I/O system, and other places where it is needed. This is a magical operator, in that you can only use it after importing the package Dangerous.
 

int SysOp0(int opC)
int SysOp1(int opC, int opnd1)
int SysOp2(int opC, int opnd1, int opnd2)
These functions take an zero, one, or two additional integer parameters, and execute the SysCall opcode to invoke a system operation that we may not have a library name for yet. For example, Read(fi,data), where fi is a FileId and data is an array could also be called with SysOp2(6,PEEK(&fi),&data) and get exactly the same compiled code, but without the type-safe protection.
 

Hardware

This package encapsulates and gives names to the Input/Output hardware addresses in the SandBox virtual machine. Eight class variables, one array, and a number of video data constants have been defined:
int[8] iVector
  // iVector[0] is MMU
  // iVector[1] is Clock
  // iVector[2] is Disk
  // iVector[3] is KeyIn
  // iVector[4] is Print
  // iVector[5] is SysCall
  // iVector[6] is Signal
  // iVector[7] is Wait
This is the interrupt handler address vector in the first 16 bytes of memory. You need to store in each item here the address of the context switch variable for its respective interrupt. See the example code.
 
class MMU {
  int fault;  // after a memory fault interrupt, this holds the virtual address being accessed
  int address;  // this is the memory address of the MMU map table currently active
  int CB;  // these three registers are used in case of a double fault (unmapped stack)
  int FP;  //   to save the processor state, which could not be pushed on the stack
  int PC;  // this register is set to 0000 if the state was successfully pushed on the stack
  }
class SignalWait {
  int address;  // this is the memory address of the semaphor that invoked the trap
  }
class Enables {
  int datum;  // low bit turns interrupts on; next 2 bits tell you if flop/printer are on
  }
final int IntOnBit = 1; // =0 if interrupts are disabled
final int PrintOnBit = 2; // =0 if printer is offline
final int FlopInBit = 4; // =0 if no floppy is mounted
final int DiskPendBit = 8; // =1 when a disk interrupt is pending

class Print {
  int datum;  // put a byte here to print it
  }
class KeyIn {
  int datum;  // get the key byte here after an interrupt
  }
class Mouse {
  int datum;  // the sign bit of this word represents the state of the (left) mouse button
  int H;  // the cursor location, relative to the left edge of the screen
  int V;  // the cursor location, relative to the top of the screen
  }
class Date {
  int hours; // the number of hours since midnight Jan 1, 2000
  int seconds; // the number of seconds in this hour
class Disk {
  int sector; // the sector address to read or write
  int address; // the memory address to read into or write from
  int count; // the number of bytes to read or (if negative) to write
  }
class Video {
  int top, left, bottom, right; // the containing rectangle to draw in, in screen pixels
  int datum; // the command and color
  int address; // the address of the font or icon
  }
final int Blit = 0; // the command to draw an icon (add line width)
final int FillRect = 16384; // the command to draw a rectangle (add color)
final int DoText = 32768; // the command to draw a text character (add color)
final int ClipRect = -3; // the command to set visible clip rectangle
final int HideRect = -1; // the command to punch a hole in visible region
final int black = 0; // the minimum color
final int white = 215; // the maximum color
final int minRed = 1; // the minimum (dark) red
final int minGreen = 6; // the minimum (dark) green
final int minBlue = 36; // the minimum (dark) blue
final int maxColor = 5; // a multiplier to turn the minimum color into its max


Other packages, classes and constants will be added as they are needed.

Revised:  2004 March 24