Itty Bitty Stack Machine

A Virtual Computer for Operating System Development


This document describes the 32-bit Itty Bitty Stack Machine (IBSM), hopefully in sufficient detail as to enable a competent programmer to implement it, the virtual computer we use for operating system research and development. There are two objectives served in this design:

1.  The virtual computer must be fast, that is, it should be capable of emulation at a raw speed not slower than 10% of the host native hardware.

2.  It must be simple, so that not very much host code is needed to implement it.

The initial IBSM implementation, written in HyperTalk compiled to 68000 code emulated in the MacOS PowerPC emulator, is neither fast nor simple. However, it does provide a lot of debugging facilities that would not be present in the production implementation.

The first implementation had a very clumsy tagged address mechanism to support multiple address segments. That has been removed and simplified, so all memory addresses are now offset from a single Base Register. There is now only one load and one store instruction, for local frame-based access, and a couple conversion operators for converting that to Base-Register form and back.
 

Architectural Overview

The IBSM is a 32-bit word-addressed stack-based computer with segmented addressing. There are seven registers, some used for program execution and the rest for memory address mapping and bounds checking. There are a dozen or so vectored interrupts, and up to 1024 words of memory-mapped Input/Output space.

The peripheral hardware is idealized to simplify the system management of it:

The Keyboard provides keypress interrupts with ASCII data and the real-time state of modifier keys.

The Mouse provides button interrupts and real-time position information, relative to the virtual screen.

The Screen maintains its own buffer and supports four high-level graphical operations, plus the ability to set rectangle-based clipping regions.

The virtual Disk drive is sector-mapped direct-memory transfer, with interrupt on completion.

There is a Host interface for sending a variety of operations out to the host platform, and for receiving results (and interrupts) back.

There is a place for Serial interfaces, not presently implemented (these are all handled by the Host).

Because the architecture is word-addressed, Endianness is not normally a concern to the programmer. However, character strings are stored in sequential bytes in the host-defined endian order. The IBSM has a special shift operation that extracts and inserts bytes in the correct position for the underlying hardware, completely transparently to the software. The software can of course test for endianness, but except for importing cross-platform program files, this is not necessary.

The IBSM packs up to six (or sometimes seven) 5-bit instructions into a 32-bit instruction word. This allows for 31 5-bit frequently used instructions, and 32 10-bit less-frequently used instructions, of which the first 5-bit nibble is an escape from the first set. These are described below in detail. There is no requirement to fill the instruction word; unused bits filled with zero are no-operations, skipped by the emulator. However, denser code puts less of a burden on the memory bandwidth, resulting in faster execution. Furthermore, because the multiple-op instruction word is not normally interruptable in the middle, it runs faster in emulation due to fewer tests for interrupts.

Memory addresses in memory are relative to a base register, so that both program code and data can be dynamically relocated by the operating system while preserving the addresses of data stored in there. The base register is not normally accessible to programmers in application programs, but is maintined by the system software for memory management.
 

Registers

There are three address control registers, one defining relocation, and two setting upper and lower limits to that region.
BR  is the Base Register, which relocates everything. Code if offset negative from BR, and the stack is offset positive.

HL  is the Heap Lower bound register, typically below the code.

SX  is the Stack eXtent, the upper limit of memory allocated for function stack growth.

Except for system software (with BR=SX=0), programs are not permitted to access memory outside the HL-SX range.

There are three registers which control program operation:

FP  is the  current local variable frame set on entering a function, and restored to its previous value on exit. It is always GL-relative.

PC  is the program counter, the address of the next instruction fetch. It is always CB-relative.

SP  is the current stack pointer, the place in memory where data is pushed and popped. The SP always points to the word most recently pushed, the top of the stack. The stack grows upward. When pushed onto the stack, SP is GL-relative; when stored in connection with an interrupt, it is always absolute.


During interrupts, the registers are popped off the stack in the order given above (BR,SX,FP,PC) and pushed in the reverse order. The HL register is reloaded from memory offset from BR. The SP stored (and retrieved) in an interrupt context switch always points to the pushed BR value thus:
 

SP Æ BR
SX-BR
FP-BR
PC-BR
other data

Semaphores

The IBSM supports thread synchronization by means of a hardware-assisted counting semaphore. A semaphore word counts in the negative direction; positive values are deemed to be process identifiers (typically an absolute address in memory, although the IBSM hardware doesn't care), which are handled by an appropriate interrupt service routine.

There are two instructions to operate on semaphores: Signal and Wait. If the semaphore has a (negative) count, Wait increases the value and continues executing; otherwise it triggers the interrupt. If a semaphore value is positive, Signal triggers its interrupt; otherwise it counts down (in the negative direction) and continues execution.

The operating system is expected to handle the interrupts and effect appropriate process management for the processes that block (Wait) on a semaphore with no (negative) counts, or conversely to prepare for continued execution processes already blocked on a semaphore that is Signalled.

A reader-writer pair of processes on a circular buffer can set up a pair of semaphores, Fill and Take, with Fill pre-counted with the buffer size. The reader process Waits on the Take semaphore, and when unblocked, advances its own buffer pointer and takes the data, and finally Signals the Fill semaphore. Conversely, the writer process Waits on the Fill semaphore, then advances its own buffer pointer and inserts a new datum, and finally Signals the Take semaphore. No deadlock, buffer overrun, nor race condition is possible in this simple example.
 

Process Context Switching

The IBSM is designed to facilitate context exchange by automatically storing most of the process context on its own stack, leaving only a few things to be taken care of by the system. There are four kinds of causes for context switching:
A.  Process timeout (timer interrupt).
B.  I/O pending or completion (I/O interrupt).
C.  Semaphore Signal/Wait (semaphore interrupt).
D.  System Call request by the software (SysCall interrupt).
In all cases, the context switch is triggered by some kind of interrupt, which has already stacked the process state and stored its SP in the interrupt vector designated address. The service routine, upon determining that context switch is called for, need only save the SP value in the current process information block, do whatever associated housekeeping is needed, then replace it with the next process SP when it resumes.
 

Interrupts

Interrupts are a special form of context switch. The hardware stores the four stackable registers, then gets an address from the interrupt vector and exchanges the interrupted process SP with the value there, then pops off the four registers from this new stack. The HL register is fetched implicitly from the address BR-2. A Cocall instruction does the same, but specifies the address of the SP exchange location, typically the same as the corresponding interrupt vector entry. Thus Cocall instructions can be used to implement coroutines, but their most frequent use is interrupt service exit.

The following are the interrupt vector assignments, which is the absolute address of that vector entry:

 0  Illegal operation
 1  PC=0 (program exit)
 2  Stack overflow
 3  System Call
 4  Signal
 5  Wait
 6  Timer
 7  Disk I/O complete
 8  Mouse click
 9  Keypress
10  Host event
11  Serial I/O
15  Debugger trap
The IBSM will not interrupt system software (BR=SX=0) for a pending I/O interrupt (6-15), but it will, if still pending, be activated upon completion of the first instruction word after resuming user program. One of the words in the I/O space has a bit for each of these interrupt numbers, and that bit can be turned off by the software to unpend the currently pending interrupt. Only the debugger trap bit can be turned on in software. To single-step a user program in a software debugger, the interrupt service handler need only set the pending bit and Cocall out.
 

Input/Output Space

The following addresses are defined for virtual hardware input and output. A store into one of these locations outputs the corresponding data, and reading the value gets the corresponding input. Some of these are delayed, possibly until a particular output word triggers the action, or the input may be sampled at intervals less frequently than continuously. The software needs to accomodate the output order requirements, but it can safely ignore the input delays, which will typically be signalled by an interrupt.

All I/O space addresses are in low memory:

10     Interrupt pending bits

11     Video command; a store initates the action.
12     Video memory address
13     Video datum/color
14     Video top bound
15     Video left bound
16     Video bottom bound
17     Video right bound

18     Disk datum/command; a store here initiates the action.
19     Disk word count
1A     Disk sector address
1B     Disk memory address

1C     Serial datum/command
1D     Serial port address
1E     Serial word count
1F     Serial memory address

20     Keypress datum, modifiers in high half

21     Mouse button(s) in high bits
22     Mouse vertical coordinate
23     Mouse horizontal coordinate

24     Current time in seconds from 12:00am 2000 Jan 1.
25     Current time in milliseconds from sometime recent (last boot? midnight?)

26     Host Command; a store here initiates the action.
27-2F  additional Host parameters
 

Video Commands

The idealized IBSM video controls are intended to simplify the operation of screen management in the operating system. Real hardware is much more complex. Seven commands are currently defined:

0.  Screen Rectangle.  Returns the boundary rectangle of the screen hardware or host window.

1.  Text bits. Beginning at the specified address, each word of data represents one pixel column, initially aligned with the top/left coordinates supplied, and continuing one word per pixel column until the right bound is reached. The least significant bit is the top pixel, as shown:

In this diagram, the displayed glyph is on the left. By imagining it rotated clockwise 90o, each column of pixels is one word of bits (here shown divided at the byte boundary, although IBSM does not make that distinction). Note that the 1-bits are the pixels colored by this command; the 0 bits are left unchanged on the display. The tallest character that can be displayed in a single command is 32 pixels, but taller characters can be constructed from 32-pixel swaths. This command could also be used to build up an icon from successive applications of colors.

2.  Fill rectangle. The boundary rectangle is filled with the specified color; the memory address is ignored. Horizontal and vertical lines can be programmed from 1-pixel-wide rectangles.

4.  Blit image. Consecutive words are copied to the screen, four pixels per word. The image is defined by the boundary rectangle, but the source data at the memory address can be arranged in lines of a different length that the specified width. The source line width (in 4-pixel words) is given in the datum.

5.  Capture image. This is the same as Blit, but the pixels go the other direction.

8.  Clip rectangle. This sets a clipping rectangle in screen coordinates, outside which drawing will not happen. It is initially by default defined to be the whole screen.

9.  Hide rectangle. This removes from the clipping area the specified rectangle. There is an implementation defined limit to the number of Hides that can take chunks out of a single Clip, but it is at least 20. The example below can be programmed by one Clip (10,20,50,90) and two Hides (10,65,25,90) and (35,20,50,70):

The normal use for Clip/Hide is to Clip an application window for drawing, then remove from it by Hide all the rectangles of windows partially overlapping it. The current implementation has enhanced the clipping region handling somewhat.

Colors

The IBSM simplified screen is limited to 216 colors specified by a single byte each pixel, where the color is made up of one of six values for each primary, thus:
 
Blue Green Red
36 
72  12 
108  18 
144  24
180  30 

Summing the color values for each of the primaries gives a reasonable distribution of colors. Black is 0, white is 215, and gray is any multiple of 43. Using color values greater than 215 may have unpredictable results. This will not produce beautiful JPEG photographs, but it works for software development. Future implementations will probably define a larger color space.
 

Instruction Codes

There are 32 possible primary instructions, one of which is an escape: enabling a set of 32 secondary instructions. The secondary instruction code must be in the same word as its escape, but otherwise packing is arbitrary, whatever will fit. Instructions in the range 0 to 3 can actually fit in as a 7th instruction in the word.

One of the primary instructions captures the next 10 bits as a signed constant in the range [-512,+511], and pushed onto the stack. The constant must be in the same instruction word, but if it extends off the end it is merely limited to [0,+3] or [0,+127], depending on how many bits are available.

Another primary instruction pushes a whole word constant onto the stack. The word comes from the next instruction word pointed to by the PC, which is then incremented over it. You can push up to six consecutive constants this way in one instruction word, followed by the six constant values. The next instruction is taken from the word following the sixth constant.

There are no addressing modes in the instructions themselves; all memory addresses are pushed onto the stack as constants (or calculated), and then used by one- or two-nibble opcodes.

 0  NOP  -- no operation, also word filler
 1  BZ   -- branch if false: pop 2 words, add the top word to PC if the next =0
 2  CALL -- push PC; load PC from subroutine in popped word
 3  SYS  -- software interrupt
 4  ZERO -- push 0
 5  ONE  -- push 1
 6  TWO  -- push 2
 7  THRE -- push 3
 8  MTWO -- push -2
 9  MONE -- push -1
10  PSH  -- 10-bit constant in instruction word
11  PUSH -- 32-bit constant (next word)
12  LDF  -- replace top of stack from FP-based word it addresses
13  STF  -- pop 2 words, top is FP-based address, next is word to store
14  GFR  -- globalize frame ref, then tag as GL-relative
15  NEG  -- negate top of stack
16  AND  -- push the bitwise logical AND of the top two words
17  ADD  -- push the sum of the top two words
18  MPY  -- push the product of the top two words
19  DIV  -- pop 2 words, divide top into next, push quotient then remainder
20  EQU  -- push a boolean result, =1 if the top two words are equal
21  LSS  -- push a boolean result, =1 if the top word is greater than the next
22  ROT3 -- remove the top word and insert it into the stack under the next two
23  SWAP -- exchange the top two words
24  POP  -- remove the top word
25  DUPE -- push a copy of the top word
26  RNG  -- replace top word on the stack with a boolean, =1 if next is in-range
27  GLOB -- tag address as GL-rel
28  MZERO -- push 80000000
29  SOS  -- pop top word, then use it as in index into stack, to swap with the new top
30  SHFT -- pop top word, shift next left or (negative) right
31  ESC  -- enables the secondary set...

 0' ERRZ   -- pop top word, take illegal interrupt if =0
 1' EXIT   -- function exit
 2' COCALL -- cocall/interrupt exit
 3' ENTER  -- function entry
 4' SIGNAL -- signal
 5' WAIT   -- wait
 6'        -- (reserved for future use)
 7'        -- (reserved for future use)
 8' DEBUG  -- enter debugger (not yet fully defined)
 9' COPY   -- pop 3 words: word count, destination and source addresses; copy words
10' BYCPY  -- same, but each address is 2 words, including a byte offset, for a byte copy
11'        -- (reserved for future use)
12' LDD    -- based off FP, float-double load
13' STD    -- based off FP, float-double store
14'        -- (reserved for future use)
15' FNEG   -- floating negative
16'        -- (reserved for future use)
17' FADD   -- floating add
18' FMPY   -- floating multiply
19' FDIV   -- floating divide
20' FEQU   -- push a boolean result, =1 if the top two float doubles are equal
21' FLSS   -- push a boolean result, =1 if the top float doubles is greater than the next
22' FLEQ   -- push a boolean result, =1 if the top float doubles is greater than or equal to the next
23' BES    -- swap top two words if Big-Endian, else do nothing
24'        -- (reserved for future use)
25'        -- (reserved for future use)
26'        -- (reserved for future use)
27'        -- (reserved for future use)
28' pSP    -- push SP-BR, from the value SP was before push
29' pPC    -- push PC-BR
30' pFP    -- push FP-BR
31' pBR    -- push BR

Some of these operations require further explanation.

BZ  There is only one branch instruction. It pops two words, an offset to be added to the PC if the next word is zero. To get an unconditional branch, just push a zero before the jump offset. To get an indexed jump, calculate the offset instead of pushing a constant. To jump to an absolute address, push the address, then push the PC, convert it to absolute, and subtract it (depending on whether you get this all into a single word, you may need an adjustment).

CALL  All subroutine calls are indexed negatively in a BR-based table by number. The subroutine number is popped off the stack, the return address is pushed, then the PC is replaced by the offset found in the indexed table (BR-index). Any instructions remaining in the instruction word are discarded.

ENTER  This pops one word off the stack, which it takes to be the number of local variable words to allocate, added to the SP after pushing the FP and setting the new FP to point to the old one on the stack. Thus the first variable is at offset +1, and the last argument pushed before the CALL is at offset -2.

EXIT  This pops one word off the stack, which it takes to be the number of function parameters words to discard, after restoring the FP that was saved by ENTER and reloading the PC from where it was saved by CALL.

STF pops an address off the top of the stack, then the data to store.

COPY  This is either a multi-word copy, or a multi-word constant fill. Push three values, a source value or address, then a destination address, then the number of words. If the word count is positive, that number of words is copied from source to destination; if negative, then the source value is replicated to fill that many words. Overlapping operand are copied properly with no collision. If the opcode is alone in its instruction word, then if an interrupt occurs during a long copy, the stack is restored to reflect the work already done, the interrupt is taken, and the copy resumes upon return. If there are other instructions in the instruction word, then it cannot be interrupted. BYCPY allows for character string copy.

RNG  This is a generalized array bounds test. The top number is taken as the array upper bound, and the next word as the index. The index is left on the stack and the top word is replaced by a boolean, 1 means that the index is less than the upper bound and not negative. A special case occurs if the upper bound is negative: the next word below the index is takesn as a pointer to an array with the length stored at that address -1; that is the upper bound used instead of the given -1. This assumes that all index calculation has already been done.

GLOB  There is no GL-based load or store; to access (global) variables at a fixed offset from the GL register, push the offset constant, then apply the GLOB instruction to subtract the FP from that offset. Subsequent LDF or STF instructions will get the right address.

GFR  To pass a local variable address as an address, it needs to be tagged in such a way that the subroutine using it will not be confused into using its own FP as a base register. GFR adds the FP to an offset and tags it as GL-relative, which will work anywhere in this program, even if the entire data frame is subsequently relocated. To pass the address to another program requires absolutizing it (see UAA).

NEG, ADD  There is no subtract instruction. Just negate the top word and add.

DIV  The integer divide operation produces both a quotient and a remainder. POP the one you don't want.

SHF  The top word on the stack is a shift count. For simple shifts up to 31 bits in either direction (negative shifts right), just push the bit count. If the count is in the range -64 to -68, magic happens: the low 8 bits of the word to be shifted are shifted into the position where they belong for the proper endianness of the underlying hardware, or conversely the byte is shifted down to the low 8-bit position. There are built-in compiler functions to access these operations properly, they are too hard to describe. They make strings work right. Aren't virtual computers wonderful? You can do anything you want.
 

First Draft 2004 July 31
Rev. 2005 March 11