This document describes the 32-bit Itty Bitty Stack Machine (IBSM), hopefully in sufficient detail as to enable a competent programmer to implement it, the virtual computer we use for operating system research and development. There are two objectives served in this design:
1. The virtual computer must be fast, that is, it should be capable of emulation at a raw speed not slower than 10% of the host native hardware.
2. It must be simple, so that not very much host code is needed to implement it.
The initial IBSM implementation, written in HyperTalk compiled to 68000 code emulated in the MacOS PowerPC emulator, is neither fast nor simple. However, it does provide a lot of debugging facilities that would not be present in the production implementation.
The first implementation had a very clumsy tagged address mechanism
to support multiple address segments. That has been removed and simplified,
so all memory addresses are now offset from a single Base Register. There
is now only one load and one store instruction, for local frame-based access,
and a couple conversion operators for converting that to Base-Register
form and back.
The peripheral hardware is idealized to simplify the system management of it:
The Keyboard provides keypress interrupts with ASCII data and the real-time state of modifier keys.Because the architecture is word-addressed, Endianness is not normally a concern to the programmer. However, character strings are stored in sequential bytes in the host-defined endian order. The IBSM has a special shift operation that extracts and inserts bytes in the correct position for the underlying hardware, completely transparently to the software. The software can of course test for endianness, but except for importing cross-platform program files, this is not necessary.
The Mouse provides button interrupts and real-time position information, relative to the virtual screen.
The Screen maintains its own buffer and supports four high-level graphical operations, plus the ability to set rectangle-based clipping regions.
The virtual Disk drive is sector-mapped direct-memory transfer, with interrupt on completion.
There is a Host interface for sending a variety of operations out to the host platform, and for receiving results (and interrupts) back.
There is a place for Serial interfaces, not presently implemented (these are all handled by the Host).
The IBSM packs up to six (or sometimes seven) 5-bit instructions into a 32-bit instruction word. This allows for 31 5-bit frequently used instructions, and 32 10-bit less-frequently used instructions, of which the first 5-bit nibble is an escape from the first set. These are described below in detail. There is no requirement to fill the instruction word; unused bits filled with zero are no-operations, skipped by the emulator. However, denser code puts less of a burden on the memory bandwidth, resulting in faster execution. Furthermore, because the multiple-op instruction word is not normally interruptable in the middle, it runs faster in emulation due to fewer tests for interrupts.
Memory addresses in memory are relative to a base register, so that
both program code and data can be dynamically relocated by the operating
system while preserving the addresses of data stored in there. The base
register is not normally accessible to programmers in application programs,
but is maintined by the system software for memory management.
BR is the Base Register, which relocates everything. Code if offset negative from BR, and the stack is offset positive.Except for system software (with BR=SX=0), programs are not permitted to access memory outside the HL-SX range.
HL is the Heap Lower bound register, typically below the code.
SX is the Stack eXtent, the upper limit of memory allocated for function stack growth.
There are three registers which control program operation:
FP is the current local variable frame set on entering a function, and restored to its previous value on exit. It is always GL-relative.
PC is the program counter, the address of the next instruction fetch. It is always CB-relative.
SP is the current stack pointer, the place in memory where data is pushed and popped. The SP always points to the word most recently pushed, the top of the stack. The stack grows upward. When pushed onto the stack, SP is GL-relative; when stored in connection with an interrupt, it is always absolute.
During interrupts, the registers are popped off the stack in the order given above (BR,SX,FP,PC) and pushed in the reverse order. The HL register is reloaded from memory offset from BR. The SP stored (and retrieved) in an interrupt context switch always points to the pushed BR value thus:
There are two instructions to operate on semaphores: Signal and Wait. If the semaphore has a (negative) count, Wait increases the value and continues executing; otherwise it triggers the interrupt. If a semaphore value is positive, Signal triggers its interrupt; otherwise it counts down (in the negative direction) and continues execution.
The operating system is expected to handle the interrupts and effect appropriate process management for the processes that block (Wait) on a semaphore with no (negative) counts, or conversely to prepare for continued execution processes already blocked on a semaphore that is Signalled.
A reader-writer pair of processes on a circular buffer can set up a
pair of semaphores, Fill and Take, with Fill pre-counted
with the buffer size. The reader process Waits on the Take semaphore,
and when unblocked, advances its own buffer pointer and takes the data,
and finally Signals the Fill semaphore. Conversely, the writer
process Waits on the Fill semaphore, then advances its own buffer
pointer and inserts a new datum, and finally Signals the Take
semaphore. No deadlock, buffer overrun, nor race condition is possible
in this simple example.
A. Process timeout (timer interrupt).In all cases, the context switch is triggered by some kind of interrupt, which has already stacked the process state and stored its SP in the interrupt vector designated address. The service routine, upon determining that context switch is called for, need only save the SP value in the current process information block, do whatever associated housekeeping is needed, then replace it with the next process SP when it resumes.
B. I/O pending or completion (I/O interrupt).
C. Semaphore Signal/Wait (semaphore interrupt).
D. System Call request by the software (SysCall interrupt).
The following are the interrupt vector assignments, which is the absolute address of that vector entry:
0 Illegal operationThe IBSM will not interrupt system software (BR=SX=0) for a pending I/O interrupt (6-15), but it will, if still pending, be activated upon completion of the first instruction word after resuming user program. One of the words in the I/O space has a bit for each of these interrupt numbers, and that bit can be turned off by the software to unpend the currently pending interrupt. Only the debugger trap bit can be turned on in software. To single-step a user program in a software debugger, the interrupt service handler need only set the pending bit and Cocall out.
1 PC=0 (program exit)
2 Stack overflow
3 System Call
7 Disk I/O complete
8 Mouse click
10 Host event
11 Serial I/O
15 Debugger trap
All I/O space addresses are in low memory:
10 Interrupt pending bits
11 Video command; a store initates the action.
12 Video memory address
13 Video datum/color
14 Video top bound
15 Video left bound
16 Video bottom bound
17 Video right bound
18 Disk datum/command; a store here initiates the action.
19 Disk word count
1A Disk sector address
1B Disk memory address
1C Serial datum/command
1D Serial port address
1E Serial word count
1F Serial memory address
20 Keypress datum, modifiers in high half
21 Mouse button(s) in high bits
22 Mouse vertical coordinate
23 Mouse horizontal coordinate
24 Current time in seconds from 12:00am 2000 Jan 1.
25 Current time in milliseconds from sometime recent (last boot? midnight?)
26 Host Command; a store here initiates the action.
27-2F additional Host parameters
0. Screen Rectangle. Returns the boundary rectangle of the screen hardware or host window.
1. Text bits. Beginning at the specified address, each word of data represents one pixel column, initially aligned with the top/left coordinates supplied, and continuing one word per pixel column until the right bound is reached. The least significant bit is the top pixel, as shown:
2. Fill rectangle. The boundary rectangle is filled with the specified color; the memory address is ignored. Horizontal and vertical lines can be programmed from 1-pixel-wide rectangles.
4. Blit image. Consecutive words are copied to the screen, four pixels per word. The image is defined by the boundary rectangle, but the source data at the memory address can be arranged in lines of a different length that the specified width. The source line width (in 4-pixel words) is given in the datum.
5. Capture image. This is the same as Blit, but the pixels go the other direction.
8. Clip rectangle. This sets a clipping rectangle in screen coordinates, outside which drawing will not happen. It is initially by default defined to be the whole screen.
9. Hide rectangle. This removes from the clipping area the specified rectangle. There is an implementation defined limit to the number of Hides that can take chunks out of a single Clip, but it is at least 20. The example below can be programmed by one Clip (10,20,50,90) and two Hides (10,65,25,90) and (35,20,50,70):
The normal use for Clip/Hide is to Clip an application window for drawing, then remove from it by Hide all the rectangles of windows partially overlapping it. The current implementation has enhanced the clipping region handling somewhat.
Summing the color values for each of the primaries gives a reasonable
distribution of colors. Black is 0, white is 215, and gray is any multiple
of 43. Using color values greater than 215 may have unpredictable results.
This will not produce beautiful JPEG photographs, but it works for software
development. Future implementations will probably define a larger color
One of the primary instructions captures the next 10 bits as a signed constant in the range [-512,+511], and pushed onto the stack. The constant must be in the same instruction word, but if it extends off the end it is merely limited to [0,+3] or [0,+127], depending on how many bits are available.
Another primary instruction pushes a whole word constant onto the stack. The word comes from the next instruction word pointed to by the PC, which is then incremented over it. You can push up to six consecutive constants this way in one instruction word, followed by the six constant values. The next instruction is taken from the word following the sixth constant.
There are no addressing modes in the instructions themselves; all memory addresses are pushed onto the stack as constants (or calculated), and then used by one- or two-nibble opcodes.
0 NOP -- no operation, also word fillerSome of these operations require further explanation.
1 BZ -- branch if false: pop 2 words, add the top word to PC if the next =0
2 CALL -- push PC; load PC from subroutine in popped word
3 SYS -- software interrupt
4 ZERO -- push 0
5 ONE -- push 1
6 TWO -- push 2
7 THRE -- push 3
8 MTWO -- push -2
9 MONE -- push -1
10 PSH -- 10-bit constant in instruction word
11 PUSH -- 32-bit constant (next word)
12 LDF -- replace top of stack from FP-based word it addresses
13 STF -- pop 2 words, top is FP-based address, next is word to store
14 GFR -- globalize frame ref, then tag as GL-relative
15 NEG -- negate top of stack
16 AND -- push the bitwise logical AND of the top two words
17 ADD -- push the sum of the top two words
18 MPY -- push the product of the top two words
19 DIV -- pop 2 words, divide top into next, push quotient then remainder
20 EQU -- push a boolean result, =1 if the top two words are equal
21 LSS -- push a boolean result, =1 if the top word is greater than the next
22 ROT3 -- remove the top word and insert it into the stack under the next two
23 SWAP -- exchange the top two words
24 POP -- remove the top word
25 DUPE -- push a copy of the top word
26 RNG -- replace top word on the stack with a boolean, =1 if next is in-range
27 GLOB -- tag address as GL-rel
28 MZERO -- push 80000000
29 SOS -- pop top word, then use it as in index into stack, to swap with the new top
30 SHFT -- pop top word, shift next left or (negative) right
31 ESC -- enables the secondary set...
0' ERRZ -- pop top word, take illegal interrupt if =0
1' EXIT -- function exit
2' COCALL -- cocall/interrupt exit
3' ENTER -- function entry
4' SIGNAL -- signal
5' WAIT -- wait
6' -- (reserved for future use)
7' -- (reserved for future use)
8' DEBUG -- enter debugger (not yet fully defined)
9' COPY -- pop 3 words: word count, destination and source addresses; copy words
10' BYCPY -- same, but each address is 2 words, including a byte offset, for a byte copy
11' -- (reserved for future use)
12' LDD -- based off FP, float-double load
13' STD -- based off FP, float-double store
14' -- (reserved for future use)
15' FNEG -- floating negative
16' -- (reserved for future use)
17' FADD -- floating add
18' FMPY -- floating multiply
19' FDIV -- floating divide
20' FEQU -- push a boolean result, =1 if the top two float doubles are equal
21' FLSS -- push a boolean result, =1 if the top float doubles is greater than the next
22' FLEQ -- push a boolean result, =1 if the top float doubles is greater than or equal to the next
23' BES -- swap top two words if Big-Endian, else do nothing
24' -- (reserved for future use)
25' -- (reserved for future use)
26' -- (reserved for future use)
27' -- (reserved for future use)
28' pSP -- push SP-BR, from the value SP was before push
29' pPC -- push PC-BR
30' pFP -- push FP-BR
31' pBR -- push BR
BZ There is only one branch instruction. It pops two words, an offset to be added to the PC if the next word is zero. To get an unconditional branch, just push a zero before the jump offset. To get an indexed jump, calculate the offset instead of pushing a constant. To jump to an absolute address, push the address, then push the PC, convert it to absolute, and subtract it (depending on whether you get this all into a single word, you may need an adjustment).
CALL All subroutine calls are indexed negatively in a BR-based table by number. The subroutine number is popped off the stack, the return address is pushed, then the PC is replaced by the offset found in the indexed table (BR-index). Any instructions remaining in the instruction word are discarded.
ENTER This pops one word off the stack, which it takes to be the number of local variable words to allocate, added to the SP after pushing the FP and setting the new FP to point to the old one on the stack. Thus the first variable is at offset +1, and the last argument pushed before the CALL is at offset -2.
EXIT This pops one word off the stack, which it takes to be the number of function parameters words to discard, after restoring the FP that was saved by ENTER and reloading the PC from where it was saved by CALL.
STF pops an address off the top of the stack, then the data to store.
COPY This is either a multi-word copy, or a multi-word constant fill. Push three values, a source value or address, then a destination address, then the number of words. If the word count is positive, that number of words is copied from source to destination; if negative, then the source value is replicated to fill that many words. Overlapping operand are copied properly with no collision. If the opcode is alone in its instruction word, then if an interrupt occurs during a long copy, the stack is restored to reflect the work already done, the interrupt is taken, and the copy resumes upon return. If there are other instructions in the instruction word, then it cannot be interrupted. BYCPY allows for character string copy.
RNG This is a generalized array bounds test. The top number is taken as the array upper bound, and the next word as the index. The index is left on the stack and the top word is replaced by a boolean, 1 means that the index is less than the upper bound and not negative. A special case occurs if the upper bound is negative: the next word below the index is takesn as a pointer to an array with the length stored at that address -1; that is the upper bound used instead of the given -1. This assumes that all index calculation has already been done.
GLOB There is no GL-based load or store; to access (global) variables at a fixed offset from the GL register, push the offset constant, then apply the GLOB instruction to subtract the FP from that offset. Subsequent LDF or STF instructions will get the right address.
GFR To pass a local variable address as an address, it needs to be tagged in such a way that the subroutine using it will not be confused into using its own FP as a base register. GFR adds the FP to an offset and tags it as GL-relative, which will work anywhere in this program, even if the entire data frame is subsequently relocated. To pass the address to another program requires absolutizing it (see UAA).
NEG, ADD There is no subtract instruction. Just negate the top word and add.
DIV The integer divide operation produces both a quotient and a remainder. POP the one you don't want.
SHF The top word on the stack is a shift count. For simple
shifts up to 31 bits in either direction (negative shifts right), just
push the bit count. If the count is in the range -64 to -68, magic happens:
the low 8 bits of the word to be shifted are shifted into the position
where they belong for the proper endianness of the underlying hardware,
or conversely the byte is shifted down to the low 8-bit position. There
are built-in compiler functions to access these operations properly, they
are too hard to describe. They make strings work right. Aren't virtual
wonderful? You can do anything you want.
First Draft 2004 July 31
Rev. 2005 March 11