Using the Turk/2 Compiler


Most of this applies to my Turk68 compiler, which generates native 68K Mac applications. There is a T2C compiler which generates C code for translation to PC Win32 applications from the same source code, but there have been improvements in T68 that have not yet filtered back to the C version. The T2 language is more or less defined here, but that spec is not formally complete.

Turkish Demitasse (T2 or Turk/2) is a dialect of Java. There are both extensions and deletions, so writing a program in T2 that also compiles without change in Java is unlikely. Most notably, the language has been tightened so it is more strongly typed than Java. I also made some changes for ease of programming, and so it can be compiled down to efficient native code. These are mostly described in the T2 White Paper. On-going user interface library development is documented in "MOS API".

Like C/C++ all names must be declared before use. Like Java, objects and strings are dynamic; there are no pointer types, and (with a little care) dynamic memory is automatically reclaimed when it goes out of scope. Java does this by stopping for garbage collection from time to time, which is unacceptable in a system programming environment; T2 does it with reference counts, but structures with circular links could fail to be disposed. The programmer is responsible for ensuring circular link chains are broken before objects go out of scope.

C/C++ uses header files to link separate compiles into a single program, where several separate files can refer to each other mutually; Java does something similar. The current version of T2 requires a strict dependency hierarchy. I use method overrides to call functions in the other direction. It's slightly slower than plain function calls, but not much. Separate files are compiled as "package"s and "import"ed pretty much as in Java, except that you are not limited to one class per package.

You can also put several different packages into a single file, then use a special form of the package declaration at the front to tell the compiler which one you want to compile. The form is:

package "pkgname"
...
package pkgname; // package code follows..
Each compilation is explicitly terminated by a standalone dot after the last declaration (usually a function or class).

If there is a main program (including its imports) less than 1000 bytes from the front, and if there are empty quotes in the initial package line, the compiler will look at what the main program imports, then compile all those packages it finds in that file before compiling the main. If the first instance of the word "import" is in a line comment, this feature is disabled. Only line comments (not block comments) are recognized by this hack, so judicious use of comments can adjust what gets compiled. Real Soon Now (in Jerry Pournelle's sense of "Don't hold your breath waiting for it") I hope to have a full IDE up which can look at file and package time-stamps to do this automatically.

The entire source file must fit in memory, and each compiled package is limited to about 500K. Functions greater than 32K probably cannot be compiled because of the 16-bit branch address limit (for compatibility, the new 68020 long branches are not used). Literals and constant tables and class method dispatch tables are indexed off the back of the code; if this exceeds 32K, the access for the more distant tables will require more code (slightly larger and slower, but otherwise safe). String literals are indexed off this same code end, and distinguished from other strings by being a small (<2K) integer. You are thus limited to not more than 2040 distinct string literals in one compilation unit.

If you need to give the compiler more memory using the Finder Get-Info dialog, make sure the number of K you give it ends in the decimal digits "-013". The startup code for applications compiled by T68 (including the compiler itself) look at the "SIZE" resource low four bits to decide how much space to allocate to the stack vs the heap. If the number of K is odd, the next three bits fix a boundary proportionally, so that the default "-001" is all heap + tiny stack, and "-015" is tiny heap + all the rest stack. The compiler is deeply recursive and needs a lot of stack (3/4 of the total = 0xD = "-0013", where any multiple of 2000 is an exact multiple of 16, so that the low 4 bits can be known from the last two decimal digits). Giving it more memory without this allocation will fail. I need to make that automatic RSN.

Most T2 right braces can be optionally named according to what kind of block is being closed. This allows the compiler to check your intentions, and often it can report a brace-match error much earlier (and therefore easier to find) than when this feature is unavailable or not used. Real Soon Now I hope to add an indentation checker also.

The 68K T2 implementation does variables of the String type as Mac "handles", which can be moved around in memory to minimize fragmentation, and String literals as Mac "P-strings" starting with a length byte. This limits the length of string literals to a maximum of 255 characters, but making a literal even that long requires very long source lines because the compiler does not support \-cuts. You get some of the benefit of \-cuts by concatenation operators between literals, which are joined into a single string literal at compile time (so long as the length of the result does not exceed 255). Mac handles are not particularly fast, so there are some library functions for working with short strings implemented as P-strings within integer arrays. Because there is no memory allocation involved until the string result is extracted, this goes much faster. The framework distinguishes handle-based Strings from P-strings because the length byte pushes the address of a P-string to be odd, whereas handles are always even. The array-based P-strings can thus be used without conversion to handles, if you are careful not to expect them to outlast the integer array. The compiler offers no protection from this error, but if you are using them, you are expected to know what you are doing. The defined String type is safe.

The "private" attribute can be used on global variables and functions declared in a package, to make them invisible to other packages who might import that package. This works essentially as it does within classes. The "protected" attribute is not yet implemented. Real Soon Now.

You can write code that takes the address of a variable (or function) and does arithmetic on it as an integer, but this is patently unsafe, so it is forbidden by the compiler unless you "import Dangerous;" the built-in Dangerous package also contains special functions for generating native code and doing other (ahem) dangerous things that system code needs to do. It includes also an opaque data type "X1NT" and functions for casting it to and from integers; I use this type to make functions visible to other packages that import Dangerous, but not usable to the general public. Because you must import Dangerous to use the prefix "&" operator, there are several language extensions using the same operator in strange places to inform the compiler of system-programming intentions. These should not be a concern for normal code.

Previous versions of T2 did not support floating point, as that is essentially unnecessary for systems software development and not available on a pure 68000. However, I recently found myself implementing an ad-hoc fixed-point data type (which executes at integer speed in a target environment that has no native floating-point hardware), so I added the 'float' data type to the latest T68 compiler. If you simply use the 'float' data type, the compiler will (probably) notice and "import FloatLib;" for you, but you can thwart that by failing to "import System;" with exactly one space between the two words, and then "import RatLib;" (the fixed-point version, or else your own implementation) instead. You can only use one architecture at a time in a program, but mostly the usage is transparent -- except that adds and compares and multiplies are inlined native code, and everything is much faster in fixed-point.

The IEEE 754 compatible floating point package "FloatLib" only runs on 68030-compatible hardware, as it uses the long multiply instruction not available on 68000s. I do not know a safe way to detect which hardware things are running on, so you must tell the compiler: In the "TAGC Prefs" resource file in the Preferences folder, you need to add a new string resource (use ResEdit) containing any number greater than 0 (I use "68030"). Then all compiles will generate in-line long multiplies and divides where appropriate, instead of a system call. If you use if-tests to constrain your operands to 16 bits, then inline hardware multiplies and divides will also be used in 68000 mode. The current T68 release was compiled with the 68030 switch turned on (because it uses more memory than is available in 68000 hardware). You can use new library function "int GetCPU();" to determine whether the switch was on or off when the program containing the call was compiled.

Also new in this version of T68 is an AssertRange(a,b,c) pseudo-function, which tests the parameters for being well-ordered (a<=b<=c) at compile-time, and stops the compile if not. Normally you would use 'final' constants which are controlling other things like conditional code, but it also works with variables that have been constrained with &-AND or if-tests against constants. For example:

if (a<3) if (b>5) ... AssertRange(a,b); // compiles, but
if (a<b) ... AssertRange(a,b); // probably errors off
final int x = 3; ... AssertRange(3,x,3); // verifies at compile-time that x=3

Optimizer

The current Turk68 compiler precalculates constant integer and boolean expressions, and within limits also joins concatenated string literals into a single literal, as noted above. Blocks of code controlled by a conditional based on a constant false boolean expression are deleted from the compiled code. This is not quite as conventient as C's "#if" conditional compilation, because the deleted code must still be valid T2 code. Except within packages and classes (where a subsequent compilation might need them) unused functions are similarly deleted from the object file.

All T2 array accesses range-check the index, and all dynamic arrays and object accesses also check the pointer for null. This is normally done with a system call, which is not overly fast, but the compiler is able to recognize in many cases when the programmer already did that test, and therefore eliminates the system call in those cases. This analysis happens in the single source-code pass, so it cannot know about global variables or anything that might happen later in a loop. I always insert code to re-check pointers and subscripts after every loop, and also inside each loop, preferably just before they are used. The compiler knows about the LENGTH() of arrays in such tests. The compiler is not able to propagate variable range-check information across variable assignments -- if you test variable x, then copy it to y, the compiler will not know about y. It also does not do much with expressions involving the index variable. With a lot more effort, I could make it smarter, but this is sufficient to greatly speed up array access.

There is a system function "BoundsNullError()" which if you call it within a function, tells the compiler to generate a compile-time error anywhere in that function after its call, if it generates a system call for array-bounds or null-pointer checking. These system calls are not really errors, but they do slow down execution. Including this function call informs you at compile time if that problem is in this code.
 

Icon

It's really rather a hack, but you can define icons for your application (and in the Mac, its text and Binary files) as a sequence of comment lines using letters and other symbols to graphically define the pixels in a monospace font as is common in program editors. The first line of a 32x32 icon starts with the keyword "/// MagicApIcon." The next 32 lines must each be at least 80 characters long, with pixel information, one pixel in each pair of characters beginning after 16. I use dots along the four sides to visually align the pixels to the icon cell. In the T2C version of the compiler, you must also supply a 16x16 icon with the keyword "/// Magic16Icon." followed by 16 lines of pixels (at least 48 characters in length), but it is optional on the Mac. The TAG compiler will look for similar lines (comment lines starting with "{// " instead of "/// ") and copy them over to the generated T2 files.

The Mac has 34 defined icon colors (which I reverse-engineered from a color table in ResEdit); the PC ".ico" file usually includes a table of 16 colors, to which I gave the following symbols as useful:

 
sp white # black
R red K dark red
G green O dark green
B blue I indigo/dark blue
C cyan/light blue S darker cyan or light green
M magenta/pink P purple
Y yellow N brown
* gray . light gray
Here for example are the first few lines of the TAG Compiler 16x16 icon (see the source file for the the rest). I hope it looks better in your browser than in mine. It has a dark blue border, shading to a light blue interior, with black text:
/// Magic16Icon.+. . . . . . . !+. . . . . . . .+.
///            .              I                  .+
///            .            I B I                .
///            .          I B C B I              .
///            .        I B C C C B I            .
///            .      I B C C C C C B I          .
///            .    I B C C C C C C C B I        .
///            .  I B # # # C # C C # # B I      .


T2C makes a ".ico" file, which you must manually include in your build. The Mac version builds code directly, with the icon embedded.

The T68 compiler makes a guess at what your memory requirements might be, but it's wrong more often than right. If you include the string "MemoryMeg = 99" in the "Magic16Icon" line, where "99" is the number of Mbytes you want your app to be allocated, the compiler will take your advice over its own guess. If you include also the string "Stack% = 3" on the same line, where "3" is any numeral between 1 and 7, the compiler will take it as advice on how much of the memory to allocate to stack (1 is minimum, 7 is maximum). If the stack percentage is important to you, and if somebody -- perhaps yourself, without thinking about it -- might change the memory allocation in the Finder, I added a system call to compare the new "SIZE" resource #0 to the number you gave the compiler, and correct it if necessary, but you must restart before it takes effect. I use a line like this near the front of my main() to do that:

if (FixBadStackPercent(true)) return; // false if % is OK

Files

All the files for compiling TAGs to Turk/2 and Turk/2 to native 68K Mac applications are in this download:
Turk68.hqx
This includes two programs ready to run (the version suffix may be different in the current distribution):
TAG2T2J23 -- The TAG compiler, to Turk/2, and
Turk68J23 -- The T2 compiler, to MacOS 68K
and these source files:
TAG2Turk.tag -- source TAG for TAG2T2a30
Turk68.tag -- source TAG for Turk68m3
SysLibs.t2 -- source code (in T2) for MOS API library, requires also sKernel68.t2 to run
sKernel68.t2 -- source code (in T2) to interface to the MacOS, must be compiled after SysLibs.t2
TagLibs.t2 -- source code (in T2) for TAG compiler library code, to be compiled after SysLibs.t2
Libry.BTL -- an (almost) empty library file to get started
I use this on a PPC Mac running MacOS/9. It needs 160MB to compile itself, so it won't run on anything with less memory than that (it can be virtual, but it will run slower). First you need to build the libraries. Once upon a time I always began by inserting "Nothing" in the quotes at the beginning of SysLibs.t2, which builds a file "Libry.BTL" containing a null library. It appears to work without this step, but fails later. Alternatively, you can compile the file "Nothing.t2" (also included in this build). Now I just start with the minimal library ("Libry.BTL") included in this build.

The first time you run it, it asks where the library file is, and fails if you can't give it a file (an empty file works); it saves the file location in the Preferences/Registry for subsequent runs. If this doesn't work (I have not exercized it very much), look in the Preferences folder for a file "TAGC Prefs" and open it in ResEdit. Look for a "STR " resource by the name "\TAGC\TAGC\LibryFile" and give that resource the full path to whatever file you want to be using for the compiler's library (it should end in ".BTL"). I suggest you use the supplied "Libry.BTL" file, and if you do, you don't need to compile the "Nothing.t2" initialization package.

The ".BTL" suffix is used for a file format that encodes resources in the data fork. There is a resource viewer in SysLibs for examining the contents of such a library. The T68 compiler libraries also use the resource fork to hold (Mac) package "code" resources, which you can examine in ResEdit. Various of these code resources get included into the built application by the "import" command.

Then compile the original SysLibs.t2 file as distributed. If you use the supplied Libry.BTL  file, you can omit the initialization step and just compile SysLibs.t2 as is. Then compile the sKernel68.t2 and the TagLibs.t2 files (in that order). Save a copy of the updated Libry.BTL file for subsequent compiles. You can save a copy before compiling the TagLibs.t2 file for faster compiles of things other than compilers. Give the two ".tag" grammars to TAG2T2 to produce ".t2" files, which can then be compiled in Turk68 to build the respective applications.

Most of these same source files are part of the Turk2C build, which makes C++ files that can be compiled in Microsoft's VisualStudio to run on a PC.

You are encouraged to experiment with writing your own grammars, and/or make modifications to these two. See "How To Write a Transformational Attribute Grammar" for help getting started doing that.
 

Bugs

The current 2017 compiler has some bugs that we need to program around until I get them fixed.

The color model this was designed to is 8-bit color. It splatters pixels all over the windows in 32-bit. RSN.

MacOS/9 is pretty far down the slope of its useful life. One of the bugs (it turns up more often in 9.2 than 9.0) causes the compiler to hang at the get-go. If it doesn't seem to be showing new status updates every second or two, Quit it and start over.

OSX Classic is much further down the slope of functionality, it no longer supports DragonDrop. The TAG compiler has no event loop, it accepts one file dropped on it by the (working, pre-OSX) Finder, compiles it, then quits. So this release might could run in OSX Classic if Apple had not bungled it. Maybe I will fix it to work around the Apple bug, or maybe I'll just release a Finder replacement that works.

The compiler does not catch and enforce violations of the letter-case rule for identifiers. This is apparently due to a bug in the TAG compiler that I have not yet identified and fixed. If you write correct T2 code, you won't notice this bug. If you try to make identifiers differing only in the case of some or all letters, your program will break when the compiler is fixed.

In packages other than the main, final arrays of string literals do not always compile properly. The workaround is to put all such tables in the main segment, then copy them to other tables as needed.

Most framework function calls invoke functions defined in other packages; these work fine. Some defined framework calls generate inline code that directly calls on the kernel; these also mostly work properly, but if they return a String value, and this value is used to initialize a variable declaration inside a function, the variable is doubly reference-counted, so that the string is not properly disposed when it goes out of scope. The work-around is to not initialize variables from those kinds of system calls, or if necessary, to do so through a library function "Stx(String theStr)", which merely returns its argument.

Cross-package static method calls crash. The work-around is to use ordinary functions.

A sub-launched program must clean up its own dynamic variables (objects, arrays, and strings) when quitting, otherwise they remain in memory after the program is gone and cannot be deleted. My work-around is to surround each Quit() call (and the end of main()) with some code to assign null or the empty string "" to all such variables as appropriate.

Large-value named globals may forget their value in a very large compile. Some very complex expressions fail to compile correctly, and may need to be broken up into separate statements.

2017 August 24a