Using the Turk/2 Compiler


Most of this applies to my Turk68 compiler, which generates native 68K Mac applications. There is a T2C compiler which generates C code for translation to PC Win32 applications from the same source code. The T2 language is more or less defined here, but that spec is not formally complete.

Turkish Demitasse (T2 or Turk/2) is a dialect of Java. There are both extensions and deletions, so writing a program in T2 that also compiles without change in Java is unlikely. Most notably, the language has been tightened so it is more strongly typed than Java. I also made some changes for ease of programming, and so it can be compiled down to efficient native code. These are mostly described in the T2 White Paper. On-going user interface library development is documented in "MOS API".

Like C/C++ all names must be declared before use. Like Java, objects and strings are dynamic; there are no pointer types, and (with a little care) dynamic memory is automatically reclaimed when it goes out of scope. Java does this by stopping for garbage collection from time to time, which is unacceptable in a system programming environment; T2 does it with reference counts, but structures with circular links could fail to be disposed. The programmer is responsible for ensuring circular link chains are broken before objects go out of scope.

C/C++ uses header files to link separate compiles into a single program, where several separate files refer to each other mutually; Java does something similar. The current version of T2 requires a strict dependency hierarchy. I use method overrides to call functions in the other direction. It's slightly slower than plain function calls, but not much. Separate files are compiled as "package"s and "import"ed pretty much as in Java, except that you are not limited to one class per package.

You can also put several different packages into a single file, then use a special form of the package declaration at the front to tell the compiler which one you want to compile. The form is:

package "pkgname"
...
package pkgname; // package code follows..
Each compilation is explicitly terminated by a standalone dot after the last declaration (usually a function or class).

If there is a main program (including its imports) less than 1000 bytes from the front, and if there are empty quotes in the initial package line, the compiler will look at main program imports, then compile all those packages it finds in that file before compiling the main. If the first instance of the word "import" is in a line comment, this feature is disabled. Only line comments (not block comments) are recognized by this hack, so judicious use of comments can adjust what gets compiled. Real Soon Now I hope to have a full IDE up which can look at file and package time-stamps to do this automatically.

The entire source file must fit in memory, and each compiled package is limited to about 500K. Functions greater than 32K probably cannot be compiled because of the 16-bit branch address limit (for compatibility, the new 68020 long branches are not used). Literals and constant tables and class method dispatch tables are indexed off the back of the code; if this exceeds 32K, the access for the more distant tables will require more code (slightly larger and slower, but otherwise safe). String literals are indexed off this same code end, and distinguished from other strings by being a small (<2K) integer. You are thus limited to not more than 2040 distinct string literals in one compilation unit.

Most T2 right braces can be optionally named according to what kind of block is being closed. This allows the compiler to check your intentions, and often it can report a brace-match error much earlier (and therefore easier to find) than when this feature is unavailable or not used. Real Soon Now I hope to add an indentation checker also.

The 68K T2 implementation does variables of the String type as Mac "handles", which can be moved around in memory to minimize fragmentation, and String literals as Mac "P-strings" starting with a length byte. This limits the length of string literals to a maximum of 255 characters, but making a literal even that long requires very long source lines because the compiler does not support \-cuts. You get some of the benefit of \-cuts by concatenation operators between literals, which are joined into a single string literal at compile time (so long as the length of the result does not exceed 255). Mac handles are not particularly fast, so there are some library functions for working with short strings implemented as P-strings within integer arrays. Because there is no memory allocation involved until the string result is extracted, this goes much faster. The framework distinguishes handle-based Strings from P-strings because the length byte pushes the address of a P-string to be odd, whereas handles are always even. The array-based P-strings can thus be used without conversion to handles, if you are careful not to expect them to outlast the integer array. The compiler offers no protection from this error, but if you are using them, you are expected to know what you are doing. The defined String type is safe.

The "private" attribute can be used on global variables and functions declared in a package, to make them invisible to other packages who might import that package. This works essentially as it does within classes. The "protected" attribute is not yet implemented. Real Soon Now (meaning: don't hold your breath waiting).

You can write code that takes the address of a variable (or function) and does arithmetic on it as an integer, but this is patently unsafe, so it is forbidden by the compiler unless you "import Dangerous;" the built-in Dangerous package also contains special functions for generating native code and doing other (ahem) dangerous things that system code needs to do. It includes also an opaque data type "X1NT" and functions for casting it to and from integers; I use this type to make functions visible to other packages that import Dangerous, but not usable to the general public. Because you must import Dangerous to use the prefix "&" operator, there are several language extensions using the same operator in strange places to inform the compiler of system-programming intentions. These should not be a concern for normal code.
 

Optimizer

The current Turk68 compiler precalculates constant integer and boolean expressions, and within limits also joins concatenated string literals into a single literal, as noted above. Blocks of code controlled by a conditional based on a constant false boolean expression are deleted from the compiled code. This is not quite as conventient as C's "#if" conditional compilation, because the deleted code must still be valid T2 code. Except within packages and classes (where a subsequent compilation might need them) unused functions are similarly deleted from the object file.

All T2 array accesses range-check the index, and all dynamic arrays and object accesses also check the pointer for null. This is normally done with a system call, which is not overly fast, but the compiler is able to recognize in many cases when the programmer already did that test, and therefore eliminates the system call in those cases. This analysis happens in the single source-code pass, so it cannot know about global variables or anything that might happen later in a loop. I always insert code to re-check pointers and subscripts after every loop, and also inside each loop, preferably just before they are used. The compiler knows about the LENGTH() of arrays in such tests, but there's a bug that prevents it from doing that in the for-loop initialization. The compiler is not able to propagate variable range check across variable assignments -- if you test variable x, then copy it to y, the compiler will not know about y. It also does not do much with expressions involving the index variable. With a lot more effort, I could make it smarter, but this is sufficient to greatly speed up array access.

There is a system function "BoundsNullError()" which if you call it within a function, tells the compiler to generate a compile-time error anywhere in that function after its call, if it generates a system call for array-bounds or null-pointer checking. These system calls are not really errors, but they do slow down execution. Including this function call informs you at compile time if that problem is in this code.
 

Icon

It's really rather a hack, but you can define icons for your application (and in the Mac, its text and Binary files) as a sequence of comment lines using letters and other symbols to graphically define the pixels in a monospace font as is common in program editors. The first line of a 32x32 icon starts with the keyword "/// MagicApIcon." The next 32 lines must each be at least 80 characters long, with pixel information, one pixel in each pair of characters beginning after 16. I use dots along the four sides to visually align the pixels to the icon cell. In the T2C version of the compiler, you must also supply a 16x16 icon with the keyword "/// Magic16Icon." followed by 16 lines of pixels (at least 48 characters in length), but it is optional on the Mac. The TAG compiler will look for similar lines (comment lines starting with "{// " instead of "/// ") and copy them over to the generated T2 files.

The Mac has 34 defined icon colors (which I reverse-engineered from a color table in ResEdit); the PC ".ico" file usually includes a table of 16 colors, to which I gave the following symbols as useful:

 
sp white # black
R red K dark red
G green O dark green
B blue I indigo/dark blue
C cyan/light blue S darker cyan or light green
M magenta/pink P purple
Y yellow N brown
* gray . light gray
Here for example are the first few lines of the TAG Compiler 16x16 icon (see the source file for the the rest). I hope it looks better in your browser than in mine. It has a dark blue border, shading to a light blue interior, with black text:
/// Magic16Icon.+. . . . . . . !+. . . . . . . .+.
///            .              I                  .+
///            .            I B I                .
///            .          I B C B I              .
///            .        I B C C C B I            .
///            .      I B C C C C C B I          .
///            .    I B C C C C C C C B I        .
///            .  I B # # # C # C C # # B I      .


T2C makes a ".ico" file, which you must manually include in your build. The Mac version builds code directly, with the icon embedded.
 

Files

All the files for compiling TAGs to Turk/2 and Turk/2 to native 68K Mac applications are in this download:
Turk68.hqx
This includes two programs ready to run (the version suffix may be different in the current distribution):
TAG2T2m7 -- The TAG compiler, to Turk/2, and
Turk68m3 -- The T2 compiler, to MacOS 68K
and these source files:
TAG2Turk.tag -- source TAG for TAG2T2a30
Turk68.tag -- source TAG for Turk68m3
SysLibs.t2 -- source code (in T2) for MOS API library, requires also sKernel68.t2 to run
sKernel68.t2 -- source code (in T2) to interface to the MacOS, must be compiled after SysLibs.t2
TagLibs.t2 -- source code (in T2) for TAG compiler library code, to be compiled after SysLibs.t2
Libry.BTL -- an (almost) empty library file to get started
I use this on a PPC Mac running MacOS/9. It needs 160MB to compile itself, so it won't run on anything with less memory than that. First you need to build the libraries. I always begin by inserting "Nothing" in the quotes at the beginning of SysLibs.t2, which builds a file "Libry.BTL" containing a null library. It appears to work without this step, but fails later. The first time you run it, it asks where the library file is, and fails if you can't give it a file (an empty file works); it saves the file location in the Preferences/Registry for subsequent runs. Then remove the "Nothing" (leaving empty quotes, as distributed) to compile the same SysLibs.t2 file again. If you use the supplied Libry.BTL  file, you can omit this step and just compile SysLibs.t2 as is. Then compile the sKernel68.t2 and the TagLibs.t2 files (in that order). Save a copy of the updated Libry.BTL file for subsequent compiles. You can save a copy before compiling the TagLibs.t2 file for faster compiles of things other than compilers. Give the two ".tag" grammars to TAG2T2 to produce ".t2" files, which can then be compiled in Turk68 to build the respective applications.

(Not yet running at this time) most of these same source files are part of the Turk2C build, which makes C++ files that can be compiled in Microsoft's VisualStudio to run on a PC.

You are encouraged to experiment with writing your own grammars, and/or make modifications to these two. See "How To Write a Transformational Attribute Grammar" for help getting started doing that.
 

Bugs

The current 2013 compiler has some bugs that we need to program around until I get them fixed.

The compiler does not catch and enforce violations of the letter-case rule for identifiers. This is apparently due to a bug in the TAG compiler that I have not yet identified and fixed. If you write correct T2 code, you won't notice this bug. If you try to make identifiers differing only in the case of some or all letters, your program will break when the compiler is fixed.

In packages other than the main, final arrays of string literals do not always compile properly. The workaround is to put all such tables in the main segment, then copy them to other tables as needed.

Most framework function calls invoke functions defined in other packages; these work fine. Some defined framework calls generate inline code that directly calls on the kernel; these also mostly work properly, but if they return a String value, and this value is used to initialize a variable declaration inside a function, the variable is doubly reference-counted, so that the string is not properly disposed when it goes out of scope. The work-around is to not initialize variables from those kinds of system calls, or if necessary, to do so through a library function "Stx(String theStr)", which merely returns its argument.

Cross-package static method calls crash. The work-around is to use ordinary functions.

A sub-launched program must clean up its own dynamic variables (objects, arrays, and strings) when quitting, otherwise they remain in memory after the program is gone and cannot be deleted. My work-around is to surround each Quit() call (and the end of main()) with some code to assign null or the empty string "" to all such variables as appropriate.

2014 March 8