Very early C compilers and language

FileSizeDate
last1120cdir 
prestructcdir 

Much of this comes from Dennis Ritchie's web page about these early C compilers, and is quoted below with Dennis' permission.

As described in the C History paper, 1972-73 were the truly formative years in the development of the C language: this is when the transition from typeless B to weakly typed C took place, mediated by the (Neanderthal?) NB language, of which no source seems to survive. It was also the period in which Unix was rewritten in C.

Two tapes are present here; the first is labeled "last1120c", the second "prestruct-c". I know from distant memory what these names mean: the first is a saved copy of the compiler preserved just as we were abandoning the PDP-11/20, which did not have multiply or divide instructions, but instead a separate, optional unit that did these operations (and also shifts) by storing the operands into memory locations.

The earlier compiler does not know about structures at all: the string "struct" does not appear anywhere. The second tape has a compiler that does implement structures in a way that begins to approach their current meaning. Their declaration syntax seems to use () instead of {}, but . and -> for specifying members of a structure itself and members of a pointed-to structure are both there.

Neither compiler yet handled the general declaration syntax of today or even K&R I, with its compound declarators like the one in int **ipp; . The compilers have not yet evolved the notion of compounding of type constructors ("array of pointers to functions", for example). These would appear, though, by 5th or 6th edition Unix (say 1975), as described (in Postscript) in the C manual a couple of years after these versions.

Instead, pointer declarations were written in the style int ip[];. A fossil from this era survives even in modern C, where the notation can be used in declarations of arguments. On the other hand, the later of the two does accept the * notation, even though it doesn't use it. (Evolving compilers written in their own language are careful not take advantage of their own latest features.)

It's interesting to note that the earlier compiler has a commented-out preparation for a "long" keyword; the later one takes over its slot for "struct." Implementation of long was a few years away.

Aside from their small size, perhaps the most striking thing about these programs is their primitive construction, particularly the many constants strewn throughout; they are used for names of tokens, for example. This is because the preprocessor didn't exist at the time.

A second, less noticeable, but astonishing peculiarity is the space allocation: temporary storage is allocated that deliberately overwrites the beginning of the program, smashing its initialization code to save space. The two compilers differ in the details in how they cope with this. In the earlier one, the start is found by naming a function; in the later, the start is simply taken to be 0. This indicates that the first compiler was written before we had a machine with memory mapping, so the origin of the program was not at location 0, whereas by the time of the second, we had a PDP-11 that did provide mapping. (See the Unix History paper). In one of the files (prestruct-c/c10.c) the kludgery is especially evident.

It is possible to get these compilers to recompile themselves, via the use of a user-mode PDP-11 simulator. Such a simulator called Apout is available. The packaged source code for these compilers and a README can be found nearby.