Make a compiler that supports a subset of the ANSI-C programming language
I’ve decided to go on a compiler writing journey. In the past I’ve written some assemblers, and I’ve written a simple compiler for a typeless language. But I’ve never written a compiler that can compile itself. So that’s where I’m headed on this journey.
As part of the process, I’m going to write up my work so that others can follow along. This will also help me to clarify my thoughts and ideas. Hopefully you, and I, will find this useful!
Here are my goals, and non-goals, for the journey:
The choice of a target language is difficult. If I choose a high-level language like Python, Go etc., then I’ll have to implement a whole pile of libraries and classes as they are built-in to the language.
I could write a compiler for a language like Lisp, but these can be done easily.
Instead, I’ve fallen back on the old standby and I’m going to write a compiler for a subset of C, enough to allow the compiler to compile itself.
C is just a step up from assembly language (for some subset of C, not C18), and this will help make the task of compiling the C code down to assembly somewhat easier. Oh, and I also like C.
The job of a compiler is to translate input in one language (usually a high-level language) into a different output language (usually a lower-level language than the input). The main steps are:
=
is different
to ==
, so you can’t just read a single =
. We call these lexical
elements tokens. if (x < 23) {
print("x is smaller than 23\n");
}
but in another language you might write:
if (x < 23):
print("x is smaller than 23\n")
This is also the place where the compiler can detect syntax errors, like if the semicolon was missing on the end of the first print statement.
<subject> <verb> <adjective> <object>
.
The following two sentences have the same structure, but completely
different meaning: David ate lovely bananas.
Jennifer hates green tomatoes.
There’s a lot of compiler resources out on the Internet. Here are the ones I’ll be looking at.
If you want to start with some books, papers and tools on compilers, I’d highly recommend this list:
While I’m going to build my own compiler, I plan on looking at other compilers for ideas and probably also borrow some of their code. Here are the ones I’m looking at:
In particular, I’ll be using a lot of the ideas, and some of the code, from the SubC compiler.
Assuming that you want to come along on this journey, here’s what you’ll need. I’m going to use a Linux development environment, so download and set up your favourite Linux system: I’m using Lubuntu 18.04.
I’m going to target two hardware platforms: Intel x86-64 and 32-bit ARM. I’ll use a PC running Lubuntu 18.04 as the Intel target, and a Raspberry Pi running Raspbian as the ARM target.
On the Intel platform, we are going to need an existing C compiler. So, install this package (I give the Ubuntu/Debian commands):
$ sudo apt-get install build-essential
If there are any more tools required for a vanilla Linux system, let me know.
Finally, clone a copy of this Github repository.
In the next part of our compiler writing journey, we will start with the code to scan our input file and find the tokens that are the lexical elements of our language.