Make a compiler that supports a subset of the ANSI-C programming language
In this part of our compiler writing journey, I’ve implemented the ternary operator. This is one of the really natty operators in the C language which can reduce lines of code in your source file. The basic syntax is:
ternary_expression:
logical_expression '?' true_expression ':' false_expression
;
We evaluate the logical expression. If this is true, we then evaluate only the true expression. Otherwise, we only evaluate the false expression. The result of either the true or the false expression becomes the result of the whole expression.
One subtlety here is that, for example, in:
x= y != 5 ? y++ : ++y;
If y != 5
then x= y++
, otherwise x= ++y
. Either way, y
is only
incremented once.
We can rewrite the above code as an IF statement:
if (y != 5)
x= y++;
else
x= ++y;
However, the ternary operator is an expression, so we can also do:
x= 23 * (y != 5 ? y++ : ++y) - 18;
This can’t be easily converted into an IF statement now. However, we can borrow some of the mechanics from the IF code generator to use for the ternary operator.
We already have ‘:’ in our grammar as a token; now we need to add the ‘?’ token. This is going to be treated as an operator, so we’d better set its precedence.
According to this list of C operators, the ‘?’ operator has precedence just above the assignment operators.
The way we have designed our precedence, our operator tokens must be in precedence order and the AST operators must correspond to the tokens.
Thus, in defs.h
, we now have:
// Token types
enum {
T_EOF,
// Binary operators
T_ASSIGN, T_ASPLUS, T_ASMINUS,
T_ASSTAR, T_ASSLASH,
T_QUESTION, // The '?' token
...
enum {
A_ASSIGN = 1, A_ASPLUS, A_ASMINUS, A_ASSTAR, A_ASSLASH,
A_TERNARY, // The ternary AST operator
...
And in expr.c
, we now have:
static int OpPrec[] = {
0, 10, 10, // T_EOF, T_ASSIGN, T_ASPLUS,
10, 10, 10, // T_ASMINUS, T_ASSTAR, T_ASSLASH,
15, // T_QUESTION
...
As always, I will leave you to browse the changes in scan.c
for the
new T_QUESTION token.
Even though the ternary operator isn’t a binary operator, because it
has precedence, we need to implement it in binexpr()
with the binary
operators. Here’s the code:
struct ASTnode *binexpr(int ptp) {
struct ASTnode *left, *right;
struct ASTnode *ltemp, ...
switch (ASTop) {
case A_TERNARY:
// Ensure we have a ':' token, scan in the expression after it
match(T_COLON, ":");
ltemp= binexpr(0);
// Build and return the AST for this statement. Use the middle
// expression's type as the return type. XXX We should also
// consider the third expression's type.
return (mkastnode(A_TERNARY, right->type, left, right, ltemp, NULL, 0));
...
}
...
}
When we hit the A_TERNARY case, we have the AST tree of the logical expression
stored in left
, the true expression in right
and we have parsed the ‘?’
token. Now we need to parse the ‘:’ token and the false expression.
With all three tokens parsed, we can now build an AST node to hold all three. One problem is how to determine the type of this node. As you can see, it’s easy to choose the type of the middle token. To do it properly, I should see, of the true and false expressions, which one is wider and choose that one. I’ll leave it for now and revisit it.
Generating the assembly code for the ternary operator is very similar to that of the IF statement: we evaluate a logical expression. If true, we evaluate one expression; if false, we evaluate the other. We are going to need some labels, and we are going to have to insert jumps to these labels as required.
I did actually try to modify the genIF()
code in gen.c
to do both IF
and the ternary operator, but it was easier just to write another function.
There is one wrinkle to the generation of the assembly code. Consider:
x= (y > 4) ? 2 * y - 18 : y * z - 3 * a;
We have three expressions, and we need to allocate registers to evaluate each one. After the logical expression is evaluated and we have jumped to the correct next section of code, we can free all the registers used in the evaluation. For the true and false expressions, we can free all the registers except one: the register holding the expression’s rvalue.
We also can’t predict which register this will be, because each expression has different operands and operators; thus, the number of registers used will differ, and the (last) register allocated to hold the result may be different.
But we need to know which register does hold the result of both the true and false expressions, so when we jump to the code that will use this result, it knows which register to access.
Thus, we need to do three things:
We already have a function to free all registers, freeall_registers()
,
which takes no arguments. Our registers are numbered zero upwards. I’ve
modified this function to take, as an argument, the register we want to
keep. And, in order to free all registers, we pass it NOREG which is defined to be the number -1
:
// Set all registers as available.
// But if reg is positive, don't free that one.
void freeall_registers(int keepreg) {
int i;
for (i = 0; i < NUMFREEREGS; i++)
if (i != keepreg)
freereg[i] = 1;
}
Throughout the compiler, you will now see freeall_registers(-1)
to
replace what used to be freeall_registers()
.
We now have a function in gen.c
to deal with the ternary operator.
It gets called from the top of genAST()
:
// We have some specific AST node handling at the top
// so that we don't evaluate the child sub-trees immediately
switch (n->op) {
...
case A_TERNARY:
return (gen_ternary(n));
Let’s have a look at the function in stages.
// Generate code for a ternary expression
static int gen_ternary(struct ASTnode *n) {
int Lfalse, Lend;
int reg, expreg;
// Generate two labels: one for the
// false expression, and one for the
// end of the overall expression
Lfalse = genlabel();
Lend = genlabel();
// Generate the condition code followed
// by a jump to the false label.
genAST(n->left, Lfalse, NOLABEL, NOLABEL, n->op);
genfreeregs(-1);
This is pretty much exactly the same as the IF generating code. We pass the logical expression sub-tree, the
false label and the A_TERNARY operator into genAST()
. When genAST()
sees
this, it knows to generate a jump if false to this label.
// Get a register to hold the result of the two expressions
reg = alloc_register();
// Generate the true expression and the false label.
// Move the expression result into the known register.
expreg = genAST(n->mid, NOLABEL, NOLABEL, NOLABEL, n->op);
cgmove(expreg, reg);
// Don't free the register holding the result, though!
genfreeregs(reg);
cgjump(Lend);
cglabel(Lfalse);
With the logical expression done, we can now allocate the register to
hold both the true and false expression results. We call genAST()
to
generate the true expression code, and we get back the register with the
result. We now have to move this register’s value into the known register.
With this done, we can free all registers except the known register. If
we did the true expression, we now jump to the end of the ternary
assembly code.
// Generate the false expression and the end label.
// Move the expression result into the known register.
expreg = genAST(n->right, NOLABEL, NOLABEL, NOLABEL, n->op);
cgmove(expreg, reg);
// Don't free the register holding the result, though!
genfreeregs(reg);
cglabel(Lend);
return (reg);
}
And the code to evaluate the false expression is very similar. Either way, execution will get to the end label and, once we get here, we know that the ternary result is in the known register.
I was worried about nested ternary operators, which I’ve used quite a bit in other code. The ternary operator is right associative, which means we bind the ‘?’ to the right more tightly than to the left.
Fortunately, as we greedily seek out the ‘:’ token and the false expression once we have parsed the ‘?’ token, our parser is already treating the ternary operator as right associative.
tests/input121.c
is an example of a nested ternary operator:
#include <stdio.h>
int x;
int y= 3;
int main() {
for (y= 0; y < 10; y++) {
x= (y < 4) ? y + 2 :
(y > 7) ? 1000 : y + 9;
printf("%d\n", x);
}
return(0);
}
If y<4
, then x
becomes y+2
. If not, we evaluate the second ternary
operator. If y>7
, x
becomes 1000, otherwise it becomes y+9
.
The effect is to do y+2
for y
values 0 to 3, y+9
for y
values
4 to 7, and 1000 for higher y values:
2
3
4
5
13
14
15
16
1000
1000
Like a few of the steps so far, I was apprehensive to tackle the ternary operator because I thought it would be very difficult. I did have problems putting it into the IF generating code, so I stepped back a bit. Actually, I went out to see a movie with my wife and this gave me a chance to mull things over. I realised that I had to free all but one registers, and I should write a separate function. After that, writing the code was straight forward. It’s always good to step away from the keyboard now and then.
In the next part of our compiler writing journey, I will feed the compiler to itself, look at the parse errors I get and choose one or more of them to fix.
P.S. We’ve reached 5,000 lines of code and 90,000 words in the Readme files. We must be nearly there!