Code Generation

Beginning with Parse Tree
Abstract Interpretation
Interpreter (take a parse tree and "run" it)
Machine Abstraction
Machine Instructions (what kind of machine to "run" the abstract program)
Code Generation

Parse Tree

data struction: list consists of operator and operand*

Operator
binary, unary, n-ary, function call

Operand
constant, local variable, global variable (include vector)

Illustrative example

from the source language
a = b + c
an output parse tree looks like:

=
/   \
a    +
    / \
   b    c

Assume we use a list to represent a tree, the first element is an operator, the rest is the operands.

(= a (+ b c ))

This data structure is built from a number of cells. A cell contains two fields, head and tail. There are two types of cell: atom and dot-pair. An atom is the end of a branch (a kind of leaf node). A dot-pair is a pointer to another list. The above example can be draw like this:

representation of a
list

The cell that pointed to = and a are atoms. The cell pointed to the list (+ b c) is a dot-pair. How to distinguish a cell and a dot-pair depends on the choice of implementation of the data structure.

To help you to understand this structure a bit more, we will write a program to copy a parse tree.

copy(a)
if( isatom(a) ) return newatom(copyof(a))
else return cons( copy(head(a)), copy(tail(a)) )

where isatom() checks whether a is an atom. newatom() creates a new atom. head() extracts the head field of a cell. tail() extracts the tail field of a cell. cons() is a constructor, it creates a new dot-pair with the head and tail instantiated by the first and second arguments.

Interpreter

virtual machine
traverse a parse tree and execute the operator. The function go() traverses the parse tree.

do_atom(a)
switch typeof(a)
    NUM:
    GLOBAL:
    LOCAL:
    ...

go(e)
if (e == nil) stop
if isatom(e) do_atom(e)
// then e is a list (operator operand*)
a = head(e)
b = tail(e)
switch typeof(a)
    PLUS: go(head(b)) + go(second(b))
    MINUS:
    ...

Machine

stack-based (First In First Out, FIFO)
the main runtime data structure to support function call is "activation record"
activation record stores the state of computation: Instruction pointer, Old activation record, Local variables (registers)

Machine Instructions

S-code is a stack-based instruction set. It is somewhat different from common processor instruction (typify by Intel instruction set). While a common instruction set today is a three-address instruction, S-code is zero-address instruction. The main working storage for S-code is a stack. It is called "evaluation stack" and is comparable to the registers in a processor. A stack has two operations on its data: push and pop. So, it is not necessary to "address" the stack (hence the name "zero-address").

For example, to perform an add operation, two arguments are pushed to the stack and the operator "add" is executed. The operator "add" takes two arguments from the stack, adding them and pushes the result back. To store local variables, S-code employs a data structure called "activation record". It is a structure that exists only when a function is called and it is deleted when a function is terminated (that is why it is a "local" data). An evaluation stack is "global", it exists from the beginning of the execution of a program until the program terminates.

Read detailed description of S-code here:

S-code

(based on Som v2.0) stack-based, fix 32-bit width

instruction format: argument:24-bit opcode:8-bit

arith: add, sub, mul, div, mod

logic: band, bor, bxor, shl, shr, not,

eq, ne, lt, le, gt, ge

data: ld, st, ldx, stx, lit, get, put

control: call, ret, case, fun, jt, jf, jmp

other: inc, dec, callt, sys

36 instructions

Example

local var a,b,c

RZ : a = b + c
S-code :

get.2
get.3
add
put.1

global var a,b,c

RZ: a = b + c
S-code:

ld.b
ld.c
add
st.a

control: local var a,b,c

RZ: if( a == b ) c = 1 else c = 2

S-code:

get.1
get.2
eq
jf xx
lit.1
put.3
jmp yy
:xx
lit.2
put.3
:yy

function call

RZ:

mysum(a,b){ return a + b;
          }

S-code:

fun.x
get.1
get.2
add
ret.y

RZ: main(){ mysum(3,4); }
S-code:

fun.x
lit.3
lit.4
call.mysum
ret.y

Example of code generation

Source
sum(n, m){
if( n == m ) return m;
else return n + sum( n+1, m);
}

main(){
print(sum(1,10));
}

Parse Tree
(fun main
   (print (call sum 1 10 )))
(fun sum
   (if-else
       (== #1 #2 )
       (return #2 )
       (return (+ #1 (call sum (+ #1 1 )#2 )))))

Machine code
:main
    fun.1
    lit.1
    lit.10
    call.sum
    sys.1
    ret.1
:sum
    fun.1
    get.2
    get.1
    eq
    jf.L18
    get.1
    ret.3
    jmp.L26
:L18
    get.2
    get.2
    lit.1
    add
    get.1
    call.sum
    add
    ret.3
:L26
    ret.3

last update 16 August 2014