Code Generation


Beginning with Parse Tree
Abstract Interpretation
   Interpreter  (take a parse tree and "run" it)
Machine Abstraction
  Machine Instructions  (what kind of machine to "run" the abstract program)
Code Generation

Parse Tree

  data structure:  list,    consists of  operator and operand*

Operator
   binary, unary, n-ary, function call

Operand
  constant, local variable, global variable (include vector)

Illustrative example

from the source language
    a = b + c
an output parse tree looks like:
  =
/   \
a    +
    /  \
   b    c

Assume we use a list to represent a tree, the first element is an operator, the rest is the operands.

    (= a (+ b c ))

This data structure is built from a number of cells. A cell contains two fields, head and tail.  There are two types of cell: atom and dot-pair.  An atom is the end of a branch (a kind of leaf node).  A dot-pair is a pointer to another list.  The above example can be draw like this:

representation of a
      list

The cell that pointed to = and a  are atoms.  The cell pointed to the list (+ b c) is a dot-pair.  How to distinguish a cell and a dot-pair depends on the choice of implementation of the data structure.

To help you to understand this structure a bit more, we will write a program to copy a parse tree.

copy(a)
  if( isatom(a) ) return newatom(copyof(a))
  else return cons( copy(head(a)), copy(tail(a)) )

where isatom() checks whether a is an atom.  newatom() creates a new atom.  head() extracts the head field of a cell.  tail() extracts the tail field of a cell. cons() is a constructor, it creates a new dot-pair with the head and tail instantiated by the first and second arguments.

Interpreter

  virtual machine
  traverse a parse tree and execute the operator.  The function eval() traverses the parse tree.

do_atom(a)
  switch typeof(a)
    NUM:
    GLOBAL:
    LOCAL:
    ...

eval(e)
  if (e == nil) stop
  if isatom(e) do_atom(e)
  //  then e is a list  (operator operand*)
  a = head(e)
  b = tail(e)
  switch typeof(a)
    PLUS: ret eval(arg1(b)) + eval(arg2(b)) 
    MINUS:
    ...
    IF:   if( eval(arg1(b)) )
		ret eval(arg2(b))
	  ret nil  
...
 

Machine

  stack-based  (First In First Out, FIFO)
  the main runtime data structure to support function call is "activation record"
  activation record stores the state of computation:  Instruction pointer, Old activation record, Local variables (registers)

Activation Record

Machine Instructions

S-code is a stack-based instruction set.  It is somewhat different from common processor instruction (typify by Intel instruction set).  While a common instruction set today is a three-address instruction, S-code is zero-address instruction.  The main working storage for S-code is a stack.  It is called "evaluation stack" and is comparable to the registers in a processor. A stack has two operations on its data: push and pop.  So, it is not necessary to "address" the stack (hence the name "zero-address").  

For example, to perform an add operation, two arguments are pushed to the stack and the operator "add" is executed.  The operator "add" takes two arguments from the stack, adding them and pushes the result back. To store local variables, S-code employs a data structure called "activation record". It is a structure that exists only when a function is called and it is deleted when a function is terminated (that is why it is a "local" data).  An evaluation stack is "global", it exists from the beginning of the execution of a program until the program terminates.

Read detailed description of S-code here:

S-code

(based on Som v2.0) stack-based, fix 32-bit width
instruction format:    argument:24-bit  opcode:8-bit

arith:     add, sub, mul, div, mod
logic:     band, bor, bxor, shl, shr, not,
           eq, ne, lt, le, gt, ge
data:      ld, st, ldx, stx, lit, get, put
control:   call, ret, case, fun, jt, jf, jmp
other:     inc, dec, callt, sys


36 instructions

Example

local var a,b,c

RZ :   a = b + c
S-code :

get.2
get.3
add
put.1

global var a,b,c

RZ:   a = b + c
S-code:

ld.b
ld.c
add
st.a

control: local var a,b,c

RZ:  if( a == b ) c = 1 else c = 2

S-code:
get.1
get.2
eq
jf xx
lit.1
put.3
jmp yy
:xx
lit.2
put.3
:yy

function call

RZ:    mysum(a,b){ return a + b; }
S-code:
fun.x
get.1
get.2
add
ret.y

RZ:    main(){ mysum(3,4); }
S-code:
fun.x
lit.3
lit.4
call.mysum
ret.y

Code Generator

The code generator works very much like an interpreter.  However, instead of "run" the parse tree, it "generates" the machine code that will give the same result as "run".  Here is the pseudo code of the code generator.  Please notice that it is almost the same as the interpreter.

eval(e)
  if (e == nil) stop
  if isatom(e) do_atom(e)
  //  then e is a list  (operator operand*)
  a = head(e)
  b = tail(e)
  switch(typeof(a))
    PLUS:
      eval(arg1(b))
      eval(arg2(b))
      out(ADD)
        ...
    IF:
      eval(arg1(b))
      out(JF,0)
      ads = CP-1
      eval(arg2(b))
      patch(ads,CP-ads)


Where out will output the machine code.  The if..then statement will also be transformed into a linear code (with proper jump).  The first out(JF,0) will spare a place holder for the location of jump and in the end patch() will fill it with appropriate address. (CP is the current place in the machine code).  See the following example:

source
if (a == b) return b;

parse tree
   (if
       (== #1 #2 )
       (return #2 ))
machine code   
    get.1
    get.2
    eq
    jf.L18
    get.2
    ret.3
:L18

Example of code generation

Source

sum(n, m){
  if( n == m ) return m;
  else return n + sum( n+1, m);
}


main(){
  print(sum(1,10));
}


Parse Tree

(fun main
   (print (call sum 1 10 )))
(fun sum
   (if-else
       (== #1 #2 )
       (return #2 )
       (return (+ #1 (call sum (+ #1 1 )#2 )))))


Machine code

:main
    fun.1
    lit.1
    lit.10
    call.sum
    sys.1
    ret.1
:sum
    fun.1
    get.2
    get.1
    eq
    jf.L18
    get.1
    ret.3
    jmp.L26
:L18
    get.2
    get.2
    lit.1
    add
    get.1
    call.sum
    add
    ret.3
:L26
    ret.3



last update  20 Oct 2016