Code Generation


Beginning with Parse Tree
Abstract Interpretation
   Interpreter  (take a parse tree and "run" it)
Machine Abstraction
  Machine Instructions  (what kind of machine to "run" the abstract program)
Code Generation

Parse Tree

  data struction:  list    consists of    operator and operand*

Operator
   binary, unary, n-ary, function call

Operand
  constant, local variable, global variable (include vector)

Illustrative example

from the source language
    a = b + c
an output parse tree looks like:
  =
/   \
a    +
    /  \
   b    c

Assume we use a list to represent a tree, the first element is an operator, the rest is the operands.

    (= a (+ b c ))

This data structure is built from a number of cells. A cell contains two fields, head and tail.  There are two types of cell: atom and dot-pair.  An atom is the end of a branch (a kind of leaf node).  A dot-pair is a pointer to another list.  The above example can be draw like this:

representation of a
      list

The cell that pointed to = and a  are atoms.  The cell pointed to the list (+ b c) is a dot-pair.  How to distinguish a cell and a dot-pair depends on the choice of implementation of the data structure.

To help you to understand this structure a bit more, we will write a program to copy a parse tree.

copy(a)
  if( isatom(a) ) return newatom(copyof(a))
  else return cons( copy(head(a)), copy(tail(a)) )

where isatom() checks whether a is an atom.  newatom() creates a new atom.  head() extracts the head field of a cell.  tail() extracts the tail field of a cell. cons() is a constructor, it creates a new dot-pair with the head and tail instantiated by the first and second arguments.

Interpreter

  virtual machine
  traverse a parse tree and execute the operator.  The function go() traverses the parse tree.

do_atom(a)
  switch typeof(a)
    NUM:
    GLOBAL:
    LOCAL:
    ...

go(e)
  if (e == nil) stop
  if isatom(e) do_atom(e)
  //  then e is a list  (operator operand*)
  a = head(e)
  b = tail(e)
  switch typeof(a)
    PLUS: go(head(b)) + go(second(b)) 
    MINUS:
    ...
 

Machine

  stack-based  (First In First Out, FIFO)
  the main runtime data structure to support function call is "activation record"
  activation record stores the state of computation:  Instruction pointer, Old activation record, Local variables (registers)

Activation Record

Machine Instructions

s-code

(based on Som v2.0) stack-based, fix 32-bit width

 

  arg:24   op:8

 

arith:   add, sub, mul, div, mod

logic:   band, bor, bxor, shl, shr, not,

         eq, ne, lt, le, gt, ge

data:    ld, st, ldx, stx, lit, get, put

control: call, ret, case, fun, jt, jf, jmp

other:   inc, dec, callt, sys

 

36 instructions

 

Example

local var a,b,c

  a = b + c

get.2
get.3
add
put.1


global var a,b,c

  a = b + c
 
ld.b
ld.c
add
st.a

local var a,b,c

if( a == b ) c = 1 else c = 2

get.1
get.2
eq
jf xx
lit.1
put.3
jmp yy
:xx
lit.2
put.3
:yy

mysum(a,b){ return a + b; }

fun.x
get.1
get.2
add
ret.y

main(){ mysum(3,4); }

fun.x
lit.3
lit.4
call.mysum
ret.y


Example of code generation

Source
sum(n, m){
  if( n == m ) return m;
  else return n + sum( n+1, m);
}

main(){
  print(sum(1,10));
}

Parse Tree
(fun main
   (print (call sum 1 10 )))
(fun sum
   (if-else
       (== #1 #2 )
       (return #2 )
       (return (+ #1 (call sum (+ #1 1 )#2 )))))


Machine code
:main
    fun.1
    lit.1
    lit.10
    call.sum
    sys.1
    ret.1
:sum
    fun.1
    get.2
    get.1
    eq
    jf.L18
    get.1
    ret.3
    jmp.L26
:L18
    get.2
    get.2
    lit.1
    add
    get.1
    call.sum
    add
    ret.3
:L26
    ret.3


last update  16 August 2012