Code Generation


Beginning with Parse Tree
Abstract Interpretation
   Interpreter  (take a parse tree and "run" it)
Machine Abstraction
  Machine Instructions  (what kind of machine to "run" the abstract program)
Code Generation

Parse Tree

  data struction:  list    consists of    operator and operand*

Operator
   binary, unary, n-ary, function call

Operand
  constant, local variable, global variable (include vector)

Illustrative example

from the source language
    a = b + c
an output parse tree looks like:
  =
/   \
a    +
    /  \
   b    c

Assume we use a list to represent a tree, the first element is an operator, the rest is the operands.

    (= a (+ b c ))

This data structure is built from a number of cells. A cell contains two fields, head and tail.  There are two types of cell: atom and dot-pair.  An atom is the end of a branch (a kind of leaf node).  A dot-pair is a pointer to another list.  The above example can be draw like this:

representation of a
      list

The cell that pointed to = and a  are atoms.  The cell pointed to the list (+ b c) is a dot-pair.  How to distinguish a cell and a dot-pair depends on the choice of implementation of the data structure.

To help you to understand this structure a bit more, we will write a program to copy a parse tree.

copy(a)
  if( isatom(a) ) return newatom(copyof(a))
  else return cons( copy(head(a)), copy(tail(a)) )

where isatom() checks whether a is an atom.  newatom() creates a new atom.  head() extracts the head field of a cell.  tail() extracts the tail field of a cell. cons() is a constructor, it creates a new dot-pair with the head and tail instantiated by the first and second arguments.

Interpreter

  virtual machine
  traverse a parse tree and execute the operator.  The function go() traverses the parse tree.

do_atom(a)
  switch typeof(a)
    NUM:
    GLOBAL:
    LOCAL:
    ...

go(e)
  if (e == nil) stop
  if isatom(e) do_atom(e)
  //  then e is a list  (operator operand*)
  a = head(e)
  b = tail(e)
  switch typeof(a)
    PLUS: go(head(b)) + go(second(b)) 
    MINUS:
    ...
 

Machine

  stack-based  (First In First Out, FIFO)
  the main runtime data structure to support function call is "activation record"
  activation record stores the state of computation:  Instruction pointer, Old activation record, Local variables (registers)

Activation Record

Machine Instructions

S-code is a stack-based instruction set.  It is somewhat different from common processor instruction (typify by Intel instruction set).  While a common instruction set today is a three-address instruction, S-code is zero-address instruction.  The main working storage for S-code is a stack.  It is called "evaluation stack" and is comparable to the registers in a processor. A stack has two operations on its data: push and pop.  So, it is not necessary to "address" the stack (hence the name "zero-address").  

For example, to perform an add operation, two arguments are pushed to the stack and the operator "add" is executed.  The operator "add" takes two arguments from the stack, adding them and pushes the result back. To store local variables, S-code employs a data structure called "activation record". It is a structure that exists only when a function is called and it is deleted when a function is terminated (that is why it is a "local" data).  An evaluation stack is "global", it exists from the beginning of the execution of a program until the program terminates.

Read detailed description of S-code here:

S-code

(based on Som v2.0) stack-based, fix 32-bit width

 instruction format:    argument:24-bit  opcode:8-bit

 

arith:      add, sub, mul, div, mod

logic:       band, bor, bxor, shl, shr, not,

                eq, ne, lt, le, gt, ge

data:       ld, st, ldx, stx, lit, get, put

control:  call, ret, case, fun, jt, jf, jmp

other:     inc, dec, callt, sys

 

36 instructions

 

Example

local var a,b,c

RZ :   a = b + c
S-code :

get.2
get.3
add
put.1



global var a,b,c


RZ:   a = b + c
S-code:

ld.b
ld.c
add
st.a


control: local var a,b,c

RZ:  if( a == b ) c = 1 else c = 2

S-code:
get.1
get.2
eq
jf xx
lit.1
put.3
jmp yy
:xx
lit.2
put.3
:yy

function call

RZ:    mysum(a,b){ return a + b; }
S-code:
fun.x
get.1
get.2
add
ret.y

RZ:    main(){ mysum(3,4); }
S-code:
fun.x
lit.3
lit.4
call.mysum
ret.y


Example of code generation

Source
sum(n, m){
  if( n == m ) return m;
  else return n + sum( n+1, m);
}

main(){
  print(sum(1,10));
}

Parse Tree
(fun main
   (print (call sum 1 10 )))
(fun sum
   (if-else
       (== #1 #2 )
       (return #2 )
       (return (+ #1 (call sum (+ #1 1 )#2 )))))


Machine code
:main
    fun.1
    lit.1
    lit.10
    call.sum
    sys.1
    ret.1
:sum
    fun.1
    get.2
    get.1
    eq
    jf.L18
    get.1
    ret.3
    jmp.L26
:L18
    get.2
    get.2
    lit.1
    add
    get.1
    call.sum
    add
    ret.3
:L26
    ret.3


last update  16 August 2014