Code Generation (s2)


Beginning with Parse Tree
Abstract Interpretation
   Interpreter  (take a parse tree and "run" it)
Machine Abstraction
  Machine Instructions  (what kind of machine to "run" the abstract program)
Code Generation

Parse Tree

Data structure:  list    consists of    operator and operand*

Operator
   binary, unary, n-ary, function call

Operand
  constant, local variable, global variable (include vector)

Illustrative example

from the source language
    a = b + c
an output parse tree looks like:
  =
/   \
a    +
    /  \
   b    c

Assume we use a list to represent a tree, the first element is an operator, the rest is the operands.

    (= a (+ b c ))

This data structure is built from a number of cells. A cell contains two fields, head and tail.  There are two types of cell: atom and dot-pair.  An atom is the end of a branch (a kind of leaf node).  A dot-pair is a pointer to another list.  The above example can be draw like this:

representation of a
      list

The cell that pointed to = and a  are atoms.  The cell pointed to the list (+ b c) is a dot-pair.  How to distinguish a cell and a dot-pair depends on the choice of implementation of the data structure.

To help you to understand this structure a bit more, we will write a program to copy a parse tree.

copy(a)
  if( isatom(a) ) return newatom(a)
  else return cons( copy(head(a)), copy(tail(a)) )

where isatom() checks whether a is an atom.  newatom() creates a new atom.  head() extracts the head field of a cell.  tail() extracts the tail field of a cell. cons() is a constructor, it creates a new dot-pair with the head and tail instantiated by the first and second arguments.

Interpreter

  virtual machine
  traverse a parse tree and execute the operator.  The function eval() traverses the parse tree (evaluate).

do_atom(a)
  switch typeof(a)
    NUM:
    GLOBAL:
    LOCAL:
    ...

eval(e)
  if (e == nil) stop
  if isatom(e) do_atom(e)
  //  then e is a list  (operator operand*)
  a = head(e)
  b = tail(e)
  switch typeof(a)
    PLUS: eval(head(b)) + eval(second(b)) 
    MINUS:
    ...
 

Machine Code

 The main runtime data structure to support function call is "activation record". Activation record stores the state of computation:  Instruction pointer, Old activation record, Local variables (registers).  Frame pointer (FP) stores a pointer to current activation record. Stack pointer (SP) stores the pointer to the current stack.

Activation Record

Machine Instructions

S2 is a typical instruction set in modern processors. It has three-address instruction format.


Example of compiling a source to S2 instructions

source (Rz 3.6)

// factorial

fac(n)
  if( n == 0 ) return 1
  else return n * fac(n-1)

main()
  print(fac(7))


Run the compiler:  (output  parse tree and assembly code)

C:\rz36-3\test>rz36 fac.txt

Parse tree

(fun fac (else (== #1 0 )(return 1 )(return (* #1 (call fac (- #1 1 ))))))
(fun main (print (call fac 7 )))


Assembly of S2
.symbol
...
.code 0
 mov fp #3500
 mov sp #3000
 jal rads main
 trap r0 #0

:fac
st r1 @1 fp
...
pop sp r1
eq r2 r1 #0
jf r2 L102
mov retval #1
jmp L101
:L102
sub r3 r1 #1
push sp r3
jal rads fac
mul r2 r1 r28
mov retval r2
jmp L101
:L101
...
ld r1 @1 fp
ret rads

:main
st r1 @1 fp
...
mov r1 #7
push sp r1
jal rads fac
trap r28 #1
...
ld r1 @1 fp
ret rads

.data 200
.end


last update 23 Sept 2017