Beginning
with Parse Tree
Abstract Interpretation
Interpreter (take a parse tree and "run" it)
Machine Abstraction
Machine Instructions (what kind of machine to "run" the
abstract program)
Code Generation
Parse Tree
data structure: list, consists of operator
and operand*
Operator
binary, unary, n-ary, function call
Operand
constant, local variable, global variable (include vector)
Illustrative example
from the source language a = b + c
an output parse tree looks like:
= / \ a + / \ b c
Assume we use a list to represent a tree, the first element is an operator,
the rest is the operands.
(= a (+ b c ))
This data structure is built from a number of cells. A cell contains two
fields, head and tail. There are two types of cell: atom and
dot-pair. An atom is the end of a branch (a kind of leaf node).
A dot-pair is a pointer to another list. The above example can be draw
like this:
The cell that pointed to = and a
are atoms. The cell pointed to the list (+
b c) is a dot-pair. How to distinguish a cell and a dot-pair
depends on the choice of implementation of the data structure.
To help you to understand this structure a bit more, we will write a program
to copy a parse tree.
where isatom() checks whether
a is an atom. newatom()
creates a new atom. head()
extracts the head field of a cell. tail()
extracts the tail field of a cell. cons()
is a constructor, it creates a new dot-pair with the head and tail
instantiated by the first and second arguments.
Interpreter
virtual machine
traverse a parse tree and execute the operator. The function eval() traverses the parse tree.
eval(e) if (e == nil) stop if isatom(e) do_atom(e) // then e is a
list (operator operand*) a = head(e) b = tail(e) switch typeof(a) PLUS: ret
eval(arg1(b)) + eval(arg2(b)) MINUS: ...
IF: if( eval(arg1(b)) )
ret eval(arg2(b))
ret nil ...
Machine
stack-based (First In First Out, FIFO)
the main runtime data structure to support function call is
"activation record"
activation record stores the state of computation: Instruction
pointer, Old activation record, Local variables (registers)
Machine Instructions
S-code
is a stack-based instruction set. It is somewhat different from
common processor instruction (typify by Intel instruction set).
While a common instruction set today is a three-address instruction,
S-code is zero-address instruction. The main working storage for
S-code is a stack. It is called "evaluation stack" and is
comparable to the registers in a processor. A stack has two operations
on its data: push and pop. So, it is not necessary to "address"
the stack (hence the name "zero-address").
For
example, to perform an add operation, two arguments are pushed to the
stack and the operator "add" is executed. The operator "add" takes
two arguments from the stack, adding them and pushes the result back. To
store local variables, S-code employs a data structure called
"activation record". It is a structure that exists only when a function
is called and it is deleted when a function is terminated (that is why
it is a "local" data). An evaluation stack is "global", it exists
from the beginning of the execution of a program until the program
terminates.
The code generator works very much like an interpreter. However,
instead of "run" the parse tree, it "generates" the machine code that will
give the same result as "run". Here is the pseudo code of the code
generator. Please notice that it is almost the same as the
interpreter.
eval(e) if (e == nil) stop if isatom(e) do_atom(e) // then e is a list
(operator operand*) a = head(e) b = tail(e) switch(typeof(a))
PLUS:
eval(arg1(b))
eval(arg2(b))
out(ADD)
...
IF:
eval(arg1(b))
out(JF,0)
ads = CP-1
eval(arg2(b))
patch(ads,CP-ads)
Where out will output the machine code. The if..then
statement will also be transformed into a linear code (with proper
jump). The first out(JF,0) will spare a place holder for
the location of jump and in the end patch() will fill it with
appropriate address. (CP is the current place in the machine
code). See the following example: