Compilation scheme
A source program is compiled into postfix codes. The control flow
becomes jumps.
notation: op:arg /label
if
e1 e2 e3 => e1 jf:a e2 jmp:b /a e3 /b
while e1 e2
=> jmp:a /b e2 /a e1 jt:b
for e1 e2 e3
=> .... [*** for loop]
case e0 ecase
=> case lit:low lit:hi jmp:else jump-table ecase
v = e
=> e st:v (global var)
=> e put:v (local var)
v
=>
ld:v (global var)
=> get:v (local var)
f arg*
=> arg* call:f
n
=> lit:n
e1 binop e2
=> e1 e2 binop
unaryop e
=> e unaryop
array index
v[i]
= e => ld/get:v i e
stx
... =
v[i] => ld/get:v i ldx ...
How to compile for
"for i start end body" means
i
= start
while i <= end
body
i = i + 1
A new local variable is created to be used to store "end".
In terms of speed, it is faster because the "end" is evaluated and
stored in a local variable just once in the initialization, so testing
"if i <= end" is faster than evaluate the expression every time
around the loop.
For the parser, most work is easy (writing a grammar for a parser
generator). The code generation is simple too. A new local
is created to store "stop" value, let's call it "end". The simple form
is: (assuming i is local)
lit start
put i
lit stop
put end
/loop
get i
get end
le
jf exit
body
inc i
jmp loop
/exit
This is inefficient as it executes two jumps (jf, jmp) per loop. The
following form requires only one jump in the loop.
lit start
put i
lit stop
put end
jmp test
/loop
body
inc i
/test
get i
get end
le
jt loop
This is also applicable to while loop.
How to compile case
grammar
'enum
'{ [num] label1 ... labeln '}
'case ex0 '{ caselist '}
caselist ->
label ': ex caselist |
'else ': ex
|
nil
'else case must be the last case (if present)
Example
enum
{ 10 add sub mul div }
case op
add : doadd
sub : dosub
...
else : error
To use symbolic constant, lex() grabs var and looks it up, if its type
is "constant" then lex returns NUM.
Compiling 'case generates table-lookup instruction of the form:
<case,
else, n, value1, goto1, ... valuen, goton>
follows by the code of each caseitem. The code for each caseitem
is ended with jump to exit. (perhaps using "break" which is transformed
into jump)
where n is the number of caselist, else is the default goto, value is
the case label. The jump list is sorted according to value to
enable the matching (searching) in log n time using for example binary
search. This is very similar to lookup instruction in JVM.
The above compare-and-jump table is the most general form. It can
handle all cases of "case". However it is not the most efficient
method. A jump table can be constructed that take constant time
in searching. When the case label is densed, which is usually the
case, the index can be used to access the jump table directly without
search.
<case2,
low, hi, else, goto1...goton>
where low, hi is the range of label. If the index is out of range
then use "else". hi-low+1 is the size of jumptable.
index-lo is used to access the gotos. This is the form we used in
Som compiler. Each case action is compiled into a normal sequence of
code ended with jmp:end. The entry into the jumptable is the code
"jmp" to the corresponding ex.
case
e0 ecase =>
e0
case
lit:low
lit:hi
jmp:else
jmp:1
...
jmp:n // jump table
/1
e1
jmp:end // each case action
/2
...
/n
en
jmp:end
/else
e-else // default action
/end
Code generation for
case
Using 'case required 'enum to generate label. However, enum is
processed by the parser and code generator does not have to know about
it. Generating code for case is a bit complicate but executing it by an
interpreter is a simple O(1) (indexed table look up) operation.
Assume the whole case statement has already been parsed and the number
of caselist is known, low and high labels are known. The jump table is
filled with all entries between low and hi.
algorithm gencase
1 genex(ex0)
2 outs(icCase)
3 look into the block and get low, hi labels.
4 outa(icLit,low)
5 outa(icLit,hi)
6 mark address here a1 (jumbtable starts at a1+1)
7 outa(icJmp,0) this is the jump to else case
now generate an empty jumptable from low to hi
8 for i=low i<=hi i++
outa(icJmp,0)
generate each ex in caselist except exelse
each ex
8.1 get num, patch jumptable at num-low to
jump to this address
8.2 genex(ex)
8.3 outa(icJmp,backchain)
backchain is used to patch its location to eloc in the end.
9 genex(exelse)
10 patch jump to exelse at a1
11 patch any entry in the jumbtable that has not been update (ads = 0)
to jump to exelse.
12 here is eloc, backpatch the backchain to jump to eloc.
End of gencase
Labels are generated from enum. enum is processed entirely in the
parser using symbol table to hold the enum labels as global symbols
with enumerated values (unique integers). Jump table is filled with the
entries from smallest label to largest label so to make it a direct
jump with a key without searching. If label is not densed the
size of jump table will be large. Another alternative which might
make code generation easier is to use "label-jump" pair in the jump
table and use search (linear or binary) to find a matched label.
the code sequence is:
ex0
case2
<jump table>
label1, jmp ex1
...
labeln, jmp exn
-1, jmp exelse
<end jump table>
ex1, jmp exit
...
exn, jmp exit
exelse
<exit>
algorithm gencase2
1 genex(ex0)
2 outs(icCase2)
3 mark this a1
4 count no. of case
5 make room for jump table
6 for each ex
7 get label, update jump table
8 putcode(a1,icLit,label)
9 putcode(a1+1,icJmp,here)
10 a1 = a1 + 2
11 genex(ex)
12 outa(icJmp,exit)
13 exit is not known yet
14 if last ex is not else
15 update jump table with -1, jmp exit
16 here is exit
17 backpatch jmp exit in the code
To eliminate backpatch loop, exit can be fixed in the beginning:
ex0
case2
<exit>
jmp exit
<jump table>
...
All jmp exit in the ex body will jump to this location. After
code improvement "improv2", the code will be shortcut.
interactive mode
To make the programming system interactive, that is, the input source
program can be typed in and the compilation translates the source into
S-codes which then are executed immediately, all expressions must be
stored in the code segment whether they are defined functions or
not. We need some markers in the code segment to mark out the
defined function so that when a S-code file is loaded it is possible to
know which section is to be executed immediately and which section is
the function definition. The symbol table is not required to
execute the S-code, however the presence of a symbol table facilitates
the debugging.
A defined function is marked in CS by "fun m" and ended with "ret n".
The layout in CS is like this:
to
name arg* e => fun m, ... body ..., ret n.
Toplevel
At the toplevel, evaluating an expression returns a value which may or
may not be used. The unused values accumulate, they are purged when
"ret" is performed.
e
=> e end
The action at the toplevel is as follows. Once S-code is loaded
(or having been translated into CS), the execution begins. The
state of computation is created causing changes in DS and SS. The
"end" will return the control back to the console.
[to write more on the compiler itself]
27 Sept 2004