Som4 : a compiler for SM chip
A standard Som compiler is modified to include some extra instructions,
mostly declaration, to facilitate conversion of S-code to SM
code. It also ouputs an object file . The executable file
runs in batch mode. To change the interactive mode in Som to an
executable file, all the immediate expressions; create global variable,
initialise variables etc. are collected into a "main" function that
will be called at the end of loading all files. The first line of
executable code is simply "call main end".
Extra S-code
The extra S-code instructions are:
Ads x
42 similar to lit x, ( -- x)
Lda
43 load absolute, TOS = M[ads], (ads -- M[ads])
Sta
44 store absolute, M[ads] = x, (ads x -- )
These instructions are used to hint to the converter program how to
generate code for SM. The following example illustrates why the
extra S-code is necessary.
Assume a, b are global
a
= b
is compiled to S-code:
ld
b, st a
but the code for SM looks like:
xlit
a, xlit b, xld, xst
The store instruction causes difficulty as the converter must know
where to put the address before the store instruction. To
facilitate that, som4 uses extra instructions and generates:
Ads
a, Ads b, Lda, Sta
Translation of these S-codes to SM code is straight forward.
Array
Another matter of concern is "array". SM does not have dynamic
array (no run-time system). Programming for SM chip must assume
static array where array declaration is immutable and is global.
There is a need for compile time allocation for array in DS (static) as
opposed to the current runtime allocation (from heap). For a
static allocation, the declaration of array is an immediate expression:
N
= array ex . . .
will be compiled to S-code
ex
. . ., array, st N
for example
N
= array 10 ==> lit 10, array, st N
To create a static allocation, the compiler compile this immediate
expression into CS, and scan for "array", if it exists, the compiler
will evaluate this sequence. The restriction is that this
immediate expression must be completely known at compile time.
The result value, which is the size of the array, then is used to
allocate DS appropriately. This initialise sequence after being
evaluated will not be included for run-time execution in the "main"
function in order not to re-evaluate it.
Static array
allocation
Because the current implementation of SM chip does not have run-time
support system, smc does not implement dynamic memory allocation.
the "array" instruction can not be executed.
Som4 solves this problem by executing all immediate expression
including the array allocation and finds out the actual address of the
array, then assigns the constant (the address of the array) to the
array variable. See the example:
data
= array maxdata (in bubble.txt)
normally, this is compiled to
ld
maxdata, array, st data
if this sequence is executed, the value of address will be stored in
data. The following sequence should be generated for SM:
lit
address, st data
finally, as "data" is a constant, its value never change. Every
"ld data" can be replaced with "lit address". For example,
y
= data[i]
normally it is compiled to:
ld
data, get i, ldx, st y
this sequence can be changed to:
lit
"address", get i, ldx, st y
this saves one memory access.
Because Som4 is targeted to generate a code is that easily converted to
SM code, it does not use ldx, stx. It generates a sequence to
calculate ads + index explicitly.
a
= b[i]
normally S-code is:
ld
b, get i, ldx, st a
Som4 generates S-code:
Ads
a, Ads b, get i, add, lda, sta
which is easily converted into SM code:
xlit
a, xlit b, xget i, xadd, xld, xst
Why don't do it at conversion time? It is difficult to get "stx"
to know where to calculate the index properly because the address
occurs much earlier in the sequence. It is easier to generate the
correct sequence by the compiler. See the following example:
a[k]
= b
S-code:
ld
a, get k, get b, stx
Extra s-code:
Ads
a, get k, add, get b, sta
SM-code:
xlit
a, xget k, xadd, xget b, xst
Without the compiler generated "add" at the right place, it is
difficult for a converter to figure it out.
Example
from quick.txt (quick sort)
These are the immediate expressions:
<at the beginning>
N = 20
a = array N
<at the end>
inita
quicksort 0 (N - 1)
When the program is compile, the "main" is created. The following
is S-code for "main":
139 Fun _main
140 Ads N
141 Lit
20
142
sta // N = 20
143 Ads a
144 Ads 1002
145
sta // a = array N
146 Call inita
147 Lit 0
148 Ads N
149 Lda
150 Lit 1
151 Sub
152 Call
quicksort
153 Ret 1
Notice at 144 Ads 1002,
it is the real address of array N. The compiler knows where the
array is allocated. The array access is shown:
to
swap i j | t =
t = a[i]
a[i] = a[j]
a[j] = t
The line "t = a[i]" is
compiled into S-code:
43 Fun
swap
44 Ads a
45
Lda // &a
46 Get
3 // i
47
Add
48
Lda // a[i]
49 Put 1
50 Ads a
51 Lda
52 Get 3
53
Add // &a[i]
54 Ads a
55 Lda
56 Get 2
57
Add
58
Lda // a[j]
59 sta
Which is then converted to SM code with static array:
84 xFun swap
86 xLit a[]
// &a
89 xGet
3 // i
91 xAdd
92 xLd
// a[i]
93 xPut 1
95 xLit a[]
98 xGet 3
100 xAdd
// &a[i]
101 xLit a[]
104 xGet 2
106 xAdd
107 xLd
// a[j]
108 xSt
Notice the sequence
44
Ads a, Lda => 86 xLit a[]
shows the reduction from getting the static base address to a constant
value. Instead of using "stx" for "a[i] = a[j]", Som4 generates
"53 add" and "59 sta" to store to a[i]. This is easily converted
to SM-code "100 xadd" "108 xst".
8 October 2004
P. Chongstitvatana