Som4 : a compiler for SM chip



A standard Som compiler is modified to include some extra instructions, mostly declaration, to facilitate conversion of S-code to SM code.  It also ouputs an object file .  The executable file runs in batch mode. To change the interactive mode in Som to an executable file, all the immediate expressions; create global variable, initialise variables etc. are collected into a "main" function that will be called at the end of loading all files.  The first line of executable code is simply "call main end".

Extra S-code

The extra S-code instructions are:

Ads x  42   similar to lit x, ( -- x)
Lda    43   load absolute, TOS = M[ads], (ads -- M[ads])
Sta    44   store absolute, M[ads] = x, (ads x -- )

These instructions are used to hint to the converter program how to generate code for SM.  The following example illustrates why the extra S-code is necessary.

Assume a, b are global

a = b

is compiled to S-code:

ld b, st a

but the code for SM looks like:

xlit a, xlit b, xld, xst

The store instruction causes difficulty as the converter must know where to put the address before the store instruction.  To facilitate that, som4 uses extra instructions and generates:

Ads a, Ads b, Lda, Sta

Translation of these S-codes to SM code is straight forward.

Array

Another matter of concern is "array".  SM does not have dynamic array (no run-time system).  Programming for SM chip must assume static array where array declaration is immutable and is global.  

There is a need for compile time allocation for array in DS (static) as opposed to the current runtime allocation (from heap).  For a static allocation, the declaration of array is an immediate expression:

N = array ex . . .

will be compiled to S-code

ex . . ., array, st N

for example

N = array 10  ==>  lit 10, array, st N

To create a static allocation, the compiler compile this immediate expression into CS, and scan for "array", if it exists, the compiler will evaluate this sequence.  The restriction is that this immediate expression must be completely known at compile time.  The result value, which is the size of the array, then is used to allocate DS appropriately.  This initialise sequence after being evaluated will not be included for run-time execution in the "main" function in order not to re-evaluate it.

Static array allocation

Because the current implementation of SM chip does not have run-time support system, smc does not implement dynamic memory allocation.  the "array" instruction can not be executed.  

Som4 solves this problem by executing all immediate expression including the array allocation and finds out the actual address of the array, then assigns the constant (the address of the array) to the array variable.  See the example:

data = array maxdata (in bubble.txt)

normally, this is compiled to

ld maxdata, array, st data

if this sequence is executed, the value of address will be stored in data.  The following sequence should be generated for SM:

lit address, st data

finally, as "data" is a constant, its value never change.  Every "ld data" can be replaced with "lit address".  For example,

y = data[i]

normally it is compiled to:

ld data, get i, ldx, st y

this sequence can be changed to:

lit "address", get i, ldx, st y

this saves one memory access.

Because Som4 is targeted to generate a code is that easily converted to SM code, it does not use ldx, stx.  It generates a sequence to calculate ads + index explicitly.

a = b[i]

normally S-code is:

ld b, get i, ldx, st a

Som4 generates S-code:

Ads a, Ads b, get i, add, lda, sta

which is easily converted into SM code:

xlit a, xlit b, xget i, xadd, xld, xst

Why don't do it at conversion time?  It is difficult to get "stx" to know where to calculate the index properly because the address occurs much earlier in the sequence.  It is easier to generate the correct sequence by the compiler.  See the following example:

a[k] = b

S-code:

ld a, get k, get b, stx

Extra s-code:

Ads a, get k, add, get b, sta

SM-code:

xlit a, xget k, xadd, xget b, xst

Without the compiler generated "add" at the right place, it is difficult for a converter to figure it out.

Example

from quick.txt (quick sort)

These are the immediate expressions:

<at the beginning>

    N = 20
    a = array N

<at the end>

    inita quicksort 0 (N - 1)

When the program is compile, the "main" is created.  The following is S-code for "main":

    139 Fun _main
    140 Ads N
    141 Lit 20        
    142 sta        // N = 20
    143 Ads a
    144 Ads 1002
    145 sta        // a = array N
    146 Call inita
    147 Lit 0
    148 Ads N
    149 Lda
    150 Lit 1
    151 Sub
    152 Call quicksort
    153 Ret 1

Notice at 144 Ads 1002, it is the real address of array N.  The compiler knows where the array is allocated.  The array access is shown:

to swap i j | t =
  t = a[i]
  a[i] = a[j]
  a[j] = t

The line "t = a[i]" is compiled into S-code:

     43 Fun swap
     44 Ads a
     45 Lda       // &a
     46 Get 3     // i
     47 Add       
     48 Lda       // a[i]
     49 Put 1

     50 Ads a
     51 Lda
     52 Get 3
     53 Add      // &a[i]
     54 Ads a
     55 Lda
     56 Get 2
     57 Add      
     58 Lda      // a[j]
     59 sta

Which is then converted to SM code with static array:

   84 xFun swap
   86 xLit a[]       // &a
   89 xGet 3      // i
   91 xAdd
   92 xLd      // a[i]
   93 xPut 1

   95 xLit a[]
   98 xGet 3
  100 xAdd      // &a[i]
  101 xLit a[]
  104 xGet 2
  106 xAdd
  107 xLd      // a[j]
  108 xSt

Notice the sequence

44 Ads a, Lda =>  86 xLit a[]

shows the reduction from getting the static base address to a constant value.  Instead of using "stx" for "a[i] = a[j]", Som4 generates "53 add" and "59 sta" to store to a[i].  This is easily converted to SM-code "100 xadd" "108 xst".

8 October 2004
P. Chongstitvatana