n-code to s-code generator


The s-code virtual machine

  S-code is a stack-based instruction set.  It is based on the intermediate language for Som language.  The virtual machine used in Som v 2.0.  This version contains only the s-code VM.  The Som compiler itself is in the form of s-code (som.obj) which we do not use.

Explain s-code (from som/report/s-code.htm)

Mapping from n-code to s-code


xADD e1 e2          e1 e2 icAdd
xLIT.a              icLit.a
xGET.a              icGet.a
xPUT.a              icPut.a
xLD.a               icLd.a
xST.a e             e icSt.a
xLDX.a e            e icGet.a icLdx
xSTX.a e v          e v icGet.a icStx
xLDY.a e            e icLd.a icLdx
xSTY.a e v          e v icLd.a icStx
xFUN.a.v e          icFun.d e icRet.g  d = v-a+1, g = v+1
xCALL.a             icCall.a

xIF e1 e2 e3
 
  e1
  icJf <F>
  e2
  icJmp <E>
F: e3
E:

xWHILE e1 e2

L: e1
   icJf <E>
   e2
   icJmp <L>
E:

or better

   icJmp <I>
L: e2
I: e1
   icJt <L>

xDO e1 e2 ...

   e1
   e2

format of s-code (som-v2) object file

magic
start end   (end inclusive)
code*       (code segment)
start end
data*        (data segment)

for som-v2 som-in-som magic = 5678920

The nut-som code generator: n-code to s-code generator (gen.txt)


Input of the code generator is an n-code object.  Output is the s-code object.

Let give one example

the source

(def add1 x () (+ x 1))

(def main () ()
  (sys 1 (add1 22)))

is compiled into n-code object

add1
(fun.1.1 (+ get.1 lit.1 ))
main
(fun.0.0 (sys.1 (call.80 lit.22 )))

22 22
2 1 16 1 0
4 1 14 1 2
6 1 6 0 4
8 0 0 6 0
10 1 19 257 8
12 1 16 22 0
14 1 13 10 12
16 0 0 14 0
18 1 20 1 16
20 0 0 18 0
22 1 19 0 20
0

The s-code generator takes this n-code object and outputs s-code object.

5678920
1 12
2080 23 294 280 287 1 532 294
5663 800 292 276
1000 999

Which means the following:

      1 Call 8
      2 End
      3 Fun 1
      4 Get 1
      5 Lit 1
      6 Add
      7 Ret 2
      8 Fun 1
      9 Lit 22
     10 Call 3
     11 Sys 1
     12 Ret 1

Example session

The above example comes from the following sequence of commands:

compile the nut-compiler:

c:>nut31 < nut.txt

and use nut-compiler to compile the example, "t3.txt", the output goes to "t3.obj":

c:>nsim31 < t3.txt > t3.obj

edit t3.obj to get rid of the listing at the begining.  Now compile the s-code generator, "gen.txt":

c:>nut31 < gen.txt

and use it to generate the final s-code object:

c:>nsim31 < t3.obj > t3s.obj

We can use the s-code virtual machine, somv2.exe to run it and to generate a "readable" s-code. 

to generate a readable s-code from a s-code object:

c:>somv2 -l < t3s.obj

to run it:

c:>somv2 -x < t3s.obj

About som v 2.0  (see som/som-v2.htm)

It is the "pure" s-code virtual machine.  This VM is exactly as appear on the Som website.  I did some editing of the C source so that somv2 can output the s-code listing.  If you want to see the s-code virtual machine, you can find it <here>.
Because of the text editing, this version is 100 lines shorter than the original version posted, however the meaning is exactly the same.

som-v2.zip   (the edited version)

How the code generator work?

It uses "eval".  In other words, the generator reads the input n-code object and traverses the n-code.  Instead of "executing" it, the generator outputs the corresponding s-code.  The mapping between n-code and s-code is simple.  Most of the code is one-to-one mapping.  However, the "address" is different.  This is handled using the "associative" list of n-code address to s-code address.

eval  ; s-code generator

 ...
 DO
   while e1
     eval head e1
     e1 = tail e1
 ADD
   eval head e1
   eval arg2 e1
   out icAdd
 LIT
   out icLit arg
 GET
   out icGet arg
 FUN
   insertLable ads XP      ; ads is n-code address
                           ; XP is s-code address
   lv = arg & 255
   arity = arg >> 8        ; decode a.v
   out icFun (lv-arity+1)
   eval head e1
   out icRet (lv+1)
 CALL
   while e1                ; generate all arguments
      eval head e1
      e1 = tail e1
   out icCall (assoc arg)  ; map address to s-code

The associative list has two operations:

insertLable n s    
  insert (n-ads, s-ads)
assoc n            
  output s-ads associated with n-ads, 0 if no match

For the control flow, n-code has the next link but s-code is a linear code so it requires the "jump" instruction and explicit addresses.

  IF  e = (cond true false)
    eval head e           ; gen cond
    out icJf 0            ; <1>
    ads = XP - 1          ; mark s-code ads
    eval arg2 e           ; gen true
    if (arg3 e) = NIL
      patch ads (XP-ads)  ; patch jf at <1>
    else
      out icJmp 0         ; <2>
      patch ads (XP-ads)
      ads = XP - 1        ; mark s-code ads
      eval arg3 e         ; gen false
      patch ads (XP-ads)  ; path jmp at <2>

As all jumps in s-code are relative addressing, the only instruction that need to relocate its argument is "call".  The "xSTR" is also relative to data segment hence needs not to be relocated.

29 June 2006