n-code to s-code generator

The s-code virtual machine
- Mapping from n-code to s-code
format of s-code (som-v2) object file
The nut-som code generator: n-code to s-code generator (gen.txt)
Example session
About som v 2.0 (see som/som-v2.htm)
How the code generator work?

The s-code virtual machine

S-code is a stack-based instruction set. It is based on the intermediate language for Som language. The virtual machine came from Som system version 2.0. We will use only its VM. We will compile a Nut program to n-code and generate s-code from that n-code. To run it Som VM is used to read s-code and evaluate it.

Explain s-code (from som/report/s-code.htm)

Mapping from n-code to s-code

xADD e1 e2         e1 e2 icAdd
xLIT.a              icLit.a
xGET.a              icGet.a
xPUT.a              icPut.a
xLD.a               icLd.a
xST.a e             e icSt.a
xLDX.a e            e icGet.a icLdx
xSTX.a e v         e v icGet.a icStx
xLDY.a e            e icLd.a icLdx
xSTY.a e v         e v icLd.a icStx
xFUN.a.v e         icFun.d e icRet.g d = v-a+1, g = v+1
xCALL.a             icCall.a

xIF e1 e2 e3

e1
icJf <F>
e2
icJmp <E>
F: e3
E:

xWHILE e1 e2

L: e1
   icJf <E>
   e2
   icJmp <L>
E:

or better

   icJmp <I>
L: e2
I: e1
   icJt <L>

xDO e1 e2 ...

   e1
   e2

format of s-code (som-v2) object file

magic
start end   (end inclusive)
code*       (code segment)
start end
data*        (data segment)

for som-v2 som-in-som magic = 5678920

The nut-som code generator: n-code to s-code generator (gen.txt)

Input of the code generator is an n-code object. Output is the s-code object.

Let give one example

the source

(def add1 x () (+ x 1))

(def main () ()
(sys 1 (add1 22)))

is compiled into n-code object

add1
(fun.1.1 (+ get.1 lit.1 ))
main
(fun.0.0 (sys.1 (call.80 lit.22 )))

22 22
2 1 16 1 0
4 1 14 1 2
6 1 6 0 4
8 0 0 6 0
10 1 19 257 8
12 1 16 22 0
14 1 13 10 12
16 0 0 14 0
18 1 20 1 16
20 0 0 18 0
22 1 19 0 20
0

The s-code generator takes this n-code object and outputs s-code object.

5678920
1 12
2080 23 294 280 287 1 532 294
5663 800 292 276
1000 999

Which means the following:

      1 Call 8
      2 End
      3 Fun 1
      4 Get 1
      5 Lit 1
      6 Add
      7 Ret 2
      8 Fun 1
      9 Lit 22
     10 Call 3
     11 Sys 1
     12 Ret 1

Example session

The above example comes from the following sequence of commands:

compile the example source:

c:>nut31 < t3.txt

rename the object code "a.obj" to "t3.obj" to be used later. Then, compile the code-generator "gen.txt":

c:>nut31 < gen.txt

and use it ("a.obj") to generate the final s-code object:

c:>nsim31 < t3.obj > t3s.obj

We can use the s-code virtual machine, somv2.exe to run it and to generate a "readable" s-code.

to generate a readable s-code from a s-code object:

c:>somv2 -l t3s.obj

to run it:

c:>somv2 -x t3s.obj

About som v 2.0 (see som/som-v2.htm)

It is the "pure" s-code virtual machine. This VM is exactly as appear on the Som website. I modified the source so that somv2 can output the s-code listing. The source of modified Som vm is here som-v2x.zip (the edited version)

How the code generator work?

It uses "eval". In other words, the generator reads the input n-code object and traverses the n-code. Instead of "executing" it, the generator outputs the corresponding s-code. The mapping between n-code and s-code is simple. Most of the code is one-to-one mapping. However, the "address" is different. This is handled using the "associative" list of n-code address to s-code address.

main
readinfile
loadobj
init
genall
outobj

genall
while not end
    get op
    if op == FUN eval
    next word
out CALL start
out END

outobj
print magic
print 1 end
print all code
print DP DP+Dend
print data segment

eval

It is similar to virtual machine evaluator
but instead of executing the code, the generator outputs corresponding s-code.

eval ; s-code generator

...
DO
   while e1
     eval head e1
     e1 = tail e1
ADD
   eval head e1
   eval arg2 e1
   out icAdd
LIT
   out icLit arg
GET
   out icGet arg
FUN
   insertLable ads XP      ; ads is n-code address
                           ; XP is s-code address
   lv = arg & 255
   arity = arg >> 8        ; decode a.v
   out icFun (lv-arity+1)
   eval head e1
   out icRet (lv+1)
CALL
   while e1                ; generate all arguments
      eval head e1
      e1 = tail e1
   out icCall (assoc arg) ; map address to s-code

The associative list has two operations: (more explanation later)

insertLable n s
insert (n-ads, s-ads)
assoc n
output s-ads associated with n-ads, 0 if no match

For the control flow, n-code has the next link but s-code is a linear code so it requires the "jump" instruction and explicit addresses.

IF e = (cond true false)
    eval head e           ; gen cond
    out icJf 0            ; <1>
    ads = XP - 1          ; mark s-code ads
    eval arg2 e           ; gen true
    if (arg3 e) = NIL
      patch ads (XP-ads) ; patch jf at <1>
    else
      out icJmp 0         ; <2>
      patch ads (XP-ads)
      ads = XP - 1        ; mark s-code ads
      eval arg3 e         ; gen false
      patch ads (XP-ads) ; path jmp at <2>

As all jumps in s-code are relative addressing, the only instruction that need to relocate its argument is "call". The "xSTR" is also relative to data segment hence needs not to be relocated.

Associative table has two fields: n-ads s-ads. searching sequentially. insert by appending at the end

; search for n1 (n-ads) return associated s-ads
assoc n1
i = 2
flag = 1
end = numLab * 2 + 2
while i < end & flag
    if atab[i] == n1
       flag = 0
    else
       i = i + 2
if flag
    return 0
else
    return atab[i+1]

; insert table n-ads s-ads at the end
insertLab n1 n2
i = numLab * 2 + 2
atab[i] = n1
atab[i+1] = n2
numLab++
if numLab > MAXLAB
    error "lable table full"

2 July 2007