n-code to s-code generator
The s-code virtual
machine
S-code is a stack-based instruction set. It is based on
the intermediate language for Som language. The virtual machine
came from Som system version 2.0. We will use only its
VM. We will compile a Nut program to n-code and generate s-code
from that n-code. To run it Som VM is used to read s-code and
evaluate it.
Explain s-code (from
som/report/s-code.htm)
Mapping from n-code to
s-code
xADD
e1 e2 e1 e2 icAdd
xLIT.a
icLit.a
xGET.a
icGet.a
xPUT.a
icPut.a
xLD.a
icLd.a
xST.a e
e icSt.a
xLDX.a e
e icGet.a icLdx
xSTX.a e v
e v icGet.a icStx
xLDY.a e
e icLd.a icLdx
xSTY.a e v
e v icLd.a icStx
xFUN.a.v e
icFun.d e icRet.g d =
v-a+1, g = v+1
xCALL.a
icCall.a
xIF e1 e2 e3
e1
icJf <F>
e2
icJmp <E>
F: e3
E:
xWHILE e1 e2
L: e1
icJf <E>
e2
icJmp <L>
E:
or better
icJmp <I>
L: e2
I: e1
icJt <L>
xDO e1 e2 ...
e1
e2
format of s-code
(som-v2) object file
magic
start end (end
inclusive)
code*
(code segment)
start end
data*
(data segment)
for som-v2 som-in-som magic = 5678920
The nut-som code
generator: n-code to s-code generator (gen.txt)
Input of the code generator is an n-code object. Output is the
s-code object.
Let give one example
the source
(def
add1 x () (+ x 1))
(def main () ()
(sys 1 (add1 22)))
is compiled into n-code object
add1
(fun.1.1 (+ get.1 lit.1 ))
main
(fun.0.0 (sys.1 (call.80 lit.22
)))
22 22
2 1 16 1 0
4 1 14 1 2
6 1 6 0 4
8 0 0 6 0
10 1 19 257 8
12 1 16 22 0
14 1 13 10 12
16 0 0 14 0
18 1 20 1 16
20 0 0 18 0
22 1 19 0 20
0
The s-code generator takes this n-code object and outputs s-code object.
5678920
1 12
2080 23 294 280 287 1 532 294
5663 800 292 276
1000 999
Which means the following:
1
Call 8
2
End
3
Fun 1
4
Get 1
5
Lit 1
6
Add
7
Ret 2
8
Fun 1
9
Lit 22
10 Call 3
11 Sys 1
12 Ret 1
Example session
The above example comes from the following sequence of commands:
compile the example source:
c:>nut31
< t3.txt
rename the object code "a.obj" to "t3.obj" to be used
later. Then, compile the code-generator "gen.txt":
c:>nut31
< gen.txt
and use it ("a.obj") to generate the final s-code object:
c:>nsim31
< t3.obj > t3s.obj
We can use the s-code virtual machine, somv2.exe to run it and to
generate a "readable" s-code.
to generate a readable s-code from a s-code object:
c:>somv2
-l t3s.obj
to run it:
c:>somv2
-x t3s.obj
It is the "pure" s-code virtual machine. This VM is exactly as
appear on the Som website. I modified the source so
that somv2 can output the s-code listing. The source of modified
Som vm is here som-v2x.zip
(the edited
version)
How the code
generator work?
It uses "eval". In other words, the generator reads the input
n-code object and traverses the n-code. Instead of "executing"
it, the generator outputs the corresponding s-code. The mapping
between n-code and s-code is simple. Most of the code is
one-to-one mapping. However, the "address" is different.
This is handled using the "associative" list of n-code address to
s-code address.
main
readinfile
loadobj
init
genall
outobj
genall
while not end
get op
if op ==
FUN eval
next word
out CALL start
out END
outobj
print magic
print 1 end
print all code
print DP DP+Dend
print data segment
eval
It is similar to virtual machine evaluator
but instead of executing the code, the generator outputs corresponding
s-code.
eval
; s-code generator
...
DO
while e1
eval
head e1
e1 =
tail e1
ADD
eval head e1
eval arg2 e1
out icAdd
LIT
out icLit arg
GET
out icGet arg
FUN
insertLable ads
XP ; ads is n-code address
; XP is s-code address
lv = arg & 255
arity = arg >>
8 ; decode a.v
out icFun
(lv-arity+1)
eval head e1
out icRet (lv+1)
CALL
while
e1
; generate all arguments
eval head e1
e1
= tail e1
out icCall (assoc
arg) ; map address to s-code
The associative list has two operations: (more explanation later)
insertLable
n s
insert (n-ads, s-ads)
assoc
n
output s-ads associated
with n-ads, 0 if no match
For the control flow, n-code has the next link but s-code is a linear
code so it requires the "jump" instruction and explicit addresses.
IF e = (cond true false)
eval head
e ; gen cond
out icJf
0 ;
<1>
ads = XP -
1 ; mark s-code
ads
eval arg2
e ; gen true
if (arg3 e) =
NIL
patch ads (XP-ads) ; patch jf at <1>
else
out icJmp 0 ; <2>
patch ads (XP-ads)
ads = XP - 1 ; mark s-code ads
eval arg3 e ; gen false
patch ads (XP-ads) ; path jmp at <2>
As all jumps in s-code are relative addressing, the only instruction
that need to relocate its argument is "call". The "xSTR" is also
relative to data segment hence needs not to be relocated.
Associative table has two fields: n-ads s-ads. searching
sequentially. insert by appending at the end
;
search for n1 (n-ads) return associated s-ads
assoc n1
i = 2
flag = 1
end = numLab * 2 + 2
while i < end & flag
if atab[i] ==
n1
flag = 0
else
i = i + 2
if flag
return 0
else
return
atab[i+1]
; insert table n-ads s-ads at the
end
insertLab n1 n2
i = numLab * 2 + 2
atab[i] = n1
atab[i+1] = n2
numLab++
if numLab > MAXLAB
error "lable
table full"
2 July 2007