U-code


This is for som 4.0 virtual machine. The aim
  1. compact instruction set
  2. fast
  3. clean semantic
Toward these goals, the instruction set is:
  1. less than 50 instructions
  2. one-addressing with a few two-addressing
  3. fully decoded format
  4. accumulator based

The instruction set

bop    :  add sub mul div band bor bxor mod
          eq ne lt le gt ge
bim    :  addi subi shli shri
data   :  ld st get put lit ads
vector :  ldx/ stx/ ldy/ sty/
control:  jmp jt jf jle/ case/ call callt ret
extra  :  fun/ sys inc dec not push pusha

total  43 instructions 
(but fun is not really an instruction.  It is a marker) The two-address instructions are marked with /.  To stop the execution, use "sys 13" instead of "end".

instruction encoding

1  add sub mul div band bor bxor mod eq
10 ne lt le gt ge addi subi shli shri inc
20 dec lit ads sys not push pusha get put ld
30 st call ret callt jmp jt jf jle case ldx
40 stx ldy sty fun

Format

  one-address  op:32  arg:32
 two-address  op:32  a:24,b:8

Semantic

v is M[fp+v]    Note the positive order of local variables                    

bop v      ::  AC = AC op v
bim a      ::  AC = AC op a

data

lit a      ::  AC = a
ads a      ::  AC = a    use to mark a as an address (pointer)
ld a       ::  AC = M[a]
st a       ::  M[a] = AC
get v      ::  AC = v
put v      ::  v = AC

vector

ldy a,v    ::  AC = M[M[a]+v]
sty a,v    ::  M[M[a]+v] = AC
ldx v1,v2  ::  AC = M[v1+v2]
stx v1,v2  ::  M[v1+v2] = AC

control

jt a       ::  if AC != 0 pc = a
jf a       ::  if AC == 0 pc = a
jle a,v    ::  if AC <= v pc = a
call a     :: 
if arity(a) > 0 pusha passing the last parameter if any
               create a new activation record
ret size   ::  delete current activation record
case lo,v  ::  if AC >= v >= lo skip 2+2*(v-lo)

extra

inc v      ::  v++, AC = v

dec v      ::  v--, AC = v
push v     ::  sp++, M[sp] = v
pusha      ::  sp++, M[sp] = AC
fun arity,size   a place holder for arity,size

usage

for loop
...
inc i
jle loop,v

case
lit hi
case lo,v
jmp else
jmp case1
...
jmp casen

comment

Two-addressing alleviates the need for Base register in vector instructions.  "jle" is the variation of "efor".  It is simpler but need a modified "inc v" to work. It does not required to "decrement" the initial index and there is no "hidden" adjacent variable. "case" is a long process of refinement.  I think I got a good compromise here, using AC for "hi" value and keep "jmp" instruction in the jump table.  This design trades off the size of the code for simplicity in semantic. 

Activation record

U-code stack frame has "positive" order of local variables (v is M[fp+v] ) unlike S-code.  The "usual" (s-code style) structure of activation record is as follows:

   hi

...     <- sp
retads
fp'     <- fp
v1
...
vn   

   lo

v is M[fp-v].  This backward order required renaming of local variables.  The rename process scans the code after the code body is completely generated as there may be some additional local variable allocated during the code generation. If the order is forward then renaming is not necessary.

   hi

...     <- sp
retads
fp'
vn
...
v1   
        <- fp
   lo

Note that the order of local variables is from 1 to n (instead of the reverse n to 1 in S-code).  To know where fp' and retads are, the size of the activation record must be known.  It is record as the argument of the "ret" instruction.  To create a new activation record, two arguments are used: arity, and the number of local variables. They are recorded as two arguments of the "fun" instruction.

Code generation for an accumulator machine

Without a stack, a temporary register must be allocated to store the intermediate result.  For example
 
  a = b + c - d

Going from left to right, b + c must be stored in t, then t - d and put a. This is done in genbop.

Parameter passing

Passing parameters to a function requires special treatment. The space in SS[] (where it stores the activation records), is used as a "virtual stack" to pass parameters to a new frame.  The new instructions are created for this task: push, pusha. With pushing parameters to a virtual stack, a tail-call instruction (callt) is revived as it is appropriate.  It is far simpler than trying to generate codes to pass parameters back to the old frame and do a jump.  Callt is faster too (it is another "big" instruction according to our philosophy of trying to create big instruction, a "big" instruction does more in one instruction). The last parameter is passed through AC, occassionally saving one push instruction.


last update 24 June 2010