T-code

T-code specification

T-code is the instruction for virtual machine to host Som system. It is the next generation of s-code that aims for performance. S-code is a stack-based instruction set, aimed for simplicity and tends toward minimalist. In contrast, T-code is a register-based instruction set and has a richer set of operations.

From the experience of designing various instruction set for real chips using mostly stack-based instruction set, many variations on instruction set design have fewer number of instruction executed. The one-address, smx, or the stack-register, xs and sr, their instruction set reduces the number of instruction executed by almost half. This observation supports the argument that using register-based instruction set improve performance. In implementing a VM, the main instruction dispatch has a high cost. Therefore reducing the number of instruction executed helps reducing this cost. Executing each instruction for a register-based instruction may be faster due to the ability to access many arguments in one instruction. However, the cost of decoding multiple fields in an instruction may be higher than stack-based.

The guide line for instruction set design for performance:
1) limit the range of argument, so that each instruction contains more information. It is fetched faster.
2) do more in one instruction, however, there is a balance between having too many specialised (rarely used but do a lot) instructions and too few of them resulted in very general instructions that are used very often (because they are so general) but do only basic simple operations. Some instruction is very general, hence being used a lot, and also does heavy work, such as "call". One must invent more of this kind.

Instruction set format

To maximise performance, I decide on:

1) three-address format.
2) to address variable, only 8-bit range (0..255)
3) address space is 24 bits

a3:8 a2:8 a1:8 op:8 a1 = a2 op a3
n:24 op:8
d:16 a1:8 op:8

The 8-bit range of variables needs explanation most. The number of local variables need never exceed 64, therefore the numbering of variables is 1..63 locals, 64..255 globals. Accessing a variable does not require a 32-bit address, a variable can be mapped to a register for direct access. The full range of address space (32 bits) can be access indirectly (using indexing addressing mode). Therefore it is necessary to have only a small finite number of registers as globals. 8-bit seems to fit nicely into a 32-bit instruction. Because there are abundant of available registers (64) for locals, allocating a register for intermediate result is never difficult.

Instruction set

arithmetic/logic

add sub mul div mod
and or xor not shl shr
eq ne lt le gt ge

(inc dec, use addi/subi instead)

control transfer

call callt ret jmp efor jt jf

data

mov ldx stx lit ads

special

fun sys

Meaning of instruction

binary op: add, sub...

op a1 a2 a3 a1 = a2 op a3

argument v1..v255, v1..v63 locals, v64..v255 globals. In immediate mode, argument can be small integer 0..255. Locals and globals are accessing M[v]. M[1..255] are specially allocated for v1..v255. Data segment should start elsewhere.

binary op immediate: addi, subi, ...

op a1 a2 n a1 = a2 op n

for large constant, the value is first load into v64 (a global) and subsequently is used, for example

lit k, add v64 ...

generalised mov

mov a1 a2           a1 = a2
movi a1 n:16        a1 = n        n 16-bit
movd a1 a2          a1 = M[a2]   ld/st indirect (defer)
ldx a1 a2 a3     a1 = M[a2+a3] index
stx a1 a2 a3        M[a2+a3] = a1
pass k a2           pass a2 as k th parameter of a func call
passi k n:16        pass n as k th paramter of a func call

control transfer

jmp d               pc = d
jt a d              if a != 0 pc += d
jf a d              if a == 0 pc += d
efor a d           a++, if a <= adj(a) pc += d
case a d            follow by lit lo, lit hi
                    if lo <= a <= hi jump table else pc += d

"increment" can be done with "addi a1 a2 n", and so is dec.

The combined jump, test and jump is not possible as there is not enough room for two argument and one displacement. So, the test is done through logical operators and subsequently the result is tested and jumped via jt/jf.

function call/ret

There is no stack in T-machine, all results are stored in registers. The callee must saved all registers it used into a stack frame. A frame pointer, FP, is used to point to a stack frame which resided in the stack segment (SS[.]).

hi    return ads   <- fp
      vn
      ...
      v1
lo

The number of locals is implicit in the function header and in the "ret" instruction. There is no need for explicitly storing size and old FP in the stack frame.

function header 0:8 nlocal:8 arity:8 op:8

call a
fetch fun header, get arity, nlocal
save locals, return address
pass parameters
goto a+1

ret v n
get return address
pass return value v
knowing n, restore registers
goto return address

by convention, to return a value to caller, it must be returned via v64.

Parameter passing

When passing parameters before a call, another call may occur, such as:

f( a, g(b))

=>
pass 1 a
pass 1 b **
call g
pass 2 retval
call f

The line ** must not overwrite the first "pass 1 a" causes incorrect parameter passing. Therefore parameter passing must be recursivable. For example, instead of passing to an absolute place, let's pass to a stack.

=>
push a
push b
call g
push retval
call f

The stack segment is already existed. It is used to save/restore registers when a call occurred. We can use the space "after" the most recent frame to pass parameters. A call will adjust fp properly as it knows the arity of the function call. The SS view from the above sequence is:

   ...y
     ^ fp

   pass a      =>    y a
   pass b      =>    y a b
   call g      =>    y a [ frame of g ] => y a
   pass retval =>    y a x
   call f      =>    y [ frame of f ] => y

The call instruction passes its parameters and saves caller's locals as follows:

   fp = fp - arity           // fp is before 1st param
   for(i=1; i<=nlocal; i++){
      if( i <= arity ){
        t = SS[fp+i]
        SS[fp+i] = M[i]     // save local i
        M[i] = t             // pass parameter i
      }else{
         SS[fp+i] = M[i]     // just save local i
      }
    }
    fp = fp + nlocal + 1     // create new frame
    SS[fp] = retads          // save return address

It is possible to do function call/ret and parameter passing with "activation record" storing locals in the stack frame. But the present method is faster in terms of accessing variables (it is slower in doing call/ret). In the present method, the first argument of the instruction "pass k a2" and "passi k n" is not used. However, it can be useful when using a different implementation.

15 Jan 2006