T-code

the next generation of s-code

T-code is the instruction for virtual machine to host Som system.  It is the next generation of s-code that aims for performance.  S-code is a stack-based instruction set, aimed for simplicity and tends toward minimalist.  In contrast, T-code is a register-based instruction set and has a richer set of operations.

From the experience of designing various instruction set for real chips using mostly stack-based instruction set, many variations on instruction set design have fewer number of instruction executed.  The one-address, smx, or the stack-register, xs and sr, their instruction set reduces the number of instruction executed by almost half.  This observation supports the argument that using register-based instruction set improve performance.  In implementing a VM, the main instruction dispatch has a high cost.  Therefore reducing the number of instruction executed helps reducing this cost.  Executing each instruction for a register-based instruction may be faster due to the ability to access many arguments in one instruction.  However, the cost of decoding multiple fields in an instruction may be higher than stack-based.

The guide line for instruction set design for performance:

1) limit the range of argument, so that each instruction contains more information.  It is fetched faster.
2) do more in one instruction, however, there is a balance between having too many specialised (rarely used but do a lot) instructions and too few of them resulted in very general instructions that are used very often (because they are so general) but do only basic simple operations.  Some instruction is very general, hence being used a lot, and also does heavy work, such as "call".  One must invent more of this kind.

Instruction set format

To maximise performance, I decide on:

1) three-address format. 
2) to address variable, only 8-bit range (0..255)
3) address space is 24 bits

op:8 a1:8 a2:8 a3:8           a1 = a2 op a3
op:8 n:24
op:8 a1:8 d:16

The 8-bit range of variables needs explanation most.  The number of local variables need never exceed 64, therefore the numbering of variables is 0..63  locals, 64..255 globals.  Accessing a variable does not require a 32-bit address, a variable can be mapped to a register for direct access, the full range of address space (32 bits) can be access indirectly (using indexing addressing mode).  Therefore it is necessary to have only a small finite number of registers as globals (the number of locals is no limit).  8-bit seems to fit nicely into a 32-bit instruction.  Because there are abundant of available registers (64) for locals, allocating a register for intermediate result is never difficult.  

Instruction set

arithmetic/logic

add sub mul div mod 
and or xor not shl shr
eq ne lt le gt ge 
(inc dec, use addi/subi instead)

control transfer

call callt ret jmp efor jt jf

data

mov ldx stx lit ads

special

fun sys 

meaning

binary op: add, sub...

op a1 a2 a3		a1 = a2 op a3

argument v1..v255, v1..v63 locals, v64..v255 globals.  In immediate mode, argument can be small integer 0..255. Locals and globals are accessing M[v].  M[1..255] are specially allocated for v1..v255.  Data segment should start elsewhere.

binary op immediate: addi, subi, ...

op a1 a2 n   		a1 = a2 op n

shl, shr are natural to be immediate

for large constant, the value is first load into v64 (a global) and subsequently is used, for example  

lit k, add v64 ...

generalised mov

mov a1 a2		a1 = a2
movi a1 n:16		a1 = n     	n 16-bit
movd a1 a2		a1 = M[a2]	ld/st indirect
ldx a1 a2 a3   		a1 = M[a2+a3]	index
stx a1 a2 a3 		M[a2+a3] = a1

control transfer

jmp d			pc = d
jt a d			if a != 0 pc += d
jf a d			if a == 0 pc += d
efor a d			a++, if a <= adj(a) pc += d

"increment" can be done with "addi a1 a2 n", and so is dec.

The combined jump, test and jump is not possible as there is not enough room for two argument and one displacement.  So, the test is done through logical operators and subsequently the result is tested and jumped via jt/jf.

function call/ret

There is no stack in T-machine, all results are stored in registers.  The callee must saved all registers it used into a stack frame. A frame pointer, FP, is used to point to a stack frame which resided in the stack segment (SS[.]).  

hi    return ads	<- fp
      vn
      ...
      v1
lo

The number of locals is implicit in the function header and in the "ret" instruction.  There is no need for explicitly storing size and old FP in the stack frame.

function call

function header  op:8 arity:8 nlocal:8 0:8
(any other useful information? such as recursive call)

call a
  keep return address
  fetch fun header, get arity, nlocal
  save locals
  pass parameters
  goto a+1

ret v n
  get return address
  pass return value v
  knowing n, restore registers
  goto return address

by convention, to return a value to caller, it must be returned via v64.

22 December 2005


