T-code
T-code specification
T-code is the instruction for virtual machine to host Som system.
It is the next generation of s-code that aims for performance.
S-code is a stack-based instruction set, aimed for simplicity and tends
toward minimalist. In contrast, T-code is a register-based
instruction set and has a richer set of operations.
From the experience of designing various instruction set for real chips
using mostly stack-based instruction set, many variations on
instruction set design have fewer number of instruction executed.
The one-address, smx, or the stack-register, xs and sr, their
instruction set reduces the number of instruction executed by almost
half. This observation supports the argument that using
register-based instruction set improve performance. In
implementing a VM, the main instruction dispatch has a high cost.
Therefore reducing the number of instruction executed helps reducing
this cost. Executing each instruction for a register-based
instruction may be faster due to the ability to access many arguments
in one instruction. However, the cost of decoding multiple fields
in an instruction may be higher than stack-based.
The guide line for instruction set design for performance:
1) limit the range of argument, so that each instruction contains more
information. It is fetched faster.
2) do more in one instruction, however, there is a balance between
having too many specialised (rarely used but do a lot) instructions and
too few of them resulted in very general instructions that are used
very often (because they are so general) but do only basic simple
operations. Some instruction is very general, hence being used a
lot, and also does heavy work, such as "call". One must invent
more of this kind.
Instruction set format
To maximise performance, I decide on:
1) three-address format.
2) to address variable, only 8-bit range (0..255)
3) address space is 24 bits
a3:8
a2:8 a1:8
op:8 a1 =
a2 op a3
n:24 op:8
d:16 a1:8 op:8
The 8-bit range of variables needs explanation most. The number
of local variables need never exceed 64, therefore the numbering of
variables is 1..63 locals, 64..255 globals. Accessing a
variable does not require a 32-bit address, a variable can be mapped to
a register for direct access. The full range of address space (32 bits)
can be access indirectly (using indexing addressing mode).
Therefore it is necessary to have only a small finite number of
registers as globals. 8-bit seems to fit nicely into a 32-bit
instruction. Because there are abundant of available registers
(64) for locals, allocating a register for intermediate result is never
difficult.
Instruction set
arithmetic/logic
add
sub mul div mod
and or xor not shl shr
eq ne lt le gt ge
(inc dec, use addi/subi instead)
control transfer
call
callt ret jmp efor jt jf
data
mov
ldx stx lit ads
special
fun
sys
Meaning of instruction
binary op: add, sub...
op
a1 a2 a3 a1 = a2 op a3
argument v1..v255, v1..v63 locals, v64..v255 globals. In
immediate mode, argument can be small integer 0..255. Locals and
globals are accessing M[v]. M[1..255] are specially allocated for
v1..v255. Data segment should start elsewhere.
binary op immediate: addi,
subi, ...
op
a1 a2 n a1 = a2 op n
for large constant, the value is first load into v64 (a global) and
subsequently is used, for example
lit
k, add v64 ...
generalised mov
mov
a1 a2 a1 = a2
movi a1 n:16
a1 = n
n 16-bit
movd a1 a2
a1 = M[a2] ld/st
indirect (defer)
ldx a1 a2 a3
a1 = M[a2+a3] index
stx a1 a2 a3
M[a2+a3] = a1
pass k a2
pass a2 as k th parameter of a func call
passi k n:16 pass n as k th
paramter of a func call
control transfer
jmp
d
pc = d
jt a d
if a != 0 pc += d
jf a d
if a == 0 pc += d
efor a d
a++, if a <= adj(a) pc
+= d
case a d follow
by lit lo, lit hi
if lo <= a <= hi jump table else pc += d
"increment" can be done with "addi a1 a2 n", and so is dec.
The combined jump, test and jump is not possible as there is not enough
room for two argument and one displacement. So, the test is done
through logical operators and subsequently the result is tested and
jumped via jt/jf.
function call/ret
There is no stack in T-machine, all results are stored in
registers. The callee must saved all registers it used into a
stack frame. A frame pointer, FP, is used to point to a stack frame
which resided in the stack segment (SS[.]).
hi
return ads <- fp
vn
...
v1
lo
The number of locals is implicit in the function header and in the
"ret" instruction. There is no need for explicitly storing size
and old FP in the stack frame.
function header 0:8
nlocal:8 arity:8 op:8
call
a
fetch fun header, get
arity, nlocal
save locals, return address
pass parameters
goto a+1
ret v n
get return address
pass return value v
knowing n, restore
registers
goto return address
by convention, to return a value to caller, it must be returned via v64.
Parameter passing
When passing parameters before a call, another call may occur, such as:
f( a, g(b))
=>
pass 1 a
pass 1 b **
call g
pass 2 retval
call f
The line ** must not overwrite the first "pass 1 a" causes incorrect
parameter passing. Therefore parameter passing must be
recursivable. For example, instead of passing to an absolute
place, let's pass to a stack.
=>
push a
push b
call g
push retval
call f
The stack segment is already existed. It is used to save/restore
registers when a call occurred. We can use the space "after" the
most recent frame to pass parameters. A call will adjust fp
properly as it knows the arity of the function call. The SS view
from the above sequence is:
...y
^ fp
pass
a => y a
pass
b => y a b
call
g => y a [ frame of
g ] => y a
pass retval
=> y a x
call
f => y [ frame of f
] => y
The call instruction passes its parameters and saves caller's locals as
follows:
fp = fp - arity
// fp is before 1st param
for(i=1;
i<=nlocal; i++){
if( i <= arity ){
t = SS[fp+i]
SS[fp+i] = M[i] // save local i
M[i] =
t
// pass parameter i
}else{
SS[fp+i] = M[i] // just save local i
}
}
fp = fp +
nlocal + 1 // create new frame
SS[fp] =
retads // save return
address
It is possible to do function call/ret and parameter passing with
"activation record" storing locals in the stack frame. But the
present method is faster in terms of accessing variables
(it is slower in doing call/ret). In the present method, the
first argument of the instruction "pass
k a2" and "passi k n"
is not used. However, it can be useful when using a different
implementation.
15 Jan 2006