Som version 4.1 Birthday
release
Som v.4.1 uses the improved instruction set over u-code of som v.4.0.
The new instruction set includes the immediate mode
and a few AC-arg instructions. Is uses AC as the argument for the
next
instruction, mostly load index. The expectation is to reduce the
noi by 10-20% and also reduces the running time comparing to som v.4.0.
The
compiler has gone through several improvements and it is better than
the previous version (it is faster and produces better code).
Upon analysing v4 vs v31 compiler many possible improvements to u-code
come to mind. The first is the immediate mode. The second
is to use "cascade" AC to reduce "put" (using AC as argument in some
instruction). However, adding everything will make u-code
unattractively large. The aim to include more instruction into
u-code is to improve the performance without undue increase in
complexity to the instruction set.
The guidelines to include an instruction are:
1) Add as little as possible
2) Must have "generality" that simplify code generator (not having tons
of special cases)
3) Clean semantic
4) Have broad impact on performance
u-code improved
From many analysis, the obvious additional instructions are (u2):
immediate, include the smallest set:
addi, subi, bandi, bori,
eqi, lti, lei
using AC as index: ldxa,
ldya
Others are not use often: muli,
divi, bxori, modi and logical op can be "inverse": nei, gti, gei. These
instructions are not included. shl
and shr are also
included to make the instruction set complete. Two instructions
are retired. ads is
discarded because it is used only to relocate the data segment.
That can be achieved by other mean. pusha deems surplus as it can
be replaced by push 0.
The argument zero is used to signify AC as argument. So, 9 instructions
are added to
u-code, 2 instructions are removed. This makes the number of
instruction to be 50.
som41 performance
First, the number of instruction executed is measured. This is a
reliable metric as it is not dependent on the platform that is
used to run the benchmark programs. These are usual benchmarks
that have been used previously. (quick sorts 20 items). The
compiler does compiling som v.3.1 source.
noi
v41 v4
v31
bubble
5792 6172
6152
hanoi
1403 1403
1650
matmul2
10436 10564
--
queen2
227309 236237
227221
quick
2071 2071
2385
sieve
7937 7997
--
aes4
17865 19961
18357
compiler
5236396 6174071
5093134
In terms of noi, v.4.1 is better than v.4.0 and v.3.1. Only in
the compiler benchmark that v.4.1 is worse than v.3.1. However, two
compilers are different. It is difficult to compare them, for example,
even when the inputs are the same, they produce different outputs
because of the difference in the instruction sets.
Improve the compiler
The compiler is run and its profile is carefully studied. The detailed of improvement can be found
here. The most improved part of the compiler is the parser.
Changing from "if tok == xxx ..." to "case" speedup the parser upto 5
times. Therefore, the next major improvement to compiler should be to
rewrite the parser generator.
With this improved compiler, on the compiler benchmark, som v.4.1 is
faster than som v.4.0 by 20% (in terms of noi) and now it is also
faster than som v3.1 by 3.4%. However, on all other benchmarks it does
not change much.
v41
v4 (v4-v41)/v4
v31
queen2
227305 236233
3.8% 227221
aes4
17825 19961
10.7% 18357
compiler
4919878 6174071
20.3% 5093134
Conclusion
In general benchmark, u2 does not improve much over u-code of som
v.4.0. It is only 6% less noi. The running time speedup is only
8.7%.
However, most improvement is done by analysing the compiler, to this
end, the compiler benchmark is much improved. The noi of v4.1
compiler is 20%
less than v4. The running time speedup is insignificant (may be
due to
heavily i/o bound?). Som v.4.1 is a gentle refinement of u-code. by
adding mere 9 instructions. These instructions allow more improvement
in the compiler.
9 Aug 2008
bug fix for code generator "case"
Som v.4.1 vm does short cut the jump in "case" instruction. It
uses the address of the jump instruction directly. Therefore the
"improv" function must NOT optimise "jmp" to "ret" instruction in the
jump table of "case". This is fixed in the update release.
17 Aug 2008