Som version 4.1 Birthday release

Som v.4.1 uses the improved instruction set over u-code of som v.4.0. The new instruction set includes the immediate mode and a few AC-arg instructions. Is uses AC as the argument for the next instruction, mostly load index. The expectation is to reduce the noi by 10-20% and also reduces the running time comparing to som v.4.0. The compiler has gone through several improvements and it is better than the previous version (it is faster and produces better code).

Upon analysing v4 vs v31 compiler many possible improvements to u-code come to mind. The first is the immediate mode. The second is to use "cascade" AC to reduce "put" (using AC as argument in some instruction). However, adding everything will make u-code unattractively large. The aim to include more instruction into u-code is to improve the performance without undue increase in complexity to the instruction set.

The guidelines to include an instruction are:

1) Add as little as possible
2) Must have "generality" that simplify code generator (not having tons of special cases)
3) Clean semantic
4) Have broad impact on performance

u-code improved

From many analysis, the obvious additional instructions are (u2):

immediate, include the smallest set:
addi, subi, bandi, bori, eqi, lti, lei
using AC as index: ldxa, ldya

Others are not use often: muli, divi, bxori, modi and logical op can be "inverse": nei, gti, gei. These instructions are not included. shl and shr are also included to make the instruction set complete. Two instructions are retired. ads is discarded because it is used only to relocate the data segment. That can be achieved by other mean. pusha deems surplus as it can be replaced by push 0. The argument zero is used to signify AC as argument. So, 9 instructions are added to u-code, 2 instructions are removed. This makes the number of instruction to be 50.

som41 performance

First, the number of instruction executed is measured. This is a reliable metric as it is not dependent on the platform that is used to run the benchmark programs. These are usual benchmarks that have been used previously. (quick sorts 20 items). The compiler does compiling som v.3.1 source.

              noi
              v41       v4    v31
bubble   5792      6172 6152
hanoi      1403      1403 1650
matmul2 10436     10564    --
queen2   227309    236237 227221
quick    2071      2071 2385
sieve    7937      7997 --
aes4    17865     19961 18357
compiler 5236396   6174071 5093134

In terms of noi, v.4.1 is better than v.4.0 and v.3.1. Only in the compiler benchmark that v.4.1 is worse than v.3.1. However, two compilers are different. It is difficult to compare them, for example, even when the inputs are the same, they produce different outputs because of the difference in the instruction sets.

Improve the compiler

The compiler is run and its profile is carefully studied. The detailed of improvement can be found here. The most improved part of the compiler is the parser. Changing from "if tok == xxx ..." to "case" speedup the parser upto 5 times. Therefore, the next major improvement to compiler should be to rewrite the parser generator.

With this improved compiler, on the compiler benchmark, som v.4.1 is faster than som v.4.0 by 20% (in terms of noi) and now it is also faster than som v3.1 by 3.4%. However, on all other benchmarks it does not change much.

            v41       v4      (v4-v41)/v4     v31
queen2     227305    236233     3.8%         227221
aes4        17825     19961    10.7%          18357
compiler 4919878   6174071    20.3%        5093134

Conclusion

In general benchmark, u2 does not improve much over u-code of som v.4.0. It is only 6% less noi. The running time speedup is only 8.7%. However, most improvement is done by analysing the compiler, to this end, the compiler benchmark is much improved. The noi of v4.1 compiler is 20% less than v4. The running time speedup is insignificant (may be due to heavily i/o bound?). Som v.4.1 is a gentle refinement of u-code. by adding mere 9 instructions. These instructions allow more improvement in the compiler.

9 Aug 2008

bug fix for code generator "case"

Som v.4.1 vm does short cut the jump in "case" instruction. It uses the address of the jump instruction directly. Therefore the "improv" function must NOT optimise "jmp" to "ret" instruction in the jump table of "case". This is fixed in the update release.

17 Aug 2008