24 July 2008

som v.4.1  (to be released in 9 Aug 2008, Birthday release)

It improves the instruction set by including immediate mode and a few AC-arg instructions (to do cascade).  The expectation is to reduce the noi 10-20% and also the running time comparing to som v.4.0. 

Upon analysing v4 vs v31 compiler (see som-v4-bm/doc/v4-vs-v31-compile.txt) many possible improvements to u-code come to mind.  The first is the immediate mode.  The second is to use "cascade" AC to reduce "put" (using AC as argument in some instruction). However, adding everything will make u-code unattractively large.  The aim to include more instruction into u-code is to improve the performance without undue increase in complexity to the instruction set.

The guidelines to include an instruction are:
1) Add as little as possible
2) Must have "generality" that simplify code generator (not having tons of special cases)
3) Clean semantic
4) Have broad impact on performance

u-code v.2  improved 
The additional instruction design is discussed in doc\u2-code.txt

from many analysis, the obvious additional instructions are:

immediate, include the smallest set:
  addi, subi, bandi, bori, eqi, lti, lei

others are not use often: muli, divi, bxori, modi
and logical op can be "inverse": nei, gti, gei

using AC as index:  ldxa, ldya

som41  performance

som41     noi                speedup          
          v41       v4       (v4-v41)/v4   v31
bubble    5792      6172      6%           6152  
hanoi     1403      1403      0            1650
matmul2   10436     10564     1%           --
queen2    227309    236237    3.8%         227221
quick     2071      2071      0            2385
sieve     7937      7997      0            --
aes4      17865     19961     10.5%        18357
compiler  5236396   6174071   15%          5093134

Now, we have som v.4.1 that is faster than v.3.1.  Som v.4.1 is both less noi and faster running time except for the compiler.  However, two compilers are different. It is difficult to compare them, for example, even when the inputs are the same, they produce different outputs because of the difference in the instruction sets.

In general benchmark, u2 does not improve much over u-code of som v.4.0. It is only 6% less noi.  The running time speedup is only 8.7%.  However, most improvement is done by analysing the compiler, to this end, the compiler benchmark is much improved.  The noi of v4.1 is 21% less than v4.  The running time speedup is insignificant (may be due to heavily i/o bound?).

improve the compiler

As expected, the most improved part of the compiler is the parser.  Changing from "if tok == xxx ..." to "case" speedup the parser upto 5 times. Therefore, the next major improvement to compiler should be to rewrite the parser generator. 

With this improved compiler, on the compiler benchmark, som v.4.1 is faster than som v.4.0 by 20% and now it is also faster than som v3.1 by 3.4%. However, on all other benchmarks it does not change much.

            v41       v4      (v4-v41)/v4     v31
queen2     227305    236233     3.8%         227221
aes4        17825     19961    10.7%          18357
compiler  4919878   6174071    20.3%        5093134

3 Aug 2008

bug fix for code generator "case"

Som v.4.1 vm does short cut the jump in "case" instruction.  It uses the address of the jump instruction directly.  Therefore the "improv" function must NOT optimise "jmp" to "ret" instruction in the jump table of "case".  This is fixed in the update release.

17 Aug 2008
