som41  performance

som41     noi                speedup          
          v41       v4       (v4-v41)/v4   v31
bubble    5792      6172      6%           6152  
hanoi     1403      1403      0            1650
matmul2   10436     10564     1%           --
queen2    227309    236237    3.8%         227221
quick     2071      2071      0            2385
sieve     7937      7997      0            --
aes4      17865     19961     10.5%        18357
compiler  5236396   6174071   15%          5093134

Now, we have som v.4.1 that is faster than v.3.1.  Som v.4.1 is both less noi and faster running time except for the compiler.  However, two compilers are different.  It is difficult to compare them, for example, even when the inputs are the same, they produce different outputs because of the difference in the instruction sets.

compile som3.txt  (som v31 compiler) size (Kbyte)

        .lst   .obj  CS size (word)        DS
som31    73     47    5437 x 2 = 10874    1229
som40    82     50    11927               1229
som41    74     45    10771               976

Analysing compiler

compare to som v.3.1 compiler, v41 is faster (less noi) in:

lex1, lex, fprints, strpack, token, genex, hash, install

newcell, ypush are slower due to no-stack (must use extra "put").

The conclusion is that the code generator is quite good and the u2 instruction set is reasonable.  The som v.4.1 code generator produces as efficient code as som v.3.1.  The other difference is due to the way the compiler is written.  Both compilers are different because they produce different codes.

To improve the compiler, one must look into how it is written and change the way the compiler work. The best place to start is to look at the profile.

improvement

start with the most frequently used function:

1) fprints (string-s.txt), change to use nested if.  noi 5231285
2) strpack (string-s.txt), change to use nested if. (1)+(2) noi 5215634
x) newcell (list-s.txt), no way to improve
3) lex1 (token-s.txt), use macro and/or in hot spots. 1..3 noi 5162367
4) outM (icode-s.txt), opt. eqi 0, jf -> jt. 1..4 noi 5133416
5) prCode (icode-s.txt), use macro to decode arg, 1..5 noi 5100835
x) token (token-s.txt), no way to improve
x) lex (token-s.txt), no way to improve
6) bop (parse.som), this is an interesting case.  This function is generated by a parser generator. Hand writing some part of it is not advisable (as the new one can be generated from a new grammar). However, to see the effect of using "case" to replace a long sequence of "if tok == xxx ..." which should have a dramatic speed up, I hand code just this function.  A better way is to write a new parser generator in Som to generate a "case" list for it. 1..6 noi 4957654
  As expected, bop cnt 3747 noi 174543 is reduced to mere 31896, 5 times reduction! 
x) genex (gencode-s.txt), is too complex to be tampered with.
x) hash (symtab-s.txt), no way to improve
7) streq (string-s.txt), opt. jmp in macro, jx to lit 0, jt, $z => jx.z, 1..7 noi 4969571.  This is strange, it is slower than 1..6! It should not be so as "improv" where additional code for opt. is appended, is used only 294 times (noi 63343).  However the profile revealed that, streq is improved a bit (noi -1226) but improv is worsen (noi +4582) so there is net loss. I rewrite "improv" (there is a bug on cascade or), now noi is 4956131.
x) cons (list-s.txt), no way to improve
x) reloc (main-s.txt), no way to improve
x) install (symtab-s.txt), has a few put.x/get.x, can be optimised.  However, I am not certain that it is totally save to do so.  Therefore, the opt. is not done.
8) term (parse.som), rewrite to use case, its noi drops from (cnt 4243) 71156 to 34903 (1/2), 1..8 noi 4919878

Conclusion

As expected, the most improved part of the compiler is the parser.  Changing from "if tok == xxx ..." to "case" speedup the parser upto 5 times. Therefore, the next major improvement to compiler should be to rewrite the parser generator. On the code generator side, an obvious candidate to do is the sequence "put/get" but it required careful analysis of its side-effect.

With the improved compiler, som v.4.1 is faster than som v.4.0 20%.  However, on all other benchmarks it does not change much.

          v41       v4       (v4-v41)/v4   v31
compiler  4919878   6174071   20%          5093134

30 July 2008

