Analysing Som v.4.1 compiler

Here is the profile of running the compiler benchmark.  Only the functions that use more than 70K instructions are shown, sorting by the noi used. (for the full profile see /doc/som41-comp-rep.txt)

      name      (ref)         cnt        noi
           lex1 (2112)        9098     713828
        fprints (214)         7950     434521
        newcell (644)        20944     397936
        strpack (308)         5315     271787
           outM (6814)           2     203510
         prCode (6572)        5589     201848
          token (1818)        6195     177840
            bop (5728)        3747     174543
            lex (2778)       10019     162493
          genex (8676)        5526     148805
           hash (1064)        5129     147210
          streq (60)          5265     138446
          reloc (10338)          1     125774
           cons (746)        11200     123200
        install (1208)        5129     107172
           outa (6000)        5741     103338
       showCode (6766)           1      95020
          ypush (3318)        8621      94831
           ypop (3346)        8621      94831
        newatom (708)         7770      85470
           term (5368)        4243      71156
 

compare to som v.3.1 compiler, v41 is faster (less noi) in:
lex1, lex, fprints, strpack, token, genex, hash, install

newcell, ypush are slower due to no-stack (must use extra "put").

The conclusion is that the code generator is quite good and the u2 instruction set is reasonable.  The som v.4.1 code generator produces as efficient code as som v.3.1.  The other difference is due to the way the compiler is written.  Both compilers are different because they produce different codes.

To improve the compiler, one must look into how it is written and change the way the compiler work. The best place to start is to look at the profile.

improvement

Start with the most frequently used function: (starting with noi 5236307)

1) fprints (string-s.txt), change to use nested if.  noi 5231285
2) strpack (string-s.txt), change to use nested if. (1)+(2) noi 5215634
x) newcell (list-s.txt), no way to improve
3) lex1 (token-s.txt), use macro and/or in hot spots. 1..3 noi 5162367
4) outM (icode-s.txt), opt. eqi 0, jf -> jt. 1..4 noi 5133416
5) prCode (icode-s.txt), use macro to decode arg, 1..5 noi 5100835
x) token (token-s.txt), no way to improve
x) lex (token-s.txt), no way to improve
6) bop (parse.som), this is an interesting case.  This function is generated by a parser generator. Hand writing some part of it is not advisable (as the new one can be generated from a new grammar). However, to see the effect of using "case" to replace a long sequence of "if tok == xxx ..." which should have a dramatic speed up, I hand code just this function.  A better way is to write a new parser generator in Som to generate a "case" list for it. 1..6 noi 4957654
  As expected, bop cnt 3747 noi 174543 is reduced to mere 31896, 5 times reduction!
x) genex (gencode-s.txt), is too complex to be tampered with.
x) hash (symtab-s.txt), no way to improve
7) streq (string-s.txt), opt. jmp in macro, jx to lit 0, jt, $z => jx.z, 1..7 noi 4969571.  This is strange, it is slower than 1..6! It should not be so as "improv" where additional code for opt. is appended, is used only 294 times (noi 63343).  However the profile revealed that, streq is improved a bit (noi -1226) but improv is worsen (noi +4582) so there is net loss. I rewrite "improv" (there is a bug on cascade or), now noi is 4956131.
x) cons (list-s.txt), no way to improve
x) reloc (main-s.txt), no way to improve
x) install (symtab-s.txt), has a few put.x/get.x, can be optimised.  However, I am not certain that it is totally save to do so.  Therefore, the opt. is not done.
8) term (parse.som), rewrite to use case, its noi drops from (cnt 4243) 71156 to 34903 (1/2), 1..8 noi 4919878

Here is the profile after all  improvements above: (see the full text at /doc/som44-rep.txt)

total noi 4919789

    name       (ref)         cnt        noi
           lex1 (2112)        9098     642395
        fprints (214)         7944     417746
        newcell (644)        20944     397936
        strpack (308)         5315     256136
           outM (6848)           2     190918
          token (1818)        6195     177840
         prCode (6606)        5583     169045
            lex (2778)       10019     161959
          genex (8736)        5526     148805
           hash (1064)        5129     147210
          streq (60)          5265     137220
          reloc (10396)          1     125645
           cons (746)        11200     123200
        install (1208)        5129     107172
           outa (5996)        5741     103338
       showCode (6800)           1      94918
          ypush (3306)        8621      94831
           ypop (3334)        8621      94831
       
So the result is 4919878 instructions versus 5236396 in the beginning.  This is 6% reduction in noi. from the effort of rewriting the source code (an afternoon job) and a bit of code generator improvement. Because the aim is to improve it until it is comparable to som v.3.1 (noi 5093134) , the effort is stopped here.

5 Aug 2008