som v3.1

som v3 with improved vm.  The new vm had fully decodes opcode and argument.  It also employs a few techniques to speed up the execution of sx-code.  (I expect 30% faster).

1.  op and arg are fully decoded (no decoder)
2.  use tos register (need a few special codes)
3.  all jumps are absolute
4.  faster access to local variables

The disadvantage is that the size of code segment is double (as xop[] and xarg[] is two arrays replacing cs[] ).  The object file format is changed.  The object code version is som 3.1 (magic = 5678931).  

The base vm is developed from som-v16s with all the experiment on the new vm in the directory som-v16t.  The bootstraping steps (converging the som code) are recorded in som-v16s/som1 ... som2.. som3.. som4 there.

Present limitation

1. I feel that the compiler do too much unnecessary work (especially converting to executable code).  I have no better idea yet.  The listing is better as it does not need to search for the symbol (because at the time of listing, the argument is the index to the symbol table). 

2. The macro is tyMAC only.  No full macro.  The forward declaration causes a lot of complexity.  So, it is easier to not allow full macro for the time being.  (the compiler does not use it any way).

Benchmark result

compare to som-v3  (from som-v16t/doc/fast-result.txt)
it is almost 2x faster.

(speedup % = time_v3/time_v31 - 1)

quick                 48.1%
hanoi                 28.7%
bubble                48.9%
queen2                59.7%
compiler              26.8% 
compiler (no output)  42.5%
                  avg 42.4%

21 Aug 2007
