Som v 5.1

The previous release (Som v 5.0) introduced t2-code.  It is the fastest Som vm to date.  t2-code has quite a wide instruction, 96 bits.  The aim of this version is simple:  to design the instructions to fit into 64 bits and to achieve that without sacrificing the performance. 64-bit is a more natural size for today's machine (year 2010).

The design for t2-64 code is straightforward.  It is similar to t2 code, only the format is changed. The new format has two arguments fit into the first 32-bit word and one argument in the second 32-bit word. To allow as many bits as possible to the two argument fields, it is divided into 16-bit, 10-bit and 6-bit (opcode). The argument that is too large to fit into 16-bit or 10-bit needed to be "mov"ed to a smaller size by an extra "mov" instruction that has large argument size: 26-bit and 32-bit.

The result:  the executable size for all benchmarks are smaller by 30% than t2-code (not surprising!). In terms of execution speed, for small size benchmarks, t2-64 is slower (noi) by 1% and for medium size benchmarks, by 10%. In terms of wall clock time, t2-64 is 12% slower averaged over all benchmarks.  See below for all hard data.

The main difficulty in doing this cross-compilation (from som v5 to som v5.1) is always the intricacy of the immediate execution (especially "loadfile").  I should rethink how to make this "cross" simpler or at least easier to understand.

Performance Data

The performance of Som v 5.1 is compared to v 5.0. To measure pure performance, number of instruction executed (noi) is used.
                      noi
                   v5       v51    1-(v5/v51)
bubble           3385      3363    -0.007
matmul           6009      5918    -0.015
queen2         131394    141993     0.07
quick           18338     18337     0.000
   avg small 0.013

aes4            10101      9892    -0.021

lexer          212746    242628     0.123
parser         593632    686615     0.135
som-v2        1475754   1771187     0.167  avg medium 0.10

The runtime is measured by function clock( ) in C library when running Som virtual machine. The running time (in ms) is measured on HP core 2 T7200, 2 GHz, 1 GB ram, notebook nc6400. To measure runtime, the benchmark programs are executed a number of times.  Three measurements of consistent results are averaged.

                    runtime (ms)
                     v5    v51    1-(v5/v51)
bubble x1000        114    125     0.088
matmul x1000        188    172    -0.093
queen2 x100         287    307     0.065
quick x1000         323    386     0.163
aes4 x1000          234    245     0.045
lexer x10           161    193     0.166
parser x1            31     47     0.340
som-v2 x1           151    183     0.175  avg all 0.12

Summary 

noi, small benchmarks: v5.1 is 1% slower then v5.0, medium benchmarks: v5.1 is 10% slower than v5.0.
wall clock time,  v5.1 is slower than v5.0 by 12% over all benchmarks

25 December 2010
Merry Christmas