Som v 5.1

The previous release (Som v 5.0) introduced t2-code. It is the fastest Som vm to date. t2-code has quite a wide instruction, 96 bits. The aim of this version is simple: to design the instructions to fit into 64 bits and to achieve that without sacrificing the performance. 64-bit is a more natural size for today's machine (year 2010).

The design for t2-64 code is straightforward. It is similar to t2 code, only the format is changed. The new format has two arguments fit into the first 32-bit word and one argument in the second 32-bit word. To allow as many bits as possible to the two argument fields, it is divided into 16-bit, 10-bit and 6-bit (opcode). The argument that is too large to fit into 16-bit or 10-bit needed to be "mov"ed to a smaller size by an extra "mov" instruction that has large argument size: 26-bit and 32-bit.

The result: the executable size for all benchmarks are smaller by 30% than t2-code (not surprising!). In terms of execution speed, for small size benchmarks, t2-64 is slower (noi) by 1% and for medium size benchmarks, by 10%. In terms of wall clock time, t2-64 is 12% slower averaged over all benchmarks. See below for all hard data.

The main difficulty in doing this cross-compilation (from som v5 to som v5.1) is always the intricacy of the immediate execution (especially "loadfile"). I should rethink how to make this "cross" simpler or at least easier to understand.

Performance Data

The performance of Som v 5.1 is compared to v 5.0. To measure pure performance, number of instruction executed (noi) is used.
                  noi
               v5       v51    1-(v5/v51)
bubble        3385      3363    -0.007
matmul        6009      5918    -0.015
queen2     131394    141993     0.07
quick     18338     18337     0.000   avg small 0.013

aes4        10101      9892    -0.021
lexer        212746    242628     0.123
parser       593632    686615     0.135
som-v2    1475754   1771187     0.167 avg medium 0.10

The runtime is measured by function clock( ) in C library when running Som virtual machine. The running time (in ms) is measured on HP core 2 T7200, 2 GHz, 1 GB ram, notebook nc6400. To measure runtime, the benchmark programs are executed a number of times. Three measurements of consistent results are averaged.

                    runtime (ms)
                     v5    v51    1-(v5/v51)
bubble x1000        114    125     0.088
matmul x1000        188    172    -0.093
queen2 x100         287    307   0.065
quick x1000         323    386   0.163
aes4 x1000          234    245   0.045
lexer x10           161    193   0.166
parser x1            31     47   0.340
som-v2 x1           151    183   0.175 avg all 0.12

Summary

noi, small benchmarks: v5.1 is 1% slower then v5.0, medium benchmarks: v5.1 is 10% slower than v5.0.
wall clock time, v5.1 is slower than v5.0 by 12% over all benchmarks

25 December 2010
Merry Christmas