Som v 5.1
The previous release (Som v 5.0) introduced t2-code. It is the fastest
Som vm to date. t2-code has quite a wide instruction, 96
bits. The aim of this version is simple: to design the
instructions to fit into 64 bits and to achieve that without
sacrificing the performance. 64-bit is a more natural size for today's
machine (year 2010).
The design for t2-64 code is
straightforward. It is similar to t2
code, only the format is changed. The new format has two arguments fit
into the first 32-bit word and one argument in the second 32-bit word.
To allow as many bits as possible to the two argument fields, it is
divided into 16-bit, 10-bit and 6-bit (opcode). The argument that is
too large to fit into 16-bit or 10-bit needed to be "mov"ed to a
smaller size by an extra "mov" instruction that has large argument
size: 26-bit and 32-bit.
The result: the executable size for all benchmarks are smaller by
30% than t2-code (not surprising!). In terms of execution speed, for
small size benchmarks, t2-64 is slower (noi) by 1% and for medium size
benchmarks, by 10%. In terms of wall clock time, t2-64 is 12% slower
averaged
over all benchmarks. See below for all hard data.
The main difficulty in doing this cross-compilation (from som v5 to som
v5.1) is always the intricacy of the immediate execution (especially
"loadfile"). I should rethink how to make this "cross" simpler or
at least easier to understand.
Performance Data
The performance of Som v 5.1 is compared to v 5.0. To measure pure
performance, number of instruction executed (noi) is used.
noi
v5 v51 1-(v5/v51)
bubble
3385
3363 -0.007
matmul
6009
5918 -0.015
queen2
131394 141993
0.07
quick
18338
18337 0.000 avg small 0.013
aes4
10101 9892 -0.021
lexer
212746
242628 0.123
parser
593632
686615 0.135
som-v2
1475754 1771187 0.167
avg medium 0.10
The runtime
is measured by function clock( ) in C library when running Som virtual
machine. The running time (in ms) is measured on HP core 2 T7200, 2
GHz, 1 GB ram, notebook nc6400. To measure runtime, the benchmark
programs are executed a number of
times. Three measurements of consistent results are averaged.
runtime
(ms)
v5
v51
1-(v5/v51)
bubble x1000
114 125
0.088
matmul x1000
188 172 -0.093
queen2 x100
287 307
0.065
quick x1000
323 386
0.163
aes4 x1000
234 245
0.045
lexer x10
161
193 0.166
parser x1
31
47 0.340
som-v2 x1
151
183 0.175 avg all 0.12
Summary
noi, small benchmarks: v5.1 is 1% slower then v5.0, medium benchmarks:
v5.1 is 10% slower than v5.0.
wall clock time, v5.1 is slower than v5.0 by 12% over all
benchmarks
25 December 2010
Merry Christmas