Happy birthday Som 2008

Today is marked as Som's birthday. It has been good many years (almost five years) that Som language and system was continuingly developed. Now it is a good time to look back and contemplate what has been accomplished.

History

4 Dec 2004	som-v1	first public release
31 Dec 2004	som-v2	second public release, with som-in-som
26 June 2005	som-v1.5	with macro and tail-call
5 Jan 2006	som-v1.7	with new VM, T-code
23 Dec 2006	som-v1.8	bug fixed v 1.7, T-code
12 Jan 2007	som-v2.4	(som-in-som for 2007) Children-day release
9 Mar 2007	som-v3	som-in-som with sx-code vm
19 Aug 2007	som-v3.1	fast sx-code vm
2 July 2008	som-v4.0	fast u-code vm
9 Aug 2008	som-v4.1	(improve u-code and compiler) Birthday release

Som project started her life in late 2003. She is based on my earlier work on many language interpreters. The basis is the stack-oriented instruction set, the very simple s-code. The whole 2004 is the developmental year to get the code base to crystallise. The first public release is on 4 December 2004. Som v.1.0 is all written in C. It took Som v2.0 so that everything is written in Som herself. The next year, 2005, we saw the development of macro (v.1.5), code optimisation, constant array and file i/o. These improvements are included in the next release (v.1.7) with the new instruction set, T-code on the new year day of 2006. With the complex t-code, Som v.2.3 which attemps to use t-code as her instruction set, is never complete. The bug fixed for v.1.7 is released by the end of the year (v.1.8). The year 2007 is the update of Som v.2.4 that bring all the updates into Som written in Som. This year also is the year of improving the execution speed of the virtual machine. Many instruction set formats have been experimented with. Som v.3.0 with the extended s-code (sx-code) is released in March. The fully decoded instruction format is released as v.3.1 in August. This version employs a lot of improvement to make the virtual machine as fast as possible. See the release history for more details.

Due to my health reason, the development was stopped for six months. In mid 2008, a new refinement is released, som v.4.0. It uses a stack-less instruction set, u-code. It is a simplification of sx-code. And today, the latest Som is released, som v.4.1. It is a gentle refinement of u-code and a lot of improvement in the compiler. To celebrate the birthday. I did benchmarking all som versions to record the development is a quantitative term.

Another view of Som development

Benchmark

The benchmark programs are chosen so that all versions can compile and run them. They are:

bubble sorts 20 items from 20..1 to 1..20.
matmul performs 8x8 matrix multiply.
queen solves all solutions of 8-queen problem.
queen2 incorporate macro (som v.1 and som v.2 do not have macro).
quick sorts 100 items 100..1 to 1..100.

These benchmarks indicate the performance of the instruction set (measuring the number of instruction executed) of various format (s-code, t-code, sx-code and u-code) plus the quality of the code generators. The running time measured the performance of the virtual machines.

Another benchmark is performed on the compiler. The source of the compiler of som v.2.0 is used (around 2000 lines of Som). All modules are concatenated into one file to be the input of the compiler. This benchmark indicates the performance of the compilers. How fast it is to compile one program. For the compiler benchmark, the performance is relative to Som v.2.0.

The number of instruction executed (noi) is a reliable metric because it is not dependent on the machine that runs the benchmark. But noi alone can not compare the quality of the implementation of the virtual machines. The running time is tricky to collect and highly variable. The results are presented as a relative measure, or the speedup, calculated by (1 - t2/t1) * 100 where t1 is the running time of som v.1.0 and t2 is the version to compare with it (v1.0 is supposed to be the slowest one). The running times are measured using the time function in C , time.h and clock( ), instrumented into the virtual machine. The program is run 3 times and the data averaged. The machine used to run all benchmarks is Dell D500, a laptop with Pentium M 1.3GHz and 1Gbytes of memory running Windows XP (SP2). All hard data can be found here.

Results

General benchmarks

Figure 1 General benchmark: the relative number of instruction executed compared to som v.1.0 (vx/v1)

The number of instruction executed (noi) of v.1.0, v.1.5, v.2.0 and v.2.4 are the same. They are s-code. The t-code of v.1.8 is less than half, so t-code is very effective. The sx-code is also fast, its noi from v.3.0 is only 62% and v.3.1 is even better at 59%. The serie 4, u-code, is similarly effective compared to sx-code.

Figure 2 General benchmark: the speeup of running time relative to som v.1.0 (1-vx/v1)(0.73 is around 4x)

When look at the running time, t-code is 64% faster than v.1.0 (around 2.7x). The virtual machine of serie 2, v.2.0 and v.2.4 (two vm are the same), is fast. It is around 40% faster than v.1.0, or 1.6x. The serie 3 sx-code has impressive result. The virtual machine of v.3.0 is 49% faster, or almost 2x. The new virtual machine for v.3.1 (with separate op and arg and top-of-stack register) is the fastest. It is 74% faster, or 3.8x. The serie 4, the virtual machine of u-code v.4.0, is 69% faster and the refined u-code of v.4.1 is as fast as the best v.3.1 at 73%, or 3.8x.

Compiler benchmark

First, the size (measuring in line of codes) of the compilers and the virtual machines. These figures indicate the complexity of the programs. The v.1.8 compiler is the largest at 3700 lines whereas the recent ones (v.4 and v4.1) are at around 2500 lines. Perhaps this reflects the complexity of t-code versus sx-code. The size of virtual machines also have this trend but they are less different.

Figure 3 The size (lines of code) of the compilers

Figure 4 The size (line of code) of the virtual machines

Figure 5 Compiler benchmark: the compiler performance relative to som v.2.0, the number of instruction executed and the speedup. noi is vx/v2. speedup is 1-vx/v2 (0.66 is around 3x)

In terms of noi (the number of instruction executed to compile the program), the newer compilers are better with the latest one (v.4.1) is only 18% of v.2.0 (it is very impressive, just 1/5). A lot of code improvement has been done on these compilers. The runtime speedup is also reflexed these improvements, with v.4.1 at 66% faster, or almost 3x. The compiler does a lot of i/o so in term of the performance of a virtual machine, it may be better than this figure.

To give some absolute number on the performance of the virtual machine, the runtime for general benchmark is reported as Million instructions per second (noi/running_time). The machine that run these benchmark is Dell D500 laptop with 1.3GHz Pentium M (single cpu) with 1Gbyte memory running Windows XP (SP2). The compiler is lcc-win32 (version Oct 2007) with no code optimisation. These figures reflexed the effect of the instruction set and the implementation of its virtual machine.

Figure 6 The performance of virtual machines (Million instruction per second)

The jump from v.3 to v.3.1 (35.7 Mips to 57.5) is the result of the engineering of the virtual machine. It is interesting to note that the serie 4 (v4 and v4.1), which employs u-code, the figure is not as good as v.3.1 (sx-code). However, in terms of running time, they are similar. It may be just the variance in the time measurement (the variance is high).

9 August 2008

PS Fig.6 has been corrected (previously Kilo instruction per second, now Million instruction per second) 24 Nov 2009