The origin of Som
(as glean from early-som and a1-all)
Som is a continuation from the development of A1 interpreter
family. All interpreters started since the day in Edinburgh
(1990-1991) when I wrote F language (F is from FORTH-like). F is a
postfix language and I added local variables to avoid explicit
manipulation of values in the evaluation stack (which is the hallmark
of FORTH). To improve the performance of the interpreter (computers
were slow then, 1980 era), the source language is translated into
intermediate codes which are executable. This started the series
of designing the intermediate code and its interpreter (virtual
machines).
Here is the examples from F1 (the first one of F family)
;; sum from start to end use an
accumulator variable
;; a start, b
end, s accum
: sum3 x 3 var
a b = if
s ret ;
a 1 + b a s
+ tro
end
I worked with a master student (around 1997) on a concurrent language
called R1. This work becomes the starting point of the written record
(published) of my work on language and instruction set design. R1
virtual machine is based on byte-code instructions executed on a
stack-based machine (zero addressing). R1 language is a C-like
language without type. Later R1 was simplified by stripping out all the
concurrency supports and it became Rz. I used Rz for teaching
purpose for several computer architecture classes where students were
shown a working compiler. Here is an example of R1 language, a bubble
sort function.
sort() {
i = maxdata;
while(i) {
j = 1;
while(j <
i) {
if
( data[j] < data[j+1]) swap(j,j+1);
j
= j+1;
}
i = i-1;
}
}
Around that time, I studied language implementation in Kamin's textbook
(* Prof. Samuel Kamin of the department of computer science, University
of Illinois at Urbana Champagne, Programming Languages: An
interpreter-based approach, Addison-Wesley, 1990.). I rewrote Kamin's
interpreter in C (the original is written in Pascal) and started to
think about a language design that based on infix syntax rather than
Kamin's LISP-like language which required a lot of parentheses. The
result is a series of A language (2002). Here is an early example
of the language. This is a bubble sort function:
: put:at a i v = store a+i v
: get:at a i = fetch a+i
: sort | i j =
[i = 0
while i < maxdata
[j = 0
while j < (maxdata - 1)
[if (get:at data j+1) < (get:at data j)
swap j j+1
syscall 3
j = j + 1]
i
= i + 1]]
Please note the syntax of this language. [ ] are used to group
statements. The array access is via "store" and "fetch"
operators. The "if" operator must have else-clause. In this
example "syscall 3" is a no-op operator. The intermediate codes of the
earliest design are:
add sub mul div band bor
bxor eq lt ldi sti ret array
add1 sub1 eq0 eq1 ret0
ret1 end get put ld st
jmp jt jf lit fun call
calli callt inc jeq jne jlt jge
0jmp 1jmp sys (40)
The format of instruction is a fix 32-bit, 8-bit for operation code and
24-bit for an optional argument ( arg:24 op:8 ). This is a simple
design with good efficiency. Fetching and decoding an instruction are
fast. This is an improvement of byte-code instruction of R1.
The project Som started in 2003. It is a refinement of A language and
its intruction set. The most notable change is the inclusion of "for"
and syntax for array access which is more familiar. Indentation is used
for grouping statements to reduce the use of "parentheses". Also, [ ]
have been used as array index. Here is the bubble sort function is Som
syntax.
to sort | i j =
for i 0 maxdata-1
for j 0
maxdata-2
if
data[j+1] < data[j]
swap j j+1
The instruction set has been steamlined and named "S-code". The
original set is:
add sub mul div band bor
bxor not eq ne lt le ge gt
shl shr mod ldx stx ret
retv array end get put ld st
jmp jt jf lit call callt
inc dec sys case fun
The arithmetic and logic operators are more complete than the
intermediate code in A language. There are no test-and-jump and other
combined code. The instruction "case" is introduced to improve
the efficiency of multi-way branch used in writing the virtual machine
itself. The first implementation is written in C. It was released to
public by the end of 2004 as Som version 1. Som system was
refined and rewritten in Som language and released on New Year 2005 as
Som version 2.
last update 7 Jan 2011