The origin of Som

(as glean from early-som and a1-all)

Som is a continuation from the development of A1 interpreter family.  All interpreters started since the day in Edinburgh (1990-1991) when I wrote F language (F is from FORTH-like). F is a postfix language and I added local variables to avoid explicit manipulation of values in the evaluation stack (which is the hallmark of FORTH). To improve the performance of the interpreter (computers were slow then, 1980 era), the source language is translated into intermediate codes which are executable.  This started the series of designing the intermediate code and its interpreter (virtual machines).

Here is the examples from F1  (the first one of F family)

;; sum from start to end use an accumulator variable
;;    a start, b end, s accum
: sum3 x 3 var
  a b = if
    s ret ;
  a 1 +  b  a s +  tro  
end

I worked with a master student (around 1997) on a concurrent language called R1. This work becomes the starting point of the written record (published) of my work on language and instruction set design.  R1 virtual machine is based on byte-code instructions executed on a stack-based machine (zero addressing).  R1 language is a C-like language without type. Later R1 was simplified by stripping out all the concurrency supports and it became Rz.  I used Rz for teaching purpose for several computer architecture classes where students were shown a working compiler. Here is an example of R1 language, a bubble sort function.

sort() {
  i = maxdata;
  while(i) {
    j = 1;
    while(j < i) {
      if ( data[j] < data[j+1]) swap(j,j+1);
      j = j+1;
    }
    i = i-1;
  }
}

Around that time, I studied language implementation in Kamin's textbook (* Prof. Samuel Kamin of the department of computer science, University of Illinois at Urbana Champagne, Programming Languages: An interpreter-based approach, Addison-Wesley, 1990.). I rewrote Kamin's interpreter in C (the original is written in Pascal) and started to think about a language design that based on infix syntax rather than Kamin's LISP-like language which required a lot of parentheses. The result is a series of A language (2002).  Here is an early example of the language. This is a bubble sort function:

: put:at a i v = store a+i v
: get:at a i = fetch a+i

: sort | i j =
  [i = 0
   while i < maxdata
     [j = 0
      while j < (maxdata - 1)
        [if (get:at data j+1) < (get:at data j)
           swap j j+1
           syscall 3
         j = j + 1]
      i = i + 1]]

Please note the syntax of this language. [ ] are used to group statements. The array access is via "store" and "fetch" operators.  The "if" operator must have else-clause. In this example "syscall 3" is a no-op operator. The intermediate codes of the earliest design are:

  add sub mul div band bor bxor eq lt ldi sti ret array
  add1 sub1 eq0 eq1 ret0 ret1 end get put ld st
  jmp jt jf lit fun call calli callt inc jeq jne jlt jge
  0jmp 1jmp sys (40)

The format of instruction is a fix 32-bit, 8-bit for operation code and 24-bit for an optional argument ( arg:24 op:8 ). This is a simple design with good efficiency. Fetching and decoding an instruction are fast. This is an improvement of byte-code instruction of R1.

The project Som started in 2003. It is a refinement of A language and its intruction set. The most notable change is the inclusion of "for" and syntax for array access which is more familiar. Indentation is used for grouping statements to reduce the use of "parentheses". Also, [ ] have been used as array index. Here is the bubble sort function is Som syntax.

to sort | i j =
  for i 0 maxdata-1
    for j 0 maxdata-2
      if data[j+1] < data[j]
        swap j j+1

The instruction set has been steamlined and named "S-code".  The original set is:

  add sub mul div band bor bxor not eq ne lt le ge gt
  shl shr mod ldx stx ret retv array end get put ld st
  jmp jt jf lit call callt inc dec sys case fun

The arithmetic and logic operators are more complete than the intermediate code in A language. There are no test-and-jump and other combined code.  The instruction "case" is introduced to improve the efficiency of multi-way branch used in writing the virtual machine itself. The first implementation is written in C. It was released to public by the end of 2004 as Som version 1.  Som system was refined and rewritten in Som language and released on New Year 2005 as Som version 2.

last update 7 Jan 2011