som development
how to write som v 3.1 (in som)

Presently, whenever I want to write a new version of Som, I will improve a som-c from a previous version first then som-som will be implemented based on the version in C.  The reason to do som-c first because of the ease of developing program in C using a good IDE which provides many facilities such as finding a function, but most importantly it is easy to debug.

Therefore a som-som version will always be a "second" to som-c.  It never develops a Som style. Because most of debugging relies on the IDE of C, the debugging aids for Som is never developed.  Is it possible to do a development entirely in som? How to use a previous version of Som alone to develop a next version?

If the change does not affect the virtual machine (the eval function) then it is not difficult.  All programs under development are executable under the previous som virtual machine.  Any change can be done at som source. 

If the virtual machine is changed, as the virtual machine must be written in C, some support to develop the virtual machine in C must exist. The development stage will be more complex. What is needed to support the development of a new virtual machine?  A correct object code to feed the new vm is necessary.  If most of the instruction set does not change or the change is only the format of the object code then the task is simple.  A previous version of som compiler can be modified to generate this new object code to feed the new vm.

If the instruction set is also changed, then a new code generator must be co-develop with the new vm. This is more complicate to debug. However, the code generator can be compiled and run with the previous som compiler and virtual machine.  The step of development is lengthen. 

the case of som-v3.1

Aim to write a new compiler that is improved from som-v3.0 to run on a new virtual machine that employed decoded instruction and top-of-stack speedup (around 40% faster).  The instruction set does not change so the old compiler is largely intact.  The change will be:

1)  do callchain while compiling because it is needed to execute the immediate code with the new vm.
2)  improve access to the symbol table via reference.  Using "end" and "retv" fields to improve several processing steps that do sequential search. 
3)  output the new object code format, separate op and arg.
4)  output the new symbol table, include "end" and "retv".

The change on the new vm is as follows:

1)  before executing sx-code, a code conversion must be done.  Analysing and putting in the "do*" special instructions, convert arguments of locals and change jump displacement to absolute.
2)  access to symbol table from two modes: compiling mode and special conversion before execution mode.

Steps of work

1)  preparing the new vm to accept the new object code format.
2)  there will be two symbol tables: one from som, and one from vm.  how to reconcile these two?
3)  let som-v3 be the base som vm. let som0.obj be the base som-compiler. let som3 be the new som-compiler and som-v31 be the new vm. 
4)  som-v3 + som0.obj will not be change. let denote them as base-som.
5)  som3 initially is copied directly from the source of som0.obj (som-v3 in som). It will gradually be modified to its final stage, the new compiler.
6)  modify som3 to do callchain and output the new symbol table and the new object code format (using base-som to compile and run it to generate the object code).
7)  use the som-v31 vm to test the new object code.
8)  improve som3 until it becomes the new compiler.
9)  use som-v31 to compile som3, let its object be som3.obj.
10) use som-v31 + som3.obj as the new system.

The result should be a new som system that is at least 40% faster because of its new vm.  However, the compiler will be a bit more complex (doing the callchain analysis an accessing symbol table by references).  Compilation should be faster due to more efficient algorithms and faster access to data structure in getting to the code and the  attributes of a function.

A small question, how to simplify the system?  How much larger the compiler will be?
   
technical 

symbol table

The symbol table is a data structure existed in the data segment accessible to som language. It is used during compilation.  On the other hand, the virtual machine requires a code conversion before execution.  This process needs to access the symbol table for some information.  During the execution mode, the symbol table can be read from the object file.  Therefore there are two symbol tables, one in som language, the other in c language.  Is it possible to reconcile the two?

execution of immediate code

the code conversion must be done before its execution.  If we use only one virtual machine, then we have to concern about two versions of code, one is not yet converted, the other is already converted.  As the execution of immediate code is done during compilation mode, the code is not yet converted.  This code is the one outputted to the object file.  A simple solution is to have two virtual machines, one for non-convert code, the other for converted code. Another solution is to have only the converted code.  

representing the executable code

Let the code segment of the virtual machine be CS' and the data segment DS'.  Let the code and data segment of som-compiler be CS and DS. The current implementation has DS' = DS = M[.] and CS = M[MAXDS]. The remaining question is where to put CS'?.  In the virtual machine, xop[.] is 8-bit therefore it is stored in (char *). However, the only data type in som is int, hence xop in som-compiler must be int in M[.]. It is most logical to store CS' in M[.]. Som-compiler will have direct access to it. There is no "extra" processing to do to execute it.  It will be direct and simple.  The only disadvantage is the size occupied by the executable will be double.  This is a trade-off between "space" and "speed".

29 July 2007  (Ah-saraha-bucha day)

If we adopt "one" code (i.e. converted) then there will be no conversion in execute mode.  Hence, there is no need to access the symbol table in execute mode.  It solves the problem of two symbol tables. To convert all codes at the compile time (it is necessary because the virtual machine needs the converted code to run immediate codes) it is complicate to handle forward call as the retv status of the forward function is not known therefore conversion is not possible and must be done later when the forward function is defined. This means it is necessary to come back later to do the conversion. Except this case, conversion of a function during compilation is straight forward. It can be done at the end of compiling a function.

30 July 2007 
