som v 3.5  symbol table

The symbol table does not follow som v 3.1 which needs more fields to support run-time conversion of codes.  The number of fields remains the same as som v 3.0: name, type, ref, arity, lv.  The hash table is separated from the attributes (htab[], symtab[]).  This design saves space and is more flexible, htab is big but narrow (2003 x 1), symtab is small but wide (1000 x 5).  An entry into the hash table will never be deleted.  This fact makes symbols in the table unique.  The symbol has no duplicate except when a local name shadowed a global name temporary.  The string of the symbol name is stored in a separate array (nmstr).  The uniqueness of symbols means that there is no duplicate string in this array.  This fact can be of use in a number of ways (but it is not used in this implementation).

When a symbol is inserted into the table its type is set to NEW. The field Lv is set to store a pointer back to the entry of the hash table, called "back pointer".  This bp is used to change the hash table. To allow removing local symbols, a list of their indexes are created when local symbols are inserted into the symbol table (lvlis is such a list). A local symbol (of type local variable) can shadow the global symbol (of type global variable) of the same name.  To do that a new entry in symtab is created, as a local symbol requires only the fields: name, type, ref; the arity is used to store a pointer to the shadowed global symbol so that it can be restored. The hash table is set to point to this new entry.  This can be done by using the bp from the global symbol. The undo process is quite simple.  To know whether a local is a shadow or not, the arity field (a pointer back to the glocal symbol) is inspected.  It stored a non-zero if it is a shadow (an index into symtab[] is never be zero).

The removed local symbol can be reused as its hash entry is not changed so the same symbol can reach it and find the type is NEW. However, in this implementation the shadow local can not be reused as the allocation of a new entry in symtab[] is created top down and never back up (very simple and efficient).  This is not a problem as shadowing should not occurred frequently.  If reused is important, then an allocation of a new entry with a linked list is required.  (it has a disadvantage of initialising the list, it takes time when the list is long).

The Lv field (the back pointer) is used when the symbol is a function. However, a function can never be shadowed so it does not matter to loose it. Only a global can be shadowed.  The Lv field of a global is used again to store its base address (when it is a static array). But that occurs at the end of code generation, before outputing the symbol table, so it does not interfere.

Finding a name by reference

To create a listing of the object code, a symbolic name must be retrieved by its reference and its type. A simple way to do it is to search symtab[] sequentially, this is inefficient, O(nm) when n is the size of symtab[], m is the number of lookup.  It must be done for every codes so the efficiency is important.  A memoised version has been tried in som v 3.0 with good efficiency, O(nk), k < m because once a symbol is retrieve the subsequent retrieval is O(1).  This implementation goes a bit further, it uses another hash table and improve the efficiency to O(n+m).

The hash table (hSym[]) is keyed by a reference to point to symtab[]. The type and reference can be retrieved from symtab[].  The listing is created when all code generation is completed, so all symbols are defined.  Instead of performing a memoised one-by-one reference, all export symbols are hashed into this table once at the beginning (hashSym function), O(n).  Then, the retrieval process is simply a look up (findSym function), O(1).

31 August 2007
