new symtab in som v3.1

separate symtab into htab[] and symtab[]
htab[] store hash and link to symtab[]  size 1009
symtab[] is the same as the original    size 500  x 9
lvtab[] is eliminated

Observe that in a hash table, not more than half of the entry is used (load factor 50%).  Therefore by separating the attributes from the hash, half of the space can be saved.

An entry in htab will never be deleted (to simplify maintenance) hence the name string in nmstr[] will be unique (this is good).  For a local name to shadow a global name, a new slot in symtab is created and htab is pointed to this one. A list of locals is kept so their entries can be reused by nullify them (set type to NEW).  Some attributes must be kept in the local entry so that it can undo the global entry.  

local entry is 
  (name,type,ref, idx, gv, next,-,-,-)

where
  idx is the index of htab (back pointer)
  gv is the pointer to the entry in symtab of gv
  next is the link to next local entry

The index to htab is a bit tricky.  It is created at the time a symbol is entered into the hash table.  The back pointer is stored in "Callee" field of symtab[].  Hence, this is a unique link, one entry symtab can associated with one entry in htab. Therefore once a local entry which shadowed a global entry is nullified its slot can not be reused (wasted) because there already is a global entry associated with htab.  To save this space, the function to allocate a new entry of symtab can be implmented as a linked list.  Presently, it simply gets a new slot.  This is acceptable as shadowing does not occur frequently.

Using som3.txt (the compiler) as a benchmark, I count the number of distintc symbols, 494 (all symbols in nmstr[] are unique).  This is nearly filled up 500 entries of the symtab.  The size of the symtab is increased to 1000.  I also measure the effectiveness of the hash table.  With the hash table size 1009, the total access is 5427 with probing 1977 times.  This is bad.  So, the size of the hash table is increased to 2003.  The total access remains the same 5427 with probing significantly reduced to 663 times.  The hash function seems to do a good job.  No need for further experiment.

the compiler benchmark takes 5.51 M instructions.  (with old symbol table it took 5.61 M).

24 Aug 2007


