som v41 w

The aim is to test "lex" as a primitive instead of being at Som language.  This is an excercise to rethink about som compiler and a new way to write a lexical analyser and perhaps a parser generator.

"lex" will become a system call.  r-e-p calls "lex" to get a token from a source. Doing this will reduce "low level" Som compiler source and of course speed up the compiler (hopefully) significantly.

Som 4.1 performance (after improving the compiler)

compiler benchmark
total 4919878

top ten function calls are:

           lex1 (2112)	    9098     642395
        fprints (214)	    7944     417746
        newcell (644)	   20944     397936
        strpack (308)	    5315     256136
           outM (6848)	       2     190918
          token (1818)	    6195     177840
         prCode (6606)	    5583     169045
            lex (2778)	   10019     161959
          genex (8736)	    5526     148805
           hash (1064)	    5129     147210

The top one is "lex1". It is 13% of total noi.

Here is the pseudo code:

lex
  skipblank
  case char
    single  ret
    double  ret
    num 0..9  collect num ret
    hex #  collect hex ret
    string " collect string
    otherwise  collect name

collectNum             in the end tokstring is num only
  while isnum 0..9
    getchar

collectHex             in the end tokstring is hex only
  while ishex 0..9 A..F a..f
    getchar

collectString
  while not "
    getchar
  delete the last "

collectName
  while isAlphaNum
    getchar

num is 0..9
hex is 0..9 A..F a..f
alpha is A..Z a..z _
single is  + - * / % < > = ! ^ & | :
double is  // << >> >= <= == !=

skipblank             assume there is one char already read
  while isSpace
    getchar
    if is / and lookahead is / 
      while not eol
        getchar
    if is eol    
      line++, col = 1, getchar

25 Aug 2009
End
 
base code is som 4.1

    
  
