Indentation

use indentation for grouping block statements

Grouping is done in the scanner because the scanner must know about when to start and end the group. Assume the source contains no tab (converts tab to space) to simplify the implementation.

The rule to recognise block-begin block-end is:

check the column of the first token on the new line
if col == previous proceed as normal
   col > previous   it is block-begin, push col
   col < previous   it is block-end, pop col
                      and repeat the check to match
                      block-begin

The complication is in the state of lex(). That lex() must sometimes return with block-begin, block-end, especially when there are many block-ends. Care must be taken to synchronise the state of lex().

Doing lookahead in exas() proves to be the source of difficulty as it backtracks lex() and causes confusion on lexstate. An easy fix is do not use lex() in lookahead. scanline() and lookforEQ() are made for this purpose.

lexstate

state	description	condition	action	next state
N	normal (do lex)	old line	out tok	N
N		new line and ==	out tok	N
N		new line and >	out { push	F
N		new line and <	out } pop	B
F	forward		out old tok	N
B	backward	==	out old tok	N
B		<	out } pop	B

lex() has a FSM to control its state. A transition occurs at a call to lex(). Start at N, lex() returns tok as normal. At event newline the column position is compared to the previous start column. Three possibilities:

1) equal returns tok,
2) > returns block-begin and marked this (pushing it to colstack) next state is F,
3) < returns block-end and pop the previous mark; next state is B. block-begin and block-end are inserted by lex(), the token that is scanned from the source is kept and will outputs it at the N state.

In F, the only thing to do is to output the saved tok and go to N. In B, the block-end is outputted until the matched position for block-begin is found by poping the colstack each time lex() is called, then go to N. At eof, care must be taken to output block-end to match the rest of block-begin by poping out the colstack until col == 1 each time lex() is called.

All the original work of lex is done in lex1() and lex() becomes a FSM to control the output. Other work to process token is pulled into lex1(); searching the identifier in the symbol table.

Lookahead for '=' is done in lookforEQ() instead of using lex() to avoid the complex interaction with the new lex FSM when doing backtracking (saving and restoring the lex state). Searching for '=' is done on a source line only, using scanline() (instead of skipblank() ). Scanline() scans character by character to find '=' skipping the matching [] (for indexing) until the end of line, the // is treated as the end of line (and terminates the scan). Using lookforEQ() simplifies the saving and restoring lex state. Only cp and col are needed to be saved/restored.

24 Oct 2003
P. Chongstitvatana