Indentation
use indentation for grouping block statements
Grouping is done in the scanner because the scanner must know about
when to start and end the group. Assume the source contains no
tab (converts tab to space) to simplify the implementation.
The rule to recognise block-begin block-end is:
check the column of the first token on
the new line
if col == previous proceed as normal
col > previous it is block-begin, push col
col < previous it is block-end, pop col
and repeat the check to match
block-begin
The complication is in the state of lex(). That lex() must
sometimes return with block-begin, block-end, especially when there are
many block-ends. Care must be taken to synchronise the state of
lex().
Doing lookahead in exas() proves to be the source of difficulty as it
backtracks lex() and causes confusion on lexstate. An easy fix is
do not use lex() in lookahead. scanline() and lookforEQ() are
made for this purpose.
lexstate
state |
description |
condition
|
action |
next state
|
N
|
normal (do lex)
|
old line
|
out tok
|
N
|
N
|
|
new line and ==
|
out tok
|
N
|
N
|
|
new line and >
|
out { push
|
F
|
N
|
|
new line and <
|
out } pop
|
B
|
F
|
forward
|
|
out old tok
|
N
|
B
|
backward
|
==
|
out old tok
|
N
|
B
|
|
<
|
out } pop
|
B
|
lex() has a FSM to control its state. A transition occurs at a
call to lex(). Start at N, lex() returns tok as normal. At event
newline the column position is compared to the previous start column.
Three possibilities:
1) equal returns tok,
2) > returns block-begin and marked this (pushing it to colstack)
next state is F,
3) < returns block-end and pop the previous mark; next state is B.
block-begin and block-end are inserted by lex(), the token that is
scanned from the source is kept and will outputs it at the N state.
In F, the only thing to do is to output the saved tok and go to
N. In B, the block-end is outputted until the matched position
for block-begin is found by poping the colstack each time lex() is
called, then go to N. At eof, care must be taken to output block-end to
match the rest of block-begin by poping out the colstack until col == 1
each time lex() is called.
All the original work of lex is done in lex1() and lex() becomes a FSM
to control the output. Other work to process token is pulled into
lex1(); searching the identifier in the symbol table.
Lookahead for '=' is done in lookforEQ() instead of using lex() to
avoid the complex interaction with the new lex FSM when doing
backtracking (saving and restoring the lex state). Searching for
'=' is done on a source line only, using scanline() (instead of
skipblank() ). Scanline() scans character by character to find
'=' skipping the matching [] (for indexing) until the end of line, the
// is treated as the end of line (and terminates the scan). Using
lookforEQ() simplifies the saving and restoring lex state. Only
cp and col are needed to be saved/restored.
24 Oct 2003
P. Chongstitvatana