This note explains an implementation of a naive top-down parser in C. I use list as main data structure. The way it works is more like a LISP programming in C (if you know LISP). I think it is quite natural this way. The reason I do not use "Artificial Intelligence" language such as LISP, Prolog or Python (as recommended by Peter Norvig) is that it takes too much time to introduce a new programming language to a class that is already crammed with AI contents.
A list contains variable number of elements (which can also be another
list). For illustration I will use only "name" (a string) and "number"
(integer) as the basic data element (or called it an atom as opposed to a
list). The function name("...") creates a string element and number(..)
creates a number element. NIL is a special atom denoting an empty
element. Construction of a list is by "cons"ing elements together. The
last element of a list is a NIL.
a = cons(number(1), cons(number(2),NIL))
(1 2)
b = cons(number(3), cons(name("abc"),NIL))
(3 "abc")
c = cons(a,cons(b,NIL))
c = ((1 2) (3 "abc"))
Selector picks a list apart. head() picks the first element. tail() produces the list without the first element.
head(a) is 1
tail(a) is (2)
Remember this invariant. If x is a list,
cons(head(x), tail(x)) == x
Two new kinds of atoms are introduced: rule, lex. makerule(S) produces
S as an atom. makelex(V) creates an atom V. (where S, V are defined as
constants in C).
A rule is stored as a list of (left-hand-side right-hand-side). When
rhs has more than one rule, it is a list of rules.
S -> NP VP is (S (NP VP))
A grammar is a list of rule.
S -> NP VP
NP -> ART Nis ((S (NP VP)) (NP (ART N)))
A lexicon is a list of lex and name.
N -> dogs, man is (N ("dogs" "man"))
An example of grammar
S -> NP VP
NP -> ART N
NP -> ART ADJ N
VP -> V
VP -> V NP
cried : VThis above grammar and lexicon are stored as an association list. An association is a pair of key and attribute. For a grammar, key is the rule name. attribute is its right-hand-side. For a lexicon, key is lexicon name, attribute is list of words.
dogs: N, V
the: ART
old: ADJ, N
man: N, V
asso(key, ls)We recurse on the rest of the asso list. input the grammar above
if ls == NIL return NIL
a = head(ls) // a is the first pair
if head(a) == key return second(a)
asso(key, tail(ls))
asso(NP, grammar) returns ((ART N)(ART ADJ N))
asso(ART,lexicon) returns ("the")
NP -> ART NIn the grammar it is stored like this.
NP -> ART ADJ N
(NP (ART N) (ART ADJ N))If we have a state:
(NP VP)We substitute NP by its rhs. The output is:
subs( state )
rhs = asso(head(state), grammar)
return subs2(rhs, tail(state))
subs2( rules, state )
if rules == NIL return NIL
return cons(merge(head(rules),state),
subs2(tail(rules),state))
state is (NP VP)
subs( (NP VP ) )
rhs of NP is ((ART N) (ART ADJ N))
subs2( rhs, (VP) )
a = merge( (ART N), (VP) )
b = subs2( ((ART ADJ N)), (VP))
cons(a,b)
output: ((ART N VP) (ART ADJ N VP))
where merge( L1, L2) produces (L1 L2)
merge( (ART N), (VP )) => (ART N VP)
1. if states == NIL return NOWe repeat these steps until YES or NO.
2. s1 = head(states) // s1 is the current state
3. if success(s1) return YES
// let t by the first symbol
4. if t is a lexicon
5. rhs = asso(t,lexicon)
6. if( match(rhs,head(input)) )
7. states = cons(...,tail(states))
8. else
states = tail(states) // back up, do alternatives
9. else if t is a rule
10. rhs = asso(t, grammar)
11. states = merge(subs2(rhs,tail(rule),input),tail(states))
12. else
states = tail(states) // all fail, back up
states = (((S )("the" "dogs" "cried" )))Line 11, merge(subs2(..),.. ) with tail(states) which is currently NIL. The subs2(rhs,NIL,("the"..)) rewrites S -> (NP VP). The result is the states (((NP VP )("the" "dogs" "cried" ))) and then repeat from Line 1 again.
parse( states )
2> s1 = ((S )("the" "dogs" "cried" ))
t = S
t is a rule
10> rhs of S is ((NP VP))
11> states = merge(
subs2((NP VP), NIL, ("the" "dogs" ..)), NIL))
10> rhs: NP -> ((ART N)(ART ADJ N))The states are now (((ART N VP )("the" "dogs" "cried" ))((ART ADJ N VP )("the" "dogs" "cried" )))
11> states = merge(
subs2(((ART N)(ART ADJ N)),(VP),("the"...)),NIL)
5> rhs: ART ("the")The states are now (((N VP )("dogs" "cried" ))((ART ADJ N VP )("the" "dogs" "cried" )))
6> match(("the"),"the")
7> generate the next states