Mini SQL Where Clause Parser


This is a parser for a subset of SQL where clause.  It is used as a study of parser construction.  The critical part of the construction is writing the grammar. Then, the building of data structure (parse-tree) that can be used in the output generation phase.  This parser does not perform type checking.

Here is the sample of the expressions which we expect to parse.

table1.a1 = 100 and table1.a2 = '678'
table4.a3 = 1000 or table4.a4 = '22333'
table5.a5 is not null and table5.a7 not in ('11122','22234')
table6.a6 in ('33344','56678')
table7.a3 <> 30
table8.a8 between to_date('01-12-2015','DD-MM-YYYY') and to_date('10-12-2015','DD-MM-YYYY')
table9.type = '5555' and upper(table9.name) like 'xyz%' and upper(table9.name) not like 'abc%'


Here is its grammar of the language that the parser accept:

clause :=  clause and clause
clause :=  clause or clause
clause :=  ( clause )

clause := name operator value
clause := name like 'string'
clause := name not like 'string'
clause := name between value and value
clause := name not between value and value
clause := name in list
clause := name not in list
clause := name is null
clause := name is not null

name := iden | fun ( iden )
operator := = | <> | < | <= | > | >=
list := ( values )
values := value | value , values
value := number | 'string' | fun list

iden 
identifier
fun  
function name

Here is the explanation how to change the above grammar into one that is suitable for LL(1) parsing.
The grammar which is suitable for LL(1) parser:

<clause> := <expr> | EOF
<expr> := <term> <exprs>
<exprs> := <logic> <term> <exprs> | NIL
<logic> := AND | OR
<term> := <bop> | FUN ( IDEN ) <bop> | ( <expr> )

<bop> :=
  =  <value> | <> <value> |
  < <value> | <=  <value> |
  >  <value> | >= <value> |
  LIKE 'STRING' |
  BETWEEN <value> AND <value> |
  IN <vlist> |
  IS <isnull> |
  NOT <xop>

<xop> :=
  LIKE 'STRING' |
  BETWEEN <value> AND <value> |
  IN <list>

<value> := NUMBER | 'STRING' | FUN <list>
<isnull> := NULL | NOT NULL
<list> := ( <value> <values>
<values> := , <value> <values> |  )


The parse-tree is represented by a list of  the form: (op term term)

Example

table1.name = 'prabhas' and table1.count = 20
its parse-tree is

(and (= table1.name 'prabhas') (= table1.count 20))

By traversing this parse tree, a linear form of output is generated.  This form makes it easy to perform line-by-line linear scan to collect any desired attributes.

LB (
logic and
name table1.name
op =
value 'prabhas'
name table1.count
op =
value 20
RB )

Source code

The first version (cannot parse function() ) is here.  sql0.zip 
Updated version with fun(.)  sql1.zip

To use the parser, go to directory /test.  There is two test files:  test.txt, cond-file.txt

> sql test.txt

The parser accepts only one clause at a time. So the input-file must contains only one clause.  If you want to change the output form, please check the source code at the directory /comp  and look in to the file gencode.c.  

The built-in function list is in the file lex.c .  To change the list, please update the array fun_name[.].

Reference

SQL at wiki
quick reference for SQL syntax (at w3school)

last update 16 Feb 2016