Aim:  We learn how a compiler work. We study theories behind the
      translation (compile) methods. We write a specification of a simple
      language (a grammar). We build a simple compiler from prefabricated parts
      (my compiler
        here). 
      
      Theory of Grammar, especially Context-Free Grammar ( ppt
        )
      Parser, top down, LL(1)  ( ppt )
      Write a grammar
      Compiler package  rz37ds-1.zip
      S2
        processor and its instruction set
      
    
      The first run, compile the whole compiler and test that it works.
      1)  unzip the rz37ds-1.zip package.
      2)  the compiler is in the directory /comp
      3)  use your tools (C compiler) to compile it.  Let the compiler
      executable be "rz37".
      4)  this compiler compiles an input file of Rz language
      
      see what is Rz
        language here
      
      and it produces an assembly language output (to be translate to machine
      codes and run on an appropriate processor).
    
Rz input file:
      
      // a test file, Rz language
      
      ax[10]
      
      findmax(a)
        max = a[0]
        i = 1
        while( i < 10 )
          if( max < a[i] )
             max = a[i]
          i = i + 1
        return max
      
      main()
        print(findmax(&ax))
      
      // -------------------
      
      run this command
      
      c>test/rz37 max.txt
      
      and you will get this output on the screen. It is the assembly language of
      S2 processor.  You can read about S2
        processor and its instruction set here.
      
      .symbol
         ...
         ax 5000
        .code 100
        ...
         st r1 20
         jal rads main
         trap r0 #0
        ...
        :findmax
        st r1 @1 fp
        ...
        pop sp r1
        ; (= #2 (vec #1 0 ))
        mov r5 #0
        ld r4 +r1 r5
        mov r2 r4
        ; (= #3 1 )
        mov r3 #1
        ; while ; (< #3 10 )
        jmp L102
        :L103
        ; if ; (< #2 (vec #1 #3 ))
        ld r5 +r1 r3
        lt r4 r2 r5
        jf r4 L104
        ; (= #2 (vec #1 #3 ))
        ld r4 +r1 r3
        mov r2 r4
        :L104
        ; (= #3 (+ #3 1 ))
        add r3 r3 #1
        :L102
        lt r4 r3 #10
        jt r4 L103
        ; end while
        ; (return #3 )
        mov retval r3
        ...
        ld r1 @1 fp
        ret rads
        
        ; function
        :main
        st r1 @1 fp
        add fp fp #2
        st rads @0 fp
        ; (print (call findmax (& ax )))
        mov r1 #ax
        push sp r1
        jal rads findmax
        trap r28 #1
        ld rads @0 fp
        sub fp fp #2
        ld r1 @1 fp
        ret rads
        .data 5000
        .end
        
        ; ---------------------------
    
 To understand the compiler, we start with the first phase of
      compilation, lexical analyser (scanner).  Scanner reads the input stream
      and recognises the words of the language (words are defined by Regular
      Expression).  It returns "token", an internal representation (just integer
      in our compiler). 
    
You try the scanner to see how it works by running this function. mylex() is the function that reads input and returns token.
void testlex(void){
            while(tok != tkEOF){
                prtoken(tok);
                printf(" '%d ",tok);
                mylex();
            }
        }
int main( int argc, char *argv[] ){
      ...
          readinfile(source);
          testlex();
          return 0;
      }This function is inserted into the main compiler file "compile.c".
      Rebuild the compiler and run it on an input file (above). This is the
      output on the screen. The word is displayed follow by its integer
      representation.
    
D:\prabhas\bag\rz\rz37ds\test>rz37 max.txt
        ax '10 [ '74 10 '11 ] '75 ; '71 findmax '10 ( '72 a '10 ) '73 { '76 max
        '10 = '5
        4 a '10 [ '74 0 '11 ] '75 ; '71 i '10 = '54 1 '11 ; '71 while '80 ( '72
        i '10 <
        '62 10 '11 ) '73 { '76 if '78 ( '72 max '10 < '62 a '10 [ '74 i '10 ]
        '75 ) '73
        { '76 max '10 = '54 a '10 [ '74 i '10 ] '75 ; '71 } '77 i '10 = '54 i
        '10 + '53
        1 '11 ; '71 } '77 return '81 i '10 ; '71 } '77 main '10 ( '72 ) '73 {
        '76 print
        '82 ( '72 findmax '10 ( '72 & '56 ax '10 ) '73 ) '73 ; '71 } '77
        D:\prabhas\bag\rz\rz37ds\test>
    
The next exploration is to change the parser so that it can read a new input language. Let's start with defining the language. Our language consists of just an expression (similar to expression in an ordinary programming language). These are examples of sentences in this language.
a + b;
a + 1 + b;
To define a language, we write a Context Free Grammar of that language.
pass = stmt pass | tkEOF         ;  parse each statement until end
        of file
        stmt = expr tkSEMI                 ;  each stmt is expression follows by
        ;
        expr = term exprs                   ;  each expr is a term or a term,
        binary operator, another term
        exprs = bop term exprs | nil     
        term = tkIDEN | tkNUMBER     ;  each term is an identifier or a number
        bop = tkPLUS                         ;  operator is +
    
We use a parser generator (directory /pgen-c/pgen.exe) to generate a parser in C language from this grammar. The input grammar needs a few additional details, you can see in the file "grammar.txt". The parser generator reads input grammar from the file "grammar.txt". The output ("parse.c") looks like this:
D:\prabhas\bag\rz\rz37ds\pgen-c\ds>pgen > parse.c
    
// parser generated from grammar
      ...
      int pass(void){
      loop:
      if( stmt() ){
      goto loop;}
      if( tok == tkEOF ){
      mylex();
      return 1; }
      return 0;
      }
      int stmt(void){
      ...
      }
    
Now, we rebuild the compiler by replacing the original "parse.c" by this new "parse.c". We have to change mylex() in "lex.c" (to skip the preprocessing step). Change main() and rebuild the compiler and use it to compile our language input.
file "lex.c"
void mylex(void){
            prtoken(tok);
            printf(" ");
            mylex2();
        }
    
file "compile.c"
    
int main( int argc, char *argv[] ){
        ...
            readinfile(source);
            f = pass();
            if( f )
                printf("yes\n");
            return 0;
        }
    
Here is the output on the screen:
D:\prabhas\bag\rz\rz37ds\test>rz37 t1.txt
        a + 1 + b ; eof yes
    
< to be continued >
    
last update 11 Aug 2013