Parsing  Nut program to N-code


First, the parser converts the input source program into an intermediate form, called intermediate code.  In Nut, the intermediate code is called N-code.  The structure of N-code resembles to the source program.  One important "rule of thumb" for a compiler is that what that can be done at compile time will be done to save the effort at the run-time (to make running a program as fast as possible).  This is a reasonable "strategy" because some program, after it has been compiled, will be used many many times.  

N-code is explained in details in my textbook chapter 2, pages 41-45.  Here is the summary of its instruction set:
-----------------------
  N-code instruction set

N-code instruction set is a definition for the internal representation of Nut language. The instruction follows from the Nut language pluses some extra
instructions to implement precise operational semantic of Nut language. The instruction set is divided into four groups: control, value, arithmetic and system. Each instruction has the form of an atom with 7-bit opcode and 24-bit argument.

opcode encoding

xIF 1   xWHILE 2  xDO 3     --       xNEW 5
xADD 6  xSUB 7    xMUL 8    xDIV 9   xEQ 10
xLT 11  xGT 12    xCALL 13  xGET 14  xPUT 15
xLIT 16 xLDX 17   xSTX 18   xFUN 19  xSYS 20
--      --        --        --       xLD 25
xST 26  xLDY 27   xSTY 28   --       --
--      xSTR 32   xBAND 33  xSHR 34  xSHL 35

Totally there are 27 instructions in N-code instruction set. Only valueinstructions have arguments, denoted by “op.arg”. “fun” has special arguments (to be explained later). “call” has a pointer to its body of a function (the N-code) as its argument.
-----------------------

Note:  Please note that N-code don't have <= >= and or not.  To use these operators, they must be defined:
(def != (a b) () (if (= a b) 0 1))
(def >= (a b) () (if (< a b) 0 1))
(def <= (a b) () (if (> a b) 0 1))
(def and (a b) ()(if a b 0))
(def or (a b) () (if a 1 b))
(def not a () (if a 0 1))

xFUN has two arguments encoded into its argument field:  fun.a.s where a is the arity of the function, s is the size of its activation record, in terms of the number of local variables of this function, that includes its arguments and its local variables.

Here are the examples of parsing some Nut programs into N-code.
  1) simple function definition.    source code in file "sq.txt"
(def sq x () (* x x ))

e:>nut32 < sq.txt
sq
(fun.1.1 (* get.1 get.1 ))

2) assignment statement.  source code in file "assign.txt"
(def assign () (a b) (set a (+ b 1)))

e:>nut32 < assign.txt
assign
(fun.0.2 (put.1 (+ get.2 lit.1 )))

3) control statement.  source code in file "control.txt"
(def parseControl () (i j k)
  (do
    (while (< i 10)
       (set i (+ i 1)))
    (if (= j 2)
       (set k 20)
       ; else
       (set k 10))))

e:>nut32 < control.txt
parseControl
(fun.0.3 (do
(while (< get.1 lit.10 )(put.1 (+ get.1 lit.1 )))
(if (= get.2 lit.2)(put.3 lit.20 )(put.3 lit.10 ))))

4) global variable declaration.  source code in file "global.txt"
(let arrayA g)
(def simple () (a b)
  (do
    (set g 1000)                ; global
    (set a g)
    (set b 11)
    (set arrayA (new 10))
    (setv arrayA 1 20)          ; arrayA[1] = 20
    (set b (vec arrayA 1))))    ; b = arrayA[1]

e:>nut32 < global.txt
arrayA
g
simple
(fun.0.2 (do (st.1 lit.1000 )
(put.1 ld.1 )(put.2 lit.11 )
(st.0 (new lit.10 ))
(sty.0 lit.1 lit.20 )
(put.2 (ldy.0 lit.1 ))))

Note: local variable names are changed into 1..n and global variable names are changed to a static address (0..m) by the compiler.  The actual address will be determined when the real machine code is generated.

last update 23 June 2010