Code generation

accessing variables
expression
assignment
control flow
function call
header
example
The code generation refers to the generation of S2 assembly language from the input source rz language.  The code generation employs different method depends on types of statement in the source code.  It can be categorised as:
  1. expression
  2. assignment
  3. control flow
  4. function call
The actual mechanism for rz to S2 code generation is based on the intermediate code (ic, byte-code) that rz compiler generated but it is not necessary to discuss the intermediate code (ic) in details.

Accessing variables

There are two types of variable: local and global.  Local variables reside in the activation record (usually implemented in the stack segment).  Global variables have absolute addresses in the data segment.  Accessing a local variable is performed using a "base pointer" (bp) and offset.  The base pointer is the pointer to the current activation record.  For example, the activation record stored locals and a return address, it grows from lo memory to hi memory with the base pointer at the end (Fig. 1  below).
| lv n   |  Lo
|--------|
| ...    |
|--------|
| lv 1   |
|--------|
| rads   | <- bp
|--------|  hi
Figure 1  an activation record

Accessing a local variable n can be done by
 ld   r1 @-n bp
 st   @-n bp r1

Accessing a global variable in done by absolute addressing:
 ld   r1 var
 st   var r1

Expression

An expression has been transformed into a post-fix form so that the code generation can be done based on left to right scanning.  The intermediate result is stored in the register.  The number of register used depends on the depth of stacking the intermediate result.  Assuming all variables are local, the expression

rz:  i * n + j
 
is transformed into a stack-based operations

ic:  rval i
     rval n
     mul
     rval j
     add

where rval n  means getting the value of a local variable n and push it onto the evaluation stack.  The following S2 code is generated

s2:  ld   r1 @-1 bp
     ld   r2 @-2 bp
     mul  r1 r1 r2
     ld   r2 @-3 bp
     add  r1 r1 r2

where i, n, j are local 1, 2, 3.  bp is already set by prior instructions (it is updated by a function call sequence).  The result is kept in r1.  For a logical expression, if the result is required explicitly (such as in an expression involved logical expression) the result will be a boolean value which is realised as 0/1 integer.  If the logical expression is part of conditional expression in an if-statement then the result is combined with the jump conditional and the explicit result is not required.  Suppose a result is required:

rz:  i < n

ic:  rval i
     rval n
     lt

s2:  ld   r1 @-1 bp
     ld   r2 @-2 bp
     sub  rz r1 r2
     ld   r1 #1
     jmp  lt skip_6
     ld   r1 #0
:skip_6 ...

Assignment

An assignment involved storing a value to a variable.  For a global variable, an absolute address is used:

rz:  level = 1

ic:  lvalg 10
     lit 1
     set

s2:  ld  r1 #1
     st  2010 r1

where lvalg n denotes the left-value (address) of a variable n (n is an offset into a data segment). Assuming the data segment started at 2000, the absolute address of "level" is 2010.  For a local variable, bp is used:

rz:  i = 1

ic:  lval 1
     lit 1
     set

s2:  ld  r1 #1
     st  @-1 bp r1

Control flow

There are two control statements in RZ:  if, while.  They are compiled into jumps.

rz:  if( e1 ) e2 else e3

     e1
     jump-if-false :else
     e2
     jump :out
:else e3
:out  ...

The logical expression in the conditional is combined with jump-if-false.  Assuming e1 is "i < n"

ic:  rval i
     rval n
     lt
     jz :else
     e2
     jmp :out
:else e3
:out  ...

s2:  ld   r1 @-1 bp
     ld   r2 @-2 bp
     sub  rz r1 r2
     jmp  ge label_else
     e2
     jmp  label_out
:label_else  e3
:lable_out ...

The "i < n" is combined with "jum-if-false" and becomes

s2:  sub ...  ;; to affect flags
     jmp ge   ;; not less than

The "while" is compiled into test and jump and jump to loop at the end.

rz:   while( e1 ) e2

:loop e1
     jump-if-false :out
     e2
     jump :loop
:out . . .

s2:
:label_loop  e1
     jmp  ... lable_out
     e2
     jmp  label_loop
:label_out ...

This sequence results in executing two jumps, one for entry into the loop and one at the end to loop back, for every iteration round the loop.  An alternative sequence can reduce it to just one jump by placing the test at the end.

     jump :in
:loop e2
:in   e1
     jump-if-true :loop
        ...

Function call

A function call passes parameters, creates an activation record, save registers and jump to the code of function body.  The actual parameters are passed to the callee activation record.  The location of local variables in the callee are known at compile time.  The activation record stored local variables and the computation state.  The computation state saved the previous state before the call occur so that it can be restored upon returning from the call.  Usually the return address and the previous base pointer are saved.  However, as the size of the current activation record is known at the compile time, it is not necessary to keep the previous base pointer.  A code sequence can be generated to delete an activation record knowing its size.  The call is performed by "jal" using a register "rads" to store the return address.  For example

rz:  swap(j,j+1)

ic:  rval j
     rval j
     lit 1
     add
     call :swap

s2:  ld   r1 @-2 bp
     add  r1 r1 #1
     st   @1 bp r1
     ld   r1 @-2 bp
     st   @2 bp r1
     jal  rads :swap
 

|   ...  |  Lo
|--------|
|   j    |
|--------|  caller AR
|   i    |
|--------|
| rads   | <- bp
|--------|
|  j+1   |
|--------|  callee AR
|   j    |
|--------|
| rads'  | <- bp'
|--------|  Hi
Figure 2   Passing actual parameters to a function

If there is a value returned by the function call, it is kept in a register "retv" by convention. An example on returning a value from a function call:

rz:  c = f(x)
 
s2:  jal  rads :f
     or   r1 retv rz   ;;  move return value to r1
     ...

In the function body, to create an activation record, bp is adjusted, follows by saving the registers which will be used in the code of body.  The number of register used is known at compile time.  The registers are saved in the stack segment, using a stack pointer "sp".   Suppose two registers are used,

s2:  :func
     add  bp bp #s
     st   @0 bp rads
     ;; ---- save regs
     st   @1 sp r1
     st   @2 sp r2
     add  sp sp #2

where s is the size of the activation record.  s = num. of locals + 1,  num. of locals = passed variables + other local variables.  All known at compile time.  If this function calls another function then the current return address is saved in the activation record.  To return to the caller, the current activation record is removed and the saved registers are restored and the control is transfered back the caller by "jr rads".

s2:  ld   rads @0 bp  ;; restore rads if necessary
     sub  bp bp #s    ;; remove the current AR
     ld   r1 @-1 sp   ;; ---- restore regs
     ld   r2 @0 sp
     sub  sp sp #2
     jr   rads

Header

There are a number of values that must be set initially such as bp, sp.  They are done at the beginning of the code section.  By convention, the code segment started at 0, the statck segement at 1000, the data segment at 2000, the activation record at 3000.

.s          ;; --- define symbols
 rz 0       ;; regiter 0
 retv 28    ;; reg return value
 bp 29      ;; reg base pointer
 rads 30    ;; reg return address
 sp 31      ;; reg stack pointer
.a 0        ;; start code section at 0
.c
   ld sp #1000 ;; initialise sp
   ld bp #3000 ;; initialise bp
   ...

Example

one long example (excerpt from matmul )

;;  N is global; i,j are locals

    ...
    j = 0;
    while( j<N ){
      b[index(i,j)] = j;
      j = j + 1;
    }

   . . .
   st   @-1 bp r0         ;; j = 0
:label_144                ;; while ( j < N )
   ld   r1 @-1 bp         ;;   rval 1
   ld   r2 2001           ;;   rvalg 1
   sub  rz r1 r2          ;;   lt
   jmp  ge label_185      ;;   jz 185  exit
   ld   r1 @-1 bp
   st   @2 bp r1          ;;   pass i
   ld   r1 @-2 bp
   st   @1 bp r1          ;;   pass j
   jal  rads label_200    ;;   call index(i,j)
   or   r1 retv rz
   ld   r2 #2018          ;;   ads of b[.]
   add  r1 r2 r1          ;;   ads of b[index(i,j)]
   ld   r2 @-1 bp         ;;   rval j
   st   @0 r1 r2          ;;   b[index(i,j)] = j
   ld   r1 @-1 bp
   add  r1 r1 #1
   st   @-1 bp r1         ;;   j = j + 1
   jmp always label_144   ;;   loop back
:label_185 ...            ;; exit loop

    index(i,j){
      return i*N + j;
    }

:label_200                ;; index(i,j)
   add  bp bp #3          ;; create AR
   st   @1 sp r1          ;; save registers
   st   @2 sp r2
   add  sp sp #2
   ld   r1 @-2 bp         ;;   i
   ld   r2 2001           ;;   N
   mul  r1 r1 r2          ;;   i * N
   ld   r2 @-1 bp         ;;   j
   add  r1 r1 r2          ;;   (i*N) + j
   or   retv r1 rz        ;;   return i*N + j;
   sub  bp bp #3          ;; remove AR
   ld   r1 @-1 sp         ;; restore regs
   ld   r2 @0 sp
   sub  sp sp #2
   jr   rads              ;; return

See more examples in the sub-directory \bm

24 December 2002
P. Chongstitvatana