Rz language version 3.7

(This compiler is still a work in progress. )
Examples
System call
Dereference
Data structure
Macro
Download

Rz is a descendant of R1, a concurrent language  for small control applications.  (Have a look at full report and implementation from my research work web page).  Rz is aimed to be a teaching language for system programming and computer architecture subjects, which emphasises a small language that can be used to illustrate all the "inner" working parts of a computer system (compilation, code generation, ISA simulation), in other words it allows students to "play" with the system.   R1 is a concurrent language. Rz simplifies that by eliminating all the real-time concurrency language features and retains only the most basic language constructs.  In a way, Rz "looks like" C (without type). 

Short description

The language is a small subset of C look-alike language. It has no type (or having only one type which is "int").  Global variables must be declared but local variables are automatic.  A variable can be either a scalar or an array.  There is no user defined data type.  An array is one dimension.  RZ language can be summarised as follows:
For C programmer, please note, no: for, break, do, missing many operators especially ++, --  .  The syntax looks clean because it uses indentation instead of {} and using a newline to terminate a statement instead of ';'.

Examples

It is easier just to look at an example to know most of the syntax.  Here is an example of Rz

    //  find max in an array
    a[10], N

    init()

      i = 0
      while ( i < N )
        a[i] = i
        i = i + 1

    main()
      N = 10
      init()
      max = a[0]
      i = 1
      while( i < N )
        if( max < a[i] ) max = a[i]
        i = i + 1
      print(max)

The variables a[], N are globals, max, i are locals.  For an array, the size must be known at compile time.  (A note of C user, there is no ++, --, and no "break", "print" is not "printf"). "print" knows only integer and string.  The size of basic unit (integer) depends on the target machine. 
// sum array

ax[10]

sum()
  i = 0
  s = 0
  while( ax[i] != 0 )
    s = s + ax[i]
    i = i + 1
  return s

main()
  ax[0] = 11
  ax[1] = 22
  ax[2] = 33
  ax[3] = 44
  ax[4] = 0
  print(sum())
The call by reference can be achieved using the * and & operators just like in C.  In short, you can think of Rz syntax as C without type declaration.

increment(x)
  *x = *x + 1 

gv  // global variable
main()
  gv = 3
  increment(&gv)
  print(gv)

Recursion works naturally
// factorial

fac(n)
  if( n == 0 ) return 1
  else return n * fac(n-1)

main()
  print(fac(6))

With version 3.7, the compiler generates machine code for S2 version 3 (s23). A special syntax is introduced to enable a low level code generation.  

asm("...x ")
where x is an assembly statement.  The compiler will output this statement to the output file. To use this feature, understanding of S2.3 assembler is necessary.
s23 assembly language

System call

To implement system dependent operations (such as input/output), Rz uses "syscall( number, argument )".  Where number is the system call number (determine function) and argument is an optinal input to the function.  This is compiled into machine code dependent to the processor (for S2.3  it is "trap r num").  For the available function, please see the S2.3 simulator document.

    syscall(1, a)

Dereference operator

The * (deref) and & (address) operators can be used as follows.

*var  works for both local and global variables on right-hand-side it will dereference to get value. On left-hand-side, it will store to var indirectly.  See this example.

inc(v)
  *v = *v + 1


is compiled into

pop sp v
ld r @0 v
add r r #1
st r @0 v


&var  works only on global variable on right-hand-side and has no meaning on left-hand-side.  To use the above example, let gv be a global variable.

gv

main()
  gv = 3
  inc(&gv)
  print(gv)


&gv will result in the address of the global variable gv.  & cannot be used with a local variable as in our compilation scheme we map local variables to registers.  There is no way to get an address of a register.  (If we have taken a different scheme, it can be made to work).

To do indirect addressing, we can use a notation of indexing an array.  This is equivalence:

    *v == v[0]

Here is how the compiler generate code for * & .

lv = local variable
gv = global variable
d  = destination register

term
left-hand-side
right-hand-side
*lv st d @0 lv ld d @0 lv
*gv ld r1 gv
st d @0 r1
ld r1 gv
ld d @0 r1
&lv illegal illegal
&gv illegal constant (ads of gv)
*lv[idx] ld r1 +lv idx
st d @0 r1
ld r1 +lv idx
ld d @0 r1
*gv[idx] ld r1 @gv idx
st d @0 r1
ld r1 @gv idx
ld d @0 r1
&lv[idx] illegal illegal
&gv[idx] add r1 idx #gv
st d @0 r1
add d idx #gv

Pointer to function

The address operator "&" is used to get an address of a function. So, a pointer to function can be implemented (for example, a table of pointer to function can be used to do a "switch" control structure).  Here is an example,
show()
  print("hello")

main()
  ads = &show
  print(&show)

Note: I did not show "how to use" the pointer to function.  It is not useable in the Rz language, not without some assembly language construction.

How to access a structure?

Mostly in Rz, we use array to store compound data.  Compare this to C (a linked list cell).

struct{
  int data;
  int next;
} acell;

acell *node;

node->data = 10;


In Rz, we will use array.  Assume we have "malloc".

mynode = malloc(2)
mynode[0] = 10       // data
mynode[1] = ...      // next

*mynode == mynode[0]


Because only one level index is allowed in the syntax, when accessing a complex data structure an intermediate step may be required. Assuming ax is an array of structure with 2 members ( id, data ). 

in C

b = ax[20].data

in Rz

def  data  1

ads = &ax[20]
b = ads[data]


Simple Macro

The macro in Rz has two uses: 
1  Define symbolic constant
2  Define simple expression with textual substitution

Define symbolic constant

     def  MYMAX   100

Define simple expression

def getRef(ref)
    return record[ref]

def setRef(ref,x)
    record[ref] = x
main()
  a = MYMAX + 10
  c = getRef(a+1)
  setRef(a, a+c)
The macro is not similar to C.  Defining a simple expression has the same structure as defining a function, but it can not have any non-free variable in the body.  The macro performs textual substitution of its free variables, so the only variables allowed in the body of definition are globals and formal parameters. (other local variable can not have a substitution, therefore it is illegal).

Because macro is compiled (it is not a preprocessor), when defining a right-hand-side expression, a "return" is needed to make it syntactically correct. However, the "return" is not substituted into a target.  Consider the above example, the "output" of the macro substitution is:
main()
  a = 100 + 10
  c = record[a+1]
  record[a] = a + c

Current state of implementation

The output of the compiler is the s2.3 assembly language. It can be assemble and run under s2.3 simulator. 

Session example

Here is hand-on how to use the compiler.  Compile the "sum array" program above. The screen will show:
c:> rz37 sum.txt
.symbol
 fp 30
 sp 29
 retval 28
 rads 27
 ax 2000
.code 0
 mov fp #4000
 mov sp #3000
 jal rads main
 trap r0 #0

:sum
  st r1 @1 fp
  st r2 @2 fp
  st r3 @3 fp
  st r4 @4 fp
  add fp fp #5
  st rads @0 fp
  mov r1 #0
  mov r2 #0
  jmp L102
:L103
  ld r3 @ax r1
  add r4 r2 r3
  mov r2 r4
  add r1 r1 #1
...
:main
...
  ld r1 @1 fp
  ret rads
.data 200
.end

Download

rz37.zip   compiler source that generates s2.3 assembly code
rz37-1.zip    update compiler with  deref and macro
rz37-2.zip    update compiler with  pointer to function
 
last update 28 Feb 2013