Rz
language version 3.7
(This
compiler is still a work in progress. )
Examples
System call
Dereference
Data structure
Macro
Download
Rz
is a descendant of R1, a concurrent language for small control
applications. (Have a look at full report and implementation from my
research work web page).
Rz is aimed to be a teaching language for system programming and computer
architecture subjects, which emphasises a small language that can be used
to illustrate all the "inner" working parts of a computer system
(compilation, code generation, ISA simulation), in other words it allows
students to "play" with the system. R1 is a concurrent language. Rz
simplifies that by eliminating all the real-time concurrency language
features and retains only the most basic language constructs. In a way,
Rz "looks like" C (without type).
Short
description
The
language is a small subset of C look-alike language. It has no type (or
having only one type which is "int"). Global variables must be declared
but local variables are automatic. A variable can be either a scalar or
an array. There is no user defined data type. An array is one
dimension. RZ language can be summarised as follows:
- It has only integer as primitive data. (natural size depends on
implementation)
- Global variables must be declared before their use. Local variables
are automatic (not required to be declared).
- Global variables can be array. The size of array must be known at
compile time. An array has only one dimension. Local variables can be
only scalar.
- Reserved words are: if, else, while, return, print.
- Operators are: +,
-, *, /, ==, !=, <, <=, >, >=, !, &&, ||, *
(dereference), & (address).
For
C programmer, please note, no: for, break, do, missing many operators
especially ++, -- . The syntax looks clean because it uses indentation
instead of {} and using a newline to terminate a statement instead of ';'.
Examples
It
is easier just to look at an example to know most of the syntax. Here is
an example of Rz
// find max in an array
a[10], N
init()
i = 0
while ( i < N )
a[i] = i
i = i + 1
main()
N = 10
init()
max = a[0]
i = 1
while( i < N )
if( max < a[i] ) max = a[i]
i = i + 1
print(max)
The
variables a[], N are globals, max, i are locals. For an array, the size
must be known at compile time. (A note of C user, there is no ++, --, and
no "break", "print" is not "printf"). "print" knows only integer and
string. The size of basic unit (integer) depends on the target machine.
//
sum array
ax[10]
sum()
i = 0
s = 0
while( ax[i] != 0 )
s = s + ax[i]
i = i + 1
return s
main()
ax[0] = 11
ax[1] = 22
ax[2] = 33
ax[3] = 44
ax[4] = 0
print(sum())
The
call by reference can be achieved using the * and & operators just
like in C. In short, you can think of Rz syntax as C without type
declaration.
increment(x)
*x = *x + 1
gv // global variable
main()
gv = 3
increment(&gv)
print(gv)
Recursion
works naturally
//
factorial
fac(n)
if( n == 0 ) return 1
else return n * fac(n-1)
main()
print(fac(6))
With version 3.7, the compiler generates machine code for S2 version 3
(s23). A special syntax is introduced to enable a low level code
generation.
asm("...x
")
where
x is an assembly statement. The compiler will output this statement to
the output file. To use this feature, understanding of S2.3 assembler is
necessary.
s23
assembly language
System call
To implement system dependent operations (such as input/output), Rz uses
"syscall( number, argument )". Where number is the system call number
(determine function) and argument is an optinal input to the function. This
is compiled into machine code dependent to the processor (for S2.3 it is
"trap r num"). For the available function, please see the
S2.3 simulator document.
syscall(1, a)
Dereference operator
The * (deref) and & (address) operators can be used as follows.
*var works for both local and global variables on right-hand-side it will
dereference to get value. On left-hand-side, it will store to var
indirectly. See this example.
inc(v)
*v = *v + 1
is compiled into
pop sp v
ld r @0 v
add r r #1
st r @0 v
&var works only on global variable on right-hand-side and has no
meaning on left-hand-side. To use the above example, let gv be a global
variable.
gv
main()
gv = 3
inc(&gv)
print(gv)
&gv will result in the address of the global variable gv. & cannot
be used with a local variable as in our compilation scheme we map local
variables to registers. There is no way to get an address of a register.
(If we have taken a different scheme, it can be made to work).
To do indirect addressing, we can use a notation of indexing an array. This
is equivalence:
*v == v[0]
Here is how the compiler generate code for * & .
lv = local variable
gv = global variable
d = destination register
term
|
left-hand-side
|
right-hand-side
|
*lv |
st d @0 lv |
ld d @0 lv |
*gv |
ld r1 gv
st d @0 r1 |
ld r1 gv
ld d @0 r1 |
&lv |
illegal |
illegal |
&gv |
illegal |
constant (ads of gv) |
*lv[idx] |
ld r1 +lv idx
st d @0 r1 |
ld r1 +lv idx
ld d @0 r1 |
*gv[idx] |
ld r1 @gv idx
st d @0 r1 |
ld r1 @gv idx
ld d @0 r1 |
&lv[idx] |
illegal |
illegal |
&gv[idx] |
add r1 idx #gv
st d @0 r1 |
add d idx #gv |
Pointer to function
The address operator "&" is used to get an address of a function. So, a
pointer to function can be implemented (for example, a table of pointer to
function can be used to do a "switch" control structure). Here is an
example,
show()
print("hello")
main()
ads = &show
print(&show)
Note: I did not show "how to use" the pointer to function. It is not
useable in the Rz language, not without some assembly language construction.
How to access a structure?
Mostly in Rz, we use array to store compound data. Compare this to C (a
linked list cell).
struct{
int data;
int next;
} acell;
acell *node;
node->data = 10;
In Rz, we will use array. Assume we have "malloc".
mynode = malloc(2)
mynode[0] = 10 // data
mynode[1] = ... // next
*mynode == mynode[0]
Because only one level index is allowed in the syntax, when accessing a
complex data structure an intermediate step may be required. Assuming ax is
an array of structure with 2 members ( id, data ).
in C
b = ax[20].data
in Rz
def data 1
ads = &ax[20]
b = ads[data]
Simple Macro
The macro in Rz has two uses:
1 Define symbolic constant
2 Define simple expression with textual substitution
Define symbolic constant
def MYMAX 100
Define simple expression
def getRef(ref)
return record[ref]
def setRef(ref,x)
record[ref] = x
main()
a = MYMAX + 10
c = getRef(a+1)
setRef(a, a+c)
The macro is not similar to C. Defining a simple expression has the same
structure as defining a function, but it can not have any non-free variable
in the body. The macro performs textual substitution of its free variables,
so the only variables allowed in the body of definition are globals and
formal parameters. (other local variable can not have a substitution,
therefore it is illegal).
Because macro is compiled (it is not a preprocessor), when defining a
right-hand-side expression, a "return" is needed to make it syntactically
correct. However, the "return" is not substituted into a target. Consider
the above example, the "output" of the macro substitution is:
main()
a = 100 + 10
c = record[a+1]
record[a] = a + c
Current
state of implementation
The
output of the compiler is the s2.3 assembly language. It can be assemble
and run under s2.3 simulator.
Session
example
Here
is hand-on how to use the compiler. Compile the "sum array" program
above. The screen will show:
c:>
rz37 sum.txt
.symbol
fp 30
sp 29
retval 28
rads 27
ax 2000
.code 0
mov fp #4000
mov sp #3000
jal rads main
trap r0 #0
:sum
st r1 @1 fp
st r2 @2 fp
st r3 @3 fp
st r4 @4 fp
add fp fp #5
st rads @0 fp
mov r1 #0
mov r2 #0
jmp L102
:L103
ld r3 @ax r1
add r4 r2 r3
mov r2 r4
add r1 r1 #1
...
:main
...
ld r1 @1 fp
ret rads
.data 200
.end
Download
rz37.zip
compiler source that generates s2.3 assembly code
rz37-1.zip update compiler with deref and
macro
rz37-2.zip update compiler with pointer to
function
last
update 28 Feb 2013