S2 version 1: 32-bit Processor
This is a typical simple 32-bit CPU. It has a three-address
instruction architecture with 32 registers and load/store instructions. This
document presents only S21 assembly language view. It does not give details
about microarchitecture (such as pipeline).
The register bank (32 registers) has two output ports connected to ALU
through multiplexors. The output of ALU is written back to the input
port of the register bank. The fundamental cycle of this processor is
read two registers, perform ALU operation, and write back, so called
"read-modify-write" cycle. This can be realised in circuits that
complete it in one clock (dependent on what kind of ALU operation).
This register bank is considered as a local memory which has limited size.
The larger, global memory is external to the processor. The data can
be read and write from Memory to Register, so called "load/store"
operations. The maximum size of the global memory, the address space,
is determined by the format of the instruction. S21 has 22 bits
absolute address so it can directly address 4M words of memory. The indirect
address (via register) can reach 32 bits, 4G words. The instruction is
fetch from the global memory to Instruction Register (IR). The output
of IR is connected to Control Unit. The location of the instruction is
tracked by Program Counter (PC). The flow of control of instruction
can be changed through setting the value of PC.
S21 has three-address instruction set. A general format of an
instruction (register to register operations) is:
op
r1 r2 r3 means R[r1] = R[r2] op R[r3]
To pass values between memory and registers, load/store instructions are
used. Load means transfer memory to register. Store means transfer
register to memory. There are three addressing modes: absolute, index and
displacement.
ld
r1 ads R[r1] =
M[ads] absolute
ld r1 +r2 r3
R[r1] = M[R[r2]+R[r3]] index
ld r1 @d r2 R[r1] = M[d +
R[r2]] displacement
st r1 ads M[ads] =
R[r1] absolute
st r1 +r2 r3
M[R[r2]+R[r3]] = R[r1] index
st r1 @d r2 M[d + R[r2]] =
R[r1] displacement
In assembly language, these addressing modes are written using the
convention op dest <- source.
Instruction type
arithmetic:
add sub mul div mod
logic:
and or xor eq ne lt le gt ge shl shr
control:
jmp jt jf jal ret
data:
ld st push pop mov
Instruction meaning
false == 0
true != 0
op r1 r2 r3
is R[r1] = R[r2] op R[r3]
op r1 r2 #n
is R[r1] = R[r2] op n
Data
d is 17-bit sign extended
m is 22-bit sign extended
ld r1
ads is R[r1] =
M[ads] absolute
ld r1 +r2 r3
is R[r1] = M[R[r2]+R[r3]] index
ld r1 @d r2
is R[r1] = M[d + R[r2]] displacement
st r1
ads is M[ads] =
R[r1] absolute
st r1 +r2 r3
is M[R[r2]+R[r3]] = R[r1] index
st
r1 @d r2 is M[d + R[r2]] =
R[r1] displacement
mov r1 #m is R[r1] = m
move constant to register
mov r1 r2 is R[r1] =
R[r2] move value between registers
Control
jmp
ads
is pc = ads
jt r1 ads is if
R[r1] != 0 pc = ads jump if
r1 is true
jf r1
ads is if R[r1]
== 0 pc = ads jump if r1 is false
jal r1
ads is R[r1] = PC; PC
= ads jump and link
ret
r1
is PC = R[r1]
return
Arithmetic
two-complement integer arithmetic
n is 17-bit sign extended
add r1 r2 r3
R[r1] = R[r2] + R[r3]
add r1 r2 #n
R[r1] = R[r2] + n
sub r1 r2 ... R[r1]
= R[r2] - R[r3]
mul r1 r2 ... R[r1]
= R[r2] * R[r3]
div r1 r2 ... R[r1]
= R[r2] / R[r3] integer division
mod r1 r2 ... R[r1]
= R[r2] % R[r3] modulo
Logic (bitwise)
n is 17-bit sign extended
and r1 r2 r3
R[r1] = R[r2] bit-and R[r3]
and r1 r2 #n
R[r1] = R[r2] bit-and n
or r1 r2
r3 R[r1] = R[r2] bit-or R[r3]
or r1 r2 #n R[r1] = R[r2] bit-or n
xor r1 r2 r3
R[r1] = R[r2] bit-xor R[r3]
xor r1 r2 #n R[r1] = R[r2] bit-xor n
eq r1 r2
r3 R[r1] = R[r2] == R[r3]
eq r1 r2
#n R[r1] = R[r2] == n
ne ...
lt ...
le ...
gt ...
ge ...
shl r1 r2 r3 R[r1] = R[r2] << R[r3]
shl r1 r2 #n
R[r1] = R[r2] << n
shr r1 r2 r3
R[r1] = R[r2] >> R[r3]
shr r1 r2 #n
R[r1] = R[r2] >> n
Stack operation
To facilitate passing the parameters to a subroutine and also to save state
(link register) for recursive call, two stack operations are defined: push,
pop.
push r1
r2 is R[r1]++; M[R[r1]] =
R[r2]
push r2, r1 as stack pointer
pop r1
r2 is R[r2] = M[R[r1]];
R[r1]--
pop to r2, r1 as stack pointer
Pseudo
Pseudo instructions are unlike other instructions, they are used mainly to
control the simulator and to perform input/output. "trap
"
instruction have two arguments. The second argument is the trap number
designated the operation.
trap r1 #n special instruction, n is
in r
2-field.
trap r1
#0
stop
simulation
trap r1
#1
print
integer in R[r1]
trap r2 #2
print
character in R[r2]
Instruction format
L-format
op:5
rd1:5 ads:22
D-format op:5
rd1:5 rs2:5 disp:17
X-format op:5
rd1:5 rs2:5 rs3:5 xop:12
(rd dest, rs source, ads 22-bit, disp 17-bit sign extended)
Instructions are fixed length at 32 bits. There are 32 registers. The
address space is 32-bit. Access to memory is always on word boundary
(no byte-access). Direct (absolute address) is 22-bit or the first 4M
words. Index and indirect access can reach the whole 32-bit address
space. Immediate value (d) is 17-bit. It is sign extended.
The jump instructions (jmp, jt, jf)
have 22-bit address.
ISA and opcode encoding
opcode format
0 L nop
no operation
1 L ld r1
ads
(ads 22-bit)
2 D ld r1 @d r2 (d 17-bit sign
extended)
3 L st r1
ads (ads 22-bit)
4 D st r1 @d r2 (d 17-bit sign extended)
5 L mov r #m (m 22-bit sign
extended)
6 L jmp ads
7 L jal
r1 ads
8 L
jt r1 ads
9 L
jf r1 ads
10 D add r1 r2 #n (n 17-bit
sign extended)
11 D
sub r1 r2 #n ...
12 D
mul r1 r2 #n
13 D
div r1 r2 #n
14 D
and r1 r2 #n
15 D
or r1 r2 #n
16 D xor r1 r2 #n
17 D
eq r1 r2 #n
18 D
ne r1 r2 #n
19 D
lt r1 r2 #n
20 D
le r1 r2 #n
21 D
gt r1 r2 #n
22 D
ge r1 r2 #n
23 D
shl r1 r2 #n
24 D
shr r1 r2 #n
25 D
mod r1 r2 #n
26..30 undefined
31 xop
- X
xop
0 X add r1 r2 r3
1 X sub r1 r2 r3
2 X
mul r1 r2 r3
3 X
div r1 r2 r3
4 X
and r1 r2 r3
5 X
or r1 r2 r3
6 X
xor r1 r2 r3
7 X
eq r1 r2 r3
8 X
ne r1 r2 r3
9 X
lt r1 r2 r3
10 X
le r1 r2 r3
11 X
gt r1 r2 r3
12 X
ge r1 r2 r3
13 X
shl r1 r2 r3
14 X
shr r1 r2 r3
15 X mod r1 r2 r3
16 x mov r1 r2
17 X ld r1 +r2 r3
18 X st r1 +r2 r3
19 X ret r1
20 X trap n use n at r1 n = 0..31
21 X push r1 r2
use r1 as stack pointer
22 X pop r1 r2
use r1 as stack pointer
23 X
not r1 r2
24..4095
undefined
Historical fact
S21 is an extension of S2 from 2007. This design is a result of my
collective experience in teaching assembly language to novice programmers
for many years. S2 has been used for teaching since 2001. S2
itself is an "extended" version of S1 (a 16-bit processor) which is used for
teaching since 1997.
To improve understandability of S2 assembly language, flags are not
used. Instead, new logical instructions that have 3-address are
introduced. They are similar to existing arithmetic
instructions. The result (true/false) is stored in a register.
Two conditional jumps are introduced "jt", "jf". They make use of the result
from logical instructions. Indirect addressing caused difficulty
to many students (similar to pointers in C language). However, array
index is simple to understand (may be aided by students' experience in high
level language programming). So, the use of index is preferred.
The opcode format and assembly language format for S2 follow the tradition
"dest = source1 op source2" from classic machines such as PDP-11, VAX and
IBM S360.
Assembly language programming
1. Add 1 to 10
s = 0
i = 0
while i <= 10
s = s + i
i = i + 1
use r1 as s, r2 as i, r3 as flag
.code 0
mov r1 #0 ; s = 0
mov r2 #0 ; i = 0
:while
le r3 r2 #10 ; i <= 10
jf r3 exit
add r1 r1 r2 ; s = s + i
add r2 r2 #1 ; i = i + 1
jmp while
:exit
trap r0 #0 ; stop
.end
2. Sum all elements in an array. Assume the array is at 100,
size of array is n.
s = 0
i = 0
while i < n
s = s + ax[i]
i = i + 1
use r1 as s, r2 as i, r3 as base, r4 as ax[i], r5 as flag
.code 0
mv r1 #0 ; s = 0
mv r2 #0 ; i = 0
mv r3 #100 ; set base address
:while
lt r5 r2 #5 ; i < n
jf r5 exit
ld r4 +r3 r2 ; ax[i]
add r1 r1 r4 ; s = s + ax[i]
add r2 r2 #1 ; i = i + 1
jmp while
:exit
trap 0 ;
stop
.data 100
11 22 33 44 55 ; ax[] at 100
.end
3. Write "pointer chasing" program in assembly language.
Goal, to test if a number is in a list.
Data structure (singly linked list)
A list is made out of an array of nodes. A node has two fields: the
first one contains the number, the second one contains an index to the next
node. The last node has 0 in the next field, signified the end of the list.
For example, this is a list of number (11,22,33). The list start at
index 100.
address: 100 101 102 103
104 105 106
data: 11 102
22 104 33
0 ...
Pseudo code
Let L be a list, x is a number.
member(x, L)
if head(L) == x return yes
while tail(L) != 0
if head(L) == x return yes
L = tail(L)
return no
We can improve this program a bit.
member2(x, L)
while L != 0
if head(L) == x return yes
L = tail(L)
return no
It can also be written in a recursive form.
memberrec(x, L)
if head(L) == x return yes
if tail(L) == 0 return no
return memberrec(x, tail(L))
We write the second version of program in s21 assembly language. The program
runs "member2(22,L)" where L = 100. That is, it
checks whether 22 is in the list L.
How to access head(L) and tail(L)? We use index addressing,
assume base address of L is in r2, then
ld r4 @0 r2
will load M[0+R[r2]] to r4 which is exactly head(L). For tail(L) we
need offset 1, then
ld r4 @1 r2
will load M[1+R[r2]] to r4, tail(L). Use r2 base, r3 x, r4 value, r5
flag.
; member of list
.code 0
:member2
mov r2 #100 ;
base address L
mov r3 #22 ;
x = 22
:loop
jf r2
no
; while L != 0
ld r4 @0 r2 ;
head(L)
eq r5 r4 r3
; head(L) == x ?
jt r5 yes
ld r2 @1 r2
; L = tail(L)
jmp loop
:yes
mov r5 #1
jmp exit
:no
mov r5 #0
:exit
trap 0
.data 100
11 102 22 104 33 0 ;; list L at 100
.end
Put this code into a text file "mem2.txt". Assemble it using:
c:> as21 mem2.txt
The assembler produces a listing file and an object file. The
simulator sim21.exe
takes an object file and run it.
c:> sim21 mem2.obj
Using the command "t" (single step), you will see:
c:> sim21 mem2.obj
load program, last address 106
>t
PC 0 mov r1 #1
r1:1 r2:0 r3:0 r4:0 r5:0 r28:0 r29:0 r31:0
>t
PC 1 mov r2 #100 r1:1
r2:100 r3:0 r4:0 r5:0 r28:0 r29:0 r31:0
>t
PC 2 mov r3 #22 r1:1
r2:100 r3:22 r4:0 r5:0 r28:0 r29:0 r31:0
>t
PC 3 jf r2
11 r1:1 r2:100 r3:22
r4:0 r5:0 r28:0 r29:0 r31:0
>t
PC 4 ld r4 @0 r2 r1:1 r2:100
r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 5 eq r5 r4 r3 r1:1
r2:100 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 6 jt r5
9 r1:1 r2:100
r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 7 ld r2 @1 r2 r1:1 r2:102 r3:22
r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 8 jmp
3
r1:1 r2:102 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 3 jf r2
11 r1:1 r2:102 r3:22
r4:11 r5:0 r28:0 r29:0 r31:0
>t
PC 4 ld r4 @0 r2 r1:1 r2:102 r3:22
r4:22 r5:0 r28:0 r29:0 r31:0
>t
PC 5 eq r5 r4 r3 r1:1
r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
>t
PC 6 jt r5
9 r1:1 r2:102
r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
>t
PC 9 mov r5 #1
r1:1 r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
>t
PC 10 jmp
12 r1:1
r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
>t
stop, execute 16 instructions
PC 12 trap
0 r1:1
r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
>
if you change x to 44. Reassemble mem2.txt, and run mem2.obj again.
c:> sim21 mem2.obj
load program, last address 106
>g
stop, execute 24 instructions
>r
r0 :0 r1 :1 r2 :0 r3
:44 r4 :33 r5 :0 r6
:0 r7 :0
r8 :0 r9 :0 r10:0
r11:0 r12:0 r13:0
r14:0 r15:0
r16:0 r17:0 r18:0
r19:0 r20:0 r21:0
r22:0 r23:0
r24:0 r25:0 r26:0
r27:0 r28:0 r29:0
r30:0 r31:0
>
This time we let it run through using the command "g". The simulator
stops and we list registers by "r", to see the result in r5 = 0 (No).
4. Function abstraction
Calling a function is done by "jal r1 ads
" instruction.
Before jumping to the function, the return address of PC is saved in
r1. The return is done by "ret r1
", which restore PC from
r1. This register is called "link" register.
There are two things that we have to concern here: 1) how to
create local variables 2) how to pass parameters. Local
variables are mapped to registers, therefore we need to save any registers
that the function is going to use and at the end, before returning to the
caller, these registered must be restored. The place to store "local
context" is called "frame". We use stack data structure to
save/restore registers.
Two stack operations are available:
push sp r1
pop sp r2
For example, suppose the function uses r1, r2. Here is the sequence
:fun
push fp r1 ; frame is at fp
push fp r2
; do fun body
; at the return
pop fp r2
pop fp r1
ret r27 ; assume r27
saved return PC
Similarly for passing the parameter we use a different stack to pass a value
from the caller to a function. To return a value from a function, assume we
return only one value, then we need to designate a "global" place (in a
register) for this purpose.
In summary, there are four global registers to be used in constructing
function calls: frame, stack, return value, return address (fp, sp, retv,
rads). We must assigned four registers specially for this purpose. Let
us see some real code:
pseudo code
main
y = double(10)
print(y)
double x
ret x + x
assembly
:main
; y is r1
mov r1 #10
push sp r1
; pass #10
jal rads double
mov r1 retv ; y =
double(10)
trap r1 #1
; print y
trap r0 #0
:double
; use r1
push fp r1
; save r1
pop sp
r1 ; get x
add retv r1 r1 ; ret x + x
pop fp
r1 ; restore
r1
ret rads
Of course, we must initialise values of the global registers before we
run our program.
.symbol
fp 27 ; r27
sp 28 ; r28
retv 29 ; r29
rads 30 ; r30
.main
mov fp #1000 ; frame at M[1000]
mov sp #2000 ; stack at M[2000]
....
Source code for S21
The source code of assembler and simulator (in C)
s21-0.zip
s21-3.zip update 24 Jan 2017
last update 24 Jan 2017