S2 version 1: 32-bit Processor

This is a typical simple 32-bit CPU. It has a three-address instruction architecture with 32 registers and load/store instructions. This document presents only S21 assembly language view. It does not give details about microarchitecture (such as pipeline).

The register bank (32 registers) has two output ports connected to ALU through multiplexors. The output of ALU is written back to the input port of the register bank. The fundamental cycle of this processor is read two registers, perform ALU operation, and write back, so called "read-modify-write" cycle. This can be realised in circuits that complete it in one clock (dependent on what kind of ALU operation). This register bank is considered as a local memory which has limited size. The larger, global memory is external to the processor. The data can be read and write from Memory to Register, so called "load/store" operations. The maximum size of the global memory, the address space, is determined by the format of the instruction. S21 has 22 bits absolute address so it can directly address 4M words of memory. The indirect address (via register) can reach 32 bits, 4G words. The instruction is fetch from the global memory to Instruction Register (IR). The output of IR is connected to Control Unit. The location of the instruction is tracked by Program Counter (PC). The flow of control of instruction can be changed through setting the value of PC.

s21 diagram

S21 has three-address instruction format. A general format of an instruction (register to register operations) is:

op r1 r2 r3 means R[r1] = R[r2] op R[r3]

To pass values between memory and registers, load/store instructions are used. Load means transfer memory to register. Store means transfer register to memory. There are three addressing modes: absolute, index and displacement.

ld
          r1 ads          R[r1] =
          M[ads]          absolute

        ld r1 +r2 r3  
              R[r1] = M[R[r2]+R[r3]]  index

          ld r1 @d r2        R[r1] = M[d +
          R[r2]]    displacement 

          st r1 ads          M[ads] =
          R[r1]          absolute

        st r1 +r2 r3  
              M[R[r2]+R[r3]] = R[r1]  index

          st r1 @d r2        M[d + R[r2]] =
          R[r1]    displacement

In assembly language, these addressing modes are written using the convention op dest <- source.

Instruction type

arithmetic:            add sub mul div mod
logic:                  and or xor eq ne lt le gt ge shl shr
control:            jmp jt jf jal ret
data:                   ld st push pop mov

Arithmetic: integer arithmetic operations, + - * / %. div and mod make it possible to get both quotient (div) and remainder (mod). Logic operators are bit-wise, and or xor shl shr. The exception is 'not' which returns True/False. To do conditional expression which returns True/False, and or xor not can be used in combination. For branching the comparison operators are: eq, ne, lt, le, gt, ge. They return True/False. Control operators are: jmp (uncondition), jt (jump if true), jf (jump if false), jal (jump and link), ret (return from call). Data operators: ld st. They also include push pop which operate on stack (with stack pointer). 'mov' moves data between registers.

Instruction meaning

false == 0    false is zero
true != 0     true is not false

op r1 r2 r3     is    R[r1] = R[r2] op R[r3]
op r1 r2 #n     is    R[r1] = R[r2] op n

Data

d is 17-bit sign extended
m is 22-bit sign extended

ld r1 ads       is    R[r1] = M[ads]          absolute
ld r1 +r2 r3 is    R[r1] = M[R[r2]+R[r3]] index
ld r1 @d r2     is R[r1] = M[d + R[r2]]    displacement
st r1 ads       is    M[ads] = R[r1]          absolute
st r1 +r2 r3 is    M[R[r2]+R[r3]] = R[r1] index
st r1 @d r2     is M[d + R[r2]] = R[r1]    displacement
mov r1 #m       is    R[r1] = m               move constant to register
mov r1 r2       is    R[r1] = R[r2] move value between registers

Control

jmp
          ads        
          is    pc = ads

          jt r1 ads       is    if
          R[r1] != 0  pc = ads    jump if
        r1 is true

        jf r1
          ads       is    if R[r1]
          == 0  pc = ads    jump if r1 is false 

        jal r1
          ads      is    R[r1] = PC; PC
          = ads       jump and link

        ret
          r1         
          is    PC = R[r1]             
           return

Arithmetic

two-complement integer arithmetic
n is 17-bit sign extended

add r1 r2 r3     R[r1] = R[r2] + R[r3]
add r1 r2 #n     R[r1] = R[r2] + n
sub r1 r2 ...    R[r1] = R[r2] - R[r3]
mul r1 r2 ...    R[r1] = R[r2] * R[r3]
div r1 r2 ...    R[r1] = R[r2] / R[r3] integer division
mod r1 r2 ...    R[r1] = R[r2] % R[r3] modulo

Logic (bitwise)

n is 17-bit sign extended

and r1 r2 r3     R[r1] = R[r2] bit-and R[r3]
and r1 r2 #n     R[r1] = R[r2] bit-and n
or r1 r2 r3      R[r1] = R[r2] bit-or R[r3]
or r1 r2 #n      R[r1] = R[r2] bit-or n
xor r1 r2 r3     R[r1] = R[r2] bit-xor R[r3]
xor r1 r2 #n     R[r1] = R[r2] bit-xor n
eq r1 r2 r3      R[r1] = R[r2] == R[r3]
eq r1 r2 #n      R[r1] = R[r2] == n
ne ...
lt ...
le ...
gt ...
ge ... shl r1 r2 r3     R[r1] = R[r2] << R[r3]
shl r1 r2 #n     R[r1] = R[r2] << n
shr r1 r2 r3     R[r1] = R[r2] >> R[r3]
shr r1 r2 #n     R[r1] = R[r2] >> n
not r1 r2        R[r1] = R[r2] == 0

Stack operation

To facilitate passing the parameters to a subroutine and also to save state (link register) for recursive call, two stack operations are defined: push, pop.

push r1 r2 is R[r1]++; M[R[r1]] = R[r2] push r2, r1 as stack pointer
pop r1 r2 is R[r2] = M[R[r1]]; R[r1]-- pop to r2, r1 as stack pointer

Pseudo

Pseudo instructions are unlike other instructions, they are used mainly to control the simulator and to perform input/output. "trap" instruction have two arguments. The second argument is the trap number designated the operation.



        

        trap r1 #n       special instruction, n is
        in r

2-field.

trap r1 #0                 stop simulation
trap r1 #1                 print integer in R[r1]
trap r2 #2                 print character in R[r2]

Instruction format

L-format    op:5 rd1:5 ads:22
D-format    op:5 rd1:5 rs2:5 disp:17
X-format    op:5 rd1:5 rs2:5 rs3:5 xop:12

(rd dest, rs source, ads 22-bit, disp 17-bit sign extended)

Instructions are fixed length at 32 bits. There are 32 registers. The address space is 32-bit. Access to memory is always on word boundary (no byte-access). Direct (absolute address) is 22-bit or the first 4M words. Index and indirect access can reach the whole 32-bit address space. Immediate value (d) is 17-bit. It is sign extended. The jump instructions (jmp, jt, jf) have 22-bit address.

ISA and opcode encoding

opcode   format
0 L    nop                      no operation
1 L   ld r1 ads    (ads 22-bit) 2 D    ld r1 @d r2   (d 17-bit sign extended) 3 L   st r1 ads     (ads 22-bit) 4 D    st r1 @d r2 (d 17-bit sign extended) 5 L    mov r #m    (m 22-bit sign extended)
6 L    jmp ads
7 L   jal r1 ads
8 L    jt r1 ads
9 L    jf r1 ads
10 D   add r1 r2 #n (n 17-bit sign extended)
11 D    sub r1 r2 #n ...
12 D    mul r1 r2 #n
13 D    div r1 r2 #n
14 D    and r1 r2 #n
15 D    or r1 r2 #n
16 D    xor r1 r2 #n
17 D    eq r1 r2 #n
18 D    ne r1 r2 #n
19 D    lt r1 r2 #n
20 D    le r1 r2 #n
21 D    gt r1 r2 #n
22 D    ge r1 r2 #n
23 D    shl r1 r2 #n
24 D    shr r1 r2 #n
25 D    mod r1 r2 #n
26..30 undefined
31 xop -     X

xop
0 X   add r1 r2 r3
1 X   sub r1 r2 r3
2 X   mul r1 r2 r3
3 X   div r1 r2 r3
4 X   and r1 r2 r3
5 X   or r1 r2 r3
6 X   xor r1 r2 r3
7 X   eq r1 r2 r3
8 X   ne r1 r2 r3
9 X lt r1 r2 r3
10 X le r1 r2 r3
11 X gt r1 r2 r3
12 X ge r1 r2 r3
13 X shl r1 r2 r3

14 X  
        shr r1 r2 r3

15 X   mod r1 r2 r3
16 x   mov r1 r2
17 X   ld r1 +r2 r3
18 X st r1 +r2 r3
19 X ret r1
20 X trap r1 n     use n at r2 n = 0..31
21 X   push r1 r2    use r1 as stack pointer
22 X   pop r1 r2     use r1 as stack pointer

23 X  
        not r1 r2

24..4095 undefined

Historical fact

S21 is an extension of S2 from 2007. This design is a result of my collective experience in teaching assembly language to novice programmers for many years. S2 has been used for teaching since 2001. S2 itself is an "extended" version of S1 (a 16-bit processor) which is used for teaching since 1997.

To make S2 assembly language easier to understand, flags are not used. Instead, new logical instructions that have 3-address are introduced. They are similar to existing arithmetic instructions. The result (true/false) is stored in a register. Two conditional jumps are introduced "jt", "jf". They make use of the result from logical instructions. Indirect addressing caused difficulty to many students (similar to pointers in C language). However, array index is simple to understand (may be aided by students' experience in high level language programming). So, the use of index is preferred.

The opcode format and assembly language format for S2 follow the tradition "dest = source1 op source2" from classic machines such as PDP-11, VAX and IBM S360.

Assembly language programming

1. Add 1 to 10

s = 0
i = 0
while i <= 10
s = s + i
i = i + 1

use r1 as s, r2 as i, r3 as flag

.code 0 mov r1 #0      ; s = 0
mov r2 #0      ; i = 0
:while
le r3 r2 #10   ; i <= 10
jf r3 exit
add r1 r1 r2   ; s = s + i
add r2 r2 #1   ; i = i + 1
jmp while
:exit trap r0 #0     ; stop .end

2. Sum all elements in an array. Assume the array is at 100, size of array is n.

s = 0
i = 0
while i < n
s = s + ax[i]
i = i + 1

use r1 as s, r2 as i, r3 as base, r4 as ax[i], r5 as flag

.code 0 mov r1 #0        ; s = 0
mov r2 #0        ; i = 0
mov r3 #100    ; set base address
:while
lt r5 r2 #5     ; i < n
jf r5 exit
ld r4 +r3 r2    ; ax[i]
add r1 r1 r4    ; s = s + ax[i]
add r2 r2 #1    ; i = i + 1
jmp while
:exit trap 0          ; stop .data 100 11 22 33 44 55 ; ax[] at 100 .end

3. Write "pointer chasing" program in assembly language.

Goal, to test if a number is in a list.

Data structure (singly linked list)

A list is made out of an array of nodes. A node has two fields: the first one contains the number, the second one contains an index to the next node. The last node has 0 in the next field, signified the end of the list. For example, this is a list of number (11,22,33). The list start at index 100.

address: 100  101   102   103  
      104   105   106

      data:    11   102  
      22    104   33   
      0     ...

Pseudo code

Let L be a list, x is a number.

member(x, L)
if head(L) == x return yes
while tail(L) != 0
if head(L) == x return yes
L = tail(L)
return no

We can improve this program a bit.

member2(x, L)
while L != 0
if head(L) == x return yes
L = tail(L)
return no

It can also be written in a recursive form.

memberrec(x, L)
if head(L) == x return yes
if tail(L) == 0 return no
return memberrec(x, tail(L))

We write the second version of program in s21 assembly language. The program runs "member2(22,L)" where L = 100. That is, it checks whether 22 is in the list L.

How to access head(L) and tail(L)? We use index addressing, assume base address of L is in r2, then

ld r4 @0 r2

will load M[0+R[r2]] to r4 which is exactly head(L). For tail(L) we need offset 1, then

ld r4 @1 r2

will load M[1+R[r2]] to r4, tail(L). Use r2 base, r3 x, r4 value, r5 flag.

; member of list

.code 0
:member2
mov r2 #100         ; base address L
mov r3 #22          ; x = 22
:loop
jf r2 no            ; while L != 0
ld r4 @0 r2       ; head(L)
eq r5 r4 r3         ; head(L) == x ?
jt r5 yes
ld r2 @1 r2         ; L = tail(L)
jmp loop
:yes
mov r5 #1
jmp exit
:no
mov r5 #0
:exit
trap 0

.data 100
11 102 22 104 33 0   ;; list L at 100
.end

Put this code into a text file "mem2.txt". Assemble it using:
c:> as21 mem2.txt
The assembler produces a listing file and an object file. The simulator sim21.exe takes an object file and run it.
c:> sim21 mem2.obj
Using the command "t" (single step), you will see:

c:> sim21 mem2.obj

      load program, last address 106

      >t

      PC   0 mov r1 #1        
      r1:1 r2:0 r3:0 r4:0 r5:0 r28:0 r29:0 r31:0

      >t

      PC   1 mov r2 #100       r1:1
      r2:100 r3:0 r4:0 r5:0 r28:0 r29:0 r31:0

      >t

      PC   2 mov r3 #22        r1:1
      r2:100 r3:22 r4:0 r5:0 r28:0 r29:0 r31:0

      >t

      PC   3 jf r2
      11          r1:1 r2:100 r3:22
      r4:0 r5:0 r28:0 r29:0 r31:0

      >t

      PC   4 ld r4 @0 r2       r1:1 r2:100
      r3:22 r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   5 eq r5 r4 r3       r1:1
      r2:100 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   6 jt r5
      9           r1:1 r2:100
      r3:22 r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   7 ld r2 @1 r2       r1:1 r2:102 r3:22
      r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   8 jmp
      3            
      r1:1 r2:102 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   3 jf r2
      11          r1:1 r2:102 r3:22
      r4:11 r5:0 r28:0 r29:0 r31:0

      >t

      PC   4 ld r4 @0 r2       r1:1 r2:102 r3:22
      r4:22 r5:0 r28:0 r29:0 r31:0

      >t

      PC   5 eq r5 r4 r3       r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0

      >t

      PC   6 jt r5
      9           r1:1 r2:102
      r3:22 r4:22 r5:1 r28:0 r29:0 r31:0

      >t

      PC   9 mov r5 #1        
      r1:1 r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0

      >t

      PC  10 jmp
      12            r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0

      >t

      stop, execute 16 instructions

      PC  12 trap
      0            r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0

      >

if you change x to 44. Reassemble mem2.txt, and run mem2.obj again.

c:> sim21 mem2.obj

      load program, last address 106

      >g

      stop, execute 24 instructions

      >r

      

      r0 :0    r1 :1    r2 :0    r3
      :44   r4 :33   r5 :0    r6
      :0    r7 :0

      r8 :0    r9 :0    r10:0   
      r11:0    r12:0    r13:0   
      r14:0    r15:0

      r16:0    r17:0    r18:0   
      r19:0    r20:0    r21:0   
      r22:0    r23:0

      r24:0    r25:0    r26:0   
      r27:0    r28:0    r29:0   
      r30:0    r31:0

      >

This time we let it run through using the command "g". The simulator stops and we list registers by "r", to see the result in r5 = 0 (No).

4. Function abstraction

Calling a function is done by "jal r1 ads" instruction. Before jumping to the function, the return address of PC is saved in r1. The return is done by "ret r1", which restore PC from r1. This register is called "link" register.

There are two things that we have to concern here: 1) how to create local variables 2) how to pass parameters. Local variables are mapped to registers, therefore we need to save any registers that the function is going to use and at the end, before returning to the caller, these registered must be restored. The place to store "local context" is called "frame". We use stack data structure to save/restore registers.

Two stack operations are available:

   push sp r1

         pop  sp r2

For example, suppose the function uses r1, r2. Here is the sequence

:fun
push fp r1 ; frame is at fp
push fp r2
; do fun body
; at the return
pop fp r2
pop fp r1

  ret r27       ; assume r27
        saved return PC

Similarly for passing the parameter we use a different stack to pass a value from the caller to a function. To return a value from a function, assume we return only one value, then we need to designate a "global" place (in a register) for this purpose.

In summary, there are four global registers to be used in constructing function calls: frame, stack, return value, return address (fp, sp, retv, rads). We must assigned four registers specially for this purpose. Let us see some real code:

pseudo code

main
y = double(10)
print(y)

double x
ret x + x

assembly

:main                
        ;  y is r1

          mov r1 #10

          push sp r1         
        ; pass #10

          jal rads double

          mov r1 retv         ; y =
        double(10)

          trap r1 #1         
        ; print y 

          trap r0 #0   

         

        :double              
        ; use r1

          push fp r1         
        ; save r1

          pop sp
        r1           ; get x

          add retv r1 r1      ; ret x + x

          pop fp
        r1           ; restore
        r1

          ret rads

Of course, we must initialise values of the global registers before we run our program.

.symbol
fp   27   ; r27
sp   28   ; r28
retv 29   ; r29
rads 30   ; r30

.main

          mov fp #1000     ; frame at M[1000]

          mov sp #2000     ; stack at M[2000]

....

Source code for S21

The source code of assembler and simulator (in C)
s21-0.zip
s21-3.zip update 24 Jan 2017

The source code of assembler and simulator (in Python)

s2-python.zip update 28 Jan 2025

last update 31 Jan 2025