S21 A Hypothetical 32-bit Processor
    
    This is a typical simple 32-bit CPU.  It has a three-address
    instruction architecture, with 32 registers and load/store instructions.
    This document presents only S21 assembly language view. It does not give
    details about microarchitecture (such as pipeline).
    
    S21 has three-address instruction set.  A general format of an
    instruction (register to register operations) is:
    
    op r1
        r2 r3     means  R[r1] = R[r2] op R[r3]
    
    
    To pass values between memory and registers, load/store instructions are
    used.   Load means transfer memory to register. Store means transfer
    register to memory. There are two addressing mode: absolute and index. 
      
    ld r1
        ads          R[r1] =
        M[ads]          absolute
      ld r1 +r2 r3  
            R[r1] = M[R[r2]+R[r3]]  index
      st ads r1
                 M[ads] =
        R[r1]          absolute
      st +r2 r3 r1  
            M[R[r2]+R[r3]] = R[r1]  index
    
    
    In assembly language, these addressing modes are written using the
    convention op dest <- source.
    Instruction type
    
    arithmetic:    
              add sub mul div 
      logic:            
           and or xor eq ne lt le gt ge shl shr
    control:       
              jmp jt jf jal ret
    data:
                   
          ld st push pop
    Instruction meaning
    
    false == 0
    true  != 0
    R[0] always zero 
    
    op r1 r2 r3    
      is    R[r1] = R[r2] op R[r3]
    op r1 r2 #n    
      is    R[r1] = R[r2] op n
    Data
    
    ld r1
      ads       is    R[r1] =
      M[ads]          absolute
    ld r1 +r2 r3   
      is    R[r1] = M[R[r2]+R[r3]]  index
    st ads
      r1       is    M[ads] =
      R[r1]          absolute
    st +r2 r3 r1   
      is    M[R[r2]+R[r3]] = R[r1]  index
    Control
    
    jmp
        ads         is   
        pc = ads
      jt r1
        ads       is    if R[r1] !=
        0  pc = ads 
      jf r1
        ads       is    if R[r1] ==
        0  pc = ads 
      jal r1
        ads      is    R[r1] = PC; PC =
        ads    ;  jump and link
      ret
        r1         
        is    PC = R[r1]             
      ;  return 
    Arithmetic
    two-complement integer arithmetic
    
    add r1 r2 r3    
      R[r1] = R[r2] + R[r3]
    add r1 r2 #n    
      R[r1] = R[r2] + sign extended n (n is 17 bits)
    sub r1 r2 ...    R[r1]
      = R[r2] - R[r3]
     mul r1 r2 ...    R[r1]
      = R[r2] * R[r3]
     div r1 r2 ...    R[r1]
      = R[r2] / R[r3]  integer division
    Logic (bitwise)
    
    and r1 r2 r3    
      R[r1] = R[r2] bit-and R[r3]
    and r1 r2 #n    
      R[r1] = R[r2] bit-and n
      (sign extended)
    or r1 r2
      r3      R[r1] = R[r2] bit-or  R[r3]
    or r1 r2 #n
           R[r1] = R[r2] bit-or  n (sign extended)
    xor r1 r2 r3    
      R[r1] = R[r2] bit-xor R[r3]
    xor r1 r2
      #n     R[r1] = R[r2] bit-xor n
      (sign extended)
    eq r1 r2
      r3      R[r1] = R[r2] == R[r3]
    eq r1 r2
      #n      R[r1] = R[r2] == n
      (sign extended)
      ne ...
      lt ...
      le ...
      gt ...
      ge ...
     shl r1 r2
      r3     R[r1] = R[r2] << R[r3]
    shl r1 r2 #n    
      R[r1] = R[r2] << #n
    shr r1 r2 r3    
      R[r1] = R[r2] >> R[r3]
    shr r1 r2 #n    
      R[r1] = R[r2] >> #n
    Stack operation
    
    To facilitate passing the parameters to a subroutine and also to save state
    (link register) for recursive call, two stack operations are defined: push,
    pop.
    
    push r1
      r2      is    R[r1]++; M[R[r1]] =
      R[r2]  // push r2, r1 as
      stack pointer   
    pop  r1
      r2      is    R[r2] = M[R[r1]];
      R[r1]--  // pop to r2,
      r1 as stack pointer
    Pseudo
    Pseudo instructions are unlike other instructions, they are used mainly to
    control the simulator and to perform input/output.  Some pseudo
    instructions are just like short-hand mnemonics, they are synthesized from
    other real instructions.
    
    trap
      n           special
      instruction, n is in r1-field.
    
    trap
    0                  
stop
    simulation
    trap
    1                  
print
    integer in R[30]
    trap
    2                  
print
    character in R[30]
    
    R0 is always zero, many instructions can be synthesized using r0.
    
    mv r1 r2    
      is    or r1 r2 r0     
      mv r1 #n     is    add r1 r0 #n 
      (n is 17-bit)
    
    Indirect address can be synthesized from index with r0.
    
    ld r5 +r2 r0  is   load r5 with indirect
      r2       R[5] = M[R[2]]
      
     To complement a register, xor with 0xFFFFFFFF (-1) can be used.
    
    not r1 r2    
      is    xor r1 r2 #-1      
      R[r1] = complement R[r2]
    Instruction format
    
    L-format   
op:5
        rd1:5 ads:22
      D-format    op:5
        rd1:5 rs2:5 disp:17
      X-format    op:5
        rd1:5 rs2:5 rs3:5 xop:12
    
    
    (rd dest, rs source, ads 22-bit,  disp 17-bit sign extended)
    
    Instructions are fixed length at 32 bits.  There are 32 registers, with
    R[0] always returns zero. The address space is 32-bit.  Access to
    memory is always on word boundary (no byte-access). Direct (absolute
    address) is 22-bit or the first 4M words.  Index and indirect access
    can reach the whole 32-bit address space.  Immediate value (d) is
    17-bit.  It is sign extended.  The  jump instructions (jmp,
    jt, jf) have 22-bit address. 
    ISA and opcode encoding
    
    opcode   format 
    
    0  -    nop
    1  L    ld r1
        ads    
      (ads is absolute 22-bit)
      3  L    st ads
        r1  
    6  L    jmp
        ads   
    7  L    jal r1
        ads       
    8  L   
      jt r1 ads 
    9  L   
      jf r1 ads  
      10 D    add r1 r2 #n  (n 17-bit
      is sign extended)
    11 D   
        sub r1 r2 #n  ...
    12 D   
      mul r1 r2 #n
    13 D   
      div r1 r2 #n
    14 D   
      and r1 r2 #n
    15 D   
      or r1 r2 #n
    16 D    xor r1 r2 #n
    17 D   
      eq r1 r2 #n
    18 D   
      ne r1 r2 #n
    19 D   
      lt r1 r2 #n
    20 D   
      le r1 r2 #n
    21 D   
      gt r1 r2 #n
    22 D   
      ge r1 r2 #n
    23 D   
      shl r1 r2 #n
    24 D   
      shr r1 r2 #n
    25..30 undefined
    31 xop   
      -     X
    
    xop
    0  X   add r1 r2 r3
    1  X   sub r1 r2 r3
    2  X  
      mul r1 r2 r3
    3  X  
      div r1 r2 r3
    4  X  
      and r1 r2 r3
    5  X  
      or r1 r2 r3
    6  X  
      xor r1 r2 r3
    7  X  
      eq r1 r2 r3
    8  X  
      ne r1 r2 r3
    9  X  
      lt r1 r2 r3
    10 X  
      le r1 r2 r3
    11 X  
      gt r1 r2 r3
    12 X  
      ge r1 r2 r3
    13 X  
      shl r1 r2 r3
    14 X  
      shr r1 r2 r3
    16 X   ld r1 +r2 r3
    17 X   st +r2 r3 r1 
    18 X   ret r1
    19 X   trap #n       use #n at r1 n = 0..31
    20 X   push r1 r2
         use r1 as stack pointer
    21 X   pop r1 r2
          use r1 as stack pointer
    23..4095 undefined
    
    Historical fact
    
    S21 is an extension of S2 from 2007. This design is a result of my
    collective experience in teaching assembly language to novice programmers
    for many years.  S2 has been used for teaching since 2001.  S2
    itself is an "extended" version of S1 (a 16-bit processor) which is used for
    teaching since 1997.
    
    To improve understandability of S2 assembly language, flags are not
    used.  Instead, new logical instructions that have 3-address are
    introduced.  They are similar to existing arithmetic
    instructions.  The result (true/false) is stored in a register. 
    Two conditional jumps are introduced "jt", "jf". They make use of the result
    from logical instructions.  To avoid the confusion between absolute
    addressing and moving between registers, a "pseudo" instruction "mv" is
    introduced and to make ld/st symmetry, "ld r1 #n" is eliminated. Indirect
    addressing caused difficulty to many students (similar to pointers in C
    language).  However, array index is simple to understand (may be aided
    by students' experience in high level language programming). So, the use of
    index instead of indirect is enforced here. 
    
    The opcode format and assembly language format for S2 follow the tradition
    "dest = source1 op source2" from classic machines such as PDP-11, VAX and
    IBM S360. 
    Example programs
    1.  Add 1 to 10
    s = 0
      i = 0
      while i <= 10
        s = s + i
        i = i + 1
    
    use r1 as s, r2 as i, r3 as flag
    .code 0
          or r1 r0 r0    ;; s = 0
        or r2 r0 r0    ;; i = 0
      :while
        le r3 r2 #10   ;; i <= 10
        jf r3 exit
        add r1 r1 r2   ;; s = s + i
        add r2 r2 #1   ;; i = i + 1
        jmp while
      :exit
          trap 0         ;; stop
        .end
       
    2.  Sum all elements in an array.  Assume the array is at 100,
    size of array is n.
    s = 0
      i = 0
      while i < n
        s = s + ax[i]
        i = i + 1
    
    use r1 as s, r2 as i, r3 as base, r4 as ax[i], r5 as flag
    .code 0
          or r1 r0 r0     ;; s = 0
        or r2 r0 r0     ;; i = 0
        add r3 r0 #100  ;; set base address
      :while
        lt r5 r2 #5     ;; i < n
        jf r5 exit
        ld r4 +r3 r2    ;; ax[i]
        add r1 r1 r4    ;; s = s + ax[i]
        add r2 r2 #1    ;; i = i + 1
        jmp while
      :exit
          trap 0          ;;
        stop
        
        .data 100
          11 22 33 44 55  ;; ax[.] at 100
        .end
       
    3.  Write "pointer chasing" program in assembly language.
    
    Goal, to test if a number is in a list.
    
    Data structure  (singly linked list)
    
    A list is made out of an array of nodes.  A node has two fields: the
    first one contains the number, the second one contains an index to the next
    node. The last node has 0 in the next field, signified the end of the list.
    For example, this is a list of number <11,22,33>.  The list start
    at index 100.
    
    address: 100  101   102   103  
      104   105   106
      data:    11   102  
      22    104   33   
      0     ...
    
    Pseudo code
    
    Let L be a list, x is a number.
    member(x, L)
        if head(L) == x return yes
        while tail(L) != 0
          if head(L) == x return yes
          L = tail(L)
        return no
    
    We can improve this program a bit.
    member2(x, L)
        while L != 0
          if head(L) == x return yes
          L = tail(L)
        return no
    
    It can also be written in a recursive form.
    memberrec(x, L)
        if head(L) == x return yes
        if tail(L) == 0 return no
        return memberrec(x, tail(L))
    
    We write the second version of program in s21 assembly language. The program
    runs  "member2(22,L)"   where L = 100.  That is, it
    checks whether 22 is in the list L.
    
    How to access head(L) and tail(L)?  We use index addressing, 
    assume base address of L is in r2, then 
    
      ld r4 +r0 r2   
    
    will load M[0+r2] to r4 which is exactly head(L).  For tail(L) we need
    offset 1.  Let assume r1 is 1, then
    
      ld r4 +r1 r2
    
    will load M[1+r2] to r4, tail(L).  Use r1 for c1, r2 base, r3 x, r4
    value, r5 flag.
    ;; member of list
      
      .code 0
      :member2
        mv r1
        #1            ;;
        constant 1
        mv r2
        #100          ;; base
        address L
        mv r3 #22
                  ;; x = 22
      :loop
        jf r2
        no           
        ;;  while L != 0
        ld r4 +r0 r2       
        ;;  head(L)
        eq r5 r4 r3        
        ;;  head(L) == x ?
        jt r5 yes
        ld r2 +r1 r2       
        ;;  L = tail(L)
        jmp loop
      :yes
        mv r5 #1
        jmp exit
      :no 
        mv r5 #0
      :exit
        trap 0
      
      .data 100 
        11 102 22 104 33 0   ;; list L at 100
      .end  
      
    Put this code into a text file "mem2.txt".  Assemble it using:
    c:> as21 mem2.txt
    The assembler produces a listing file and an object file.  The
    simulator s21.exe  takes an object file and run it.
    c:> s21 mem2.obj
    Using the command "t"  (single step), you will see:
    
    D:\prabhas\bag\s21\test>s21 mem2.obj
      load program, last address 106
      >t
      PC   0 mv r1
      #1          r1:1 r2:0 r3:0
      r4:0 r5:0 r28:0 r29:0 r31:0
      >t
      PC   1 mv r2 #100        r1:1
      r2:100 r3:0 r4:0 r5:0 r28:0 r29:0 r31:0
      >t
      PC   2 mv r3 #22        
      r1:1 r2:100 r3:22 r4:0 r5:0 r28:0 r29:0 r31:0
      >t
      PC   3 jf r2
      11          r1:1 r2:100 r3:22
      r4:0 r5:0 r28:0 r29:0 r31:0
      >t
      PC   4 ld r4 +r0 r2      r1:1 r2:100
      r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   5 eq r5 r4 r3       r1:1
      r2:100 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   6 jt r5
      9           r1:1 r2:100
      r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   7 ld r2 +r1 r2      r1:1 r2:102
      r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   8 jmp
      3            
      r1:1 r2:102 r3:22 r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   3 jf r2
      11          r1:1 r2:102 r3:22
      r4:11 r5:0 r28:0 r29:0 r31:0
      >t
      PC   4 ld r4 +r0 r2      r1:1 r2:102
      r3:22 r4:22 r5:0 r28:0 r29:0 r31:0
      >t
      PC   5 eq r5 r4 r3       r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
      >t
      PC   6 jt r5
      9           r1:1 r2:102
      r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
      >t
      PC   9 mv r5
      #1          r1:1 r2:102 r3:22
      r4:22 r5:1 r28:0 r29:0 r31:0
      >t
      PC  10 jmp
      12            r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
      >t
      stop, execute 16 instructions
      PC  12 trap
      0            r1:1
      r2:102 r3:22 r4:22 r5:1 r28:0 r29:0 r31:0
      >
    
    if you change x to 44.  Reassemble mem2.txt, and run mem2.obj again.
    
    D:\prabhas\bag\s21\test>s21 mem2.obj
      load program, last address 106
      >g
      stop, execute 24 instructions
      >r
      
      r0 :0    r1 :1    r2 :0    r3
      :44   r4 :33   r5 :0    r6
      :0    r7 :0
      r8 :0    r9 :0    r10:0   
      r11:0    r12:0    r13:0   
      r14:0    r15:0
      r16:0    r17:0    r18:0   
      r19:0    r20:0    r21:0   
      r22:0    r23:0
      r24:0    r25:0    r26:0   
      r27:0    r28:0    r29:0   
      r30:0    r31:0
      >
    
    This time we let it run through using the command "g".  The simulator
    stops and we list registers by "r", to see the result in r5 = 0 (No).
    
    
    last update 17 Dec 2012