S2 version 2.1  assembly language


Syntax of the assembly language

Meta commands  (.symbol .code .data .end)

There are three sections which can occur in any sequence: define symbols, code section, data section.  Each section starts with a meta command: .symbol for symbol section, .code for code section, .data for data section.  Each section ends with any meta command. End the assembly file by .end, it must be the last line.

The ';;' starts the comment to the end of the current line.  Comments are not interpreted by the assembler.

In symbol definition section, symbols are defined with their associated values.  The data section defined constant values.  Lables
can be defined in code section and they can be referred to by other assembly instructions.

;; comment
.symbol        ;; define symbol
  name value
  . . .
.code ads      ;; code segment start at ads
  [:label] op opr1 opr2 ...
  . . .    
.data ads      ;; data segment start at ads
  v v ...      ;; v is number or sym
.end           ;; end of program

.symbol .code .data  can occur in any sequence.  .end is the last line of program.

S2 Assembly language

The convention for operand ordering is: op dest source.  The operands are written in such a way to simplify the assembler using prefix to identify the addressing mode.

ld r1, 10(r2)  is written as  ld r1 @10 r2
ld r1, (r2+r3)   "            ld r1 +r2 r3
add r1, r2, r3   "            add r1 r2 r3
add r1, r2, #20               add r1 r2 #20

The assembler does not check for all possible illegal combination of opcode, addressing mode and operands.  The forms of assembly language for each S2 instruction are:

ld rd ads           ;; load absolute
ld rd @d rs         ;; load indirect
ld rd +rs1 rs2      ;; load index
st ads rd           ;; store absolute
st @d rd rs         ;; store indirect
st +rd1 rd2 rs      ;; store index
op rd rs1 rs2       ;; three arguments
op rd rs #n         ;; immediate
op rd rs            ;; two arguments
mv rd #n            ;; move immediate
mv rd rs            ;; move reg-reg
jmp dest            ;; unconditional jump
jt rd dest          ;; jump if rd == TRUE
jal rd dest         ;; jump and link, call subroutine
ret rd              ;; return
trap num            ;; trap 0, stop

Trap is special instructions,
trap 0      stop the simulation
trap 1      print R[30]  as integer
trap 2      print R[30] as character

S2 instruction format  (field:length)

L-format  op:5 r1:5 ads:22
D-format  op:5 r1:5 r2:5 disp:17
X-format  op:5 r1:5 r2:5 r3:5 xop:12

The object code:

L op num num
D op num num num
X op num num num xop

ads and disp will be sign extended to 32-bit.

Example

;; sum array A[0..9]
;; i = 0
;; s = 0
;; while i < 10
;;    s = s + A[i]
;;    i = i + 1

;; let r1 = i, r2 = s, r3 = test, r4 = A[i]

.symbol
    A 50           ;; array A start at 50
.code 0
    mv r1 #0
    mv r2 #0
:loop   
    lt r3 r1 #10
    jf r3 exit
    ld r4 @A r1
    add r2 r2 r4
    add r1 r1 #1
    jmp loop
:exit   
    trap 0
.data 50
    1 2 3 4 5 6 7 8 9 10
.end

How the assembler work

The assembler works in two passes:  
  pass1
    input scanning, collect symbols, generate token list
  pass2
    generate object code from the token list

pass 1

  collect symbols and resolve reference
  build symbol table
  store token list

token list is an array of token.  Each token stores type, mode, reference and line number (refer to source code line number).  line
number is used in reporting error and listing.  Type is: sym num op dot.  Mode is addressing mode: absolute, indirect, index, immediate, reg-reg, reg-imm, special.

For example ld r1 @d base  will generate the list of four tokens: ( notation : {type,mode,ref} )
{ {op,indirect,ld}, {sym,reg,r1}, {sym,disp,d}, {sym,reg,base} }

pass 2

  generate code from token list.  output format is suitable for a loader of the simulator

a num              set address
{L,D,X} num+       instruction
w num              defined word
e                  end of file

Extended instructions

To enable creation of new instructions, three extended instructions aer provided: xl, xd, xx, associated with three instruction formats: L, D, X.  The assembly language can not have the notation of addressing as usual because the meaning of instruction will be defined by users.  Therefore the operands of the instruction have to be written out without any decoration:

xl  op r1 disp:22
xd  op r1 r2 disp:17
xx  op r1 r2 r3

where op are user defined, disp can be a symbol.  

Example  To add a new instruction "inc r1 r2 value" using D-format, where inc is assigned the opcode number 14, it can be written:

.symbol
  inc 23
  value 1
.code 0
  xd inc r1 r2 value
.end

The generated object code will be:

D 23 1 2 1

The simulator must be extended accordingly to interpret this new instruction.  See more example on assembly form of extended
instruction in the file "testx.txt".

16 Jan 2007