How the assembler work

Meta commands
S2 Assembly language
S2 instruction format
Pass1
Pass2
Extended instructions

Syntax of the assembly language

Meta commands  (.s .a .c .w .e)

There are three sections which can occur in any sequence: define symbols, code section, data section.  Each section starts with a meta command: .s for symbol section, .c for code section, .w for data section.  Each section ends with any meta command. Other meta commands are: .a sets the current address, .e ends the assembly file. .e must be the last line.  The ';;' starts the comment to the end of the current line.  Comments are not interpreted by the assembler.

In symbol definition section, symbols are defined with their associated values.  The data section defined constant values.  Lables can be defined in any section and they can be referred to by other assembly instructions.

;; comment
.s           ;; define symbol
symbol n     ;; n is value
. . .
.a n         ;; set address to n
.c           ;; code segment
:label op opr1 opr2 ...
. . .
.w           ;; data segment
v v ...      ;; v is number or sym
.e           ;; end of program
.s .a .c .w  can occur in any sequence.  .e is the last line of program.

S2 Assembly language

op opr1 opr2 ...
  where
    opr -> v #v @v +v
    v -> n | sym
The convention for operand ordering is: op dest source.  The operands are written in such a way to simplify the assembler using prefix to identify the addressing mode.

ld r1, 10(r2)  is written as  ld r1 @10 r2   ;; displacement
ld r1, (r2+r3)   "            ld r1 +r2 r3   ;; index
ld r1, #200      "            ld r1 #200     ;; immediate
add r1, r2, r3   "            add r1 r2 r3   ;; reg-reg
add r1, r2, #20               add r1 r2 #20  ;; reg-immediate

The assembler does not check for all possible illegal combination of opcode, addressing mode and operands.  The forms of assembly language for each S2 instruction are:

ld rd source
st source rd
aop rd rs1 rs2
aop rd rs #n
sop rd rs
jmp cond dest
jal rd dest
jr  rs
trap num rs
where
rd is r1..r31
rs is r0..r31
source -> absolute | disp | index | immediate    (as shown above)
aop -> add | sub | mul | div | and | or | xor    (ALU op)
sop -> shl | shr     (shift op)
cond -> always | eq | neq | le | lt | gt | ge    (conditional)
dest -> label | number
 

S2 instruction format  (field:length)

L-format  op:5 r1:5 ads:22
D-format  op:5 r1:5 r2:5 disp:17
X-format  op:5 r1:5 r2:5 r3:5 xop:12

The object code:

l op num num
d op num num num
x op num num num xop

ads and disp will be sign extended to 32-bit.
 

The assembler

The assembler works in two passes:
  pass1
    input scanning, collect symbols, generate token list
  pass2
    generate object code from the token list

input scanning

symbol table

The predefined symbols are: opcode, r0..r31, conditional.  opcode are ld st jmp jal jr add sub mul div and or xor shl shr trap.  conditional are: always eq neq lt le ge gt.

pass 1

  collect symbols and resolve reference
  build symbol table
  store token list

token list is an array of token.  Each token stores type, mode, reference and line number (refer to source code line number).  line number is used in reporting error.  Type is: sym num op dot.  Mode is addressing mode: absolute, displacement, index, immediate, reg-reg, reg-imm, special.

For example ld r1 @lv1 base  will generate the list of four tokens:
( notation : {type,mode,ref} )
{ {op,disp,ld}, {sym,reg,r1}, {sym,disp,lv1}, {sym,reg,base} }
 

pass 2

  generate code from token list

output format is suitable for a loader of the simulator

a num              set address
{l,d,x} num+       instruction
w num              defined word
e                  end of file

4 December 2001
 

Extended instructions

To enable creation of new instructions, three extended instructions aer provided: xl, xd, xx, associated with three instruction formats: L, D, X.  The assembly language can not have the notation of addressing as usual because the meaning of instruction will be defined by users.  Therefore the operands of the instruction have to be written out without any decoration:

XL  op r1 disp:22
XD  op r1 r2 disp:17
XX  op r1 r2 r3 xop:12

where op/xop are user defined, disp can be a symbol.

If the new instruction will have different format than the existing three, then users can use .w to put a 32-bit value directly into the code section.

Example  To add a new instruction "inc r1 r2 value" using D-format, where inc is assigned the opcode number 14, it can be written:

.s
inc 14
value 1
.c
xd inc r1 r2 value
.e
The generated object code will be:

d 14 1 2 1

The simulator must be extended accordingly to interpret this new instruction.  See more example on assembly form of extended instruction in the file "as2\testx.txt":

;;  test extended instruction
.s
inc 14
ldd 15
addx 16
addx2 17
.a 10
.c
     xd inc r1 r2 1          ;; new instruction D-format
     xl ldd r7 data          ;; L-format
     xx addx r3 r4 r5 addx2  ;; X-format
 .w 48230 ;; raw 32-bit
 .c                          ;; back to code
     add r1 r3 #4
     add r1 r2 r3
:data                        ;; data segment
.w 11 22 33
.e
21 December 2002
Prabhas Chongstitvatana