How the assembler work syntax of the assembly language Meta commands (.s .a .c .w .e) There are three sections which can occur in any sequence: define symbols, code section, data section. Each section starts with a meta command: .s for symbol section, .c for code section, .w for data section. Each section ends with any meta command. Other meta commands are: .a sets the current address, .e ends the assembly file. .e must be the last line. The ';;' starts the comment to the end of the current line. Comments are not interpreted by the assembler. In symbol definition section, symbols are defined with their associated values. The data section defined constant values. Lables can be defined in any section and they can be referred to by other assembly instructions. ;; comment .s ;; define symbol symbol n ;; n is value . . . .a n ;; set address to n .c ;; code segment :label op opr1 opr2 ... . . . .w ;; data segment v v ... ;; v is number or sym .e ;; end of program .s .a .c .w can occur in any sequence. .e is the last line of program. S2 Assembly language op opr1 opr2 ... where opr -> v #v @v +v v -> n | sym The convention for operand ordering is: op dest source. The operands are written in such a way to simplify the assembler using prefix to identify the addressing mode. ld r1, 10(r2) is written as ld r1 @10 r2 ;; displacement ld r1, (r2+r3) " ld r1 +r2 r3 ;; index ld r1, #200 " ld r1 #200 ;; immediate add r1, r2, r3 " add r1 r2 r3 ;; reg-reg add r1, r2, #20 add r1 r2 #20 ;; reg-immediate The assembler does not check for all possible illegal combination of opcode, addressing mode and operands. The forms of assembly language for each S2 instruction are: ld rd source st source rd aop rd rs1 rs2 aop rd rs #n sop rd rs jmp cond dest jal rd dest jr rs trap num rs where rd is r1..r31 rs is r0..r31 source -> absolute | disp | index | immediate (as shown above) aop -> add | sub | mul | div | and | or | xor (ALU op) sop -> shl | shr (shift op) cond -> always | eq | neq | le | lt | gt | ge (conditional) dest -> label | number s2 instruction format (field:length) L-format op:5 r1:5 ads:22 D-format op:5 r1:5 r2:5 disp:17 X-format op:5 r1:5 r2:5 r3:5 xop:12 The object code: l op num num d op num num num x op num num num xop ads and disp will be sign extended to 32-bit. The assembler works in two passes: pass1 input scanning, collect symbols, generate token list pass2 generate object code from the token list input scanning symbol table The predefined symbols are: opcode, r0..r31, conditional. opcode are ld st jmp jal jr add sub mul div and or xor shl shr trap. conditional are: always eq neq lt le ge gt. pass 1 collect symbols and resolve reference build symbol table store token list token list is an array of token. Each token stores type, mode, reference and line number (refer to source code line number). line number is used in reporting error. Type is: sym num op dot. Mode is addressing mode: absolute, displacement, index, immediate, reg-reg, reg-imm, special. For example ld r1 @lv1 base will generate the list of four tokens: ( notation : {type,mode,ref} ) { {op,disp,ld}, {sym,reg,r1}, {sym,disp,lv1}, {sym,reg,base} } pass 2 generate code from token list output format is suitable for a loader of the simulator a num set address {l,d,x} num+ instruction w num defined word e end of file 4 December 2001 Extended instructions To enable creation of new instructions, three extended instructions aer provided: xl, xd, xx, associated with three instruction formats: L, D, X. The assembly language can not have the notation of addressing as usual because the meaning of instruction will be defined by users. Therefore the operands of the instruction have to be written out without any decoration: XL op r1 disp:22 XD op r1 r2 disp:17 XX op r1 r2 r3 xop:12 where op/xop are user defined, disp can be a symbol. If the new instruction will have different format than the existing three, then users can use .w to put a 32-bit value directly into the code section. Example To add a new instruction "inc r1 r2 value" using D-format, where inc is assigned the opcode number 14, it can be written: .s inc 14 value 1 .c xd inc r1 r2 value .e The generated object code will be: d 14 1 2 1 The simulator must be extended accordingly to interpret this new instruction. See more example on assembly form of extended instruction in the file "testx.txt". 21 December 2002 Prabhas Chongstitvatana