2110254 Digital Design and Verification 2005

Annoucement
Part 2 Assembly level programming
cp8299x
Assembly language programming
"low level" assembly programming
Excercises
Tools (NEW assembler and simulator)

Course description

Processor design at instruction set level and register transfer level, hardware description language (HDL); functional verification of HDL models; microprocessors; control unit; memory unit; adders; I/O device interfaces.

Annoucement

Instead of having the examination of this class at the end of the semester, it is decided (for students benefit) that we will administrate the test around the beginning of February. The test will be on-line using JLAB. I will integrate the assembler and simulator into a special version of JLAB. I hope to release this version for your use as soon as it is done. Meanwhile, practice your assembly programming skill using the off-line version posted here. Any bug report is most welcome.

31 December 2005 The "call" section is updated. This is a correction for 16-bit PC. It is different from the lecture in class.
4 January 2006 Tools available assembler and simulator run under JLAB, and stand-alone version

Part 2 Assembly level programming

lecturer   Prabhas Chongstitvatana
office     room 18-13, Engineering building 4, floor 18.
tel        02-2186982
contact    prabhas at chula dot ac dot th

Aim: Learn programming at assembly level
Method: Use the instruction set of the chip for laboratory session (CP8299) to do a simple assembly level programming.

previous lecture (last year 2004)

cp8299x

Modify cp8299 to be used for the assembly language programming class. the modification will be minimal as the intension of the cp8299 design is to make it as easy as possible to realise. To enable accessing data structures, the "indirect addressing" is added to lda/sta intructions. This is the only modification to cp8299.

As cp8299 is a 8-bit processor, the indirect addressing has range only 8 bits. The address 0..255 is designated to data and the code will be started at 256 onward. The stack starts at 512 and grows toward the high memory.

memory map

0..255 data segment
256..511 code segment
512.... stack segment

Assembler syntax

.org xxx

specify the address xxx

sym: is the label of data or code, for data it is used as a variable name, for code it is used as label for control transfer destination (target of jumps or calls).

Addressing mode

lda sym      direct address   ac <- M[sym]
lda (sym)    indirect address ac <- M[M[sym]]
lda #n       n is a constant 8-bit   ac <- n

assumption about the meaning of programs

1 stack grows towards the hi memory.
2 "add" is 2'complement arithmetic (so the range of result is -128..127, not 0..255).
3 jsr label does the following:

push return address (the address of next instruction after jsr)
jump to label
and return "rts" pops stack uses that as the return address and transfer to that address.

4 the main code starts at 256.

Explanation of instruction set

There are 16 instructions in cp8299. They are grouped into

four groups:

1 lda/sta              ac <-> M/n
2 add/and/ora/xor      arithmetic/logic ac <- ac op M/n
3 jmp/jpc/jpz/jsr/rts control transfer
4 psh/pop/ror/rol/lds others

the lda/sta has 3 addressing modes: direct, indirect, immediate.   direct/indirect require 16-bit argument (the address), immediate is 8-bit. arithmetic/logic has 2 addressing modes: direct, immediate, immediate is 8-bit. control transfer has 16-bit argument. the argument is absolute address (I will ignore the relative mode).

Each instruction has this format:

op:4 mode:4 [argument]

argument is optional, it is either 8-bit (immediate) or 16-bit (address).

Meaning of each instruction

lda ads         ac <- M[ads]
lda (ads)       ac <- M[M[ads]]
lda #n          ac <- n
sta ads         M[ads] <- ac
sta (ads)       M[M[ads]] <- ac
note: there is no sta #n mode!
add ads        ac <- ac + M[ads]
add #n         ac <- ac + n
similarly for and/or/xor
jmp ads        jmp to ads, pc = ads
jpc ads        if carry flag == 1 jmp to ads
jpz ads        if zero flag == 1 jmp to ads
jsr ads        jmp to subroutine ads
rts            return from subroutine
psh            push ac to stack, sp++, M[sp] <- ac
pop            pop stack to ac, ac <- M[sp], sp--
rol            rotate ac left (through carry)
ror            rotate ac right (through carry)
lds ads        load stack pointer, sp = ads

note: we never use lds, in the simulator sp will start at 512 automatically.

Assembly language programming

we will approach programming from the view of translating a high level language to assembly language.

- assignment
- control transfer
- call

Assignment

a = b + c

where a,b,c are variables

in cp8299, there is only ac, so all variables are declared to be in the memory. we declare the location of variable by:

.org 0
a: 0
b: 10
c: 20

these lines will declare variables a, b, c to be at the locations 0, 1, 2 consecutively with the values 0, 10 and 20. (the simulator will instantiate these values in the memory).

the code for a = b + c is:

lda b
add c
sta a

the whole program is:

.org 0
jmp begin
a: 0
b: 10
c: 20
.org 100H
begin:
lda b
add c
sta a
.end

At the beginning, when reset, cpu pc starts at 0, "jmp begin" transfers to code segment which started at 100H (256) as the data segment occupied the addresses 0..0FFH (0..255). ".end" denotes the end of the assembly program. the simulator knows where the program ends and will stop the execution there.

another example

a = b + c - d

as there is no instruction to subtract, we have to convert d to (-d) and use "add". to convert to (-d) we do inverse and add one.

.org 0
jmp begin
a: 0
b: 10
c: 20
d: 5
tmp: 0
.org 100H
begin:
lda b
add c
sta tmp
lda d
xor #-1
add #1
sta d
lda tmp
add d
sta a
.end

Control transfer

the if-then-else contruct can be translated as follows:

if ex1 then ex2 else ex3

ex1
jump-if-false else
ex2
jump exit
else:
ex3
exit:

a concrete example:

if a == 0 then b == 1 else b == 2

.org 0
jmp begin
a: 0
b: 0
.org 100H
begin:
lda a
;; we don't have jump-if-not-zero
;; so we jump to "then" instead
jpz then
else:
lda #2
sta b
jmp exit
then:
lda #1
sta b
exit:
.end

the instruction set has only "positive" sense, jump-if-carry, jump-if-zero, in case of "negative" sense, we can "swap" the destination.

the "while" construct is translated as follows:

while cond
body

loop:
cond
jump-if-false exit
body
jmp loop
exit:

a concrete example

i = 0
while i < 10
i = i + 1

.org 0
jmp begin
i: 0
.org 100H
begin:
lda #0
sta i
loop:
;; to do i < 10
;; as we do i + (-10)
;; and check if carry flag is set
;; it signifies negative result
lda i
add #-10
jpc doit
jmp exit
doit:
lda i
add #1
sta i
jmp loop
exit:
.end

to test i < 10, we do i + (-10) < 0? and use carry flag to indicate the negative result. As there is no jump-if-not-carry, we do "swap" the jump destination. Other loop such as for can be done similar to while loop.

Subroutine call and parameter passing

See the following example of a code snippet.

sum(a,b)
return a+b

main
c = sum(4,5)

Calling a subroutine is done by "jsr" instruction. "jsr ads" implicitly pushes the return address (the next instruction after jsr) to stack (pointed by the stack pointer, sp) before transfers to the destination address. There are two questions:
1) How the actual parameters (4 and 5) are binded to the formal parameters (a and b)?
2) How the subroutine returns the value (a+b) back to the caller?

There are many ways to do parameter passsing. The simplest way is to declare the formal parameters as global variables. The
caller just instantiate the values and transfer the control. The subroutine gets the value from those global variables. However, this method precludes the subroutine that is recursive (because it uses global variables and therefore has side-effect). We opt for the alternative of using stack to pass parameters. The simulator implements a Big Endian representation ( Hi byte first). When pushing 16 bits value into the stack, Lo byte will be pushed first then Hi byte (so that the number in the data segment and stack segment will be ordered in the same way). The return address is 16-bit and is saved in the stack when doing a "call".

sum(4,5) is translated to

lda #4
psh
lda #5
psh
jsr sum

the picture of stack is:

hi
retads <-- sp
retads2
5
4
lo

The subroutine must "unstack" the stack to get its actual parameters. Let declare four variables as local in the subroutine and stored values from stack there.

.org 0
jmp main
retads: 0
retads2: 0
a: 0
b: 0
c: 0
.org 100H
main:
lda #4
psh
lda #5
psh
jsr sum
...

Now we must do the last piece, returning a value back to the caller. We will also use stack to pass a value back. We must arrange the value in the stack so that at return by the instruction "rts", the return address must be properly placed at the top of stack. The return value will be "under" this return address. See the picture of stack before return:

hi

  retads  <-  sp

  retads2 

  a+b

lo

This is done by pushing the return value THEN pushing the return address back and do "rts".

;; subroutine
sum:
pop
sta retads
pop
sta retads2
pop
sta b
pop
sta a
;; do a+b
lda a
add b
psh ;; push a+b
lda retads2
psh
lda retads
psh
rts

The caller simply pops the return value from stack and uses it.

main:
...
lda #4
psh
lda #5
psh
jsr sum
pop
sta c
.end

We still leave some topic unresolved, if this subroutine is recursive, we must "save" the value of local variables (a and b) before we call recursively. How to do that? It is a bit complicate beyond the introduction class. I will leave the curious students to work that out by themselves.

Accessing data structures

From the beginning we use only the scalar value. Accessing a scalar is simply lda/sta to a variable. To access a data structure we need to use a "pointer". The instruction lda/sta is used in "indirect" addressing mode to access a value pointed to by a pointer.

lda (ads) ac <- M[M[ads]]
sta (ads) M[M[ads]] <- ac

To access an array element, we calculate the "effective" address of an element by loading the base address of that array and add the index (assuming the size of element is one, if the size is otherthan one, we must also calculate the right "offset"). Then, the value of that element can be accessed by "indirect" address.

let a be an array a[10], the base address &a[0] is at 40.

c = a[2] is translated to:

.org 40
ea: 0 ;; use a temp var to store an effective address
c: 0
a: ...

.org 100H
...
lda #40
add #2
sta ea ;; this is the effective address
lda (ea)
sta c
...
.end

Please note that, as our processor (cp8299x) is a 8-bit machine, its alu is 8-bit, hence it can perform only 8-bit arithmetic in calculating the effective address. This limits the range of indirect addressing to 0..255. In contrast, the direct addressing is 16-bit. This characteristic is due to our choice of modification to the cp8299. Other design where the addressing is flat 0..65335 is possible (but the question is how you are going to calculate the effective address?).

Another example comes from a basic data structure, linked list. If we assume a "cell" consists of 2 bytes, the first byte is the information, the second byte is the address of the next cell (with only 0..255, the pointer is only 8-bit). We can access a cell using "indirect" addressing.

let m be a list (3 4 5) of the following structure: a cell is represented as 2 bytes: [ads:info, ads:next]

we can represent (3 4 5 ) as:

[10:3, 11:20] [20:4, 21:24] [24:5, 25:0]

A null pointer is 0. Accessing m.info is similar to m[0], and m.next is m[1].

m = 10
p = m.next
m.info = 6

...
lda #10
sta m
lda m
add #1 ;; m.next
sta ea
lda (ea)
sta p
lda #6
sta (m) ;; m.info
...

Example of assembly programs

Arming with these basics, we will now proceed to show you some assembly language programs. "jsr 1001" is a pseudo code to stop the simulator.

Ex 1 storing a value into the whole array

let a be an array 10

i = 0
while i < 10
a[i] = 8
i = i + 1

.org 0
jmp begin
i:
aa: &a
ea: 0
a: 0 0 0 0 0 0 0 0 0 0
.org 100H
begin:
lda #0
sta i
loop:
lda i
add #-10
jpc body ;; test i < 10
jmp exit
body:
lda aa
add i
sta ea ;; &a[i]
lda #8
sta (ea) ;; a[i] = 8
lda i
add i
sta i ;; i = i + 1
jmp loop
exit:
jsr 1001
.end

The base address of array a is stored in "aa", if "jmp begin" is relative its size is two bytes, "&a" will be the address 5. The effective address is in a temp var "ea".

Ex 2 searching a linked list.

Let x be a list of number, this list is represented by a linked list of the structure [ads:info, ads:next] of size two bytes, with a null pointer 0. Let d be a number input, we want to check if d is in the list x.

seach( x, d) will check if d is in the list x, it returns 1 if found, otherwise 0.

search( x, d )
flag = 0
while x != nil
    if x.info == d
      flag = 1
      break
    else
      x = x.next
return flag

test data, x list is (7, 8, 9). we wrote search() as a subroutine with two parameters.

;; search

.org 0
jmp main
flag: 0
retads: 0
retads2: 0
ea: 0
ax: 10   ;; &x
d: 0
xp: 0
x: 7 12 8 14 9 0
md: 0
c: 0
;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )

.org 100H
main:
lda ax
psh
lda #8
psh
jsr search
pop
sta c       ;; c = search(x, 8)
jmp exit
;; search in written as a subroutine with two parameters
search:
pop
sta retads
pop
sta retads2
pop
sta d
pop
sta xp     ;; x is pointer
lda d
xor #-1
add #1
sta md ;; do (-d) for comparison
lda #0
sta flag
loop:
lda xp
jpz ret   ;; test x == nil
sta ea    ;; x is already in ac, do &x
lda (ea) ;; get x.info
add md ;; test x.info == d
jpz then
else:
lda xp
add 1
sta ea    ;; &(x.next)
lda (ea) ;; x.next
sta xp    ;; x = x.next
jmp loop
then:
lda #1
sta flag
ret:
lda flag ;; return flag
psh
lda retads2
psh
lda retads
psh
rts        ;; return
exit:
jsr 1001
.end

We wrote search() as a subroutine. In main, search is called with search(x,8) and the result is stored in c (where we can see the result by inspecting the memory content). Search has the preamble: pop retads, pop d, pop x, and the body of function, then the postamble to return result: push return value, push retads and rts. The body uses the access function of linked list as described in the previous section.

"low level" assembly programming

In a real chip, there always is a limited resource so that some operation is not available directly as an instruction, for example, cp8299x does not have "subtract". In order to perform this operation, sequence of existing instructions can be used. This is what I mean by "low level" assembly. You have seen previously how we perform "subtract" using "xor" and "add":

a = b - c

lda c
xor #-1
add #1
sta mc ;; mc = (-c)
lda b
add mc
sta a

To give you some idea what a programmer must face when writing programs at the machine level, I will show how a 8-bit processor performs 16-bit addition. We use two bytes to store a 16-bit number then accessing it byte by byte. Adding one byte to one byte then use the carry flag to add the next two bytes together.

Let bd, cd be two input numbers, let ad be the result, we use the "big endian" convention in storing these numbers in the memory.

.org 0
jmp begin
ad: 0 0
bd: 0 4
cd: 0 5
ap: &ad
bp: &bd
cp: &cd
ea: 0
tmp: 0
carry: 0
.org 100H
begin:
lda bp
add #1    ;; access the lo byte
sta ea
lda (ea) ;; get lo(bd)
sta tmp
lda cp
add #1
sta ea
lda (ea) ;; get lo(cd)
add tmp   ;; lo(cd)+lo(bd)
sta tmp
jpc docarry
lda #0    ;; carry = 0
sta carry
jmp addnext
docarry:
lda #1    ;; carry = 1
sta carry
addnext:
lda ap
add #1
sta ea
lda tmp
sta (ea) ;; lo(cd+bd)->lo(ad)
lda bp
sta ea    ;; access the hi byte
lda (ea) ;; get hi(bd)
sta tmp
lda cp
sta ea    ;; access the hi byte
lda (ea) ;; get hi(cd)
add tmp
add carry ;; hi(cd+bd)+carry
sta tmp
lda ap
sta ea
lda tmp
sta (ea) ;; hi(cd+bd)->hi(ad)
exit:
jsr 1001
.end

It is a bit tedious but not difficult. Please note that we use "pointer" to numbers (ap, bp, cp) to access them (because the number becomes a data structure, array of bytes). The "carry" is used to store the carry bit that will be added to the next digit (byte).

Obviously, programmers are not going to write in all details every time they want to just adding two 16-bit numbers. This code belongs to the library. In the assembly language vocabulary, the old timer called these kinds, "macro" programming. Now you probably started to see why at some circumstances, assembly language programming is necessary.

Don't worry that you do not appreciate what is going on at this level. The "low level" topic will not be included in the assessment.

Excercises

Please try your hands on these examples. They will help to hone your skill a bit so that you will be comfortable doing the assembly level programming for your forthcoming laboratory sessions. (and in the on-line exam!)

1 Write a program to find maximum of an array of 8-bit numbers.
2 Write a program to add sum(a,b) where a,b are 8-bit numbers.
3 Write a program to reverse order of an array of 8-bit numbers.
4 Write fibonacci function that you have seen in the lecture, try it out on the simulator.
5 How fast can you clear a block of memory? How many instruction per one byte of work?
6 Write a subroutine to "multiply" x,y. Remember that we have only 8-bit alu, so the result must not exceed 127.
7 Write a subroutine to "insert" one element into the middle of an array.

Remember to write out a psuedo-code (in kind of high level language) first. Then allocate space for the variables. Only then, you can start to write the assembly language. This is, I found from my experience of teaching this subject, the easiest way to get it right the first time.

Well, after reading these questions you should be able to invent questions of your own along these lines.

Good luck and enjoy programming!

< I can write more examples, please let me know what example you like to see>

Tools

assembler and simulator for cp8299x from Aj. Prabhas stand-alone version run on XP and DOS
How to use assembler How to use simulator cp8299x
assembler and simulator from Aj. Thit run under JLAB
NOTE: the two versions of tools (Aj. Prabhas and Aj. Thit) may differ in assembly syntax

Prabhas Chongstitvatana

last update 4 January 2006