- Annoucement
- Part 2 Assembly level programming
- cp8299x
- Assembly language programming
- "low level" assembly programming
- Excercises
- Tools (NEW assembler and
simulator)

31 December 2005 The "call" section is updated. This is a correction for 16-bit PC. It is different from the lecture in class.

4 January 2006 Tools available assembler and simulator run under JLAB, and stand-alone version

office room 18-13, Engineering building 4, floor 18.

tel 02-2186982

contact prabhas at chula dot ac dot th

Aim: Learn programming at assembly level

Method: Use the instruction set of the chip for laboratory session (CP8299) to do a simple assembly level programming.

previous lecture (last year 2004)

As cp8299 is a 8-bit processor, the indirect addressing has range only 8 bits. The address 0..255 is designated to data and the code will be started at 256 onward. The stack starts at 512 and grows toward the high memory.

256..511 code segment

512.... stack segment

specify the address xxx

sym: is the label of data or code, for data it is used as a variable name, for code it is used as label for control transfer destination (target of jumps or calls).

lda
sym direct address ac <-
M[sym]

lda (sym) indirect address ac <- M[M[sym]]

lda #n n is a constant 8-bit ac <- n

lda (sym) indirect address ac <- M[M[sym]]

lda #n n is a constant 8-bit ac <- n

assumption about the meaning of programs

1 stack grows towards the hi memory.

2 "add" is 2'complement arithmetic (so the range of result is -128..127, not 0..255).

3 jsr label does the following:

push return address (the address of
next instruction after jsr)

jump to label

and return "rts" pops stack uses that as the return address and transfer to that address.

4 the main code starts at 256.jump to label

and return "rts" pops stack uses that as the return address and transfer to that address.

four groups:

1 lda/sta ac <-> M/n

2 add/and/ora/xor arithmetic/logic ac <- ac op M/n

3 jmp/jpc/jpz/jsr/rts control transfer

4 psh/pop/ror/rol/lds others

the lda/sta has 3 addressing modes: direct, indirect, immediate. direct/indirect require 16-bit argument (the address), immediate is 8-bit. arithmetic/logic has 2 addressing modes: direct, immediate, immediate is 8-bit. control transfer has 16-bit argument. the argument is absolute address (I will ignore the relative mode).

Each instruction has this format:

op:4
mode:4 [argument]

argument is optional, it is either 8-bit (immediate) or 16-bit (address).

lda
ads ac <-
M[ads]

lda (ads) ac <- M[M[ads]]

lda #n ac <- n

sta ads M[ads] <- ac

sta (ads) M[M[ads]] <- ac

note: there is no sta #n mode!

add ads ac <- ac + M[ads]

add #n ac <- ac + n

similarly for and/or/xor

jmp ads jmp to ads, pc = ads

jpc ads if carry flag == 1 jmp to ads

jpz ads if zero flag == 1 jmp to ads

jsr ads jmp to subroutine ads

rts return from subroutine

psh push ac to stack, sp++, M[sp] <- ac

pop pop stack to ac, ac <- M[sp], sp--

rol rotate ac left (through carry)

ror rotate ac right (through carry)

lds ads load stack pointer, sp = ads

lda (ads) ac <- M[M[ads]]

lda #n ac <- n

sta ads M[ads] <- ac

sta (ads) M[M[ads]] <- ac

note: there is no sta #n mode!

add ads ac <- ac + M[ads]

add #n ac <- ac + n

similarly for and/or/xor

jmp ads jmp to ads, pc = ads

jpc ads if carry flag == 1 jmp to ads

jpz ads if zero flag == 1 jmp to ads

jsr ads jmp to subroutine ads

rts return from subroutine

psh push ac to stack, sp++, M[sp] <- ac

pop pop stack to ac, ac <- M[sp], sp--

rol rotate ac left (through carry)

ror rotate ac right (through carry)

lds ads load stack pointer, sp = ads

note: we never use lds, in the simulator sp will start at 512 automatically.

- assignment

- control transfer

- call

- control transfer

- call

a
= b + c

where a,b,c are variables

in cp8299, there is only ac, so all variables are declared to be in the memory. we declare the location of variable by:

.org
0

a: 0

b: 10

c: 20

a: 0

b: 10

c: 20

these lines will declare variables a, b, c to be at the locations 0, 1, 2 consecutively with the values 0, 10 and 20. (the simulator will instantiate these values in the memory).

the code for a = b + c is:

lda
b

add c

sta a

add c

sta a

the whole program is:

.org
0

jmp begin

a: 0

b: 10

c: 20

.org 100H

begin:

lda b

add c

sta a

.end

jmp begin

a: 0

b: 10

c: 20

.org 100H

begin:

lda b

add c

sta a

.end

At the beginning, when reset, cpu pc starts at 0, "jmp begin" transfers to code segment which started at 100H (256) as the data segment occupied the addresses 0..0FFH (0..255). ".end" denotes the end of the assembly program. the simulator knows where the program ends and will stop the execution there.

another example

a
= b + c - d

as there is no instruction to subtract, we have to convert d to (-d) and use "add". to convert to (-d) we do inverse and add one.

.org
0

jmp begin

a: 0

b: 10

c: 20

d: 5

tmp: 0

.org 100H

begin:

lda b

add c

sta tmp

lda d

xor #-1

add #1

sta d

lda tmp

add d

sta a

.end

jmp begin

a: 0

b: 10

c: 20

d: 5

tmp: 0

.org 100H

begin:

lda b

add c

sta tmp

lda d

xor #-1

add #1

sta d

lda tmp

add d

sta a

.end

if
ex1 then ex2 else ex3

ex1

jump-if-false else

ex2

jump exit

else:

ex3

exit:

ex1

jump-if-false else

ex2

jump exit

else:

ex3

exit:

a concrete example:

if
a == 0 then b == 1 else b == 2

.org 0

jmp begin

a: 0

b: 0

.org 100H

begin:

lda a

;; we don't have jump-if-not-zero

;; so we jump to "then" instead

jpz then

else:

lda #2

sta b

jmp exit

then:

lda #1

sta b

exit:

.end

.org 0

jmp begin

a: 0

b: 0

.org 100H

begin:

lda a

;; we don't have jump-if-not-zero

;; so we jump to "then" instead

jpz then

else:

lda #2

sta b

jmp exit

then:

lda #1

sta b

exit:

.end

the instruction set has only "positive" sense, jump-if-carry, jump-if-zero, in case of "negative" sense, we can "swap" the destination.

the "while" construct is translated as follows:

while
cond

body

loop:

cond

jump-if-false exit

body

jmp loop

exit:

body

loop:

cond

jump-if-false exit

body

jmp loop

exit:

a concrete example

i = 0

while i < 10

i = i + 1

.org 0

jmp begin

i: 0

.org 100H

begin:

lda #0

sta i

loop:

;; to do i < 10

;; as we do i + (-10)

;; and check if carry flag is set

;; it signifies negative result

lda i

add #-10

jpc doit

jmp exit

doit:

lda i

add #1

sta i

jmp loop

exit:

.end

while i < 10

i = i + 1

.org 0

jmp begin

i: 0

.org 100H

begin:

lda #0

sta i

loop:

;; to do i < 10

;; as we do i + (-10)

;; and check if carry flag is set

;; it signifies negative result

lda i

add #-10

jpc doit

jmp exit

doit:

lda i

add #1

sta i

jmp loop

exit:

.end

to test i < 10, we do i + (-10) < 0? and use carry flag to indicate the negative result. As there is no jump-if-not-carry, we do "swap" the jump destination. Other loop such as for can be done similar to while loop.

sum(a,b)

return a+b

main

c = sum(4,5)

return a+b

main

c = sum(4,5)

Calling a subroutine is done by "jsr" instruction. "jsr ads" implicitly pushes the return address (the next instruction after jsr) to stack (pointed by the stack pointer, sp) before transfers to the destination address. There are two questions:

1) How the actual parameters (4 and 5) are binded to the formal parameters (a and b)?

2) How the subroutine returns the value (a+b) back to the caller?

There are many ways to do parameter passsing. The simplest way is to declare the formal parameters as global variables. The

caller just instantiate the values and transfer the control. The subroutine gets the value from those global variables. However, this method precludes the subroutine that is recursive (because it uses global variables and therefore has side-effect). We opt for the alternative of using stack to pass parameters. The simulator implements a Big Endian representation ( Hi byte first). When pushing 16 bits value into the stack, Lo byte will be pushed first then Hi byte (so that the number in the data segment and stack segment will be ordered in the same way). The return address is 16-bit and is saved in the stack when doing a "call".

sum(4,5) is translated to

lda #4

psh

lda #5

psh

jsr sum

psh

lda #5

psh

jsr sum

the picture of stack is:

hi

retads <-- sp

retads2

5

4

lo

retads <-- sp

retads2

5

4

lo

The subroutine must "unstack" the stack to get its actual parameters. Let declare four variables as local in the subroutine and stored values from stack there.

.org 0

jmp main

retads: 0

retads2: 0

a: 0

b: 0

c: 0

.org 100H

main:

lda #4

psh

lda #5

psh

jsr sum

...

jmp main

retads: 0

retads2: 0

a: 0

b: 0

c: 0

.org 100H

main:

lda #4

psh

lda #5

psh

jsr sum

...

Now we must do the last piece, returning a value back to the caller. We will also use stack to pass a value back. We must arrange the value in the stack so that at return by the instruction "rts", the return address must be properly placed at the top of stack. The return value will be "under" this return address. See the picture of stack before return:

hi

retads <- sp

retads2

a+b

lo

retads <- sp

retads2

a+b

lo

This is done by pushing the return value THEN pushing the return address back and do "rts".

;;
subroutine

sum:

pop

sta retads

pop

sta retads2

pop

sta b

pop

sta a

;; do a+b

lda a

add b

psh ;; push a+b

lda retads2

psh

lda retads

psh

rts

The caller simply pops the return value from stack and uses it.sum:

pop

sta retads

pop

sta retads2

pop

sta b

pop

sta a

;; do a+b

lda a

add b

psh ;; push a+b

lda retads2

psh

lda retads

psh

rts

main:

...

lda #4

psh

lda #5

psh

jsr sum

pop

sta c

.end

...

lda #4

psh

lda #5

psh

jsr sum

pop

sta c

.end

We still leave some topic unresolved, if this subroutine is recursive, we must "save" the value of local variables (a and b) before we call recursively. How to do that? It is a bit complicate beyond the introduction class. I will leave the curious students to work that out by themselves.

lda
(ads) ac <- M[M[ads]]

sta (ads) M[M[ads]] <- ac

sta (ads) M[M[ads]] <- ac

To access an array element, we calculate the "effective" address of an element by loading the base address of that array and add the index (assuming the size of element is one, if the size is otherthan one, we must also calculate the right "offset"). Then, the value of that element can be accessed by "indirect" address.

let a be an array a[10], the base address &a[0] is at 40.

c = a[2] is translated to:

.org 40

ea: 0 ;; use a temp var to store an effective address

c: 0

a: ...

.org 100H

...

lda #40

add #2

sta ea ;; this is the effective address

lda (ea)

sta c

...

.end

ea: 0 ;; use a temp var to store an effective address

c: 0

a: ...

.org 100H

...

lda #40

add #2

sta ea ;; this is the effective address

lda (ea)

sta c

...

.end

Please note that, as our processor (cp8299x) is a 8-bit machine, its alu is 8-bit, hence it can perform only 8-bit arithmetic in calculating the effective address. This limits the range of indirect addressing to 0..255. In contrast, the direct addressing is 16-bit. This characteristic is due to our choice of modification to the cp8299. Other design where the addressing is flat 0..65335 is possible (but the question is how you are going to calculate the effective address?).

Another example comes from a basic data structure, linked list. If we assume a "cell" consists of 2 bytes, the first byte is the information, the second byte is the address of the next cell (with only 0..255, the pointer is only 8-bit). We can access a cell using "indirect" addressing.

let m be a list (3 4 5) of the following structure: a cell is represented as 2 bytes: [ads:info, ads:next]

we can represent (3 4 5 ) as:

[10:3, 11:20] [20:4, 21:24] [24:5, 25:0]

A null pointer is 0. Accessing m.info is similar to m[0], and m.next is m[1].

m
= 10

p = m.next

m.info = 6

...

lda #10

sta m

lda m

add #1 ;; m.next

sta ea

lda (ea)

sta p

lda #6

sta (m) ;; m.info

...

p = m.next

m.info = 6

...

lda #10

sta m

lda m

add #1 ;; m.next

sta ea

lda (ea)

sta p

lda #6

sta (m) ;; m.info

...

i
= 0

while i < 10

a[i] = 8

i = i + 1

.org 0

jmp begin

i:

aa: &a

ea: 0

a: 0 0 0 0 0 0 0 0 0 0

.org 100H

begin:

lda #0

sta i

loop:

lda i

add #-10

jpc body ;; test i < 10

jmp exit

body:

lda aa

add i

sta ea ;; &a[i]

lda #8

sta (ea) ;; a[i] = 8

lda i

add i

sta i ;; i = i + 1

jmp loop

exit:

jsr 1001

.end

while i < 10

a[i] = 8

i = i + 1

.org 0

jmp begin

i:

aa: &a

ea: 0

a: 0 0 0 0 0 0 0 0 0 0

.org 100H

begin:

lda #0

sta i

loop:

lda i

add #-10

jpc body ;; test i < 10

jmp exit

body:

lda aa

add i

sta ea ;; &a[i]

lda #8

sta (ea) ;; a[i] = 8

lda i

add i

sta i ;; i = i + 1

jmp loop

exit:

jsr 1001

.end

The base address of array a is stored in "aa", if "jmp begin" is relative its size is two bytes, "&a" will be the address 5. The effective address is in a temp var "ea".

seach( x, d) will check if d is in the list x, it returns 1 if found, otherwise 0.

search(
x, d )

flag = 0

while x != nil

if x.info == d

flag = 1

break

else

x = x.next

return flag

flag = 0

while x != nil

if x.info == d

flag = 1

break

else

x = x.next

return flag

test data, x list is (7, 8, 9). we wrote search() as a subroutine with two parameters.

;;
search

.org 0

jmp main

flag: 0

retads: 0

retads2: 0

ea: 0

ax: 10 ;; &x

d: 0

xp: 0

x: 7 12 8 14 9 0

md: 0

c: 0

;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )

.org 100H

main:

lda ax

psh

lda #8

psh

jsr search

pop

sta c ;; c = search(x, 8)

jmp exit

;; search in written as a subroutine with two parameters

search:

pop

sta retads

pop

sta retads2

pop

sta d

pop

sta xp ;; x is pointer

lda d

xor #-1

add #1

sta md ;; do (-d) for comparison

lda #0

sta flag

loop:

lda xp

jpz ret ;; test x == nil

sta ea ;; x is already in ac, do &x

lda (ea) ;; get x.info

add md ;; test x.info == d

jpz then

else:

lda xp

add 1

sta ea ;; &(x.next)

lda (ea) ;; x.next

sta xp ;; x = x.next

jmp loop

then:

lda #1

sta flag

ret:

lda flag ;; return flag

psh

lda retads2

psh

lda retads

psh

rts ;; return

exit:

jsr 1001

.end

.org 0

jmp main

flag: 0

retads: 0

retads2: 0

ea: 0

ax: 10 ;; &x

d: 0

xp: 0

x: 7 12 8 14 9 0

md: 0

c: 0

;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )

.org 100H

main:

lda ax

psh

lda #8

psh

jsr search

pop

sta c ;; c = search(x, 8)

jmp exit

;; search in written as a subroutine with two parameters

search:

pop

sta retads

pop

sta retads2

pop

sta d

pop

sta xp ;; x is pointer

lda d

xor #-1

add #1

sta md ;; do (-d) for comparison

lda #0

sta flag

loop:

lda xp

jpz ret ;; test x == nil

sta ea ;; x is already in ac, do &x

lda (ea) ;; get x.info

add md ;; test x.info == d

jpz then

else:

lda xp

add 1

sta ea ;; &(x.next)

lda (ea) ;; x.next

sta xp ;; x = x.next

jmp loop

then:

lda #1

sta flag

ret:

lda flag ;; return flag

psh

lda retads2

psh

lda retads

psh

rts ;; return

exit:

jsr 1001

.end

We wrote search() as a subroutine. In main, search is called with search(x,8) and the result is stored in c (where we can see the result by inspecting the memory content). Search has the preamble: pop retads, pop d, pop x, and the body of function, then the postamble to return result: push return value, push retads and rts. The body uses the access function of linked list as described in the previous section.

a = b - c

lda c

xor #-1

add #1

sta mc ;; mc = (-c)

lda b

add mc

sta a

lda c

xor #-1

add #1

sta mc ;; mc = (-c)

lda b

add mc

sta a

To give you some idea what a programmer must face when writing programs at the machine level, I will show how a 8-bit processor performs 16-bit addition. We use two bytes to store a 16-bit number then accessing it byte by byte. Adding one byte to one byte then use the carry flag to add the next two bytes together.

Let bd, cd be two input numbers, let ad be the result, we use the "big endian" convention in storing these numbers in the memory.

.org 0

jmp begin

ad: 0 0

bd: 0 4

cd: 0 5

ap: &ad

bp: &bd

cp: &cd

ea: 0

tmp: 0

carry: 0

.org 100H

begin:

lda bp

add #1 ;; access the lo byte

sta ea

lda (ea) ;; get lo(bd)

sta tmp

lda cp

add #1

sta ea

lda (ea) ;; get lo(cd)

add tmp ;; lo(cd)+lo(bd)

sta tmp

jpc docarry

lda #0 ;; carry = 0

sta carry

jmp addnext

docarry:

lda #1 ;; carry = 1

sta carry

addnext:

lda ap

add #1

sta ea

lda tmp

sta (ea) ;; lo(cd+bd)->lo(ad)

lda bp

sta ea ;; access the hi byte

lda (ea) ;; get hi(bd)

sta tmp

lda cp

sta ea ;; access the hi byte

lda (ea) ;; get hi(cd)

add tmp

add carry ;; hi(cd+bd)+carry

sta tmp

lda ap

sta ea

lda tmp

sta (ea) ;; hi(cd+bd)->hi(ad)

exit:

jsr 1001

.end

jmp begin

ad: 0 0

bd: 0 4

cd: 0 5

ap: &ad

bp: &bd

cp: &cd

ea: 0

tmp: 0

carry: 0

.org 100H

begin:

lda bp

add #1 ;; access the lo byte

sta ea

lda (ea) ;; get lo(bd)

sta tmp

lda cp

add #1

sta ea

lda (ea) ;; get lo(cd)

add tmp ;; lo(cd)+lo(bd)

sta tmp

jpc docarry

lda #0 ;; carry = 0

sta carry

jmp addnext

docarry:

lda #1 ;; carry = 1

sta carry

addnext:

lda ap

add #1

sta ea

lda tmp

sta (ea) ;; lo(cd+bd)->lo(ad)

lda bp

sta ea ;; access the hi byte

lda (ea) ;; get hi(bd)

sta tmp

lda cp

sta ea ;; access the hi byte

lda (ea) ;; get hi(cd)

add tmp

add carry ;; hi(cd+bd)+carry

sta tmp

lda ap

sta ea

lda tmp

sta (ea) ;; hi(cd+bd)->hi(ad)

exit:

jsr 1001

.end

It is a bit tedious but not difficult. Please note that we use "pointer" to numbers (ap, bp, cp) to access them (because the number becomes a data structure, array of bytes). The "carry" is used to store the carry bit that will be added to the next digit (byte).

Obviously, programmers are not going to write in all details every time they want to just adding two 16-bit numbers. This code belongs to the library. In the assembly language vocabulary, the old timer called these kinds, "macro" programming. Now you probably started to see why at some circumstances, assembly language programming is necessary.

Don't worry that you do not appreciate what is going on at this level. The "low level" topic will not be included in the assessment.

1 Write a program to find maximum of an array of 8-bit numbers.

2 Write a program to add sum(a,b) where a,b are 8-bit numbers.

3 Write a program to reverse order of an array of 8-bit numbers.

4 Write fibonacci function that you have seen in the lecture, try it out on the simulator.

5 How fast can you clear a block of memory? How many instruction per one byte of work?

6 Write a subroutine to "multiply" x,y. Remember that we have only 8-bit alu, so the result must not exceed 127.

7 Write a subroutine to "insert" one element into the middle of an array.

Remember to write out a psuedo-code (in kind of high level language) first. Then allocate space for the variables. Only then, you can start to write the assembly language. This is, I found from my experience of teaching this subject, the easiest way to get it right the first time.

Well, after reading these questions you should be able to invent questions of your own along these lines.

Good luck and enjoy programming!

< I can write more examples, please let me know what example you like to see>

How to use assembler How to use simulator cp8299x

assembler and simulator from Aj. Thit run under JLAB

NOTE: the two versions of tools (Aj. Prabhas and Aj. Thit) may differ in assembly syntax

Prabhas Chongstitvatana

last update 4 January 2006