2110254 Digital Design and
Verification 2005
Course
description
Course description
Processor design at instruction set level and register transfer level,
hardware description language (HDL); functional verification of HDL
models; microprocessors; control unit; memory unit; adders; I/O device
interfaces.
Annoucement
Instead of having the examination of this class at the end of the
semester, it is decided (for students benefit) that we will
administrate the test around the beginning of February. The test will
be on-line using JLAB. I will integrate the assembler and
simulator
into a special version of JLAB. I hope to release this version
for
your use as soon as it is done. Meanwhile, practice your assembly
programming skill using the off-line version posted here. Any bug
report is most welcome.
31 December 2005 The "call" section is
updated. This is a correction for 16-bit PC. It is
different from the lecture in class.
4 January 2006 Tools
available assembler and simulator run under JLAB, and
stand-alone version
Part 2
Assembly level programming
lecturer Prabhas Chongstitvatana
office room 18-13, Engineering building 4,
floor 18.
tel 02-2186982
contact prabhas at chula dot ac dot th
Aim: Learn programming at assembly level
Method: Use the instruction set of the chip for laboratory
session (CP8299) to do a simple
assembly level programming.
previous lecture (last year 2004)
cp8299x
Modify cp8299 to be used for the assembly language programming
class. the modification will be minimal as the intension of the
cp8299 design is to make it as easy as possible to realise. To enable
accessing data structures, the "indirect addressing" is added to
lda/sta intructions. This is the only modification to cp8299.
As cp8299 is a 8-bit processor, the indirect addressing has range only
8 bits. The address 0..255 is designated to data and the
code will be started at 256 onward. The stack starts at 512 and
grows toward the high memory.
memory map
0..255 data segment
256..511 code segment
512.... stack segment
Assembler syntax
.org xxx
specify the address xxx
sym: is
the label of data or code, for data it is used as a variable name, for
code it is used as label for control transfer destination (target of
jumps or calls).
Addressing mode
lda
sym direct address ac <-
M[sym]
lda (sym)
indirect address ac <- M[M[sym]]
lda
#n n is a constant
8-bit ac <- n
assumption about the meaning of programs
1 stack grows towards the hi memory.
2 "add" is 2'complement arithmetic (so the range of result is
-128..127, not 0..255).
3 jsr label does the following:
push return address (the address of
next instruction after jsr)
jump to label
and return "rts" pops stack uses that as the return address and
transfer to that address.
4 the main code starts at 256.
Explanation of
instruction set
There are 16 instructions in cp8299. They are grouped into
four groups:
1
lda/sta
ac <-> M/n
2
add/and/ora/xor arithmetic/logic ac
<- ac op M/n
3 jmp/jpc/jpz/jsr/rts
control transfer
4 psh/pop/ror/rol/lds
others
the lda/sta has 3 addressing modes: direct, indirect,
immediate. direct/indirect require 16-bit argument
(the address), immediate is 8-bit. arithmetic/logic has 2
addressing modes: direct, immediate, immediate is 8-bit. control
transfer has 16-bit argument. the argument is absolute address (I
will ignore the relative mode).
Each instruction has this format:
op:4
mode:4 [argument]
argument is optional, it is either 8-bit (immediate) or 16-bit
(address).
Meaning of each
instruction
lda
ads ac <-
M[ads]
lda
(ads) ac <- M[M[ads]]
lda
#n ac
<- n
sta
ads M[ads] <- ac
sta
(ads) M[M[ads]] <- ac
note: there is no sta
#n mode!
add
ads ac <- ac + M[ads]
add
#n ac <- ac + n
similarly for and/or/xor
jmp
ads jmp to ads, pc = ads
jpc
ads if carry flag
== 1 jmp to ads
jpz
ads if zero flag ==
1 jmp to ads
jsr
ads jmp to
subroutine ads
rts
return from subroutine
psh
push ac to stack, sp++, M[sp] <- ac
pop
pop stack to ac, ac <- M[sp], sp--
rol
rotate ac left (through carry)
ror
rotate ac right (through carry)
lds
ads load
stack pointer, sp = ads
note: we never use lds, in the simulator sp will start at 512
automatically.
Assembly language
programming
we will approach programming from the view of translating a high level
language to assembly language.
- assignment
- control transfer
- call
Assignment
a
= b + c
where a,b,c are variables
in cp8299, there is only ac, so all variables are declared to be in the
memory. we declare the location of variable by:
.org
0
a: 0
b: 10
c: 20
these lines will declare variables a, b, c to be at the locations 0, 1,
2 consecutively with the values 0, 10 and 20. (the simulator will
instantiate these values in the memory).
the code for a = b + c is:
lda
b
add c
sta a
the whole program is:
.org
0
jmp begin
a: 0
b: 10
c: 20
.org 100H
begin:
lda b
add c
sta a
.end
At the beginning, when reset, cpu pc starts at 0, "jmp begin" transfers
to code segment which started at 100H (256) as the data segment
occupied the addresses 0..0FFH (0..255). ".end" denotes the end
of the assembly program. the simulator knows where the program
ends and will stop the execution there.
another example
a
= b + c - d
as there is no instruction to subtract, we have to convert d to (-d)
and use "add". to convert to (-d) we do inverse and add
one.
.org
0
jmp begin
a: 0
b: 10
c: 20
d: 5
tmp: 0
.org 100H
begin:
lda b
add c
sta tmp
lda d
xor #-1
add #1
sta d
lda tmp
add d
sta a
.end
Control transfer
the if-then-else contruct can be translated as follows:
if
ex1 then ex2 else ex3
ex1
jump-if-false else
ex2
jump exit
else:
ex3
exit:
a concrete example:
if
a == 0 then b == 1 else b == 2
.org 0
jmp begin
a: 0
b: 0
.org 100H
begin:
lda a
;; we don't have jump-if-not-zero
;; so we jump to "then" instead
jpz then
else:
lda #2
sta b
jmp exit
then:
lda #1
sta b
exit:
.end
the instruction set has only "positive" sense, jump-if-carry,
jump-if-zero, in case of "negative" sense, we can "swap" the
destination.
the "while" construct is translated as follows:
while
cond
body
loop:
cond
jump-if-false exit
body
jmp loop
exit:
a concrete example
i = 0
while i < 10
i = i + 1
.org 0
jmp begin
i: 0
.org 100H
begin:
lda #0
sta i
loop:
;; to do i < 10
;; as we do i + (-10)
;; and check if carry flag
is set
;; it signifies negative
result
lda i
add #-10
jpc doit
jmp exit
doit:
lda i
add #1
sta i
jmp loop
exit:
.end
to test i < 10, we do i + (-10) < 0? and use carry flag to
indicate the negative result. As there is no jump-if-not-carry,
we do "swap" the jump destination. Other loop such as for can be
done similar to while loop.
Subroutine call and
parameter passing
See the following example of a code snippet.
sum(a,b)
return a+b
main
c = sum(4,5)
Calling a subroutine is done by "jsr" instruction. "jsr ads"
implicitly pushes the return address (the next instruction after jsr)
to stack (pointed by the stack pointer, sp) before transfers to the
destination address. There are two questions:
1) How the actual parameters (4 and 5) are binded to the formal
parameters (a and b)?
2) How the subroutine returns the value (a+b) back to the caller?
There are many ways to do parameter passsing. The simplest way is
to declare the formal parameters as global variables. The
caller just instantiate the values and transfer the control. The
subroutine gets the value from those global variables. However, this
method precludes the subroutine that is recursive (because it uses
global variables and therefore has side-effect). We opt for the
alternative of using stack to pass parameters. The simulator
implements a Big Endian representation ( Hi byte first). When
pushing 16 bits value into the stack, Lo byte will be pushed first then
Hi byte (so that the number in the data segment and stack segment
will be ordered in the same way). The
return address is 16-bit and is saved in the stack when doing a "call".
sum(4,5) is translated to
lda #4
psh
lda #5
psh
jsr sum
the picture of stack is:
hi
retads <-- sp
retads2
5
4
lo
The subroutine must "unstack" the stack to get its actual parameters.
Let declare four variables as local in the subroutine and stored
values from stack there.
.org 0
jmp main
retads: 0
retads2: 0
a: 0
b: 0
c: 0
.org 100H
main:
lda #4
psh
lda #5
psh
jsr sum
...
Now we must do the last piece, returning a value back to the
caller. We will also use stack to pass a value back. We
must arrange the value in the stack so that at return by the
instruction "rts", the return address must be properly placed at the
top of stack. The return value will be "under" this return
address. See the picture of stack before return:
hi
retads <- sp
retads2
a+b
lo
This is done by pushing the return value THEN pushing the return
address back and do "rts".
;;
subroutine
sum:
pop
sta retads
pop
sta retads2
pop
sta b
pop
sta a
;; do a+b
lda a
add b
psh ;; push a+b
lda retads2
psh
lda retads
psh
rts
The caller simply pops the return value from stack and uses it.
main:
...
lda #4
psh
lda #5
psh
jsr sum
pop
sta c
.end
We still leave some topic unresolved, if this subroutine is recursive,
we must "save" the value of local variables (a and b) before we call
recursively. How to do that? It is a bit complicate beyond
the introduction class. I will leave the curious students to work
that out by themselves.
Accessing data
structures
From the beginning we use only the scalar value. Accessing a
scalar is simply lda/sta to a variable. To access a data
structure we need to use a "pointer". The instruction lda/sta is
used in "indirect" addressing mode to access a value pointed to by a
pointer.
lda
(ads) ac <- M[M[ads]]
sta
(ads) M[M[ads]] <- ac
To access an array element, we calculate the "effective" address of an
element by loading the base address of that array and add the index
(assuming the size of element is one, if the size is otherthan one, we
must also calculate the right "offset"). Then, the value of that
element can be accessed by "indirect" address.
let a be an array a[10], the base address &a[0] is at 40.
c = a[2] is
translated to:
.org 40
ea: 0 ;; use a
temp var to store an effective address
c: 0
a: ...
.org 100H
...
lda #40
add #2
sta ea ;; this
is the effective address
lda (ea)
sta c
...
.end
Please note that, as our processor (cp8299x) is a 8-bit machine, its
alu is 8-bit, hence it can perform only 8-bit arithmetic in calculating
the effective address. This limits the range of indirect
addressing to 0..255. In contrast, the direct addressing is
16-bit. This characteristic is due to our choice of modification
to the cp8299. Other design where the addressing is flat 0..65335
is possible (but the question is how you are going to calculate the
effective address?).
Another example comes from a basic data structure, linked list.
If we assume a "cell" consists of 2 bytes, the first byte is the
information, the second byte is the address of the next cell (with only
0..255, the pointer is only 8-bit). We can access a cell using
"indirect" addressing.
let m be a list (3 4 5) of the following structure: a cell is
represented as 2 bytes: [ads:info, ads:next]
we can represent (3 4 5 ) as:
[10:3, 11:20] [20:4, 21:24] [24:5, 25:0]
A null pointer is 0. Accessing m.info is similar to m[0],
and m.next is m[1].
m
= 10
p = m.next
m.info = 6
...
lda #10
sta m
lda m
add #1 ;;
m.next
sta ea
lda (ea)
sta p
lda #6
sta (m) ;;
m.info
...
Example of assembly
programs
Arming with these basics, we will now proceed to show you some assembly
language programs. "jsr 1001" is a pseudo code to stop the
simulator.
Ex
1 storing a value into the whole array
let a be an array 10
i
= 0
while i < 10
a[i] = 8
i = i + 1
.org 0
jmp begin
i:
aa: &a
ea: 0
a: 0 0 0 0 0 0 0 0 0 0
.org 100H
begin:
lda #0
sta i
loop:
lda i
add #-10
jpc body ;; test i
< 10
jmp exit
body:
lda aa
add i
sta ea ;;
&a[i]
lda #8
sta (ea) ;; a[i] = 8
lda i
add i
sta
i ;; i = i + 1
jmp loop
exit:
jsr 1001
.end
The base address of array a is stored in "aa", if "jmp begin" is
relative its size is two bytes, "&a" will be the address 5.
The effective address is in a temp var "ea".
Ex 2 searching
a linked list.
Let x be a list of number, this list is represented by a linked list of
the structure [ads:info, ads:next] of size two bytes, with a null
pointer 0. Let d be a number input, we want to check if d is in
the list x.
seach( x, d) will check if d is in the list x, it returns 1 if found,
otherwise 0.
search(
x, d )
flag = 0
while x != nil
if x.info == d
flag = 1
break
else
x
= x.next
return flag
test data, x list is (7, 8, 9). we wrote search() as a subroutine
with two parameters.
;;
search
.org 0
jmp main
flag: 0
retads: 0
retads2: 0
ea: 0
ax: 10 ;; &x
d: 0
xp: 0
x: 7 12 8 14 9 0
md: 0
c: 0
;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )
.org 100H
main:
lda ax
psh
lda #8
psh
jsr search
pop
sta c ;; c = search(x, 8)
jmp exit
;; search in written as a subroutine with two parameters
search:
pop
sta retads
pop
sta retads2
pop
sta d
pop
sta xp ;; x is pointer
lda d
xor #-1
add #1
sta md ;; do (-d) for comparison
lda #0
sta flag
loop:
lda xp
jpz ret ;; test x == nil
sta ea ;; x is already in ac, do &x
lda (ea) ;; get x.info
add md ;; test x.info == d
jpz then
else:
lda xp
add 1
sta ea ;; &(x.next)
lda (ea) ;; x.next
sta xp ;; x = x.next
jmp loop
then:
lda #1
sta flag
ret:
lda flag ;; return flag
psh
lda retads2
psh
lda retads
psh
rts ;; return
exit:
jsr 1001
.end
We wrote search() as a subroutine. In main, search is called with
search(x,8) and the result is stored in c (where we can see the result
by inspecting the memory content). Search has the preamble: pop
retads, pop d, pop x, and the body of function, then the postamble to
return result: push return value, push retads and rts. The body
uses the access function of linked list as described in the previous
section.
"low level" assembly
programming
In a real chip, there always is a limited resource so that some
operation is not available directly as an instruction, for example,
cp8299x does not have "subtract". In order to perform this
operation, sequence of existing instructions can be used. This is
what I mean by "low level" assembly. You have seen previously how
we perform "subtract" using "xor" and "add":
a = b - c
lda c
xor #-1
add #1
sta mc ;; mc =
(-c)
lda b
add mc
sta a
To give you some idea what a programmer must face when writing programs
at the machine level, I will show how a 8-bit processor performs 16-bit
addition. We use two bytes to store a 16-bit number then
accessing it byte by byte. Adding one byte to one byte then use
the carry flag to add the next two bytes together.
Let bd, cd be two input numbers, let ad be the result, we use the "big
endian" convention in storing these numbers in the memory.
.org 0
jmp begin
ad: 0 0
bd: 0 4
cd: 0 5
ap: &ad
bp: &bd
cp: &cd
ea: 0
tmp: 0
carry: 0
.org 100H
begin:
lda bp
add #1
;; access the lo byte
sta ea
lda (ea) ;; get
lo(bd)
sta tmp
lda cp
add #1
sta ea
lda (ea) ;; get
lo(cd)
add tmp ;;
lo(cd)+lo(bd)
sta tmp
jpc docarry
lda #0
;; carry = 0
sta carry
jmp addnext
docarry:
lda #1
;; carry = 1
sta carry
addnext:
lda ap
add #1
sta ea
lda tmp
sta (ea) ;;
lo(cd+bd)->lo(ad)
lda bp
sta ea
;; access the hi byte
lda (ea) ;; get
hi(bd)
sta tmp
lda cp
sta ea
;; access the hi byte
lda (ea) ;; get
hi(cd)
add tmp
add carry ;;
hi(cd+bd)+carry
sta tmp
lda ap
sta ea
lda tmp
sta (ea) ;;
hi(cd+bd)->hi(ad)
exit:
jsr 1001
.end
It is a bit tedious but not difficult. Please note that we use
"pointer" to numbers (ap, bp, cp) to access them (because the number
becomes a data structure, array of bytes). The "carry" is used to
store the carry bit that will be added to the next digit (byte).
Obviously, programmers are not going to write in all details every time
they want to just adding two 16-bit numbers. This code belongs to
the library. In the assembly language vocabulary, the old timer
called these kinds, "macro" programming. Now you probably started
to see why at some circumstances, assembly language programming is
necessary.
Don't worry that you do not appreciate what is going on at this
level. The "low level" topic will not be included in the
assessment.
Excercises
Please try your hands on these examples. They will help to hone
your skill a bit so that you will be comfortable doing the assembly
level programming for your forthcoming laboratory sessions. (and in the
on-line exam!)
1 Write a program to find maximum of an array of 8-bit numbers.
2 Write a program to add sum(a,b) where a,b are 8-bit numbers.
3 Write a program to reverse order of an array of 8-bit numbers.
4 Write fibonacci function that you have seen in the lecture, try
it out on the simulator.
5 How fast can you clear a block of memory? How many
instruction per one byte of work?
6 Write a subroutine to "multiply" x,y. Remember that we
have only 8-bit alu, so the result must not exceed 127.
7 Write a subroutine to "insert" one element into the middle of
an array.
Remember to write out a psuedo-code (in kind of high level language)
first. Then allocate space for the variables. Only then,
you can start to write the assembly language. This is, I found
from my experience of teaching this subject, the easiest way to get it
right the first time.
Well, after reading these questions you should be able to invent
questions of your own along these lines.
Good luck and enjoy programming!
< I can write more examples, please let me know what
example you like to see>
Tools
assembler and simulator for cp8299x from
Aj. Prabhas stand-alone version run on XP and DOS
How to use assembler How to use simulator cp8299x
assembler and simulator from Aj. Thit run
under JLAB
NOTE: the two versions of tools (Aj. Prabhas and Aj. Thit)
may differ in assembly syntax
Prabhas Chongstitvatana
last update 4 January 2006