2110254 Digital Design and
Processor design at instruction set level and register transfer level,
hardware description language (HDL); functional verification of HDL
models; microprocessors; control unit; memory unit; adders; I/O device
Instead of having the examination of this class at the end of the
semester, it is decided (for students benefit) that we will
administrate the test around the beginning of February. The test will
be on-line using JLAB. I will integrate the assembler and
into a special version of JLAB. I hope to release this version
your use as soon as it is done. Meanwhile, practice your assembly
programming skill using the off-line version posted here. Any bug
report is most welcome.
31 December 2005 The "call" section is
updated. This is a correction for 16-bit PC. It is
different from the lecture in class.
4 January 2006 Tools
available assembler and simulator run under JLAB, and
Assembly level programming
lecturer Prabhas Chongstitvatana
office room 18-13, Engineering building 4,
contact prabhas at chula dot ac dot th
Aim: Learn programming at assembly level
Method: Use the instruction set of the chip for laboratory
session (CP8299) to do a simple
assembly level programming.
previous lecture (last year 2004)
Modify cp8299 to be used for the assembly language programming
class. the modification will be minimal as the intension of the
cp8299 design is to make it as easy as possible to realise. To enable
accessing data structures, the "indirect addressing" is added to
lda/sta intructions. This is the only modification to cp8299.
As cp8299 is a 8-bit processor, the indirect addressing has range only
8 bits. The address 0..255 is designated to data and the
code will be started at 256 onward. The stack starts at 512 and
grows toward the high memory.
0..255 data segment
256..511 code segment
512.... stack segment
specify the address xxx
the label of data or code, for data it is used as a variable name, for
code it is used as label for control transfer destination (target of
jumps or calls).
sym direct address ac <-
indirect address ac <- M[M[sym]]
#n n is a constant
8-bit ac <- n
assumption about the meaning of programs
1 stack grows towards the hi memory.
2 "add" is 2'complement arithmetic (so the range of result is
-128..127, not 0..255).
3 jsr label does the following:
push return address (the address of
next instruction after jsr)
4 the main code starts at 256.
jump to label
and return "rts" pops stack uses that as the return address and
transfer to that address.
There are 16 instructions in cp8299. They are grouped into
ac <-> M/n
add/and/ora/xor arithmetic/logic ac
<- ac op M/n
the lda/sta has 3 addressing modes: direct, indirect,
immediate. direct/indirect require 16-bit argument
(the address), immediate is 8-bit. arithmetic/logic has 2
addressing modes: direct, immediate, immediate is 8-bit. control
transfer has 16-bit argument. the argument is absolute address (I
will ignore the relative mode).
Each instruction has this format:
argument is optional, it is either 8-bit (immediate) or 16-bit
Meaning of each
ads ac <-
(ads) ac <- M[M[ads]]
ads M[ads] <- ac
(ads) M[M[ads]] <- ac
note: there is no sta
ads ac <- ac + M[ads]
#n ac <- ac + n
similarly for and/or/xor
ads jmp to ads, pc = ads
ads if carry flag
== 1 jmp to ads
ads if zero flag ==
1 jmp to ads
ads jmp to
return from subroutine
push ac to stack, sp++, M[sp] <- ac
pop stack to ac, ac <- M[sp], sp--
rotate ac left (through carry)
rotate ac right (through carry)
stack pointer, sp = ads
note: we never use lds, in the simulator sp will start at 512
we will approach programming from the view of translating a high level
language to assembly language.
- control transfer
= b + c
where a,b,c are variables
in cp8299, there is only ac, so all variables are declared to be in the
memory. we declare the location of variable by:
these lines will declare variables a, b, c to be at the locations 0, 1,
2 consecutively with the values 0, 10 and 20. (the simulator will
instantiate these values in the memory).
the code for a = b + c is:
the whole program is:
At the beginning, when reset, cpu pc starts at 0, "jmp begin" transfers
to code segment which started at 100H (256) as the data segment
occupied the addresses 0..0FFH (0..255). ".end" denotes the end
of the assembly program. the simulator knows where the program
ends and will stop the execution there.
= b + c - d
as there is no instruction to subtract, we have to convert d to (-d)
and use "add". to convert to (-d) we do inverse and add
the if-then-else contruct can be translated as follows:
ex1 then ex2 else ex3
a concrete example:
a == 0 then b == 1 else b == 2
;; we don't have jump-if-not-zero
;; so we jump to "then" instead
the instruction set has only "positive" sense, jump-if-carry,
jump-if-zero, in case of "negative" sense, we can "swap" the
the "while" construct is translated as follows:
a concrete example
i = 0
while i < 10
i = i + 1
;; to do i < 10
;; as we do i + (-10)
;; and check if carry flag
;; it signifies negative
to test i < 10, we do i + (-10) < 0? and use carry flag to
indicate the negative result. As there is no jump-if-not-carry,
we do "swap" the jump destination. Other loop such as for can be
done similar to while loop.
Subroutine call and
See the following example of a code snippet.
c = sum(4,5)
Calling a subroutine is done by "jsr" instruction. "jsr ads"
implicitly pushes the return address (the next instruction after jsr)
to stack (pointed by the stack pointer, sp) before transfers to the
destination address. There are two questions:
1) How the actual parameters (4 and 5) are binded to the formal
parameters (a and b)?
2) How the subroutine returns the value (a+b) back to the caller?
There are many ways to do parameter passsing. The simplest way is
to declare the formal parameters as global variables. The
caller just instantiate the values and transfer the control. The
subroutine gets the value from those global variables. However, this
method precludes the subroutine that is recursive (because it uses
global variables and therefore has side-effect). We opt for the
alternative of using stack to pass parameters. The simulator
implements a Big Endian representation ( Hi byte first). When
pushing 16 bits value into the stack, Lo byte will be pushed first then
Hi byte (so that the number in the data segment and stack segment
will be ordered in the same way). The
return address is 16-bit and is saved in the stack when doing a "call".
sum(4,5) is translated to
the picture of stack is:
retads <-- sp
The subroutine must "unstack" the stack to get its actual parameters.
Let declare four variables as local in the subroutine and stored
values from stack there.
Now we must do the last piece, returning a value back to the
caller. We will also use stack to pass a value back. We
must arrange the value in the stack so that at return by the
instruction "rts", the return address must be properly placed at the
top of stack. The return value will be "under" this return
address. See the picture of stack before return:
retads <- sp
This is done by pushing the return value THEN pushing the return
address back and do "rts".
The caller simply pops the return value from stack and uses it.
;; do a+b
psh ;; push a+b
We still leave some topic unresolved, if this subroutine is recursive,
we must "save" the value of local variables (a and b) before we call
recursively. How to do that? It is a bit complicate beyond
the introduction class. I will leave the curious students to work
that out by themselves.
From the beginning we use only the scalar value. Accessing a
scalar is simply lda/sta to a variable. To access a data
structure we need to use a "pointer". The instruction lda/sta is
used in "indirect" addressing mode to access a value pointed to by a
(ads) ac <- M[M[ads]]
(ads) M[M[ads]] <- ac
To access an array element, we calculate the "effective" address of an
element by loading the base address of that array and add the index
(assuming the size of element is one, if the size is otherthan one, we
must also calculate the right "offset"). Then, the value of that
element can be accessed by "indirect" address.
let a be an array a, the base address &a is at 40.
c = a is
ea: 0 ;; use a
temp var to store an effective address
sta ea ;; this
is the effective address
Please note that, as our processor (cp8299x) is a 8-bit machine, its
alu is 8-bit, hence it can perform only 8-bit arithmetic in calculating
the effective address. This limits the range of indirect
addressing to 0..255. In contrast, the direct addressing is
16-bit. This characteristic is due to our choice of modification
to the cp8299. Other design where the addressing is flat 0..65335
is possible (but the question is how you are going to calculate the
Another example comes from a basic data structure, linked list.
If we assume a "cell" consists of 2 bytes, the first byte is the
information, the second byte is the address of the next cell (with only
0..255, the pointer is only 8-bit). We can access a cell using
let m be a list (3 4 5) of the following structure: a cell is
represented as 2 bytes: [ads:info, ads:next]
we can represent (3 4 5 ) as:
[10:3, 11:20] [20:4, 21:24] [24:5, 25:0]
A null pointer is 0. Accessing m.info is similar to m,
and m.next is m.
p = m.next
m.info = 6
add #1 ;;
sta (m) ;;
Example of assembly
Arming with these basics, we will now proceed to show you some assembly
language programs. "jsr 1001" is a pseudo code to stop the
1 storing a value into the whole array
let a be an array 10
while i < 10
a[i] = 8
i = i + 1
a: 0 0 0 0 0 0 0 0 0 0
jpc body ;; test i
sta ea ;;
sta (ea) ;; a[i] = 8
i ;; i = i + 1
The base address of array a is stored in "aa", if "jmp begin" is
relative its size is two bytes, "&a" will be the address 5.
The effective address is in a temp var "ea".
Ex 2 searching
a linked list.
Let x be a list of number, this list is represented by a linked list of
the structure [ads:info, ads:next] of size two bytes, with a null
pointer 0. Let d be a number input, we want to check if d is in
the list x.
seach( x, d) will check if d is in the list x, it returns 1 if found,
x, d )
flag = 0
while x != nil
if x.info == d
flag = 1
test data, x list is (7, 8, 9). we wrote search() as a subroutine
with two parameters.
ax: 10 ;; &x
x: 7 12 8 14 9 0
;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )
sta c ;; c = search(x, 8)
;; search in written as a subroutine with two parameters
sta xp ;; x is pointer
sta md ;; do (-d) for comparison
jpz ret ;; test x == nil
sta ea ;; x is already in ac, do &x
lda (ea) ;; get x.info
add md ;; test x.info == d
sta ea ;; &(x.next)
lda (ea) ;; x.next
sta xp ;; x = x.next
lda flag ;; return flag
rts ;; return
We wrote search() as a subroutine. In main, search is called with
search(x,8) and the result is stored in c (where we can see the result
by inspecting the memory content). Search has the preamble: pop
retads, pop d, pop x, and the body of function, then the postamble to
return result: push return value, push retads and rts. The body
uses the access function of linked list as described in the previous
"low level" assembly
In a real chip, there always is a limited resource so that some
operation is not available directly as an instruction, for example,
cp8299x does not have "subtract". In order to perform this
operation, sequence of existing instructions can be used. This is
what I mean by "low level" assembly. You have seen previously how
we perform "subtract" using "xor" and "add":
a = b - c
sta mc ;; mc =
To give you some idea what a programmer must face when writing programs
at the machine level, I will show how a 8-bit processor performs 16-bit
addition. We use two bytes to store a 16-bit number then
accessing it byte by byte. Adding one byte to one byte then use
the carry flag to add the next two bytes together.
Let bd, cd be two input numbers, let ad be the result, we use the "big
endian" convention in storing these numbers in the memory.
ad: 0 0
bd: 0 4
cd: 0 5
;; access the lo byte
lda (ea) ;; get
lda (ea) ;; get
add tmp ;;
;; carry = 0
;; carry = 1
sta (ea) ;;
;; access the hi byte
lda (ea) ;; get
;; access the hi byte
lda (ea) ;; get
add carry ;;
sta (ea) ;;
It is a bit tedious but not difficult. Please note that we use
"pointer" to numbers (ap, bp, cp) to access them (because the number
becomes a data structure, array of bytes). The "carry" is used to
store the carry bit that will be added to the next digit (byte).
Obviously, programmers are not going to write in all details every time
they want to just adding two 16-bit numbers. This code belongs to
the library. In the assembly language vocabulary, the old timer
called these kinds, "macro" programming. Now you probably started
to see why at some circumstances, assembly language programming is
Don't worry that you do not appreciate what is going on at this
level. The "low level" topic will not be included in the
Please try your hands on these examples. They will help to hone
your skill a bit so that you will be comfortable doing the assembly
level programming for your forthcoming laboratory sessions. (and in the
1 Write a program to find maximum of an array of 8-bit numbers.
2 Write a program to add sum(a,b) where a,b are 8-bit numbers.
3 Write a program to reverse order of an array of 8-bit numbers.
4 Write fibonacci function that you have seen in the lecture, try
it out on the simulator.
5 How fast can you clear a block of memory? How many
instruction per one byte of work?
6 Write a subroutine to "multiply" x,y. Remember that we
have only 8-bit alu, so the result must not exceed 127.
7 Write a subroutine to "insert" one element into the middle of
Remember to write out a psuedo-code (in kind of high level language)
first. Then allocate space for the variables. Only then,
you can start to write the assembly language. This is, I found
from my experience of teaching this subject, the easiest way to get it
right the first time.
Well, after reading these questions you should be able to invent
questions of your own along these lines.
Good luck and enjoy programming!
< I can write more examples, please let me know what
example you like to see>
assembler and simulator for cp8299x from
Aj. Prabhas stand-alone version run on XP and DOS
How to use assembler How to use simulator cp8299x
assembler and simulator from Aj. Thit run
NOTE: the two versions of tools (Aj. Prabhas and Aj. Thit)
may differ in assembly syntax
last update 4 January 2006