2110254  Digital Design and
Verification 2005
Course
description
Course description
Processor design at instruction set level and register transfer level,
hardware description language (HDL); functional verification of HDL
models; microprocessors; control unit; memory unit; adders; I/O device
interfaces.
Annoucement
Instead of having the examination of this class at the end of the
semester, it is decided (for students benefit) that we will
administrate the test around the beginning of February. The test will
be on-line using JLAB.  I will integrate the assembler and
simulator
into a special version of JLAB.  I hope to release this version
for
your use as soon as it is done.  Meanwhile, practice your assembly
programming skill using the off-line version posted here.  Any bug
report is most welcome.
31 December 2005     The "call"  section is
updated.  This is a correction for 16-bit PC.  It is
different from the lecture in class.
4 January 2006     Tools
available  assembler and simulator run under JLAB, and
stand-alone version    
Part 2   
Assembly level programming
lecturer   Prabhas Chongstitvatana
office     room 18-13, Engineering building 4,
floor 18.
tel        02-2186982
contact    prabhas at chula dot ac dot th
Aim:  Learn programming at assembly level
Method:  Use the instruction set of the chip for laboratory
session (CP8299) to do a simple
assembly level programming.  
previous lecture (last year 2004)
cp8299x
Modify cp8299 to be used for the assembly language programming
class.  the modification will be minimal as the intension of the
cp8299 design is to make it as easy as possible to realise. To enable
accessing data structures, the "indirect addressing" is added to
lda/sta intructions.  This is the only modification to cp8299.
As cp8299 is a 8-bit processor, the indirect addressing has range only
8 bits.  The address 0..255 is designated to data  and the
code will be started at 256 onward.  The stack starts at 512 and
grows toward the high memory.
memory map 
0..255   data segment
256..511  code segment
512....     stack segment
Assembler syntax
    .org xxx
specify the address xxx
sym:    is
the label of data or code, for data it is used as a variable name, for
code it is used as label for control transfer destination (target of
jumps or calls).
Addressing mode
lda
sym      direct address   ac <-
M[sym]
lda (sym)   
indirect address  ac <- M[M[sym]]
lda
#n       n is a constant
8-bit   ac <- n
assumption about the meaning of programs
1  stack grows towards the hi memory.
2  "add" is 2'complement arithmetic (so the range of result is
-128..127, not 0..255).
3  jsr label  does the following:
push return address (the address of
next instruction after jsr)
jump to label
and return "rts" pops stack uses that as the return address and
transfer to that address.
4  the main code starts at 256.
Explanation of
instruction set
There are 16 instructions in cp8299.  They are grouped into 
four groups:
1 
lda/sta             
ac <-> M/n
2 
add/and/ora/xor      arithmetic/logic  ac
<-  ac op M/n  
3  jmp/jpc/jpz/jsr/rts 
control transfer
4  psh/pop/ror/rol/lds 
others
the lda/sta  has 3 addressing modes:  direct, indirect,
immediate.   direct/indirect  require 16-bit argument
(the address), immediate is 8-bit. arithmetic/logic  has 2
addressing modes: direct, immediate, immediate is 8-bit. control
transfer has 16-bit argument.  the argument is absolute address (I
will ignore the relative mode).
Each instruction has this format:
op:4
mode:4 [argument]    
argument is optional, it is either 8-bit (immediate) or 16-bit
(address).
Meaning of each
instruction
lda
ads         ac <- 
M[ads]
lda
(ads)       ac <-  M[M[ads]]
lda
#n          ac
<-  n
sta
ads         M[ads] <- ac
sta
(ads)       M[M[ads]] <- ac
note:  there is no sta
#n  mode!
add
ads        ac <- ac + M[ads]
add
#n         ac <- ac + n
similarly for and/or/xor
jmp
ads        jmp to ads, pc = ads
jpc
ads        if carry flag
== 1 jmp to ads
jpz
ads        if zero flag ==
1 jmp to ads
jsr
ads        jmp to
subroutine ads
rts           
return from subroutine
psh           
push ac to stack, sp++, M[sp] <- ac
pop           
pop stack to ac, ac <- M[sp], sp--
rol           
rotate ac left (through carry)
ror           
rotate ac right (through carry)
lds
ads        load
stack pointer, sp = ads 
note: we never use lds, in the simulator sp will start at 512
automatically.
Assembly language
programming
we will approach programming from the view of translating a high level
language to assembly language.
- assignment
- control transfer
- call
Assignment
a
= b + c
where a,b,c are variables
in cp8299, there is only ac, so all variables are declared to be in the
memory.  we declare the location of variable by:
.org
0
a:  0
b:  10
c:   20
these lines will declare variables a, b, c to be at the locations 0, 1,
2 consecutively with the values 0, 10 and 20. (the simulator will
instantiate these values in the memory).
the code for a = b + c is:
lda
b
add c
sta a
the whole program is:
.org
0
  jmp begin
a: 0
b: 10
c: 20
.org 100H
begin:
  lda b
  add c
  sta a
.end
At the beginning, when reset, cpu pc starts at 0, "jmp begin" transfers
to code segment which started at 100H (256) as the data segment
occupied the addresses 0..0FFH (0..255).  ".end" denotes the end
of the assembly program.  the simulator knows where the program
ends and will stop the execution there.
another example
a
= b + c - d
as there is no instruction to subtract, we have to convert d to (-d)
and use "add".  to convert to (-d) we do inverse and add
one.  
.org
0
  jmp begin
a: 0
b: 10
c: 20
d: 5
tmp: 0
.org 100H
begin:
  lda b
  add c
  sta tmp
  lda d
  xor #-1
  add #1
  sta d
  lda tmp
  add d
  sta a
.end
Control transfer
the if-then-else contruct can be translated as follows:
if
ex1 then ex2 else ex3
ex1
jump-if-false else
ex2
jump exit
else:
ex3
exit:
a concrete example:
if
a == 0 then b == 1 else b == 2
.org 0
  jmp begin
a: 0
b: 0
.org 100H
begin:
  lda a
;; we don't have jump-if-not-zero
;; so we jump to "then" instead
  jpz then
else:
  lda #2
  sta b
  jmp exit
then:
  lda #1
  sta b
exit:
.end
the instruction set has only "positive" sense, jump-if-carry,
jump-if-zero, in case of "negative" sense, we can "swap" the
destination.
the "while" construct is translated as follows:
while
cond
  body
loop:
  cond
  jump-if-false exit
  body
  jmp loop
exit:
a concrete example
 
i = 0
  while i < 10
     i = i + 1
.org 0
  jmp begin
i: 0
.org 100H
begin:
  lda #0
  sta i
loop:
;;  to do i < 10 
;;  as we do i + (-10) 
;;  and check if carry flag
is set
;;  it signifies negative
result
  lda i
  add #-10
  jpc doit
  jmp exit
doit:
  lda i
  add #1
  sta i
  jmp loop
exit:
.end
to test i < 10, we do i + (-10) < 0?  and use carry flag to
indicate the negative result.  As there is no jump-if-not-carry,
we do "swap" the jump destination.  Other loop such as for can be
done similar to while loop.
Subroutine call and
parameter passing
See the following example of a code snippet.
sum(a,b)
  return a+b
main
  c = sum(4,5)
Calling a subroutine is done by "jsr" instruction.  "jsr ads"
implicitly pushes the return address (the next instruction after jsr)
to stack (pointed by the stack pointer, sp) before transfers to the
destination address.  There are two questions: 
1) How the actual parameters (4 and 5) are binded to the formal
parameters (a and b)?  
2) How the subroutine returns the value (a+b) back to the caller?
There are many ways to do parameter passsing.  The simplest way is
to declare the formal parameters as global variables.  The
caller just instantiate the values and transfer the control.  The
subroutine gets the value from those global variables. However, this
method precludes the subroutine that is recursive (because it uses
global variables and therefore has side-effect).  We opt for the
alternative of using stack to pass parameters.  The simulator
implements a Big Endian representation ( Hi byte first).  When
pushing 16 bits value into the stack, Lo byte will be pushed first then
Hi byte  (so that the number in the data segment and stack segment
will be ordered in the same way).  The
return address is 16-bit and is saved in the stack when doing a "call".
sum(4,5) is translated to
 
lda #4
  psh
  lda #5
  psh
  jsr sum
the picture of stack is:
hi
  retads  <-- sp
  retads2
  5
  4
lo
The subroutine must "unstack" the stack to get its actual parameters.
Let declare four variables as local in the subroutine and stored
values from stack there.
 
.org 0
  jmp main
retads: 0
retads2: 0
a: 0
b: 0
c: 0
.org 100H
main:
  lda #4
  psh
  lda #5
  psh
  jsr sum
  ...
Now we must do the last piece, returning a value back to the
caller.  We will also use stack to pass a value back.  We
must arrange the value in the stack so that at return by the
instruction "rts", the return address must be properly placed at the
top of stack.  The return value will be "under" this return
address.  See the picture of stack before return:
hi
  retads  <-  sp
  retads2 
  a+b
lo
This is done by pushing the return value THEN pushing the return
address back and do "rts".
;;
subroutine
sum:
  pop
  sta retads
  pop 
  sta retads2
  pop
  sta b
  pop
  sta a
;;  do a+b
  lda a
  add b
  psh   ;; push a+b
  lda retads2
  psh
  lda retads
  psh
  rts
The caller simply pops the return value from stack and uses it.
main:
  ...
  lda #4
  psh
  lda #5
  psh
  jsr sum
  pop
  sta c
.end
We still leave some topic unresolved, if this subroutine is recursive,
we must "save" the value of local variables (a and b) before we call
recursively.  How to do that?  It is a bit complicate beyond
the introduction class.  I will leave the curious students to work
that out by themselves.
Accessing data
structures
From the beginning we use only the scalar value.  Accessing a
scalar is simply lda/sta to a variable.  To access a data
structure we need to use a "pointer".  The instruction lda/sta is
used in "indirect" addressing mode to access a value pointed to by a
pointer.  
lda
(ads)       ac <-  M[M[ads]]
sta
(ads)       M[M[ads]] <- ac
To access an array element, we calculate the "effective" address of an
element by loading the base address of that array and add the index
(assuming the size of element is one, if the size is otherthan one, we
must also calculate the right "offset").  Then, the value of that
element can be accessed by "indirect" address.
let a be an array  a[10], the base address &a[0] is at 40.
c = a[2]  is
translated to:
 
.org 40
ea: 0    ;; use a
temp var to store an effective address
c:  0
a: ...
  .org 100H
  ...
  lda #40
  add #2
  sta ea   ;; this
is the effective address
  lda (ea)
  sta c
  ...
.end
Please note that, as our processor (cp8299x) is a 8-bit machine, its
alu is 8-bit, hence it can perform only 8-bit arithmetic in calculating
the effective address.  This limits the range of indirect
addressing to 0..255.  In contrast, the direct addressing is
16-bit.  This characteristic is due to our choice of modification
to the cp8299.  Other design where the addressing is flat 0..65335
is possible (but the question is how you are going to calculate the
effective address?).
Another example comes from a basic data structure, linked list. 
If we assume a "cell" consists of 2 bytes, the first byte is the
information, the second byte is the address of the next cell (with only
0..255, the pointer is only 8-bit).  We can access a cell using
"indirect" addressing.
let m be a list (3 4 5) of the following structure:  a cell is
represented as 2 bytes:  [ads:info, ads:next]
we can represent (3 4 5 ) as:
[10:3, 11:20] [20:4, 21:24] [24:5, 25:0]
A null pointer is 0.  Accessing m.info  is similar to m[0],
and m.next is m[1].
m
= 10
p = m.next
m.info = 6
  ...
  lda #10
  sta m
  lda m
  add #1   ;;
m.next
  sta ea
  lda (ea)
  sta p
  lda #6
  sta (m)   ;;
m.info
  ...
Example of assembly
programs
Arming with these basics, we will now proceed to show you some assembly
language programs.  "jsr 1001" is a pseudo code to stop the
simulator.
Ex
1    storing a value into the whole array
let a be an array 10
i
= 0
while i < 10
  a[i] = 8
  i = i + 1
.org 0
  jmp begin
i:
aa: &a
ea: 0
a: 0 0 0 0 0 0 0 0 0 0
.org 100H
begin:
  lda #0
  sta i
loop:
  lda i
  add #-10
  jpc body  ;; test i
< 10
  jmp exit
body:
  lda aa
  add i
  sta ea   ;;
&a[i]
  lda #8
  sta (ea)  ;; a[i] = 8
  lda i
  add i
  sta
i      ;; i = i + 1
  jmp loop
exit:
  jsr 1001
  .end
The base address of array a is stored in "aa", if "jmp begin" is
relative its size is two bytes, "&a" will be the address 5. 
The effective address is in a temp var "ea".     
Ex 2  searching
a linked list.  
Let x be a list of number, this list is represented by a linked list of
the structure [ads:info, ads:next] of size two bytes, with a null
pointer 0.  Let d be a number input, we want to check if d is in
the list x.
seach( x, d) will check if d is in the list x, it returns 1 if found,
otherwise 0.
search(
x, d )
  flag = 0
  while x != nil
    if x.info == d
     
flag = 1
     
break
    else
      x
= x.next
  return flag
test data, x list is (7, 8, 9).  we wrote search() as a subroutine
with two parameters.
;;
search
.org 0
  jmp main
flag: 0
retads: 0
retads2: 0
ea: 0
ax: 10   ;; &x
d: 0
xp: 0
x: 7 12 8 14 9 0
md: 0
c: 0
;; list x ( [10:7, 11:12] [12:8, 13:14] [14:9, 15:0] )
.org 100H
main:
  lda ax
  psh 
  lda #8
  psh
  jsr search
  pop
  sta c       ;; c = search(x, 8)
  jmp exit
;; search in written as a subroutine with two parameters
search:
  pop
  sta retads
  pop
  sta retads2
  pop 
  sta d
  pop
  sta xp     ;;  x is pointer
  lda d
  xor #-1
  add #1
  sta md  ;; do (-d) for comparison
  lda #0
  sta flag
loop:
  lda xp
  jpz ret   ;; test x == nil
  sta ea    ;; x is already in ac, do &x
  lda (ea)  ;; get x.info
  add md  ;; test x.info == d
  jpz then
else:
  lda xp
  add 1
  sta ea    ;; &(x.next)
  lda (ea)  ;; x.next
  sta xp    ;; x = x.next
  jmp loop
then:
  lda #1
  sta flag
ret:
  lda flag  ;; return flag
  psh
  lda retads2
  psh
  lda retads 
  psh
  rts        ;; return
exit:
  jsr 1001
  .end
We wrote search() as a subroutine.  In main, search is called with
search(x,8) and the result is stored in c (where we can see the result
by inspecting the memory content).  Search has the preamble: pop
retads, pop d, pop x, and the body of function, then the postamble to
return result: push return value, push retads and rts.  The body
uses the access function of linked list as described in the previous
section.  
"low level" assembly
programming
In a real chip, there always is a limited resource so that some
operation is not available directly as an instruction, for example,
cp8299x does not have "subtract".  In order to perform this
operation, sequence of existing instructions can be used.  This is
what I mean by "low level" assembly.  You have seen previously how
we perform "subtract" using "xor" and "add":
  a = b - c
  lda c
  xor #-1
  add #1
  sta mc   ;; mc =
(-c)
  lda b
  add mc
  sta a
To give you some idea what a programmer must face when writing programs
at the machine level, I will show how a 8-bit processor performs 16-bit
addition.  We use two bytes to store a 16-bit number then
accessing it byte by byte.  Adding one byte to one byte then use
the carry flag to add the next two bytes together.
Let bd, cd be two input numbers, let ad be the result, we use the "big
endian" convention in storing these numbers in the memory.
 
.org 0
  jmp begin
ad: 0 0
bd: 0 4
cd: 0 5
ap: &ad
bp: &bd
cp: &cd
ea: 0
tmp: 0
carry: 0
  .org 100H
begin:
  lda bp
  add #1   
;; access the lo byte
  sta ea
  lda (ea)  ;; get
lo(bd)
  sta tmp
  lda cp
  add #1
  sta ea
  lda (ea)  ;; get
lo(cd)
  add tmp   ;;
lo(cd)+lo(bd)
  sta tmp
  jpc docarry
  lda #0   
;; carry = 0
  sta carry
  jmp addnext
docarry:
  lda #1   
;; carry = 1
  sta carry
addnext:   
  lda ap
  add #1
  sta ea
  lda tmp
  sta (ea)  ;;
lo(cd+bd)->lo(ad)
  lda bp
  sta ea   
;; access the hi byte
  lda (ea)  ;; get
hi(bd)
  sta tmp
  lda cp
  sta ea   
;; access the hi byte
  lda (ea)  ;; get
hi(cd)
  add tmp
  add carry ;;
hi(cd+bd)+carry
  sta tmp
  lda ap
  sta ea
  lda tmp  
  sta (ea)  ;;
hi(cd+bd)->hi(ad)
exit:
  jsr 1001
  .end
It is a bit tedious but not difficult.  Please note that we use
"pointer" to numbers (ap, bp, cp) to access them (because the number
becomes a data structure, array of bytes).  The "carry" is used to
store the carry bit that will be added to the next digit (byte).
Obviously, programmers are not going to write in all details every time
they want to just adding two 16-bit numbers.  This code belongs to
the library.  In the assembly language vocabulary, the old timer
called these kinds, "macro" programming.  Now you probably started
to see why at some circumstances, assembly language programming is
necessary.
Don't worry that you do not appreciate what is going on at this
level.  The "low level" topic will not be included in the
assessment.
Excercises
Please try your hands on these examples.  They will help to hone
your skill a bit so that you will be comfortable doing the assembly
level programming for your forthcoming laboratory sessions. (and in the
on-line exam!)
1  Write a program to find maximum of an array of 8-bit numbers.
2  Write a program to add sum(a,b) where a,b are 8-bit numbers.
3  Write a program to reverse order of an array of 8-bit numbers.
4  Write fibonacci function that you have seen in the lecture, try
it out on the simulator.
5  How fast can you clear a block of memory?  How many
instruction per one byte of work?
6  Write a subroutine to "multiply" x,y.  Remember that we
have only 8-bit alu, so the result must not exceed 127.
7  Write a subroutine to "insert" one element into the middle of
an array.
Remember to write out a psuedo-code (in kind of high level language)
first.  Then allocate space for the variables.  Only then,
you can start to write the assembly language.  This is, I found
from my experience of teaching this subject, the easiest way to get it
right the first time.
Well, after reading these questions you should be able to invent
questions of your own along these lines.
Good luck and enjoy programming!
< I can write more examples, please let me know what
example you like to see>
Tools
assembler and simulator for cp8299x  from
Aj. Prabhas  stand-alone version run on XP and DOS  
  
How to use assembler    How to use simulator  cp8299x
assembler and simulator from Aj. Thit  run
under JLAB
NOTE:  the two versions of tools (Aj. Prabhas and Aj. Thit)
may differ in assembly syntax
Prabhas Chongstitvatana
last update  4 January 2006