book style

Chapter 2

Instruction Set Architecture

The instruction set design is an important part of computer design. An instruction set is the visible part of a processor where programmers see available resources of the processor such as functional units, registers, flags and the operations that can manipulate those resources.

An instruction set abstracts away the technology dependent part of a processor. For example, the frequency of the master clock, the details of implementation such as the number of pipeline stage and the size of cache memory. An instruction set also defines the architecture of a processor, that is, an ISA defines the function of a processor.

In this chapter we discuss the instruction set design issues. An introduction to assembly language is illustrated using the Motorola 6800. A study of the IBM System360 instruction set is elaborated to illustrate one of the most long-lived ISA. The S/360 ISA defines a family of computers and has a unique position in the computer history. Another approach to the ISA design, the stack-based ISA is discussed. Finally, one of the revolutionalised idea in ISA design of the last decade, the reduced instruction set computer (RISC), is explored.

Design issues

The designer of an instruction set must consider the following issues:

types of operations
types of data
instruction formats
the number of registers
the number of addressing modes

Types of operations

An instruction set consists of several types of operations. Most of these types must be present for a general-purpose processor.

Arithmetic operations such as add, subtract, increment.

Logical operations, such as compare, which return Boolean values or affect flags.

Data transfer such as load from memory, store to memory, moving data between registers or input/output.

Control transfer such as jump, conditional branch, subroutine call and return. They affect the flow of program execution.

Other operations such as disable/enable interrupts and interface to the operating system.

Types of data

The sequence of bit in the memory represents many types of data: addresses, numbers, characters, logical values {True, False}. These data types are interpreted by the instructions. Each instruction requires the correct type of data to produce a meaningful result. The choice of data type in each ISA is heavily influenced by the type of workload, such as binary-packed decimal (BCD) for business applications and floating-point for scientific computing. The difference in design reflects the difference in the intended use.

Example

The Intel Pentium processor has the following data types: byte, word, double word, quadword, integer, unsigned integer, BCD, packed BCD, near pointer, bit field, byte string, floating-point.

The IBM PowerPC processor has the following data types: byte, halfword, word, doubleword, unsigned byte, unsigned halfword, signed halfword, unsigned word, signed word, unsigned doubleword, byte string, single float, double float (IEEE 754).

Endianness (byte ordering, bit ordering)

As the memory is arranged in linear order, the order of bit and byte of data must be specified to have a consistent interpretation. There are two schools of thought: big-endian and little-endian. The big-endian school lays the data in memory from the most significant to the least significant "digits" and vice versa for the little-endian. Neither of which has absolute advantage over the other. In the past, the issue of endianness causes the problem of compatibility when data must be transferred between two machines with different endianness. Presently, the implementation of processors has both endianness built-in which allows software to switch the mode, hence reduces the problem of data translation. The ordering is considered at two levels: bit ordering and byte ordering.

Bit ordering: The ordering refers to whether the least significant bit is the left most or right most bit. This is important when a data is shifted out serially as in the serial communication applications. However, this is not the problem of the architecture as most processor has the instruction to shift both left most bit and right most bit out.

Byte ordering: Suppose a 32-bit value is 12345678 (hex), for a big-endian machine this is represented as 12,34,56,78 (ordering from low address to high address in memory). For a little-endian machine this is represented as 78,56,34,12.

The different processors adopted different endianness, the examples are as follows. The machines with little-endian are Intel 80x86, Pentium, VAX. The machines with big-endian are IBM 370, Motorola 680x0, and most RISC machines. Some machines are bi-endian, the endianness can be set in the processor status bit, they are PowerPC, MIPS.

Example To illustrate the difference between two endianness, consider how the following C structure is mapped in memory.

struct {

int a; //0x1112_1314 word

int pad;

double b; //0x2122_2324_2526_2728 doubleword

char* c; //0x3132_3334 word

char d[7]; //'A','B','C','D','E','F','G' byte array

short e; //0x5152 halfword

int f; //0x6162_6364 word

} s;

Big-endian address mapping (byte address)

00

01

02

03

04

05

06

07

08

09

0A

0B

0C

0D

0E

0F

11

12

13

14

21

22

23

24

25

26

27

28

10

11

12

13

14

15

16

17

18

19

1A

1B

1C

1D

1E

1F

31

32

33

34

A

B

C

D

E

F

G

51

52

20

21

22

23

61

62

63

64

Little-endian address mapping (byte address)

00

01

02

03

04

05

06

07

08

09

0A

0B

0C

0D

0E

0F

14

13

12

11

28

27

26

25

24

23

22

21

10

11

12

13

14

15

16

17

18

19

1A

1B

1C

1D

1E

1F

34

33

32

31

A

B

C

D

E

D

G

52

51

20

21

22

23

64

63

62

61

Figure 2.1 example of C data structure and its endian maps [IBM94]

Instruction formats

An instruction operates on its "operands". The number of operands varies for each instruction, however many instructions have the same number of operands. The number of operands determines the "format" of an instruction. The instruction format can be classified into 3- , 2- , 1- , and 0- operand instruction.

A 3- operand instruction has the form "op A B C", means A = B op C

A 2- operand instruction has the form "op A B", means A = A op B

A 1- operand instruction has the form "op A", means it operates on A

A 0- operand instruction has the form "op", means it has no operand or the operand is implicit in the stack .

The type of operands can be memory, register or constant values, which will affect:

the length of instructions - number of bits required to encode the instruction,

the speed of operation - the access time of memory and register are different hence the speed is different for reading and writing operands in memory or register, and

the number of instructions required to perform a task - the larger number of operands in an instruction results in fewer instructions to perform a task.

The size of encoding is different between memory and register operand. The number of register in a machine is much smaller than the addressable memory space hence the encoding of register operand is smaller than that of memory locations. The combination of the type of operand gives rise to the difference in category of architecture.

Comparing the register-register format and the memory-memory format. Assume operational code has 8 bits, operand address has 16 bits and each operand size 32 bits. Let I be the size of executed instructions, D be the size of executed data, M be the total memory traffic (in bits). The table below shows the size of instruction for each type of sequence of operations as (I, D, M).

Table 2.1 Comparing register-register and memory-memory instruction formats (I,D,M) - I the size of instruction, D the size of data, M total memory traffic in bits

operations

register-register

memory-memory

A = B + C

ld rB B
ld rC C
add rA rB rC
st rA A

(104 96 200)

add B C A

(56 96 152)

A=B+C;

B=A+C;

D=D-B

add rA rB rC
add rB rA rC
sub rD rD rB

(60 0 60)

add B C A
add A C B
sub B D D

(168 288 456)

The processor design is strongly tied to the instruction set design. There were many diverse computer designs and hence many different instruction set designs in the past. However, as the technology progress, the analysis of the workload - the actual running programs - which affect the instruction set selection leads to the convergence of instruction set architecture. The most common type of instruction set architecture today belong to three classes:

Load-Store architecture
Register-Memory architecture
Register-plus-Memory architecture

Load-Store architecture has 3-address format and mostly 32-bit instruction size. This is the most popular among the current microprocessor design including: HP PA-RISC, IBM RS/6000, SUN Sparc, MIPS R4000, DEC Alpha etc. All data to/from memory must load/store through a register first. The execution (operation) takes operands from registers and the result stored back to a register. This instruction format simplifies the decoding and implementation. Because most operations are performed on registers, they are fast. However, as registers are used extensively the allocation of registers becomes important. Determining which variables to be resided in registers affects the performance of this class of machines and register allocation is done by compilers.

Register-Memory architecture has 2-address format and has 16/32/64 bit instruction size. An instruction can operate both on registers and with one of the operand in the memory. This is the "classical" ISA and is used by one of the longest-lived ISA of today IBM S/360 and Intel x86 family of processors.

Register-plus-Memory architecture is the most flexible in the use of operands. Operands can be registers or memory. This architecture has byte-variable instruction size. This flexibility comes with a price, the complexity in implementation. This type of architecture is typified by VAX family of computer in the era that there was the drive to provide the high level language semantic for the instruction set, so called "close the gap" between high level language and machine language. This architecture combines both operands in memory and registers. It allows flexibility in the use of memory to keep variables and does not need to have a large number of registers to achieve high level of performance.

Addressing modes

The addressing mode refers to the way an instruction calculates addresses of operands. The "effective" address can be computed using the value from the register(s) or the value of some field in the instruction itself. To access an array, the index is necessary. The index is usually stored in a register. The indirect address is used to represent "pointer" type and to access a value via a pointer. Many complicated addressing modes have their use when translating a high level language construct into machine instructions. Table 2.2 shows some of the most frequently used addressing found in most processors.

Table 2.2 various addressing modes

addressing modes	instruction format	instruction meaning
register	add r4,r3	r4 = r4 + r3
immediate	add r4,#3	r4 = r4 + 3
based	add r4,100(r1)	r4 = r4 + M[100+r1]
register indirect	add r4,(r1)	r4 = r4 + M[r1]
indexed	add r3,(r1+r2)	r3 = r3 + M[r1+r2]
direct	add r1,(1001)	r1 = r1 + M[1001]
memory indirect	add r1,@(r3)	r1 = r1 + M[ M[r3]]
auto-increment	add r1,(r2)+	r1 = r1 + M[r2] ; r2 = r2 + d
auto-decrement	add r1,- (r2)	r2 = r2 - d; r1 = r1 +M[r2]
scaled	add r1,100(r2)[r3]	r1 = r1 + M[100+r2+r3*d]

Assembly language

In this section we will learn an assembly language. The assembly language is "lingua franca" to talk to the underlying hardware. An example of a real microprocessor assembly language is illustrated in relations with the high level language.

Why assembly language is needed

It is becoming less and less necessary for a programmer to program in an assembly language. High-level languages made programs portable and programming more productive. There are however some situation where an assembly language is necessary such as when programming at very near hardware level. A programmer who creates these types of programs: a compiler, a device driver in an OS, an embedded control program etc. needs to use assembly language. An assembly language is the language that allows a programmer to talk about operations on a bare bone hardware. For a computer architect, an assembly language is the "interface" to the hardware functions. During this course, we will talk about the innards of computers, their organization, how each unit works. All these follow from what kind of assembly language a computer has. It is necessary for a computer architect to be able to write and read assembly language well. All working units inside a computer perform according to some sequence of its instruction.

To study computer architecture, we need to understand assembly language. This introduction will concentrate on principles of assembly language programming. The aim is to enable students to read some subset of assembly language and understand their operational semantics. We will use a real CPU, Motorola 6800, as our example. It was designed more than 20 years ago. It has a simple instruction set and is easy to understand. We use real CPU because it shows the complexity of the real device. We choose only a subset of instruction set that is enough to let us program some small programs.

Instruction set of MC6800

The machine model of Motorola 6800 shows the resource in the view of an assembly language programmer. This microprocessor is composed of two 8-bit accumulators ACCA and ACCB. It has two 16-bit registers, which can perform indexing: X and SP. The conditional flags reside in the 8-bit condition code register. The address space is 64K bytes (address 16-bit).

In general, instructions can be grouped into 5 categories:

Arithmetic: ADD, SUB, INC, DEC

Logical operation: CMP

Data transfer: LDA, STA (load, store)

Flow of control: BR, JMP (branch on condition, jump)

Others (such as I/O)

These are instructions that manipulate the index register:

LDX load index register
INX increment index register
CPX compare index registers

Addressing modes are:

Direct mode, sometimes called "Absolute". The operand is the effective address. LDA A $100 ($ signify hex)

Immediate mode, the operand is some constant value to be used. LDA A #3

Indexed mode, the operand is added to the index register to get an effective address. LDA A $200, X effective ads = $200 + x

Relative mode, it is used in jump instructions to get effective address relative to the current PC.

Programmer model of 6800

A    8 bits
B    8 bits
X    16 bits index register
SP    16 bits   stack pointer
CC    8 bits    H,I,N,Z,V,C

Memory model of 6800

64K : 00-FF for short, 0000-FFFF long

Example: P = M + N
let P= $100, M =$101, N= $102

ldaa $101
adda $102
staa $100

Example: add 1 to 10
In a high level language

i = 1; sum = i
while i <= 10
sum = sum + i
i = i+ 1

in assembly language

let sum =$100, i =101
ldaa #1
staa $101
staa $100
loop: ldaa $101
cmpa #10
bgt exit ; while i <= 10
ldaa $100
adaa $101
staa $100 ; sum = sum + i
inc $101 ; i = i + 1
jmp loop
exit: ...

Example: Find maximum in an array AR[i] i = 1..10
learn how to use index

      .org h'100
      ldx i
      ldaa ar,x
      staa max     ; max = ar[0]
loop: ldaa i+1     ; use 8 bit of i
      cmpa #2
      bgt exit     ; while i <= N
      ldaa ar,x
      cmpa max     ; if max < ar[i]
      ble skip
      staa max     ; then max = ar[i]
skip: inx          ; i = i+1
      stx i
      jmp loop
exit:
      .org h'10
max: .db 0        ; max
i:    .dw 0        ; index must be 16 bit
ar:   .db 4,5,6    ; array
      .end

Assembler a68

directive .ORG, .END, .DB define byte, .DW define word
symbolic name NAME:
literal H'100 (hex), 100, #2, #'A'

In an assembler program, the assembly language directive helps to improve the readability of the assembly language program by providing the use of symbolic names. The directives are special instructions. They are pseudo instructions which do not translated into any actual machine instruction. Mostly they provide the name and the constant value stored in the memory. ORG set PC, EQU define symbol, DB, DW reserve storage.

To simplify register allocation, variables are kept in memory (using DB, DW or EQU). Although sometimes it is laborious to move variables between registers and memory, it is straightforward and easy to understand. Symbolic names can be used to make a program easier to read. From the last example:

.org 0
max equ $100
AR equ $102

ldx #1
ldaa AR,x
staa max ; max = AR[1]
ldx #2 ; i = 2
...

Tools

An assembler can translate a source file into machine code file (in some file format, such as Motorola S-format). This machine code file can be loaded into a simulator and executed. A simulator allows students to execute and monitor the effect step by step. It shows the value of all registers and can display memory values. The a68 assembler and the simulator, sim68 are available for download from the web page of this book.

IBM System/360 ISA

IBM System/360 [AMD64] is one of the longest-lived instruction set to date, the architecture was introduced in 1964. The goal of this family of computer is to have compatible instruction set but have a performance range of 50. The task of designers is a difficult one. It is aimed to perform both scientific and data processing applications. The scientific applications are dominated by floating point operations. The data processing applications involve movement of long strings. Its long live brought light to a now classical problem in instruction set design: the shortage of addressing space. As applications grow the requirement for address space increase very quickly. A design that has the address space adequate at the time of its introduction quickly find itself lacking address space in just a few years later. To quote from IBM [BEL76]

"There is only one mistake . . . that is difficult to recover from - not providing enough address bits . . . "

Programmer's model

It is byte addressable, the smallest addressable unit is byte. Addresses are "real" referring to physical location in the main memory. Its successor System/370 [CAS78] introduced a major advanced concept, "virtual" address, where address does not refer directly to a physical location in the main memory but is mapped to a physical location by a dynamic addressing translation mechanism.

S360 has 16 32-bit registers, R0 to R15. R2 to R12 are general purpose. R0, R1, R13, R14, R15 are special purpose and are used in subroutine linkages (Table 2.3). For floating point number operations the registers are paired into four floating point registers, each 64-bit, numbered : 0, 2, 4, 6.

Table 2.3 S360 special purpose registers

register

caller

callee

R0

return value from the subroutine

return value

R1

send parameters to subroutine

receive parameters

R13

register save area

save and restore registers

R14

return address

return value

R15

the address of subroutine

--

Addressing mode

It has five addressing modes: register-register (RR), register-index (RX), register-storage (RS), storage-index (SI) and storage-storage (SS). The instruction format for each mode is (field:length in bit) :

RR

op:8

R1:4

R2:4

RX

op:8

R1:4

X:4

B:4

D:12

RS

op:8

R1:4

R3:4

B:4

D:12

SI

op:8

I:8

B:4

D:12

SS

op:8

L1:4

L2:4

B1:4

D1:12

B2:4

D2:12

RR register to register R[R1] = R[R1] op R[R2]

RX register to indexed storage R[R1] = R[R1] op M[R[X] + R[B] + D]

RS register to storage R[R1] = M[R[B] + D] op R[R3]

SI storage to immediate M[R[B] + D] op I

SS storage to storage M[R[B1] + D1]:L1 op M[R[B2] + D2]:L2 where L1, L2 are length of operands

Types of data

It is byte-addressable. A full word is 32-bit, a double word is 64-bit. The natural size is 32-bit. For arithmetic data, it has decimal, pack decimal, floating point numbers with single precision 32-bit and double precision 64-bit. It has strings and characters, EBCDIC (extended binary coded decimal interchange code),

Types of operations

The S/360 has been built to accommodate many types of basic functions, with decimal data, binary data and floating-point data and instructions for arithmetic operations for each type of data. The instruction format for decimal addition is not the same as that for binary addition, because the decimal addition does not use registers. Floating-point arithmetic uses its own set of registers, and has special environments in regard to numbering registers. The S/360 has the following classification of its instructions.

Arithmetic instructions
Conversion instructions
Data movement instructions
Logical instructions
Branch instructions
Miscellaneous instructions

load/store

L load

LP load positive

LN load negative

LC load complement

LA load address

ST store

branch

B branch

BC branch on condition, on condition code (CC bits) using the following mnemonics :

BZ branch on zero

BP branch on positive

BM branch on minus

BNZ branch on not zero

BNP branch on not positive

BNM branch on not minus

BO branch on overflow

BNO branch on not overflow

The addressing mode can be either RR (the destination address is in a register) or RX (the destination address is calculate from base + index + displacement)

doing loop

BCT branch on count, this is auto-decrement the operand (RR-type) and branch when the value is 0.

BXLE branch on index low or equal

BXH branch on index high

calling subroutine

BAL branch and link (RR, RX) the return address is loaded into op1 and branch to the destination address in op2.

to return from subroutine

B r branch register r which store the return address. This is used in pair with BAL r

arithmetic/logic

A add

S subtract

M multiply

D divide

C compare

CL compare logical character

logical operations,

the operands can be RX RR SS SI

N and

O or

X xor

TM test under mask

SL shift left arithmetic/logical

SR shift right arithmetic/logical

string operations

MVC move characters

CLC compare logical characters

TR translate and test, string search

TRT translate and test table, table look up and character translation

conversion

CV convert from packed decimal to binary

CVD convert from binary to packed decimal

PACK convert from zoned decimal to packed decimal

UNPACK convert from packed decimal to zoned decimal

ED edit, convert packed decimal to zoned for display

EDMK edit and mask, similar to edit but use pattern to insert a currency symbol such as $

Example of a program to perform W = X + Y - Z. Assume W, X, Y, Z are in the memory.

PROGRAM

START 0

BALR 12,0

USING *, 12

L 2, X R2 = M(X)

A 2, Y R2 = R2 + M(Y)

S 2, Z R2 = R2 - M(Z)

ST 2, W M(W) = R2

BR 14 STOP

X DC F '10' DEFINE CONST FLOAT 10.0

Y DC F '3' DEFINE CONST FLOAT 3.0

Z DC F '4' DEFINE CONST FLOAT 4.0

W DS F RESERVE STORAGE FLOAT

END

Stack-based instruction set architecture

What is a stack machine?

Contrast to an ordinary processor of contemporary design which uses registers, a stack machine uses stack. A stack is a LIFO (last in first out) storage with two abstract operations: push and pop. Push puts an item into stack at the top. Pop retrieves an item at the top of stack.

Calculation using stack.

Because a stack is LIFO, any operation must access data item from the top. Stack doesn't need "addressing", as it is implicit in the operators, which use stack. Any expression can be transformed into a postfix order and stack can be used to evaluate that expression without the need for explicitly locating any variable. For example,

B + C - D ==>
        B C + D - (postfix)
        push val B, push val C, add, push val D, sub.
A = B ==>
        A B =
        push ads A, push val B, store.

add takes top two items from stack add them and push the result back to stack. Similarly sub operators. store takes one value and one address from stack and store value to address.

Let's compare the above expression to the calculation using registers.

B + C - D
    load r0, B
    load r1, C
    add r0, r1   ; r0+r1 -> r0
    load r2, D
    sub r0, r2

A = B
    load r0, ads A
    load r1, val B
    store r1, (r0)  ; r1 --> (r0)

One can see that the main difference is that registers must be allocated, for example, r0 is used to store temporary result while in a stack machine the temporary storage is implicit. ISA based on stack has an advantage over register-based ISA that it is very compact. As most instructions have implicit argument, the size of instruction is very short, usually one byte. Only a few instructions need argument, such as jump, push literal, that required more than one byte.

Example of stack ISA

We will illustrate an ISA that is based on a stack machine. Let us ignore local variables to simplify the presentation (therefore reduce the complication of an activation record). We need load, store, arithmetic operators, call, return and conventional jump and branch for flow of control.

Notation: TOP is the item on top of stack, NEXT is an item below TOP (therefore we can talk about 2 operands on stack by TOP, NEXT), M[ads] value of memory at ads. "pop a" is TOP --> a, "pop2" pops two items off stack.

lit #a push the immediate value a.
load pop a, push M[a].
store NEXT -> M[TOP], pop2.
add NEXT + TOP -> a, pop2, push a.
cmp if NEXT > TOP a = 1 else a = 0, pop2, push a.
call pop a, create new activation record, goto a.
return delete current activation record, go back pc'.
jz #a if pop = 0 then goto a.

Please note that except lit #a and jz #a which has #a as argument, all other instructions have argument(s) implicit in the stack. The state of computation consists of a stack pointer and a program counter. If we have two stacks one for computation and one for activation record (called control stack), we need only to store the program counter (return address) in the activation record and there is no need to do anything to computation stack on subroutine calls. Calling a subroutine need just push the current program counter (return address) onto the control stack. Returning is just pop the control stack and restore the previous program counter.

Reduced Instruction Set Computer

As high-level languages became popular and started to replace assembly languages the design of instruction set began to take the central stage. The ISA design of that period (circa 1970) emphasised the support of high-level languages using instructions that perform complex operations such as move block of characters and having various addressing modes to accommodate accessing the data structure of high-level languages. The intention is that with these complex instructions the "level" of assembly languages will be lifted up to be nearer to the high-level languages. (The difference between high-level languages and assembly languages is called "semantic gap" [ILI82]). Thus, simplify the construction of compiler (which was one of the most complex programs of those days). The ISA design also emphasised the small size of the executable code. The reason is that by having a small code size, the program will run faster. One obvious fact is there will be fewer instructions to be fetched from memory.

However, because of their complexity, the complex instructions require many cycles to execute. The control unit was more difficult to design and the technique of "microprogram" became the standard engineering tool to battle this complexity. The complexity of a control unit can be measured by the size of the microprogram (the DEC VAX 11/780 has 5140 ´ 96 bits of microprogram, it has one of the most complex ISA [LEV89]). This complexity resulted in the longer cycle time. The other negative aspect of the complex ISA is that the pipeline scheduling is not very effective and the cost of stall is very high.

The study of dynamic execution of instructions of the programs written in high-level languages [PATT82] [LUN77] [HUC83] showed that 1) the most frequently used instructions are the simple instructions 2) compilers do not use much of the complex instructions as it is difficult to match the context (conditions) of statements in the language to specialised instructions, therefore the compiled code contained mostly simple instructions. Table 2.4 shows the result from [PATT82].

Arming with these findings, the movement of the new direction is designing instruction set had begun [PAT82] [PAT85] [STA88]. The ISA design was in the contrast with the earlier ISA, this new ISA emphasised on 1) making the simple instructions run fast 2) making the pipeline efficiency the main concern. This idea led to the effort to make every instruction to run in one cycle. The main technique is to have load/store instruction set and making use of large number of registers to store local values and to pass parameters between call/return. The visible characteristic is that the new ISA has simplified instruction set (this does not mean the number of instruction is reduced), for example, the number of addressing mode is restricted, the complex instructions which can not be completed in one cycle are abandoned, some complex operations are achieved by using a sequence of simple instructions instead.

Table 2.3 Weighted relative dynamic frequency of high-level languages operation

	Dynamic occurrence		Machine instruction weighted		Memory referenced weighted
	Pascal	C	Pascal	C	Pascal	C
ASSIGN	45	38	13	13	14	15
LOOP	5	3	42	32	33	26
CALL	15	12	31	33	44	45
IF	29	43	11	21	7	13
GOTO	-	3	-	-	-	-
OTHER	6	1	3	1	2	1

The other main departure from the previous ISA design is the emphasis on using compilers to schedule efficient codes. Many techniques in the new ISA requires sophistication of the compiler such as the use of delay branch requires compilers to be able to filled in the delay slot. Fortunately, the software technology has been advanced to the stage that writing this sophisticate compiler becomes possible. With simplified instruction set, compilation techniques achieve a good deal of efficiency. It was easier to generate a good code for this simplified ISA than for a complex ISA. The result from this new thinking is that CPI of processor approaches 1.0. The control unit is simplified to the point that the hardwired circuit is practical. The cycle time is reduced.

The complex instruction set was named "Complex Instruction Set Computer" (CISC) in contrast to the simplified instruction set which was then called "Reduced Instruction Set Computer" (RISC). The year 1980-1990 becomes the golden age of the RISC philosophy when the microelectronics industry has matured and it is possible to produce a high performance processor on a chip. The RISC design has dominated the market and becomes synonymous with high performance. Because of the regularity inherent in the RISC design, the computer-aided design (CAD) tools can be applied easily to the design and test process, hence it accelerates the time to market of the new processors. However, the compatibility of the old software keeps the complex instruction set alive, notable the Intel family of microprocessors, the 80x86 and later the Pentium family.

Decode complexity

Pipelining difficulty

Processor

No. of Inst. sizes

Max. Inst. size in bytes

No. of addressing modes

indirect addressing

load/store with combined arithmetic

Max. no. of memory operands

unaligned addressing allowed

MIPS R2000

1

4

1

no

no

1

no

SPARC

1

4

2

no

no

1

no

HP PA

1

4

10

no

no

1

no

IBM RS/6000

1

4

4

no

no

1

yes

IBM 3090

4

8

2

no

yes

2

yes

Intel 80486

12

12

15

no

yes

2

yes

MC68040

11

22

44

yes

yes

2

yes

VAX

56

56

22

yes

yes

6

yes

Figure 2.2 Characteristics of some processors

Figure 2.2 shows characteristics of some processors that illustrate the difference between CISC and RISC designs. The first four processors: MIRS R2000, SPARC, HP PA and RS/6000 are RISC. They have one fixed instruction size, small number of addressing modes, has no indirect addressing, no load/store combined with arithmetic instructions and has maximum one memory operand. The other four processors: IBM 3090, Intel 80486, MC68040 and VAX are CISC. This example is chosen to contrast both schools of thought, however, the division between them is not black and white. There are many ISA that fall in between.

The evolution of idea in the ISA design of both generations (CISC and RISC) is the change according to the technological force. The CISC was successful because of microprogramming technique as well as RISC was successful because of the single chip processor technology. The success of both ideas in the past can be a good example how a particular tradeoff is achieved. The lesson learn can be applicable to the future ISA design which definitely will be affected by the technology yet to come (such as DNA computing and nanoelectronics).

The current design uses both ideas in the implementation of a processor [HEN91] [FLY98] [FLY99]. The control is divided in to two parts 1) the execution of basic instructions and 2) the execution of the complex instructions. The basic instructions will be completed in one cycle and multiple issued. The complex instruction will have very deep pipeline, for example the Intel Pentium has 14 stages pipeline in one model. The complex instructions can also be translated at run-time into wide internal micro-operations, which simplify the multicycle pipeline especially for floating-point operations. Flynn said in one of his article [FLY97] that

"Tradeoffs between computer design cost-performance and programmer accessible functionality are as current a problem today as they were in 1953."

and concerning the debate whether CISC or RISC is better that

" ... Actual performance differences in instruction set efficiency are slight, but these differences still stir passions among hardware designers. Within the past few years, there has been a continuing (and generally unproductive) debate over the cost-performance benefits of the so-called RISC instruction sets over earlier instruction sets labeled CISC."

No doubt, the instruction set design of the future processor will have another revolutionary idea as much as RISC has over CISC in the past.

References

[AMD64] Amdahl, G., Blaauw, G., and Brooks, F., "Architecture of the IBM System/360", IBM Journal of Research and Development, April 1964.

[BEL76] Bell, C., and Strecker, W., "Computer structures: What we have learned from the PDP-11", Proc. of 3rd annual symposium on computer architecture, (1976): 1-14.

[CAS78] Case, R., and A. Padegs, A., "Architecture of the IBM System/370", Communication of the ACM, 21(1978): 73-96.

[FLY97] Flynn, M., "Introduction to :Influence of Programming Techniques on the Design of Computers", Proceedings of the IEEE, Volume 85, no. 3, March 1997, pp. 467-469.

[FLY98] Flynn, M., "Computer engineering 30 years after the IBM Model 91", Computer, Volume 31, no. 4, April 1998, pp. 27 -31.

[FLY99] Flynn, M., Hung, P., Rudd, K., "Deep submicron microprocessor design issues", IEEE Micro, Volume 19, no. 4, July-Aug. 1999, pp. 11-22.

[HEN91] Hennessy, J., Jouppi, N., "Computer technology and architecture: an evolving interaction", Computer, Volume 24, no.9 , Sept. 1991, pp. 18-29.

[HUC83] Huck, T., Comparative analysis of computer architectures, Stanford university technical report no. 83-243, May 1983.

[IBM94] International Business Machines, Inc., The PowerPC architecture: A specification for a new family of RISC processors. San Francisco: CA, Morgan Kaufmann, 1994.

[ILI82] Iliffe, J., Advanced computer design, Prentice-Hall, London, 1982.

[LEV89] Levy M., and Eckhouse, R., Computer programming and architecture: the VAX, Bedford, Mass., Digital Press, 1989

[LUN77] Lunde, A., "Empirical evaluation of some features of instruction set processor architecture", Comm. of the ACM, March 1977.

[PAT82] Patterson, D., and Sequin, C., "A VLSI RISC", Computer, 15, no. 9, September, 1982, pp. 8-21.

[PAT85] Patterson, D., "Reduced instruction set computers", Comm. of the ACM, 28, no.1, January 1985.

[STA88] Stallings, W., "Reduced instruction set computer architecture", Proc. of the IEEE, vol. 76, no. 1, January 1988, pp. 38-55.

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
11	12	13	14					21	22	23	24	25	26	27	28
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
31	32	33	34	A	B	C	D	E	F	G		51	52
20	21	22	23
61	62	63	64

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
14	13	12	11					28	27	26	25	24	23	22	21
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
34	33	32	31	A	B	C	D	E	D	G		52	51
20	21	22	23
64	63	62	61

operations	register-register	memory-memory
A = B + C	ld rB B ld rC C add rA rB rC st rA A (104 96 200)	add B C A (56 96 152)
A=B+C; B=A+C; D=D-B	add rA rB rC add rB rA rC sub rD rD rB (60 0 60)	add B C A add A C B sub B D D (168 288 456)

register	caller	callee
R0	return value from the subroutine	return value
R1	send parameters to subroutine	receive parameters
R13	register save area	save and restore registers
R14	return address	return value
R15	the address of subroutine	--

	Decode complexity			Pipelining difficulty
Processor	No. of Inst. sizes	Max. Inst. size in bytes	No. of addressing modes	indirect addressing	load/store with combined arithmetic	Max. no. of memory operands	unaligned addressing allowed
MIPS R2000	1	4	1	no	no	1	no
SPARC	1	4	2	no	no	1	no
HP PA	1	4	10	no	no	1	no
IBM RS/6000	1	4	4	no	no	1	yes
IBM 3090	4	8	2	no	yes	2	yes
Intel 80486	12	12	15	no	yes	2	yes
MC68040	11	22	44	yes	yes	2	yes
VAX	56	56	22	yes	yes	6	yes

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
11	12	13	14					21	22	23	24	25	26	27	28
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
31	32	33	34	A	B	C	D	E	F	G		51	52
20	21	22	23
61	62	63	64

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
14	13	12	11					28	27	26	25	24	23	22	21
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
34	33	32	31	A	B	C	D	E	D	G		52	51
20	21	22	23
64	63	62	61

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
11	12	13	14					21	22	23	24	25	26	27	28
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
31	32	33	34	A	B	C	D	E	F	G		51	52
20	21	22	23
61	62	63	64

00	01	02	03	04	05	06	07	08	09	0A	0B	0C	0D	0E	0F
14	13	12	11					28	27	26	25	24	23	22	21
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
34	33	32	31	A	B	C	D	E	D	G		52	51
20	21	22	23
64	63	62	61