The instruction set design is an important part of computer design. An instruction set is the visible part of a processor where programmers see available resources of the processor such as functional units, registers, flags and the operations that can manipulate those resources.
An instruction set abstracts away the technology dependent part of a processor. For example, the frequency of the master clock, the details of implementation such as the number of pipeline stage and the size of cache memory. An instruction set also defines the architecture of a processor, that is, an ISA defines the function of a processor.
In this chapter we discuss the instruction set design issues. An introduction to assembly language is illustrated using the Motorola 6800. A study of the IBM System360 instruction set is elaborated to illustrate one of the most long-lived ISA. The S/360 ISA defines a family of computers and has a unique position in the computer history. Another approach to the ISA design, the stack-based ISA is discussed. Finally, one of the revolutionalised idea in ISA design of the last decade, the reduced instruction set computer (RISC), is explored.
Design issues
The designer of an instruction set must consider the following issues:
Types of operations
An instruction set consists of several types of operations. Most of these types must be present for a general-purpose processor.
Types of data
The sequence of bit in the memory represents many types of data: addresses, numbers, characters, logical values {True, False}. These data types are interpreted by the instructions. Each instruction requires the correct type of data to produce a meaningful result. The choice of data type in each ISA is heavily influenced by the type of workload, such as binary-packed decimal (BCD) for business applications and floating-point for scientific computing. The difference in design reflects the difference in the intended use.
Example
The Intel Pentium processor has the following data types: byte, word, double word, quadword, integer, unsigned integer, BCD, packed BCD, near pointer, bit field, byte string, floating-point.
The IBM PowerPC processor has the following data types: byte, halfword, word, doubleword, unsigned byte, unsigned halfword, signed halfword, unsigned word, signed word, unsigned doubleword, byte string, single float, double float (IEEE 754).
Endianness (byte ordering, bit ordering)
As the memory is arranged in linear order, the order of bit and byte of data must be specified to have a consistent interpretation. There are two schools of thought: big-endian and little-endian. The big-endian school lays the data in memory from the most significant to the least significant "digits" and vice versa for the little-endian. Neither of which has absolute advantage over the other. In the past, the issue of endianness causes the problem of compatibility when data must be transferred between two machines with different endianness. Presently, the implementation of processors has both endianness built-in which allows software to switch the mode, hence reduces the problem of data translation. The ordering is considered at two levels: bit ordering and byte ordering.
Bit ordering: The ordering refers to whether the least significant bit is the left most or right most bit. This is important when a data is shifted out serially as in the serial communication applications. However, this is not the problem of the architecture as most processor has the instruction to shift both left most bit and right most bit out.
Byte ordering: Suppose a 32-bit value is 12345678 (hex), for a big-endian machine this is represented as 12,34,56,78 (ordering from low address to high address in memory). For a little-endian machine this is represented as 78,56,34,12.
The different processors adopted different endianness, the examples are as follows. The machines with little-endian are Intel 80x86, Pentium, VAX. The machines with big-endian are IBM 370, Motorola 680x0, and most RISC machines. Some machines are bi-endian, the endianness can be set in the processor status bit, they are PowerPC, MIPS.
Example To illustrate the difference between two endianness, consider how the following C structure is mapped in memory.
struct {
int a; //0x1112_1314 word
int pad;
double b; //0x2122_2324_2526_2728 doubleword
char* c; //0x3132_3334 word
char d[7]; //'A','B','C','D','E','F','G' byte array
short e; //0x5152 halfword
int f; //0x6162_6364 word
} s;
Big-endian address mapping (byte address)
00 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
0A |
0B |
0C |
0D |
0E |
0F |
11 |
12 |
13 |
14 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
||||
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
1A |
1B |
1C |
1D |
1E |
1F |
31 |
32 |
33 |
34 |
A |
B |
C |
D |
E |
F |
G |
51 |
52 |
|||
20 |
21 |
22 |
23 |
||||||||||||
61 |
62 |
63 |
64 |
Little-endian address mapping (byte address)
00 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
0A |
0B |
0C |
0D |
0E |
0F |
14 |
13 |
12 |
11 |
28 |
27 |
26 |
25 |
24 |
23 |
22 |
21 |
||||
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
1A |
1B |
1C |
1D |
1E |
1F |
34 |
33 |
32 |
31 |
A |
B |
C |
D |
E |
D |
G |
52 |
51 |
|||
20 |
21 |
22 |
23 |
||||||||||||
64 |
63 |
62 |
61 |
Figure 2.1 example of C data structure and its endian maps [IBM94]
Instruction formats
An instruction operates on its "operands". The number of operands varies for each instruction, however many instructions have the same number of operands. The number of operands determines the "format" of an instruction. The instruction format can be classified into 3- , 2- , 1- , and 0- operand instruction.
A 3- operand instruction has the form "op A B C", means A = B op C
A 2- operand instruction has the form "op A B", means A = A op B
A 1- operand instruction has the form "op A", means it operates on A
A 0- operand instruction has the form "op", means it has no operand or the operand is implicit in the stack .
The type of operands can be memory, register or constant values, which will affect:
The size of encoding is different between memory and register operand. The number of register in a machine is much smaller than the addressable memory space hence the encoding of register operand is smaller than that of memory locations. The combination of the type of operand gives rise to the difference in category of architecture.
Comparing the register-register format and the memory-memory format. Assume operational code has 8 bits, operand address has 16 bits and each operand size 32 bits. Let I be the size of executed instructions, D be the size of executed data, M be the total memory traffic (in bits). The table below shows the size of instruction for each type of sequence of operations as (I, D, M).
Table 2.1 Comparing register-register and memory-memory instruction formats (I,D,M) - I the size of instruction, D the size of data, M total memory traffic in bits
operations |
register-register |
memory-memory |
A = B + C |
ld rB B (104 96 200) |
add B C A
(56 96 152) |
A=B+C; B=A+C; D=D-B |
add rA rB rC (60 0 60) |
add B C A (168 288 456) |
The processor design is strongly tied to the instruction set design. There were many diverse computer designs and hence many different instruction set designs in the past. However, as the technology progress, the analysis of the workload - the actual running programs - which affect the instruction set selection leads to the convergence of instruction set architecture. The most common type of instruction set architecture today belong to three classes:
Load-Store architecture has 3-address format and mostly 32-bit instruction size. This is the most popular among the current microprocessor design including: HP PA-RISC, IBM RS/6000, SUN Sparc, MIPS R4000, DEC Alpha etc. All data to/from memory must load/store through a register first. The execution (operation) takes operands from registers and the result stored back to a register. This instruction format simplifies the decoding and implementation. Because most operations are performed on registers, they are fast. However, as registers are used extensively the allocation of registers becomes important. Determining which variables to be resided in registers affects the performance of this class of machines and register allocation is done by compilers.
Register-Memory architecture has 2-address format and has 16/32/64 bit instruction size. An instruction can operate both on registers and with one of the operand in the memory. This is the "classical" ISA and is used by one of the longest-lived ISA of today IBM S/360 and Intel x86 family of processors.
Register-plus-Memory architecture is the most flexible in the use of operands. Operands can be registers or memory. This architecture has byte-variable instruction size. This flexibility comes with a price, the complexity in implementation. This type of architecture is typified by VAX family of computer in the era that there was the drive to provide the high level language semantic for the instruction set, so called "close the gap" between high level language and machine language. This architecture combines both operands in memory and registers. It allows flexibility in the use of memory to keep variables and does not need to have a large number of registers to achieve high level of performance.
Addressing modes
The addressing mode refers to the way an instruction calculates addresses of operands. The "effective" address can be computed using the value from the register(s) or the value of some field in the instruction itself. To access an array, the index is necessary. The index is usually stored in a register. The indirect address is used to represent "pointer" type and to access a value via a pointer. Many complicated addressing modes have their use when translating a high level language construct into machine instructions. Table 2.2 shows some of the most frequently used addressing found in most processors.
Table 2.2 various addressing modes
addressing modes |
instruction format |
instruction meaning |
register |
add r4,r3 |
r4 = r4 + r3 |
immediate |
add r4,#3 |
r4 = r4 + 3 |
based |
add r4,100(r1) |
r4 = r4 + M[100+r1] |
register indirect |
add r4,(r1) |
r4 = r4 + M[r1] |
indexed |
add r3,(r1+r2) |
r3 = r3 + M[r1+r2] |
direct |
add r1,(1001) |
r1 = r1 + M[1001] |
memory indirect |
add r1,@(r3) |
r1 = r1 + M[ M[r3]] |
auto-increment |
add r1,(r2)+ |
r1 = r1 + M[r2] ; r2 = r2 + d |
auto-decrement |
add r1,- (r2) |
r2 = r2 - d; r1 = r1 +M[r2] |
scaled |
add r1,100(r2)[r3] |
r1 = r1 + M[100+r2+r3*d] |
Assembly language
In this section we will learn an assembly language. The assembly language is "lingua franca" to talk to the underlying hardware. An example of a real microprocessor assembly language is illustrated in relations with the high level language.
Why assembly language is needed
It is becoming less and less necessary for a programmer to program in an assembly language. High-level languages made programs portable and programming more productive. There are however some situation where an assembly language is necessary such as when programming at very near hardware level. A programmer who creates these types of programs: a compiler, a device driver in an OS, an embedded control program etc. needs to use assembly language. An assembly language is the language that allows a programmer to talk about operations on a bare bone hardware. For a computer architect, an assembly language is the "interface" to the hardware functions. During this course, we will talk about the innards of computers, their organization, how each unit works. All these follow from what kind of assembly language a computer has. It is necessary for a computer architect to be able to write and read assembly language well. All working units inside a computer perform according to some sequence of its instruction.
To study computer architecture, we need to understand assembly language. This introduction will concentrate on principles of assembly language programming. The aim is to enable students to read some subset of assembly language and understand their operational semantics. We will use a real CPU, Motorola 6800, as our example. It was designed more than 20 years ago. It has a simple instruction set and is easy to understand. We use real CPU because it shows the complexity of the real device. We choose only a subset of instruction set that is enough to let us program some small programs.
Instruction set of MC6800
The machine model of Motorola 6800 shows the resource in the view of an assembly language programmer. This microprocessor is composed of two 8-bit accumulators
ACCA and ACCB. It has two 16-bit registers, which can perform indexing: X and SP. The conditional flags reside in the 8-bit condition code register. The address space is 64K bytes (address 16-bit).In general, instructions can be grouped into 5 categories:
These are instructions that manipulate the index register:
LDX
load index registerAddressing modes are:
Programmer model of 6800
A
8 bitsMemory model of 6800
64K : 00-FF for short, 0000-FFFF long
Example: P = M + N
let P= $100, M =$101, N= $102
ldaa $101
Example: add 1 to 10
In a high level language
i = 1; sum = i
while i <= 10
sum = sum + i
i = i+ 1
in assembly language
let sum =$100, i =101
Example
: Find maximum in an array AR[i] i = 1..10.org h'100
directive
.ORG, .END, .DB define byte, .DW define wordIn an assembler program, the assembly language directive helps to improve the readability of the assembly language program by providing the use of symbolic names. The directives are special instructions. They are pseudo instructions which do not translated into any actual machine instruction. Mostly they provide the name and the constant value stored in the memory.
ORG set PC, EQU define symbol, DB, DW reserve storage.To simplify register allocation, variables are kept in memory (using
DB, DW or EQU). Although sometimes it is laborious to move variables between registers and memory, it is straightforward and easy to understand. Symbolic names can be used to make a program easier to read. From the last example:.org 0
ldx #1
ldaa AR,x
staa max ; max = AR[1]
ldx #2 ; i = 2
...
An assembler can translate a source file into machine code file (in some file format, such as Motorola S-format). This machine code file can be loaded into a simulator and executed. A simulator allows students to execute and monitor the effect step by step. It shows the value of all registers and can display memory values. The a68 assembler and the simulator, sim68 are available for download from the web page of this book.
IBM System/360 ISA
IBM System/360 [AMD64] is one of the longest-lived instruction set to date, the architecture was introduced in 1964. The goal of this family of computer is to have compatible instruction set but have a performance range of 50. The task of designers is a difficult one. It is aimed to perform both scientific and data processing applications. The scientific applications are dominated by floating point operations. The data processing applications involve movement of long strings. Its long live brought light to a now classical problem in instruction set design: the shortage of addressing space. As applications grow the requirement for address space increase very quickly. A design that has the address space adequate at the time of its introduction quickly find itself lacking address space in just a few years later. To quote from IBM [BEL76]
"There is only one mistake . . . that is difficult to recover from - not providing enough address bits . . . "
It is byte addressable, the smallest addressable unit is byte. Addresses are "real" referring to physical location in the main memory. Its successor System/370 [CAS78] introduced a major advanced concept, "virtual" address, where address does not refer directly to a physical location in the main memory but is mapped to a physical location by a dynamic addressing translation mechanism.
S360 has 16 32-bit registers, R0 to R15. R2 to R12 are general purpose. R0, R1, R13, R14, R15 are special purpose and are used in subroutine linkages (Table 2.3). For floating point number operations the registers are paired into four floating point registers, each 64-bit, numbered : 0, 2, 4, 6.
Table 2.3 S360 special purpose registers
register |
caller |
callee |
R0 |
return value from the subroutine |
return value |
R1 |
send parameters to subroutine |
receive parameters |
R13 |
register save area |
save and restore registers |
R14 |
return address |
return value |
R15 |
the address of subroutine |
-- |
Addressing mode
It has five addressing modes: register-register (RR), register-index (RX), register-storage (RS), storage-index (SI) and storage-storage (SS). The instruction format for each mode is (
field:length in bit) :RR |
op:8 |
R1:4 |
R2:4 |
RX |
op:8 |
R1:4 |
X:4 |
B:4 |
D:12 |
RS |
op:8 |
R1:4 |
R3:4 |
B:4 |
D:12 |
SI |
op:8 |
I:8 |
B:4 |
D:12 |
SS |
op:8 |
L1:4 |
L2:4 |
B1:4 |
D1:12 |
B2:4 |
D2:12 |
RR register to register R[R1] = R[R1] op R[R2]
RX register to indexed storage R[R1] = R[R1] op M[R[X] + R[B] + D]
RS register to storage R[R1] = M[R[B] + D] op R[R3]
SI storage to immediate M[R[B] + D] op I
SS storage to storage M[R[B1] + D1]:L1 op M[R[B2] + D2]:L2 where L1, L2 are length of operands
It is byte-addressable. A full word is 32-bit, a double word is 64-bit. The natural size is 32-bit. For arithmetic data, it has decimal, pack decimal, floating point numbers with single precision 32-bit and double precision 64-bit. It has strings and characters, EBCDIC (extended binary coded decimal interchange code),
Types of operations
The S/360 has been built to accommodate many types of basic functions, with decimal data, binary data and floating-point data and instructions for arithmetic operations for each type of data. The instruction format for decimal addition is not the same as that for binary addition, because the decimal addition does not use registers. Floating-point arithmetic uses its own set of registers, and has special environments in regard to numbering registers. The S/360 has the following classification of its instructions.
L
loadLP
load positiveLN
load negativeLC
load complementLA
load addressST
storeB
branchBC
branch on condition, on condition code (CC bits) using the following mnemonics :BZ
branch on zeroBP
branch on positiveBM
branch on minusBNZ
branch on not zeroBNP
branch on not positiveBNM
branch on not minusBO
branch on overflowBNO
branch on not overflowThe addressing mode can be either RR (the destination address is in a register) or RX (the destination address is calculate from base + index + displacement)
BCT
branch on count, this is auto-decrement the operand (RR-type) and branch when the value is 0.BXLE
branch on index low or equalBXH
branch on index highBAL
branch and link (RR, RX) the return address is loaded into op1 and branch to the destination address in op2.B
r branch register r which store the return address. This is used in pair with BAL rA
addS
subtractM
multiplyD
divideC
compareCL
compare logical characterthe operands can be RX RR SS SI
N
andO
orX
xorTM
test under maskSL
shift left arithmetic/logicalSR
shift right arithmetic/logicalMVC
move charactersCLC
compare logical charactersTR
translate and test, string searchTRT
translate and test table, table look up and character translationCV
convert from packed decimal to binaryCVD
convert from binary to packed decimalPACK
convert from zoned decimal to packed decimalUNPACK
convert from packed decimal to zoned decimalED
edit, convert packed decimal to zoned for displayEDMK
edit and mask, similar to edit but use pattern to insert a currency symbol such as $Example of a program to perform W = X + Y - Z. Assume W, X, Y, Z are in the memory.
PROGRAM
START 0
BALR 12,0
USING *, 12
L 2, X R2 = M(X)
A 2, Y R2 = R2 + M(Y)
S 2, Z R2 = R2 - M(Z)
ST 2, W M(W) = R2
BR 14 STOP
X DC F '10' DEFINE CONST FLOAT 10.0
Y DC F '3' DEFINE CONST FLOAT 3.0
Z DC F '4' DEFINE CONST FLOAT 4.0
W DS F RESERVE STORAGE FLOAT
END
Contrast to an ordinary processor of contemporary design which uses registers, a stack machine uses stack. A stack is a LIFO (last in first out) storage with two abstract operations: push and pop. Push puts an item into stack at the top. Pop retrieves an item at the top of stack.
Calculation using stack.
Because a stack is LIFO, any operation must access data item from the top. Stack doesn't need "addressing", as it is implicit in the operators, which use stack. Any expression can be transformed into a postfix order and stack can be used to evaluate that expression without the need for explicitly locating any variable. For example,
B + C - D ==>
add
takes top two items from stack add them and push the result back to stack. Similarly sub operators. store takes one value and one address from stack and store value to address.Let's compare the above expression to the calculation using registers.
B + C - D
A = B
One can see that the main difference is that registers must be allocated, for example,
r0 is used to store temporary result while in a stack machine the temporary storage is implicit. ISA based on stack has an advantage over register-based ISA that it is very compact. As most instructions have implicit argument, the size of instruction is very short, usually one byte. Only a few instructions need argument, such as jump, push literal, that required more than one byte.Example of stack ISA
We will illustrate an ISA that is based on a stack machine. Let us ignore local variables to simplify the presentation (therefore reduce the complication of an activation record). We need load, store, arithmetic operators, call, return and conventional jump and branch for flow of control.
Notation: TOP is the item on top of stack, NEXT is an item below TOP (therefore we can talk about 2 operands on stack by TOP, NEXT), M[ads] value of memory at ads. "pop a" is TOP
--> a, "pop2" pops two items off stack.
lit #a push the immediate value a.
Please note that except
lit #a and jz #a which has #a as argument, all other instructions have argument(s) implicit in the stack. The state of computation consists of a stack pointer and a program counter. If we have two stacks one for computation and one for activation record (called control stack), we need only to store the program counter (return address) in the activation record and there is no need to do anything to computation stack on subroutine calls. Calling a subroutine need just push the current program counter (return address) onto the control stack. Returning is just pop the control stack and restore the previous program counter.Reduced Instruction Set Computer
As high-level languages became popular and started to replace assembly languages the design of instruction set began to take the central stage. The ISA design of that period (circa 1970) emphasised the support of high-level languages using instructions that perform complex operations such as move block of characters and having various addressing modes to accommodate accessing the data structure of high-level languages. The intention is that with these complex instructions the "level" of assembly languages will be lifted up to be nearer to the high-level languages. (The difference between high-level languages and assembly languages is called "semantic gap" [ILI82]). Thus, simplify the construction of compiler (which was one of the most complex programs of those days). The ISA design also emphasised the small size of the executable code. The reason is that by having a small code size, the program will run faster. One obvious fact is there will be fewer instructions to be fetched from memory.
However, because of their complexity, the complex instructions require many cycles to execute. The control unit was more difficult to design and the technique of "microprogram" became the standard engineering tool to battle this complexity. The complexity of a control unit can be measured by the size of the microprogram (the DEC VAX 11/780 has 5140 ´ 96 bits of microprogram, it has one of the most complex ISA [LEV89]). This complexity resulted in the longer cycle time. The other negative aspect of the complex ISA is that the pipeline scheduling is not very effective and the cost of stall is very high.
The study of dynamic execution of instructions of the programs written in high-level languages [PATT82] [LUN77] [HUC83] showed that 1) the most frequently used instructions are the simple instructions 2) compilers do not use much of the complex instructions as it is difficult to match the context (conditions) of statements in the language to specialised instructions, therefore the compiled code contained mostly simple instructions. Table 2.4 shows the result from [PATT82].
Arming with these findings, the movement of the new direction is designing instruction set had begun [PAT82] [PAT85] [STA88]. The ISA design was in the contrast with the earlier ISA, this new ISA emphasised on 1) making the simple instructions run fast 2) making the pipeline efficiency the main concern. This idea led to the effort to make every instruction to run in one cycle. The main technique is to have load/store instruction set and making use of large number of registers to store local values and to pass parameters between call/return. The visible characteristic is that the new ISA has simplified instruction set (this does not mean the number of instruction is reduced), for example, the number of addressing mode is restricted, the complex instructions which can not be completed in one cycle are abandoned, some complex operations are achieved by using a sequence of simple instructions instead.
Table 2.3 Weighted relative dynamic frequency of high-level languages operation
Dynamic occurrence |
Machine instruction weighted |
Memory referenced weighted |
||||
Pascal |
C |
Pascal |
C |
Pascal |
C |
|
ASSIGN |
45 |
38 |
13 |
13 |
14 |
15 |
LOOP |
5 |
3 |
42 |
32 |
33 |
26 |
CALL |
15 |
12 |
31 |
33 |
44 |
45 |
IF |
29 |
43 |
11 |
21 |
7 |
13 |
GOTO |
- |
3 |
- |
- |
- |
- |
OTHER |
6 |
1 |
3 |
1 |
2 |
1 |
The other main departure from the previous ISA design is the emphasis on using compilers to schedule efficient codes. Many techniques in the new ISA requires sophistication of the compiler such as the use of delay branch requires compilers to be able to filled in the delay slot. Fortunately, the software technology has been advanced to the stage that writing this sophisticate compiler becomes possible. With simplified instruction set, compilation techniques achieve a good deal of efficiency. It was easier to generate a good code for this simplified ISA than for a complex ISA. The result from this new thinking is that CPI of processor approaches 1.0. The control unit is simplified to the point that the hardwired circuit is practical. The cycle time is reduced.
The complex instruction set was named "Complex Instruction Set Computer" (CISC) in contrast to the simplified instruction set which was then called "Reduced Instruction Set Computer" (RISC). The year 1980-1990 becomes the golden age of the RISC philosophy when the microelectronics industry has matured and it is possible to produce a high performance processor on a chip. The RISC design has dominated the market and becomes synonymous with high performance. Because of the regularity inherent in the RISC design, the computer-aided design (CAD) tools can be applied easily to the design and test process, hence it accelerates the time to market of the new processors. However, the compatibility of the old software keeps the complex instruction set alive, notable the Intel family of microprocessors, the 80x86 and later the Pentium family.
Decode complexity |
Pipelining difficulty |
||||||
Processor |
No. of Inst. sizes |
Max. Inst. size in bytes |
No. of addressing modes |
indirect addressing |
load/store with combined arithmetic |
Max. no. of memory operands |
unaligned addressing allowed |
MIPS R2000 |
1 |
4 |
1 |
no |
no |
1 |
no |
SPARC |
1 |
4 |
2 |
no |
no |
1 |
no |
HP PA |
1 |
4 |
10 |
no |
no |
1 |
no |
IBM RS/6000 |
1 |
4 |
4 |
no |
no |
1 |
yes |
IBM 3090 |
4 |
8 |
2 |
no |
yes |
2 |
yes |
Intel 80486 |
12 |
12 |
15 |
no |
yes |
2 |
yes |
MC68040 |
11 |
22 |
44 |
yes |
yes |
2 |
yes |
VAX |
56 |
56 |
22 |
yes |
yes |
6 |
yes |
Figure 2.2 Characteristics of some processors
Figure 2.2 shows characteristics of some processors that illustrate the difference between CISC and RISC designs. The first four processors: MIRS R2000, SPARC, HP PA and RS/6000 are RISC. They have one fixed instruction size, small number of addressing modes, has no indirect addressing, no load/store combined with arithmetic instructions and has maximum one memory operand. The other four processors: IBM 3090, Intel 80486, MC68040 and VAX are CISC. This example is chosen to contrast both schools of thought, however, the division between them is not black and white. There are many ISA that fall in between.
The evolution of idea in the ISA design of both generations (CISC and RISC) is the change according to the technological force. The CISC was successful because of microprogramming technique as well as RISC was successful because of the single chip processor technology. The success of both ideas in the past can be a good example how a particular tradeoff is achieved. The lesson learn can be applicable to the future ISA design which definitely will be affected by the technology yet to come (such as DNA computing and nanoelectronics).
The current design uses both ideas in the implementation of a processor [HEN91] [FLY98] [FLY99]. The control is divided in to two parts 1) the execution of basic instructions and 2) the execution of the complex instructions. The basic instructions will be completed in one cycle and multiple issued. The complex instruction will have very deep pipeline, for example the Intel Pentium has 14 stages pipeline in one model. The complex instructions can also be translated at run-time into wide internal micro-operations, which simplify the multicycle pipeline especially for floating-point operations. Flynn said in one of his article [FLY97] that
"Tradeoffs between computer design cost-performance and programmer accessible functionality are as current a problem today as they were in 1953."
and concerning the debate whether CISC or RISC is better that
" ... Actual performance differences in instruction set efficiency are slight, but these differences still stir passions among hardware designers. Within the past few years, there has been a continuing (and generally unproductive) debate over the cost-performance benefits of the so-called RISC instruction sets over earlier instruction sets labeled CISC."
No doubt, the instruction set design of the future processor will have another revolutionary idea as much as RISC has over CISC in the past.
References
[AMD64] Amdahl, G., Blaauw, G., and Brooks, F., "Architecture of the IBM System/360", IBM Journal of Research and Development, April 1964.
[BEL76] Bell, C., and Strecker, W., "Computer structures: What we have learned from the PDP-11", Proc. of 3rd annual symposium on computer architecture, (1976): 1-14.
[CAS78] Case, R., and A. Padegs, A., "Architecture of the IBM System/370", Communication of the ACM, 21(1978): 73-96.
[FLY97] Flynn, M., "Introduction to :Influence of Programming Techniques on the Design of Computers", Proceedings of the IEEE, Volume 85, no. 3, March 1997, pp. 467-469.
[FLY98] Flynn, M., "Computer engineering 30 years after the IBM Model 91", Computer, Volume 31, no. 4, April 1998, pp. 27 -31.
[FLY99] Flynn, M., Hung, P., Rudd, K., "Deep submicron microprocessor design issues", IEEE Micro, Volume 19, no. 4, July-Aug. 1999, pp. 11-22.
[HEN91] Hennessy, J., Jouppi, N., "Computer technology and architecture: an evolving interaction", Computer, Volume 24, no.9 , Sept. 1991, pp. 18-29.
[HUC83] Huck, T., Comparative analysis of computer architectures, Stanford university technical report no. 83-243, May 1983.
[IBM94] International Business Machines, Inc., The PowerPC architecture: A specification for a new family of RISC processors. San Francisco: CA, Morgan Kaufmann, 1994.
[ILI82] Iliffe, J., Advanced computer design, Prentice-Hall, London, 1982.
[LEV89] Levy M., and Eckhouse, R., Computer programming and architecture: the VAX, Bedford, Mass., Digital Press, 1989
[LUN77] Lunde, A., "Empirical evaluation of some features of instruction set processor architecture", Comm. of the ACM, March 1977.
[PAT82] Patterson, D., and Sequin, C., "A VLSI RISC", Computer, 15, no. 9, September, 1982, pp. 8-21.
[PAT85] Patterson, D., "Reduced instruction set computers", Comm. of the ACM, 28, no.1, January 1985.
[STA88] Stallings, W., "Reduced instruction set computer architecture", Proc. of the IEEE, vol. 76, no. 1, January 1988, pp. 38-55.