Some improvement can be made to the above design. To increase the speed the number of state for each instruction must be reduced. To reduce the complexity of the circuit, state should be shared wherever possible.
Reduce the number of state
<store>
MAR = IR:ADS
MDR = R[IR:R0]
M[MAR] = MDR ; MWRITE
<storer>
MAR = R[IR:R2]
MDR = R[IR:R1]
M[MAR] = MDR; MWRITE
the above states cannot be merged as both MAR and MDR is on the same
internal bus therefore can not be accessed at the same time. If two
internal bus are available then these states can be merged into one (the
register bank already has two read ports).
<store>
MAR = IR:ADS; MDR = R[IR:R0]; MWRITE
<storer>
MAR = R[IR:R2]; MDR = R[IR:R1]; MWRITE
Share state
<load>
R[IR:R0] = MDR
<loadr>
R[IR:R2] = MDR
These states can be merged if R0 == R2. We can do that by changing
the opcode format to use fixed field encoding. Moving R2 to the same
field as R0.
<add>
R[IR:R2] = T
<inc>
R[IR:R1] = T
These states can be merged if R2 == R1. We do that by changing
the meaning of say "add" instruction from R1 + R2 -> R2 to R1 + R2 -> R1.
Timing of S1 hardwired control unit
load
6
store
6
loadr
6
storer
6
jump
5
jump taken 5
jump notaken 4
move
5
add
5
inc
5
cmp
4
call
9
ret
8
Dest, Src : specify destination and source for internal bus.
SelR : selecting registers in register file.
Mctl : memory control for read/write.
ALU : specify function of ALU and latch the result to T register.
Misc : other control signal such as PC + 1.
Cond : for testing condition for jump to other microword.
Goto : next address.
Table S1 microprogram
Loc | Label | Dest | Src | SelR | ALU | Mctl | Misc | Cond | Goto | |
0 | ifetch | MAR | PC | |||||||
1 | w0 | RD | MRDY | w0 | ||||||
2 | IR | MDR | PC+1 | Decode | ||||||
3 | load | MAR | IR:ADS | |||||||
4 | w1 | RD | MRDY | w1 | ||||||
5 | R | MDR | IR:R0 | U | ifetch | |||||
6 | store | MAR | IR:ADS | |||||||
7 | MDR | R | IR:R0 | |||||||
8 | w2 | WR | MRDY | w2 | ||||||
9 | U | ifetch | ||||||||
10 | loadr | MAR | R | IR:R1 | ||||||
11 | w3 | RD | MRDY | w3 | ||||||
12 | R | MDR | IR:R2 | U | ifetch | |||||
13 | storer | MAR | R | IR:R2 | ||||||
14 | MDR | R | IR:R1 | |||||||
15 | w4 | WR | MRDY | w4 | ||||||
16 | U | ifetch | ||||||||
17 | mov | IR:R12 | PASS1 | |||||||
18 | R | T | IR:R2 | U | ifetch | |||||
19 | add | IR:R12 | ADD | |||||||
20 | T | T | IR:R1 | U | ifetch | |||||
21 | cmp | IR:R12 | SUB | U | ifetch | set CC | ||||
22 | inc | IR:R12 | ADD1 | |||||||
23 | R | T | IR:R1 | U | ifetch | |||||
24 | jmp | testCC | ifetch | cc false | ||||||
25 | PC | IR:ADS | U | ifetch | jump | |||||
26 | jal | R | PC | IR:R0 | ||||||
27 | PC | IR:ADS | U | ifetch | ||||||
28 | jr | PC | R | IR:R1 | U | ifetch |
The memory read/write step has "wait for memory ready" state. Because the use of cache memory, one can assume 0 clock waiting for memory ready when cache hits and more than 10 clocks for a miss penalty.
For example, instruction fetch starts with
0: MAR = PC
Dest and Src of the internal bus MAR and PC, then wait for memory to
fill in MDR.
1: MDR = M[MAR] ; MREAD
After memory cycle has completed,
2: IR = MDR ; PC = PC + 1
then branch to each instruction depends on IR:OP and IR:XOP (we will
elaborate on this instruction decoding mechanism later). Suppose
the instruction is "load", the microprogram go to location 2 (load)
and the following sequence occurs :
3: MAR = IR:ADS
then waiting for memory then
4: MDR = M[MAR] ; MREAD
5: R[IR:R0] = MDR
The register is selected by IR:R0 and Dest and Src of internal bus
are R and MDR. After completion, the microprogram branches back to instruction
fetch (specified by the next address field). For ALU instruction,
for example, "add" the following sequence occurs after the instruction
fetch, go to location 19 :
19: T = ADD(R[IR:R1], R[IR:R2])
the registers are selected and read: IR:R1, IR:R2; to ALU and ALU function
ADD is activated. The result from ALU is latched to T register. Then
the result is written to back to register selected by IR:R1 and the microprogram
branches back to the instruction fetch.
20: R[IR:R1] = T
Totally the microprogram is 29 words. Each microword is in fact
composed of the control bits that control the signals in the datapath.
We will assign the bits to each field of microword as following :
bit 0..4 Dest : 5 bits
for write to R, PC, IR, MAR, MDR.
bit 5..10 Src : 6 bits for read
from R, PC, IR, MAR, MDR, T.
bit 11..14 SelR : 4 bits for selecting IR:R0,
IR:R1, IR:R2, IR:R1,R2
bit 15..18 ALU : 4 bits for ALU function :
PASS1, ADD, SUB, ADD 1.
bit 19..20 Mclt : Mread, Mwrite
bit 21
Misc : 1 bit for PC + 1.
bit 22..25 Cond : 4 bits for jump control :
Uncond, Mrdy, testCC, Decode.
bit 26..30 Goto : 5 bits, micro store
has 29 addresses therefore 5 bits to address each of them.
So for the unencoded microword, the microword for S1 is 31 bit long. The instruction decoding, to branch to each microprogram sequence for each instruction, can be achieved by using IR:OP concatenate with IR:XOP (3 bits and 4 bits) to point to a jump table which contain the location of microword in the microprogram.
To reduce the width of the microword, each field can be "encoded"
as following :
Dest : 5 signals, 3 bits.
Src : 6 signals, 3 bits.
SelR : 4 signals, 3 bits (including NONE)
ALU : 4 signals, 3 bits.
Mctl : 2 bits
Misc : 1 bit.
Cond : 4 signals, 3 bits
Goto : only 6 distinct locations to jump to
: ifetch, w0, w1, w2, w3, w4 hence 3 bits.
Totally the encoded or vertical microprogram for S1 is 21 bit long.
Figure Scheme for decoding opcode in ifetch
Figure Comparing unencoded and encoded microword for S1
Timing for microprogrammed S1
load
6
store
7
loadr
6
storer
7
jump uncond 5
jump taken 5
jump nottaken 4
move
5
add
5
inc
5
cmp
4
jal
5
jr
4