Sequential executionSequential execution
Pipeline (overlapped execution)
Superpipe
Superscalar
Vector
Overlap execution (pipeline)
Fetch, Decode, Execute
Fetch, Decode, Execute
because the memory was slow, it is Fetch limit. The fetch portion
is larger than decode and execution. To increase performance designer do
"more" in one instruction during execution.
------ Fetch------- Dec1 Exe1 Dec2 Exe2
-------Fetch------- Dec1 Exe1 Dec2 Exe2
therefore, CPI is large, cycle time is large because the complex circuits required to execute complex instruction. The increase in chance of conflict in pipeline because one instruction stays in pipeline for long time therefore it can interfere with other instructions.
The invention of cache memory reduces Fetch time greatly.
Current design concentrates on reducing CPI and cycle time. By simplify
the execution of one instruction (and ISA), pipeline can be more effective
and circuits can be simpler and faster.
Fetch, decode, execute, writeback
Fetch, decode, execute, writeback
Superpipeline
Once the pipeline enables CPI to reach 1, the only way to increase
speed is to reduce cycle time. To make it possible, the pipeline
is divided into finer grain which reduce the clock time for each stage.
This idea is called "superpipeline".
Fet1, fet2, dec1, dec2, wrt1,
wrt2
Fet1, fet2, dec1, dec2, wrt1, writ2
Superscalar
To increse performance further we need to issue more than one instruction
per clock. This is called "superscalar".
Fetch, decode, execute, writeback
Fetch, decode, execute, writeback
Fetch,
decode, execute, writeback
Fetch,
decode, execute, writeback
Of course, superpipe-superscalar is possible.
Fet1, fet2, dec1, dec2, wrt1, wrt2
Fet1, fet2, dec1, dec2, wrt1, wrt2
Fet1, fet2, dec1,
dec2, wrt1, wrt2
Fet1, fet2, dec1,
dec2, wrt1, wrt2
Summary
- Sequential (non overlap execution)
- Pipeline (overlap execution) CPI --> 1
instruction pipeline (single step)
floating-point pipeline (multi step)
Scoreboard and Tomasulo methods are hardware for enabling dynamic execution in which instructions can be rearrange by hardware to execute according to the resources available.
- Superpipe CPI = 1 reduce cycle time (higher clock rate)
- Superscalar CPI < 1
- Vector machines reduce fetch time and increase effective pipeline
but its use is restricted to the class of program that fits to vector
computation.