Performance

The first commercial electronic computer appeared around 1950. The first 25 years the performance improvement came mostly from technology and better computer architectures.  But the last 25 years the improvement mostly came from the advent of microelectronics. The speed increased 18-35% per year.

Technology progresses from vacumm tubes to transisters to integrated circuits.  The birth of microprocessor around 1970 has great impact on performance of computers.  The growth of performance has been highest for microprocessors.  Since 1980 the performance double every 2 years.  This phenomena is called Moore's law.  For example, around 1980 the first IBM PC appeared.   Its CPU was a 8088, 16 bits CPU with 8 MHz clock, 16Kbytes of memory, floppy disk and no hard disk.  The later model offer 5Mbytes hard disk (so called IBM XT).  Today PC equipped with Pentium 32 bit CPU 233 MHz clock, 32Mbyte of memory and 1 Gbyte disk. It performance is around 500 times of the first PC.

Performance = how fast a computer can run
performance = response time ( latency)
performance = throughput  (bandwidth)

The fastest machine of today is the ASCI-Red of the deparment of energy, USA.  It is composed of 2048 nodes of Pentium Pro with collective memory of 600 G bytes.  Its peak performance is 1.8 Tflops and it has run 630 Gflops on 3400 nodes  (running simulation of motion of particles).  See detail in H. Karp, E. Lusk and D. Bailey, "1997 Gordon Bell prize winners",IEEE computer magazine Jan 1998, pp.86-92.
 

Measurement of performance

X is n% faster than Y
execution time Y / execution time X  = 1 + n/100
performance = 1/ execution time   (or 1/t)
execution time Y / execution time X = performance X / performance Y
n = (performance X - performance Y) / performance Y

Amdalh's law
speedup = performance with enhancement use /   performance without enhancement use
speedup = execution time without enhancement use /  execution time with enhancement use

if enhancement is used only fraction_enhancement

execution time new = execution time old (  (1-fraction_enhancement) +
  fraction_enhancement/speedup )

speedup overall = 1 / ((1-fraction_enhancement) + fraction_en / speedup )

Therefore the limit is at how much the enhancement has been used. In achieving speedup by parallelization, Amdalh's law predicts that speedup will be limit by the sequential part of the program.  Let see some numerical example.

Example : a computer has an enhancement with 10 times speedup.  That enhancement is used only 40% of the time.  What is the overall speedup?

speedup overall = 1/ ((1-0.4) + 0.4/10 ) = 1.56

Please note that Amdalh's law apply only with the problem of fixed size.  When problem size much much larger than the machine, Amdalh's law doesn't applied.  This is why the massively parallel machine is still possible.

Example : comparing CPU A and CPU B, A with "compare then branch" instruction sequence, B has special combined "compare&branch".  A has 25% faster clock.  For CPU A, 20% of instruction is "branch" and hench another 20% is the accompanied "compare".  "Branch" takes 2 clocks and all other instructions take 1 clock.  "compare&branch" takes 2 clocks. Both cpu run the same program.  Which is faster?

CPU time A = num. of instruction A x CPI A x cycletime A
 = n.o.i A x (( .20 x 2 ) + ( .8 x 1 ) ) x cycletime A
 = 1.2 x n.o.i x cycletime A

"compare" are not executed in CPU B so 20% of 80% = 25% of instructions are now branching taking 2 clocks and the rest 75% take 1 clock.
CPI B = .25 x 2 + .75 x 1 = 1.25
CPU time B = .8 x n.o.i A x 1.25 x 1.25 cycletime A
 = 1.25 n.o.i A x cycletime A

Therefore A, with shorter cycle time, is faster than B, which executes fewer instructions.

Now if the designer reworks CPU B and reduces the clock cycle time so that now A cycle time is only 10% faster.  Which CPU is faster now?

CPU time B = .8 x n.o.i A x 1.25 x 1.1 cycletime A
 = 1.1 n.o.i A x cycletime A

So now CPU B is faster.
 
Calculation of CPI (Some assembly language is required)
In order to appreciate the effect of different instruction set, understanding of assembly language is required.  An example of assembly language programming is illustrated as follows.

Suppose a hypothetical machine has the typical instruction set composed of : load, store, compare, increment, jump condition.  It has an index register (x) and a set of general purpose register (r0..r7).

Find max of array[i], i=1..N

let the array[i] be accessed by load r0,array,x

max     equ ...
array   equ ...

        load x, #1
        load r0, array, x
        store r0, max         ; max = array[1]
        load x, #2            ; x keeps i
loop    cmp x, #N
        jump GT exit          ; i <= N
        load r0, max
        load r1, array, x
        cmp r0, r1            ; max < array[i]
        jump GE skip
        store r1, max
skip    inc x
        jump loop
exit    END

We count the number of instruction being executed to calculate CPI.
let N=3, array[] = 1,2,3
                  frequency     clock
load          3+4                 2
store         1+2                 2
cmp           4                     1
jump         4                     2
inc             3                     1

total  21 instructions, 35 clocks.  CPI = 35/21 = 1.67