to write a word
send address
send data
command WRITE
wait until cycle complete
type of memory cell : RAM, static RAM, dynamic RAM
New memory bus : page mode, nibble mode, RAMBUS, EDO etc.
To increase memory bandwidth (bits/bytes per second)
* reduce cycle time
* increase word size
* concurrent access (banking, interleaving)
memory hierarchy
processor --- cache --- main memory --- secondary memory
principle of locality of references
* temporal
* spatial (array, vector, code segment)
advantage : average memory access time is nearly equal to the speed of the fastest memory, whereas the average unit cost of the memory system approaches the cost of the cheapest memory.
figure graph of access time vs cost of memory
Cache Organisation
* Fully Associative
* Direct Map
* Set Associative
replacement policy
* random
* least frequently used
* least recently used
* optimal
Cache type is classified according to its "organisation". Mostly
they differ in the following manner :
1 where can a block be placed in the upper level?
2 how is a block found if it is in the upper level?
3 which block should be replaced on a miss?
4 what happens on a write?
Q1
if each block has only one place it can appear in a cache. The
cache is called "direct mapped"
if a block can be placed anywhere, it is called "fully associative".
if a block can be placed in a restricted set of places, it is "set-
associative".
Q2
Cache includes an address tag on each block which is used to check
if the required address is in the cache.
Direct mapped cache searches exactly one tag.
Fully associative cache searches all tags at once.
Set associative cache searches one tag from one set but all set at
once.
Q3
replacement policy
direct mapped cache has no choice.
others have random, Least Recently Used, Least Frequently Used, Optimal.
Q4
two options :write through, write back.
associative memory (Content Addressable Memory, CAM)
retrieve "content" of memory by association. "Is there any cell that
has value 105?".
Using conventional memory (RAM) which has to be accessed by address,
to retrieve by association requires scanning the whole content. If
there is no ordering among the content scanning will take O(n). If
some ordering, such as ascending, using binary search will take O(n log
n). But associative memory take O(1).
Example on-chip cache of Intel 486
4 Way set associative, line size 16 bytes, cache size 8Kbytes
Pseudo LRU 3 bits : r0, r1, r2 Set number :
s0, s1, s2, s3
most recent access
set bit
s0 or s1
r0
s0
r1
s2
r2
if r0 r1 = 00 replace s0
if r0 r1 = 11 replace s1
if r0 r2 = 10 replace s2
if r0 r2 = 11 replace s3
Address Trace
To measure the performance of a cache memory, miss ratio is measure
from an address trace. Some problems are presented :
1. the workload on the trace may not be representative
2. the initialisation transient may grossly affect the evaluation
3. the trace may be too short to obtain accurate measurement.
The length of the trace is important for accuracy. Also the concern about initialisation of cache (should or should not take into account for cache miss). If cache size is 1000 lines, assuming 1 byte per line, and miss ratio is 1%, for miss to occur once for every line requires the trace length of 100,000 addresses just to initialise the cache.