Cache memory

General idea
   Cache memory is a "small" fast memory that resides "between" main memory and the processor.  It is much faster than the main memory.  Having cache memory reduces the access time. 
   Because cache memory is much smaller than main memory, it is a "copy" of some part of the main memory.  When the processor gets the data from cache memory it is called, cache is "hit", otherwise that data has to be fetched from the main memory, in this case, it is called a "miss".
   The goal is to arrange the cache memory to maximize "hit".

How it work
   The main memory can be considered to be divided into many blocks of cache (in which the actual cache can keep only a few of these blocks).  The full address bit is divided into two parts, the address bit to cache block, and the address bit to the tag.  TAG stores the "block number" of the actual block of memory stored in the cache at that moment.  In other words, TAG is used to remember which block is stored in the cache memory.

Organisation
   Direct Map
   Fully Associative
   Set Associative

Direct Map
   The simplest organisation of cache is to use the block number to "directly" address the TAG.  When an access occurs, the "actual" TAG address bit is compared to the "block number" stored in that slot, if they are the same, it is a hit, otherwise it is a miss.
   
   Let us illustrate by an example:

   The main memory is 1G bytes (address 30 bits).  The cache is 1M bytes (address 20 bits).  For a direct map cache, we need a TAG storage which has 1024 slots, each slot stores 10 bits of the "block number".

   Let the access address by "xyzabcd" where xyz is the left most 10 bits and abcd is the rest 20 bits.   "abcd" is used as the address of the 1M cache.  "xyz" is used as the address of the TAG.   "xyz" is also the block number.  When an access occurs, the bottom 20 bits of address goes to cache block.  The first 10 bits goes to TAG and this part of address is compared to the "block number" stored at that slot.  If they are the same, the access is a hit and the processor can retrieved the valid data from the cache memory.  If it is a miss, the data will be fetched from the main memory and also write to cache.  The "block number" in the TAG will be updated.

   Direct map organisation is economic to implement.  However, it is restrictive because the access can be conflicting at the same "block" eventhough there are other slot available.    

  To improve this situation, the access to TAG storage is change to "associative".

Fully associative

  The cache can be used more effectively if the "block number" can be placed at any slot of TAG instead of being forced to a certain place as in the direct map.  To do this the TAG storage must be "associative".  This means that the action to access the "block number" is a kin to answer the question "is xyz exists in the TAG?"  That is the same as using the "content" to address the memory.  

  With "associative" TAG storage, the part "xyz" can be placed at any empty slot.  Therefore the hit rate is better than the direct map organisation.   

  The disadvantage is that "associative" memory is expensive to implement.  The third alternative cache organisation is the hybrid of direct map and fully associative.

Set associative

  Rather than having full freedom of associative memory, a number of direct map  can be used in parallel (so called a set, or n-way).  All TAG are compared in parallel and there are flexibility in choosing one of n places to store data.

  This scheme keeps simplicity of direct map and it is flexible in using the available slot (upto n choices).

Write policy
  write through
  write back

When all write to cache is also write to the main memory, the consistency between cache memory and main memory will be simple to keep.  This can be important in some application.  However, always having to write to the main memory will slow down the access.  

In the write back scheme, the main memory will be update only when there is a miss on that slot.  In order to do that, an extra bit of memory is needed to keep track whether a slot in cache has been written to or not.  This is called the "dirty" bit.  When a write to cache occurs its dirty bit is set.  When a miss occurs, the dirty bit is inspected and if it is set, the content of cache will be update to the main memory.  This reduce the number of write to the main memory to the minimum.  However, the state of main memory and the cache memory may not be the same all the time.