Cache Memory PDF

Memory Hierarch  characteristics of the hierarchy: Decreasing cost per bit Increasing capacity Increasing access time  The fourth characteristic is met using the principle of locality of reference Locality of Reference Due to the nature of programming, instructions and data tend to cluster together (loops, subroutines, and data structures) Over a long period of time, clusters will change Over a short period, clusters will tend to be the same Cache  A cache is a small amount of fast memory  What makes small fast? Simpler decoding logic More expensive SRAM technology Close proximity to processor – Cache sits between normal main memory and CPU or it may be located on CPU chip or module Cache (continued) Cache operation – overview  CPU requests contents of memory location  Check cache for this data  If present, get from cache (fast)  If not present, one of two things happens: read the required block from main memory to cache then deliver from cache to CPU (cache physically between CPU and bus) read required block from main memory to cache and simultaneously deliver to CPU (CPU and cache both receive data from the same data bus buffer) Cache Structure  Cache includes tags to identify the address of the block of main memory contained in a line of the cache  Each word in main memory has a unique n-bit address  There are M=2n/K block of K words in main memory  Cache contains C lines of K words each plus a tag uniquely identifying the block of K words Cache Structure Line EX: Mémoire principale (RAM) de 1 number Tag Block Mo (mots à 1 octet):20 bits adr 0 Blocks de 1 Ko: 10 bits adr 1 Nbre de blocks dans la RAM= 2 1Mo/1Ko= 1K (1024 blocks) Chaque block de 1 Ko Si cache contient 16 lignes de 1Ko ( 1 K mots) (cache:16Ko) Alors le cache contient 16 blocks (16 lignes) C-1 K= 1 Kilo, mots à 1 octet Alors C=16 blocks dans le cache Block length et dans la RAM , 1024 blocks (K words) Tag:6bits index:4bits offset:10bits Intel x86 caches  80386 – no on chip cache  80486 – 8k using 16 byte lines and four- way set associative organization (main memory had 32 address lines – 4 Gig)  Pentium (all versions) Two on chip L1 caches Data & instructions Pentium 4 L1 and L2 Caches  L1 cache 8k bytes 64 byte lines Four way set associative  L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative Pentium 4 Design Reasoning  Decodes instructions into RISC like micro-ops before L1 cache  Micro-ops fixed length – Superscalar pipelining and scheduling  Pentium instructions long & complex  Performance improved by separating decoding from scheduling & pipelining. Power PC Cache Organization  601 – single 32kb 8 way set associative  603 – 16kb (2 x 8kb) two way set associative  604 – 32kb  610 – 64kb  G3 & G4 64kb L1 cache – 8 way set associative 256k, 512k or 1M L2 cache – two way set associative Comparison of Cache Sizes (Table 4.3) Memory Memory Divided into Blocks Address 1 2 3 Block of K words Block 2n-1 Word length Cache Design  Size  Mapping Function  Replacement Algorithm  Write Policy  Block Size  Number of Caches Cache size  Cost – More cache is expensive  Speed More cache is faster (up to a point) Larger decoding circuits slow up a cache Algorithm is needed for mapping main memory addresses to lines in the cache. This takes more time than just a direct RAM Typical Cache Organization Mapping Functions  A mapping function is the method used to locate a memory address within a cache  It is used when copying a block from main memory to the cache and it is used again when trying to retrieve data from the cache  There are three kinds of mapping functions Direct Associative Set Associative Cache Example Example of Mapping: Index Size: 64 kByte (cache) adresses Block size: 4 bytes – the cache has 16k (214) lines of 4 bytes Address bus: 24-bit– i.e., 16M bytes main memory divided into 4M of 4 byte blocks RAM (16Mb): 4 M blocks each block of 4 bytes Cache (64kb): 16 K blocks Tag:8 index:14 offset:2 total:24 bits Direct Mapping Traits  Each block of main memory maps to only one cache line  – i.e. if a block is in cache, it will always be found in the same place  Line number is calculated using the following function i = j modulo m i = cache line number j = main memory block number m = number of lines in the cache N° ligne _dansCache=N° block (RAM) MOD (NbreLigne_cache) Direct Mapping Address Structure Each main memory address can by divided into three fields  Least Significant w bits identify unique word within a block  Remaining bits (s) specify which block in memory. These are divided into two fields Least significant r bits of these s bits identifies which line in the cache Most significant s-r bits uniquely identifies the block within s-rabits line of the cache r bits w bits Tag Bits identifying Bits identifying word row in cache offset into block Direct Mapping Address Structure Example Tag s-r Line or slot r Word w 8 14 2  24 bit address (16 Mb RAM) Adresse du  2 bit word identifier (blocks of 4bytes) mot dans le block  22 bit block identifier 14bits adresse de la ligne  8 bit tag (=22–14)  14 bit slot or line 8 bit adresses du tag  No two blocks in the same line have the same tag  Check contents of cache by finding line and comparing tag Direct Mapping Cache Line Table Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s–m 1 1, m+1, 2m+1…2s–m+1 m–1 m–1, 2m–1, 3m–1…2s–1 M lignes dans le cache: (M=20) S lignes dans la RAM: (s=1024) Direct Mapping Cache Organization Direct Mapping Examples What cache line number will the following addresses be stored to, and what will the minimum address and the maximum address of each block they are in be if we have a cache with 4K lines of 16 words to a block in a 256 M memory space (28-bit address)? RAM: 256 Mo (28 bits) cache 64 Ko Nbre blocks (lines)= 4K (212 ) 1 line (block)= 16 words (bytes) [4bits address] Tag s-r Line or slot r Word w 28-12-4=12 12 4 N° ligne _dansCache=N° block (RAM) MOD (NbreLigne_cache) N° ligne _dansCache= 9ABCDEF mod 212 = 3294 (CDE= 3294) a.) 9ABCDEF16 b.) 123456716 Ligne N° 45616 = 1110 Assume that a portion of the tags in the cache in our example looks like the table below. Which of the following addresses are contained in the cache? a.) 438EE816 b.) F18EFF16 c.) 6B8EF516 d.) AD8EF316 Tag (binary) Line number (binary) Addresses wi/ block 00 01 10 11 0101 0011 1000 1110 1110 10 1110 1101 1000 1110 1110 11 1010 1101 1000 1110 1111 00 0110 1011 1000 1110 1111 01 1011 0101 1000 1110 1111 10 1111 0001 1000 1110 1111 11 Direct Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line width = 2w words or bytes  Number of blocks in main memory = 2s+ w /2w = 2s  Number of lines in cache = m = 2r  Size of tag = (s – r) bits Direct Mapping pros & cons  Simple  Inexpensive  Fixed location for given block – If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high (thrashing) Associative Mapping Traits  A main memory block can load into any line of cache  Memory address is interpreted as: Least significant w bits = word position within block Most significant s bits = tag used to identify which block is stored in a particular line of cache  Every line's tag must be examined for a match  Cache searching gets expensive and slower Associative Mapping Address Structure Example Tag – s bits Word – w bits (22 in example) (2 in ex.)  22 bit tag stored with each 32 bit block of data  Compare tag field with tag entry in cache to check for hit  Least significant 2 bits of address identify which of the four 8 bit words is required from 32 bit data block Fully Associative Cache Organization Fully Associative Mapping Example Assume that a portion of the tags in the cache in our example looks like the table below. Which of the following addresses are contained in the cache? a.) 438EE816 b.) F18EFF16 c.) 6B8EF316 d.) AD8EF316 Tag (binary) Addresses wi/ block 00 01 10 11 0101 0011 1000 1110 1110 10 1110 1101 1100 1001 1011 01 1010 1101 1000 1110 1111 00 0110 1011 1000 1110 1111 11 1011 0101 0101 1001 0010 00 1111 0001 1000 1110 1111 11 Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  Number of blocks in main memory = 2s+ w /2w = 2s  Number of lines in cache = undetermined  Size of tag = s bits Set Associative Mapping Traits  Address length is s + w bits  Cache is divided into a number of sets, v = 2d  k blocks/lines can be contained within each set  k lines in a cache is called a k-way set associative mapping  Number of lines in a cache = v k = k 2d  Size of tag = (s-d) bits Set Associative Mapping Traits (continued)  Hybrid of Direct and Associative k = 1, this is basically direct mapping v = 1, this is associative mapping  Each set contains a number of lines, basically the number of lines divided by the number of sets  A given block maps to any line within its specified set – e.g. Block B can be in any line of set i.  2 lines per set is the most common organization. Called 2 way associative mapping A given block can be in one of 2 lines in only one specific set Significant improvement over direct mapping K-Way Set Associative Cache Organization How does this affect our example?  Let’s go to two-way set associative mapping  Divides the 16K lines into 8K sets  This requires a 13 bit set number  With 2 word bits, this leaves 9 bits for the tag  Blocks beginning with the addresses 00000016, 00800016, 01000016, 01800016, 02000016, 02800016, etc. map to the same set, Set 0.  Blocks beginning with the addresses 00000416, 00800416, 01000416, 01800416, 02000416, 02800416, etc. map to the same set, Set 1. Set Associative Mapping Address Structure Tag Set Word 9 bits 13 bits 2 bits  Note that there is one more bit in the tag than for this same example using direct mapping.  Therefore, it is 2-way set associative  Use set field to determine cache set to look in  Compare tag field to see if we have a hit Set Associative Mapping Example For each of the following addresses, answer the following questions based on a 2-way set associative cache with 4K lines, each line containing 16 words, with the main memory of size 256 Meg memory space (28-bit address):  What cache set number will the block be stored to?  What will their tag be?  What will the minimum address and the maximum address of each block they are in be? 1. 9ABCDEF16 2. 123456716 Tag s-r Set s Word w 13 11 4 Set Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  Number of blocks in main memory = 2s+ w/2w = 2s  Number of lines in set = k  Number of sets = v = 2d  Number of lines in cache = kv = k * 2d  Size of tag = (s – d) bits Replacement Algorithms  There must be a method for selecting which line in the cache is going to be replaced when there’s no room for a new line  Hardware implemented algorithm (speed)  Direct mapping There is no need for a replacement algorithm with direct mapping Each block only maps to one line Replace that line Associative & Set Associative Replacement Algorithms  Least Recently used (LRU) Replace the block that hasn't been touched in the longest period of time Two way set associative simply uses a USE bit. When one block is referenced, its USE bit is set while its partner in the set is cleared  First in first out (FIFO) – replace block that has been in cache longest Associative & Set Associative Replacement Algorithms (continued)  Least frequently used (LFU) – replace block which has had fewest hits  Random – only slightly lower performance than use-based algorithms LRU, FIFO, and LFU Writing to Cache  Must not overwrite a cache block unless main memory is up to date  Two main problems: If cache is written to, main memory is invalid or if main memory is written to, cache is invalid – Can occur if I/O can address main memory directly Multiple CPUs may have individual caches; once one cache is written to, all caches are invalid Write through  All writes go to main memory as well as cache  Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date  Lots of traffic  Slows down writes Write back  Updates initially made in cache only  Update bit for cache slot is set when update occurs  If block is to be replaced, write to main memory only if update bit is set  Other caches get out of sync  I/O must access main memory through cache  Research shows that 15% of memory references are writes

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue