Chapter 4 Cache Memory PDF

# Chapter 4 Cache Memory ## Table 4.1 Key Characteristics of Computer Memory Systems | Location | Performance | Physical Type | Physical Characteristics | Organization | | --------------------------- | ------------------------ | ------------------- | ------------------------- | ------------------- | | Internal (e.g. processor registers, cache, main memory) | Access time | Semiconductor | Volatile/nonvolatile | Memory modules | | External (e.g. optical disks, magnetic disks, tapes) | Cycle time | Magnetic | Erasable/nonerasable | | | | Transfer rate | Optical | | | | | | Magneto-optical | | | | Capacity | | | | | | Number of words | | | | | | Number of bytes | | | | | | Unit of Transfer | | | | | | Word | | | | | | Block | | | | | | Access Method | | | | | | Sequential | | | | | | Direct | | | | | | Random | | | | | | Associative | | | | | ## Characteristics of Memory Systems - **Location** - Refers to whether memory is internal and external to the computer - Internal memory is often equated with main memory - Processor requires its own local memory, in the form of registers - Cache is another form of internal memory - External memory consists of peripheral storage devices that are accessible to the processor via I/O controllers - **Capacity** - Memory is typically expressed in terms of bytes - **Unit of transfer** - For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module ## Method of Accessing Units of Data - **Sequential access** - Memory is organized into units of data called records - Access must be made in a specific linear sequence - Access time is variable - **Direct access** - Involves a shared read-write mechanism - Individual blocks or records have a unique address based on physical location - Access time is variable - **Random access** - Each addressable location in memory has a unique, physically wired-in addressing mechanism - The time to access a given location is independent of the sequence of prior accesses and is constant - Any location can be selected at random and directly addressed and accessed - Main memory and some cache systems are random access - **Associative** - A word is retrieved based on a portion of its contents rather than its address - Each location has its own addressing mechanism and retrieval time is constant independent of location or prior access patterns - Cache memories may employ associative access ## Capacity and Performance: - The two most important characteristics of memory - Three performance parameters are used: - **Access time (latency)** - For random-access memory, it is the time it takes to perform a read or write operation - For non-random-access memory, it is the time it takes to position the read-write mechanism at the desired location - **Memory cycle time** - Access time plus any additional time required before the second access can commence - Additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively - Concerned with the system bus, not the processor - **Transfer rate** - The rate at which data can be transferred into or out of a memory unit - For random-access memory, it is equal to 1/(cycle time) ## Memory - The most common forms are: - Semiconductor memory - Magnetic surface memory - Optical - Magneto-optical - Several physical characteristics of data storage are important - **Volatile memory** - Information decays naturally or is lost when electrical power is switched off - **Nonvolatile memory** - Once recorded, information remains without deterioration until deliberately changed - No electrical power is needed to retain information - **Magnetic-surface memories** - Are nonvolatile - **Semiconductor memory** - May be either volatile or nonvolatile - Nonerasable memory - Cannot be altered, except by destroying the storage unit - Semiconductor memory of this type is known as read-only memory (ROM) - For random-access memory, the organization is a key design issue - Organization refers to the physical arrangement of bits to form words ## Memory Hierarchy - Design constraints on a computer's memory can be summed up by three questions: - How much, how fast, how expensive - There is a trade-off among capacity, access time, and cost - Faster access time, greater cost per bit - Greater capacity, smaller cost per bit - Greater capacity, slower access time - The way out of the memory dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy ## Figure 4.1 The Memory Hierarchy A pyramid is illustrated with the following labels from top to bottom. - Registers - Cache - Main memory - Inboard memory - Outboard storage - Off-line storage - Magnetic tape - Magnetic disk - CD-ROM - CD-RW - DVD-RW - DVD-RAM - Blu-Ray - There are hand-written notes on this figure: - The top of the figure has this note: "Faster and more expensive" - The left hand side of the pyramid has this note: "Slower than" - The upper right hand side of the pyramid has this note: "Slower than" - The lower right hand side of the pyramid has this note: "Slower than" - The bottom of the pyramid has this note: "The price?" ## Figure 4.2 Performance of a Simple Two-Level Memory - A graphic is illustrated showing the relationship between the average access time and fraction of accesses involving only level 1 (hit ratio) of a simple two-level memory. - Level 1 and Level 2 are labeled on the graph. The value of the average access time is higher when only level 1 is involved, and the value of the average access time is lower as the fraction of accesses involving only level 1 increases. ## Memory - At previous chapter we said cache memory has levels - The use of three levels exploits the fact that semiconductor memory comes in a variety of types which differ in speed and cost - Data are stored more permanently on external mass storage devices - External, nonvolatile memory is also referred to as secondary memory or auxiliary memory ## Disk cache - A portion of main memory can be used as a buffer to hold data temporarily that is to be read out to disk - A few large transfers of data can be used instead of many small transfers of data - Data can be retrieved rapidly from the software cache rather than slowly from the disk ## Figure 4.3 Cache and Main Memory - Two diagrams are illustrated: - **Single cache diagram:** - A CPU is connected to a Cache and Main Memory. - Data goes from CPU to Cache, and Cache to Main Memory. - *Word transfer and Block transfer are labeled on the diagram.* Data is transferred from CPU to Cache as a Word Transfer (fast), and from Cache to Main Memory as a Block Transfer (slow). - **Three-level cache organization diagram:** - A CPU is connected to Level 1 cache, Level 2 cache, Level 3 cache, and Main Memory. - *The speed of the transfer is labeled from fastest to slowest.* Data transfers from CPU to Level 1 cache as fastest and slowest from Level 3 cache to Main Memory. - There are handwritten notes on this figure: - Next to the Single cache diagram: "Smallest in size, but the fastest, the largest in size, and the slowest" - Next to the Three-level cache organization diagram: "Customarily" ## Figure 4.4 Cache/Main-Memory Structure - Two diagrams are illustrated: - **Cache diagram:** - A table of cache blocks is shown. Each block contains line numbers, a tag, and a block. - *The direction of data transfer and question are labeled.* Data transfers from Main Memory to cache with a question: "Does the data exist first?". - **Main memory diagram:** - A table of main memory blocks is shown. Each block contains a word length, 2n-1, and a word. ## Figure 4.5 Cache Read Operation - A flowchart is illustrated showing the steps involved in a cache read operation: - Start - Receive address RA from CPU - Is block containing RA in cache? - Yes: - Fetch RA word and deliver to CPU - Done - No: - Access main memory for block containing RA - Allocate cache line for main memory block - Load main memory block into cache line - Deliver RA word to CPU ## Figure 4.6 Typical Cache Organization - A schematic is illustrated of cache organization, showing the following components and connections: - Processor - Control - Cache - Control - Data - System Bus - Address buffer (Data is stored in this buffer) - Data buffer (Data is stored in this buffer) ## Figure 4.7 Logical and Physical Caches - Two diagrams are illustrated: - **Logical cache diagram:** - A processor is connected to a Cache and Main Memory via an MMU. - Logical address is received by the processor. - Data transfer follows this path: Processor receives Logical address, MMU converts it to physical address, Cache stores data, and Main Memory is connected to the Cache. - **Physical cache diagram:** - A processor is connected to Cache and Main Memory via an MMU. - Logical address is received by the processor. - Data transfer follows this path: Processor receives Logical address, MMU converts it to physical address, Cache stores data, and Main Memory is connected to the Cache. - There are handwritten notes on this figure: - Next to the Logical cache diagram: "It's checked if the info exists within the data" - Next to the Physical cache diagram: "What is it?" ## Mapping Function - Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines - Three techniques can be used: - **Direct** - The simplest technique - Maps each block of main memory into only one possible cache line - **Associative** - Permits each main memory block to be loaded into any line of the cache - The cache control logic interprets a memory address simply as a Tag and a Word field - To determine whether a block is in the cache, the cache control logic must simultaneously examine every line's Tag for a match - **Set Associative** - A compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages ## Figure 4.8 Mapping From Main Memory to Cache: Direct and Associative - Two diagrams are illustrated: - **Direct mapping diagram:** - A table of main memory blocks is shown, with "First m blocks of main memory (equal to size of cache)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. - **Associative mapping diagram:** - A single main memory block with "b" label, which represents the length of block in bits is shown. - A table of cache memory is shown, with "b" label for each block. - Data flow from the main memory block to the cache memory table. - "t" is labeled, which represents the length of tag in bits. - There are handwritten notes on this figure: - Next to the Associative mapping diagram: "This picture is easier to understand" ## Figure 4.9 Direct-Mapping Cache Organization - A schematic is illustrated showing the direct mapping cache organization. The following components are shown: - Memory address (Tag, Line, and Word are labeled) - Cache (Tag and Data are labeled) - Main Memory - Compare (1 if match, 0 if no match) - Block - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. - There are handwritten notes on this figure: - Next to the Block: "Data set" ## Figure 4.10 Direct Mapping Example - A diagram is illustrated showing the direct mapping example, showing the following components: - Main Memory (00000000:0000:0000:0000 is labeled) - Cache (00, 16, FF, 00, 16, FF are labeled) - Data (13579246, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468) - Tag (hex) - (00, 00, 16, 16, 16, 16, FF, FF, FF, FF) - Line Number - (0000, 0001, 0CE7, 3FFF, 3FFE, 3FFF) - There are handwritten notes on this figure: - Next to the Main Memory address: "I understand" ## Direct Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - *Number of blocks in main memory = 2s+w/2w = 2s* - Number of lines in cache = m = 2r - Size of tag = (s – r) bits ## Victim Cache - Originally proposed as an approach to reduce the conflict misses of direct mapped caches without affecting its fast access time - Fully associative cache - Typical size is 4 to 16 cache lines - Residing between direct mapped L1 cache and the next level of memory ## Figure 4.11 Fully Associative Cache Organization - A schematic is illustrated showing the fully associative cache organization. The following components are shown: - Memory address (Tag and Word are labeled) - Cache (Tag and Data are labeled) - Main Memory - Compare (1 if match, 0 if no match) - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. ## Figure 4.12 Associative Mapping Example - A diagram is illustrated showing the associative mapping example, showing the following components: - Main Memory (00000000:0000:00:00:0000:0:0:0:0000 is labeled) - Cache (058CE6, 058CE7, 058CE8, 3FFFFD, 3FFFFE, 3FFFFF) - Tag (hex) - (000000, 000001, 058CE6, 058CE7, 058CE8, 3FFFFD, 3FFFFE, 3FFFFF) - Line Number - (0000, 0001, 0CE7, 3FFD, 3FFE, 3FFF) - There are handwritten notes on this figure: - Next to the Main Memory address: "One of the parts" ## Associative Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - Number of blocks in main memory = 2s+w/2w = 2s - Number of lines in cache = undetermined - Size of tag = s bits ## Set Associative Mapping - Compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages - Cache consists of a number of sets - Each set contains a number of lines - A given block maps to any line in a given set - e.g. 2 lines per set - 2 way associative mapping - A given block can be in one of 2 lines in only one set ## Figure 4.13 Mapping From Main Memory to Cache: k-way Set Associative - Two diagrams are illustrated: - **v associative-mapped caches diagram:** - A table of main memory blocks is shown, with "First v blocks of main memory (equal to number of sets)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. - **k direct-mapped caches diagram:** - A table of main memory blocks is shown, with "First v blocks of main memory (equal to number of sets)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. ## Figure 4.14 k-Way Set Associative Cache Organization - A schematic is illustrated showing the k-way set associative cache organization. The following components are shown: - Memory address (Tag, Set, and Word are labeled) - Cache (Tag, Data, Set 0, and Set 1 are labeled) - Main Memory - Compare (1 if match, 0 if no match) - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. ## Set Associative Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - Number of blocks in main memory = 2s+w/2w = 2s - Number of lines in set = k - Number of sets = v = 2d - Number of lines in cache = m = kv = k * 2d - Size of cache = k * 2d+w words or bytes - Size of tag = (s – d) bits ## Figure 4.15 Two-Way Set Associative Mapping Example - A diagram is illustrated showing the two-way set associative mapping example. The following components are shown: - Main Memory (00000000:0000:0000:0000 is labeled) - Cache (000, 02C, 1FF, 000, 02C, 1FF are labeled) - Tag - (000, 000, 02C, 02C, 02C, 02C, 1FF, 1FF) - Set - (0000, 0001, 0CE7, 1FFE, 1FFF) - Data - (13579246, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468) ## Figure 4.16 Varying Associativity Over Cache Size - A graph is illustrated showing the hit ratio for varying associativity over different cache sizes: - The x-axis is labeled "Cache size (bytes)": 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1M - The y-axis is labeled "Hit ratio": 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 - The following labels are used for different associativity: - Direct - 2-way - 4-way - 8-way - 16-way ## Figure 4.17 Total Hit Ratio (L1 and L2) For 8 Kbyte and 16 Kbyte L1 - A graph is illustrated showing the total hit ratio for L1 and L2 caches for 8 Kbyte and 16 Kbyte L1 sizes: - The x-axis is labeled "L2 Cache size (bytes)": 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1M, 2M - The y-axis is labeled "Hit ratio": 0.78, 0.80, 0.82, 0.84, 0.86, 0.88, 0.90, 0.92, 0.94, 0.96, 0.98 - L1= 16K and L1= 8K are labeled on the graph. ## Unified Versus Split Caches - Has become common to split cache: - One dedicated to instructions - One dedicated to data - Both exist at the same level, typically as two L1 caches - Advantages of unified cache: - Higher hit rate - Balances load of instruction and data fetches automatically - Only one cache needs to be designed and implemented - Trend is toward split caches at the L1 and unified caches for higher levels - Advantages of split cache: - Eliminates cache contention between instruction fetch/decode unit and execution unit - Important in pipelining ## Summary ### Chapter 4 - Computer memory system overview - Characteristics of Memory Systems - Memory Hierarchy - Cache memory principles ### Cache Memory - Elements of cache design - Cache addresses - Cache size - Mapping function - Replacement algorithms - Write policy - Line size - Number of caches ## Figure 4.18 Cache Performance Evaluation - A diagram is illustrated with the following labels: - *Memory hierarchy:* - L1 cache - L2 cache - Main Memory - *Performance metrics:* - Hit ratio - Miss ratio - Miss penalty - Average access time - *Cache design parameters:* - Cache size - Block size - Associativity - Replacement policy - Write policy - There are hand-written notes on this figure: - "This is cache design" - "This is cache performance" - "This is cache parameters" ## Figures 34-50 Set Associative Mapping - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Set Associative Mapping and its advantages, implementation details, and code examples. ## Figure 62 - 65 Write Policy - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Write Policy and its advantages, implementation details, and code examples. ## Figure 66 Line Size - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Line Size and its advantages, implementation details, and code examples. ## Figure 67 Multilevel Caches - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Multilevel Caches and its advantages, implementation details, and code examples. ## Figure 68 - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of the total hit ratio (L1 and L2) for 8 kbyte and 16 kbyte L1 sizes. The figure would likely show the relationship between L2 cache size and total hit ratio for different L1 cache configuration.

Chapter 4 Cache Memory PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue