Chapter 4 Cache Memory PDF
Document Details
Uploaded by EffortlessCircle2912
William Stallings
Tags
Summary
This document outlines the concepts of cache memory in computer architecture. It details the characteristics of computer memory systems, various access methods, and the performance parameters associated with memory systems. The document also briefly describes different memory types and the fundamental concepts of cache memory, including its organization and principles.
Full Transcript
# Chapter 4 Cache Memory ## Table 4.1 Key Characteristics of Computer Memory Systems | Location | Performance | Physical Type | Physical Characteristics | Organization | | --------------------------- | ------------------------ | ------------------- | ----...
# Chapter 4 Cache Memory ## Table 4.1 Key Characteristics of Computer Memory Systems | Location | Performance | Physical Type | Physical Characteristics | Organization | | --------------------------- | ------------------------ | ------------------- | ------------------------- | ------------------- | | Internal (e.g. processor registers, cache, main memory) | Access time | Semiconductor | Volatile/nonvolatile | Memory modules | | External (e.g. optical disks, magnetic disks, tapes) | Cycle time | Magnetic | Erasable/nonerasable | | | | Transfer rate | Optical | | | | | | Magneto-optical | | | | Capacity | | | | | | Number of words | | | | | | Number of bytes | | | | | | Unit of Transfer | | | | | | Word | | | | | | Block | | | | | | Access Method | | | | | | Sequential | | | | | | Direct | | | | | | Random | | | | | | Associative | | | | | ## Characteristics of Memory Systems - **Location** - Refers to whether memory is internal and external to the computer - Internal memory is often equated with main memory - Processor requires its own local memory, in the form of registers - Cache is another form of internal memory - External memory consists of peripheral storage devices that are accessible to the processor via I/O controllers - **Capacity** - Memory is typically expressed in terms of bytes - **Unit of transfer** - For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module ## Method of Accessing Units of Data - **Sequential access** - Memory is organized into units of data called records - Access must be made in a specific linear sequence - Access time is variable - **Direct access** - Involves a shared read-write mechanism - Individual blocks or records have a unique address based on physical location - Access time is variable - **Random access** - Each addressable location in memory has a unique, physically wired-in addressing mechanism - The time to access a given location is independent of the sequence of prior accesses and is constant - Any location can be selected at random and directly addressed and accessed - Main memory and some cache systems are random access - **Associative** - A word is retrieved based on a portion of its contents rather than its address - Each location has its own addressing mechanism and retrieval time is constant independent of location or prior access patterns - Cache memories may employ associative access ## Capacity and Performance: - The two most important characteristics of memory - Three performance parameters are used: - **Access time (latency)** - For random-access memory, it is the time it takes to perform a read or write operation - For non-random-access memory, it is the time it takes to position the read-write mechanism at the desired location - **Memory cycle time** - Access time plus any additional time required before the second access can commence - Additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively - Concerned with the system bus, not the processor - **Transfer rate** - The rate at which data can be transferred into or out of a memory unit - For random-access memory, it is equal to 1/(cycle time) ## Memory - The most common forms are: - Semiconductor memory - Magnetic surface memory - Optical - Magneto-optical - Several physical characteristics of data storage are important - **Volatile memory** - Information decays naturally or is lost when electrical power is switched off - **Nonvolatile memory** - Once recorded, information remains without deterioration until deliberately changed - No electrical power is needed to retain information - **Magnetic-surface memories** - Are nonvolatile - **Semiconductor memory** - May be either volatile or nonvolatile - Nonerasable memory - Cannot be altered, except by destroying the storage unit - Semiconductor memory of this type is known as read-only memory (ROM) - For random-access memory, the organization is a key design issue - Organization refers to the physical arrangement of bits to form words ## Memory Hierarchy - Design constraints on a computer's memory can be summed up by three questions: - How much, how fast, how expensive - There is a trade-off among capacity, access time, and cost - Faster access time, greater cost per bit - Greater capacity, smaller cost per bit - Greater capacity, slower access time - The way out of the memory dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy ## Figure 4.1 The Memory Hierarchy A pyramid is illustrated with the following labels from top to bottom. - Registers - Cache - Main memory - Inboard memory - Outboard storage - Off-line storage - Magnetic tape - Magnetic disk - CD-ROM - CD-RW - DVD-RW - DVD-RAM - Blu-Ray - There are hand-written notes on this figure: - The top of the figure has this note: "Faster and more expensive" - The left hand side of the pyramid has this note: "Slower than" - The upper right hand side of the pyramid has this note: "Slower than" - The lower right hand side of the pyramid has this note: "Slower than" - The bottom of the pyramid has this note: "The price?" ## Figure 4.2 Performance of a Simple Two-Level Memory - A graphic is illustrated showing the relationship between the average access time and fraction of accesses involving only level 1 (hit ratio) of a simple two-level memory. - Level 1 and Level 2 are labeled on the graph. The value of the average access time is higher when only level 1 is involved, and the value of the average access time is lower as the fraction of accesses involving only level 1 increases. ## Memory - At previous chapter we said cache memory has levels - The use of three levels exploits the fact that semiconductor memory comes in a variety of types which differ in speed and cost - Data are stored more permanently on external mass storage devices - External, nonvolatile memory is also referred to as secondary memory or auxiliary memory ## Disk cache - A portion of main memory can be used as a buffer to hold data temporarily that is to be read out to disk - A few large transfers of data can be used instead of many small transfers of data - Data can be retrieved rapidly from the software cache rather than slowly from the disk ## Figure 4.3 Cache and Main Memory - Two diagrams are illustrated: - **Single cache diagram:** - A CPU is connected to a Cache and Main Memory. - Data goes from CPU to Cache, and Cache to Main Memory. - *Word transfer and Block transfer are labeled on the diagram.* Data is transferred from CPU to Cache as a Word Transfer (fast), and from Cache to Main Memory as a Block Transfer (slow). - **Three-level cache organization diagram:** - A CPU is connected to Level 1 cache, Level 2 cache, Level 3 cache, and Main Memory. - *The speed of the transfer is labeled from fastest to slowest.* Data transfers from CPU to Level 1 cache as fastest and slowest from Level 3 cache to Main Memory. - There are handwritten notes on this figure: - Next to the Single cache diagram: "Smallest in size, but the fastest, the largest in size, and the slowest" - Next to the Three-level cache organization diagram: "Customarily" ## Figure 4.4 Cache/Main-Memory Structure - Two diagrams are illustrated: - **Cache diagram:** - A table of cache blocks is shown. Each block contains line numbers, a tag, and a block. - *The direction of data transfer and question are labeled.* Data transfers from Main Memory to cache with a question: "Does the data exist first?". - **Main memory diagram:** - A table of main memory blocks is shown. Each block contains a word length, 2n-1, and a word. ## Figure 4.5 Cache Read Operation - A flowchart is illustrated showing the steps involved in a cache read operation: - Start - Receive address RA from CPU - Is block containing RA in cache? - Yes: - Fetch RA word and deliver to CPU - Done - No: - Access main memory for block containing RA - Allocate cache line for main memory block - Load main memory block into cache line - Deliver RA word to CPU ## Figure 4.6 Typical Cache Organization - A schematic is illustrated of cache organization, showing the following components and connections: - Processor - Control - Cache - Control - Data - System Bus - Address buffer (Data is stored in this buffer) - Data buffer (Data is stored in this buffer) ## Figure 4.7 Logical and Physical Caches - Two diagrams are illustrated: - **Logical cache diagram:** - A processor is connected to a Cache and Main Memory via an MMU. - Logical address is received by the processor. - Data transfer follows this path: Processor receives Logical address, MMU converts it to physical address, Cache stores data, and Main Memory is connected to the Cache. - **Physical cache diagram:** - A processor is connected to Cache and Main Memory via an MMU. - Logical address is received by the processor. - Data transfer follows this path: Processor receives Logical address, MMU converts it to physical address, Cache stores data, and Main Memory is connected to the Cache. - There are handwritten notes on this figure: - Next to the Logical cache diagram: "It's checked if the info exists within the data" - Next to the Physical cache diagram: "What is it?" ## Mapping Function - Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines - Three techniques can be used: - **Direct** - The simplest technique - Maps each block of main memory into only one possible cache line - **Associative** - Permits each main memory block to be loaded into any line of the cache - The cache control logic interprets a memory address simply as a Tag and a Word field - To determine whether a block is in the cache, the cache control logic must simultaneously examine every line's Tag for a match - **Set Associative** - A compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages ## Figure 4.8 Mapping From Main Memory to Cache: Direct and Associative - Two diagrams are illustrated: - **Direct mapping diagram:** - A table of main memory blocks is shown, with "First m blocks of main memory (equal to size of cache)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. - **Associative mapping diagram:** - A single main memory block with "b" label, which represents the length of block in bits is shown. - A table of cache memory is shown, with "b" label for each block. - Data flow from the main memory block to the cache memory table. - "t" is labeled, which represents the length of tag in bits. - There are handwritten notes on this figure: - Next to the Associative mapping diagram: "This picture is easier to understand" ## Figure 4.9 Direct-Mapping Cache Organization - A schematic is illustrated showing the direct mapping cache organization. The following components are shown: - Memory address (Tag, Line, and Word are labeled) - Cache (Tag and Data are labeled) - Main Memory - Compare (1 if match, 0 if no match) - Block - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. - There are handwritten notes on this figure: - Next to the Block: "Data set" ## Figure 4.10 Direct Mapping Example - A diagram is illustrated showing the direct mapping example, showing the following components: - Main Memory (00000000:0000:0000:0000 is labeled) - Cache (00, 16, FF, 00, 16, FF are labeled) - Data (13579246, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468) - Tag (hex) - (00, 00, 16, 16, 16, 16, FF, FF, FF, FF) - Line Number - (0000, 0001, 0CE7, 3FFF, 3FFE, 3FFF) - There are handwritten notes on this figure: - Next to the Main Memory address: "I understand" ## Direct Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - *Number of blocks in main memory = 2s+w/2w = 2s* - Number of lines in cache = m = 2r - Size of tag = (s – r) bits ## Victim Cache - Originally proposed as an approach to reduce the conflict misses of direct mapped caches without affecting its fast access time - Fully associative cache - Typical size is 4 to 16 cache lines - Residing between direct mapped L1 cache and the next level of memory ## Figure 4.11 Fully Associative Cache Organization - A schematic is illustrated showing the fully associative cache organization. The following components are shown: - Memory address (Tag and Word are labeled) - Cache (Tag and Data are labeled) - Main Memory - Compare (1 if match, 0 if no match) - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. ## Figure 4.12 Associative Mapping Example - A diagram is illustrated showing the associative mapping example, showing the following components: - Main Memory (00000000:0000:00:00:0000:0:0:0:0000 is labeled) - Cache (058CE6, 058CE7, 058CE8, 3FFFFD, 3FFFFE, 3FFFFF) - Tag (hex) - (000000, 000001, 058CE6, 058CE7, 058CE8, 3FFFFD, 3FFFFE, 3FFFFF) - Line Number - (0000, 0001, 0CE7, 3FFD, 3FFE, 3FFF) - There are handwritten notes on this figure: - Next to the Main Memory address: "One of the parts" ## Associative Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - Number of blocks in main memory = 2s+w/2w = 2s - Number of lines in cache = undetermined - Size of tag = s bits ## Set Associative Mapping - Compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages - Cache consists of a number of sets - Each set contains a number of lines - A given block maps to any line in a given set - e.g. 2 lines per set - 2 way associative mapping - A given block can be in one of 2 lines in only one set ## Figure 4.13 Mapping From Main Memory to Cache: k-way Set Associative - Two diagrams are illustrated: - **v associative-mapped caches diagram:** - A table of main memory blocks is shown, with "First v blocks of main memory (equal to number of sets)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. - **k direct-mapped caches diagram:** - A table of main memory blocks is shown, with "First v blocks of main memory (equal to number of sets)" labeled. - "b" is labeled for each block, which represents the length of block in bits. - A table of cache memory blocks is shown. "b" is labeled for each block. - Data flows from the main memory blocks table to the cache memory blocks table. - "t" is labeled, which represents the length of tag in bits. ## Figure 4.14 k-Way Set Associative Cache Organization - A schematic is illustrated showing the k-way set associative cache organization. The following components are shown: - Memory address (Tag, Set, and Word are labeled) - Cache (Tag, Data, Set 0, and Set 1 are labeled) - Main Memory - Compare (1 if match, 0 if no match) - *Word transfer and Block transfer are labeled on the diagram.* Data goes from CPU to Cache as a Word Transfer and from Cache to Main Memory as a Block Transfer. ## Set Associative Mapping Summary - Address length = (s + w) bits - Number of addressable units = 2s+w words or bytes - Block size = line size = 2w words or bytes - Number of blocks in main memory = 2s+w/2w = 2s - Number of lines in set = k - Number of sets = v = 2d - Number of lines in cache = m = kv = k * 2d - Size of cache = k * 2d+w words or bytes - Size of tag = (s – d) bits ## Figure 4.15 Two-Way Set Associative Mapping Example - A diagram is illustrated showing the two-way set associative mapping example. The following components are shown: - Main Memory (00000000:0000:0000:0000 is labeled) - Cache (000, 02C, 1FF, 000, 02C, 1FF are labeled) - Tag - (000, 000, 02C, 02C, 02C, 02C, 1FF, 1FF) - Set - (0000, 0001, 0CE7, 1FFE, 1FFF) - Data - (13579246, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468, 77777777, 11235813, FEDCBA98, 12345678, 11223344, 24682468) ## Figure 4.16 Varying Associativity Over Cache Size - A graph is illustrated showing the hit ratio for varying associativity over different cache sizes: - The x-axis is labeled "Cache size (bytes)": 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1M - The y-axis is labeled "Hit ratio": 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 - The following labels are used for different associativity: - Direct - 2-way - 4-way - 8-way - 16-way ## Figure 4.17 Total Hit Ratio (L1 and L2) For 8 Kbyte and 16 Kbyte L1 - A graph is illustrated showing the total hit ratio for L1 and L2 caches for 8 Kbyte and 16 Kbyte L1 sizes: - The x-axis is labeled "L2 Cache size (bytes)": 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1M, 2M - The y-axis is labeled "Hit ratio": 0.78, 0.80, 0.82, 0.84, 0.86, 0.88, 0.90, 0.92, 0.94, 0.96, 0.98 - L1= 16K and L1= 8K are labeled on the graph. ## Unified Versus Split Caches - Has become common to split cache: - One dedicated to instructions - One dedicated to data - Both exist at the same level, typically as two L1 caches - Advantages of unified cache: - Higher hit rate - Balances load of instruction and data fetches automatically - Only one cache needs to be designed and implemented - Trend is toward split caches at the L1 and unified caches for higher levels - Advantages of split cache: - Eliminates cache contention between instruction fetch/decode unit and execution unit - Important in pipelining ## Summary ### Chapter 4 - Computer memory system overview - Characteristics of Memory Systems - Memory Hierarchy - Cache memory principles ### Cache Memory - Elements of cache design - Cache addresses - Cache size - Mapping function - Replacement algorithms - Write policy - Line size - Number of caches ## Figure 4.18 Cache Performance Evaluation - A diagram is illustrated with the following labels: - *Memory hierarchy:* - L1 cache - L2 cache - Main Memory - *Performance metrics:* - Hit ratio - Miss ratio - Miss penalty - Average access time - *Cache design parameters:* - Cache size - Block size - Associativity - Replacement policy - Write policy - There are hand-written notes on this figure: - "This is cache design" - "This is cache performance" - "This is cache parameters" ## Figures 34-50 Set Associative Mapping - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Set Associative Mapping and its advantages, implementation details, and code examples. ## Figure 62 - 65 Write Policy - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Write Policy and its advantages, implementation details, and code examples. ## Figure 66 Line Size - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Line Size and its advantages, implementation details, and code examples. ## Figure 67 Multilevel Caches - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of Multilevel Caches and its advantages, implementation details, and code examples. ## Figure 68 - This section of the document is missing. Based on the included information, the section would likely be a detailed discussion of the total hit ratio (L1 and L2) for 8 kbyte and 16 kbyte L1 sizes. The figure would likely show the relationship between L2 cache size and total hit ratio for different L1 cache configuration.