Cache Memory PDF

Chapter 4 Cache Memory COMPUTER MEMORY SYSTEM OVERVIEW ALTHOUGH SEEMINGLY SIMPLE IN CONCEPT, COMPUTER MEMORY EXHIBITS PERHAPS THE WIDEST RANGE OF TYPE, TECHNOLOGY, ORGANIZATION, PERFORMANCE, AND COST OF ANY FEATURE OF A COMPUTER SYSTEM. NO ONE TECHNOLOGY IS OPTIMAL IN SATISFYING THE MEMORY REQUIREMENTS FOR A COMPUTER SYSTEM. AS A CONSEQUENCE, THE TYPICAL COMPUTER SYSTEM IS EQUIPPED WITH A HIERARCHY OF MEMORY SUBSYSTEMS, SOME INTERNAL TO THE SYSTEM (DIRECTLY ACCESSIBLE BY THE PROCESSOR) AND SOME EXTERNAL (ACCESSIBLE BY THE PROCESSOR VIA AN I/O MODULE). In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as locality of references. 2 Characteristics of Memory Systems The complex subject of computer memory is made more manageable if we classify memory systems according to their key characteristics. The most important of these are listed below: 3 Characteristics of Memory Systems 1. Location 2. Capacity 3. Unit of transfer 4. Access method 5. Performance 6. Physical type 7. Physical characteristics 8. Organisation LOCATION ◦CPU( registers, control unit portion of the processor may also require its own internal memory) ◦Internal (cache, main memory) ◦External (optical disks etc) Capacity ◦Word size Internal: expressed in terms of bytes (1 byte 8 bits) or wordsNumber of words External: Bytes Unit of Transfer Internal For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module. ◦ Usually governed by data bus width 64-bit,128-bit,256-bit etc. External ◦ Usually a block which is much larger than a word Addressable unit ◦ Smallest location which can be uniquely addressed ◦ Word internally ◦ Cluster on MS disks Access Methods (1) Sequential ◦Start at the beginning and read through in order ◦Access time depends on location of data and previous location ◦e.g. tape Direct ◦Individual blocks have unique address ◦Access is by jumping to vicinity plus sequential search ◦Access time depends on location and previous location.e.g. disk Access Methods (2) Random ◦ Individual addresses identify locations exactly ◦ Access time is independent of location or previous access ◦ e.g. RAM Associative ◦ Data is located by a comparison with contents of a portion of the store ◦ Access time is independent of location or previous access ◦ e.g. cache Memory Hierarchy The key to the successful of memory system organization is decreasing frequency of access of the memory by the processor and employing a variety of technologies. Registers ◦In CPU Internal or Main memory ◦May include one or more levels of cache ◦“RAM” External memory ◦Backing store The Bottom Line Design constraint on computer memory can be summed up as: How much? ◦ Capacity (once there applications will use it) How fast? ◦ Memory must be able to keep up with the processor. The Bottom Line Design constraint on computer memory can be summed up as: How expensive? For a practical system, the cost of memory must be reasonable in relationship to other components. There is a trade-off among the three key characteristics of memory: namely, capacity, access time, and cost. Dilemma facing the designer A variety of technologies are used to implement memory systems, and across this spectrum of technologies, the following relationships hold: Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time The way out of this dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy. 12 Memory Hierarchy - Diagram Memory Hierarchy - Diagram 14 Memory Hierarchy- dilemma The dilemma facing the designer is clear. The designer would like to use memory technologies that provide for large-capacity memory, both because the capacity is needed and because the cost per bit is low. However, to meet performance requirements, the designer needs to use expensive, relatively lower-capacity memories with short access times. times 15 Memory Hierarchy As one goes down the hierarchy, the following occur: a. Decreasing cost per bit b. Increasing capacity c. Increasing access time d. Decreasing frequency of access of the memory by the processor Thus, smaller, more expensive, faster memories are supplemented by larger, cheaper, slower memories. 16 All data in one level is also found in the level below, and all data in that lower level is found in the one below it, and so on until we reach the bottom of the hierarchy. 17 Performance Access time ◦Time between presenting the address and getting the valid data Memory Cycle time ◦Time may be required for the memory to “recover” before next access ◦Cycle time is access + recovery Transfer Rate ◦Rate at which data can be moved Physical Types Cache and main memory are built using solid-state semiconductor material (typically CMOS transistors). Sometimes referred to the primary memory. The solid-state memory is followed by larger, less expensive, and far slower magnetic memories that consist typically of the (hard) disk and the tape. It is customary to call the disk the secondary memory, while the tape is conventionally called the tertiary memory. 19 Physical Types There are a variety of physical types of memory that are in use today. The most common are semiconductor memory, magnetic surface memory, (used for disk and tape,)and optical and magneto-optical. Semiconductor ◦RAM Magnetic ◦Disk & Tape Optical ◦CD & DVD Others :Bubble, Hologram 20 Physical characteristics Several physical characteristics (Key words) of data storage are important. Example: a volatile memory, is a kind whose information decays naturally or is lost when electrical power is switched off. In a nonvolatile memory, information once recorded remains without deterioration until deliberately changed; no electrical power is needed to retain information. Physical characteristics Magnetic-surface memories are nonvolatile. Semiconductor memory may be either volatile or nonvolatile. Non-erasable memory cannot be altered, except by destroying the storage unit. Semiconductor memory of this type is known as read-only memory(ROM). Of necessity, a practical nonerasable memory must also be nonvolatile Organisation Physical arrangement of bits into words Not always obvious e.g. interleaved 24 Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape Locality of Reference During the course of the execution of a program, memory references tend to cluster e.g. loops Programs typically contain a number of iterative loops and subroutines. Once a loop or subroutine is entered, there are repeated references to a small set of instructions. Locality of Reference Similarly, operations on tables and arrays involve access to a clustered set of data words. Over a long period of time, the clusters in use change, but over a short period of time, the processor is primarily working with fixed clusters of memory references. CACHE MEMORY PRINCIPLES Cache memory is intended to give memory speed approaching that of the fastest memories available. Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module Why Cache? The idea of introducing cache is that this extremely fast memory would store data that is frequently accessed and if possible, the data that is around it. This is to achieve the quickest possible response time to the CPU. 29 Cache and Main Memory The cache contains a copy of portions of main memory. When the processor attempts to read a word of memory, a check is made to determine if the word is in the cache. If so, the word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of words, is read into the cache and then the word is delivered to the processor. Because of the phenomenon of locality of reference. 31 Cache/Main Memory Structure Cache operation – overview CPU requests contents of memory location Check cache for this data, If present, get from cache (fast) If not present, read required block from main memory to cache, then deliver from cache to CPU. Cache includes tags to identify which block of main memory is in each cache slot The processor generates the read address (RA) of a word to be read Intelligent Storage System: Cache Intelligent Storage System Hos Front Back Physical t End End Disks Cach Connectivit e y FC SAN 34 Write Operation with Cache Write-through Cache Cach Write e Request Acknowledgemen t Write-back Cach Write e Request Acknowledgemen Acknowledgement t 35 Read Operation with Cache: ‘Hits’ and ‘Misses’ Data found in cache = ‘Hit’ Cach Read e Request No data found = ‘Miss’ Cach Read e Request 36 Measuring Cache Performance 37 Cache Management: New Data Algorithms Least Recently Used (LRU) – Discards least recently used data Most Recently Used (MRU) – Discards most recently used Oldest Data data Cache Implementation ( assumed that recent data may not be required for a while) Dedicated Cache – Separate memory sets reserved for read and write Global Cache – Both read and write operation use available memory. – More efficient 38 Cache Management: Watermarking Manage peak I/O requests “bursts” through flushing/de-staging – Idle flushing, High Watermark flushing and Forced flushing For maximum performance: – Provide headroom in write cache for I/O bursts 100 % HW M LWM Idle flushing High watermark flushing Forced flushing Cache Data Protection Protecting cache data against failure: (power failure) – Cache mirroring Each write to the cache is held in two different memory locations on two independent memory cards Cache coherency(only writes r mirrored) – Cache vaulting Cache is exposed to the risk of uncommitted data loss due to power failure In the event of power failure, uncommitted data is dumped to a dedicated set of drives called vault drives 40 Cache Read Operation - Flowchart Cache Design Addressing Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches Cache Size Cost ◦More cache is expensive Speed ◦More cache is faster (up to a point) ◦Checking cache for data takes time Typical Cache Organization Sizes Processor Type Year of Introduction L1 cache L2 cache L3 cache IBM 360/85 Mainframe 1968 16 to 32 KB — — PDP-11/70 Minicomputer 1975 1 KB — — VAX 11/780 Minicomputer 1978 16 KB — — IBM 3033 Mainframe 1978 64 KB — — IBM 3090 Mainframe 1985 128 to 256 KB — — Intel 80486 PC 1989 8 KB — — Pentium PC 1993 8 KB/8 KB 256 to 512 KB — PowerPC 601 PC 1993 32 KB — — PowerPC 620 PC 1996 32 KB/32 KB — — PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB IBM S/390 G6 Mainframe 1999 256 KB 8 MB — Pentium 4 PC/server 2000 8 KB/8 KB 256 KB — High-end server/ IBM SP supercomputer 2000 64 KB/32 KB 8 MB — CRAY MTAb Supercomputer 2000 8 KB 2 MB — Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB — Itanium 2 PC/server 2002 32 KB 256 KB 6 MB IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB — Types of cache memory Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe its closeness and accessibility to the microprocessor. There are three general cache levels: L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in the processor chip as CPU cache. L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on the CPU, or it can be on a separate chip or coprocessor and have a high- speed alternative system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the main system bus. 46 Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of DRAM. With multicore processors, each core can have dedicated L1 and L2 cache, but they can share an L3 cache. If an L3 cache references an instruction, it is usually elevated to a higher level of cache. 47

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue