Chapter 2: Memories - BADJI MOKHTAR-ANNABA University 2024/2025 PDF

Summary

This document is a chapter from a computer engineering textbook, focusing on cache memory. It covers cache memory definitions, locality principles, functioning principles, types of caches, and their organization, ultimately explaining cache operation algorithms, such as FIFO.

Full Transcript

BADJI MOKHTAR-ANNABA UNIVERSITY ‫جامعة باجي مختار – عنابـــــــــــــــة‬ FACULTY OF TECHNOLOGY ‫كلية التكنولوجيا‬ COMPUTER ENGINEERING DEPARTMENT ‫قسم مهندس في اإلعــــــــــــــالم اآللــــــــــــ...

BADJI MOKHTAR-ANNABA UNIVERSITY ‫جامعة باجي مختار – عنابـــــــــــــــة‬ FACULTY OF TECHNOLOGY ‫كلية التكنولوجيا‬ COMPUTER ENGINEERING DEPARTMENT ‫قسم مهندس في اإلعــــــــــــــالم اآللــــــــــــــي‬ Chapter 2: Memories Part 2 Cache memories Dr. MECHERI K. ARCHI2 S3 2024/2025 PLAN 1. Introduction. 2. Cache memory definition. 3. Locality principles. 4. Functioning principle of caches. 5. General organization of caches. 6. Cache memory types. 7. Number and location of caches. 2 1. Introduction Performance Gap Problem  Processor speed performance is increasing rapidly compared to memory (9GHz vs. 1GHz). https://www.mdpi.com/2079 -9292/12/3/754 3 1. Introduction Performance Gap Problem  The gap is considerable and the exchanges between the processor and the MM are very numerous, which causes a bottleneck and penalizes the processor operation.  The bus transfer time is also a slowing factor.  SRAM memories have too low an integration density and too high a cost to constitute large capacity main memories. 4 1. Introduction Solutions  increase the frequency of memories, which allows an improvement in bandwidth. However, we note that this evolution is slow;  increase the width of the bus, which allows an increase in bandwidth, but which is not easy to achieve, mainly because of the congestion;  reduce processor bandwidth requirements through the use of cache memories. 5 2. Cache memory definition Cache memory is a faster memory (SRAM) with a lower capacity and closer to the processor. It contains copies of information from main memory (DRAM) that are frequently used. It therefore allows the number of RAM accesses to be reduced, which saves time. Main Memory Processor Cache Registers Memory Bus local Bus 6 3. Locality principles Principle of spatial locality : When an instruction references an @ it is very probable that the next reference will be in the neighborhood of this @ (e.g. arrays). The instructions of a program are placed in contiguous memory words and execution is done sequentially. So when an instruction is executed, it is very likely that the next instruction to be executed is in the next word of memory. So, when a data or an instruction must be loaded into a cache it would be interesting to also load the data (and/or instructions) which are close to it into memory. This increases the probability that the processor will find the next data (or instruction) in the cache. 7 3. Locality principles Principle of temporal locality: Data processed by programs is used multiple times in short periods of time (example: variable in a loop). Also, the processor may reference a given word multiple times. These two principles of locality are the basis of systems using caches. 8 4. Functioning principle of caches The principle is to make low-capacity memories, very fast and close to the processor, cooperate with slower, high-capacity memories. The processor first looks for the word in the cache, if it is present (Success/ cache hit) it gets it quickly. Main Memory Processor Cache Registers Memory 1 Bus local 2 Bus 9 4. Functioning principle of caches If the word is not present (Failure/ cache miss), the processor accesses central memory and places this word in the cache before loading it. Main Memory Processor Cache Registers Memory 1 2 Bus local 4 2 3 Bus 10 4. Functioning principle of caches The processor first searches if the Algorithm for reading a word : addressed memory word is present in the cache (success/ cache hit). if word present in cache then load processor with the word; else If the information is not in the cache (Failure/cache miss) it must be if cache full retrieved from the MM then load cache(replace); load processor; If cache full, we load the cache with replacement, then the processor else load cache; load processor; If cache is not full, we load it without replacement, then we load the endif processor endif 11 4. Functioning principle of caches Algorithm for reading a word (explanations): The processor first searches if the addressed memory word is present in the cache(success/ cache hit). If the information is not in the cache (cache miss), it must be retrieved from the MM and placed in the cache (load cache) with possible replacement of information currently present in the cache (load cache (replace)). Finally, the processor is loaded with the information now available in the cache. The efficiency of the cache depends on its success rate and therefore on the probability of the presence in the cache of the searched word. 12 4. Functioning principle of caches Effective time to access information Teff = h x Tc + (1 – h) x Tm with Teff: effective time to access information, h: probability of information presence in the cache, Tc: cache access time and Tm : MM access time Objective : Caches must be carefully organized to increase the probability of information being present in the cache and thus reduce the effective time to access the information. 13 4. Functioning principle of caches Algorithm of a writing : if word present in cache then edit cache; edit MM; else edit MM; endif The cache contains a copy of part of the MM information. So, when we have to modify the cache we must modify the MM to maintain consistency. 14 4. Functioning principle of caches There are several techniques to achieve this modification: Write through (écriture immédiate): We write simultaneously to the cache and the main memory. We therefore guarantee consistency; Write back (écriture différée ): The main memory can be updated when the cache information needs to be replaced or whenever the communication bus is free. This method does not guarantee constant coherence, but the write time is lower.. 15 5. General organization of caches Main memory is considered as a series of memory blocks: memory lines. The 1st word @ of the 1st line is 000, All lines have the same number of words. The 1st word @ of the 2nd Example: line is 004. MM is organized into lines of 4 words each. 16 5. General organization of caches When a line number exists in the directory then the cache line associated with the key contains the values ​of the corresponding memory line. The @ of 'c' is 022 = line number 020 + an offset of 2 words. 17 6. Cache memory types 6.1. Direct cache This is the simplest cache. Example:  The main memory has a capacity of 10,000 words =104 words. Addresses vary from 0000 to 9999 (4 decimal positions).  The MM is composed of lines each of 10 words. Main Memory 18 6. Cache memory types The cache includes : 6.1. Direct cache A useful memory : contains the data. Each line is 10 words long. There are 10 lines; A 10-line directory. Each line includes : - a validity bit that indicates whether valid data is available in this line. - a key to precisely identify a line of data. A comparator which checks if the label value of the address is equal Directory Useful Memory to the key. Comparator Each line in the cache = a validity bit, a key and a data line. Size of a cache entry = 1bit (validity) + n bits (key) + m bits (data) 19 6. Cache memory types 6.1. Direct cache When an address is presented to the cache, the cache controller decomposes this address into three Label parts : index, label and offset. The index: locates a cache entry, we have 10 entries so one number is enough (0…9). The offset (déplacement) locates a word within a data line. There are 10 words per line, so one number is enough to designate all the words in a line(0…9). Directory Useful Memory Comparator Label = the rest of the @ positions. It will be compared to the key. 20 6. Cache memory types 6.1. Direct cache address size = label size + index size + offset size nbr of cache entries = useful memory size in words / nbr of words in a useful memory line = base index= 10 index = 100/10 = 10 = 101 nbr of words in a useful memory line = baseOffset 10 = 101 21 6. Cache memory types 6.1. Direct cache Direct Cache Operation Algorithm : if directory[index] == label then load the processor with useful_memory [index, offset]; else load useful_memory [index] from MM; directory[index] = label; load the processor with useful_memory[index, offset]; endif 22 6. Cache memory types 6.1. Direct cache Assume that the processor executes the following program with our cache: 1. Load D, R, 0215 2. Load D, R, 0212 3. Load D, R, 0114 4. Load D, R, 0223 5. Load D, R, 0116 23 6. Cache memory types 6.1. Direct cache Label Success (hit) Directory Useful Memory Comparator Main Memory 24 6. Cache memory types 6.1. Direct cache Label 0 2 1 2 Success (hit) Directory Useful Memory Comparator Main Memory 25 6. Cache memory types 6.1. Direct cache Label 0 1 1 4 Failure(miss) 01 10k l mno pq r s t 01 Directory Useful Memory Comparator Main Memory 26 6. Cache memory types 6.1. Direct cache Label 0 2 2 3 Success (hit) Directory Useful Memory Comparator Main Memory 27 6. Cache memory types 6.1. Direct cache Application exercise : A 1MB memory; addressable by bytes. The direct cache has a useful capacity of 8KB, with blocks (lines) of 4 words each. 1. Calculate the label size. 2. Calculate the real size of the cache 28 6. Cache memory types 6.1. Direct cache Solution: 1. Label size : n ? Address size= n + index size + offset size n= adress size - index size - offset size @ size ?: capacity memory 1M = 220 so @ size = 20 bits Index size ?: depends on the number of cache entries nbr of cache entries = useful memory size/useful line size = 8Koctets/4 octets= 2K entries = 211 entries ; so index size = 11 bits offset size?: we have 4= 22 words per par block, so offset size = 2 bits n= 20 - 11 – 2 n= 7 bits 29 6. Cache memory types 6.1. Direct cache Solution: 2. Real cache size Real cache size = size of a cache entry × number of entries Size of a cache entry ? size of a cache entry = 1 bit(validity) + n bits (label)+ m bits (useful data) size of a cache entry = 1 + 7 + 4 × 8 = 40 bits Real cache size = 40 * 2K= 80K bits 30 6. Cache memory types 6.2. Associative cache  More complex and more expensive to build.  There are as many comparators as cache lines.  No fixed index for loading.  Address = label + offset.  The cache controller checks in a single operation if a label is present in one of the lines of the directory.  If the response is successful, it means that the data sought is in the useful memory. 31 6. Cache memory types 6.2. Associative cache Label Directory Useful Memory Comparators Main Memory 32 6. Cache memory types 6.2. Associative cache Associative Cache Operation Algorithm : if directory contains label then load processor; else if directory not full then fill a free line (Label, info)) load processor; else line replacement algorithm; fill the chosen line (Label, info); load processor; endif endif 33 6. Cache memory types 6.2. Associative cache Line replacement algorithms: – FIFO (First In, First Out) : in this case the replaced line is the oldest loaded line; – LRU (Least Recently Used) : in this case, the replaced line is the least recently accessed line (moins recemment accedée). This policy is better than the previous one because it takes into account the accesses made by the processor to the cache, but it is expensive because it requires maintaining the order of the accesses made; – NMRU (Not Most Recently Used) : the replaced line is not the most recently used line. In this policy, the replaced line is a line chosen at random from the set of lines in the cache, excluding the most recently accessed line. 34 6. Cache memory types 6.2. Associative cache Application exercise Consider a memory addressed on 16 bits including 12 bits for the label. We assume that the processor successively requests the following sequence of addresses (in hexadecimal): Times 0 1 2 3 4 5 6 Adress 010A 011A 012F 011B 023F 012C 030F Give the evolution of the four entries of the cache directory and note the successes/failures in the case of the FIFO policy. 35 6. Cache memory types 6.2. Associative cache Exercice d’application (FIFO) Temps 0 1 2 3 4 5 6 Adress 010A 011A 012F 011B 023F 012C 030F Entry0 010 010 010 010 010 010 030 Entry1 011 011 011 011 011 011 Entry2 012 012 012 012 012 Entry3 023 023 023 Result Failure Failure Failure Success Failure Success Failure(R eplacemen t) 36 6. Cache memory types 6.2. Mixed cache The cache is divided into blocks managed like When an address is direct caches and there is one comparator presented to the cache, the per block. index simultaneously references one line per block. Label In a single operation, the comparators check if the label is in one of the lines. Block1 In case of failure, the line must be loaded from the MM into one of the referenced Block2 lines. If all lines are occupied, a replacement algorithm is Block3 used. 37 6. Cache memory types 6.3. Mixed cache Mixed Cache Operation Algorithm : if directories[index] contains label then load usable_memory[block, index, offset] into processor; else choose block to replace line; replace line in chosen block; load usable_memory[choose block, index, offset] into processor; endIf 38 7. Number and location of caches  The objective of caches is to increase the amount of information transferred to the processor per unit of time (improve bandwidth):  By increasing the frequency of memories (reducing access times) and/or improving the quality of transfers between processor and memory.  The idea is therefore to bring the memory as close as possible to the processor in order to reduce transfer times, the ideal being to place the memory in the processor chip.  Unfortunately, it is not possible to house high-capacity fast memories in the processor chip (low degree of integration).  A hierarchy of memories is therefore set up on three levels: 39 7. Number and location of caches Processor Level 1 Cache Level 2 Cache Main Memory Local bus Data The 1st level (L1) of cache The 3rd level (L3) is memory, small and very fast, made up of MM placed in the processor The 2nd level (L2), with (Instructions; Data) greater capacity and rapid access, is outside the processor 40 7. Number and location of caches  The processor first searches for the data or instruction in the L1 cache, in case of failure information is searched in the L2 cache, in case of failure information is searched in the L3 MM.  A good operation implies that all information from cache 1 is found in cache 2 and that information from cache 2 is in the MM. 41 7. Number and location of caches https://encyclopedia2.thefreedictionary.com/L2+cache 42 Main reference Alain Cazes, Joëlle Delacroix, ‘Architecture des machines et des systèmes informatiques, cours et exercices corrigés’. 4e édition, DUNOD, 2011. 43

Use Quizgecko on...
Browser
Browser