csc25-chapter_08-17-38_part2.pdf
Document Details
Uploaded by SelfDeterminationOmaha
ITA
2024
Tags
Full Transcript
Cache Coherence SMP Problems Let’s assume that P1 writes the memory position X on its cache, and P2 reads the Mem[X]. What happens? Cache coherence problem I informally, a system is coherent if it returns the last value written to a data item Memory system coherence and consisten...
Cache Coherence SMP Problems Let’s assume that P1 writes the memory position X on its cache, and P2 reads the Mem[X]. What happens? Cache coherence problem I informally, a system is coherent if it returns the last value written to a data item Memory system coherence and consistency are complementary I coherence – ensure the use of the most current data I consistency – synchronize read/write between processors 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/44 Cache Coherence (cont.) SMP Problems Consistency P1 P2 1 a = 0; 1 b = 0; 2... 2... 3 a = 1; 3 b = 1; 4 if(b==0) 4 if(a==0) 5... 5... Initially, both a and b in the cache with value equal to zero Which one of the if will be taken? Or both of them? I most of the times it needs to be handled by the programmer, i.e., synchronization 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/44 Cache Coherence (cont.) SMP Problems Consistency P1 P2 1 a = 0; 1 b = 0; 2... 2... 3 a = 1; 3 b = 1; 4 if(b==0) 4 if(a==0) 5... 5... Initially, both a and b in the cache with value equal to zero Which one of the if will be taken? Or both of them? I most of the times it needs to be handled by the programmer, i.e., synchronization 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/44 Cache Coherence (cont.) Coherence A memory system is coherent if 1. a read by processor P to location X following a write by P to X, with no writes of X by another processor between the write/read by P, always returns the value written by P 2. a read by processor P1 to location X following a write by processor P2 to X returns the written value, if the write/read are sufficiently separated in time and no other writes to X occur between those two accesses 3. writes to the same location are serialized, i.e., two writes to the same location by any two processors are seen in the same order by all processors I e.g., if the values 1 and then 2 are written to a location, processors can never read the value of the location as 2 and then later read it as 1 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/44 Cache Coherence (cont.) Coherence Basic schemes for enforcing coherence I keep track of the status of any sharing of a data block I cache block status is kept by using status bits associated with that block 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/44 Cache Coherence (cont.) Coherence Hardware-based solution for multiprocessors to maintain cache coherence – cache coherence protocols Snooping I every cache containing a copy of the data from a physical memory block could track the sharing status of the block Directory-based I sharing status of a particular block of physical memory is kept in one location, i.e., directory I SMP uses single directory I DSM uses distributed directories 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/44 Cache Coherence (cont.) Snooping Coherence Protocols Each cache has a copy of the I memory data block, and I share status of the block, e.g., shared/non-shared As caches share the memory bus, they snoop on the memory traffic to check if they have copies of the “in-transit” block Protocols 1. write invalidate protocol 2. write update or write broadcast protocol 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/44 Cache Coherence (cont.) Write Invalidate Protocol Writing in a shared block invalidates the other block copies in the other processor’s cache When trying to access an invalid block I there is a cache miss, and I the data comes from the “dirty” cache block and also updates the memory, i.e., write back case Writing in non-shared blocks do not cause problems. Why? What about the write through approach? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/44 Cache Coherence (cont.) Write Invalidate Protocol Example I assuming that neither cache initially holds X and that the value of X in memory is 0 Assuming also write back caches 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/44 Cache Coherence (cont.) Write Update Protocol Difference only in the write fashion It updates all the cached copies of a data item when that item is written Must broadcast all writes to shared cache lines I consumes considerably more bandwidth Therefore, virtually all recent multiprocessors have opted to implement a write invalidate protocol 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/44 Cache Coherence (cont.) Brief Protocols Comparison Pro write invalidate protocol I multiple writing of the same word without intervening readings require multiple broadcasts, but just one initial block invalidation I each word written in a cache block requires a write broadcast in the write update prot., although only the first write of any word within the block needs to generate an invalidation I write invalidate prot. acts on cache blocks, while write update prot. must act on individual words I write invalidate prot. tends to generate less traffic in the memory bus I memory bus may be seen as a bottleneck in SMP 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/44 Cache Coherence (cont.) Brief Protocols Comparison Pro write update protocol I The delay between writing a word on one processor and reading the value written on another processor is smaller in the write update prot. I the written data is updated immediately in the reader’s cache 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/44 Cache Coherence (cont.) Write Invalidate Implementation Block invalidation I key to implementation is to get access to the memory bus I use it to invalidate a block, i.e., the processor sends the block address through the bus I the other processors are snooping on the bus, and I watching if they have that block in their caches, i.e., by checking the address and invalidating the block Serialized writing I the need to get access to the bus, as an exclusive resource, forces the serialization of the writes 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/44 Cache Coherence (cont.) Write Invalidate Implementation Locating a data item on a cache miss Write through cache I all written data are always sent to the memory I the most recent value of a data item can always be fetched from memory Write back cache I if block is clean I acts like write through I if block is dirty I sends dirty block in response to the read request I aborts the memory read/fetch operation 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/44 Cache Coherence (cont.) Write Invalidate Implementation A simple protocol with three states I invalid – as if not present in the cache I shared – present in one or more private caches (potentially shared) I not necessarily in the processor’s cache which requested the data I modified/exclusive – updated in the private cache / present only in one cache I not necessarily in the processor’s cache which requested the data The status changes from invalid to shared already in the first reading of the block, even if there is only one copy And, in the first write it becomes exclusive 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/44 Cache Coherence (cont.) Write Invalidate Implementation Write invalidate, cache coherence protocol FSM for a private write back cache. Hits/misses in local cache 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 30/44 Cache Coherence (cont.) Directory-based Protocol Is there any cache coherence problem wrt DSM, i.e., with shared address space? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 31/44 Cache Coherence (cont.) Directory-based Protocol Directory protocol - alternative to a snooping-based coherence protocol A directory keeps the state of every block that may be cached Information in the directory includes info like I which caches have copies of the block I whether it is dirty I block status, i.e., shared, uncached, modified 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 32/44 Cache Coherence (cont.) Directory-based Protocol Solution – distribute the directory along with the memory I different coherence requests can go to different directories I just as different memory requests go to different memories Each directory is responsible for tracking caches that share the memory addresses of the memory portion in the node 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 33/44 Cache Coherence (cont.) Directory-based Protocol Directory added to each node to implement cache coherence in DSM 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 34/44 Cache Coherence (cont.) Directory-based Protocol The state diagrams are the same as those used in snooping Implementation based on message passing between nodes, rather than snooping on the bus I needs to know which node has the block to make the invalidation Each processor keeps information on which processors have each memory block I field with an associated bit for each system processor for each memory block 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 35/44 Cache Coherence (cont.) Multicomputers Is there any cache coherence problem wrt multicomputers based on message passing? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 36/44