Shared Memory Architecture PDF
Document Details
Uploaded by Deleted User
Nadeem Kafi
Tags
Summary
This document presents an overview of shared memory architecture, including various types like UMA, NUMA, and COMA. It discusses concepts like cache coherence and different protocols used to maintain consistency in shared memory systems. The document also covers design considerations and example scenarios, providing a foundational understanding of shared memory.
Full Transcript
Chapter 4.1, 4.2, 4.3, 4.4, 4.5 - Hesham & Mostafa Shared Memory Architecture Nadeem Kafi [email protected] All material are taken from the Text Books & Internet Shared Memory Systems Communication between tasks running on dif...
Chapter 4.1, 4.2, 4.3, 4.4, 4.5 - Hesham & Mostafa Shared Memory Architecture Nadeem Kafi [email protected] All material are taken from the Text Books & Internet Shared Memory Systems Communication between tasks running on different processors is performed through writing to and reading from the global memory. All inter-processor coordination and synchronization is also accomplished via the global memory. Characteristics of shared memory systems Any processor can directly reference any memory location. Communication occurs implicitly as result of loads and stores. Location of data in memory is transparent to the programmer. Inherently provided on wide range of platforms (standard processors today have specific extra hardware for share memory systems) Memory may be physically distributed among processors. 3 Shared Memory Systems Two main problems need to be addressed when designing a shared memory system: (1) performance degradation due to contention, and (2) coherence problems. Performance degradation might happen when multiple processors are trying to access the shared memory simultaneously. A typical design might use caches to solve the contention problem. However, having multiple copies of data, spread throughout the caches, might lead to a coherence problem. The copies in the caches are coherent if they are all equal to the same value. However, if one of the processors writes over the value of one of the copies, then the copy becomes inconsistent because it no longer equals the value of the other copies. Classification of Shared Memory UMA Each processor has equal opportunity to read/write to memory, including equal access speed. NUMA COMA There is no memory hierarchy and the address space is made of all the caches. There is a cache directory (D) that helps in remote cache access. Shared Memory Requirements Support for memory coherency – The machine must make sure that all of the processing nodes have an accurate picture of the most up-to-date memory. Support for atomic operations on data – The machine must allow for only one processor to change data at a time. – Non-atomic operation: One processor requests data and before the request is answered, another processor changes that data. 9 Shared Memory Design There are two type of interconnection network designs: Bus-based or switch-based Bus-based SMP The bus/cache architecture alleviates the need for expensive multi-ported memories and interface circuitry as well as the need to adopt a message- passing paradigm when developing application software. However, the bus may get saturated if multiple processors are trying to access the shared memory (via the bus) simultaneously. A typical bus-based design uses caches to solve the bus contention problem. High speed caches connected to each processor on one side and the bus on the other side mean that local copies of instructions and data can be supplied at the highest possible rate. Hit rate & Miss rate of a cache If the request is not be satisfied by the cache, and so must be copied from the global memory, across the bus, into the cache, and then passed on to the local processor. Bus-based SMP The maximum number of processors with cache memories that the bus can support is given by the relation Where: Example Caches and Cache Coherence Caches play key role in all cases – Reduce average data access time – Reduce bandwidth demands placed on shared interconnect But private processor caches create a problem – Copies of a variable can be present in multiple caches – A write by one processor may not become visible to others They’ll keep accessing stale value in their caches – Cache coherence problem – Need to take actions to ensure visibility 14 Cache Memory Coherence Cache coherence 16 Cache Cache Coherence Shared Memory System Coherence The four combinations to maintain coherence among all caches and global memory are: Snooping Protocols for Cache Coherence Snooping protocols are based on watching bus activities and carry out the appropriate coherency commands when necessary. Global memory is moved in blocks, and each block has a state associated with it, which determines what happens to the entire contents of the block. The state of a block might change as a result of the operations Read-Miss, Read-Hit, Write-Miss, and Write-Hit. A cache miss means that the requested block is not in the cache or it is in the cache but has been invalidated. Snooping protocols differ in whether they update or invalidate shared copies in remote caches in case of a write operation. They also differ as to where to obtain the new data in the case of a cache miss. Directory based Protocols Updating or invalidating caches using snoopy protocols might become unpractical, owing to the nature of some interconnection networks and the size of the shared memory system. For example, when a multistage network is used to build a large shared memory system, the broadcasting techniques used in the snoopy protocols becomes very expensive. In such situations, coherence commands need to be sent to only those caches that might be affected by an update. This is the idea behind directory-based protocols. Directory based Protocols Cache coherence protocols that somehow store information on where copies of blocks reside are called directory schemes. A directory is a data structure that maintains information on the processors that share a memory block and on its state. The information maintained in the directory could be either centralized or distributed. A Central directory maintains information about all blocks in a central data structure. While Central directory includes everything in one location, it becomes a bottleneck and suffers from large search time. To alleviate this problem, the same information can be handled in a distributed fashion by allowing each memory module to maintain a separate directory. In a distributed directory, the entry associated with a memory block has only one pointer one of the cache that requested the block. Fully mapped Directory Limited directory Chained Directory Invalidation Protocols Centralized Directory Invalidation