Parallel Computer Architectures PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides a general overview of parallel computer architectures. It details various types of parallel computation systems, including multiprocessor systems, different classifications based on Flynn's taxonomy, and describes different architectures such as symmetric multiprocessors (SMPs), non-uniform memory access (NUMAs), clusters, and massively parallel processors (MPPs). It also touches on concepts such as fault tolerance and graceful degradation.
Full Transcript
Parallel Computer Architectures Chapter 8 Multiprocessor Systems Multiprocessor Systems refer to the use of multiple processors that execute instructions simultaneously and communicate via shared memory. Multiple Processor Organization – Flynn’s Taxonomy First pr...
Parallel Computer Architectures Chapter 8 Multiprocessor Systems Multiprocessor Systems refer to the use of multiple processors that execute instructions simultaneously and communicate via shared memory. Multiple Processor Organization – Flynn’s Taxonomy First proposed by Michael J. Flynn in 1966. Flynn's taxonomy is a specific classification of parallel computer architectures that are based on the number of concurrent instruction (single or multiple) and data streams (single or multiple) available in the architecture. Multiple Processor Organization – Flynn’s Taxonomy SISD - Single instruction, single data stream SIMD - Single instruction, multiple data stream MISD - Multiple instruction, single data stream MIMD - Multiple instruction, multiple data stream SISD- Single Instruction , Single Data Stream A single processor executes a single instruction stream, to operate on data stored in a single memory. There is often a central controller that broadcasts the instruction stream to all the processing elements. Data stored in single memory Uni-processor CU: control unit Single instruction stream PU: Processing unit MU: Memory unit SIMD- Single Instruction , Multiple Data Stream 1 Single instruction 2 Multiple processing elements (PE) 3 Each processing element has associated local memory or shared memory Each instruction simultaneously executed on different set of data by PE: Processing element different processors LM : Local memory DS :stored data MISD- Multiple Instruction , Single Data Stream One sequence of data Ia Ib A set of processors Each processor executes different instruction di di sequence Not much practical application MIMD- Multiple Instruction , Multiple Data Stream Set of processors Simultaneously execute different instruction sequences Different sets of data v SMPs (Symmetric Multiprocessors) v NUMA systems (Non-uniform Memory Access) v Clusters (Groups of “partnering” computers) MIMD- Multiple Instruction , Multiple Data Stream Shared memory (SMP or NUMA) Distributed memory (Clusters) MIMD - Shared Memory (Tightly Coupled) Processors share memory and communicate via that shared memory A “tightly-coupled” system usually: v Runs a single copy of the OS with a single job queue v has a single address space v usually has a single bus or backplane to which all processors and memories are connected v has very low communication latency v processors communicate through shared memory Block Diagram of Tightly Coupled Multiprocessor Types of Tightly Coupled Systems Symmetric Multiprocessor (SMP) v Share single memory or pool v Shared bus to access memory v Memory access time to given area of memory is approximately the same for each processor Nonuniform memory access (NUMA) v Access times to different regions of memory may differ Symmetric Vs Asymmetric all peers One master and others are slaves MIMD- Distributed Memory (Loosely Coupled Systems, Clusters) Collection of independent uniprocessors. Each CPU runs an independent OS. Communication via local area network The system may look differently from different hosts. Cluster – (Loosely Coupled) Collection of independent whole uniprocessors or SMPs v Usually called nodes Interconnected to form a cluster Working together as unified resource v Illusion of being one machine Communication via fixed path or network connections Difference between Loosely Coupled and Tightly Coupled Tightly coupled Loosely coupled Multiprocessor Multicomputer Latency – nanoseconds Latency – microseconds Cluster Benefits Absolute scalability: It is possible to create large clusters that far surpass the power of even the largest standalone machines. A cluster can have tens, hundreds, or even thousands of machines, each of which is a multiprocessor. Incremental scalability: A cluster is configured in such a way that it is possible to add new systems to the cluster in small increments. Thus, a user can start out with a modest system and expand it as needs grow, without having to go through a major upgrade in which an existing small system is replaced with a larger system. High availability: Because each node in a cluster is a standalone computer, the failure of one node does not mean loss of service. In many products, fault tolerance is handled automatically in software. Cluster Benefits Superior price/performance: By using commodity building blocks, it is possible to put together a cluster with equal or greater computing power than a single large machine, at much lower cost. Memory Architecture of Multiprocessor Systems Shared Memory Uniform Memory Access (UMA) Non-Uniform Memory Access (NUMA) Distributed Memory UMA Architecture Uniform memory access(UMA): all processors have same latency to access memory. This architecture is scalable only for limited number of processors. Most commonly represented today by Symmetric Multiprocessor (SMP) machines Identical processors Equal access and access times to memory Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level. NUMA architecture Not all processors have equal access time to all memories Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Memory access across link is slower Each processor has its own local memory The memory of other processor is accessible but the latency to access them is not the same which this event called " remote memory access” Fault tolerance and Graceful degradation üFault-tolerant systems are designed so that if a component fails or a network route becomes unusable, a backup component, procedure or route can immediately take its place with no negative impact whatsoever on individual subscribers (implemented by hardware and software duplication). üGraceful degradation is the ability of a computer, machine, electronic system or network to maintain limited functionality even when a large portion of it has been destroyed or rendered inoperative. The purpose of graceful degradation is to prevent catastrophic failure. Blade servers A recent development with.. Multiple processor boards, I/O boards and networking boards are placed in the same chassis. ( modular design optimized to minimize the use of physical space and energy). Each blade processor boots independently and runs its own OS and application. Some of them are multiprocessors as well. In essence, these servers consist of multiple independent multiprocessor systems. A Beowulf cluster It is a cluster implemented on multiple identical commercial off-the-shelf computers connected with a TCP/IP Ethernet local area network. Massive parallel processing MPP A massively parallel processor (MPP) is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks (whereas clusters use commodity hardware for networking). MPPs also tend to be larger than clusters, typically having "far more" than 100 processors. Blue Gene/Lis a massive In an MPP, "each CPU contains its own memory and copy of the parallel processing MPP operating system and application. Each subsystem communicates with the others via a high-speed interconnect." Grid Computing Grid computing is the most distributed form of parallel computing. It makes use of computers communicating over the Internet to work on a given problem. Most grid computing applications use middleware, software that sits between the operating system and the application to manage network resources and standardize the software interface. The most common grid computing middleware is the Berkeley Open Infrastructure for Network Computing (BOINC). Often, grid computing software makes use of "spare cycles", performing computations at times when a computer is idling.