Full Transcript

Lecture (5) Message Passing Architectures Interconnection Networks Static Networks These consist of point-to-point communication links between processors Also referred to as Direct Networks Dynamic Networks These are built using switches and communication links Communi...

Lecture (5) Message Passing Architectures Interconnection Networks Static Networks These consist of point-to-point communication links between processors Also referred to as Direct Networks Dynamic Networks These are built using switches and communication links Communication links are connected to one another dynamically using the switches Also referred to as Indirect Networks Static vs Dynamic Networks Static Networks Network Criteria Latency Time to transfer a message through the network: Network Latency: transmission time Communication Latency: transmission time + overhead Bandwidth Number of bits that can be transmitted per second Connectivity Number of paths between any two processors Cost Number of links in the network Ease of Construction Regularity of the network Comparing Static Networks Diameter Maximum shortest path between any two processors (gives worst case latency) Bisection Width Minimum number of links that must be cut to partition the network into two equal or almost equal halves (indicates potential communication bottlenecks) Bisection Bandwidth Bisection width times bandwidth of links (or sum of the bandwidths of the links cut) Communication Model of Parallel Platforms There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages/ Message-Passing. Platforms that provide a shared data space are called shared-address-space machines or multiprocessors. Platforms that support messaging are also called message passing platforms or multi computers. In shared address space architectures, communication is implicitly specified since some (or all) of the memory is accessible to all the processors. Shared-Address-Space Platforms All of the memory is accessible to all processors. Processors interact by modifying data objects stored in this shared-address-space. If the time taken by a processor to access any memory in the network global or local is matching, the platform is classified as a uniform memory access (UMA), else, a non- uniform memory access (NUMA) machine. NUMA and UMA Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory- access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only. NUMA and UMA Shared-Address-Space Platforms Multiprocessors can be categorized into three shared- memory model which are: 1. Uniform Memory Access (UMA) 2. Non-uniform Memory Access (NUMA) 3. Cache-only Memory Access (COMA) Uniform Memory Access (UMA): In UMA, where Single memory controller is used. Uniform Memory Access is slower than non-uniform Memory Access Non-uniform Memory Access (NUMA): In NUMA, where different memory controller is used. Non-uniform Memory Access is faster than uniform Memory Access. NUMA and UMA Shared-Address-Space Platforms UMA NUMA UMA stands for Uniform Memory NUMA stands for Non-uniform 1 Access. Memory Access. In Uniform Memory Access, In Non-uniform Memory Access, 2 Single memory controller is Different memory controller is used. used. Uniform Memory Access is Non-uniform Memory Access is 3 slower than non-uniform faster than uniform Memory Memory Access. Access. NUMA and UMA Shared-Address-Space Platforms UMA NUMA Non-uniform Memory Access 4 Uniform Memory Access has has more bandwidth than limited bandwidth. uniform Memory Access. Uniform Memory Access is Non-uniform Memory Access is applicable for general purpose 5 applicable for real-time applications and time-sharing applications. applications. In uniform Memory Access, In non-uniform Memory 6 memory access time is balanced Access, memory access time is or equal. not equal. Message-Passing Platforms These platforms comprise of a set of processors and their own (exclusive) memory. EX: Instances of such a view come naturally from clustered workstations that known as non-shared- address-space multi computers. These platforms are implemented/programmed using (variants of) send and receive messages. Message Passing vs. Shared Address Space Platforms Message passing requires little hardware support, other than a network. It can easily emulate message passing. Whereas Shared address space platforms is more difficult to do (in an efficient manner). Communication Costs in Parallel Machines The time taken to communicate a message between two nodes in a network is the sum of the time to prepare a message for transmission and the time taken by the message to traverse the network to its destination. Startup time (ts): The startup time is the time required to handle a message at the sending and receiving nodes. This includes the time to prepare the message (adding header, trailer, and error correction information). Per-hop time (th): After a message leaves a node, it takes a finite amount of time to reach the next node in its path. The time taken by the header of a message to travel between two directly- connected nodes in the network is called the per-hop time Communication Costs in Parallel Machines Per-word transfer time (tw): If the channel bandwidth is r words per second, then each word takes time tw = 1/r to traverse the link. This time is called the per-word transfer time. This time includes network as well as buffering overheads. Communication Costs in Parallel Machines Suppose that a message of size m is being transmitted through such a network. Assume that it traverses by links. At each link, the message incurs a cost th for the header and twm for the rest of the message to traverse the link. Since there are l such links, the total time is (th + twm). Find the total communication cost for a message of size m words to traverse l communication links is Communication Costs in Parallel Machines In current parallel computers, the per-hop time th is quite small. For most parallel algorithms, it is less than twm even for small values of m and thus can be ignored. Physical Organization of Parallel Platforms We begin this discussion with an ideal parallel machine called Parallel Random Access Machine, or PRAM. The Parallel Random Access Machine (PRAM) is an abstract model for parallel computation which assumes that all the processors operate synchronously under a single clock and are able to randomly access a large shared memory PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all processors. Processors share a common clock but may execute different instructions in each cycle Interconnection Networks for Parallel Computers Interconnection networks carry data between processors and to memory. Interconnects are made of switches and links (wires, fiber). Interconnects are classified as static or dynamic. Static networks consist of point-to-point communication links among processing nodes and are also referred to as direct networks. Dynamic networks are built using switches and communication links. Dynamic networks are also referred to as indirect networks. Static and Dynamic Interconnection Networks Classification of interconnection networks: (a) a static network; and (b) a dynamic network. Interconnection Networks Switches map a fixed number of inputs to outputs. The total number of ports on a switch is the degree of the switch (Ninput +Noutput = 𝑁). The cost of a switch grows as the square of the degree of the switch ( 𝑁 2 ), Processors talk to the network via a network interface. The network interface may hang off the I/O bus or the memory bus. The relative speeds of the I/O and memory buses impact the performance of the network.

Use Quizgecko on...
Browser
Browser