Parallel Programming Lectures PDF
Document Details
Uploaded by StimulativeNebula9716
Tags
Related
- dokumen.pub_parallel-programming-techniques-and-applications-using-networked-workstations-and-parallel-computers-2nd-ed-0131405632-9780131405639.pdf
- Message Passing Computing Lecture PDF
- Multi-Threading - Concurrent & Parallel Programming in Java PDF
- COMP 3730 Intro to Parallel Programming Fall 2024 Lecture 1 PDF
- Parallel Programming PDF
- MPI Programming Lectures PDF
Summary
These lecture notes cover various aspects of parallel programming. It starts with the historical context of generations of computing hardware before advancing to relevant models and concepts. Topics discussed include parallelism, processes, threads, communication patterns and message passing interface (MPI).
Full Transcript
Lecture 1 Generations First gen (1940 - 1956): Vacuum tubes Second gen (1956 - 1963): Transistors Third gen (1964 - 1971): Integrated circuits Fourth gen (1971 - present): Microprocessors Fifth gen (present - beyond): A.I. Basic Parallel Models (Flynn's Taxonomy) I - Inst...
Lecture 1 Generations First gen (1940 - 1956): Vacuum tubes Second gen (1956 - 1963): Transistors Third gen (1964 - 1971): Integrated circuits Fourth gen (1971 - present): Microprocessors Fifth gen (present - beyond): A.I. Basic Parallel Models (Flynn's Taxonomy) I - Instruction; D - data Visualization SISD SIMD MIMD MISD - Uniprocessors - SSE instruction of x86 - Multiprocessor architecture Doesn't exist - Pentium - Vector processors Mutliprocessor architectures Shared memory Distributed memory Hybrid --------------- interconnection interconnection memory --------------- --------------- --------------- mem \ mem \ mem mem \ mem CPU \ CPU \ CPU --------------- --------------- CPU \ CPU \ CPU CPU \ CPU \ CPU Processes and Threads Process: Instance of the OS to execute a program Thread Smallest unit of processing; sequence of instructions Has heap (dynamic data); stack (local data); access to global static data Parallel programming models Message passing Parallel procs exchange data by passing messages ex: PVM; MPI Shared memory Parallel threads share a global address space ex: POSIX threads, OpenMP Implicit Process interaction is not visible to the programmer ex: PGAS, GA Formulas t(1) - time on 1 core/CPU t(n) - time on n core/CPUs Efficiency: t(1) E(n) = ⋅ 100% n ⋅ t(n) Speed-up: t(1) S(n) = t(n) Scalability: Strong scaling - problem size = const when ↑n Weak scaling - problem size ~ n Amdahl's law: a - sequential portion p - number of procs 1 Sp = 1−a a+ p Lecture 2 MPI Terminology - Basics MPI - Message Passing Interface Task - instance; sub-program or process of MPI program Group - ordered set of process identifiers Rank - unique number assigned to each task Context - property that allows the partitioning of the communication space Communicator - scope for communication operations between groups; Intra and Inter communication visuals Four types of MPI functions: Initialization and management calls Point to point communication Collective operations Data type creation hello_world.cpp #include #include void main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); // Connects the program to the procs; Creates the global communicator MPI_Comm_rank(MPI_COMM_WORLD, &rank); // To access rank and size for a comm MPI_Comm_size(MPI_COMM_WORLD, &size); printf("I am %d of %d\n", rank, size); MPI_Finalize(); // Cleans up all data from procs return 0; } Lecture 3 MPI Terminology - Properties of Procedures: Blocking communication: Program waits until the communication is complete before continuing. Simpler, but can lead to idling. (Ringing to someone's phone). Visualization Nonblocking communication: Program continues immediately after initiating communication. More complex, but allows overlapping communication and computation. Requires a separate check for completion. (Sending emails). Visualization Parts of messages: Data part (BUF; COUNT; DATATYPE) Message envelop (SOURCE; DEST; TAG; COMM) Communication modes: Send modes: Synchronous send Buffered send Standard send Ready send - Only completes when - Copies content of sent - Eutomatically chooses - Sends bit by bit as soon as the receive has started message to proc's user-defined between Ssend, Bsend Recv is ready buffer - Uses internal buffer - Always completes irrespective - Always completes irrespective of whether a receive has been of whether a receive has been posted or not posted or not - High latency; good - Low latency; low bandwidth - Minimal transfer time - Only use if logic of your bandwidth program permits it (receive is - Risk of idle times; posted before send) serialization; deadlocks Receive modes: Only one receive call for all modes Wildcards used for receiving messages: MPI_ANY_SOURCE - receiving from any source MPI_ANY_TAG - receiving message with any tag Deadlock The situation where processes are waiting for sends and receives which can never be posted. Example: // For the same rank: MPI_Ssend(..., dest = right_neighbor,...) MPI_Recv(..., source = left_neighbor,...) To avoid deadlocks: Use odd-even separation of processors' communication Use nonblocking communication Lecture 4 Nonblocking communication: Goal: overlapping communication and computation Communication modes: Synchronous send Buffered send Standard send Ready send MPI_Issend MPI_Ibsend MPI_Isend MPI_Irsend Receive Probe MPI_Irecv MPI_Iprobe The REQUEST handle The value is used by MPI_WAIT or MPI_TEST to test when the communication has completed. If the communication has completed, the request handle is set to MPI_REQUEST_NULL Blocking operations Nonblocking operations - Avoids deadlock - Decreases synchronization overheaed A blocking send can be used with nonblocking receive (vice versa too) Nonblocking operation followed by a matching MPI_Wait is equivalent to blocking operations Lecture 5 Characteristics of collective communication Synchronization may or may not occur No tags are used Receive buffers must gave exactly same size as send buffers MPI_Abort Terminates all MPI processes associated with the communicator comm. If the communicator is invalid function is called on MPI_COMM_WORLD Lecture 6 Global reduction operators Associative Order in which sub-reductions are performed is not defined Predefined reduction operation handles They're all associative and commutative Handle Function MPI_MAX Max of all values MPI_MIN Min of all values MPI_SUM Sum of all values MPI_PROD Product of all values MPI_LAND Logical AND MPI_BAND Bitwise AND MPI_LOR Logical OR Handle Function MPI_BOR Bitwise OR MPI_LXOR Logical exclusive OR MPI_BXOR Bitwise exclusive OR MPI_MAXLOC Max and it's location MPI_MINLOC Min and it's location MPI_SUM visualization user_defined_op_example.cpp void myProd(Complex *in, Complex *inout, int *len, MPI_Datatype *dptr) { int i; Complex c; for (i=0; i< *len; ++i) { c.real = inout->real * in->real - inout->imag * in->imag; c.imag = inout->real * in->imag + inout->imag * in->real; *inout = c; in++; inout++; } } Complex a, answer; MPI_Op myOp; MPI_Datatype ctype; MPI_Type_contiguous(2, MPI_DOUBLE, &ctype); MPI_Type_commit(&ctype); MPI_Op_create(myProd, True, &myOp); MPI_Reduce(a, answer, 100, ctype, myOp, root, comm); Lecture 7 Only syntax Lecture 8 Only syntax Lecture 9 Topologies Topologies are how the processes are connected MPI's virtual topologies map the program structure. Independent of the actual hardware network. Topologies are almost essential if: You are writing structure-generic libraries Your program has a variable graph structure Benefits: Applications have specific communication patterns Toplogies advice plans to the program when it's running Types of topologies: 1. Cartesian 2. Graph Cartesian topologies No dimensions periodic => grid All dimesions periodic => torus Some dimensions periodic => cylinder Lecture 10 Graph topology Uses hierarchical systems Communicators can be intra (used for communicating within a single group of processes) or inter (used for communication between two disjoin group of processes). Default communicators - MPI_COMM_WORLD , MPI_COMM_SELF Elements of Graph topology Visualization Neighbors has to be listed in ascending order process edges nneighbors index 0 1 1 1 1 0, 2 2 1+2=3 2 1, 3 2 1+2+2=5 3 2 1 1 + 2 + 2 +1 = 6 nnodes = 4 index = 1, 3, 5, 6 edges = 1, 0, 2, 1, 3, 2 Lecture 11 Derived datatypes These datatypes are derived from basic datatypes using constructors. MPI datatypes are opaque objects Any derived datatype is defined by: A list of basic datatypes A list of displacements (positive, zero, negative) General type map: Datatype Displacement datatype 0 displacement of datatype 0 datatype 1 displacement of datatype 1...... Example 0--4--8--12--16----24----32----40----48---- ///|//|//| |/////|/////|/////|/////|////| ----------^^^------------------------------ hole Datatype Displacement MPI_INT 0 MPI_INT 4 MPI_INT 8 MPI_DOUBLE 16 MPI_DOUBLE 24 MPI_DOUBLE 32 MPI_DOUBLE 40 MPI_DOUBLE 48 Size is the size of datatype without holes Extent is the span from the lower bound to the upperbound (with holes). Holes ath the end of the new type are not counted Example visualiztation