Full Transcript

Parallel Algorithm Design Parallel Algorithm Design Look at Ian Foster’s Methodology (PCAM) Partition – Decompose the problem – Identify the concurrent tasks (multiple tasks that progress at the same time in the worker system but are not executed at the same time.) –...

Parallel Algorithm Design Parallel Algorithm Design Look at Ian Foster’s Methodology (PCAM) Partition – Decompose the problem – Identify the concurrent tasks (multiple tasks that progress at the same time in the worker system but are not executed at the same time.) – Often the most difficult step Communication – Often dictated by partition We will focus on Partitioning and Mapping Agglomeration (clustering/blocking) – Often not much you can do here Mapping – Difficult problem – Load Balancing Preliminaries an action or event preceding or preparing for something more important A given problem may be partitioned in many different ways. Tasks may be the same, different, or even non determined sizes – Coarse grain – large tasks – Fine grain – very small tasks Often, partitioning's are illustrated in the form of a “task dependency graph” – Directed graph – Nodes are tasks – Edges denote that the result of one task is needed for the computation of the result in another task Task Dependency Graph Can be a graph or adjacency matrix Preliminaries Degree of Concurrency – The number of tasks that can be executed in parallel Maximum Degree of Concurrency – the maximum number of such tasks at any point during execution – Since the number of tasks that can be executed in parallel may change over program execution,. Average degree of concurrency – the average number of tasks that can be processed in parallel over the execution of the program. The degree of concurrency increases as the decomposition becomes finer in granularity and vice versa. – When viewed strictly from a “number of tasks” perspective. – However, the number of concurrent operations may not follow this relationship Preliminaries A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. – The longest path determines the shortest time in which the program can be executed in parallel. – The length of the longest path in a task dependency graph is called the critical path length. Task 4 Task 3 Task 2 Task 1 Task 4 Task 3 Task 2 Task 1 10 10 10 10 10 10 10 10 6 Task 5 9 Task 6 6 Task 5 11 Task 6 27 8 Task 7 7 Task 7 34 (a) (b) at are the critical path lengths? If each task takes 10 time units, what is the shortest parallel execution Limits on Parallel Performance It would appear that the parallel time can be made arbitrarily small by making the decomposition finer in granularity. There is an inherent bound on how fine the granularity of a computation can be. – For example, in the case of multiplying a dense matrix with a vector, there can be no more than (n2) concurrent tasks. Concurrent tasks may also have to exchange data with other tasks. This results in communication overhead. The tradeoff between the granularity of a decomposition and associated overheads often determines performance bounds. Partitioning Techniques There is no single recipe that works for all problems. We can benefit from some commonly used techniques: – Recursive Decomposition – Data Decomposition – Exploratory Decomposition – Speculative Decomposition Recursive Decomposition Generally suited to problems that are solved using a divide and conquer strategy. Decompose based on sub-problems Often results in natural concurrency as sub-problems can be solved in parallel. Need to think recursively – parallel not sequential Recursive Decomposition: Quicksort Once the list has been partitioned around the pivot, each sublist can be processed concurrently. Once each sublist has been partitioned around the pivot,each sub-sublist can be processed concurrently. Recursive Decomposition: Finding the Min/Max/Sum Any associative and commutative operation. 1. procedure SERIAL_MIN (A,n) 2. begin 3. min = A; 4. for i:= 1 to n−1 do 5. if (A[i]

Use Quizgecko on...
Browser
Browser