Parallel Computing Lecture 3 PDF
Document Details
Uploaded by FantasticCyan
New Jersey Institute of Technology
David A. Bader
Tags
Summary
This document is a lecture on applications of parallel computing. It details instruction level parallelism, pipelining, single instruction multiple data (SIMD) and fused multiply add (FMA) instructions. The document also features a case study on matrix multiplication and optimizations in practice.
Full Transcript
DS 642: Applications of Parallel Computing Lecture 3 02/05/2024 http://www.cs.njit.edu/~bader DS642 1 Outline Processors and registers Memory hierarchies Parallelism within single processors – Instruction Level Parallelism (ILP) and Pipelining – SIMD units – Special Instructions (FMA) Case study: Ma...
DS 642: Applications of Parallel Computing Lecture 3 02/05/2024 http://www.cs.njit.edu/~bader DS642 1 Outline Processors and registers Memory hierarchies Parallelism within single processors – Instruction Level Parallelism (ILP) and Pipelining – SIMD units – Special Instructions (FMA) Case study: Matrix Multiplication Optimization in practice 02/05/2024 DS642 2 What is Pipelining? Dave Patterson’s Laundry example Latency: wash (30 min) + dry (40 min) + fold (20 min) = 90 min 6 PM 7 8 9 In this example (4 loads): Time 4 * 90min = 6 hours 30 40 40 40 40 20 T a s k O r d e r - Sequential execution takes A B C D 02/05/2024 DS642 3 What is Pipelining? Dave Patterson’s Laundry example Latency: wash (30 min) + dry (40 min) + fold (20 min) = 90 min 6 PM 7 8 9 In this example (4 loads of laundry): Time 30 40 40 40 40 20 T a s k O r d e r - Sequential execution takes 4 * 90min = 6 hours - Pipelined execution takes 30+4*40+20 = 3.5 hours A B Bandwidth = loads/hour = 4/6 l/h w/o pipelining = 4/3.5 l/h w pipelining