20240129-DS642-Lecture-02.pdf
Document Details
Uploaded by FantasticCyan
NJIT
2024
Tags
Related
- Parallel Computing Lecture 3 PDF
- DS 642: Applications of Parallel Computing Lecture 6 PDF
- Parallel & Distributed Computing PDF
- 1- Introduction to Parallel Computing.pdf
- Parallel And Distributed Computing Lecture #01 PDF
- Theoretical and Practical Foundations of Parallel Computing in Numerical Methods PDF
Full Transcript
DS 642: Applications of Parallel Computing Lecture 2 01/29/2024 http://www.cs.njit.edu/~bader DS642 1 The Motifs of Parallel Computing 01/29/2024 DS642 2 How to cover all applications? Phil Colella’s famous 7 “dwarfs” of scientific computing (simulation) Dense Linear Algebra Sparse Linear Algebra Pa...
DS 642: Applications of Parallel Computing Lecture 2 01/29/2024 http://www.cs.njit.edu/~bader DS642 1 The Motifs of Parallel Computing 01/29/2024 DS642 2 How to cover all applications? Phil Colella’s famous 7 “dwarfs” of scientific computing (simulation) Dense Linear Algebra Sparse Linear Algebra Particle Methods Structured Grids Unstructured Grids Spectral Methods (e.g. FFT) Monte Carlo 01/29/2024 Colella’s 2004 DARPA presentation “Defining Software Requirements for Scientific Computing” DS642 3 Motifs: Common Computational Methods (Red Hot Important Blue Cool Not Important) 1 2 3 4 5 6 7 8 9 10 11 12 13 HPC ML Games DB SPEC Embed What do commercial and CSE applications have in common? Health Image Speech Music Browser Finite State Mach. Combinational Graph Traversal Structured Grid Dense Matrix Sparse Matrix Spectral (FFT) Dynamic Prog N-Body MapReduce Backtrack/ B&B Graphical Models Unstructured Grid 01/29/2024 2006 DS642 of the Parallel Computing Landscape4 Berkeley View paper -- A View Analytics vs. Simulation Motifs National Academies 2013 01/29/2024 7 Giants of Data 7 Dwarfs of Simulation Basic statistics Monte Carlo methods Generalized N-Body Particle methods Graph-theory Unstructured meshes Linear algebra Dense Linear Algebra Optimizations Sparse Linear Algebra Integrations Spectral methods Alignment Structured Meshes 5 Motifs of Genomic Data Analysis These computational patterns dominate ExaBiome Application problems Overlap: Find all overlaps in a set Assembly: find / correct overlaps Distance: how good are overlaps Index: lookup in database Large shared memory platforms were most common – limits science questions and approaches 01/29/2024 Yelick, et al. “The Parallelism Motifs of Genomic DS642 Data Analysis”, Philosophical Transactions A, 2020 6 all (since 2005) Why the Fastest Computers are Parallel Computers Including laptops and handhelds 01/29/2024 DS642 7 Tunnel Vision by Experts “I think there is a world market for maybe five computers.” Thomas Watson, chairman of IBM, 1943. “There is no reason for any individual to have a computer in their home” Ken Olson, president and founder of Digital Equipment Corporation, 1977. “640K [of memory] ought to be enough for anybody.” Bill Gates, chairman of Microsoft,1981. “On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it.” Ken Kennedy, CRPC Director, 1994 01/29/2024 DS642 Slide source: Warfield et al. 8 Technology Trends: Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “Moore’s Law” Microprocessors have become smaller, denser, and more powerful. Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. Slide source: Jack Dongarra 01/29/2024 DS642 9 Microprocessor Transistors / Clock (1970-2000) 10000000 1000000 Transistors (Thousands) 100000 Frequency (MHz) 10000 1000 100 10 1 0 1970 01/29/2024 1975 1980 1985 DS642 1990 1995 2000 10 Microprocessor Transistors / Clock 01/29/2024 Source: Horace He (posted on Twitter) DS642 11 But the actual Moore’s law seems alive (so far) 01/29/2024 DS642 12 Historical Impact of Device Shrinkage What happens when feature size (transistor size) shrinks by a factor x ? Clock rate goes up by x because wires are shorter – actually less than x, because of power consumption Transistors per unit area goes up by x2 Die size has also increased – – This often happened concurrently to feature size shrinkage, not because of it typically another factor of ~x Raw computing power of the chip goes up by ~ x4 ! – typically x3 is devoted to either on-chip parallelism: hidden parallelism such as ILP locality: caches So some programs got x3 times faster, without changing them 01/29/2024 DS642 13 Limits: How fast can a serial computer be? 1 Tflop/s, 1 Tbyte sequential machine r = 0.3 mm Consider the 1 Tflop/s (1012) sequential machine: – – Data must travel distance, r, from memory to processor. To get 1 data element per cycle, this means 1012 times per second at the speed of light, c = 3x108 m/s. Thus r < c/1012 = 0.3 mm. Now put 1 Tbyte of storage in a 0.3 mm x 0.3 mm area: – Each bit occupies about 1 square Angstrom, or the size of a small atom. No choice but parallelism 01/29/2024 DS642 14 But What about Heat Density 01/29/2024 DS642 Image by Sam Spratt 15 Power Density Limits Serial Performance – – Dynamic power is proportional to V2fC Increasing frequency (f) also increases supply voltage (V) cubic effect Increasing cores increases capacitance (C) but only linearly Save power by lowering clock speed and adding parallelism – – High performance serial processors waste power - Speculation, dynamic dependence checking, etc. burn power - Implicit parallelism discovery Sun’s Surface Source: Patrick Gelsinger, Shenkar Bokar, Intel Rocket 1000 Nozzle Nuclear 100 Reactor Hot Plate 8086 10 4004 8008 8080 P6 8085 286 Pentium® 386 486 1 More transistors, but not faster serial processors 01/29/2024 10000 Power Density (W/cm2) Faster processors increase power Scaling clock speed (business as usual) will not work DS642 1970 1980 1990 2000 2010 Year 16 More drivers for parallelism: Chip yield Yield - What % of chips are usable? - Complexity of fabrication (decreased size and number of steps) increases errors - Parallelism helps, e.g., KNL (in Cori) sold with only 68 out of 76 “on” to improve yield http://electroiq.com/blog/2016/02/yield-andcost-challenges-at-16nm-and-beyond/ 01/29/2024 DS642 17 Dennard Scaling is Dead; Moore’s Law Will Follow Science implication: Atlas computing estimate off by $1B 01/29/2024M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten, and K. Rupp DS642 18 Moore’s Law It’s hard to think exponentially But it’s also hard to stop - 19 - 01/29/2024 DS642 Moore’s Law reinterpreted Number of cores per chip can double every two years Clock speed will not increase (possibly decrease) Need to deal with systems with millions of concurrent threads Need to deal with inter-chip parallelism as well as intra-chip parallelism But Moore’s Law is not forever… 01/29/2024 DS642 20 Parallel hardware is everywhere It’s just about all you can find today in laptops, servers etc. Intel® Knights Landing (68 cores) Intel® Haswell (16 cores) AMD Ryzden (8 cores) NVIDIA V100 Fujitsu ARM (48 cores) Even my cell phone has a parallel processor There’s no escape! 01/29/2024 DS642 21 Caution: Amdahl’s Law Suppose only part of an application is parallel Amdahl’s law – s = fraction of work done sequentially (Amdahl fraction) – 1-s is fraction parallelizable – P = number of processors Speedup(P) = Time(1)/Time(P)