COMP 426 Lecture Notes PDF
Document Details
Uploaded by Deleted User
Tags
Summary
These lecture notes cover various topics related to computer architecture, focusing on microprocessor capacity, parallel programming techniques, and performance analysis. The notes include examples and discussions on different architectures and approaches to parallelization. It's likely part of a computer science course.
Full Transcript
Photomicrograph of Intel Pentium CPU August 29, 2024 9:20 AM Screen clipping taken: 2024-08-29 9:22 AM Lecture 0 Page 1 Technology Trends: Microprocessor Capacity August 29, 2024 9:22 AM Screen clipping taken: 2024-08-29 9:22 AM...
Photomicrograph of Intel Pentium CPU August 29, 2024 9:20 AM Screen clipping taken: 2024-08-29 9:22 AM Lecture 0 Page 1 Technology Trends: Microprocessor Capacity August 29, 2024 9:22 AM Screen clipping taken: 2024-08-29 9:22 AM Lecture 0 Page 2 Increasing Processor Speed August 29, 2024 9:22 AM Screen clipping taken: 2024-08-29 9:22 AM Lecture 0 Page 3 Microprocessor Transistors and Clock Rate August 29, 2024 9:22 AM Screen clipping taken: 2024-08-29 9:23 AM Lecture 0 Page 4 Limitation #1: Power Wall Power density August 29, 2024 9:23 AM Screen clipping taken: 2024-08-29 9:23 AM Lecture 0 Page 5 Instruction-Level Parallelism August 29, 2024 9:23 AM Screen clipping taken: 2024-08-29 9:23 AM Screen clipping taken: 2024-08-29 9:23 AM Lecture 0 Page 6 Screen clipping taken: 2024-08-29 9:24 AM Lecture 0 Page 7 Very Large Instruction Word (VLIW) August 29, 2024 9:24 AM Screen clipping taken: 2024-08-29 9:24 AM Lecture 0 Page 8 SIMD and Vector Processing August 29, 2024 9:24 AM Screen clipping taken: 2024-08-29 9:24 AM Lecture 0 Page 9 Hardware Multithreading August 29, 2024 9:24 AM Screen clipping taken: 2024-08-29 9:25 AM Lecture 0 Page 10 Limitation #2: ILP Wall Hidden Parallelism Tapped Out August 29, 2024 9:25 AM Screen clipping taken: 2024-08-29 9:25 AM Screen clipping taken: 2024-08-29 9:25 AM Lecture 0 Page 11 Limitation #3: Memory Wall August 29, 2024 9:25 AM Screen clipping taken: 2024-08-29 9:25 AM Lecture 0 Page 12 Why Multicore? August 29, 2024 9:26 AM Screen clipping taken: 2024-08-29 9:26 AM Lecture 0 Page 13 Multicore Architecture August 29, 2024 9:26 AM Screen clipping taken: 2024-08-29 9:26 AM Lecture 0 Page 14 Multicore in Products August 29, 2024 9:26 AM Screen clipping taken: 2024-08-29 9:26 AM Lecture 0 Page 15 Advantages August 29, 2024 9:27 AM Screen clipping taken: 2024-08-29 9:27 AM Lecture 0 Page 16 Disadvantages August 29, 2024 9:27 AM Screen clipping taken: 2024-08-29 9:27 AM Lecture 0 Page 17 Flynn’s Taxonomy August 29, 2024 9:27 AM Screen clipping taken: 2024-08-29 9:27 AM Lecture 0 Page 18 SISD architecture August 29, 2024 9:27 AM Screen clipping taken: 2024-08-29 9:28 AM Lecture 0 Page 19 SIMD architecture August 29, 2024 9:28 AM Screen clipping taken: 2024-08-29 9:28 AM Lecture 0 Page 20 MIMD architecture August 29, 2024 9:28 AM Screen clipping taken: 2024-08-29 9:28 AM Lecture 0 Page 21 Processor Structures August 29, 2024 9:29 AM Screen clipping taken: 2024-08-29 9:29 AM Screen clipping taken: 2024-08-29 9:29 AM Lecture 0 Page 22 Multi-core processor August 29, 2024 9:29 AM Screen clipping taken: 2024-08-29 9:29 AM Lecture 0 Page 23 Homogeneous Multicore August 29, 2024 9:30 AM Screen clipping taken: 2024-08-29 9:31 AM Lecture 0 Page 24 Examples August 29, 2024 9:29 AM Screen clipping taken: 2024-08-29 9:30 AM Lecture 0 Page 25 Screen clipping taken: 2024-08-29 9:30 AM Lecture 0 Page 26 Screen clipping taken: 2024-08-29 9:31 AM Lecture 0 Page 27 Throughput Oriented Architecture August 29, 2024 9:31 AM Screen clipping taken: 2024-08-29 9:31 AM Lecture 0 Page 28 Heterogeneous Multicore August 29, 2024 9:31 AM Screen clipping taken: 2024-08-29 9:32 AM Screen clipping taken: 2024-08-29 9:32 AM Lecture 0 Page 29 Lecture 0 Page 30 Example August 29, 2024 9:32 AM Screen clipping taken: 2024-08-29 9:32 AM Lecture 0 Page 31 Top-level anatomy of the Cell processor September 10, 2024 8:23 AM Screen clipping taken: 2024-09-10 8:25 AM Lecture 1 Page 32 Processor Interconnects September 10, 2024 8:25 AM Screen clipping taken: 2024-09-10 8:26 AM Lecture 1 Page 33 Scalar vs. SIMD Operations September 10, 2024 8:47 AM Screen clipping taken: 2024-09-10 8:47 AM Lecture 1 Page 34 Scalar vs. SIMD Code September 10, 2024 8:47 AM Screen clipping taken: 2024-09-10 8:47 AM Lecture 1 Page 35 Restrictions on SIMD Operations September 10, 2024 8:48 AM Screen clipping taken: 2024-09-10 8:49 AM Lecture 1 Page 36 Screen clipping taken: 2024-09-10 8:49 AM Lecture 1 Page 37 Scalar program for difference calculation September 10, 2024 8:49 AM Screen clipping taken: 2024-09-10 8:49 AM Screen clipping taken: 2024-09-10 8:50 AM Lecture 1 Page 38 Vector operations September 10, 2024 8:50 AM Screen clipping taken: 2024-09-10 8:50 AM Lecture 1 Page 39 Shared Memory Multicore Architecture September 10, 2024 8:50 AM Screen clipping taken: 2024-09-10 8:50 AM Lecture 1 Page 40 Programming Shared Memory Processors September 10, 2024 8:51 AM Screen clipping taken: 2024-09-10 8:51 AM Lecture 1 Page 41 Distributed Memory Multicore Architecture September 10, 2024 8:50 AM Screen clipping taken: 2024-09-10 8:50 AM Lecture 1 Page 42 Multicore Programming September 10, 2024 8:51 AM Screen clipping taken: 2024-09-10 8:51 AM Lecture 1 Page 43 General Structure of Multicore Programs September 10, 2024 8:51 AM Screen clipping taken: 2024-09-10 8:51 AM Lecture 1 Page 44 Example Parallelization September 10, 2024 8:52 AM Screen clipping taken: 2024-09-10 8:52 AM Screen clipping taken: 2024-09-10 8:52 AM Lecture 1 Page 45 Lecture 1 Page 46 Types of Parallelism September 10, 2024 8:52 AM Screen clipping taken: 2024-09-10 8:52 AM Lecture 1 Page 47 Open MP September 10, 2024 8:53 AM Screen clipping taken: 2024-09-10 8:53 AM Lecture 1 Page 48 OpenMP History September 10, 2024 8:53 AM Screen clipping taken: 2024-09-10 8:53 AM Lecture 1 Page 49 Simple OpenMP Example September 10, 2024 8:53 AM Screen clipping taken: 2024-09-10 8:53 AM Lecture 1 Page 50 OpenMP Paradigm September 10, 2024 8:53 AM Screen clipping taken: 2024-09-10 8:54 AM Lecture 1 Page 51 Parallel Programming with OpenMP September 10, 2024 8:54 AM Screen clipping taken: 2024-09-10 8:55 AM Lecture 1 Page 52 Cilk September 10, 2024 8:55 AM Screen clipping taken: 2024-09-10 8:55 AM Lecture 1 Page 53 Cilk to Cilk++ and Cilk Plus September 10, 2024 8:55 AM Screen clipping taken: 2024-09-10 8:55 AM Lecture 1 Page 54 Cilk++ (and Cilk Plus) September 10, 2024 8:55 AM Screen clipping taken: 2024-09-10 8:56 AM Lecture 1 Page 55 Fibonacci Example: Creating Parallelism September 10, 2024 8:56 AM Screen clipping taken: 2024-09-10 8:56 AM Lecture 1 Page 56 Basic Cilk Keywords September 10, 2024 8:56 AM Screen clipping taken: 2024-09-10 8:56 AM Lecture 1 Page 57 Dynamic Multithreading September 10, 2024 8:56 AM Screen clipping taken: 2024-09-10 8:56 AM Lecture 1 Page 58 Parallelizing Vector Addition September 10, 2024 8:56 AM Screen clipping taken: 2024-09-10 8:57 AM Screen clipping taken: 2024-09-10 8:57 AM Lecture 1 Page 59 Screen clipping taken: 2024-09-10 8:57 AM Lecture 1 Page 60 Cilk’s Work-Stealing Scheduler September 10, 2024 8:57 AM Screen clipping taken: 2024-09-10 8:58 AM Lecture 1 Page 61 Mutual Exclusion in Cilk September 10, 2024 8:58 AM Screen clipping taken: 2024-09-10 8:58 AM Lecture 1 Page 62 Compiling Cilk September 10, 2024 8:58 AM Screen clipping taken: 2024-09-10 8:58 AM Lecture 1 Page 63 Programming Distributed Memory Processors September 10, 2024 8:58 AM Screen clipping taken: 2024-09-10 8:59 AM Lecture 1 Page 64 Message Passing September 10, 2024 8:59 AM Screen clipping taken: 2024-09-10 8:59 AM Lecture 1 Page 65 Example Message Passing Program September 10, 2024 8:59 AM Screen clipping taken: 2024-09-10 9:00 AM Screen clipping taken: 2024-09-10 9:00 AM Lecture 1 Page 66 Lecture 1 Page 67 Performance Analysis September 10, 2024 9:00 AM Screen clipping taken: 2024-09-10 9:00 AM Lecture 1 Page 68 Understanding Performance September 10, 2024 9:00 AM Screen clipping taken: 2024-09-10 9:00 AM Lecture 1 Page 69 Limits to Performance Scalability September 10, 2024 9:00 AM Screen clipping taken: 2024-09-10 9:01 AM Lecture 1 Page 70 Coverage (Extent of Parallelism) September 10, 2024 9:01 AM Screen clipping taken: 2024-09-10 9:01 AM Lecture 1 Page 71 Amdahl’s Law September 10, 2024 9:01 AM Screen clipping taken: 2024-09-10 9:02 AM Lecture 1 Page 72 Granularity September 10, 2024 9:02 AM Screen clipping taken: 2024-09-10 9:02 AM Lecture 1 Page 73 Fine vs. Coarse Granularity September 10, 2024 9:02 AM Screen clipping taken: 2024-09-10 9:02 AM Lecture 1 Page 74 The Load Balancing Problem September 10, 2024 9:02 AM Screen clipping taken: 2024-09-10 9:02 AM Lecture 1 Page 75 Static Load Balancing September 10, 2024 9:02 AM Screen clipping taken: 2024-09-10 9:03 AM Lecture 1 Page 76 Dynamic Load Balancing September 10, 2024 9:03 AM Screen clipping taken: 2024-09-10 9:03 AM Lecture 1 Page 77 Granularity and Performance Tradeoffs September 10, 2024 9:03 AM Screen clipping taken: 2024-09-10 9:03 AM Lecture 1 Page 78 Overlapping Communication and Computation September 10, 2024 9:03 AM Screen clipping taken: 2024-09-10 9:03 AM Screen clipping taken: 2024-09-10 9:04 AM Lecture 1 Page 79 Locality of Memory Accesses (Shared Memory) September 10, 2024 9:04 AM Screen clipping taken: 2024-09-10 9:04 AM Screen clipping taken: 2024-09-10 9:04 AM Lecture 1 Page 80 Screen clipping taken: 2024-09-10 9:04 AM Lecture 1 Page 81 Memory Access Latency in Shared Memory Architectures September 10, 2024 9:04 AM Screen clipping taken: 2024-09-10 9:04 AM Lecture 1 Page 82 Symmetric Multiprocessor (SMP) architecture September 10, 2024 9:04 AM Screen clipping taken: 2024-09-10 9:05 AM Lecture 1 Page 83 Nonuniform memory access (NUMA) architecture September 10, 2024 9:05 AM Screen clipping taken: 2024-09-10 9:05 AM Lecture 1 Page 84 Cache Architectures September 10, 2024 9:05 AM Screen clipping taken: 2024-09-10 9:05 AM Lecture 1 Page 85 Problems in Multicore Programming September 16, 2024 11:23 AM Screen clipping taken: 2024-09-16 11:33 AM Lecture 2 Page 86 Screen clipping taken: 2024-09-16 11:33 AM Lecture 2 Page 87 Open Multi-Processing (OpenMP) September 16, 2024 11:33 AM Screen clipping taken: 2024-09-16 11:33 AM Lecture 2 Page 88 OpenMP Example September 16, 2024 11:33 AM Screen clipping taken: 2024-09-16 11:33 AM Lecture 2 Page 89 A Programmer’s View of OpenMP September 16, 2024 11:34 AM Screen clipping taken: 2024-09-16 11:34 AM Lecture 2 Page 90 Limitations of OpenMP September 16, 2024 11:34 AM Screen clipping taken: 2024-09-16 11:34 AM Lecture 2 Page 91 Cilk and Cilk Plus September 16, 2024 11:35 AM Screen clipping taken: 2024-09-16 11:35 AM Lecture 2 Page 92 Summary of Cilk Plus September 16, 2024 11:35 AM Screen clipping taken: 2024-09-16 11:35 AM Lecture 2 Page 93 Intel ® Threading Building Blocks September 16, 2024 11:35 AM Screen clipping taken: 2024-09-16 11:35 AM Screen clipping taken: 2024-09-16 11:35 AM Lecture 2 Page 94 TBB 4.0 Components September 16, 2024 11:36 AM Screen clipping taken: 2024-09-16 11:36 AM Lecture 2 Page 95 Scalability September 16, 2024 11:35 AM Screen clipping taken: 2024-09-16 11:36 AM Lecture 2 Page 96 Scalability in TBB September 16, 2024 11:36 AM Screen clipping taken: 2024-09-16 11:36 AM Lecture 2 Page 97 Generic Parallel Algorithms September 16, 2024 11:36 AM Screen clipping taken: 2024-09-16 11:37 AM Lecture 2 Page 98 Generic Parallel Algorithms in TBB September 16, 2024 11:38 AM Screen clipping taken: 2024-09-16 11:39 AM Lecture 2 Page 99 Screen clipping taken: 2024-09-16 11:39 AM Lecture 2 Page 100 Example September 16, 2024 11:38 AM Screen clipping taken: 2024-09-16 11:39 AM Screen clipping taken: 2024-09-16 11:40 AM Lecture 2 Page 101 Screen clipping taken: 2024-09-16 11:40 AM Screen clipping taken: 2024-09-16 11:40 AM Lecture 2 Page 102 Data Parallel Decomposition September 16, 2024 11:37 AM Screen clipping taken: 2024-09-16 11:37 AM Screen clipping taken: 2024-09-16 11:37 AM Lecture 2 Page 103 Recursive Decomposition September 16, 2024 11:37 AM Screen clipping taken: 2024-09-16 11:37 AM Lecture 2 Page 104 Lazy Parallelism in Recursive Partitioning September 16, 2024 11:37 AM Screen clipping taken: 2024-09-16 11:38 AM Lecture 2 Page 105 Practical Matters September 16, 2024 11:38 AM Screen clipping taken: 2024-09-16 11:38 AM Lecture 2 Page 106 Grain Size September 16, 2024 11:38 AM Screen clipping taken: 2024-09-16 11:38 AM Lecture 2 Page 107 Blocking September 16, 2024 11:38 AM Screen clipping taken: 2024-09-16 11:38 AM Lecture 2 Page 108 parallel_for September 16, 2024 11:40 AM Screen clipping taken: 2024-09-16 11:40 AM Lecture 2 Page 109 Range is Generic September 16, 2024 11:40 AM Screen clipping taken: 2024-09-16 11:40 AM Lecture 2 Page 110 Partitioning the work September 16, 2024 11:40 AM Screen clipping taken: 2024-09-16 11:41 AM Lecture 2 Page 111 More on affinity_partitioner September 16, 2024 11:41 AM Screen clipping taken: 2024-09-16 11:41 AM Lecture 2 Page 112 Example : Matrix Multiply September 16, 2024 11:41 AM Screen clipping taken: 2024-09-16 11:41 AM Screen clipping taken: 2024-09-16 11:42 AM Lecture 2 Page 113 Screen clipping taken: 2024-09-16 11:42 AM Lecture 2 Page 114 Example: Find Index of Smallest Element September 16, 2024 11:42 AM Screen clipping taken: 2024-09-16 11:42 AM Lecture 2 Page 115 Screen clipping taken: 2024-09-16 11:42 AM Lecture 2 Page 116 Screen clipping taken: 2024-09-16 11:42 AM Lecture 2 Page 117 Parallel pipeline September 16, 2024 11:42 AM Screen clipping taken: 2024-09-16 11:43 AM Lecture 2 Page 118 Screen clipping taken: 2024-09-16 11:43 AM Lecture 2 Page 119 TBB Library – Pipeline, continued September 16, 2024 11:43 AM Screen clipping taken: 2024-09-16 11:43 AM Lecture 2 Page 120 Screen clipping taken: 2024-09-16 11:43 AM Screen clipping taken: 2024-09-16 11:43 AM Lecture 2 Page 121 Task Scheduler September 16, 2024 11:43 AM Screen clipping taken: 2024-09-16 11:50 AM Lecture 2 Page 122 What is a Task September 16, 2024 11:50 AM Screen clipping taken: 2024-09-16 11:50 AM Lecture 2 Page 123 Example – a TBB Task September 16, 2024 11:50 AM Screen clipping taken: 2024-09-16 11:50 AM Lecture 2 Page 124 Task Dependencies September 16, 2024 11:50 AM Screen clipping taken: 2024-09-16 11:50 AM Lecture 2 Page 125 Task Tree Example September 16, 2024 11:51 AM Screen clipping taken: 2024-09-16 11:51 AM Lecture 2 Page 126 Optimization: Continuation Passing September 16, 2024 11:51 AM Screen clipping taken: 2024-09-16 11:51 AM Lecture 2 Page 127 Example: Naive Fibonacci Calculation September 16, 2024 11:51 AM Screen clipping taken: 2024-09-16 11:52 AM Screen clipping taken: 2024-09-16 11:52 AM Lecture 2 Page 128 Screen clipping taken: 2024-09-16 11:52 AM Lecture 2 Page 129 Screen clipping taken: 2024-09-16 11:52 AM Lecture 2 Page 130 Screen clipping taken: 2024-09-16 11:52 AM Lecture 2 Page 131 Screen clipping taken: 2024-09-16 11:52 AM Lecture 2 Page 132 Two Execution Orders September 16, 2024 11:57 AM Screen clipping taken: 2024-09-16 11:57 AM Lecture 2 Page 133 Work Stealing September 16, 2024 11:57 AM Screen clipping taken: 2024-09-16 11:57 AM Lecture 2 Page 134 Work Depth First; Steal Breadth First September 16, 2024 11:57 AM Screen clipping taken: 2024-09-16 11:58 AM Lecture 2 Page 135 Executing and stealing tasks in TBB September 16, 2024 11:58 AM Screen clipping taken: 2024-09-16 11:58 AM Screen clipping taken: 2024-09-16 12:12 PM Lecture 2 Page 136 Screen clipping taken: 2024-09-16 12:12 PM Screen clipping taken: 2024-09-16 12:13 PM Screen clipping taken: 2024-09-16 12:13 PM Lecture 2 Page 137 Screen clipping taken: 2024-09-16 12:13 PM Screen clipping taken: 2024-09-16 12:14 PM Lecture 2 Page 138 Synchronization Primitives September 16, 2024 12:14 PM Screen clipping taken: 2024-09-16 12:14 PM Lecture 2 Page 139 Mutex Behaviors September 16, 2024 12:14 PM Screen clipping taken: 2024-09-16 12:14 PM Lecture 2 Page 140 TBB Synchronization Primitives September 16, 2024 12:14 PM Screen clipping taken: 2024-09-16 12:15 PM Lecture 2 Page 141 TBB Synchronization Primitives Features September 16, 2024 12:15 PM Screen clipping taken: 2024-09-16 12:15 PM Lecture 2 Page 142 Example: spin_rw_mutex September 16, 2024 12:15 PM Screen clipping taken: 2024-09-16 12:16 PM Lecture 2 Page 143 Concurrent Containers September 16, 2024 12:16 PM Screen clipping taken: 2024-09-16 12:16 PM Lecture 2 Page 144 C++ STL September 16, 2024 12:16 PM Screen clipping taken: 2024-09-16 12:16 PM Lecture 2 Page 145 Concurrency-Friendly Interfaces September 16, 2024 12:16 PM Screen clipping taken: 2024-09-16 12:16 PM Lecture 2 Page 146 TBB Concurrent Containers September 16, 2024 12:16 PM Screen clipping taken: 2024-09-16 12:17 PM Lecture 2 Page 147 concurrent_vector September 16, 2024 12:17 PM Screen clipping taken: 2024-09-16 12:17 PM Lecture 2 Page 148 concurrent_queue September 16, 2024 12:17 PM Screen clipping taken: 2024-09-16 12:17 PM Lecture 2 Page 149 concurrent_hash_map September 16, 2024 12:17 PM Screen clipping taken: 2024-09-16 12:18 PM Lecture 2 Page 150 Example: map strings to integers September 16, 2024 12:18 PM Screen clipping taken: 2024-09-16 12:18 PM Lecture 2 Page 151 Native Threads vs. TBB September 16, 2024 12:18 PM Screen clipping taken: 2024-09-16 12:18 PM Lecture 2 Page 152 OpenMP vs. Intel® Threading Building Blocks September 16, 2024 12:19 PM Screen clipping taken: 2024-09-16 12:19 PM Lecture 2 Page 153 Compute Unified Device Architecture (CUDA) October 1, 2024 5:51 PM Screen clipping taken: 2024-10-01 5:52 PM Screen clipping taken: 2024-10-01 5:58 PM Lecture 3 Page 154 Screen clipping taken: 2024-10-01 6:18 PM Lecture 3 Page 155 CUDA – C with a Co-processor October 1, 2024 5:58 PM Screen clipping taken: 2024-10-01 6:14 PM Lecture 3 Page 156 CUDA Devices and Threads October 1, 2024 6:14 PM Screen clipping taken: 2024-10-01 6:15 PM Lecture 3 Page 157 Differences between GPU and CPU threads October 1, 2024 6:16 PM Screen clipping taken: 2024-10-01 6:16 PM Lecture 3 Page 158 CUDA C Programming October 1, 2024 6:18 PM Screen clipping taken: 2024-10-01 6:19 PM Lecture 3 Page 159 CUDA Kernels and Threads October 1, 2024 6:19 PM Screen clipping taken: 2024-10-01 6:19 PM Lecture 3 Page 160 CUDA API October 1, 2024 6:22 PM Screen clipping taken: 2024-10-01 6:22 PM Screen clipping taken: 2024-10-01 6:22 PM Lecture 3 Page 161 Screen clipping taken: 2024-10-01 6:22 PM Lecture 3 Page 162 GPU vs. CPU October 1, 2024 5:52 PM Screen clipping taken: 2024-10-01 5:52 PM Screen clipping taken: 2024-10-01 5:57 PM Lecture 3 Page 163 CPU vs. GPU - Hardware October 1, 2024 5:52 PM Screen clipping taken: 2024-10-01 5:54 PM Lecture 3 Page 164 Traditional Graphics Pipeline October 1, 2024 5:54 PM Screen clipping taken: 2024-10-01 5:54 PM Lecture 3 Page 165 Pixel / Thread Processing October 1, 2024 5:55 PM Screen clipping taken: 2024-10-01 5:55 PM Lecture 3 Page 166 G80 Device October 1, 2024 5:55 PM Screen clipping taken: 2024-10-01 5:55 PM Lecture 3 Page 167 Processing Element October 1, 2024 5:55 PM Screen clipping taken: 2024-10-01 5:55 PM Lecture 3 Page 168 Hardware implementation October 1, 2024 5:56 PM Screen clipping taken: 2024-10-01 5:56 PM Lecture 3 Page 169 Streaming Multiprocessor (SM) October 1, 2024 5:56 PM Screen clipping taken: 2024-10-01 5:56 PM Lecture 3 Page 170 Data-parallel Programming October 1, 2024 5:56 PM Screen clipping taken: 2024-10-01 5:56 PM Lecture 3 Page 171 What is GPGPU ? October 1, 2024 5:56 PM Screen clipping taken: 2024-10-01 5:57 PM Lecture 3 Page 172 Previous GPGPU Constraints October 1, 2024 5:57 PM Screen clipping taken: 2024-10-01 5:57 PM Lecture 3 Page 173 Buzzword: Kernel October 1, 2024 6:17 PM Screen clipping taken: 2024-10-01 6:17 PM Lecture 3 Page 174 Buzzword: Thread October 1, 2024 6:17 PM Screen clipping taken: 2024-10-01 6:17 PM Lecture 3 Page 175 Buzzword: Block October 1, 2024 6:17 PM Screen clipping taken: 2024-10-01 6:17 PM Lecture 3 Page 176 Buzzword: Grid October 1, 2024 6:17 PM Screen clipping taken: 2024-10-01 6:18 PM Lecture 3 Page 177 Mapping Buzzwords to GPU Hardware October 1, 2024 6:18 PM Screen clipping taken: 2024-10-01 6:18 PM Screen clipping taken: 2024-10-01 6:18 PM Lecture 3 Page 178 Software Stack October 1, 2024 6:19 PM Screen clipping taken: 2024-10-01 6:19 PM Screen clipping taken: 2024-10-01 6:19 PM Lecture 3 Page 179 Single Instruction Multiple Thread (SIMT) Execution October 1, 2024 6:20 PM Screen clipping taken: 2024-10-01 6:20 PM Lecture 3 Page 180 Execution Model October 1, 2024 6:20 PM Screen clipping taken: 2024-10-01 6:22 PM Lecture 3 Page 181 Thread/Warp Divergence October 1, 2024 6:22 PM Screen clipping taken: 2024-10-01 6:22 PM Lecture 3 Page 182 C SAXPY October 1, 2024 6:23 PM Screen clipping taken: 2024-10-01 6:23 PM Lecture 3 Page 183 SAXPY on a GPU October 1, 2024 6:23 PM Screen clipping taken: 2024-10-01 6:23 PM Lecture 3 Page 184 CUDA SAXPY October 1, 2024 6:23 PM Screen clipping taken: 2024-10-01 6:24 PM Lecture 3 Page 185 Thread Life Cycle in HW October 1, 2024 6:34 PM Screen clipping taken: 2024-10-01 6:34 PM Screen clipping taken: 2024-10-01 6:34 PM Lecture 3 Page 186 SM Executes Blocks October 1, 2024 6:34 PM Screen clipping taken: 2024-10-01 6:36 PM Lecture 3 Page 187 Thread Scheduling/Execution October 1, 2024 6:36 PM Screen clipping taken: 2024-10-01 6:36 PM Lecture 3 Page 188 SM Warp Scheduling October 1, 2024 6:37 PM Screen clipping taken: 2024-10-01 6:37 PM Lecture 3 Page 189 Example October 1, 2024 6:37 PM Screen clipping taken: 2024-10-01 6:37 PM Lecture 3 Page 190 Thread Batching: Grids and Blocks October 1, 2024 6:38 PM Screen clipping taken: 2024-10-01 6:38 PM Screen clipping taken: 2024-10-01 6:38 PM Lecture 3 Page 191 Block and Thread IDs October 1, 2024 6:38 PM Screen clipping taken: 2024-10-01 6:38 PM Lecture 3 Page 192 Memory Model October 1, 2024 6:38 PM Screen clipping taken: 2024-10-01 6:39 PM Screen clipping taken: 2024-10-01 6:39 PM Lecture 3 Page 193 Screen clipping taken: 2024-10-01 6:39 PM Screen clipping taken: 2024-10-01 6:39 PM Screen clipping taken: 2024-10-01 6:39 PM Screen clipping taken: 2024-10-01 6:39 PM Lecture 3 Page 194 Screen clipping taken: 2024-10-01 6:39 PM Screen clipping taken: 2024-10-01 6:40 PM Screen clipping taken: 2024-10-01 6:41 PM Lecture 3 Page 195 CUDA Device Memory Space Overview October 1, 2024 6:41 PM Screen clipping taken: 2024-10-01 6:41 PM Screen clipping taken: 2024-10-01 6:42 PM Lecture 3 Page 196 Global, Constant, and Texture Memories October 1, 2024 6:42 PM Screen clipping taken: 2024-10-01 6:42 PM Lecture 3 Page 197 CUDA Device Memory Allocation October 1, 2024 6:43 PM Screen clipping taken: 2024-10-01 6:43 PM Screen clipping taken: 2024-10-01 6:43 PM Lecture 3 Page 198 CUDA Host-Device Data Transfer October 1, 2024 6:43 PM Screen clipping taken: 2024-10-01 6:44 PM Screen clipping taken: 2024-10-01 6:44 PM Lecture 3 Page 199 CUDA Function Declarations October 1, 2024 6:44 PM Screen clipping taken: 2024-10-01 6:44 PM Screen clipping taken: 2024-10-01 6:45 PM Lecture 3 Page 200 A Matrix Data Type October 1, 2024 6:42 PM Screen clipping taken: 2024-10-01 6:43 PM Lecture 3 Page 201 Calling a Kernel Function – Thread Creation October 1, 2024 6:45 PM Screen clipping taken: 2024-10-01 6:45 PM Lecture 3 Page 202 A Simple Running Example Matrix Multiplication October 1, 2024 6:45 PM Screen clipping taken: 2024-10-01 6:46 PM Lecture 3 Page 203 Multiply Using One Thread Block October 1, 2024 7:02 PM Screen clipping taken: 2024-10-01 7:02 PM Lecture 3 Page 204 Programming Model: Matrix Multiplication Example October 1, 2024 6:59 PM Screen clipping taken: 2024-10-01 7:00 PM Lecture 3 Page 205 Step 1: Matrix Data Transfers October 1, 2024 7:00 PM Screen clipping taken: 2024-10-01 7:01 PM Lecture 3 Page 206 Step 2: Matrix Multiplication A Simple Host Code in C October 1, 2024 7:01 PM Screen clipping taken: 2024-10-01 7:02 PM Lecture 3 Page 207 Step 3: Matrix Multiplication Host-side Main Program Code October 1, 2024 7:02 PM Screen clipping taken: 2024-10-01 7:02 PM Screen clipping taken: 2024-10-01 7:03 PM Lecture 3 Page 208 Step 4: Matrix Multiplication Device-side Kernel Function October 1, 2024 7:03 PM Screen clipping taken: 2024-10-01 7:03 PM Screen clipping taken: 2024-10-01 7:03 PM Lecture 3 Page 209 Step 5: Some Loose Ends October 1, 2024 7:04 PM Screen clipping taken: 2024-10-01 7:04 PM Screen clipping taken: 2024-10-01 7:04 PM Lecture 3 Page 210 Step 6: Handling Arbitrary Sized Square Matrices October 1, 2024 7:04 PM Screen clipping taken: 2024-10-01 7:04 PM Lecture 3 Page 211 CUDA Advantages over Legacy GPGPU October 1, 2024 7:05 PM Screen clipping taken: 2024-10-01 7:05 PM Lecture 3 Page 212 CUDA Disadvantages October 1, 2024 7:06 PM Screen clipping taken: 2024-10-01 7:06 PM Lecture 3 Page 213 A Common Programming Strategy October 1, 2024 7:06 PM Screen clipping taken: 2024-10-01 7:06 PM Lecture 3 Page 214 Tiled Data Strategy October 1, 2024 7:06 PM Screen clipping taken: 2024-10-01 7:06 PM Lecture 3 Page 215 Simple Matrix Multiplication October 1, 2024 7:06 PM Screen clipping taken: 2024-10-01 7:07 PM Lecture 3 Page 216 How about performance on G80? October 1, 2024 7:08 PM Screen clipping taken: 2024-10-01 7:21 PM Lecture 3 Page 217 Idea: Use Shared Memory to reuse global memory data October 1, 2024 7:21 PM Screen clipping taken: 2024-10-01 7:21 PM Lecture 3 Page 218 Tiled Multiply October 1, 2024 7:21 PM Screen clipping taken: 2024-10-01 7:21 PM Lecture 3 Page 219 Device Runtime Component: Synchronization Function October 1, 2024 7:21 PM Screen clipping taken: 2024-10-01 7:22 PM Lecture 3 Page 220 First-order Size Considerations in G80 October 1, 2024 7:22 PM Screen clipping taken: 2024-10-01 7:22 PM Lecture 3 Page 221 CUDA Code: Kernel Execution Configuration October 1, 2024 7:22 PM Screen clipping taken: 2024-10-01 7:23 PM Lecture 3 Page 222 Tiled Matrix Multiplication Kernel October 1, 2024 7:23 PM Screen clipping taken: 2024-10-01 7:23 PM Lecture 3 Page 223 G80 Shared Memory and Threading October 1, 2024 7:23 PM Screen clipping taken: 2024-10-01 7:23 PM Lecture 3 Page 224 Tiling Size Effects October 1, 2024 7:23 PM Screen clipping taken: 2024-10-01 7:24 PM Lecture 3 Page 225 What’s Limiting My Code? October 1, 2024 7:24 PM Screen clipping taken: 2024-10-01 7:24 PM Lecture 3 Page 226 OpenGL Rendering October 1, 2024 7:24 PM Screen clipping taken: 2024-10-01 7:25 PM Lecture 3 Page 227 OpenGL Interoperability October 1, 2024 7:25 PM Screen clipping taken: 2024-10-01 7:25 PM Lecture 3 Page 228 Example from simpleGL in SDK October 1, 2024 7:25 PM Screen clipping taken: 2024-10-01 7:25 PM Screen clipping taken: 2024-10-01 7:25 PM Lecture 3 Page 229 Processor Parallelism October 29, 2024 3:20 PM Screen clipping taken: 2024-10-29 3:20 PM Lecture 4 Page 230 The origins of OpenCL October 29, 2024 3:20 PM Screen clipping taken: 2024-10-29 3:20 PM Lecture 4 Page 231 With OpenCL we can October 29, 2024 3:20 PM Screen clipping taken: 2024-10-29 3:21 PM Lecture 4 Page 232 OpenCL Design Requirements October 29, 2024 3:21 PM Screen clipping taken: 2024-10-29 3:21 PM Screen clipping taken: 2024-10-29 3:21 PM Lecture 4 Page 233 Benefits of OpenCL October 29, 2024 3:21 PM Screen clipping taken: 2024-10-29 3:22 PM Lecture 4 Page 234 Anatomy of OpenCL October 29, 2024 3:22 PM Screen clipping taken: 2024-10-29 3:22 PM Screen clipping taken: 2024-10-29 3:22 PM Lecture 4 Page 235 OpenCL Program October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 3:23 PM Lecture 4 Page 236 Hierarchy of Models October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 3:23 PM Lecture 4 Page 237 Platform Model October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 3:24 PM Lecture 4 Page 238 OpenCL Platform Example October 29, 2024 3:24 PM Screen clipping taken: 2024-10-29 3:24 PM Lecture 4 Page 239 Memory Model October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 3:24 PM Lecture 4 Page 240 Execution Model October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 4:04 PM Lecture 4 Page 241 Screen clipping taken: 2024-10-29 4:04 PM Screen clipping taken: 2024-10-29 4:04 PM Lecture 4 Page 242 Screen clipping taken: 2024-10-29 4:04 PM Screen clipping taken: 2024-10-29 4:04 PM Lecture 4 Page 243 Screen clipping taken: 2024-10-29 4:04 PM Lecture 4 Page 244 Screen clipping taken: 2024-10-29 6:02 PM Screen clipping taken: 2024-10-29 6:02 PM Lecture 4 Page 245 Screen clipping taken: 2024-10-29 6:02 PM Screen clipping taken: 2024-10-29 6:02 PM Lecture 4 Page 246 Screen clipping taken: 2024-10-29 6:03 PM Lecture 4 Page 247 Programming Model October 29, 2024 3:23 PM Screen clipping taken: 2024-10-29 6:03 PM Lecture 4 Page 248 Screen clipping taken: 2024-10-29 6:03 PM Screen clipping taken: 2024-10-29 6:03 PM Lecture 4 Page 249 Screen clipping taken: 2024-10-29 6:03 PM Lecture 4 Page 250 Screen clipping taken: 2024-10-29 6:03 PM Screen clipping taken: 2024-10-29 6:03 PM Lecture 4 Page 251 OpenCL Runtime October 29, 2024 4:03 PM Screen clipping taken: 2024-10-29 6:04 PM Lecture 4 Page 252 Command Queues October 29, 2024 6:04 PM Screen clipping taken: 2024-10-29 6:05 PM Lecture 4 Page 253 Execution October 29, 2024 6:05 PM Screen clipping taken: 2024-10-29 6:05 PM Lecture 4 Page 254 Synchronization October 29, 2024 6:05 PM Screen clipping taken: 2024-10-29 6:05 PM Lecture 4 Page 255 Work-Item Synchronization Within a Work-Group October 29, 2024 6:06 PM Screen clipping taken: 2024-10-29 6:06 PM Screen clipping taken: 2024-10-29 6:06 PM Lecture 4 Page 256 OpenCL Events October 29, 2024 6:06 PM Screen clipping taken: 2024-10-29 6:07 PM Screen clipping taken: 2024-10-29 6:07 PM Lecture 4 Page 257 Generating and consuming events October 29, 2024 6:09 PM Screen clipping taken: 2024-10-29 6:09 PM Lecture 4 Page 258 Event: basic event usage October 29, 2024 6:09 PM Screen clipping taken: 2024-10-29 6:10 PM Lecture 4 Page 259 Host code influencing commands: User events October 29, 2024 6:12 PM Screen clipping taken: 2024-10-29 6:13 PM Lecture 4 Page 260 Commands influencing host code October 29, 2024 6:13 PM Screen clipping taken: 2024-10-29 6:14 PM Lecture 4 Page 261 Profiling with Events October 29, 2024 6:14 PM Screen clipping taken: 2024-10-29 6:14 PM Lecture 4 Page 262 OpenCL Synchronization: Queues & Events October 29, 2024 6:10 PM Screen clipping taken: 2024-10-29 6:11 PM Lecture 4 Page 263 Why Events? Won’t a barrier do? October 29, 2024 6:11 PM Screen clipping taken: 2024-10-29 6:11 PM Screen clipping taken: 2024-10-29 6:11 PM Lecture 4 Page 264 Barriers between queues: clEnqueueBarrier doesn’t work October 29, 2024 6:11 PM Screen clipping taken: 2024-10-29 6:12 PM Lecture 4 Page 265 Screen clipping taken: 2024-10-29 6:12 PM Lecture 4 Page 266 Using the Profiling interface October 29, 2024 6:14 PM Screen clipping taken: 2024-10-29 6:15 PM Lecture 4 Page 267 cl_profiling_info values October 29, 2024 6:16 PM Screen clipping taken: 2024-10-29 6:16 PM Lecture 4 Page 268 Profiling Example October 29, 2024 6:16 PM Screen clipping taken: 2024-10-29 6:16 PM Lecture 4 Page 269 Events inside Kernels … Async. copy October 29, 2024 6:17 PM Screen clipping taken: 2024-10-29 6:17 PM Lecture 4 Page 270 Events and the C++ interface (for profiling) October 31, 2024 5:12 PM Screen clipping taken: 2024-10-31 5:13 PM Lecture 4 Page 271 OpenCL C for Compute Kernels October 31, 2024 5:13 PM Screen clipping taken: 2024-10-31 5:13 PM Screen clipping taken: 2024-10-31 5:13 PM Screen clipping taken: 2024-10-31 5:13 PM Lecture 4 Page 272 Language Highlights October 31, 2024 5:13 PM Screen clipping taken: 2024-10-31 5:14 PM Lecture 4 Page 273 Lecture 4 Page 274 Language Restrictions October 31, 2024 5:14 PM Screen clipping taken: 2024-10-31 5:15 PM Lecture 4 Page 275 Optional Extensions October 31, 2024 5:15 PM Screen clipping taken: 2024-10-31 5:15 PM Lecture 4 Page 276 OpenGL Interoperability October 31, 2024 5:15 PM Screen clipping taken: 2024-10-31 5:15 PM Screen clipping taken: 2024-10-31 5:15 PM Lecture 4 Page 277 Lecture 4 Page 278 OpenCL Programming October 31, 2024 5:15 PM Screen clipping taken: 2024-10-31 5:15 PM Lecture 4 Page 279 Initialization October 31, 2024 5:15 PM Screen clipping taken: 2024-10-31 5:16 PM Screen clipping taken: 2024-10-31 5:16 PM Lecture 4 Page 280 Lecture 4 Page 281 Choosing Devices October 31, 2024 5:16 PM Screen clipping taken: 2024-10-31 5:16 PM Lecture 4 Page 282 Create Memory Objects October 31, 2024 5:16 PM Screen clipping taken: 2024-10-31 5:16 PM Lecture 4 Page 283 Allocating Images and Buffers October 31, 2024 5:16 PM Screen clipping taken: 2024-10-31 5:16 PM Lecture 4 Page 284 Memory Resources October 31, 2024 5:17 PM Screen clipping taken: 2024-10-31 5:17 PM Lecture 4 Page 285 Image Formats and Samplers October 31, 2024 5:17 PM Screen clipping taken: 2024-10-31 5:17 PM Lecture 4 Page 286 Transfer Data October 31, 2024 5:17 PM Screen clipping taken: 2024-10-31 5:17 PM Lecture 4 Page 287 Reading/Writing Memory Object Data October 31, 2024 5:17 PM Screen clipping taken: 2024-10-31 5:18 PM Lecture 4 Page 288 Execution Overview October 31, 2024 5:18 PM Screen clipping taken: 2024-10-31 5:18 PM Lecture 4 Page 289 Program Objects October 31, 2024 5:18 PM Screen clipping taken: 2024-10-31 5:18 PM Lecture 4 Page 290 Kernel Objects October 31, 2024 5:18 PM Screen clipping taken: 2024-10-31 5:19 PM Lecture 4 Page 291 Program and Kernel Objects October 31, 2024 5:19 PM Screen clipping taken: 2024-10-31 5:19 PM Lecture 4 Page 292 Executing Code October 31, 2024 5:19 PM Screen clipping taken: 2024-10-31 5:19 PM Lecture 4 Page 293 Kernel Arguments October 31, 2024 5:20 PM Screen clipping taken: 2024-10-31 5:20 PM Lecture 4 Page 294 Kernel Execution October 31, 2024 5:20 PM Screen clipping taken: 2024-10-31 5:20 PM Lecture 4 Page 295 Executing Kernels October 31, 2024 5:21 PM Screen clipping taken: 2024-10-31 5:21 PM Lecture 4 Page 296 Sample walkthrough October 31, 2024 5:21 PM Screen clipping taken: 2024-10-31 5:22 PM Screen clipping taken: 2024-10-31 5:22 PM Lecture 4 Page 297 Screen clipping taken: 2024-10-31 5:22 PM Screen clipping taken: 2024-10-31 5:22 PM Lecture 4 Page 298 Crea , Screen clipping taken: 2024-10-31 5:22 PM Screen clipping taken: 2024-10-31 5:22 PM Lecture 4 Page 299 Screen clipping taken: 2024-10-31 5:22 PM Screen clipping taken: 2024-10-31 5:23 PM Lecture 4 Page 300 Matrix multiplication: sequential code October 31, 2024 5:23 PM Screen clipping taken: 2024-10-31 5:23 PM Lecture 4 Page 301 Matrix multiplication: OpenCL kernel October 31, 2024 5:23 PM Screen clipping taken: 2024-10-31 5:24 PM Screen clipping taken: 2024-10-31 5:24 PM Lecture 4 Page 302 Screen clipping taken: 2024-10-31 5:24 PM Screen clipping taken: 2024-10-31 5:24 PM Lecture 4 Page 303 Screen clipping taken: 2024-10-31 5:24 PM Lecture 4 Page 304 Matrix multiplication host program October 31, 2024 5:25 PM Screen clipping taken: 2024-10-31 5:25 PM Screen clipping taken: 2024-10-31 5:25 PM Lecture 4 Page 305 Screen clipping taken: 2024-10-31 5:25 PM Screen clipping taken: 2024-10-31 5:30 PM Lecture 4 Page 306 Screen clipping taken: 2024-10-31 5:32 PM Lecture 4 Page 307 Making OpenCL Matrix Multiplication really fast October 31, 2024 5:32 PM Screen clipping taken: 2024-10-31 5:46 PM Lecture 4 Page 308 Performance of OpenCL Matrix Multiplication October 31, 2024 5:25 PM Screen clipping taken: 2024-10-31 5:25 PM Screen clipping taken: 2024-10-31 5:30 PM Lecture 4 Page 309 Screen clipping taken: 2024-10-31 5:31 PM Screen clipping taken: 2024-10-31 5:32 PM Lecture 4 Page 310 Screen clipping taken: 2024-10-31 5:32 PM Screen clipping taken: 2024-10-31 5:46 PM Lecture 4 Page 311 Optimizing OpenCL Matrix Multiplication October 31, 2024 5:25 PM Screen clipping taken: 2024-10-31 5:27 PM Screen clipping taken: 2024-10-31 5:28 PM Lecture 4 Page 312 Screen clipping taken: 2024-10-31 5:30 PM Screen clipping taken: 2024-10-31 5:31 PM Lecture 4 Page 313 Row of C per work-item, A row private October 31, 2024 5:30 PM Screen clipping taken: 2024-10-31 5:31 PM Screen clipping taken: 2024-10-31 5:31 PM Lecture 4 Page 314 Screen clipping taken: 2024-10-31 5:31 PM Lecture 4 Page 315 An N-dimension domain of work-items October 31, 2024 5:28 PM Screen clipping taken: 2024-10-31 5:29 PM Lecture 4 Page 316 Reduce work-item overhead … do one row of C per work- item October 31, 2024 5:29 PM Screen clipping taken: 2024-10-31 5:30 PM Lecture 4 Page 317 Vector Types October 31, 2024 5:46 PM Screen clipping taken: 2024-10-31 5:47 PM Lecture 4 Page 318 Vector Operations October 31, 2024 5:47 PM Screen clipping taken: 2024-10-31 5:47 PM Lecture 4 Page 319 Converting a scalar loop into a vector loop October 31, 2024 5:47 PM Screen clipping taken: 2024-10-31 5:47 PM Lecture 4 Page 320 Vector instructions example October 31, 2024 5:48 PM Screen clipping taken: 2024-10-31 5:48 PM Lecture 4 Page 321 Portable performance in OpenCL October 31, 2024 6:04 PM Screen clipping taken: 2024-10-31 6:05 PM Lecture 4 Page 322 Advice for performance portability October 31, 2024 6:05 PM Screen clipping taken: 2024-10-31 6:05 PM Screen clipping taken: 2024-10-31 6:05 PM Lecture 4 Page 323 Screen clipping taken: 2024-10-31 6:05 PM Lecture 4 Page 324 OpenCL 2.0 October 31, 2024 6:05 PM Screen clipping taken: 2024-10-31 6:05 PM Lecture 4 Page 325 Conclusion October 31, 2024 6:05 PM Screen clipping taken: 2024-10-31 6:05 PM Lecture 4 Page 326 Four Common Steps to Creating a Parallel Program November 5, 2024 9:54 AM Screen clipping taken: 2024-11-05 9:54 AM Lecture 5 Page 327 Decomposition (Amdahl’s Law) November 5, 2024 9:54 AM Screen clipping taken: 2024-11-05 9:55 AM Lecture 5 Page 328 Assignment (Granularity) November 5, 2024 9:54 AM Screen clipping taken: 2024-11-05 9:55 AM Lecture 5 Page 329 Orchestration and Mapping (Locality) November 5, 2024 9:54 AM Screen clipping taken: 2024-11-05 9:55 AM Lecture 5 Page 330 The PCAM Methodology November 5, 2024 9:54 AM Screen clipping taken: 2024-11-05 9:55 AM Screen clipping taken: 2024-11-05 9:56 AM Lecture 5 Page 331 Lecture 5 Page 332 Parallel Programming by Pattern November 5, 2024 9:56 AM Screen clipping taken: 2024-11-05 9:56 AM Lecture 5 Page 333 Patterns for Parallelizing Programs November 5, 2024 9:56 AM Screen clipping taken: 2024-11-05 9:56 AM Lecture 5 Page 334 Example November 5, 2024 9:57 AM Screen clipping taken: 2024-11-05 9:57 AM Screen clipping taken: 2024-11-05 9:57 AM Lecture 5 Page 335 Screen clipping taken: 2024-11-05 9:57 AM Screen clipping taken: 2024-11-05 9:57 AM Screen clipping taken: 2024-11-05 9:57 AM Lecture 5 Page 336 Lecture 5 Page 337 Guidelines for Task Decomposition November 5, 2024 9:57 AM Screen clipping taken: 2024-11-05 9:57 AM Screen clipping taken: 2024-11-05 9:58 AM Lecture 5 Page 338 Screen clipping taken: 2024-11-05 9:58 AM Screen clipping taken: 2024-11-05 9:58 AM Screen clipping taken: 2024-11-05 9:58 AM Lecture 5 Page 339 Common Data Decompositions November 5, 2024 9:58 AM Screen clipping taken: 2024-11-05 9:58 AM Lecture 5 Page 340 Case for Pipeline Decomposition November 5, 2024 9:58 AM Screen clipping taken: 2024-11-05 9:59 AM Lecture 5 Page 341 Reengineering for Parallelism November 5, 2024 9:59 AM Screen clipping taken: 2024-11-05 9:59 AM Screen clipping taken: 2024-11-05 9:59 AM Lecture 5 Page 342 Screen clipping taken: 2024-11-05 9:59 AM Lecture 5 Page 343 Example: Molecular dynamics November 5, 2024 9:59 AM Screen clipping taken: 2024-11-05 9:59 AM Screen clipping taken: 2024-11-05 9:59 AM Lecture 5 Page 344 Screen clipping taken: 2024-11-05 9:59 AM Lecture 5 Page 345 Finding Concurrency Design Space November 5, 2024 10:00 AM Screen clipping taken: 2024-11-05 10:00 AM Lecture 5 Page 346 Screen clipping taken: 2024-11-05 10:00 AM Lecture 5 Page 347 Understand Control Dependences November 5, 2024 10:00 AM Screen clipping taken: 2024-11-05 10:00 AM Screen clipping taken: 2024-11-05 10:00 AM Lecture 5 Page 348 Evaluate Design November 5, 2024 10:00 AM Screen clipping taken: 2024-11-05 10:01 AM Lecture 5 Page 349 Algorithm Structure Design Space November 5, 2024 10:02 AM Screen clipping taken: 2024-11-05 10:02 AM Screen clipping taken: 2024-11-05 10:02 AM Lecture 5 Page 350 Major Organizing Principle November 5, 2024 10:02 AM Screen clipping taken: 2024-11-05 10:02 AM Lecture 5 Page 351 Decision Tree for Algorithm Structure Design Space November 5, 2024 10:02 AM Screen clipping taken: 2024-11-05 10:02 AM Lecture 5 Page 352 Organize by Tasks? November 5, 2024 10:03 AM Screen clipping taken: 2024-11-05 10:03 AM Lecture 5 Page 353 Task Parallelism November 5, 2024 10:03 AM Screen clipping taken: 2024-11-05 10:03 AM Lecture 5 Page 354 Good vs. Poor Load Balance November 5, 2024 10:03 AM Screen clipping taken: 2024-11-05 10:03 AM Lecture 5 Page 355 Divide and Conquer November 5, 2024 10:03 AM Screen clipping taken: 2024-11-05 10:04 AM Lecture 5 Page 356 Parallelizing the Divide-and-Conquer Strategy November 5, 2024 10:04 AM Screen clipping taken: 2024-11-05 10:04 AM Lecture 5 Page 357 Organize by Data? November 5, 2024 10:04 AM Screen clipping taken: 2024-11-05 10:05 AM Lecture 5 Page 358 Geometric Decomposition November 5, 2024 10:05 AM Screen clipping taken: 2024-11-05 10:05 AM Lecture 5 Page 359 Recursive Data November 5, 2024 10:05 AM Screen clipping taken: 2024-11-05 10:05 AM Lecture 5 Page 360 Recursive Data Example: Find the Root November 5, 2024 10:05 AM Screen clipping taken: 2024-11-05 10:05 AM Lecture 5 Page 361 Work vs. Concurrency Tradeoff November 5, 2024 10:05 AM Screen clipping taken: 2024-11-05 10:06 AM Lecture 5 Page 362 Organize by Flow of Data? November 5, 2024 10:06 AM Screen clipping taken: 2024-11-05 10:06 AM Lecture 5 Page 363 Pipeline November 5, 2024 10:06 AM Screen clipping taken: 2024-11-05 10:06 AM Lecture 5 Page 364 Throughput vs. Latency November 5, 2024 10:06 AM Screen clipping taken: 2024-11-05 10:07 AM Lecture 5 Page 365 Event-Based Coordination November 5, 2024 10:07 AM Screen clipping taken: 2024-11-05 10:07 AM Lecture 5 Page 366 Program Structure Patterns November 5, 2024 10:07 AM Screen clipping taken: 2024-11-05 10:07 AM Lecture 5 Page 367 Single Program Multiple Data November 5, 2024 10:07 AM Screen clipping taken: 2024-11-05 10:07 AM Lecture 5 Page 368 Pattern November 5, 2024 10:08 AM Screen clipping taken: 2024-11-05 10:08 AM Lecture 5 Page 369 Challenges November 5, 2024 10:08 AM Screen clipping taken: 2024-11-05 10:08 AM Lecture 5 Page 370 Multiple Program Multiple Data November 5, 2024 10:08 AM Screen clipping taken: 2024-11-05 10:08 AM Lecture 5 Page 371 Loop Parallelism Pattern November 5, 2024 10:08 AM Screen clipping taken: 2024-11-05 10:08 AM Screen clipping taken: 2024-11-05 10:09 AM Lecture 5 Page 372 Master/Worker Pattern November 5, 2024 10:09 AM Screen clipping taken: 2024-11-05 10:09 AM Screen clipping taken: 2024-11-05 10:09 AM Lecture 5 Page 373 Fork/Join Pattern November 5, 2024 10:10 AM Screen clipping taken: 2024-11-05 10:10 AM Lecture 5 Page 374 Map-Reduce November 5, 2024 10:09 AM Screen clipping taken: 2024-11-05 10:09 AM Lecture 5 Page 375 Algorithm Structure and Organization November 5, 2024 10:10 AM Screen clipping taken: 2024-11-05 10:10 AM Lecture 5 Page 376 PCAM Example November 27, 2024 6:53 PM Screen clipping taken: 2024-11-27 6:53 PM Screen clipping taken: 2024-11-27 6:54 PM Lecture 6 Page 377 Screen clipping taken: 2024-11-27 6:54 PM Screen clipping taken: 2024-11-27 6:54 PM Lecture 6 Page 378 Partitioning November 27, 2024 6:54 PM Screen clipping taken: 2024-11-27 6:54 PM Lecture 6 Page 379 Communication November 27, 2024 6:54 PM Screen clipping taken: 2024-11-27 6:54 PM Lecture 6 Page 380 Agglomeration November 27, 2024 6:54 PM Screen clipping taken: 2024-11-27 6:54 PM Lecture 6 Page 381 Image Convolution Kernel November 27, 2024 6:54 PM Screen clipping taken: 2024-11-27 6:55 PM Screen clipping taken: 2024-11-27 6:55 PM Lecture 6 Page 382 Screen clipping taken: 2024-11-27 6:55 PM Screen clipping taken: 2024-11-27 6:55 PM Lecture 6 Page 383 Screen clipping taken: 2024-11-27 6:55 PM Screen clipping taken: 2024-11-27 6:55 PM Lecture 6 Page 384 Image Convolution Host Program November 27, 2024 6:55 PM Screen clipping taken: 2024-11-27 6:55 PM Screen clipping taken: 2024-11-27 6:56 PM Lecture 6 Page 385 Screen clipping taken: 2024-11-27 6:56 PM Screen clipping taken: 2024-11-27 6:56 PM Screen clipping taken: 2024-11-27 6:56 PM Lecture 6 Page 386 Screen clipping taken: 2024-11-27 6:56 PM Screen clipping taken: 2024-11-27 6:56 PM Screen clipping taken: 2024-11-27 6:56 PM Lecture 6 Page 387 Screen clipping taken: 2024-11-27 6:56 PM Screen clipping taken: 2024-11-27 6:57 PM Lecture 6 Page 388 Screen clipping taken: 2024-11-27 6:57 PM Screen clipping taken: 2024-11-27 6:57 PM Lecture 6 Page 389 Screen clipping taken: 2024-11-27 6:57 PM Screen clipping taken: 2024-11-27 6:57 PM Lecture 6 Page 390 Screen clipping taken: 2024-11-27 6:57 PM Screen clipping taken: 2024-11-27 6:57 PM Screen clipping taken: 2024-11-27 6:57 PM Lecture 6 Page 391 Monotonic and Bitonic Sets November 27, 2024 6:58 PM Screen clipping taken: 2024-11-27 6:58 PM Lecture 6 Page 392 Bitonic Splits November 27, 2024 6:58 PM Screen clipping taken: 2024-11-27 6:58 PM Lecture 6 Page 393 Forming Bitonic Halves and Bitonic Sequence November 27, 2024 6:58 PM Screen clipping taken: 2024-11-27 6:58 PM Lecture 6 Page 394 Splitting the bitonic sort among work-groups November 27, 2024 6:59 PM Screen clipping taken: 2024-11-27 6:59 PM Lecture 6 Page 395 Sorting Elements inside a Vector November 27, 2024 6:58 PM Screen clipping taken: 2024-11-27 6:58 PM Lecture 6 Page 396 Sorting an 8-element sequence November 27, 2024 6:58 PM Screen clipping taken: 2024-11-27 6:59 PM Screen clipping taken: 2024-11-27 6:59 PM Lecture 6 Page 397 Screen clipping taken: 2024-11-27 6:59 PM Lecture 6 Page 398 16-element Bitonic Sort November 27, 2024 7:00 PM Screen clipping taken: 2024-11-27 7:00 PM Screen clipping taken: 2024-11-27 7:00 PM Lecture 6 Page 399 Screen clipping taken: 2024-11-27 7:00 PM Lecture 6 Page 400 General Bitonic Sort November 27, 2024 7:00 PM Screen clipping taken: 2024-11-27 7:00 PM Screen clipping taken: 2024-11-27 7:00 PM Lecture 6 Page 401 Screen clipping taken: 2024-11-27 7:00 PM Screen clipping taken: 2024-11-27 7:01 PM Lecture 6 Page 402 Screen clipping taken: 2024-11-27 7:01 PM Screen clipping taken: 2024-11-27 7:01 PM Lecture 6 Page 403 Screen clipping taken: 2024-11-27 7:01 PM Screen clipping taken: 2024-11-27 7:01 PM Lecture 6 Page 404 3 Ways to Accelerate Applications November 27, 2024 7:02 PM Screen clipping taken: 2024-11-27 7:07 PM Lecture 7 Page 405 Introduction to Thrust November 27, 2024 7:07 PM Screen clipping taken: 2024-11-27 7:07 PM Lecture 7 Page 406 Thrust VS CUDA November 27, 2024 7:07 PM Screen clipping taken: 2024-11-27 7:08 PM Lecture 7 Page 407 "Hello World" in Thrust November 27, 2024 7:08 PM Screen clipping taken: 2024-11-27 7:08 PM Lecture 7 Page 408 A more elaborate example : averaging an array of numbers November 27, 2024 7:08 PM Screen clipping taken: 2024-11-27 7:08 PM Lecture 7 Page 409 Averaging an array of numbers – The explanation November 27, 2024 7:08 PM Screen clipping taken: 2024-11-27 7:09 PM Lecture 7 Page 410 SAXPY using Thrust November 27, 2024 7:09 PM Screen clipping taken: 2024-11-27 7:09 PM Screen clipping taken: 2024-11-27 7:09 PM Lecture 7 Page 411 Screen clipping taken: 2024-11-27 7:09 PM Screen clipping taken: 2024-11-27 7:09 PM Lecture 7 Page 412 Screen clipping taken: 2024-11-27 7:09 PM Lecture 7 Page 413 What is OpenACC November 27, 2024 7:09 PM Screen clipping taken: 2024-11-27 7:10 PM Lecture 7 Page 414 What are compiler directives? November 27, 2024 7:10 PM Screen clipping taken: 2024-11-27 7:10 PM Lecture 7 Page 415 Openacc Directives November 27, 2024 7:10 PM Screen clipping taken: 2024-11-27 7:10 PM Lecture 7 Page 416 Single code for multiple platforms November 27, 2024 7:10 PM Screen clipping taken: 2024-11-27 7:10 PM Lecture 7 Page 417 Familiar to OpenMP Programmers November 27, 2024 7:10 PM Screen clipping taken: 2024-11-27 7:10 PM Lecture 7 Page 418 OpenACC: Key Advantages November 27, 2024 7:10 PM Screen clipping taken: 2024-11-27 7:11 PM Lecture 7 Page 419 True Open Standard November 27, 2024 7:11 PM Screen clipping taken: 2024-11-27 7:11 PM Lecture 7 Page 420 A Simple Example: SAXPY November 27, 2024 7:11 PM Screen clipping taken: 2024-11-27 7:11 PM Lecture 7 Page 421 kernels: Our first OpenACC Directive November 27, 2024 7:11 PM Screen clipping taken: 2024-11-27 7:11 PM Lecture 7 Page 422 General Directive Syntax and Scope November 27, 2024 7:11 PM Screen clipping taken: 2024-11-27 7:12 PM Lecture 7 Page 423 Complete SAXPY Example Code November 27, 2024 7:12 PM Screen clipping taken: 2024-11-27 7:12 PM Lecture 7 Page 424 C Detail: the “restrict” Keyword November 27, 2024 7:12 PM Screen clipping taken: 2024-11-27 7:12 PM Lecture 7 Page 425 Compile and Run November 27, 2024 7:12 PM Screen clipping taken: 2024-11-27 7:12 PM Lecture 7 Page 426 Compare: OpenACC and CUDA Implementation November 27, 2024 7:12 PM Screen clipping taken: 2024-11-27 7:12 PM Lecture 7 Page 427 Big Difference: OpenACC vs CUDA implementations November 27, 2024 7:12 PM Screen clipping taken: 2024-11-27 7:13 PM Lecture 7 Page 428 OpenACC parallel directive November 27, 2024 7:13 PM Screen clipping taken: 2024-11-27 7:13 PM Lecture 7 Page 429 Kernels vs parallel November 27, 2024 7:13 PM Screen clipping taken: 2024-11-27 7:13 PM Lecture 7 Page 430 OpenACC loop directive November 27, 2024 7:13 PM Screen clipping taken: 2024-11-27 7:13 PM Lecture 7 Page 431