COMP 3730 Intro to Parallel Programming Fall 2024 Lecture 1 PDF

COMP 3730 Intro to Parallel Programming Fall 2024 – Instructor: Terrence Tricco © Lecture 1 Introduction What is Parallel Programming? Parallel programming is about simultaneous computation. The goal is to solve a problem faster by using multiple processing elements (cores, CPUs, GPUs, etc). Do We Need To Calculate Faster? Do We Need to Calculate Faster? Narayanan+ (2021) Large Language Models have grown exponentially in size. 1000x larger in just 3 years! LLM Training Details What does it take to train an LLM? LLM Training Details Llama is a set of large language models (LLMs) openly released by Meta (Facebook). They are comparable in size and ability as other LLMs, such as ChatGPT or Claude. Training an LLM requires serious computational power. These models have billions of parameters, and their training data consists of trillions of words. LLM Training Details Llama 1 – 65B (Feb 2023): 21 days training time. 2048 Nvidia A100 GPUs. 14M CUDA cores total. LLM Training Details Llama 1 – 65B (Feb 2023): 21 days training time. 2048 Nvidia A100 GPUs. 14M CUDA cores total. Llama 3 – 405B (Jul 2024): ~80 days training time. 16,384 Nvidia H100 GPUs. 276M CUDA cores total. LLM Training Details Llama 1 – 65B (Feb 2023): Training on a single GPU would take: 21 days training time. 2048 Nvidia A100 GPUs. 43,000 days (~120 years!) 14M CUDA cores total. Llama 3 – 405B (Jul 2024): ~80 days training time. 1,310,000 days (~3600 years!) 16,384 Nvidia H100 GPUs. 276M CUDA cores total. The Power of Parallelism Parallel computing is what reduced the training of Llama 3 from an impossible 3600 years to a mere 3 months. Llama Parallel Challenges In this course, we will explore many of the same challenges faced in training Llama. See section 3.3 of the Llama paper for their discussion on how they parallelized the training of their models. Network topology, Load balancing Parallelism for Model Scaling, GPU utilization, Memory imbalance, Computation imbalance, Network-aware parallelism configuration, Collective Communication Do I Care? So what? Meta is an outlier! I won’t be training LLMs! Everything is Parallel Everything is Parallel Everything is Parallel All modern applications use threads. Threads are a way to run your program using multiple cores. Up until now, everything that you have coded in your degree has almost certainly been single threaded. Your program executes your code in a sequential format. You write a sequence of instructions (lines of code), and those instructions are executed one after another in that sequence. Sequential Example (Single Thread) a = 0 for i in range(0, 100): a += 1 print(a) Output: 100 Parallel Example (Two Threads) CPU 1 a = 0 for i in range(0, 50): a += 1 Parallel Example (Two Threads) CPU 1 CPU 2 a = 0 b = 0 for i in range(0, 50): for i in range(50, 100): a += 1 b += 1 Parallel Example (Two Threads) CPU 1 CPU 2 a = 0 b = 0 for i in range(0, 50): for i in range(50, 100): a += 1 b += 1 a = a + b print(a) Output: 100 In theory, this will be 2x as fast! Each CPU is working simultaneously. Are You Sure I Need to Care About This? The largest, industry-leading companies care about parallel programming to build their biggest products. The smallest, everyday program on your laptop is inherently parallel using multiple threads. Are You Sure I Need to Care About This? The largest, industry-leading companies care about parallel programming to build their biggest products. The smallest, everyday program on your laptop is inherently parallel using multiple threads. Why? Why is everything parallel? Moore’s Law In 1965, Intel co-founder Gordan Moore noticed that transistor counts were doubling every two years. He predicted that this trend would continue for the next 10+ years at least. This rate of doubling is known as Moore’s Law. Moore (1965) Transistors Transistors are the electronic components that make up a CPU. A CPU is essentially a network of transistors working together to perform calculations or computations. Transistor count is a rough proxy for computational power. More transistors means a more powerful (faster) computer. Moore’s Law Data Source: https://en.wikipedia.org/wiki/Transistor_count Single Core Processors Exponential Increase Single Core Processors Multicore Exponential Flat / Plateaued Exponential Rise of the Multicore We see that transistor count keeps increasingly exponentially in line with Moore’s Law even to today. But since the mid-2000s, individual cores have effectively peaked on frequency and power consumption. Transistor count has only continued to exponentially increase in line with Moore’s Law because CPUs have become multicore. Exponential decrease in transistor size Intel Transistor size has stopped at 10 nm Data Source: Wikipedia Transistor Size Transistor counts increased from the 1950’s to the 2000’s because transistors kept becoming smaller. The problem is this – transistors have stopped becoming smaller. For the past decade, transistors have been unable to become smaller than 5-10 nm using normal manufacturing processes. Physical Limits A transistor that is 2 nm in size is roughly 10 atoms across. The physics at this scale is a barrier for transistors. Quantum effects (e.g., quantum tunneling) cannot be ignored. Heat dissipation is a significant challenge (avoid overheating). Transistors simply cannot continue to be scaled down in the traditional manner. Moore’s Law is Dead “Moore’s Law is dead. […] And the ability for Moore's Law to deliver twice the performance at the same cost, or at the same performance and half the cost, every year and a half is over. It's completely over.” Nvidia CEO Jensen Huang at the GPU Technology Conference 2022 Or Is It? This trend of Intel CEO Pat Gelsinger at Intel Innovation 2023 Moore’s Law in 10 Years Where will Moore’s Law take us in 10 years time? What will our computers look like then? Supercomputers might give us an idea. The world’s most powerful supercomputers have followed Moore’s Law for 30+ years. top500.org Top 5 Supercomputers in the World 9,472 64-core AMD Epyc CPUs + 37,888 220-core AMD Instinct GPUs 10,624 52-core Intel Xeon Max CPUs + 63,744 128-core Intel Data Centre GPU Max Eagle weighs 6 tons! top500.org Computers in 10+ Years Supercomputers of today regularly have 48-64 cores. The trend is clear that consumer-level computers will continue progressing toward larger and larger core counts. Expect to see more and more heterogenous architectures. Computers will feature combinations of specialized hardware! GPUs Supercomputers have become reliant on GPUs to achieve greater computational power. Programming for GPUs will become a fact of your development career. (Machine learning has already progressed this way.) And GPUs have even more cores! Nvidia H100 GPUs have up to 16,986 CUDA cores. COMP 3730 Introduction to Parallel Programming What to Expect This course is focused on the practical aspects of parallel programming. What to Expect This course is focused on the practical aspects of parallel programming. There will be theoretical components. But the goal is to learn how to work with parallel architectures. How does a multicore CPU or GPU work? How do we write code for them? What to Expect Here is the tentative high-level set of topics: Introduction to Parallelism. Explicit threading in C/C++. Implicit threading in C/C++. Performance aspects, synchronization, efficiency. Vectorization. GPU programming with CUDA. Evaluation Scheme 40% – Assignments (4) 2 before the Midterm. 2 after the Midterm. 20% – Midterm Exam Written, in class, on October 24. There are no deferred Midterm exams. 40% – Final Exam You must pass either the Midterm or Final exam to pass the course. Programming Language Assignments will use C++. I strongly recommend using an IDE if you do not already. You can get Professional Editions of PyCharm and CLion as a student if you register. I strongly recommend versioning your code on GitHub. Repositories must be private. Lecture Recordings I will be recording lectures and providing them to you. I believe that lectures should be as accessible as possible. Our room does not have Lecture Capture. But I will be recording the lecture slides and audio myself. There will not be any video of the room/myself unfortunately. Recordings will be posted onto YouTube. Copyright Lecture recordings, lecture notes, slides, assignments, exam and any other course material is subject to copyright. Your instructor (T. Tricco) owns the copyright to these materials. You are not allowed to share them with your friends, post them online, or share in any capacity whatsoever without my permission. (Hint: Ask me about software licences and copyright!) Generative AI I support your use of LLMs to assist with your learning. I use ChatGPT and LLMs myself. They are useful tools. However, ChatGPT cannot replace thinking or learning. You will need to learn the course content to pass the exams, so it is in your benefit to earnestly work on the assignments. Copying code or text from an LLM and presenting it as your own original work will be considered plagiarism. Contact and Office Hours Office hours are on Mondays from 1pm-3pm. My office is ER-6031 (sixth floor of Earth Science building). Email me at [email protected]. I rarely check Brightspace email. Conclusion Welcome to COMP 3730. This is a new course I have created(/am creating) from scratch. In some sense, you’re all my beta testers. And will happily accept your feedback throughout the course on how to improve things for next time. See you all next week.

COMP 3730 Intro to Parallel Programming Fall 2024 Lecture 1 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue