Chapter 2 HPC PDF

Dr. Ali Alshdifat Data Science Department 2024 – 2025 Chapter 2: High Performance Computing CONTENTS Introduction to HPC Parallel Architectures Multi Cores Graphical Processing Units Clusters Grid Computing Cloud Computing HIGH PERFORMANCE COMPUTING-HPC Ability to process data and perform complex calculations at high speeds One of the best-known types of HPC solutions is the supercomputer Supercomputer contains thousands of compute nodes that work together to complete one or more tasks The IBM Blue Gene/P supercomputer "Intrepid" at Argonne National Laboratory runs 164,000 processor cores using normal data center air conditioning, grouped in 40 racks/cabinets connected by a high- WHEN DO WE NEED HPC? Case1: Complete a time-consuming operation in less time  I am an automotive engineer  I need to design a new car that consumes less gasoline  I’d rather have the design completed in 6 months than in 2 years  I want to test my design using computer simulations rather than building very expensive prototypes and crashing them Case 2: Complete an operation under a tight deadline  I work for a weather prediction agency  I am getting input from weather stations/sensors  I’d like to predict tomorrow’s forecast today Case 3: Perform a high number of operations per seconds  I am an engineer at Amazon.com  My Web server gets 1,000 hits per seconds  I’d like my web server and databases to handle 1,000 transactions per seconds so that customers do not experience bad delays WHAT DOES HPC INCLUDE? High-performance computing is fast computing Computations in parallel over lots of compute elements (CPU, GPU) Very fast network to connect between the compute elements Hardware Computer Architecture Vector Computers, Distributed Systems, Clusters Network Connections InfiniBand, Ethernet, Proprietary Software Programming models MPI (Message Passing Interface), SHMEM (Shared Memory), PGAS, etc. Applications Open source, commercial HOW DOES HPC WORK? HPC solutions have three main components: Compute Network Storage To build a high-performance computing architecture, compute servers are networked together into a cluster Software programs and algorithms are run simultaneously on the servers in the cluster Cluster is networked to the data storage to capture the output Together, these components operate seamlessly to complete a diverse set of tasks PARALLEL ARCHITECTURES Traditionally, software has been written for serial computation A problem is broken into a discrete series of instructions Instructions are executed sequentially one after another Executed on a single processor Only one instruction may execute at any moment in time Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem A problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different processors An overall control/coordination mechanism is employed PARALLEL ARCHITECTURES Serial Computing Parallel Computing PARALLEL ARCHITECTURES Virtually all stand-alone computers today are parallel from a hardware perspective: Multiple functional units (L1 cache, L2 cache, branch, pre-fetch, decode, floating- point, graphics processing (GPU), integer, etc.) Multiple execution units/cores Multiple hardware threads Networks connect multiple stand-alone computers (nodes) to make larger parallel computer cluster WHY USE PARALLEL ARCHITECTURES? Save time and/or money Solve larger / more complex problems Provide concurrency Take advantage of non-local resources Make better use of underlying parallel hardware TYPES OF PARALLELISM Data Parallelism  Focuses on distributing the data across different parallel computing nodes  Also called as loop-level parallelism  Example: CPU A could add all elements from the top half of the matrices, while CPU B could add all elements from the bottom half of the matrices  Since the two processors work in parallel, the job of performing matrix addition would take one half the time of performing the same operation in serial using one CPU alone Task Parallelism  Focuses on distribution of tasks across different processors  Also known as functional parallelism or control parallelism  As a simple example, if we are running code on a 2-processor system (CPUs "a" & "b") in a parallel environment and we wish to do tasks "A" and "B" , it is possible to tell CPU "a" to do task "A" and CPU "b" to do task 'B" simultaneously, reducing the runtime of the execution PARALLEL ARCHITECTURES Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream Flynn’s classical taxonomy of Parallel Architectures: SISD – Single Instruction stream Single Data stream SIMD – Single Instruction stream Multiple Data stream MISD – Multiple Instruction stream Single Data stream MIMD – Multiple Instruction stream Multiple Data stream FLYNN’S CLASSICAL TAXONOMY SISD SIMD MISD MIMD Serial All processing units Different instructions Can execute different Only one instruction execute the same operated on a single instructions on and data stream is instruction at any data element. different data acted on during any given clock cycle Example: Multiple elements. one clock cycle Each processing unit cryptography Examples: Most Examples: older operates on a algorithms attempting current generation different data element to crack a single supercomputers, mainframes, Most modern coded message networked parallel minicomputers, computers, computer clusters and workstations and particularly those with "grids", multi- single processor/core GPUs employ SIMD processor computers, PCs instructions and multi-core PCs execution units PARALLEL COMPUTER MEMORY ARCHITECTURES: SHARED MEMORY ARCHITECTURE All processors access all memory as a single global address space & data sharing is fast Multiple processors can operate independently but share the same memory resources Changes in a memory location effected by one processor are visible to all other processors Uniform Memory Access (UMA) Non-Uniform Memory Access (NUMA) Shared memory machines have been classified as UMA and NUMA, based upon Commonly represented today by Symmetric Often made by physically linking two or more memory access times Multiprocessor (SMP) machines SMPs Identical processors, equal access and access One SMP can directly access memory of times to memory another SMP Sometimes called CC-UMA - Cache Coherent Not all processors have equal access time to all UMA. Cache coherent means if one processor memories updates a location in shared memory, all the Memory access across link is slower other processors know about the update. If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA PARALLEL COMPUTER MEMORY ARCHITECTURES: DISTRIBUTED MEMORY ARCHITECTURE Each processor has its own memory Programmer is responsible for many details of communication between processors Each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicate Synchronization between tasks is likewise the programmer's responsibility MULTI CORES A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions Multi-Core CPU Chip Single Core CPU Chip MULTI-CORES The cores fit on a single processor socket Also called CMP (Chip Multi-Processor) The cores run in parallel Interaction with OS: OS perceives each core as a separate processor OS scheduler maps threads/processes to different cores Most major OS support multi-core today WHY MULTI-CORES? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits: heat problems speed of light problems difficult design and verification large design teams necessary server farms need expensive air-conditioning Many new applications are multithreaded General trend in computer architecture (shift towards more parallelism) MULTI-CORES Multi-core processors are MIMD Different cores execute different threads ( Multiple Instructions), operating on different parts of memory ( Multiple Data) Multi-core is a shared memory multiprocessor All cores share the same memory WHAT APPLICATIONS BENEFIT FROM MULTI-CORE? Database servers Web servers (Web commerce) Compilers Multimedia applications Scientific applications, CAD/CAM Editing a photo while recording a TV show through a digital video recorder Downloading software while running an anti-virus program Anything that can be threaded today will map efficiently to multi- core MULTI-CORES: CACHE COHERENCE PROBLEM Cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multi-core architecture Coherence Mechanism: Snooping: Snooping based protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors Snooping isn't scalable. Every request must be broadcast to all nodes in a system Directory based Tend to have longer latencies but use much less bandwidth since messages are point to point and not broadcast For this reason, many of the larger systems (>64 processors) use this type of cache coherence MULTI-CORES: COHERENCE PROTOCOLS Write-invalidate: When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates its own copy of the snooped memory location, which forces a read from main memory of the new value on its next access Write-update: When a write operation is observed to a location that a cache has a copy of, the cache controller updates its own copy of the snooped memory location with the new data GRAPHICAL PROCESSING UNITS- GPU Processor optimized for 2D/3D graphics, video, visual computing, and display Highly parallel, highly multithreaded multiprocessor optimized for visual computing Provide real-time visual interaction with computed objects via graphics images, and video Serves as both a programmable graphics processor and a scalable parallel computing platform Heterogeneous Systems: combine a GPU with a CPU GPU EVOLUTION 1980’s – No GPU. PC used VGA controller 1990’s – Add more function into VGA controller 1997 – 3D acceleration functions:  Hardware for triangle setup and rasterization  Texture mapping  Shading 2000 – A single chip graphics processor ( beginning of GPU term) 2005 – Massively parallel programmable processors 2007 – CUDA (Compute Unified Device Architecture) 2010 – AMD’s Radeon cards, GeForce 10 series WHY GPU? To provide a separate dedicated graphics resources including a graphics processor and memory. To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests GPU VS CPU A GPU is tailored for highly parallel operation while a CPU executes programs serially. For this reason, GPUs have many parallel execution units , while CPUs have few execution units. GPUs have significantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs. GPUs have much deeper pipelines (several thousand stages vs 10- 20 for CPUs). COMPONENTS OF GPU Graphics Processor Graphics co-processor Graphics accelerator Frame buffer Memory Graphics BIOS Digital-to-Analog Converter (DAC) Display Connector Computer (Bus) Connector CLUSTERS A computer cluster is a group of loosely or tightly coupled computers that work together closely so that in many respects it can be viewed as though it were a single computer. Connected through fast LAN. Deployed to improve speed & reliability over that provided by a single computer, while typically being much more cost effective than single computer in terms of speed or reliability. Middleware is required to manage them CLUSTERS In cluster computing each node within a cluster is an independent system, with its own operating system, private memory, and, in some cases, its own file system Processors on one node cannot directly access the memory on the other nodes, programs or software run on clusters usually employ a procedure called "message passing" to get data and execution code from one node to another NEED OF CLUSTERS More computing power Better reliability by orchestrating a number of low cost commercial off-the-shelf computers has given rise to a variety of architectures and configurations Improve performance and availability over that of a single computer More cost-effective than single computers of comparable speed or availability E.g. Big Data TYPES OF CLUSTERS High Availability Clusters Load Balancing Clusters Compute Clusters Provide uninterrupted availability Distributes incoming requests for Used for computation-intensive of data or services (typically web resources or content among multiple purposes, rather than handling IO- services) to the end-user community nodes running the same programs or oriented operations such as web In case of node failure, service can having the same content service or databases. be restored without affecting the Every node in the cluster is able to Compute clusters vary in the level of availability of the services provided handle requests for the same content coupling by the cluster. There will be a or application. Jobs with frequent performance drop due to the Typically, seen in a web-hosting communications among nodes missing node environment may require dedicated network, Implementations: Data E.g. nginx as HTTP load balancer dense location & likely mining ,simulations, mission-critical homogenous nodes applications or databases, mail, file Jobs with infrequent and print, web, or application servers communication between nodes E.g. Oracle Clusterware may relax some of these requirements E.g. Rocks package on Linux BEOWULF CLUSTERS Uses parallel processing across multiple computers to create cheap and powerful supercomputers. A cluster has two types of computers: Master or service node or front node : Used to interact with users and manage the cluster. Nodes : A group of computers (computing nodes) E.g. keyboard, mouse, floppy, video etc. E.g. OSCAR on Linux When a large problem or set of data is given to a Beowulf cluster, the master computer first runs a program that breaks the problem into small discrete pieces; it then sends a piece to each node to compute. As nodes finish their tasks, the master computer continually sends more pieces to them until the entire problem has been computed CLUSTERS-TECHNOLOGIES TO IMPLEMENT Parallel Virtual Machine (PVM) Must be directly installed on every cluster node & provides a set of software libraries that paint the node as a “parallel virtual machine” Provides a run-time environment for : Message-passing Task & Resource management Fault notification Message Passing Interface (MPI) Drew on various features available in commercial systems of the time. The MPI specifications then gave rise to specific implementations Implementations typically use TCP/IP & socket connections Widely available communications model that enables parallel programs to be written in languages such as: C, Fortran, Python, etc CLUSTER BENEFITS Availability Performance Low Cost Elasticity Run Jobs Anytime Anywhere GRID COMPUTING Grid computing combines computers from multiple administrative domains to reach a common goal What distinguishes grid computing from Cluster systems such as cluster computing is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed Special kind of distributed computing in which different computers within the same network share one or more resources TYPES OF GRID COMPUTING – DATA GRIDS Allows you to distribute your data across the grid Main goal of Data Grid is to provide as much data as possible from memory on every grid node and to ensure data coherency Characteristics: a. Data Replication- all data is fully replicated to all nodes in the grid b. Data Invalidation- Whenever data changes on one of the nodes, then the same data on all other nodes is purged c. Distributed Transactions- Transactions are required to ensure Data Coherency d. Data Backups- Useful for fail-over. Some Data Grid products provide ability to assign backup nodes for the data e. Data Affinity/Partitioning- Allows to split/partition whole data set into multiple subsets and assign every subset to a grid node TYPES OF GRID COMPUTING – COMPUTE GRIDS Allows to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel leads to faster rate of execution. E.g. MapReduce Helps to improve overall scalability and fault-tolerance by offloading your computations onto most available nodes Characteristics: a. Automatic Deployment b. Topology Resolution - allows to provision nodes based on any node characteristic or user-specific configuration c. Collision Resolution - Jobs are executed in parallel but synchronization is maintained d. Load Balancing – proper balancing of your system load within grid e. Checkpoints - Long running jobs should be able to periodically store their intermediate state f. Grid Events - a querying mechanism for all grid events is essential g. Node Metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes GRID COMPUTING Advantages:  Can solve larger, more complex problems in a shorter time  Easier to collaborate with other organizations  Make better use of existing hardware Disadvantages:  Grid software and standards are still evolving  Learning curve to get started  Non-interactive job submission CLOUD COMPUTING Computing paradigm shift where computing is moved away from personal computers or an individual application server to a “cloud” of computers. Abstraction: Users of the cloud only need to be concerned with the computing service being asked for, as the underlying details of how it is achieved are hidden. Virtualization: Cloud Computing virtualizes system by pooling and sharing resources NIST DEFINITION OF CLOUD COMPUTING Cloud computing is a model for enabling ubiquitous, convenient, on‐demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction CHARACTERISTICS OF CLOUD COMPUTING 1. On‐demand self‐service 2. Broad network access 3. Resource pooling 4. Rapid elasticity 5. Measured service CLOUD COMPONENTS Clients Data Center (Collection of Servers where the application to which you subscribe is housed) Internet CLOUD COMPUTING- BENEFITS Lower Costs Lower computer costs Reduced Software Costs By using the Cloud infrastructure on “pay as used and on demand”, all of us can save in capital and operational investment! Ease of utilization Quality of Service Reliability Outsourced IT management Simplified maintenance and upgrade Low Barrier to Entry Unlimited storage capacity Universal document access Latest version availability CLOUD COMPUTING - LIMITATIONS Requires a constant Internet connection Does not work well with low‐speed connections Larger organizations can have applications more customizable Security and Privacy issues Cloud Service provider may go down Latency concerns RESOURCES 1. https://www.hpcadvisorycouncil.com/pdf/Intro_to_HPC.pdf 2. https://computing.llnl.gov/tutorials/parallel_comp/#Whatis 3. https://www.cs.cmu.edu/~fp/courses/15213-s06/lectures/27-mult icore.pdf 4. https://en.wikipedia.org/wiki/Cache_coherence https://www.youtube.com/watch?v=A_i5kOlj_UU https://www.youtube.com/watch?v=bkLVuNfiCVs

Document Details

Tags

Related

Summary

Full Transcript