csc25-chapter_10.pdf
Document Details
Uploaded by ManageableSatire
2024
Tags
Full Transcript
CSC-25 High Performance Architectures Lecture Notes – Chapter X Warehouse-scale computer Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Techno...
CSC-25 High Performance Architectures Lecture Notes – Chapter X Warehouse-scale computer Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents Memory Hierarchy Introduction Networking Hierarchy Concepts Power Utilization Effectiveness - PUE Main Goals Server from a Google WSC Main Requirements Why Regular Machines Nodes, Racks and Switches Seeks and Scans Examples Cloud Computing Workload References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/29 Introduction Concepts Anyone can build a fast CPU. The trick is to build a fast system. Seymour Cray, Considered the father of the supercomputer 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/29 Introduction (cont.) Concepts Warehouse-scale computer - WSC I foundation of internet services billions of people use every day around the globe WSC acts as one giant machine I costs hundreds of million dollars for the building, the electrical and cooling infrastructure, the servers, and the networking equipment I the last connecting and housing 50,000–100,000 servers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/29 Introduction (cont.) Concepts Quick growth of commercial cloud computing I makes WSC accessible to anyone with a credit card Nowadays, target is providing information technology for the world I instead of high-performance computing - HPC for scientists and engineers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/29 Introduction (cont.) Main Goals Cost-performance I work done per dollar is critical, i.e., scale I reducing the small costs could save millions Energy efficiency I energy consumed turned into heat I power and cooling I work done per joule is critical 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/29 Introduction (cont.) Main Goals Dependability via redundancy I 99.99%1 of availability, i.e., down time ≤ 1h per year I software redundancy also plays an important role, along with hardware redundancy Network I/O I data consistent between multiple WSC I as well as to interface with public Interactive and batch processing workloads I interactive workloads, e.g., search and social networking I massively parallel batch programs, e.g., calculate metadata useful to such services 1 called “four nines” 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/29 Introduction (cont.) Main Requirements Ample parallelism I data-level parallelism, e.g., web crawlers I internet service applications, aka software as a service - SaaS I “easy” parallelism, i.e., request-level parallelism I many independent efforts going in parallel with little need for communication/synchronization Operational costs I energy, power distribution, and cooling represent more than 30% of the costs over 10 years 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/29 Introduction (cont.) Main Requirements Location I inexpensive electricity I proximity to Internet backbone optical fibers I human resources nearby to work with Big scale trade-offs I smaller unit costs, i.e., bigger quantities purchased at a time I “less” dependability, bigger failure rates 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/29 Introduction (cont.) Nodes, Racks and Switches Switches hierarchy in WSC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/29 Examples Workload MapReduce2 I popular framework for batch processing in WSC Map I applies a programmer-supplied function to each input I runs on hundreds of computers to produce an intermediate result of key-value pairs Reduce I collects the output of those distributed tasks I collapses them using another programmer-defined function 2 Hadoop is an alternative open-source version 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/29 Examples (cont.) Workload MapReduce, words and documents indexing example MapReduce usage at Google from 2004 to 2016. PB stands petabytes 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/29 Examples (cont.) Workload MapReduce, words and documents indexing code fragment example 1 map(String key, String value): 2 // key: document name 3 // value: document contents 4 for each word w in value: 5 EmitIntermediate(w, "1"); // Produce list of all words 6 7 reduce(String key, Iterator values): 8 // key: a word 9 // values: a list of counts 10 int result = 0; 11 for each v in values: 12 result += ParseInt(v); // get integer from key-value pair 13 Emit(AsString(result)); 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 15/29 Examples (cont.) Local Node Rack Array Memory Hierarchy DRAM 0.1 300 500 Flash 100 400 600 Disk 10,000 11,000 12,000 Latency µs; networking software and switch overhead increases the latency in the rack; finally, Each node contains: the array switch hardware/software also increases I 16 GiB DRAM (×80RACK )(×30ARRAY ) latency I 128 GiB Flash (×80RACK )(×30ARRAY ) I 2,048 GiB Disk (×80RACK )(×30ARRAY ) Local Node Rack Array I 1 Gbit/s Ethernet port DRAM 20,000 Flash 1,000 100 10 Disk 200 The rack holds 80 nodes, and array has 30 racks Bandwidth MiB/s; the 1 Gbit/s Ethernet limits the remote bandwidth within the rack; finally, bandwidth of the array switch also limits the remote bandwidth 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/29 Examples (cont.) Memory Hierarchy Bottom line, i.e., comparison between local node and array I network overhead considerably increases latency I network collapses differences in bandwidth 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/29 Examples (cont.) Networking Hierarchy Some WSC needs more than one array, in that case, there is one more level in the networking hierarchy Regular Layer 3 routers connecting arrays together and also to the Internet; core router operates in the Internet backbone 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/29 Examples (cont.) Power Utilization Effectiveness - PUE Metric to evaluate the efficiency of a WSC Total facility power consumption PUE = (1) IT equipment power consumption where PUE must be ≥ 1 the bigger the PUE, the less efficient the WSC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/29 Examples (cont.) Power Utilization Effectiveness - PUE The power for AC and other uses3 is normalized to the power for the IT equipment I power for IT equipment must be 1.0 I AC varies from about 0.30 to 1.40× the power of the IT equipment I power for other varies from about 0.05 to 0.60 of the IT equipment Median PUE is 1.69 I cooling infrastructure using more than half as much power as the servers I on average, 0.55 of the 1.69 is for cooling Power utilization efficiency of 19 data centers, 2006; PUE from most to least efficient 3 e.g., power distribution 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/29 Examples (cont.) Power Utilization Effectiveness - PUE Average PUE from 15 Google WSC, 2008 to 2017; spiking line is the quarterly average PUE; and the straighter line is the trailing 12-month average PUE 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/29 Examples (cont.) Server from a Google WSC Intel Haswell 2.3 GHz CPUs – 2 sockets × 18 cores × 2 threads given 72 “virtual cores” per machine; 2.5 MiB last level cache per core; 256 GB DDR3-1600 DRAM; 2 × 8 TB SATA disk drives, or 1 TB SSD; 10 Gbit/s Ethernet link 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/29 Examples (cont.) Why Regular Machines HP Integrity Superdome HP ProLiant ML350 G5 - Itanium2 Processor 64 sockets; 128 cores, 1 socket, quad-core; 2.66 dual-threaded; 1.6 GHz GHz X5355 CPU, 8 MB Itanium2, 12 MB last- last-level cache level cache Memory 2,048 GB 24 GB Disk Storage 320,974 GB | 7,056 drives 3,961 GB | 105 drives TPC-C price/perfor- $2.93/tpmC $0.73/tpmC mance price/performance $1.28/tpm $0.10/tpm (server HW only) TPC-C benchmark is measured in transactions per minutes (tpmC) – www.tpc.org 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/29 Examples (cont.) Seeks and Scans Let’s assume a 1 TB database with 100 Bytes records I update 1% of the records First scenario – random access I assuming each update takes ≈ 30 ms, i.e., seek, read, write I 108 updates → ≈ 35 days Second scenario – rewrite all records I assuming 100 MiB/s throughput I time → ≈ hours The lesson is to avoid random seeks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/29 Cloud Computing If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry. John McCarthy MIT centennial celebration, 1961 Back there, he thought about timesharing 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/29 Cloud Computing (cont.) To fullfil the demand of increasing number of users, Internet players4 make very large warehouse-scale computers from commodity components McCarthy’s prediction eventually came true 4 Amazon, Google, and Microsoft 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/29 Information to the reader Lecture notes mainly based on the following references Castro, Paulo André. Notas de Aula da disciplina CES-25 Arquiteturas para Alto Desempenho. ITA. 2018. Dean, Jeff. Designs, Lessons and Advice from Building Large Distributed Systems. Online. Google Fellow Presentation. 2009. URL: https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf. Hennessy, J. L. and D. A. Patterson. Computer Architecture: A Quantitative Approach. 6th. Morgan Kaufmann, 2017. 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/29