🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Document Details

ManageableSatire

Uploaded by ManageableSatire

2024

Tags

computer architecture high performance computing warehouse scale computing

Full Transcript

CSC-25 High Performance Architectures Lecture Notes – Chapter X Warehouse-scale computer Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Techno...

CSC-25 High Performance Architectures Lecture Notes – Chapter X Warehouse-scale computer Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents Memory Hierarchy Introduction Networking Hierarchy Concepts Power Utilization Effectiveness - PUE Main Goals Server from a Google WSC Main Requirements Why Regular Machines Nodes, Racks and Switches Seeks and Scans Examples Cloud Computing Workload References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/29 Introduction Concepts Anyone can build a fast CPU. The trick is to build a fast system. Seymour Cray, Considered the father of the supercomputer 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/29 Introduction (cont.) Concepts Warehouse-scale computer - WSC I foundation of internet services billions of people use every day around the globe WSC acts as one giant machine I costs hundreds of million dollars for the building, the electrical and cooling infrastructure, the servers, and the networking equipment I the last connecting and housing 50,000–100,000 servers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/29 Introduction (cont.) Concepts Quick growth of commercial cloud computing I makes WSC accessible to anyone with a credit card Nowadays, target is providing information technology for the world I instead of high-performance computing - HPC for scientists and engineers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/29 Introduction (cont.) Main Goals Cost-performance I work done per dollar is critical, i.e., scale I reducing the small costs could save millions Energy efficiency I energy consumed turned into heat I power and cooling I work done per joule is critical 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/29 Introduction (cont.) Main Goals Dependability via redundancy I 99.99%1 of availability, i.e., down time ≤ 1h per year I software redundancy also plays an important role, along with hardware redundancy Network I/O I data consistent between multiple WSC I as well as to interface with public Interactive and batch processing workloads I interactive workloads, e.g., search and social networking I massively parallel batch programs, e.g., calculate metadata useful to such services 1 called “four nines” 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/29 Introduction (cont.) Main Requirements Ample parallelism I data-level parallelism, e.g., web crawlers I internet service applications, aka software as a service - SaaS I “easy” parallelism, i.e., request-level parallelism I many independent efforts going in parallel with little need for communication/synchronization Operational costs I energy, power distribution, and cooling represent more than 30% of the costs over 10 years 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/29 Introduction (cont.) Main Requirements Location I inexpensive electricity I proximity to Internet backbone optical fibers I human resources nearby to work with Big scale trade-offs I smaller unit costs, i.e., bigger quantities purchased at a time I “less” dependability, bigger failure rates 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/29 Introduction (cont.) Nodes, Racks and Switches Switches hierarchy in WSC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/29 Examples Workload MapReduce2 I popular framework for batch processing in WSC Map I applies a programmer-supplied function to each input I runs on hundreds of computers to produce an intermediate result of key-value pairs Reduce I collects the output of those distributed tasks I collapses them using another programmer-defined function 2 Hadoop is an alternative open-source version 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/29 Examples (cont.) Workload MapReduce, words and documents indexing example MapReduce usage at Google from 2004 to 2016. PB stands petabytes 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/29 Examples (cont.) Workload MapReduce, words and documents indexing code fragment example 1 map(String key, String value): 2 // key: document name 3 // value: document contents 4 for each word w in value: 5 EmitIntermediate(w, "1"); // Produce list of all words 6 7 reduce(String key, Iterator values): 8 // key: a word 9 // values: a list of counts 10 int result = 0; 11 for each v in values: 12 result += ParseInt(v); // get integer from key-value pair 13 Emit(AsString(result)); 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 15/29 Examples (cont.) Local Node Rack Array Memory Hierarchy DRAM 0.1 300 500 Flash 100 400 600 Disk 10,000 11,000 12,000 Latency µs; networking software and switch overhead increases the latency in the rack; finally, Each node contains: the array switch hardware/software also increases I 16 GiB DRAM (×80RACK )(×30ARRAY ) latency I 128 GiB Flash (×80RACK )(×30ARRAY ) I 2,048 GiB Disk (×80RACK )(×30ARRAY ) Local Node Rack Array I 1 Gbit/s Ethernet port DRAM 20,000 Flash 1,000 100 10 Disk 200 The rack holds 80 nodes, and array has 30 racks Bandwidth MiB/s; the 1 Gbit/s Ethernet limits the remote bandwidth within the rack; finally, bandwidth of the array switch also limits the remote bandwidth 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/29 Examples (cont.) Memory Hierarchy Bottom line, i.e., comparison between local node and array I network overhead considerably increases latency I network collapses differences in bandwidth 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/29 Examples (cont.) Networking Hierarchy Some WSC needs more than one array, in that case, there is one more level in the networking hierarchy Regular Layer 3 routers connecting arrays together and also to the Internet; core router operates in the Internet backbone 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/29 Examples (cont.) Power Utilization Effectiveness - PUE Metric to evaluate the efficiency of a WSC Total facility power consumption PUE = (1) IT equipment power consumption where PUE must be ≥ 1 the bigger the PUE, the less efficient the WSC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/29 Examples (cont.) Power Utilization Effectiveness - PUE The power for AC and other uses3 is normalized to the power for the IT equipment I power for IT equipment must be 1.0 I AC varies from about 0.30 to 1.40× the power of the IT equipment I power for other varies from about 0.05 to 0.60 of the IT equipment Median PUE is 1.69 I cooling infrastructure using more than half as much power as the servers I on average, 0.55 of the 1.69 is for cooling Power utilization efficiency of 19 data centers, 2006; PUE from most to least efficient 3 e.g., power distribution 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/29 Examples (cont.) Power Utilization Effectiveness - PUE Average PUE from 15 Google WSC, 2008 to 2017; spiking line is the quarterly average PUE; and the straighter line is the trailing 12-month average PUE 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/29 Examples (cont.) Server from a Google WSC Intel Haswell 2.3 GHz CPUs – 2 sockets × 18 cores × 2 threads given 72 “virtual cores” per machine; 2.5 MiB last level cache per core; 256 GB DDR3-1600 DRAM; 2 × 8 TB SATA disk drives, or 1 TB SSD; 10 Gbit/s Ethernet link 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/29 Examples (cont.) Why Regular Machines HP Integrity Superdome HP ProLiant ML350 G5 - Itanium2 Processor 64 sockets; 128 cores, 1 socket, quad-core; 2.66 dual-threaded; 1.6 GHz GHz X5355 CPU, 8 MB Itanium2, 12 MB last- last-level cache level cache Memory 2,048 GB 24 GB Disk Storage 320,974 GB | 7,056 drives 3,961 GB | 105 drives TPC-C price/perfor- $2.93/tpmC $0.73/tpmC mance price/performance $1.28/tpm $0.10/tpm (server HW only) TPC-C benchmark is measured in transactions per minutes (tpmC) – www.tpc.org 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/29 Examples (cont.) Seeks and Scans Let’s assume a 1 TB database with 100 Bytes records I update 1% of the records First scenario – random access I assuming each update takes ≈ 30 ms, i.e., seek, read, write I 108 updates → ≈ 35 days Second scenario – rewrite all records I assuming 100 MiB/s throughput I time → ≈ hours The lesson is to avoid random seeks 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/29 Cloud Computing If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry. John McCarthy MIT centennial celebration, 1961 Back there, he thought about timesharing 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/29 Cloud Computing (cont.) To fullfil the demand of increasing number of users, Internet players4 make very large warehouse-scale computers from commodity components McCarthy’s prediction eventually came true 4 Amazon, Google, and Microsoft 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/29 Outline Introduction Examples Cloud Computing References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/29 Information to the reader Lecture notes mainly based on the following references Castro, Paulo André. Notas de Aula da disciplina CES-25 Arquiteturas para Alto Desempenho. ITA. 2018. Dean, Jeff. Designs, Lessons and Advice from Building Large Distributed Systems. Online. Google Fellow Presentation. 2009. URL: https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf. Hennessy, J. L. and D. A. Patterson. Computer Architecture: A Quantitative Approach. 6th. Morgan Kaufmann, 2017. 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/29

Use Quizgecko on...
Browser
Browser