Computer Abstractions and Technology Lecture Notes PDF
Document Details
Uploaded by CuteWatermelonTourmaline
KNU
Tags
Related
- Computer Abstractions and Technology : Performance (Lecture 3) - KNU
- Computer Abstractions and Technology: Performance II Lecture Notes PDF
- CSE220 - Computer Architecture Course Outline PDF
- Computer Architecture and Organization Lecture Notes PDF
- Computer Architecture Lecture 1 PDF
- Computer Architecture PDF - University of Benghazi
Summary
These lecture notes cover computer abstractions and technologies related to the study of computer architecture. They discuss topics such as computer components, processors, memory, memory controllers, and interconnect. The lecture notes are presented as a PDF slide document.
Full Transcript
Computer Abstractions and Technology 471029: Introduction to Computer Architecture 2th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU...
Computer Abstractions and Technology 471029: Introduction to Computer Architecture 2th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 1 Opening the Box Display Fans Computer board + DRAM + CPU + GPU Batteries I/O devices Speakers Storage 2 Components of a computer Processor Memory Interconnects NoC(Network-on-Chip), Processor- interconnect, large-scale network I/Os User-interface devices: Display, keyboard, mouse, sounds, camera, … Storage devices: HDD, SSD, CD/DVD, Network adapters: Ethernet, 3G/4G/5G, Wifi, Bluetooth, NFC, … Same components for all classes of computer 3 Inside the Processor(CPU) Datapath: performs operations on data Control: tells the datapath, memory and I/O devices what to do Old days.. “ARM1 (Acorn RISC Machine 1)”, 1985 Control Datapath Source: http://www.righto.com/2016/02/reverse-engineering-arm1-processors.html 4 Source: https://en.wikichip.org/wiki/acorn/microarchitectures/arm1 Inside the Processor(CPU) – cont’d Program Counter Fetch Instruction Pipe Pipe Stat Decode Instruction Decode ALU Decode Register Decode Shift Decode Register File …. ??? 5 Inside the Memory Offchip memory controller ( ~ 2008) PCI DRAM (Memory Controller) 6 Inside the Memory – cont’d Offchip memory controller ( ~ 2008) Source: “Memory systems: cache, DRAM, disk”, Jacob et al., 2010 7 Inside the Memory – cont’d DRAM Technology 8 FLOPS(flop/s) = floating point operation per second What Happened? KFLOPS = 103 FLOPS MFLOPS = 106 FLOPS GFLOPS = 109 FLOPS 1996 2019 Hitachi CP-PACS/2048 Supercomputer AMD EPYC 7702P 64-Core Processor Performance: ~368.2 GFLOPS, Power: 257 KW Performance: ~388 GFLOPS, Power: ~0.2KW 2048 Processing Unit 3D hyper crossbar network Source: top500.org 3 orders of magnitude! 9 Reviewing 40 years of Moore’s Law 40 years of stunning progress in microprocessor design 1.4 x annual performance improvement for 40+ years Width: 8163264 bit (~4x) Instruction Level Parallelism 4~10 cycles per instruction to 4+ instructions per cycle (~10-20x) Multicore: one to 128~ cores (~ 128+x) Clock rate: 3MHz to 4GHz (through technology & architecture) 10 Now: Inside the Processor(CPU) Status quo (Intel I7-3960X) Further integrated More functionalities 11 Now: Inside the Processor(CPU) – cont’d Status quo (AMD Ryzen 5000, Zen3 Architecture) 12 Now: Inside the Memory High-Bandwidth Memory, 3D stacked memory Source: https://community.cadence.com/cadence_blogs_8/b/fv/posts/what-s-new-with-hybrid-memory-cube-hmc 13 Now: Inside the Memory High-Bandwidth Memory, 3D stacked memory NVIDIA Volta V100 GPU + HBM2, 2017 AMD Radeon Pro 5600 GPU + HBM2, 2020 14 Interconnect in CPU Interconnect matters as the number of compute unit increases Intel Ice Lake 15 Interconnect in CPU Network-On-Chip (On-Chip-Network) “Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs”, Hernndez et al., 2013 16 Interconnect in CPU Network-On-Chip (On-Chip-Network) Example of Mesh topology Source: Intel Source: “On-Chip Networks”, Mark Hill, 2009 17 Eight great Ideas Design for Moore’s Law Use abstraction to simplify design Make the common case fast Performance via parallelism Performance via pipelining Performance via prediction Hierarchy of memories Dependability via redundancy 18 Abstractions Abstraction helps us deal with complexity Hide lower-level details Application Programming Interface Application and libraries(e.g., PL) interfaces Application binary interface System software interface Or interface between two binary programs E.g., calling convention Instruction set architecture (ISA) The hardware/software interface Implementation The details underlying and interface Source: https://www.computer.org/csdl/mags/co/2005/05/r5032-abs.html 19 Parallelism Implicit parallelism: Instruction-level parallelism(ILP) In programmer’s perspective, a sequence of instructions is executed sequentially Hardware executes it in parallel. Pipelining Speculation(prediction) Caching Superscalar (multiple instruction per cycle) Dynamic scheduling (out-of-order execution) … Explicit parallelism: Data & thread level parallelism Hardware provides parallel resources to execute instructions simultaneously Why? Diminishing returns on instruction-level parallelism 20 Everything goes well and looks fine… But we now have new challenges… 21 [ Note ] Uniprocessor Performance (Single Core) 22 The end of Moore’s Law 23 End of Dennard Scaling Dennard scaling As transistors get smaller, the power density is constant Power = alpha * CFV2 Alpha – percent time switched C = capacitance F = frequency V = voltage Capacitance is related to area So, as the size of the transistors shrunk, and the voltage was reduced, circuit could operate at higher frequencies at the same power End of Dennard Scaling Dennard scaling ignored the “leakage current” and “threshold voltage”, which establish a baseline of power per transistor These created a “Power Wall” that has limited practical processor frequency 24 Running Into the Power Wall 25 End of Dennard Scaling is a Crisis Processors have reached their power limit Thermal dissipation is maxed out Energy consumption has become more important to users E.g., mobile, IoT and datacenter E.g., the global It industry’s 2012 electricity consumption in billion kilowatt-hours compared to the world’s largest energy-consuming countries Source: GREENPEACE 26 “New Golden Age of Computer Architecture” Hennesy & Patterson, 2018 Turing Lecture The end of Dennard Scaling & Moore’s Law means no more free performance “The next decade will see a Cambrian explosion of novel computer architectures” 27 Future Opportunities Domain-specific Architecture(DSA) Design architectures tailored to a specific problem domain Also, called ‘Accelerator’ GPU for graphic processing Neural network processor for deep learning Processor for software-defined network [Note] those above share the similar architectural techniques of the general-purpose processors Domain-specific Language(DSL) DSA requires targeting of higher-level operations to the architecture but trying to extract such structure and information like Python, Java, and C is simply too difficult DSL enables this process to make it possible to program DSAs efficiently Matlab, a language for operating on matrices TensorFlow, a dataflow language used for programming DNNs P4, a language for programming SDNs Halide, a language for image processing 28 Future Opportunities Secure architecture and S/W Control isolation Data isolation Constant-time programming Avoiding speculative execution Avoiding shared resources https://www.neowin.net/news/microsoft-no-longer-suggests-overlooking-downfall-of-intel-7th-8th-9th-10th-11th-gen-cpus/ 29 Future Opportunities Energy-efficient architecture and S/W Minimizing instruction counts E.g., same functions, fewer codes Less data movement Fewer communications Data-centric architecture E.g., PIM(processor-in-memory) “Energy Efficiency across Programming Language”, SEL 2017 30