Computer Abstractions and Technology Lecture Notes PDF

Summary

These lecture notes cover computer abstractions and technologies related to the study of computer architecture. They discuss topics such as computer components, processors, memory, memory controllers, and interconnect. The lecture notes are presented as a PDF slide document.

Full Transcript

Computer Abstractions and Technology 471029: Introduction to Computer Architecture 2th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU...

Computer Abstractions and Technology 471029: Introduction to Computer Architecture 2th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 1 Opening the Box Display Fans Computer board + DRAM + CPU + GPU Batteries I/O devices Speakers Storage 2 Components of a computer  Processor  Memory  Interconnects  NoC(Network-on-Chip), Processor- interconnect, large-scale network  I/Os  User-interface devices: Display, keyboard, mouse, sounds, camera, …  Storage devices: HDD, SSD, CD/DVD,  Network adapters: Ethernet, 3G/4G/5G, Wifi, Bluetooth, NFC, …  Same components for all classes of computer 3 Inside the Processor(CPU)  Datapath: performs operations on data  Control: tells the datapath, memory and I/O devices what to do  Old days..  “ARM1 (Acorn RISC Machine 1)”, 1985 Control Datapath Source: http://www.righto.com/2016/02/reverse-engineering-arm1-processors.html 4 Source: https://en.wikichip.org/wiki/acorn/microarchitectures/arm1 Inside the Processor(CPU) – cont’d Program Counter Fetch Instruction Pipe Pipe Stat Decode Instruction Decode ALU Decode Register Decode Shift Decode Register File …. ??? 5 Inside the Memory  Offchip memory controller ( ~ 2008) PCI DRAM (Memory Controller) 6 Inside the Memory – cont’d  Offchip memory controller ( ~ 2008) Source: “Memory systems: cache, DRAM, disk”, Jacob et al., 2010 7 Inside the Memory – cont’d  DRAM Technology 8 FLOPS(flop/s) = floating point operation per second What Happened? KFLOPS = 103 FLOPS MFLOPS = 106 FLOPS GFLOPS = 109 FLOPS 1996 2019 Hitachi CP-PACS/2048 Supercomputer AMD EPYC 7702P 64-Core Processor Performance: ~368.2 GFLOPS, Power: 257 KW Performance: ~388 GFLOPS, Power: ~0.2KW 2048 Processing Unit 3D hyper crossbar network Source: top500.org 3 orders of magnitude! 9 Reviewing 40 years of Moore’s Law  40 years of stunning progress in microprocessor design  1.4 x annual performance improvement for 40+ years  Width: 8163264 bit (~4x)  Instruction Level Parallelism  4~10 cycles per instruction to 4+ instructions per cycle (~10-20x)  Multicore: one to 128~ cores (~ 128+x)  Clock rate: 3MHz to 4GHz (through technology & architecture) 10 Now: Inside the Processor(CPU)  Status quo (Intel I7-3960X) Further integrated More functionalities 11 Now: Inside the Processor(CPU) – cont’d  Status quo (AMD Ryzen 5000, Zen3 Architecture) 12 Now: Inside the Memory  High-Bandwidth Memory, 3D stacked memory Source: https://community.cadence.com/cadence_blogs_8/b/fv/posts/what-s-new-with-hybrid-memory-cube-hmc 13 Now: Inside the Memory  High-Bandwidth Memory, 3D stacked memory NVIDIA Volta V100 GPU + HBM2, 2017 AMD Radeon Pro 5600 GPU + HBM2, 2020 14 Interconnect in CPU  Interconnect matters as the number of compute unit increases Intel Ice Lake 15 Interconnect in CPU  Network-On-Chip (On-Chip-Network) “Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs”, Hernndez et al., 2013 16 Interconnect in CPU  Network-On-Chip (On-Chip-Network)  Example of Mesh topology Source: Intel Source: “On-Chip Networks”, Mark Hill, 2009 17 Eight great Ideas  Design for Moore’s Law  Use abstraction to simplify design  Make the common case fast  Performance via parallelism  Performance via pipelining  Performance via prediction  Hierarchy of memories  Dependability via redundancy 18 Abstractions  Abstraction helps us deal with complexity  Hide lower-level details  Application Programming Interface  Application and libraries(e.g., PL) interfaces  Application binary interface  System software interface  Or interface between two binary programs  E.g., calling convention  Instruction set architecture (ISA)  The hardware/software interface  Implementation  The details underlying and interface Source: https://www.computer.org/csdl/mags/co/2005/05/r5032-abs.html 19 Parallelism  Implicit parallelism: Instruction-level parallelism(ILP)  In programmer’s perspective, a sequence of instructions is executed sequentially  Hardware executes it in parallel.  Pipelining  Speculation(prediction)  Caching  Superscalar (multiple instruction per cycle)  Dynamic scheduling (out-of-order execution)  …  Explicit parallelism: Data & thread level parallelism  Hardware provides parallel resources to execute instructions simultaneously  Why? Diminishing returns on instruction-level parallelism 20 Everything goes well and looks fine… But we now have new challenges… 21 [ Note ] Uniprocessor Performance (Single Core) 22 The end of Moore’s Law 23 End of Dennard Scaling  Dennard scaling  As transistors get smaller, the power density is constant  Power = alpha * CFV2 Alpha – percent time switched  C = capacitance  F = frequency  V = voltage  Capacitance is related to area  So, as the size of the transistors shrunk, and the voltage was reduced, circuit could operate at higher frequencies at the same power  End of Dennard Scaling  Dennard scaling ignored the “leakage current” and “threshold voltage”, which establish a baseline of power per transistor  These created a “Power Wall” that has limited practical processor frequency 24 Running Into the Power Wall 25 End of Dennard Scaling is a Crisis  Processors have reached their power limit  Thermal dissipation is maxed out  Energy consumption has become more important to users  E.g., mobile, IoT and datacenter  E.g., the global It industry’s 2012 electricity consumption in billion kilowatt-hours compared to the world’s largest energy-consuming countries Source: GREENPEACE 26 “New Golden Age of Computer Architecture”  Hennesy & Patterson, 2018 Turing Lecture  The end of Dennard Scaling & Moore’s Law means no more free performance  “The next decade will see a Cambrian explosion of novel computer architectures” 27 Future Opportunities  Domain-specific Architecture(DSA)  Design architectures tailored to a specific problem domain  Also, called ‘Accelerator’  GPU for graphic processing  Neural network processor for deep learning  Processor for software-defined network  [Note] those above share the similar architectural techniques of the general-purpose processors  Domain-specific Language(DSL)  DSA requires targeting of higher-level operations to the architecture but trying to extract such structure and information like Python, Java, and C is simply too difficult  DSL enables this process to make it possible to program DSAs efficiently  Matlab, a language for operating on matrices  TensorFlow, a dataflow language used for programming DNNs  P4, a language for programming SDNs  Halide, a language for image processing 28 Future Opportunities  Secure architecture and S/W  Control isolation  Data isolation  Constant-time programming  Avoiding speculative execution  Avoiding shared resources https://www.neowin.net/news/microsoft-no-longer-suggests-overlooking-downfall-of-intel-7th-8th-9th-10th-11th-gen-cpus/ 29 Future Opportunities  Energy-efficient architecture and S/W  Minimizing instruction counts E.g., same functions, fewer codes   Less data movement  Fewer communications  Data-centric architecture  E.g., PIM(processor-in-memory) “Energy Efficiency across Programming Language”, SEL 2017 30

Use Quizgecko on...
Browser
Browser