Computer Organization and Design RISC-V Edition PDF

Chat with Document Download Quiz & Flashcards

Document Details

Uploaded by Deleted User

2021

David A. Patterson, John L. Hennessy

Summary

This textbook, Computer Organization and Design: The Hardware/Software Interface, Sixth Edition, by Patterson and Hennessy, provides a comprehensive overview of computer architecture and design, with a focus on the RISC-V architecture. It covers the core concepts and principles of computer organization, including the interactions between hardware and software components.

Full Transcript

In Praise of Computer Organization and Design: The Hardware/ Software Interface, Sixth Edition “Textbook selection is often a frustrating act of compromise—pedagogy, content coverage, quality of exposition, level of rigor, cost. Computer Organization and Design is the rare book that hits all the right notes across the board, without compromise. It is not only the premier computer organization textbook, it is a shining example of what all computer science textbooks could and should be.” —Michael Goldweber, Xavier University “I have been using Computer Organization and Design for years, from the very first edition. This new edition is yet another outstanding improvement on an already classic text. The evolution from desktop computing to mobile computing to Big Data brings new coverage of embedded processors such as the ARM, new material on how software and hardware interact to increase performance, and cloud computing. All this without sacrificing the fundamentals.” —Ed Harcourt, St. Lawrence University “To Millennials: Computer Organization and Design is the computer architecture book you should keep on your (virtual) bookshelf. The book is both old and new, because it develops venerable principles—Moore’s Law, abstraction, common case fast, redundancy, memory hierarchies, parallelism, and pipelining—but illustrates them with contemporary designs.” —Mark D. Hill, University of Wisconsin-Madison “The new edition of Computer Organization and Design keeps pace with advances in emerging embedded and many-core (GPU) systems, where tablets and smartphones will/are quickly becoming our new desktops. This text acknowledges these changes, but continues to provide a rich foundation of the fundamentals in computer organization and design which will be needed for the designers of hardware and software that power this new class of devices and systems.” —Dave Kaeli, Northeastern University “Computer Organization and Design provides more than an introduction to computer architecture. It prepares the reader for the changes necessary to meet the ever- increasing performance needs of mobile systems and big data processing at a time that difficulties in semiconductor scaling are making all systems power constrained. In this new era for computing, hardware and software must be co-designed and system-level architecture is as critical as component-level optimizations.” —Christos Kozyrakis, Stanford University “Patterson and Hennessy brilliantly address the issues in ever-changing computer hardware architectures, emphasizing on interactions among hardware and software components at various abstraction levels. By interspersing I/O and parallelism concepts with a variety of mechanisms in hardware and software throughout the book, the new edition achieves an excellent holistic presentation of computer architecture for the post- PC era. This book is an essential guide to hardware and software professionals facing energy efficiency and parallelization challenges in Tablet PC to Cloud computing.” —Jae C. Oh, Syracuse University R I S C - V E D I T I O N Computer Organization and Design T H E H A R D W A R E S O F T W A R E I N T E R FA C E SECOND EDITION David A. Patterson has been teaching computer architecture at the University of California, Berkeley, since joining the faculty in 1977, where he held the Pardee Chair of Computer Science. His teaching has been honored by the Distinguished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von Neumann Medal and the C & C Prize with John Hennessy. Like his coauthor, Patterson is a Fellow of both AAAS organizations, the Computer History Museum, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served as chair of the CS division in the Berkeley EECS department, as chair of the Computing Research Association, and as President of ACM. This record led to Distinguished Service Awards from ACM, CRA, and SIGARCH. He received the Tapia Achievement Award for Civic Science and Diversifying Computing and shared the 2017 ACM A. M. Turing Award with Hennessy. At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced instruction set computer, and the foundation of the commercial SPARC architecture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable storage systems from many companies. He was also involved in the Network of Workstations (NOW) project, which led to cluster technology used by Internet companies and later to cloud computing. These projects earned four dissertation awards from ACM. In 2016, he became Professor Emeritus at Berkeley and a Distinguished Engineer at Google, where he works on domain specific architecture for machine learning. He is also the Vice Chair of RISC-V International and the Director of the RISC-V International Open Source Laboratory. John L. Hennessy was a Professor of Electrical Engineering and Computer Science at Stanford University, where he has been a member of the faculty since 1977 and was, from 2000 to 2016, its tenth President. Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering, the National Academy of Science, and the American Philosophical Society; and a Fellow of the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neumann Award, which he shared with David Patterson. In 2017, they shared the ACM A. M. Turing Award. He has also received seven honorary doctorates. In 1981, he started the MIPS project at Stanford with a handful of graduate students. After completing the project in 1984, he took a leave from the university to cofound MIPS Computer Systems (now MIPS Technologies), which developed one of the first commercial RISC microprocessors. As of 2006, over 2 billion MIPS microprocessors have been shipped in devices ranging from video games and palmtop computers to laser printers and network switches. Hennessy subsequently led the DASH (Director Architecture for Shared Memory) project, which prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been adopted in modern multiprocessors. In addition to his technical activities and university responsibilities, he has continued to work with numerous start-ups, both as an early-stage advisor and an investor. He is currently Director of Knight-Hennessy Scholars and serves as non-executive chairman of Alphabet. R I S C - V E D I T I O N Computer Organization and Design T H E H A R D W A R E S O F T W A R E I N T E R FA C E SECOND EDITION David A. Patterson University of California, Berkeley Google, Inc John L. Hennessy Stanford University Morgan Kaufmann is an imprint of Elsevier 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2021 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/ or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. RISC-V and the RISC-V logo are registered trademarks managed by the RISC-V Foundation, used under permission of the RISC-V Foundation. All rights reserved. This publication is independent of the RISC-V Foundation, which is not affiliated with the publisher and the RISC-V Foundation does not authorize, sponsor, endorse or otherwise approve this publication. All material relating to ARM® technology has been reproduced with permission from ARM Limited, and should only be used for education purposes. All ARM-based models shown or referred to in the text must not be used, reproduced or distributed for commercial purposes, and in no event shall purchasing this textbook be construed as granting you or any third party, expressly or by implication, estoppel or otherwise, a license to use any other ARM technology or know how. Materials provided by ARM are copyright © ARM Limited (or its affi liates). British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-820331-6 For Information on all Morgan Kaufmann publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Katey Birtcher Senior Acquisitions Editor: Stephen R. Merken Content Development Specialist: Beth LoGiudice Project Manager: Janish Paul Designer: Patrick Ferguson Typeset by TNQ Technologies To Linda, who has been, is, and always will be the love of my life Contents Preface xi C H A P T E R S 1 Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Seven Great Ideas in Computer Architecture 10 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 25 1.6 Performance 29 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Going Faster: Matrix Multiply in Python 49 1.11 Fallacies and Pitfalls 50 1.12 Concluding Remarks 53 1.13 Historical Perspective and Further Reading 55 1.14 Self-Study 55 1.15 Exercises 59 2 Instructions: Language of the Computer 66 2.1 Introduction 68 2.2 Operations of the Computer Hardware 69 2.3 Operands of the Computer Hardware 73 2.4 Signed and Unsigned Numbers 80 2.5 Representing Instructions in the Computer 87 2.6 Logical Operations 95 2.7 Instructions for Making Decisions 98 2.8 Supporting Procedures in Computer Hardware 104 2.9 Communicating with People 114 2.10 RISC-V Addressing for Wide Immediates and Addresses 120 2.11 Parallelism and Instructions: Synchronization 128 2.12 Translating and Starting a Program 131 2.13 A C Sort Example to Put it All Together 140 Contents vii 2.14 Arrays versus Pointers 148 2.15 Advanced Material: Compiling C and Interpreting Java 151 2.16 Real Stuff: MIPS Instructions 152 2.17 Real Stuff: ARMv7 (32-bit) Instructions 153 2.18 Real Stuff: ARMv8 (64-bit) Instructions 157 2.19 Real Stuff: x86 Instructions 158 2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167 2.21 Going Faster: Matrix Multiply in C 168 2.22 Fallacies and Pitfalls 170 2.23 Concluding Remarks 172 2.24 Historical Perspective and Further Reading 174 2.25 Self-Study 175 2.26 Exercises 178 3 Arithmetic for Computers 188 3.1 Introduction 190 3.2 Addition and Subtraction 190 3.3 Multiplication 193 3.4 Division 199 3.5 Floating Point 208 3.6 Parallelism and Computer Arithmetic: Subword Parallelism 233 3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 234 3.8 Going Faster: Subword Parallelism and Matrix Multiply 236 3.9 Fallacies and Pitfalls 238 3.10 Concluding Remarks 241 3.11 Historical Perspective and Further Reading 242 3.12 Self-Study 242 3.13 Exercises 246 4 The Processor 252 4.1 Introduction 254 4.2 Logic Design Conventions 258 4.3 Building a Datapath 261 4.4 A Simple Implementation Scheme 269 4.5 Multicyle Implementation 282 4.6 An Overview of Pipelining 283 4.7 Pipelined Datapath and Control 296 4.8 Data Hazards: Forwarding versus Stalling 313 4.9 Control Hazards 325 4.10 Exceptions 333 4.11 Parallelism via Instructions 340 4.12 Putting it All Together: The Intel Core i7 6700 and ARM Cortex-A53 354 viii Contents 4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363 4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 365 4.15 Fallacies and Pitfalls 365 4.16 Concluding Remarks 367 4.17 Historical Perspective and Further Reading 368 4.18 Self-Study 368 4.19 Exercises 369 5 Large and Fast: Exploiting Memory Hierarchy 386 5.1 Introduction 388 5.2 Memory Technologies 392 5.3 The Basics of Caches 398 5.4 Measuring and Improving Cache Performance 412 5.5 Dependable Memory Hierarchy 431 5.6 Virtual Machines 436 5.7 Virtual Memory 440 5.8 A Common Framework for Memory Hierarchy 464 5.9 Using a Finite-State Machine to Control a Simple Cache 470 5.10 Parallelism and Memory Hierarchy: Cache Coherence 475 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 479 5.12 Advanced Material: Implementing Cache Controllers 480 5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies 480 5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions 486 5.15 Going Faster: Cache Blocking and Matrix Multiply 488 5.16 Fallacies and Pitfalls 489 5.17 Concluding Remarks 494 5.18 Historical Perspective and Further Reading 495 5.19 Self-Study 495 5.20 Exercises 499 6 Parallel Processors from Client to Cloud 518 6.1 Introduction 520 6.2 The Difficulty of Creating Parallel Processing Programs 522 6.3 SISD, MIMD, SIMD, SPMD, and Vector 527 6.4 Hardware Multithreading 534 6.5 Multicore and Other Shared Memory Multiprocessors 537 6.6 Introduction to Graphics Processing Units 542 6.7 Domain-Specific Architectures 549 6.8 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors 552 Contents ix 6.9 Introduction to Multiprocessor Network Topologies 557 6.10 Communicating to the Outside World: Cluster Networking 561 6.11 Multiprocessor Benchmarks and Performance Models 561 6.12 Real Stuff: Benchmarking the Google TPUv3 Supercomputer and an NVIDIA Volta GPU Cluster 572 6.13 Going Faster: Multiple Processors and Matrix Multiply 580 6.14 Fallacies and Pitfalls 583 6.15 Concluding Remarks 585 6.16 Historical Perspective and Further Reading 587 6.17 Self-Study 588 6.18 Exercises 590 A P P E N D I X A The Basics of Logic Design A-2 A.1 Introduction A-3 A.2 Gates, Truth Tables, and Logic Equations A-4 A.3 Combinational Logic A-9 A.4 Using a Hardware Description Language A-20 A.5 Constructing a Basic Arithmetic Logic Unit A-26 A.6 Faster Addition: Carry Lookahead A-37 A.7 Clocks A-47 A.8 Memory Elements: Flip-Flops, Latches, and Registers A-49 A.9 Memory Elements: SRAMs and DRAMs A-57 A.10 Finite-State Machines A-66 A.11 Timing Methodologies A-71 A.12 Field Programmable Devices A-77 A.13 Concluding Remarks A-78 A.14 Exercises A-79 Index I-1 O N L I N E C O N T E N T B Graphics and Computing GPUs B-2 B.1 Introduction B-3 B.2 GPU System Architectures B-7 B.3 Programming GPUs B-12 B.4 Multithreaded Multiprocessor Architecture B-25 B.5 Parallel Memory System B-36 B.6 Floating-point Arithmetic B-41 x Contents B.7 Real Stuff: The NVIDIA GeForce 8800 B-46 B.8 Real Stuff: Mapping Applications to GPUs B-55 B.9 Fallacies and Pitfalls B-72 B.10 Concluding Remarks B-76 B.11 Historical Perspective and Further Reading B-77 C Mapping Control to Hardware C-2 C.1 Introduction C-3 C.2 Implementing Combinational Control Units C-4 C.3 Implementing Finite-State Machine Control C-8 C.4 Implementing the Next-State Function with a Sequencer C-22 C.5 Translating a Microprogram to Hardware C-28 C.6 Concluding Remarks C-32 C.7 Exercises C-33 D Survey of Instruction Set Architectures D-2 D.1 Introduction D-3 D.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers D-4 D.3 The Intel 80×86 D-30 D.4 The VAX Architecture D-50 D.5 The IBM 360/370 Architecture for Mainframe Computers D-68 D.6 Historical Perspective and References D-74 Glossary G-1 Further Reading FR-1 Preface The most beautiful thing we can experience is the mysterious. It is the source of all true art and science. Albert Einstein, What I Believe, 1930 About This Book We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, energy, and, ultimately, the success of computer systems. Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for understanding the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers. The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective, given since the first edition. While programmers could ignore the advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster or be more energy-efficient without change, that era is over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers. The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does. xii Preface About the Other Book Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers. A majority of the readers for this book do not plan to become computer architects. The performance and energy efficiency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications. Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the first editions of both books. Why a RISC-V Edition? The choice of instruction set architecture is clearly critical to the pedagogy of a computer architecture textbook. We didn’t want an instruction set that required describing unnecessary baroque features for someone’s first instruction set, no matter how popular it is. Ideally, your initial instruction set should be an exemplar, just like your first love. Surprisingly, you remember both fondly. Since there were so many choices at the time, for the first edition of Computer Architecture: A Quantitative Approach, we invented our own RISC-style instruction set. Given the growing popularity and the simple elegance of the MIPS instruction set, we switched to it for the first edition of this book and to later editions of the other book. MIPS has served us and our readers well. It’s been many years since we made that switch, and while billions of chips that use MIPS continue to be shipped, they are typically in found embedded devices where the instruction set is nearly invisible. Thus, for a while now it’s been hard to find a real computer on which readers can download and run MIPS programs. The good news is that an open instruction set that adheres closely to the RISC principles has recently debuted, and it is rapidly gaining a following. RISC-V, which was developed originally at UC Berkeley, not only cleans up the quirks of the MIPS Preface xiii instruction set, but it offers a simple, elegant, modern take on what instruction sets should look like in 2020. Moreover, because it is not proprietary, there are open-source RISC-V simulators, compilers, debuggers, and so on easily available and even open-source RISC-V implementations available written in hardware description languages. Moreover, 2020 saw the introduction of low-cost boards based on RISC-V that are the equivalent of the Raspberry Pi, which is not the case for MIPS. Readers will not only benefit from studying these RISC-V designs, they will be able to modify them and go through the implementation process in order to understand the impact of their hypothetical changes on performance, die size, and energy. This is an exciting opportunity for the computing industry as well as for education, and thus at the time of this writing more than 300 companies have joined the RISC-V foundation. This sponsor list includes virtually all the major players except for ARM and Intel, including Alibaba, Amazon, AMD, Google, Hewlett Packard Enterprise, IBM, Microsoft, NVIDIA, Qualcomm, Samsung, and Western Digital. It is for these reasons that we wrote a RISC-V edition of this book, and we switched Computer Architecture: A Quantitative Approach to RISC-V as well. With this edition of the RISC-V version, we switched from 64-bit RV64 to 32-bit RV32. Instructors found that the extra complexity of a 64-bit instruction set made it harder on the students. RV32 reduces the core architecture by 10 instructions— dropping ld, sd, lwu, addw, subw, addwi, sllw, srlw, sllwiw, srliw—and they don’t have to understand operations on the lower 32 bits of a 64-bit register. We also can largely ignore doublewords and just use words in the text. in this edition we also hid the odd-looking SB and UJ formats until Chapter 4. We explain the hardware savings of the swirled bit orderings in the immediate fields of SB and UJ later since that chapter is where we show the datapath hardware. Just as we did for the sixth MIPS addition, we added an online section showing a multiple-clock- cycle implementation for this edition, but we modified it to match RISC-V. Some faculty prefer to go through the multicycle implementation after the single-cycle implementation before introducing pipelining. The only changes for the RISC-V edition from the MIPS edition are those associated with the change in instruction sets, which primarily affects Chapter 2, Chapter 3, the virtual memory section in Chapter 5, and the short VMIPS example in Chapter 6. In Chapter 4, we switched to RISC-V instructions, changed several figures, and added a few “Elaboration” sections, but the changes were simpler than we had feared. Chapter 1 and most of the appendices are virtually unchanged. The extensive online documentation and combined with the magnitude of RISC-V make it difficult to come up with a replacement for the MIPS version of Appendix A (“Assemblers, Linkers, and the SPIM Simulator” in the MIPS Sixth Edition). Instead, Chapters 2, 3, and 5 include quick overviews of the hundreds of RISC-V instructions outside of the core RISC-V instructions that we cover in detail in the rest of the book. The current plan is to continue revisiing both the MIPS and RISC-V versions of this book, as we did in 2020. xiv Preface Changes for the Second Edition There is arguably been more change in the technology and business of computer architecture since the fifth edition than there were for the first five: The slowing of Moore’s Law. After 50 years of biannual doubling of the number of transistors per chip, Gordon Moore’s prediction no longer holds. Semiconductor technology will still improve, but more slowly and less predictably than in the past. The rise of Domain Specific Architectures (DSA). In part due to the slowing of Moore’s Law and in part due to the end of Dennard Scaling, general purpose processors are only improving a few percent per year. Moreover, Amdahl’s Law limits the practical benefit of increasing the number of processors per chip. In 2020, it is widely believed that the most promising path forward is DSA. It doesn’t try to run everything well like general purpose processors, but focuses on running programs of one domain much better than conventional CPUs. Microarchitecture as a security attack surface. Spectre demonstrated that speculative out-of-order execution and hardware multithreading make timing based side-channel attacks practical. Moreover, these are not due to bugs that can be fixed, but a fundamental challenge to this style of processor design. Open instruction sets and open source implementations. The opportunities and impact of open source software have come to computer architecture. Open instruction sets like RISC-V enables organizations to build their own processors without first negotiating a license, which has enabled open-source implementations that are shared to freely download and use as well as proprietary implementations of RISC-V. Open-source software and hardware are a boon to academic research and instruction, allowing students to see and enhance industrial strength technology. The re-virticalization of the information technology industry. Cloud computing has led to no more than a half-dozen companies that provide computing infrastructure for everyone to use. Much like IBM in the 1960s and 1970s, these companies determine both the software stack and the hardware that they deploy. The changes above have led to some of these “hyperscalers” developing their own DSA and RISC-V chips for deployment in their clouds. Preface xv Chapter or Appendix Sections Software focus Hardware focus 1. Computer Abstractions 1.1 to 1.12 and Technology 1.13 (History) 2.1 to 2.14 2. Instructions: Language 2.15 (Compilers & Java) of the Computer 2.16 to 2.22 2.23 (History) D. RISC Instruction-Set Architectures D.1 to D.6 3.1 to 3.5 3.6 to 3.8 (Subword Parallelism) 3.9 to 3.10 (Fallacies) 3. Arithmetic for Computers 3.11 (History) A. The Basics of Logic Design A.1 to A.13 4.1 (Overview) 4.2 (Logic Conventions) 4.3 to 4.4 (Simple Implementation) 4.5 (Multicycle Implementation) 4.6 (Pipelining Overview) 4. The Processor 4.7 (Pipelined Datapath) 4.8 to 4.10 (Hazards, Exceptions) 4.11 to 4.13 (Parallel, Real Stuff) 4.14 (Verilog Pipeline Control) 4.15 to 4.16 (Fallacies) 4.17 (History) C. Mapping Control to Hardware C.1 to C.6 5.1 to 5.10 5.11 (Redundant Arrays of Inexpensive Disks) 5. Large and Fast: Exploiting Memory Hierarchy 5.12 (Verilog Cache Controller) 5.13 to 5.16 5.17 (History) 6.1 to 6.9 6. Parallel Process from Client 6.10 (Clusters) to Cloud 6.11 to 6.15 6.16 (History) B. Graphics Processor Units B.1 to B.11 Read carefully Read if have time Reference Review or read Read for culture xvi Preface The second edition of COD (RISC-V edition) reflects these recent changes, updates all the examples and figures, responds to requests of instructors, plus adds a pedagogic improvement inspired by textbooks I used to help my grandchildren with their math classes. The Going Faster section is now in every chapter. It starts with a Python version in Chapter 1, whose poor performance inspires learning C and then rewriting matrix multiply in C in Chapter 2. The remaining chapters accelerate matrix multiply by leveraging data-level parallelism, instruction-level parallelism, thread-level parallelism, and by adjusting memory accesses to match the memory hierarchy of a modern server. This computer has 512-bit SIMD operations, speculative out-of-order execution, three levels of caches, and 48 cores. All four optimizations add only 21 lines of C code yet speedup matrix multiply by almost 50,000, cutting it from nearly 6 hours in Python to less than 1 second in optimized C. If I were a student again, this running example would inspire me to use C and learn the underlying hardware concepts of this book. With this edition, every chapter has a Self-Study section that asks thought provoking questions and supplies the answers afterwards to help you evaluate if you follow the material on your own. Besides explaining that Moore’s Law and Dennard Scaling no longer hold, we’ve de-emphasized Moore’s Law as a change agent that was prominent in the fifth edition. Chapter 2 has more material to emphasize that binary data has no inherent meaning—the program determines the data type—not an easy concept for beginners to grasp. Chapter 2 also includes a short description of the MIPS as a contrasting instruction set to RISC-V alongside ARMv7, ARMv8, and x86. (There is also a companion version of this book based on MIPS instead of RISC-V, and we’re updating that with the other changes as well.) The benchmark example of Chapter 2 is upgraded to SPEC2017 from SPEC2006. At instructors’ request, we’ve restored the multi-cycle implementation of RISC-V as an online section in Chapter 4 between the single-cycle implementation and the pipelined implementation. Some instructors find these three steps an easier path to teach pipelining. The Putting It All Together examples of Chapters 4 and 5 were updated to the recent ARM A53 microarchitecture and the Intel i7 6700 Skyelake microarchitecture. The Fallacies and Pitfalls Sections of Chapters 5 and 6 added pitfalls around hardware security attacks of Row Hammer and Spectre. Chapter 6 has a new section introducing DSAs using Google’s Tensor Processing Unit (TPU) version 1. Chapter 6’s Putting it All Together section Preface xvii is updated to compare Google’s TPUv3 DSA supercomputer to a cluster of NVIDIA Volta GPUs. Finally, we updated all the exercises in the book. While some elements changed, we have preserved useful book elements from prior editions. To make the book work better as a reference, we still place definitions of new terms in the margins at their first occurrence. The book element called “Understanding Program Performance” sections helps readers understand the performance of their programs and how to improve it, just as the “Hardware/ Software Interface” book element helped readers understand the tradeoffs at this interface. “The Big Picture” section remains so that the reader sees the forest despite all the trees. “Check Yourself ” sections help readers to confirm their comprehension of the material on the first time through with answers provided at the end of each chapter. This edition still includes the green RISC-V reference card, which was inspired by the “Green Card” of the IBM System/360. This card has been updated and should be a handy reference when writing RISC-V assembly language programs. Instructor Support We have collected a great deal of material to help instructors teach courses using this book. Solutions to exercises, figures from the book, lecture slides, and other materials are available to instructors who register with the publisher. In addition, the companion Web site provides links to a free RISC-V software. Check the publisher’s website for more information: https://textbooks.elsevier.com/web/manuals.aspx?isbn=9780128203316 Concluding Remarks If you read the following acknowledgments section, you will see that we went to great lengths to correct mistakes. Since a book goes through many printings, we have the opportunity to make even more corrections. If you uncover any remaining, resilient bugs, please contact the publisher. This edition is the fourth break in the long-standing collaboration between Hennessy and Patterson, which started in 1989. The demands of running one of the world’s great universities meant that President Hennessy could no longer had the tme the substantial commitment to create a new edition. The remaining author felt once again like a tightrope walker without a safety net. Hence, the people in the acknowledgments and Berkeley colleagues played an even larger role in shaping the contents of this book. Nevertheless, this time around there is only one author to blame for the new material in what you are about to read. Acknowledgments for the Second Edition With every edition of this book, we are very fortunate to receive help from many readers, reviewers, and contributors. Each of these people has helped to make this book better. xviii Preface We are grateful for the assistance of Khaled Benkrid and his colleagues at ARM Ltd., who carefully reviewed the ARM-related material and provided helpful feedback. Special thanks goes to Dr. Rimas Avizenis, who developed the various versions of matrix multiply and supplied the performance numbers as well. I deeply appreciate his continued help after he has graduated from UC Berkeley. As I worked with his father while I was a graduate student at UCLA, it was a nice symmetry to work with Rimas when he was a graduate student at UC Berkeley. I also wish to thank my longtime collaborator Randy Katz of UC Berkeley, who helped develop the concept of great ideas in computer architecture as part of the extensive revision of an undergraduate class that we did together. I’d like to thank David Kirk, John Nickolls, and their colleagues at NVIDIA (Michael Garland, John Montrym, Doug Voorhies, Lars Nyland, Erik Lindholm, Paulius Micikevicius, Massimiliano Fatica, Stuart Oberman, and Vasily Volkov) for writing the first in-depth appendix on GPUs. I’d like to express again my appreciation to Jim Larus, recently named Dean of the School of Computer and Communications Science at EPFL, for his willingness in contributing his expertise on assembly language programming, as well as for welcoming readers of this book with regard to using the simulator he developed and maintains. I am also very grateful to Jason Bakos of the University of South Carolina, who updated and created new exercises, based on the exercises created by Perry Alexander (The University of Kansas); Javier Bruguera (Universidade de Santiago de Compostela); Matthew Farrens (University of California, Davis); Zachary Kurmas (Grand Valley State University); David Kaeli (Northeastern University); Nicole Kaiyan (University of Adelaide); John Oliver (Cal Poly, San Luis Obispo); Milos Prvulovic (Georgia Tech); Jichuan Chang (Google); Jacob Leverich (Stanford); Kevin Lim (Hewlett-Packard); and Partha Ranganathan (Google). Additional thanks goes to Jason Bakos for updating the lecture slides, based on updated slides from Peter Ashenden (Ashenden Design Pty Ltd). I am grateful to the many instructors who have answered the publisher’s surveys, reviewed our proposals, and attended focus groups. They include the following individuals: Focus Groups: Bruce Barton (Suffolk County Community College), Jeff Braun (Montana Tech), Ed Gehringer (North Carolina State), Michael Goldweber (Xavier University), Ed Harcourt (St. Lawrence University), Mark Hill (University of Wisconsin, Madison), Patrick Homer (University of Arizona), Norm Jouppi (HP Labs), Dave Kaeli (Northeastern University), Christos Kozyrakis (Stanford University), Jae C. Oh (Syracuse University), Lu Peng (LSU), Milos Prvulovic (Georgia Tech), Partha Ranganathan (HP Labs), David Wood (University of Wisconsin), Craig Zilles (University of Illinois at Urbana-Champaign). Surveys and Reviews: Mahmoud Abou-Nasr (Wayne State University), Perry Alexander (The University of Kansas), Behnam Arad (Sacramento State University), Hakan Aydin (George Mason University), Hussein Badr (State University of New York at Stony Brook), Mac Baker (Virginia Military Institute), Ron Barnes (George Mason University), Douglas Blough (Georgia Institute of Technology), Kevin Bolding (Seattle Pacific University), Miodrag Bolic (University of Ottawa), John Bonomo Preface xix (Westminster College), Jeff Braun (Montana Tech), Tom Briggs (Shippensburg University), Mike Bright (Grove City College), Scott Burgess (Humboldt State University), Fazli Can (Bilkent University), Warren R. Carithers (Rochester Institute of Technology), Bruce Carlton (Mesa Community College), Nicholas Carter (University of Illinois at Urbana-Champaign), Anthony Cocchi (The City University of New York), Don Cooley (Utah State University), Gene Cooperman (Northeastern University), Robert D. Cupper (Allegheny College), Amy Csizmar Dalal (Carleton College), Daniel Dalle (Université de Sherbrooke), Edward W. Davis (North Carolina State University), Nathaniel J. Davis (Air Force Institute of Technology), Molisa Derk (Oklahoma City University), Andrea Di Blas (Stanford University), Nathan B. Doge (The University of Texas at Dallas), Derek Eager (University of Saskatchewan), Ata Elahi (Souther Connecticut State University), Ernest Ferguson (Northwest Missouri State University), Rhonda Kay Gaede (The University of Alabama), Etienne M. Gagnon (L’Université du Québec à Montréal), Costa Gerousis (Christopher Newport University), Paul Gillard (Memorial University of Newfoundland), Michael Goldweber (Xavier University), Georgia Grant (College of San Mateo), Paul V. Gratz (Texas A&M University), Merrill Hall (The Master’s College), Tyson Hall (Southern Adventist University), Ed Harcourt (St. Lawrence University), Justin E. Harlow (University of South Florida), Paul F. Hemler (Hampden-Sydney College), Jayantha Herath (St. Cloud State University), Martin Herbordt (Boston University), Steve J. Hodges (Cabrillo College), Kenneth Hopkinson (Cornell University), Bill Hsu (San Francisco State University), Dalton Hunkins (St. Bonaventure University), Baback Izadi (State University of New York—New Paltz), Reza Jafari, Abbas Javadtalab (Concordia University), Robert W. Johnson (Colorado Technical University), Bharat Joshi (University of North Carolina, Charlotte), Nagarajan Kandasamy (Drexel University), Rajiv Kapadia, Ryan Kastner (University of California, Santa Barbara), E.J. Kim (Texas A&M University), Jihong Kim (Seoul National University), Jim Kirk (Union University), Geoffrey S. Knauth (Lycoming College), Manish M. Kochhal (Wayne State), Suzan Koknar-Tezel (Saint Joseph’s University), Angkul Kongmunvattana (Columbus State University), April Kontostathis (Ursinus College), Christos Kozyrakis (Stanford University), Danny Krizanc (Wesleyan University), Ashok Kumar, S. Kumar (The University of Texas), Zachary Kurmas (Grand Valley State University), Adrian Lauf (University of Louisville), Robert N. Lea (University of Houston), Alvin Lebeck (Duke University), Baoxin Li (Arizona State University), Li Liao (University of Delaware), Gary Livingston (University of Massachusetts), Michael Lyle, Douglas W. Lynn (Oregon Institute of Technology), Yashwant K Malaiya (Colorado State University), Stephen Mann (University of Waterloo), Bill Mark (University of Texas at Austin), Ananda Mondal (Claflin University), Euripedes Montagne (University of Central Florida), Tali Moreshet (Boston University), Alvin Moser (Seattle University), Walid Najjar (University of California, Riverside), Vijaykrishnan Narayanan (Penn State University), Danial J. Neebel (Loras College), Victor Nelson (Auburn University), John Nestor (Lafayette College), Jae C. Oh (Syracuse University), Joe Oldham (Centre College), Timour Paltashev, James Parkerson (University of Arkansas), Shaunak Pawagi (SUNY at Stony Brook), Steve Pearce, xx Preface Ted Pedersen (University of Minnesota), Lu Peng (Louisiana State University), Gregory D. Peterson (The University of Tennessee), William Pierce (Hood College), Milos Prvulovic (Georgia Tech), Partha Ranganathan (HP Labs), Dejan Raskovic (University of Alaska, Fairbanks) Brad Richards (University of Puget Sound), Roman Rozanov, Louis Rubinfield (Villanova University), Md Abdus Salam (Southern University), Augustine Samba (Kent State University), Robert Schaefer (Daniel Webster College), Carolyn J. C. Schauble (Colorado State University), Keith Schubert (CSU San Bernardino), William L. Schultz, Kelly Shaw (University of Richmond), Shahram Shirani (McMaster University), Scott Sigman (Drury University), Shai Simonson (Stonehill College), Bruce Smith, David Smith, Jeff W. Smith (University of Georgia, Athens), Mark Smotherman (Clemson University), Philip Snyder (Johns Hopkins University), Alex Sprintson (Texas A&M), Timothy D. Stanley (Brigham Young University), Dean Stevens (Morningside College), Nozar Tabrizi (Kettering University), Yuval Tamir (UCLA), Alexander Taubin (Boston University), Will Thacker (Winthrop University), Mithuna Thottethodi (Purdue University), Manghui Tu (Southern Utah University), Dean Tullsen (UC San Diego), Steve VanderLeest (Calvin College), Christopher Vickery (Queens College of CUNY), Rama Viswanathan (Beloit College), Ken Vollmar (Missouri State University), Guoping Wang (Indiana-Purdue University), Patricia Wenner (Bucknell University), Kent Wilken (University of California, Davis), David Wolfe (Gustavus Adolphus College), David Wood (University of Wisconsin, Madison), Ki Hwan Yum (University of Texas, San Antonio), Mohamed Zahran (City College of New York), Amr Zaky (Santa Clara University), Gerald D. Zarnett (Ryerson University), Nian Zhang (South Dakota School of Mines & Technology), Xiaoyu Zhang (California State University San Marcos), Jiling Zhong (Troy University), Huiyang Zhou (North Carolina State University), Weiyu Zhu (Illinois Wesleyan University). A special thanks also goes to Mark Smotherman for making multiple passes to find technical and writing glitches that significantly improved the quality of this edition. We wish to thank the extended Morgan Kaufmann family for agreeing to publish this book again under the able leadership of Katey Birtcher, Steve Merken, and Beth LoGiudice: I certainly couldn’t have completed the book without them. We also want to extend thanks to Janish Paul, who managed the book production process, and Patrick Ferguson, who did the cover design. Finally, I owe a huge debt to Yunsup Lee and Andrew Waterman for taking on the first edition’s conversion to RISC-V in their spare time while founding a startup company. Kudos to Eric Love as well, who made the original RISC-V versions of the exercises this book while finishing his Ph.D. We’re all excited to see what will happen with RISC-V in academia and beyond. The contributions of the nearly 150 people we mentioned here have helped make this new edition what I hope will be our best book yet. Enjoy! David A. Patterson THIS PAGE INTENTIONALLY LEFT BLANK 1 Computer Abstractions and Civilization advances Technology by extending the 1.1 Introduction 3 number of important 1.2 Seven Great Ideas in Computer operations which we Architecture 10 can perform without 1.3 Below Your Program 13 thinking about them. 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Alfred North Whitehead, An Introduction to Mathematics, 1911 Memory 25 Computer Organization and Design RISC-V Edition. DOI: http://dx.doi.org/10.1016/B978-0-12-820331-6.00001-6 © 2016 2021 Elsevier Inc. All rights reserved. 1.6 Performance 29 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Going Faster: Matrix Multiply in Python 49 1.11 Fallacies and Pitfalls 50 1.12 Concluding Remarks 53 1.13 Historical Perspective and Further Reading 55 1.14 Self-Study 55 1.15 Exercises 59 1.1 Introduction Welcome to this book! We’re delighted to have this opportunity to convey the excitement of the world of computer systems. This is not a dry and dreary field, where progress is glacial and where new ideas atrophy from neglect. No! Computers are the product of the incredibly vibrant information technology industry, all aspects of which are responsible for almost 10% of the gross national product of the United States, and whose economy has become dependent in part on the rapid improvements in information technology. This unusual industry embraces innovation at a breathtaking rate. In the last 40 years, there have been a number of new computers whose introduction appeared to revolutionize the computing industry; these revolutions were cut short only because someone else built an even better computer. This race to innovate has led to unprecedented progress since the inception of electronic computing in the late 1940s. Had the transportation industry kept pace with the computer industry, for example, today we could travel from New York to London in a second for a penny. Take just a moment to contemplate how such an improvement would change society—living in Tahiti while working in San Francisco, going to Moscow for an evening at the Bolshoi Ballet—and you can appreciate the implications of such a change. 4 Chapter 1 Computer Abstractions and Technology Computers have led to a third revolution for civilization, with the information revolution taking its place alongside the agricultural and industrial revolutions. The resulting multiplication of humankind’s intellectual strength and reach naturally has affected our everyday lives profoundly and changed the ways in which the search for new knowledge is carried out. There is now a new vein of scientific investigation, with computational scientists joining theoretical and experimental scientists in the exploration of new frontiers in astronomy, biology, chemistry, and physics, among others. The computer revolution continues. Each time the cost of computing improves by another factor of 10, the opportunities for computers multiply. Applications that were economically infeasible suddenly become practical. In the recent past, the following applications were “computer science fiction.” Computers in automobiles: Until microprocessors improved dramatically in price and performance in the early 1980s, computer control of cars was ludicrous. Today, computers reduce pollution, improve fuel efficiency via engine controls, and increase safety through nearly automated driving and air bag inflation to protect occupants in a crash. Cell phones: Who would have dreamed that advances in computer systems would lead to more than half of the planet having mobile phones, allowing person-to-person communication to almost anyone anywhere in the world? Human genome project: The cost of computer equipment to map and analyze human DNA sequences was hundreds of millions of dollars. It’s unlikely that anyone would have considered this project had the computer costs been 10 to 100 times higher, as they would have been 15 to 25 years earlier. Moreover, costs continue to drop; you will soon be able to acquire your own genome, allowing medical care to be tailored to you. World Wide Web: Not in existence at the time of the first edition of this book, the web has transformed our society. For many, the web has replaced libraries and newspapers. Search engines: As the content of the web grew in size and in value, finding relevant information became increasingly important. Today, many people rely on search engines for such a large part of their lives that it would be a hardship to go without them. Clearly, advances in this technology now affect almost every aspect of our society. Hardware advances have allowed programmers to create wonderfully useful software, which explains why computers are omnipresent. Today’s science fiction suggests tomorrow’s killer applications: already on their way are glasses that augment reality, the cashless society, and cars that can drive themselves. 1.1 Introduction 5 Traditional Classes of Computing Applications and Their Characteristics Although a common set of hardware technologies (see Sections 1.4 and 1.5) is used in computers ranging from smart home appliances to cell phones to the largest personal computer (PC) A computer supercomputers, these different applications have distinct design requirements designed for use by and employ the core hardware technologies in different ways. Broadly speaking, an individual, usually computers are used in three dissimilar classes of applications. incorporating a graphics Personal computers (PCs) in the form of laptops are possibly the best-known display, a keyboard, and a form of computing, which readers of this book have likely used extensively. Personal mouse. computers emphasize delivery of good performance to single users at low costs and server A computer usually execute third-party software. This class of computing drove the evolution of used for running larger many computing technologies, which is merely 40 years old! programs for multiple Servers are the modern form of what were once much larger computers, and users, often simultaneously, are usually accessed only via a network. Servers are oriented to carrying sizable and typically accessed only workloads, which may consist of either single complex applications—usually a via a network. scientific or engineering application—or handling many small jobs, such as would supercomputer A class occur in building a large web server. These applications are usually based on of computers with the software from another source (such as a database or simulation system), but are highest performance and often modified or customized for a particular function. Servers are built from the cost; they are configured same basic technology as desktop computers, but provide for greater computing, as servers and typically storage, and input/output capacity. In general, servers also place a higher emphasis cost tens to hundreds of millions of dollars. on dependability, since a crash is usually more costly than it would be on a single- user PC. terabyte (TB) Originally Servers span the widest range in cost and capability. At the low end, a server may be 1,099,511,627,776 little more than a desktop computer without a screen or keyboard and cost a thousand (240) bytes, although communications and dollars. These low-end servers are typically used for file storage, small business secondary storage applications, or simple web serving. At the other extreme are supercomputers, which systems developers at the present consist of hundreds of thousands of processors and many terabytes started using the term to of memory, and cost tens to hundreds of millions of dollars. Supercomputers are mean 1,000,000,000,000 usually used for high-end scientific and engineering calculations, such as weather (1012) bytes. To reduce forecasting, oil exploration, protein structure determination, and other large-scale confusion, we now use the problems. Although such supercomputers represent the peak of computing capability, term tebibyte (TiB) for 240 bytes, defining terabyte they represent a relatively small fraction of the servers and thus a proportionally tiny (TB) to mean 1012 bytes. fraction of the overall computer market in terms of total revenue. Figure 1.1 shows the full Embedded computers are the largest class of computers and span the widest range range of decimal and of applications and performance. Embedded computers include the microprocessors binary values and names. found in your car, the computers in a television set, and the networks of processors that control a modern airplane or cargo ship. A popular term today is Internet of Things (IoT) which suggests may small devices that all communicate wirelessly over embedded computer A computer inside the Internet. Embedded computing systems are designed to run one application or another device used one set of related applications that are normally integrated with the hardware and for running one delivered as a single system; thus, despite the large number of embedded computers, predetermined application most users never really see that they are using a computer! or collection of software. 6 Chapter 1 Computer Abstractions and Technology kilobyte KB 103 kibibyte KiB 210 2% megabyte MB 106 mebibyte MiB 220 5% gigabyte GB 109 gibibyte GiB 230 7% terabyte TB 1012 tebibyte TiB 240 10% petabyte PB 1015 pebibyte PiB 250 13% exabyte EB 1018 exbibyte EiB 260 15% zettabyte ZB 1021 zebibyte ZiB 270 18% yottabyte YB 1024 yobibyte YiB 280 21% ronnabyte RB 10 27 robibyte RiB 2 90 24% queccabyte QB 10 30 quebibyte QiB 2 100 27% FIGURE 1.1 The 2X vs. 10Y bytes ambiguity was resolved by adding a binary notation for all the common size terms. In the last column we note how much larger the binary term is than its corresponding decimal term, which is compounded as we head down the chart. These prefixes work for bits as well as bytes, so gigabit (Gb) is 109 bits while gibibits (Gib) is 230 bits. The society that runs the metric system created the decimal prefixes, with the last two proposed only in 2019 in anticipation of the global capacity of storage systems. All the names are derived from the entymology in Latin of the powers of 1000 that they represent. Embedded applications often have unique application requirements that combine a minimum performance with stringent limitations on cost or power. For example, consider a music player: the processor need only to be as fast as necessary to handle its limited function, and beyond that, minimizing cost and power is the most important objective. Despite their low cost, embedded computers often have lower tolerance for failure, since the results can vary from upsetting (when your new television crashes) to devastating (such as might occur when the computer in a plane or cargo ship crashes). In consumer-oriented embedded applications, such as a digital home appliance, dependability is achieved primarily through simplicity— the emphasis is on doing one function as perfectly as possible. In large embedded systems, techniques of redundancy from the server world are often employed. Although this book focuses on general-purpose computers, most concepts apply directly, or with slight modifications, to embedded computers. Elaboration: Elaborations are short sections used throughout the text to provide more detail on a particular subject that may be of interest. Disinterested readers may skip over an Elaboration, since the subsequent material will never depend on the contents of the Elaboration. Many embedded processors are designed using processor cores, a version of a processor written in a hardware description language, such as Verilog or VHDL (see Chapter 4). The core allows a designer to integrate other application-specific hardware with the processor core for fabrication on a single chip. Welcome to the Post-PC Era The continuing march of technology brings about generational changes in computer hardware that shake up the entire information technology industry. Since the fourth edition of the book, we have undergone such a change, as significant in the 1.1 Introduction 7 1600 Smart phone 1400 1200 1000 Millions 800 600 Cell phone 400 (excluding smart phones) PC (excluding tablets) 200 Tablet 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 FIGURE 1.2 The number manufactured per year of tablets and smart phones, which reflect the post-PC era, versus personal computers and traditional cell phones. Smart Personal mobile phones represent the recent growth in the cell phone industry, and they passed PCs in 2011. PCs, tablets, and devices (PMDs) are traditional cell phone categories are declining. The peak volume years are 2011 for cell phones, 2013 for PCs, and 2014 for tablets. PCs fell from 20% of total units shipped in 2007 to 10% in 2018. small wireless devices to connect to the Internet; past as the switch starting 40 years ago to personal computers. Replacing the PC they rely on batteries for is the personal mobile device (PMD). PMDs are battery operated with wireless power, and software is installed by downloading connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, apps. Conventional users can download software (“apps”) to run on them. Unlike PCs, they no longer examples are smart have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen phones and tablets. or even speech input. Today’s PMD is a smart phone or a tablet computer, but tomorrow it may include electronic glasses. Figure 1.2 shows the rapid growth over Cloud Computing refers to large collections time of tablets and smart phones versus that of PCs and traditional cell phones. of servers that provide Taking over from the conventional server is Cloud Computing, which relies services over the Internet; upon giant datacenters that are now known as Warehouse Scale Computers (WSCs). some providers rent Companies like Amazon and Google build these WSCs containing 50,000 servers and dynamically varying then let companies rent portions of them so that they can provide software services to numbers of servers as a PMDs without having to build WSCs of their own. Indeed, Software as a Service (SaaS) utility. deployed via the Cloud is revolutionizing the software industry just as PMDs and WSCs Software as a Service are revolutionizing the hardware industry. Today’s software developers will often have a (SaaS) delivers software portion of their application that runs on the PMD and a portion that runs in the Cloud. and data as a service over the Internet, usually via a thin program such as a What You Can Learn in This Book browser that runs on local Successful programmers have always been concerned about the performance of client devices, instead of their programs, because getting results to the user quickly is critical in creating binary code that must be installed, and runs wholly popular software. In the 1960s and 1970s, a primary constraint on computer on that device. Examples performance was the size of the computer’s memory. Thus, programmers often include web search and followed a simple credo: minimize memory space to make programs fast. In the social networking. 8 Chapter 1 Computer Abstractions and Technology last two decades, advances in computer design and memory technology have greatly reduced the importance of small memory size in most applications other than those in embedded computing systems. Programmers interested in performance now need to understand the issues that have replaced the simple memory model of the 1960s: the parallel nature of processors and the hierarchical nature of memories. We demonstrate the importance of this understanding in Chapters 3 to 6 by showing how to improve performance of a C program by a factor of 200. Moreover, as we explain in Section 1.7, today’s programmers need to worry about energy efficiency of their programs running either on the PMD or in the Cloud, which also requires understanding what is below your code. Programmers who seek to build competitive versions of software will therefore need to increase their knowledge of computer organization. We are honored to have the opportunity to explain what’s inside this revolutionary machine, unraveling the software below your program and the hardware under the covers of your computer. By the time you complete this book, we believe you will be able to answer the following questions: How are programs written in a high-level language, such as C or Java, translated into the language of the hardware, and how does the hardware execute the resulting program? Comprehending these concepts forms the basis of understanding the aspects of both the hardware and software that affect program performance. What is the interface between the software and the hardware, and how does software instruct the hardware to perform needed functions? These concepts are vital to understanding how to write many kinds of software. What determines the performance of a program, and how can a programmer improve the performance? As we will see, this depends on the original program, the software translation of that program into the computer’s language, and the effectiveness of the hardware in executing the program. What techniques can be used by hardware designers to improve performance? This book will introduce the basic concepts of modern computer design. The interested reader will find much more material on this topic in our advanced book, Computer Architecture: A Quantitative Approach. What techniques can be used by hardware designers to improve energy efficiency? What can the programmer do to help or hinder energy efficiency? What are the reasons for and the consequences of the switch from sequential processing to parallel processing? This book gives the motivation, describes multicore the current hardware mechanisms to support parallelism, and surveys the microprocessor new generation of “multicore” microprocessors (see Chapter 6). A microprocessor containing multiple Since the first commercial computer in 1951, what great ideas did processors (“cores”) in a computer architects come up with that lay the foundation of modern single integrated circuit. computing? 1.1 Introduction 9 Without understanding the answers to these questions, improving the performance of your program on a modern computer or evaluating what features might make one computer better than another for a particular application will be a complex process of trial and error, rather than a scientific procedure driven by insight and analysis. This first chapter lays the foundation for the rest of the book. It introduces the basic ideas and definitions, places the major components of software and hardware in perspective, shows how to evaluate performance and energy, introduces integrated circuits (the technology that fuels the computer revolution), and explains the shift to multicores. In this chapter and later ones, you will likely see many new words, or words that you may have heard but are not sure what they mean. Don’t panic! Yes, there is a lot of special terminology used in describing modern computers, but the terminology actually helps, since it enables us to describe precisely a function or capability. In addition, computer designers (including your authors) love using acronyms, which are easy to understand once you know what the letters stand for! acronym A word To help you remember and locate terms, we have included a highlighted definition constructed by taking the of every term in the margins the first time it appears in the text. After a short initial letters of a string of words. For example: time of working with the terminology, you will be fluent, and your friends will RAM is an acronym for be impressed as you correctly use acronyms such as BIOS, CPU, DIMM, DRAM, Random Access Memory, PCIe, SATA, and many others. and CPU is an acronym To reinforce how the software and hardware systems used to run a program will for Central Processing affect performance, we use a special section, Understanding Program Performance, Unit. throughout the book to summarize important insights into program performance. The first one appears below. The performance of a program depends on a combination of the effectiveness of the Understanding algorithms used in the program, the software systems used to create and translate Program the program into machine instructions, and the effectiveness of the computer in executing those instructions, which may include input/output (I/O) operations. Performance This table summarizes how the hardware and software affect performance. Hardware or software Where is this component How this component affects performance topic covered? Algorithm Determines both the number of source-level Other books! statements and the number of I/O operations executed Programming language, Determines the number of computer instructions Chapters 2 and 3 compiler, and architecture for each source-level statement Processor and memory Determines how fast instructions can be Chapters 4, 5, and 6 system executed I/O system (hardware and Determines how fast I/O operations may be Chapters 4, 5, and 6 operating system) executed 10 Chapter 1 Computer Abstractions and Technology Check Check Yourself sections are designed to help readers assess whether they Yourself comprehend the major concepts introduced in a chapter and understand the implications of those concepts. Some Check Yourself questions have simple answers; others are for discussion among a group. Answers to the specific questions can be found at the end of the chapter. Check Yourself questions appear only at the end of a section, making it easy to skip them if you are sure you understand the material. 1. The number of embedded processors sold every year greatly outnumbers the number of PC and even post-PC processors. Can you confirm or deny this insight based on your own experience? Try to count the number of embedded processors in your home. How does it compare with the number of conventional computers in your home? 2. As mentioned earlier, both the software and hardware affect the performance of a program. Can you think of examples where each of the following is the right place to look for a performance bottleneck? The algorithm chosen The programming language or compiler The operating system The processor The I/O system and devices Seven Great Ideas in Computer 1.2 Architecture We now introduce seven great ideas that computer architects have invented in the last 60 years of computer design. These ideas are so powerful they have lasted long after the first computer that used them, with newer architects demonstrating their admiration by imitating their predecessors. These great ideas are themes that we will weave through this and subsequent chapters as examples arise. To point out their influence, in this section we introduce icons and highlighted terms that represent the great ideas and we use them to identify the nearly 100 sections of the book that feature use of the great ideas. 1.2 Seven Great Ideas in Computer Architecture 11 Use Abstraction to Simplify Design Both computer architects and programmers had to invent techniques to make themselves more productive, for otherwise design time would lengthen as dramatically as resources grew. A major productivity technique for hardware and software is to use abstractions to characterize the design at different levels of representation; lower-level details are hidden to offer a simpler model at higher levels. We’ll use the abstract painting icon to represent this second great idea. Make the Common Case Fast Making the common case fast will tend to enhance performance better than optimizing the rare case. Ironically, the common case is often simpler than the rare case and hence is usually easier to enhance. This common sense advice implies that you know what the common case is, which is only possible with careful experimentation and measurement (see Section 1.6). We use a sports car as the icon for making the common case fast, as the most common trip has one or two passengers, and it’s surely easier to make a fast sports car than a fast minivan! Performance via Parallelism Since the dawn of computing, computer architects have offered designs that get more performance by computing operations in parallel. We’ll see many examples of parallelism in this book. We use multiple jet engines of a plane as our icon for parallel performance. Performance via Pipelining A particular pattern of parallelism is so prevalent in computer architecture that it merits its own name: pipelining. For example, before fire engines, a “bucket brigade” would respond to a fire, which many cowboy movies show in response to a dastardly act by the villain. The townsfolk form a human chain to carry a water source to fire, as they could much more quickly move buckets up the chain instead of individuals running back and forth. Our pipeline icon is a sequence of pipes, with each section representing one stage of the pipeline. Performance via Prediction Following the saying that it can be better to ask for forgiveness than to ask for permission, the next great idea is prediction. In some cases, it can be faster on average to guess and start working rather than wait until you know for sure, assuming that the mechanism to recover from a misprediction is not too expensive and your prediction is relatively accurate. We use the fortune-teller’s crystal ball as our prediction icon. 12 Chapter 1 Computer Abstractions and Technology Hierarchy of Memories Programmers want the memory to be fast, large, and cheap, as memory speed often shapes performance, capacity limits the size of problems that can be solved, and the cost of memory today is often the majority of computer cost. Architects have found that they can address these conflicting demands with a hierarchy of memories, with the fastest, smallest, and the most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom. As we shall see in Chapter 5, caches give the programmer the illusion that main memory is almost as fast as the top of the hierarchy and nearly as big and cheap as the bottom of the hierarchy. We use a layered triangle icon to represent the memory hierarchy. The shape indicates speed, cost, and size: the closer to the top, the faster and more expensive per bit the memory; the wider the base of the layer, the bigger the memory. Dependability via Redundancy Computers not only need to be fast; they need to be dependable. Since any physical device can fail, we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures. We use the tractor-trailer as our icon, since the dual tires on each side of its rear axles allow the truck to continue driving even when one tire fails. (Presumably, the truck driver heads immediately to a repair facility so the flat tire can be fixed, thereby restoring redundancy!) In the prior edition, we listed an eighth great idea, which was “Designing for Moore’s Law.” Gordon Moore, one of the founders of Intel, made a remarkable prediction in 1965: integrated circuit resources would double every year. A decade later he amended his prediction to doubling every two years. His prediction was accurate, and for 50 years Moore’s law shaped computer architecture. As computer designs can take years, the resources available per chip (“transistors”; see page 25) could easily double or triple between the start and finish of the project. Like a skeet shooter, computer architects had to anticipate where the technology would be when the design was finished rather than design for when it began. Alas, no exponential growth can last forever, and Moore’s law is no longer accurate. The slowing of Moore’s law is shocking for computer designers who have long leveraged it. Some do not want to believe it is over despite substantial evidence to the contrary. Part of the reason is confusion between saying that Moore’s prediction of a biannual doubling rate is now incorrect and claiming that semiconductors will no longer improve. Semiconductor technology will continue to improve but more slowly than in the past. Starting with this edition, we will discuss the implications of the slowing of Moore’s law, especially in Chapter 6. Elaboration: During the heyday of Moore’s law, the cost per chip resource dropped with each new technology generation. For the latest technologies, the cost per resource may be flat or even rising with each new generation due to the cost of new equipment, elaborate processes invented to make chips work at finer feature sizes, and reduced number of companies investing in these new technologies to push the state of the art. Less competition naturally leads to higher prices. 1.3 Below Your Program 13 In Paris they simply stared when I spoke to 1.3 Below Your Program them in French; I never did succeed in making those idiots understand A typical application, such as a word processor or a large database system, may their own language. consist of millions of lines of code and rely on sophisticated software libraries that Mark Twain, The implement complex functions in support of the application. As we will see, the Innocents Abroad, 1869 hardware in a computer can only execute extremely simple low-level instructions. To go from a complex application to the primitive instructions involves several layers of software that interpret or translate high-level operations into simple computer instructions, an example of the great idea of abstraction. Figure 1.3 shows that these layers of software are organized primarily in a hierarchical fashion, with applications being the outermost ring and a variety of systems software sitting between the hardware and the application software. There are many types of systems software, but two types of systems software are central to every computer system today: an operating system and a compiler. An operating system interfaces between a user’s program and the hardware and provides a variety of services and supervisory functions. Among the most important functions are: systems software Software that provides Handling basic input and output operations services that are Allocating storage and memory commonly useful, including operating Providing for protected sharing of the computer among multiple applications systems, compilers, using it simultaneously loaders, and assemblers. Examples of operating systems in use today are Linux, iOS, Android, and Windows. operating system Supervising program that manages the resources of a computer for the benefit DWLRQVRIWZD of the programs that run SOLF UH $S on that computer. V VRI WZD WHP \V U H 6 +DUGZDUH FIGURE 1.3 A simplified view of hardware and software as hierarchical layers, shown as concentric circles with hardware in the center and application software outermost. In complex applications, there are often multiple layers of application software as well. For example, a database system may run on top of the systems software hosting an application, which in turn runs on top of the database. 14 Chapter 1 Computer Abstractions and Technology compiler A program Compilers perform another vital function: the translation of a program written that translates high-level in a high-level language, such as C, C+ +, Java, or Visual Basic into instructions language statements that the hardware can execute. Given the sophistication of modern programming into assembly language languages and the simplicity of the instructions executed by the hardware, the statements. translation from a high-level language program to hardware instructions is complex. We give a brief overview of the process here and then go into more depth in Chapter 2. From a High-Level Language to the Language of Hardware To speak directly to electronic hardware, you need to send electrical signals. The easiest signals for computers to understand are on and off, and so the computer alphabet is just two letters. Just as the 26 letters of the English alphabet do not limit how much can be written, the two letters of the computer alphabet do not limit what computers can do. The two symbols for these two letters are the numbers 0 and 1, and we commonly think of the computer language as numbers in base 2, or binary digit Also called binary numbers. We refer to each “letter” as a binary digit or bit. Computers are a bit. One of the two slaves to our commands, which are called instructions. Instructions, which are just numbers in base 2 (0 or 1) collections of bits that the computer understands and obeys, can be thought of as that are the components numbers. For example, the bits of information. instruction A command 1001010100101110 that computer hardware understands and obeys. tell one computer to add two numbers. Chapter 2 explains why we use numbers for instructions and data; we don’t want to steal that chapter’s thunder, but using numbers for both instructions and data is a foundation of computing. The first programmers communicated to computers in binary numbers, but this was so tedious that they quickly invented new notations that were closer to the way humans think. At first, these notations were translated to binary by hand, but this process was still tiresome. Using the computer to help program the computer, the pioneers invented software to translate from symbolic notation to binary. The first of assembler A program these programs was named an assembler. This program translates a symbolic version that translates a symbolic of an instruction into the binary version. For example, the programmer would write version of instructions into the binary version. add A, B and the assembler would translate this notation into 1001010100101110 This instruction tells the computer to add the two numbers A and B. The name coined assembly language for this symbolic language, still used today, is assembly language. In contrast, the A symbolic representation binary language that the machine understands is the machine language. of machine instructions. Although a tremendous improvement, assembly language is still far from the machine language notations a scientist might like to use to simulate fluid flow or that an accountant A binary representation of might use to balance the books. Assembly language requires the programmer machine instructions. to write one line for every instruction that the computer will follow, forcing the programmer to think like the computer. 1.3 Below Your Program 15 The recognition that a program could be written to translate a more powerful language into computer instructions was one of the great breakthroughs in the early days of computing. Programmers today owe their productivity—and their sanity—to the creation of high-level programming languages and compilers that translate programs in such languages into instructions. Figure 1.4 shows the relationships among these programs and languages, which are more examples of the power of abstraction. High-level swap(size_t v[], size_t k) high-level language { programming program size_t temp; language A portable (in C) temp = v[k]; language such as C, C++, v[k] = v[k+1]; Java, or Visual Basic that v[k+1] = temp; is composed of words } and algebraic notation that can be translated by a compiler into assembly language. Compiler

Computer Organization and Design RISC-V Edition PDF

Document Details

Tags

Related

Summary

Full Transcript