Computer Organization and Architecture (11th Edition Global) PDF

This is a special edition of an established title widely used by GLOBAL colleges and universities throughout the world. Pearson published this exclusive edition for the benefit of students outside the GLOBAL EDITION United States and Canada. If you purchased this book within EDITION EDITION GLOB AL the United States or Canada, you should be aware that it has been imported without the approval of the Publisher or Author. Computer Organization Computer Organization The eleventh edition of Computer Organization and Architecture continues to present the nature and characteristics of modern-day computer systems clearly and completely. Designing for Performance To ensure that continuous innovations are captured while maintaining a comprehensive and Architecture coverage of the field, the previous edition was extensively reviewed by instructors and professionals, leading to improved pedagogy, enhanced user-friendliness, and substantial and Architecture changes throughout the book. The book provides a thorough discussion of the fundamentals of computer organization and architecture and relates these fundamentals to contemporary, real-world issues one Designing for Performance encounters when designing balanced computer systems that achieve optimal performance. This text can be used across academic and professional spheres alike and is ideal for courses ELEVENTH EDITION in computer science, computer engineering, and electrical engineering. New to This Edition William Stallings Memory Hierarchy has been expanded to a new chapter and covers the principle of locality, performance modeling of data access, and other topics. The discussion on pipeline organization has been substantially expanded with new text and figures in Chapters 16, 17, and 18. ELEVENTH EDITION The treatment of assembly language has been expanded to a full chapter, with more detail and examples. The book supports recommendations of the ACM/IEEE Computer Science Curricula 2013 and Engineering Curricula 2016. Running examples across the book use Intel x86 and ARM architectures to tie concepts to real-world design choices. Stallings CVR_STAL0103_11_GE_CVR_Vivar.indd 1 28/09/21 9:40 AM Digital Resources for Students Your eBook provides 12-month access to digital resources on the Companion Website. Refer to the preface in the textbook for a detailed list of resources. Follow the instructions below to register for the Companion Website for William Stallings’ Computer Organization and Architecture, Eleventh Edition, Global Edition. 1. Go to www.pearsonglobaleditions.com 2. Enter the title of your textbook or browse by author name. 3. Click Companion Website. 4. Click Register and follow the on-screen instructions to create a login name and password. ISSSTY-FLAIL-EMERY-DUVET-BLUNT-RISES Use a coin to scratch off the coating and reveal your access code. Do not use a sharp knife or other sharp object as it may damage the code. Use the login name and password you created during registration to start using the online resources that accompany your textbook. IMPORTANT: This access code can only be used once. This subscription is valid for 12 months upon activation and is not transferable. For technical support, go to https://support.pearson.com/getsupport CVR_STAL0103_11_GE_CVR_Vivar_IFC_IBC.indd 1 21/09/21 10:44 AM Computer Organization and Architecture Designing for Performance Eleventh Edition A01_STAL0103_11_GE_FM.indd 1 29/09/21 7:03 PM This page is intentionally left blank A01_STAL0103_11_GE_FM.indd 2 29/09/21 7:03 PM Computer Organization and Architecture Designing for Performance Eleventh Edition Global Edition William Stallings 330 Hudson Street, New York, NY 10013 A01_STAL0103_11_GE_FM.indd 3 29/09/21 7:03 PM Cover Art: zairiazmal/Shutterstock Pearson Education Limited KAO Two KAO Park Hockham Way Harlow CM17 9SR United Kingdom and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsonglobaleditions.com Please contact https://support.pearson.com/getsupport/s/contactsupport with any queries on this content. © Pearson Education Limited 2022 The right of William Stallings to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. Authorized adaptation from the United States edition, entitled Computer Organization and Architecture, ISBN 978-0-13- 499719-3 by William Stallings published by Pearson Education © 2019. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions department, please visit www.pearsoned.com/permissions/. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners. Attributions of third-party content appear on the appropriate page within the text. Unless otherwise indicated herein, any third-party trademarks that may appear in this work are the property of their respective owners and any references to third-party trademarks, logos or other trade dress are for demonstrative or descriptive purposes only. Such references are not intended to imply any sponsorship, endorsement, authorization, or promotion of Pearson’s products by the owners of such marks, or any relationship between the owner and Pearson Education, Inc. or its affiliates, authors, licensees, or distributors. This eBook is a standalone product and may or may not include all assets that were part of the print version. It also does not provide access to other Pearson digital products like MyLab and Mastering. The publisher reserves the right to remove any material in this eBook at any time. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN 10: 1-292-42010-3 (print) ISBN 13: 978-1-292-42010-3 (print) ISBN 13: 978-1-292-42008-0 (uPDF eBook) To Tricia my loving wife, the kindest and gentlest person A01_STAL0103_11_GE_FM.indd 5 29/09/21 7:03 PM This page is intentionally left blank A01_STAL0103_11_GE_FM.indd 6 29/09/21 7:03 PM Contents Preface 13 About the Author 22 PART ONE INTRODUCTION 23 Chapter 1 Basic Concepts and Computer Evolution 23 1.1 Organization and Architecture 24 1.2 Structure and Function 25 1.3 The IAS Computer 33 1.4 Gates, Memory Cells, Chips, and Multichip Modules 39 1.5 The Evolution of the Intel x86 Architecture 45 1.6 Embedded Systems 46 1.7 ARM Architecture 51 1.8 Key Terms, Review Questions, and Problems 56 Chapter 2 Performance Concepts 59 2.1 Designing for Performance 60 2.2 Multicore, MICs, and GPGPUs 66 2.3 Two Laws that Provide Insight: Ahmdahl’s Law and Little’s Law 67 2.4 Basic Measures of Computer Performance 70 2.5 Calculating the Mean 73 2.6 Benchmarks and SPEC 81 2.7 Key Terms, Review Questions, and Problems 88 PART TWO THE COMPUTER SYSTEM 94 Chapter 3 A Top-Level View of Computer Function and Interconnection 94 3.1 Computer Components 95 3.2 Computer Function 97 3.3 Interconnection Structures 112 3.4 Bus Interconnection 114 3.5 Point-to-Point Interconnect 116 3.6 PCI Express 121 3.7 Key Terms, Review Questions, and Problems 129 Chapter 4 The Memory Hierarchy: Locality and Performance 134 4.1 Principle of Locality 135 4.2 Characteristics of Memory Systems 140 4.3 The Memory Hierarchy 143 4.4 Performance Modeling of a Multilevel Memory Hierarchy 150 4.5 Key Terms, Review Questions, and Problems 157 7 A01_STAL0103_11_GE_FM.indd 7 29/09/21 7:03 PM 8 Contents Chapter 5 Cache Memory 160 5.1 Cache Memory Principles 161 5.2 Elements of Cache Design 165 5.3 Intel x86 Cache Organization 187 5.4 The IBM z13 Cache Organization 190 5.5 Cache Performance Models 191 5.6 Key Terms, Review Questions, and Problems 195 Chapter 6 Internal Memory 199 6.1 Semiconductor Main Memory 200 6.2 Error Correction 209 6.3 DDR DRAM 214 6.4 eDRAM 219 6.5 Flash Memory 221 6.6 Newer Nonvolatile Solid-State Memory Technologies 224 6.7 Key Terms, Review Questions, and Problems 227 Chapter 7 External Memory 232 7.1 Magnetic Disk 233 7.2 RAID 243 7.3 Solid State Drives 253 7.4 Optical Memory 256 7.5 Magnetic Tape 262 7.6 Key Terms, Review Questions, and Problems 264 Chapter 8 Input/Output 267 8.1 External Devices 269 8.2 I/O Modules 271 8.3 Programmed I/O 274 8.4 Interrupt-Driven I/O 278 8.5 Direct Memory Access 287 8.6 Direct Cache Access 293 8.7 I/O Channels and Processors 300 8.8 External Interconnection Standards 302 8.9 IBM z13 I/O Structure 305 8.10 Key Terms, Review Questions, and Problems 309 Chapter 9 Operating System Support 313 9.1 Operating System Overview 314 9.2 Scheduling 325 9.3 Memory Management 331 9.4 Intel x86 Memory Management 342 9.5 ARM Memory Management 347 9.6 Key Terms, Review Questions, and Problems 352 PART THREE ARITHMETIC AND LOGIC 356 Chapter 10 Number Systems 356 10.1 The Decimal System 357 A01_STAL0103_11_GE_FM.indd 8 29/09/21 7:03 PM Contents 9 10.2 Positional Number Systems 358 10.3 The Binary System 359 10.4 Converting Between Binary and Decimal 359 10.5 Hexadecimal Notation 362 10.6 Key Terms and Problems 364 Chapter 11 Computer Arithmetic 366 11.1 The Arithmetic and Logic Unit 367 11.2 Integer Representation 368 11.3 Integer Arithmetic 373 11.4 Floating-Point Representation 388 11.5 Floating-Point Arithmetic 396 11.6 Key Terms, Review Questions, and Problems 405 Chapter 12 Digital Logic 410 12.1 Boolean Algebra 411 12.2 Gates 416 12.3 Combinational Circuits 418 12.4 Sequential Circuits 436 12.5 Programmable Logic Devices 445 12.6 Key Terms and Problems 450 PART FOUR INSTRUCTION SETS AND ASSEMBLY LANGUAGE 454 Chapter 13 Instruction Sets: Characteristics and Functions 454 13.1 Machine Instruction Characteristics 455 13.2 Types of Operands 462 13.3 Intel x86 and ARM Data Types 464 13.4 Types of Operations 467 13.5 Intel x86 and ARM Operation Types 480 13.6 Key Terms, Review Questions, and Problems 488 Appendix 13A Little-, Big-, and Bi-Endian 494 Chapter 14 Instruction Sets: Addressing Modes and Formats 498 14.1 Addressing Modes 499 14.2 x86 and ARM Addressing Modes 505 14.3 Instruction Formats 511 14.4 x86 and ARM Instruction Formats 519 14.5 Key Terms, Review Questions, and Problems 524 Chapter 15 Assembly Language and Related Topics 528 15.1 Assembly Language Concepts 529 15.2 Motivation for Assembly Language Programming 532 15.3 Assembly Language Elements 534 15.4 Examples 540 15.5 Types of Assemblers 545 15.6 Assemblers 545 15.7 Loading and Linking 548 15.8 Key Terms, Review Questions, and Problems 555 A01_STAL0103_11_GE_FM.indd 9 29/09/21 7:03 PM 10 Contents PART FIVE THE CENTRAL PROCESSING UNIT 559 Chapter 16 Processor Structure and Function 559 16.1 Processor Organization 560 16.2 Register Organization 561 16.3 Instruction Cycle 567 16.4 Instruction Pipelining 570 16.5 Processor Organization for Pipelining 588 16.6 The x86 Processor Family 590 16.7 The ARM Processor 597 16.8 Key Terms, Review Questions, and Problems 603 Chapter 17 Reduced Instruction Set Computers 608 17.1 Instruction Execution Characteristics 610 17.2 The Use of a Large Register File 615 17.3 Compiler-Based Register Optimization 620 17.4 Reduced Instruction Set Architecture 622 17.5 RISC Pipelining 628 17.6 MIPS R4000 632 17.7 SPARC 638 17.8 Processor Organization for Pipelining 643 17.9 CISC, RISC, and Contemporary Systems 645 17.10 Key Terms, Review Questions, and Problems 647 Chapter 18 Instruction-Level Parallelism and Superscalar Processors 651 18.1 Overview 652 18.2 Design Issues 659 18.3 Intel Core Microarchitecture 668 18.4 ARM Cortex-A8 674 18.5 ARM Cortex-M3 680 18.6 Key Terms, Review Questions, and Problems 685 Chapter 19 Control Unit Operation and Microprogrammed Control 691 19.1 Micro-operations 692 19.2 Control of the Processor 698 19.3 Hardwired Implementation 708 19.4 Microprogrammed Control 711 19.5 Key Terms, Review Questions, and Problems 720 PART SIX PARALLEL ORGANIZATION 723 Chapter 20 Parallel Processing 723 20.1 Multiple Processors Organization 725 20.2 Symmetric Multiprocessors 727 20.3 Cache Coherence and the MESI Protocol 731 20.4 Multithreading and Chip Multiprocessors 740 20.5 Clusters 745 20.6 Nonuniform Memory Access 748 20.7 Key Terms, Review Questions, and Problems 752 A01_STAL0103_11_GE_FM.indd 10 29/09/21 7:03 PM Contents 11 Chapter 21 Multicore Computers 758 21.1 Hardware Performance Issues 759 21.2 Software Performance Issues 762 21.3 Multicore Organization 767 21.4 Heterogeneous Multicore Organization 769 21.5 Intel Core i7-5960X 778 21.6 ARM Cortex-A15 MPCore 779 21.7 IBM z13 Mainframe 784 21.8 Key Terms, Review Questions, and Problems 787 Appendix A System Buses 790 A.1 Bus Structure 791 A.2 Multiple-Bus Hierarchies 792 A.3 Elements of Bus Design 794 Appendix B Victim Cache Strategies 799 B.1 Victim Cache 800 B.2 Selective Victim Cache 802 Appendix C Interleaved Memory 804 Appendix D The International Reference Alphabet 807 Appendix E Stacks 810 E.1 Stacks 811 E.2 Stack Implementation 812 E.3 Expression Evaluation 813 Appendix F Recursive Procedures 817 F.1 Recursion 818 F.2 Activation Tree Representation 819 F.3 Stack Implementation 825 F.4 Recursion and Iteration 826 Appendix G Additional Instruction Pipeline Topics 829 G.1 Pipeline Reservation Tables 830 G.2 Reorder Buffers 837 G.3 Tomasulo’s Algorithm 840 G.4 Scoreboarding 844 Glossary 848 References 857 Index 866 A01_STAL0103_11_GE_FM.indd 11 29/09/21 7:03 PM This page is intentionally left blank A01_STAL0103_11_GE_FM.indd 12 29/09/21 7:03 PM Preface WHAT’S NEW IN THE ELEVENTH EDITION Since the tenth edition of this book was published, the field has seen continued innovations and improvements. In this new edition, I try to capture these changes while maintaining a broad and comprehensive coverage of the entire field. To begin this process of revision, the tenth edition of this book was extensively reviewed by a number of professors who teach the subject and by professionals working in the field. The result is that, in many places, the narrative has been clarified and tightened, and illustrations have been improved. Beyond these refinements to improve pedagogy and u ser-friendliness, there have been substantive changes throughout the book. Roughly the same chapter organization has been retained, but much of the material has been revised and new material has been added. The most noteworthy changes are as follows: Multichip Modules: A new discussion of MCMs, which are now widely used, has been added to Chapter 1. SPEC benchmarks: The treatment of SPEC in Chapter 2 has been updated to cover the new SPEC CPU2017 benchmark suite. Memory hierarchy: A new chapter on memory hierarchy expands on material that was in the cache memory chapter, plus adds new material. The new Chapter 4 includes: —Updated and expanded coverage of the principle of locality —Updated and expanded coverage of the memory hierarchy —A new treatment of performance modeling of data access in a memory hierarchy Cache memory: The cache memory chapter has been updated and revised. Chapter 5 now includes: —Revised and expanded treatment of logical cache organization, including new figures, to improve clarity —New coverage of content-addressable memory —New coverage of write allocate and no write allocate policies —A new section on cache performance modeling. Embedded DRAM: Chapter 6 on internal memory now includes a section on the increasingly popular eDRAM. 13 A01_STAL0103_11_GE_FM.indd 13 29/09/21 7:03 PM 14 Preface Advanced Format 4k sector hard drives: Chapter 7 on external memory now includes discussion of the now widely used 4k sector hard drive format. Boolean algebra: The discussion on Boolean algebra in Chapter 12 has been expanded with new text, figures, and tables, to enhance understanding. Assembly language: The treatment of assembly language has been expanded to a full chapter, with more detail and more examples. Pipeline organization: The discussion on pipeline organization has been substantially expanded with new text and figures. The material is in new sections in Chapters 16 (Processor Structure and Function), 17 (RISC), and 18 (Superscalar). Cache coherence: The discussion of the MESI cache coherence protocol in Chapter 20 has been expanded with new text and figures. UPPORT OF ACM/IEEE COMPUTER SCIENCE AND COMPUTER S ENGINEERING CURRICULA The book is intended for both an academic and a professional audience. As a textbook, it is intended as a one-or two-semester undergraduate course for computer science, computer engineering, and electrical engineering majors. This edition supports recommendations of the ACM/IEEE Computer Science Curricula 2013 (CS2013). CS2013 divides all course work into three categories: C ore-Tier 1 (all topics should be included in the curriculum); Core-Tier-2 (all or almost all topics should be included); and Elective (desirable to provide breadth and depth). In the Architecture and Organization (AR) area, CS2013 includes five Tier-2 topics and three Elective topics, each of which has a number of subtopics. This text covers all eight topics listed by CS2013. Table P.1 shows the support for the AR Knowl- edge Area provided in this textbook. This book also supports the ACM/IEEE Computer Engineering Curricula 2016 (CE2016). CE2016 defines a necessary body of knowledge for undergraduate computer engineering, divided into twelve knowledge areas. One of these areas is Computer Architecture and Organization (CE-CAO), consisting of ten core knowl- edge areas. This text covers all of the CE-CAO knowledge areas listed in CE2016. Table P.2 shows the coverage. Table P.1 Coverage of CS2013 Architecture and Organization (AR) Knowledge Area IAS Knowledge Units Topics Textbook Coverage Digital Logic and Digital Overview and history of computer architecture —Chapter 1 Systems (Tier 2) Combinational vs. sequential logic/Field program- —Chapter 12 mable gate arrays as a fundamental combinational sequential logic building block Multiple representations/layers of interpretation (hardware is just another layer) Physical constraints (gate delays, fan-in, fan-out, energy/power) Machine Level Represen- Bits, bytes, and words —Chapter 10 tation of Data (Tier 2) Numeric data representation and number bases —Chapter 11 Fixed- and floating-point systems Signed and twos-complement representations Representation of non-numeric data (character codes, graphical data) A01_STAL0103_11_GE_FM.indd 14 29/09/21 7:03 PM Preface 15 IAS Knowledge Units Topics Textbook Coverage Assembly Level Machine Basic organization of the von Neumann machine —Chapter 1 Organization (Tier 2) Control unit; instruction fetch, decode, and execution —Chapter 8 Instruction sets and types (data manipulation, —Chapter 13 control, I/O) —Chapter 14 Assembly/machine language programming —Chapter 15 Instruction formats —Chapter 19 Addressing modes —Chapter 20 Subroutine call and return mechanisms (cross- —Chapter 21 reference PL/Language Translation and Execution) I/O and interrupts Shared memory multiprocessors/multicore organization Introduction to SIMD vs. MIMD and the Flynn Taxonomy Memory System Organi- Storage systems and their technology —Chapter 4 zation and Architecture Memory hierarchy: temporal and spatial locality —Chapter 5 (Tier 2) Main memory organization and operations —Chapter 6 Latency, cycle time, bandwidth, and interleaving —Chapter 7 Cache memories (address mapping, block size, —Chapter 9 replacement and store policy) —Chapter 20 Multiprocessor cache consistency/Using the memory system for inter-core synchronization/atomic mem- ory operations Virtual memory (page table, TLB) Fault handling and reliability Interfacing and Commu- I/O fundamentals: handshaking, buffering, pro- —Chapter 3 nication (Tier 2) grammed I/O, interrupt-driven I/O —Chapter 7 Interrupt structures: vectored and prioritized, inter- —Chapter 8 rupt acknowledgment External storage, physical organization, and drives Buses: bus protocols, arbitration, direct-memory access (DMA) RAID architectures Functional Organization Implementation of simple datapaths, including —Chapter 16 (Elective) instruction pipelining, hazard detection, and —Chapter 17 resolution —Chapter 18 Control unit: hardwired realization vs. micropro- —Chapter 19 grammed realization Instruction pipelining Introduction to instruction-level parallelism (ILP) Multiprocessing and Example SIMD and MIMD instruction sets and —Chapter 20 Alternative Architectures architectures —Chapter 21 (Elective) Interconnection networks Shared multiprocessor memory systems and memory consistency Multiprocessor cache coherence Performance Enhance- Superscalar architecture —Chapter 17 ments (Elective) Branch prediction, Speculative execution, —Chapter 18 Out-of-order execution —Chapter 20 Prefetching Vector processors and GPUs Hardware support for multithreading Scalability A01_STAL0103_11_GE_FM.indd 15 29/09/21 7:03 PM 16 Preface Table P.2 Coverage of CE2016 Computer Architecture and Organization (AR) Knowledge Area Knowledge Unit Textbook Coverage History and overview Chapter 1—Basic Concepts and Computer Evolution Relevant tools, standards and/or Chapter 3—A Top-Level View of Computer Function and engineering constraints Interconnection Instruction set architecture Chapter 13—Instruction Sets: Characteristics and Functions Chapter 14—Instruction Sets: Addressing Modes and Formats Chapter 15—Assembly Language and Related Topics Measuring performance Chapter 2—Performance Concepts Computer arithmetic Chapter 10—Number Systems Chapter 11—Computer Arithmetic Processor organization Chapter 16—Processor Structure and Function Chapter 17—Reduced Instruction Set Computers (RISCs) Chapter 18—Instruction-Level Parallelism and Superscalar Processors Chapter 19—Control Unit Operation and Microprogrammed Control Memory system organization and Chapter 4—The Memory Hierarchy: Locality and Performance architectures Chapter 5—Cache Memory Chapter 6—Internal Memory Technology Chapter 7—External Memory Input/Output interfacing and Chapter 8—Input/Output communication Peripheral subsystems Chapter 3—A Top-Level View of Computer Function and Interconnection Chapter 8—Input/Output Multi/Many-core architectures Chapter 21—Multicore Computers Distributed system architectures Chapter 20—Parallel Processing OBJECTIVES This book is about the structure and function of computers. Its purpose is to present, as clearly and completely as possible, the nature and characteristics of modern-day computer systems. This task is challenging for several reasons. First, there is a tremendous variety of prod- ucts that can rightly claim the name of computer, from s ingle-chip microprocessors costing a few dollars to supercomputers costing tens of millions of dollars. Variety is exhibited not only in cost but also in size, performance, and application. Second, the rapid pace of change that has always characterized computer technology continues with no letup. These changes cover all aspects of computer technology, from the underlying integrated circuit technology used to construct computer components to the increasing use of parallel organization con- cepts in combining those components. In spite of the variety and pace of change in the computer field, certain fundamen- tal concepts apply consistently throughout. The application of these concepts depends on the current state of the technology and the price/performance objectives of the designer. A01_STAL0103_11_GE_FM.indd 16 29/09/21 7:03 PM Preface 17 The intent of this book is to provide a thorough discussion of the fundamentals of computer organization and architecture and to relate these to contemporary design issues. The subtitle suggests the theme and the approach taken in this book. It has always been important to design computer systems to achieve high performance, but never has this requirement been stronger or more difficult to satisfy than today. All of the basic per- formance characteristics of computer systems, including processor speed, memory speed, memory capacity, and interconnection data rates, are increasing rapidly. Moreover, they are increasing at different rates. This makes it difficult to design a balanced system that maxi- mizes the performance and utilization of all elements. Thus, computer design increasingly becomes a game of changing the structure or function in one area to compensate for a per- formance mismatch in another area. We will see this game played out in numerous design decisions throughout the book. A computer system, like any system, consists of an interrelated set of components. The system is best characterized in terms of structure—the way in which components are interconnected, and f unction—the operation of the individual components. Furthermore, a computer’s organization is hierarchical. Each major component can be further described by decomposing it into its major subcomponents and describing their structure and function. For clarity and ease of understanding, this hierarchical organization is described in this book from the top down: Computer system: Major components are processor, memory, I/O. Processor: Major components are control unit, registers, ALU, and instruction execu- tion unit. Control unit: Provides control signals for the operation and coordination of all proces- sor components. Traditionally, a microprogramming implementation has been used, in which major components are control memory, microinstruction sequencing logic, and registers. More recently, microprogramming has been less prominent but remains an important implementation technique. The objective is to present the material in a fashion that keeps new material in a clear context. This should minimize the chance that the reader will get lost and should provide better motivation than a b ottom-up approach. Throughout the discussion, aspects of the system are viewed from the points of view of both architecture (those attributes of a system visible to a machine language programmer) and organization (the operational units and their interconnections that realize the architecture). EXAMPLE SYSTEMS This text is intended to acquaint the reader with the design principles and implementation issues of contemporary operating systems. Accordingly, a purely conceptual or theoretical treatment would be inadequate. To illustrate the concepts and to tie them to r eal-world design choices that must be made, two processor families have been chosen as running examples: Intel x86 architecture: The x86 architecture is the most widely used for nonembedded computer systems. The x86 is essentially a complex instruction set computer (CISC) A01_STAL0103_11_GE_FM.indd 17 29/09/21 7:03 PM 18 Preface with some RISC features. Recent members of the x86 family make use of superscalar and multicore design principles. The evolution of features in the x86 architecture pro- vides a unique case-study of the evolution of most of the design principles in computer architecture. ARM: The ARM architecture is arguably the most widely used embedded processor, used in cell phones, iPods, remote sensor equipment, and many other devices. The ARM is essentially a reduced instruction set computer (RISC). Recent members of the ARM family make use of superscalar and multicore design principles. Many, but by no means all, of the examples in this book are drawn from these two computer families. Numerous other systems, both contemporary and historical, provide examples of important computer architecture design features. PLAN OF THE TEXT The book is organized into six parts: Introduction The computer system Arithmetic and logic Instruction sets and assembly language The central processing unit Parallel organization, including multicore The book includes a number of pedagogic features, including the use of interactive sim- ulations and numerous figures and tables to clarify the discussion. Each chapter includes a list of key words, review questions, and homework problems. The book also includes an extensive glossary, a list of frequently used acronyms, and a bibliography. INSTRUCTOR SUPPORT MATERIALS Support materials for instructors are available at the Instructor Resource Center (IRC) for this textbook, which can be reached through the publisher’s Web site www.pearsonglobaleditions. com. The IRC provides the following materials: Projects manual: Project resources including documents and portable software, plus suggested project assignments for all of the project categories listed subsequently in this Preface. Solutions manual: Solutions to e nd-of-chapter Review Questions and Problems. PowerPoint slides: A set of slides covering all chapters, suitable for use in lecturing. PDF files: Copies of all figures and tables from the book. Test bank: A chapter-by-chapter set of questions. A01_STAL0103_11_GE_FM.indd 18 29/09/21 7:03 PM Preface 19 Sample syllabuses: The text contains more material than can be conveniently covered in one semester. Accordingly, instructors are provided with several sample syllabuses that guide the use of the text within limited time. These samples are based on r eal-world experience by professors with the first edition. STUDENT RESOURCES For this new edition, a tremendous amount of original supporting material for students has been made available online. The C ompanion Web Site, at www.pearsonglobaleditions.com, includes a list of relevant links organized by chapter. To aid the student in understanding the material, a separate set of homework problems with solutions are available at this site. Stu- dents can enhance their understanding of the material by working out the solutions to these problems and then checking their answers. The site also includes a number of documents and papers referenced throughout the text. Students can subscribe to the Companion Web Site by using the access codes provided on the inside front cover. PROJECTS AND OTHER STUDENT EXERCISES For many instructors, an important component of a computer organization and architec- ture course is a project or set of projects by which the student gets hands-on experience to reinforce concepts from the text. This book provides an unparalleled degree of support for including a projects component in the course. The instructor’s support materials available through the IRC not only includes guidance on how to assign and structure the projects but also includes a set of user’s manuals for various project types plus specific assignments, all written especially for this book. Instructors can assign work in the following areas: Interactive simulation assignments: Described subsequently. Research projects: A series of research assignments that instruct the student to research a particular topic on the Internet and write a report. Simulation projects: The IRC provides support for the use of the two simulation pack- ages: SimpleScalar can be used to explore computer organization and architecture design issues. SMPCache provides a powerful educational tool for examining cache design issues for symmetric multiprocessors. Assembly language projects: A simplified assembly language, CodeBlue, is used and assignments based on the popular Core Wars concept are provided. Reading/report assignments: A list of papers in the literature, one or more for each chapter, that can be assigned for the student to read and then write a short report. Writing assignments: A list of writing assignments to facilitate learning the material. Test bank: Includes T/F, multiple choice, and fill-in-the-blank questions and answers. This diverse set of projects and other student exercises enables the instructor to use the book as one component in a rich and varied learning experience and to tailor a course plan to meet the specific needs of the instructor and students. A01_STAL0103_11_GE_FM.indd 19 29/09/21 7:03 PM 20 Preface INTERACTIVE SIMULATIONS An important feature in this edition is the incorporation of interactive simulations. These simulations provide a powerful tool for understanding the complex design features of a modern computer system. A total of 20 interactive simulations are used to illustrate key functions and algorithms in computer organization and architecture design. At the relevant point in the book, an icon indicates that a relevant interactive simulation is available online for student use. Because the animations enable the user to set initial conditions, they can serve as the basis for student assignments. The instructor’s supplement includes a set of assignments, one for each of the animations. Each assignment includes several specific prob- lems that can be assigned to students. ACKNOWLEDGMENTS This new edition has benefited from review by a number of people, who gave generously of their time and expertise. The following professors provided a review of the entire book: Nikhil Bhargava (Indian Institute of Management, Delhi), James Gil de Lamadrid (Bowie State University, Computer Science Department), Debra Calliss (Computer Science and Engineering, Arizona State University), Mohammed Anwaruddin (Wentworth Institute of Technology, Dept. of Computer Science), Roger Kieckhafer (Michigan Technological University, Electrical & Computer Engineering), Paul Fortier (University of Massachusetts Darthmouth, Electrical and Computer Engineering), Yan Zhang (Department of Com- puter Science and Engineering, University of South Florida), Patricia Roden (University of North Alabama, Computer Science and Information Systems), Sanjeev Baskiyar (Auburn University, Computer Science and Software Engineering), and (Jayson Rock, University of Wisconsin-Milwaukee, Computer Science). I would especially like to thank Professor Roger Kieckhafer for permission to make use of some of the figures and performance models from his course lecture notes. Thanks also to the many people who provided detailed technical reviews of one or more chapters: Rekai Gonzalez Alberquilla, Allen Baum, Jalil Boukhobza, Dmitry Bufistov, Humberto Calderón, Jesus Carretero, Ashkan Eghbal, Peter Glaskowsky, Ram Huggahalli, Chris Jesshope, Athanasios Kakarountas, Isil Oz, Mitchell Poplingher, Roger Shepherd, Jigar Savla, Karl Stevens, Siri Uppalapati, Dr. Sriram Vajapeyam, Kugan Vivekanandara- jah, Pooria M. Yaghini, and Peter Zeno, Professor Cindy Norris of Appalachian State University, Professor Bin Mu of the Uni- versity of New Brunswick, and Professor Kenrick Mock of the University of Alaska kindly supplied homework problems. Aswin Sreedhar of the University of Massachusetts developed the interactive simula- tion assignments. Professor Miguel Angel Vega Rodriguez, Professor Dr. Juan Manuel Sánchez Pérez, and Professor Dr. Juan Antonio Gómez Pulido, all of University of Extremadura, Spain, prepared the SMPCache problems in the instructor’s manual and authored the SMPCache User’s Guide. A01_STAL0103_11_GE_FM.indd 20 29/09/21 7:03 PM Preface 21 Todd Bezenek of the University of Wisconsin and James Stine of Lehigh University prepared the SimpleScalar problems in the instructor’s manual, and Todd also authored the SimpleScalar User’s Guide. Finally, I would like to thank the many people responsible for the publication of the book, all of whom did their usual excellent job. This includes the staff at Pearson, particu- larly my editor Tracy Johnson, her assistant Meghan Jacoby, and project manager Bob Engelhardt. Thanks also to the marketing and sales staffs at Pearson, without whose efforts this book would not be in front of you. ACKNOWLEDGMENTS FOR THE GLOBAL EDITION Pearson would like to acknowledge and thank the following for their work on the Global Edition. Contributor Asral Bahari Jambek (Universiti Malaysia Perlis, Malaysia) Carl Barton (Birkbeck University of London) Sutep Tongngam (NIDA, Thailand) Reviewers Rana Ejaz Ahmed (American University of Sharjah, UAE) Ritesh Ajoodha (University of the Witwatersrand, Johannesburg) Asral Bahari Jambek (Universiti Malaysia Perlis, Malaysia) Patricia E.N. Lutu (University of Pretoria, South Africa) Sezer Gören Uğurdağ (Yeditepe Üniversitesi, Turkey) A01_STAL0103_11_GE_FM.indd 21 29/09/21 7:03 PM About the Author Dr. William Stallings has authored 18 textbooks, and counting revised editions, over 70 books on computer security, computer networking, and computer architecture. In over 30 years in the field, he has been a technical contributor, technical manager, and an execu- tive with several h igh-technology firms. Currently, he is an independent consultant whose clients have included computer and networking manufacturers and customers, software development firms, and leading-edge government research institutions. He has 13 times received the award for the best computer science textbook of the year from the Text and Academic Authors Association. He created and maintains the Computer Science Student Resource Site at ComputerScienceStudent.com. This site provides documents and links on a variety of sub- jects of general interest to computer science students (and professionals). He is a member of the editorial board of Cryptologia, a scholarly journal devoted to all aspects of cryptology. Dr. Stallings holds a PhD from MIT in computer science and a BS from Notre Dame in electrical engineering. 22 A01_STAL0103_11_GE_FM.indd 22 29/09/21 7:03 PM Part One Introduction CHAPTER Basic Concepts and Computer Evolution 1.1 Organization and Architecture 1.2 Structure and Function Function Structure 1.3 The IAS Computer 1.4 Gates, Memory Cells, Chips, and Multichip Modules Gates and Memory Cells Transistors Microelectronic Chips Multichip Module 1.5 The Evolution of the Intel x86 Architecture 1.6 Embedded Systems The Internet of Things Embedded Operating Systems Application Processors versus Dedicated Processors Microprocessors versus Microcontrollers Embedded versus Deeply Embedded Systems 1.7 ARM Architecture ARM Evolution Instruction Set Architecture ARM Products 1.8 Key Terms, Review Questions, and Problems 23 M01_STAL0103_11_GE_C01.indd 23 29/09/21 7:18 PM 24 CHAPTER 1 / Basic Concepts and Computer Evolution Learning Objectives After studying this chapter, you should be able to: rr Explain the general functions and structure of a digital computer. rr Present an overview of the evolution of computer technology from early digital computers to the latest microprocessors. rr Present an overview of the evolution of the x86 architecture. rr Define embedded systems and list some of the requirements and constraints that various embedded systems must meet. 1.1 ORGANIZATION AND ARCHITECTURE In describing computers, a distinction is often made between computer architec- ture and computer organization. Although it is difficult to give precise definitions for these terms, a consensus exists about the general areas covered by each. For example, see [VRAN80], [SIEW82], and [BELL78a]; an interesting alternative view is presented in [REDD76]. Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical execution of a program. A term that is often used interchangeably with computer architecture is instruction set architecture (ISA). The ISA defines instruction formats, instruction opcodes, registers, instruction and data memory; the effect of executed instructions on the registers and memory; and an algo- rithm for controlling instruction execution. Computer organization refers to the operational units and their interconnections that realize the architectural specifi- cations. Examples of architectural attributes include the instruction set, the num- ber of bits used to represent various data types (e.g., numbers, characters), I/O mechanisms, and techniques for addressing memory. Organizational attributes include those hardware details transparent to the programmer, such as control signals; interfaces between the computer and peripherals; and the memory tech- nology used. For example, it is an architectural design issue whether a computer will have a multiply instruction. It is an organizational issue whether that instruction will be implemented by a special multiply unit or by a mechanism that makes repeated use of the add unit of the system. The organizational decision may be based on the anticipated frequency of use of the multiply instruction, the relative speed of the two approaches, and the cost and physical size of a special multiply unit. Historically, and still today, the distinction between architecture and organi- zation has been an important one. Many computer manufacturers offer a family of computer models, all with the same architecture but with differences in organi- zation. Consequently, the different models in the family have different price and M01_STAL0103_11_GE_C01.indd 24 29/09/21 7:18 PM 1.2 / Structure and Function 25 performance characteristics. Furthermore, a particular architecture may span many years and encompass a number of different computer models, its organization changing with changing technology. A prominent example of both these phenomena is the IBM System/370 architecture. This architecture was first introduced in 1970 and included a number of models. The customer with modest requirements could buy a cheaper, slower model and, if demand increased, later upgrade to a more expensive, faster model without having to abandon software that had already been developed. Over the years, IBM has introduced many new models with improved technology to replace older models, offering the customer greater speed, lower cost, or both. These newer models retained the same architecture so that the customer’s software investment was protected. Remarkably, the System/370 architecture, with a few enhancements, has survived to this day as the architecture of IBM’s main- frame product line. In a class of computers called microcomputers, the relationship between architecture and organization is very close. Changes in technology not only influ- ence organization but also result in the introduction of more powerful and more complex architectures. Generally, there is less of a requirement for g eneration- to-generation compatibility for these smaller machines. Thus, there is more inter- play between organizational and architectural design decisions. An intriguing example of this is the reduced instruction set computer (RISC), which we examine in Chapter 15. This book examines both computer organization and computer architecture. The emphasis is perhaps more on the side of organization. However, because a computer organization must be designed to implement a particular architectural specification, a thorough treatment of organization requires a detailed examination of architecture as well. 1.2 STRUCTURE AND FUNCTION A computer is a complex system; contemporary computers contain millions of elementary electronic components. How, then, can one clearly describe them? The key is to recognize the hierarchical nature of most complex systems, including the computer [SIMO96]. A hierarchical system is a set of interrelated subsystems; each subsystem may, in turn, contain lower level subsystems, until we reach some lowest level of e lementary subsystem. The hierarchical nature of complex systems is essential to both their design and their description. The designer need only deal with a particular level of the system at a time. At each level, the system consists of a set of components and their interrelationships. The behavior at each level depends only on a simplified, abstracted characterization of the system at the next lower level. At each level, the designer is concerned with structure and function: Structure: The way in which the components are interrelated. Function: The operation of each individual component as part of the structure. M01_STAL0103_11_GE_C01.indd 25 29/09/21 7:18 PM 26 CHAPTER 1 / Basic Concepts and Computer Evolution In terms of description, we have two choices: starting at the bottom and build- ing up to a complete description, or beginning with a top view and decomposing the system into its subparts. Evidence from a number of fields suggests that the top- down approach is the clearest and most effective [WEIN75]. The approach taken in this book follows from this viewpoint. The computer system will be described from the top down. We begin with the major components of a computer, describing their structure and function, and proceed to successively lower layers of the hierarchy. The remainder of this section provides a very brief overview of this plan of attack. Function Both the structure and functioning of a computer are, in essence, simple. In general terms, there are only four basic functions that a computer can perform: Data processing: Data may take a wide variety of forms, and the range of pro- cessing requirements is broad. However, we shall see that there are only a few fundamental methods or types of data processing. Data storage: Even if the computer is processing data on the fly (i.e., data come in and get processed, and the results go out immediately), the computer must temporarily store at least those pieces of data that are being worked on at any given moment. Thus, there is at least a s hort-term data storage function. Equally important, the computer performs a long-term data storage function. Files of data are stored on the computer for subsequent retrieval and update. Data movement: The computer’s operating environment consists of devices that serve as either sources or destinations of data. When data are received from or delivered to a device that is directly connected to the computer, the process is known as input–output (I/O), and the device is referred to as a peripheral. When data are moved over longer distances, to or from a remote device, the process is known as data communications. Control: Within the computer, a control unit manages the computer’s resources and orchestrates the performance of its functional parts in response to instructions. The preceding discussion may seem absurdly generalized. It is certainly possible, even at a top level of computer structure, to differentiate a variety of func- tions, but to quote [SIEW82]: There is remarkably little shaping of computer structure to fit the function to be performed. At the root of this lies the general-purpose nature of computers, in which all the functional specialization occurs at the time of programming and not at the time of design. Structure We now look in a general way at the internal structure of a computer. We begin with a traditional computer with a single processor that employs a microprogrammed control unit, then examine a typical multicore structure. M01_STAL0103_11_GE_C01.indd 26 29/09/21 7:18 PM 1.2 / Structure and Function 27 ingle-processor computer Figure 1.1 provides a hierarchical view simple s of the internal structure of a traditional s ingle-processor computer. There are four main structural components: Central processing unit (CPU): Controls the operation of the computer and performs its data processing functions; often simply referred to as processor. Main memory: Stores data. I/O: Moves data between the computer and its external environment. System interconnection: Some mechanism that provides for communica- tion among CPU, main memory, and I/O. A common example of system COMPUTER I/O Main memory System bus CPU CPU Registers ALU Internal bus Control unit CONTROL UNIT Sequencing logic Control unit registers and decoders Control memory Figure 1.1 The Computer: Top-Level Structure M01_STAL0103_11_GE_C01.indd 27 29/09/21 7:18 PM 28 CHAPTER 1 / Basic Concepts and Computer Evolution interconnection is by means of a system bus, consisting of a number of con- ducting wires to which all the other components attach. There may be one or more of each of the aforementioned components. raditionally, there has been just a single processor. In recent years, there has been T increasing use of multiple processors in a single computer. Some design issues relat- ing to multiple processors crop up and are discussed as the text proceeds; Part Five focuses on such computers. Each of these components will be examined in some detail in Part Two. How- ever, for our purposes, the most interesting and in some ways the most complex component is the CPU. Its major structural components are as follows: Control unit: Controls the operation of the CPU and hence the computer. Arithmetic and logic unit (ALU): Performs the computer’s data processing functions. Registers: Provides storage internal to the CPU. CPU interconnection: Some mechanism that provides for communication among the control unit, ALU, and registers. Part Three covers these components, where we will see that complexity is added by the use of parallel and pipelined organizational techniques. Finally, there are sev- eral approaches to the implementation of the control unit; one common approach is a microprogrammed implementation. In essence, a microprogrammed control unit operates by executing microinstructions that define the functionality of the control unit. With this approach, the structure of the control unit can be depicted, as in Figure 1.1. This structure is examined in Part Four. multicore computer structure As was mentioned, contemporary computers generally have multiple processors. When these processors all reside on a single chip, the term multicore computer is used, and each processing unit (consisting of a control unit, ALU, registers, and perhaps cache) is called a core. To clarify the terminology, this text will use the following definitions. Central processing unit (CPU): That portion of a computer that fetches and executes instructions. It consists of an ALU, a control unit, and registers. In a system with a single processing unit, it is often simply referred to as a processor. Core: An individual processing unit on a processor chip. A core may be equiv- alent in functionality to a CPU on a single-CPU system. Other specialized pro- cessing units, such as one optimized for vector and matrix operations, are also referred to as cores. Processor: A physical piece of silicon containing one or more cores. The processor is the computer component that interprets and executes instruc- tions. If a processor contains multiple cores, it is referred to as a multicore processor. After about a decade of discussion, there is broad industry consensus on this usage. Another prominent feature of contemporary computers is the use of multiple layers of memory, called cache memory, between the processor and main memory. M01_STAL0103_11_GE_C01.indd 28 29/09/21 7:18 PM 1.2 / Structure and Function 29 Chapter 4 is devoted to the topic of cache memory. For our purposes in this section, we simply note that a cache memory is smaller and faster than main memory and is used to speed up memory access, by placing in the cache data from main memory, that is likely to be used in the near future. A greater performance improvement may be obtained by using multiple levels of cache, with level 1 (L1) closest to the core and additional levels (L2, L3, and so on) progressively farther from the core. In this scheme, level n is smaller and faster than level n + 1. Figure 1.2 is a simplified view of the principal components of a typical mul- ticore computer. Most computers, including embedded computers in smartphones and tablets, plus personal computers, laptops, and workstations, are housed on a motherboard. Before describing this arrangement, we need to define some terms. A printed circuit board (PCB) is a rigid, flat board that holds and interconnects chips and other electronic components. The board is made of layers, typically two to ten, that interconnect components via copper pathways that are etched into MOTHERBOARD Main memory chips Processor chip I/O chips PROCESSOR CHIP Core Core Core Core L3 cache L3 cache Core Core Core Core CORE Arithmetic Instruction Load/ and logic logic store logic unit (ALU) L1 I-cache L1 data cache L2 instruction L2 data cache cache Figure 1.2 Simplified View of Major Elements of a Multicore Computer M01_STAL0103_11_GE_C01.indd 29 29/09/21 7:18 PM 30 CHAPTER 1 / Basic Concepts and Computer Evolution the board. The main printed circuit board in a computer is called a system board or motherboard, while smaller ones that plug into the slots in the main board are called expansion boards. The most prominent elements on the motherboard are the chips. A chip is a single piece of semiconducting material, typically silicon, upon which electronic circuits and logic gates are fabricated. The resulting product is referred to as an integrated circuit. The motherboard contains a slot or socket for the processor chip, which typ- ically contains multiple individual cores, in what is known as a multicore processor. There are also slots for memory chips, I/O controller chips, and other key computer components. For desktop computers, expansion slots enable the inclusion of more components on expansion boards. Thus, a modern motherboard connects only a few individual chip components, with each chip containing from a few thousand up to hundreds of millions of transistors. Figure 1.2 shows a processor chip that contains eight cores and an L3 cache. Not shown is the logic required to control operations between the cores and the cache and between the cores and the external circuitry on the motherboard. The figure indicates that the L3 cache occupies two distinct portions of the chip surface. However, typically, all cores have access to the entire L3 cache via the aforemen- tioned control circuits. The processor chip shown in Figure 1.2 does not represent any specific product, but provides a general idea of how such chips are laid out. Next, we zoom in on the structure of a single core, which occupies a portion of the processor chip. In general terms, the functional elements of a core are: Instruction logic: This includes the tasks involved in fetching instructions, and decoding each instruction to determine the instruction operation and the memory locations of any operands. Arithmetic and logic unit (ALU): Performs the operation specified by an instruction. Load/store logic: Manages the transfer of data to and from main memory via cache. The core also contains an L1 cache, split between an instruction cache (I-cache) that is used for the transfer of instructions to and from main memory, and an L1 data cache, for the transfer of operands and results. Typically, today’s pro- cessor chips also include an L2 cache as part of the core. In many cases, this cache is also split between instruction and data caches, although a combined, single L2 cache is also used. Keep in mind that this representation of the layout of the core is only intended to give a general idea of internal core structure. In a given product, the functional elements may not be laid out as the three distinct elements shown in Figure 1.2, especially if some or all of these functions are implemented as part of a micropro- grammed control unit. examples It will be instructive to study some real-world examples that illustrate the hierarchical structure of computers. Let us take as an example the motherboard for a computer built around two Intel Quad-Core Xeon processor chips. M01_STAL0103_11_GE_C01.indd 30 29/09/21 7:18 PM 1.2 / Structure and Function 31 Activity: The most important elements of this motherboard, in addition to the processor sockets are listed below. Research online and find motherboards that match this description. CI-Express slots for a high-end display adapter and for additional peripher- P als (Section 3.6 describes PCIe). Ethernet controller and Ethernet ports for network connections. USB sockets for peripheral devices. Serial ATA (SATA) sockets for connection to disk memory (Section 7.7 discusses Ethernet, USB, and SATA). Interfaces for DDR (double data rate) main memory chips (Section 5.3 discusses DDR). Intel 3420 chipset is an I/O controller for direct memory access operations between peripheral devices and main memory (Section 7.5 discusses DDR). Following our t op-down strategy, as illustrated in Figures 1.1 and 1.2, we can now zoom in and look at the internal structure of a processor chip, referred to as a processor unit (PU). For variety, we look at an IBM chip instead of the Intel proces- sor chip. Figure 1.3 is a to-scale layout of the processor chip for the IBM z13 main- frame computer [LASC16]. This chip has 3.99 billion transistors. The superimposed labels indicate how the silicon surface area of the chip is allocated. We see that this chip has eight cores, or processors. In addition, a substantial portion of the chip is devoted to the L3 cache, which is shared by all eight cores. The L3 control logic con- trols traffic between the L3 cache and the cores and between the L3 cache and the external environment. Additionally, there is storage control (SC) logic between the cores and the L3 cache. The memory controller (MC) function controls access to memory external to the chip. The GX I/O bus controls the interface to the channel adapters accessing the I/O. M01_STAL0103_11_GE_C01.indd 31 29/09/21 7:18 PM 32 CHAPTER 1 / Basic Concepts and Computer Evolution MC Drvrs MCU MC Rcvrs L3D 101 CP1 Drvrs L3D 100 CP0 Rcvrs L3B 10 Core0 Core2 Core4 L3D 001 L3C 1 L3D 000 L3B 00 CP0 Drvrs CP1 Rcvrs L3D 111 L3D 110 RU L3B 11 Core1 Core6 IFB ISU IDU L3C 0 FXU VFD SC Drvrs SC Rcvrs L3D 011 L3D 010 L3B 01 ICM LSU Core3 Core5 Core7 XU PC COP ICM ICM PCI1 PBU1 GX Drvrs GX GX Rcvrs PBU0 PCI0 Figure 1.4 IBM z13 Core Figure 1.3 IBM z13 Processor Unit (PU) Chip Diagram Layout Going down one level deeper, we examine the internal structure of a single core, as shown in the photograph of Figure 1.4. The core implements the z13 instruc- tion set architecture, referred to as the z/Architecture. Keep in mind that this is a portion of the silicon surface area making up a single-processor chip. The main sub- areas within this core area are the following: ISU (instruction sequence unit): Determines the sequence in which instruc- tions are executed in what is referred to as a superscalar architecture. It enables the out-of-order (OOO) pipeline. It tracks register names, OOO instruction dependency, and handling of instruction resource dispatch. These concepts are discussed in Chapter 16. IFB (instruction fetch and branch) and ICM (instruction cache and merge) These two subunits contain the 128-kB1 instruction cache, branch prediction logic, instruction fetching controls, and buffers. The relative size of these sub- units is the result of the elaborate branch prediction design. IDU (instruction decode unit): The IDU is fed from the IFU buffers, and is responsible for the parsing and decoding of all z/Architecture operation codes. LSU (load-store unit): The LSU contains the 96-kB L1 data cache, and man- ages data traffic between the L2 data cache and the functional execution units. It is responsible for handling all types of operand accesses of all lengths, modes, and formats as defined in the z/Architecture. XU (translation unit): This unit translates logical addresses from instructions into physical addresses in main memory. The XU also contains a translation 1 kB = kilobyte = 1048 bytes. Numerical prefixes are explained in a document under the “Other Useful” tab at ComputerScienceStudent.com. M01_STAL0103_11_GE_C01.indd 32 29/09/21 7:18 PM 1.3 / The IAS Computer 33 lookaside buffer (TLB) used to speed up memory access. TLBs are discussed in Chapter 8. PC (core pervasive unit): Used for instrumentation and error collection. FXU (fixed-point unit): The FXU executes fixed-point arithmetic operations. VFU (vector and floating-point units): The binary floating-unit part handles all binary and hexadecimal floating-point operations, as well as fixed-point multiplication operations. The decimal floating-unit part handles both fixed- point and floating-point operations on numbers that are stored as decimal digits. The vector execution part handles vector operations. RU (recovery unit): The RU keeps a copy of the complete state of the sys- tem that includes all registers, collects hardware fault signals, and manages the hardware recovery actions. COP (dedicated co-processor): The COP is responsible for data compression and encryption functions for each core. L2D: A 2-MB L2 data cache for all memory traffic other than instructions. L2I: A 2-MB L2 instruction cache. As we progress through the book, the concepts introduced in this section will become clearer. 1.3 THE IAS COMPUTER The first generation of computers used vacuum tubes for digital logic elements and memory. A number of research and then commercial computers were built using vacuum tubes. For our purposes, it will be instructive to examine perhaps the most famous first-generation computer, known as the IAS computer. This example illus- trates many of the fundamental concepts found in all computer systems. A fundamental design approach first implemented in the IAS computer is known as the stored-program concept. This idea is usually attributed to the math- ematician John von Neumann. Alan Turing developed the idea at about the same time. The first publication of the idea was in a 1945 proposal by von Neumann for a new computer, the EDVAC (Electronic Discrete Variable Computer).2 In 1946, von Neumann and his colleagues began the design of a new stored- program computer, referred to as the IAS computer, at the Princeton Institute for Advanced Studies. The IAS computer, although not completed until 1952, is the prototype of all subsequent g eneral-purpose computers.3 Figure 1.5 shows the structure of the IAS computer (compare with Figure 1.1). It consists of A main memory, which stores both data and instructions4 An arithmetic and logic unit (ALU) capable of operating on binary data 2 The 1945 report on EDVAC is available at box.com/COA11e. 3 A 1954 report [GOLD54] describes the implemented IAS machine and lists the final instruction set. It is available at box.com/COA11e. 4 In this book, unless otherwise noted, the term instruction refers to a machine instruction that is directly interpreted and executed by the processor, in contrast to a statement in a high-level language, such as Ada or C++, which must first be compiled into a series of machine instructions before being executed. M01_STAL0103_11_GE_C01.indd 33 29/09/21 7:18 PM 34 CHAPTER 1 / Basic Concepts and Computer Evolution Central processing unit (CPU) Arithmetic-logic unit (CA) AC MQ Input- Arithmetic-logic output circuits equipment (I, O) MBR Instructions and data Instructions and data M(0) M(1) M(2) M(3) PC IBR M(4) AC: Accumulator register MQ: multiply-quotient register MBR: memory buffer register IBR: instruction buffer register MAR IR PC: program counter Main MAR: memory address register memory IR: insruction register (M) Control Control signals circuits M(4092) M(4093) M(4095) Program control unit (CC) Addresses Figure 1.5 IAS Structure A control unit, which interprets the instructions in memory and causes them to be executed Input–output (I/O) equipment operated by the control unit

Computer Organization and Architecture (11th Edition Global) PDF

Document Details

Tags

Related

Summary

Full Transcript