CO textbook.pdf
Document Details
Uploaded by EasierPromethium5587
Mangalore University
Related
- PCSII Depression/Anxiety/Strong Emotions 2024 Document
- A Concise History of the World: A New World of Connections (1500-1800)
- Human Bio Test PDF
- Vertebrate Pest Management PDF
- Lg 5 International Environmental Laws, Treaties, Protocols, and Conventions
- Fármacos Anticoncepcionais e Disfunção Erétil PDF
Full Transcript
This page intentionally left blank This page intentionally left blank December 9, 2010 12:37 ham_338065_halftitle Sheet number 1 Page number i cyan black COMPUTER ORGANIZATION AND EMBEDDED SYSTEMS This page intentionally left blank December...
This page intentionally left blank This page intentionally left blank December 9, 2010 12:37 ham_338065_halftitle Sheet number 1 Page number i cyan black COMPUTER ORGANIZATION AND EMBEDDED SYSTEMS This page intentionally left blank December 15, 2010 09:16 ham_338065_title Sheet number 1 Page number iii cyan black COMPUTER ORGANIZATION AND EMBEDDED SYSTEMS SIXTH EDITION Carl Hamacher Queen’s University Zvonko Vranesic University of Toronto Safwat Zaky University of Toronto Naraig Manjikian Queen’s University December 22, 2010 10:39 ham_338065_copy Sheet number 1 Page number iv cyan black COMPUTER ORGANIZATION AND EMBEDDED SYSTEMS, SIXTH EDITION Published by McGraw-Hill, a business unit of The McGraw-Hill Companies, Inc., 1221 Avenue of the Americas, New York, NY 10020. Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Previous editions 2002, 1996, and 1990. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written consent of The McGraw-Hill Companies, Inc., including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning. Some ancillaries, including electronic and print components, may not be available to customers outside the United States. This book is printed on acid-free paper. 1 2 3 4 5 6 7 8 9 DOC/DOC 0 9 8 7 6 5 4 3 2 1 ISBN 978–0–07–338065–0 MHID 0–07–338065–2 Vice President & Editor-in-Chief: Marty Lange Vice President EDP/Central Publishing Services: Kimberly Meriwether David Publisher: Raghothaman Srinivasan Senior Sponsoring Editor: Peter E. Massar Developmental Editor: Darlene M. Schueller Senior Marketing Manager: Curt Reynolds Senior Project Manager: Lisa A. Bruflodt Buyer: Laura Fuller Design Coordinator: Brenda A. Rolwes Media Project Manager: Balaji Sundararaman Cover Design: Studio Montage, St. Louis, Missouri Cover Image: © Royalty-Free/CORBIS Compositor: Techsetters, Inc. Typeface: 10/12 Times Roman Printer: R. R. Donnelley & Sons Company/Crawfordsville, IN Library of Congress Cataloging-in-Publication Data Computer organization and embedded systems / Carl Hamacher... [et al.]. – 6th ed. p. cm. Includes bibliographical references. ISBN-13: 978-0-07-338065-0 (alk. paper) ISBN-10: 0-07-338065-2 (alk. paper) 1. Computer organization. 2. Embedded computer systems. I. Hamacher, V. Carl. QA76.9.C643.H36 2012 004.2'2–dc22 2010050243 www.mhhe.com December 7, 2010 11:51 ham_338065_ded Sheet number 1 Page number v cyan black To our families This page intentionally left blank December 15, 2010 09:18 ham_338065_ata Sheet number 1 Page number vii cyan black About the Authors Carl Hamacher received the B.A.Sc. degree in Engineering Physics from the University of Waterloo, Canada, the M.Sc. degree in Electrical Engineering from Queen’s University, Canada, and the Ph.D. degree in Electrical Engineering from Syracuse University, New York. From 1968 to 1990 he was at the University of Toronto, Canada, where he was a Professor in the Department of Electrical Engineering and the Department of Computer Science. He served as director of the Computer Systems Research Institute during 1984 to 1988, and as chairman of the Division of Engineering Science during 1988 to 1990. In 1991 he joined Queen’s University, where is now Professor Emeritus in the Department of Electrical and Computer Engineering. He served as Dean of the Faculty of Applied Science from 1991 to 1996. During 1978 to 1979, he was a visiting scientist at the IBM Research Laboratory in San Jose, California. In 1986, he was a research visitor at the Laboratory for Circuits and Systems associated with the University of Grenoble, France. During 1996 to 1997, he was a visiting professor in the Computer Science Department at the University of California at Riverside and in the LIP6 Laboratory of the University of Paris VI. His research interests are in multiprocessors and multicomputers, focusing on their interconnection networks. Zvonko Vranesic received his B.A.Sc., M.A.Sc., and Ph.D. degrees, all in Electrical En- gineering, from the University of Toronto. From 1963 to 1965 he worked as a design engineer with the Northern Electric Co. Ltd. in Bramalea, Ontario. In 1968 he joined the University of Toronto, where he is now a Professor Emeritus in the Department of Electrical & Computer Engineering. During the 1978–79 academic year, he was a Senior Visitor at the University of Cambridge, England, and during 1984-85 he was at the University of Paris, 6. From 1995 to 2000 he served as Chair of the Division of Engineering Science at the University of Toronto. He is also involved in research and development at the Altera Toronto Technology Center. His current research interests include computer architecture and field-programmable VLSI technology. He is a coauthor of four other books: Fundamentals of Digital Logic with VHDL Design, 3rd ed.; Fundamentals of Digital Logic with Verilog Design, 2nd ed.; Microcom- puter Structures; and Field-Programmable Gate Arrays. In 1990, he received the Wighton Fellowship for “innovative and distinctive contributions to undergraduate laboratory in- struction.” In 2004, he received the Faculty Teaching Award from the Faculty of Applied Science and Engineering at the University of Toronto. Safwat Zaky received his B.Sc. degree in Electrical Engineering and B.Sc. in Mathemat- ics, both from Cairo University, Egypt, and his M.A.Sc. and Ph.D. degrees in Electrical Engineering from the University of Toronto. From 1969 to 1972 he was with Bell North- ern Research, Bramalea, Ontario, where he worked on applications of electro-optics and vii This page intentionally left blank December 15, 2010 09:18 ham_338065_ata Sheet number 2 Page number viii cyan black viii About the Authors magnetics in mass storage and telephone switching. In 1973, he joined the University of Toronto, where he is now Professor Emeritus in the Department of Electrical and Computer Engineering. He served as Chair of the Department from 1993 to 2003 and as Vice-Provost from 2003 to 2009. During 1980 to 1981, he was a senior visitor at the Computer Laboratory, University of Cambridge, England. He is a Fellow of the Canadian Academy of Engineering. His research interests are in the areas of computer architecture, digital-circuit design, and electromagnetic compatibility. He is a coauthor of the book Microcomputer Structures and is a recipient of the IEEE Third Millennium Medal and of the Vivek Goel Award for distinguished service to the University of Toronto. Naraig Manjikian received his B.A.Sc. degree in Computer Engineering and M.A.Sc. degree in Electrical Engineering from the University of Waterloo, Canada, and his Ph.D. degree in Electrical Engineering from the University of Toronto. In 1997, he joined Queen’s University, Kingston, Canada, where he is now an Associate Professor in the Department of Electrical and Computer Engineering. From 2004 to 2006, he served as Undergraduate Chair for Computer Engineering. From 2006 to 2007, he served as Acting Head of the Department of Electrical and Computer Engineering, and from 2007 until 2009, he served as Associate Head for Student and Alumni Affairs. During 2003 to 2004, he was a visiting professor at McGill University, Montreal, Canada, and the University of British Columbia. During 2010 to 2011, he was a visiting professor at McGill University. His research interests are in the areas of computer architecture, multiprocessor systems, field-programmable VLSI technology, and applications of parallel processing. December 15, 2010 09:21 ham_338065_pref Sheet number 1 Page number ix cyan black Preface This book is intended for use in a first-level course on computer organization and embedded systems in electrical engineering, computer engineering, and computer science curricula. The book is self-contained, assuming only that the reader has a basic knowledge of computer programming in a high-level language. Many students who study computer organization will have had an introductory course on digital logic circuits. Therefore, this subject is not covered in the main body of the book. However, we have provided an extensive appendix on logic circuits for those students who need it. The book reflects our experience in teaching three distinct groups of students: elec- trical and computer engineering undergraduates, computer science undergraduates, and engineering science undergraduates. We have always approached the teaching of courses on computer organization from a practical point of view. Thus, a key consideration in shap- ing the contents of the book has been to carefully explain the main principles, supported by examples drawn from commercially available processors. Our main commercial examples are based on: Altera’s Nios II, Freescale’s ColdFire, ARM, and Intel’s IA-32 architectures. It is important to recognize that digital system design is not a straightforward process of applying optimal design algorithms. Many design decisions are based largely on heuristic judgment and experience. They involve cost/performance and hardware/software tradeoffs over a range of alternatives. It is our goal to convey these notions to the reader. The book is aimed at a one-semester course in engineering or computer science pro- grams. It is suitable for both hardware- and software-oriented students. Even though the emphasis is on hardware, we have addressed a number of relevant software issues. McGraw-Hill maintains a Website with support material for the book at http://www. mhhe.com/hamacher. Scope of the Book The first three chapters introduce the basic structure of computers, the operations that they perform at the machine-instruction level, and input/output methods as seen by a programmer. The fourth chapter provides an overview of the system software needed to translate programs written in assembly and high-level languages into machine language and to manage their execution. The remaining eight chapters deal with the organization, interconnection, and performance of hardware units in modern computers, including a coverage of embedded systems. Five substantial appendices are provided. The first appendix covers digital logic circuits. Then, four current commercial instruction set architectures—Altera’s Nios II, Freescale’s ColdFire, ARM, and Intel’s IA-32—are described in separate appendices. Chapter 1 provides an overview of computer hardware and informally introduces terms that are discussed in more depth in the remainder of the book. This chapter discusses ix December 15, 2010 09:21 ham_338065_pref Sheet number 2 Page number x cyan black x Preface the basic functional units and the ways they interact to form a complete computer system. Number and character representations are discussed, along with basic arithmetic operations. An introduction to performance issues and a brief treatment of the history of computer development are also provided. Chapter 2 gives a methodical treatment of machine instructions, addressing techniques, and instruction sequencing. Program examples at the machine-instruction level, expressed in a generic assembly language, are used to discuss concepts that include loops, subroutines, and stacks. The concepts are introduced using a RISC-style instruction set architecture. A comparison with CISC-style instruction sets is also included. Chapter 3 presents a programmer’s view of basic input/output techniques. It explains how program-controlled I/O is performed using polling, as well as how interrupts are used in I/O transfers. Chapter 4 considers system software. The tasks performed by compilers, assemblers, linkers, and loaders are explained. Utility programs that trace and display the results of executing a program are described. Operating system routines that manage the execution of user programs and their input/output operations, including the handling of interrupts, are also described. Chapter 5 explores the design of a RISC-style processor. This chapter explains the sequence of processing steps needed to fetch and execute the different types of machine instructions. It then develops the hardware organization needed to implement these pro- cessing steps. The differing requirements of CISC-style processors are also considered. Chapter 6 provides coverage of the use of pipelining and multiple execution units in the design of high-performance processors. A pipelined version of the RISC-style processor design from Chapter 5 is used to illustrate pipelining. The role of the compiler and the rela- tionship between pipelined execution and instruction set design are explored. Superscalar processors are discussed. Input/output hardware is considered in Chapter 7. Interconnection networks, including the bus structure, are discussed. Synchronous and asynchronous operation is explained. Interconnection standards, including USB and PCI Express, are also presented. Semiconductor memories, including SDRAM, Rambus, and Flash memory imple- mentations, are discussed in Chapter 8. Caches are explained as a way for increasing the memory bandwidth. They are discussed in some detail, including performance modeling. Virtual-memory systems, memory management, and rapid address-translation techniques are also presented. Magnetic and optical disks are discussed as components in the memory hierarchy. Chapter 9 explores the implementation of the arithmetic unit of a computer. Logic design for fixed-point add, subtract, multiply, and divide hardware, operating on 2’s- complement numbers, is described. Carry-lookahead adders and high-speed multipliers are explained, including descriptions of the Booth multiplier recoding and carry-save addi- tion techniques. Floating-point number representation and operations, in the context of the IEEE Standard, are presented. Today, far more processors are in use in embedded systems than in general-purpose computers. Chapters 10 and 11 are dedicated to the subject of embedded systems. First, basic aspects of system integration, component interconnections, and real-time operation are presented in Chapter 10. The use of microcontrollers is discussed. Then, Chapter 11 concentrates on system-on-a-chip (SoC) implementations, in which a single chip integrates December 15, 2010 09:21 ham_338065_pref Sheet number 3 Page number xi cyan black Preface xi the processing, memory, I/O, and timer functionality needed to satisfy application-specific requirements. A substantial example shows how FPGAs and modern design tools can be used in this environment. Chapter 12 focuses on parallel processing and performance. Hardware multithread- ing and vector processing are introduced as enhancements in a single processor. Shared- memory multiprocessors are then described, along with the issue of cache coherence. In- terconnection networks for multiprocessors are presented. Appendix A provides extensive coverage of logic circuits, intended for a reader who has not taken a course on the design of such circuits. Appendices B, C, D, and E illustrate how the instruction set concepts introduced in Chapters 2 and 3 are implemented in four commercial processors: Nios II, ColdFire, ARM, and Intel IA-32. The Nios II and ARM processors illustrate the RISC design style. ColdFire has an easy-to-teach CISC design, while the IA-32 CISC architecture represents the most successful commercial design. The presentation for each processor includes assembly- language examples from Chapters 2 and 3, implemented in the context of that processor. The details given in these appendices are not essential for understanding the material in the main body of the book. It is sufficient to cover only one of these appendices to gain an appreciation for commercial processor instruction sets. The choice of a processor to use as an example is likely to be influenced by the equipment in an accompanying laboratory. Instructors may wish to use more that one processor to illustrate the different design approaches. Changes in the Sixth Edition Substantial changes in content and organization have been made in preparing the sixth edition of this book. They include the following: The basic concepts of instruction set architecture are now covered using the RISC-style approach. This is followed by a comparative examination of the CISC-style approach. The processor design discussion is focused on a RISC-style implementation, which leads naturally to pipelined operation. Two chapters on embedded systems are included: one dealing with the basic structure of such systems and the use of microcontrollers, and the other dealing with system-on- a-chip implementations. Appendices are used to give examples of four commercial processors. Each appendix includes the essential information about the instruction set architecture of the given processor. Solved problems have been included in a new section toward the end of chapters and appendices. They provide the student with solutions that can be expected for typical problems. Difficulty Level of Problems The problems at the end of chapters and appendices have been classified as easy (E), medium (M), or difficult (D). These classifications should be interpreted as follows: December 15, 2010 09:21 ham_338065_pref Sheet number 4 Page number xii cyan black xii Preface Easy—Solutions can be derived in a few minutes by direct application of specific information presented in one place in the relevant section of the book. Medium—Use of the book material in a way that does not directly follow any examples presented is usually needed. In some cases, solutions may follow the general pattern of an example, but will take longer to develop than those for easy problems. Difficult—Some additional insight is needed to solve these problems. If a solution requires a program to be written, its underlying algorithm or form may be quite different from that of any program example given in the book. If a hardware design is required, it may involve an arrangement and interconnection of basic logic circuit components that is quite different from any design shown in the book. If a performance analysis is needed, it may involve the derivation of an algebraic expression. What Can Be Covered in a One-Semester Course This book is suitable for use at the university or college level as a text for a one-semester course in computer organization. It is intended for the first course that students will take on computer organization. There is more than enough material in the book for a one-semester course. The core material on computer organization and relevant software issues is given in Chapters 1 through 9. For students who have not had a course in logic circuits, the material in Appendix A should be studied near the beginning of a course and certainly prior to covering Chapter 5. A course aimed at embedded systems should include Chapters 1, 2, 3, 4, 7, 8, 10 and 11. Use of the material on commercial processor examples in Appendices B through E can be guided by instructor and student interest, as well as by relevance to any hardware laboratory associated with a course. Acknowledgments We wish to express our thanks to many people who have helped us during the preparation of this sixth edition of the book. Our colleagues Daniel Etiemble of University of Paris South and Glenn Gulak of Uni- versity of Toronto provided numerous comments and suggestions that helped significantly in shaping the material. Blair Fort and Dan Vranesic provided valuable help with some of the programming examples. Warren R. Carithers of Rochester Institute of Technology, Krishna M. Kavi of Uni- versity of North Texas, and Nelson Luiz Passos of Midwestern State University provided reviews of material from both the fifth and sixth editions of the book. The following people provided reviews of material from the fifth edition of the book: Goh Hock Ann of Multimedia University, Joseph E. Beaini of University of Colorado Denver, Kalyan Mohan Goli of Jawaharlal Nehru Technological University, Jaimon Jacob of Model Engineering College Ernakulam, M. Kumaresan of Anna University Coimbatore, December 15, 2010 09:21 ham_338065_pref Sheet number 5 Page number xiii cyan black Preface xiii Kenneth K. C. Lee of City University of Hong Kong, Manoj Kumar Mishra of Institute of Technical Education and Research, Junita Mohamad-Saleh of Universiti Sains Malaysia, Prashanta Kumar Patra of College of Engineering and Technology Bhubaneswar, Shanq- Jang Ruan of National Taiwan University of Science and Technology, S. D. Samantaray of G. B. Pant University of Agriculture and Technology, Shivakumar Sastry of University of Akron, Donatella Sciuto of Politecnico of Milano, M. P. Singh of National Institute of Technology Patna, Albert Starling of University of Arkansas, Shannon Tauro of University of California Irvine, R. Thangarajan of Kongu Engineering College, Ashok Kunar Turuk of National Institute of Technology Rourkela, and Philip A. Wilsey of University of Cincinnati. Finally, we truly appreciate the support of Raghothaman Srinivasan, Peter E. Massar, Darlene M. Schueller, Lisa Bruflodt, Curt Reynolds, Brenda Rolwes, and Laura Fuller at McGraw-Hill. Carl Hamacher Zvonko Vranesic Safwat Zaky Naraig Manjikian December 15, 2010 12:12 ham_338065_extrafm Sheet number 1 Page number xiv cyan black McGraw-Hill CreateTM Craft your teaching resources to match the way you teach! With McGraw-Hill Create, www.mcgrawhillcreate.com, you can easily rearrange chapters, com- bine material from other content sources, and quickly upload content you have written like your course syllabus or teaching notes. Find the content you need in Create by search- ing through thousands of leading McGraw-Hill textbooks. Arrange your book to fit your teaching style. Create even allows you to personalize your book’s appearance by selecting the cover and adding your name, school, and course information. Order a Create book and you’ll receive a complimentary print review copy in 3-5 business days or a complimentary electronic review copy (eComp) via email in minutes. Go to www.mcgrawhillcreate.com today and register to experience how McGraw-Hill Create empowers you to teach your students your way. McGraw-Hill Higher Education and Blackboard® have teamed up. Blackboard, the Web-based course management system, has partnered with McGraw-Hill to better allow students and faculty to use online materials and activities to complement face-to-face teaching. Blackboard features exciting social learning and teaching tools that foster more logical, visually impactful and active learning opportunities for students. You’ll transform your closed-door classrooms into communities where students remain connected to their educational experience 24 hours a day. This partnership allows you and your students access to McGraw-Hill’s Create right from within your Blackboard course - all with one single sign-on. McGraw-Hill and Blackboard can now offer you easy access to industry leading technology and content, whether your campus hosts it, or we do. Be sure to ask your local McGraw-Hill representative for details. December 16, 2010 09:28 ham_338065_toc Sheet number 1 Page number xv cyan black Contents Chapter 1 2.3 Instructions and Instruction Sequencing 32 Basic Structure of 2.3.1 Register Transfer Notation 33 2.3.2 Assembly-Language Notation 33 Computers 1 2.3.3 RISC and CISC Instruction Sets 34 1.1 Computer Types 2 2.3.4 Introduction to RISC Instruction 1.2 Functional Units 3 Sets 34 2.3.5 Instruction Execution and Straight-Line 1.2.1 Input Unit 4 Sequencing 36 1.2.2 Memory Unit 4 2.3.6 Branching 37 1.2.3 Arithmetic and Logic Unit 5 2.3.7 Generating Memory Addresses 40 1.2.4 Output Unit 6 1.2.5 Control Unit 6 2.4 Addressing Modes 40 2.4.1 Implementation of Variables and 1.3 Basic Operational Concepts 7 Constants 41 1.4 Number Representation and Arithmetic 2.4.2 Indirection and Pointers 42 Operations 9 2.4.3 Indexing and Arrays 45 1.4.1 Integers 10 2.5 Assembly Language 48 1.4.2 Floating-Point Numbers 16 2.5.1 Assembler Directives 50 1.5 Character Representation 17 2.5.2 Assembly and Execution of 1.6 Performance 17 Programs 53 1.6.1 Technology 17 2.5.3 Number Notation 54 1.6.2 Parallelism 19 2.6 Stacks 55 1.7 Historical Perspective 19 2.7 Subroutines 56 1.7.1 The First Generation 20 2.7.1 Subroutine Nesting and the Processor 1.7.2 The Second Generation 20 Stack 58 1.7.3 The Third Generation 21 2.7.2 Parameter Passing 59 1.7.4 The Fourth Generation 21 2.7.3 The Stack Frame 63 1.8 Concluding Remarks 22 2.8 Additional Instructions 65 1.9 Solved Problems 22 2.8.1 Logic Instructions 67 Problems 24 2.8.2 Shift and Rotate Instructions 68 References 25 2.8.3 Multiplication and Division 71 2.9 Dealing with 32-Bit Immediate Values 73 Chapter 2 2.10 CISC Instruction Sets 74 2.10.1 Additional Addressing Modes 75 Instruction Set 2.10.2 Condition Codes 77 Architecture 27 2.11 RISC and CISC Styles 78 2.1 Memory Locations and Addresses 28 2.12 Example Programs 79 2.1.1 Byte Addressability 30 2.12.1 Vector Dot Product Program 79 2.1.2 Big-Endian and Little-Endian 2.12.2 String Search Program 81 Assignments 30 2.13 Encoding of Machine Instructions 82 2.1.3 Word Alignment 31 2.14 Concluding Remarks 85 2.1.4 Accessing Numbers and Characters 32 2.15 Solved Problems 85 2.2 Memory Operations 32 Problems 90 xv December 16, 2010 09:28 ham_338065_toc Sheet number 2 Page number xvi cyan black xvi Contents Chapter 3 Chapter 5 Basic Input/Output 95 Basic Processing Unit 151 3.1 Accessing I/O Devices 96 5.1 Some Fundamental Concepts 152 3.1.1 I/O Device Interface 97 5.2 Instruction Execution 155 3.1.2 Program-Controlled I/O 97 5.2.1 Load Instructions 155 3.1.3 An Example of a RISC-Style I/O 5.2.2 Arithmetic and Logic Instructions 156 Program 101 5.2.3 Store Instructions 157 3.1.4 An Example of a CISC-Style I/O 5.3 Hardware Components 158 Program 101 5.3.1 Register File 158 3.2 Interrupts 103 5.3.2 ALU 160 3.2.1 Enabling and Disabling Interrupts 106 5.3.3 Datapath 161 3.2.2 Handling Multiple Devices 107 5.3.4 Instruction Fetch Section 164 3.2.3 Controlling I/O Device Behavior 109 5.4 Instruction Fetch and Execution Steps 165 3.2.4 Processor Control Registers 110 5.4.1 Branching 168 3.2.5 Examples of Interrupt Programs 111 5.4.2 Waiting for Memory 171 3.2.6 Exceptions 116 5.5 Control Signals 172 3.3 Concluding Remarks 119 5.6 Hardwired Control 175 3.4 Solved Problems 119 5.6.1 Datapath Control Signals 177 Problems 126 5.6.2 Dealing with Memory Delay 177 5.7 CISC-Style Processors 178 Chapter 4 5.7.1 An Interconnect using Buses 180 Software 129 5.7.2 Microprogrammed Control 183 5.8 Concluding Remarks 185 4.1 The Assembly Process 130 5.9 Solved Problems 185 4.1.1 Two-pass Assembler 131 Problems 188 4.2 Loading and Executing Object Programs 131 4.3 The Linker 132 Chapter 6 4.4 Libraries 133 4.5 The Compiler 133 Pipelining 193 4.5.1 Compiler Optimizations 134 6.1 Basic Concept—The Ideal Case 194 4.5.2 Combining Programs Written in 6.2 Pipeline Organization 195 Different Languages 134 6.3 Pipelining Issues 196 4.6 The Debugger 134 6.4 Data Dependencies 197 4.7 Using a High-level Language for I/O 6.4.1 Operand Forwarding 198 Tasks 137 6.4.2 Handling Data Dependencies in 4.8 Interaction between Assembly Language and Software 199 C Language 139 6.5 Memory Delays 201 4.9 The Operating System 143 6.6 Branch Delays 202 4.9.1 The Boot-strapping Process 144 6.6.1 Unconditional Branches 202 4.9.2 Managing the Execution of Application 6.6.2 Conditional Branches 204 Programs 144 6.6.3 The Branch Delay Slot 204 4.9.3 Use of Interrupts in Operating 6.6.4 Branch Prediction 205 Systems 146 6.7 Resource Limitations 209 4.10 Concluding Remarks 149 6.8 Performance Evaluation 209 Problems 149 6.8.1 Effects of Stalls and Penalties 210 References 150 6.8.2 Number of Pipeline Stages 212 December 16, 2010 09:28 ham_338065_toc Sheet number 3 Page number xvii cyan black Contents xvii 6.9 Superscalar Operation 212 8.2.4 Synchronous DRAMs 276 6.9.1 Branches and Data Dependencies 214 8.2.5 Structure of Larger Memories 279 6.9.2 Out-of-Order Execution 215 8.3 Read-only Memories 282 6.9.3 Execution Completion 216 8.3.1 ROM 283 6.9.4 Dispatch Operation 217 8.3.2 PROM 283 6.10 Pipelining in CISC Processors 218 8.3.3 EPROM 284 6.10.1 Pipelining in ColdFire Processors 219 8.3.4 EEPROM 284 6.10.2 Pipelining in Intel Processors 219 8.3.5 Flash Memory 284 6.11 Concluding Remarks 220 8.4 Direct Memory Access 285 6.12 Examples of Solved Problems 220 8.5 Memory Hierarchy 288 Problems 222 8.6 Cache Memories 289 References 226 8.6.1 Mapping Functions 291 8.6.2 Replacement Algorithms 296 Chapter 7 8.6.3 Examples of Mapping Techniques 297 Input/Output Organization 227 8.7 Performance Considerations 300 8.7.1 Hit Rate and Miss Penalty 301 7.1 Bus Structure 228 8.7.2 Caches on the Processor Chip 302 7.2 Bus Operation 229 8.7.3 Other Enhancements 303 7.2.1 Synchronous Bus 230 8.8 Virtual Memory 305 7.2.2 Asynchronous Bus 233 8.8.1 Address Translation 306 7.2.3 Electrical Considerations 236 8.9 Memory Management Requirements 310 7.3 Arbitration 237 8.10 Secondary Storage 311 7.4 Interface Circuits 238 8.10.1 Magnetic Hard Disks 311 7.4.1 Parallel Interface 239 8.10.2 Optical Disks 317 7.4.2 Serial Interface 243 8.10.3 Magnetic Tape Systems 322 7.5 Interconnection Standards 247 8.11 Concluding Remarks 323 7.5.1 Universal Serial Bus (USB) 247 7.5.2 FireWire 251 8.12 Solved Problems 324 7.5.3 PCI Bus 252 Problems 328 7.5.4 SCSI Bus 256 References 332 7.5.5 SATA 258 7.5.6 SAS 258 Chapter 9 7.5.7 PCI Express 258 Arithmetic 335 7.6 Concluding Remarks 260 9.1 Addition and Subtraction of Signed 7.7 Solved Problems 260 Numbers 336 Problems 263 9.1.1 Addition/Subtraction Logic Unit 336 References 266 9.2 Design of Fast Adders 339 Chapter 8 9.2.1 Carry-Lookahead Addition 340 9.3 Multiplication of Unsigned Numbers 344 The Memory System 267 9.3.1 Array Multiplier 344 8.1 Basic Concepts 268 9.3.2 Sequential Circuit Multiplier 346 8.2 Semiconductor RAM Memories 270 9.4 Multiplication of Signed Numbers 346 8.2.1 Internal Organization of Memory 9.4.1 The Booth Algorithm 348 Chips 270 9.5 Fast Multiplication 351 8.2.2 Static Memories 271 9.5.1 Bit-Pair Recoding of Multipliers 352 8.2.3 Dynamic RAMs 274 9.5.2 Carry-Save Addition of Summands 353 December 16, 2010 09:28 ham_338065_toc Sheet number 4 Page number xviii cyan black xviii Contents 9.5.3 Summand Addition Tree using 3-2 Chapter 11 Reducers 355 System-on-a-Chip—A Case 9.5.4 Summand Addition Tree using 4-2 Reducers 357 Study 421 9.5.5 Summary of Fast Multiplication 359 11.1 FPGA Implementation 422 9.6 Integer Division 360 11.1.1 FPGA Devices 423 9.7 Floating-Point Numbers and Operations 363 11.1.2 Processor Choice 423 9.7.1 Arithmetic Operations on 11.2 Computer-Aided Design Tools 424 Floating-Point Numbers 367 11.2.1 Altera CAD Tools 425 9.7.2 Guard Bits and Truncation 368 11.3 Alarm Clock Example 428 9.7.3 Implementing Floating-Point 11.3.1 User’s View of the System 428 Operations 369 11.3.2 System Definition and Generation 429 9.8 Decimal-to-Binary Conversion 372 11.3.3 Circuit Implementation 430 9.9 Concluding Remarks 372 11.3.4 Application Software 431 9.10 Solved Problems 374 11.4 Concluding Remarks 440 Problems 377 Problems 440 References 383 References 441 Chapter 10 Chapter 12 Embedded Systems 385 Parallel Processing and 10.1 Examples of Embedded Systems 386 Performance 443 10.1.1 Microwave Oven 386 10.1.2 Digital Camera 387 12.1 Hardware Multithreading 444 10.1.3 Home Telemetry 390 12.2 Vector (SIMD) Processing 445 10.2 Microcontroller Chips for Embedded 12.2.1 Graphics Processing Units (GPUs) 448 Applications 390 12.3 Shared-Memory Multiprocessors 448 10.3 A Simple Microcontroller 392 12.3.1 Interconnection Networks 450 10.3.1 Parallel I/O Interface 392 12.4 Cache Coherence 453 10.3.2 Serial I/O Interface 395 12.4.1 Write-Through Protocol 453 10.3.3 Counter/Timer 397 12.4.2 Write-Back protocol 454 10.3.4 Interrupt-Control Mechanism 399 12.4.3 Snoopy Caches 454 10.3.5 Programming Examples 399 12.4.4 Directory-Based Cache Coherence 456 10.4 Reaction Timer—A Complete Example 401 12.5 Message-Passing Multicomputers 456 10.5 Sensors and Actuators 407 12.6 Parallel Programming for 10.5.1 Sensors 407 Multiprocessors 456 10.5.2 Actuators 410 12.7 Performance Modeling 460 10.5.3 Application Examples 411 12.8 Concluding Remarks 461 10.6 Microcontroller Families 412 Problems 462 10.6.1 Microcontrollers Based on the Intel References 463 8051 413 10.6.2 Freescale Microcontrollers 413 Appendix A 10.6.3 ARM Microcontrollers 414 10.7 Design Issues 414 Logic Circuits 465 10.8 Concluding Remarks 417 A.1 Basic Logic Functions Problems 418 A.1.1 Electronic Logic Gates 469 References 420 A.2 Synthesis of Logic Functions 470 December 16, 2010 09:28 ham_338065_toc Sheet number 5 Page number xix cyan black Contents xix A.3 Minimization of Logic Expressions 472 B.4.4 Logic Instructions 537 A.3.1 Minimization using Karnaugh Maps 475 B.4.5 Move Instructions 537 A.3.2 Don’t-Care Conditions 477 B.4.6 Branch and Jump Instructions 538 A.4 Synthesis with NAND and NOR Gates 479 B.4.7 Subroutine Linkage Instructions 541 A.5 Practical Implementation of Logic Gates 482 B.4.8 Comparison Instructions 545 A.5.1 CMOS Circuits 484 B.4.9 Shift Instructions 546 A.5.2 Propagation Delay 489 B.4.10 Rotate Instructions 547 A.5.3 Fan-In and Fan-Out Constraints 490 B.4.11 Control Instructions 548 A.5.4 Tri-State Buffers 491 B.5 Pseudoinstructions 548 A.6 Flip-Flops 492 B.6 Assembler Directives 549 A.6.1 Gated Latches 493 B.7 Carry and Overflow Detection 551 A.6.2 Master-Slave Flip-Flop 495 B.8 Example Programs 553 A.6.3 Edge Triggering 498 B.9 Control Registers 553 A.6.4 T Flip-Flop 498 B.10 Input/Output 555 A.6.5 JK Flip-Flop 499 B.10.1 Program-Controlled I/O 556 A.6.6 Flip-Flops with Preset and Clear 501 B.10.2 Interrupts and Exceptions 556 A.7 Registers and Shift Registers 502 B.11 Advanced Configurations of Nios II A.8 Counters 503 Processor 562 A.9 Decoders 505 B.11.1 External Interrupt Controller 562 A.10 Multiplexers 506 B.11.2 Memory Management Unit 562 A.11 Programmable Logic Devices (PLDs) 509 B.11.3 Floating-Point Hardware 562 A.11.1 Programmable Logic Array (PLA) 509 B.12 Concluding Remarks 563 A.11.2 Programmable Array Logic (PAL) 511 B.13 Solved Problems 563 A.11.3 Complex Programmable Logic Devices Problems 568 (CPLDs) 512 A.12 Field-Programmable Gate Arrays 514 Appendix C A.13 Sequential Circuits 516 The ColdFire Processor 571 A.13.1 Design of an Up/Down Counter as a Sequential Circuit 516 C.1 Memory Organization 572 A.13.2 Timing Diagrams 519 C.2 Registers 572 A.13.3 The Finite State Machine Model 520 C.3 Instructions 573 A.13.4 Synthesis of Finite State Machines 521 C.3.1 Addressing Modes 575 A.14 Concluding Remarks 522 C.3.2 Move Instruction 577 Problems 522 C.3.3 Arithmetic Instructions 578 C.3.4 Branch and Jump Instructions 582 References 528 C.3.5 Logic Instructions 585 Appendix B C.3.6 Shift Instructions 586 C.3.7 Subroutine Linkage Instructions 587 The Altera Nios II Processor 529 C.4 Assembler Directives 593 C.5 Example Programs 594 B.1 Nios II Characteristics 530 C.5.1 Vector Dot Product Program 594 B.2 General-Purpose Registers 531 C.5.2 String Search Program 595 B.3 Addressing Modes 532 C.6 Mode of Operation and Other Control B.4 Instructions 533 Features 596 B.4.1 Notation 533 C.7 Input/Output 597 B.4.2 Load and Store Instructions 534 C.8 Floating-Point Operations 599 B.4.3 Arithmetic Instructions 536 C.8.1 FMOVE Instruction 599 December 16, 2010 09:28 ham_338065_toc Sheet number 6 Page number xx cyan black xx Contents C.8.2 Floating-Point Arithmetic D.9 Conditional Execution of Instructions 648 Instructions 600 D.10 Coprocessors 650 C.8.3 Comparison and Branch D.11 Embedded Applications and the Thumb Instructions 601 ISA 651 C.8.4 Additional Floating-Point D.12 Concluding Remarks 651 Instructions 601 D.13 Solved Problems 652 C.8.5 Example Floating-Point Program 602 Problems 657 C.9 Concluding Remarks 603 References 660 C.10 Solved Problems 603 Problems 608 Appendix E References 609 The Intel IA-32 Architecture Appendix D 661 The ARM Processor 611 E.1 Memory Organization 662 E.2 Register Structure 662 D.1 ARM Characteristics 612 D.1.1 Unusual Aspects of the ARM E.3 Addressing Modes 665 Architecture 612 E.4 Instructions 668 D.2 Register Structure 613 E.4.1 Machine Instruction Format 670 E.4.2 Assembly-Language Notation 670 D.3 Addressing Modes 614 E.4.3 Move Instruction 671 D.3.1 Basic Indexed Addressing Mode 614 E.4.4 Load-Effective-Address Instruction 671 D.3.2 Relative Addressing Mode 615 E.4.5 Arithmetic Instructions 672 D.3.3 Index Modes with Writeback 616 E.4.6 Jump and Loop Instructions 674 D.3.4 Offset Determination 616 E.4.7 Logic Instructions 677 D.3.5 Register, Immediate, and Absolute E.4.8 Shift and Rotate Instructions 678 Addressing Modes 618 E.4.9 Subroutine Linkage Instructions 679 D.3.6 Addressing Mode Examples 618 E.4.10 Operations on Large Numbers 681 D.4 Instructions 621 D.4.1 Load and Store Instructions 621 E.5 Assembler Directives 685 D.4.2 Arithmetic Instructions 622 E.6 Example Programs 686 D.4.3 Move Instructions 625 E.6.1 Vector Dot Product Program 686 D.4.4 Logic and Test Instructions 626 E.6.2 String Search Program 686 D.4.5 Compare Instructions 627 E.7 Interrupts and Exceptions 687 D.4.6 Setting Condition Code Flags 628 E.8 Input/Output Examples 689 D.4.7 Branch Instructions 628 E.9 Scalar Floating-Point Operations 690 D.4.8 Subroutine Linkage Instructions 631 E.9.1 Load and Store Instructions 692 D.5 Assembly Language 635 E.9.2 Arithmetic Instructions 693 D.5.1 Pseudoinstructions 637 E.9.3 Comparison Instructions 694 D.6 Example Programs 638 E.9.4 Additional Instructions 694 D.6.1 Vector Dot Product 639 E.9.5 Example Floating-Point Program 694 D.6.2 String Search 639 E.10 Multimedia Extension (MMX) D.7 Operating Modes and Exceptions 639 Operations 695 D.7.1 Banked Registers 641 E.11 Vector (SIMD) Floating-Point D.7.2 Exception Types 642 Operations 696 D.7.3 System Mode 644 E.12 Examples of Solved Problems 697 D.7.4 Handling Exceptions 644 E.13 Concluding Remarks 702 D.8 Input/Output 646 Problems 702 D.8.1 Program-Controlled I/O 646 References 703 D.8.2 Interrupt-Driven I/O 648 December 10, 2010 11:03 ham_338065_ch01 Sheet number 1 Page number 1 cyan black c h a p t e r 1 Basic Structure of Computers Chapter Objectives In this chapter you will be introduced to: The different types of computers The basic structure of a computer and its operation Machine instructions and their execution Number and character representations Addition and subtraction of binary numbers Basic performance issues in computer systems A brief history of computer development 1 December 10, 2010 11:03 ham_338065_ch01 Sheet number 2 Page number 2 cyan black 2 CHAPTER 1 Basic Structure of Computers This book is about computer organization. It explains the function and design of the various units of digital computers that store and process information. It also deals with the input units of the computer which receive information from external sources and the output units which send computed results to external destinations. The input, storage, processing, and output operations are governed by a list of instructions that constitute a program. Most of the material in the book is devoted to computer hardware and computer architecture. Computer hardware consists of electronic circuits, magnetic and optical storage devices, displays, electromechanical devices, and communication facilities. Computer architecture encompasses the specification of an instruction set and the functional behavior of the hardware units that implement the instructions. Many aspects of programming and software components in computer systems are also discussed in the book. It is important to consider both hardware and software aspects of the design of the various computer components in order to gain a good understanding of computer systems. 1.1 Computer Types Since their introduction in the 1940s, digital computers have evolved into many different types that vary widely in size, cost, computational power, and intended use. Modern computers can be divided roughly into four general categories: Embedded computers are integrated into a larger device or system in order to automat- ically monitor and control a physical process or environment. They are used for a specific purpose rather than for general processing tasks. Typical applications include industrial and home automation, appliances, telecommunication products, and vehicles. Users may not even be aware of the role that computers play in such systems. Personal computers have achieved widespread use in homes, educational institu- tions, and business and engineering office settings, primarily for dedicated individual use. They support a variety of applications such as general computation, document preparation, computer-aided design, audiovisual entertainment, interpersonal communication, and In- ternet browsing. A number of classifications are used for personal computers. Desktop computers serve general needs and fit within a typical personal workspace. Workstation computers offer higher computational capacity and more powerful graphical display ca- pabilities for engineering and scientific work. Finally, Portable and Notebook computers provide the basic features of a personal computer in a smaller lightweight package. They can operate on batteries to provide mobility. Servers and Enterprise systems are large computers that are meant to be shared by a potentially large number of users who access them from some form of personal computer over a public or private network. Such computers may host large databases and provide information processing for a government agency or a commercial organization. Supercomputers and Grid computers normally offer the highest performance. They are the most expensive and physically the largest category of computers. Supercomputers are used for the highly demanding computations needed in weather forecasting, engineering design and simulation, and scientific work. They have a high cost. Grid computers provide a more cost-effective alternative. They combine a large number of personal computers and December 10, 2010 11:03 ham_338065_ch01 Sheet number 3 Page number 3 cyan black 1.2 Functional Units 3 disk storage units in a physically distributed high-speed network, called a grid, which is managed as a coordinated computing resource. By evenly distributing the computational workload across the grid, it is possible to achieve high performance on large applications ranging from numerical computation to information searching. There is an emerging trend in access to computing facilities, known as cloud com- puting. Personal computer users access widely distributed computing and storage server resources for individual, independent, computing needs. The Internet provides the neces- sary communication facility. Cloud hardware and software service providers operate as a utility, charging on a pay-as-you-use basis. 1.2 Functional Units A computer consists of five functionally independent main parts: input, memory, arithmetic and logic, output, and control units, as shown in Figure 1.1. The input unit accepts coded information from human operators using devices such as keyboards, or from other comput- ers over digital communication lines. The information received is stored in the computer’s memory, either for later use or to be processed immediately by the arithmetic and logic unit. The processing steps are specified by a program that is also stored in the memory. Finally, the results are sent back to the outside world through the output unit. All of these actions are coordinated by the control unit. An interconnection network provides the means for the functional units to exchange information and coordinate their actions. Later chapters will provide more details on individual units and their interconnections. We refer to the Memory Arithmetic Input and logic Interconnection network Output Control I/O Processor Figure 1.1 Basic functional units of a computer. December 10, 2010 11:03 ham_338065_ch01 Sheet number 4 Page number 4 cyan black 4 CHAPTER 1 Basic Structure of Computers arithmetic and logic circuits, in conjunction with the main control circuits, as the processor. Input and output equipment is often collectively referred to as the input-output (I/O) unit. We now take a closer look at the information handled by a computer. It is conve- nient to categorize this information as either instructions or data. Instructions, or machine instructions, are explicit commands that Govern the transfer of information within a computer as well as between the computer and its I/O devices Specify the arithmetic and logic operations to be performed A program is a list of instructions which performs a task. Programs are stored in the memory. The processor fetches the program instructions from the memory, one after another, and performs the desired operations. The computer is controlled by the stored program, except for possible external interruption by an operator or by I/O devices connected to it. Data are numbers and characters that are used as operands by the instructions. Data are also stored in the memory. The instructions and data handled by a computer must be encoded in a suitable format. Most present-day hardware employs digital circuits that have only two stable states. Each instruction, number, or character is encoded as a string of binary digits called bits, each having one of two possible values, 0 or 1, represented by the two stable states. Numbers are usually represented in positional binary notation, as discussed in Section 1.4. Alphanumeric characters are also expressed in terms of binary codes, as discussed in Section 1.5. 1.2.1 Input Unit Computers accept coded information through input units. The most common input device is the keyboard. Whenever a key is pressed, the corresponding letter or digit is automatically translated into its corresponding binary code and transmitted to the processor. Many other kinds of input devices for human-computer interaction are available, in- cluding the touchpad, mouse, joystick, and trackball. These are often used as graphic input devices in conjunction with displays. Microphones can be used to capture audio input which is then sampled and converted into digital codes for storage and processing. Similarly, cameras can be used to capture video input. Digital communication facilities, such as the Internet, can also provide input to a computer from other computers and database servers. 1.2.2 Memory Unit The function of the memory unit is to store programs and data. There are two classes of storage, called primary and secondary. Primary Memory Primary memory, also called main memory, is a fast memory that operates at electronic speeds. Programs must be stored in this memory while they are being executed. The December 10, 2010 11:03 ham_338065_ch01 Sheet number 5 Page number 5 cyan black 1.2 Functional Units 5 memory consists of a large number of semiconductor storage cells, each capable of storing one bit of information. These cells are rarely read or written individually. Instead, they are handled in groups of fixed size called words. The memory is organized so that one word can be stored or retrieved in one basic operation. The number of bits in each word is referred to as the word length of the computer, typically 16, 32, or 64 bits. To provide easy access to any word in the memory, a distinct address is associated with each word location. Addresses are consecutive numbers, starting from 0, that identify successive locations. A particular word is accessed by specifying its address and issuing a control command to the memory that starts the storage or retrieval process. Instructions and data can be written into or read from the memory under the control of the processor. It is essential to be able to access any word location in the memory as quickly as possible. A memory in which any location can be accessed in a short and fixed amount of time after specifying its address is called a random-access memory (RAM). The time required to access one word is called the memory access time. This time is independent of the location of the word being accessed. It typically ranges from a few nanoseconds (ns) to about 100 ns for current RAM units. Cache Memory As an adjunct to the main memory, a smaller, faster RAM unit, called a cache, is used to hold sections of a program that are currently being executed, along with any associated data. The cache is tightly coupled with the processor and is usually contained on the same integrated-circuit chip. The purpose of the cache is to facilitate high instruction execution rates. At the start of program execution, the cache is empty. All program instructions and any required data are stored in the main memory. As execution proceeds, instructions are fetched into the processor chip, and a copy of each is placed in the cache. When the execution of an instruction requires data located in the main memory, the data are fetched and copies are also placed in the cache. Now, suppose a number of instructions are executed repeatedly as happens in a program loop. If these instructions are available in the cache, they can be fetched quickly during the period of repeated use. Similarly, if the same data locations are accessed repeatedly while copies of their contents are available in the cache, they can be fetched quickly. Secondary Storage Although primary memory is essential, it tends to be expensive and does not retain in- formation when power is turned off. Thus additional, less expensive, permanent secondary storage is used when large amounts of data and many programs have to be stored, particu- larly for information that is accessed infrequently. Access times for secondary storage are longer than for primary memory. A wide selection of secondary storage devices is available, including magnetic disks, optical disks (DVD and CD), and flash memory devices. 1.2.3 Arithmetic and Logic Unit Most computer operations are executed in the arithmetic and logic unit (ALU) of the processor. Any arithmetic or logic operation, such as addition, subtraction, multiplication, December 10, 2010 11:03 ham_338065_ch01 Sheet number 6 Page number 6 cyan black 6 CHAPTER 1 Basic Structure of Computers division, or comparison of numbers, is initiated by bringing the required operands into the processor, where the operation is performed by the ALU. For example, if two numbers located in the memory are to be added, they are brought into the processor, and the addition is carried out by the ALU. The sum may then be stored in the memory or retained in the processor for immediate use. When operands are brought into the processor, they are stored in high-speed storage elements called registers. Each register can store one word of data. Access times to registers are even shorter than access times to the cache unit on the processor chip. 1.2.4 Output Unit The output unit is the counterpart of the input unit. Its function is to send processed results to the outside world. A familiar example of such a device is a printer. Most printers employ either photocopying techniques, as in laser printers, or ink jet streams. Such printers may generate output at speeds of 20 or more pages per minute. However, printers are mechanical devices, and as such are quite slow compared to the electronic speed of a processor. Some units, such as graphic displays, provide both an output function, showing text and graphics, and an input function, through touchscreen capability. The dual role of such units is the reason for using the single name input/output (I/O) unit in many cases. 1.2.5 Control Unit The memory, arithmetic and logic, and I/O units store and process information and perform input and output operations. The operation of these units must be coordinated in some way. This is the responsibility of the control unit. The control unit is effectively the nerve center that sends control signals to other units and senses their states. I/O transfers, consisting of input and output operations, are controlled by program instructions that identify the devices involved and the information to be transferred. Control circuits are responsible for generating the timing signals that govern the transfers and determine when a given action is to take place. Data transfers between the processor and the memory are also managed by the control unit through timing signals. It is reasonable to think of a control unit as a well-defined, physically separate unit that interacts with other parts of the computer. In practice, however, this is seldom the case. Much of the control circuitry is physically distributed throughout the computer. A large set of control lines (wires) carries the signals used for timing and synchronization of events in all units. The operation of a computer can be summarized as follows: The computer accepts information in the form of programs and data through an input unit and stores it in the memory. Information stored in the memory is fetched under program control into an arithmetic and logic unit, where it is processed. Processed information leaves the computer through an output unit. All activities in the computer are directed by the control unit. December 10, 2010 11:03 ham_338065_ch01 Sheet number 7 Page number 7 cyan black 1.3 Basic Operational Concepts 7 1.3 Basic Operational Concepts In Section 1.2, we stated that the activity in a computer is governed by instructions. To perform a given task, an appropriate program consisting of a list of instructions is stored in the memory. Individual instructions are brought from the memory into the processor, which executes the specified operations. Data to be used as instruction operands are also stored in the memory. A typical instruction might be Load R2, LOC This instruction reads the contents of a memory location whose address is represented symbolically by the label LOC and loads them into processor register R2. The original contents of location LOC are preserved, whereas those of register R2 are overwritten. Execution of this instruction requires several steps. First, the instruction is fetched from the memory into the processor. Next, the operation to be performed is determined by the control unit. The operand at LOC is then fetched from the memory into the processor. Finally, the operand is stored in register R2. After operands have been loaded from memory into processor registers, arithmetic or logic operations can be performed on them. For example, the instruction Add R4, R2, R3 adds the contents of registers R2 and R3, then places their sum into register R4. The operands in R2 and R3 are not altered, but the previous value in R4 is overwritten by the sum. After completing the desired operations, the results are in processor registers. They can be transferred to the memory using instructions such as Store R4, LOC This instruction copies the operand in register R4 to memory location LOC. The original contents of location LOC are overwritten, but those of R4 are preserved. For Load and Store instructions, transfers between the memory and the processor are initiated by sending the address of the desired memory location to the memory unit and asserting the appropriate control signals. The data are then transferred to or from the memory. Figure 1.2 shows how the memory and the processor can be connected. It also shows some components of the processor that have not been discussed yet. The interconnections between these components are not shown explicitly since we will only discuss their func- tional characteristics here. Chapter 5 describes the details of the interconnections as part of processor organization. In addition to the ALU and the control circuitry, the processor contains a number of registers used for several different purposes. The instruction register (IR) holds the instruction that is currently being executed. Its output is available to the control circuits, which generate the timing signals that control the various processing elements involved in executing the instruction. The program counter (PC) is another specialized register. It December 10, 2010 11:03 ham_338065_ch01 Sheet number 8 Page number 8 cyan black 8 CHAPTER 1 Basic Structure of Computers Main memory Processor-memory interface PC R0 Control R1 Processor IR ALU R n–1 n general purpose registers Figure 1.2 Connection between the processor and the main memory. contains the memory address of the next instruction to be fetched and executed. During the execution of an instruction, the contents of the PC are updated to correspond to the address of the next instruction to be executed. It is customary to say that the PC points to the next instruction that is to be fetched from the memory. In addition to the IR and PC, Figure 1.2 shows general-purpose registers R0 through Rn−1 , often called processor registers. They serve a variety of functions, including holding operands that have been loaded from the memory for processing. The roles of the general-purpose registers are explained in detail in Chapter 2. The processor-memory interface is a circuit which manages the transfer of data between the main memory and the processor. If a word is to be read from the memory, the interface sends the address of that word to the memory along with a Read control signal. The interface waits for the word to be retrieved, then transfers it to the appropriate processor register. If a word is to be written into memory, the interface transfers both the address and the word to the memory along with a Write control signal. Let us now consider some typical operating steps. A program must be in the main memory in order for it to be executed. It is often transferred there from secondary storage through the input unit. Execution of the program begins when the PC is set to point to the December 10, 2010 11:03 ham_338065_ch01 Sheet number 9 Page number 9 cyan black 1.4 Number Representation and Arithmetic Operations 9 first instruction of the program. The contents of the PC are transferred to the memory along with a Read control signal. When the addressed word (in this case, the first instruction of the program) has been fetched from the memory it is loaded into register IR. At this point, the instruction is ready to be interpreted and executed. Instructions such as Load, Store, and Add perform data transfer and arithmetic opera- tions. If an operand that resides in the memory is required for an instruction, it is fetched by sending its address to the memory and initiating a Read operation. When the operand has been fetched from the memory, it is transferred to a processor register. After operands have been fetched in this way, the ALU can perform a desired arithmetic operation, such as Add, on the values in processor registers. The result is sent to a processor register. If the result is to be written into the memory with a Store instruction, it is transferred from the processor register to the memory, along with the address of the location where the result is to be stored, then a Write operation is initiated. At some point during the execution of each instruction, the contents of the PC are incremented so that the PC points to the next instruction to be executed. Thus, as soon as the execution of the current instruction is completed, the processor is ready to fetch a new instruction. In addition to transferring data between the memory and the processor, the computer accepts data from input devices and sends data to output devices. Thus, some machine instructions are provided for the purpose of handling I/O transfers. Normal execution of a program may be preempted if some device requires urgent service. For example, a monitoring device in a computer-controlled industrial process may detect a dangerous condition. In order to respond immediately, execution of the current program must be suspended. To cause this, the device raises an interrupt signal, which is a request for service by the processor. The processor provides the requested service by executing a program called an interrupt-service routine. Because such diversions may alter the internal state of the processor, its state must be saved in the memory before servicing the interrupt request. Normally, the information that is saved includes the contents of the PC, the contents of the general-purpose registers, and some control information. When the interrupt-service routine is completed, the state of the processor is restored from the memory so that the interrupted program may continue. This section has provided an overview of the operation of a computer. Detailed dis- cussion of these concepts is given in subsequent chapters, first from the point of view of the programmer in Chapters 2, 3, and 4, and then from the point of view of the hardware designer in later chapters. 1.4 Number Representation and Arithmetic Operations The most natural way to represent a number in a computer system is by a string of bits, called a binary number. We will first describe binary number representations for integers as well as arithmetic operations on them. Then we will provide a brief introduction to the representation of floating-point numbers. December 10, 2010 11:03 ham_338065_ch01 Sheet number 10 Page number 10 cyan black 10 CHAPTER 1 Basic Structure of Computers 1.4.1 Integers Consider an n-bit vector B = bn−1... b1 b0 where bi = 0 or 1 for 0 ≤ i ≤ n − 1. This vector can represent an unsigned integer value V (B) in the range 0 to 2n − 1, where V (B) = bn−1 × 2n−1 + · · · + b1 × 21 + b0 × 20 We need to represent both positive and negative numbers. Three systems are used for representing such numbers: Sign-and-magnitude 1’s-complement 2’s-complement In all three systems, the leftmost bit is 0 for positive numbers and 1 for negative numbers. Figure 1.3 illustrates all three representations using 4-bit numbers. Positive values have identical representations in all systems, but negative values have different representations. In the sign-and-magnitude system, negative values are represented by changing the most B Values represented Sign and b3 b2 b1 b0 magnitude 1’s complement 2’s complement 0 1 1 1 +7 +7 +7 0 1 1 0 +6 +6 +6 0 1 0 1 +5 +5 +5 0 1 0 0 +4 +4 +4 0 0 1 1 +3 +3 +3 0 0 1 0 +2 +2 +2 0 0 0 1 +1 +1 +1 0 0 0 0 +0 +0 +0 1 0 0 0 –0 –7 –8 1 0 0 1 –1 –6 –7 1 0 1 0 –2 –5 –6 1 0 1 1 –3 –4 –5 1 1 0 0 –4 –3 –4 1 1 0 1 –5 –2 –3 1 1 1 0 –6 –1 –2 1 1 1 1 –7 –0 –1 Figure 1.3 Binary, signed-integer representations. December 10, 2010 11:03 ham_338065_ch01 Sheet number 11 Page number 11 cyan black 1.4 Number Representation and Arithmetic Operations 11 significant bit (b3 in Figure 1.3) from 0 to 1 in the B vector of the corresponding positive value. For example, +5 is represented by 0101, and −5 is represented by 1101. In 1’s-complement representation, negative values are obtained by complementing each bit of the corresponding positive number. Thus, the representation for −3 is obtained by complementing each bit in the vector 0011 to yield 1100. The same operation, bit complementing, is done to convert a negative number to the corresponding positive value. Converting either way is referred to as forming the 1’s-complement of a given number. For n-bit numbers, this operation is equivalent to subtracting the number from 2n − 1. In the case of the 4-bit numbers in Figure 1.3, we subtract from 24 − 1 = 15, or 1111 in binary. Finally, in the 2’s-complement system, forming the 2’s-complement of an n-bit number is done by subtracting the number from 2n. Hence, the 2’s-complement of a number is obtained by adding 1 to the 1’s-complement of that number. Note that there are distinct representations for +0 and −0 in both the sign-and- magnitude and 1’s-complement systems, but the 2’s-complement system has only one rep- resentation for 0. For 4-bit numbers, as shown in Figure 1.3, the value −8 is representable in the 2’s-complement system but not in the other systems. The sign-and-magnitude sys- tem seems the most natural, because we deal with sign-and-magnitude decimal values in manual computations. The 1’s-complement system is easily related to this system, but the 2’s-complement system may appear somewhat unnatural. However, we will show that the 2’s-complement system leads to the most efficient way to carry out addition and subtraction operations. It is the one most often used in modern computers. Addition of Unsigned Integers Addition of 1-bit numbers is illustrated in Figure 1.4. The sum of 1 and 1 is the 2-bit vector 10, which represents the value 2. We say that the sum is 0 and the carry-out is 1. In order to add multiple-bit numbers, we use a method analogous to that used for manual computation with decimal numbers. We add bit pairs starting from the low-order (right) end of the bit vectors, propagating carries toward the high-order (left) end. The carry-out from a bit pair becomes the carry-in to the next bit pair to the left. The carry-in must be added to a bit pair in generating the sum and carry-out at that position. For example, if both bits of a pair are 1 and the carry-in is 1, then the sum is 1 and the carry-out is 1, which represents the value 3. 0 1 0 1 + 0 + 0 + 1 + 1 0 1 1 10 Carry-out Figure 1.4 Addition of 1-bit numbers. December 10, 2010 11:03 ham_338065_ch01 Sheet number 12 Page number 12 cyan black 12 CHAPTER 1 Basic Structure of Computers Addition and Subtraction of Signed Integers We introduced three systems for representing positive and negative numbers, or, simply, signed numbers. These systems differ only in the way they represent negative values. Their relative merits from the standpoint of ease of performing arithmetic operations can be summarized as follows. The sign-and-magnitude system is the simplest representation, but it is also the most awkward for addition and subtraction operations. The 1’s-complement method is somewhat better. The 2’s-complement system is the most efficient method for performing addition and subtraction operations. To understand 2’s-complement arithmetic, consider addition modulo N (abbreviated as mod N ). A helpful graphical device for the description of addition of unsigned integers mod N is a circle with the values 0 through N − 1 marked along its perimeter, as shown in Figure 1.5a. Consider the case N = 16, shown in part (b) of the figure. The decimal values 0 through 15 are represented by their 4-bit binary values 0000 through 1111 around the outside of the circle. In terms of decimal values, the operation (7 + 5) mod 16 yields the value 12. To perform this operation graphically, locate 7 (0111) on the outside of the circle and then move 5 units in the clockwise direction to arrive at the answer 12 (1100). Similarly, (9 + 14) mod 16 = 7; this is modeled on the circle by locating 9 (1001) and moving 14 units in the clockwise direction past the zero position to arrive at the answer 7 (0111). This graphical technique works for the computation of (a + b) mod 16 for any unsigned integers a and b; that is, to perform addition, locate a and move b units in the clockwise direction to arrive at (a + b) mod 16.