Computer Organization and Architecture PDF
Document Details
Uploaded by GreatCongas
Atlas Üniversitesi
Tags
Related
- Unit 2.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-24-101-9-11.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-4.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-5.pdf
- Computer Architecture and Organization-1.pdf
Summary
This document is an excerpt from 'Computer Organization and Architecture' 11th edition providing an overview of computer organization. It covers fundamental concepts, system structure, functions like data processing and storage, CPU components, multicore structure, cache memory, and a brief computer history.
Full Transcript
Computer Organization and Architecture Designing for Performance 11th Edition, Global Edition Chapter 1 Basic Concepts and Computer Evolution...
Computer Organization and Architecture Designing for Performance 11th Edition, Global Edition Chapter 1 Basic Concepts and Computer Evolution Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Computer Architecture Computer Organization Attributes of a system Instruction set, number of visible to the bits used to represent programmer various data types, I/O Have a direct impact on mechanisms, techniques the logical execution of a for addressing memory program Architectural Computer attributes Architecture include: Organizational Computer attributes Organization include: Hardware details The operational units and transparent to the their interconnections programmer, control that realize the signals, interfaces architectural between the computer specifications and peripherals, memory technology used Copyright © 2022 Pearson Education, Ltd. All Rights Reserved IBM System 370 Architecture IBM System/370 architecture – Was introduced in 1970 – Included a number of models – Could upgrade to a more expensive, faster model without having to abandon original software – New models are introduced with improved technology, but retain the same architecture so that the customer’s software investment is protected – Architecture has survived to this day as the architecture of IBM’s mainframe product line Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Structure and Function Hierarchical system Structure – Set of interrelated subsystems – The way in which components relate to each other Hierarchical nature of complex systems is essential to both their Function design and their description – The operation of individual components as part of the Designer need only deal with a particular structure level of the system at a time – Concerned with structure and function at each level Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Function There are four basic functions that a computer can perform: – Data processing ▪ Data may take a wide variety of forms and the range of processing requirements is broad – Data storage ▪ Short-term ▪ Long-term – Data movement ▪ Input-output (I/O) - when data are received from or delivered to a device (peripheral) that is directly connected to the computer ▪ Data communications – when data are moved over longer distances, to or from a remote device – Control ▪ A control unit manages the computer’s resources and orchestrates the performance of its functional parts in response to instructions Copyright © 2022 Pearson Education, Ltd. All Rights Reserved COMPUTER I/O Main Structure memory System Bus CPU CPU Registers ALU Internal Bus Control Unit CONTROL UNIT Sequencing Logic Control Unit Registers and Decoders Control Memory Figure 1.1 The Computer: Top-Level Structure Figure Copyright 1.1 Pearson © 2022 A Top-Down View ofLtd. Education, a Computer All Rights Reserved There are four main structural components of the computer: CPU – controls the operation of the computer and performs its data processing functions Main Memory – stores data I/O – moves data between the computer and its external environment System Interconnection – some mechanism that provides for communication among CPU, main memory, and I/O Copyright © 2022 Pearson Education, Ltd. All Rights Reserved CPU Major structural components: Control Unit – Controls the operation of the CPU and hence the computer Arithmetic and Logic Unit (ALU) – Performs the computer’s data processing function Registers – Provide storage internal to the CPU CPU Interconnection – Some mechanism that provides for communication among the control unit, ALU, and registers Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Multicore Computer Structure Central processing unit (CPU) – Portion of the computer that fetches and executes instructions – Consists of an ALU, a control unit, and registers – Referred to as a processor in a system with a single processing unit Core – An individual processing unit on a processor chip – May be equivalent in functionality to a CPU on a single-CPU system – Specialized processing units are also referred to as cores Processor – A physical piece of silicon containing one or more cores – Is the computer component that interprets and executes instructions – Referred to as a multicore processor if it contains multiple cores Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Cache Memory Multiple layers of memory between the processor and main memory Is smaller and faster than main memory Used to speed up memory access by placing in the cache data from main memory that is likely to be used in the near future A greater performance improvement may be obtained by using multiple levels of cache, with level 1 (L1) closest to the core and additional levels (L2, L3, etc.) progressively farther from the core Copyright © 2022 Pearson Education, Ltd. All Rights Reserved MOTHERBOARD Main memory chips Figure 1.2 Processor I/O chips chip PROCESSOR CHIP Core Core Core Core L3 cache L3 cache Core Core Core Core CORE Arithmetic Instruction and logic Load/ logic unit (ALU) store logic L1 I-cache L1 data cache L2 instruction L2 data cache cache Figure 1.2 Simplified View of Major Elements of a Multicore Computer Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.3 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.4 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved History of Computers First Generation: Vacuum Tubes Vacuum tubes were used for digital logic elements and memory IAS computer – Fundamental design approach was the stored program concept ▪ Attributed to the mathematician John von Neumann ▪ First publication of the idea was in 1945 for the EDVAC – Design began at the Princeton Institute for Advanced Studies – Completed in 1952 – Prototype of all subsequent general-purpose computers Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.5 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.6 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Registers Memory buffer register Contains a word to be stored in memory or sent to the I/O unit (MBR) Or is used to receive a word from memory or from the I/O unit Memory address Specifies the address in memory of the word to be written from register (MAR) or read into the MBR Instruction register (IR) Contains the 8-bit opcode instruction being executed Instruction buffer Employed to temporarily hold the right-hand instruction from a register (IBR) word in memory Contains the address of the next instruction pair to be fetched Program counter (PC) from memory Accumulator (AC) and Employed to temporarily hold operands and results of ALU multiplier quotient (MQ) operations Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.7 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 1.1 The IAS Instruction Set Instruction Symbolic Type Opcode Representation Description 00001010 LOAD MQ Transfer contents of register MQ to the accumulator AC 00001001 LOAD MQ,M(X) Transfer contents of memory location X to MQ 00100001 STOR M(X) Transfer contents of accumulator to memory location X Data transfer 00000001 LOAD M(X) Transfer M(X) to the accumulator 00000010 LOAD –M(X) Transfer –M(X) to the accumulator 00000011 LOAD |M(X)| Transfer absolute value of M(X) to the accumulator 00000100 LOAD –|M(X)| Transfer –|M(X)| to the accumulator Unconditional 00001101 JUMP M(X,0:19) Take next instruction from left half of M(X) branch 00001110 JUMP M(X,20:39) Take next instruction from right half of M(X) 00001111 JUMP + M(X,0:19) Take next instruction from right half of M(X) Conditional Branch 00010000 JUMP + M(X,20:39) If number in the accumulator is nonnegative, take next instruction from right half of M(X) 00000101 ADD M(X) Add M(X) to AC; put the result in AC 00000111 ADD |M(X)| Add |M(X)| to AC; put the result in AC 00000110 SUB M(X) Subtract M(X) from AC; put the result in AC 00001000 SUB |M(X)| Subtract |M(X)| from AC; put the remainder in AC Arithmetic 00001011 MUL M(X) Multiply M(X) by MQ; put most significant bits of result in AC, put least significant bits in MQ 00001100 DIV M(X) Divide AC by M(X); put the quotient in MQ and the remainder in AC 00010100 LSH Multiply accumulator by 2; that is, shift left one bit position 00010101 RSH Divide accumulator by 2; that is, shift right one position 00010010 STOR M(X,8:19) Replace left address field at M(X) by 12 rightmost bits Address of AC modify 00010011 STOR M(X,28:39) Replace right address field at M(X) by 12 rightmost bits of AC (Table can be found on page 16 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.8 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Integrated Circuits Data storage – provided by A computer consists of gates, memory cells, and memory cells interconnections among these elements Data processing – provided by The gates and memory cells are constructed of gates simple digital electronic components Data movement – the paths Exploits the fact that such components as among components are used to transistors, resistors, and conductors can be move data from memory to fabricated from a semiconductor such as memory and from memory silicon through gates to memory Many transistors can be produced at the Control – the paths among same time on a single wafer of silicon components can carry control signals Transistors can be connected with a processor metallization to form circuits Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Transistors The fundamental building block of digital circuits used to construct processors, memories, and other digital logic devices Active part of the transistor is made of silicon or some other semiconductor material that can change its electrical state when pulsed – In its normal state the material may be nonconductive or conductive – The transistor changes its state when voltage is applied to the gate Discrete component – A single, self-contained transistor – Were manufactured separately, packaged in their own containers, and soldered or wired together onto Masonite-like circuit boards Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.9 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.10 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.11 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Moore’s Law 1965; Gordon Moore – co-founder of Intel Observed number of transistors that could be put on a single chip was doubling every year Consequences of Moore’s law: The pace slowed to a doubling every 18 months in the 1970’s but has sustained The cost of The electrical Computer computer logic path length is becomes smaller Reduction in that rate ever since and memory shortened, power and Fewer and is more interchip circuitry has increasing convenient to cooling use in a variety connections fallen at a operating requirements dramatic rate speed of environments Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.12 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Evolution of Intel Microprocessors (1 of 4) 4004 8008 8080 8086 8088 Introduced 1971 1972 1974 1978 1979 Clock speeds 108 kHz 108 kHz 2 MHz 2 MHz, 8 MHz, 10 MHz 5 MHz, 8 MHz Bus width 4 bits 8 bits 8 bits 16 bits 8 bits Number of transistors 2,300 3,500 6,000 29,000 29,000 Feature size (m) 10 8 6 3 6 Addressable memory 640 bytes 16 KB 64 KB 1 MB 1 MB (a) 1970s Processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Evolution of Intel Microprocessors (2 of 4) 80286 386TM DX 386TM SX 486TM DX CPU Introduced 1982 1985 1988 1989 Clock speeds 6–12.5 MHz 16–33 MHz 16–33 MHz 25–50 MHz Bus width 16 bits 32 bits 16 bits 32 bits Number of transistors 134,000 275,000 275,000 1.2 million Feature size (m) 1.5 1 1 0.8–1 Addressable memory 16 MB 4 GB 16 MB 4 GB Virtual memory 1 GB 64 TB 64 TB 64 TB Cache – – – 8 kB (b) 1980s Processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Evolution of Intel Microprocessors (3 of 4) 486TM SX Pentium Pentium Pro Pentium II Introduced 1991 1993 1995 1997 Clock speeds 16–33 MHz 60–166 MHz 150–200 MHz 200–300 MHz Bus width 32 bits 32 bits 64 bits 64 bits Number of transistors 1.185 million 3.1 million 5.5 million 7.5 million Feature size (m) 1 0.8 0.6 0.35 Addressable memory 4 GB 4 GB 64 GB 64 GB Virtual memory 64 TB 64 TB 64 TB 64 TB 512 kB L1 and Cache 8 kB 8 kB 512 kB L2 1 MB L2 (c) 1990s Processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Evolution of Intel Microprocessors (4 of 4) Core i7 EE Core i9- Pentium III Pentium 4 Core 2 Duo 4960X 7900X Introduced 1999 2000 2006 2013 2017 1.06–1.2 Clock speeds 450–660 MHz 1.3–1.8 GHz 4 GHz 4.3 GHz GHz Bus width 64 bits 64 bits 64 bits 64 bits 64 bits Number of transistors 9.5 million 42 million 167 million 1.86 billion 7.2 billion Feature size (nm) 250 180 65 22 14 Addressable memory 64 GB 64 GB 64 GB 64 GB 128 GB Virtual memory 64 TB 64 TB 64 TB 64 TB 64 TB 1.5 MB L2/ Cache 512 kB L2 256 kB L2 2 MB L2 14 MB L3 1.5 MB L3 Number of cores 1 1 2 6 10 (d) Recent Processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Highlights of the Evolution of the Intel Product Line: (1 of 2) 8080 8086 80286 80386 80486 World’s first A more Extension of the Intel’s first 32- Introduced the general- powerful 16-bit 8086 enabling bit machine use of much purpose machine addressing a First Intel more microprocessor Has an 16-MB memory processor to sophisticated 8-bit machine, instruction instead of just support and powerful 8-bit data path cache, or 1MB multitasking cache to memory queue, that technology and Was used in the prefetches a sophisticated first personal few instructions instruction computer before they are pipelining (Altair) executed Also offered a The first built-in math appearance of coprocessor the x86 architecture The 8088 was a variant of this processor and used in IBM’s first personal computer (securing the success of Intel Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Highlights of the Evolution of the Intel Product Line: (2 of 2) Pentium Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel Pentium Pro Continued the move into superscalar organization with aggressive use of register renaming, branch prediction, data flow analysis, and speculative execution Pentium II Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics data efficiently Pentium III Incorporated additional floating-point instructions Streaming SIMD Extensions (SSE) Pentium 4 Includes additional floating-point and other enhancements for multimedia Core First Intel x86 micro-core Core 2 Extends the Core architecture to 64 bits Core 2 Quad provides four cores on a single chip More recent Core offerings have up to 10 cores per chip An important addition to the architecture was the Advanced Vector Extensions instruction set Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Embedded Systems The use of electronics and software within a product Billions of computer systems are produced each year that are embedded within larger devices Today many devices that use electric power have an embedded computing system Often embedded systems are tightly coupled to their environment – This can give rise to real-time constraints imposed by the need to interact with the environment ▪ Constraints such as required speeds of motion, required precision of measurement, and required time durations, dictate the timing of software operations – If multiple activities must be managed simultaneously this imposes more complex real-time constraints Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.13 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved The Internet of Things (IoT) Term that refers to the expanding interconnection of smart devices, ranging from appliances to tiny sensors Is primarily driven by deeply embedded devices Generations of deployment culminating in the IoT: – Information technology (IT) ▪ PCs, servers, routers, firewalls, and so on, bought as IT devices by enterprise IT people and primarily using wired connectivity – Operational technology (OT) ▪ Machines/appliances with embedded IT built by non-IT companies, such as medical machinery, SCADA, process control, and kiosks, bought as appliances by enterprise OT people and primarily using wired connectivity – Personal technology ▪ Smartphones, tablets, and eBook readers bought as IT devices by consumers exclusively using wireless connectivity and often multiple forms of wireless connectivity – Sensor/actuator technology ▪ Single-purpose devices bought by consumers, IT, and OT people exclusively using wireless connectivity, generally of a single form, as part of larger systems It is the fourth generation that is usually thought of as the IoT and it is marked by the use of billions of embedded devices Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Embedded Application Processors Operating versus Systems Dedicated Processors There are two general Application processors approaches to developing an – Defined by the processor’s ability to execute complex operating systems embedded operating system – General-purpose in nature (OS): – An example is the smartphone – the – Take an existing OS and adapt it embedded system is designed to support numerous apps and perform for the embedded application a wide variety of functions – Design and implement an OS intended solely for embedded use Dedicated processor – Is dedicated to one or a small number of specific tasks required by the host device – Because such an embedded system is dedicated to a specific task or tasks, the processor and associated components can be engineered to reduce size and cost Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.14 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Deeply Embedded Systems Subset of embedded systems Has a processor whose behavior is difficult to observe both by the programmer and the user Uses a microcontroller rather than a microprocessor Is not programmable once the program logic for the device has been burned into ROM Has no interaction with a user Dedicated, single-purpose devices that detect something in the environment, perform a basic level of processing, and then do something with the results Often have wireless capability and appear in networked configurations, such as networks of sensors deployed over a large area Typically have extreme resource constraints in terms of memory, processor size, time, and power consumption Copyright © 2022 Pearson Education, Ltd. All Rights Reserved ARM Refers to a processor architecture that has evolved from RISC design principles and is used in embedded systems Family of RISC-based microprocessors and microcontrollers designed by ARM Holdings, Cambridge, England Chips are high-speed processors that are known for their small die size and low power requirements Probably the most widely used embedded processor architecture and indeed the most widely used processor architecture of any kind in the world Acorn RISC Machine/Advanced RISC Machine Copyright © 2022 Pearson Education, Ltd. All Rights Reserved ARM Products Cortex-M Cortex-M0 Cortex-M0+ Cortex-R Cortex-M3 Cortex-M4 Cortex-A Cortex-M7 Cortex-M23 Cortex-M33 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 1.15 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Summary Basic Concepts and Chapter 1 Computer Evolution Organization and architecture Embedded systems – The Internet of things Structure and function – Embedded operating systems The IAS computer – Application processors versus Gates, memory cells, chips, and dedicated processors multichip modules – Microprocessors versus – Gates and memory cells microcontrollers – Transistors – Embedded versus deeply embedded systems – Microelectronic chips – Multichip module ARM architecture – ARM evolution The evolution of the Intel x86 architecture – Instruction set architecture – ARM products Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Computer Organization and Architecture Designing for Performance 11th Edition, Global Edition Chapter 2 Performance Concepts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Designing for Performance The cost of computer systems continues to drop dramatically, while the performance and capacity of those systems continue to rise equally dramatically Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago Processors are so inexpensive that we now have microprocessors we throw away Desktop applications that require the great power of today’s microprocessor-based systems include: – Image processing – Three-dimensional rendering – Speech recognition – Videoconferencing – Multimedia authoring – Voice and video annotation of files – Simulation modeling Businesses are relying on increasingly powerful servers to handle transaction and database processing and to support massive client/server networks that have replaced the huge mainframe computer centers of yesteryear Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-transaction-rate applications for a broad spectrum of clients Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Microprocessor Speed Techniques built into contemporary processors include: Processor moves data or instructions into a Pipelining conceptual pipe with all stages of the pipe processing simultaneously Processor looks ahead in the instruction code fetched Branch prediction from memory and predicts which branches, or groups of instructions, are likely to be processed next Superscalar This is the ability to issue more than one instruction in every processor clock cycle. (In effect, multiple execution parallel pipelines are used.) Processor analyzes which instructions are dependent Data flow analysis on each other’s results, or data, to create an optimized schedule of instructions Speculative Using branch prediction and data flow analysis, some processors speculatively execute instructions ahead of their actual appearance in the program execution, execution holding the results in temporary locations, keeping execution engines as busy as possible Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Performance Balance Adjust the organization and Increase the number of bits that are architecture to compensate retrieved at one time by making DRAMs for the mismatch among the “wider” rather than “deeper” and by using wide bus data capabilities of the various paths components Reduce the frequency Architectural examples of memory access by incorporating increasingly complex include: and efficient cache structures between the processor and main memory Change the DRAM Increase the interface to make it interconnect more efficient by bandwidth between processors and including a cache or memory by using other buffering higher speed buses scheme on the DRAM and a hierarchy of chip buses to buffer and structure data flow Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 2.1 Ethernet modem (max speed) Graphics display Wi-Fi modem (max speed) Hard disk Optical disc Laser printer Scanner Mouse Keyboard 101 102 103 104 105 106 107 108 109 1010 1011 Data Rate (bps) Figure 2.1 Typical I/O Device Data Rates Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Improvements in Chip Organization and Architecture Increase hardware speed of processor – Fundamentally due to shrinking logic gate size ▪ More gates, packed more tightly, increasing clock rate ▪ Propagation time for signals reduced Increase size and speed of caches – Dedicating part of processor chip ▪ Cache access times drop significantly Change processor organization and architecture – Increase effective speed of instruction execution – Parallelism Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Problems with Clock Speed and Logic Density Power – Power density increases with density of logic and clock speed – Dissipating heat RC delay – Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them – Delay increases as the RC product increases – As components on the chip decrease in size, the wire interconnects become thinner, increasing resistance – Also, the wires are closer together, increasing capacitance Memory latency and throughput – Memory access speed (latency) and transfer speed (throughput) lag processor speeds Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 2.2 107 106 Transistors (Thousands) 105 Frequency (MHz) Power (W) 104 Cores 103 102 10 1 0.1 1970 1975 1980 1985 1990 1995 2000 2005 2010 Figure 2.2 Processor Trends Copyright © 2022 Pearson Education, Ltd. All Rights Reserved The use of multiple processors on the same chip Multicore provides the potential to increase performance without increasing the clock rate Strategy is to use two simpler processors on the chip rather than one more complex processor With two processors larger caches are justified As caches became larger it made performance sense to create two and then three levels of cache on a chip Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Many Integrated Core (MIC) Graphics Processing Unit (GPU) MIC GPU Leap in performance as well Core designed to perform as the challenges in parallel operations on graphics developing software to data exploit such a large number Traditionally found on a plug-in of cores graphics card, it is used to The multicore and MIC encode and render 2D and 3D strategy involves a graphics as well as process homogeneous collection of video general purpose processors Used as vector processors for on a single chip a variety of applications that require repetitive computations Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Amdahl’s Law Gene Amdahl Deals with the potential speedup of a program using multiple processors compared to a single processor Illustrates the problems facing industry in the development of multi-core machines – Software must be adapted to a highly parallel execution environment to exploit the power of parallel processing Can be generalized to evaluate and design technical improvement in a computer system Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 2.3 T (1 – f)T fT (1 – f)T fT N 1 1 f 1 T N Figure 2.3 Illustration of Amdahl’s Law Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 2.4 Spedup f = 0.95 f = 0.90 f = 0.75 f = 0.5 Number of Processors Figure 2.4 Amdahl’s Law for Multiprocessors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Little’s Law Fundamental and simple relation with broad applications Can be applied to almost any system that is statistically in steady state, and in which there is no leakage Queuing system – If server is idle an item is served immediately, otherwise an arriving item joins a queue – There can be a single queue for a single server or for multiple servers, or multiple queues with one being for each of multiple servers Average number of items in a queuing system equals the average rate at which items arrive multiplied by the time that an item spends in the system – Relationship requires very few assumptions – Because of its simplicity and generality it is extremely useful Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 2.5 Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 2.1 Performance Factors and System Attributes Ic p m k Instruction set architecture X X Compiler technology X X X Processor implementation X X Cache and memory hierarchy X X Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Calculating the Mean The three The use of benchmarks to compare systems involves common calculating the mean value of formulas used a set of data points related to for calculating execution time a mean are: Arithmetic Geometric Harmonic Copyright © 2022 Pearson Education, Ltd. All Rights Reserved MD AM (a) GM Figure 2.6 HM MD AM (b) GM HM MD AM (c) GM HM MD AM (d) GM HM MD AM (e) GM HM MD AM (f) GM HM MD AM (g) GM HM 0 1 2 3 4 5 6 7 8 9 10 11 (a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median (b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean (c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean (d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean (e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1) (f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) (g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) Figure 2.6 Comparison of Means on Various Data Sets (each set has a maximum data point value of 11) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Arithmetic Mean ◼ An Arithmetic Mean (AM) is an appropriate measure if the sum of all the measurements is a meaningful and interesting value ◼ The AM is a good candidate for comparing the execution time performance of several systems For example, suppose we were interested in using a system for large-scale simulation studies and wanted to evaluate several alternative products. On each system we could run the simulation multiple times with different input values for each run, and then take the average execution time across all runs. The use of multiple runs with different inputs should ensure that the results are not heavily biased by some unusual feature of a given input set. The AM of all the runs is a good measure of the system’s performance on simulations, and a good number to use for system comparison. ◼ The AM used for a time-based variable, such as program execution time, has the important property that it is directly proportional to the total time ◼ If the total time doubles, the mean value doubles Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 2.2 A Comparison of Arithmetic and Harmonic Means for Rates Computer Computer Computer Computer Computer Computer A time B time C time A rate B rate C rate (secs) (secs) (secs) (MFLOPS) (MFLOPS) (MFLOPS) Program 1 2.0 1.0 0.75 50 100 133.33 (108 FP ops) Program 1 0.75 2.0 4.0 133.33 50 25 (108 FP ops) Total execution 2.75 3.0 4.75 – – – time Arithmetic mean of 1.38 1.5 2.38 – – – times Inverse of total 0.36 0.33 0.21 – – – execution time (1/sec) Arithmetic mean of – – – 91.67 75.00 79.17 rates Harmonic mean of – – – 72.72 66.67 42.11 rates Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 2.3 A Comparison of Arithmetic and Geometric Means for Normalized Results (a) Results normalized to Computer A Computer A time Computer B time Computer C time Program 1 2.0 (1.0) 1.0 (0.5) 0.75 (0.38) Program 2 0.75 (1.0) 2.0 (2.67) 4.0 (5.33) Total execution time 2.75 3.0 4.75 Arithmetic mean of 1.00 1.58 2.85 normalized times Geometric mean of 1.00 1.15 1.41 normalized times (a) Results normalized to Computer B Computer A time Computer B time Computer C time Program 1 2.0 (2.0) 1.0 (1.0) 0.75 (0.75) Program 2 0.75 (0.38) 2.0 (1.0) 4.0 (2.0) Total execution time 2.75 3.0 4.75 Arithmetic mean of 1.19 1.00 1.38 normalized times Geometric mean of 0.87 1.00 1.22 normalized times Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 2.4 Another Comparison of Arithmetic and Geometric Means for Normalized Results (a) Results normalized to Computer A Computer A time Computer B time Computer C time Program 1 2.0 (1.0) 1.0 (0.5) 0.20 (0.1) Program 2 0.4 (1.0) 2.0 (5.0) 4.0 (10.0) Total execution time 2.4 3.00 4.2 Arithmetic mean of 1.00 2.75 5.05 normalized times Geometric mean of 1.00 1.58 1.00 normalized times (a) Results normalized to Computer B Computer A time Computer B time Computer C time Program 1 2.0 (2.0) 1.0 (1.0) 0.20 (0.2) Program 2 0.4 (0.2) 2.0 (1.0) 4.0 (2.0) Total execution time 2.4 3.0 4.2 Arithmetic mean of 1.10 1.00 1.10 normalized times Geometric mean of 0.63 1.00 0.63 normalized times Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Benchmark Principles Desirable characteristics of a benchmark program: 1. It is written in a high-level language, making it portable across different machines 2. It is representative of a particular kind of programming domain or paradigm, such as systems programming, numerical programming, or commercial programming 3. It can be measured easily 4. It has wide distribution Copyright © 2022 Pearson Education, Ltd. All Rights Reserved System Performance Evaluation Corporation (SPEC) Benchmark suite – A collection of programs, defined in a high-level language – Together attempt to provide a representative test of a computer in a particular application or system programming area – SPEC – An industry consortium – Defines and maintains the best known collection of benchmark suites aimed at evaluating computer systems – Performance measurements are widely used for comparison and research purposes Copyright © 2022 Pearson Education, Ltd. All Rights Reserved SPEC CPU2017 Best known SPEC benchmark suite Industry standard suite for processor intensive applications Appropriate for measuring performance for applications that spend most of their time doing computation rather than I/O Consists of 20 integer benchmarks and 23 floating-point benchmarks written in C, C++, and Fortran For all of the integer benchmarks and most of the floating- point benchmarks, there are both rate and speed benchmark programs The suite contains over 11 million lines of code Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Rate Speed Language Kloc Application Area 500.perlbench_r 600.perlbench_s C 363 Perl interpreter 502.gcc_r 602.gcc_s C 1304 GNU C compiler 505.mcf_r 605.mcf_s C 3 Route planning 520.omnetpp_r 620.omnetpp_s C++ 134 Discrete event simulation - computer network Table 2.5 523.xalancbmk_r 623.xalancbmk_s C++ 520 XML to HTML conversion via XSLT (A) 525.x264_r 625.x264_s C 96 Video compression SPEC 531.deepsjeng_r 631.deepsjeng_s C++ 10 AI: alpha-beta tree search (chess) CPU2017 Benchmarks 541.leela_r 641.leela_s C++ 21 AI: Monte Carlo tree search (Go) 548.exchange2_r 648.exchange2_s Fortran 1 AI: recursive solution generator (Sudoku) 557.xz_r 657.xz_s C 33 General data compression Kloc = line count (including comments/whitespace) for source files used in a build/1000 (Table can be found on page 61 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Rate Speed Language Kloc Application Area 503.bwaves_r 603.bwaves_s Fortran 1 Explosion modeling 507.cactuBSSN_r 607.cactuBSSN_s C++, C, 257 Physics; relativity Fortran 508.namd_r C++, C 8 Molecular dynamics 510.parest_r C++ 427 Biomedical imaging; optical tomography with finite elements Table 2.5 (B) 511.povray_r C++ 170 Ray tracing 519.ibm_r 619.ibm_s C 1 Fluid dynamics SPEC 521.wrf_r 621.wrf_s Fortran, C 991 Weather forecasting CPU2017 526.blender_r C++ 1577 3D rendering and animation Benchmarks 527.cam4_r 627.cam4_s Fortran, C 407 Atmosphere modeling 628.pop2_s Fortran, C 338 Wide-scale ocean modeling (climate level) 538.imagick_r 638.imagick_s C 259 Image manipulation 544.nab_r 644.nab_s C 24 Molecular dynamics 549.fotonik3d_r 649.fotonik3d_s Fortran 14 Computational electromagnetics 554.roms_r 654.roms_s Fortran 210 Regional ocean modeling. Kloc = line count (including comments/whitespace) for source files used in a build/1000 (Table can be found on page 61 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Base Peak Seconds Rate Seconds Rate Benchmark 1141 1070 933 1310 500.perlbench_r Table 2.6 1303 835 1276 852 502.gcc_r 1433 866 1378 901 SPEC 505.mcf_r 1664 606 1634 617 CPU 2017 520.omnetpp_r Integer 722 1120 713 1140 Benchmarks 523.xalancbmk_r for HP 655 2053 661 2030 Integrity 525.x264_r 604 1460 597 1470 Superdome X 531.deepsjeng_r 541.leela_r 892 1410 896 1420 (a) Rate Result 833 2420 770 2610 (768 copies) 548.exchange2_r 870 953 863 961 557.xz_r © 2018 Pearson Education, Inc., Hoboken, NJ. All rights reserved. (Table can be found on page 64 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Base Peak Seconds Ratio Seconds Ratio Benchmark 358 4.96 295 6.01 Table 2.6 600.perlbench_s 602.gcc_s 546 7.29 535 7.45 SPEC 605.mcf_s 866 5.45 700 6.75 CPU 2017 276 5.90 247 6.61 Integer 620.omnetpp_s Benchmarks 188 7.52 179 7.91 for HP 623.xalancbmk_s Integrity 625.x264_s 283 6.23 271 6.51 Superdome X 407 3.52 343 4.18 631.deepsjeng_s (b) Speed 469 3.63 439 3.88 Result 641.leela_s 329 8.93 299 9.82 (384 threads) 648.exchange2_s 2164 2.86 2119 2.92 657.xz_s (Table can be found on page 64 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Terms Used in SPEC Documentation Benchmark Peak metric – A program written in a high-level – This enables users to attempt to language that can be compiled and optimize system performance by executed on any computer that optimizing the compiler output implements the compiler Speed metric System under test – This is simply a measurement of the – This is the system to be evaluated time it takes to execute a compiled Reference machine benchmark – This is a system used by SPEC to Used for comparing the ability of a establish a baseline performance for all computer to complete single tasks benchmarks Rate metric ▪ Each benchmark is run and – This is a measurement of how many measured on this machine to tasks a computer can accomplish in a establish a reference time for that certain amount of time benchmark This is called a throughput, capacity, Base metric or rate measure – These are required for all reported Allows the system under test to results and have strict guidelines for execute simultaneous tasks to take compilation advantage of multiple processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Start Figure 2.7 Get next program Run program three times Select median value Ratio(prog) = Tref(prog)/TSUT(prog) Yes More No Compute geometric programs? mean of all ratios End Figure 2.7 SPEC Evaluation Flowchart Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Benchmark Seconds Energy (kJ) Average Power Maximum (W) Power (W) 1774 1920 1080 1090 600.perlbench_s 3981 4330 1090 1110 602.gcc_s 4721 5150 1090 1120 Table 2.7 605.mcf_s 1630 1770 1090 1090 620.omnetpp_s SPECspeed 2017_int_base 1417 1540 1090 1090 Benchmark 623.xalancbmk_s Results for Reference 1764 1920 1090 1100 Machine (1 625.x264_s 1432 1560 1090 1130 thread) 631.deepsjeng_s 1706 1850 1090 1090 641.leela_s 2939 3200 1080 1090 648.exchange2_s 6182 6730 1090 1140 657.xz_s (Table can be found on page 66 in the textbook.) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Summary Performance Concepts Chapter 2 Designing for performance Basic measures of computer – Microprocessor speed performance – Performance balance – Clock speed – Improvements in chip – Instruction execution rate organization and architecture Calculating the mean – Arithmetic mean Multicore – Harmonic mean MICs – Geometric mean GPGPUs Benchmark principles Amdahl’s Law SPEC benchmarks Little’s Law Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Computer Organization and Architecture Designing for Performance 11th Edition, Global Edition Chapter 3 A Top-Level View of Computer Function and Interconnection Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Computer Components Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies, Princeton Referred to as the von Neumann architecture and is based on three key concepts: – Data and instructions are stored in a single read-write memory – The contents of this memory are addressable by location, without regard to the type of data contained there – Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next Hardwired program – The result of the process of connecting the various components in the desired configuration Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Hardware and Sequence of arithmetic Software Data and logic functions Results Approaches (a) Programming in hardware Instruction Instruction codes interpreter Control signals General-purpose Data arithmetic Results and logic functions (b) Programming in software Figure 3.1 Hardware and Software Approaches Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Software and I/O Components Software A sequence of codes or instructions Part of the hardware interprets each instruction and generates control signals Provide a new sequence of codes for each new program instead of rewiring the hardware Major components: CPU Instruction interpreter Module of general-purpose arithmetic and logic functions I/O Components Input module Contains basic components for accepting data and instructions and converting them into an internal form of signals usable by the system Output module Means of reporting results Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Memory, MAR, and MBR Memory Memory buffer address register (MBR) register (MAR) Contains the data Specifies the to be written into address in memory memory or for the next read or receives the data write read from memory I/O address I/O buffer register (I/OAR) register (I/OBR) Specifies a Used for the particular I/O exchange of data device between an I/O module and the CPU Copyright © 2022 Pearson Education, Ltd. All Rights Reserved CPU Main Memory 0 System 1 2 Figure 3.2 PC MAR Bus Instruction Instruction Instruction IR MBR I/O AR Data Execution unit Data I/O BR Data Data I/O Module n–2 n–1 PC = Program counter Buffers IR = Instruction register MAR = Memory address register MBR = Memory buffer register I/O AR = Input/output address register I/O BR = Input/output buffer register Figure 3.2 Computer Components: Top-Level View Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.3 Fetch Cycle Execute Cycle Fetch Next Execute START HALT Instruction Instruction Figure 3.3 Basic Instruction Cycle Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Fetch Cycle At the beginning of each instruction cycle the processor fetches an instruction from memory The program counter (PC) holds the address of the instruction to be fetched next The processor increments the PC after each instruction fetch so that it will fetch the next instruction in sequence The fetched instruction is loaded into the instruction register (IR) The processor interprets the instruction and performs the required action Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Action Categories Data transferred from Data transferred to or processor to memory from a peripheral or from memory to device by processor transferring between the processor and an I/O module Processor- Processor- memory I/O Data Control processing An instruction may The processor may specify that the perform some sequence of arithmetic or logic execution be altered operation on data Copyright © 2022 Pearson Education, Ltd. All Rights Reserved 0 3 4 15 Opcode Address Figure 3.4 (a) Instruction format 0 1 15 S Magnitude (b) Integer format Program Counter (PC) = Address of instruction Instruction Register (IR) = Instruction being executed Accumulator (AC) = Temporary storage (c) Internal CPU registers 0001 = Load AC from Memory 0010 = Store AC to Memory 0101 = Add to AC from Memory (d) Partial list of opcodes Figure 3.4 Characteristics of a Hypothetical Machine Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Memory CPU Registers Memory CPU Registers 300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC 301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC 302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR Figure 3.5 940 0 0 0 3 940 0 0 0 3 941 0 0 0 2 941 0 0 0 2 Step 1 Step 2 Memory CPU Registers Memory CPU Registers 300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC 301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC 302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR 940 0 0 0 3 940 0 0 0 3 3+2=5 941 0 0 0 2 941 0 0 0 2 Step 3 Step 4 Memory CPU Registers Memory CPU Registers 300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC 301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC 302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR 940 0 0 0 3 940 0 0 0 3 941 0 0 0 2 941 0 0 0 5 Step 5 Step 6 Figure 3.5 Example of Program Execution (contents of memory and registers in hexadecimal) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.6 Instruction Operand Operand fetch fetch store Multiple Multiple operands results Instruction Instruction Operand Operand Data address operation address address Operation calculation decoding calculation calculation Return for string Instruction complete, or vector data fetch next instruction Figure 3.6 Instruction Cycle State Diagram Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Table 3.1 Classes of Interrupts Program Generated by some condition that occurs as a result of an instruction execution, such as arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside a user’s allowed memory space. Timer Generated by a timer within the processor. This allows the operating system to perform certain functions on a regular basis. I/O Generated by an I/O controller, to signal normal completion of an operation, request service from the processor, or to signal a variety of error conditions. Hardware Failure Generated by a failure such as power failure or memory parity error. Copyright © 2022 Pearson Education, Ltd. All Rights Reserved User 1 2 3 WRITE WRITE WRITE Program Figure 3.7 (a) No interrupts I/O 4 I/O 5 Program Command END User 1 2a 2b 3a 3b WRITE WRITE WRITE Program (b) Interrupts, short I/O wait I/O 4 I/O Interrupt 5 Program Command Handler END User 1 2 3 WRITE WRITE WRITE Program (c) Interrupts, long I/O wait I/O 4 I/O Interrupt 5 Program Command Handler END = interrupt occurs during course of execution of user program Figure 3.7 Program Flow of Control Without and With Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved User Program Interrupt Handler Figure 3.8 1 2 i Interrupt occurs here i+1 M Figure 3.8 Transfer of Control via Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.9 Fetch Cycle Execute Cycle Interrupt Cycle Interrupts Disabled Check for Fetch Next Execute START Interrupt; Instruction Instruction Interrupts Process Interrupt Enabled HALT Figure 3.9 Instruction Cycle with Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Time 1 1 Figure 3.10 4 4 I/O operation I/O operation; processor waits 2a concurrent with processor executing 5 5 2b 2 4 I/O operation 4 3a concurrent with processor executing I/O operation; processor waits 5 5 3b (b) With interrupts 3 (a) Without interrupts Figure 3.10 Program Timing: Short I/O Wait Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Time 1 1 4 4 Figure 3.11 I/O operation; 2 I/O operation processor waits concurrent with processor executing; then processor waits 5 5 2 4 4 3 I/O operation concurrent with I/O operation; processor executing; processor waits then processor waits 5 5 3 (b) With interrupts (a) Without interrupts Figure 3.11 Program Timing: Long I/O Wait Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.12 Instruction Operand Operand fetch fetch store Multiple Multiple operands results Instruction Instruction Operand Operand Data address operation address address Operation calculation decoding calculation calculation Return for string or Instruction complete, vector data No fetch next instruction interrupt Interrupt check Interrupt Interrupt Figure 3.12 Instruction Cycle State Diagram, With Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Interrupt User program handler X Figure 3.13 Interrupt handler Y (a) Sequential interrupt processing Interrupt User program handler X Interrupt handler Y (b) Nested interrupt processing Figure 3.13 Transfer of Control with Multiple Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.14 Printer Communication User program interrupt service routine interrupt service routine t=0 15 0 t= t =1 t = 25 t= t = 25 Disk 40 interrupt service routine t= 35 Figure 3.14 Example Time Sequence of Multiple Interrupts Copyright © 2022 Pearson Education, Ltd. All Rights Reserved I/O Function I/O module can exchange data directly with the processor Processor can read data from or write data to an I/O module – Processor identifies a specific device that is controlled by a particular I/O module – I/O instructions rather than memory referencing instructions In some cases it is desirable to allow I/O exchanges to occur directly with memory – The processor grants to an I/O module the authority to read from or write to memory so that the I/O memory transfer can occur without tying up the processor – The I/O module issues read or write commands to memory relieving the processor of responsibility for the exchange – This operation is known as direct memory access (DMA) Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Read Memory Figure 3.15 Write N Words Address 0 Data Data N–1 Read I/O Module Internal Write Data External Address M Ports Data Internal Data Interrupt Signals External Data Instructions Address Control Data CPU Signals Interrupt Data Signals Figure 3.15 Computer Modules Copyright © 2022 Pearson Education, Ltd. All Rights Reserved The interconnection structure must support the following types of transfers: Memory Processor I/O to or I/O to Processor to to from processor to I/O processor memory memory An I/O module is allowed to exchange Processor Processor data directly reads an Processor reads data Processor with instruction writes a unit from an I/O sends data memory or a unit of of data to device via to the I/O without data from memory an I/O device going memory module through the processor using direct memory access Copyright © 2022 Pearson Education, Ltd. All Rights Reserved A communication pathway Signals transmitted by any connecting two or more one device are available for devices reception by all other Key characteristic is that it is a devices attached to the bus shared transmission medium If two devices transmit during the same time period their signals will overlap and become garbled Bus Typically consists of Interconnection multiple communication Computer systems contain a lines number of different buses that provide pathways Each line is capable of between components at transmitting signals representing binary 1 and binary 0 various levels of the computer system hierarchy System bus A bus that connects major The most common computer computer components (processor, memory, I/O) interconnection structures are based on the use of one or more system buses Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Data Bus Data lines that provide a path for moving data among system modules May consist of 32, 64, 128, or more separate lines The number of lines is referred to as the width of the data bus The number of lines determines how many bits can be transferred at a time The width of the data bus is a key factor in determining overall system performance Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Address Bus Control Bus Used to designate the source or Used to control the access and the destination of the data on the use of the data and address lines data bus – If the processor wishes to Because the data and address lines read a word of data from are shared by all components there memory it puts the address must be a means of controlling their of the desired word on the use address lines Control signals transmit both Width determines the maximum command and timing information possible memory capacity of the among system modules system Also used to address I/O ports Timing signals indicate the validity – The higher order bits are of data and address information used to select a particular Command signals specify module on the bus and the operations to be performed lower order bits select a memory location or I/O port within the module Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Figure 3.16 CPU Memory Memory I/O I/O Control lines Address lines Bus Data lines Figure 3.16 Bus Interconnection Scheme Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Point-to-Point Interconnect Principal reason for change At higher and higher data was the electrical rates it becomes constraints encountered increasingly difficult to with increasing the perform the synchronization frequency of wide and arbitration functions in a synchronous buses timely fashion A conventional shared bus on the same chip magnified Has lower latency, higher the difficulties of increasing data rate, and better bus data rate and reducing scalability bus latency to keep up with the processors Copyright © 2022 Pearson Education, Ltd. All Rights Reserved Quick Path Interconnect QPI Introduced in 2008 Multiple direct connections – Direct pairwise connections to other components eliminating the need for arbitration found in shared transmission systems Layered protocol architecture – These processor level interconnects use a layered protocol architecture rather than the simple use of control signals found in shared bus arrangements Packetized data transfer – Data are sent as a sequence of packets each of which includes control headers and error control codes Copyright © 2022 Pearson Education, Ltd. All Rights Reserved I/O device I/O device I/O Hub Figure 3.17 DRAM DRAM Core Core A B DRAM DRAM