Computer Architecture PDF
Document Details
Uploaded by SufficientOnomatopoeia6591
Cairo University Engineering
2016
Douglas Comer
Tags
Summary
This document provides an introduction to computer architecture, covering terminology, component design, and implementation details. It includes diagrams and truth tables to illustrate the use of transistors and logic gates in digital systems.
Full Transcript
Terminology Used With Digital Systems d Three key ideas – Architecture – Design – Implementation Computer Architecture – Module 1 11 Fall, 2016...
Terminology Used With Digital Systems d Three key ideas – Architecture – Design – Implementation Computer Architecture – Module 1 11 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Architecture d Refers to the overall organization of a computer system d Analogous to a blueprint d Specifies – Functionality of major components – Interconnections among components d Abstracts away details Computer Architecture – Module 1 12 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Design d Needed before a digital system can be built d Translates architecture into components d Fills in details that the architectural specification omits d Specifies items such as – How components are grouped onto boards – How power is distributed to boards d Many designs can satisfy a given architecture Computer Architecture – Module 1 13 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Implementation d All details necessary to build a system d Includes – Specific part numbers to be used – Mechanical specifications of chassis and cases – Layout of components on boards – Power supplies and power distribution – Signal wiring and connectors Computer Architecture – Module 1 14 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Summary d Architecture is required because understanding computer organization leads to programming excellence d This course covers the four essential aspects of computer architecture – Digital logic – Processors – Memory – I/O d You will have fun with hardware in the lab Computer Architecture – Module 1 15 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of A Traditional Transistor C large current flows B from point C to point E small current flows from point B to E E d Amplification means the large output current varies exactly like the small input current Computer Architecture – Module 2 8 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Field Effect Transistor d Called a Metal Oxide Semiconductor FET (MOSFET) when used on a CMOS chip d Three external connections – Source – Gate – Drain d Designed to act as a switch (on or off) – When the input reaches a threshold (i.e., becomes logic 1), the transistor turns on and passes full current – When the input falls below a threshold (i.e., becomes logic 0), the transistor turns off and passes no current Computer Architecture – Module 2 9 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of A Field Effect Transistor (Used For Switching) source turns on current flowing gate from point S to point D non-zero current flowing from point G to D drain d Input arrives at the gate d Logic zero (zero volts) means the transistor is off; logic 1 (positive voltage) turns the transistor on Computer Architecture – Module 2 10 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Alternative Field Effect Transistor (Also Used For Switching) source turns on current flowing gate from point S to point D no current flowing from point G to D drain d Circle on the gate indicates an inversion d Logic 0 (zero volts) turns the transistor on, and logic 1 (positive voltage) turns the transistor off Computer Architecture – Module 2 11 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Boolean Logic d Mathematical basis for digital circuits d Three basic functions: and, or, and not A B A and B A B A or B A not A 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 Computer Architecture – Module 2 12 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Digital Logic d Can implement Boolean functions with transistors d Five volts represents Boolean 1 (true) d Zero volts represents Boolean 0 (false) Computer Architecture – Module 2 13 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Transistors Implementing Boolean Not + voltage (called Vdd ) input output 0 volts d When input is zero volts, output is connected to + voltage d When input is five volts, output is connected to 0 volts d Hardware engineers use Vdd to denote positive voltage Computer Architecture – Module 2 14 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gate d Hardware component d Consists of integrated circuit d Implements an individual Boolean function d To reduce complexity, hardware uses inverse of Boolean functions – Nand gate implements not and – Nor gate implements not or – Inverter implements not Computer Architecture – Module 2 15 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Truth Tables For Nand, Nor, and Xor Gates A B A nand B A B A nor B A B A xor B 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 Computer Architecture – Module 2 16 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Of Internal Gate Structure (Nand Gate) + A input output B input – d Solid dot indicates electrical connection Computer Architecture – Module 2 17 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Symbols Used In Schematic Diagrams d Basic gates nand gate nor gate inverter and gate or gate xor gate Computer Architecture – Module 2 18 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Technology For Logic Gates d Most popular technology known as Transistor-Transistor Logic (TTL) d Allows direct interconnection (a wire can connect output from one gate to input of another) d Single output can connect to multiple inputs – Called fanout – Limited to a small number Computer Architecture – Module 2 19 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Interconnection Of TTL Gates d Suppose we need a signal to indicate that the power button is depressed and the disk is ready d Two logic gates are needed to form logical and – Output from nand gate connected to input of inverter input from power button output input from disk Computer Architecture – Module 2 20 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Consider The Following Circuit X C output Y A B Z d Question: what does the circuit implement? Computer Architecture – Module 2 21 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Two Ways To Describe A Circuit d Boolean expression – Often used when designing circuit – Can be transformed to equivalent version that requires fewer gates d Truth table – Enumerates inputs and outputs – Often used when debugging a circuit Computer Architecture – Module 2 22 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Describing A Circuit With Boolean Algebra X C output Y A B Z d Value at point A is: not Y d Value at point B is: Z nor (not Y) Computer Architecture – Module 2 23 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Describing A Circuit With Boolean Algebra X C output Y A B Z d Value at point C is: (X nand ((Z nor (not Y)) d Value at output is: X and (Z nor (not Y)) Computer Architecture – Module 2 24 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Simplifying Boolean Expressions d Rules are similar to conventional algebra – Associative – Reflexive – Distributive d See Appendix 2 in the text for details Computer Architecture – Module 2 25 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Describing A Circuit With A Truth Table X Y Z A B C output 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 0 d Table lists all possible inputs and output for each d Can also state values for intermediate points Computer Architecture – Module 2 26 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Nand / Nor Vs. And / Or d Mathematically, nand / nor / not is equivalent to and / or / not d Practically – It is possible to construct and and or gates – Sometimes, humans find and and or operations easier to understand d Example circuit or truth table output can be described by Boolean expression: X and Y and (not Z)) Computer Architecture – Module 2 27 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Binary Addition d How does a computer perform addition? d Analogous to the method used in elementary school d Each digit is a single bit carry carry carry 1 0 1 0 0 + 1 1 1 0 1 1 1 0 0 0 1 d Note: first bit never has a carry input Computer Architecture – Module 2 28 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Half-Adder Circuit d Adds two input bits d Produces two output bits – Sum – Carry d We will use exclusive or gate plus and gate exclusive-or gate bit 1 sum bit 2 and gate carry Computer Architecture – Module 2 29 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Full-Adder Circuit d Input is two bits plus a carry d Produces two output bits – Sum – Carry bit 1 bit 2 sum carry in carry out Computer Architecture – Module 2 30 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved In Practice d A single gate only has a few connections d A chip has many pins for external connections d Result: package multiple gates on each chip d We will see examples shortly Computer Architecture – Module 2 31 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved An Example Logic Gate Technology d 7400 family of chips d Package is about one-half inch long d Implement TTL logic d Powered by five volts d Each chip contains multiple gates Computer Architecture – Module 2 32 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Gates On 7400-Series Chips 14 13 12 11 10 9 8 14 13 12 11 10 9 8 14 13 12 11 10 9 8 + + + gnd gnd gnd 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 7400 7402 7404 (Quad 2-input NAND) (Quad 2-input NOR) (Hex Inverter) d Pins 7 and 14 connect to ground and power d Power and ground must be connected for the chip to operate Computer Architecture – Module 2 33 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gates And Computers Computer Architecture – Module 2 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gates And Computers d Question: how can computers be constructed from simple logic gates? Computer Architecture – Module 2 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gates And Computers d Question: how can computers be constructed from simple logic gates? d Answer: they cannot Computer Architecture – Module 2 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gates And Computers d Question: how can computers be constructed from simple logic gates? d Answer: they cannot d Logic gates only provide a Boolean combination of inputs (known as combinatorial circuits) Computer Architecture – Module 2 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Logic Gates And Computers d Question: how can computers be constructed from simple logic gates? d Answer: they cannot d Logic gates only provide a Boolean combination of inputs (known as combinatorial circuits) d Additional functionality is needed – Circuits that maintain state – Circuits that operate on a clock Computer Architecture – Module 2 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Circuits That Maintain State d More sophisticated than combinatorial circuits d Output depends on history of previous input as well as values on input lines Computer Architecture – Module 2 35 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Basic Circuit That Maintains State d Known as latch d Has two inputs: data and enable d When enable is 1, output is same as data d When enable goes to 0, output stays locked at current value data in output enable Computer Architecture – Module 2 36 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Propagation Delay Computer Architecture – Module 2 37 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Propagation Delay d Key in understanding a latch Computer Architecture – Module 2 37 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Propagation Delay d Key in understanding a latch d Consider the circuit output d What does it do? Computer Architecture – Module 2 37 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Propagation Delay d Key in understanding a latch d Consider the circuit output d What does it do? d Mathematically, the circuit is meaningless because an inverter produces the complement of its input, but in this case the output is fed back into the input d Practically, a propagation delay means the output stays the same for a short time, and then changes d Result: output varies over time, 0 for time t, 1 for time t, 0 for time t, and so on, where t is the propagation delay Computer Architecture – Module 2 37 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Register d Basic building block for a computer d Acts like a miniature N-bit memory d Can be built out of latches enable line for the register Register 1-bit latch 1-bit latch input bits for output bits for the register the register 1-bit latch 1-bit latch Computer Architecture – Module 2 38 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved A More Complex Circuit That Maintains State d Basic flip-flop d Can be constructed from a pair of latches d Analogous to push-button power switch (i.e., push-on push-off) d Each new 1 received as input causes output to reverse – First input pulse causes flip-flop to turn on – Second input pulse causes flip-flop to turn off Computer Architecture – Module 2 39 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Output Of A Flip-Flop input output flip-flop in: 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 out: 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 time increases d Note: output only changes when input makes a transition from zero to one (i.e., rises) Computer Architecture – Module 2 40 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Flip-Flop Action Plotted As Transition Diagram 1 in: 0 1 out: 0 clock: time increases d All changes synchronized with clock (described later) d Output changes on rising edge of input d Also called leading edge Computer Architecture – Module 2 41 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Binary Counter d Counts input pulses d Output is binary value d Includes reset line to restart count at zero d Example: 4-bit counter available as single integrated circuit Computer Architecture – Module 2 42 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Counter input outputs decimal 0 000 0 0 000 0 1 001 1 counter 0 001 1 outputs 1 010 2 input time 1 010 2 increases 0 010 2 1 011 3 (a) 0 011 3 1 100 4 0 100 4 1 101 5... (b) d Part (a) shows the schematic of a counter chip d Part (b) shows the output as the input changes Computer Architecture – Module 2 43 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Clock d Permits active circuits d Electronic circuit that pulses regularly d Measured in cycles per second (Hz) d Output of clock is square wave (sequence of 1 0 1 0 1... ) clock 1 output 0 time Computer Architecture – Module 2 44 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Decoder/Demultiplexor d Takes binary number as input d Uses input to select one output d Technical distinction – Decoder simply selects one of its outputs – Demultiplexor feeds a special input to the selected output d In practice: engineers often use the term “demux” for either, and blur the distinction Computer Architecture – Module 2 45 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Decoder d Binary value on input lines determines which output is active decoder “000” “001” “010” x “011” inputs y outputs “100” z “101” “110” “111” d Technical detail: on some decoder chips, an active output is logic 0 and all others are logic 1 Computer Architecture – Module 2 46 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example: Execute A Sequence Of Steps d Imagine the power-on sequence for an embedded system – Test the battery – Power on and test the memory – Start the disk – Power up the display – Read boot sector from disk into memory – Start the CPU d Separate hardware module performs each task d Need to activate the modules in sequence Computer Architecture – Module 2 47 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Circuit To Execute A Sequence decoder not used counter test battery clock test memory start disk power screen read boot blk start CPU not used d Technique: count clock pulses and use decoder to select an output for each possible counter output d Note: counter will wrap around to zero, so this is an infinite loop Computer Architecture – Module 2 48 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Feedback d Output of circuit used as an input d Called feedback d Allows more control d Example: stop sequence when output F becomes active d Boolean algebra CLOCK and (not F) Computer Architecture – Module 2 49 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Feedback For Termination decoder not used these two gates perform clock the Boolean and function counter test battery test memory start disk state CRT read boot blk start CPU feedback stop d Note additional input needed to restart sequence Computer Architecture – Module 2 50 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Latch operations Operation Output=Z Data in Enable s A A=0, B=1 1 1 If Z=0, it inverts to 1 1 If Z=1 it stays 1 Z A=1, B=0 If Z=0 it stays 0 1 0 0 If Z=1, it inverts to 0 B A=1, B=1 If Z=0, it stays 0 0 X If Z=1 it stays 1 The Variety Of Processors And Computational Engines Fetch-Execute cycle, Starting and stopping a processor Terminology d The terms processor and computational engine refer broadly to any mechanism that drives computation d Wide variety of sizes and complexity d Processor is key element in all computational systems Computer Architecture – Module 4 2 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Von Neumann Architecture d Characteristic of most modern processors d Reference to mathematician John Von Neumann, a pioneer in computer architecture d Unlike Harvard architecture, there is one memory d Fundamental concept is a stored program (i.e., a program in the same memory as the data) d Three basic components interact to form a computational system – Processor – Memory – I/O facilities Computer Architecture – Module 4 3 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Von Neumann Architecture computer processor memory input/output facilities Computer Architecture – Module 4 4 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Processor d Digital device d Performs computation involving multiple steps d Wide variety of capabilities d Mechanisms available – Fixed logic – Selectable logic – Parameterized logic – Programmable logic Computer Architecture – Module 4 5 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Hierarchical Structure And Processors d Most computer architecture follows a hierarchical approach d Subparts of a large, central processor are sophisticated enough to meet our definition of processor d Some engineers use term computational engine for subpiece that is less powerful than main processor Computer Architecture – Module 4 6 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Processor Hierarchy CPU graphics trigonometry engine engine other query components engine arithmetic engine Computer Architecture – Module 4 7 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Major Components Of A Conventional Processor d Controller to coordinate operation (often omitted from architecture diagrams) d Arithmetic Logic Unit (ALU) d Local data storage d Internal interconnections d External interfaces (I/O buses) Computer Architecture – Module 4 8 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of A Conventional Processor internal interconnection(s) controller local ALU storage external interface external connection Computer Architecture – Module 4 9 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Parts Of A Conventional Processor d Controller – Overall responsibility for execution – Moves through sequence of steps – Coordinates other units – Timing-based operation: knows how long each unit requires and schedules steps accordingly d Arithmetic Logic Unit – Operates as directed by controller – Provides arithmetic and Boolean operations – Performs one operation at a time as directed Computer Architecture – Module 4 10 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Parts Of A Conventional Processor (continued) d Internal interconnections – Allow transfer of values among units of the processor – Also called data paths d External interface – Handles communication between processor and rest of computer system – Provides interaction with external memory as well as external I/O devices Computer Architecture – Module 4 11 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Parts Of A Conventional Processor (continued) d Local data storage – Holds data values for operations – Values must be inserted (e.g., loaded from memory) before the operation can be performed – Typically implemented with registers Computer Architecture – Module 4 12 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Arithmetic Logic Unit d Main computational engine in conventional processor d Complex unit that can perform variety of tasks d Typical ALU operations – Arithmetic (integer add, subtract, multiply, divide) – Shift (left, right, circular) – Boolean (and, or, not, exclusive or) Computer Architecture – Module 4 13 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Processor Categories And Roles d Many possible roles for individual processors in – Coprocessors – Microcontrollers – Embedded system processors – General-purpose processors Computer Architecture – Module 4 14 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Coprocessor d Operates in conjunction with and under the control of another processor d Usually – Special-purpose processor – Performs a single task – Operates at high speed d Example: floating point accelerator Computer Architecture – Module 4 15 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Microcontroller d Programmable device d Dedicated to control of a physical system d Example: control an automobile engine or grocery store door d Negative: extremely limited (slow processor and tiny memory) d Positive: very low power consumption Computer Architecture – Module 4 16 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Steps A Microcontroller Performs (Automatic Door) do forever { wait for the sensor to be tripped; turn on power to the door motor; wait for a signal that indicates the door is open; wait for the sensor to reset; delay ten seconds; turn off power to the door motor; } Computer Architecture – Module 4 17 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Embedded System Processor d Runs sophisticated electronic device d May be more powerful than microcontroller d Generally low power consumption d Example: control DVD player, including commands received from a remote control as well as from the front panel Computer Architecture – Module 4 18 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved General-Purpose Processor d Most powerful type of processor d Completely programmable d Full functionality d Power consumption is secondary consideration d Example: CPU in a personal computer Computer Architecture – Module 4 19 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Processor Implementation d Originally: discrete logic d Later: single circuit board d Even later: single chip d Now: usually part of a single chip Computer Architecture – Module 4 20 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Definition Of Programmable Device d To a software engineer programming means – Writing, compiling, and loading code into memory – Executing the resulting memory image d To a hardware engineer a programmable device – Has a processor separate from the program it runs – May have the program burned onto a chip Computer Architecture – Module 4 21 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Fetch-Execute Cycle d Basis for programmable processors d Allows processor to move through program steps automatically d Implemented by processor hardware d At some level, every programmable processor implements a fetch-execute cycle Computer Architecture – Module 4 22 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Fetch-Execute Algorithm 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Repeat forever { 1 1 1 1 1 1 Fetch: access the next step of the program from the 1 1 location in which the program has been stored. 1 1 1 1 Execute: Perform the step of the program. 1 1 1 1 } 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222221 d Note: we will discuss in more detail later Computer Architecture – Module 4 23 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Program Translation d Processors require a program to be – In memory – Represented in binary d Programmers prefer a program to be – Readable by humans – In a High Level Language d Solution: allow programmers to write code in a readable high-level language and translate to binary d Use computer software to perform the translation Computer Architecture – Module 4 24 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Program Translation preprocessed source source assembly code preprocessor compiler code code relocatable binary assembler object linker object code code object code (functions) in libraries Computer Architecture – Module 4 25 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Clock Rate And Instruction Rate d Clock rate – Rate at which gates are clocked – Provides a measure of the underlying hardware speed d Instruction rate – Measures the number of instructions a processor can execute per unit time d On some processors, a given instruction may take more clock cycles than other instructions d Example: multiplication may take longer than addition Computer Architecture – Module 4 26 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Stopping A Processor d Processor runs fetch-execute indefinitely d Software must plan next step d Two possibilities when last step of computation finishes – Smallest embedded systems: code enters a loop testing for a change in input – Larger systems: operating system runs and executes an infinite loop d Note: to reduce power consumption, hardware may provide a way to put processor to sleep until I/ O activity occurs (covered later in the course) Computer Architecture – Module 4 27 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Starting A Processor d Processor hardware includes a reset line that stops the fetch-execute cycle d For power-down: reset line is asserted d During power-up, logic holds the reset until the processor and memory are initialized d Power-up steps known as bootstrap Computer Architecture – Module 4 28 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Summary d Processor performs a computation involving multiple steps d Many types of processors – Coprocessor – Microcontroller – Embedded system processor – General-purpose processor d Arithmetic Logic Unit (ALU) performs basic arithmetic and Boolean operations Computer Architecture – Module 4 29 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Summary (continued) d Hardware in programmable processor runs fetch-execute cycle d Until a processor is powered down, fetch-execute must continue Computer Architecture – Module 4 30 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Questions? Introducing instruction set What Instructions Should A Processor Offer? d Minimum set is sufficient, but inconvenient d Extremely large set is convenient, but inefficient d Architect must consider additional factors – Physical size of processor chip – Expected use – Power consumption d Tradeoffs mean a variety of designs exist Computer Architecture – Module 5 2 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Instruction Set Architecture d Idea pioneered by IBM d Allows multiple, compatible models d Define – Set of instructions – Operands and meaning d Do not define – Implementation details – Processor speed Computer Architecture – Module 5 3 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved A Few Choices d Functionality: what the instructions provide – Arithmetic (integer or floating point) – Logic (bit manipulation and testing) – Control (branching, function call) – Other (graphics, data conversion) d Format: representation for each instruction d Semantics: effect when instruction is executed d An Instruction Set Architecture includes all of the above Computer Architecture – Module 5 4 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Parts Of An Instruction d Opcode specifies operation to be performed d Operands specify data values on which to operate d Result location specifies where result is to be placed Computer Architecture – Module 5 5 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Instruction Format d Instruction represented as sequence of bits in memory (usually multiples of bytes) d Typically – Opcode at beginning of instruction – Operands follow opcode opcode operand 1 operand 2... Computer Architecture – Module 5 6 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Instruction Length d Fixed-length – Every instruction is same size – Hardware is less complex – Hardware can run faster – Wasted space: some instructions do not use all the bits d Variable-length – Some instructions shorter than others – Allows instructions with no operands, a few operands, or many operands – Efficient use of memory (no wasted space) Computer Architecture – Module 5 7 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved General-Purpose Registers d High-speed storage mechanism d Part of the processor (on chip) d Each register holds an integer or a pointer d Numbered from 0 through N–1 d Basic uses – Temporary storage during computation – Operand for arithmetic operation d Note: some processors require all operands for an arithmetic operation to come from general-purpose registers Computer Architecture – Module 5 8 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Floating Point Registers d Usually separate from general-purpose registers d Each holds one floating-point value d Floating point registers are operands for floating point arithmetic Computer Architecture – Module 5 9 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Of Programming With Registers d Task – Start with variables X and Y in memory – Add X and Y and place the result in variable Z (also in memory) d Example steps – Load a copy of X into register 1 – Load a copy of Y into register 2 – Add the value in register 1 to the value in register 2, and put the result in register 3 – Store a copy of the value in register 3 in Z d Note: the above assumes registers 1, 2, and 3 are available Computer Architecture – Module 5 10 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Terminology d Register spilling – Occurs when a register is needed for a computation and all registers contain values – General idea * Save current contents of register(s) in memory * Reload registers(s) from memory when values are needed d Register allocation – Refers to choosing which values to keep in registers at a given time – Performed by programmer or compiler Computer Architecture – Module 5 11 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Double Precision d Refers to value that is twice as large as a standard integer d Most processors do not have dedicated registers for double precision computation d Approach taken: programmer must use a contiguous pair of registers to hold a double precision value d Example: multiplication of two 32-bit integers – Result can require 64 bits – Programmer specifies that result goes into a pair of registers (e.g., 4 and 5) Computer Architecture – Module 5 12 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Register Banks d Registers partitioned into disjoint sets called banks d Additional hardware detail d Optimizes performance d Complicates programming Computer Architecture – Module 5 13 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Typical Register Bank Scheme d Registers divided into two banks d ALU instruction that takes two operands must have one operand from each bank d Programmer must ensure operands are in separate banks d Note: having two operands from the same bank will cause a run-time error Computer Architecture – Module 5 14 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Why Register Banks Are Used d Parallel hardware facilities allow simultaneous access of both banks Bank A Bank B 0 4 1 5 2 6 3 7 separate hardware units used to access the register banks Processor d Access takes half as long as using a single bank Computer Architecture – Module 5 15 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Consequence For Programmers d Even trivial programs cause problems d Example R ← X + Y S ← Z - X T ← Y + Z d Operands must be assigned to banks d No feasible choice for the above Computer Architecture – Module 5 16 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Register Conflicts d Occur when operands specify same register bank d May be reported by compiler / assembler d Programmer must rewrite code or insert extra instruction to copy an operand value to the opposite register bank d In the previous example – Start with Y and Z in the same bank – Before adding Y and Z, copy one to another bank Computer Architecture – Module 5 17 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Two Types Of Instruction Sets d CISC: Complex Instruction Set Computer d RISC: Reduced Instruction Set Computer Computer Architecture – Module 5 18 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved CISC Instruction Set d Many instructions (often hundreds) d Given instruction can require arbitrary time to compute d Example: Intel/AMD (x86/x64) or IBM instruction set d Typical complex instructions – Move graphical item on bitmapped display – Copy or clear a region of memory – Perform a floating point computation Computer Architecture – Module 5 19 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved RISC Instruction Set d Few instructions (typically 32 or 64) d Each instruction executes in one clock cycle d Example: MIPS or ARM instruction set d Omits complex instructions – No floating-point instructions – No graphics instructions d Sequence of instructions needed to perform complex action Computer Architecture – Module 5 20 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Instruction pipeline Instruction set example Instruction Pipeline d A major idea in processor design d Also called execution pipeline d Optimizes performance d Permits processor to complete more instructions per unit time d Typically used with RISC instruction set Computer Architecture – Module 5 21 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Basic Steps In A Fetch-Execute Cycle d Fetch the next instruction d Decode the instruction and fetch operands from registers d Perform the arithmetic operation specified by the opcode d Perform memory read or write, if needed d Store result back to the registers Computer Architecture – Module 5 22 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Instruction Pipeline Approach d Build separate hardware block for each step of the fetch-execute cycle d Arrange hardware to pass an instruction through the sequence of hardware blocks d Allows step K of one instruction to execute while step K–1 of next instruction executes d Result is an execution pipeline Computer Architecture – Module 5 23 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of An Execution Pipeline stage 1 stage 2 stage 3 stage 4 stage 5 fetch decode perform read or store next plus fetch arithmetic write the instruction operands operation memory result d Example pipeline has five stages d All stages operate at the same time d Instruction passes through like a factory assembly line Computer Architecture – Module 5 24 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Instructions In A Pipeline clock stage 1 stage 2 stage 3 stage 4 stage 5 1 inst. 1 - - - - Time 2 inst. 2 inst. 1 - - - 3 inst. 3 inst. 2 inst. 1 - - 4 inst. 4 inst. 3 inst. 2 inst. 1 - 5 inst. 5 inst. 4 inst. 3 inst. 2 inst. 1 6 inst. 6 inst. 5 inst. 4 inst. 3 inst. 2 7 inst. 7 inst. 6 inst. 5 inst. 4 inst. 3 8 inst. 8 inst. 7 inst. 6 inst. 5 inst. 4 Computer Architecture – Module 5 25 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Pipeline Speed d All stages operate in parallel d Given stage can start to process a new instruction as soon as current instruction finishes d Effect: N-stage pipeline can operate on N instructions simultaneously, producing speedup d Result – One instruction completes every time pipeline moves – For RISC processor, one instruction completes on every clock cycle d Comparison: without a pipeline, each instruction would take five clock cycles Computer Architecture – Module 5 26 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Significance Of A Pipeline To A Programmer d Pipeline is transparent to programmers (i.e., is automatic) d Execution speed – Is never worse than a processor without a pipeline – May be K times faster than processor without a pipeline d Pipeline stalls (i.e., pauses) if item is not available when a stage needs the item d Programmer who does not understand pipeline can produce code that stalls frequently Computer Architecture – Module 5 27 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Of Instructions That Cause A Stall d Consider code that – Performs addition and subtraction operations – Uses registers A through E for operands and results d Example instruction sequence Instruction K: C ← add A B Instruction K+1: D ← subtract E C d Instruction K+1 must wait for operand C to be computed d Result is a stall Computer Architecture – Module 5 28 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Effect Of Stall On Pipeline stage 1 stage 2 stage 3 stage 4 stage 5 fetch fetch ALU access write instruction operands operation memory results clock 1 inst. K inst. K-1 inst. K-2 inst. K-3 inst. K-4 Time 2 inst. K+1 inst. K inst. K-1 inst. K-2 inst. K-3 3 inst. K+2 (inst. K+1) inst. K inst. K-1 inst. K-2 4 (inst. K+2) (inst. K+1) – inst. K inst. K-1 5 (inst. K+2) (inst. K+1) – – inst. K 6 (inst. K+2) inst. K+1 – – – 7 inst. K+3 inst. K+2 inst. K+1 – – 8 inst. K+4 inst. K+3 inst. K+2 inst. K+1 – 9 inst. K+5 inst. K+4 inst. K+3 inst. K+2 inst. K+1 10 inst. K+6 inst. K+5 inst. K+4 inst. K+1 inst. K+2 d We say a bubble passes through pipeline Computer Architecture – Module 5 29 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Actions That Cause A Pipeline Stall d Access external storage (i.e., memory reference) d Invoke a coprocessor (i.e., I/O) d Branch to a new location d Call a subroutine Computer Architecture – Module 5 30 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Achieving Maximum Speed d Program must be written to accommodate instruction pipeline d To minimize stalls – Avoid introducing unnecessary branches – Delay references to result register(s) d A contradiction – Good software engineering practice divides a large program into smaller functions – A function call stalls the pipelining Computer Architecture – Module 5 31 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Example Of Avoiding Stalls C ← add A B C ← add A B D ← subtract E C F ← add G H F ← add G H M ← add K L J ← subtract I F D ← subtract E C M ← add K L J ← subtract I F P ← subtract M N P ← subtract M N (a) (b) d Stalls eliminated by rearranging (a) to (b) d Compilers for RISC processors usually optimize code to avoid stalls Computer Architecture – Module 5 32 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved A Note About Pipelines d We can think of pipelining as an automatic optimization – Hardware speeds up processing if possible – If speedup is not possible, hardware is still correct d Consequence: code that is not optimized will work correctly, but may run slower than necessary Computer Architecture – Module 5 33 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Forwarding d Hardware optimization to avoid a stall d Allows ALU to reference result in next instruction d Example Instruction K: C ← add A B Instruction K+1: D ← subtract E C d Forwarding hardware – Passes result of add operation directly to ALU without waiting to store it in a register – Ensures the value arrives by the time subtract instruction reaches the pipeline stage for execution Computer Architecture – Module 5 34 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved No-Op Instruction d Often included in RISC instruction sets d May seem unnecessary d Has no effect on – Registers – Memory – Program counter – Computation d Purpose: can be inserted to avoid instruction stalls Computer Architecture – Module 5 35 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Use Of No-Op d Example Instruction K: C ← add A B Instruction K+1: no-op Instruction K+2: D ← subtract E C d If forwarding is available, no-op allows time for result from register C to be fetched for subtract operation d Compilers insert no-op instructions to optimize performance Computer Architecture – Module 5 36 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Types Of Opcodes d Operations usually classified into groups d An example categorization – Arithmetic instructions (integer arithmetic) – Logical instructions (also called Boolean) – Data access and transfer instructions – Conditional and unconditional branch instructions – Floating point instructions – Processor control instructions – Graphics instructions Computer Architecture – Module 5 37 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Program Counter d Hardware register d Used during fetch-execute cycle d Gives address of next instruction to execute d Also known as instruction pointer or instruction counter Computer Architecture – Module 5 38 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Fetch-Execute Algorithm Details 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Assign the program counter an initial program address. 1 1 Repeat forever { 1 1 1 1 Fetch: access the next step of the program from the location given by the 1 1 1 1 program counter. 1 1 1 1 Set an internal address register, A, to the address beyond the instruction that 1 1 was just fetched. 1 1 1 1 Execute: Perform the step of the program. 1 1 1 1 Copy the contents of address register A to the program counter. 1 1 1 1 } 1 12222222222222222222222222222222222222222222222222222222222222222222222222222221 Computer Architecture – Module 5 39 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Branches And Fetch Execute d Absolute branch – Typically named jump – Operand is an address – Assigns operand value to internal register A d Relative branch – Typically named br – Operand is a signed value – Adds operand to internal register A Computer Architecture – Module 5 40 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Subroutine Call d Jump to subroutine (jsr instruction) – Similar to a jump – Saves value of internal register A – Replaces A with operand address d Return from subroutine (ret instruction) – Retrieves value saved during jsr – Replaces A with saved value Computer Architecture – Module 5 41 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Passing Arguments d Multiple methods are used d Choice depends on language/ compiler as well as hardware d Examples – Store arguments in memory – Store arguments in special-purpose hardware registers – Store arguments in general-purpose registers d Many techniques also used to return result from function Computer Architecture – Module 5 42 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Register Window d Hardware optimization for argument passing d Processor contains many general-purpose registers d Only a small subset of registers visible at any time d Caller places arguments in reserved registers d During procedure call, register window moves to hide old registers and expose new registers Computer Architecture – Module 5 43 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved Illustration Of Register Window registers 0 - 7 before other registers subroutine is called are unavailable x1 x2 x3 x4 A B C D (a) registers 0 - 7 unavailable when subroutine runs unavailable x1 x2 x3 x4 A B C D l1 l2 l3 l4 (b) d (a) registers before calling a subroutine d (b) registers when the subroutine runs Computer Architecture – Module 5 44 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved An Example Instruction Set d Known as MIPS instruction set d Early RISC design d Minimalistic d Only 32 instructions Computer Architecture – Module 5 45 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved MIPS Instruction Set (Part 1) 2222222222222222222222222222222222222222222222222222222222222222 Instruction Meaning Arithmetic add integer addition subtract integer subtraction add immediate integer addition (register + constant) add unsigned unsigned integer addition subtract unsigned unsigned integer subtraction add immediate unsigned unsigned addition with a constant move from coprocessor access coprocessor register multiply integer multiplication multiply unsigned unsigned integer multiplication divide integer division divide unsigned unsigned integer division move from Hi access high-order register move from Lo access low-order register Logical (Boolean) and logical and (two registers) or logical or (two registers) and immediate and of register and constant or immediate or of register and constant shift left logical Shift register left N bits shift right logical Shift register right N bits Computer Architecture – Module 5 46 Fall, 2016 Copyright 2016 by Douglas Comer. All rights reserved MIPS Instruction Set (Part 2) 2 222222222222222222222222222222222222222222222222222222222222222222222 Instruction Meaning Data Transfer load word load register from memory store word store register into memory load upper immediate place constant in upper sixteen bits of register move from coproc. register obtain a value from a coprocessor Conditional Branch branch equal branch if two registers equal branch not equal branch if two registers unequal set on less than compare two registers set less than immediate compare register and constant set less than unsigned compare unsigned registers set less than immediate compare unsigned register and constant Unconditional Branch jump go to target address jump register go to address in register jump and link procedure call Computer Architecture