Week 5 (Pipelining Basic) PDF
Document Details
Uploaded by Deleted User
University of Wollongong in Dubai
Tags
Summary
This document is a lecture on computer architecture, with a focus on pipelining and computer memory hierarchy. It discusses the concepts of pipelining and the trade-offs involved in memory design.
Full Transcript
University of Wollongong in Dubai Learning objectives Pipelining and Memory of Computer Systems 1. The pipelining as a measure of improving the performance of instruction execution at the CPU level 2. Memory of a computer system 3. The characteristics of a memory system 4. The memory hierarchy...
University of Wollongong in Dubai Learning objectives Pipelining and Memory of Computer Systems 1. The pipelining as a measure of improving the performance of instruction execution at the CPU level 2. Memory of a computer system 3. The characteristics of a memory system 4. The memory hierarchy 5. The prime memory 1. Bits 2. Memory addresses 3 a computer components And their interactions Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 4 Computer function revisited The basic function performed by a computer is execution of a program, which consists of a set of instructions stored in memory. The processor does the actual work By executing instructions specified in the program. 5 Computer architecture strategies to improve computer performance Computer architects have One obvious way been But for every new is to Making the continuously design, there is a chips run faster striving to physical by increasing improve limitation to their clock performance of what is possible speed the machines they design. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 6 Computer architecture strategies to improve computer performance as a way to Therefore , Which meant get even computer doing two or more architects more things performance looked to at once for a given parallelism clock speed. 7 Parallelism in a computer: Two main forms In computer architecture Parallelism comes in two general forms, instruction-level parallelism processor-level parallelism. 8 Parallelism in a computer: Two main forms Instruction-level Processor level parallelism parallelism Parallelism is exploited Multiple CPUs work within individual together on the same instructions problem. Core can be considered to get more instructions as a extension of out of the machine. processor-level parallelism 9 Pipelining, a traditional instruction- level parallelism Computer architects have known for years that the actual fetching of instructions from memory is a major bottleneck in instruction execution speed. 10 Pipelining, a traditional instruction- level parallelism The solution: Computer architects have adopted the strategy to fetch instructions from memory in advance, so the instruction would be readily available to the CPU when the CPU needs them Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved Pipelining, a traditional instruction- 11 level parallelism- How is this performed? Step 1 Step 2 These instructions are When an instruction is stored in a special set of needed, registers it could usually be taken from the prefetch buffer called the prefetch rather than waiting for a buffer. memory read to complete. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved Pipelining, a traditional instruction- 12 level parallelism- How is this performed? More precisely, prefetching divides instruction execution into two parts: 2. Actual 1. Fetching execution. 13 Pipelining, divides the instruction execution in more than two parts. Pipelining Instruction execution is often divided into many (often a dozen or more) parts, Each one handled by a dedicated piece of hardware, All of which can run in parallel 14 Simple view of instruction processing Two main steps The processor reads (fetches) instructions from memory one at a time, Then executes each instruction. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 15 Instruction cycle flowchart Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 16 A simplified view of the instruction cycle using stages S1, S2, S3, S4, S5 Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 17 The next slide present an example of Pipelining The example illustrates a 5-stage Pipeline over a period of Nine Clock Cycles Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 18 Example of Pipelining We are assuming that The pipeline has 5 the program under units or stages S1, S2, execution is made of S3, S4, S5 nine (09) consecutive I1, I2, I3, I4, I5, I6, I7, I8, I9 We also consider the as well as the time time taken in clock taken to execute all cycles by each instructions instruction 19 20 During clock cycle 1, stage S1 is working on instruction 1, fetching it from memory. 21 During cycle 2, stage S2 decodes instruction1, while stage S1 fetches instruction2. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 22 During cycle 3, S3 fetches the operands for instruction 1, S2 decodes instruction 2, S1 fetches the instruction3. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 23 During cycle 4, S4 executes instruction 1, S3 fetches the operands for instruction 2, S2 decodes instruction 3, S1 fetches instruction 4. Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 24 During cycle 5, S5 writes the result of instruction 1 back, S4 executes instruction2 S3 fetches the operands instruction3 S2 decodes the instruction4 S1 fetches the instruction5 Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 25 A numerical application Assume that a cycle time of 2 nanoseconds Or 2 * 10-9 seconds Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 26 Then it takes 10* 10-9 =10-8 If the cycle time is 2 *10-9 through the five-stage seconds for an instruction seconds pipeline. to execute 27 So, At first glance, with an instruction taking 10 nsec, or 10-8 seconds in one second we can execute 1/ 10-8 = 108 instructions this means we are executing 100 MIPS or 100 Million of instructions per second 28 So, At first glance, with an instruction taking 10 nsec, or 10-8 seconds in one second we can execute 1/ 10-8 = 108 instructions this means we are executing 100 MIPS or 100 Million of Do you Do you instructions per second agree? disagree? 29 So, At first glance, with an instruction taking 10 nsec, or 10-8 seconds in one second we can execute 1/ 10-8 = 108 instructions this means we are executing 100 MIPS or You should disagree 100 Million of instructions per second 30 So, At first glance, with an instruction taking 10 nsec, or 10-8 seconds in one second we can execute 1/ 10-8 = 108 instructions this means we are executing 100 MIPS or Why you should 100 Million of instructions per second disagree? Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 31 Why should you disagree? Look closely at our diagram Focus on stage 5 that shows when the execution of an instructions terminates. Focus on the frequency of execution of S5 in the long run 32 A closer look Instruction # Clock cycle it finishes Clock time it finishes(nano seconds) I1 5 10 I2 6 12 I3 7 14 I4 8 16 I5 9 18 … ….. 33 A closer, closer, closer look Instruction # Clock cycle it finishes Clock time it finishes(nano seconds) I1 5 10 I2 6 12 I3 7 14 I4 8 16 I5 9 18 I6 10 20 I7 11 22 ….. Starting from clock time 10 nanoseconds An instruction is completed every 2 nanoseconds 34 A closer, closer, closer look Starting from clock time 10 nanoseconds An instruction is completed every 2 nanoseconds Hence, in the long run Since, an instruction is executed every 2 nanoseconds So, 1/(2 nanoseconds) instructions are executed per second That is: 500 million instructions per seconds or 500 MIPS not 100 MIPS!! As previously thought 35 Another Example of Pipelining In this example, the instruction execution is divided into six stages: 1.Fetch instruction (FI): Read the next expected instruction into a buffer. 2.Decode instruction (DI): Determine the opcode and the operand specifiers. 3.Calculate operands (CO): Calculate the effective address of each source operand. 4.Fetch operands (FO): Fetch each operand from memory. 5.Execute instruction (EI): Perform the indicated operation 6.Write operand (WO): Store the result in memory. 36 Another Example of Pipelining- Time diagram of the execution of instructions of a given program 37 Pipelining: can we achieve better performance ? If one pipeline is then surely two good, pipelines are better. The following example illustrates a possible design for a dual pipeline CPU 38 Pipelining: can we achieve better performance ? Example of Five-stage dual pipeline CPU 39 Five stage pipeline CPU vs five-stage dual pipeline CPU A simple 5-stage pipeline CPU has one pipe A dual 5-stage pipeline CPU has two pipes 40 The five-stage dual pipeline CPU A dual 5-stage pipeline CPU has two pipes The instruction fetch Each of the pipeline Unit: includes there is a single instruction An instruction decode unit fetch unit An operand fetch unit it fetches pairs of instructions together An instruction execution unit and puts each one into its own pipeline, And a write back unit 41 The five-stage dual pipeline CPU A dual 5-stage pipeline CPU has two pipes What is the reason for A single fetch unit fetching two instructions at a time? Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 42 The five-stage dual pipeline CPU A dual 5-stage pipeline CPU has two pipes At least two reasons This is applicable at the level of each CPU. Accessing memory once to fetch two consecutive instructions Than accessing memory two times 43 Memory of a computer system The complex subject of computer memory is made more manageable if we classify memory systems according to their key characteristics Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 44 CHARACTERISTICS OF MEMORY SYSTEMS Location Performance Internal (e.g., processor registers, cache, main Access time memory) Cycle time External (e.g., optical disks, magnetic Transfer rate disks, tapes) Physical Type Capacity Semiconductor Number of words Magnetic Number of bytes Optical Unit of Transfer Magneto-optical Word Physical Characteristics Block Volatile/nonvolatile Access Method Erasable/nonerasable Sequential Organization Direct Memory modules Random Associative 45 Characteristics of Memory Systems Location – Refers to whether memory is internal and external to the computer – Internal memory is often equated with main memory – Processor requires its own local memory, in the form of registers – Cache is another form of internal memory – External memory consists of peripheral storage devices that are accessible to the processor via I/O controllers Capacity – Memory is typically expressed in terms of bytes Unit of transfer – For internal memory the unit of transfer is equal to the number of electrical lines into and out of the memory module 46 Method of Accessing Units of Data Sequential Direct Random Associativ access access access e Each addressable A word is retrieved Memory is organized location in memory has Involves a shared read- based on a portion of its into units of data called a unique, physically write mechanism contents rather than its records wired-in addressing address mechanism Each location has its The time to access a Individual blocks or own addressing Access must be made in given location is records have a unique mechanism and retrieval a specific linear independent of the address based on time is constant sequence sequence of prior physical location independent of location accesses and is constant or prior access patterns Any location can be Cache memories may selected at random and Access time is variable Access time is variable employ associative directly addressed and access accessed Main memory and some cache systems are random access 47 Capacity and Performance: The two most important characteristics of memory Three performance parameters are used: Memory cycle time Access time (latency) Transfer rate Access time plus any For random-access memory it additional time required The rate at which data can be is the time it takes to perform before second access can transferred into or out of a a read or write operation commence memory unit For non-random-access Additional time may be For random-access memory it memory it is the time it takes required for transients to die is equal to 1/(cycle time) to position the read-write out on signal lines or to mechanism at the desired regenerate data if they are location read destructively Concerned with the system bus, not the processor 48 Memory The most common forms are: – Semiconductor memory – Magnetic surface memory – Optical – Magneto-optical Several physical characteristics of data storage are important: – Volatile memory ▪ Information decays naturally or is lost when electrical power is switched off – Nonvolatile memory ▪ Once recorded, information remains without deterioration until deliberately changed ▪ No electrical power is needed to retain information – Magnetic-surface memories ▪ Are nonvolatile – Semiconductor memory ▪ May be either volatile or nonvolatile – Nonerasable memory ▪ Cannot be altered, except by destroying the storage unit ▪ Semiconductor memory of this type is known as read-only memory (ROM ) Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 49 Memory Hierarchy and memory design constraints Design constraints on a computer’s memory can be summed up by three questions: How How How much, fast, expensive Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved Memory Hierarchy and memory design 50 constraints- we have a dilemma Faster access time, greater cost per bit There is a trade-off among capacity, Greater capacity, access time, and smaller cost per bit cost Greater capacity, slower access time 51 Memory Hierarchy- the dilemma The designer would like to use memory technologies that provide for large- capacity memory, both because the capacity is needed and because the cost per bit is low. However, to meet performance requirements, the designer needs to use expensive, relatively lower- capacity memories with short access times. Memory Hierarchy- solution to the 52 dilemma The way out of this dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy. 53 The memory hierarchy: smaller, more expensive, faster memories are supplemented by larger, cheaper, slower memories 54 The memory hierarchy: the benefits Benefits: Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor 55 Relative Cost, Size, and Speed Characteristics Across the Memory Hierarchy 56 Main Memory ( Primary memory) The Memory Bit: memory address: location in The part of memory of a computer cell binary digit where programs and containing data are data stored 57 Memory addresses Memories consist of a number of cells (or locations), each location can store a piece of information. Each cell has a number, called its address, by which programs can refer to it. If a memory has n they will have cells, addresses 0 to n − 1. 58 Memory addresses If a cell consists Therefor it can 2k different bit of k bits, hold any one of combinations. All cells in a memory contain the same number of bits. 59 Three ways of organizing a 96-bit memory Memory orgnization Size of a cell Number of addresses (a) 8 bits 96/8=12 (b) 12 bits 96/12=8 (c) 16 bits 96/16=6 60 Memory addresses- Exercise-1 What is the minimum number of bits needed to reference an address in the memory (a) ? 61 Memory addresses- Exercise-1 solution Memory (a) Consists of 12 possible locations from location 0 to location 11 Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved 62 Memory addresses- Exercise-1 solution Memory (a) We seek a number of bits n such that 2n >= 11 63 Memory addresses- Exercise-1 solution Memory (a) Let us consider the first power of n Such that 2n >= 11 n 2n 2n >= 11 1 2 NO 2 4 NO 3 8 NO 4 16 YES 5 32 YES 64 References William Stallings, Computer Organization and Architecture, Pearson Ed, 11 th Edition, 2019 Tanenbaum, Andrew S. Structured computer organization. Pearson Ed, 6th, 2016 Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved