Lecture 3 Pipelining PDF
Document Details

Uploaded by ImpartialElPaso571
The Copperbelt University
Tags
Summary
These lecture notes cover the concept of pipelining in computer architecture, explaining the basic instruction pipelining, and calculating the time, speedup and effects of pipelining. The document includes examples and diagrams to illustrate the concepts, also covering superscalar processors. These notes are suitable for undergraduate students studying computer science.
Full Transcript
LECTURE 3 Pipelining Basic Instruction Pipelining  simplest definition – execution of the next instruction begins before execution of previous instruction is completed  Techniques used in design of modern processors, microcontrollers and CPUs to increase instruction throughput( no of instr...
LECTURE 3 Pipelining Basic Instruction Pipelining  simplest definition – execution of the next instruction begins before execution of previous instruction is completed  Techniques used in design of modern processors, microcontrollers and CPUs to increase instruction throughput( no of instructions that can be executed in a unit time) The main idea of instruction pipelining  Main idea = divide (split) the processing of a CPU instruction into a series of independent steps/operations as defined by the opcode  Allows CPU’s control logic to handle instructions at the processing rate of the slowest step.  Slowest step is much faster than the time needed to process the instruction as a single step Example of a car assembly plant  Like in car assembly each step is carrying a single microinstruction/micro operation and each step is linked to another step.  Each step is called a pipe stage- stages connected one to the next to form a pipe  Instructions enter at one end, progress through the stages and exit at the other end.  Time required to move an instruction one step down the pipeline is called processor cycle Pipeline stages Example: Consider a RISC pipeline broken into five stages with a set of flip flops between each stage as follows: 1 Instruction fetch (IF) 2 Instruction decode (ID) 3 Execute (EX) 4 Memory Access(MEM) 5 Write-back (WB) Block diagram showing pipeline stages Pipelined processors consists of :  Internal modules which can semi independently work on separate microinstructions  Stages are linked by flip flops  Pipelining reduces instruction’s overall processing time but does not reduce the stages Serial processing A Non pipelined is not as efficient because some CPU modules are idle while another is active during instruction cycle  Note that pipelining does not completely remove idle time in a pipelined CPU.  Making CPU modules work in parallel increases instruction throughput Instruction pipeline is called fully pipelined if it can accept a new instruction every clock cycle Non pipelined processor Quantitative effects of pipelining  Time in non seconds per instruction goes up  Each instruction takes more cycles to execute but average CPI remains roughly the same  Clock speed goes up  Total execution time goes down resulting in lower average time per instruction Example Consider a simple example to understand pipelining Consider 4 students, Ann, Ben, Candle, and Donald who share a Washer machine, a drying machine and an iron. Washing takes 30 min Drying takes 40 min Ironing takes 20 min Calculating execution time in serial load If the students do their workload in sequence, how much time will it take to finish the four wash loads? Use diagram to explain your calculations. Calculating the time in pipelined If the students have been taught pipelining and decide to apply it the entire wash load would take 3.5 hours Calculating speedup It is possible to calculate speedup, i.e. how fast is pipelining compared to serial processing. Calculating total time in pipelining 1) Let k be the number of stages in the pipeline and tp the time taken to execute per stage 2) Each instruction represents a task T in the pipeline and n be the number of tasks(instructions) 3) First task requires k x tp time to complete in a k stage pipeline 4) The remaining n – 1 tasks emerge from pipeline one per cycle, so the total time to complete the remaining tasks is (n – 1) tp So to complete the n tasks using a k stage pipeline requires: (k x tp) + (n – 1) tp = (k + n -1) tp where: (k x tp) is the first instruction (k + n -1)tp are the remaining n – 1 instructions This formula is applied if all stages take exactly the same time and that there are no wait cycles. Clock skew Pipelining always introduce clock skew, i.e. different arrivals of the clock signals especially in adjacent flip flops In such a case the formula for calculating instruction latency is: Latency = MAX(lengths of unpipelined stages)+ overhead (clock skew) Example Consider a non pipelined machine with six execution stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50 ns and 50 ns. Find the instruction latency and how much time it would take to execute 100 instructions? Suppose pipeline is introduced to the above machine and assume a lock skew of 5 ns is added as overhead to each execution stage. (i) What would be the instruction latency? (ii) How much time would it take to execute 100 instructions? Calculating instruction latency Length of pipe stage = MAX(length of un pipe lined stages) + overhead = MAX(50,50,60,60,50,50) + 5 NS = 60 + 5 = 65 ns To execute 100 instructions: = (1 x 6 x 65) + (1 x 65 x 99) = 390 + 6435 = 6825 ns Superscalar processor A super scalar can execute more than one instruction in a cycle Dispatches multiple instructions to several execution units Each execution unit is not a separate processor but an execution unit in the same processor such as ALU Implements a form of instruction parallelism called instruction level parallelism in a single processor Space time diagram of a super scalar Super scalar This can be represented by the instruction stages as shown below: