Pipelining in Computer Architecture PDF

Computer Architecture Pipelining Isaac Osei Nyantakyi (Ph.D) Chapter objective Basic concepts: Understand the fundamental concept of pipelining in CPU architecture. Pipeline stages: Identify and describe the key stages of a typical CPU pipeline. Hazard identification:Identify the types of hazards that can occur in a pipeline. Introduction 1 What is Pipelining? 2 The Major Hurdle of Pipelining-Structural Hazards – Data Hazards – Control Hazards 3 How is Pipelining Implemented 4 What Makes Pipelining Hard to Implement? 5 Extending the MIPS Pipeline to Handle Multi-cycle Operations * What Is Pipelining Laundry Example Ann, Brian, Cathy, Dave A B C D each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes * What Is Pipelining 6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20 30 40 20 30 40 20 T a A s k B O r d C e r D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? * What Is Pipelining Start work ASAP 6 PM 7 8 9 10 11 Midnight Time 30 40 40 40 40 20 T a A s Pipelined laundry takes k 3.5 hours for 4 loads B O r d C e r D * What Is Pipelining Lessons Pipelining Pipelining doesn’t help 6 PM 7 8 9 latency of single task, it helps throughput of entire workload Time Pipeline rate limited by T slowest pipeline stage a 30 40 40 40 40 20 Multiple tasks operating s simultaneously k A Potential speedup = Number O pipe stages r B Unbalanced lengths of pipe d stages reduces speedup e Time to “fill” pipeline and time r C to “drain” it reduces speedup D * Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution – Takes advantage of parallelism that exists among the actions needed to execute an instruction fetch instruction from memory decode to figure out what to do read source operands execute write results * The Basic Pipeline For MIPS Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 ALU Ifetch Reg DMem Reg I n s ALU Reg t Ifetch Reg DMem r. ALU O Ifetch Reg DMem Reg r d ALU e Ifetch Reg DMem Reg r Figure 3.3 * Pipeline Hurdles Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Pipelining of branches & other instructions that change the PC – Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline * Pipeline Hurdles Definition conditions that lead to incorrect behavior if not fixed Structural hazard – two different instructions use same h/w in same cycle Data hazard – two different instructions use same storage – must appear as if the instructions execute in correct order Control hazard – one instruction affects which instruction is next Resolution Pipeline interlock logic detects hazards and fixes them simple solution: stall increases CPI, decreases performance better solution: partial stall some instruction stall, others proceed better to stall early than late * Structural Hazards When two or Time (clock cycles) more different instructions want Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 to use same hardware I Load Ifetch ALU Reg DMem Reg resource in same n cycle s Instr 1 ALU Ifetch Reg DMem Reg e.g., MEM uses t the same memory r. port as IF as ALU Ifetch Reg DMem Reg Instr 2 shown in this slide. O ALU r Instr 3 Ifetch Reg DMem Reg d e ALU Reg Instr 4 Ifetch Reg DMem r Figure 3.6 * Structural Hazards Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 This is another way of looking at the effect of I Load Ifetch ALU Reg DMem Reg a stall. n s ALU t Instr 1 Ifetch Reg DMem Reg r. ALU Reg Instr 2 Ifetch Reg DMem O r Stall Bubble Bubble Bubble Bubble Bubble d e r ALU Instr 3 Ifetch Reg DMem Reg Figure 3.7 * Structural Hazards This is another way to represent the stall we saw on the last few pages. * Structural Hazards Dealing with Structural Hazards Stall low cost, simple Increases CPI use for rare case since stalling has performance effect Pipeline hardware resource useful for multi-cycle resources good performance sometimes complex e.g., RAM Replicate resource good performance increases cost (+ maybe interconnect delay) useful for cheap or divisible resources * Structural Hazards Structural hazards are reduced with these rules: Each instruction uses a resource at most once Always use the resource in the same pipeline stage Use the resource for one cycle only Many RISC ISA’s are designed with this in mind Sometimes very complex to do this. For example, memory of necessity is used in the IF and MEM stages. Some common Structural Hazards: Memory Floating point - Since many floating point instructions require many cycles, it’s easy for them to interfere with each other. Starting up more of one type of instruction than there are resources. For instance, the PA-8600 can support two ALU + two load/store instructions per cycle - that’s how much hardware it has available. * Data Hazards These occur when at any time, there are instructions active that need to access the same data (memory or register) locations. Where there’s real trouble is when we have: instruction A instruction B and B manipulates (reads or writes) data before A does. This violates the order of the instructions, since the architecture implies that A completes entirely before B is executed. * Data Hazards Execution Order is: Read After Write (RAW) InstrJ tries to read operand before InstrI writes it InstrI InstrJ I: add r1,r2,r3 J: sub r4,r1,r3 Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. * Data Hazards Execution Order is: Write After Read (WAR) InstrJ tries to write operand before InstrI reads i InstrI InstrJ – Gets wrong operand I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 – Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 * Data Hazards Execution Order is: Write After Write (WAW) InstrJ tries to write operand before InstrI writes it InstrI InstrJ – Leaves wrong result ( InstrI not InstrJ ) I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 Will see WAR and WAW in later more complicated pipes * Data Hazards Simple Solution to RAW Hardware detects RAW and stalls Assumes register written then read each cycle + low cost to implement, simple -- reduces IPC Try to minimize stalls Minimizing RAW stalls Bypass/forward/shortcircuit (We will use the word “forward”) Use data before it is in the register + reduces/avoids stalls -- complex Crucial for common RAW hazards * Data Hazards Time (clock cycles) IF ID/RF EX MEM WB I ALU add r1,r2,r3Ifetch Reg DMem Reg n s ALU Ifetch Reg DMem Reg t sub r4,r1,r3 r. ALU Ifetch Reg DMem Reg and r6,r1,r7 O r ALU Ifetch Reg DMem Reg d or r8,r1,r9 e ALU Reg r xor r10,r1,r11 Ifetch Reg DMem The use of the result of the ADD instruction in the next three instructions causes a hazard, since the register is not written until after those instructions read it. Figure 3.9 * Forwarding is the concept of making data available to the input of the ALU for Data Hazards subsequent instructions, even though the generating instruction hasn’t gotten to WB Forwarding To Avoid in order to write the memory or registers. Data Hazard Time (clock cycles) I n add r1,r2,r3 Ifetch ALU Reg DMem Reg s t ALU r. sub r4,r1,r3 Ifetch Reg DMem Reg O ALU Ifetch Reg DMem Reg r and r6,r1,r7 d e ALU Ifetch Reg DMem Reg r or r8,r1,r9 ALU Ifetch Reg DMem Reg xor r10,r1,r11 Figure 3.10 * The data isn’t loaded until after Data Hazards the MEM stage. Time (clock cycles) I lw r1, 0(r2) Ifetch ALU Reg DMem Reg n s t ALU sub r4,r1,r6 Ifetch Reg DMem Reg r. O ALU Ifetch Reg DMem Reg and r6,r1,r7 r d e ALU Ifetch Reg DMem Reg r or r8,r1,r9 There are some instances where hazards occur, even with forwarding. Figure 3.12 * The stall is necessary as shown Data Hazards here. Time (clock cycles) I n ALU Reg s lw r1, 0(r2) Ifetch Reg DMem t r. ALU sub r4,r1,r6 Ifetch Reg Bubble DMem Reg O r d Bubble ALU Ifetch Reg DMem Reg e and r6,r1,r7 r Bubble ALU Ifetch Reg DMem or r8,r1,r9 There are some instances where hazards occur, even with forwarding. Figure 3.13 * This is another representation Data Hazards of the stall. LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID EX MEM WB AND R6, R1, R7 IF ID EX MEM WB OR R8, R1, R9 IF ID EX MEM WB LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID stall EX MEM WB AND R6, R1, R7 IF stall ID EX MEM WB OR R8, R1, R9 stall IF ID EX MEM WB * Control Hazards A control hazard is when we need to find the destination of a branch, and can’t fetch any new instructions until we know that destination. * Control Hazard on Control Hazards Branches Three Stage Stall ALU 10: beq r1,r3,36 Ifetch Reg DMem Reg ALU Ifetch Reg DMem Reg 14: and r2,r3,r5 ALU Reg 18: or r6,r1,r7 Ifetch Reg DMem ALU Ifetch Reg DMem Reg 22: add r8,r1,r9 ALU 36: xor r10,r1,r11 Ifetch Reg DMem Reg * Dealing with Branches Multiple Streams Prefetch Branch Target Loop buffer Branch prediction Delayed branching Multiple Streams Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Leads to bus & register contention Multiple branches lead to further pipelines being needed Prefetch Branch Target Target of branch is prefetched in addition to instructions following branch Keep target until branch is executed Used by IBM 360/91 Loop Buffer Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps c.f. cache Used by CRAY-1 Branch Prediction (1) Predict never taken – Assume that jump will not happen – Always fetch next instruction – 68020 & VAX 11/780 – VAX will not prefetch after branch if a page fault would result (O/S v CPU design) Predict always taken – Assume that jump will happen – Always fetch target instruction Branch Prediction (2) Predict by Opcode – Some instructions are more likely to result in a jump than thers – Can get up to 75% success Taken/Not taken switch – Based on previous history – Good for loops Branch Prediction (3) Delayed Branch – Do not take jump until you have to – Rearrange instructions

Pipelining in Computer Architecture PDF

Document Details

Tags

Related

Summary

Full Transcript