Full Transcript

Forwarding  Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding. 1 The...

Forwarding  Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding. 1 The pipelined datapath 1 0 ID/EX WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P Add C Shift RegWrite left 2 Read Read register 1 data 1 MemWrite ALU Read Instruction Zero Read Read address [31-0] 0 register 2 data 2 Result Address Write 1 Data Instruction register MemToReg memory memory Registers ALUOp Write data ALUSrc Write Read 1 data data Instr [15 - 0] Sign RegDst extend MemRead 0 Instr [20 - 16] 0 Instr [15 - 11] 1 2 Pipeline diagram review Clock cycle 1 2 3 4 5 6 7 8 9 lw $8, 4($29) IF ID EX MEM WB sub $2, $4, $5 IF ID EX MEM WB and $9, $10, $11 IF ID EX MEM WB or $16, $17, $18 IF ID EX MEM WB add $13, $14, $0 IF ID EX MEM WB  This diagram shows the execution of an ideal code fragment. — Each instruction needs a total of five cycles for execution. — One instruction begins on every clock cycle for the first five cycles. — One instruction completes on each cycle from that time on. 3 Our examples are too simple  Here is the example instruction sequence used to illustrate pipelining on the previous page. lw $8, 4($29) sub $2, $4, $5 and $9, $10, $11 or $16, $17, $18 add $13, $14, $0  The instructions in this example are independent. — Each instruction reads and writes completely different registers. — Our datapath handles this sequence easily, as we saw last time.  But most sequences of instructions are not independent! 4 An example with dependencies sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) 5 An example with dependencies sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  There are several dependencies in this new code fragment. — The first instruction, SUB, stores a value into $2. — That register is used as a source in the rest of the instructions.  This is not a problem for the single-cycle and multicycle datapaths. — Each instruction is executed completely before the next one begins. — This ensures that instructions 2 through 5 above use the new value of $2 (the sub result), just as we expect.  How would this code sequence fare in our pipelined datapath? 6 Data hazards in the pipeline diagram Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB add $14, $2, $2 IF ID EX MEM WB sw $15, 100($2) IF ID EX MEM WB  The SUB instruction does not write to register $2 until clock cycle 5. This causes two data hazards in our current pipelined datapath. — The AND reads register $2 in cycle 3. Since SUB hasn’t modified the register yet, this will be the old value of $2, not the new one. — Similarly, the OR instruction uses register $2 in cycle 4, again before it’s actually updated by SUB. 7 Things that are okay Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB add $14, $2, $2 IF ID EX MEM WB sw $15, 100($2) IF ID EX MEM WB  The ADD instruction is okay, because of the register file design. — Registers are written at the beginning of a clock cycle. — The new value will be available by the end of that cycle.  The SW is no problem at all, since it reads $2 after the SUB finishes. 8 Dependency arrows Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB add $14, $2, $2 IF ID EX MEM WB sw $15, 100($2) IF ID EX MEM WB  Arrows indicate the flow of data between instructions. — The tails of the arrows show when register $2 is written. — The heads of the arrows show when $2 is read.  Any arrow that points backwards in time represents a data hazard in our basic pipelined datapath. Here, hazards exist between instructions 1 & 2 and 1 & 3. 9 A fancier pipeline diagram Clock cycle 1 2 3 4 5 6 7 8 9 IM Reg DM Reg sub $2, $1, $3 and $12, $2, $5 IM Reg DM Reg IM Reg DM Reg or $13, $6, $2 add $14, $2, $2 IM Reg DM Reg sw $15, 100($2) IM Reg DM Reg 10 A more detailed look at the pipeline  We have to eliminate the hazards, so the AND and OR instructions in our example will use the correct value for register $2.  When is the data is actually produced and consumed?  What can we do? Clock cycle 1 2 3 4 5 6 7 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB 11 A more detailed look at the pipeline  We have to eliminate the hazards, so the AND and OR instructions in our example will use the correct value for register $2.  Let’s look at when the data is actually produced and consumed. — The SUB instruction produces its result in its EX stage, during cycle 3 in the diagram below. — The AND and OR need the new value of $2 in their EX stages, during clock cycles 4-5 here. Clock cycle 1 2 3 4 5 6 7 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB 12 Bypassing the register file  The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5.  If we could somehow bypass the writeback and register read stages when needed, then we can eliminate these data hazards. — Today we’ll focus on hazards involving arithmetic instructions. — Next time, we’ll examine the lw instruction.  Essentially, we need to pass the ALU output from SUB directly to the AND and OR instructions, without going through the register file. Clock cycle 1 2 3 4 5 6 7 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB 13 Where to find the ALU result  The ALU result generated in the EX stage is normally passed through the pipeline registers to the MEM and WB stages, before it is finally written to the register file.  This is an abridged diagram of our pipelined datapath. IF/ID ID/EX EX/MEM MEM/WB PC ALU Registers Instruction memory Data memory 1 Rt 0 0 Rd 1 14 Forwarding  Since the pipeline registers already contain the ALU result, we could just forward that value to subsequent instructions, to prevent data hazards. — In clock cycle 4, the AND instruction can get the value $1 - $3 from the EX/MEM pipeline register used by sub. — Then in cycle 5, the OR can get that same result from the MEM/WB pipeline register being used by SUB. Clock cycle 1 2 3 4 5 6 7 IM Reg DM Reg sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5 IM Reg DM Reg or $13, $6, $2 15 Outline of forwarding hardware  A forwarding unit selects the correct ALU inputs for the EX stage. — If there is no hazard, the ALU’s operands will come from the register file, just like before. — If there is a hazard, the operands will come from either the EX/MEM or MEM/WB pipeline registers instead.  The ALU sources will be selected by two new multiplexers, with control signals named ForwardA and ForwardB. IM Reg DM Reg sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5 IM Reg DM Reg or $13, $6, $2 16 Simplified datapath with forwarding muxes IF/ID ID/EX EX/MEM MEM/WB PC 0 1 2 Registers ForwardA Instruction ALU memory 0 1 Data 2 memory 1 ForwardB Rt 0 0 Rd 1 17 Detecting EX/MEM data hazards  So how can the hardware determine if a hazard exists? IM Reg DM Reg sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5 18 Detecting EX/MEM data hazards  So how can the hardware determine if a hazard exists?  An EX/MEM hazard occurs between the instruction currently in its EX stage and the previous instruction if: 1. The previous instruction will write to the register file, and 2. The destination is one of the ALU source registers in the EX stage.  There is an EX/MEM hazard between the two instructions below. IM Reg DM Reg sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5  Data in a pipeline register can be referenced using a class-like syntax. For example, ID/EX.RegisterRt refers to the rt field stored in the ID/EX pipeline. 19 EX/MEM data hazard equations  The first ALU source comes from the pipeline register when necessary. if (EX/MEM.RegWrite = 1 and EX/MEM.RegisterRd = ID/EX.RegisterRs) then ForwardA = 2  The second ALU source is similar. if (EX/MEM.RegWrite = 1 and EX/MEM.RegisterRd = ID/EX.RegisterRt) then ForwardB = 2 IM Reg DM Reg sub $2, $1, $3 IM Reg DM Reg and $12, $2, $5 20 Detecting MEM/WB data hazards  A MEM/WB hazard may occur between an instruction in the EX stage and the instruction from two cycles ago.  One new problem is if a register is updated twice in a row. add $1, $2, $3 add $1, $1, $4 sub $5, $5, $1  Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded. IM Reg DM Reg add $1, $2, $3 IM Reg DM Reg add $1, $1, $4 IM Reg DM Reg sub $5, $5, $1 21 MEM/WB hazard equations  Here is an equation for detecting and handling MEM/WB hazards for the first ALU source. if (MEM/WB.RegWrite = 1 and MEM/WB.RegisterRd = ID/EX.RegisterRs and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs or EX/MEM.RegWrite = 0) then ForwardA = 1  The second ALU operand is handled similarly. if (MEM/WB.RegWrite = 1 and MEM/WB.RegisterRd = ID/EX.RegisterRt and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt or EX/MEM.RegWrite = 0) then ForwardB = 1 22 Simplified datapath with forwarding IF/ID ID/EX EX/MEM MEM/WB PC 0 1 2 ForwardA Registers Instruction ALU memory 0 1 Data 2 memory 1 ForwardB Rt 0 0 Rd 1 EX/MEM.RegisterRd Rs ID/EX. RegisterRt Forwarding Unit MEM/WB.RegisterRd ID/EX. RegisterRs 23 The forwarding unit  The forwarding unit has several control signals as inputs. ID/EX.RegisterRs EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt EX/MEM.RegWrite MEM/WB.RegWrite (The two RegWrite signals are not shown in the diagram, but they come from the control unit.)  The fowarding unit outputs are selectors for the ForwardA and ForwardB multiplexers attached to the ALU. These outputs are generated from the inputs using the equations on the previous pages.  Some new buses route data from pipeline registers to the new muxes. 24 Example sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  Assume again each register initially contains its number plus 100. — After the first instruction, $2 should contain -2 (101 - 103). — The other instructions should all use -2 as one of their operands.  We’ll try to keep the example short. — Assume no forwarding is needed except for register $2. — We’ll skip the first two cycles, since they’re the same as before. 25 Clock cycle 3 IF: or $13, $6, $2 ID: and $12, $2, $5 EX: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 101 2 0 102 101 1 2 5 0 Registers Instruction ALU 103 memory X 105 0 103 1 -2 Data X 2 memory 1 0 5 (Rt) 0 0 2 12 (Rd) 2 1 EX/MEM.RegisterRd 2 (Rs) ID/EX. RegisterRt Forwarding 3 Unit ID/EX. 1 MEM/WB.RegisterRd RegisterRs 26 Clock cycle 4: forwarding $2 from EX/MEM IF: add $14, $2, $2 ID: or $13, $6, $2 EX: and $12, $2, $5 MEM: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 102 6 0 106 -2 1 2 2 2 Registers Instruction ALU -2 105 memory X 102 0 105 1 104 Data X 2 memory 1 0 2 (Rt) 0 0 12 13 (Rd) 12 1 EX/MEM.RegisterRd 6 (Rs) ID/EX. RegisterRt 2 Forwarding 5 Unit 2 MEM/WB.RegisterRd ID/EX. RegisterRs -2 27 Clock cycle 5: forwarding $2 from MEM/WB IF: sw $15, 100($2) ID: add $14, $2, $2 EX: or $13, $6, $2 MEM: and $12, $2, $5 WB: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 106 2 0 -2 106 1 2 2 0 Registers Instruction ALU 104 102 memory 2 -2 0 -2 1 -2 Data -2 2 -2 memory X 1 -2 1 2 (Rt) 0 0 13 14 (Rd) 13 1 EX/MEM.RegisterRd 2 2 (Rs) ID/EX. RegisterRt 12 Forwarding 2 Unit ID/EX. 6 2 MEM/WB.RegisterRd RegisterRs 104 -2 28 Lots of data hazards  The first data hazard occurs during cycle 4. — The forwarding unit notices that the ALU’s first source register for the AND is also the destination of the SUB instruction. — The correct value is forwarded from the EX/MEM register, overriding the incorrect old value still in the register file.  A second hazard occurs during clock cycle 5. — The ALU’s second source (for OR) is the SUB destination again. — This time, the value has to be forwarded from the MEM/WB pipeline register instead.  There are no other hazards involving the SUB instruction. — During cycle 5, SUB writes its result back into register $2. — The ADD instruction can read this new value from the register file in the same cycle. 29 Complete pipelined datapath...so far ID/EX WB EX/MEM Control M WB MEM/WB IF/ID EX M WB PC Read Read 0 register 1 data 1 1 Addr Instr 2 Read ALU register 2 Zero ALUSrc Write Read Result Address 0 Instruction register data 2 1 0 Data memory Write Registers 2 memory data 1 Write Read Instr [15 - 0] 1 RegDst data data Extend Rt 0 0 Rd 1 EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd 30 What about stores?  Two “easy” cases: 1 2 3 4 5 6 add $1, $2, $3 IM Reg DM Reg sw $4, 0($1) IM Reg DM Reg 1 2 3 4 5 6 add $1, $2, $3 IM Reg DM Reg sw $1, 0($4) IM Reg DM Reg 31 Store Bypassing: Version 1 EX: sw $4, 0($1) MEM: add $1, $2, $3 IF/ID ID/EX EX/MEM MEM/WB PC Read Read 0 register 1 data 1 1 Addr Instr 2 Read ALU register 2 Zero ALUSrc Write Read Result Address 0 Instruction register data 2 1 0 Data memory Write Registers 2 memory data 1 Write Read Instr [15 - 0] 1 RegDst data data Extend Rt 0 0 Rd 1 EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd 32 Store Bypassing: Version 2 EX: sw $1, 0($4) MEM: add $1, $2, $3 IF/ID ID/EX EX/MEM MEM/WB PC Read Read 0 register 1 data 1 1 Addr Instr 2 Read ALU register 2 Zero ALUSrc Write Read Result Address 0 Instruction register data 2 1 0 Data memory Write Registers 2 memory data 1 Write Read Instr [15 - 0] 1 RegDst data data Extend Rt 0 0 Rd 1 EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd 33 What about stores?  A harder case: 1 2 3 4 5 6 lw $1, 0($2) IM Reg DM Reg sw $1, 0($4) IM Reg DM Reg  In what cycle is: — The load value available? — The store value needed?  What do we have to add to the datapath? 34 Load/Store Bypassing: Extend the Datapath ForwardC IF/ID ID/EX EX/MEM 0 MEM/WB PC 1 Read Read 0 register 1 data 1 1 Addr Instr 2 Read ALU register 2 Zero ALUSrc Write Read Result Address 0 Instruction register data 2 1 0 Data memory Write Registers 2 memory data 1 Write Read Instr [15 - 0] data data 1 RegDst Extend Rt 0 0 Rd 1 EX/MEM.RegisterRd Rs Forwarding Sequence : Unit lw $1, 0($2) sw $1, 0($4) MEM/WB.RegisterRd 35 Miscellaneous comments  Each MIPS instruction writes to at most one register. — This makes the forwarding hardware easier to design, since there is only one destination register that ever needs to be forwarded.  Forwarding is especially important with deep pipelines like the ones in all current PC processors.  Section 6.4 of the textbook has some additional material not shown here. — Their hazard detection equations also ensure that the source register is not $0, which can never be modified. — There is a more complex example of forwarding, with several cases covered. Take a look at it! 36 Summary  In real code, most instructions are dependent upon other ones. — This can lead to data hazards in our original pipelined datapath. — Instructions can’t write back to the register file soon enough for the next two instructions to read.  Forwarding eliminates data hazards involving arithmetic instructions. — The forwarding unit detects hazards by comparing the destination registers of previous instructions to the source registers of the current instruction. — Hazards are avoided by grabbing results from the pipeline registers before they are written back to the register file.  Next, we’ll finish up pipelining. — Forwarding can’t save us in some cases involving lw. — We still haven’t talked about branches for the pipelined datapath. 37

Use Quizgecko on...
Browser
Browser