Lec19-4471029-MIPS Pipelining (Contd) PDF
Document Details
Uploaded by CuteWatermelonTourmaline
KNU
Tags
Summary
These lecture notes discuss pipelining in MIPS architecture, covering the instruction processing cycle and different stages in a single-cycle Uarch. The notes include diagrams and figures.
Full Transcript
Pipelining : Pipelining Instruction Processing 471029: Introduction to Computer Architecture 19th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU...
Pipelining : Pipelining Instruction Processing 471029: Introduction to Computer Architecture 19th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 1 Remember: The Instruction Processing Cycle § Fetch 1. Instruction fetch (IF) § Decode decode and 2. Instruction § Evaluate register Address fetch (ID/RF) operand 3. Execute/Evaluate § Fetch Operands memory address (EX/AG) 4. Memory operand fetch (MEM) § Execute 5. Store/writeback § Store Result result (WB) 2 Remember: the Single-Cycle Uarch Instruction [25– 0] Shift Jump address [31– 0] 26 left 2 28 0 PCSrc 1 1=Jump PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add Shift RegDst Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg PCSrc2=Br Taken ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read register 2 Zero Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write bcond data 16 32 Instruction [15– 0] Sign extend ALU control Instruction [5– 0] ALU operation T BW=~(1/T) 3 Dividing Into Stages 200ps 100ps 200ps 200ps 100ps IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back register file read address calculation 0 M u x 1 ignore for now Add 4 Add Add result Shift left 2 Read PC Address register 1 Read data 1 Read register 2 Zero Instruction Registers Read ALU ALU 0 Read RF Write data 2 result Address 1 register M data Instruction M u Data memory u Write data x 1 memory x 0 write Write data 16 32 Sign extend Is this the correct partitioning? Why not 4 or 6 stages? Why not different boundaries? 4 Instruction Pipeline Throughput Program execution 2 2004 400 6 600 8 800 101000 1200 12 1400 14 1600 16 1800 18 order Time (in instructions) Instruction Data lw $1, 100($0) fetch Reg ALU access Reg Instruction Data lw $2, 200($0) 8 ns 800ps fetch Reg ALU access Reg Instruction lw $3, 300($0) 8800ps ns fetch... 8 ns 800ps Program 200 4 400 6 600 execution 2 8 800 1000 10 1200 12 1400 14 Time order (in instructions) Instruction Data lw $1, 100($0) Reg ALU Reg fetch access Instruction Data lw $2, 200($0) 2 ns Reg ALU Reg 200ps fetch access Instruction Data lw $3, 300($0) 2 ns 200ps Reg ALU Reg fetch access 2 ns 200ps 2 ns 200ps 2200ps ns 2 ns 200ps 2 ns 200ps 5-stage speedup is 4, not 5 as predicted by the ideal model. Why? 5 Enabling Pipelined Processing: Pipeline Registers IF: Instruction fetch ¢ Need registers between stages MEM: Memory access ID: Instruction decode/ EX: Execute/ WB: Write back 0 0 M § To register file hold read information address produced calculation in previous cycle Mu ux x1 ¢ No resource is used by more than 1 stage! 1 IF/ID PCD+4 ID/EX EX/MEM MEM/WB PCE+4 Add nPCM Add Add 4 Add result 4 Add Add result Shift Shift left 2 left 2 Read Instruction PC Address register 1 Read Read AE PCF PC Address register 1 data 1 Read AoutM Read register 2 data 1 Zero Instruction Read MDRW register 2Registers Read ALUZero ALU memory Instruction WriteRegisters Read 0 ALU ALU Read IRD data 2 result Address 1 register Write 0M data Read data 2 result Address 1M register Mu Data data Instruction Mu Write ux Data memory memory data Write BE x1 ux memory x0 data 1 Write data 0 Write data AoutW 16 32 BM ImmE 16 Sign 32 Sign extend extend T/k T/k ps T ps 6 memory Write Read data 2 0 ALU result Address Read 1 Write register data 2 M result Address data 1 register M data Data M u M Write u x Data memory u Write x u x data 1 memory x data 1 0 Write 0 Write data data 16 32 Pipelined Operation Example 16 32Sign Sign extend extend Regularity! lw All instruction classes must follow the same path Instruction fetch and timing lw through the pipeline lw stages. 00 00 lw MMM Instruction decode Execution uuu x xx 111 Any performance impact? Write back IF/ID IF/ID IF/ID IF/ID ID/EX ID/EX ID/EX ID/EX EX/MEM EX/MEM EX/MEM EX/MEM MEM/WB MEM/WB MEM/WB MEM/WB Add Add Add Add AddAdd Add 444 Add Add Add Add result result result result Shift Shift Shift Shift left 22 left22 left left Read Read Read Instruction Read Instruction Instruction Instruction PC PC Address Address register register111 register Read PC Address Read Read Read Read data111 data data data Read Read Read register register222 register Zero Zero Zero Zero Instruction Instruction Instruction Registers Registers Read Registers Read Read ALU ALU ALU ALU ALU ALU ALU memory memory memory Write 00 ALU ALU Read Read Write Write data222 data 00 result Address Address Address Read Read register data data M result result result Address Address data data 1111 register register M M M Data data data uu Data MMM M uu Data Write Write Write xxxx memory uuuu memory memory x xxx data data data 11 11 Write Write 0000 Write Write data data data data 16 16 16 32 32 32 Sign Sign Sign extend extend extend lw 0 Instruction decode 7 M u memory Write Write data 22 0x result Read Read register data result Addressmemory data 11x data register M 1M data M uu Data 0M Write Write xx Write uu data memory memory xx data data 11 16 32 Write 00 Write Sign data data extend 16 16 32 32 Pipelined Operation Example (cont’d) Sign Sign extend extend Clock 1 Clock Clock 5 3 lw $10, sub $11,20($1) $2, $3 lw $10, sub $11,20($1) $2, $3 lw $10, 20($1) Instruction fetch Instruction decode Execution 0 sub $11, $2, $3 lw $10, sub $11,20($1) $2, $3 sub $11,20($1) lw $10, $2, $3 00 M M M uuu Execution Memory Write back Write back xxx 1 11 IF/ID IF/ID IF/ID ID/EX ID/EX ID/EX EX/MEM EX/MEM EX/MEM MEM/WB MEM/WB MEM/WB Add Add Add Add AddAdd 44 Add Add result result result Shift Shift Shift left 22 left left 2 Read Read Read Instruction Instruction PC PC PC Address Address Address register 11 register Read Read Read Read data 1 data 11 Read Read Zero Instruction register 22 register register 2 Zero Zero Instruction Registers Read ALU ALU memory Registers Registers Read Read 0 ALU ALU ALU ALU memory memory Write Write 2 00 result Address Read Read Read 1 data 22 data result result Address data 11 register register register M MM data data M M M uu Data Data u Write Write Write xxx uu memory memory xxx data data 1 11 0 Write 00 Write Write data data data 16 16 16 32 32 Sign Sign extend extend extend Clock Clock Clock56 21 43 Clock Clock sub $11, $2, $3 lw $10, 20($1) Instruction00 fetch Instruction decode sub $11, $2, $3 lw $10, 20($1) sub $11, $2, $38 M Illustrating Pipeline Operation: Operation View t0 t1 t2 t3 t4 t5 Inst0 IF ID EX MEM WB Inst1 IF ID EX MEM WB Inst2 IF ID EX MEM WB Inst3 IF ID EX MEM WB Inst4 IF ID EX MEM IF ID EX steady state IF ID (full pipeline) IF 9 Illustrating Pipeline Operation: Resource View t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 ID I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 EX I0 I1 I2 I3 I4 I5 I6 I7 I8 MEM I0 I1 I2 I3 I4 I5 I6 I7 WB I0 I1 I2 I3 I4 I5 I6 10 Control Points in a Pipeline PCSrc 0 M u x 1 IF/ID ID/EX EX/MEM MEM/WB Add Add 4 Add result Branch Shift RegWrite left 2 Read MemWrite Instruction PC Address register 1 Read data 1 Read ALUSrc Zero Zero MemtoReg Instruction register 2 Registers Read ALU ALU memory Write 0 Read data 2 result Address 1 register M data u M Data u Write x memory data x 1 0 Write data Instruction [15– 0] 16 32 6 Sign ALU extend control MemRead Instruction [20– 16] 0 M ALUOp Instruction u [15– 11] x 1 RegDst Identical set of control points as the single-cycle datapath!! 11 Control Signals in a Pipeline ¢ For a given instruction § same control signals as single-cycle, but § control signals required at different cycles, depending on stage èOption 1: decode once using the same logic as single-cycle and buffer signals until consumed WB Instruction Control M WB EX M WB IF/ID ID/EX EX/MEM MEM/WB èOption 2: carry relevant “instruction word/field” down the pipeline and decode locally within each or in a previous stage Which one is better? 12