CS471 Parallel Processing Lecture 2 PDF
Document Details
Uploaded by PropitiousNephrite4270
Dr. Ahmed Hesham Mostafa
Tags
Related
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-5.pdf
- Digital Design and Computer Architecture RISC-V Edition PDF
- Computer Organization and Architecture Notes & Ebook PDF
- Chapter 4 The Processor PDF
- Computer Architecture: CISC vs RISC (PDF)
Summary
These lecture notes cover parallel processing and RISC architecture, including instruction pipelining and the five stages of a pipeline. The document includes diagrams and examples of instruction executions.
Full Transcript
CS471- Parallel Processing Dr. Ahmed Hesham Mostafa Lecture 2 – RISC and Instruction Pipelining Instruction Pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the proc...
CS471- Parallel Processing Dr. Ahmed Hesham Mostafa Lecture 2 – RISC and Instruction Pipelining Instruction Pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps. is a process of arrangement of hardware elements of the CPU such that its overall performance is increased. Simultaneous execution of more than one instruction takes place in a pipelined processor. A processor is said to be fully pipelined if it can fetch an instruction on every cycle. Thus, if some instructions or conditions require delays that inhibit fetching new instructions, the processor is not fully pipelined. Pipeline is also called Instruction Level Parallelism (ILP). Pipeline Stages : The number of dependent steps varies with the machine architecture. For example: The 1956–1961 IBM Stretch project proposed the terms Fetch, Decode, and Execute that have become common. The classic RISC pipeline comprises: Instruction fetch Instruction decode and register fetch Execute Memory access Register write back The Atmel AVR and the PIC microcontroller each have a two-stage pipeline. CISC V.S RISC CISC is a complex instruction , which can perform multiple operations and uses a complex hardware: Example MAC R0,R1,R3 which means R0 = ( R1 * R3 ) + R0. Example ADD [R0],[R1],[R3] which means add the value present in the address in the R1 to the value present in the address in the R3 and store the result in the address present in the R0 CISC V.S RISC programs RISC is a reduced instruction set , which can only perform simple instructions and uses simple hardware. Example ADD R0,R1,R3 which means R0 = R1 + R3; Example LD R0,[R1] which means move R0 register the value present in the address that present in the R1. RISC: Reduced Instruction Set Computer Properties: All operations on data apply to data in registers and typically change the entire register (32 or 64 bits per reg); Only load and store operations affect memory; load: move data from mem to reg; store: move data from reg to mem; Only a few instruction formats; all instructions typically being one size. RISC: Reduced Instruction Set Computer RISC has 32 registers and 3 classes of instructions. The first class is ALU instruction. It usually operates on two registers and stores the result in a third register. Typical ALU instructions include add, subtract, and logical operations such as AND, OR. RISC: Reduced Instruction Set Computer The second class is Load and Store instructions that affect memory. They first get the memory address by adding the base register and an offset. The load instruction uses a second register operand as the data destination while the store instruction uses a second register as the data source. Load (LD) and store (SD) instructions operands: base register + offset; the sum (called effective address) is used as a memory address; RISC: Reduced Instruction Set Computer The third class is branches and jumps that make conditional transfers of control. A branch instruction consists of two phases. The first phase specifies the branch condition to decide whether the branch should take effect. It’s just like an if-else in programming language. The branch condition can be verified via a set of condition bits or comparison between two registers. If the branch condition is satisfied, the second phase decides the branch destination. That is, where to fetch the next instruction. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Instruction Fetch cycle The first cycle is IF for fetching the current instruction from memory. To do so, the processor sends the program counter to memory, fetches the current instruction from there and then increments the program counter by 4. ( as each mem cell 8 bit and instruction is 32 bit so instruction need 4 mem cells) PC = PC + 4; //each instr is 4 bytes RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Instruction Decode/register fetch cycle decode the instruction; read the registers (corresponding to register source specifiers); RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Execution/effective address cycle The third cycle is EX for calculating the effective memory address. ALU operates on the operands fetched in the ID cycle. There are 3 classes of functions according to the instruction type. The first class is memory reference, ALU adds base register and offset to form effective address. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Execution/effective address cycle The second class is register-register ALU instruction: in this case, ALU performs the operation specified by opcode on the values in the register file. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Execution/effective address cycle The third class is register-immediate ALU instruction. For this type, ALU operates on the first value read from the register file and the sign-extended immediate. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB MEMory access The fourth cycle is MEM for memory access. there are two types of instructions affecting memory. For load instruction, the memory does a read using the effective address. For store instruction, the memory writes some data using the effective address. RISC: Reduced Instruction Set Computer at most 5 clock cycles per instruction IF ID EX MEM WB Write-Back cycle The fifth cycle is WB for writing the result into the register file. The result may come from the memory if it’s a load instruction or from the ALU if it’s an ALU instruction. RISC Instruction Format Instructions are divided into three types: R (register), I (immediate), and J (jump). Every instruction starts with a 6-bit opcode. In addition to the opcode, R-type instructions specify three registers,a shift amount field, and a function field; I-type instructions specify two registers and a 16-bit immediate value; J-type instructions follow the opcode with a 26-bit jump target. 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits op rs rt rd shamt funct R-Format 6 bits 5 bits 5 bits 16 bits op rs rt offset I-Format 6 bits 26 bits op address J-Format Set of Instructions op rs rt rd sh fn 31 25 20 15 10 5 0 R 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Opcode Source 1 Source 2 Destination Unused Opcode ext I or base or dest’n imm Operand / Offset, 16 bits J jta Jump target address, 26 bits inst Instruction, 32 bits Fig. 13.1 MicroMIPS instruction formats and naming of the various fields. We will refer to this diagram later Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor) Six I-format ALU instructions (lui, addi, slti, andi, ori, xori) Two I-format memory access instructions (lw, sw) Three I-format conditional branch instructions (bltz, beq, bne) Four unconditional jump instructions (j, jr, jal, syscall) The Instruction Usage op fn MicroMIPS Copy Load upper immediate lui rt,imm 15 Instruction Add add rd,rs,rt 0 32 Subtract sub rd,rs,rt 0 34 Set Set less than slt rd,rs,rt 0 42 Arithmetic Add immediate addi rt,rs,imm 8 Including Set less than immediate slti rd,rs,imm 10 la rt,imm(rs) AND and rd,rs,rt 0 36 (load address) OR or rd,rs,rt 0 37 makes it easier XOR xor rd,rs,rt 0 38 Logic NOR nor rd,rs,rt 0 39 to write useful AND immediate andi rt,rs,imm 12 programs OR immediate ori rt,rs,imm 13 XOR immediate xori rt,rs,imm 14 Load word lw rt,imm(rs) 35 Memory access Store word sw rt,imm(rs) 43 Jump j L 2 Jump register jr rs 0 8 Branch less than 0 bltz rs,L 1 Control transfer Branch equal beq rs,rt,L 4 Branch not equal bne rs,rt,L 5 Jump and link jal L 3 Table 13.1 System call syscall 0 12 A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x0056 0ab0f 0x01f00 PC = 0x1f00 IR = Data Memory R0 R1 R3 0x 1224 0a00 CPU A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x1856 0000 0x01f00 PC = 0x1f04 IR =0056 ab0f Data Memory R0 R1 R3 0x 1224 0a00 CPU A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x1856 0000 0x01f00 PC = 0x1f04 IR =0056 ab0f Data Memory R0 R1 to ALU latch R3 to ALU latch 0x 1224 0a00 CPU A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x1856 0000 0x01f00 PC = 0x1f04 IR =0056 ab0f Data Memory R0 R1 R1 + R3 R3 0x 1224 0a00 CPU A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x1856 0000 0x01f00 Memory access stage is a dummy state in the Add instruction PC = 0x1f04 IR =0056 ab0f Data Memory R0 R1 R1 + R3 R3 0x 1224 0a00 CPU A 5-Stage Pipeline (ADD instruction) ADD R0, R1,R3 Program Memory IF ID ID ExecuteMemory Memory Store Store ADD R0, R1,R3 0x1856 0000 0x01f00 PC = 0x1f04 IR =0056 ab0f Data Memory R0 = R1 + R3 R1 R1 + R3 R3 0x 1224 0a00 CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID ID ExecuteMemory Memory Store Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f00 IR = Data Memory R0 R1 0x 1224 0a00 CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID ID ExecuteMemory Memory Store Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f04 IR =1856 0000 Data Memory R0 R1 0x 1224 0a00 CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID Execute Memory Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f04 IR =1856 0000 Data Memory R0 to ALU latch R1 0x 1224 0a00 CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID Execute Memory Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f04 IR =1856 0000 Data Memory R0 ALU Latch + 24 0x 1224 0a00 R1 CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID Execute Memory Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f04 IR =1856 0000 Data Memory R0 ALU Latch + 24 0x 1224 0a00 R1 = CPU A 5-Stage Pipeline (LD instruction) LD R1,24[R0] Program Memory IF ID Execute Memory Store LD R1,24[R0] 0x1856 0000 0x01f00 PC = 0x1f04 IR =1856 0000 Data Memory R0 ALU Latch + 24 0x 1224 0a00 R1 = 0x 1224 0a00 CPU A 5-Stage Pipeline (Branch instruction) If the instruction decoded was a branch or jump, the target address of the branch or jump was computed in parallel with reading the register file. The branch condition is computed after the register file is read, and if the branch is taken or if the instruction is a jump, the PC predictor in the first stage is assigned the branch target, rather than the incremented PC that has been computed. So the branch instruction and call instruction are completed in two cycles. A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 PC = 0x1f00 IR = Data Memory ALU Latch 0x 1224 0a00 R2 = CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 PC = 0x1f04 IR =10A6 0027 Data Memory ALU Latch 0x 1224 0a00 R2 = CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 PC = 0x1f04 IR =10A6 0027 Data Memory TB = PC + 100 ALU Latch Compare against zero 0x 1224 0a00 R2 to ALU Latch CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 If the Branch to be taken , the PC is updated PC = TB (PC+100) IR =10A6 0027 Data Memory TB = PC + 100 ALU Latch Compare against zero 0x 1224 0a00 R2 to ALU Latch CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 The Execute stage has no additional work to do. PC = TB (PC+100) IR =10A6 0027 Data Memory TB = PC + 100 ALU Latch Compare against zero 0x 1224 0a00 R2 to ALU Latch CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 The Memory stage has no additional work to do. PC = TB (PC+100) IR =10A6 0027 Data Memory TB = PC + 100 ALU Latch Compare against zero 0x 1224 0a00 R2 to ALU Latch CPU A 5-Stage Pipeline (Branch instruction) Program Memory BEZ R2,100; // If R2 is zero branch to location PC+100. IF ID Execute Memory Store BEZ R2,100; 0x10A6 0027 0x01f00 The Store stage has no additional work to do. PC = TB (PC+100) IR =10A6 0027 Data Memory TB = PC + 100 ALU Latch Compare against zero 0x 1224 0a00 R2 to ALU Latch CPU RISC: Five-Stage Pipeline Simply start a new instruction on each clock cycle; Speedup = 5. RISC: Five-Stage Pipeline To achieve the pipelining, we need separate instruction memory and data memory. Because both IF and MEM require memory access, if there is only one single memory, IF and MEM will have memory conflicts during the pipelining process. RISC: Five-Stage Pipeline separate instruction and data mems to eliminate conflicts for a single memory between instruction fetch and data memory access. Instr mem Data mem IF MEM RISC: Five-Stage Pipeline How it works introduce pipeline registers between successive stages; pipeline registers store the results of a stage and use them as the input of the next stage. RISC: Five- Stage Pipeline high level paradigm of RISC pipelining Pipelined datapath and control 1 0 ID/EX WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P Add C Shift RegWrite left 2 Read Read register 1 data 1 MemWrite ALU Read Instruction Zero Read Read address [31-0] 0 register 2 data 2 Result Address Write 1 Data Instruction register MemToReg memory memory Registers ALUOp Write data ALUSrc Write Read 1 data data Instr [15 - 0] Sign RegDst extend MemRead 0 Instr [20 - 16] 0 Instr [15 - 11] 1 45 An example execution sequence ▪ Here’s a sample sequence of instructions to execute 1000: lw $8, 4($29) 1004: sub $2, $4, $5 1008: and $9, $10, $11 1012: or $16, $17, $18 1016: add $13, $14, $0 ▪ We’ll make some assumptions, just so we can show actual data values: — Each register contains its number plus 100. For instance, register $8 contains 108, register $29 contains 129, etc. — Every data memory location contains 99 ▪ Our pipeline diagrams will follow some conventions: — An X indicates values that aren’t important, like the constant field of an R-type instruction — Question marks ??? indicate values we don’t know, usually resulting from instructions coming before and after the ones in our example 46 Cycle 1 (filling) IF: lw $8, 4($29) ID: ??? EX: ??? MEM: ??? WB: ??? 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P 1004 Add C Shift RegWrite (?) left 2 ??? ??? ??? 1000 Read Read register 1 data 1 MemWrite (?) ALU ReadInstruction ??? ??? Zero Read Read ??? ??? address [31-0] 0 register 2 data 2 Result Address ??? Write ??? MemToReg Data Instruction register 1 (?) ??? Registers ALUOp (???) memory memory Write ??? data ALUSrc (?) ??? Write Read 1 data data ??? Sign ??? RegDst (?) ??? extend MemRead (?) 0 ??? ??? 0 ??? ??? ??? ??? ??? 1 ??? 47 Cycle 2 IF: sub $2, $4, $5 ID: lw $8, 4($29) EX: ??? MEM: ??? WB: ??? 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P 1008 Add C Shift RegWrite (?) left 2 29 129 ??? 1004 Read Read register 1 data 1 MemWrite (?) ALU ReadInstruction X X ??? Zero Read Read ??? address [31-0] 0 register 2 data 2 Result Address ??? Write ??? MemToReg Data Instruction register 1 (?) ??? Registers ALUOp (???) memory memory Write ??? data ALUSrc (?) ??? Write Read 1 data data 4 Sign ??? RegDst (?) ??? extend MemRead (?) 0 8 ??? 0 ??? ??? ??? X ??? 1 ??? 48 Cycle 3 IF: and $9, $10, $11 ID: sub $2, $4, $5 EX: lw $8, 4($29) MEM: ??? WB: ??? 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P 1012 Add C Shift RegWrite (?) left 2 4 104 129 1008 Read Read register 1 data 1 MemWrite (?) ALU ReadInstruction 5 X Zero Read Read 105 ??? address [31-0] 0 register 2 data 2 Result Address 4 ??? Write 133 MemToReg Data Instruction register 1 (?) ??? Registers ALUOp (add) memory memory Write ??? Write ??? data ALUSrc (1) Read 1 data data X Sign 4 RegDst (0) ??? extend MemRead (?) 0 X 8 ??? ??? 0 8 2 X 1 ??? 49 Cycle 4 IF: or $16, $17, $18 ID: and $9, $10, $11 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ??? 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P 1016 Add C Shift RegWrite (?) left 2 10 110 104 1012 Read Read register 1 data 1 MemWrite (0) ALU ReadInstruction 11 105 Zero Read Read 111 133 address [31-0] 0 register 2 data 2 Result Address ??? Write –1 MemToReg Data Instruction register 1 (?) ??? Registers ALUOp (sub) memory memory Write 99 ??? data ALUSrc (0) X Write Read 1 data data X Sign X RegDst (1) ??? extend MemRead (1) 0 X X ??? 0 2 8 9 2 1 ??? 50 Cycle 5 (full) IF: add $13, $14, $0 ID: or $16, $17, $18 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB: lw $8, 4($29) 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P 1020 Add C Shift RegWrite (1) left 2 17 117 110 1016 Read Read register 1 data 1 MemWrite (0) ALU ReadInstruction 18 111 Zero Read Read 118 -1 address [31-0] 0 register 2 data 2 Result Address 8 Write 110 MemToReg Data Instruction register 1 (1) 99 Registers ALUOp (and) memory memory Write X 99 data ALUSrc (0) 105 Write Read 1 data data X Sign X RegDst (1) 133 extend MemRead (0) 0 X X 0 9 2 8 16 9 1 99 51 Cycle 6 (emptying) IF: ??? ID: add $13, $14, $0 EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub $2, $4, $5 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P ??? Add C Shift RegWrite (1) left 2 14 114 117 1020 Read Read register 1 data 1 MemWrite (0) ALU ReadInstruction 0 0 118 Zero Read Read 110 address [31-0] 0 register 2 data 2 Result Address 2 Write 119 MemToReg Data Instruction register 1 (0) -1 Registers ALUOp (or) memory memory Write X X data ALUSrc (0) 111 Write Read 1 data data X Sign X RegDst (1) -1 extend MemRead (0) 0 X X 0 16 9 2 13 16 1 -1 52 Cycle 7 IF: ??? ID: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and $9, $10, $11 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P ??? Add C Shift RegWrite (1) left 2 ??? ??? 114 ??? Read Read register 1 data 1 MemWrite (0) ALU ReadInstruction ??? 0 Zero Read Read ??? 119 address [31-0] 0 register 2 data 2 Result Address 9 Write 114 MemToReg Data Instruction register 1 (0) 110 Registers ALUOp (add) memory memory Write X X data ALUSrc (0) 118 Write Read 1 data data ??? Sign X RegDst (1) 110 extend MemRead (0) 0 ??? X 0 13 16 9 ??? 13 1 110 53 Cycle 8 IF: ??? ID: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16, $17, $18 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P ??? Add C Shift RegWrite (1) left 2 ??? ??? ??? ??? Read Read register 1 data 1 MemWrite (0) ALU ReadInstruction ??? ??? Zero Read Read ??? 114 address [31-0] 0 register 2 data 2 Result Address 16 Write ??? MemToReg Data Instruction register 1 (0) 119 Registers ALUOp (???) memory memory Write X X data ALUSrc (?) 0 Write Read 1 data data ??? Sign ??? RegDst (?) 119 extend MemRead (0) 0 ??? ??? 0 ??? 13 16 ??? ??? 1 119 54 Cycle 9 IF: ??? ID: ??? EX: ??? MEM: ??? WB: add $13, $14, $0 1 ID/EX 0 WB EX/MEM PCSrc Control M WB MEM/WB IF/ID EX M WB 4 Add P ??? Add C Shift RegWrite (1) left 2 ??? ??? ??? ??? Read Read register 1 data 1 MemWrite (?) ALU ReadInstruction ??? ??? Zero Read Read ??? ??? address [31-0] 0 register 2 data 2 Result Address 13 Write ??? MemToReg Data Instruction register 1 (0) 114 Registers ALUOp (???) memory memory Write X X data ALUSrc (?) ? Write Read 1 data data ??? Sign ??? RegDst (?) 114 extend MemRead (?) 0 ??? ??? 0 ??? ??? 13 ??? ??? 1 114 55 That’s a lot of diagrams there Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB sub $v0, $a0, $a1 IF ID EX MEM WB and $t1, $t2, $t3 IF ID EX MEM WB or $s0, $s1, $s2 IF ID EX MEM WB add $t5, $t6, $0 IF ID EX MEM WB — You can see how instruction executions are overlapped — Each functional unit is used by a different instruction in each cycle — The pipeline registers save control and data values generated in previous clock cycles for later use — When the pipeline is full in clock cycle 5, all of the hardware units are utilized. This is the ideal situation, and what makes pipelined processors so fast 56 When Pipeline Is Stuck LD R1, 0(R2) R1 R1 SUB R4, R1, R5 Actually, in many cases, instructions cannot be fully pipelined. In this example, R1 is ready by load instruction at the end of clock cycle 4; But subtract instruction requires R1 at the beginning of clock cycle 4; in this case, the processor can not fully pipeline the two instructions as we expect. References Computer System Architecture (3rd Edition) by M. Morris Mano Computer organization and architecture designing for performance (8th edition) by William Stallings Computer Architecture: A Quantitative Approach 5th Edition by John L. Hennessy Patterson & Hennessy, “Computer Organization &Design” Lecture Notes, Lecture 4: Pipelining Basics & Hazards, Kai Bu, [email protected] EmbeededNotes Pipeline , https://www.mediafire.com/file/yl157ng7eby7989/Video_6__-_PipeLine_- _1_Intro.pptx/file https://www.youtube.com/watch?v=X84uO3H83bA Lecture Notes, Introduction to Computer Architecture, at the University of California, Santa Barbara. © Behrooz Parhami Thanks