csc25-chapter_03a.pdf

CSC-25 High Performance Architectures Lecture Notes – Chapter III-A Exploiting Instruction-Level Parallelism (ILP) Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents ILP Common Solution Pipeline Concepts Datapaths and Stalls Overview MIPS Pipeline Speedup View of Pipeline and Functional Units Efficiency Structural Hazard Throughput Data Hazard Major Hurdles of Pipelining Minimizing Data Hazard Stalls by Forwarding Unbalanced Length in Pipe Stages HW changing for Forwarding Structural Hazard Control Hazard Data Hazard Summary Control Hazard References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/59 ILP All processors from 1985+ have used pipelining to overlap the execution of instructions and then improve performance ILP This potential overlap among instructions is known as instruction-level parallelism - ILP, since the instructions can be evaluated in parallel 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/59 Pipeline Concepts Overview Pipeline is a natural idea I assembly lines It is about divide the task into sequential sub-tasks, allocate resources for each sub-task, and control phase changes Example: laundry I wash and dry: 4h, therefore, 4 baskets would take 16h I wash: 2h and dry: 2h I how long will 4 baskets take? 10h 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/59 Pipeline Concepts (cont.) Overview Instruction cycle pipeline – an instruction can be broken into the following cycles: 1. instruction fetch - RI 2. instruction decode - DI 3. operands fetch - OO 4. execution - EXE 5. result store - AR RI DI OO EXE AR 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/59 Pipeline Concepts (cont.) Overview Parallelism in instructions RI I1 I2 I3 DI I1 I2 I3 OO I1 I2 I3 EXE I1 I2 I3 AR I1 I2 I3 5 10 15 Without pipeline, idle time and inefficiency. CPI = 15 cycles/3 instructions RI I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 DI I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 OO I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 EXE I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 AR I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 5 10 15 With pipeline. CPI = 15 cycles/11 instructions. CPI (lower) limit = 1, notice from the clock cycle - CC 5 to 15. How many clocks and how many instructions? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/59 Pipeline Concepts (cont.) Overview Single cycle, multiple cycle and pipeline overview comparison 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/59 Pipeline Concepts (cont.) Speedup Which is the expected enhancement in performance with pipeline? Stage S1 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 S2 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 S3 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 S4 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 S5 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 5 10 15 t Stages vs. Time diagram 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/59 Pipeline Concepts (cont.) Speedup Basic concepts: I n – number of instructions I p – number of pipeline stages (a.k.a. depth of the pipeline) I Tj (1 ≤ j ≤ p) – waiting in Sj I TL – stage transition time I TMAX = max{Ti } I T = TMAX + TL : clock period (clock cycle) I frequency f = 1 T 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/59 Pipeline Concepts (cont.) Speedup Measuring speedup from pipelining AverageInstructionTimeUnpipelined SpeedupP = (1) AverageInstructionTimePipelined AverageInstructionTimeUnpipelined = n × p × T (2) AverageInstructionTimePipelined = (p + n − 1) × T (3) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/59 Pipeline Concepts (cont.) Speedup Then, eq. (1) becomes: n×p SpeedupP = (4) p+n−1 If n p, SpeedupP ≈ p −→ maximum SpeedupP ≤ p Question: the higher the p, the higher the SpeedupP ? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/59 Pipeline Concepts (cont.) Efficiency Efficiency η – relation between the “occupied area” and “total area” considering the stages vs. time diagram The observed improvement when using a number of pipeline stages p n×p SpeedupP p+n−1 n η= = = (5) p p p+n−1 When n → ∞, η = 1 (maximum efficiency) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/59 Pipeline Concepts (cont.) Throughput Throughput W – number of completed tasks per time unit n η W= = (6) (p + n − 1) × T T 1 When η → 1, W → T = f (maximum throughput) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 15/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/59 Major Hurdles of Pipelining Impediments to maximum values related to speedup, efficiency and throughput: I stages with different times I fails in cache memory I instructions with different lengths I hazards and dependencies I structural – resource conflicts I data – instruction depends on the results of a previous instruction I read after write - RAW I write after read - WAR I write after write - WAW I control – branch prediction, delayed branches, changes in PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/59 Major Hurdles of Pipelining (cont.) RI I1 I2 I3 I4 I5 I6 I19 I20 I21............... DI I1 I2 I3 I4 I5 * I19 I20 I21............ OO I1 I2 I3 I4 I5 I19 I20 I21......... EXE I1 I2 I3 I4 I5 I19 I20 I21...... AR I1 I2 I3 I4 I5 I19 I20 I21... 5 10 15 * I5 is a conditional branch, it is blocked when decoded, then causing a bunch of idle stages (inefficiency) Two possibilities: execute I6 or I19 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/59 Major Hurdles of Pipelining (cont.) Unbalanced Length in Pipe Stages RI I1 I1 I1 I1 I2 I2 I2 I2 I3 I3 I3 I3 I4 I4 I4 I4 I5 I5 I5 I5 I6... DI I1 I2 I3 I4 I5... OO I1 I1 I1 I1 I2 I2 I2 I2 I3 I3 I3 I3 I4 I4 I4 I4... EXE I1 I2 I3 AR I1 I1 I1 I1 I2 I2 I2 I2 I3 I3 I3... 5 10 15 20 lots of idle time and inefficiency 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/59 Major Hurdles of Pipelining (cont.) Structural Hazard Also known as functional dependency Result of a race condition, involving two or more instructions competing for the same resource at the same time Possible solutions I one of the instructions must wait I increase the number of resources RI I1 I2 I3 I3 I3 I3 I4 I5 I5 I5 I5 I6 I7 DI I1 I2 I3 I4 OO I1 I2 I3 I4 EXE I1 I2 I3 I4 AR I1 I2 I3 I4 5 10 15 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/59 Major Hurdles of Pipelining (cont.) Structural Hazard Control of Resources Usage When a dependency involves only processor resources (e.g., registers or functional units), a resource reservation table is generally used Every instruction makes a check in the table about the resources that it will use; if any of them are marked, it is blocked until they are released After using it, the statement frees the resource by unchecking the reservation table 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/59 Major Hurdles of Pipelining (cont.) Data Hazard It occurs when the pipeline changes the order of read/write accesses to operands Then, the order differs from the order seen by sequentially executing instructions on an unpipelined processor 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after read - WAR 1 A = B + C 2 B = C + D Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAR hazards are impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after read - WAR 1 A = B + C 2 B = C + D Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAR hazards are impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after write - WAW 1 A = B + C 2 A = D + E Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAW hazards are also impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after write - WAW 1 A = B + C 2 A = D + E Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAW hazards are also impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/59 Major Hurdles of Pipelining (cont.) Data Hazard Read after write - RAW 1 A = B + C 2 E = A + D Is there any problem? Yes. Instruction from line 2 can only read the value of A after the completion of the instruction from line 1 Simple solution: wait the execution of instruction from line 2. How many clocks? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/59 Major Hurdles of Pipelining (cont.) Data Hazard Read after write - RAW 1 A = B + C 2 E = A + D Is there any problem? Yes. Instruction from line 2 can only read the value of A after the completion of the instruction from line 1 Simple solution: wait the execution of instruction from line 2. How many clocks? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/59 Major Hurdles of Pipelining (cont.) Control Hazard It basically comes from the pipelining of branches and other instructions that cause a change in the PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/59 Major Hurdles of Pipelining (cont.) Common Solution A common possible solution to those hazards is “simply” stall the pipeline as long as the hazard is present by inserting one or more “bubbles” into it 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/59 Datapaths and Stalls MIPS Pipeline Stages from MIPS pipeline: 1. instruction fetch - IF cycle 2. instruction decode/register fetch - ID cycle 3. execution/effective address - EX cycle 4. memory access - MEM cycle 5. write-back - WB cycle 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/59 Datapaths and Stalls (cont.) View of Pipeline and Functional Units Time (Clock Cycles) IF ID EX MEM WB A Load Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #1 Mem Reg L Mem Reg U Instruction Order IF ID EX MEM WB A Inst. #2 Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #3 Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #4 Mem Reg L Mem Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 30/59 Datapaths and Stalls (cont.) View of Pipeline and Functional Units Hazards and Dependencies I structural – resource conflicts I data – instruction depends on the results of a previous instruction I read after write - RAW I write after read - WAR I write after write - WAW I control – branch prediction, delayed branches, changes in PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 31/59 Datapaths and Stalls (cont.) Structural Hazard Single memory is a structural hazard Time (Clock Cycles) IF ID EX MEM WB A Load Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #1 Mem Reg L Mem Reg U Instruction Order IF ID EX MEM WB A Inst. #2 Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #3 Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #4 Mem Reg L Mem Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 32/59 Datapaths and Stalls (cont.) Structural Hazard Solution #1: stall the pipeline to resolve the memory structural hazard Time (Clock Cycles) IF ID EX MEM WB A Load Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #1 Mem Reg L Mem Reg U Instruction Order IF ID EX MEM WB A Inst. #2 Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #3 (stall) bubble Mem Reg L Mem Reg U IF ID EX MEM WB A Inst. #4 Mem Reg L Mem Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 33/59 Datapaths and Stalls (cont.) Structural Hazard Solution #2: separate instruction cache - Im and data cache - Dm Time (Clock Cycles) IF ID EX MEM WB A Load Im Reg L Dm Reg U IF ID EX MEM WB A Inst. #1 Im Reg L Dm Reg U Instruction Order IF ID EX MEM WB A Inst. #2 Im Reg L Dm Reg U IF ID EX MEM WB A Inst. #3 Im Reg L Dm Reg U IF ID EX MEM WB A Inst. #4 Im Reg L Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 34/59 Datapaths and Stalls (cont.) Data Hazard Let’s consider the following code: 1 add r1 , r2 , r3 2 sub r4 , r1 , r3 3 and r6 , r1 , r7 4 or r8 , r1 , r9 5 xor r10 , r1 , r11 All the instructions after the add use the result of the add instruction In turn, the add instruction writes the value of r1 in the WB pipe stage 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 35/59 Datapaths and Stalls (cont.) Data Hazard Time (Clock Cycles) Data hazard on r1 IF ID EX MEM WB Register file is used add r1,r2,r3 Im Reg A L Dm Reg as a source in the U ID stage and as a IF ID EX MEM WB A destination in the sub r4,r1,r3 Im Reg L Dm Reg WB stage U Instruction Order IF ID EX MEM WB A It is read in the and r6,r1,r7 Im Reg L U Dm Reg second half of the stage, and written IF ID A EX MEM WB or r8,r1,r9 Reg Reg in the first half Im L U Dm IF ID EX MEM WB RAW hazard on sub xor r10,r1,r11 Im Reg A L Dm Reg and and instructions U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 36/59 Datapaths and Stalls (cont.) Data Hazard HW stall – HW does not change the PC, instead it keeps fetching the same instruction and sets control signals to harmless values (0) Time (Clock Cycles) IF ID EX MEM WB A add r1,r2,r3 Im Reg L Dm Reg U stall Im bubble bubble bubble bubble Instruction Order stall Im bubble bubble bubble bubble IF ID EX MEM WB A sub r4,r1,r3 Im Reg L Dm Reg U IF ID EX MEM WB A and r6,r1,r7 Im Reg L Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 37/59 Datapaths and Stalls (cont.) Data Hazard SW – insert independent instructions, worst case inserts nop instructions Time (Clock Cycles) IF ID EX MEM WB A add r1,r2,r3 Im Reg L Dm Reg U IF ID EX MEM WB A nop Im Reg L Dm Reg U Instruction Order IF ID EX MEM WB A nop Im Reg L Dm Reg U IF ID EX MEM WB A sub r4,r1,r3 Im Reg L Dm Reg U IF ID EX MEM WB A and r6,r1,r7 Im Reg L Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 38/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Forwarding A simple hardware technique, a.k.a. bypassing or short-circuiting Let’s consider the previous RAW hazard 1 add r1 , r2 , r3 2 sub r4 , r1 , r3 3 and r6 , r1 , r7 4 or r8 , r1 , r9 5 xor r10 , r1 , r11 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 39/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Key points in forwarding: I the result in not really needed by the sub until after the add actually produces it I what if that result could be moved from the pipeline register where the add stores it to where the sub needs it? I in this case, the need for a stall can be avoided 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 40/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding ALU result from EX/MEM and MEM/WB pipeline registers is always fed back to ALU input Pipeline registers: IF/ID; ID/EX; EX/MEM; MEM/WB 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 41/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Forwarding HW detects that the previous ALU operation wrote the register corresponding to a source for the current ALU operation Control logic selects the forwarded result as the ALU input, instead of the value read from the register file Pipeline registers: IF/ID; ID/EX; EX/MEM; MEM/WB 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 42/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding If the sub is stalled, the add will be completed and the bypass will not be activated This is also true when an interrupt occurs between the two instructions 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 43/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Not all potential data hazards can be managed by forwarding Let’s consider the following code: 1 ld r1 ,0( r2 ) 2 sub r4 , r1 , r5 3 and r6 , r1 , r7 4 or r8 , r1 , r9 What is the problem here? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 44/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding The ld instruction does not have the data until the end of MEM cycle The sub instruction needs to have the data by the beginning of that CC This case needs a stall 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 45/59 Datapaths and Stalls (cont.) HW changing for Forwarding 1. ALU output at the end of EX stage 2. memory output at the end of MEM stage 3. ALU output at the end of MEM stage HW forwarding including new three paths 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 46/59 Datapaths and Stalls (cont.) HW changing for Forwarding Increase the multiplex to include new paths from pipeline registers Assuming register writing occurs in the first half, and reading occurs in the second half of the cycle (no need for the WB output), otherwise it needs extra forwarding 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 47/59 Datapaths and Stalls (cont.) Control Hazard Control hazards can cause a greater performance loss compared to data hazards When a branch is executed, it may or may not change the PC to a value other than its current value plus four, i.e., next instruction Taken vs. Untaken If a branch changes the PC to its target address, it is a taken branch, otherwise it is an untaken branch If instruction i is a taken branch, then the PC is normally not changed until the end of ID, after the completion of the address calculation and comparison 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 48/59 Datapaths and Stalls (cont.) Control Hazard Simplest method for coping with branches: redo the fetch of the instruction following a branch instruction The first IF may be seen as a stall, as branch is detected during ID This scheme is regarded as freeze or flush the pipeline Branch instruction IF ID EX MEM WB Branch successor IF IF ID EX MEM WB Branch successor + 1 IF ID EX MEM Branch successor + 2 IF ID EX Branch causing one-cycle stall in a five-stage pipeline The performance loss is 10-30% for one stall cycle per branch 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 49/59 Datapaths and Stalls (cont.) Control Hazard The predicted-not-taken scheme When branch is untaken, verified during ID, fetch and fall-through ordinarily. When branch is taken during ID, fetch is redone at the branch target 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 50/59 Datapaths and Stalls (cont.) Control Hazard The predicted-taken scheme As the name mentions, treats every branch as taken Considering this five-stage pipeline, the target address is not known any earlier than the branch outcome Therefore, there is no advantage in this approach for this pipeline 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 51/59 Datapaths and Stalls (cont.) Control Hazard The delayed branch scheme, heavily used in early RISC processors 1 branch instruction 2 sequential successor 1 3 branch target if taken The sequential successor is in the branch delay slot, i.e., that instruction is executed whether or not the branch is taken Most of processors implementing this technique use single instruction delay, despite longer delays are possible 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 52/59 Datapaths and Stalls (cont.) Control Hazard Delayed branch scheme. Instructions in the delay slot are executed. If the branch is untaken: execution continues with the instruction after the branch delay instruction; otherwise, execution continues at the branch target What if the instruction in the branch delay slot is also a branch? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 53/59 Datapaths and Stalls (cont.) Control Hazard There are lot of room for compilers to play in the delayed branch technique Compilers with optimizations should choose the instructions to be placed after these branch instructions (branch delay slot), and they must be effectively executed 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 54/59 Datapaths and Stalls (cont.) Control Hazard Is it possible to eliminate the delay in branch instructions? Branch-prediction scheme1 - guessing the outcome of the branch condition and proceeding as if this guessing were correct I processor state cannot be affected should any errors occur in the prediction Prediction is very good for performance when there are good prediction hit rates; in many cases this is possible: I end-of-loop testing is always false, except at the last iteration I searches fail in all iterations, except possibly in the last Generally used in superscalar processors, addressed later in the course 1 A Critical Intel Flaw Breaks Basic Security for Most Computers. Wired. 2018. https://www.wired.com/story/critical- intel- flaw- breaks- basic- security- for- most- computers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 55/59 Datapaths and Stalls (cont.) Control Hazard Is it possible to eliminate the delay in branch instructions? Branch-prediction scheme1 - guessing the outcome of the branch condition and proceeding as if this guessing were correct I processor state cannot be affected should any errors occur in the prediction Prediction is very good for performance when there are good prediction hit rates; in many cases this is possible: I end-of-loop testing is always false, except at the last iteration I searches fail in all iterations, except possibly in the last Generally used in superscalar processors, addressed later in the course 1 A Critical Intel Flaw Breaks Basic Security for Most Computers. Wired. 2018. https://www.wired.com/story/critical- intel- flaw- breaks- basic- security- for- most- computers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 55/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 56/59 Summary Pipeline processor: I enhancement of the multiple clock cycle processor I each functional unit can be used only once per instruction I instructions must use functional units at the same stage as all other instructions I brings considerable performance improvements I also bring new “problems”, e.g., structural, data and control hazards 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 57/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 58/59 Information to the reader Lecture notes mainly based on the following references Castro, Paulo André. Notas de Aula da disciplina CES-25 Arquiteturas para Alto Desempenho. ITA. 2018. Hennessy, J. L. and D. A. Patterson. Arquitetura de Computadores: Uma Abordagem Quantitativa. 5a. Campus, 2014. –.Computer Architecture: A Quantitative Approach. 6th. Morgan Kaufmann, 2017. Patterson, D. and S. Kong. Lecture notes, CS152 Computer Architecture and Engineering, Lecture 15: Designing a Pipeline Processor. Online. 1995. 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 59/59

csc25-chapter_03a.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue