csc25-chapter_03a.pdf
Document Details
Uploaded by SelfDeterminationOmaha
ITA
2024
Tags
Full Transcript
CSC-25 High Performance Architectures Lecture Notes – Chapter III-A Exploiting Instruction-Level Parallelism (ILP) Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents ILP Pipeline...
CSC-25 High Performance Architectures Lecture Notes – Chapter III-A Exploiting Instruction-Level Parallelism (ILP) Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents ILP Pipeline Concepts Overview Speedup Efficiency Throughput Major Hurdles of Pipelining Unbalanced Length in Pipe Stages Structural Hazard Data Hazard Control Hazard 1st semester, 2024 Loubach Common Solution Datapaths and Stalls MIPS Pipeline View of Pipeline and Functional Units Structural Hazard Data Hazard Minimizing Data Hazard Stalls by Forwarding HW changing for Forwarding Control Hazard Summary References CSC-25 High Performance Architectures ITA 2/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/59 ILP All processors from 1985+ have used pipelining to overlap the execution of instructions and then improve performance This potential overlap among instructions is known as instruction-level parallelism - ILP, since the instructions can be evaluated in parallel 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/59 Pipeline Concepts Overview Pipeline is a natural idea I assembly lines It is about divide the task into sequential sub-tasks, allocate resources for each sub-task, and control phase changes Example: laundry I wash and dry: 4h, therefore, 4 baskets would take 16h I wash: 2h and dry: 2h I how long will 4 baskets take? 10h 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/59 Pipeline Concepts (cont.) Overview Instruction cycle pipeline – an instruction can be broken into the following cycles: 1. instruction fetch - RI 2. instruction decode - DI 3. operands fetch - OO 4. execution - EXE 5. result store - AR RI 1st semester, 2024 DI Loubach OO EXE AR CSC-25 High Performance Architectures ITA 7/59 Pipeline Concepts (cont.) Overview Parallelism in instructions I1 RI DI OO EXE AR I2 I3 I1 I2 I3 I1 I2 I3 I1 I2 I1 5 I3 I2 10 I3 15 Without pipeline, idle time and inefficiency. CPI = 15 cycles/3 instructions RI DI OO EXE AR I1 I2 I1 I3 I2 I1 I4 I3 I2 I1 I5 I4 I3 I2 I1 5 I6 I5 I4 I3 I2 I7 I6 I5 I4 I3 I8 I7 I6 I5 I4 I9 I8 I7 I6 I5 I10 I9 I8 I7 I6 10 I11 I10 I9 I8 I7 I11 I10 I9 I8 I11 I10 I9 I11 I10 I11 15 With pipeline. CPI = 15 cycles/11 instructions. CPI (lower) limit = 1, notice from the clock cycle - CC 5 to 15. How many clocks and how many instructions? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/59 Pipeline Concepts (cont.) Overview Single cycle, multiple cycle and pipeline overview comparison 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/59 Pipeline Concepts (cont.) Speedup Which is the expected enhancement in performance with pipeline? Stage S1 S2 S3 S4 S5 I1 I2 I1 I3 I2 I1 I4 I3 I2 I1 I5 I4 I3 I2 I1 5 I6 I5 I4 I3 I2 I7 I6 I5 I4 I3 I8 I7 I6 I5 I4 I9 I8 I7 I6 I5 I10 I9 I8 I7 I6 10 I11 I10 I9 I8 I7 I11 I10 I9 I8 I11 I10 I9 I11 I10 I11 15 t Stages vs. Time diagram 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/59 Pipeline Concepts (cont.) Speedup Basic concepts: I n – number of instructions I p – number of pipeline stages (a.k.a. depth of the pipeline) I Tj (1 ≤ j ≤ p) – waiting in Sj I TL – stage transition time I TMAX = max{Ti } I T = TMAX + TL : clock period (clock cycle) I frequency f = 1st semester, 2024 1 T Loubach CSC-25 High Performance Architectures ITA 11/59 Pipeline Concepts (cont.) Speedup Measuring speedup from pipelining AverageInstructionTimeUnpipelined AverageInstructionTimePipelined (1) AverageInstructionTimeUnpipelined = n × p × T (2) AverageInstructionTimePipelined = (p + n − 1) × T (3) SpeedupP = 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/59 Pipeline Concepts (cont.) Speedup Then, eq. (1) becomes: SpeedupP = If n p, SpeedupP ≈ p −→ n×p p+n−1 (4) maximum SpeedupP ≤ p Question: the higher the p, the higher the SpeedupP ? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/59 Pipeline Concepts (cont.) Efficiency Efficiency η – relation between the “occupied area” and “total area” considering the stages vs. time diagram The observed improvement when using a number of pipeline stages p η= SpeedupP = p n×p p+n−1 p = n p+n−1 (5) When n → ∞, η = 1 (maximum efficiency) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/59 Pipeline Concepts (cont.) Throughput Throughput W – number of completed tasks per time unit W= When η → 1, W → 1st semester, 2024 1 T n η = (p + n − 1) × T T (6) = f (maximum throughput) Loubach CSC-25 High Performance Architectures ITA 15/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/59 Major Hurdles of Pipelining Impediments to maximum values related to speedup, efficiency and throughput: I stages with different times I fails in cache memory I instructions with different lengths I hazards and dependencies I structural – resource conflicts I data – instruction depends on the results of a previous instruction I read after write - RAW I write after read - WAR I write after write - WAW I control – branch prediction, delayed branches, changes in PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/59 Major Hurdles of Pipelining (cont.) RI DI OO EXE AR I1 I2 I1 I3 I2 I1 I4 I3 I2 I1 I5 I4 I3 I2 I1 5 I6 I5 * I4 I3 I2 I19 I5 I4 I3 I5 I4 I20 I19 I21 I20 I19... I21 I20 I19 I5 10...... I21 I20 I19......... I21 I20 15............ I21............... * I5 is a conditional branch, it is blocked when decoded, then causing a bunch of idle stages (inefficiency) Two possibilities: execute I6 or I19 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/59 Major Hurdles of Pipelining (cont.) Unbalanced Length in Pipe Stages RI DI OO EXE AR I1 I1 I1 I1 I2 I1 I2 I2 I2 I1 I1 I1 5 I3 I2 I1 I3 I3 I3 I2 I1 I2 I2 I4 I3 I2 I1 I1 I1 10 I4 I4 I4 I3 I2 I1 I3 I3 I2 15 I2 I5 I4 I3 I3 I2 I5 I5 I5 I4 I4 I4 I2 I3 I3 20 I6 I5 I4......... I3... lots of idle time and inefficiency 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/59 Major Hurdles of Pipelining (cont.) Structural Hazard Also known as functional dependency Result of a race condition, involving two or more instructions competing for the same resource at the same time Possible solutions I one of the instructions must wait I increase the number of resources RI DI OO EXE AR 1st semester, 2024 I1 I2 I1 I3 I2 I1 Loubach I3 I2 I1 I3 I2 I1 5 I3 I2 I4 I3 I5 I4 I3 I5 I4 I3 I5 I5 I4 I3 10 I4 I6 CSC-25 High Performance Architectures I7 15 ITA 20/59 Major Hurdles of Pipelining (cont.) Structural Hazard Control of Resources Usage When a dependency involves only processor resources (e.g., registers or functional units), a resource reservation table is generally used Every instruction makes a check in the table about the resources that it will use; if any of them are marked, it is blocked until they are released After using it, the statement frees the resource by unchecking the reservation table 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/59 Major Hurdles of Pipelining (cont.) Data Hazard It occurs when the pipeline changes the order of read/write accesses to operands Then, the order differs from the order seen by sequentially executing instructions on an unpipelined processor 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after read - WAR 1 2 A = B + C B = C + D Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAR hazards are impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after read - WAR 1 2 A = B + C B = C + D Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAR hazards are impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after write - WAW 1 2 A = B + C A = D + E Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAW hazards are also impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/59 Major Hurdles of Pipelining (cont.) Data Hazard Write after write - WAW 1 2 A = B + C A = D + E Is there any problem? False dependency in this case, as long as there is no out-of-order execution WAW hazards are also impossible in the simple five stage, but they occur when instructions are reordered 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/59 Major Hurdles of Pipelining (cont.) Data Hazard Read after write - RAW 1 2 A = B + C E = A + D Is there any problem? Yes. Instruction from line 2 can only read the value of A after the completion of the instruction from line 1 Simple solution: wait the execution of instruction from line 2. How many clocks? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/59 Major Hurdles of Pipelining (cont.) Data Hazard Read after write - RAW 1 2 A = B + C E = A + D Is there any problem? Yes. Instruction from line 2 can only read the value of A after the completion of the instruction from line 1 Simple solution: wait the execution of instruction from line 2. How many clocks? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/59 Major Hurdles of Pipelining (cont.) Control Hazard It basically comes from the pipelining of branches and other instructions that cause a change in the PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 26/59 Major Hurdles of Pipelining (cont.) Common Solution A common possible solution to those hazards is “simply” stall the pipeline as long as the hazard is present by inserting one or more “bubbles” into it 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 27/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 28/59 Datapaths and Stalls MIPS Pipeline Stages from MIPS pipeline: 1. instruction fetch - IF cycle 2. instruction decode/register fetch - ID cycle 3. execution/effective address - EX cycle 4. memory access - MEM cycle 5. write-back - WB cycle 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 29/59 Datapaths and Stalls (cont.) View of Pipeline and Functional Units Time (Clock Cycles) Load IF ID Mem Reg EX A L MEM WB Mem Reg U Instruction Order Inst. #1 IF ID Mem Reg EX A L MEM WB Mem Reg U Inst. #2 IF ID Mem Reg EX A L MEM WB Mem Reg U Inst. #3 IF ID Mem Reg L IF ID Mem Reg EX A MEM WB Mem Reg U Inst. #4 MEM WB Mem Reg EX A L U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 30/59 Datapaths and Stalls (cont.) View of Pipeline and Functional Units Hazards and Dependencies I structural – resource conflicts I data – instruction depends on the results of a previous instruction I read after write - RAW I write after read - WAR I write after write - WAW I control – branch prediction, delayed branches, changes in PC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 31/59 Datapaths and Stalls (cont.) Structural Hazard Single memory is a structural hazard Time (Clock Cycles) Load IF ID Mem Reg EX A L MEM WB Mem Reg U Instruction Order Inst. #1 IF ID Mem Reg EX A L MEM WB Mem Reg U Inst. #2 IF ID Mem Reg L IF ID Mem Reg EX A MEM WB Mem Reg U Inst. #3 EX A L MEM WB Mem Reg U Inst. #4 IF ID Mem Reg EX A L MEM WB Mem Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 32/59 Datapaths and Stalls (cont.) Structural Hazard Solution #1: stall the pipeline to resolve the memory structural hazard Time (Clock Cycles) Load IF ID Mem Reg EX A L MEM WB Mem Reg U Instruction Order Inst. #1 IF ID Mem Reg L IF ID Mem Reg EX A MEM WB Mem Reg U Inst. #2 EX A L MEM WB Mem Reg U Inst. #3 (stall) bubble IF ID Mem Reg EX A L MEM WB Mem Reg U Inst. #4 IF ID Mem Reg EX A L MEM WB Mem Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 33/59 Datapaths and Stalls (cont.) Structural Hazard Solution #2: separate instruction cache - Im and data cache - Dm Time (Clock Cycles) Load IF ID Im Reg L IF ID Im Reg EX A MEM WB Dm Reg U Instruction Order Inst. #1 EX A L MEM WB Dm Reg U Inst. #2 IF ID Im Reg L IF ID Im Reg EX A MEM WB Dm Reg U Inst. #3 EX A L MEM WB Dm Reg U Inst. #4 IF ID Im Reg EX A L MEM WB Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 34/59 Datapaths and Stalls (cont.) Data Hazard Let’s consider the following code: 1 2 3 4 5 add sub and or xor r1 , r2 , r3 r4 , r1 , r3 r6 , r1 , r7 r8 , r1 , r9 r10 , r1 , r11 All the instructions after the add use the result of the add instruction In turn, the add instruction writes the value of r1 in the WB pipe stage 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 35/59 Datapaths and Stalls (cont.) Data Hazard Time (Clock Cycles) Data hazard on r1 It is read in the second half of the stage, and written in the first half RAW hazard on sub and and instructions 1st semester, 2024 add r1,r2,r3 IF ID Im Reg L IF ID Im Reg L IF ID Im Reg L IF ID Im Reg L IF ID Im Reg EX A MEM WB Dm Reg U sub r4,r1,r3 Instruction Order Register file is used as a source in the ID stage and as a destination in the WB stage EX A MEM WB Dm Reg U and r6,r1,r7 MEM WB Dm Reg EX A U or r8,r1,r9 EX A MEM WB Dm Reg U xor r10,r1,r11 EX A L MEM WB Dm Reg U Loubach CSC-25 High Performance Architectures ITA 36/59 Datapaths and Stalls (cont.) Data Hazard HW stall – HW does not change the PC, instead it keeps fetching the same instruction and sets control signals to harmless values (0) Time (Clock Cycles) add r1,r2,r3 IF ID Im Reg MEM WB Dm Reg bubble bubble bubble bubble Im bubble bubble bubble EX A L U Instruction Order stall stall sub r4,r1,r3 Im IF ID Im Reg EX A L bubble MEM WB Dm Reg U and r6,r1,r7 IF ID Im Reg EX A L MEM WB Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 37/59 Datapaths and Stalls (cont.) Data Hazard SW – insert independent instructions, worst case inserts nop instructions Time (Clock Cycles) add r1,r2,r3 IF ID Im Reg L IF ID Im Reg EX A MEM WB Dm Reg U Instruction Order nop EX A L MEM WB Dm Reg U nop IF ID Im Reg L IF ID Im Reg EX A MEM WB Dm Reg U sub r4,r1,r3 EX A L MEM WB Dm Reg U and r6,r1,r7 IF ID Im Reg EX A L MEM WB Dm Reg U 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 38/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Forwarding – a simple hardware technique, a.k.a. bypassing or short-circuiting Let’s consider the previous RAW hazard 1 2 3 4 5 add sub and or xor r1 , r2 , r3 r4 , r1 , r3 r6 , r1 , r7 r8 , r1 , r9 r10 , r1 , r11 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 39/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Key points in forwarding: I the result in not really needed by the sub until after the add actually produces it I what if that result could be moved from the pipeline register where the add stores it to where the sub needs it? I in this case, the need for a stall can be avoided 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 40/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding ALU result from EX/MEM and MEM/WB pipeline registers is always fed back to ALU input Pipeline registers: IF/ID; ID/EX; EX/MEM; MEM/WB 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 41/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Forwarding HW detects that the previous ALU operation wrote the register corresponding to a source for the current ALU operation Control logic selects the forwarded result as the ALU input, instead of the value read from the register file Pipeline registers: IF/ID; ID/EX; EX/MEM; MEM/WB 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 42/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding If the sub is stalled, the add will be completed and the bypass will not be activated This is also true when an interrupt occurs between the two instructions 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 43/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding Not all potential data hazards can be managed by forwarding Let’s consider the following code: 1 2 3 4 ld r1 ,0( r2 ) sub r4 , r1 , r5 and r6 , r1 , r7 or r8 , r1 , r9 What is the problem here? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 44/59 Datapaths and Stalls (cont.) Minimizing Data Hazard Stalls by Forwarding The ld instruction does not have the data until the end of MEM cycle The sub instruction needs to have the data by the beginning of that CC This case needs a stall 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 45/59 Datapaths and Stalls (cont.) HW changing for Forwarding 1. ALU output at the end of EX stage 2. memory output at the end of MEM stage 3. ALU output at the end of MEM stage HW forwarding including new three paths 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 46/59 Datapaths and Stalls (cont.) HW changing for Forwarding Increase the multiplex to include new paths from pipeline registers Assuming register writing occurs in the first half, and reading occurs in the second half of the cycle (no need for the WB output), otherwise it needs extra forwarding 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 47/59 Datapaths and Stalls (cont.) Control Hazard Control hazards can cause a greater performance loss compared to data hazards When a branch is executed, it may or may not change the PC to a value other than its current value plus four, i.e., next instruction If a branch changes the PC to its target address, it is a taken branch, otherwise it is an untaken branch If instruction i is a taken branch, then the PC is normally not changed until the end of ID, after the completion of the address calculation and comparison 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 48/59 Datapaths and Stalls (cont.) Control Hazard Simplest method for coping with branches: redo the fetch of the instruction following a branch instruction The first IF may be seen as a stall, as branch is detected during ID This scheme is regarded as freeze or flush the pipeline Branch Branch Branch Branch instruction successor successor + 1 successor + 2 IF ID IF EX IF MEM ID IF WB EX ID IF MEM EX ID WB MEM EX Branch causing one-cycle stall in a five-stage pipeline The performance loss is 10-30% for one stall cycle per branch 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 49/59 Datapaths and Stalls (cont.) Control Hazard The predicted-not-taken scheme When branch is untaken, verified during ID, fetch and fall-through ordinarily. When branch is taken during ID, fetch is redone at the branch target 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 50/59 Datapaths and Stalls (cont.) Control Hazard The predicted-taken scheme As the name mentions, treats every branch as taken Considering this five-stage pipeline, the target address is not known any earlier than the branch outcome Therefore, there is no advantage in this approach for this pipeline 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 51/59 Datapaths and Stalls (cont.) Control Hazard The delayed branch scheme, heavily used in early RISC processors 1 2 3 branch instruction sequential successor 1 branch target if taken The sequential successor is in the branch delay slot, i.e., that instruction is executed whether or not the branch is taken Most of processors implementing this technique use single instruction delay, despite longer delays are possible 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 52/59 Datapaths and Stalls (cont.) Control Hazard Delayed branch scheme. Instructions in the delay slot are executed. If the branch is untaken: execution continues with the instruction after the branch delay instruction; otherwise, execution continues at the branch target What if the instruction in the branch delay slot is also a branch? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 53/59 Datapaths and Stalls (cont.) Control Hazard There are lot of room for compilers to play in the delayed branch technique Compilers with optimizations should choose the instructions to be placed after these branch instructions (branch delay slot), and they must be effectively executed 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 54/59 Datapaths and Stalls (cont.) Control Hazard Is it possible to eliminate the delay in branch instructions? Branch-prediction scheme1 - guessing the outcome of the branch condition and proceeding as if this guessing were correct I processor state cannot be affected should any errors occur in the prediction Prediction is very good for performance when there are good prediction hit rates; in many cases this is possible: I end-of-loop testing is always false, except at the last iteration I searches fail in all iterations, except possibly in the last Generally used in superscalar processors, addressed later in the course 1 A Critical Intel Flaw Breaks Basic Security for Most Computers. Wired. 2018. https://www.wired.com/story/critical- intel- flaw- breaks- basic- security- for- most- computers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 55/59 Datapaths and Stalls (cont.) Control Hazard Is it possible to eliminate the delay in branch instructions? Branch-prediction scheme1 - guessing the outcome of the branch condition and proceeding as if this guessing were correct I processor state cannot be affected should any errors occur in the prediction Prediction is very good for performance when there are good prediction hit rates; in many cases this is possible: I end-of-loop testing is always false, except at the last iteration I searches fail in all iterations, except possibly in the last Generally used in superscalar processors, addressed later in the course 1 A Critical Intel Flaw Breaks Basic Security for Most Computers. Wired. 2018. https://www.wired.com/story/critical- intel- flaw- breaks- basic- security- for- most- computers 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 55/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 56/59 Summary Pipeline processor: I enhancement of the multiple clock cycle processor I each functional unit can be used only once per instruction I instructions must use functional units at the same stage as all other instructions I brings considerable performance improvements I also bring new “problems”, e.g., structural, data and control hazards 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 57/59 Outline ILP Pipeline Concepts Major Hurdles of Pipelining Datapaths and Stalls Summary References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 58/59 Information to the reader Lecture notes mainly based on the following references Castro, Paulo André. Notas de Aula da disciplina CES-25 Arquiteturas para Alto Desempenho. ITA. 2018. Hennessy, J. L. and D. A. Patterson. Arquitetura de Computadores: Uma Abordagem Quantitativa. 5a. Campus, 2014. –.Computer Architecture: A Quantitative Approach. 6th. Morgan Kaufmann, 2017. Patterson, D. and S. Kong. Lecture notes, CS152 Computer Architecture and Engineering, Lecture 15: Designing a Pipeline Processor. Online. 1995. 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 59/59