csc25-chapter_06.pdf

CSC-25 High Performance Architectures Lecture Notes – Chapter VI Branch Prediction & Speculative Superscalar Processors Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents Simple Example Speculative Execution Loop-Based Example Overview Algorithm’s Details Control Dependencies Multiple Issue Superscalar Processors Dynamic Branch Prediction Concept Overview Example Branch-Prediction Buffers Alternatives to Tomasulo’s Algorithm Branch-Target Buffers Software Techniques Correlating Predictors Basic Pipeline Scheduling Wrap-up Loop Unrolling Hardware-Based Speculation Wrap-up Overview Loop Unrolling Key Ideas Instruction-Level Parallelism - ILP Extended Tomasulo’s Algorithm References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/59 Outline Speculative Execution Dynamic Branch Prediction Hardware-Based Speculation Multiple Issue Superscalar Processors Alternatives to Tomasulo’s Algorithm Wrap-up References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/59 Speculative Execution Overview Is it still possible to unroll loops without speculative execution in case the branches have RAW dependence/hazards with an iteration? I in this case, no I speculative execution is required 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/59 Speculative Execution (cont.) Overview New questions wrt speculative superscalar execution I more hardware needed, e.g., reservation stations, functional units I more complex control and dependencies detection I additional memory, to guarantee the correction in case of prediction error, i.e., misprediction I needs more efficient branch prediction to pay the new hardware cost with performance 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/59 Speculative Execution (cont.) Control Dependencies As the number of executed instructions per clock cycle increases, the potential instructions flow also increases I CPI decreases I delays caused by branches can seriously impact performance How to minimize this problem? I “simple”, do not stop or reduce speed in branches 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/59 Outline Speculative Execution Dynamic Branch Prediction Hardware-Based Speculation Multiple Issue Superscalar Processors Alternatives to Tomasulo’s Algorithm Wrap-up References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/59 Dynamic Branch Prediction Overview Static branch prediction I predicted-not-taken1 I predicted-taken Dynamic branch prediction I branches can be executed many times during a program execution, e.g., loops 1 Untaken: fetch and fall-through ordinarily; taken (during ID): fetch is redone at the branch target 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/59 Dynamic Branch Prediction (cont.) Overview More parallel instructions mean greater impact due to branch delays Predicted-not-taken approach may be inefficient 1 addi R1,R1,0 2 addi R4,R3,5000 3 loop: 4 load R2,0(R3) 5 addi R3,R3,1 6 add R1,R1,R2 7 bne R3,R4,loop 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/59 Dynamic Branch Prediction (cont.) Branch-Prediction Buffers Possible solution to the that problem? Relate one bit to each branch decision I if last iteration was taken, prediction is taken I otherwise, prediction is untaken Still needs to compute the target branch address 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/59 Dynamic Branch Prediction (cont.) Branch-Prediction Buffers Branch-prediction buffers Simplest dynamic branch-prediction scheme, a.k.a. branch history table Small memory indexed by the lower portion of the branch instruction address 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/59 Dynamic Branch Prediction (cont.) Branch-Prediction Buffers Memory contains a bit associated with I indicates whether a branch was recently taken or untaken Not sure if the prediction is correct I may have been put there by another branch with same low-order address bits I prediction is a hint assumed to be correct I fetching begins in the predicted direction I if the hint is wrong, the prediction bit is inverted and stored back 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/59 Dynamic Branch Prediction (cont.) Branch-Prediction Buffers Why using just the low-order address bits? Any problem with the low-order address bits? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/59 Dynamic Branch Prediction (cont.) Branch-Prediction Buffers 1-bit prediction scheme shortcoming I even if a branch is almost always taken, it’s likely to mispredict twice, rather than once when it is not taken I i.e., misprediction causes the prediction bit to be flipped 1 do{ 2 // possible duplicated error (misprediction) 3 // wrt ``predicted-taken'' in the inner-loop 4 do{ 5 sum+=v[i][j++]; 6 } 7 while(j

csc25-chapter_06.pdf

Document Details

Tags

Related

Full Transcript