cap6-lecture-notes-117-132-4-6.pdf

Full Transcript

6 Branch Prediction & Speculative Superscalar Processors Example. The misprediction is reduced to one case when applying the 2-bit prediction buffer to the code from Listing 6.4. Here, the state “11” is strongly related to the “predicted taken”, and the state “00” is strongly related to the...

6 Branch Prediction & Speculative Superscalar Processors Example. The misprediction is reduced to one case when applying the 2-bit prediction buffer to the code from Listing 6.4. Here, the state “11” is strongly related to the “predicted taken”, and the state “00” is strongly related to the “predicted not taken”. Branch-Target Buffers The general branch prediction approach allows to determine, i.e., predict, which is the next instruction before concluding the current instruction. The current instruction can be even delayed due to dependencies. Such an approach still needs to decode, during the ID cycle, the instruction in order to compute the target branch address. That can cause delays if the branch is taken, when assuming the predicted-not-taken approach is in place. What about determining the target branch address during the IF cycle. This seems to be the ideal solution. The challenge is to determine whether the yet-undecoded instruction is a branch. And, if so, what is the next PC value. The knowledge of such information gives a zero penalty delay branch. For this purpose, there is the branch-target buffer technique, also known as the branch-target cache. It is a branch-prediction cache that stores the predicted address for the next instruction following a branch. As illustrated in Fig. 6.2, the first column of the buffer contains only the addresses of the predicted-taken branches. In the second column there is the predicted PC, i.e., the next PC after the branch. Therefore, a “new” fetching begins immediately at that address, i.e., the predicted PC. The hardware for this branch-target buffer is identical to the hardware for a cache memory. Figure 6.2: Branch-target buffer indexed by the branch instruction address. Here, the PC of the instruction to fetch is searched for a match in the buffer. When there is a match, it means the branch is a taken branch with the predicted PC to be used as the next PC. The main steps regarding the branch-target buffer are detailed as shown in Fig. 6.3. The branch-target buffer can determine the next predicted PC at the end of the IF phase, since it uses the branch address instruction as index. Without the buffer it would be possible to determine the next predicted PC only after the ID phase. 114 Dynamic Branch Prediction Figure 6.3: Branch target algorithm and the corresponding cycles in the left-hand side. Correlating Predictors In the technique of correlating predictors, a previous branch (e.g., branch 1 in Listing 6.5) can be used to determine the behavior of the next branch (e.g., branch 2 in Listing 6.5). Listing 6.5: Correlating predictors code example. 1 if ( k ==0) // branch 1 2 k =1; 3 if ( k ==1) // branch 2 4... Given this, it is possible to create two predictions to branch 2 (Listing 6.5). One for the case where the branch 1 is taken, and another one in case of branch 1 is untaken. Let’s consider the codes as presented in Listings 6.6 and 6.7. Listing 6.6: C original code, related to the correlating branch predictors. 1 if ( aa ==2) 2 aa =0; 3 if ( bb ==2) 4 bb =0; 5 if ( aa != bb ){... 115 6 Branch Prediction & Speculative Superscalar Processors Listing 6.7: RISC-V code generated code from Listing 6.6. Here, the C language’s variable aa is assigned to the RISC-V register x1, and bb to x2. 1 addi x3 , x1 , -2 2 bnez x3 , L1 // branch b1 ( aa !=2) 3 add x1 , x0 , x0 // aa

Use Quizgecko on...
Browser
Browser