ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf ·...
Transcript of ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf ·...
ECE 486/586
Computer Architecture
Lecture # 16
Spring 2019
Portland State University
Lecture Topics
• Branch Prediction
Reference:
• Chapter 3: Section 3.3
Why Predict Branches?
• The decision about control flow (where to fetch the next instruction from?) is made in the fetch stage
• The branch penalty is non-zero because when the processor computes the branch outcome (in decode stage), a useless instruction may have already been fetched and needs to be discarded
• To prevent the fetching of useless instruction, the processor needs to know about the branch outcome in the fetch stage
• This involves the following steps:
– Anticipating that the instruction being fetched is a branch instruction
– Predicting whether the branch instruction will be taken or not taken
– Predicting the branch target address (for a taken branch)
Basic Branch Prediction
• Branch prediction buffer (branch history table)
• Memory indexed by low order bits of branch instruction address
• Stores previous branch outcomes to predict next outcome
• Memory is not tagged (unlike cache)
• Consequence: entry may reflect a different branch (aliasing)
PC
10PC[11:2]
210 = 1K entries
Static Branch Prediction
• In static branch prediction, the prediction made for a conditional branch remains constant (static) throughout the execution of a program
• Example 1: Always-predict-not-taken– Simplest form of prediction, always fetch next instruction in the sequential order
– In case of a misprediction, the incorrectly fetched instruction is discarded and branch penalty is incurred
– Low prediction accuracy because many branches in the program are taken
• Typically, branch outcomes are not completely random
• In a loop with many iterations, forward branches (beginning of loop) are mostly not taken and backward branches (end of loop) are mostly taken
• Example 2: Predict not-taken for forward branches and taken for backward branches– Improves prediction accuracy as compared to the always-not-taken prediction
– Mispredictions still happen during the last loop iteration
Dynamic Branch Prediction
• Outcomes for a branch instruction often change during program execution– Static prediction may result in high misprediction accuracy
• But, outcomes for a particular branch often follow a predictable pattern
• Key idea behind dynamic branch prediction:– Track the past outcomes for a branch instruction to make predictions about
future outcomes
• In its simplest form, a dynamic prediction algorithm can use the result of the most recent execution of a branch instruction– This result can be captured in a single bit (e.g., “0” if the branch was taken and
“1” if the branch was not taken)
– The processor assumes that the next time, the branch instruction is executed, its outcome is the same as the last time
1-bit Branch Prediction
• The algorithm is implemented by a 2-state state machine:LT -- Branch is likely to be taken
LNT -- Branch is likely not to be taken• The prediction for a branch is based on the current state of the state machine • The state transitions are based on the actual outcome computed after the branch has been executed
Example
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2
3
4
5
6
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3
4
5
6
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3 LT T NT LNT
4 LNT NT T LT
5 LT T T LT
6 LT T NT LNT
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3 LT T NT LNT
4 LNT NT T LT
5 LT T T LT
6 LT T NT LNT
Prediction Accuracy = 2/6
Mispredictions happen during both the first and last iterations of the loop => one bit of state not enough to capture the branch outcome pattern accurately
2-bit Branch Prediction
ST: Strongly likely to be takenLT: Likely to be taken
LNT: Likely not to be takenSNT: Strongly likely not to be taken
Branch predicted as Not taken in these two states
Branch predicted as Taken in these two states
Example
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LT T T ST
2 ST T T ST
3 ST T NT LT
4 LT T T ST
5 ST T T ST
6 ST T NT LT
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LT T T ST
2 ST T T ST
3 ST T NT LT
4 LT T T ST
5 ST T T ST
6 ST T NT LT
Prediction Accuracy = 4/6
Mispredictions happen only during the last iteration of the loop => less mispredictionsthat 1-bit prediction
Prediction Accuracy of 4K 2-bit Predictor
Having more Entries Isn’t the Solution
Correlating Branch Predictors
• Simple 2-bit prediction schemes use branch history of single branch to predict future behavior of that branch. This is called a local branch prediction
• Behavior of other branches may have impact on the current branch
• Outcomes of different branches often correlated
Example:
If (a == 2)
a = 0;
If (b == 2)
b = 0;
If ( a == b) {
}
If the first two branches are not taken, then the third one is taken. Local branch prediction cannot capture this behavior
DADDi R3, R1, -2
BNEZ R3, L1 ; a != -2
DADD R1, R0, R0
L1: DADDI R3, R2, -2
BNEZ R3, L2 ; b!= -2
DADD R2, R0, R0
L2: DSUB R3, R1, R2
BEQZ R3, L3 ; a== b
Correlating Branch Predictor with 2-bit Global History Register
Branch Address
11
2-bit per-branch predictors
3
01
10 11 00Prediction
= 11
• Correlating (or 2-level) Predictors use the behavior of other branches (global branch history) to make branch predictions
• Can extend branch history as m-bits recording history of last m branches• Requires 2m tables of length
2(branch address bits used)
• Global branch history implemented as a m-bit shift register where each bit records whether a branch was taken or not taken
2-bit global branch history
Correlating Branch Predictor with m-bit Global History Register
(m,n) correlating predictor uses behavior of last m branches to choose from 2m
branch predictors, each of which is an n-bit predictor
Total number of bits = 2m * n * Number of entries in each prediction table= 2m * n * 2(branch address bits used)
For a predictor that does not use any global history, m = 0, e.g., a (0,2) is a 2-bit predictor with no global history
Branch Address
1..0
n-bit per-branch predictors
3
10….1
10..1 0..1 0..0
m-bit global branch history
Correlating Predictor Examples
Question: How many bits are in the (0,2) branch predictor with 4K entries? How many entries are in a (2,2) predictor with the same number of bits?
Solution:
Number of bits = 2m * n * 2(branch address bits used)
For the (0,2) predictor:
Number of bits = 20 * 2 * 4K = 8K bits
For the (2,2) predictor:
Number of bits = 8K
8K = 22 * 2 * Number of predictor entries
=> Number of predictor entries = 1K
Comparison of 2-bit Predictors