Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a...
![Page 1: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/1.jpg)
Branch Prediction
J. Nelson Amaral
![Page 2: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/2.jpg)
Why Branch Prediction?
• Every 5-7 instruction of a program is a branch• Not predicting, or miss-predicting, is very
costly in architectures with deep pipelines or with many functional units.
Baer p. 129
![Page 3: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/3.jpg)
Anatomy of a Predictor
Baer p. 130
![Page 4: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/4.jpg)
Anatomy of a Branch Predictor
• Event Source: the execution of the program– Predictive information:
• Can be encoded in the instruction code – a bit indicates most likely outcome– forward/backward branch
• Obtained from some profiling informationBaer p. 130
Prog. Exec.
![Page 5: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/5.jpg)
Anatomy of a Branch Predictor (cont.)
• Event Selection: when to predict?– Simple solution: compute the prediction for every
instruction (even non-branches)• Only use the result of the prediction for branches
Baer p. 130
Event Selec.
![Page 6: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/6.jpg)
Anatomy of a Branch Predictor (cont.)
• Prediction Indexing:– Use part of the PC to index prediction tables:
• history of outcome of previous branches at this PC• history of execution path leading to this PC
Baer p. 130
Pred. Index.
![Page 7: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/7.jpg)
Anatomy of a Branch Predictor (cont.)
• Predictor Mechanism:– Static (example):
• forward: always not taken• backward: always taken
– Dynamic:• Finite State Machine predictor: saturating counters• Markov predictor: correlation Baer p. 131
Pred. Mechan.
![Page 8: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/8.jpg)
Anatomy of a Branch Predictor (cont.)
• Feedback and Recovery:– Use real outcome to reinforce prediction– Must recover from miss-predictions
Baer p. 131
Feedback
![Page 9: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/9.jpg)
Control Flow StatisticsApplication % control
flow% cond. branches
(% taken)% Uncond.(% direct)
% calls % returns
SPEC95int 20.4 14.9 (46) 1.1 (77) 2.2 2.1
Desktop 18.7 13 (39) 1.1 (92) 2.4 2.1
A 4-way superscalar has to predict a branch, on average,every other cycle.
Baer p. 131
![Page 10: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/10.jpg)
Interbranch Distances40% of the time there is 1 or 0 cycles betweenpredictions
Branch resolution takes +/- 10 cycles
If the prediction is wrong, up to 40 wronginstructions are in flight by the time theresolution occurs.
Simulation for a 4-way out-of-order architecture Baer p. 131
![Page 11: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/11.jpg)
Static Predictions
Always Taken Always Not Taken
OR
Baer p. 132
![Page 12: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/12.jpg)
Static Predictions
• Early studies indicated that 2/3 of branches are taken– but 30% of those branches were
unconditional!
• For conditional branches there appears to be no preferred direction.
Always TakenBaer p. 132
![Page 13: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/13.jpg)
Alternative Static Predictions
Forward Always Not Taken Backward Always Taken
Accuracy improvementsare barely noticeable.
Static prediction based onprofiling is slightly better.
Static branch-not-takenhas no implementationcost on pipeline.
Baer p. 132
![Page 14: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/14.jpg)
Dynamic Predictors
• Prediction of a given branch changes with the execution of the program.– Simple: a finite-state machine encodes the
outcome of a few recent executions of the branch.– Elaborate: Not only early branch outcomes, but
other correlated parts of the programs are considered.
Baer p. 132
![Page 15: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/15.jpg)
When to predict?• Static prediction: at the
Instruction Decode stage– Know that the instruction
is a branch
• Dynamic prediction: at the Instruction Fetch stage– Calculate prediction for
every instruction, even non-branch ones.
Baer p. 133
![Page 16: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/16.jpg)
What to Predict?
• Branch Direction: Is branch taken on not?
• Branch Target: Address of next instruction for a taken branch
Baer p. 133
![Page 17: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/17.jpg)
Predicting Direction
• Where we find the prediction?
• How to encode the prediction?
Look at the recent past:
What was the direction the last time this samebranch was executed?
A single bit encodes the prediction:
Prediction bit is set at prediction time.
Baer p. 133
![Page 18: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/18.jpg)
Prediction Hysteresis
• Look at the last two resolutions– Two wrong predictions
are necessary to change the prediction
– Motivated by wrong predictions at the end of inner loops.
Baer p. 133
![Page 19: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/19.jpg)
2-Bit Saturating CounterLast two instanceswere taken
Last instancewas taken but theprevious was not
Last two instanceswere not taken
Last instancewas not taken but theprevious was taken
Baer p. 134
![Page 20: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/20.jpg)
2-Bit Saturating Counter (Example)for(i=0 ; i < m ; i++) for(j=0; j<n ; j++) begin S1; S2; …; Sk end;
i ← 0
m ≤ 0
n ≥ 0
j ← 0
S1; S2; …; Sk
j < n
j←j+1
i←i+1
i < m
i←i+1
i j Pred Outc
1-bit
0 0 NT T
0 1 T T
0 n T NT
1 0 NT T
1 1 T TT
NT
2 × m misspredictions
Baer p. 134
![Page 21: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/21.jpg)
2-Bit Saturating Counter (Example)for(i=0 ; i < m ; i++) for(j=0; j<n ; j++) begin S1; S2; …; Sk end;
i ← 0
m ≤ 0
n ≥ 0
j ← 0
S1; S2; …; Sk
j < n
j←j+1
i←i+1
i < m
i←i+1
i j Pred Outc
1-bit
State Pred
2-bit
0 0 NT T wNT NT
0 1 T T sT T
Outc
T
T
0 n T NT sT T
1 0 NT T wT T
NT
T
1 1 T T sT T T
m + 1 misspredictions
T
NT
Baer p. 134
![Page 22: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/22.jpg)
Accuracy of Branch Prediction• Includes unconditional branches• Predictions are associated with branches after each branch’s
first execution
3-bit counters yield onlyminor improvements
Baer p. 135
Average of 26 traces (IBM 379, DEC PDP-11, CDC 6400)
Average of 32 traces (MIPS R2000, Sun SPARC, DEC VAX, Motorola 68000)
Fix prediction. Determined by the first execution of the branch.
![Page 23: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/23.jpg)
Where to store the Prediction
Need one (or two) bit for each possible branch address.
Storing prediction bits with instructions.
Use a cache (Branch Prediction Buffer – BPB).
Solution: ditch the tags.
32-bit address → 230 entries
Need to modifycode every 5 instructions.
Many more bits fortags than for predictions.
Baer p. 136
![Page 24: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/24.jpg)
Pattern History Table (PHT)
Use selected bits from PCto index (or hash) the PHT.
Aliasing: multiple branchesmay index the same PHT entry.
Performance degrades slightly.
Baer p. 136
Each entry of the PHPstores the state of afinite state machineassociated with a branch.
![Page 25: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/25.jpg)
Accuracy of Bimodal Predictor(based on PHT)
Based on 10 SPEC89 traces.
Baer p. 137
![Page 26: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/26.jpg)
Separate PHTSeparate PHTEmbedded in Instruction cacheEmbedded in Instruction cache
Where the Predictor is Stored?
Alpha 21264: 1 counter per instruction? (2K counters)
Sun UltraSPARC:2 counters/cache line(2K counters)
AMD K5:1 counter/cache line(1K counters)
MIPS R10000: (512 counters)
IBM PowerPC 620: (512 counters)
Intel Pentium: Combines PHP with Branch Target Buffer(512 entries)Baer p. 137
![Page 27: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/27.jpg)
Feedback and Recovery
Baer p. 137
Feedback
![Page 28: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/28.jpg)
Feedback: Bimodal Predictor• Feedback: update 2-bit counter for executing
branch• When the updating is done?
– When the actual direction is found (EX stage)Other predictions of the same branch are done.
– When the branch commitsEven more predictions are done.
– Speculatively when the prediction is doneOnly reinforces prediction in bimodal predictor.
Textbook typo (p. 137): choice for the timing of the “update”. Baer p. 137
EX/commit updating makes little difference in performance.
![Page 29: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/29.jpg)
Local × Global Predictor
• Local: – Only use history of the branch to be predicted
• Global:– Use history of other branches that precede the
branch to be predicted.
Baer p. 138
![Page 30: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/30.jpg)
Motivation for Global Prediction
• Example from SPEC program eqntott:
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
If b1 and b2 are taken,then b3 is not taken.
Baer p. 138
![Page 31: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/31.jpg)
Correlator Predictor
History Register
1 inserted to the right when a branchis taken (0 otherwise)
Shifted-out bits are lost
Two-level predictor.
Baer p. 139
![Page 32: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/32.jpg)
Update Problem in theCorrelator Predictor
• PHT is updated non-speculatively at commit stage.
• What is the problem with non-speculative updates of the global register?
Baer p. 139
![Page 33: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/33.jpg)
Updating the Global Register in theCorrelator Predictor
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
Event TimePrediction of b1 tPrediction of b2 t+1Prediction of b3 t+2
Commit of b1 t+5
Branches b1 and b2 are notinclude in the prediction ofbranch b3!
Baer p. 139
![Page 34: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/34.jpg)
Updating the Global Register in theCorrelator Predictor
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
if (aa == 2) /* b1 */ aa = 0;if (bb == 2) /* b2 */ bb = 0;if(aa != bb){ /* b3 */ ….}
Mispredictions and cache missesaffect the commit time of earlierbranches.
•Two consecutive predictions of a branch b may use different ancestors of b.
• Even if the path leading to b is the same
Baer p. 139
![Page 35: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/35.jpg)
Solution to the Update Problem in theCorrelator Predictor
• Update Global Register speculatively when prediction is made.
• New problem: – Need a repair mechanism– All bits after a misprediction
are from branches in the wrong path.
Baer p. 139
![Page 36: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/36.jpg)
Repair Mechanism for Global Register in the Correlator Predictor
• Decode Stage:– Checkpoint current GR into
a FIFO queue• Commit Stage:
– H: head of the queue– The corresponding check-
pointed GR is H.– Correct prediction: discard H– Incorrect prediction: shift
branch outcome into H and make it the new GR.
Baer p. 144
![Page 37: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/37.jpg)
Optimization to GR Checkpointing
Put into the queue a GRthat has the correctedbit shifted into it.
Baer p. 144
![Page 38: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/38.jpg)
Issues with Correlator Predictor
• For small PHTs– Performance is worse than local predictors
• It does not use the location of the branch in the program for the prediction– May introduce excessive aliasing
• Solution to the aliasing problem:– Reintroduce the PC in the indexing of PHT
Baer p. 140
![Page 39: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/39.jpg)
gshare Predictor
A common hash is an XOR function.Baer p. 141
![Page 40: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/40.jpg)
Accuracy and Use of gshare• Almost perfect for SPEC
FP95.• 0.83 accuracy for SPEC
INT95– 0.65 for program go
AMD K5
Sun UltraSPARC
IBM Power4
Baer p. 141
![Page 41: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/41.jpg)
Example• Assume n=4:
– bimodal mispredicts 1/5 times– global mispredicts from 0 to 5
times depending on other branches in the loop
• This branch has a fix pattern:– “4 taken, 1 not taken”
• How can this pattern be learned?– Remember the history of
individual branches• We need predictors more
attuned to locality of individual branches
i ← 0
m ≤ 0
n ≥ 0
j ← 0
S1; S2; …; Sk
j < n
j←j+1
i←i+1
i < m
i←i+1
T
NT
Baer p. 142
![Page 42: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/42.jpg)
global-set predictor
• First Level: A global shift register for correlations• Second Level: A set of multiple PHTs to prevent
aliasing– expensive in terms of storage
• must use few PHTs to be viableBaer p. 142/143
![Page 43: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/43.jpg)
set-global predictor
• Set of Branch History registers (BHT)• A single global PHT
Baer p. 143
![Page 44: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/44.jpg)
set-set predictor
• A set of branch history registers (BHT)• A set of PHTs
Baer p. 143
![Page 45: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/45.jpg)
Predicting the Branch Target
• When is the target of a branch computed?– In a superscalar architecture (p.e., the IA-32 of the
Intel P6) after several pipeline stages.
• What is the point of predicting direction early if we don’t know where the branch goes?– Need to also predict the branch target address.
Baer p. 145
![Page 46: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/46.jpg)
Branch Target Buffer (BTB)
• A cachelike storage that records branch addresses and associated targets
• If there is a hit in BTB for branch predicted taken:– PC ← Target in BTB for branch
Baer p. 146
![Page 47: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/47.jpg)
Integrated BTB-PHT
• BTB needs much more space than the PHT– # of entries is limited by BTB.
• BTB must be accessed on a single cycle
Baer p. 146
![Page 48: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/48.jpg)
Decoupled BTB-PHT
• Parallel BTB and PHT access• if PHT say ‘taken’ and hit in BTB
then PC ← Address in BTB Baer p. 146
![Page 49: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/49.jpg)
Decoupled BTB-PHT
• For space efficiency:– Only taken branches are added
to BTB• They are added at the backend
when the outcome is known.
IBM PowerPC 620: 256-entry, 2-way set-associative BTB2K counter PHT
Baer p. 146
![Page 50: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/50.jpg)
Integrating the BTB with the Branch History Table (BHT)
• The history of all branches needs to be recorded in BTB+BHT• Taken and not taken branches need to be included
Most likely, it is not thesame bit field from the PCthat is used to index the BTB+BHTand to select the PHT
Intel P64-bit local history512 BTB entries# of PHTs not published
What happens on a BTB miss?
“Backward taken, forward not taken” prediction.
Baer p. 147
![Page 51: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/51.jpg)
Two Instances of Mispredictions
• Direction of branch b is mispredicted– Recovery only when b is at the head of the
reorder buffer• lots of instructions to be nullified
• BTB miss for branch b (direction is correctly predicted taken)– Cannot fetch instructions until target is computed
• only affect the filling of the front end
Baer p. 147
![Page 52: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/52.jpg)
misfetch• Branch is correctly predicted taken and• There is a hit in the BTB• but target address is wrong
– caused by indirect jumps• more common in object-oriented languages
– can modify a BTB entry after two misfetches• need a counter with each BTB entry
Intel Pentium MHas an indirect branch predictor associates global history registerswith target address
Baer p. 148
![Page 53: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/53.jpg)
Chapter 2 — Instructions: Language of the Computer — 53
CMPUT 229 Flashback:Procedure Call Instructions
• Procedure call: jump and link
– Address of following instruction put in $ra– Jumps to target address
• Procedure return: jump register
– Copies $ra to program counter– Can also be used for computed jumps
• e.g., for case/switch statements
jal ProcedureLabel
jr $ra
P-H p. 113
![Page 54: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/54.jpg)
Chapter 2 — Instructions: Language of the Computer — 54
Example fact(3)
MIPS assembly:fact:
sub $sp, $sp, 8 # Make room in stack for 2 more itemssw $ra, 4($sp) # save the return addresssw $a0, 0($sp) # save the argument nslt $t0, $a0, 1 # if ($a0<1) then $t01 else $t0 0beq $t0, $zero, L1 # if n 1, go to L1add $v0, $zero, 1 # return 1add $sp, $sp, 8 # pop two items from the stackjr $ra # return to the instruction after jal
L1: sub $a0, $a0, 1 # subtract 1 from argumentjal fact: # call fact(n-1)lw $a0, 0($sp) # just returned from jal: restore nlw $ra, 4($sp) # restore the return addressadd $sp, $sp, 8 # pop two items from the stackmul $v0, $a0, $v0 # return n*fact(n-1)jr $ra # return to the caller
$t0
$v0
3$a0
Processor
0x1000 2000$sp
$ra
$spMemory High Address
0x1000 3FFB addi $a0,$zero,30x1000 4000 jal fact0x1000 4004 ….
Low Address
Pat.-Hen. pp. 136-138and A-26/A-29
int fact ( int n ) { if (n < 1) return(1); else return(n * fact(n-1)); }
![Page 55: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/55.jpg)
Chapter 2 — Instructions: Language of the Computer — 55
Example fact(3)
MIPS assembly:fact:
sub $sp, $sp, 8 # Make room in stack for 2 more itemssw $ra, 4($sp) # save the return addresssw $a0, 0($sp) # save the argument nslt $t0, $a0, 1 # if ($a0<1) then $t01 else $t0 0beq $t0, $zero, L1 # if n 1, go to L1add $v0, $zero, 1 # return 1add $sp, $sp, 8 # pop two items from the stackjr $ra # return to the instruction after jal
L1: sub $a0, $a0, 1 # subtract 1 from argumentjal fact: # call fact(n-1)lw $a0, 0($sp) # just returned from jal: restore nlw $ra, 4($sp) # restore the return addressadd $sp, $sp, 8 # pop two items from the stackmul $v0, $a0, $v0 # return n*fact(n-1)jr $ra # return to the caller
$t0
$v0
3$a0
Processor
0x1000 2000$sp
0x1000 4004$ra
Memory High Address
0x1000 3FFB addi $a0,$zero,30x1000 4000 jal fact0x1000 4004 ….
Low Address
Pat.-Hen. pp. 136-138and A-26/A-29
$sp
int fact ( int n ) { if (n < 1) return(1); else return(n * fact(n-1)); }
![Page 56: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/56.jpg)
Chapter 2 — Instructions: Language of the Computer — 56
Example fact(3)
MIPS assembly:fact:
sub $sp, $sp, 8 # Make room in stack for 2 more itemssw $ra, 4($sp) # save the return addresssw $a0, 0($sp) # save the argument nslt $t0, $a0, 1 # if ($a0<1) then $t01 else $t0 0beq $t0, $zero, L1 # if n 1, go to L1add $v0, $zero, 1 # return 1add $sp, $sp, 8 # pop two items from the stackjr $ra # return to the instruction after jal
L1: sub $a0, $a0, 1 # subtract 1 from argumentjal fact: # call fact(n-1)lw $a0, 0($sp) # just returned from jal: restore nlw $ra, 4($sp) # restore the return addressadd $sp, $sp, 8 # pop two items from the stackmul $v0, $a0, $v0 # return n*fact(n-1)jr $ra # return to the caller
1$t0
6$v0
3$a0
Processor
0x1000 2000$sp
0x1000 4004$ra
0x1000 4004
3
0x1000 6FEC
2
0x1000 6FEC
1
$spMemory High Address
0x1000 3FFB addi $a0,$zero,30x1000 4000 jal fact0x1000 4004 ….
Low Address
0x1000 6FEC
0
Pat.-Hen. pp. 136-138and A-26/A-29
int fact ( int n ) { if (n < 1) return(1); else return(n * fact(n-1)); }
![Page 57: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/57.jpg)
Chapter 2 — Instructions: Language of the Computer — 57
Example fact(3)
MIPS assembly:fact:
sub $sp, $sp, 8 # Make room in stack for 2 more itemssw $ra, 4($sp) # save the return addresssw $a0, 0($sp) # save the argument nslt $t0, $a0, 1 # if ($a0<1) then $t01 else $t0 0beq $t0, $zero, L1 # if n 1, go to L1add $v0, $zero, 1 # return 1add $sp, $sp, 8 # pop two items from the stackjr $ra # return to the instruction after jal
L1: sub $a0, $a0, 1 # subtract 1 from argumentjal fact: # call fact(n-1)lw $a0, 0($sp) # just returned from jal: restore nlw $ra, 4($sp) # restore the return addressadd $sp, $sp, 8 # pop two items from the stackmul $v0, $a0, $v0 # return n*fact(n-1)jr $ra # return to the caller
1$t0
6$v0
3$a0
Processor
0x1000 2000$sp
0x1000 4004$ra
0x1000 4004
3
0x1000 6FEC
2
0x1000 6FEC
1
$spMemory High Address
0x1000 3FFB addi $a0,$zero,30x1000 4000 jal fact0x1000 4004 ….
Low Address
0x1000 6FEC
0
Pat.-Hen. pp. 136-138and A-26/A-29
int fact ( int n ) { if (n < 1) return(1); else return(n * fact(n-1)); }
![Page 58: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/58.jpg)
Call/Return Mechanisms
foo(….){ …0x10001000 jal bar0x10001004 … …0x10001800 jal bar0x10001804 … …0x10001CE4 jal bar0x10001CE8 … ...}
bar(….){ …0x1000F0E0 jal baz0x1000F0E4 … ... jar $ra}
baz(….){ ... jar $ra}
How to predict the next instructionto be executed after the return?
We know that the branch is always taken.
The return address is known sincethe time of each call!
Baer p. 150
![Page 59: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/59.jpg)
Return Address Stack
foo(….){ …0x10001000 jal bar0x10001004 … …0x10001800 jal bar0x10001804 … …0x10001CE4 jal bar0x10001CE8 … ...}
bar(….){ …0x1000F0E0 jal baz0x1000F0E4 … ... jar $ra}
baz(….){ ... jar $ra}
Pop address from stack at return.
Push return address into stackat the function call.
Stack is a circular FIFO. Wrong address on overflow. What is the best strategy to handle FIFO overflow? Baer p. 150
![Page 60: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/60.jpg)
Speculative calls and returns
foo(….){ …0x10000FFC beq … target0x10001000 jal bar0x10001004 … …target:0x10001800 jal baz0x10001804 … …0x10001CE4 jal bar0x10001CE8 … ...}
bar(….){ …0x1000F0E0 bne … next0x1000F0E4 jr $ra ...next: ….}
Function calls and returns executedin the predicted path of a branchchange the return address stack.
Need a recovery mechanism for thereturn address stack.
If a single path is followed, save thepointer to the top of the stack on abranch prediction and restore it incase of misprediction. Baer p. 150
![Page 61: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/61.jpg)
Return StacksMIPS R10000: 1-entry return stack
DEC Alpha 21164:12-entry return stack
Intel Pentium III: 16-entry return stackBaer p. 151
![Page 62: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/62.jpg)
A different way of doing things…
Don’t know which way to go?
“Some people go both ways.”
(Scarecrow, The Wizard of Oz)
Baer p. 151
![Page 63: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/63.jpg)
IBM System 360/91
• Upon decoding a branch:– fetch, decode, and enqueue both the taken and
the not taken paths into separate buffers
• Upon branch resolution:– one buffer becomes the execution path– the other is discarded
Baer p. 151
![Page 64: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/64.jpg)
In a restricted version …Branch is predicted
taken
There is aBTB hit
Instruction Cache Line:
Branch Instruction Resume Buffer:@#$&%misprediction!
Fetch from Resume Buffer!
MIPS R10000Intel P6
Fall-through instructions in cache line
Baer p. 151
![Page 65: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/65.jpg)
Loop Detector
• A separate loop predictor detects loop patterns:– TTTTTTTNTTTTTTTNTTTTTTTNTTTTTTTNTT….
• Uses a separate counter for each recognized loop
Intel Pentium M
Baer p. 151
![Page 66: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/66.jpg)
Sophisticated Predictors• Tension:
– Branch Correlation (global information) × Individual Branch Patterns (local information)
• neutral aliasing– between branches biased the same way
• destructive aliasing– between branches with opposite bias
• bias bit– added to BTB– PHT predicts if direction agrees with the bias bit
• two branches with strong opposite bias that alias do not destroy each other prediction.
Baer p. 152
![Page 67: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/67.jpg)
skewed predictor
• Goal: reduce aliasing• Use three PHTs
– different hashing function for each PHT– Take majority vote
Baer p. 153
![Page 68: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/68.jpg)
hybrid (or combining) predictor
Two different prediction strategies
Tournament predictor:predicts which strategyshould be used
Baer p. 156
![Page 69: Branch Prediction J. Nelson Amaral. Why Branch Prediction? Every 5-7 instruction of a program is a branch Not predicting, or miss-predicting, is very.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a3068e/html5/thumbnails/69.jpg)
Tournament Predictor
Baer p. 155