CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted...
-
Upload
oswald-walton -
Category
Documents
-
view
215 -
download
1
Transcript of CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted...
CPE 731 Advanced Computer
Architecture
ILP: Part II – Branch Prediction
Dr. Gheith Abandah
Adapted from the slides of Prof. David Patterson, University of California, Berkeley
04/18/23 CPE 731, ILP2 2
Outline
• Static Branch Prediction
• Dynamic Branch Prediction
• Branch History Table
• Correlated Branch Prediction
• Tournament Predictors
• Branch Target Buffer
04/18/23 CPE 731, ILP2 3
12%
22%
18%
11% 12%
4%6%
9% 10%
15%
0%
5%
10%
15%
20%
25%
compress
eqntott
espresso gc
c li
doduc
ear
hydro2d
mdljdp
su2cor
Mis
pre
dic
tio
n R
ate
Static Branch Prediction
• We learned how to schedule code around delayed branch
• To reorder code around branches, need to predict branch statically when compile
• Simplest scheme is to predict a branch as taken– Average misprediction = untaken branch frequency = 34% SPEC
• More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run:
Integer Floating Point
04/18/23 CPE 731, ILP2 4
Dynamic Branch Prediction
• Why does prediction work?– Underlying algorithm has regularities
– Data that is being operated on has regularities
– Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems
• Is dynamic branch prediction better than static branch prediction?– Seems to be
– There are a small number of important branches in programs which have dynamic behavior
• Performance = ƒ(accuracy, cost of misprediction)
04/18/23 CPE 731, ILP2 5
Branch History Table
• Use the lower k bits of the PC to address index the table
Branch Address
Branch History Table (BHT)
2k entries
Predict Taken or not Taken
k <n>
04/18/23 CPE 731, ILP2 6
BHT, n=1
• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):– End of loop case, when it exits instead of looping as before
– First time through loop on next time through code, when it predicts exit instead of looping
0 1
Not taken
TakenNot taken
Taken
Predict not taken
Predict taken
04/18/23 CPE 731, ILP2 7
• Solution: 2-bit scheme where change prediction only if get misprediction twice
• Red: stop, not taken
• Green: go, taken
• Adds hysteresis to decision making process
BHT, n=2
T
T NT
NT
Predict Taken
Predict Not Taken
Predict Taken
Predict Not TakenT
NTT
NT
04/18/23 CPE 731, ILP2 8
18%
5%
12%10%
9%
5%
9% 9%
0%1%
0%2%4%6%8%
10%12%14%16%18%20%
eqntott
espresso gc
c li
spice
doduc
spice
fpppp
matrix300
nasa7
Mis
pre
dic
tio
n R
ate
BHT Accuracy
• Mispredict because either:– Wrong guess for that branch
– Got branch history of wrong branch when index the table
• 4096 entry table:
IntegerFloating Point
04/18/23 CPE 731, ILP2 9
Outline
• Static Branch Prediction
• Dynamic Branch Prediction
• Branch History Table
• Correlated Branch Prediction
• Tournament Predictors
• Branch Target Buffer
04/18/23 CPE 731, ILP2 10
Correlated Branch Prediction
• Aka Two-Level Predictors
• Motivation1. if (a==2) a=0;
2. if (b==2) b=0;
3. if (a != b) {…}
Can predict that Condition (3) is not true if Conditions (1) and (2) are true.
04/18/23 CPE 731, ILP2 11
Correlated Branch Prediction
• Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table
• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters– Thus, old 2-bit BHT is a (0,2) predictor
• Global Branch History: m-bit shift register keeping T/NT status of last m branches.
• Each entry in table has m n-bit predictors.
04/18/23 CPE 731, ILP2 12
Correlating Branch Prediction
(2,2) predictor
– Behavior of recent branches selects between four predictions of next branch, updating just that prediction
Branch address
2-bits per branch predictor
Prediction
2-bit global branch history
4
04/18/23 CPE 731, ILP2 13
Correlated Branch Prediction
Example:
Find the size of (2, 2) predictor with k=10
Size = 2m * n * 2k
= 4 * 2 * 1K = 8 Kbits
04/18/23 CPE 731, ILP2 14
0%
Fre
quen
cy o
f M
isp
redi
ctio
ns
0%1%
5%6% 6%
11%
4%
6%5%
1%2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Accuracy of Different Schemes
4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT
nasa
7
mat
rix3
00
dodu
cd
spic
e
fppp
p
gcc
expr
esso
eqnt
ott li
tom
catv
04/18/23 CPE 731, ILP2 15
Outline
• Static Branch Prediction
• Dynamic Branch Prediction
• Branch History Table
• Correlated Branch Prediction
• Tournament Predictors
• Branch Target Buffer
04/18/23 CPE 731, ILP2 16
Tournament Predictors
• Multilevel branch predictor
• Use n-bit saturating counter to choose between predictors
• Usual choice between global and local predictors
04/18/23 CPE 731, ILP2 17
Tournament Predictors
Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:
• Global predictor
– 4K entries index by history of last 12 branches (212 = 4K)
– Each entry is a standard 2-bit predictor
• Local predictor
– Local history table: 1024 10-bit entries recording last 10 branches, index by branch address
– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters
04/18/23 CPE 731, ILP2 18
Alpha 21264 Branch Predictor
Selector
Global
Local
BHR 4K
<2>
addr 4K
<2>
<12>
k=12PredictionSelect
addr 1K10
1K
<10> <3>
Size = 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29 Kbits
04/18/23 CPE 731, ILP2 19
Comparing Predictors (Fig. 2.8)
• Advantage of tournament predictor is ability to select the right predictor for a particular branch– Particularly crucial for integer benchmarks.
– A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks
04/18/23 CPE 731, ILP2 20
Outline
• Static Branch Prediction
• Dynamic Branch Prediction
• Branch History Table
• Correlated Branch Prediction
• Tournament Predictors
• Branch Target Buffer
• Branch target calculation is costly and stalls the instruction fetch.
• BTB stores PCs the same way as caches
• The PC of a branch is sent to the BTB
• When a match is found the corresponding Predicted PC is returned
• If the branch was predicted taken, instruction fetch continues at the returned predicted PC
Branch Target Buffers (BTB)
04/18/23 CPE 731, ILP2 23
Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)
11
13
7
12
9
10 0 0
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
Bra
nch
mis
pre
dic
tion
s p
er
10
00
In
str
ucti
on
s
SPECint2000 SPECfp2000
6% misprediction rate per branch SPECint (19% of INT instructions are branch)
2% misprediction rate per branch SPECfp(5% of FP instructions are branch)
04/18/23 CPE 731, ILP2 24
Dynamic Branch Prediction Summary
• Prediction becoming important part of execution
• Branch History Table: 2 bits for loop accuracy
• Correlation: Recently executed branches correlated with next branch– Either different branches (GA)
– Or different executions of same branches (PA)
• Tournament predictors take insight to next level, by using multiple predictors – usually one based on global information and one based on local
information, and combining them with a selector
– In 2006, tournament predictors using 30K bits are in processors like the Power5 and Pentium 4
• Branch Target Buffer: include branch address & prediction