Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University...
-
Upload
aron-short -
Category
Documents
-
view
216 -
download
1
Transcript of Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University...
Predicting Conditional Predicting Conditional Branches With Fusion-Branches With Fusion-
Based Hybrid PredictorsBased Hybrid Predictors
Gabriel H. LohGabriel H. Loh Yale UniversityYale University
Dept. of Computer ScienceDept. of Computer Science
Dana S. HenryDana S. Henry Yale UniversityYale University
Depts. of Elec. Eng. & Comp. Depts. of Elec. Eng. & Comp. Sci.Sci.
This research was funded by NSF Grant MIP-9702281
The Branch Prediction The Branch Prediction ProblemProblem
• 1 out of 5 instructions is a branch1 out of 5 instructions is a branch• May require many cycles to resolveMay require many cycles to resolve
– P4 has 20 cycle branch resolution pipelineP4 has 20 cycle branch resolution pipeline– Future pipeline depths likely to increase Future pipeline depths likely to increase
[Sprangle02][Sprangle02]
• Predict branches to keep pipeline fullPredict branches to keep pipeline full
PC Compute Branch resolution
Bigger Predictors = More Bigger Predictors = More AccurateAccurate
• Larger predictors tend to yield more Larger predictors tend to yield more accurate predictionsaccurate predictions
• Faster cycle times force smaller Faster cycle times force smaller branch predictorsbranch predictors
• Overriding predictorOverriding predictor couples small, couples small, fast predictor with a large, multi-fast predictor with a large, multi-cycle predictor [Jiménez2000]cycle predictor [Jiménez2000]– performs close to ideal large-fast performs close to ideal large-fast
predictorpredictor
(but bigger predictors = slower)(but bigger predictors = slower)
Hybrid PredictorsHybrid Predictors• Wide variety of branch prediction Wide variety of branch prediction
algorithms availablealgorithms available• Hybrid combines more than one “stand-Hybrid combines more than one “stand-
alone” or alone” or componentcomponent predictor predictor [McFarling93]:[McFarling93]:
PP11 PP22Meta-Meta-
PredictorPredictor
Final PredictionFinal Prediction
Multi-HybridsMulti-Hybrids
PP11 PP22 PPnn
Pr. Encoder
…
… …
…
Final PredictionFinal Prediction
PP11 PP22MM11 PP33 PP44MM22
MM33
Final PredictionFinal Prediction
““Multi-Hybrid” [Evers96]Multi-Hybrid” [Evers96] ““Quad-Hybrid” [Evers00]Quad-Hybrid” [Evers00]
Our Idea: Prediction FusionOur Idea: Prediction Fusion
PP11 …
…
PP22 PP33 PPnn
Prediction Selection
PP11 …
…
PP22 PP33 PPnn
Prediction Fusion
Early Attempt from MLEarly Attempt from ML
• Weighted Majority algorithm [LW94]Weighted Majority algorithm [LW94]– Better predictors get assigned larger weightsBetter predictors get assigned larger weights– Make final prediction with larger sumMake final prediction with larger sum
• Predictor with largest weight not always correctPredictor with largest weight not always correct
0.487 0.513
PP22 PP66PP77 PP11
PP33 PP44PP55
PP88
P2, P6 and P7 say “not-taken”P1, P3, P4, P5 and P8 say “taken”
OutlineOutline
• COLT PredictorCOLT Predictor• Choosing parameters and Choosing parameters and
componentscomponents• PerformancePerformance• Prediction distributions, component Prediction distributions, component
choicechoice
COLT OrganizationCOLT Organization
Branch AddressBranch AddressBranch HistoryBranch History
PP11 PP22 PP33 PPnn
11 00 11 00……
…
MappingMappingTableTable
VMTVMT
…
Final PredictionFinal Prediction
Pathological ExamplePathological Example
PP11 PP22 PP33
00 00 00
Actual outcome = 1 (taken)Actual outcome = 1 (taken)
Example (cont’d)Example (cont’d)
PP11 PP22 PP33
00 00 00
Outcome is always wrongOutcome is always wrong
Selection:Selection:
PP11 PP22 PP33
1 1 0 10 0 0
Can recognizeCan recognizeand rememberand rememberthis patternthis pattern
11
COLT:COLT:
VMTVMT
COLT Lookup DelayCOLT Lookup Delay
1 0 0 1 1…
......
......
PP11PP22
PPnn
PredictionPrediction
timetime
…
MT SelectMT Select
critical delaycritical delay
Design ChoicesDesign Choices
• # of branch address bits# of branch address bits• # of branch history bits# of branch history bits
• # of components# of components
• Choice of componentsChoice of components– gshare, PAs, gskewed, …gshare, PAs, gskewed, …– History length, PHT size, …History length, PHT size, …
}}Determines number ofDetermines number ofmapping tablesmapping tables
}}Determines size ofDetermines size ofindividual MT’sindividual MT’s
Predictor ComponentsPredictor Components• Global HistoryGlobal History
– gshare [McFarling93]gshare [McFarling93]– Bi-Mode [Lee97]Bi-Mode [Lee97]– Enhanced gskewed Enhanced gskewed
[Michaud97][Michaud97]– YAGS [Eden98]YAGS [Eden98]
• Local HistoryLocal History– PAs [Yeh94]PAs [Yeh94]– pskewed [Evers96]pskewed [Evers96]
• OtherOther– 2bC (bimodal) [Smith81]2bC (bimodal) [Smith81]– Loop [Chang95]Loop [Chang95]– alloyed Perceptron alloyed Perceptron
[Jiménez02][Jiménez02]
}}history lengthshistory lengthsoptimized onoptimized ontest data setstest data sets
Total of 59 configurationsTotal of 59 configurationsSizes vary up to 64KBSizes vary up to 64KB
Huge Search SpaceHuge Search Space
• 225959 ways to choose components ways to choose components ways to choose COLT parametersways to choose COLT parameters• We use a genetic searchWe use a genetic search
…
bit-k = 0 means don’t include Pbit-k = 0 means don’t include Pkk
bit-k = 1 means do include Pbit-k = 1 means do include Pkk
VMT SizeVMT Size historyhistorylengthlength
gene format:gene format:……
MethodologyMethodology
• SPEC2000 integer benchmarksSPEC2000 integer benchmarks– For tuning/optimization: 10M branches For tuning/optimization: 10M branches
from testfrom test– For evaluation: 500M branches from trainFor evaluation: 500M branches from train
• Skipped first 100M branchesSkipped first 100M branches
– Compiled with Compiled with cc –arch ev6 –O4 –fast –non_sharedcc –arch ev6 –O4 –fast –non_shared
• SimpleScalar simulatorSimpleScalar simulator– sim-safe for trace collectionsim-safe for trace collection– MASE for ILP simulationsMASE for ILP simulations
Genetic Search COLT Genetic Search COLT ResultsResults
NamNamee
SizeSize
(KB)(KB)ComponentsComponents VMTVMT CounteCounte
r widthr width
HistorHistory y
lengthlength
1616alpct(34/alpct(34/1010) ) gskewed(12)gskewed(12)
gshare(8)gshare(8)20482048 44 88
3232alpct(34/alpct(34/1010) ) gshare(15)gshare(15)
gshare(9) PAs(gshare(9) PAs(77))81928192 44 77
6464alpct(40/alpct(40/1414) )
gshare(16) YAGS(11) gshare(16) YAGS(11) pskewed(pskewed(66))
1638416384 44 1010
128128
alpct(40/alpct(40/1414) ) alpct(38/alpct(38/1414) ) gshare(16) gshare(16)
gskewed(13) gskewed(13) YAGS(12) PAs(YAGS(12) PAs(88))
1638416384 44 77
256256
alpct(50/alpct(50/1818) ) alpct(34/alpct(34/1010) )
gshare(18) Bi-gshare(18) Bi-Mode(16) Mode(16)
gskewed(15) PAs(gskewed(15) PAs(88))
3276832768 44 44
ILP PerformanceILP Performance
• Simulated CPU:Simulated CPU:– 6-issue6-issue– 20 cycle pipeline20 cycle pipeline– Same functional units, latencies, caches Same functional units, latencies, caches
as as IntInteell P4/NetBurst microarchitecture P4/NetBurst microarchitecture
1-cycle1-cycle2bC2bC
4-cycle4-cycleOR alpctOR alpct
++ ++
4-cycle4-cycleOR COLTOR COLT
IdealIdeal1-cycle1-cycleCOLTCOLT
COLT Parameter COLT Parameter SensitivitySensitivity
• Mapping table counter widthsMapping table counter widths• Number of mapping tablesNumber of mapping tables• Number of history bits for VMT Number of history bits for VMT
indexindex
Explaining Choice of Explaining Choice of ComponentsComponents
• Parameter sensitivity results shows Parameter sensitivity results shows GA performed well for the COLT GA performed well for the COLT parametersparameters
• Why did it choose the component Why did it choose the component predictors that it did?predictors that it did?
Classifying COLT Classifying COLT PredictionsPredictions
• We examined the We examined the (32KB) COLT config. (32KB) COLT config.• For each mapping table lookup, we For each mapping table lookup, we
examine the neighboring entries:examine the neighboring entries:
PP11 PP22 PP33 PP44
11 00 00 11 1111
0010
1001
entry entry 00001 = NT001 = NT
entry 1001 = Tentry 1001 = T
entry 1entry 11101 = T01 = T
Classifying Predictions Classifying Predictions (cont’d)(cont’d)
easy: all neighboring entries agreeeasy: all neighboring entries agreeshort: only gshare(9) distinguishesshort: only gshare(9) distinguisheslong: only gshare(14) distinguisheslong: only gshare(14) distinguisheslocal: only PAs(local: only PAs(77) distinguishes) distinguishesperceptron: only alpct(34/perceptron: only alpct(34/1010) )
distinguishesdistinguishesmulti-length: mix of gshare(9), (14) or multi-length: mix of gshare(9), (14) or
alpctalpctmixed: both global and local componentsmixed: both global and local components
gsharegshare(9)(9)
gsharegshare(14)(14)
PAsPAs((77))
alpctalpct(34/(34/1010))32KB COLT:32KB COLT:
Classes:Classes:
Related Work/IssuesRelated Work/Issues
• Alloyed history [Skadron00]Alloyed history [Skadron00]• Variable path history length [Stark98]Variable path history length [Stark98]• Dynamic history length fitting [Juan98]Dynamic history length fitting [Juan98]• Interference reduction [lots…]Interference reduction [lots…]
COLT handles all of these cases*COLT handles all of these cases*
Doesn’t support partial update policiesDoesn’t support partial update policies
Open ResearchOpen Research
• Better individual componentsBetter individual components• Augment with SBI [Manne99], agree Augment with SBI [Manne99], agree
[Sprangle97][Sprangle97]• Better fusion algorithmsBetter fusion algorithms• Hybrid fusion/selection algorithmsHybrid fusion/selection algorithms• Other domains (branch confidence Other domains (branch confidence
prediction, value prediction, memory prediction, value prediction, memory dependence prediction, instruction dependence prediction, instruction criticality prediction, …)criticality prediction, …)
SummarySummary
• Fusion is more powerful than selectionFusion is more powerful than selection– Combines multiple sources of informationCombines multiple sources of information
• Branch behavior is very variedBranch behavior is very varied– Need long, short, global and local histories, Need long, short, global and local histories,
multiple simultaneous lengths and types of multiple simultaneous lengths and types of historyhistory
• COLT is one possible fusion-based COLT is one possible fusion-based predictorpredictor– Combines multiple types of informationCombines multiple types of information– Current “best” purely dynamic predictor*Current “best” purely dynamic predictor*