A New Equation of State for Correlation and Prediction of Standard M
Exploring Correlation for Indirect Branch Prediction
description
Transcript of Exploring Correlation for Indirect Branch Prediction
Exploring Correlation for Indirect Exploring Correlation for Indirect Branch PredictionBranch Prediction
1
Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou
Department of Electrical and Computer EngineeringNorth Carolina State University
Baseline: IITAGE Indirect Branch Predictor
[A. Seznec and P. Michaud, JILP 2006]
A PPM-based predictor contains multiple Markov predictors with each capturing different history length and the one with the longest match will be used to make prediction.
2
Our Main Idea: Longest history length vs. adaptive
history lengths.
Address-target correlation.
3
Predictor Structure – Main PredictorPredictor Structure – Main Predictor
Tag
T1
u Target AltTag u Target AltTag u Target AltTag u Target
T2 T3 Tn
…
T1_MatchT2_Match
T1,2_Match
T3_Match
T1,n-1_MatchTn_Match
T1_MatchT2_MatchTn_Match
Target Prediction
HBT hit…
hlen
…
Main Predictor at Fetch stageMain Predictor at Fetch stage
ITTAGE as the baseline predictor (no T0)
Two ways to adaptively select the proper table (or history length)
1. Alt bit in each entry (except T1)
2. A separate table for hard-to-predict branches
5
tag u alt target
Alt = 0, target from the current entry is preferred for the prediction.
Alt = 1, a table with shorter history is to be used to make the final prediction.
No alt bit for the table T1.
Initially alt field is set to zero. Update mechanism:
If table with the longest match fails to make correct prediction while another table does, the alt field will be set for those entries with longer history lengths.
6
Using Alt bits to select a tableUsing Alt bits to select a table
Hard-to-predict Branch Table (HBT)Hard-to-predict Branch Table (HBT)
A cache like set associative structure with entry containing a tag, a misprediction counter (mc) and a history length (hlen).
HBT updated based on the prediction provided by longest history
mc field is used for replacement to allow hard to predict branches to be captured by HBT.
hlen is used to select the hlenth longest history.
7
tag mc hlen
For example, if hlen = 2 and T2, T4 and T5 have tag matches and their corresponding alt fields are false then T2 will be selected for prediction.
The main predictor provides prediction at fetch stage.
The main predictor is updated at retire stage of an indirect branch.
8
Hard to predict Branch table (HBT)Hard to predict Branch table (HBT)
Auxiliary Predictor at AGEN stage Auxiliary Predictor at AGEN stage
Correlation between producer load address and consumer branch target, e. g.,
Load R19 = Mem [R3] //Address: 0x60848100 0x60846ec8
Br R19 //Target: 0x60751a64 0x607691c9
Producer load accesses two addresses with each address providing a different branch target.
As long as data structures at these addresses do not change frequently, they are sufficient to predict branch target of consumer indirect branch.
9
tag <addr,target> <addr,target>Br pc
Hashed load address
Auxiliary Predictor DesignAuxiliary Predictor Design
Address Target Correlation (ATC) is captured using Address Target Table (ATT) .
Accessed at agen stage of load instruction.
PC of indirect branch used for tag match.
Hashed load address is used to find matching address-target pair.
Updated at the EXE stage of an indirect branch
LRU replacement policy.
Reduces misprediction penalty in case the prediction differs from the one provided at fetch stage.
11
Auxiliary Predictor DesignAuxiliary Predictor Designtag <addr,target> <addr,target>
Br pc
Hashed load address
Storage Cost (1/2)Storage Cost (1/2)
Tagged table entry U ctr: 2 bits Target: 32 bits Alt: 1 bit (except T1) Tag: partial tag
HBT (1,216 bits) 32 entries Tag: 32 bits mc: 2 bits hlen: 4 bits
ATT (11,882 bits) 26 entries Tag: 32 bits Lru: 5 bits <target,address> : <32,10> bits
12
Global history – 640 * 2 bits
Path history – 16 bits
Other counters – 39 bits
Total – 64.97 KB
13
Storage Cost (2/2)Storage Cost (2/2)
Experimental ResultsExperimental Results
Overall performance improvements (ATT 11,882 bits)– 15.6%
Performance improvements with small ATT (1,624 bits) – 14.8%
14
1. Other contestants are doing superb!
2. Our baseline ITTAGE is not well tuned. The code and the predictor structure is modified based on L-TAGE
Discussion: Why we may not winDiscussion: Why we may not win
Our main ideas, adaptive history length and address-target correlation, can further improve well-tuned predictors.
Discussion: Why we can winDiscussion: Why we can win
ConclusionsConclusions
Although control flow history carries correlation to targets, the strength of correlation may either increase or decrease for different indirect branches when we increase the history length.
There exists strong correlation between producer load addresses and consumer branch targets.
17
Thank You
18