NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical...
-
Upload
alijah-blackshaw -
Category
Documents
-
view
214 -
download
2
Transcript of NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical...
![Page 1: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/1.jpg)
NC STATE UNIVERSITY
Center for Efficient, Scalable, and Reliable ComputingDepartment of Electrical & Computer Engineering
North Carolina State University
EXACT:Explicit Dynamic-Branch Prediction
with Active Updates
Muawya Al-Otoom, Elliott Forbes, Eric Rotenberg
![Page 2: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/2.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 2© Eric Rotenberg
Branch Prediction and Performance
1K-entry instruction window
128-entry issue queue (similar to Nehalem)
4-wide fetch/issue/retire
16 KB gshare
Randomly remove 0% to 100% of mispredictions
0
0.5
1
1.5
2
2.5
3
3.5
4
% of remaining branch mispredicts removed
IPC
bzip2gziptwolf
89%
91%
95%
![Page 3: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/3.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 3© Eric Rotenberg
while ( . . . ){ . . . for (i = 0; i < 16; i++) { if (a[i] > 10) { . . . } else { . . . } } . . .}
LOAD r5 = a[i]X: BRANCH r5 <= 10
Example
![Page 4: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/4.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 4© Eric Rotenberg
Example (cont.)
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
0 0 1
X
+
PC
GHR
0
PHT
![Page 5: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/5.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 5© Eric Rotenberg
Good Scenario #1
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
0 0 1
X
+
PC
GHR
0
PHT a[4] has unique GHR context. GHR implicitly identifies a[4]. Dedicated prediction for a[4].
dedicatedpredictionfor a[4]
![Page 6: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/6.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 6© Eric Rotenberg
Good Scenario #2
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
a[6], a[10] have same GHR context. GHR does not distinguish a[6], a[10]. Shared prediction for a[6], a[10]. Fortunately, they have same outcome.
sharedpredictionfor a[6], a[10]
1 0 1
X
+
PC
GHR
1
PHT
![Page 7: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/7.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 7© Eric Rotenberg
Bad Scenario #1
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
a[6], a[15] have same GHR context. GHR does not distinguish a[6], a[15]. Shared prediction for a[6], a[15]. Problem: they have different outcomes.
sharedpredictionfor a[6], a[15]
1 0 1
X
+
PC
GHR
?
PHT
![Page 8: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/8.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 8© Eric Rotenberg
Bad Scenario #1 (cont.)
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
Indirect solution Use longer history. GHR now distinguishes a[6], a[15].
0 1 0
X
+
PC
GHR
PHT
1 0
X
+
PC
GHR
1
1
11 0
dedicated prediction for a[6]
dedicated prediction for a[15]
![Page 9: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/9.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 9© Eric Rotenberg
Bad Scenario #1 (cont.)
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
Indirect solution Use longer history. GHR now distinguishes a[6], a[15]. Ambiguity not eradicated: a[10], a[15].
PHT
1 0
X
+
PC
GHR 11
? sharedpredictionfor a[10], a[15]
Direct solution Use address of element. Dedicated predictions for different elements.
![Page 10: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/10.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 10© Eric Rotenberg
Bad Scenario #2
LOAD
BRANCH
4 11 15 2 7 1 3 52 9 3 8 5 55 833
a[14]a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] a[13]
1 0 0 1 1 1 1 0 1 1 1 1 0 10
18
a[15]
0
0 0 1
X
+
PC
GHR
0
PHT STORE a[4] = 6
stalepredictionfor a[4]
6
1
Conventional passive updates Predictor’s entry is stale after store Retrained after suffering misprediction
Solution: Active updates
![Page 11: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/11.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 11© Eric Rotenberg
Big Picture
branchpredictor
memory
Proposedbranchpredictor
memory
Conventional
Branch predictor should mirror a program’s objects
![Page 12: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/12.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 12© Eric Rotenberg
Big Picture Branch predictor should mirror a program’s objects Branch predictor should mirror changes as they happen
branchpredictor
memory
Conventional
Store
branchpredictor
memory
Proposed
Store
![Page 13: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/13.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 13© Eric Rotenberg
Characterize Mispredictions
Bad Scenario #1 Measure how often global branch history does
not distinguish different dynamic branches that have different outcomes
Bad Scenario #2 Measure how often stores cause stale
predictions Evaluate for two history-based predictors
Very large gselect Very large L-TAGE
![Page 14: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/14.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 14© Eric Rotenberg
Characterize Mispredictions
Bad Scenario #1
![Page 15: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/15.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 15© Eric Rotenberg
Characterize Mispredictions
Bad Scenario #2
Note:Predominance of Bad Scenario #1 may obscure occurrences of Bad Scenario #2.
![Page 16: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/16.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 16© Eric Rotenberg
Problems and Solutions Two problems:
{PC, global branch history} does not always distinguish different dynamic branches that have different outcomes
Stores to memory change branch outcomes
Two solutions: Explicitly identify dynamic branches to provide dedicated
predictions for them (EX)o branch ID = hash(PC, load addresses)
Stores “actively update” the branch predictor (ACT)
![Page 17: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/17.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 17© Eric Rotenberg
Bad Scenario #2
Bad Scenario #1
Load-dependent branches use explicit predictor Index with branch ID (EX) Index with branch ID and perform active updates (EXACT)
Other branches use the default predictor (e.g., L-TAGE)
![Page 18: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/18.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 18© Eric Rotenberg
3 Implementation Challenges
1. Indexing the explicit predictor Branch ID unknown at fetch time Loads are unlikely to have computed their addresses
by the time the branch is fetched
2. Large explicit predictor Many different IDs contribute to mispredictions Predictor size normally limited by cycle time
3. Large active update unit Too large to implement with dedicated storage
![Page 19: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/19.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 19© Eric Rotenberg
Implementation Challenge #1:Indexing the Explicit Predictor
Insight: sequences of IDs repeat due to traversing data structures
Use ID of a prior retired branch at a fixed distance N
![Page 20: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/20.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 20© Eric Rotenberg
0%
2%
4%
6%
8%
10%
12%
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30distance
mis
pre
dic
tio
n r
ate
%
bzip2 crafty gzip twolf
0%
2%
4%
6%
8%
10%
12%
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30distance
mis
pre
dic
tio
n r
ate
%
bzip2 crafty gzip twolf
IDEAL:Assumes ID of Nth branch away is always available.
REAL:Use explicit predictor only if Nth branch away is retired.Otherwise use default predictor.
(N)
(N)
use IDof branchitself
![Page 21: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/21.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 21© Eric Rotenberg
Need large explicit predictor with fast cycle time Indexing with prior retired branch IDs makes it
easily pipelinable In general, pipelining is straightforward if the
index does not depend on immediately preceding predictions
Implementation Challenge #2:Large Explicit Predictor
![Page 22: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/22.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 22© Eric Rotenberg
++
++
++
GBQ
ID1+
BHR
ID2+
BHR
ID3+
BHR
CYCLE 1 CYCLE 2 CYCLE 3 CYCLE 4 CYCLE 5 CYCLE 6 CYCLE 7
Implementation Challenge #2:Large Explicit Predictor
![Page 23: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/23.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 23© Eric Rotenberg
Most benchmarks are tolerant of 400+ cycles of active update latency Large distance between stores and re-encounters of the branches
they update
Implementation Challenge #3:Large Storage for Active Updates
0%
2%
4%
6%
8%
10%
12%
0 100 200 300 400 500 600 700 800 900 1000active update latency (cycles)
mis
pre
dic
tio
n r
ate
%
bzip2 crafty gzip twolf
![Page 24: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/24.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 24© Eric Rotenberg
Exploit “Predictor Virtualization” [Burcea et al.] Eliminate significant amount of dedicated storage Use small L1 table backed by full table in physical
memory The full table in physical memory is transparently
cached in the general-purpose memory hierarchy (e.g., L2$)
Implementation Challenge #3:Large Storage for Active Updates
![Page 25: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/25.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 25© Eric Rotenberg
Putting it all together
Processor PipelineFetch Retire
Explicit Predictor
ID Gen
Default Predictor
past ID for predicting current branch
passive updates
stores
active updates
Active Update
Unit
1
2
3
![Page 26: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/26.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 26© Eric Rotenberg
gzip
4%
5%
6%
7%
8%
9%
1 10 100 1000 10000cost (K-Bytes)
mis
pre
dic
tio
n r
ate
%
gsharegshare+EXACTgshare+EXACT (VIRT. SACT)L-TAGEL-TAGE+EXACTL-TAGE+EXACT (VIRT. SACT)
![Page 27: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/27.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 27© Eric Rotenberg
Active Update Unit
First proposal to use store instructions to update the branch predictor
Store instructions might: Change a branch outcome in the explicit predictor Change a trip-count in the explicit loop predictor
Two mechanisms required: Convert store address into a predictor index Convert store value into a branch-outcome or trip-count
![Page 28: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/28.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 28© Eric Rotenberg
Active Update Example0x0400STORE 0x20 X
.
.
.
0x0800LOAD R5 X
0x0808BLE R5, 0x30, target
Explicit Predictor
Single-Address Conversion Table (SACT)
X
0xABCD
0xABCD0x0808
Ranges Reuse Table (RRT)
T min T max N min N max0x0808 0x10
compare
T
PC index
0x30 0x31 0x50
0x20
N
![Page 29: NC STATE UNIVERSITY Center for Efficient, Scalable, and Reliable Computing Department of Electrical & Computer Engineering North Carolina State University.](https://reader036.fdocuments.us/reader036/viewer/2022081602/551c5633550346b1458b4fbc/html5/thumbnails/29.jpg)
NC STATE UNIVERSITY
Int’l Conference on Computing Frontiers, May 16-19, 2010 29© Eric Rotenberg
Future Work
• Rich ISA support– Empower compiler or programmer to directly index
into the explicit predictor, directly manage it, and directly manage the instruction fetch unit
– Close gap between real and ideal indexing– More efficient hardware