Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev...
-
Upload
geraldine-richardson -
Category
Documents
-
view
218 -
download
1
description
Transcript of Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev...
![Page 1: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/1.jpg)
Variable Length Delta Prefetcher 1
Efficiently Prefetching Complex Address
PatternsManjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian
University of UtahChris Wilkerson, Zeshan Chishti, Seth Pugsley
*Intel Labs
![Page 2: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/2.jpg)
Variable Length Delta Prefetcher 2
Prefetchers
Confirmation Based Prefetchers• Issue predictions after a few deltas• High Accuracy• Short Streams Lose out
Immediate Prefetchers• Aggressive• Low Accuracy• Waste DRAM bandwidth and
cache capacity
Accurate Fast
![Page 3: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/3.jpg)
Variable Length Delta Prefetcher 3
Spatial Correlation• Learn Access (Delta) Patterns• Apply patterns when similar conditions re-occur. • Eg: PC, physical address, delta patterns
Delta Patterns• Regular Delta Patterns. Eg: ( +1, +1, +1)…, (+2, +2, +2, +2)…• Irregular Delta Patterns. Eg: ( +1, +2, +3 )…
![Page 4: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/4.jpg)
Variable Length Delta Prefetcher 4
Long Repeatable Streams of Irregular Deltas
Page Num: 479218 Deltas: 1, 9, -8, 1, 8, 1, -8, 1, 1, 7……..
Delta patterns for milc
![Page 5: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/5.jpg)
Variable Length Delta Prefetcher 5
Long Repeatable Streams of Irregular Deltas
Deltas : 1, 9, -8, 1, 8, 1, -8, 1, 1, 7, -1, -5,…..Cache Line: A+1, A+10, A+2, A+3, A+11, A+12, A+4, A+5, A+6, A+13, A+12, A+7……
Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7 Stream2: A+10, A+11, A+12, A+13
Confirmation Prefetches
Stride Prefetcher Coverage: 5/11
SandBox Prefetcher Coverage: 9/11
Neither are perfectly timely!
![Page 6: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/6.jpg)
Variable Length Delta Prefetcher 6
Variable Length Delta Prefetcher
![Page 7: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/7.jpg)
Variable Length Delta Prefetcher 7
Core 1
Last
Lev
el $
$
$ Access
$ AccessCore 8
Delta Prediction TablesPer Page
Delta History Tables
Per Page Delta History
Tables
PredictedDelta/OffsetOffset Prediction
Tables
Delta Prediction Tables
Offset Prediction Tables
Structure of VLDP
PredictedDelta/Offset
![Page 8: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/8.jpg)
Variable Length Delta Prefetcher 8
Delta History Table Tracks delta within a page
for (i=0;i<BIGNUM; i++){
a[i]=b[i]+c[i];}
a, b, c can each belong to different pages So Deltas between pages is meaningless
Delta = Last Address- Current Address
![Page 9: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/9.jpg)
Variable Length Delta Prefetcher 9
Delta History Table
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Times Used
Last Four Prefetched Offsets
![Page 10: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/10.jpg)
Variable Length Delta Prefetcher 10
Delta Prediction Tables
Delta(1) Pred. Accuracy
8 b 8 b 2 b
Deltas (3) Pred. Accuracy
8b 8b 8b 8b 2b
Match?
Predicted Delta
64 Rows per Table
Highest Priority (t=3)Lowest Priority (t=1)
MUX
…
Match?
![Page 11: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/11.jpg)
Variable Length Delta Prefetcher 11
Offset Prediction TableFirst Page
OffsetPred.Offset
Accuracy
7 b 7 b 2 b
OPT is used only to predict the second access to a page
![Page 12: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/12.jpg)
Variable Length Delta Prefetcher 12
Need for Multiple TablesRepeating Delta Pattern- (1, 2, 3, 5, 2, 4)…
Delta Pred.1 22 33 55 2
Delta Pred.1,2 32,3 53,5 25,2 4
Table 1 Table 2
50% Accuracy
Search for Delta pattern match starts from right most table
![Page 13: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/13.jpg)
Variable Length Delta Prefetcher 13
Looking farther than one Delta aheadRepeating Delta Pattern- (1, 2, 3), (1, 2, 3)…….
Delta Pred.1 22 33 1- -
Delta Pred.1,2 32,3 13,1 2-,- -
Degree 1 Prediction
Current Delta
![Page 14: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/14.jpg)
Variable Length Delta Prefetcher 14
Looking farther than one Delta aheadRepeating Delta Pattern- 1, 2, 3, 1, 2, 3…….
Delta Pred.1 22 33 1- -
Delta Pred.1,2 32,3 13,1 2-,- -
Degree 1 Prediction
Degree 2 Prediction
Use Recursive lookup to look farther than one Delta
Current Delta Deg 1 Prediction
![Page 15: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/15.jpg)
Variable Length Delta Prefetcher 15
Case Study: Streaming WorkloadsRepeating Delta Pattern- 1, 1, 1, 1, 1…
Delta Pred.1 1- -- -- -
Delta Pred.-,- --,- --,- --,- -
Table 1 Table 2
Patterns learned from one page is applied to another
![Page 16: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/16.jpg)
Variable Length Delta Prefetcher 16
Updating the Delta History TablesEvict Not Recently Used
If Page present, add
Delta
If Page not present, replace
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Used
Last 4 Prefetches
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Used
Last 4 Prefetches
LLC Access
![Page 17: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/17.jpg)
Variable Length Delta Prefetcher 17
Updating the Prediction TablesPage Num.
Last Add.
Last 3 Deltas
B, C, D
Delta Pred.B,C,D E?
- -- -- -
Table 3
ELatest Delta If Prediction is Correct
Increment AccuracyIf Prediction of Wrong Decrement Accuracy If Accuracy==0 Update + Promote PredictionIf Prediction is Missing Seed T1 with prediction
Delta Pred.C,D E?
- -- -- -
Delta Pred.D F?- -- -- -
Table 2Table 1
Can the current state predict Latest Delta?
Last Predictor
![Page 18: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/18.jpg)
Variable Length Delta Prefetcher 18
Populating the Prediction Tables
Delta Pred.1 A- -- -- -
Delta Pred.1,1 B-,- --,- --,- -
Delta Pred.1,1,1 C
- -- -- -
Table 1 Table 2 Table 3Table 1Wrong
Table 2Wrong
NRU NRUNRU
If mis-predict, a longer Delta history might be needed
Pattern Missing
![Page 19: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/19.jpg)
Variable Length Delta Prefetcher 19
Evaluation Methodology• Simics + USIMM• 8 RISC cores, UltraSPARC III ISA• 3.2 GHz, 4-wide OoO, 128-entry RoB• 32 KB I&D L1 caches, 4 cycles• 8 MB shared (1MB per core) L2 cache, 10 cycles
• DRAM Specifications• 2Channels, 2 Ranks per Channel, 8 Banks per Rank• 800MHz DDR3 DRAM
• SPEC 2006, NPB, and Cloudsuite• Mix1- milc, astar, lbm, libq; Mix2- xalancbmk, lbm, zeusmp,
milc;
![Page 20: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/20.jpg)
Variable Length Delta Prefetcher 20
VLDP Configuration• Per-Core VLDP• 1 Offset Prediction Table, 64 entry• 3 Delta Prediction Tables, 64 entries each• 16 entry Delta History Table• Only Delta Prediction Tables 2,3 contribute to multi degree prefetch
Offset Prediction Table 128 B
Delta History Table 222 B
Delta Prediction Table 648 B
Total 998 B/Core
![Page 21: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/21.jpg)
Variable Length Delta Prefetcher 21
Performance Improvement (Vs No PC)
VLDP is 6% better than AMPM 9% better than SBP17% better than FDP
CG IS LU MG SPClassi
fCloud
Astar
Lbm Lib
qMcf
Milc
Omnet
Soplex
XalancZeus
Mix1 Mix2 GM0.81.01.21.41.61.82.0 FDP SBP AMPM VLDP
Spee
dup
![Page 22: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/22.jpg)
Variable Length Delta Prefetcher 22
Performance Improvement (Vs PC)
VLDP is7.1% better than GHB7.6% better than SMS
CG IS LU MG SPClassi
c
Cloud9Asta
rLb
m Libq
McfMilc
Omnet
Soplex
XalaZeus
Mix1 Mix2 GM0.81.01.21.41.61.82.0 SMS GHB_PC_DC VLDP
Spee
dup
![Page 23: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/23.jpg)
Variable Length Delta Prefetcher 23
Coverage
FDP 16%SMS 55%SBP 40%
GHB 33%AMPM 49%VLDP 61%
NPB CloudSuite Spec2006 Spec2006-Mix
GM0%20%40%60%80%
100%120%
FDP SMS SBP GHB_PC_DC AMPM VLDP
Cove
rage
![Page 24: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/24.jpg)
Variable Length Delta Prefetcher 24
Sensitivity to table size
32Page_8T
32Page_16T
32Page_32T
32Page_64T
16Page_8T
16Page_16T
16Page_32T
16Page_64T
8Page_8T
8Page_16T
8Page_32T
8Page_64T0.980.991.001.011.021.03
Spee
dup
2% increase in performance when DPT size is increased
![Page 25: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/25.jpg)
Variable Length Delta Prefetcher 25
Sensitivity number of Delta Prediction Tables
3DPT improves efficiency despite a modest 1% performance improvement by reducing DRAM requests by 3%
1DPT_NoOPT 1DPT+OPT 2DPT+OPT 3DPT+OPT 4DPT+OPT1
1.1
1.2
1.3
1.4
1.5Speedup DRAM Accesses
![Page 26: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/26.jpg)
Variable Length Delta Prefetcher 26
Conclusions•OPT Issues predictions without confirmation•DPT recognizes Irregular Delta Patterns• Long delta patterns provide high accuracy• Less than 1KB per core overhead• 6% better performance
![Page 27: Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.](https://reader035.fdocuments.us/reader035/viewer/2022062504/5a4d1b727f8b9ab0599b5e86/html5/thumbnails/27.jpg)
Variable Length Delta Prefetcher 27
Thank You