Back to the Future: Leveraging Belady’s Algorithm for...
Transcript of Back to the Future: Leveraging Belady’s Algorithm for...
![Page 1: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/1.jpg)
Back to the Future: Leveraging Belady’s Algorithm for Improved Cache ReplacementAkanksha JainCalvin Lin
1
![Page 2: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/2.jpg)
Cache Replacement
• On a cache miss, which line should we evict? – Well-studied problem
• Belady provided an optimal solution in 1966
2
![Page 3: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/3.jpg)
Belady’s Optimal Algorithm (OPT)
• Evict the line that is accessed farthest in the future – Impractical
3
![Page 4: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/4.jpg)
Existing Solutions• Rely on heuristics that make assumptions about
the underlying access pattern – LRU assumes recency-friendly accesses – MRU assumes thrashing accesses – Recent solutions use more sophisticated heuristics
• Problem: Heuristics don’t work well when their assumptions don’t hold
4
![Page 5: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/5.jpg)
* = AMedium-term Reuse
CShort-term Reuse
BLong-term Reuse
Example: Matrix Multiplication
5 LRU MRU DRRIP SDBP SHiP OPT
Hit
Rate
![Page 6: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/6.jpg)
LRU MRU DRRIP SDBP SHiP OPT
Hit
Rate
B (Long-term) A (Medium-term) C (Short-term)
Example: Matrix Multiplication
6
* = AMedium-term Reuse
CShort-term Reuse
BLong-term Reuse
![Page 7: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/7.jpg)
Significant Headroom
0
5
10
15
20
25
30
DIP
DR
RIP
DSB
SDB
P
SHiP
MPK
I Red
uctio
n ov
er L
RU (%
) Cache Replacement for SPEC
Opt
imal
2007 2010 2010 2010 2011 7
We are going to use OPT
![Page 8: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/8.jpg)
Our Solution: Key Idea• We cannot look into the future • But we can apply the OPT algorithm to past events to
learn how OPT behaves • If past predicts the future, then this solution should
approach OPT
time
Future Past Behavior
What should I evict? 8
Predictor
![Page 9: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/9.jpg)
Complication
• OPT looks arbitrarily far into the future – We might need an arbitrarily long history
9
![Page 10: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/10.jpg)
0
10
20
30
40
50
60
70
1× 2× 4× 8× 16×
Erro
r com
pare
d to
Infin
ite O
PT (
%)
View of the Future (number of cache accesses)
Belady
LRU
How far in the future does OPT need to look?
10
History length not unbounded, but we need to track past 8× cache accesses
![Page 11: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/11.jpg)
Our Solution
• Hawkeye Cache Replacement (Hawkeye) – Hawks can see 8× farther than the best humans
11
![Page 12: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/12.jpg)
Hawkeye: Challenges• Need to look at a long history (8× the cache size)
• Need to efficiently compute the OPT solution for a long history of past references
• New algorithm called OPTgen – Online (linear time) – Sampling (12KB overhead for 2MB cache)
12
![Page 13: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/13.jpg)
OPTgen PC-based Predictor
13
Hawkeye: Overall Design
Last Level Cache
Computes OPT’s decisions for the
past
Remembers past OPT decisions
Cache Access Stream
OPT hit/miss
Insertion Priority
PC
![Page 14: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/14.jpg)
OPTgen PC-based Predictor
Address: X
14
Hawkeye: Overall Design
With OPT, would X be a cache hit or miss?
Hit/Miss
Training
Last Level Cache
X
PC
![Page 15: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/15.jpg)
OPTgen PC-based Predictor
Insert X with high or low
priority
Last Level Cache15
Hawkeye: Overall Design
Does this PC tend to load cache-friendly or cache-averse lines?
Address: XPredictionPC
![Page 16: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/16.jpg)
OPTgen• Simple online algorithm that reproduces OPT’s
solution for the past
• Inspired from Belady’s insight – Lines that are reused first have higher priority
• OPTgen also gives higher priority to lines that are reused first
16
![Page 17: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/17.jpg)
OPTgen• For each line, we ask if this line would have been a
hit or miss with OPT.
A B C C E D D A
time Future Past Behavior 17
Cache Capacity = 2
D
C
A
![Page 18: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/18.jpg)
OPTgen• For each line, we ask if this line would have been a
hit or miss with OPT.
time Future Past Behavior 18
Hit
A B C C E D D ACache Capacity = 2
D
C
A
![Page 19: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/19.jpg)
OPTgen• For each line, we ask if this line would have been a
hit or miss with OPT.
19
Miss
D
time Future Past Behavior
A B C C E D D A BCache Capacity = 2
D
C
A
B
![Page 20: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/20.jpg)
OPTgen is equivalent to OPT• 100% accuracy with unlimited history
– 95.5% accuracy with 8× history (for SPEC 2006)
20
![Page 21: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/21.jpg)
OPTgen Algorithm: Insight 1• OPT hit/miss can be determined at the time of reuse
21
Don’t need information about future accesses because they will have lower priority than A
A B C C E D D A B
D
C
time Future Past Behavior
A
![Page 22: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/22.jpg)
OPTgen Algorithm: Insight 2• OPT solution can be reproduced by tracking
occupancy rather than cache contents
22
![Page 23: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/23.jpg)
OPTgen Algorithm: Insight 2
23
B
D
A B C C E D D A BCache Capacity = 2
D
C
1 1 2 1 1 2 1
time Future Past Behavior
A
![Page 24: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/24.jpg)
OPTgen Algorithm: Insight 3
• OPT considers both reuse distances and their overlap
24
E
D
A B C C E D D A ECache Capacity = 2
D
C
time Future Past Behavior
E’s reuse distance is smaller than A, but it misses with OPT because it has higher overlap
A
![Page 25: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/25.jpg)
Sampling for OPTgen
• 8× history for a 2MB cache – 100K entries (> 0.5MB)
• Set Dueling [Qureshi et al., ISCA 07] – Sample the behavior of a
few cache sets – 64 sampled sets
• Set Dueling with OPTgen – Apply OPTgen to 64
sampled sets – 2.5K entries (12KB) – 95% accuracy in
estimating miss rate
25
![Page 26: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/26.jpg)
OPTgen" PC-based Predictor"
26
Hawkeye: Overall Design
Last Level Cache"
95% accurate
Cache Access Stream
OPT hit/miss
Insertion Priority
![Page 27: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/27.jpg)
Evaluation• Cache Replacement Championship Simulator (CRC) • Benchmarks: Memory-intensive SPEC 2006 • Hardware Configurations
– Single Core: Last-level Cache: 16-way, 2MB LLC – Multi-core: Shared Cache: 16-way, 4MB & 8MB LLC
• Replacement policies – Baseline: LRU – DRRIP [2010] – SDBP [2011] – SHiP [2012]
27
![Page 28: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/28.jpg)
-30 -20 -10
0 10 20 30 40 50 60
asta
r gr
omac
s go
bmk
lbm
om
netp
p ca
lcul
ix
gcc
lesl
ie
zeus
lib
q bz
ip
h264
to
nto
hmm
er
xala
nc
mcf
so
plex
ge
ms
cact
us
sphi
nx3
Mea
n
Mis
s Red
uctio
n ov
er L
RU
(%)
DRRIP SHIP SDBP
28
Miss Reduction
![Page 29: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/29.jpg)
-30 -20 -10
0 10 20 30 40 50 60
asta
r gr
omac
s go
bmk
lbm
om
netp
p ca
lcul
ix
gcc
lesl
ie
zeus
lib
q bz
ip
h264
to
nto
hmm
er
xala
nc
mcf
so
plex
ge
ms
cact
us
sphi
nx3
Mea
n
Mis
s Red
uctio
n ov
er L
RU
(%)
DRRIP SHIP SDBP Hawkeye
17.4%
Hawkeye outperforms DRRIP, SDBP and SHiP 29
Miss Reduction
![Page 30: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/30.jpg)
-30 -20 -10
0 10 20 30 40 50 60
asta
r gr
omac
s go
bmk
lbm
om
netp
p ca
lcul
ix
gcc
lesl
ie
zeus
lib
q bz
ip
h264
to
nto
hmm
er
xala
nc
mcf
so
plex
ge
ms
cact
us
sphi
nx3
Mea
n
Mis
s Red
uctio
n ov
er L
RU
(%)
DRRIP SHIP SDBP Hawkeye
17.4%
Hawkeye does not result in a slowdown for any benchmark 30
Miss Reduction
![Page 31: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/31.jpg)
-30 -20 -10
0 10 20 30 40 50 60
asta
r gr
omac
s go
bmk
lbm
om
netp
p ca
lcul
ix
gcc
lesl
ie
zeus
lib
q bz
ip
h264
to
nto
hmm
er
xala
nc
mcf
so
plex
ge
ms
cact
us
sphi
nx3
Mea
n
Mis
s Red
uctio
n ov
er L
RU
(%)
DRRIP SHIP SDBP Hawkeye
17.4%
Hawkeye’s performance gains are consistent across benchmarks 31
Miss Reduction
![Page 32: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/32.jpg)
-30 -20 -10
0 10 20 30 40 50 60 70
asta
r gr
omac
s go
bmk
lbm
om
netp
p ca
lcul
ix
gcc
lesl
ie
zeus
lib
q bz
ip
h264
to
nto
hmm
er
xala
nc
mcf
so
plex
ge
ms
cact
us
sphi
nx3
Mea
n
Mis
s Red
uctio
n ov
er L
RU
(%)
DRRIP SHIP SDBP Hawkeye OPT
32
Miss Reduction
![Page 33: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/33.jpg)
-20
-10
0
10
20
30
40
50
asta
r"gr
omac
s"go
bmk"
lbm"
omne
tpp"
calc
ulix"
gcc"
lesl
ie"
zeus"
libq"
bzip
2"h2
64"
tont
o"hm
mer"
xala
n"m
cf"
sopl
ex"
gem
s"ca
ctus"
sphi
nx"
Geo
mea
n"
Spee
dup
(%)"
DRRIP SHiP SDBP Hawkeye
8.4%
33
Speedup
![Page 34: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/34.jpg)
Multi-Core Results
0"
5"
10"
15"
20"
25"
30"
35"
1" 2" 4"
MPK
I Red
uctio
n ov
er L
RU
(%)"
Number of Cores"
SDBP"
SHiP"
Hawkeye"
Speedup = 8.4%
Speedup = 13.5%
Speedup = 15.0%
Averaged across 100s of multi-programmed SPEC runs 34
![Page 35: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/35.jpg)
Hawkeye: Summary
• New goal: learn from the OPT solution
• Not limited to specific access patterns
• Models both reuse distance and demand 0
5
10
15
20
25
30
DIP
DR
RIP
DSB
SDB
P
SHiP
OPT
Haw
keye
MPK
I Red
uctio
n ov
er L
RU (%
)
‘07 ‘10 ‘10 ‘10 ‘11 2015
35
![Page 36: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement](https://reader034.fdocuments.us/reader034/viewer/2022050223/5f68cd1c2e04a56cd06b2d9a/html5/thumbnails/36.jpg)
Conclusions and Future Work• Recent trend to view cache replacement as a
prediction problem – Learn from the past to predict the future
• Hawkeye learns from an oracle
• Future Work: More sophisticated predictors to learn the OPT solution for the past
36