Accurate and Complexity-Effective Spatial Pattern Prediction
description
Transcript of Accurate and Complexity-Effective Spatial Pattern Prediction
![Page 1: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/1.jpg)
Computer Architecture Labat
University of Toronto
AENAO: Power Aware Memory Coherence & Hierarchies for Servers
http://eecg.toronto.edu/~aenao
Accurate and Complexity-EffectiveAccurate and Complexity-Effective Spatial Pattern Prediction Spatial Pattern Prediction
Chi ChenSe-Hyun YangBabak Falsafi
Andreas Moshovos
![Page 2: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/2.jpg)
2CALCM
Motivation – Variation in Spatial Locality
Caches Exploit Spatial Locality via Block Size Prefetch Nearby Data Improve Performance
“One Size Fits All” Solution Large enough for prefetching Small enough to avoid memory link saturation
Opportunity Variation Within and Across Applications
If “Best Block Size” was known:1. Prefetch even further Higher Performance
2. “Turn-off” unused data in cache Lower Leakage Power
![Page 3: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/3.jpg)
3CALCM
This Work
Dynamic Spatial Pattern Prediction Leakage Power Reduction
Sub-blocks of a block as a Group Place “unused” block parts in low leakage state
Prefetching Consecutive Memory Blocks as a Group Selectively Prefetch Blocks Upon First Access in Group
Key Contribution: PC + Offset Within Group Quick Learning Compact Representation High Coverage
![Page 4: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/4.jpg)
4CALCM
How Well it Works
Spatial Pattern Predictor (SPP) 256-entry Tag-Less Direct-Mapped ~95% coverage
L1 Data Leakage Energy Reduction ~40% reduction w/ 70nm CMOS technology < 1% average performance degradation
Prefetching w/ 1024 byte Group Up to 2x speedup and 56% Average Conventional Cache: 14% Slowdown
![Page 5: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/5.jpg)
5CALCM
Outline
Conventional Cache: Optimization Opportunities
Variation in Spatial Locality
Prediction Framework
Prior Work
Results
![Page 6: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/6.jpg)
6CALCM
Optimization Opportunity #1
L1D with 64-Byte cache lines
age isAdult next
isAdult nextage
miss
miss
miss age isAdult next
Resident untouched data Wasteful Leakage
untouched touched
typedef struct person { char name[20]; … int age; int isAdult; struct person* next;} // total 64 bytes
// do something …
while ( people ) { if ( peopleage >= 21 ) peopleisAdult = TRUE; people = peoplenext;}
Conventional Cache
![Page 7: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/7.jpg)
7CALCM
Optimization Opportunity #2
L1D with 64-Byte cache lines
age isAdult
isAdultage
age isAdult
Detech Access Patterns at Group Level Selectively Prefetch Same Block Members
Improve Performance w/o Saturating Memory
Conventional Cache
age isAdultG
rou
p #
1G
rou
p #
2
typedef struct person { char name[20]; … int age; int isAdult;} people[LARGE]
// do something …
for i { if ( people[i].age >= 21 ) people[i].isAdult = TRUE;}
![Page 8: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/8.jpg)
8CALCM
Variation in Spatial Locality
1/8
facerec gcc mcf vortex
100%
80%
60%
40%
20%
0%
2/8
3/8
4/8
5/8
6/8
7/8
8/8
Fraction of data used before eviction Measured on 64KB 2-way L1D w/ 64B cache lines
40% 89% 26% 48%
Average Line Usage
All
Cac
he
Lin
es T
ou
che
d
![Page 9: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/9.jpg)
9CALCM
Prediction Framework
1 0 . . . 1
Minimum Fetch Unit (MFU):• replacement unit of cache• e.g., cache line or sub block
Spatial Group:• group of adjacent MFUs• indexed by logical tag
Spatial Pattern:• reference pattern of a spatial group
Tag0 Tag0 Tag1 Tag1 Tag1. . . . . .
Spatial Group Generation:• starts with a new logical tag
Time
![Page 10: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/10.jpg)
10CALCM
Spatial Pattern Predictor
0 0 0 0
0 0 0 0
1 0 0 0
1 1 1 1
0 1 1 0
1 1 0 0
1 0 0 0
1 1 1 1
001
000
011
010
Spatial PatternRegister
PHT EntryPointer
PredictionIndex
Spatial PatternHistory
Current Pattern Table (CPT) Pattern History Table (PHT)DataCache
Current Pattern Table records patterns Pattern History Table stores captured patterns
PC SPG Offset
Prediction Index: 32 bits
=?
Spatial Pattern Prediction
![Page 11: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/11.jpg)
11CALCM
Prior Work
Static profiling, V. Vleet, et al. ICCD 1999 Adjustable block size, Dubnicki & LeBlanc. ISCA 1992 Fetching adjacent cache lines, Temam & Jegou. ICS 1994 Dual cache, Gonzalez, Aliagas & Valero. ICS 1995 Spatial Locality Detection Table, Johnson, Merten & Hwu.
MICRO 1998 Spatial Footprint Predictor (SFP), Kumar & Wilkerson. ISCA
1998
Key Difference is Prediction Handle: PC + Group Offset
1. Compact Representation 2. Quick Learning3. High Coverage
![Page 12: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/12.jpg)
12CALCM
Results Overview
Predictor Performance Statistics
Leakage Power Reduction
Performance Improvement w/ Prefetching
![Page 13: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/13.jpg)
13CALCM
Methodology
SimpleScalar simulator 64KB 2-way L1D/L1I cache, 2-cycle latency 2MB 8-way L2 cache, 12-cycle latency
SPEC CPU2000 Alpha binaries + reference inputs
Predictor performance evaluation Simulated to completion
Performance impact evaluation Skipped 10B and simulated next 500M instructions
Energy reduction evaluation SPICE w/ 70nm CMOS technology & 1V supply voltage
![Page 14: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/14.jpg)
14CALCM
Practical Predictor: Performance
160%
100%
0%
20%
40%
60%
80%
gcc mcf
256-entry tag-less direct-mapped average prediction accuracy of 96%
A B CA B CvortexA B C
fecerecA B C
256 EntriesA: 16-wayB: DMC: FA
Training Over-PredictionOver-PredictionUnder-PredictionCorrect Prediction
% o
f p
erfe
ct
pre
dic
tio
ns bet
ter
![Page 15: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/15.jpg)
15CALCM
Predictor Applications
Leakage energy reduction Sub blocks as minimum fetch units Cache lines as spatial groups A cache miss starts a spatial group generation Assuming Gated-Ground by Agarwal, Li, & Roy
Spatial group prefetcher Cache lines as minimum fetch units Adjacent cache lines grouped into spatial groups A new logical tag starts a spatial group generation
![Page 16: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/16.jpg)
16CALCM
Leakage Energy Reduction
Execution Time Increase
Relative Leakage Power
80%
5%
0%
20%
40%
60%
100%
gcc mcf vortexfecerec AVG
Up to 73% leakage energy reduction ~40% average leakage energy reduction < 1% average performance degradation
60%
<1%~2%
bet
ter
bet
ter
![Page 17: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/17.jpg)
17CALCM
Performance Improvement
-50%
0%
50%
100%
150%
facerec gcc mcf vortex AVG
SPG 1024SPG 512CONV. 1024CONV. 512
Up to 2x speedup with 1024B spatial groups ~60% average speedup with 1024B spatial groups
![Page 18: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/18.jpg)
18CALCM
Summary
Spatial Pattern Predictor (SPP) Key Contribution: PC + Group Offset
Small and Effective, High Coverage 256-entry Tag-Less Direct-Mapped ~95% coverage
L1 Data Leakage Energy Reduction ~40% reduction w/ 70nm CMOS technology < 1% average performance degradation
Prefetching w/ 1024 byte Group Up to 2x speedup and 56% Average Conventional Cache: 14% Slowdown
![Page 19: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/19.jpg)
Computer Architecture Labat
University of Toronto
AENAO: Power Aware Memory Coherence & Hierarchies for Servers
http://eecg.toronto.edu/~aenao
Accurate and Complexity-EffectiveAccurate and Complexity-Effective Spatial Pattern Prediction Spatial Pattern Prediction
Chi ChenSe-Hyun YangBabak Falsafi
Andreas Moshovos
![Page 20: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/20.jpg)
20CALCM
Prediction Index
Infinite Tables PC + SPG offset yields high prediction accuracy PC + SPG offset has low prediction memory requirements
160%
100%
0%
20%
40%
60%
80%
facerec gcc mcf
TrainingOver-Prediction
Under-Prediction
Correct Prediction
A B C D A B C D A B C Dvortex
A B C D
A: PCB: PC+SPG IDC: PC+SPG OFFSETD: PC+ADDR
![Page 21: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/21.jpg)
21CALCM
Contributions
Spatial Pattern Predictor (SPP) 256-entry Tag-Less Direct-Mapped ~95% coverage
Leakage Energy Reduction ~40% reduction w/ 70nm CMOS technology < 1% average performance degradation
Processor Performance Improvement Up to 2x speedup
![Page 22: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/22.jpg)
22CALCM
Variations in Spatial Locality
0%
20%
40%
60%
80%
100%amm
p art bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
Percen
tage of A
ll Cach
e Line U
sages
<=13%14-25%26-38%39-50%51-63%64-75%76-88%89-100%
Fraction of data used before eviction Measured on 64KB 2-way L1D w/ 64B cache lines
![Page 23: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/23.jpg)
23CALCM
Prediction Index
PC + SPG offset yields high prediction accuracy PC + SPG offset requires low prediction memory
requirement
ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD
ammp art bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
0%20%40%60%80%100%120%140%160%
Percent
of Perfe
ct Predi
ctions
A: PC-onlyB: PC+SPG IDC: PC+SPG OFFSETD: PC+ADDR
Correct PredictionUnderpredictionOverpredictionTraining
![Page 24: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/24.jpg)
24CALCM
Predictor Memory Organization
256-entry tag-less direct-mapped yields average prediction accuracy of 96%
0%20%40%60%80%100%120%140%160%
ABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEF
ammp art bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
Percen
t of Perfect
Predict
ions
A: 128-en try 16-wayB: 128-en try DMC: 128-en try FAD: 256-en try 16-wayE: 256-entry DMF: 256-entry FA
Correct PredictionUnderpredictionOverpredictionTraining
![Page 25: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/25.jpg)
25CALCM
Spatial Group Size (1/2)
ABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEABCDEartA am
mp bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
Percenta
ge of Pe
rfect Pre
dictions
0%20%40%60%80%100%120%140%160%
A: 16B Spatial Group 8B Fetch UnitB: 32B Spatial Group 8B Fetch UnitC: 64B Spatial Group 8B Fetch UnitD: 128B Spatial Group 8B Fetch UnitE: 256B Spatial Group 8B Fetch Unit
Correct PredictionUnderpredictionOverpredictionTraining
![Page 26: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/26.jpg)
26CALCM
Spatial Group Size (2/2)
0%20%40%60%80%100%120%140%160%
ABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEFABCDEF
ammp art bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
Percen
tage of P
erfect P
redictio
nsCorrect PredictionUnderpredictionOverpredictionTraining
A: 32B Spatial Group 8B Fetch UnitB: 64B Spatial Group 8B Fetch UnitC: 128B Spatial Group 8B Fetch UnitD: 128B Spatial Group 64B Fetch UnitE: 256B Spatial Group 64B F etch UnitF: 512B Spatial Group 64B Fetch Unit
![Page 27: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/27.jpg)
27CALCM
Predictor Memory Organization
0%20%40%60%80%100%120%140%160%
ABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFGABCDEFG
ammp art bzip equake
facerec fma3d gap gcc lucas mcf mgrid
vortex
Percen
tage of P
erfect P
redictio
ns
A: 8-entryB: 16-entryC: 32-entryD: 64-entryE: 128-entryF: 256-entryG: INF
Correct PredictionUnderpredictionOverpredictionTraining
![Page 28: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/28.jpg)
28CALCM
Leakage Energy Reduction
Up to 73% leakage energy reduction ~40% average leakage energy reduction < 1% average performance degradation
0%
20%
40%
60%
80%
100%
ammp art bzip equ
akeface
rec fma3d gap gcc lucas mcf mgrid
vortex AVG
Execution Time Increase Fraction of Baseline Leakage Dissipation
5%
![Page 29: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/29.jpg)
29CALCM
ammp512B
1024BSPG 512BSPG 1024B
-41-6310-25
art3296121305
bzip-43-4968
equake-34-415999
facerec
-13-358103
fma3d
-9-900
gap
20313147
gcc
-2-211
lucas
-23-673451
mcf
-27-323867
mgrid
6123653
vortex
-27-4311
AVG
-13-143359
Performance Improvement
Up to 2x speedup with 1024B spatial groups ~60% average speedup with 1024B spatial groups
![Page 30: Accurate and Complexity-Effective Spatial Pattern Prediction](https://reader035.fdocuments.us/reader035/viewer/2022062322/568145f5550346895db2fc39/html5/thumbnails/30.jpg)
30CALCM
Predictor Memory Organization
160%
100%
0%
20%
40%
60%
80%
gcc mcf
256-entry tag-less direct-mapped average prediction accuracy of 96%
A B C D E FA B C D E Fvortex
A B C D E Ffecerec
A B C D E F
A: 128-entry 16-wayB: 128-entry DMC: 128-entry FAD: 256-entry 16-wayE: 256-entry DMF: 256-entry FA
TrainingOver-Prediction
Under-Prediction
Correct Prediction