Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman...
-
Upload
jessie-regan -
Category
Documents
-
view
217 -
download
2
Transcript of Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman...
Predicting Performance Impact of DVFSfor Realistic Memory Systems
Rustam MiftakhutdinovEiman Ebrahimi
Yale N. Patt
2
V
f
Dynamic Voltage/Frequency Scaling
Image source: intel.com
3
fopt
Impact of Frequency Scaling
frequency
time
power
energy
4
fo
Impact of Frequency Scaling
power
time
frequency
5
fopt
Prediction Overview
instructions
frequency
energy perinstruction
100K 200K 300K0
fo freq.
time
fo freq.
powerfo
fo freq.fopt
energy
our work
×
6
Outline
Intro to performance prediction
Why realistic memory systems?
Variable memory latency
Prefetching
✓
7
V
f
Why Realistic Memory System?
8
Prior Work
• Stall time
• Leading loads (2010) S. Eyerman et al. G. Keramidas et al. B. Rountree
Evaluated withconstant access latency memory system
9
Energy Savings
Constant Access Latency
Realistic DRAM Realistic DRAM + Streaming Prefetcher
0123456789 Oracle
Stall timeLeading loadsOur predictor
Nor
m. E
nerg
y Sa
ving
s (%
)
< 0.1
Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006
*
10
Energy Savings
Constant Access Latency
Realistic DRAM Realistic DRAM + Streaming Prefetcher
0123456789 Oracle
Stall timeLeading loadsOur predictor
Nor
m. E
nerg
y Sa
ving
s (%
)
< 0.1
Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006
*
11
Outline
Intro to performance prediction
Why realistic memory systems?
Variable memory latency
Prefetching
✓
✓
12
Execution Example
chipactivity
memoryrequests A
BC
DE
1 2 3 4
time
13
T = Tmemory + Tcomputeindependent offrequency
proportional tocycle time
14
to
Linear Modelexecution time T
cycle time t
Tmemory
Tcompute
0
15
Measuring Tmemory
chipactivity
memoryrequests
time
16
Measuring Tmemory
chipactivity
memoryrequests
time
17
Causes of Request Dependences
next
next
next
Pointer Chasing
instruction window
miss miss
Finite Chip Resources
18
Measuring Tmemory
chipactivity
memoryrequests
time
Critical Path Algorithm
at Tstart 1. record Tstart and Tmemory
TendTstart time
Tmemory
19
at Tend 2. compute path = Tmemory(Tstart) + (Tend - Tstart)
old critical path request latency
3. set Tmemory = max(Tmemory, path)
new Tmemory
(length of critical path)
20
to
Linear Modelexecution time T
cycle time t
Tmemory
Tcompute
0
21
Linear Model
to
execution time T
cycle time t
Tmemory
Tcompute
0
to cycletime
Tm
time
fo freq.
time
fo freq.
power
fo freq.fopt
energy
×
22
Critical Path: Variable Access Latency
chipactivity
memoryrequests
time
Leading Loads: Constant Access Latency
timechipactivity
memoryrequests
23
to
Leading Loadsexecution time T
cycle time t
Tmemory
Tcompute
0
leading loads
24
Leading Loads
to
execution time T
cycle time t
Tmemory
Tcompute
0
leading loads
to cycletime
Tm
time
fo freq.
time
fo freq.
power
fo freq.fopt
energy
×
25
Energy Savings
Constant Access Latency
Realistic DRAM0
1
2
3
4
5
6
7
8 OracleStall timeLeading loadsOur predictor
Nor
m. E
nerg
y Sa
ving
s (%
)
Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006
*
26
Outline
Intro to performance prediction
Why realistic memory systems?
Variable memory latency
Prefetching
✓
✓
✓
27
chipactivity
memoryrequests
time
Prefetcher OFF
Prefetcher ON
chipactivity
memoryrequests
Streaming Workload
28
Limited Bandwidth Modelexecution time T
cycle time t
Tdemand
TcomputeTmemorymin
tcrossover0
Energy Savings
29Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006
*
Constant Access Latency
Realistic DRAM Realistic DRAM + Streaming Prefetcher
0123456789 Oracle
Stall timeLeading loadsOur predictor
Nor
m. E
nerg
y Sa
ving
s (%
)
< 0.1
30
Recap
Intro to performance prediction
Why realistic memory systems?
Variable memory latency
Prefetching
✓
✓
✓
✓
31
Final Thought
Performance predictors need realistic evaluation