Reducing Energy Consumption of Disk
Storage Using Power-Aware Cache Management
Q. Zhu, F. David, C. Devaraj, Z. Li, Y. Zhou, P. Cao*University of Illinois at Urbana Champaign &
*Cisco Systems Inc.HPCA ‘04
Presented by: Justin Kliger & Scott Schneider
Motivation
Reduce Energy Consumption
Targeting large data center such as EMC Symmetrix (right) 10-50 TBytes 128 GB of non-volatile
memory cache
Motivation
Dealing with large data storage and large caches separated from application servers
Motivation
Consume huge amounts of power: Currently consuming 150-200 W/ft2
Expect an increase up to 25% annually Storage devices already account for 27% of
power consumption at data center Significance of reducing energy consumption:
Can limit costs to these data centers Keeps energy costs from becoming prohibitive
and preventing data center expansion Positive environmental impacts
Motivation
Focusing on cache replacement algorithm, conserve energy by changing the average idle time of disks
Create Power-Aware algorithm Designate priority disks to allow some disks to
greatly increase idle time Selectively keep blocks in cache from priority
disks to decrease power usage
Outline
Background for Disk Power Model Off-line analysis Online algorithm Evaluation & Results Write Policies Conclusion, Impact & Further Research
Disk Power Model
Conventional disks have three states Active and Idle consume full power Standby consumes less power, but requires a
spin up to satisfy a request Gurumuthi et al. proposed multi-speed disks
Lower rotational speeds consume less energy Transition from lower speed to next higher speed
is smaller than switching from standby to active
Disk Power Model
Their disk model uses these proposed multi-speed disks Multi-speed disks can be configured to service
requests at all speeds or only the highest speed Only servicing requests at highest speed makes
the disks essentially multi-state disks, as opposed to two-state disks
Their model uses 4 intermediate lower power modes
Disk Power Management
Oracle disk power management (DPM) Term for when entire sequence is known ahead of
time, perfect power management is possible Provides upper bound on energy saved
Upon request completion Oracle DPM examines interval length t between
requests If t is greater than break-even time, spin disk
down immediately
Disk Power Management
The minimum energy consumption is the curve that intersects with the consumption line for each state
Online algorithms uses these crossover points as thresholds
DPM and Cache Replacement
Disk Disk DiskDisk DiskDisk Disk
DPM and Cache Replacement
Disk Disk DiskDisk DiskDisk Disk
DPM DPM DPM DPM DPM
DPM and Cache Replacement
Disk Disk DiskDisk DiskDisk Disk
DPM DPM DPM DPM DPM
Cache Replacement Policy
Power-Aware Off-line algorithms Optimal cache-hit algorithm (Belady’s) can be
suboptimal for power-consumption
[Figure 3: An example showing Belady’s algorithm is not energy-optimal]
6 misses; 24 time units at high energy consumption
Power-Aware Off-line algorithms Optimal cache-hit algorithm (Belady’s) can be
suboptimal for power-consumption
[Figure 3: An example showing Belady’s algorithm is not energy-optimal]
7 misses; 16 time units at high energy consumption
Power-Aware Off-line Algorithms
Energy Optimal Algorithm Developed polynomial-time algorithm with
dynamic programming, but not applicable to aiding online algorithm, details not included
Off-line Power-Aware Greedy Algorithm
(OPG) More realistic, used as comparison in traces Considers future deterministic misses
Power-Aware Off-line Algorithms
OPG: Evicts blocks with minimum energy penalty:
OL(Li) + OL(Fi) – OL(Li + Fi) = Practical DPM
LE(Li) + LE(Fi) – LE(Li + Fi) = Oracle DPM
Time complexity is O(n2) Heuristic because only considers the current
set of deterministic misses
Power-Aware Online Algorithm
Insight gained from off-line analysis: avoid evicting blocks with large energy penalties
Small increases in idle time can have big energy gains
Online Approach
Online algorithm goals Use the cache replacement policy to reshape
each disk’s access pattern Give priority to blocks from inactive disks to
increase average idle time Allow average interval time of some disks to
decrease so that interval time of already idle disks can increase
Overall energy consumption is reduced
Other Online Factors
Potential energy savings are also determined by Percentage of capacity misses must be high in
a workload; cold misses can’t be avoided Distribution of accesses determine actual
interval lengths; larger deviation from the mean has more opportunity for savings
An online algorithm needs to identify these properties for each disk to make good decisions
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Example, m = 13, k = 4
0 0 0 0 0 0 0 0 0 0 0 0 0
h1() h2() h3() h4()
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Example, m = 13, k = 4
0 0 0 0 0 0 0 0 0 0 0 0 0
h1(a) h2(a) h3(a) h4(a)
a
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Example, m = 13, k = 4
0 0 0 0 0 0 0 0 0 0 0 0 0
h1(a) h2(a) h3(a) h4(a)
a
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Example, m = 13, k = 4
0 1 0 0 1 0 1 0 0 1 0 0 0
h1(a) h2(a) h3(a) h4(a)
a
Tracking Cold Misses
Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h1, h2, ..., hk
Identifying a cold miss is always correct False positives are possible for non-cold
misses For 1.6M blocks with v set to 2M and k = 7, false
positives happen 0.82% of the time
Distribution Estimate
Disk access distribution is estimated using an epoch-based histogram technique
We use an approximation method to estimate the cumulative distribution function of interval length of a disk, F(x) = P[X ≤ x]
Distribution Estimate
In each epoch Track interval length
between consecutive disk accesses
Each interval falls into a discrete range
The sum of each range and all ranges above it approximates the cumulative probability below that range
Power Aware Cache Management
Dynamically track cold misses and cumulative distribution of interval lengths for each disk
Classify disks, each epoch, as Priority disks have a “small” percentage of cold
misses and large interval lengths with a “high” probability
Regular disks are the rest
Power Aware Cache Management
The basic idea: reshape the access pattern to keep priority disks idle
PA can be combined with other algorithms such as LIRS, ARC, MQ, etc.
Example: PA-LRU
PA-LRU employs two LRU stacks LRU0 keeps regular disk blocks LRU1 keeps priority disk blocks
Blocks are evicted from bottom of LRU0 first, then bottom of LRU1
Parameters α is the cold misses threshold p is the cumulative probability β is the CDF threshold epoch length
Overall Design
Disk Disk DiskDisk DiskDisk Disk
DPM DPM DPM DPM DPM
Cache Replacement Policy
Overall Design
Disk Disk DiskDisk DiskDisk Disk
DPM DPM DPM DPM DPM
PA Classification Engine Cache Replacement Algorithm
Evaluation
Specifications of disk:
Additionally, 4 more low-speed power modes: 12000RPM, 9000RPM, 6000RPM, 3000RPM
Evaluation Two System traces: OLTP & Cello96
Cache Size: For OLTP = 128 MBytes For Cello96 = 32 MBytes
Evaluation
Compared 4 algorithms: Belady’s, OPG, LRU, PA-LRU Also measured disk energy consumption with
infinite cache size – provides lower bound(only cold misses access disk)
Practical DPM use thresholds identified earlier as competitive with Oracle
PA-LRU uses 4 parameters: Epoch length = 15 minutes, α = 50%, p = 80%,
β = 5 seconds
Results PA-LRU can save 16% more energy than LRU on
OLTP trace However, PA-LRU only saves 2-3% more energy
than LRU for Cello96 trace
Results
How PA-LRU improves performance:
= PA-LRU
Write Policies
Four write policies WB: Write-Back only writes dirty blocks upon
eviction WT: Write-Through writes dirty blocks
immediately WBEU: Write-Back with Eager Updates writes a
block immediately if that disk is active; otherwise it waits
WTDU: Write-Through with Deferred Update writes dirty blocks to a log if the target disk is in a low power mode
Write Policies Evaluation
Synthetic traces that varied write/read ratios and the interarrival time
WB vs. WT Write-Back consistently better up to 20%
WBEU vs. WT WBEU consistently better up to 65%
WTDU vs. WT WTDU consistently better up to 55%
Conclusion
Effective analysis for off-line algorithm (OPG) Designed and evaluated Power-Aware online
algorithm (PA-LRU), which can use 16% less energy than LRU
Considered write policy effects on energy savings
Impact of work
Theoretical off-line analysis of power-aware caching policies
Identification of requirements for an online power-aware caching algorithm
Published in 2004; 3 self citations plus Pinheiro, Bianchini ICS 04 and Papathanasiou, Scott USENIX 04
Further Research
Reduce # of parameters (PB-LRU) Online algorithm applied to single disks Prefetching Consider storage cache energy consumption Evaluate in real storage system context
Top Related