10 years of research on Power Management (now called green computing)
description
Transcript of 10 years of research on Power Management (now called green computing)
![Page 1: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/1.jpg)
10 years of research on Power Management
(now called green computing)
Rami Melhem
Daniel Mosse
Bruce Childers
![Page 2: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/2.jpg)
• Introduction• Power management in real-time systems• Power management in multi-core processors• Performance-Resilience-Power Tradeoff • Management of memory power• Phase Change Memory
![Page 3: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/3.jpg)
Power management techniques
Two common techniques:1) Throttling
Turn off (or change mode of) unused components (Need to predict usage patterns to avoid time and energy overhead of on/off or mode switching)
2) Frequency and voltage scalingScale down core’s speed (frequency and voltage)
Designing power efficient components is orthogonal to power management
![Page 4: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/4.jpg)
Frequency/voltage scaling• Gracefully reduce performance
• Dynamic power Pd = C f 3 + Pind
• Static power: independent of f. power
Static power
time
C f 3
Pind
timeWhen frequency is halved:• Time is doubled• C f 3 is divided by 8• Energy caused by C f 3 is divided by 4• Energy caused by Pind is doubled
Idle time
![Page 5: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/5.jpg)
• Minimize total energy consumption- static energy decreases with speed- dynamic energy increases with speed
• Minimize the energy-delay product – Takes performance into consideration
• Minimize the maximum temperature• Maximize performance given a power budget• Minimize energy given a deadline• Minimize energy given reliability constraints
Different goals of power management
Ener
gy*d
elay
f
Pind / f 2
C f
ener
gy
Speed (f)
C f 2
total
Pind / f
![Page 6: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/6.jpg)
DVS in real-time systemsCPU speed
time
deadlineSmax
Smin
Worst case execution
Remaining time
• Utilize slack to slow down future tasks (Proportional, Greedy, aggressive,…)
timeStatic scaling
(power management points)
Dynamic scaling
Remaining time
![Page 7: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/7.jpg)
Implementation of Power Management Points
•Can be implemented as periodic OS interrupst
• Difficulty: OS does not know how much execution is remaining
• Compiler can insert code to provide hints to the OS
min average max
branch
loop
![Page 8: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/8.jpg)
Example of compiler/OS collaboration
•Compiler records WCET based on the longest remaining path
At a power management hint
min average max
At a power management point•OS uses knowledge about current load to set up the speed
![Page 9: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/9.jpg)
Run-timeinformation
OS/HW(knows the
system)
Compiler/OS collaboration
Compiler(knows the
task)
Static analysis
Application Source Code
PMHs: Power management hints
PMPs: Power management
points
Interrupts for executing PMPs
PMHs
time
![Page 10: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/10.jpg)
DVS for multiple cores
Manage energy by determining:• The speed for the serial section• The number of cores used in the
parallel section• The speed in the parallel section
One core
Two cores
Slowing down the cores
Slowing down the parallel section
To derive a simple analytical model, assume Amdahl’s law: - p % of computation can be perfectly parallelized.
p
Using more cores
s
![Page 11: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/11.jpg)
• Streaming applications are prevalent– Audio, video, real-time tasks, cognitive
applications
• Constrains:– Inter-arrival time (T)– End-to-end delay (D)
• Power aware mapping to CMPs– Determine speeds– Account for communication– Exclude faulty cores
T
D
Mapping streaming applications to CMPs
![Page 12: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/12.jpg)
Mapping a linear task graph onto a linear pipeline
If the # of stages = # of cores
Core Core Core Core
Dt stagen
stage stage )(
0
Tt stagestagestage )(max
)(0 stage
n
stage stagee minimize
Subject to
ei : energy for executing stage i
i : energy for moving data from stage i-1 to stage i
ti : time for executing stage i
i : time for moving data from stage i-1 to stage i
Find tstage
![Page 13: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/13.jpg)
1) Group the stages so that the number of stages equals the number of cores
2) Use a dynamic programming approach to explore possible groupings
3) A faster solution may guarantee optimality within a specified error bound.
Core Core Core Core
If the # of stages > # of cores
Mapping a linear task graph onto a linear pipeline
![Page 14: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/14.jpg)
• Timing constraints are conventionally satisfied through load balanced mapping
• Additional constraint– Minimize energy consumption– Maximize performance for a given energy budget– Avoid faulty cores
instance
instance
A
B C
E
D
FG H I
J
K
Mapping a non-linear task graph onto CMP
A
B
CD
FEG
H IJK
Maximum speed
Medium speed
Minimum speed
![Page 15: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/15.jpg)
Turn OFF some PEs
Maximum speed/voltage (fmax)
instance
instance
A
B C
E
D
FG H I
J
K
A
B C
D
F
E G
HI
J
K
Medium speed/voltage
Minimum speed/voltage (fmin)
PE OFF
![Page 16: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/16.jpg)
DVS using Machine Learning Characterize the execution state of a core by
• Rate of instruction execution (IPC) • # of memory accesses per instruction• Average memory access time (depends on other threads)
During training, record for each state• The core frequency • The energy consumption
Determine the optimal frequency for each state
During execution, periodically,Estimate the current state (through run-time measurements)Assume that the future is a continuation of the presentSet the frequency to the best recorded during training
MMC
core
L1 $$
core core core
L1 $$ L1 $$ L1 $$
L2 $$ L2 $$ L2 $$ L2 $$
![Page 17: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/17.jpg)
17
Training phase
Runtime
Learning engine
determine freq. & voltages
Integrated DVS policy
Auto. policy generator
Statistical learning applied to DVS in CMPs.
![Page 18: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/18.jpg)
deadline
If you have a time slack:
1) add checkpoints
2) reserve recovery time
3) reduce processing speed
For a given number of checkpoints, we can find the speed that minimizes energy consumption, While guaranteeing recovery and timeliness.
Smax
Using time redundancy (checkpointing and rollbacks)
Energy-Reliability tradeoff
![Page 19: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/19.jpg)
More checkpoints = more overhead + less recovery slack
D
C
r
Optimal number of checkpoints
For a given slack (C/D) and checkpoint overhead (r/C),we can find the number of checkpoints that minimizes energy consumption, and guarantee recovery and timeliness.
# of checkpoints
Energy
![Page 20: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/20.jpg)
Faults are rare events
If a fault occurs, may continue executing at Smax after recovery.
![Page 21: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/21.jpg)
Non-uniform check-pointing
Observation: If a fault occurs, may continue executing at Smax after recovery.
Advantage: recovery in an early section can use slack created by execution of later sections at Smax
Disadvantage: increases energy consumption when a fault occurs (a rare event)
Requires non-uniform checkpoints.
![Page 22: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/22.jpg)
Triple Modular Redundancy vs. Duplex
TMR: vote and exclude the faulty result
Duplex: Compare and rollback if different
Efficiency of TMR Vs. Duplex depends on • static power (l), • checkpoint overhead and • load
Duplex is more Energy efficient
TMR is more Energy efficient
0.02
0.035 Load=0.7
l0.1 0.2
checkpoint overhead
![Page 23: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/23.jpg)
Add memory power to the mix
Example: DRAM and SRAM modules can be switched between different power states (modes) – not free:
- Mode transition power overhead- Mode transition time overhead
Active(779.1 mW)
Power-down(150 mW)
Standby(275.0 mW)
Self-refresh(20.87 mW)
5ns5ns1000ns5ns
auto
![Page 24: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/24.jpg)
• keep a histogram for patterns of bank accesses and idle time distributions.
• Use machine learning techniques to select the optimal “threashold” to turn banks off.
OS assisted Memory Power Management?
![Page 25: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/25.jpg)
Example of compiler assisted Memory Power Management?
….Load x ….Store x….Load z….Load y….Store z….Store y….….
Load x Load y….….….….….….Store yLoad z….Store z
Compiler transformation
Code transformations to increase the memory idle time (the time between memory accesses).
![Page 26: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/26.jpg)
Declare A[], B[], C[], D[]….Access A….Access D….Access B….Access C….Access B….….
Memory allocation
Algorithms that use the access pattern to allocate memory to banks in a way that maximizes bank idle times
A[], B[] C[], D[]
A[], D[] C[], B[]
OR
Example of compiler assisted Memory Power Management?
![Page 27: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/27.jpg)
Phase Change Memory (PCM)A power saving memory technology
• Solid State memory made of germanium-antimony alloy• Switching between states is thermal based (not electrical
based)
• Samsung, Intel, Hitachi and IBM developed PCM prototypes (to replace Flash).
![Page 28: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/28.jpg)
Properties of PCM• Non-volatile but faster than Flash
• Byte addressable but denser and cheaper than DRAM
• No static power consumption and very low switching power
• Not susceptible to SEUs (single event upsets) and hence do not need error detecting or correcting codesoErrors occur only during write (not read) – use a simple
read-after-write to detect errors
![Page 29: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/29.jpg)
So, where is the catch?• Slower than DRAM
• factor of 2 for read and 10 for write• Low endurance
• A cell fails after 107 writes (as opposed to 1015 for DRAM)
• Asymmetric energy consumption • write is more expensive than read
• Asymmetry in bit writing • writing 0s is faster than writing 1s
![Page 30: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/30.jpg)
CPU
MemoryController
DRAM
CPU
AEB MM
PCM
Traditional architecture Proposed architectureAEB: acceleration/endurance bufferMM: memory manager
Goal: use PCM as main memory
Advantages: cheaper + denser + lower power consumption
![Page 31: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/31.jpg)
Dealing with asymmetric read/write
• Use coherence algorithms in which “writes” are not on the critical path.
• Design algorithms with “read rather than write” in mind
• Take advantage of the fact that writing 0s is faster than 1s• Pre-write a block with 1’s as soon as block is dirty in
the cache• On write back, only write 0’s .
![Page 32: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/32.jpg)
Dealing with low write endurance(write minimization)
• Block (or page) allocation algorithms should not be oblivious to the status of the block – for wear minimization
• Modify the cache replacement algorithm• ex. LRR replacement (least recently read)• Lower priority to dirty pages in cache replacement
• Use coherence algorithms that minimize writes (write-through is not good)
• Read/compare/write == write a bit only if it is different than the current content
![Page 33: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/33.jpg)
Wear leveling
• Memory allocation decisions should consider age of blocks (age = number of write cycles exerted)
• Periodically change the physical location of a page (write to a location different than the one read from)
• Consider memory as a consumable resource - can be periodically replaced
![Page 34: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/34.jpg)
Memory Manager
CPUBus
Interface
Request ControllerTag Array
V St CPU R/W Size Tag AEB PCMRequests Buffer
In FlightBuffer
(SRAM)
FSM
DRAMController/DMAC
PCMController/DMAC
V D Addr
n
In Flight Buffer Busy Bitmap
AEBPage
Cache
In FlightBuffer
Spare Table
PCMPagesArea
Spares
V D AddrV D Addr
Control bus
Data bus
Detailed architecture
![Page 35: 10 years of research on Power Management (now called green computing)](https://reader031.fdocuments.us/reader031/viewer/2022012922/5681691f550346895de04c13/html5/thumbnails/35.jpg)
Conclusion
Time constrains(deadlines or rates) Energy constrains
Reliability constrains
It is essential to manage the tradeoffs between
Hardware
Compiler OS