Runtime Software Power Estimation and Minimization Tao Li
description
Transcript of Runtime Software Power Estimation and Minimization Tao Li
Runtime Software Power Estimation and Minimization
Tao Li
Power-aware Computing
Power: Software Perspective & Impact
Power estimation: the first step to power management & optimization
Software contributes to & largely impacts power consumption
It is crucial to model power from the perspective of software
Evaluate software energy in early design stage
Understand impact of software optimizations on energy
Support run-time power management and optimizations
Power: Software Perspective & Impact (Contd.)
Instruction level modeling Computation intensive
High level macro-modeling Difficult to apply to general code
Event counting based modeling Impacted by the availability of performance counters
Architecture level simulation Large slowdown
Software Power Estimation: Current Techniques
Challenges in Run-time Power Estimation
High fidelity & fast speed
On-the-fly estimation capability, non-intrusive & low overhead
Simplicity, availability and generality
Experimental Methodology
SoftWatt: cycle-accurate & full-system power simulation framework
SimOS infrastructure, Wattch power model
Commercial OS & real applications
Out-of-order superscalar processor
Caches & memory hierarchy
Low-power disk
Experimental Methodology (Contd.)
Applications
E-mail and file management (sendmail, fileman)
Java (SPECjvm98: db, jess, javac, jack, mtrt, compress)
SPECInt95 (gcc, vortex)
Database (Postgres: select, update, join)
Miscellaneous (pmake, osboot)
OS Power Characterization OS power varies from one application to another
29 Watt (gcc) ~ 66 Watt (fileman)
Variance of power consumption in OS service routines & invocations
0102030405060
Av
g. P
ow
er
(W)
0246810121416
Std
. D
ev
. (%
)
Avg. Power (W) Std. Dev.(%)
OS Power Characterization (Contd.)
OS routine power correlates with its performance
Circuits used to exploit ILP burn significant portion of power
The number of in-flight instructions that flow through impacts circuit switching activity
For a given OS routine, similar IPC indicates similar circuit switching activity and therefore, similar power
OS Routine Power-Performance Correlation
SCSI Disk Interrupt Handler Read File System Call
Routine Level OS Power Model
Idea: use a linear regression model
Proutine=k1*IPCroutine+k0
to track the OS routine power showing different performance
Energy(OS)= Sum [ Energy(OS routines) ]= Sum [ Power(OS routines)*Time(OS routines) ]
Routine Level OS Power Model (Contd.)
Regression Model P = k1×I PC+k0 OS
Services k1 k0 ε
Comment
utlb 23.6 6.2 0.17% TLB miss handler
COW_fault 32.1 1.1 0.19% copy-on-write fault simscsi_ intr 33.9 1.3 1.94% SCSI disk I / O interrupt clock 36.4 0.6 2.68% clock interrupts read 29.6 4.7 4.53% read fi le write 34.3 1.5 1.27% write fi le open 34.3 1.2 0.41% open a fi le or serial port
close 30.4 3.9 2.61% close an open channel
: Model Fitting Error
Pre-characterization Low level energy simulation
Model fitting
Run-time estimation OS routine boundaries Evaluation using counter values
Routine Level OS Power Modeling
Routine based Regression ModelProutine=k1*IPCroutine+k0
-2%
-1%
0%
1%
2%
se
nd
ma
il
file
ma
n
db
jes
s
po
stg
res
.se
lec
t
po
stg
res
.up
da
te
os
bo
ot
Es
tim
ati
on
Err
or
(%)
Flat Regression ModelPOS=g1*IPCOS+g0
-15%
-10%
-5%
0%
5%
10%
15%
sen
dm
ail
file
man d
b
jess
po
stg
res.
sele
ct
po
stg
res.
up
dat
e
osb
oo
t
Est
imat
ion
Err
or
(%)
Cumulative Estimation Error
Flat Regression Model POS=g1*IPCOS+g0
Per-routine Estimation Error
Routine based Regression ModelProutine=k1*IPCroutine+k0
Per-routine Estimation Error (Contd.)
OS Energy Dissipation
0%10%20%30%40%50%60% % of OS Cycles
% of OS Energy
92% 89%
Phases in Programs(8-issue machine)
0
1
2
3
4
5
6
0.00 0.07 0.13 0.20 0.26Execution Time (seconds)
IPC
Benchmark: SPECjvm98 jessBenchmark: SPECjvm98 jess
Resources are utilized differently during different phases of program execution
Average IPC - User: 2.1, OS: 1.1
Power Minimization via Processor Resource Adaptations
Adapt processor resources to program needs
What can be adapted?
Bandwidth of fetch/decode/issue/retire…
Size of instruction window, re-order buffer,
load store queue…
Reduce power, retain performance
Effects of Tuning Processor Resource for the OS
8-issue -> 4-issue
OS Performance degradation: 4%
OS Power savings: 50%
1-issue 2-issue 4-issue 6-issue 8-issue
OS IPC 0.88 1.09 1.15 1.19 1.21OS Power(W) 6.4 12.2 21.7 31.1 42.8
Previous Approach for Adaptations
Sampling
Cycles
Sampling Window
IPC (Inst. Per Cycle)
Adaptation
A B C D E F
Problems with Sampling based Adaptations (Contd.)
OS executions Short-lived
AdaptationOverhead
User User UserOS OS User
sampling window
A B CTa
Ts
smallersamplingwindow
OS UserUser OS User
Th
OS-aware Routine based Adaptations
OS-aware: Identify OS executions via processor execution modes Just-in-time & full coverage of OS activities
Routine-based: Adapt processor resources at OS routine boundaries
Precise exceptions: drained pipeline Achieve minimum adaptation overhead
User UserOS OS User
OS Routine-basedOptimal Adaptations
Adaptations w ithMinimum Overhead
OS-aware Routine based Adaptations (Contd.)
Apply optimal adaptation for individual OS routine Exploit the routine level Energy-Delay Product
variance
0
0.2
0.4
0.6
0.8
1
clock COW_fault read
No
rma
lize
d E
ne
rgy
-De
lay
Pro
du
ct
1-issue2-issue4-issue6-issue8-issue
OS ServicesOS Services
Routine based Adaptations: OS Power
0
0.2
0.4
0.6
pmak
egcc
vorte
x
sendm
ail
filem
an dbje
ss
java
cja
ck
postgr
es.s
elec
t
postgr
es.u
pdat
e
osboot
AVG
No
rma
lize
d P
ow
er
Sampling based Adaptation (Window Size: 2048-cycle)Sampling based Adaptation (Window Size: 128-cycle)Routine based Adaptation
OS Performance
0.7
0.8
0.9
1
pmak
egcc
vorte
x
sendm
ail
filem
an dbje
ss
java
cja
ck
postgr
es.s
elec
t
postgr
es.u
pdat
e
osboot
AVG
No
rma
lize
d I
PC
Sampling based Adaptation (Window Size: 2048-cycle)Sampling based Adaptation (Window Size: 128-cycle)Routine based Adaptation
OS Power & Performance Tradeoff
0
0.2
0.4
0.6
0.8
No
rma
lize
d E
ne
rgy
.De
lay Sampling based Adaptation (Window Size: 2048-cycle)
Sampling based Adaptation (Window Size: 128-cycle)Routine based Adaptation