Estimating the Worst-Case Energy Consumption of Embedded Software
description
Transcript of Estimating the Worst-Case Energy Consumption of Embedded Software
![Page 1: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/1.jpg)
1
Estimating the Worst-Case Energy Consumption of Embedded Software
Ramkumar Jayaseelan Tulika Mitra Xianfeng Li
School of ComputingNational University of Singapore
![Page 2: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/2.jpg)
2
Motivation
Conventional scheduling techniques give timing guarantees Processor cycles is the critical resource WCET of the tasks are required input
Battery life is equally important for mobile devices Scheduling technique have to give energy
guarantees Worst-Case Energy Consumption (WCEC) of the
tasks are required input
![Page 3: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/3.jpg)
3
Remotely Deployed Systems
Available energy unevenly distributed among nodes Spatio-temporal scheduling benefits from WCEC
Local Station
Sensor Network
![Page 4: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/4.jpg)
4
Energy-Based Guarantees
Scheduling critical and non-critical tasks in a battery-operated system
Non-critical tasks can be run only if energy constraints for critical tasks are satisfied
Worst-case energy estimation is crucial
![Page 5: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/5.jpg)
5
Reward-Based Scheduling
Energy consumption Voltage Delay (1 / Voltage) Reward-based scheduling attempts to satisfy
constraints on energy and timing Energy guarantee only if worst-case energy
consumption of tasks are known
![Page 6: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/6.jpg)
6
Outline
Background Relation between WCET and Worst-case
energy consumption Estimation technique: Simplified model Instruction cache and speculation Experimental results Conclusion
![Page 7: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/7.jpg)
7
Background
Power and energy are often used interchangeably
Power is energy consumed per unit time Energy consumed during program execution
E = P × t Approximation as P is also a function of time
![Page 8: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/8.jpg)
8
In reality when a program executes
Energy is the area under the curve E = ∫P(t)dt
E=P×T is an approximation
Power
Time
![Page 9: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/9.jpg)
9
WCEC versus WCET
13000
14000
15000
16000
17000
18000
19000
20000
21000
Program Inputs
En
erg
y(n
an
o J
ou
les)
4500
4600
4700
4800
4900
5000
5100
Execu
tio
n T
ime(c
ycle
s)
Total Energy
Execution Time
Full Input Space Expansion for a 5-element Insertion Sort program
![Page 10: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/10.jpg)
10
Cannot Estimate WCEC from WCETBenchmark WCET×avg_power
µJ
Observed
µJ
isort 489.92 525.88
fft 12106.49 10260.86
fdct 138.20 105.57
ludcmp 131.76 119.33
matsum 972.03 1154.31
minver 93.61 80.80
bsearch 3.84 3.07
des 724.05 643.75
matmult 178.12 166.88
qsort 54.79 43.73
qurt 23.80 17.65
Possible underestimation using WCEC=WCET × power
![Page 11: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/11.jpg)
11
WCEC versus WCET
WCEC path need not be the same as the WCET path
WCEC cannot be directly estimated from the WCET value
![Page 12: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/12.jpg)
12
A closer look at Power
Dynamic Power : Power Consumption due to switching of transistors
Leakage Power: Power consumed independent of switching activity
Dynamic power forms the bulk of power consumption in today’s processors
![Page 13: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/13.jpg)
13
Dynamic Power
Dynamic Power
P=(1/2) × A × V2 × C × fV is supply voltage
C is the capacitance of the circuit
f is the frequency
A is the activity factor V, C, f are independent of program execution Variation in P is due to the variation in A
![Page 14: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/14.jpg)
14
Variation in Activity Factor (A) Not all parts of the processor are used in
every cycle e.g., data-cache is used only for loads/stores
Clock gating disables unused components Activity factor (A) varies during the execution
of the program Model variation in A through static analysis
![Page 15: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/15.jpg)
15
Switch-off Energy
An inactive component cannot be fully switched off A certain portion of the peak energy is consumed
even in idle cycles Switch-off energy is proportional to the
number of idle cycles
![Page 16: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/16.jpg)
16
Clock Energy and Leakage Energy Clock power: power consumed in clock
distribution network Leakage power: power consumed due to
leakage in transistors Clock energy and leakage energy are directly
proportional to the execution time
![Page 17: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/17.jpg)
17
Energy Components Summary Dynamic Energy
Switching of transistors during execution Independent of execution time
Switch-off Energy Energy consumed in unused components Depends on idle cycles
Clock and Leakage energy Directly proportional to execution time
![Page 18: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/18.jpg)
18
WCEC versus WCET
13000
14000
15000
16000
17000
18000
19000
20000
21000
Program Inputs
En
erg
y(n
an
o J
ou
les)
4500
4600
4700
4800
4900
5000
5100
Execu
tio
n T
ime(c
ycle
s)
Total Energy
Execution Time
Full Input Space Expansion for a 5-element Insertion Sort program
![Page 19: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/19.jpg)
19
Our Analysis: Overview
Operate on the control flow graph Estimate worst-case energy of basic blocks Formulate estimation for whole program as
an integer linear programming (ILP) problem
![Page 20: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/20.jpg)
20
ILP Formulation
Input: Control flow graph of the program Objective function:
Need to estimate Worst-Case Energy Consumption( WCECB) for each basic block
Worst Case Energy = Worst Case Energy = WCEC WCECB B count countBB
![Page 21: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/21.jpg)
21
Flow Constraints
E0,1 = B0 = 1
E2,3 + E1,3 = B3 = 1
E0,1 + E2,1 = E1,2 + E1,3 = B1
E1,2 = E2,3 + E2,1= B2
Loop bound: E2,1 <= 100
B0
B1
B2
B3
Inflow = Basic Block Execution Count = OutflowInflow = Basic Block Execution Count = Outflow
Bounds on maximum loop iterationsBounds on maximum loop iterations
![Page 22: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/22.jpg)
22
Worst-Case Energy of a Basic Block Processor Model Energy Components
Instruction Specific Energy Pipeline Specific Energy
![Page 23: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/23.jpg)
23
Processor Model
I-1 I-4
I-2 I-3
IBUF
ROB
ALU
MULT
FPU
I+1I
IF
ID
EX
WB
CM
ISSUE
![Page 24: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/24.jpg)
24
Pipelined Execution of InstructionsADD R1,R2,R3
MUL R4,R5,R6SUB R7,R8,R9
1 2 3 4 5 6 7 8CC
ADD IF ID IS EX WB CM
MUL IF ID IS EX WB CM
SUB IF ID IS EX WB CM
Difficult to statically predict the energy consumption in each cycle
![Page 25: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/25.jpg)
25
Pipelined Execution of InstructionsADD R1,R2,R3
MUL R4,R5,R6SUB R7,R8,R9
1 2 3 4 5 6 7 8CC
ADD IF ID IS EX WB CM
MUL IF ID IS EX WB
SUB IF ID IS EX
Difficult to statically predict the energy consumption in each cycle
Stall Stall
![Page 26: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/26.jpg)
26
Our Approach
Determine the maximum energy consumed on a component by component basis
Static analysis to determine the maximum energy consumed by a component in a specified interval
![Page 27: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/27.jpg)
27
Execution of InstructionIF
ID
EX
WB
CM
ISSUE
![Page 28: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/28.jpg)
28
Instruction Specific Energy Energy consumed due to the sub-tasks
associated with execution of an instruction e.g., register file access, ALU usage, etc.
Depends on the type of executed instruction No correlation with execution time
![Page 29: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/29.jpg)
29
Pipeline Specific Energy
During program execution energy is consumed due to Switch-off power (idle cycles) Leakage power (every cycle) Clock network power (every cycle)
Cannot be attributed to any instruction Energy consumed even in idle cycles
![Page 30: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/30.jpg)
30
Energy Components
Observation: Energy consumed can be separated out as Instruction Specific energy
Energy associated with the execution of a particular instruction
Independent of execution time Pipeline Specific energy
Energy consumed in other components such as clock network, leakage etc.
Related to execution time
![Page 31: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/31.jpg)
31
Worst-case Energy of a Basic block
dynamicBB : Instruction-Specific Energy for BB
switchoffBB , leakageBB and clockBB are energy consumed in unused components, leakage and clock network during WCETBB
BBBBBBBBclockleakageswitchoffdynamicenergyBB
![Page 32: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/32.jpg)
32
Instruction Specific Energy
Energy consumed due to switching activity generated by the instructions in BB
Sum of energy consumed by individual instructions in BB
BBinstrinstrdynamicdynamic
BB
![Page 33: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/33.jpg)
33
Switch-off Energy
Unused units consume 10% of peak energy Switch-off energy for a specific component (C)
Switch-off energy for basic block BB
1.0)())(()(
1.0)()(_)(
CenergyCusesWCETCswitchoff
CenergyCcyclesIdleCswitchoff
BBBBBB
BB
componentsC
CswitchoffswitchoffBBBB
)(
![Page 34: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/34.jpg)
34
Clock Energy and Leakage Energy Clock Energy
Leakage Energy
BBcycleBBWCETyclockenergyclockenerg
BBcycleBBWCETenergyleakageenergyleakage __
![Page 35: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/35.jpg)
35
Overlap among basic blocks
B1 B2
BB
B3
B1
B3
Time
t1
t2
t3
t4
t5
WCETBB
![Page 36: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/36.jpg)
36
Switch-off Energy
Unused units consume 10% of peak energy Switch-off energy for a specific component (C)
Switch-off energy for basic block BB
1.0)())(()(
1.0)()(_)(
CenergyCusesWCETCswitchoff
CenergyCcyclesIdleCswitchoff
BBBBBB
BB
componentsC
BBBB Cswitchoffswitchoff )(
![Page 37: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/37.jpg)
37
Instruction Cache Modeling
Context based ILP formulation used in WCET analysis [Li et al RTSS 2004]
Basic block divided into memory blocks A context comprises of mapping each of
these memory blocks to hit/miss Estimate the worst-case energy of each
context taking into account main memory access energy
![Page 38: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/38.jpg)
38
Modeling Branch miss-prediction
BB’
BB
BB’
BX
BB
Time
t1
t2
t3
BX
![Page 39: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/39.jpg)
39
Objective function
count(c,ω) is the number of times the basic block Bi is executed with path from Bj and the branch is predicted correctly
count(m,ω) is similarly defined where the branch is miss-predicted
In a similar manner energy(c,ω) and energy(m,ω) are defined The ILP problem is solved to generate values for count using
constraints similar to WCET analysis
),(),(
),(),(1 )(
mcountmenergy
ccountcenergyEnergy
ijij
N
i ij iCijij
![Page 40: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/40.jpg)
40
Results
Platform: Simplescalar toolset Modified WCET analysis tool [Li et al RTSS
2004] to estimate worst-case energy Energy values for processor components
derived from parameterized models in Wattch ILP problem is solved using CPLEX
![Page 41: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/41.jpg)
41
Results
Compare estimated WCEC against the observed values for eleven benchmarks
Observed values are obtained using Wattch power simulator
Actual inputs producing WCEC is unknown Manually select inputs that might produce WCEC
![Page 42: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/42.jpg)
42
Styles of Clock Gating
Simple: Peak power is consumed even if there is one access to a specific component
Ideal : Power consumed is proportional to the number of ports accessed
Realistic: Same as ideal but unused components consume switch-off power
![Page 43: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/43.jpg)
43
Results
Results for ideal clock gating more accurate than simple because of distribution of accesses
Benchmarks
isort
fft
fdct
ludcmp
matsum
minver
bsearch
des
matmult
qsort
qurt
Est(µJ) Obs(µJ) Ratio
468.85 422.76 1.11
9600.99 8586.49 1.12
89.92 83.63 1.08
98.75 92.77 1.06
1012.83 929.94 1.09
63.66 59.61 1.07
2.54 2.40 1.06
546.41 518.22 1.05
149.70 132.08 1.13
34.90 31.16 1.12
13.98 11.91 1.17
Ideal Clock Gating
Est(µJ) Obs(µJ) Ratio
524.95 455.94 1.15
11057.50 9185.39 1.20
99.31 88.79 1.11
115.39 100.32 1.15
1227.37 994.11 1.23
74.91 64.15 1.17
3.51 3.07 1.14
613.16 553.74 1.10
172.39 136.93 1.26
39.50 33.84 1.17
16.36 12.97 1.26
Simple Clock Gating
![Page 44: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/44.jpg)
44
Results
Results for ideal clock gating more accurate than realistic because of conservative WCET estimation
Benchmarks
isort
fft
fdct
ludcmp
matsum
minver
bsearch
des
matmult
qsort
qurt
Est(µJ) Obs(µJ) Ratio
596.93 525.88 1.14
13631.21 10260.86 1.33
121.65 105.57 1.15
139.75 119.33 1.17
1397.72 1154.31 1.21
90.95 80.80 1.13
3.81 3.07 1.24
715.58 643.75 1.11
212.94 166.88 1.28
49.84 43.73 1.14
21.95 17.65 1.24
Realistic Clock Gating
Est(µJ) Obs(µJ) Ratio
468.85 422.76 1.11
9600.99 8586.49 1.12
89.92 83.63 1.08
98.75 92.77 1.06
1012.83 929.94 1.09
63.66 59.61 1.07
2.54 2.40 1.06
546.41 518.22 1.05
149.70 132.08 1.13
34.90 31.16 1.12
13.98 11.91 1.17
Ideal Clock Gating
![Page 45: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/45.jpg)
45
Conclusion
Static worst-case energy estimation technique that takes into account pipelining, instruction cache and branch prediction
Future work Validation using commercial processors Explore the possibility of providing thermal
guarantees
![Page 46: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/46.jpg)
46
Execution of an Add InstructionIF
ID
EX
WB
CM
ISSUE
I-Cache Access
Instruction Decode + Rename Logic
Wakeup + Selection logic
Register File Read + Add unit access
Result Bus
ROB-retire + Register file Update
ADD
ADD
ADD
ADD
ADD
ADD
![Page 47: Estimating the Worst-Case Energy Consumption of Embedded Software](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813ffd550346895dab2ce1/html5/thumbnails/47.jpg)
47
Instruction Specific Energy
Each Component Accessed once Selection logic maybe accessed multiple times Instruction Specific Energy is
BBinstrinstrBBcycleBB dynamicwcetpowerselectiondynamic _