9/23/20151 Energy Efficient Scheduling Techniques For Real-Time Embedded Systems Rabi Mahapatra.
-
Upload
kerry-mills -
Category
Documents
-
view
217 -
download
0
Transcript of 9/23/20151 Energy Efficient Scheduling Techniques For Real-Time Embedded Systems Rabi Mahapatra.
04/21/2304/21/23 11
Energy Efficient Scheduling Techniques Energy Efficient Scheduling Techniques For Real-Time Embedded SystemsFor Real-Time Embedded Systems
Rabi Mahapatra
04/21/2304/21/23 22
OutlineOutline
• Introduction• Motivation• Related Work• Single Processor Systems• Distributed Multiprocessor Systems• Experiments & Results• Summary
04/21/2304/21/23 33
Introduction
PDAaudio/video entertainment devices robots Handheld computer
Mobile Phone Network Camera Wireless presentation Gateway Cerfcube
Sample Embedded Systems
04/21/2304/21/23 44
Application Specification for Application Specification for Embedded SystemsEmbedded Systems
• Periodic Task graphsPeriodic Task graphs• Each task characterized Each task characterized
by:by:• PeriodPeriod• Execution timeExecution time• DeadlineDeadline
• Sporadic TasksSporadic Tasks• Invoked at any timeInvoked at any time• Hard deadlineHard deadline
• Soft AperiodicSoft Aperiodic• Invoked at any timeInvoked at any time• No deadlineNo deadline
t1
t3
t2
t4
t5
Period =90, Deadline =90
Sporadic Task, Deadline =30
Typical Input Specification of Embedded Systems
04/21/2304/21/23 55
Why Low Power ?Why Low Power ?
High Power dissipation causes Chip failures
Expensive Cooling & Packaging overheads High Manufacturing Costs
Portable Systems, User convenience limited by: Battery Size Recharging Interval
04/21/2304/21/23 66
Power ManagementPower Management Processor power dissipation is a function of Processor power dissipation is a function of
α . Cl . V2dd . f
Various Low-Power TechniquesVarious Low-Power Techniques System-LevelSystem-Level Architecture-LevelArchitecture-Level Circuit-LevelCircuit-Level
System-Level power reduction techniques:System-Level power reduction techniques: Dynamic Voltage ScalingDynamic Voltage Scaling Dynamic Power ManagementDynamic Power Management
04/21/2304/21/23 77
System Level Power Management System Level Power Management TaxonomyTaxonomy
SLPM
DPM LPS (DVS)
Fixed Tasks Variable Tasks
Single Processor
Multiprocessors
Single processor
Multiprocessors
Fixed Task set Variable Task set
Single Processor
Multiprocessors
Single Processor
MultiprocessorsD≤P (contd ..)
No Restrictions
Tolerance DL
Hard Realtime
SLPM – System Level Power ManagementDPM – Dynamic Power ManagementLPS – Low power Scheduling
04/21/2304/21/23 88
System Level Power Management System Level Power Management Taxonomy (contd …)Taxonomy (contd …)
D≤P
Tolerance DL Hard Real time
No Precedence With Precedence
Periodic
Periodic + Sporadic
Periodic
Periodic + Sporadic
04/21/2304/21/23 99
Our ObjectiveOur Objective
Given Embedded system and its application task graphs with library functions (i.e. period, execution time, Deadline etc.), our goal is
toReduce the system wide power consumption while
guaranteeing the deadlines
04/21/2304/21/23 1010
Related WorkRelated WorkMulti-Processor J.Luo and N.K.Jha, 2001 J.Luo and N.K.Jha, 2001
““Battery-Aware static scheduling”Battery-Aware static scheduling”• Global shifting scheme & local schedule transformationsGlobal shifting scheme & local schedule transformations• More suitable to small scale systemsMore suitable to small scale systems
R.Mishra, N.Rastogi, and D.Zhu, 2003R.Mishra, N.Rastogi, and D.Zhu, 2003 ““Energy aware scheduling for distributed”Energy aware scheduling for distributed”
• Greedy and gap-filling dynamic power management techniquesGreedy and gap-filling dynamic power management techniques• Limited to task graphs with equal deadlineLimited to task graphs with equal deadline
D. Zhu, R. Melhem, and B. Childers, 2003 D. Zhu, R. Melhem, and B. Childers, 2003 ““Scheduling with Dynamic Voltage/Speed”Scheduling with Dynamic Voltage/Speed”
• Slack sharing among processors, global queueSlack sharing among processors, global queue• Limited homogenous systems with shared memoryLimited homogenous systems with shared memory
Single Processor:Single Processor: G.Quan, and X. HU, 2001 G.Quan, and X. HU, 2001
Minimum constant voltage for each intervalMinimum constant voltage for each interval Assumes deadline less than or equal to period.Assumes deadline less than or equal to period.
V.Swaminathan, and K.Chakrabarty, 2000V.Swaminathan, and K.Chakrabarty, 2000 Low-energy earliest deadline first heuristicLow-energy earliest deadline first heuristic No guarantee on required maximum processor speedNo guarantee on required maximum processor speed
04/21/2304/21/23 1111
ContributionsContributions
Provides a framework for single processor that consider tasks
Whose response time is greater than the period. With Precedence constraints
Introduced chain of task set based execution approach to model low-power in distributed embedded systems.
04/21/2304/21/23 1212
Energy Efficient Scheduling Techniques for Single Processor
04/21/2304/21/23 1313
Proposed Approach
“A 3-step approach to reduce power in single processor embedded systems with arbitrary response times and precedence constraints.”
Step1: Task priority assignment that guarantees precedence constraints.
Step 2: Determination of task speed that guarantees deadlines. reduces power consumption.
Step 3: Dynamic power management Idle Intervals. Run-time variations in task execution time.
04/21/2304/21/23 1414
Task ModelingTask Modeling Periodic task graphs
Scheduled according to their priorities
Sporadic task Invoked at any time Hard deadline Execution slot is needed Let ‘’ be the worst-case execution time and ‘d’ be the
deadlineExecution Slots are defined with
Period : d - Deadline: d -
04/21/2304/21/23 1515
STEP 1 : Priority Assignment
Remove the node with no Predecessor and least slack
time
END
Arrange the task graphs & EX. Slots in increasing order Of their period
Remove the task graph with smallest period
Assign the node next highest priority
If all nodes in the Graph are Assigned priorities
no
List is emptyyes yes
no
04/21/2304/21/23 1616
STEP 2 : Task Speed Determination
yes
Find the task with largest speed, ‘s’.
Mark the speed for this task and all other high priority tasks as ‘s’
Arrange tasks in decreasing order of priority
For each task in the list,determine the speed at whichthe task and all high prioritytasks in the list can be run
Remove all these tasks from the list
List is empty
END
no
04/21/2304/21/23 1717
Task SchedulabilityTask Schedulability Let = {T1,T2,…,TN} be the task set arranged in
decreasing order of priorities.
Characteristics of Ti : {Pi, ei, Di}. A task set is feasible if the deadline of all tasks are always met.
Critical Instant Theorem (Critical Instant Theorem (Liu and Layland, 1973)“Scheduling algorithms for multiprogramming” if a task meets its deadline whenever the task is requested simultaneously with all the high priority tasks, then the deadline will always be met for all task phasing.
04/21/2304/21/23 1818
In other words, the task set = {T1,T2,…,TN} is schedulable if and only ti Di i =1,..n, where
otherwise ti,j Di,j i =1,..n, and ‘j’ instances of ti, where ti,j = R(ti,j + (j-1)Pi) – (j-1)Pi , where
R(ti,j) = + j*ei …………… (2)
k
i
k k
i ePt
1
1+ ei ti if Pi Di ………….. (1)
k
i
k k
i ePt
1
1
Task Schedulability (Contd … )
04/21/2304/21/23 1919
STEP 2 : Task Speed Determination
yes
Find the task with largest speed, ‘s’.
Mark the speed for this task and all other high priority tasks as ‘s’
Arrange tasks in decreasing order of priority
For each task in the list,determine the speed at whichthe task and all high prioritytasks in the list can be run
Remove all these tasks from the list
List is empty
END
no
04/21/2304/21/23 2020
Step 3: Dynamic Power ManagementStep 3: Dynamic Power Management
During System operation, idle intervals arise During System operation, idle intervals arise when:when: Actual task execution time is less than the worst-case Actual task execution time is less than the worst-case
execution time. (that is assumed at the time of fixed execution time. (that is assumed at the time of fixed priority scheduling).priority scheduling).
Since these Idle intervals can not be exploited by Since these Idle intervals can not be exploited by off-line methods.off-line methods.
An on-line method that adapts the clock speed An on-line method that adapts the clock speed to take advantage of idle intervals is needed.to take advantage of idle intervals is needed.
04/21/2304/21/23 2121
DPM (Contd ..)DPM (Contd ..)
Schedule the tasks according to their pre-Schedule the tasks according to their pre-determined speeds in a preemptive manner.determined speeds in a preemptive manner.
If the current task has finished and the queue of If the current task has finished and the queue of ready tasks is empty, then:ready tasks is empty, then: Determine the length of idle intervalDetermine the length of idle interval If feasible, put the processor in the power If feasible, put the processor in the power
down mode.down mode.
04/21/2304/21/23 2222
Experimental SetupExperimental Setup
Event driven simulatorEvent driven simulator
Intel Strong Arm SA-1100 Embedded Intel Strong Arm SA-1100 Embedded Processor SpecificationsProcessor Specifications
Real-world test cases Real-world test cases (CNC controller, INS, avionics,…)(CNC controller, INS, avionics,…)
04/21/2304/21/23 2323
BenchmarksBenchmarksTest cases # Periodic
task graphs
# sporadic
tasks
# tasks
with D > P
Utilization
Synthetic I 3 1 2 0.52
Synthetic II 5 3 4 0.61
Synthetic III 10 5 8 0.737
CNC [1] 8 --- --- 0.488
INS [2] 6 -- --- 0.72
Avionics [3] 14 1 --- 0.692
Characteristics of various test cases
04/21/2304/21/23 2424
0
20
40
60
80
100
VLPS [5] proposed technique
CNC
INS
Various low power techniques
% Energy savings
Comparison of % Energy savings with variousLow power techniques
04/21/2304/21/23 2525
0. 00%
10. 00%
20. 00%
30. 00%
40. 00%
50. 00%
60. 00%
70. 00%
80. 00%
90. 00%
100. 00%
Synthetic I Synthetic I I Synthetic I I I CNC INS Avionics
various test cases
% E
nerg
y Sa
ving
s
%Energy Savings
% Energy Savings with the proposed technique on various test cases
04/21/2304/21/23 2626
Energy Efficient Scheduling Techniques for Multi-Processor Embedded Systems
04/21/2304/21/23 2727
OverviewOverview
PreliminariesPreliminaries System modelSystem model Slack distribution heuristicSlack distribution heuristic Periodical determination of service ratePeriodical determination of service rate Experiments & ResultsExperiments & Results
04/21/2304/21/23 2828
PreliminariesPreliminaries Command and control systems that comprise of hard Command and control systems that comprise of hard
real-time applications in a distributed environment.real-time applications in a distributed environment. An application comprises of:An application comprises of:
Chain(s) of tasks or Task setsChain(s) of tasks or Task sets Hard deadlinesHard deadlines Exchange of messages during executionExchange of messages during execution
Admitting task set (Connection establishment) : Key Admitting task set (Connection establishment) : Key IssuesIssues
Traffic descriptor [6]Traffic descriptor [6] Worst-case delay analysisWorst-case delay analysis
Power Reduction approachesPower Reduction approaches slack distributionslack distribution Clock speed adaptation during system run-timeClock speed adaptation during system run-time
04/21/2304/21/23 2929
System Model A task set is described by a vector triplet where
Di
Pi
niii
CCC ,..1
nii DD },.../{ 1
≡ ,........,P 1i
≡
A distributed system with 3 nodes & 2 task sets
PE1 PE2
PE3 M1
M2
),,( DCP iii
04/21/2304/21/23 3030
Admission of Task SetAdmission of Task Set Task set admission: Key PhasesTask set admission: Key Phases
Setting up task setSetting up task set Reply task setReply task set
Setting up task set : Key IssuesSetting up task set : Key Issues local worst-case delay < local deadlinelocal worst-case delay < local deadline end-to-end worst-case delay < end-to-end deadlineend-to-end worst-case delay < end-to-end deadline
Reply task set : Key IssuesReply task set : Key Issues Slack distributionSlack distribution Service rate < 1 (periodic service rate determination)Service rate < 1 (periodic service rate determination)
04/21/2304/21/23 3131
ObservationsObservations Processing of messages at a node can be extended up
to their delay bounds.
This slack can be utilized to increase the worst-case delay tolerable at the computational nodes involved in processing the task set.
The actual processing time demanded by the messages of a task set during the run-time varies and is less than the worst-case specification.
A technique to adapt the clock speed periodically is introduced to take advantage of run-time variations
04/21/2304/21/23 3232
Slack DistributionSlack Distribution
The slack in a task set is the difference between the end-The slack in a task set is the difference between the end-to-end deadline and the sum of the worst-case delays to-end deadline and the sum of the worst-case delays suffered at each node.suffered at each node.
This slack can be distributed among the nodes serving This slack can be distributed among the nodes serving task set to reduce the system energy consumption.task set to reduce the system energy consumption.
The slack is distributed among the nodes according to The slack is distributed among the nodes according to the service rate of the nodes.the service rate of the nodes.
04/21/2304/21/23 3333
Service Rate DeterminationService Rate Determination
Key Issues:Key Issues: Monitoring the traffic patternMonitoring the traffic pattern Feedback incorporation while determining Feedback incorporation while determining
service rate.service rate. Periodical service rate determinationPeriodical service rate determination
• guarantees processing of messages of guarantees processing of messages of outstanding intervals by their delay boundsoutstanding intervals by their delay bounds
• guarantees processing of messages of upcoming guarantees processing of messages of upcoming interval by their delay boundsinterval by their delay bounds
Scheduling policies considered: FCFS & WRRScheduling policies considered: FCFS & WRR
04/21/2304/21/23 3434
FCFS Scheduling Policy
k
1j
jtQ Qt
and the corresponding queue is determined according to
The service rate should be such that it must process the outstandingmessages that arrived during the interval (t-j,t-(j-1)) by their remaining delay bound. i.e., (dfcfs - j).
The new service rate at the beginning of every interval is determined The new service rate at the beginning of every interval is determined according toaccording to
k
jFCFS
tjttd
SS1
where k = /)1(,max nttt s
S jt
QjdSjt
FCFSjt )( .
04/21/2304/21/23 3535
WRR Scheduling PolicyThe new service rate at the beginning of every interval is determined according to
dS
ii
it
j
it
S k
1
ji,t
and the corresponding queue is determined according to
k
j
jit
it QQ
1
,
The service rate and the corresponding processing time demanded by the outstanding messages that arrived during the interval (t-j,t-(j-1)) are given by
S jit,
Q - . ji,t
, jdS iji
t
04/21/2304/21/23 3636
Experimental SetupExperimental Setup
Event driven simulatorEvent driven simulator
Socket interface for communicationSocket interface for communication
Intel PXA250 XScale Embedded ProcessorIntel PXA250 XScale Embedded Processor
Real-life test cases (DSP, Multimedia,..)Real-life test cases (DSP, Multimedia,..)
04/21/2304/21/23 3737
BenchmarksBenchmarks
Test CasesTest Cases Number Of Number Of NodesNodes
Number Of Number Of ConnectionsConnections
Number Of Number Of ModesModes
Synthetic I Synthetic I 33 1010 22
Synthetic IISynthetic II 55 2020 22
Synthetic IIISynthetic III 1010 3030 33
MultimediaMultimedia 44 44 33
DSP [4]DSP [4] 1616 3131 11Characteristics of various test cases
04/21/2304/21/23 3838
Test CasesTest Cases Mode 1 Mode 1 (nodes, (nodes, connections)connections)
Mode 2Mode 2(nodes,connecti(nodes,connections)ons)
Mode 3Mode 3(nodes,connectio(nodes,connections)ns)
Synthetic Synthetic (10,30)(10,30)
(9,20)(9,20) (9,25)(9,25) (10,30)(10,30)
Multimedia(Multimedia(4,4)4,4)
(3,2)(3,2) (3,3)(3,3) (4,4)(4,4)
Mode configurations for Multimedia and Synthetic test cases
Benchmarks (Contd …)
04/21/2304/21/23 3939
Energy Saving versus Slack distributationEnergy Saving versus Slack distributation
0
10
20
30
40
50
Srate Equal Wcet Greedy
Slack Distribution Schemes
Syste
m E
nerg
y S
avin
gs %
(3,10)
(5,20)
(10,30)
MM(4,4)
DSP(16,31)
FCFS
0
10
20
30
40
50
60
70
80
Srate Equal Wcet Greedy
Slack Distribution Schemes
Syste
m E
nerg
y S
avin
gs %
WRR
04/21/2304/21/23 4040
Energy Saving at different ModesEnergy Saving at different Modes
0
10
20
30
40
50
Synthetic 1 Synthetic 0.8 Multimedia 1 Multimedia 0.8
Normalised Peak Power
Syst
em E
nerg
y Sa
ving
s%
Mode 1
Mode 2
Mode 3
04/21/2304/21/23 4141
Service rate at intervalsService rate at intervals
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12
Intervals
Nor
mal
ised
Ser
vice
Rat
e
0.4
0.6
0.8
0.9
1
(10,30) at one node
04/21/2304/21/23 4242
Service rate vs MIService rate vs MI
0.35
0.4
0.45
0.5
0.55
0.6
1 2 3 4 5MI (Monitoring Interval)
Nor
mal
ised
Ser
vice
Rat
e
1
0.9
0.8
(3,10) at one node
04/21/2304/21/23 4343
Overhead due to number of task Overhead due to number of task sets on servicesets on service
100
150
200
250
300
1 5 10 15 20 25 30Number of connections
Ove
rhea
d(u
secs
)
(10,30) at one node
04/21/2304/21/23 4444
SummarySummary Energy Efficient Scheduling technique for Single Processor that:Energy Efficient Scheduling technique for Single Processor that:
handles Sporadic and periodic task graphs with precedence constraintshandles Sporadic and periodic task graphs with precedence constraints takes into account tasks with arbitrary response timestakes into account tasks with arbitrary response times determines minimum speed for each taskdetermines minimum speed for each task adapts clock speed to take advantage of idle intervals.adapts clock speed to take advantage of idle intervals.
A connection based task execution approach for distributed A connection based task execution approach for distributed embedded systems that:embedded systems that:
effectively distributes the slack available in the connection to reduce effectively distributes the slack available in the connection to reduce system wide power consumption.system wide power consumption.
periodically adjusts the clock speed to take advantage of run-time periodically adjusts the clock speed to take advantage of run-time variations.variations.
Experimental results indicate that the proposed techniques yield Experimental results indicate that the proposed techniques yield significant energy savings.significant energy savings.
04/21/2304/21/23 4545
ReferencesReferences
1. N. Kim, M. Ryu, S. Hong, M. Saksena, C. Choi, and H. Shin, “Visual assessment of a real time system design: A case study on a CNC controller,” in Proc. IEEE Real-Time Systems Symposium, December. 1996.
2. A. Burns, K. Tindell, and A. Wellings, “Effective analysis for engineering real-time fixed priority schedulers,” IEEE Trans. on Software Eng., vol. 21, no. 5, pp. 475–480, May 1995.
3. C. Locke, D. Vogel, and T. Mesler, “Building a predictable avionics platform in Ada: A casestudy,” in Proc. IEEE Real-Time Systems Symposium, December. 1991.
4. C. M. Woodside and G. G. Monforton, “Fast allocation of processes in distributed and parallel systems,” Proc. IEEE Trans. Parallel & Distr. Systems., vol. 4, no. 2, pp. 164-174, Feb. 1993.
04/21/2304/21/23 4646
References (Contd ..)References (Contd ..)
5. G.Quan, and X.Hu, “Energy efficient fixed priority scheduling for real-time 5. G.Quan, and X.Hu, “Energy efficient fixed priority scheduling for real-time systems on variable voltage processors,” In Proc. Design Automation systems on variable voltage processors,” In Proc. Design Automation Conference, June 2001.Conference, June 2001.
6. A.Raha, N.Malcom, and W.Zhao, “Guaranteeing end-to-end deadlines in 6. A.Raha, N.Malcom, and W.Zhao, “Guaranteeing end-to-end deadlines in ATM networks,” In Proc. International conference on Distributed ATM networks,” In Proc. International conference on Distributed Computing Systems, May 1995.Computing Systems, May 1995.
04/21/2304/21/23 4747
THANK YOU