Performance Analysis of Concurrent & Distributed Real-Time Software Designs ECEN5053 Software...
-
Upload
jonathan-higgins -
Category
Documents
-
view
222 -
download
0
Transcript of Performance Analysis of Concurrent & Distributed Real-Time Software Designs ECEN5053 Software...
Performance Analysis of Concurrent & Distributed Real-Time Software
Designs
ECEN5053 Software Engineering of Distributed Systems
University of Colorado
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
2
Overview
Why bother
Review of RMA
Advanced RMA
Event Sequence Analysis
Examples
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
3
Why bother?
Quantitative analysis allows for early detection of potential performance problems
Both Rate Monotonic Analysis and Event Scheduling Analysis are applied to designs
Task architecture level
Provides early performance estimate and characterization, e.g. where are bottlenecks
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
4
A Word About the SPE model
The SPE model (Smith and Williams) can model distributed systems or single CPU systems
Represent components whether they are software or hardware or both
Specify varying workloads
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
5
Review of RMA
Priority based scheduling of concurrent tasks with hard deadlines
Same CPU
Can be used in environments with less rigid constraints
For example, server role in a client/server application
Assumes priority preemption scheduling algorithm
Can be applied where task synchronization is required
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
6
Basic Theory
InitiallyIndependent periodic tasks
• Do not communicate with each other• Do not synchronize with each other
Periodic task has A period T, frequency with which it executesAn execution time C, CPU time required/periodCPU utilization of C/T
Group of tasks is schedulable if each task can meet its deadlinesAssign a fixed priority such that the shorter period has the higher priority
RMA Review (cont. 1)
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
7
RMA Review (cont. 2)
A set of n independent periodic tasks scheduled by the rate monotonic algorithm will always meet its deadlines for all task phasings, if:
C1/T1 + … + Cn/Tn <= n( 21/n – 1) = U(n)
where Ci and Ti are the execution time and period of task ti, respectively.
(Note: the upper bound converges to 69% as the number of tasks approaches infinity.)
U(1) = 1.000 U(2) = .828 U(3) = .779 U(4) = .756U(5) = .743 U(6) = .734 U(7) = .728 U(8) = .724
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
8
Conclusions & Assumptions
The rate monotonic algorithm is stable when there is a transient overload
A subset of the total number of tasks (highest priorities) will still meet their deadlines if the system is overloaded for a relatively short time.
Context switching overhead is included in the CPU times of the interrupting tasksThe Utilization Bound Theorem is pessimistic. If it fails, we can do a further check by applying a second theorem to get an exact determination of whether the tasks are schedulable.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
9
Completion Time Theorem -- Thm 2
For a set of independent periodic tasks, if each task meets its deadline when all tasks are started at the same time, the deadlines will be met for any combination of start times.
Check the end of the first period of task ti as well as the end of all periods of higher priority tasks.
Remember the higher priority tasks have shorter periods
These are called scheduling points
Can be illustrated graphically with a timing diagram
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
10
Time-annotated sequence diagram
t1 t2 t3Time in msec
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
11
Contradictions to Basic RMA Theory
Sometimes tasks execute at actual priorities different from their rate monotonic priorities – priority inversion
For example, a lower priority task must execute its critical section at a higher priority to avoid being preempted by a higher priority task that shares the same resource but is mutually excluded
• Support mutual exclusion
• Avoid deadlock
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
12
Contradictions to Basic RMA Theory - 2
Aperiodic tasks can be treated as periodic tasks where the worst-case inter-arrival time is its “period”
If this “period” is longer than another, it will be assigned a lower priority
Often aperiodic tasks are interrupt-driven and execute as soon as the interrupt arrives
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
13
Accounting for Priority InversionExtend Theorem 1 (Utilization Bound)
Four factors need to be considered to determine whether task ti can meet its first deadline
Preemption time by higher priority tasks (periods less than ti) Cj/Tj for each task
Execution time for task ti, Ci/Ti
Preemption by higher priority tasks with longer periods, that is, non-rate-monotonic priorities.
• Can only interrupt ti once (why?)
• Ck is the sum of their execution times
• Ck/Ti because worst case is that it all occurs in i’s period
Blocking time by lower priority tasks – once/Ti
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
14
Generalized Utilization Bound Thm
CkTi
1
Tj
Cj
+j n
Ci + Bi + k l
Ui is the utilization bound during period Ti for task ti. The first term is the total preemption utilization by higher priority tasks with periods of less than ti’s. The second term is the CPU utilization by task ti. The third term is the worst-case blocking utilization experienced by ti. The fourth term is the total preemption utilization by higher priority tasks with longer periods than ti’s period. (Terms 3 and 4 are instances of priority inversion.)
If Ui is less than the worst-case upper bound for U(i), this means the task ti will meet its deadline. The utilization-bound test must be applied to each task. Since rate monotonic priorities are not guaranteed, ti may meet its deadline while a higher priority task does not.
Ui =
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
15
Generalized Completion Time Theorem
Assumes the worst case that all tasks are ready for execution at the start of the task ti’s period.
Draw the timing sequence diagram for all the tasks and take into account the priority inversion as well as preemption that can occur.
If each task meets its first deadline while all higher priority tasks meet all of their deadlines up to that point and all priority-inverted tasks meet their deadlines up to that point, then ti will meet its deadlines.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
16
Task scheduling and DesignCautious approach at design time
Use estimatesSatisfy Thm 1, the conservative one, not just Thm 2
If some tasks with lower priorities have soft real-time or non-real-time tasks
Ok to exceed utilization bound somewhatIf ok to miss their deadlines/targets occasionally
At design time, can choose priorities to assignAim for rate monotonic priorities for periodic tasksAssign highest priorities to interrupt-driven tasks to reflect realityIf 2 tasks have same period, assign one a higher priority based on application semantics
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
17
Example of Generalized RMA
4 tasks, t1 and t3 are periodic and t2 and ta are aperiodic
ta is interrupt-driven and must execute within 200 ms of the arrival of its interrupt or data will be lost t2 has a worst-case interarrival time of T2.
t1 is periodic: C1 = 20; T1 = 100; U1 = 0.2
t2 is aperiodic: C2 = 15; T2 = 150; U2 = 0.1
ta is aperiodic, interrupt-driven: Ca = 4; Ta = 200; Ua = 0.02
t3 is periodic: C3 = 30; T3 = 300; U3 = 0.1
t1, t2 and t3 access a data repository protected by semaphore s.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
18
Notes, not meant for use as slide
If tasks assigned strict rate monotonic priorities, obviously the assignments in priority order from highest to lowest would be t1, t2, ta, and t3.
ta stringent response time tells us to give it the highest priority. The priority assignment becomes ta, t1, t2, and t3.
Overall CPU utilization is 0.42 which is less than worst-case utilization bound for infinity, namely 0.69.
Since rate monotonic priorities are not strictly assigned, we can’t rely on the basic Theorem 1, we need to apply the extended theorem 1 to each task individually.
ta is highest priority and interrupt-driven so there are no blockers. Ua is 0.02 < U(1) -- no problem meeting its deadline.
(cont. next slide)
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
19
Notes 2, not meant for use as slideConsider t1. Need to consider four factors :
a. Preemption time by higher priority tasks with periods less than T1. There are higher priority tasks (the aperiodic one) but not with shorter periods.
b. Execution time C1 for the task t1 = 20. U1 = 0.2
c. Preemption by higher priority tasks with longer periods. ta is one of these. Preemption utilization during the period T1: Ca /T1 = 4/100=0.04
d. Blocking time by lower priority tasks. Because of the semaphore, t2 and t3 can both potentially block t1. In the worst case, one of them will. But at most one lower priority task can actually block t1 (why?). The worst case is the task with the longer CPU time, t3 = 30. Blocking utilization during the period T1: B3 /T1 = 30/100 = 0.04
Worst case utilization = preemption util. +execution util. + blocking util. = .04 + .2 + .3 = .54 < worst-case upper bound of .69. t1 will be ok.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
20
NOTES 3
You do the calculation for tasks 2 and 3. Ask for help if you need it.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
21
Event Sequence Analysis
If done properly, during requirements definition, the system’s required response times to external events are specified
After task structuring, we can make a first attempt at allocating time budgets to the concurrent tasks
Event Sequence Analysis determines the tasks to be executed to service a given external event
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
22
Pick an external event
Determine which I/O task is activated by this event
Determine the sequence of internal events that follow in response
Identify the tasks that are activated
Identify the I/O tasks that generate the system response to the external event
Estimate CPU time for each task
Estimate CPU overhead, inter-task communication and synchronization overhead
Consider other tasks that execute during this period
Performance Analysis using ESA
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
23
CPU Utilization for ESA
The Sum (indented list) must be less than or equal to the specified system response time
CPU times for the tasks that participate in the event sequence
Times for additional tasks that execute
CPU overhead
Allocate a worst-case upper bound for uncertain CPU times
Overall CPU utilization, estimate for given interval
CPU time for each task, for each path if >= 1
Frequency of activation * tasks’ CPU times
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
24
Example of Perf. Analysis using ESA
Consider Cruise Control subsystem; see event sequence diagram
based on task architecture diagram
assume, for now, that all the other tasks in the system as well as Calibration in this subsystem have lower priorities so that we can ignore them
Consider first the case of the driver engaging the cruise control lever in the accelerate position resulting in controlled acceleration of the car.
Performance requirement: system must respond to driver’s action within 250 ms.
Sequence of internal events following the driver’s stimulus is shown by the event seq. on the concurrent collaboration diagram (Fig. 17.2 taken from Gomaa’s book).
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
25
Performance Analysis ESA example - cont.
Assume Cruise Control is in its initial state. ACCEL is the cruise control input.
Event sequence: (Ci is time to process event i)
C1: interrupt arrives from external cr. cont. lever
C2: CC Lever Interface reads the ACCEL input from the CC lever
C3: CC Lever interface sends a cc request message to CC
C4: CC receivse the msg, executes its state transition diagram, and changes state from Initial to Accelerating
C5: CC sends an increase speed command msg to Speed Adjustment
C6: Speed Adjustment executes the command, computes throttle value
C7: Speed Adj sends throttle value msg to Throttle Interface task
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
26
Performance Analysis ESA example - cont. 2
Event sequence continues:
C8: Throttle Interface computes new throttle position
C9: Throttle Interface outputs throttle position to the real-world throttle. (This is an output operation, uses no CPU time.)
Four tasks required to support the ACCEL external event
Minimum of four context switches required, 4*Cx where Cx is context switching overhead
Assume Cm is message communication overhead so C3, C5, and C7 are all equal to Cm
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
27
Performance Analysis ESA example - cont. 3
Execution time of this event sequence, Ce = what?
System response time, however, must also consider other tasks that could execute during the time when the system must respond to the external event.
Look at Fig. 17.2 (remember we have artificially decided that all other tasks have lower priorities -- they can’t execute during this time)
Assume Auto Sensors (C10) is periodically activated every 100 ms. It could execute 3 times before the 250 ms deadline.
Shaft Interface (C11) is activated once every shaft rotation. It could execute up to __?__ times assuming a shaft rotation max rate of 6000 rpm. This is once every __?__ .
Distance & Speed (C12) activates periodically once every quarter of a second. In the 250 ms window, it can execute _?_.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
28
Performance Analysis ESA example - cont. 4
Every time another task intervenes, there could be two context switches (assume 0.5ms for real-time)
assuming the executing task is preempted and then resumes execution after completion of the intervening taskThese three tasks could therefore impose an additional __?__ context switches.
Total CPU time Cother for these three tasks including system overhead is what?Estimated response time to the external event is greater than or equal to the total CPU time which is the sum of the tasks in the event sequence plus the CPU time for the other tasks. Ctotal = Ce + Cother
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
29
Performance Analysis ESA example - cont. 5
Make estimates for each of these timing parameters so that the equations can be solved (see table provided)
Substituting for the timing parameters results in estimated value of Ce = 35 ms.
Substituting for the estimated timing parameters adding up to Cother results in estimated value of 79 ms
Ctotal = 114 ms. This is well below the specified response time of 250 ms.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
30
Performance Analysis ESA example - cont. 6
How susceptible is the estimated response time to error?
Experiment with different values
What if context switching time were 1 ms instead of 0.5?
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
31
Performance Analysis using RMA & ESA
An external event activates a task. Its execution initiates a series of internal events which activate other internal tasks.
Can all the tasks in the combined event sequence be executed before the deadline?
Each internal event sequence can be analyzed regarding how much time it will take. The internal event sequences can then be treated as a group of tasks rate monotonically speaking …
That is, initially allocate all the tasks in the event sequence the same priority. These can collectively be considered one equivalent task from a real-time scheduling viewpoint.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
32
Performance Analysis using RMA & ESA - 2
This equivalent task has a CPU time equal to the
sum of the CPU times of the tasks in the event sequence
Plus context switching overhead
Plus message communication or event synchronization overhead
Worst-case inter-arrival time of the external event that initiates the event sequence is the period of this equivalent task.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
33
Performance Analysis using RMA & ESA - 3
To decide if the equivalent task can meet its deadline, apply the real-time scheduling theorems. Consider:
Preemption by higher priority tasks
Blocking by lower priority tasks
Execution time of the equivalent task itself
Cannot always replace all tasks in the event sequence by a single equivalent task
A task may be used in more than one event sequence
Executing the equivalent task at the chosen priority may prevent other tasks from meeting their deadlines.
May need to analyze tasks separately and assign different priorities
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
34
Performance Analysis using RMA & ESA - 4
Must consider preemption and blocking on a per task basis
Also necessary to determine whether all tasks in the event sequence will complete before the deadline.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
35
Perf. Analysis using RMA
Some considerationsConsider first a steady state involving only the periodic tasks.After that, the aperiodic externally-imposed demands on the system can be considered.Consider the worst steady state case, namely the case that causes maximum CPU demandRemember context switching timeYou can include aperiodic tasks if they have a known/estimated worst-case inter-arrival timeIf 2 tasks have same period, assign higher priority to the independent task*
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
36
Perf. Analysis using RMA - 2
Access time to shared data stores consists of one read instruction or one write instruction.
So small that potential delay time due to blocking of one task by another is considered negligible.
It’s guaranteed to be “short” and to “terminate” so don’t try to compute it as a blocking factor, just include it in its CPU time
Significant priority inversion delays can occur and those are the ones to consider
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
37
Perf. Analysis Example using RMA & ESA
Back to the Cruise Control example
Driver initiates an external event (CC lever or pressing the brake)
Must consider the tasks in the event sequence as well as the periodic tasks that execute on an ongoing basis when simply driving under CC
Earlier we replaced the four tasks in the event sequence with an equivalent aperiodic task
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
38
Perf. Analysis Example using RMA & ESA -2
Consider the impact of the additional load imposed by the driver-initiated external event on the steady state load of the periodic tasks.
The worst case is when the vehicle is already under automated control (CC). If it weren’t, Speed Adjustment and Throttle Interface wouldn’t be executing so the CPU load would be lighter
Input from CC lever. In the event sequence analysis, we saw CC Lever Interface, CC, Speed Adjustment, and Throttle Interface process this input. (CPU time Ce calculated at slide 29)
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
39
Perf. Analysis Example using RMA & ESA -3
Four tasks are involved but they must execute in strict sequence.
Each activated by msg from its predecessor.The four are equivalent to one aperiodic taskCe is the sum of the CPU times of the four tasks plus msg communication overhead and context switching overhead. We’ll call the combined task the “event sequence task”
In RMA, can treat aperiodic task as one whose period is the minimum inter-arrival time of the requests. Call it Te = 250 ms.For now, assume desired response is also Te
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
40
Perf. Analysis Example using RMA & ESA -4
When assigning priority to the event sequence task, initially assign its rate monotonic priority.
When we do this, the event sequence task has the same period as two other periodic tasks, Speed Adjustment and Distance & Speed.Assign the event sequence task the highest priority of those three
The event sequence task still has a lower priority than Shaft Interface, Throttle Interface, and Auto Sensors. (See Table 17.4, Gomaa)Ce for the event sequence task is 35 ms; Te is 250 ms; therefore CPU utilization Ue is 0.14
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
41
Perf. Analysis Example using RMA & ESA -5
Total CPU utilization of the periodic tasks is 0.48 (you can compute that if you don’t believe me )
Total periodic and event sequence task CPU utilization is 0.62 which is less than .69 and therefore less than U(n) where n is the number of periodic tasks plus 1
Therefore, the event sequence task can meet its deadline as can all the periodic tasks.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
42
Perf. Analysis Example using RMA & ESA -6
We made one assumption
All tasks can be allocated their rate monotonic priorities
What is wrong with giving the event sequence task its rate monotonic priority?
What is wrong with giving it the highest priority?
Compromise, give the event sequence task a priority lower than Shaft Interface but higher than Throttle Interface and Auto Sensors. This is higher than its rate monotonic priority.
What does THAT mean we’ll have to do?
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
43
Perf. Analysis Example using RMA & ESA -7
Overall CPU utilization is less than the 0.69
Bursts of activity can lead to transient loads that are much higher
In the 100 ms worst case CPU burst, the total utilization of the three steady state tasks and the one event sequence task is 67 %, allowing lower priority tasks to execute.
If the next highest priority task, Distance & Speed, were to also execute in this busy 100 ms, CPU utilization would increase to 78%
Comparing to the proper U(n) value, all tasks can meet their deadlines.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
44
Design Restructuring
If proposed design does not meet performance goals, design needs to be restructured
Revisit task clustering criteria and task inversion criteria
Consider sequential task inversionCC task sends a speed command msg to the Speed Adj task which in turn sends throttle msgs to the Throttle Interface task.These may be combined into one task, the CC tasks with passive objects for Speed Adj and Throttle Interface. This eliminates message communication overhead between them plus context switching overhead
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
45
Estimation & Measurement of Performance Parameters
Performance input parameters must be determined through estimation or measurement before the performance analysis is carried out.
Independent variables whose values are input to the performance analysis
Dependent variables are variables whose values are estimated by the real-time scheduling theory
Assumption for RMA, all tasks are locked in main memory so there is no paging overhead. Typically paging overhead cannot be tolerated in real-time system design.
October 23, 2002 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado
46
Estimation & Measurement of Performance Parameters -- 2
Individual task parameters that need to be estimated for each task involved in the performance analysisTask’s period Ti which is the frequency with which it executesExecution time Ci which is the CPU time required for the periodCPU overheads
Context switching overheadInterrupt handling overheadInter-task communication and synchronization overhead