thesis.pdf
-
Upload
sseliar-pablo -
Category
Documents
-
view
16 -
download
2
description
Transcript of thesis.pdf
-
Feedback Control Real-Time Scheduling
A Dissertation
Presented to the Faculty of the School of Engineering and Applied Science
University of Virginia
In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Computer Science
by
Chenyang Lu
May 2001
-
2
Copyright by
Chenyang Lu
All Rights Reserved
May 2001
-
3
Approvals
This dissertation is submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Computer Science
__________________________________
Chenyang Lu
Approved:
__________________________________
John A. Stankovic (Advisor)
__________________________________
Sang H. Son (Chair)
__________________________________
Tarek F. Abdelzaher
__________________________________
Marty Humphrey
__________________________________
Jrg Liebeherr
__________________________________
Gang Tao (Minor Representative)
Accepted by the School of Engineering and Applied Science:
__________________________________
Richard W. Miksad (Dean)
May 2001
-
4
Abstract
We develop Feedback Control real-time Scheduling (FCS) as a unified framework to
provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-
business servers on the Internet). FCS includes four major components. First, novel
scheduling architectures provide performance control to a new category of QoS critical
systems that cannot be addressed by traditional open loop scheduling paradigms. Second,
we derive dynamic models for computing systems for the purpose of performance
control. These models provide a theoretical foundation for adaptive performance control.
Third, we apply established control methodology to design scheduling algorithms with
proven performance guarantees, which is in contrast with existing heuristics-based
solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-
based performance specifications characterizes the efficiency, accuracy, and robustness
of QoS guarantees.
The generality and strength of FCS are demonstrated by its instantiations in three
important applications with significantly different characteristics. First, we develop real-
time CPU scheduling algorithms that guarantees low deadline miss ratios in systems
where task execution times may deviate from estimations at run-time. We solve the
saturation problems of real-time CPU scheduling systems with a novel integrated control
structure. Second, we develop an adaptive web server architecture to provide relative and
absolute delay guarantees to different service classes with unpredictable workloads. The
adaptive architecture has been implemented by modifying an Apache web server.
Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server
provides robust relative/absolute delay guarantees despite of instantaneous changes in the
user population. Third, we develop a data migration executor for networked storage
systems that migrate data on-line while guaranteeing specified I/O throughput of
concurrent applications.
-
5
Acknowledgements
First, thanks to my advisor, Jack Stankovic, for being a great mentor to me both
personally and professionally. His encouragement, support, and advice are greatly
appreciated. My thanks go to Tarek Abdelzaher, Sang Son, and Gang Tao for sharing
their ideas and insights on research.
Thanks to Guillermo Alvarez, John Wilkes, Michael Hobbs, Ralph Becker-Szendy,
Simon Towers, and all other members of the storage systems program at HP Labs for
offering me a great research environment and their collaborations during my internship at
HP Labs.
Thanks to Jrg Liebeherr and Marty Humphrey for serving on my dissertation
committee and their valuable suggestions on my dissertation.
Thanks to Jrgen Hansson, Victor Lee, Michael Marley, John Regehr, and all other
members of the real-time systems group for interesting and stimulating discussions.
Thanks to all of my friends for providing invaluable moral support. I want to
especially thank Hainan Lin for helping me through the years at Charlottesville.
Finally but not least, I want to thank my parents and my wife for their understanding
and support of my research endeavors and accompanying me through all the happy and
sad days.
-
6
Table of Contents
1. Introduction............................................................................................................ 15
1.1. Motivation................................................................................................. 15
1.2. Contributions............................................................................................. 19
2. Related Work.......................................................................................................... 26
2.1. Classical Real-Time Scheduling ............................................................... 27
2.2. Real-Time Scheduling for Embedded Digital Control Systems ............... 28
2.3. QoS Adaptation......................................................................................... 28
2.4. Service Delay Guarantee in Web Servers................................................. 30
2.5. Data Migration in Storage Systems .......................................................... 31
3. Feedback Control Real-Time Scheduling Framework ........................................ 32
3.1. Feedback Control Scheduling Architecture .............................................. 33
3.1.1. Control Related Variables..................................................................... 33
3.1.2. Feedback Control Loop......................................................................... 35
3.2. Performance Specifications and Metrics .................................................. 36
3.2.1. Performance Profile .............................................................................. 37
3.2.2. Load Profile .......................................................................................... 39
3.3. Control Theory Based Design Methodology ............................................ 42
4. Real-Time CPU Scheduling .................................................................................. 45
4.1. Feedback Control Real-Time Scheduling Architecture............................ 47
4.1.1. Task Model ........................................................................................... 48
-
7
4.1.2. Control Related Variables..................................................................... 49
4.1.3. Feedback Control Loop......................................................................... 51
4.1.4. Basic Scheduler..................................................................................... 52
4.2. Performance Specifications and Metrics .................................................. 53
4.2.1. Performance Profile .............................................................................. 53
4.2.2. Load Profile .......................................................................................... 55
4.3. Modeling the Controlled Real-Time System ............................................ 56
4.4. Design of FC-RTS Algorithms ................................................................. 60
4.4.1. Design of the Controller........................................................................ 61
4.4.2. Closed-Loop System Model ................................................................. 62
4.4.3. Control Tuning and Analysis ................................................................ 64
4.4.4. FC-RTS Algorithms.............................................................................. 73
4.5. Experiments .............................................................................................. 80
4.5.1. FECSIM Real-Time System Simulator ................................................ 81
4.5.2. Scheduling Policy of the Basic Scheduler ............................................ 81
4.5.3. Workload............................................................................................... 82
4.5.4. QoS Actuator ........................................................................................ 84
4.5.5. Profiling the Controlled Real-Time Systems........................................ 85
4.5.6. Controller Parameters ........................................................................... 87
4.5.7. Performance References ....................................................................... 88
4.5.8. Evaluation Experiment A: Arrival Overload ........................................ 90
4.5.9. Evaluation Experiment B: Arrival/Internal Overload........................... 96
4.6. Comparison of Real-Time Scheduling Algorithms in Overload ............ 108
-
8
4.7. Summary ................................................................................................. 109
5. Web Server with Delay Guarantees..................................................................... 111
5.1. Introduction............................................................................................. 111
5.2. Background............................................................................................. 116
5.3. Semantics of Service Delay Guarantees ................................................. 118
5.4. A Feedback Control Architecture for Web Server QoS ......................... 120
5.4.1. Connection Scheduler ......................................................................... 121
5.4.2. Server Processes.................................................................................. 123
5.4.3. Monitor ............................................................................................... 123
5.4.4. Controllers........................................................................................... 123
5.5. Design of the Controller.......................................................................... 127
5.5.1. Performance Specifications ................................................................ 128
5.5.2. Modeling the Web Server: A System Identification Approach .......... 129
5.5.3. Root-Locus Design ............................................................................. 133
5.6. Implementation ....................................................................................... 136
5.7. Experimentation...................................................................................... 138
5.7.1. Comparing Connection Delays and Response Times......................... 139
5.7.2. System Identification .......................................................................... 141
5.7.3. Evaluation of the Adaptive Web Server ............................................. 143
5.8. Summary ................................................................................................. 150
6. Online Data Migration in Storage Systems ........................................................ 152
6.1. Introduction and Motivations.................................................................. 152
6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration 156
-
9
6.2.1. Migration Planner ............................................................................... 156
6.2.2. LV Mover............................................................................................ 157
6.2.3. QoS guarantees ................................................................................... 158
6.2.4. The Feedback Control Loop ............................................................... 160
6.2.5. The Monitor ........................................................................................ 161
6.2.6. The Controller..................................................................................... 161
6.2.7. The Actuator ....................................................................................... 163
6.3. Design and Analysis of the Controller.................................................... 163
6.3.1. The Dynamic Model ........................................................................... 164
6.3.2. Controller Tuning and Analysis.......................................................... 165
6.4. Implementation ....................................................................................... 168
6.5. Experiments ............................................................................................ 169
6.5.1. Experiment Configurations................................................................. 170
6.5.2. Migration Penalty................................................................................ 171
6.5.3. System Profiling.................................................................................. 175
6.5.4. Performance Evaluation...................................................................... 177
6.6. Conclusion and Future Work .................................................................. 185
7. General Issues ...................................................................................................... 187
7.1. Granularity of Performance Control ....................................................... 187
7.2. Sampling Period and Overhead .............................................................. 189
7.3. Robustness of Linear Models and PI Control ......................................... 191
8. Conclusions and Future Work ............................................................................ 193
Reference.................................................................................................................. 197
-
10
List of Figures
Figure 3.1 The FCS Architecture..33
Figure 3.2 Control Theory based Design Methodology for FCS Algorithms...41
Figure 4.1 Feedback Control Real-Time Scheduling Architecture.................................. 47
Figure 4.2 The Model of the Controlled System .............................................................. 57
Figure 4.3 Closed-Loop System Model for Real-Time CPU Scheduling ........................ 62
Figure 4.4 System Response to Reference Input .............................................................. 69
Figure 4.5 System Response to Disturbance Input ........................................................... 70
Figure 4.6 Settling Time vs. Process Gain........................................................................ 72
Figure 4.7 The FC-UM Algorithm.................................................................................... 76
Figure 4.8 The FECSIM Simulator................................................................................... 81
Figure 4.9 Controlled Variables vs. Total Requested Utilization..................................... 86
Figure 4.10 Response to Arrival Overload SL(0, 150%) (DM/PA).................................. 89
Figure 4.11 Response to Arrival Overload SL(0, 150%) (EDF/P) ................................... 90
Figure 4.12 Execution Time Factor Ga in Experiment B................................................. 96
Figure 4.13 Response to Arrival/Internal Overload (DM/PA) ......................................... 97
Figure 4.14 Response to Arrival/Internal Overload (EDF/P) ........................................... 98
Figure 4.15 Average Performance of FC-RTS algorithms and the Baseline.................. 107
Figure 5.1 The Feedback-Control Architecture for Delay Guarantees .......................... 120
Figure 5.2 Architecture for system identification .......................................................... 131
Figure 5.3 The Root Locus of the web server model ..................................................... 136
Figure 5.4 Connection delay and response time............................................................. 139
-
11
Figure 5.5 System identification results for Relative Delay .......................................... 141
Figure 5.6 System Identification Results for Absolute Delay........................................ 143
Figure 5.7 Evaluation Results of Relative Delay Guarantees between Two Classes..... 146
Figure 5.8 Evaluation Results of Relative Delay Guarantees for Three Classes ........... 147
Figure 5.9 Evaluation of Absolute Delay Guarantees.................................................... 150
Figure 6.1 Aqueduct: The Feedback Control Architecture for Data Migration............. 160
Figure 6.2 Step Response of Aqueduct...167
Figure 6.3 Device iops during data migration................................................................ 172
Figure 6.4 Migration Penalty in Experiment 1173
Figure 6.5 Migration Penalty in Experiment 2173
Figure 6.6 Relationship between migration speed and migration speed176
Figure 6.7 Device iops and control input of Aqueduct...180
Figure 6.8 Average iops of AFAP and Aqueduct, and Aqueduct in steady state .......... 181
Figure 6.9 QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State ... 183
Figure 6.10 QoS violation ratio using 0.98IS..183
Figure 6.11 Worst QoS Violations of AFAP, Aqueduct, Aqueduct in steady state....184
Figure 6.12 Execution Time of Migration Plan.......185
-
12
List of Tables
Table 4.1 Testing Configurations.................................................................................... 82
Table 4.2 Controller Parameters of FC-RTS Algorithms ............................................... 87
Table 4.3 Performance References of FC-RTS Algorithms ........................................... 88
Table 4.4 The Performance Profiles of FC-U in Experiment B.................................... 100
Table 4.5 The Performance Profiles of FC-M in Experiment B....103
Table 4.6 The Performance Profiles of FC-UM in Experiment B.105
Table 4.7 Comparison of Real-Time Scheduling Paradigms in Overload Conditions .109
Table 5.1 Variables and Parameters of the Absolute Delay Controller CAk................. 124
Table 5.2 Variables and Parameters of the Relative Delay Controller CRk.................. 126
-
13
List of Symbols
C(k) a controlled variable
CS a performance reference
U(k) a manipulated variable
TS the settling time
CO the overshoot
ESC the steady-state error
SP the sensitivity with regard to a system parameter P
SL(Ln, Lm) the step load that increases instantaneously from Ln to Lm RL(Ln, Lm, TR) the ramp load that increases linearly from Ln to Lm within TR sec
Di[j] the relative deadline of task i at QoS level j
EEi[j] the estimated execution time of task i at QoS level j
AEi[j] the actual execution time of task i at QoS level j
Vi[j] the value of task i at QoS level j
Pi[j] the invocation period of periodic task i at QoS level j
EIi[j] the estimated inter-arrival-time of aperiodic task i at QoS level j
AIi[j] the average inter-arrival-time of aperiodic task i at QoS level j
Bi[j] the estimated CPU utilization of task i at QoS level j
Ai[j] the actual CPU utilization of task i at QoS level j
Ga(k) the utilization ratio in the kth sampling period
GA the worst-case utilization ratio
Gm(k) the miss ratio factor in the kth sampling period
GM the worst-case miss ratio factor
Ath(k) the schedulable utilization threshold GA in the kth sampling period
Wk the absolute or relative connection delay guarantee of service class k
Ck(m) the connection delay of class k in the mth sampling period
Bk(m) the process budget of class k in the mth sampling period
Rm(k) the inter-submove-time in the kth sampling period
Ii(k) the number of I/O per sec of device i in the kth sampling period
-
14
List of Abbreviations
FCS Feedback Control real-time Scheduling
RM Rate Monotonic scheduling policy
EDF Earliest Deadline First scheduling policy
DM Deadline Monotonic scheduling policy
-
15
Chapter 1
Introduction
1.1. Motivation
Real-time scheduling algorithms fall into two categories: static and dynamic scheduling.
In static scheduling, the scheduling algorithm has complete knowledge of the task set and
its constraints, such as deadlines, computation times, precedence constraints, and future
release times. The Rate Monotonic (RM) algorithm and its extensions [40][48] are static
scheduling algorithms and represent one major paradigm for real-time scheduling. In
dynamic scheduling, however, the scheduling algorithm does not have the complete
knowledge of the task set or its timing constraints. For example, new task activations, not
known to the algorithm when it is scheduling the current task set, may arrive at a future
unknown time. Dynamic scheduling can be further divided into two categories:
scheduling algorithms that work in resource sufficient environments and those that work
in resource insufficient environments. Resource sufficient environments are systems
where the system resources are sufficient to a priori guarantee that, even though tasks
arrive dynamically, at any given time all the tasks are schedulable. Under certain
-
16
conditions, Earliest Deadline First (EDF) [48][71] is an optimal dynamic scheduling
algorithm in resource sufficient environments. EDF is a second major paradigm for real-
time scheduling. While real-time system designers try to design the system with
sufficient resources, because of cost and unpredictable environments, it is sometimes
impossible to guarantee that the system resources are sufficient. In this case, EDFs
performance degrades rapidly in overload situations. The Spring scheduling algorithm
[79] can dynamically guarantee incoming tasks via on-line admission control and
planning and thus is applicable in resource insufficient environments. Many other
algorithms [71] have also been developed to operate in this way. These admission-
control-based algorithms represent the third major paradigm for real-time scheduling.
However, despite the significant body of results in these three paradigms of real-time
scheduling, many real world problems are not easily supported. While algorithms such as
EDF, RM and the Spring scheduling algorithm can support sophisticated task set
characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.),
they are all "open loop" scheduling algorithms. Open loop refers to the fact that once
schedules are created they are not "adjusted" based on continuous feedback. While open-
loop scheduling algorithms can perform well in predictable environments in which the
workloads can be accurately modeled (e.g., traditional process control systems), they can
perform poorly in unpredictable environments, i.e., systems whose workloads cannot be
accurately modeled. For example, the Spring scheduling algorithm assumes complete
knowledge of the task set except for their future release times. Systems with open-loop
schedulers such as the Spring scheduling algorithm are usually designed based on worst-
case workload parameters. When accurate system workload models are not available,
-
17
such an approach can result in a highly underutilized system based on extremely
pessimistic estimation of workload.
In recent years, a new category of soft real-time applications executing in open and
unpredictable environments is rapidly growing [69]. Examples include open systems on
the Internet such as online trading and e-business servers, and data-driven systems such
as smart spaces, agile manufacturing, and many defense applications such as C4I. For
example, in an e-business server, neither the resource requirements nor the arrival rate of
service requests are known a priori. However, performance guarantees are required in
these applications. Failure to meet performance guarantees may result in loss of
customers, financial damage, liability violations, or even mission failures. For these
applications, a system design based on open loop scheduling and estimation of worst-case
resource requirements can result in an extremely expensive and underutilized system.
As a cost-effective approach to achieve performance guarantees in unpredictable
environments, several adaptive scheduling algorithms have been recently developed (e.g.,
[5][8][9][24][44][46][55]). While early research on real-time scheduling was concerned
with guaranteeing complete avoidance of undesirable effects such as overload and
deadline misses, adaptive real-time systems are designed to handle such effects
dynamically. There remain many open research questions in adaptive real-time
scheduling. In particular, how can a system designer specify the performance requirement
of an adaptive real-time system? And how can he systematically design a scheduling
algorithm to satisfy its performance specifications? The design methodology for
automatic adaptive systems has been developed in feedback control theory [32][34].
However, feedback control theory has been mostly applied in mechanical and electrical
-
18
systems. The modeling, analysis and implementation of adaptive real-time systems lead
to significant research challenges.
Recently, several works applied control theory to computing systems. For example,
several papers [4][13][22][23][28][58][63][66][73][75] presented flexible or adaptive
real-time (CPU) scheduling techniques to improve digital control system performance.
These techniques are tailored to the specific characteristics of digital control systems
instead of general adaptive real-time computing systems. Several other papers [6][19]
[44][63][64][74] presented adaptive CPU scheduling algorithms or QoS management
architectures for computing systems such multimedia and communication systems.
Transient and steady state performance of adaptive real-time systems has received special
attention in recent years. For example, Brandt et. al. [19] evaluated a dynamic QoS
manager by measuring the transient performance of applications in response to QoS
adaptations. Rosu et. al. [64] proposed a set of performance metrics to capture the
transient responsiveness of adaptations and its impact on applications. The paper
proposed metrics that is similar to settling time and steady-state error metrics found in
control theory.
However, to our best knowledge, no unified framework exists to date for designing an
adaptive system from performance specifications of desired dynamic response. In this
thesis, we establish feedback control real-time scheduling (FCS) [53], a unified
framework of adaptive real-time systems based on feedback control theory. Our control
theoretical framework includes the following elements:
Feedback control scheduling architectures that map the feedback control structure
-
19
to adaptive resource scheduling in real-time systems [52],
A set of performance specifications and metrics to characterize transient and
steady state performance of adaptive real-time systems [51], and
A control theory based design methodology for resource scheduling algorithms to
satisfy their performance specifications [50][53].
In contrast with ad hoc approaches that rely on laborious design/tuning/testing
iterations, our framework enables system designers to systematically design adaptive
real-time systems with established analytical methods to achieve desired performance
guarantees in unpredictable environments.
1.2. Contributions
Specifically, the main contributions of this thesis work are as follows:
A control-theoretical foundation for adaptive real-time systems: We apply
control theory to provide a theoretical foundation for adaptive real-time
scheduling. In contrast with some existing scheduling algorithms that utilize
feedback control in an ad hoc manner, we provide theoretical understanding of
feedback control scheduling and develop a systematic design methodology for
adaptive real-time systems with analytically proven performance guarantees in
unpredictable environments.
Design methodology for real-time systems in unpredictable environments:
While traditional design methods for real-time system design depend on a priori
known workloads parameters (e.g., worst-case execution times, worst case arrival
-
20
rates, and blocking factors due to resource contentions), our control theory based
design methodology provides robust performance guarantees when accurate
characterizations of the workloads are not available. This feature makes our
design framework especially valuable for performance critical systems in
unpredictable environments, e.g., open systems on the Internet such as online
trading and e-business servers, and data-driven systems such as smart space, agile
manufacturing, and many defense applications.
Software architecture for feedback performance control: We develop a
general software architecture for adaptive performance control in unpredictable
environments. Our architecture facilitates control theory based design and
analysis of an adaptive real-time system by mapping it to the structure of
feedback control systems. This architecture includes a set of control-related
variables (performance references, controlled variables and manipulated
variables), and software components such as monitor, actuator, and controller.
Our architecture has been implemented as three instances tailored the specific
characteristics and performance requirements of different applications including
real-time CPU scheduling, a web server, and data migration in networked storage
systems. These successful instantiations demonstrate the general applicability of
our architecture in software systems in unpredictable environments.
Performance specifications and guarantees: While hard real-time systems
require absolute guarantees, such guarantees are infeasible and unnecessary for
-
21
many soft real-time systems in unpredictable environment. We adopt a set of
performance metrics and specifications in control theory to characterize the
transient and steady state performance of adaptive real-time systems. Transient
state performance (including settling time and overshoot) of an adaptive system
represents the responsiveness and efficiency of adaptation in response to
environmental variations, and steady-state performance (including stability,
steady state error, and sensitivity) describes a system's long-term performance. In
contrast, traditional metrics such as average miss-ratio cannot capture the
transient behavior of the system in response to load variations.
Modeling real-time computing systems: Unlike traditional control systems such
as electrical and mechanical systems, real-time computing systems do not have
readily available differential/difference equations that can be used in control
analysis. In this thesis work, we apply analytical approach and system
identification techniques to the modeling of three computing systems, a generic
CPU-bound real-time system, a modified Apache web server, and a networked
storage system. In the analytical approach, a system designer describes a system
directly with mathematical equations based on the knowledge of the system
dynamics. When such knowledge is not available (as in the case of the Apache
web server), we use system identification [11] to estimate the system model based
on system input/output from profiling experiments. This modeling methodology
and established analytical models provide a basis for the application of control
theory to adaptive real-time scheduling.
-
22
Handling non-linearities of real-time systems: The control design of an
adaptive resource scheduler is non-trivial due to the non-linearities and unknown
or random factors in many real-time computing systems. We solved these
problems with model linearization techniques and novel control structures based
on the particular characteristics of real-time systems. Our work demonstrates that
robust performance control can be achieved despite of the intrinsic non-linearities
and uncertainties of real-time systems.
Practical FCS implementation in three applications: Using our design
framework, we develop practical resource scheduling algorithms that can provide
robust (steady state and transient) performance guarantees in unpredictable
environments, while traditional scheduling algorithms fail to provide such
guarantees. We develop FCS algorithms for three application domains including
real-time CPU scheduling, web servers, and storage systems. These applications
are significantly different in terms of semantics of performance guarantees,
scheduled resources, monitor/actuator mechanisms, and system models. Our
evaluation experiments demonstrate that our FCS algorithms based on the FCS
framework successfully achieved robust performance guarantees in all three
applications. The success in these applications demonstrates that FCS is a unified
framework for adaptive computing systems.
Real-Time CPU Scheduling: We develop a set of feedback control real-
time scheduling (FCS) algorithms that guarantees low deadline miss ratio
-
23
and high CPU utilization by dynamically adjusting task QoS levels and
CPU requirements. Simulation experiments demonstrate that our FCS
algorithms provide robust steady and transient state performance
guarantees in terms of deadline miss ratio even when the task execution
time varied considerably from the estimation and when the systems
schedulable utilization bound is unknown.
Connection Scheduling in Web Servers: We develop adaptive connection
scheduling algorithms that provide relative, absolute and hybrid service
delay guarantees for different service classes on web servers under HTTP
1.1. The scheduling algorithms feature feedback control loops that
enforce delay guarantees for classes via dynamic connection scheduling
and server process reallocation. The scheduling algorithms have been
implemented by modifying an Apache web server. Experimental results
demonstrate that our adaptive server provides robust delay guarantees
when web workload varies significantly. Properties of our adaptive web
server also include guaranteed stability, and satisfactory efficiency and
accuracy in achieving desired delay or delay differentiation. Our new real-
time web server will be particularly useful for e-business and e-trading
applications, where a priori QoS guarantees is desirable in face of bursty
and unpredictable workloads from the Internet.
On-line Data Migration in Storage Systems: We have extended our work
to a non-real-time application, on-line data migration in storage systems.
On-line data migration is necessary in large-scale storage systems (e.g.,
-
24
data centers of e-business and large organizations, and multimedia service
centers such as video-on-demand) due to performance optimization and
load balancing, and back-up operations. However, data migration can
cause unacceptable performance degradations in concurrent applications
due to excessive resource contentions on the storage system. We develop
an adaptive data migration executor with a feedback control architecture
that guarantees desired I/O throughput for applications by dynamically
regulating the speed of data migration. The migration executor has been
implemented and evaluated at a storage testbed at HP Labs. Our
evaluation experiments demonstrate that our adaptive migration executor
achieved specified I/O throughput of all devices at the cost of slowing
down data migration. Our work on storage systems demonstrates the
generality of our control-theory-based framework in non-real-time
systems.
Technology Impact: Not only have we produced several research papers
[6][50][51][52][53][70], parts of this thesis work have also been transferred to
other university research groups. We have sent our real-time CPU scheduling
simulator FECSIM and the feedback control CPU scheduling algorithms to a
group in Sweden for them to study the algorithms. We have transferred the source
code of our adaptive web server and system identification software to Professor
Lui Shas group at UIUC and given them inputs on modeling of web servers. The
project of online data migration in networked storage systems was conducted
-
25
when the author was a research intern in the Storage Systems Program at Hewlett
Packard Laboratories (Palo Alto). Hewlett Packard is in the process of applying
the feedback control data migration technique developed in the Aqueduct project
for a patent.
The rest of the thesis is organized as follows. We discuss the state-of-the-art in
Chapter 2. In Chapter 3, we present the general control-theory based design methodology
for adaptive real-time systems. The first case study, feedback control real-time CPU
scheduling, is presented in Chapter 4. The second case study, adaptive connection
scheduling for service delay guarantees in web servers, is presented in Chapter 5. The
third case study, on-line data migration with I/O throughput guarantees on concurrent
applications in storage systems, is presented in Chapter 6. After summarizing several
general issues in Chapter 7, we conclude the thesis at Chapter 8.
-
26
Chapter 2
Related Work
A general trend of real-time resource scheduling has evolved from static to dynamic and
adaptive while the target application environments becomes increasingly unpredictable.
While classical real-time scheduling that concerns with absolute guarantees in highly
predictable environments, more recent research aims at developing more flexible,
adaptive and cost-effective solutions to handle unpredictable environments. This thesis
work establishes a theoretical foundation and unified framework for achieving a new
category of performance guarantees in unpredictable environments with adaptive real-
time resource scheduling. In this chapter, we summarize the work related to this thesis
research. The classical results on real-time scheduling is described in Section 2.1. A
category of flexible and adaptive real-time scheduling algorithms tailored for digital
control systems is summarized in Section 2.2. In Section 2.3, we then describe existing
QoS adaptation techniques and compare them with our FCS framework. Related works
on web server delay guarantees and storage systems are summarized in Sections 2.4 and
2.5, respectively.
-
27
2.1. Classical Real-Time Scheduling
Classical real-time scheduling algorithms depend on a priori characterization of
workload and systems to provide performance guarantees in predictable environments
(e.g., embedded process control and avionics). For example, Rate Monotonic (RM)
[40][48] and Earliest Deadline First (EDF) [48][71] require complete knowledge about
the task set such as resource requirements, precedence constraints, resource contention,
and future arrival times. Dynamic real-time systems [71] pioneered by the Spring project
[79] provide guarantees upon new task arrivals with on-line admission control and
planning. Unlike earlier systems based on RM or EDF, the dynamic real-time systems do
not require future task arrival time to be known a priori. However, the on-line admission
control and planning in the above dynamic systems still depend on a priori task set
characterizations including resource requirements, precedence constraints, and resource
contention. While classical algorithms such as EDF, RM and the Spring scheduling
algorithm can support sophisticated task set characteristics, they cannot provide
performance guarantees in systems operating in unpredictable environments where an
accurate workload model is not available. Such systems include Internet servers (e.g., on-
line stock trading and e-business) and data-driven systems (e.g., smart spaces and agile
manufacturing). A key observation that motivated this thesis work is that a fundamental
reason for the inadequacy of classical real-time scheduling in unpredictable environments
lies in their open loop nature. Because they do not adjust schedules based on continuous
performance feedback, open loop schedulers schedule tasks and system resource based on
worst-case workload estimations. When accurate system workload models are not
available, the open loop approach may result in a highly underutilized system based on
-
28
extremely pessimistic estimation of workload. In contrast, feedback control real-time
scheduling provides robust performance guarantees in unpredictable environments with a
closed loop approach.
2.2. Real-Time Scheduling for Embedded Digital Control Systems
There have been several results that have applied feedback control theory to the design of
real-time computing systems. For example, several papers [30][58][65][66] presented co-
design methods for real-time scheduling algorithms and embedded digital control
systems. The co-design methods trade-off the quality of control performance and its
computation requirements to produce more cost-effective system designs than separate
design of control and scheduling. There approaches are off-line solutions and their on-
line scheduling algorithms are still classical open-loop algorithms such as EDF and RM.
Several other papers presented on-line scheduling algorithms [4][16][22][23][30][73] to
improve the robustness of digital control system by dynamically relaxing the timing
constraints within the tolerable range of the digital control system in overload conditions.
However, these techniques require a priori knowledge of the tasks such as execution
times. Furthermore, these techniques are tailored to CPU-bound digital controllers and
are not applicable to other computing systems such as e-business servers and on-line
trading where the performance bottleneck may not be the CPU.
2.3. QoS Adaptation
The concept of using performance feedback to adjust the schedule has been incorporated
in general-purpose operating systems in the form of multi-level feedback queue
scheduling [18]. The system adjusts a tasks priority based on whether it consumes a time
-
29
slice or is blocked due to I/O. This type of feedback control is based on intuitive solutions
rather than systematic control derivation to achieve performance guarantees.
In recent years, QoS adaptation architectures and algorithms have been developed to
support applications such as communication subsystems [8], multimedia [19][24],
distributed visual tracking [46] and operating systems [55][61][63][69][78]. Some of
these techniques [55][61][63] include optimization algorithms to optimize the value in
QoS adaptation. However, their optimization algorithms assume that the resource
requirement of every QoS level is a priori known. In contrast, our FCS framework
provides performance guarantees even when the resource requirements are unknown or
deviate from the estimations. Several other works [8][21][25][78] developed feedback
based adaptation algorithms that do not depend on completely accurate knowledge about
workloads. However, their feedback loops were based on heuristics and they did not
establish time domain analysis on the efficiency of QoS adaptation in response to run-
time variations. Our FCS framework provides a unified framework to design adaptive
real-time systems with proven transient state performance.
Li and Nahrstedt utilized control theory to develop a feedback control loop to
guarantee desired network packet rate in a distributed visual tracking system [46]. Hollot,
Misra, Towsley, and Gong In [36] apply control theory to analyze a congestion control
algorithm on IP routers. While these works also uses control theory analysis on
computing systems, they do not address timing constraints and service delays on end
server systems , which is the focus of this thesis.
Transient and steady state performance of QoS adaptation has received special
attention in recent years (e.g., [19][64][75]). For example, Brandt et. al. [19] evaluated a
-
30
dynamic QoS manager by measuring the transient performance of applications in
response to QoS adaptations. Rosu et. al. [64] proposed a set of performance metrics to
capture the transient responsiveness of adaptations and its impact on applications.
However, they did not provide a methodology to design a system from its performance
specifications in terms of above metrics. Instead they only used the metrics in system
testing. In contrast, by extending and mapping these metrics to the dynamic response of
control systems, our FCS framework provide a control-theory-based methodology to
design a system to analytically satisfy its performance specifications.
2.4. Service Delay Guarantee in Web Servers
Support for different classes of service on the Web (with special emphasis on server
delay differentiation) has been investigated in recent literature. For example, the authors
of [28] proposed and evaluated an architecture in which restrictions are imposed on the
amount of server resources (such as threads or processes), which are available to basic
clients. In [9][10] admission control and scheduling algorithms are used to provide
premium clients with better service. In [17] a server architecture is proposed that
maintains separate service queues for premium and basic clients, thus facilitating their
differential treatment. While the above differentiation approach usually offers better
service to premium clients, it does not provide any guarantees on the service and hence
can be called the best effort differentiation model.
Notably, a feedback control loop was used in [5][6][9] to control the desired CPU
utilization of a web server with adaptive admission control. Their CPU utilization control
can be extended to guarantee the desired absolute delay in web servers under HTTP 1.0
protocol and when CPU is the bottleneck resource. This technique is not applicable to
-
31
servers under HTTP 1.1 protocol, which can be handled by our adaptive server described
in Chapter 5. A least squares estimator was used in [1] for automatic profiling of resource
usage parameters of a web server. However, the work did not establishing a dynamic
model for the server.
Several other works such as [13][26] developed kernel level mechanism to achieve
overload protection and proportional resource allocations in server systems. Their work
did not utilize feedback control, nor did they provide any relative or absolute delay
guarantees. Supporting proportional differentiated services in network routers have been
investigated in [26][47]. Their work did not address end systems such as web servers.
2.5. Data Migration in Storage Systems
An old approach to performing backups and data relocations is to do them at night, while
the system is idle. As discussed, this does not help with many current applications such
as e-business that require continuous operation and adaptation to quickly changing
system/workload conditions. The approach of bringing the whole (or parts of the) system
offline is also impractical due to the substantial business costs that it incurs. Online
migration and backup are still in their infancy in the current state of the art. Some
existing tools such as the Veritas Volume Manager [75] can guarantee consistent access
to each piece of data while its being migrated. However, we are not aware of any
existing solution that handles concurrent accesses while bounding the impact of
migration on concurrent applications.
-
32
Chapter 3
Feedback Control Real-Time Scheduling
Framework
In this chapter, we describe feedback control real-time scheduling (FCS), a unified
framework of adaptive real-time systems based on feedback control theory. The FCS
framework includes the following elements:
A feedback control scheduling architecture that maps adaptive resource
scheduling in real-time systems [52] to feedback control loops,
A set of performance specifications and metrics [51] to characterize transient and
steady state performance of adaptive real-time systems, and
A control theory based design methodology [50][53] for resource scheduling
algorithms to satisfy their performance specifications.
A key feature of the FCS framework is its use of feedback control theory (rather than
ad hoc solutions) as a scientific underpinning. The FCS framework enables system
designers to systematically design adaptive real-time systems with established analytical
-
33
methods to achieve analytically provable performance guarantees in unpredictable
environments. To our best knowledge, this is the first unified framework that provides a
fundamental theory and analytical design methodology for adaptive real-time systems to
achieve specified performance guarantees in unpredictable environments. In this chapter,
we describe the elements of the general FCS framework at a high level. The specific
technical challenges and solutions are described with its concrete instantiations in three
different application domains: real-time CPU scheduling (Chapter 4), web servers
(Chapter 5), and networked storage systems (Chapter 6).
3.1. Feedback Control Scheduling Architecture
The major components of our FCS architecture are a set of control related variables and a
feedback control loop that maps a feedback control system structure to real-time resource
scheduling.
Actuator
Monitor
performancereference
control input
controlled variable
manipulatedvariable
Real-Time System
+ -
error
controlfunction
ControllerScheduler
sample
Figure 3.1. The FCS Architecture
3.1.1. Control Related Variables
A first step in designing the FCS architecture is to decide the following key variables of a
real-time system in terms of control theory.
-
34
Controlled variable C(k): the performance metric that characterizes the system
performance defined over a sampling period ((k-1)W, kW), where W is a
application specific constant called the sampling window. The scheduler controls
the controlled variable in order to achieve the desired performance. The choice of
controlled variables depends on the performance guarantees that need to be
provided to the specific application of a system. For example, if an absolute delay
guarantee is required in an Internet server (e.g., critical stock trading operations in
an on-line trading system), the (absolute) service delays of HTTP requests should
be defined as the controlled variable. On the other hand, if proportional
differentiated service is required in an Internet server (e.g., e-commerce stores
where customers are classified into different service classes depending on their
monthly fees), the relative delays of service classes become the appropriate
controlled variables. For another example, the deadline miss ratio and the CPU
utilization are typical controlled variables for soft real-time systems (e.g.,
multimedia streaming, process control, and robotics) where explicit timing
constraints need to be respected.
Performance reference CS: the desired system performance in term of a controlled
variable C(k). The performance reference defines a contract established between
the adaptive resource scheduler and the users such that the performance reference
should be enforced. The difference between the performance reference and the
value of the corresponding controlled variable is called the error EC(k) = CS
-
35
C(k). For example, if a system set its performance to a deadline miss ratio of CS =
2%, and the current miss ratio is 10%, the system has an error EC(k) = -8%.
Manipulated variable U(k): a system attribute that is dynamically changed by the
scheduler. The manipulated variable should be effective for performance control,
e.g., changing its value should affect the systems controlled variable(s). The
choice of manipulated variable should reflect the resource bottleneck of a system.
For example, although the total requested utilization should be used as a
manipulated variable if CPU is the bottleneck resource of a web server; it should
not be used as the manipulated variable if CPU is not the bottleneck resource
(e.g., in the case of HTTP 1.1 as described in Section 5.2).
3.1.2. Feedback Control Loop
The FCS architecture has a feedback control loop that is invoked at every sampling
instant k. Each feedback control loop is composed of a Monitor, a Controller, and an
Actuator.
1) The Monitor measures the controlled variables and feeds the samples back to the
Controller.
2) The Controller compares the performance references with corresponding controlled
variables to get the current errors, and calls control algorithms to compute a control
input, the new value of the manipulated variable based on the errors. The control
algorithm is a critical component with significant impacts on the system performance
and hence is the centerpiece of the design of an FCS algorithm. Note that control
-
36
theory enables us to derive the control algorithm and analytically prove that the
algorithm can provide the desired performance guarantees.
3) The Actuator changes the manipulated variable based on the newly computed control
input. The Actuator implements a mechanism that dynamically reallocates
(reschedules) the resource corresponding to the manipulated variable. For example,
corresponding to a manipulated variable of the total requested CPU utilization, we
design a QoS Actuator to dynamically adjust task QoS levels (different QoS levels
have different execution times and/or invocation periods).
3.2. Performance Specifications and Metrics
We now describe the second element of the FCS framework, the performance
specifications and metrics for adaptive real-time systems. While early research on real-
time computing was concerned with guaranteeing complete avoidance of undesirable
effects such as overload and deadline misses, adaptive real-time systems are designed to
handle such effects dynamically. Using a control theory framework, we characterize the
dynamic performance of an adaptive real-time system in both transient and steady state
upon load or resource changes. Transient behavior of an adaptive system represents the
responsiveness and efficiency of adaptation in reacting to changes in run-time conditions,
and steady-state behavior describes a system's long-term performance after its transient
response settles. In contrast, traditional metrics such as the average miss-ratio often fails
to capture the transient behavior of the system in response to load variations. Another
important advantage of our metrics is that they formulate the performance of real-time
systems as dynamic responses in control theory, and therefore enable the use of control
-
37
design methods to satisfy the specifications. Our performance specifications and metrics
consist of a set of performance profiles1 in terms of the controlled variables. We also
present a set of representative load profiles adapted from control theory [32].
Corresponding to signals widely used in control theory, our load profiles can be used to
provide guidance for control design and generate canonical system response to variations
of run-time conditions.
3.2.1. Performance Profile
The performance profile characterizes important transient and steady state properties of a
system in terms of its controlled variables. Note that when the sampling window W is
small, a controlled variable C(k) approximates the instantaneous system performance at
the sampling instant k. In contrast, traditional metrics for real-time systems such as
average miss-ratio and average utilization are defined based on a much larger time
window than the sampling period W. The average metrics are often inadequate metric in
characterizing the dynamics of the system performance [50]. From the control theory
point of view, a real-time system transits from the steady state to the transient state when
a controlled variable deviates significantly from its steady state value in response to
variation in its run-time condition. After a time interval in the transient state, the system
may settle down to a new steady state after the feedback control loop converges the
controlled variable to the vicinity of a new value. The steady state is defined as a state
when the controlled variable C(k) stays within % of its performance reference CS. The
performance profile includes the following elements.
1 The performance profile has been called the miss-ratio profile in [50] when deadline miss ratio is used as the controlled variable.
-
38
Stability: A system is Bounded-Input-Bounded-Out stable if its controlled
variables are always bounded for bounded performance references and
disturbances. Note that the performance of an unstable system can severely and
persistently diverge from the desired performance so as to cause system
malfunctioning and even complete system failure. Stability is a necessary
condition for achieving the desired performance reference. Stability is especially
an important requirement for FCS algorithms because a poorly designed
Controller can overreact to performance errors and push a real-time system to
unstable conditions.
Transient-state response represents the responsiveness and efficiency of adaptive
resource scheduling in reacting to changes in run-time conditions.
Settling time Ts: The time it takes the system to settle down to a steady
state from the start of a transient state. The settling time represents how
fast the system can regain desired performance after a change in its run-
time condition.
Overshoot Co: The maximum amount that a controlled variable overshoots
its reference divided by its reference, i.e., Co = (CM CS) / CS where CM is
the maximum value of the controlled variable during its transient state.
Overshoot characterizes the worst-case transient performance degradation
of a system. A system may require a low overshoot because severe
transient performance degradation may lead to system failure. For
-
39
example, in media players, a high transient deadline miss-ratio can cause
buffer overflows [19].
Steady-state error ESC: The difference between the average value of a controlled
variable in steady state and its reference. The steady state error characterizes how
precise the system can enforce desired performance in steady state.
Sensitivity SP: Relative change of a controlled variable in steady state with respect
to the relative change of a system parameter P. For example, assuming the
controlled variable is deadline miss ratio, the systems sensitivity with respect to
the task execution time SAE represents how significantly the change in the task
execution time affects the system miss-ratio. Sensitivity describes the robustness
of the system with regard to workload or system variations.
The performance profile establishes a set of metrics of adaptive real-time systems based
on the specification of dynamic response in control theory. The metrics enables system
designers to apply established control theory techniques to achieve stability, and meet
transient and steady state specifications.
3.2.2. Load Profile
According to control theory, the performance profile of an adaptive system may be
specified assuming representative load profiles including step load and ramp load. The
step load represents the worst case of load variation that overloads the system
instantaneously, while the ramp load represents a nominal form of load variation. The
-
40
load profiles are defined as follows.
Step-load SL(Ln, Lm): a load profile that instantaneously jumps from a nominal
load Ln to a higher load Lm > Ln and stays constant after the jump. Instantaneous
load change such as the step load is more difficult to handle than gradual load
change.
Ramp-load RL(Ln, Lm, TR): a load profile that increases linearly from the nominal
load Ln to a higher load Lm > Ln during a time interval of TR sec. Compared with
the step load, the ramp signal represents a less severe load variation scenario.
One key advantage of using the above load profiles for performance specification is
that they are amenable to well-established design and analysis methods in control theory
and, therefore, fits well with our control theoretical framework. This means that a system
designer can use control theory method to analytically design the system to satisfy a
performance profile in response to a load profile as defined above. Specifically, a load
profile can be modeled as disturbance signals in the form of a step or ramp signal (see
Section 4.4). Based on control theory, a linear systems dynamic properties can be
determined by its dynamic response to a step signal or a ramp load regardless of its
parameters including the magnitude of load variation (Lm-Ln) and the ramp duration TR. If
a real-time system can be approximated with a linear model in its operation conditions,
its performance profile can be determined by stressing the system with a step load, i.e.,
the system can achieve satisfactory performance under any combinations of step and
-
41
ramp load if its performance profile in response to a step load or ramp load satisfies its
specifications.
Unfortunately, if a real-time system is non-linear in its operation conditions, the
dynamic response of a system in response of any load variations cannot be determined by
its response to a single step load or a single ramp load because the system performance
depends on the specific parameters of the load profiles. In this case, the performance
profiles in response to specific load profiles are only indications of the system
performance in general. In this case, the load profiles are application-specific based on a
set of expected load characteristics and system requirement.
We should also note that load profile is an abstraction of the workload, and there can
be many possible instantiations of the same load profile. The instantiation of a load
profile should incorporate the knowledge of the workload, and, therefore, the load profile
should be viewed as an enhancement to existing benchmarks (e.g., [37][40][41][42]
[75][77]). For example, the system load can be interpreted as the total requested CPU
utilization in the system where CPU is the bottleneck resource. For another example, the
load of an Internet server may be interpreted as the number of concurrent users.
Controller Design
Requirement Analysis
Modeling System Model FCS algorithms
Performance Specifications
Satisfy
Figure 3.2. Control Theory based Design Methodology for FCS Algorithms
-
42
3.3. Control Theory Based Design Methodology
The third element of our FCS framework is the control theory based design methodology
(see Figure 3.2). Based on the scheduling architecture and the performance specifications,
we now establish a design methodology based on feedback control theory. Using this
design methodology, a system designer can systematically design an adaptive resource
scheduler to satisfy the systems performance specifications with established analytical
methods. This methodology is in contrast with existing ad hoc approaches that depend on
laborious design/tuning/testing iterations. Our design methodology works as follows.
1) The system designer specifies the desired dynamic behavior with transient and
steady state performance metrics. This step maps the performance requirements of
an adaptive real-time system to the dynamic response specification of a control
system.
2) The system designer establishes a dynamic model of the real-time system for the
purposes of performance control. A dynamic model describes the mathematical
relationship between the control input and the controlled variables of a system
with differential/difference equations or state matrices. Modeling is important
because it provides a basis for the analytical design of the Controller. However,
modeling has been a major challenge for applying control theory to real-time
systems due to the lack of established differential/difference equations to describe
real-time systems. Two different approaches can be used to establish the dynamic
model of a real-time system. The analytical approach directly describes a system
-
43
with mathematical equations based on the knowledge of the system dynamics.
When such knowledge is not available, the system identification approach [11]
can be used to estimate the system model based on profiling experiments. In this
thesis work, we apply the analytical approach to model a generic CPU-bound
real-time system and a storage system, and developed a system identification tool
to model a web server whose dynamics is less clear. Our work represents a first
step in modeling real-time systems using rigorous mathematical equations. Our
modeling methodology and established analytical models provide a foundation for
the application of control theory to adaptive real-time systems in this thesis work
and future works in this area.
3) Based on the performance specs and system model from step 1) and 2), the
system designer applies established mathematical techniques (i.e., the Root Locus
method, frequency design, or state based design) of feedback control theory [32]
to design FCS algorithms that analytically guarantee the specified transient and
steady-state behavior at run-time. Compared with existing ad hoc approaches, our
analytical design approach significantly reduce the design time and required
efforts for adaptive systems because our approach requires much less design
/testing iterations. Furthermore, the resultant systems parameters can be easily
tuned with existing control theory methods and tools in practice and the resultant
system can be proved to satisfy its performance specifications. In contrast, the
tuning adaptive systems designed with ad hoc methods often depend on repeated
testing, guessing, or rule-of-thumb without performance guarantees at run-time.
-
44
In summary, we describe a unified FCS framework for adaptive real-time systems
that provides performance guarantees in unpredictable environments. Our FCS
framework includes 1) a software architecture for feedback performance control, 2) a
set of performance specifications and metrics that describes the efficiency, accuracy,
and robustness of performance guarantees, and 3) a control theory methodology for
designing FCS algorithms to satisfy the performance specifications. In the next three
chapters, we describe the details of three instantiations of the FCS framework in three
application domains.
-
45
Chapter 4
Real-Time CPU Scheduling
In this Chapter, we develop a set of novel real-time CPU scheduling algorithms called
FC-RTS [51][52][53][70] that guarantee low deadline miss ratio and high CPU utilization
when workload deviate from estimations at run-time. Our FC-RTS algorithms provide a
scheduling solution for a new category of soft real-time systems working in unpredictable
environments, whose performance cannot be guaranteed by many existing real-time
scheduling algorithms including RM [43], EDF [70], the Spring algorithm [79], and QoS
adaptation algorithms [4][61]. Such systems include open systems on the Internet such as
on-line trading servers, e-business servers, and on-line media streaming, and data driven
systems such as database applications. For example, in an on-line trading server, the
processing time for a service request often depends on the user input that is unknown to
the scheduler. For another example, in a surveillance system, the processing time of
objects tracking based on camera images can vary dramatically due to movement scope
of the object being tracked [23]. In addition, our FC-RTS algorithms can also provide
performance guarantees for off-the-shelf software applications, components, and device
drivers when accurate information on their execution time and invocation rates is
-
46
unavailable.
A motivation for applying FCS framework to real-time CPU scheduling is the
observation that many existing feedback based scheduling algorithms [8][21][25] are
based on heuristics rather than a theoretical foundation. These algorithms often depend
on laborious design/tuning/testing iterations, and may still fail to handle unexpected or
untested conditions at run-time. While the design methodology for automatic feedback
control systems has been developed in feedback control theory, the modeling, analysis
and implementation of real-time scheduling lead to significant research challenges to
real-time system research. In this thesis, we design our FC-RTS algorithms based on a
feedback control theory by instantiating the FCS framework in real-time CPU scheduling.
Specially, our major contributions include the following:
A novel and general feedback control real-time CPU scheduling architecture that
allows plug-ins of different real-time scheduling policies and QoS optimization
algorithms and a set of tuning rules based on the scheduling policies,
An analytical model of CPU-bound real-time system, which to our best
knowledge is the first dynamic model for generic real-time CPU scheduling,
A set of analysis results and tuning methods for FC-RTS algorithms to achieve
performance specifications including stability, settling time, overshoot, steady
state performance, and sensitivity with regard to workload variations,
Practical FC-RTS algorithms applicable to different types of real-time
applications,
Performance evaluation results demonstrating that our analytically designed FC-
RTS algorithms can provide robust performance guarantees in terms of deadline
-
47
miss ratio and CPU utilization, and achieve satisfactory performance profiles in
response to overloads caused by new task arrivals and task execution time
variations.
The feedback control real-time scheduling architecture is described in Section 4.1.
We describe the performance specifications and metrics in Section 4.2. We establish an
analytical model for a real-time system in Section 4.3. Based on the model, we present
the design and control analysis of a set of FC-RTS algorithms in Section 4.4. We present
the performance evaluation results of these scheduling algorithms in Section 4.5. We then
qualitatively compare FC-RTS algorithms with several existing scheduling paradigms in
Section 4.6. Finally, we summarize this chapter in Section 4.7.
CPU
Task Arrivals
Completed/AbortedTasks
QoS Actuator
Scheduler
Current Tasks
Performance References
Control Input AdjustQoS
Sched
Controller
ControlledVariables
Monitor
BasicScheduler
Figure 4.1. Feedback Control Real-Time Scheduling Architecture
4.1. Feedback Control Real-Time Scheduling Architecture
Our feedback control real-time CPU scheduling (FC-RTS) architecture (illustrated in
Figure 4.1) is composed of four parts: a task model, a set of control related variables, a
feedback control loop that maps a feedback control system structure to real-time CPU
-
48
scheduling, and a Basic Scheduler.
4.1.1. Task Model
In our task model, each task Ti has N QoS levels (N 2). Each QoS level j (0 j N-1)
of Ti is characterized by the following attributes:
Di[j]: the relative deadline
EEi[j]: the estimated execution time
AEi[j]: the (actual) execution time that can vary considerably from instance to
instance and is unknown to the scheduler
Vi[j]: the value that task Ti contributes if it is completed at QoS level j before its
deadline Di[j]. The lowest QoS level 0 represents the rejection of the task
and Vi[0] = 0. Every QoS level contributes a miss penalty MPi < 0 if it
misses its deadline.
Periodic tasks:
Pi[j]: the invocation period
Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / Pi[j]
Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / Pi[j]
Aperiodic tasks:
EIi[j]: the estimated inter-arrival-time between subsequent invocations
AIi[j]: the average inter-arrival-time that is unknown to the scheduler
Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / EIi[j]
Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / AIi[j]
In this model, a higher QoS level of a task has a higher (both estimated and actual)
CPU utilization and contributes a higher value if it meets its deadline, i.e., Bi[j+1] > Bi[j],
-
49
Ai[j+1] > Ai[j], and Vi[j+1] > Vi[j]. In the simplest form, each task only has two QoS
levels (corresponding to the admission and the rejection of the task, respectively). In
many applications including web services [5], multimedia [19], embedded digital control
systems [23], and systems that support imprecise computation [48] or flexible security
[68], each task has more than two QoS levels and the scheduler can trade-off the CPU
utilization of a task with the value it contributes to the system at a finer granularity. The
QoS levels may differ in term of execution time and/or period/inter-arrival-time. For
example, a web server may dynamically change the execution time of an HTTP session
by changing the complexity of the requested web page [5]. For another example, several
papers have shown that the deadlines and periods of tasks in embedded digital control
systems and multimedia players can be adjusted on-line [19][23][66] within certain
ranges. A key feature of our task model is that it characterizes systems in unpredictable
environments where tasks actual CPU utilization is time varying and unknown to the
scheduler. Such systems are amenable to the use of feedback control loops to
dynamically correct the scheduling errors to adapt to load variations at run-time.
4.1.2. Control Related Variables
An important step in designing the FC-RTS architecture is to decide the following
variables of a real-time system in terms of control theory.
Controlled variables are the performance metrics controlled by the scheduler in
order to achieve desired system performance. Controlled variables of a real-time
system may include the deadline miss ratio M(k) and the CPU utilization U(k)
(also called miss ratio and utilization, respectively), both defined over a time
-
50
window ((k-1)W, kW), where W is the sampling period and k is called the
sampling instant.
The miss ratio M(k) at the kth sampling instant is defined as the number
of deadline misses divided by the total number of completed and
aborted task instances in a sampling window ((k-1)W, kW). Miss ratio
is usually the most important performance metric in a real-time
system.
The utilization U(k) at the kth sampling instant is the percentage of
CPU busy time in a sampling window ((k-1)W, kW). CPU utilization is
regarded as a controlled variable for real-time systems due to cost and
throughput considerations. CPU utilization is important also because
the its direct linkage with the deadline miss ratio (see Section 4.3).
Another controlled variable might be the total value V(k) delivered by
the system in the kth sampling period. In the remainder of this paper,
we do not directly use the total value as a controlled variable, but
rather address the value imparted by tasks via the QoS Actuator (see
and Section 4.5.1)
Performance references represent the desired system performance in terms of the
controlled variables, i.e., the desired miss ratio MS and/or the desired CPU
utilization US. For example, a particular system may require deadline miss ratio
MS = 0 and CPU utilization US = 90%. The difference between a performance
reference and the current value of the corresponding controlled variable is called
-
51
an error, i.e., the miss ratio error EM = MS M(k) and the utilization error EU = US
U(k).
Manipulated variables are system attributes that can be dynamically changed by
the scheduler to affect the values of the controlled variables. In our architecture,
the manipulated variable is the total estimated utilization B(k) = iBi[li(k)] of all tasks in the system, where Ti is a task with a QoS level of li(k) in the kth sampling
window. The rational for choosing the total estimated utilization as a manipulated
variable is that most real-time scheduling policies (such as EDF and
Rate/Deadline Monotonic) can guarantee no deadline misses when the system is
not overloaded, and in normal situations, the miss ratio increases as the system
load increases. The other controlled variable, the utilization U(k), also usually
increases as the total estimated utilization increases. However, the utilization is
often different from the total estimated utilization B(k). This is due to the
estimation error of execution times when workload is unpredictable and time
varying. Another difference between U(k) and B(k) is that U(k) can never exceed
100% while B(k) does not have this boundary.
4.1.3. Feedback Control Loop
The FC-RTS architecture features a feedback control loop that is invoked at every
sampling instant k. It is composed of a Monitor, a Controller, and a QoS Actuator (Figure
4.1).
1) The Monitor measures the controlled variables (M(k) and/or U(k)) and feeds the
samples back to the Controller.
-
52
2) The Controller compares the performance references with corresponding controlled
variables to get the current errors, and computes a change DB(k) (called the control
input) to the total estimated requested utilization, i.e., B(k+1) = B(k) + DB(k), based
on the errors. The Controller uses a control function to compute the correct control
input to compensate for the load variations and keep the controlled variables close to
the references. The detailed design of the Controller is presented in Section 4.4.
3) The QoS Actuator calls a QoS optimization algorithm (see Section 4.5.1) to maximize
the system value by dynamically adjusting tasks QoS levels under the utilization
constraint computed by the Controller, B(k+1) = B(k) + DB(k). In the simplest form,
each task only has only two QoS levels and the QoS Actuator is essentially an
admission controller.
In addition to the above feedback control loop, our FC-RTS architecture also includes
arriving-time QoS control, i.e., in addition to being called periodically by the Controller,
the QoS Actuator is also invoked upon the arrival of each task. The arriving-time QoS
control isolates disturbances caused by new task arrivals (see Section 4.3). Feedback
control scheduling in systems without arriving-time QoS control was previously studied
in [50].
4.1.4. Basic Scheduler
The FC-RTS architecture has a Basic Scheduler that schedules admitted tasks with a
scheduling policy (e.g., EDF or Rate/Deadline Monotonic). The properties of the
scheduling policy can have significant impact on the design of the feedback control loop.
Our FC-RTS architecture permits plugging in different real-time scheduling policies for
-
53
this Basic Scheduler and then designing the entire feedback control scheduling system
around this choice (see Section 4.4.4).
A key difference between our work and many previous works is that while previous
work often assumes the CPU utilization of each task is known a priori, we focus on
systems in unpredictable environments where tasks actual CPU utilizations are unknown
and time varying. This more challenging problem necessitates the feedback control loop
to dynamically correct the scheduling errors at run-time. Our FC-RTS architecture
establishes a mapping from real-time scheduling to a typical structure of feedback control
systems. This step enables us to treat a real-time system as a feedback control system and
utilize feedback control theory to design the system rather than developing ad hoc
algorithms.
4.2. Performance Specifications and Metrics
We now specialize the second element of the FCS framework, the performance
specifications, to real-time CPU scheduling. The performance specifications consist of a
set of performance profiles in terms of utilization U(k) and miss ratio M(k), and a set of
load profiles in term of the total requested CPU utilization of a system.
4.2.1. Performance Profile
The performance profile characterizes important transient and steady state performance
of a real-time system. M(k) and U(k) characterize the system performance in the sampling
window ((k-1)W, kW). In contrast, traditional metrics for real-time systems such as
average miss-ratio and average utilization are defined based on a much larger time
window than the sampling period W. The average metrics are often inadequate metric in
-
54
characterizing the dynamics of the system performance in response to overload
conditions [50]. The performance profile of a real-time system includes the follows.
Stability: A real-time system is stable if its miss ratio M(k) and utilization U(k) are
always bounded for bounded references. Although both miss ratio M(k) and
utilization U(k) are naturally bounded in the range [0, 1], stability is a necessary
condition to prevent the controlled variables from severe deviations from the
reference values.
Transient-state response represents the real-time systems responsiveness and
efficiency of QoS adaptation in reacting to changes in run-time conditions.
Overshoot Mo and Uo: For a real-time system, we define overshoot as the
maximum amount that the system overshoots its miss ratio or utilization
reference divided by its miss ratio or utilization reference, i.e., Mo = (Mmax
MS) / MS, Uo = (Umax US) / US, respectively. The maximum miss ratio
Mo and utilization Uo in the transient state is called the absolute overshoot.
Overshoot is important to a real-time system because a high transient
miss-ratio or utilization can cause system failure in many systems such as
robots and media streaming [19].
Settling time Ts: The time it takes the system to enter a steady state in
response to a load profile. The settling time represents how fast the system
can settle down to steady state with desired miss ratio and/or utilization.
Steady-state error ESM and ESU: The difference between the average values of
miss ratio M(k) and/or utilization U(k) in steady state and its corresponding
reference. The steady state error characterizes how precise the system can enforce
-
55
the desired miss ratio and/or utilization in steady state.
Sensitivity Sp: Relative change of a controlled variable in steady state with respect
to the relative change of a system parameter p. For example, sensitivity of miss
ratio with respect to the task execution time SAE represents how significantly the
change in the task execution time affects the system miss-ratio. Sensitivity
describes the robustness of the system with regard to workload or system
variations.
4.2.2. Load Profile
Fo