thesis.pdf

201
Feedback Control Real-Time Scheduling A Dissertation Presented to the Faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Computer Science by Chenyang Lu May 2001

description

feedback control real time scheduling as a unified frameworkto provide quality of service.

Transcript of thesis.pdf

  • Feedback Control Real-Time Scheduling

    A Dissertation

    Presented to the Faculty of the School of Engineering and Applied Science

    University of Virginia

    In Partial Fulfillment of the Requirements for the Degree of

    Doctor of Philosophy

    Computer Science

    by

    Chenyang Lu

    May 2001

  • 2

    Copyright by

    Chenyang Lu

    All Rights Reserved

    May 2001

  • 3

    Approvals

    This dissertation is submitted in partial fulfillment of the requirements for the degree of

    Doctor of Philosophy

    Computer Science

    __________________________________

    Chenyang Lu

    Approved:

    __________________________________

    John A. Stankovic (Advisor)

    __________________________________

    Sang H. Son (Chair)

    __________________________________

    Tarek F. Abdelzaher

    __________________________________

    Marty Humphrey

    __________________________________

    Jrg Liebeherr

    __________________________________

    Gang Tao (Minor Representative)

    Accepted by the School of Engineering and Applied Science:

    __________________________________

    Richard W. Miksad (Dean)

    May 2001

  • 4

    Abstract

    We develop Feedback Control real-time Scheduling (FCS) as a unified framework to

    provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-

    business servers on the Internet). FCS includes four major components. First, novel

    scheduling architectures provide performance control to a new category of QoS critical

    systems that cannot be addressed by traditional open loop scheduling paradigms. Second,

    we derive dynamic models for computing systems for the purpose of performance

    control. These models provide a theoretical foundation for adaptive performance control.

    Third, we apply established control methodology to design scheduling algorithms with

    proven performance guarantees, which is in contrast with existing heuristics-based

    solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-

    based performance specifications characterizes the efficiency, accuracy, and robustness

    of QoS guarantees.

    The generality and strength of FCS are demonstrated by its instantiations in three

    important applications with significantly different characteristics. First, we develop real-

    time CPU scheduling algorithms that guarantees low deadline miss ratios in systems

    where task execution times may deviate from estimations at run-time. We solve the

    saturation problems of real-time CPU scheduling systems with a novel integrated control

    structure. Second, we develop an adaptive web server architecture to provide relative and

    absolute delay guarantees to different service classes with unpredictable workloads. The

    adaptive architecture has been implemented by modifying an Apache web server.

    Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server

    provides robust relative/absolute delay guarantees despite of instantaneous changes in the

    user population. Third, we develop a data migration executor for networked storage

    systems that migrate data on-line while guaranteeing specified I/O throughput of

    concurrent applications.

  • 5

    Acknowledgements

    First, thanks to my advisor, Jack Stankovic, for being a great mentor to me both

    personally and professionally. His encouragement, support, and advice are greatly

    appreciated. My thanks go to Tarek Abdelzaher, Sang Son, and Gang Tao for sharing

    their ideas and insights on research.

    Thanks to Guillermo Alvarez, John Wilkes, Michael Hobbs, Ralph Becker-Szendy,

    Simon Towers, and all other members of the storage systems program at HP Labs for

    offering me a great research environment and their collaborations during my internship at

    HP Labs.

    Thanks to Jrg Liebeherr and Marty Humphrey for serving on my dissertation

    committee and their valuable suggestions on my dissertation.

    Thanks to Jrgen Hansson, Victor Lee, Michael Marley, John Regehr, and all other

    members of the real-time systems group for interesting and stimulating discussions.

    Thanks to all of my friends for providing invaluable moral support. I want to

    especially thank Hainan Lin for helping me through the years at Charlottesville.

    Finally but not least, I want to thank my parents and my wife for their understanding

    and support of my research endeavors and accompanying me through all the happy and

    sad days.

  • 6

    Table of Contents

    1. Introduction............................................................................................................ 15

    1.1. Motivation................................................................................................. 15

    1.2. Contributions............................................................................................. 19

    2. Related Work.......................................................................................................... 26

    2.1. Classical Real-Time Scheduling ............................................................... 27

    2.2. Real-Time Scheduling for Embedded Digital Control Systems ............... 28

    2.3. QoS Adaptation......................................................................................... 28

    2.4. Service Delay Guarantee in Web Servers................................................. 30

    2.5. Data Migration in Storage Systems .......................................................... 31

    3. Feedback Control Real-Time Scheduling Framework ........................................ 32

    3.1. Feedback Control Scheduling Architecture .............................................. 33

    3.1.1. Control Related Variables..................................................................... 33

    3.1.2. Feedback Control Loop......................................................................... 35

    3.2. Performance Specifications and Metrics .................................................. 36

    3.2.1. Performance Profile .............................................................................. 37

    3.2.2. Load Profile .......................................................................................... 39

    3.3. Control Theory Based Design Methodology ............................................ 42

    4. Real-Time CPU Scheduling .................................................................................. 45

    4.1. Feedback Control Real-Time Scheduling Architecture............................ 47

    4.1.1. Task Model ........................................................................................... 48

  • 7

    4.1.2. Control Related Variables..................................................................... 49

    4.1.3. Feedback Control Loop......................................................................... 51

    4.1.4. Basic Scheduler..................................................................................... 52

    4.2. Performance Specifications and Metrics .................................................. 53

    4.2.1. Performance Profile .............................................................................. 53

    4.2.2. Load Profile .......................................................................................... 55

    4.3. Modeling the Controlled Real-Time System ............................................ 56

    4.4. Design of FC-RTS Algorithms ................................................................. 60

    4.4.1. Design of the Controller........................................................................ 61

    4.4.2. Closed-Loop System Model ................................................................. 62

    4.4.3. Control Tuning and Analysis ................................................................ 64

    4.4.4. FC-RTS Algorithms.............................................................................. 73

    4.5. Experiments .............................................................................................. 80

    4.5.1. FECSIM Real-Time System Simulator ................................................ 81

    4.5.2. Scheduling Policy of the Basic Scheduler ............................................ 81

    4.5.3. Workload............................................................................................... 82

    4.5.4. QoS Actuator ........................................................................................ 84

    4.5.5. Profiling the Controlled Real-Time Systems........................................ 85

    4.5.6. Controller Parameters ........................................................................... 87

    4.5.7. Performance References ....................................................................... 88

    4.5.8. Evaluation Experiment A: Arrival Overload ........................................ 90

    4.5.9. Evaluation Experiment B: Arrival/Internal Overload........................... 96

    4.6. Comparison of Real-Time Scheduling Algorithms in Overload ............ 108

  • 8

    4.7. Summary ................................................................................................. 109

    5. Web Server with Delay Guarantees..................................................................... 111

    5.1. Introduction............................................................................................. 111

    5.2. Background............................................................................................. 116

    5.3. Semantics of Service Delay Guarantees ................................................. 118

    5.4. A Feedback Control Architecture for Web Server QoS ......................... 120

    5.4.1. Connection Scheduler ......................................................................... 121

    5.4.2. Server Processes.................................................................................. 123

    5.4.3. Monitor ............................................................................................... 123

    5.4.4. Controllers........................................................................................... 123

    5.5. Design of the Controller.......................................................................... 127

    5.5.1. Performance Specifications ................................................................ 128

    5.5.2. Modeling the Web Server: A System Identification Approach .......... 129

    5.5.3. Root-Locus Design ............................................................................. 133

    5.6. Implementation ....................................................................................... 136

    5.7. Experimentation...................................................................................... 138

    5.7.1. Comparing Connection Delays and Response Times......................... 139

    5.7.2. System Identification .......................................................................... 141

    5.7.3. Evaluation of the Adaptive Web Server ............................................. 143

    5.8. Summary ................................................................................................. 150

    6. Online Data Migration in Storage Systems ........................................................ 152

    6.1. Introduction and Motivations.................................................................. 152

    6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration 156

  • 9

    6.2.1. Migration Planner ............................................................................... 156

    6.2.2. LV Mover............................................................................................ 157

    6.2.3. QoS guarantees ................................................................................... 158

    6.2.4. The Feedback Control Loop ............................................................... 160

    6.2.5. The Monitor ........................................................................................ 161

    6.2.6. The Controller..................................................................................... 161

    6.2.7. The Actuator ....................................................................................... 163

    6.3. Design and Analysis of the Controller.................................................... 163

    6.3.1. The Dynamic Model ........................................................................... 164

    6.3.2. Controller Tuning and Analysis.......................................................... 165

    6.4. Implementation ....................................................................................... 168

    6.5. Experiments ............................................................................................ 169

    6.5.1. Experiment Configurations................................................................. 170

    6.5.2. Migration Penalty................................................................................ 171

    6.5.3. System Profiling.................................................................................. 175

    6.5.4. Performance Evaluation...................................................................... 177

    6.6. Conclusion and Future Work .................................................................. 185

    7. General Issues ...................................................................................................... 187

    7.1. Granularity of Performance Control ....................................................... 187

    7.2. Sampling Period and Overhead .............................................................. 189

    7.3. Robustness of Linear Models and PI Control ......................................... 191

    8. Conclusions and Future Work ............................................................................ 193

    Reference.................................................................................................................. 197

  • 10

    List of Figures

    Figure 3.1 The FCS Architecture..33

    Figure 3.2 Control Theory based Design Methodology for FCS Algorithms...41

    Figure 4.1 Feedback Control Real-Time Scheduling Architecture.................................. 47

    Figure 4.2 The Model of the Controlled System .............................................................. 57

    Figure 4.3 Closed-Loop System Model for Real-Time CPU Scheduling ........................ 62

    Figure 4.4 System Response to Reference Input .............................................................. 69

    Figure 4.5 System Response to Disturbance Input ........................................................... 70

    Figure 4.6 Settling Time vs. Process Gain........................................................................ 72

    Figure 4.7 The FC-UM Algorithm.................................................................................... 76

    Figure 4.8 The FECSIM Simulator................................................................................... 81

    Figure 4.9 Controlled Variables vs. Total Requested Utilization..................................... 86

    Figure 4.10 Response to Arrival Overload SL(0, 150%) (DM/PA).................................. 89

    Figure 4.11 Response to Arrival Overload SL(0, 150%) (EDF/P) ................................... 90

    Figure 4.12 Execution Time Factor Ga in Experiment B................................................. 96

    Figure 4.13 Response to Arrival/Internal Overload (DM/PA) ......................................... 97

    Figure 4.14 Response to Arrival/Internal Overload (EDF/P) ........................................... 98

    Figure 4.15 Average Performance of FC-RTS algorithms and the Baseline.................. 107

    Figure 5.1 The Feedback-Control Architecture for Delay Guarantees .......................... 120

    Figure 5.2 Architecture for system identification .......................................................... 131

    Figure 5.3 The Root Locus of the web server model ..................................................... 136

    Figure 5.4 Connection delay and response time............................................................. 139

  • 11

    Figure 5.5 System identification results for Relative Delay .......................................... 141

    Figure 5.6 System Identification Results for Absolute Delay........................................ 143

    Figure 5.7 Evaluation Results of Relative Delay Guarantees between Two Classes..... 146

    Figure 5.8 Evaluation Results of Relative Delay Guarantees for Three Classes ........... 147

    Figure 5.9 Evaluation of Absolute Delay Guarantees.................................................... 150

    Figure 6.1 Aqueduct: The Feedback Control Architecture for Data Migration............. 160

    Figure 6.2 Step Response of Aqueduct...167

    Figure 6.3 Device iops during data migration................................................................ 172

    Figure 6.4 Migration Penalty in Experiment 1173

    Figure 6.5 Migration Penalty in Experiment 2173

    Figure 6.6 Relationship between migration speed and migration speed176

    Figure 6.7 Device iops and control input of Aqueduct...180

    Figure 6.8 Average iops of AFAP and Aqueduct, and Aqueduct in steady state .......... 181

    Figure 6.9 QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State ... 183

    Figure 6.10 QoS violation ratio using 0.98IS..183

    Figure 6.11 Worst QoS Violations of AFAP, Aqueduct, Aqueduct in steady state....184

    Figure 6.12 Execution Time of Migration Plan.......185

  • 12

    List of Tables

    Table 4.1 Testing Configurations.................................................................................... 82

    Table 4.2 Controller Parameters of FC-RTS Algorithms ............................................... 87

    Table 4.3 Performance References of FC-RTS Algorithms ........................................... 88

    Table 4.4 The Performance Profiles of FC-U in Experiment B.................................... 100

    Table 4.5 The Performance Profiles of FC-M in Experiment B....103

    Table 4.6 The Performance Profiles of FC-UM in Experiment B.105

    Table 4.7 Comparison of Real-Time Scheduling Paradigms in Overload Conditions .109

    Table 5.1 Variables and Parameters of the Absolute Delay Controller CAk................. 124

    Table 5.2 Variables and Parameters of the Relative Delay Controller CRk.................. 126

  • 13

    List of Symbols

    C(k) a controlled variable

    CS a performance reference

    U(k) a manipulated variable

    TS the settling time

    CO the overshoot

    ESC the steady-state error

    SP the sensitivity with regard to a system parameter P

    SL(Ln, Lm) the step load that increases instantaneously from Ln to Lm RL(Ln, Lm, TR) the ramp load that increases linearly from Ln to Lm within TR sec

    Di[j] the relative deadline of task i at QoS level j

    EEi[j] the estimated execution time of task i at QoS level j

    AEi[j] the actual execution time of task i at QoS level j

    Vi[j] the value of task i at QoS level j

    Pi[j] the invocation period of periodic task i at QoS level j

    EIi[j] the estimated inter-arrival-time of aperiodic task i at QoS level j

    AIi[j] the average inter-arrival-time of aperiodic task i at QoS level j

    Bi[j] the estimated CPU utilization of task i at QoS level j

    Ai[j] the actual CPU utilization of task i at QoS level j

    Ga(k) the utilization ratio in the kth sampling period

    GA the worst-case utilization ratio

    Gm(k) the miss ratio factor in the kth sampling period

    GM the worst-case miss ratio factor

    Ath(k) the schedulable utilization threshold GA in the kth sampling period

    Wk the absolute or relative connection delay guarantee of service class k

    Ck(m) the connection delay of class k in the mth sampling period

    Bk(m) the process budget of class k in the mth sampling period

    Rm(k) the inter-submove-time in the kth sampling period

    Ii(k) the number of I/O per sec of device i in the kth sampling period

  • 14

    List of Abbreviations

    FCS Feedback Control real-time Scheduling

    RM Rate Monotonic scheduling policy

    EDF Earliest Deadline First scheduling policy

    DM Deadline Monotonic scheduling policy

  • 15

    Chapter 1

    Introduction

    1.1. Motivation

    Real-time scheduling algorithms fall into two categories: static and dynamic scheduling.

    In static scheduling, the scheduling algorithm has complete knowledge of the task set and

    its constraints, such as deadlines, computation times, precedence constraints, and future

    release times. The Rate Monotonic (RM) algorithm and its extensions [40][48] are static

    scheduling algorithms and represent one major paradigm for real-time scheduling. In

    dynamic scheduling, however, the scheduling algorithm does not have the complete

    knowledge of the task set or its timing constraints. For example, new task activations, not

    known to the algorithm when it is scheduling the current task set, may arrive at a future

    unknown time. Dynamic scheduling can be further divided into two categories:

    scheduling algorithms that work in resource sufficient environments and those that work

    in resource insufficient environments. Resource sufficient environments are systems

    where the system resources are sufficient to a priori guarantee that, even though tasks

    arrive dynamically, at any given time all the tasks are schedulable. Under certain

  • 16

    conditions, Earliest Deadline First (EDF) [48][71] is an optimal dynamic scheduling

    algorithm in resource sufficient environments. EDF is a second major paradigm for real-

    time scheduling. While real-time system designers try to design the system with

    sufficient resources, because of cost and unpredictable environments, it is sometimes

    impossible to guarantee that the system resources are sufficient. In this case, EDFs

    performance degrades rapidly in overload situations. The Spring scheduling algorithm

    [79] can dynamically guarantee incoming tasks via on-line admission control and

    planning and thus is applicable in resource insufficient environments. Many other

    algorithms [71] have also been developed to operate in this way. These admission-

    control-based algorithms represent the third major paradigm for real-time scheduling.

    However, despite the significant body of results in these three paradigms of real-time

    scheduling, many real world problems are not easily supported. While algorithms such as

    EDF, RM and the Spring scheduling algorithm can support sophisticated task set

    characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.),

    they are all "open loop" scheduling algorithms. Open loop refers to the fact that once

    schedules are created they are not "adjusted" based on continuous feedback. While open-

    loop scheduling algorithms can perform well in predictable environments in which the

    workloads can be accurately modeled (e.g., traditional process control systems), they can

    perform poorly in unpredictable environments, i.e., systems whose workloads cannot be

    accurately modeled. For example, the Spring scheduling algorithm assumes complete

    knowledge of the task set except for their future release times. Systems with open-loop

    schedulers such as the Spring scheduling algorithm are usually designed based on worst-

    case workload parameters. When accurate system workload models are not available,

  • 17

    such an approach can result in a highly underutilized system based on extremely

    pessimistic estimation of workload.

    In recent years, a new category of soft real-time applications executing in open and

    unpredictable environments is rapidly growing [69]. Examples include open systems on

    the Internet such as online trading and e-business servers, and data-driven systems such

    as smart spaces, agile manufacturing, and many defense applications such as C4I. For

    example, in an e-business server, neither the resource requirements nor the arrival rate of

    service requests are known a priori. However, performance guarantees are required in

    these applications. Failure to meet performance guarantees may result in loss of

    customers, financial damage, liability violations, or even mission failures. For these

    applications, a system design based on open loop scheduling and estimation of worst-case

    resource requirements can result in an extremely expensive and underutilized system.

    As a cost-effective approach to achieve performance guarantees in unpredictable

    environments, several adaptive scheduling algorithms have been recently developed (e.g.,

    [5][8][9][24][44][46][55]). While early research on real-time scheduling was concerned

    with guaranteeing complete avoidance of undesirable effects such as overload and

    deadline misses, adaptive real-time systems are designed to handle such effects

    dynamically. There remain many open research questions in adaptive real-time

    scheduling. In particular, how can a system designer specify the performance requirement

    of an adaptive real-time system? And how can he systematically design a scheduling

    algorithm to satisfy its performance specifications? The design methodology for

    automatic adaptive systems has been developed in feedback control theory [32][34].

    However, feedback control theory has been mostly applied in mechanical and electrical

  • 18

    systems. The modeling, analysis and implementation of adaptive real-time systems lead

    to significant research challenges.

    Recently, several works applied control theory to computing systems. For example,

    several papers [4][13][22][23][28][58][63][66][73][75] presented flexible or adaptive

    real-time (CPU) scheduling techniques to improve digital control system performance.

    These techniques are tailored to the specific characteristics of digital control systems

    instead of general adaptive real-time computing systems. Several other papers [6][19]

    [44][63][64][74] presented adaptive CPU scheduling algorithms or QoS management

    architectures for computing systems such multimedia and communication systems.

    Transient and steady state performance of adaptive real-time systems has received special

    attention in recent years. For example, Brandt et. al. [19] evaluated a dynamic QoS

    manager by measuring the transient performance of applications in response to QoS

    adaptations. Rosu et. al. [64] proposed a set of performance metrics to capture the

    transient responsiveness of adaptations and its impact on applications. The paper

    proposed metrics that is similar to settling time and steady-state error metrics found in

    control theory.

    However, to our best knowledge, no unified framework exists to date for designing an

    adaptive system from performance specifications of desired dynamic response. In this

    thesis, we establish feedback control real-time scheduling (FCS) [53], a unified

    framework of adaptive real-time systems based on feedback control theory. Our control

    theoretical framework includes the following elements:

    Feedback control scheduling architectures that map the feedback control structure

  • 19

    to adaptive resource scheduling in real-time systems [52],

    A set of performance specifications and metrics to characterize transient and

    steady state performance of adaptive real-time systems [51], and

    A control theory based design methodology for resource scheduling algorithms to

    satisfy their performance specifications [50][53].

    In contrast with ad hoc approaches that rely on laborious design/tuning/testing

    iterations, our framework enables system designers to systematically design adaptive

    real-time systems with established analytical methods to achieve desired performance

    guarantees in unpredictable environments.

    1.2. Contributions

    Specifically, the main contributions of this thesis work are as follows:

    A control-theoretical foundation for adaptive real-time systems: We apply

    control theory to provide a theoretical foundation for adaptive real-time

    scheduling. In contrast with some existing scheduling algorithms that utilize

    feedback control in an ad hoc manner, we provide theoretical understanding of

    feedback control scheduling and develop a systematic design methodology for

    adaptive real-time systems with analytically proven performance guarantees in

    unpredictable environments.

    Design methodology for real-time systems in unpredictable environments:

    While traditional design methods for real-time system design depend on a priori

    known workloads parameters (e.g., worst-case execution times, worst case arrival

  • 20

    rates, and blocking factors due to resource contentions), our control theory based

    design methodology provides robust performance guarantees when accurate

    characterizations of the workloads are not available. This feature makes our

    design framework especially valuable for performance critical systems in

    unpredictable environments, e.g., open systems on the Internet such as online

    trading and e-business servers, and data-driven systems such as smart space, agile

    manufacturing, and many defense applications.

    Software architecture for feedback performance control: We develop a

    general software architecture for adaptive performance control in unpredictable

    environments. Our architecture facilitates control theory based design and

    analysis of an adaptive real-time system by mapping it to the structure of

    feedback control systems. This architecture includes a set of control-related

    variables (performance references, controlled variables and manipulated

    variables), and software components such as monitor, actuator, and controller.

    Our architecture has been implemented as three instances tailored the specific

    characteristics and performance requirements of different applications including

    real-time CPU scheduling, a web server, and data migration in networked storage

    systems. These successful instantiations demonstrate the general applicability of

    our architecture in software systems in unpredictable environments.

    Performance specifications and guarantees: While hard real-time systems

    require absolute guarantees, such guarantees are infeasible and unnecessary for

  • 21

    many soft real-time systems in unpredictable environment. We adopt a set of

    performance metrics and specifications in control theory to characterize the

    transient and steady state performance of adaptive real-time systems. Transient

    state performance (including settling time and overshoot) of an adaptive system

    represents the responsiveness and efficiency of adaptation in response to

    environmental variations, and steady-state performance (including stability,

    steady state error, and sensitivity) describes a system's long-term performance. In

    contrast, traditional metrics such as average miss-ratio cannot capture the

    transient behavior of the system in response to load variations.

    Modeling real-time computing systems: Unlike traditional control systems such

    as electrical and mechanical systems, real-time computing systems do not have

    readily available differential/difference equations that can be used in control

    analysis. In this thesis work, we apply analytical approach and system

    identification techniques to the modeling of three computing systems, a generic

    CPU-bound real-time system, a modified Apache web server, and a networked

    storage system. In the analytical approach, a system designer describes a system

    directly with mathematical equations based on the knowledge of the system

    dynamics. When such knowledge is not available (as in the case of the Apache

    web server), we use system identification [11] to estimate the system model based

    on system input/output from profiling experiments. This modeling methodology

    and established analytical models provide a basis for the application of control

    theory to adaptive real-time scheduling.

  • 22

    Handling non-linearities of real-time systems: The control design of an

    adaptive resource scheduler is non-trivial due to the non-linearities and unknown

    or random factors in many real-time computing systems. We solved these

    problems with model linearization techniques and novel control structures based

    on the particular characteristics of real-time systems. Our work demonstrates that

    robust performance control can be achieved despite of the intrinsic non-linearities

    and uncertainties of real-time systems.

    Practical FCS implementation in three applications: Using our design

    framework, we develop practical resource scheduling algorithms that can provide

    robust (steady state and transient) performance guarantees in unpredictable

    environments, while traditional scheduling algorithms fail to provide such

    guarantees. We develop FCS algorithms for three application domains including

    real-time CPU scheduling, web servers, and storage systems. These applications

    are significantly different in terms of semantics of performance guarantees,

    scheduled resources, monitor/actuator mechanisms, and system models. Our

    evaluation experiments demonstrate that our FCS algorithms based on the FCS

    framework successfully achieved robust performance guarantees in all three

    applications. The success in these applications demonstrates that FCS is a unified

    framework for adaptive computing systems.

    Real-Time CPU Scheduling: We develop a set of feedback control real-

    time scheduling (FCS) algorithms that guarantees low deadline miss ratio

  • 23

    and high CPU utilization by dynamically adjusting task QoS levels and

    CPU requirements. Simulation experiments demonstrate that our FCS

    algorithms provide robust steady and transient state performance

    guarantees in terms of deadline miss ratio even when the task execution

    time varied considerably from the estimation and when the systems

    schedulable utilization bound is unknown.

    Connection Scheduling in Web Servers: We develop adaptive connection

    scheduling algorithms that provide relative, absolute and hybrid service

    delay guarantees for different service classes on web servers under HTTP

    1.1. The scheduling algorithms feature feedback control loops that

    enforce delay guarantees for classes via dynamic connection scheduling

    and server process reallocation. The scheduling algorithms have been

    implemented by modifying an Apache web server. Experimental results

    demonstrate that our adaptive server provides robust delay guarantees

    when web workload varies significantly. Properties of our adaptive web

    server also include guaranteed stability, and satisfactory efficiency and

    accuracy in achieving desired delay or delay differentiation. Our new real-

    time web server will be particularly useful for e-business and e-trading

    applications, where a priori QoS guarantees is desirable in face of bursty

    and unpredictable workloads from the Internet.

    On-line Data Migration in Storage Systems: We have extended our work

    to a non-real-time application, on-line data migration in storage systems.

    On-line data migration is necessary in large-scale storage systems (e.g.,

  • 24

    data centers of e-business and large organizations, and multimedia service

    centers such as video-on-demand) due to performance optimization and

    load balancing, and back-up operations. However, data migration can

    cause unacceptable performance degradations in concurrent applications

    due to excessive resource contentions on the storage system. We develop

    an adaptive data migration executor with a feedback control architecture

    that guarantees desired I/O throughput for applications by dynamically

    regulating the speed of data migration. The migration executor has been

    implemented and evaluated at a storage testbed at HP Labs. Our

    evaluation experiments demonstrate that our adaptive migration executor

    achieved specified I/O throughput of all devices at the cost of slowing

    down data migration. Our work on storage systems demonstrates the

    generality of our control-theory-based framework in non-real-time

    systems.

    Technology Impact: Not only have we produced several research papers

    [6][50][51][52][53][70], parts of this thesis work have also been transferred to

    other university research groups. We have sent our real-time CPU scheduling

    simulator FECSIM and the feedback control CPU scheduling algorithms to a

    group in Sweden for them to study the algorithms. We have transferred the source

    code of our adaptive web server and system identification software to Professor

    Lui Shas group at UIUC and given them inputs on modeling of web servers. The

    project of online data migration in networked storage systems was conducted

  • 25

    when the author was a research intern in the Storage Systems Program at Hewlett

    Packard Laboratories (Palo Alto). Hewlett Packard is in the process of applying

    the feedback control data migration technique developed in the Aqueduct project

    for a patent.

    The rest of the thesis is organized as follows. We discuss the state-of-the-art in

    Chapter 2. In Chapter 3, we present the general control-theory based design methodology

    for adaptive real-time systems. The first case study, feedback control real-time CPU

    scheduling, is presented in Chapter 4. The second case study, adaptive connection

    scheduling for service delay guarantees in web servers, is presented in Chapter 5. The

    third case study, on-line data migration with I/O throughput guarantees on concurrent

    applications in storage systems, is presented in Chapter 6. After summarizing several

    general issues in Chapter 7, we conclude the thesis at Chapter 8.

  • 26

    Chapter 2

    Related Work

    A general trend of real-time resource scheduling has evolved from static to dynamic and

    adaptive while the target application environments becomes increasingly unpredictable.

    While classical real-time scheduling that concerns with absolute guarantees in highly

    predictable environments, more recent research aims at developing more flexible,

    adaptive and cost-effective solutions to handle unpredictable environments. This thesis

    work establishes a theoretical foundation and unified framework for achieving a new

    category of performance guarantees in unpredictable environments with adaptive real-

    time resource scheduling. In this chapter, we summarize the work related to this thesis

    research. The classical results on real-time scheduling is described in Section 2.1. A

    category of flexible and adaptive real-time scheduling algorithms tailored for digital

    control systems is summarized in Section 2.2. In Section 2.3, we then describe existing

    QoS adaptation techniques and compare them with our FCS framework. Related works

    on web server delay guarantees and storage systems are summarized in Sections 2.4 and

    2.5, respectively.

  • 27

    2.1. Classical Real-Time Scheduling

    Classical real-time scheduling algorithms depend on a priori characterization of

    workload and systems to provide performance guarantees in predictable environments

    (e.g., embedded process control and avionics). For example, Rate Monotonic (RM)

    [40][48] and Earliest Deadline First (EDF) [48][71] require complete knowledge about

    the task set such as resource requirements, precedence constraints, resource contention,

    and future arrival times. Dynamic real-time systems [71] pioneered by the Spring project

    [79] provide guarantees upon new task arrivals with on-line admission control and

    planning. Unlike earlier systems based on RM or EDF, the dynamic real-time systems do

    not require future task arrival time to be known a priori. However, the on-line admission

    control and planning in the above dynamic systems still depend on a priori task set

    characterizations including resource requirements, precedence constraints, and resource

    contention. While classical algorithms such as EDF, RM and the Spring scheduling

    algorithm can support sophisticated task set characteristics, they cannot provide

    performance guarantees in systems operating in unpredictable environments where an

    accurate workload model is not available. Such systems include Internet servers (e.g., on-

    line stock trading and e-business) and data-driven systems (e.g., smart spaces and agile

    manufacturing). A key observation that motivated this thesis work is that a fundamental

    reason for the inadequacy of classical real-time scheduling in unpredictable environments

    lies in their open loop nature. Because they do not adjust schedules based on continuous

    performance feedback, open loop schedulers schedule tasks and system resource based on

    worst-case workload estimations. When accurate system workload models are not

    available, the open loop approach may result in a highly underutilized system based on

  • 28

    extremely pessimistic estimation of workload. In contrast, feedback control real-time

    scheduling provides robust performance guarantees in unpredictable environments with a

    closed loop approach.

    2.2. Real-Time Scheduling for Embedded Digital Control Systems

    There have been several results that have applied feedback control theory to the design of

    real-time computing systems. For example, several papers [30][58][65][66] presented co-

    design methods for real-time scheduling algorithms and embedded digital control

    systems. The co-design methods trade-off the quality of control performance and its

    computation requirements to produce more cost-effective system designs than separate

    design of control and scheduling. There approaches are off-line solutions and their on-

    line scheduling algorithms are still classical open-loop algorithms such as EDF and RM.

    Several other papers presented on-line scheduling algorithms [4][16][22][23][30][73] to

    improve the robustness of digital control system by dynamically relaxing the timing

    constraints within the tolerable range of the digital control system in overload conditions.

    However, these techniques require a priori knowledge of the tasks such as execution

    times. Furthermore, these techniques are tailored to CPU-bound digital controllers and

    are not applicable to other computing systems such as e-business servers and on-line

    trading where the performance bottleneck may not be the CPU.

    2.3. QoS Adaptation

    The concept of using performance feedback to adjust the schedule has been incorporated

    in general-purpose operating systems in the form of multi-level feedback queue

    scheduling [18]. The system adjusts a tasks priority based on whether it consumes a time

  • 29

    slice or is blocked due to I/O. This type of feedback control is based on intuitive solutions

    rather than systematic control derivation to achieve performance guarantees.

    In recent years, QoS adaptation architectures and algorithms have been developed to

    support applications such as communication subsystems [8], multimedia [19][24],

    distributed visual tracking [46] and operating systems [55][61][63][69][78]. Some of

    these techniques [55][61][63] include optimization algorithms to optimize the value in

    QoS adaptation. However, their optimization algorithms assume that the resource

    requirement of every QoS level is a priori known. In contrast, our FCS framework

    provides performance guarantees even when the resource requirements are unknown or

    deviate from the estimations. Several other works [8][21][25][78] developed feedback

    based adaptation algorithms that do not depend on completely accurate knowledge about

    workloads. However, their feedback loops were based on heuristics and they did not

    establish time domain analysis on the efficiency of QoS adaptation in response to run-

    time variations. Our FCS framework provides a unified framework to design adaptive

    real-time systems with proven transient state performance.

    Li and Nahrstedt utilized control theory to develop a feedback control loop to

    guarantee desired network packet rate in a distributed visual tracking system [46]. Hollot,

    Misra, Towsley, and Gong In [36] apply control theory to analyze a congestion control

    algorithm on IP routers. While these works also uses control theory analysis on

    computing systems, they do not address timing constraints and service delays on end

    server systems , which is the focus of this thesis.

    Transient and steady state performance of QoS adaptation has received special

    attention in recent years (e.g., [19][64][75]). For example, Brandt et. al. [19] evaluated a

  • 30

    dynamic QoS manager by measuring the transient performance of applications in

    response to QoS adaptations. Rosu et. al. [64] proposed a set of performance metrics to

    capture the transient responsiveness of adaptations and its impact on applications.

    However, they did not provide a methodology to design a system from its performance

    specifications in terms of above metrics. Instead they only used the metrics in system

    testing. In contrast, by extending and mapping these metrics to the dynamic response of

    control systems, our FCS framework provide a control-theory-based methodology to

    design a system to analytically satisfy its performance specifications.

    2.4. Service Delay Guarantee in Web Servers

    Support for different classes of service on the Web (with special emphasis on server

    delay differentiation) has been investigated in recent literature. For example, the authors

    of [28] proposed and evaluated an architecture in which restrictions are imposed on the

    amount of server resources (such as threads or processes), which are available to basic

    clients. In [9][10] admission control and scheduling algorithms are used to provide

    premium clients with better service. In [17] a server architecture is proposed that

    maintains separate service queues for premium and basic clients, thus facilitating their

    differential treatment. While the above differentiation approach usually offers better

    service to premium clients, it does not provide any guarantees on the service and hence

    can be called the best effort differentiation model.

    Notably, a feedback control loop was used in [5][6][9] to control the desired CPU

    utilization of a web server with adaptive admission control. Their CPU utilization control

    can be extended to guarantee the desired absolute delay in web servers under HTTP 1.0

    protocol and when CPU is the bottleneck resource. This technique is not applicable to

  • 31

    servers under HTTP 1.1 protocol, which can be handled by our adaptive server described

    in Chapter 5. A least squares estimator was used in [1] for automatic profiling of resource

    usage parameters of a web server. However, the work did not establishing a dynamic

    model for the server.

    Several other works such as [13][26] developed kernel level mechanism to achieve

    overload protection and proportional resource allocations in server systems. Their work

    did not utilize feedback control, nor did they provide any relative or absolute delay

    guarantees. Supporting proportional differentiated services in network routers have been

    investigated in [26][47]. Their work did not address end systems such as web servers.

    2.5. Data Migration in Storage Systems

    An old approach to performing backups and data relocations is to do them at night, while

    the system is idle. As discussed, this does not help with many current applications such

    as e-business that require continuous operation and adaptation to quickly changing

    system/workload conditions. The approach of bringing the whole (or parts of the) system

    offline is also impractical due to the substantial business costs that it incurs. Online

    migration and backup are still in their infancy in the current state of the art. Some

    existing tools such as the Veritas Volume Manager [75] can guarantee consistent access

    to each piece of data while its being migrated. However, we are not aware of any

    existing solution that handles concurrent accesses while bounding the impact of

    migration on concurrent applications.

  • 32

    Chapter 3

    Feedback Control Real-Time Scheduling

    Framework

    In this chapter, we describe feedback control real-time scheduling (FCS), a unified

    framework of adaptive real-time systems based on feedback control theory. The FCS

    framework includes the following elements:

    A feedback control scheduling architecture that maps adaptive resource

    scheduling in real-time systems [52] to feedback control loops,

    A set of performance specifications and metrics [51] to characterize transient and

    steady state performance of adaptive real-time systems, and

    A control theory based design methodology [50][53] for resource scheduling

    algorithms to satisfy their performance specifications.

    A key feature of the FCS framework is its use of feedback control theory (rather than

    ad hoc solutions) as a scientific underpinning. The FCS framework enables system

    designers to systematically design adaptive real-time systems with established analytical

  • 33

    methods to achieve analytically provable performance guarantees in unpredictable

    environments. To our best knowledge, this is the first unified framework that provides a

    fundamental theory and analytical design methodology for adaptive real-time systems to

    achieve specified performance guarantees in unpredictable environments. In this chapter,

    we describe the elements of the general FCS framework at a high level. The specific

    technical challenges and solutions are described with its concrete instantiations in three

    different application domains: real-time CPU scheduling (Chapter 4), web servers

    (Chapter 5), and networked storage systems (Chapter 6).

    3.1. Feedback Control Scheduling Architecture

    The major components of our FCS architecture are a set of control related variables and a

    feedback control loop that maps a feedback control system structure to real-time resource

    scheduling.

    Actuator

    Monitor

    performancereference

    control input

    controlled variable

    manipulatedvariable

    Real-Time System

    + -

    error

    controlfunction

    ControllerScheduler

    sample

    Figure 3.1. The FCS Architecture

    3.1.1. Control Related Variables

    A first step in designing the FCS architecture is to decide the following key variables of a

    real-time system in terms of control theory.

  • 34

    Controlled variable C(k): the performance metric that characterizes the system

    performance defined over a sampling period ((k-1)W, kW), where W is a

    application specific constant called the sampling window. The scheduler controls

    the controlled variable in order to achieve the desired performance. The choice of

    controlled variables depends on the performance guarantees that need to be

    provided to the specific application of a system. For example, if an absolute delay

    guarantee is required in an Internet server (e.g., critical stock trading operations in

    an on-line trading system), the (absolute) service delays of HTTP requests should

    be defined as the controlled variable. On the other hand, if proportional

    differentiated service is required in an Internet server (e.g., e-commerce stores

    where customers are classified into different service classes depending on their

    monthly fees), the relative delays of service classes become the appropriate

    controlled variables. For another example, the deadline miss ratio and the CPU

    utilization are typical controlled variables for soft real-time systems (e.g.,

    multimedia streaming, process control, and robotics) where explicit timing

    constraints need to be respected.

    Performance reference CS: the desired system performance in term of a controlled

    variable C(k). The performance reference defines a contract established between

    the adaptive resource scheduler and the users such that the performance reference

    should be enforced. The difference between the performance reference and the

    value of the corresponding controlled variable is called the error EC(k) = CS

  • 35

    C(k). For example, if a system set its performance to a deadline miss ratio of CS =

    2%, and the current miss ratio is 10%, the system has an error EC(k) = -8%.

    Manipulated variable U(k): a system attribute that is dynamically changed by the

    scheduler. The manipulated variable should be effective for performance control,

    e.g., changing its value should affect the systems controlled variable(s). The

    choice of manipulated variable should reflect the resource bottleneck of a system.

    For example, although the total requested utilization should be used as a

    manipulated variable if CPU is the bottleneck resource of a web server; it should

    not be used as the manipulated variable if CPU is not the bottleneck resource

    (e.g., in the case of HTTP 1.1 as described in Section 5.2).

    3.1.2. Feedback Control Loop

    The FCS architecture has a feedback control loop that is invoked at every sampling

    instant k. Each feedback control loop is composed of a Monitor, a Controller, and an

    Actuator.

    1) The Monitor measures the controlled variables and feeds the samples back to the

    Controller.

    2) The Controller compares the performance references with corresponding controlled

    variables to get the current errors, and calls control algorithms to compute a control

    input, the new value of the manipulated variable based on the errors. The control

    algorithm is a critical component with significant impacts on the system performance

    and hence is the centerpiece of the design of an FCS algorithm. Note that control

  • 36

    theory enables us to derive the control algorithm and analytically prove that the

    algorithm can provide the desired performance guarantees.

    3) The Actuator changes the manipulated variable based on the newly computed control

    input. The Actuator implements a mechanism that dynamically reallocates

    (reschedules) the resource corresponding to the manipulated variable. For example,

    corresponding to a manipulated variable of the total requested CPU utilization, we

    design a QoS Actuator to dynamically adjust task QoS levels (different QoS levels

    have different execution times and/or invocation periods).

    3.2. Performance Specifications and Metrics

    We now describe the second element of the FCS framework, the performance

    specifications and metrics for adaptive real-time systems. While early research on real-

    time computing was concerned with guaranteeing complete avoidance of undesirable

    effects such as overload and deadline misses, adaptive real-time systems are designed to

    handle such effects dynamically. Using a control theory framework, we characterize the

    dynamic performance of an adaptive real-time system in both transient and steady state

    upon load or resource changes. Transient behavior of an adaptive system represents the

    responsiveness and efficiency of adaptation in reacting to changes in run-time conditions,

    and steady-state behavior describes a system's long-term performance after its transient

    response settles. In contrast, traditional metrics such as the average miss-ratio often fails

    to capture the transient behavior of the system in response to load variations. Another

    important advantage of our metrics is that they formulate the performance of real-time

    systems as dynamic responses in control theory, and therefore enable the use of control

  • 37

    design methods to satisfy the specifications. Our performance specifications and metrics

    consist of a set of performance profiles1 in terms of the controlled variables. We also

    present a set of representative load profiles adapted from control theory [32].

    Corresponding to signals widely used in control theory, our load profiles can be used to

    provide guidance for control design and generate canonical system response to variations

    of run-time conditions.

    3.2.1. Performance Profile

    The performance profile characterizes important transient and steady state properties of a

    system in terms of its controlled variables. Note that when the sampling window W is

    small, a controlled variable C(k) approximates the instantaneous system performance at

    the sampling instant k. In contrast, traditional metrics for real-time systems such as

    average miss-ratio and average utilization are defined based on a much larger time

    window than the sampling period W. The average metrics are often inadequate metric in

    characterizing the dynamics of the system performance [50]. From the control theory

    point of view, a real-time system transits from the steady state to the transient state when

    a controlled variable deviates significantly from its steady state value in response to

    variation in its run-time condition. After a time interval in the transient state, the system

    may settle down to a new steady state after the feedback control loop converges the

    controlled variable to the vicinity of a new value. The steady state is defined as a state

    when the controlled variable C(k) stays within % of its performance reference CS. The

    performance profile includes the following elements.

    1 The performance profile has been called the miss-ratio profile in [50] when deadline miss ratio is used as the controlled variable.

  • 38

    Stability: A system is Bounded-Input-Bounded-Out stable if its controlled

    variables are always bounded for bounded performance references and

    disturbances. Note that the performance of an unstable system can severely and

    persistently diverge from the desired performance so as to cause system

    malfunctioning and even complete system failure. Stability is a necessary

    condition for achieving the desired performance reference. Stability is especially

    an important requirement for FCS algorithms because a poorly designed

    Controller can overreact to performance errors and push a real-time system to

    unstable conditions.

    Transient-state response represents the responsiveness and efficiency of adaptive

    resource scheduling in reacting to changes in run-time conditions.

    Settling time Ts: The time it takes the system to settle down to a steady

    state from the start of a transient state. The settling time represents how

    fast the system can regain desired performance after a change in its run-

    time condition.

    Overshoot Co: The maximum amount that a controlled variable overshoots

    its reference divided by its reference, i.e., Co = (CM CS) / CS where CM is

    the maximum value of the controlled variable during its transient state.

    Overshoot characterizes the worst-case transient performance degradation

    of a system. A system may require a low overshoot because severe

    transient performance degradation may lead to system failure. For

  • 39

    example, in media players, a high transient deadline miss-ratio can cause

    buffer overflows [19].

    Steady-state error ESC: The difference between the average value of a controlled

    variable in steady state and its reference. The steady state error characterizes how

    precise the system can enforce desired performance in steady state.

    Sensitivity SP: Relative change of a controlled variable in steady state with respect

    to the relative change of a system parameter P. For example, assuming the

    controlled variable is deadline miss ratio, the systems sensitivity with respect to

    the task execution time SAE represents how significantly the change in the task

    execution time affects the system miss-ratio. Sensitivity describes the robustness

    of the system with regard to workload or system variations.

    The performance profile establishes a set of metrics of adaptive real-time systems based

    on the specification of dynamic response in control theory. The metrics enables system

    designers to apply established control theory techniques to achieve stability, and meet

    transient and steady state specifications.

    3.2.2. Load Profile

    According to control theory, the performance profile of an adaptive system may be

    specified assuming representative load profiles including step load and ramp load. The

    step load represents the worst case of load variation that overloads the system

    instantaneously, while the ramp load represents a nominal form of load variation. The

  • 40

    load profiles are defined as follows.

    Step-load SL(Ln, Lm): a load profile that instantaneously jumps from a nominal

    load Ln to a higher load Lm > Ln and stays constant after the jump. Instantaneous

    load change such as the step load is more difficult to handle than gradual load

    change.

    Ramp-load RL(Ln, Lm, TR): a load profile that increases linearly from the nominal

    load Ln to a higher load Lm > Ln during a time interval of TR sec. Compared with

    the step load, the ramp signal represents a less severe load variation scenario.

    One key advantage of using the above load profiles for performance specification is

    that they are amenable to well-established design and analysis methods in control theory

    and, therefore, fits well with our control theoretical framework. This means that a system

    designer can use control theory method to analytically design the system to satisfy a

    performance profile in response to a load profile as defined above. Specifically, a load

    profile can be modeled as disturbance signals in the form of a step or ramp signal (see

    Section 4.4). Based on control theory, a linear systems dynamic properties can be

    determined by its dynamic response to a step signal or a ramp load regardless of its

    parameters including the magnitude of load variation (Lm-Ln) and the ramp duration TR. If

    a real-time system can be approximated with a linear model in its operation conditions,

    its performance profile can be determined by stressing the system with a step load, i.e.,

    the system can achieve satisfactory performance under any combinations of step and

  • 41

    ramp load if its performance profile in response to a step load or ramp load satisfies its

    specifications.

    Unfortunately, if a real-time system is non-linear in its operation conditions, the

    dynamic response of a system in response of any load variations cannot be determined by

    its response to a single step load or a single ramp load because the system performance

    depends on the specific parameters of the load profiles. In this case, the performance

    profiles in response to specific load profiles are only indications of the system

    performance in general. In this case, the load profiles are application-specific based on a

    set of expected load characteristics and system requirement.

    We should also note that load profile is an abstraction of the workload, and there can

    be many possible instantiations of the same load profile. The instantiation of a load

    profile should incorporate the knowledge of the workload, and, therefore, the load profile

    should be viewed as an enhancement to existing benchmarks (e.g., [37][40][41][42]

    [75][77]). For example, the system load can be interpreted as the total requested CPU

    utilization in the system where CPU is the bottleneck resource. For another example, the

    load of an Internet server may be interpreted as the number of concurrent users.

    Controller Design

    Requirement Analysis

    Modeling System Model FCS algorithms

    Performance Specifications

    Satisfy

    Figure 3.2. Control Theory based Design Methodology for FCS Algorithms

  • 42

    3.3. Control Theory Based Design Methodology

    The third element of our FCS framework is the control theory based design methodology

    (see Figure 3.2). Based on the scheduling architecture and the performance specifications,

    we now establish a design methodology based on feedback control theory. Using this

    design methodology, a system designer can systematically design an adaptive resource

    scheduler to satisfy the systems performance specifications with established analytical

    methods. This methodology is in contrast with existing ad hoc approaches that depend on

    laborious design/tuning/testing iterations. Our design methodology works as follows.

    1) The system designer specifies the desired dynamic behavior with transient and

    steady state performance metrics. This step maps the performance requirements of

    an adaptive real-time system to the dynamic response specification of a control

    system.

    2) The system designer establishes a dynamic model of the real-time system for the

    purposes of performance control. A dynamic model describes the mathematical

    relationship between the control input and the controlled variables of a system

    with differential/difference equations or state matrices. Modeling is important

    because it provides a basis for the analytical design of the Controller. However,

    modeling has been a major challenge for applying control theory to real-time

    systems due to the lack of established differential/difference equations to describe

    real-time systems. Two different approaches can be used to establish the dynamic

    model of a real-time system. The analytical approach directly describes a system

  • 43

    with mathematical equations based on the knowledge of the system dynamics.

    When such knowledge is not available, the system identification approach [11]

    can be used to estimate the system model based on profiling experiments. In this

    thesis work, we apply the analytical approach to model a generic CPU-bound

    real-time system and a storage system, and developed a system identification tool

    to model a web server whose dynamics is less clear. Our work represents a first

    step in modeling real-time systems using rigorous mathematical equations. Our

    modeling methodology and established analytical models provide a foundation for

    the application of control theory to adaptive real-time systems in this thesis work

    and future works in this area.

    3) Based on the performance specs and system model from step 1) and 2), the

    system designer applies established mathematical techniques (i.e., the Root Locus

    method, frequency design, or state based design) of feedback control theory [32]

    to design FCS algorithms that analytically guarantee the specified transient and

    steady-state behavior at run-time. Compared with existing ad hoc approaches, our

    analytical design approach significantly reduce the design time and required

    efforts for adaptive systems because our approach requires much less design

    /testing iterations. Furthermore, the resultant systems parameters can be easily

    tuned with existing control theory methods and tools in practice and the resultant

    system can be proved to satisfy its performance specifications. In contrast, the

    tuning adaptive systems designed with ad hoc methods often depend on repeated

    testing, guessing, or rule-of-thumb without performance guarantees at run-time.

  • 44

    In summary, we describe a unified FCS framework for adaptive real-time systems

    that provides performance guarantees in unpredictable environments. Our FCS

    framework includes 1) a software architecture for feedback performance control, 2) a

    set of performance specifications and metrics that describes the efficiency, accuracy,

    and robustness of performance guarantees, and 3) a control theory methodology for

    designing FCS algorithms to satisfy the performance specifications. In the next three

    chapters, we describe the details of three instantiations of the FCS framework in three

    application domains.

  • 45

    Chapter 4

    Real-Time CPU Scheduling

    In this Chapter, we develop a set of novel real-time CPU scheduling algorithms called

    FC-RTS [51][52][53][70] that guarantee low deadline miss ratio and high CPU utilization

    when workload deviate from estimations at run-time. Our FC-RTS algorithms provide a

    scheduling solution for a new category of soft real-time systems working in unpredictable

    environments, whose performance cannot be guaranteed by many existing real-time

    scheduling algorithms including RM [43], EDF [70], the Spring algorithm [79], and QoS

    adaptation algorithms [4][61]. Such systems include open systems on the Internet such as

    on-line trading servers, e-business servers, and on-line media streaming, and data driven

    systems such as database applications. For example, in an on-line trading server, the

    processing time for a service request often depends on the user input that is unknown to

    the scheduler. For another example, in a surveillance system, the processing time of

    objects tracking based on camera images can vary dramatically due to movement scope

    of the object being tracked [23]. In addition, our FC-RTS algorithms can also provide

    performance guarantees for off-the-shelf software applications, components, and device

    drivers when accurate information on their execution time and invocation rates is

  • 46

    unavailable.

    A motivation for applying FCS framework to real-time CPU scheduling is the

    observation that many existing feedback based scheduling algorithms [8][21][25] are

    based on heuristics rather than a theoretical foundation. These algorithms often depend

    on laborious design/tuning/testing iterations, and may still fail to handle unexpected or

    untested conditions at run-time. While the design methodology for automatic feedback

    control systems has been developed in feedback control theory, the modeling, analysis

    and implementation of real-time scheduling lead to significant research challenges to

    real-time system research. In this thesis, we design our FC-RTS algorithms based on a

    feedback control theory by instantiating the FCS framework in real-time CPU scheduling.

    Specially, our major contributions include the following:

    A novel and general feedback control real-time CPU scheduling architecture that

    allows plug-ins of different real-time scheduling policies and QoS optimization

    algorithms and a set of tuning rules based on the scheduling policies,

    An analytical model of CPU-bound real-time system, which to our best

    knowledge is the first dynamic model for generic real-time CPU scheduling,

    A set of analysis results and tuning methods for FC-RTS algorithms to achieve

    performance specifications including stability, settling time, overshoot, steady

    state performance, and sensitivity with regard to workload variations,

    Practical FC-RTS algorithms applicable to different types of real-time

    applications,

    Performance evaluation results demonstrating that our analytically designed FC-

    RTS algorithms can provide robust performance guarantees in terms of deadline

  • 47

    miss ratio and CPU utilization, and achieve satisfactory performance profiles in

    response to overloads caused by new task arrivals and task execution time

    variations.

    The feedback control real-time scheduling architecture is described in Section 4.1.

    We describe the performance specifications and metrics in Section 4.2. We establish an

    analytical model for a real-time system in Section 4.3. Based on the model, we present

    the design and control analysis of a set of FC-RTS algorithms in Section 4.4. We present

    the performance evaluation results of these scheduling algorithms in Section 4.5. We then

    qualitatively compare FC-RTS algorithms with several existing scheduling paradigms in

    Section 4.6. Finally, we summarize this chapter in Section 4.7.

    CPU

    Task Arrivals

    Completed/AbortedTasks

    QoS Actuator

    Scheduler

    Current Tasks

    Performance References

    Control Input AdjustQoS

    Sched

    Controller

    ControlledVariables

    Monitor

    BasicScheduler

    Figure 4.1. Feedback Control Real-Time Scheduling Architecture

    4.1. Feedback Control Real-Time Scheduling Architecture

    Our feedback control real-time CPU scheduling (FC-RTS) architecture (illustrated in

    Figure 4.1) is composed of four parts: a task model, a set of control related variables, a

    feedback control loop that maps a feedback control system structure to real-time CPU

  • 48

    scheduling, and a Basic Scheduler.

    4.1.1. Task Model

    In our task model, each task Ti has N QoS levels (N 2). Each QoS level j (0 j N-1)

    of Ti is characterized by the following attributes:

    Di[j]: the relative deadline

    EEi[j]: the estimated execution time

    AEi[j]: the (actual) execution time that can vary considerably from instance to

    instance and is unknown to the scheduler

    Vi[j]: the value that task Ti contributes if it is completed at QoS level j before its

    deadline Di[j]. The lowest QoS level 0 represents the rejection of the task

    and Vi[0] = 0. Every QoS level contributes a miss penalty MPi < 0 if it

    misses its deadline.

    Periodic tasks:

    Pi[j]: the invocation period

    Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / Pi[j]

    Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / Pi[j]

    Aperiodic tasks:

    EIi[j]: the estimated inter-arrival-time between subsequent invocations

    AIi[j]: the average inter-arrival-time that is unknown to the scheduler

    Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / EIi[j]

    Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / AIi[j]

    In this model, a higher QoS level of a task has a higher (both estimated and actual)

    CPU utilization and contributes a higher value if it meets its deadline, i.e., Bi[j+1] > Bi[j],

  • 49

    Ai[j+1] > Ai[j], and Vi[j+1] > Vi[j]. In the simplest form, each task only has two QoS

    levels (corresponding to the admission and the rejection of the task, respectively). In

    many applications including web services [5], multimedia [19], embedded digital control

    systems [23], and systems that support imprecise computation [48] or flexible security

    [68], each task has more than two QoS levels and the scheduler can trade-off the CPU

    utilization of a task with the value it contributes to the system at a finer granularity. The

    QoS levels may differ in term of execution time and/or period/inter-arrival-time. For

    example, a web server may dynamically change the execution time of an HTTP session

    by changing the complexity of the requested web page [5]. For another example, several

    papers have shown that the deadlines and periods of tasks in embedded digital control

    systems and multimedia players can be adjusted on-line [19][23][66] within certain

    ranges. A key feature of our task model is that it characterizes systems in unpredictable

    environments where tasks actual CPU utilization is time varying and unknown to the

    scheduler. Such systems are amenable to the use of feedback control loops to

    dynamically correct the scheduling errors to adapt to load variations at run-time.

    4.1.2. Control Related Variables

    An important step in designing the FC-RTS architecture is to decide the following

    variables of a real-time system in terms of control theory.

    Controlled variables are the performance metrics controlled by the scheduler in

    order to achieve desired system performance. Controlled variables of a real-time

    system may include the deadline miss ratio M(k) and the CPU utilization U(k)

    (also called miss ratio and utilization, respectively), both defined over a time

  • 50

    window ((k-1)W, kW), where W is the sampling period and k is called the

    sampling instant.

    The miss ratio M(k) at the kth sampling instant is defined as the number

    of deadline misses divided by the total number of completed and

    aborted task instances in a sampling window ((k-1)W, kW). Miss ratio

    is usually the most important performance metric in a real-time

    system.

    The utilization U(k) at the kth sampling instant is the percentage of

    CPU busy time in a sampling window ((k-1)W, kW). CPU utilization is

    regarded as a controlled variable for real-time systems due to cost and

    throughput considerations. CPU utilization is important also because

    the its direct linkage with the deadline miss ratio (see Section 4.3).

    Another controlled variable might be the total value V(k) delivered by

    the system in the kth sampling period. In the remainder of this paper,

    we do not directly use the total value as a controlled variable, but

    rather address the value imparted by tasks via the QoS Actuator (see

    and Section 4.5.1)

    Performance references represent the desired system performance in terms of the

    controlled variables, i.e., the desired miss ratio MS and/or the desired CPU

    utilization US. For example, a particular system may require deadline miss ratio

    MS = 0 and CPU utilization US = 90%. The difference between a performance

    reference and the current value of the corresponding controlled variable is called

  • 51

    an error, i.e., the miss ratio error EM = MS M(k) and the utilization error EU = US

    U(k).

    Manipulated variables are system attributes that can be dynamically changed by

    the scheduler to affect the values of the controlled variables. In our architecture,

    the manipulated variable is the total estimated utilization B(k) = iBi[li(k)] of all tasks in the system, where Ti is a task with a QoS level of li(k) in the kth sampling

    window. The rational for choosing the total estimated utilization as a manipulated

    variable is that most real-time scheduling policies (such as EDF and

    Rate/Deadline Monotonic) can guarantee no deadline misses when the system is

    not overloaded, and in normal situations, the miss ratio increases as the system

    load increases. The other controlled variable, the utilization U(k), also usually

    increases as the total estimated utilization increases. However, the utilization is

    often different from the total estimated utilization B(k). This is due to the

    estimation error of execution times when workload is unpredictable and time

    varying. Another difference between U(k) and B(k) is that U(k) can never exceed

    100% while B(k) does not have this boundary.

    4.1.3. Feedback Control Loop

    The FC-RTS architecture features a feedback control loop that is invoked at every

    sampling instant k. It is composed of a Monitor, a Controller, and a QoS Actuator (Figure

    4.1).

    1) The Monitor measures the controlled variables (M(k) and/or U(k)) and feeds the

    samples back to the Controller.

  • 52

    2) The Controller compares the performance references with corresponding controlled

    variables to get the current errors, and computes a change DB(k) (called the control

    input) to the total estimated requested utilization, i.e., B(k+1) = B(k) + DB(k), based

    on the errors. The Controller uses a control function to compute the correct control

    input to compensate for the load variations and keep the controlled variables close to

    the references. The detailed design of the Controller is presented in Section 4.4.

    3) The QoS Actuator calls a QoS optimization algorithm (see Section 4.5.1) to maximize

    the system value by dynamically adjusting tasks QoS levels under the utilization

    constraint computed by the Controller, B(k+1) = B(k) + DB(k). In the simplest form,

    each task only has only two QoS levels and the QoS Actuator is essentially an

    admission controller.

    In addition to the above feedback control loop, our FC-RTS architecture also includes

    arriving-time QoS control, i.e., in addition to being called periodically by the Controller,

    the QoS Actuator is also invoked upon the arrival of each task. The arriving-time QoS

    control isolates disturbances caused by new task arrivals (see Section 4.3). Feedback

    control scheduling in systems without arriving-time QoS control was previously studied

    in [50].

    4.1.4. Basic Scheduler

    The FC-RTS architecture has a Basic Scheduler that schedules admitted tasks with a

    scheduling policy (e.g., EDF or Rate/Deadline Monotonic). The properties of the

    scheduling policy can have significant impact on the design of the feedback control loop.

    Our FC-RTS architecture permits plugging in different real-time scheduling policies for

  • 53

    this Basic Scheduler and then designing the entire feedback control scheduling system

    around this choice (see Section 4.4.4).

    A key difference between our work and many previous works is that while previous

    work often assumes the CPU utilization of each task is known a priori, we focus on

    systems in unpredictable environments where tasks actual CPU utilizations are unknown

    and time varying. This more challenging problem necessitates the feedback control loop

    to dynamically correct the scheduling errors at run-time. Our FC-RTS architecture

    establishes a mapping from real-time scheduling to a typical structure of feedback control

    systems. This step enables us to treat a real-time system as a feedback control system and

    utilize feedback control theory to design the system rather than developing ad hoc

    algorithms.

    4.2. Performance Specifications and Metrics

    We now specialize the second element of the FCS framework, the performance

    specifications, to real-time CPU scheduling. The performance specifications consist of a

    set of performance profiles in terms of utilization U(k) and miss ratio M(k), and a set of

    load profiles in term of the total requested CPU utilization of a system.

    4.2.1. Performance Profile

    The performance profile characterizes important transient and steady state performance

    of a real-time system. M(k) and U(k) characterize the system performance in the sampling

    window ((k-1)W, kW). In contrast, traditional metrics for real-time systems such as

    average miss-ratio and average utilization are defined based on a much larger time

    window than the sampling period W. The average metrics are often inadequate metric in

  • 54

    characterizing the dynamics of the system performance in response to overload

    conditions [50]. The performance profile of a real-time system includes the follows.

    Stability: A real-time system is stable if its miss ratio M(k) and utilization U(k) are

    always bounded for bounded references. Although both miss ratio M(k) and

    utilization U(k) are naturally bounded in the range [0, 1], stability is a necessary

    condition to prevent the controlled variables from severe deviations from the

    reference values.

    Transient-state response represents the real-time systems responsiveness and

    efficiency of QoS adaptation in reacting to changes in run-time conditions.

    Overshoot Mo and Uo: For a real-time system, we define overshoot as the

    maximum amount that the system overshoots its miss ratio or utilization

    reference divided by its miss ratio or utilization reference, i.e., Mo = (Mmax

    MS) / MS, Uo = (Umax US) / US, respectively. The maximum miss ratio

    Mo and utilization Uo in the transient state is called the absolute overshoot.

    Overshoot is important to a real-time system because a high transient

    miss-ratio or utilization can cause system failure in many systems such as

    robots and media streaming [19].

    Settling time Ts: The time it takes the system to enter a steady state in

    response to a load profile. The settling time represents how fast the system

    can settle down to steady state with desired miss ratio and/or utilization.

    Steady-state error ESM and ESU: The difference between the average values of

    miss ratio M(k) and/or utilization U(k) in steady state and its corresponding

    reference. The steady state error characterizes how precise the system can enforce

  • 55

    the desired miss ratio and/or utilization in steady state.

    Sensitivity Sp: Relative change of a controlled variable in steady state with respect

    to the relative change of a system parameter p. For example, sensitivity of miss

    ratio with respect to the task execution time SAE represents how significantly the

    change in the task execution time affects the system miss-ratio. Sensitivity

    describes the robustness of the system with regard to workload or system

    variations.

    4.2.2. Load Profile

    Fo