thesis.pdf

Feedback Control Real-Time Scheduling

A Dissertation

Presented to the Faculty of the School of Engineering and Applied Science

University of Virginia

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Computer Science

by

Chenyang Lu

May 2001

2

Copyright by

Chenyang Lu

All Rights Reserved

May 2001

3

Approvals

This dissertation is submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Computer Science

__________________________________

Chenyang Lu

Approved:

__________________________________

John A. Stankovic (Advisor)

__________________________________

Sang H. Son (Chair)

__________________________________

Tarek F. Abdelzaher

__________________________________

Marty Humphrey

__________________________________

Jrg Liebeherr

__________________________________

Gang Tao (Minor Representative)

Accepted by the School of Engineering and Applied Science:

__________________________________

Richard W. Miksad (Dean)

May 2001

4

Abstract

We develop Feedback Control real-time Scheduling (FCS) as a unified framework to

provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-

business servers on the Internet). FCS includes four major components. First, novel

scheduling architectures provide performance control to a new category of QoS critical

systems that cannot be addressed by traditional open loop scheduling paradigms. Second,

we derive dynamic models for computing systems for the purpose of performance

control. These models provide a theoretical foundation for adaptive performance control.

Third, we apply established control methodology to design scheduling algorithms with

proven performance guarantees, which is in contrast with existing heuristics-based

solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-

based performance specifications characterizes the efficiency, accuracy, and robustness

of QoS guarantees.

The generality and strength of FCS are demonstrated by its instantiations in three

important applications with significantly different characteristics. First, we develop real-

time CPU scheduling algorithms that guarantees low deadline miss ratios in systems

where task execution times may deviate from estimations at run-time. We solve the

saturation problems of real-time CPU scheduling systems with a novel integrated control

structure. Second, we develop an adaptive web server architecture to provide relative and

absolute delay guarantees to different service classes with unpredictable workloads. The

adaptive architecture has been implemented by modifying an Apache web server.

Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server

provides robust relative/absolute delay guarantees despite of instantaneous changes in the

user population. Third, we develop a data migration executor for networked storage

systems that migrate data on-line while guaranteeing specified I/O throughput of

concurrent applications.

5

Acknowledgements

First, thanks to my advisor, Jack Stankovic, for being a great mentor to me both

personally and professionally. His encouragement, support, and advice are greatly

appreciated. My thanks go to Tarek Abdelzaher, Sang Son, and Gang Tao for sharing

their ideas and insights on research.

Thanks to Guillermo Alvarez, John Wilkes, Michael Hobbs, Ralph Becker-Szendy,

Simon Towers, and all other members of the storage systems program at HP Labs for

offering me a great research environment and their collaborations during my internship at

HP Labs.

Thanks to Jrg Liebeherr and Marty Humphrey for serving on my dissertation

committee and their valuable suggestions on my dissertation.

Thanks to Jrgen Hansson, Victor Lee, Michael Marley, John Regehr, and all other

members of the real-time systems group for interesting and stimulating discussions.

Thanks to all of my friends for providing invaluable moral support. I want to

especially thank Hainan Lin for helping me through the years at Charlottesville.

Finally but not least, I want to thank my parents and my wife for their understanding

and support of my research endeavors and accompanying me through all the happy and

sad days.

6

Table of Contents

1. Introduction............................................................................................................ 15

1.1. Motivation................................................................................................. 15

1.2. Contributions............................................................................................. 19

2. Related Work.......................................................................................................... 26

2.1. Classical Real-Time Scheduling ............................................................... 27

2.2. Real-Time Scheduling for Embedded Digital Control Systems ............... 28

2.3. QoS Adaptation......................................................................................... 28

2.4. Service Delay Guarantee in Web Servers................................................. 30

2.5. Data Migration in Storage Systems .......................................................... 31

3. Feedback Control Real-Time Scheduling Framework ........................................ 32

3.1. Feedback Control Scheduling Architecture .............................................. 33

3.1.1. Control Related Variables..................................................................... 33

3.1.2. Feedback Control Loop......................................................................... 35

3.2. Performance Specifications and Metrics .................................................. 36

3.2.1. Performance Profile .............................................................................. 37

3.2.2. Load Profile .......................................................................................... 39

3.3. Control Theory Based Design Methodology ............................................ 42

4. Real-Time CPU Scheduling .................................................................................. 45

4.1. Feedback Control Real-Time Scheduling Architecture............................ 47

4.1.1. Task Model ........................................................................................... 48

7

4.1.2. Control Related Variables..................................................................... 49

4.1.3. Feedback Control Loop......................................................................... 51

4.1.4. Basic Scheduler..................................................................................... 52

4.2. Performance Specifications and Metrics .................................................. 53

4.2.1. Performance Profile .............................................................................. 53

4.2.2. Load Profile .......................................................................................... 55

4.3. Modeling the Controlled Real-Time System ............................................ 56

4.4. Design of FC-RTS Algorithms ................................................................. 60

4.4.1. Design of the Controller........................................................................ 61

4.4.2. Closed-Loop System Model ................................................................. 62

4.4.3. Control Tuning and Analysis ................................................................ 64

4.4.4. FC-RTS Algorithms.............................................................................. 73

4.5. Experiments .............................................................................................. 80

4.5.1. FECSIM Real-Time System Simulator ................................................ 81

4.5.2. Scheduling Policy of the Basic Scheduler ............................................ 81

4.5.3. Workload............................................................................................... 82

4.5.4. QoS Actuator ........................................................................................ 84

4.5.5. Profiling the Controlled Real-Time Systems........................................ 85

4.5.6. Controller Parameters ........................................................................... 87

4.5.7. Performance References ....................................................................... 88

4.5.8. Evaluation Experiment A: Arrival Overload ........................................ 90

4.5.9. Evaluation Experiment B: Arrival/Internal Overload........................... 96

4.6. Comparison of Real-Time Scheduling Algorithms in Overload ............ 108

8

4.7. Summary ................................................................................................. 109

5. Web Server with Delay Guarantees..................................................................... 111

5.1. Introduction............................................................................................. 111

5.2. Background............................................................................................. 116

5.3. Semantics of Service Delay Guarantees ................................................. 118

5.4. A Feedback Control Architecture for Web Server QoS ......................... 120

5.4.1. Connection Scheduler ......................................................................... 121

5.4.2. Server Processes.................................................................................. 123

5.4.3. Monitor ............................................................................................... 123

5.4.4. Controllers........................................................................................... 123

5.5. Design of the Controller.......................................................................... 127

5.5.1. Performance Specifications ................................................................ 128

5.5.2. Modeling the Web Server: A System Identification Approach .......... 129

5.5.3. Root-Locus Design ............................................................................. 133

5.6. Implementation ....................................................................................... 136

5.7. Experimentation...................................................................................... 138

5.7.1. Comparing Connection Delays and Response Times......................... 139

5.7.2. System Identification .......................................................................... 141

5.7.3. Evaluation of the Adaptive Web Server ............................................. 143

5.8. Summary ................................................................................................. 150

6. Online Data Migration in Storage Systems ........................................................ 152

6.1. Introduction and Motivations.................................................................. 152

6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration 156

9

6.2.1. Migration Planner ............................................................................... 156

6.2.2. LV Mover............................................................................................ 157

6.2.3. QoS guarantees ................................................................................... 158

6.2.4. The Feedback Control Loop ............................................................... 160

6.2.5. The Monitor ........................................................................................ 161

6.2.6. The Controller..................................................................................... 161

6.2.7. The Actuator ....................................................................................... 163

6.3. Design and Analysis of the Controller.................................................... 163

6.3.1. The Dynamic Model ........................................................................... 164

6.3.2. Controller Tuning and Analysis.......................................................... 165

6.4. Implementation ....................................................................................... 168

6.5. Experiments ............................................................................................ 169

6.5.1. Experiment Configurations................................................................. 170

6.5.2. Migration Penalty................................................................................ 171

6.5.3. System Profiling.................................................................................. 175

6.5.4. Performance Evaluation...................................................................... 177

6.6. Conclusion and Future Work .................................................................. 185

7. General Issues ...................................................................................................... 187

7.1. Granularity of Performance Control ....................................................... 187

7.2. Sampling Period and Overhead .............................................................. 189

7.3. Robustness of Linear Models and PI Control ......................................... 191

8. Conclusions and Future Work ............................................................................ 193

Reference.................................................................................................................. 197

10

List of Figures

Figure 3.1 The FCS Architecture..33

Figure 3.2 Control Theory based Design Methodology for FCS Algorithms...41

Figure 4.1 Feedback Control Real-Time Scheduling Architecture.................................. 47

Figure 4.2 The Model of the Controlled System .............................................................. 57

Figure 4.3 Closed-Loop System Model for Real-Time CPU Scheduling ........................ 62

Figure 4.4 System Response to Reference Input .............................................................. 69

Figure 4.5 System Response to Disturbance Input ........................................................... 70

Figure 4.6 Settling Time vs. Process Gain........................................................................ 72

Figure 4.7 The FC-UM Algorithm.................................................................................... 76

Figure 4.8 The FECSIM Simulator................................................................................... 81

Figure 4.9 Controlled Variables vs. Total Requested Utilization..................................... 86

Figure 4.10 Response to Arrival Overload SL(0, 150%) (DM/PA).................................. 89

Figure 4.11 Response to Arrival Overload SL(0, 150%) (EDF/P) ................................... 90

Figure 4.12 Execution Time Factor Ga in Experiment B................................................. 96

Figure 4.13 Response to Arrival/Internal Overload (DM/PA) ......................................... 97

Figure 4.14 Response to Arrival/Internal Overload (EDF/P) ........................................... 98

Figure 4.15 Average Performance of FC-RTS algorithms and the Baseline.................. 107

Figure 5.1 The Feedback-Control Architecture for Delay Guarantees .......................... 120

Figure 5.2 Architecture for system identification .......................................................... 131

Figure 5.3 The Root Locus of the web server model ..................................................... 136

Figure 5.4 Connection delay and response time............................................................. 139

11

Figure 5.5 System identification results for Relative Delay .......................................... 141

Figure 5.6 System Identification Results for Absolute Delay........................................ 143

Figure 5.7 Evaluation Results of Relative Delay Guarantees between Two Classes..... 146

Figure 5.8 Evaluation Results of Relative Delay Guarantees for Three Classes ........... 147

Figure 5.9 Evaluation of Absolute Delay Guarantees.................................................... 150

Figure 6.1 Aqueduct: The Feedback Control Architecture for Data Migration............. 160

Figure 6.2 Step Response of Aqueduct...167

Figure 6.3 Device iops during data migration................................................................ 172

Figure 6.4 Migration Penalty in Experiment 1173

Figure 6.5 Migration Penalty in Experiment 2173

Figure 6.6 Relationship between migration speed and migration speed176

Figure 6.7 Device iops and control input of Aqueduct...180

Figure 6.8 Average iops of AFAP and Aqueduct, and Aqueduct in steady state .......... 181

Figure 6.9 QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State ... 183

Figure 6.10 QoS violation ratio using 0.98IS..183

Figure 6.11 Worst QoS Violations of AFAP, Aqueduct, Aqueduct in steady state....184

Figure 6.12 Execution Time of Migration Plan.......185

12

List of Tables

Table 4.1 Testing Configurations.................................................................................... 82

Table 4.2 Controller Parameters of FC-RTS Algorithms ............................................... 87

Table 4.3 Performance References of FC-RTS Algorithms ........................................... 88

Table 4.4 The Performance Profiles of FC-U in Experiment B.................................... 100

Table 4.5 The Performance Profiles of FC-M in Experiment B....103

Table 4.6 The Performance Profiles of FC-UM in Experiment B.105

Table 4.7 Comparison of Real-Time Scheduling Paradigms in Overload Conditions .109

Table 5.1 Variables and Parameters of the Absolute Delay Controller CAk................. 124

Table 5.2 Variables and Parameters of the Relative Delay Controller CRk.................. 126

13

List of Symbols

C(k) a controlled variable

CS a performance reference

U(k) a manipulated variable

TS the settling time

CO the overshoot

ESC the steady-state error

SP the sensitivity with regard to a system parameter P

SL(Ln, Lm) the step load that increases instantaneously from Ln to Lm RL(Ln, Lm, TR) the ramp load that increases linearly from Ln to Lm within TR sec

Di[j] the relative deadline of task i at QoS level j

EEi[j] the estimated execution time of task i at QoS level j

AEi[j] the actual execution time of task i at QoS level j

Vi[j] the value of task i at QoS level j

Pi[j] the invocation period of periodic task i at QoS level j

EIi[j] the estimated inter-arrival-time of aperiodic task i at QoS level j

AIi[j] the average inter-arrival-time of aperiodic task i at QoS level j

Bi[j] the estimated CPU utilization of task i at QoS level j

Ai[j] the actual CPU utilization of task i at QoS level j

Ga(k) the utilization ratio in the kth sampling period

GA the worst-case utilization ratio

Gm(k) the miss ratio factor in the kth sampling period

GM the worst-case miss ratio factor

Ath(k) the schedulable utilization threshold GA in the kth sampling period

Wk the absolute or relative connection delay guarantee of service class k

Ck(m) the connection delay of class k in the mth sampling period

Bk(m) the process budget of class k in the mth sampling period

Rm(k) the inter-submove-time in the kth sampling period

Ii(k) the number of I/O per sec of device i in the kth sampling period

14

List of Abbreviations

FCS Feedback Control real-time Scheduling

RM Rate Monotonic scheduling policy

EDF Earliest Deadline First scheduling policy

DM Deadline Monotonic scheduling policy

15

Chapter 1

Introduction

1.1. Motivation

Real-time scheduling algorithms fall into two categories: static and dynamic scheduling.

In static scheduling, the scheduling algorithm has complete knowledge of the task set and

its constraints, such as deadlines, computation times, precedence constraints, and future

release times. The Rate Monotonic (RM) algorithm and its extensions [40][48] are static

scheduling algorithms and represent one major paradigm for real-time scheduling. In

dynamic scheduling, however, the scheduling algorithm does not have the complete

knowledge of the task set or its timing constraints. For example, new task activations, not

known to the algorithm when it is scheduling the current task set, may arrive at a future

unknown time. Dynamic scheduling can be further divided into two categories:

scheduling algorithms that work in resource sufficient environments and those that work

in resource insufficient environments. Resource sufficient environments are systems

where the system resources are sufficient to a priori guarantee that, even though tasks

arrive dynamically, at any given time all the tasks are schedulable. Under certain

16

conditions, Earliest Deadline First (EDF) [48][71] is an optimal dynamic scheduling

algorithm in resource sufficient environments. EDF is a second major paradigm for real-

time scheduling. While real-time system designers try to design the system with

sufficient resources, because of cost and unpredictable environments, it is sometimes

impossible to guarantee that the system resources are sufficient. In this case, EDFs

performance degrades rapidly in overload situations. The Spring scheduling algorithm

[79] can dynamically guarantee incoming tasks via on-line admission control and

planning and thus is applicable in resource insufficient environments. Many other

algorithms [71] have also been developed to operate in this way. These admission-

control-based algorithms represent the third major paradigm for real-time scheduling.

However, despite the significant body of results in these three paradigms of real-time

scheduling, many real world problems are not easily supported. While algorithms such as

EDF, RM and the Spring scheduling algorithm can support sophisticated task set

characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.),

they are all "open loop" scheduling algorithms. Open loop refers to the fact that once

schedules are created they are not "adjusted" based on continuous feedback. While open-

loop scheduling algorithms can perform well in predictable environments in which the

workloads can be accurately modeled (e.g., traditional process control systems), they can

perform poorly in unpredictable environments, i.e., systems whose workloads cannot be

accurately modeled. For example, the Spring scheduling algorithm assumes complete

knowledge of the task set except for their future release times. Systems with open-loop

schedulers such as the Spring scheduling algorithm are usually designed based on worst-

case workload parameters. When accurate system workload models are not available,

17

such an approach can result in a highly underutilized system based on extremely

pessimistic estimation of workload.

In recent years, a new category of soft real-time applications executing in open and

unpredictable environments is rapidly growing [69]. Examples include open systems on

the Internet such as online trading and e-business servers, and data-driven systems such

as smart spaces, agile manufacturing, and many defense applications such as C4I. For

example, in an e-business server, neither the resource requirements nor the arrival rate of

service requests are known a priori. However, performance guarantees are required in

these applications. Failure to meet performance guarantees may result in loss of

customers, financial damage, liability violations, or even mission failures. For these

applications, a system design based on open loop scheduling and estimation of worst-case

resource requirements can result in an extremely expensive and underutilized system.

As a cost-effective approach to achieve performance guarantees in unpredictable

environments, several adaptive scheduling algorithms have been recently developed (e.g.,

[5][8][9][24][44][46][55]). While early research on real-time scheduling was concerned

with guaranteeing complete avoidance of undesirable effects such as overload and

deadline misses, adaptive real-time systems are designed to handle such effects

dynamically. There remain many open research questions in adaptive real-time

scheduling. In particular, how can a system designer specify the performance requirement

of an adaptive real-time system? And how can he systematically design a scheduling

algorithm to satisfy its performance specifications? The design methodology for

automatic adaptive systems has been developed in feedback control theory [32][34].

However, feedback control theory has been mostly applied in mechanical and electrical

18

systems. The modeling, analysis and implementation of adaptive real-time systems lead

to significant research challenges.

Recently, several works applied control theory to computing systems. For example,

several papers [4][13][22][23][28][58][63][66][73][75] presented flexible or adaptive

real-time (CPU) scheduling techniques to improve digital control system performance.

These techniques are tailored to the specific characteristics of digital control systems

instead of general adaptive real-time computing systems. Several other papers [6][19]

[44][63][64][74] presented adaptive CPU scheduling algorithms or QoS management

architectures for computing systems such multimedia and communication systems.

Transient and steady state performance of adaptive real-time systems has received special

attention in recent years. For example, Brandt et. al. [19] evaluated a dynamic QoS

manager by measuring the transient performance of applications in response to QoS

adaptations. Rosu et. al. [64] proposed a set of performance metrics to capture the

transient responsiveness of adaptations and its impact on applications. The paper

proposed metrics that is similar to settling time and steady-state error metrics found in

control theory.

However, to our best knowledge, no unified framework exists to date for designing an

adaptive system from performance specifications of desired dynamic response. In this

thesis, we establish feedback control real-time scheduling (FCS) [53], a unified

framework of adaptive real-time systems based on feedback control theory. Our control

theoretical framework includes the following elements:

Feedback control scheduling architectures that map the feedback control structure

19

to adaptive resource scheduling in real-time systems [52],

A set of performance specifications and metrics to characterize transient and

steady state performance of adaptive real-time systems [51], and

A control theory based design methodology for resource scheduling algorithms to

satisfy their performance specifications [50][53].

In contrast with ad hoc approaches that rely on laborious design/tuning/testing

iterations, our framework enables system designers to systematically design adaptive

real-time systems with established analytical methods to achieve desired performance

guarantees in unpredictable environments.

1.2. Contributions

Specifically, the main contributions of this thesis work are as follows:

A control-theoretical foundation for adaptive real-time systems: We apply

control theory to provide a theoretical foundation for adaptive real-time

scheduling. In contrast with some existing scheduling algorithms that utilize

feedback control in an ad hoc manner, we provide theoretical understanding of

feedback control scheduling and develop a systematic design methodology for

adaptive real-time systems with analytically proven performance guarantees in

unpredictable environments.

Design methodology for real-time systems in unpredictable environments:

While traditional design methods for real-time system design depend on a priori

known workloads parameters (e.g., worst-case execution times, worst case arrival

20

rates, and blocking factors due to resource contentions), our control theory based

design methodology provides robust performance guarantees when accurate

characterizations of the workloads are not available. This feature makes our

design framework especially valuable for performance critical systems in

unpredictable environments, e.g., open systems on the Internet such as online

trading and e-business servers, and data-driven systems such as smart space, agile

manufacturing, and many defense applications.

Software architecture for feedback performance control: We develop a

general software architecture for adaptive performance control in unpredictable

environments. Our architecture facilitates control theory based design and

analysis of an adaptive real-time system by mapping it to the structure of

feedback control systems. This architecture includes a set of control-related

variables (performance references, controlled variables and manipulated

variables), and software components such as monitor, actuator, and controller.

Our architecture has been implemented as three instances tailored the specific

characteristics and performance requirements of different applications including

real-time CPU scheduling, a web server, and data migration in networked storage

systems. These successful instantiations demonstrate the general applicability of

our architecture in software systems in unpredictable environments.

Performance specifications and guarantees: While hard real-time systems

require absolute guarantees, such guarantees are infeasible and unnecessary for

21

many soft real-time systems in unpredictable environment. We adopt a set of

performance metrics and specifications in control theory to characterize the

transient and steady state performance of adaptive real-time systems. Transient

state performance (including settling time and overshoot) of an adaptive system

represents the responsiveness and efficiency of adaptation in response to

environmental variations, and steady-state performance (including stability,

steady state error, and sensitivity) describes a system's long-term performance. In

contrast, traditional metrics such as average miss-ratio cannot capture the

transient behavior of the system in response to load variations.

Modeling real-time computing systems: Unlike traditional control systems such

as electrical and mechanical systems, real-time computing systems do not have

readily available differential/difference equations that can be used in control

analysis. In this thesis work, we apply analytical approach and system

identification techniques to the modeling of three computing systems, a generic

CPU-bound real-time system, a modified Apache web server, and a networked

storage system. In the analytical approach, a system designer describes a system

directly with mathematical equations based on the knowledge of the system

dynamics. When such knowledge is not available (as in the case of the Apache

web server), we use system identification [11] to estimate the system model based

on system input/output from profiling experiments. This modeling methodology

and established analytical models provide a basis for the application of control

theory to adaptive real-time scheduling.

22

Handling non-linearities of real-time systems: The control design of an

adaptive resource scheduler is non-trivial due to the non-linearities and unknown

or random factors in many real-time computing systems. We solved these

problems with model linearization techniques and novel control structures based

on the particular characteristics of real-time systems. Our work demonstrates that

robust performance control can be achieved despite of the intrinsic non-linearities

and uncertainties of real-time systems.

Practical FCS implementation in three applications: Using our design

framework, we develop practical resource scheduling algorithms that can provide

robust (steady state and transient) performance guarantees in unpredictable

environments, while traditional scheduling algorithms fail to provide such

guarantees. We develop FCS algorithms for three application domains including

real-time CPU scheduling, web servers, and storage systems. These applications

are significantly different in terms of semantics of performance guarantees,

scheduled resources, monitor/actuator mechanisms, and system models. Our

evaluation experiments demonstrate that our FCS algorithms based on the FCS

framework successfully achieved robust performance guarantees in all three

applications. The success in these applications demonstrates that FCS is a unified

framework for adaptive computing systems.

Real-Time CPU Scheduling: We develop a set of feedback control real-

time scheduling (FCS) algorithms that guarantees low deadline miss ratio

23

and high CPU utilization by dynamically adjusting task QoS levels and

CPU requirements. Simulation experiments demonstrate that our FCS

algorithms provide robust steady and transient state performance

guarantees in terms of deadline miss ratio even when the task execution

time varied considerably from the estimation and when the systems

schedulable utilization bound is unknown.

Connection Scheduling in Web Servers: We develop adaptive connection

scheduling algorithms that provide relative, absolute and hybrid service

delay guarantees for different service classes on web servers under HTTP

1.1. The scheduling algorithms feature feedback control loops that

enforce delay guarantees for classes via dynamic connection scheduling

and server process reallocation. The scheduling algorithms have been

implemented by modifying an Apache web server. Experimental results

demonstrate that our adaptive server provides robust delay guarantees

when web workload varies significantly. Properties of our adaptive web

server also include guaranteed stability, and satisfactory efficiency and

accuracy in achieving desired delay or delay differentiation. Our new real-

time web server will be particularly useful for e-business and e-trading

applications, where a priori QoS guarantees is desirable in face of bursty

and unpredictable workloads from the Internet.

On-line Data Migration in Storage Systems: We have extended our work

to a non-real-time application, on-line data migration in storage systems.

On-line data migration is necessary in large-scale storage systems (e.g.,

24

data centers of e-business and large organizations, and multimedia service

centers such as video-on-demand) due to performance optimization and

load balancing, and back-up operations. However, data migration can

cause unacceptable performance degradations in concurrent applications

due to excessive resource contentions on the storage system. We develop

an adaptive data migration executor with a feedback control architecture

that guarantees desired I/O throughput for applications by dynamically

regulating the speed of data migration. The migration executor has been

implemented and evaluated at a storage testbed at HP Labs. Our

evaluation experiments demonstrate that our adaptive migration executor

achieved specified I/O throughput of all devices at the cost of slowing

down data migration. Our work on storage systems demonstrates the

generality of our control-theory-based framework in non-real-time

systems.

Technology Impact: Not only have we produced several research papers

[6][50][51][52][53][70], parts of this thesis work have also been transferred to

other university research groups. We have sent our real-time CPU scheduling

simulator FECSIM and the feedback control CPU scheduling algorithms to a

group in Sweden for them to study the algorithms. We have transferred the source

code of our adaptive web server and system identification software to Professor

Lui Shas group at UIUC and given them inputs on modeling of web servers. The

project of online data migration in networked storage systems was conducted

25

when the author was a research intern in the Storage Systems Program at Hewlett

Packard Laboratories (Palo Alto). Hewlett Packard is in the process of applying

the feedback control data migration technique developed in the Aqueduct project

for a patent.

The rest of the thesis is organized as follows. We discuss the state-of-the-art in

Chapter 2. In Chapter 3, we present the general control-theory based design methodology

for adaptive real-time systems. The first case study, feedback control real-time CPU

scheduling, is presented in Chapter 4. The second case study, adaptive connection

scheduling for service delay guarantees in web servers, is presented in Chapter 5. The

third case study, on-line data migration with I/O throughput guarantees on concurrent

applications in storage systems, is presented in Chapter 6. After summarizing several

general issues in Chapter 7, we conclude the thesis at Chapter 8.

26

Chapter 2

Related Work

A general trend of real-time resource scheduling has evolved from static to dynamic and

adaptive while the target application environments becomes increasingly unpredictable.

While classical real-time scheduling that concerns with absolute guarantees in highly

predictable environments, more recent research aims at developing more flexible,

adaptive and cost-effective solutions to handle unpredictable environments. This thesis

work establishes a theoretical foundation and unified framework for achieving a new

category of performance guarantees in unpredictable environments with adaptive real-

time resource scheduling. In this chapter, we summarize the work related to this thesis

research. The classical results on real-time scheduling is described in Section 2.1. A

category of flexible and adaptive real-time scheduling algorithms tailored for digital

control systems is summarized in Section 2.2. In Section 2.3, we then describe existing

QoS adaptation techniques and compare them with our FCS framework. Related works

on web server delay guarantees and storage systems are summarized in Sections 2.4 and

2.5, respectively.

27

2.1. Classical Real-Time Scheduling

Classical real-time scheduling algorithms depend on a priori characterization of

workload and systems to provide performance guarantees in predictable environments

(e.g., embedded process control and avionics). For example, Rate Monotonic (RM)

[40][48] and Earliest Deadline First (EDF) [48][71] require complete knowledge about

the task set such as resource requirements, precedence constraints, resource contention,

and future arrival times. Dynamic real-time systems [71] pioneered by the Spring project

[79] provide guarantees upon new task arrivals with on-line admission control and

planning. Unlike earlier systems based on RM or EDF, the dynamic real-time systems do

not require future task arrival time to be known a priori. However, the on-line admission

control and planning in the above dynamic systems still depend on a priori task set

characterizations including resource requirements, precedence constraints, and resource

contention. While classical algorithms such as EDF, RM and the Spring scheduling

algorithm can support sophisticated task set characteristics, they cannot provide

performance guarantees in systems operating in unpredictable environments where an

accurate workload model is not available. Such systems include Internet servers (e.g., on-

line stock trading and e-business) and data-driven systems (e.g., smart spaces and agile

manufacturing). A key observation that motivated this thesis work is that a fundamental

reason for the inadequacy of classical real-time scheduling in unpredictable environments

lies in their open loop nature. Because they do not adjust schedules based on continuous

performance feedback, open loop schedulers schedule tasks and system resource based on

worst-case workload estimations. When accurate system workload models are not

available, the open loop approach may result in a highly underutilized system based on

28

extremely pessimistic estimation of workload. In contrast, feedback control real-time

scheduling provides robust performance guarantees in unpredictable environments with a

closed loop approach.

2.2. Real-Time Scheduling for Embedded Digital Control Systems

There have been several results that have applied feedback control theory to the design of

real-time computing systems. For example, several papers [30][58][65][66] presented co-

design methods for real-time scheduling algorithms and embedded digital control

systems. The co-design methods trade-off the quality of control performance and its

computation requirements to produce more cost-effective system designs than separate

design of control and scheduling. There approaches are off-line solutions and their on-

line scheduling algorithms are still classical open-loop algorithms such as EDF and RM.

Several other papers presented on-line scheduling algorithms [4][16][22][23][30][73] to

improve the robustness of digital control system by dynamically relaxing the timing

constraints within the tolerable range of the digital control system in overload conditions.

However, these techniques require a priori knowledge of the tasks such as execution

times. Furthermore, these techniques are tailored to CPU-bound digital controllers and

are not applicable to other computing systems such as e-business servers and on-line

trading where the performance bottleneck may not be the CPU.

2.3. QoS Adaptation

The concept of using performance feedback to adjust the schedule has been incorporated

in general-purpose operating systems in the form of multi-level feedback queue

scheduling [18]. The system adjusts a tasks priority based on whether it consumes a time

29

slice or is blocked due to I/O. This type of feedback control is based on intuitive solutions

rather than systematic control derivation to achieve performance guarantees.

In recent years, QoS adaptation architectures and algorithms have been developed to

support applications such as communication subsystems [8], multimedia [19][24],

distributed visual tracking [46] and operating systems [55][61][63][69][78]. Some of

these techniques [55][61][63] include optimization algorithms to optimize the value in

QoS adaptation. However, their optimization algorithms assume that the resource

requirement of every QoS level is a priori known. In contrast, our FCS framework

provides performance guarantees even when the resource requirements are unknown or

deviate from the estimations. Several other works [8][21][25][78] developed feedback

based adaptation algorithms that do not depend on completely accurate knowledge about

workloads. However, their feedback loops were based on heuristics and they did not

establish time domain analysis on the efficiency of QoS adaptation in response to run-

time variations. Our FCS framework provides a unified framework to design adaptive

real-time systems with proven transient state performance.

Li and Nahrstedt utilized control theory to develop a feedback control loop to

guarantee desired network packet rate in a distributed visual tracking system [46]. Hollot,

Misra, Towsley, and Gong In [36] apply control theory to analyze a congestion control

algorithm on IP routers. While these works also uses control theory analysis on

computing systems, they do not address timing constraints and service delays on end

server systems , which is the focus of this thesis.

Transient and steady state performance of QoS adaptation has received special

attention in recent years (e.g., [19][64][75]). For example, Brandt et. al. [19] evaluated a

30

dynamic QoS manager by measuring the transient performance of applications in

response to QoS adaptations. Rosu et. al. [64] proposed a set of performance metrics to

capture the transient responsiveness of adaptations and its impact on applications.

However, they did not provide a methodology to design a system from its performance

specifications in terms of above metrics. Instead they only used the metrics in system

testing. In contrast, by extending and mapping these metrics to the dynamic response of

control systems, our FCS framework provide a control-theory-based methodology to

design a system to analytically satisfy its performance specifications.

2.4. Service Delay Guarantee in Web Servers

Support for different classes of service on the Web (with special emphasis on server

delay differentiation) has been investigated in recent literature. For example, the authors

of [28] proposed and evaluated an architecture in which restrictions are imposed on the

amount of server resources (such as threads or processes), which are available to basic

clients. In [9][10] admission control and scheduling algorithms are used to provide

premium clients with better service. In [17] a server architecture is proposed that

maintains separate service queues for premium and basic clients, thus facilitating their

differential treatment. While the above differentiation approach usually offers better

service to premium clients, it does not provide any guarantees on the service and hence

can be called the best effort differentiation model.

Notably, a feedback control loop was used in [5][6][9] to control the desired CPU

utilization of a web server with adaptive admission control. Their CPU utilization control

can be extended to guarantee the desired absolute delay in web servers under HTTP 1.0

protocol and when CPU is the bottleneck resource. This technique is not applicable to

31

servers under HTTP 1.1 protocol, which can be handled by our adaptive server described

in Chapter 5. A least squares estimator was used in [1] for automatic profiling of resource

usage parameters of a web server. However, the work did not establishing a dynamic

model for the server.

Several other works such as [13][26] developed kernel level mechanism to achieve

overload protection and proportional resource allocations in server systems. Their work

did not utilize feedback control, nor did they provide any relative or absolute delay

guarantees. Supporting proportional differentiated services in network routers have been

investigated in [26][47]. Their work did not address end systems such as web servers.

2.5. Data Migration in Storage Systems

An old approach to performing backups and data relocations is to do them at night, while

the system is idle. As discussed, this does not help with many current applications such

as e-business that require continuous operation and adaptation to quickly changing

system/workload conditions. The approach of bringing the whole (or parts of the) system

offline is also impractical due to the substantial business costs that it incurs. Online

migration and backup are still in their infancy in the current state of the art. Some

existing tools such as the Veritas Volume Manager [75] can guarantee consistent access

to each piece of data while its being migrated. However, we are not aware of any

existing solution that handles concurrent accesses while bounding the impact of

migration on concurrent applications.

32

Chapter 3

Feedback Control Real-Time Scheduling

Framework

In this chapter, we describe feedback control real-time scheduling (FCS), a unified

framework of adaptive real-time systems based on feedback control theory. The FCS

framework includes the following elements:

A feedback control scheduling architecture that maps adaptive resource

scheduling in real-time systems [52] to feedback control loops,

A set of performance specifications and metrics [51] to characterize transient and

steady state performance of adaptive real-time systems, and

A control theory based design methodology [50][53] for resource scheduling

algorithms to satisfy their performance specifications.

A key feature of the FCS framework is its use of feedback control theory (rather than

ad hoc solutions) as a scientific underpinning. The FCS framework enables system

designers to systematically design adaptive real-time systems with established analytical

33

methods to achieve analytically provable performance guarantees in unpredictable

environments. To our best knowledge, this is the first unified framework that provides a

fundamental theory and analytical design methodology for adaptive real-time systems to

achieve specified performance guarantees in unpredictable environments. In this chapter,

we describe the elements of the general FCS framework at a high level. The specific

technical challenges and solutions are described with its concrete instantiations in three

different application domains: real-time CPU scheduling (Chapter 4), web servers

(Chapter 5), and networked storage systems (Chapter 6).

3.1. Feedback Control Scheduling Architecture

The major components of our FCS architecture are a set of control related variables and a

feedback control loop that maps a feedback control system structure to real-time resource

scheduling.

Actuator

Monitor

performancereference

control input

controlled variable

manipulatedvariable

Real-Time System

+ -

error

controlfunction

ControllerScheduler

sample

Figure 3.1. The FCS Architecture

3.1.1. Control Related Variables

A first step in designing the FCS architecture is to decide the following key variables of a

real-time system in terms of control theory.

34

Controlled variable C(k): the performance metric that characterizes the system

performance defined over a sampling period ((k-1)W, kW), where W is a

application specific constant called the sampling window. The scheduler controls

the controlled variable in order to achieve the desired performance. The choice of

controlled variables depends on the performance guarantees that need to be

provided to the specific application of a system. For example, if an absolute delay

guarantee is required in an Internet server (e.g., critical stock trading operations in

an on-line trading system), the (absolute) service delays of HTTP requests should

be defined as the controlled variable. On the other hand, if proportional

differentiated service is required in an Internet server (e.g., e-commerce stores

where customers are classified into different service classes depending on their

monthly fees), the relative delays of service classes become the appropriate

controlled variables. For another example, the deadline miss ratio and the CPU

utilization are typical controlled variables for soft real-time systems (e.g.,

multimedia streaming, process control, and robotics) where explicit timing

constraints need to be respected.

Performance reference CS: the desired system performance in term of a controlled

variable C(k). The performance reference defines a contract established between

the adaptive resource scheduler and the users such that the performance reference

should be enforced. The difference between the performance reference and the

value of the corresponding controlled variable is called the error EC(k) = CS

35

C(k). For example, if a system set its performance to a deadline miss ratio of CS =

2%, and the current miss ratio is 10%, the system has an error EC(k) = -8%.

Manipulated variable U(k): a system attribute that is dynamically changed by the

scheduler. The manipulated variable should be effective for performance control,

e.g., changing its value should affect the systems controlled variable(s). The

choice of manipulated variable should reflect the resource bottleneck of a system.

For example, although the total requested utilization should be used as a

manipulated variable if CPU is the bottleneck resource of a web server; it should

not be used as the manipulated variable if CPU is not the bottleneck resource

(e.g., in the case of HTTP 1.1 as described in Section 5.2).

3.1.2. Feedback Control Loop

The FCS architecture has a feedback control loop that is invoked at every sampling

instant k. Each feedback control loop is composed of a Monitor, a Controller, and an

Actuator.

1) The Monitor measures the controlled variables and feeds the samples back to the

Controller.

2) The Controller compares the performance references with corresponding controlled

variables to get the current errors, and calls control algorithms to compute a control

input, the new value of the manipulated variable based on the errors. The control

algorithm is a critical component with significant impacts on the system performance

and hence is the centerpiece of the design of an FCS algorithm. Note that control

36

theory enables us to derive the control algorithm and analytically prove that the

algorithm can provide the desired performance guarantees.

3) The Actuator changes the manipulated variable based on the newly computed control

input. The Actuator implements a mechanism that dynamically reallocates

(reschedules) the resource corresponding to the manipulated variable. For example,

corresponding to a manipulated variable of the total requested CPU utilization, we

design a QoS Actuator to dynamically adjust task QoS levels (different QoS levels

have different execution times and/or invocation periods).

3.2. Performance Specifications and Metrics

We now describe the second element of the FCS framework, the performance

specifications and metrics for adaptive real-time systems. While early research on real-

time computing was concerned with guaranteeing complete avoidance of undesirable

effects such as overload and deadline misses, adaptive real-time systems are designed to

handle such effects dynamically. Using a control theory framework, we characterize the

dynamic performance of an adaptive real-time system in both transient and steady state

upon load or resource changes. Transient behavior of an adaptive system represents the

responsiveness and efficiency of adaptation in reacting to changes in run-time conditions,

and steady-state behavior describes a system's long-term performance after its transient

response settles. In contrast, traditional metrics such as the average miss-ratio often fails

to capture the transient behavior of the system in response to load variations. Another

important advantage of our metrics is that they formulate the performance of real-time

systems as dynamic responses in control theory, and therefore enable the use of control

37

design methods to satisfy the specifications. Our performance specifications and metrics

consist of a set of performance profiles1 in terms of the controlled variables. We also

present a set of representative load profiles adapted from control theory [32].

Corresponding to signals widely used in control theory, our load profiles can be used to

provide guidance for control design and generate canonical system response to variations

of run-time conditions.

3.2.1. Performance Profile

The performance profile characterizes important transient and steady state properties of a

system in terms of its controlled variables. Note that when the sampling window W is

small, a controlled variable C(k) approximates the instantaneous system performance at

the sampling instant k. In contrast, traditional metrics for real-time systems such as

average miss-ratio and average utilization are defined based on a much larger time

window than the sampling period W. The average metrics are often inadequate metric in

characterizing the dynamics of the system performance [50]. From the control theory

point of view, a real-time system transits from the steady state to the transient state when

a controlled variable deviates significantly from its steady state value in response to

variation in its run-time condition. After a time interval in the transient state, the system

may settle down to a new steady state after the feedback control loop converges the

controlled variable to the vicinity of a new value. The steady state is defined as a state

when the controlled variable C(k) stays within % of its performance reference CS. The

performance profile includes the following elements.

1 The performance profile has been called the miss-ratio profile in [50] when deadline miss ratio is used as the controlled variable.

38

Stability: A system is Bounded-Input-Bounded-Out stable if its controlled

variables are always bounded for bounded performance references and

disturbances. Note that the performance of an unstable system can severely and

persistently diverge from the desired performance so as to cause system

malfunctioning and even complete system failure. Stability is a necessary

condition for achieving the desired performance reference. Stability is especially

an important requirement for FCS algorithms because a poorly designed

Controller can overreact to performance errors and push a real-time system to

unstable conditions.

Transient-state response represents the responsiveness and efficiency of adaptive

resource scheduling in reacting to changes in run-time conditions.

Settling time Ts: The time it takes the system to settle down to a steady

state from the start of a transient state. The settling time represents how

fast the system can regain desired performance after a change in its run-

time condition.

Overshoot Co: The maximum amount that a controlled variable overshoots

its reference divided by its reference, i.e., Co = (CM CS) / CS where CM is

the maximum value of the controlled variable during its transient state.

Overshoot characterizes the worst-case transient performance degradation

of a system. A system may require a low overshoot because severe

transient performance degradation may lead to system failure. For

39

example, in media players, a high transient deadline miss-ratio can cause

buffer overflows [19].

Steady-state error ESC: The difference between the average value of a controlled

variable in steady state and its reference. The steady state error characterizes how

precise the system can enforce desired performance in steady state.

Sensitivity SP: Relative change of a controlled variable in steady state with respect

to the relative change of a system parameter P. For example, assuming the

controlled variable is deadline miss ratio, the systems sensitivity with respect to

the task execution time SAE represents how significantly the change in the task

execution time affects the system miss-ratio. Sensitivity describes the robustness

of the system with regard to workload or system variations.

The performance profile establishes a set of metrics of adaptive real-time systems based

on the specification of dynamic response in control theory. The metrics enables system

designers to apply established control theory techniques to achieve stability, and meet

transient and steady state specifications.

3.2.2. Load Profile

According to control theory, the performance profile of an adaptive system may be

specified assuming representative load profiles including step load and ramp load. The

step load represents the worst case of load variation that overloads the system

instantaneously, while the ramp load represents a nominal form of load variation. The

40

load profiles are defined as follows.

Step-load SL(Ln, Lm): a load profile that instantaneously jumps from a nominal

load Ln to a higher load Lm > Ln and stays constant after the jump. Instantaneous

load change such as the step load is more difficult to handle than gradual load

change.

Ramp-load RL(Ln, Lm, TR): a load profile that increases linearly from the nominal

load Ln to a higher load Lm > Ln during a time interval of TR sec. Compared with

the step load, the ramp signal represents a less severe load variation scenario.

One key advantage of using the above load profiles for performance specification is

that they are amenable to well-established design and analysis methods in control theory

and, therefore, fits well with our control theoretical framework. This means that a system

designer can use control theory method to analytically design the system to satisfy a

performance profile in response to a load profile as defined above. Specifically, a load

profile can be modeled as disturbance signals in the form of a step or ramp signal (see

Section 4.4). Based on control theory, a linear systems dynamic properties can be

determined by its dynamic response to a step signal or a ramp load regardless of its

parameters including the magnitude of load variation (Lm-Ln) and the ramp duration TR. If

a real-time system can be approximated with a linear model in its operation conditions,

its performance profile can be determined by stressing the system with a step load, i.e.,

the system can achieve satisfactory performance under any combinations of step and

41

ramp load if its performance profile in response to a step load or ramp load satisfies its

specifications.

Unfortunately, if a real-time system is non-linear in its operation conditions, the

dynamic response of a system in response of any load variations cannot be determined by

its response to a single step load or a single ramp load because the system performance

depends on the specific parameters of the load profiles. In this case, the performance

profiles in response to specific load profiles are only indications of the system

performance in general. In this case, the load profiles are application-specific based on a

set of expected load characteristics and system requirement.

We should also note that load profile is an abstraction of the workload, and there can

be many possible instantiations of the same load profile. The instantiation of a load

profile should incorporate the knowledge of the workload, and, therefore, the load profile

should be viewed as an enhancement to existing benchmarks (e.g., [37][40][41][42]

[75][77]). For example, the system load can be interpreted as the total requested CPU

utilization in the system where CPU is the bottleneck resource. For another example, the

load of an Internet server may be interpreted as the number of concurrent users.

Controller Design

Requirement Analysis

Modeling System Model FCS algorithms

Performance Specifications

Satisfy

Figure 3.2. Control Theory based Design Methodology for FCS Algorithms

42

3.3. Control Theory Based Design Methodology

The third element of our FCS framework is the control theory based design methodology

(see Figure 3.2). Based on the scheduling architecture and the performance specifications,

we now establish a design methodology based on feedback control theory. Using this

design methodology, a system designer can systematically design an adaptive resource

scheduler to satisfy the systems performance specifications with established analytical

methods. This methodology is in contrast with existing ad hoc approaches that depend on

laborious design/tuning/testing iterations. Our design methodology works as follows.

1) The system designer specifies the desired dynamic behavior with transient and

steady state performance metrics. This step maps the performance requirements of

an adaptive real-time system to the dynamic response specification of a control

system.

2) The system designer establishes a dynamic model of the real-time system for the

purposes of performance control. A dynamic model describes the mathematical

relationship between the control input and the controlled variables of a system

with differential/difference equations or state matrices. Modeling is important

because it provides a basis for the analytical design of the Controller. However,

modeling has been a major challenge for applying control theory to real-time

systems due to the lack of established differential/difference equations to describe

real-time systems. Two different approaches can be used to establish the dynamic

model of a real-time system. The analytical approach directly describes a system

43

with mathematical equations based on the knowledge of the system dynamics.

When such knowledge is not available, the system identification approach [11]

can be used to estimate the system model based on profiling experiments. In this

thesis work, we apply the analytical approach to model a generic CPU-bound

real-time system and a storage system, and developed a system identification tool

to model a web server whose dynamics is less clear. Our work represents a first

step in modeling real-time systems using rigorous mathematical equations. Our

modeling methodology and established analytical models provide a foundation for

the application of control theory to adaptive real-time systems in this thesis work

and future works in this area.

3) Based on the performance specs and system model from step 1) and 2), the

system designer applies established mathematical techniques (i.e., the Root Locus

method, frequency design, or state based design) of feedback control theory [32]

to design FCS algorithms that analytically guarantee the specified transient and

steady-state behavior at run-time. Compared with existing ad hoc approaches, our

analytical design approach significantly reduce the design time and required

efforts for adaptive systems because our approach requires much less design

/testing iterations. Furthermore, the resultant systems parameters can be easily

tuned with existing control theory methods and tools in practice and the resultant

system can be proved to satisfy its performance specifications. In contrast, the

tuning adaptive systems designed with ad hoc methods often depend on repeated

testing, guessing, or rule-of-thumb without performance guarantees at run-time.

44

In summary, we describe a unified FCS framework for adaptive real-time systems

that provides performance guarantees in unpredictable environments. Our FCS

framework includes 1) a software architecture for feedback performance control, 2) a

set of performance specifications and metrics that describes the efficiency, accuracy,

and robustness of performance guarantees, and 3) a control theory methodology for

designing FCS algorithms to satisfy the performance specifications. In the next three

chapters, we describe the details of three instantiations of the FCS framework in three

application domains.

45

Chapter 4

Real-Time CPU Scheduling

In this Chapter, we develop a set of novel real-time CPU scheduling algorithms called

FC-RTS [51][52][53][70] that guarantee low deadline miss ratio and high CPU utilization

when workload deviate from estimations at run-time. Our FC-RTS algorithms provide a

scheduling solution for a new category of soft real-time systems working in unpredictable

environments, whose performance cannot be guaranteed by many existing real-time

scheduling algorithms including RM [43], EDF [70], the Spring algorithm [79], and QoS

adaptation algorithms [4][61]. Such systems include open systems on the Internet such as

on-line trading servers, e-business servers, and on-line media streaming, and data driven

systems such as database applications. For example, in an on-line trading server, the

processing time for a service request often depends on the user input that is unknown to

the scheduler. For another example, in a surveillance system, the processing time of

objects tracking based on camera images can vary dramatically due to movement scope

of the object being tracked [23]. In addition, our FC-RTS algorithms can also provide

performance guarantees for off-the-shelf software applications, components, and device

drivers when accurate information on their execution time and invocation rates is

46

unavailable.

A motivation for applying FCS framework to real-time CPU scheduling is the

observation that many existing feedback based scheduling algorithms [8][21][25] are

based on heuristics rather than a theoretical foundation. These algorithms often depend

on laborious design/tuning/testing iterations, and may still fail to handle unexpected or

untested conditions at run-time. While the design methodology for automatic feedback

control systems has been developed in feedback control theory, the modeling, analysis

and implementation of real-time scheduling lead to significant research challenges to

real-time system research. In this thesis, we design our FC-RTS algorithms based on a

feedback control theory by instantiating the FCS framework in real-time CPU scheduling.

Specially, our major contributions include the following:

A novel and general feedback control real-time CPU scheduling architecture that

allows plug-ins of different real-time scheduling policies and QoS optimization

algorithms and a set of tuning rules based on the scheduling policies,

An analytical model of CPU-bound real-time system, which to our best

knowledge is the first dynamic model for generic real-time CPU scheduling,

A set of analysis results and tuning methods for FC-RTS algorithms to achieve

performance specifications including stability, settling time, overshoot, steady

state performance, and sensitivity with regard to workload variations,

Practical FC-RTS algorithms applicable to different types of real-time

applications,

Performance evaluation results demonstrating that our analytically designed FC-

RTS algorithms can provide robust performance guarantees in terms of deadline

47

miss ratio and CPU utilization, and achieve satisfactory performance profiles in

response to overloads caused by new task arrivals and task execution time

variations.

The feedback control real-time scheduling architecture is described in Section 4.1.

We describe the performance specifications and metrics in Section 4.2. We establish an

analytical model for a real-time system in Section 4.3. Based on the model, we present

the design and control analysis of a set of FC-RTS algorithms in Section 4.4. We present

the performance evaluation results of these scheduling algorithms in Section 4.5. We then

qualitatively compare FC-RTS algorithms with several existing scheduling paradigms in

Section 4.6. Finally, we summarize this chapter in Section 4.7.

CPU

Task Arrivals

Completed/AbortedTasks

QoS Actuator

Scheduler

Current Tasks

Performance References

Control Input AdjustQoS

Sched

Controller

ControlledVariables

Monitor

BasicScheduler

Figure 4.1. Feedback Control Real-Time Scheduling Architecture

4.1. Feedback Control Real-Time Scheduling Architecture

Our feedback control real-time CPU scheduling (FC-RTS) architecture (illustrated in

Figure 4.1) is composed of four parts: a task model, a set of control related variables, a

feedback control loop that maps a feedback control system structure to real-time CPU

48

scheduling, and a Basic Scheduler.

4.1.1. Task Model

In our task model, each task Ti has N QoS levels (N 2). Each QoS level j (0 j N-1)

of Ti is characterized by the following attributes:

Di[j]: the relative deadline

EEi[j]: the estimated execution time

AEi[j]: the (actual) execution time that can vary considerably from instance to

instance and is unknown to the scheduler

Vi[j]: the value that task Ti contributes if it is completed at QoS level j before its

deadline Di[j]. The lowest QoS level 0 represents the rejection of the task

and Vi[0] = 0. Every QoS level contributes a miss penalty MPi < 0 if it

misses its deadline.

Periodic tasks:

Pi[j]: the invocation period

Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / Pi[j]

Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / Pi[j]

Aperiodic tasks:

EIi[j]: the estimated inter-arrival-time between subsequent invocations

AIi[j]: the average inter-arrival-time that is unknown to the scheduler

Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / EIi[j]

Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / AIi[j]

In this model, a higher QoS level of a task has a higher (both estimated and actual)

CPU utilization and contributes a higher value if it meets its deadline, i.e., Bi[j+1] > Bi[j],

49

Ai[j+1] > Ai[j], and Vi[j+1] > Vi[j]. In the simplest form, each task only has two QoS

levels (corresponding to the admission and the rejection of the task, respectively). In

many applications including web services [5], multimedia [19], embedded digital control

systems [23], and systems that support imprecise computation [48] or flexible security

[68], each task has more than two QoS levels and the scheduler can trade-off the CPU

utilization of a task with the value it contributes to the system at a finer granularity. The

QoS levels may differ in term of execution time and/or period/inter-arrival-time. For

example, a web server may dynamically change the execution time of an HTTP session

by changing the complexity of the requested web page [5]. For another example, several

papers have shown that the deadlines and periods of tasks in embedded digital control

systems and multimedia players can be adjusted on-line [19][23][66] within certain

ranges. A key feature of our task model is that it characterizes systems in unpredictable

environments where tasks actual CPU utilization is time varying and unknown to the

scheduler. Such systems are amenable to the use of feedback control loops to

dynamically correct the scheduling errors to adapt to load variations at run-time.

4.1.2. Control Related Variables

An important step in designing the FC-RTS architecture is to decide the following

variables of a real-time system in terms of control theory.

Controlled variables are the performance metrics controlled by the scheduler in

order to achieve desired system performance. Controlled variables of a real-time

system may include the deadline miss ratio M(k) and the CPU utilization U(k)

(also called miss ratio and utilization, respectively), both defined over a time

50

window ((k-1)W, kW), where W is the sampling period and k is called the

sampling instant.

The miss ratio M(k) at the kth sampling instant is defined as the number

of deadline misses divided by the total number of completed and

aborted task instances in a sampling window ((k-1)W, kW). Miss ratio

is usually the most important performance metric in a real-time

system.

The utilization U(k) at the kth sampling instant is the percentage of

CPU busy time in a sampling window ((k-1)W, kW). CPU utilization is

regarded as a controlled variable for real-time systems due to cost and

throughput considerations. CPU utilization is important also because

the its direct linkage with the deadline miss ratio (see Section 4.3).

Another controlled variable might be the total value V(k) delivered by

the system in the kth sampling period. In the remainder of this paper,

we do not directly use the total value as a controlled variable, but

rather address the value imparted by tasks via the QoS Actuator (see

and Section 4.5.1)

Performance references represent the desired system performance in terms of the

controlled variables, i.e., the desired miss ratio MS and/or the desired CPU

utilization US. For example, a particular system may require deadline miss ratio

MS = 0 and CPU utilization US = 90%. The difference between a performance

reference and the current value of the corresponding controlled variable is called

51

an error, i.e., the miss ratio error EM = MS M(k) and the utilization error EU = US

U(k).

Manipulated variables are system attributes that can be dynamically changed by

the scheduler to affect the values of the controlled variables. In our architecture,

the manipulated variable is the total estimated utilization B(k) = iBi[li(k)] of all tasks in the system, where Ti is a task with a QoS level of li(k) in the kth sampling

window. The rational for choosing the total estimated utilization as a manipulated

variable is that most real-time scheduling policies (such as EDF and

Rate/Deadline Monotonic) can guarantee no deadline misses when the system is

not overloaded, and in normal situations, the miss ratio increases as the system

load increases. The other controlled variable, the utilization U(k), also usually

increases as the total estimated utilization increases. However, the utilization is

often different from the total estimated utilization B(k). This is due to the

estimation error of execution times when workload is unpredictable and time

varying. Another difference between U(k) and B(k) is that U(k) can never exceed

100% while B(k) does not have this boundary.

4.1.3. Feedback Control Loop

The FC-RTS architecture features a feedback control loop that is invoked at every

sampling instant k. It is composed of a Monitor, a Controller, and a QoS Actuator (Figure

4.1).

1) The Monitor measures the controlled variables (M(k) and/or U(k)) and feeds the

samples back to the Controller.

52

2) The Controller compares the performance references with corresponding controlled

variables to get the current errors, and computes a change DB(k) (called the control

input) to the total estimated requested utilization, i.e., B(k+1) = B(k) + DB(k), based

on the errors. The Controller uses a control function to compute the correct control

input to compensate for the load variations and keep the controlled variables close to

the references. The detailed design of the Controller is presented in Section 4.4.

3) The QoS Actuator calls a QoS optimization algorithm (see Section 4.5.1) to maximize

the system value by dynamically adjusting tasks QoS levels under the utilization

constraint computed by the Controller, B(k+1) = B(k) + DB(k). In the simplest form,

each task only has only two QoS levels and the QoS Actuator is essentially an

admission controller.

In addition to the above feedback control loop, our FC-RTS architecture also includes

arriving-time QoS control, i.e., in addition to being called periodically by the Controller,

the QoS Actuator is also invoked upon the arrival of each task. The arriving-time QoS

control isolates disturbances caused by new task arrivals (see Section 4.3). Feedback

control scheduling in systems without arriving-time QoS control was previously studied

in [50].

4.1.4. Basic Scheduler

The FC-RTS architecture has a Basic Scheduler that schedules admitted tasks with a

scheduling policy (e.g., EDF or Rate/Deadline Monotonic). The properties of the

scheduling policy can have significant impact on the design of the feedback control loop.

Our FC-RTS architecture permits plugging in different real-time scheduling policies for

53

this Basic Scheduler and then designing the entire feedback control scheduling system

around this choice (see Section 4.4.4).

A key difference between our work and many previous works is that while previous

work often assumes the CPU utilization of each task is known a priori, we focus on

systems in unpredictable environments where tasks actual CPU utilizations are unknown

and time varying. This more challenging problem necessitates the feedback control loop

to dynamically correct the scheduling errors at run-time. Our FC-RTS architecture

establishes a mapping from real-time scheduling to a typical structure of feedback control

systems. This step enables us to treat a real-time system as a feedback control system and

utilize feedback control theory to design the system rather than developing ad hoc

algorithms.

4.2. Performance Specifications and Metrics

We now specialize the second element of the FCS framework, the performance

specifications, to real-time CPU scheduling. The performance specifications consist of a

set of performance profiles in terms of utilization U(k) and miss ratio M(k), and a set of

load profiles in term of the total requested CPU utilization of a system.

4.2.1. Performance Profile

The performance profile characterizes important transient and steady state performance

of a real-time system. M(k) and U(k) characterize the system performance in the sampling

window ((k-1)W, kW). In contrast, traditional metrics for real-time systems such as

average miss-ratio and average utilization are defined based on a much larger time

window than the sampling period W. The average metrics are often inadequate metric in

54

characterizing the dynamics of the system performance in response to overload

conditions [50]. The performance profile of a real-time system includes the follows.

Stability: A real-time system is stable if its miss ratio M(k) and utilization U(k) are

always bounded for bounded references. Although both miss ratio M(k) and

utilization U(k) are naturally bounded in the range [0, 1], stability is a necessary

condition to prevent the controlled variables from severe deviations from the

reference values.

Transient-state response represents the real-time systems responsiveness and

efficiency of QoS adaptation in reacting to changes in run-time conditions.

Overshoot Mo and Uo: For a real-time system, we define overshoot as the

maximum amount that the system overshoots its miss ratio or utilization

reference divided by its miss ratio or utilization reference, i.e., Mo = (Mmax

MS) / MS, Uo = (Umax US) / US, respectively. The maximum miss ratio

Mo and utilization Uo in the transient state is called the absolute overshoot.

Overshoot is important to a real-time system because a high transient

miss-ratio or utilization can cause system failure in many systems such as

robots and media streaming [19].

Settling time Ts: The time it takes the system to enter a steady state in

response to a load profile. The settling time represents how fast the system

can settle down to steady state with desired miss ratio and/or utilization.

Steady-state error ESM and ESU: The difference between the average values of

miss ratio M(k) and/or utilization U(k) in steady state and its corresponding

reference. The steady state error characterizes how precise the system can enforce

55

the desired miss ratio and/or utilization in steady state.

Sensitivity Sp: Relative change of a controlled variable in steady state with respect

to the relative change of a system parameter p. For example, sensitivity of miss

ratio with respect to the task execution time SAE represents how significantly the

change in the task execution time affects the system miss-ratio. Sensitivity

describes the robustness of the system with regard to workload or system

variations.

4.2.2. Load Profile

Fo

thesis.pdf

Documents

Transcript of thesis.pdf