Feedback Control Real-Time Schedulinglu/papers/thesis.pdf · We develop Feedback Control real-time...

Feedback Control Real-Time Scheduling

A Dissertation

Presented to the Faculty of the School of Engineering and Applied Science

University of Virginia

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Computer Science

by

Chenyang Lu

May 2001

2

© Copyright by

Chenyang Lu

All Rights Reserved

May 2001

3

Approvals

This dissertation is submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Computer Science

__________________________________

Chenyang Lu

Approved:

__________________________________

John A. Stankovic (Advisor)

__________________________________

Sang H. Son (Chair)

__________________________________

Tarek F. Abdelzaher

__________________________________

Marty Humphrey

__________________________________

Jörg Liebeherr

__________________________________

Gang Tao (Minor Representative)

Accepted by the School of Engineering and Applied Science:

__________________________________

Richard W. Miksad (Dean)

May 2001

4

Abstract

We develop Feedback Control real-time Scheduling (FCS) as a unified framework to

provide Quality of Service (QoS) guarantees in unpredictable environments (such as e-

business servers on the Internet). FCS includes four major components. First, novel

scheduling architectures provide performance control to a new category of QoS critical

systems that cannot be addressed by traditional open loop scheduling paradigms. Second,

we derive dynamic models for computing systems for the purpose of performance

control. These models provide a theoretical foundation for adaptive performance control.

Third, we apply established control methodology to design scheduling algorithms with

proven performance guarantees, which is in contrast with existing heuristics-based

solutions relying on laborious design/tuning/testing iterations. Fourth, a set of control-

based performance specifications characterizes the efficiency, accuracy, and robustness

of QoS guarantees.

The generality and strength of FCS are demonstrated by its instantiations in three

important applications with significantly different characteristics. First, we develop real-

time CPU scheduling algorithms that guarantees low deadline miss ratios in systems

where task execution times may deviate from estimations at run-time. We solve the

saturation problems of real-time CPU scheduling systems with a novel integrated control

structure. Second, we develop an adaptive web server architecture to provide relative and

absolute delay guarantees to different service classes with unpredictable workloads. The

adaptive architecture has been implemented by modifying an Apache web server.

Evaluation experiments on a testbed of networked Linux PC's demonstrate that our server

provides robust relative/absolute delay guarantees despite of instantaneous changes in the

user population. Third, we develop a data migration executor for networked storage

systems that migrate data on-line while guaranteeing specified I/O throughput of

concurrent applications.

5

Acknowledgements

First, thanks to my advisor, Jack Stankovic, for being a great mentor to me both

personally and professionally. His encouragement, support, and advice are greatly

appreciated. My thanks go to Tarek Abdelzaher, Sang Son, and Gang Tao for sharing

their ideas and insights on research.

Thanks to Guillermo Alvarez, John Wilkes, Michael Hobbs, Ralph Becker-Szendy,

Simon Towers, and all other members of the storage systems program at HP Labs for

offering me a great research environment and their collaborations during my internship at

HP Labs.

Thanks to Jörg Liebeherr and Marty Humphrey for serving on my dissertation

committee and their valuable suggestions on my dissertation.

Thanks to Jörgen Hansson, Victor Lee, Michael Marley, John Regehr, and all other

members of the real-time systems group for interesting and stimulating discussions.

Thanks to all of my friends for providing invaluable moral support. I want to

especially thank Hainan Lin for helping me through the years at Charlottesville.

Finally but not least, I want to thank my parents and my wife for their understanding

and support of my research endeavors and accompanying me through all the happy and

sad days.

6

Table of Contents

1. Introduction............................................................................................................ 15

1.1. Motivation................................................................................................. 15

1.2. Contributions............................................................................................. 19

2. Related Work.......................................................................................................... 26

2.1. Classical Real-Time Scheduling ............................................................... 27

2.2. Real-Time Scheduling for Embedded Digital Control Systems ............... 28

2.3. QoS Adaptation......................................................................................... 28

2.4. Service Delay Guarantee in Web Servers................................................. 30

2.5. Data Migration in Storage Systems .......................................................... 31

3. Feedback Control Real-Time Scheduling Framework ........................................ 32

3.1. Feedback Control Scheduling Architecture .............................................. 33

3.1.1. Control Related Variables..................................................................... 33

3.1.2. Feedback Control Loop......................................................................... 35

3.2. Performance Specifications and Metrics .................................................. 36

3.2.1. Performance Profile .............................................................................. 37

3.2.2. Load Profile .......................................................................................... 39

3.3. Control Theory Based Design Methodology ............................................ 42

4. Real-Time CPU Scheduling .................................................................................. 45

4.1. Feedback Control Real-Time Scheduling Architecture............................ 47

4.1.1. Task Model ........................................................................................... 48

7

4.1.2. Control Related Variables..................................................................... 49

4.1.3. Feedback Control Loop......................................................................... 51

4.1.4. Basic Scheduler..................................................................................... 52

4.2. Performance Specifications and Metrics .................................................. 53

4.2.1. Performance Profile .............................................................................. 53

4.2.2. Load Profile .......................................................................................... 55

4.3. Modeling the Controlled Real-Time System ............................................ 56

4.4. Design of FC-RTS Algorithms ................................................................. 60

4.4.1. Design of the Controller........................................................................ 61

4.4.2. Closed-Loop System Model ................................................................. 62

4.4.3. Control Tuning and Analysis ................................................................ 64

4.4.4. FC-RTS Algorithms.............................................................................. 73

4.5. Experiments .............................................................................................. 80

4.5.1. FECSIM Real-Time System Simulator ................................................ 81

4.5.2. Scheduling Policy of the Basic Scheduler ............................................ 81

4.5.3. Workload............................................................................................... 82

4.5.4. QoS Actuator ........................................................................................ 84

4.5.5. Profiling the Controlled Real-Time Systems........................................ 85

4.5.6. Controller Parameters ........................................................................... 87

4.5.7. Performance References ....................................................................... 88

4.5.8. Evaluation Experiment A: Arrival Overload ........................................ 90

4.5.9. Evaluation Experiment B: Arrival/Internal Overload........................... 96

4.6. Comparison of Real-Time Scheduling Algorithms in Overload ............ 108

8

4.7. Summary ................................................................................................. 109

5. Web Server with Delay Guarantees..................................................................... 111

5.1. Introduction............................................................................................. 111

5.2. Background............................................................................................. 116

5.3. Semantics of Service Delay Guarantees ................................................. 118

5.4. A Feedback Control Architecture for Web Server QoS ......................... 120

5.4.1. Connection Scheduler ......................................................................... 121

5.4.2. Server Processes.................................................................................. 123

5.4.3. Monitor ............................................................................................... 123

5.4.4. Controllers........................................................................................... 123

5.5. Design of the Controller.......................................................................... 127

5.5.1. Performance Specifications ................................................................ 128

5.5.2. Modeling the Web Server: A System Identification Approach .......... 129

5.5.3. Root-Locus Design ............................................................................. 133

5.6. Implementation ....................................................................................... 136

5.7. Experimentation...................................................................................... 138

5.7.1. Comparing Connection Delays and Response Times......................... 139

5.7.2. System Identification .......................................................................... 141

5.7.3. Evaluation of the Adaptive Web Server ............................................. 143

5.8. Summary ................................................................................................. 150

6. Online Data Migration in Storage Systems ........................................................ 152

6.1. Introduction and Motivations.................................................................. 152

6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration 156

9

6.2.1. Migration Planner ............................................................................... 156

6.2.2. LV Mover............................................................................................ 157

6.2.3. QoS guarantees ................................................................................... 158

6.2.4. The Feedback Control Loop ............................................................... 160

6.2.5. The Monitor ........................................................................................ 161

6.2.6. The Controller..................................................................................... 161

6.2.7. The Actuator ....................................................................................... 163

6.3. Design and Analysis of the Controller.................................................... 163

6.3.1. The Dynamic Model ........................................................................... 164

6.3.2. Controller Tuning and Analysis.......................................................... 165

6.4. Implementation ....................................................................................... 168

6.5. Experiments ............................................................................................ 169

6.5.1. Experiment Configurations................................................................. 170

6.5.2. Migration Penalty................................................................................ 171

6.5.3. System Profiling.................................................................................. 175

6.5.4. Performance Evaluation...................................................................... 177

6.6. Conclusion and Future Work .................................................................. 185

7. General Issues ...................................................................................................... 187

7.1. Granularity of Performance Control ....................................................... 187

7.2. Sampling Period and Overhead .............................................................. 189

7.3. Robustness of Linear Models and PI Control ......................................... 191

8. Conclusions and Future Work ............................................................................ 193

Reference.................................................................................................................. 197

10

List of Figures

Figure 3.1 The FCS Architecture………………………………………………………..33

Figure 3.2 Control Theory based Design Methodology for FCS Algorithms………...…41

Figure 4.1 Feedback Control Real-Time Scheduling Architecture.................................. 47

Figure 4.2 The Model of the Controlled System .............................................................. 57

Figure 4.3 Closed-Loop System Model for Real-Time CPU Scheduling ........................ 62

Figure 4.4 System Response to Reference Input .............................................................. 69

Figure 4.5 System Response to Disturbance Input ........................................................... 70

Figure 4.6 Settling Time vs. Process Gain........................................................................ 72

Figure 4.7 The FC-UM Algorithm.................................................................................... 76

Figure 4.8 The FECSIM Simulator................................................................................... 81

Figure 4.9 Controlled Variables vs. Total Requested Utilization..................................... 86

Figure 4.10 Response to Arrival Overload SL(0, 150%) (DM/PA).................................. 89

Figure 4.11 Response to Arrival Overload SL(0, 150%) (EDF/P) ................................... 90

Figure 4.12 Execution Time Factor Ga′ in Experiment B................................................. 96

Figure 4.13 Response to Arrival/Internal Overload (DM/PA) ......................................... 97

Figure 4.14 Response to Arrival/Internal Overload (EDF/P) ........................................... 98

Figure 4.15 Average Performance of FC-RTS algorithms and the Baseline.................. 107

Figure 5.1 The Feedback-Control Architecture for Delay Guarantees .......................... 120

Figure 5.2 Architecture for system identification .......................................................... 131

Figure 5.3 The Root Locus of the web server model ..................................................... 136

Figure 5.4 Connection delay and response time............................................................. 139

11

Figure 5.5 System identification results for Relative Delay .......................................... 141

Figure 5.6 System Identification Results for Absolute Delay........................................ 143

Figure 5.7 Evaluation Results of Relative Delay Guarantees between Two Classes..... 146

Figure 5.8 Evaluation Results of Relative Delay Guarantees for Three Classes ........... 147

Figure 5.9 Evaluation of Absolute Delay Guarantees.................................................... 150

Figure 6.1 Aqueduct: The Feedback Control Architecture for Data Migration............. 160

Figure 6.2 Step Response of Aqueduct………………………………………………...167

Figure 6.3 Device iops during data migration................................................................ 172

Figure 6.4 Migration Penalty in Experiment 1…………………………………………173

Figure 6.5 Migration Penalty in Experiment 2…………………………………………173

Figure 6.6 Relationship between migration speed and migration speed………………176

Figure 6.7 Device iops and control input of Aqueduct………………………………...180

Figure 6.8 Average iops of AFAP and Aqueduct, and Aqueduct in steady state .......... 181

Figure 6.9 QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State ... 183

Figure 6.10 QoS violation ratio using 0.98IS…………………………….………………….183

Figure 6.11 Worst QoS Violations of AFAP, Aqueduct, Aqueduct in steady state…....184

Figure 6.12 Execution Time of Migration Plan……………………………...………....185

12

List of Tables

Table 4.1 Testing Configurations.................................................................................... 82

Table 4.2 Controller Parameters of FC-RTS Algorithms ............................................... 87

Table 4.3 Performance References of FC-RTS Algorithms ........................................... 88

Table 4.4 The Performance Profiles of FC-U in Experiment B.................................... 100

Table 4.5 The Performance Profiles of FC-M in Experiment B……………………....103

Table 4.6 The Performance Profiles of FC-UM in Experiment B……………….……105

Table 4.7 Comparison of Real-Time Scheduling Paradigms in Overload Conditions .109

Table 5.1 Variables and Parameters of the Absolute Delay Controller CAk................. 124

Table 5.2 Variables and Parameters of the Relative Delay Controller CRk.................. 126

13

List of Symbols

C(k) a controlled variable

CS a performance reference

U(k) a manipulated variable

TS the settling time

CO the overshoot

ESC the steady-state error

SP the sensitivity with regard to a system parameter P

SL(Ln, Lm) the step load that increases instantaneously from Ln to Lm

RL(Ln, Lm, TR) the ramp load that increases linearly from Ln to Lm within TR sec

Di[j] the relative deadline of task i at QoS level j

EEi[j] the estimated execution time of task i at QoS level j

AEi[j] the actual execution time of task i at QoS level j

Vi[j] the value of task i at QoS level j

Pi[j] the invocation period of periodic task i at QoS level j

EIi[j] the estimated inter-arrival-time of aperiodic task i at QoS level j

AIi[j] the average inter-arrival-time of aperiodic task i at QoS level j

Bi[j] the estimated CPU utilization of task i at QoS level j

Ai[j] the actual CPU utilization of task i at QoS level j

Ga(k) the utilization ratio in the kth sampling period

GA the worst-case utilization ratio

Gm(k) the miss ratio factor in the kth sampling period

GM the worst-case miss ratio factor

Ath(k) the schedulable utilization threshold GA in the kth sampling period

Wk the absolute or relative connection delay guarantee of service class k

Ck(m) the connection delay of class k in the mth sampling period

Bk(m) the process budget of class k in the mth sampling period

Rm(k) the inter-submove-time in the kth sampling period

Ii(k) the number of I/O per sec of device i in the kth sampling period

14

List of Abbreviations

FCS Feedback Control real-time Scheduling

RM Rate Monotonic scheduling policy

EDF Earliest Deadline First scheduling policy

DM Deadline Monotonic scheduling policy

15

Chapter 1

Introduction

1.1. Motivation

Real-time scheduling algorithms fall into two categories: static and dynamic scheduling.

In static scheduling, the scheduling algorithm has complete knowledge of the task set and

its constraints, such as deadlines, computation times, precedence constraints, and future

release times. The Rate Monotonic (RM) algorithm and its extensions [40][48] are static

scheduling algorithms and represent one major paradigm for real-time scheduling. In

dynamic scheduling, however, the scheduling algorithm does not have the complete

knowledge of the task set or its timing constraints. For example, new task activations, not

known to the algorithm when it is scheduling the current task set, may arrive at a future

unknown time. Dynamic scheduling can be further divided into two categories:

scheduling algorithms that work in resource sufficient environments and those that work

in resource insufficient environments. Resource sufficient environments are systems

where the system resources are sufficient to a priori guarantee that, even though tasks

arrive dynamically, at any given time all the tasks are schedulable. Under certain

16

conditions, Earliest Deadline First (EDF) [48][71] is an optimal dynamic scheduling

algorithm in resource sufficient environments. EDF is a second major paradigm for real-

time scheduling. While real-time system designers try to design the system with

sufficient resources, because of cost and unpredictable environments, it is sometimes

impossible to guarantee that the system resources are sufficient. In this case, EDF’s

performance degrades rapidly in overload situations. The Spring scheduling algorithm

[79] can dynamically guarantee incoming tasks via on-line admission control and

planning and thus is applicable in resource insufficient environments. Many other

algorithms [71] have also been developed to operate in this way. These admission-

control-based algorithms represent the third major paradigm for real-time scheduling.

However, despite the significant body of results in these three paradigms of real-time

scheduling, many real world problems are not easily supported. While algorithms such as

EDF, RM and the Spring scheduling algorithm can support sophisticated task set

characteristics (such as deadlines, precedence constraints, shared resources, jitter, etc.),

they are all "open loop" scheduling algorithms. Open loop refers to the fact that once

schedules are created they are not "adjusted" based on continuous feedback. While open-

loop scheduling algorithms can perform well in predictable environments in which the

workloads can be accurately modeled (e.g., traditional process control systems), they can

perform poorly in unpredictable environments, i.e., systems whose workloads cannot be

accurately modeled. For example, the Spring scheduling algorithm assumes complete

knowledge of the task set except for their future release times. Systems with open-loop

schedulers such as the Spring scheduling algorithm are usually designed based on worst-

case workload parameters. When accurate system workload models are not available,

17

such an approach can result in a highly underutilized system based on extremely

pessimistic estimation of workload.

In recent years, a new category of soft real-time applications executing in open and

unpredictable environments is rapidly growing [69]. Examples include open systems on

the Internet such as online trading and e-business servers, and data-driven systems such

as smart spaces, agile manufacturing, and many defense applications such as C4I. For

example, in an e-business server, neither the resource requirements nor the arrival rate of

service requests are known a priori. However, performance guarantees are required in

these applications. Failure to meet performance guarantees may result in loss of

customers, financial damage, liability violations, or even mission failures. For these

applications, a system design based on open loop scheduling and estimation of worst-case

resource requirements can result in an extremely expensive and underutilized system.

As a cost-effective approach to achieve performance guarantees in unpredictable

environments, several adaptive scheduling algorithms have been recently developed (e.g.,

[5][8][9][24][44][46][55]). While early research on real-time scheduling was concerned

with guaranteeing complete avoidance of undesirable effects such as overload and

deadline misses, adaptive real-time systems are designed to handle such effects

dynamically. There remain many open research questions in adaptive real-time

scheduling. In particular, how can a system designer specify the performance requirement

of an adaptive real-time system? And how can he systematically design a scheduling

algorithm to satisfy its performance specifications? The design methodology for

automatic adaptive systems has been developed in feedback control theory [32][34].

However, feedback control theory has been mostly applied in mechanical and electrical

18

systems. The modeling, analysis and implementation of adaptive real-time systems lead

to significant research challenges.

Recently, several works applied control theory to computing systems. For example,

several papers [4][13][22][23][28][58][63][66][73][75] presented flexible or adaptive

real-time (CPU) scheduling techniques to improve digital control system performance.

These techniques are tailored to the specific characteristics of digital control systems

instead of general adaptive real-time computing systems. Several other papers [6][19]

[44][63][64][74] presented adaptive CPU scheduling algorithms or QoS management

architectures for computing systems such multimedia and communication systems.

Transient and steady state performance of adaptive real-time systems has received special

attention in recent years. For example, Brandt et. al. [19] evaluated a dynamic QoS

manager by measuring the transient performance of applications in response to QoS

adaptations. Rosu et. al. [64] proposed a set of performance metrics to capture the

transient responsiveness of adaptations and its impact on applications. The paper

proposed metrics that is similar to settling time and steady-state error metrics found in

control theory.

However, to our best knowledge, no unified framework exists to date for designing an

adaptive system from performance specifications of desired dynamic response. In this

thesis, we establish feedback control real-time scheduling (FCS) [53], a unified

framework of adaptive real-time systems based on feedback control theory. Our control

theoretical framework includes the following elements:

Feedback control scheduling architectures that map the feedback control structure

19

to adaptive resource scheduling in real-time systems [52],

A set of performance specifications and metrics to characterize transient and

steady state performance of adaptive real-time systems [51], and

A control theory based design methodology for resource scheduling algorithms to

satisfy their performance specifications [50][53].

In contrast with ad hoc approaches that rely on laborious design/tuning/testing

iterations, our framework enables system designers to systematically design adaptive

real-time systems with established analytical methods to achieve desired performance

guarantees in unpredictable environments.

1.2. Contributions

Specifically, the main contributions of this thesis work are as follows:

A control-theoretical foundation for adaptive real-time systems: We apply

control theory to provide a theoretical foundation for adaptive real-time

scheduling. In contrast with some existing scheduling algorithms that utilize

feedback control in an ad hoc manner, we provide theoretical understanding of

feedback control scheduling and develop a systematic design methodology for

adaptive real-time systems with analytically proven performance guarantees in

unpredictable environments.

Design methodology for real-time systems in unpredictable environments:

While traditional design methods for real-time system design depend on a priori

known workloads parameters (e.g., worst-case execution times, worst case arrival

20

rates, and blocking factors due to resource contentions), our control theory based

design methodology provides robust performance guarantees when accurate

characterizations of the workloads are not available. This feature makes our

design framework especially valuable for performance critical systems in

unpredictable environments, e.g., open systems on the Internet such as online

trading and e-business servers, and data-driven systems such as smart space, agile

manufacturing, and many defense applications.

Software architecture for feedback performance control: We develop a

general software architecture for adaptive performance control in unpredictable

environments. Our architecture facilitates control theory based design and

analysis of an adaptive real-time system by mapping it to the structure of

feedback control systems. This architecture includes a set of control-related

variables (performance references, controlled variables and manipulated

variables), and software components such as monitor, actuator, and controller.

Our architecture has been implemented as three instances tailored the specific

characteristics and performance requirements of different applications including

real-time CPU scheduling, a web server, and data migration in networked storage

systems. These successful instantiations demonstrate the general applicability of

our architecture in software systems in unpredictable environments.

Performance specifications and guarantees: While hard real-time systems

require absolute guarantees, such guarantees are infeasible and unnecessary for

21

many soft real-time systems in unpredictable environment. We adopt a set of

performance metrics and specifications in control theory to characterize the

transient and steady state performance of adaptive real-time systems. Transient

state performance (including settling time and overshoot) of an adaptive system

represents the responsiveness and efficiency of adaptation in response to

environmental variations, and steady-state performance (including stability,

steady state error, and sensitivity) describes a system's long-term performance. In

contrast, traditional metrics such as average miss-ratio cannot capture the

transient behavior of the system in response to load variations.

Modeling real-time computing systems: Unlike traditional control systems such

as electrical and mechanical systems, real-time computing systems do not have

readily available differential/difference equations that can be used in control

analysis. In this thesis work, we apply analytical approach and system

identification techniques to the modeling of three computing systems, a generic

CPU-bound real-time system, a modified Apache web server, and a networked

storage system. In the analytical approach, a system designer describes a system

directly with mathematical equations based on the knowledge of the system

dynamics. When such knowledge is not available (as in the case of the Apache

web server), we use system identification [11] to estimate the system model based

on system input/output from profiling experiments. This modeling methodology

and established analytical models provide a basis for the application of control

theory to adaptive real-time scheduling.

22

Handling non-linearities of real-time systems: The control design of an

adaptive resource scheduler is non-trivial due to the non-linearities and unknown

or random factors in many real-time computing systems. We solved these

problems with model linearization techniques and novel control structures based

on the particular characteristics of real-time systems. Our work demonstrates that

robust performance control can be achieved despite of the intrinsic non-linearities

and uncertainties of real-time systems.

Practical FCS implementation in three applications: Using our design

framework, we develop practical resource scheduling algorithms that can provide

robust (steady state and transient) performance guarantees in unpredictable

environments, while traditional scheduling algorithms fail to provide such

guarantees. We develop FCS algorithms for three application domains including

real-time CPU scheduling, web servers, and storage systems. These applications

are significantly different in terms of semantics of performance guarantees,

scheduled resources, monitor/actuator mechanisms, and system models. Our

evaluation experiments demonstrate that our FCS algorithms based on the FCS

framework successfully achieved robust performance guarantees in all three

applications. The success in these applications demonstrates that FCS is a unified

framework for adaptive computing systems.

• Real-Time CPU Scheduling: We develop a set of feedback control real-

time scheduling (FCS) algorithms that guarantees low deadline miss ratio

23

and high CPU utilization by dynamically adjusting task QoS levels and

CPU requirements. Simulation experiments demonstrate that our FCS

algorithms provide robust steady and transient state performance

guarantees in terms of deadline miss ratio even when the task execution

time varied considerably from the estimation and when the system’s

schedulable utilization bound is unknown.

• Connection Scheduling in Web Servers: We develop adaptive connection

scheduling algorithms that provide relative, absolute and hybrid service

delay guarantees for different service classes on web servers under HTTP

1.1. The scheduling algorithms feature feedback control loops that

enforce delay guarantees for classes via dynamic connection scheduling

and server process reallocation. The scheduling algorithms have been

implemented by modifying an Apache web server. Experimental results

demonstrate that our adaptive server provides robust delay guarantees

when web workload varies significantly. Properties of our adaptive web

server also include guaranteed stability, and satisfactory efficiency and

accuracy in achieving desired delay or delay differentiation. Our new real-

time web server will be particularly useful for e-business and e-trading

applications, where a priori QoS guarantees is desirable in face of bursty

and unpredictable workloads from the Internet.

• On-line Data Migration in Storage Systems: We have extended our work

to a non-real-time application, on-line data migration in storage systems.

On-line data migration is necessary in large-scale storage systems (e.g.,

24

data centers of e-business and large organizations, and multimedia service

centers such as video-on-demand) due to performance optimization and

load balancing, and back-up operations. However, data migration can

cause unacceptable performance degradations in concurrent applications

due to excessive resource contentions on the storage system. We develop

an adaptive data migration executor with a feedback control architecture

that guarantees desired I/O throughput for applications by dynamically

regulating the speed of data migration. The migration executor has been

implemented and evaluated at a storage testbed at HP Labs. Our

evaluation experiments demonstrate that our adaptive migration executor

achieved specified I/O throughput of all devices at the cost of slowing

down data migration. Our work on storage systems demonstrates the

generality of our control-theory-based framework in non-real-time

systems.

Technology Impact: Not only have we produced several research papers

[6][50][51][52][53][70], parts of this thesis work have also been transferred to

other university research groups. We have sent our real-time CPU scheduling

simulator FECSIM and the feedback control CPU scheduling algorithms to a

group in Sweden for them to study the algorithms. We have transferred the source

code of our adaptive web server and system identification software to Professor

Lui Sha’s group at UIUC and given them inputs on modeling of web servers. The

project of online data migration in networked storage systems was conducted

25

when the author was a research intern in the Storage Systems Program at Hewlett

Packard Laboratories (Palo Alto). Hewlett Packard is in the process of applying

the feedback control data migration technique developed in the Aqueduct project

for a patent.

The rest of the thesis is organized as follows. We discuss the state-of-the-art in

Chapter 2. In Chapter 3, we present the general control-theory based design methodology

for adaptive real-time systems. The first case study, feedback control real-time CPU

scheduling, is presented in Chapter 4. The second case study, adaptive connection

scheduling for service delay guarantees in web servers, is presented in Chapter 5. The

third case study, on-line data migration with I/O throughput guarantees on concurrent

applications in storage systems, is presented in Chapter 6. After summarizing several

general issues in Chapter 7, we conclude the thesis at Chapter 8.

26

Chapter 2

Related Work

A general trend of real-time resource scheduling has evolved from static to dynamic and

adaptive while the target application environments becomes increasingly unpredictable.

While classical real-time scheduling that concerns with absolute guarantees in highly

predictable environments, more recent research aims at developing more flexible,

adaptive and cost-effective solutions to handle unpredictable environments. This thesis

work establishes a theoretical foundation and unified framework for achieving a new

category of performance guarantees in unpredictable environments with adaptive real-

time resource scheduling. In this chapter, we summarize the work related to this thesis

research. The classical results on real-time scheduling is described in Section 2.1. A

category of flexible and adaptive real-time scheduling algorithms tailored for digital

control systems is summarized in Section 2.2. In Section 2.3, we then describe existing

QoS adaptation techniques and compare them with our FCS framework. Related works

on web server delay guarantees and storage systems are summarized in Sections 2.4 and

2.5, respectively.

27

2.1. Classical Real-Time Scheduling

Classical real-time scheduling algorithms depend on a priori characterization of

workload and systems to provide performance guarantees in predictable environments

(e.g., embedded process control and avionics). For example, Rate Monotonic (RM)

[40][48] and Earliest Deadline First (EDF) [48][71] require complete knowledge about

the task set such as resource requirements, precedence constraints, resource contention,

and future arrival times. Dynamic real-time systems [71] pioneered by the Spring project

[79] provide guarantees upon new task arrivals with on-line admission control and

planning. Unlike earlier systems based on RM or EDF, the dynamic real-time systems do

not require future task arrival time to be known a priori. However, the on-line admission

control and planning in the above dynamic systems still depend on a priori task set

characterizations including resource requirements, precedence constraints, and resource

contention. While classical algorithms such as EDF, RM and the Spring scheduling

algorithm can support sophisticated task set characteristics, they cannot provide

performance guarantees in systems operating in unpredictable environments where an

accurate workload model is not available. Such systems include Internet servers (e.g., on-

line stock trading and e-business) and data-driven systems (e.g., smart spaces and agile

manufacturing). A key observation that motivated this thesis work is that a fundamental

reason for the inadequacy of classical real-time scheduling in unpredictable environments

lies in their open loop nature. Because they do not adjust schedules based on continuous

performance feedback, open loop schedulers schedule tasks and system resource based on

worst-case workload estimations. When accurate system workload models are not

available, the open loop approach may result in a highly underutilized system based on

28

extremely pessimistic estimation of workload. In contrast, feedback control real-time

scheduling provides robust performance guarantees in unpredictable environments with a

closed loop approach.

2.2. Real-Time Scheduling for Embedded Digital Control Systems

There have been several results that have applied feedback control theory to the design of

real-time computing systems. For example, several papers [30][58][65][66] presented co-

design methods for real-time scheduling algorithms and embedded digital control

systems. The co-design methods trade-off the quality of control performance and its

computation requirements to produce more cost-effective system designs than separate

design of control and scheduling. There approaches are off-line solutions and their on-

line scheduling algorithms are still classical open-loop algorithms such as EDF and RM.

Several other papers presented on-line scheduling algorithms [4][16][22][23][30][73] to

improve the robustness of digital control system by dynamically relaxing the timing

constraints within the tolerable range of the digital control system in overload conditions.

However, these techniques require a priori knowledge of the tasks such as execution

times. Furthermore, these techniques are tailored to CPU-bound digital controllers and

are not applicable to other computing systems such as e-business servers and on-line

trading where the performance bottleneck may not be the CPU.

2.3. QoS Adaptation

The concept of using performance feedback to adjust the schedule has been incorporated

in general-purpose operating systems in the form of multi-level feedback queue

scheduling [18]. The system adjusts a task’s priority based on whether it consumes a time

29

slice or is blocked due to I/O. This type of feedback control is based on intuitive solutions

rather than systematic control derivation to achieve performance guarantees.

In recent years, QoS adaptation architectures and algorithms have been developed to

support applications such as communication subsystems [8], multimedia [19][24],

distributed visual tracking [46] and operating systems [55][61][63][69][78]. Some of

these techniques [55][61][63] include optimization algorithms to optimize the value in

QoS adaptation. However, their optimization algorithms assume that the resource

requirement of every QoS level is a priori known. In contrast, our FCS framework

provides performance guarantees even when the resource requirements are unknown or

deviate from the estimations. Several other works [8][21][25][78] developed feedback

based adaptation algorithms that do not depend on completely accurate knowledge about

workloads. However, their feedback loops were based on heuristics and they did not

establish time domain analysis on the efficiency of QoS adaptation in response to run-

time variations. Our FCS framework provides a unified framework to design adaptive

real-time systems with proven transient state performance.

Li and Nahrstedt utilized control theory to develop a feedback control loop to

guarantee desired network packet rate in a distributed visual tracking system [46]. Hollot,

Misra, Towsley, and Gong In [36] apply control theory to analyze a congestion control

algorithm on IP routers. While these works also uses control theory analysis on

computing systems, they do not address timing constraints and service delays on end

server systems , which is the focus of this thesis.

Transient and steady state performance of QoS adaptation has received special

attention in recent years (e.g., [19][64][75]). For example, Brandt et. al. [19] evaluated a

30

dynamic QoS manager by measuring the transient performance of applications in

response to QoS adaptations. Rosu et. al. [64] proposed a set of performance metrics to

capture the transient responsiveness of adaptations and its impact on applications.

However, they did not provide a methodology to design a system from its performance

specifications in terms of above metrics. Instead they only used the metrics in system

testing. In contrast, by extending and mapping these metrics to the dynamic response of

control systems, our FCS framework provide a control-theory-based methodology to

design a system to analytically satisfy its performance specifications.

2.4. Service Delay Guarantee in Web Servers

Support for different classes of service on the Web (with special emphasis on server

delay differentiation) has been investigated in recent literature. For example, the authors

of [28] proposed and evaluated an architecture in which restrictions are imposed on the

amount of server resources (such as threads or processes), which are available to basic

clients. In [9][10] admission control and scheduling algorithms are used to provide

premium clients with better service. In [17] a server architecture is proposed that

maintains separate service queues for premium and basic clients, thus facilitating their

differential treatment. While the above differentiation approach usually offers better

service to premium clients, it does not provide any guarantees on the service and hence

can be called the best effort differentiation model.

Notably, a feedback control loop was used in [5][6][9] to control the desired CPU

utilization of a web server with adaptive admission control. Their CPU utilization control

can be extended to guarantee the desired absolute delay in web servers under HTTP 1.0

protocol and when CPU is the bottleneck resource. This technique is not applicable to

31

servers under HTTP 1.1 protocol, which can be handled by our adaptive server described

in Chapter 5. A least squares estimator was used in [1] for automatic profiling of resource

usage parameters of a web server. However, the work did not establishing a dynamic

model for the server.

Several other works such as [13][26] developed kernel level mechanism to achieve

overload protection and proportional resource allocations in server systems. Their work

did not utilize feedback control, nor did they provide any relative or absolute delay

guarantees. Supporting proportional differentiated services in network routers have been

investigated in [26][47]. Their work did not address end systems such as web servers.

2.5. Data Migration in Storage Systems

An old approach to performing backups and data relocations is to do them at night, while

the system is idle. As discussed, this does not help with many current applications such

as e-business that require continuous operation and adaptation to quickly changing

system/workload conditions. The approach of bringing the whole (or parts of the) system

offline is also impractical due to the substantial business costs that it incurs. Online

migration and backup are still in their infancy in the current state of the art. Some

existing tools such as the Veritas Volume Manager [75] can guarantee consistent access

to each piece of data while it’s being migrated. However, we are not aware of any

existing solution that handles concurrent accesses while bounding the impact of

migration on concurrent applications.

32

Chapter 3

Feedback Control Real-Time Scheduling

Framework

In this chapter, we describe feedback control real-time scheduling (FCS), a unified

framework of adaptive real-time systems based on feedback control theory. The FCS

framework includes the following elements:

A feedback control scheduling architecture that maps adaptive resource

scheduling in real-time systems [52] to feedback control loops,

A set of performance specifications and metrics [51] to characterize transient and

steady state performance of adaptive real-time systems, and

A control theory based design methodology [50][53] for resource scheduling

algorithms to satisfy their performance specifications.

A key feature of the FCS framework is its use of feedback control theory (rather than

ad hoc solutions) as a scientific underpinning. The FCS framework enables system

designers to systematically design adaptive real-time systems with established analytical

33

methods to achieve analytically provable performance guarantees in unpredictable

environments. To our best knowledge, this is the first unified framework that provides a

fundamental theory and analytical design methodology for adaptive real-time systems to

achieve specified performance guarantees in unpredictable environments. In this chapter,

we describe the elements of the general FCS framework at a high level. The specific

technical challenges and solutions are described with its concrete instantiations in three

different application domains: real-time CPU scheduling (Chapter 4), web servers

(Chapter 5), and networked storage systems (Chapter 6).

3.1. Feedback Control Scheduling Architecture

The major components of our FCS architecture are a set of control related variables and a

feedback control loop that maps a feedback control system structure to real-time resource

scheduling.

Actuator

Monitor

performancereference

control input

controlled variable

manipulatedvariable

Real-Time System

+ -

error

controlfunction

ControllerScheduler

sample

Figure 3.1. The FCS Architecture

3.1.1. Control Related Variables

A first step in designing the FCS architecture is to decide the following key variables of a

real-time system in terms of control theory.

34

Controlled variable C(k): the performance metric that characterizes the system

performance defined over a sampling period ((k-1)W, kW), where W is a

application specific constant called the sampling window. The scheduler controls

the controlled variable in order to achieve the desired performance. The choice of

controlled variables depends on the performance guarantees that need to be

provided to the specific application of a system. For example, if an absolute delay

guarantee is required in an Internet server (e.g., critical stock trading operations in

an on-line trading system), the (absolute) service delays of HTTP requests should

be defined as the controlled variable. On the other hand, if proportional

differentiated service is required in an Internet server (e.g., e-commerce stores

where customers are classified into different service classes depending on their

monthly fees), the relative delays of service classes become the appropriate

controlled variables. For another example, the deadline miss ratio and the CPU

utilization are typical controlled variables for soft real-time systems (e.g.,

multimedia streaming, process control, and robotics) where explicit timing

constraints need to be respected.

Performance reference CS: the desired system performance in term of a controlled

variable C(k). The performance reference defines a contract established between

the adaptive resource scheduler and the users such that the performance reference

should be enforced. The difference between the performance reference and the

value of the corresponding controlled variable is called the error EC(k) = CS –

35

C(k). For example, if a system set its performance to a deadline miss ratio of CS =

2%, and the current miss ratio is 10%, the system has an error EC(k) = -8%.

Manipulated variable U(k): a system attribute that is dynamically changed by the

scheduler. The manipulated variable should be effective for performance control,

e.g., changing its value should affect the system’s controlled variable(s). The

choice of manipulated variable should reflect the resource bottleneck of a system.

For example, although the total requested utilization should be used as a

manipulated variable if CPU is the bottleneck resource of a web server; it should

not be used as the manipulated variable if CPU is not the bottleneck resource

(e.g., in the case of HTTP 1.1 as described in Section 5.2).

3.1.2. Feedback Control Loop

The FCS architecture has a feedback control loop that is invoked at every sampling

instant k. Each feedback control loop is composed of a Monitor, a Controller, and an

Actuator.

1) The Monitor measures the controlled variables and feeds the samples back to the

Controller.

2) The Controller compares the performance references with corresponding controlled

variables to get the current errors, and calls control algorithms to compute a control

input, the new value of the manipulated variable based on the errors. The control

algorithm is a critical component with significant impacts on the system performance

and hence is the centerpiece of the design of an FCS algorithm. Note that control

36

theory enables us to derive the control algorithm and analytically prove that the

algorithm can provide the desired performance guarantees.

3) The Actuator changes the manipulated variable based on the newly computed control

input. The Actuator implements a mechanism that dynamically reallocates

(reschedules) the resource corresponding to the manipulated variable. For example,

corresponding to a manipulated variable of the total requested CPU utilization, we

design a QoS Actuator to dynamically adjust task QoS levels (different QoS levels

have different execution times and/or invocation periods).

3.2. Performance Specifications and Metrics

We now describe the second element of the FCS framework, the performance

specifications and metrics for adaptive real-time systems. While early research on real-

time computing was concerned with guaranteeing complete avoidance of undesirable

effects such as overload and deadline misses, adaptive real-time systems are designed to

handle such effects dynamically. Using a control theory framework, we characterize the

dynamic performance of an adaptive real-time system in both transient and steady state

upon load or resource changes. Transient behavior of an adaptive system represents the

responsiveness and efficiency of adaptation in reacting to changes in run-time conditions,

and steady-state behavior describes a system's long-term performance after its transient

response settles. In contrast, traditional metrics such as the average miss-ratio often fails

to capture the transient behavior of the system in response to load variations. Another

important advantage of our metrics is that they formulate the performance of real-time

systems as dynamic responses in control theory, and therefore enable the use of control

37

design methods to satisfy the specifications. Our performance specifications and metrics

consist of a set of performance profiles1 in terms of the controlled variables. We also

present a set of representative load profiles adapted from control theory [32].

Corresponding to signals widely used in control theory, our load profiles can be used to

provide guidance for control design and generate canonical system response to variations

of run-time conditions.

3.2.1. Performance Profile

The performance profile characterizes important transient and steady state properties of a

system in terms of its controlled variables. Note that when the sampling window W is

small, a controlled variable C(k) approximates the instantaneous system performance at

the sampling instant k. In contrast, traditional metrics for real-time systems such as

average miss-ratio and average utilization are defined based on a much larger time

window than the sampling period W. The average metrics are often inadequate metric in

characterizing the dynamics of the system performance [50]. From the control theory

point of view, a real-time system transits from the steady state to the transient state when

a controlled variable deviates significantly from its steady state value in response to

variation in its run-time condition. After a time interval in the transient state, the system

may settle down to a new steady state after the feedback control loop converges the

controlled variable to the vicinity of a new value. The steady state is defined as a state

when the controlled variable C(k) stays within ε% of its performance reference CS. The

performance profile includes the following elements.

1 The performance profile has been called the miss-ratio profile in [50] when deadline miss ratio is used as the controlled variable.

38

Stability: A system is Bounded-Input-Bounded-Out stable if its controlled

variables are always bounded for bounded performance references and

disturbances. Note that the performance of an unstable system can severely and

persistently diverge from the desired performance so as to cause system

malfunctioning and even complete system failure. Stability is a necessary

condition for achieving the desired performance reference. Stability is especially

an important requirement for FCS algorithms because a poorly designed

Controller can overreact to performance errors and push a real-time system to

unstable conditions.

Transient-state response represents the responsiveness and efficiency of adaptive

resource scheduling in reacting to changes in run-time conditions.

• Settling time Ts: The time it takes the system to settle down to a steady

state from the start of a transient state. The settling time represents how

fast the system can regain desired performance after a change in its run-

time condition.

• Overshoot Co: The maximum amount that a controlled variable overshoots

its reference divided by its reference, i.e., Co = (CM – CS) / CS where CM is

the maximum value of the controlled variable during its transient state.

Overshoot characterizes the worst-case transient performance degradation

of a system. A system may require a low overshoot because severe

transient performance degradation may lead to system failure. For

39

example, in media players, a high transient deadline miss-ratio can cause

buffer overflows [19].

Steady-state error ESC: The difference between the average value of a controlled

variable in steady state and its reference. The steady state error characterizes how

precise the system can enforce desired performance in steady state.

Sensitivity SP: Relative change of a controlled variable in steady state with respect

to the relative change of a system parameter P. For example, assuming the

controlled variable is deadline miss ratio, the system’s sensitivity with respect to

the task execution time SAE represents how significantly the change in the task

execution time affects the system miss-ratio. Sensitivity describes the robustness

of the system with regard to workload or system variations.

The performance profile establishes a set of metrics of adaptive real-time systems based

on the specification of dynamic response in control theory. The metrics enables system

designers to apply established control theory techniques to achieve stability, and meet

transient and steady state specifications.

3.2.2. Load Profile

According to control theory, the performance profile of an adaptive system may be

specified assuming representative load profiles including step load and ramp load. The

step load represents the worst case of load variation that overloads the system

instantaneously, while the ramp load represents a nominal form of load variation. The

40

load profiles are defined as follows.

Step-load SL(Ln, Lm): a load profile that instantaneously jumps from a nominal

load Ln to a higher load Lm > Ln and stays constant after the jump. Instantaneous

load change such as the step load is more difficult to handle than gradual load

change.

Ramp-load RL(Ln, Lm, TR): a load profile that increases linearly from the nominal

load Ln to a higher load Lm > Ln during a time interval of TR sec. Compared with

the step load, the ramp signal represents a less severe load variation scenario.

One key advantage of using the above load profiles for performance specification is

that they are amenable to well-established design and analysis methods in control theory

and, therefore, fits well with our control theoretical framework. This means that a system

designer can use control theory method to analytically design the system to satisfy a

performance profile in response to a load profile as defined above. Specifically, a load

profile can be modeled as disturbance signals in the form of a step or ramp signal (see

Section 4.4). Based on control theory, a linear system’s dynamic properties can be

determined by its dynamic response to a step signal or a ramp load regardless of its

parameters including the magnitude of load variation (Lm-Ln) and the ramp duration TR. If

a real-time system can be approximated with a linear model in its operation conditions,

its performance profile can be determined by stressing the system with a step load, i.e.,

the system can achieve satisfactory performance under any combinations of step and

41

ramp load if its performance profile in response to a step load or ramp load satisfies its

specifications.

Unfortunately, if a real-time system is non-linear in its operation conditions, the

dynamic response of a system in response of any load variations cannot be determined by

its response to a single step load or a single ramp load because the system performance

depends on the specific parameters of the load profiles. In this case, the performance

profiles in response to specific load profiles are only “indications” of the system

performance in general. In this case, the load profiles are application-specific based on a

set of expected load characteristics and system requirement.

We should also note that load profile is an abstraction of the workload, and there can

be many possible instantiations of the same load profile. The instantiation of a load

profile should incorporate the knowledge of the workload, and, therefore, the load profile

should be viewed as an enhancement to existing benchmarks (e.g., [37][40][41][42]

[75][77]). For example, the system load can be interpreted as the total requested CPU

utilization in the system where CPU is the bottleneck resource. For another example, the

load of an Internet server may be interpreted as the number of concurrent users.

Controller Design

Requirement Analysis

Modeling System Model FCS algorithms

Performance Specifications

Satisfy

Figure 3.2. Control Theory based Design Methodology for FCS Algorithms

42

3.3. Control Theory Based Design Methodology

The third element of our FCS framework is the control theory based design methodology

(see Figure 3.2). Based on the scheduling architecture and the performance specifications,

we now establish a design methodology based on feedback control theory. Using this

design methodology, a system designer can systematically design an adaptive resource

scheduler to satisfy the system’s performance specifications with established analytical

methods. This methodology is in contrast with existing ad hoc approaches that depend on

laborious design/tuning/testing iterations. Our design methodology works as follows.

1) The system designer specifies the desired dynamic behavior with transient and

steady state performance metrics. This step maps the performance requirements of

an adaptive real-time system to the dynamic response specification of a control

system.

2) The system designer establishes a dynamic model of the real-time system for the

purposes of performance control. A dynamic model describes the mathematical

relationship between the control input and the controlled variables of a system

with differential/difference equations or state matrices. Modeling is important

because it provides a basis for the analytical design of the Controller. However,

modeling has been a major challenge for applying control theory to real-time

systems due to the lack of established differential/difference equations to describe

real-time systems. Two different approaches can be used to establish the dynamic

model of a real-time system. The analytical approach directly describes a system

43

with mathematical equations based on the knowledge of the system dynamics.

When such knowledge is not available, the system identification approach [11]

can be used to estimate the system model based on profiling experiments. In this

thesis work, we apply the analytical approach to model a generic CPU-bound

real-time system and a storage system, and developed a system identification tool

to model a web server whose dynamics is less clear. Our work represents a first

step in modeling real-time systems using rigorous mathematical equations. Our

modeling methodology and established analytical models provide a foundation for

the application of control theory to adaptive real-time systems in this thesis work

and future works in this area.

3) Based on the performance specs and system model from step 1) and 2), the

system designer applies established mathematical techniques (i.e., the Root Locus

method, frequency design, or state based design) of feedback control theory [32]

to design FCS algorithms that analytically guarantee the specified transient and

steady-state behavior at run-time. Compared with existing ad hoc approaches, our

analytical design approach significantly reduce the design time and required

efforts for adaptive systems because our approach requires much less design

/testing iterations. Furthermore, the resultant system’s parameters can be easily

tuned with existing control theory methods and tools in practice and the resultant

system can be proved to satisfy its performance specifications. In contrast, the

tuning adaptive systems designed with ad hoc methods often depend on repeated

testing, guessing, or rule-of-thumb without performance guarantees at run-time.

44

In summary, we describe a unified FCS framework for adaptive real-time systems

that provides performance guarantees in unpredictable environments. Our FCS

framework includes 1) a software architecture for feedback performance control, 2) a

set of performance specifications and metrics that describes the efficiency, accuracy,

and robustness of performance guarantees, and 3) a control theory methodology for

designing FCS algorithms to satisfy the performance specifications. In the next three

chapters, we describe the details of three instantiations of the FCS framework in three

application domains.

45

Chapter 4

Real-Time CPU Scheduling

In this Chapter, we develop a set of novel real-time CPU scheduling algorithms called

FC-RTS [51][52][53][70] that guarantee low deadline miss ratio and high CPU utilization

when workload deviate from estimations at run-time. Our FC-RTS algorithms provide a

scheduling solution for a new category of soft real-time systems working in unpredictable

environments, whose performance cannot be guaranteed by many existing real-time

scheduling algorithms including RM [43], EDF [70], the Spring algorithm [79], and QoS

adaptation algorithms [4][61]. Such systems include open systems on the Internet such as

on-line trading servers, e-business servers, and on-line media streaming, and data driven

systems such as database applications. For example, in an on-line trading server, the

processing time for a service request often depends on the user input that is unknown to

the scheduler. For another example, in a surveillance system, the processing time of

objects tracking based on camera images can vary dramatically due to movement scope

of the object being tracked [23]. In addition, our FC-RTS algorithms can also provide

performance guarantees for off-the-shelf software applications, components, and device

drivers when accurate information on their execution time and invocation rates is

46

unavailable.

A motivation for applying FCS framework to real-time CPU scheduling is the

observation that many existing feedback based scheduling algorithms [8][21][25] are

based on heuristics rather than a theoretical foundation. These algorithms often depend

on laborious design/tuning/testing iterations, and may still fail to handle unexpected or

untested conditions at run-time. While the design methodology for automatic feedback

control systems has been developed in feedback control theory, the modeling, analysis

and implementation of real-time scheduling lead to significant research challenges to

real-time system research. In this thesis, we design our FC-RTS algorithms based on a

feedback control theory by instantiating the FCS framework in real-time CPU scheduling.

Specially, our major contributions include the following:

A novel and general feedback control real-time CPU scheduling architecture that

allows plug-ins of different real-time scheduling policies and QoS optimization

algorithms and a set of tuning rules based on the scheduling policies,

An analytical model of CPU-bound real-time system, which to our best

knowledge is the first dynamic model for generic real-time CPU scheduling,

A set of analysis results and tuning methods for FC-RTS algorithms to achieve

performance specifications including stability, settling time, overshoot, steady

state performance, and sensitivity with regard to workload variations,

Practical FC-RTS algorithms applicable to different types of real-time

applications,

Performance evaluation results demonstrating that our analytically designed FC-

RTS algorithms can provide robust performance guarantees in terms of deadline

47

miss ratio and CPU utilization, and achieve satisfactory performance profiles in

response to overloads caused by new task arrivals and task execution time

variations.

The feedback control real-time scheduling architecture is described in Section 4.1.

We describe the performance specifications and metrics in Section 4.2. We establish an

analytical model for a real-time system in Section 4.3. Based on the model, we present

the design and control analysis of a set of FC-RTS algorithms in Section 4.4. We present

the performance evaluation results of these scheduling algorithms in Section 4.5. We then

qualitatively compare FC-RTS algorithms with several existing scheduling paradigms in

Section 4.6. Finally, we summarize this chapter in Section 4.7.

CPU

Task Arrivals

Completed/AbortedTasks

QoS Actuator

Scheduler

Current Tasks

Performance References

Control Input AdjustQoS

Sched

Controller

ControlledVariables

Monitor

BasicScheduler

Figure 4.1. Feedback Control Real-Time Scheduling Architecture

4.1. Feedback Control Real-Time Scheduling Architecture

Our feedback control real-time CPU scheduling (FC-RTS) architecture (illustrated in

Figure 4.1) is composed of four parts: a task model, a set of control related variables, a

feedback control loop that maps a feedback control system structure to real-time CPU

48

scheduling, and a Basic Scheduler.

4.1.1. Task Model

In our task model, each task Ti has N QoS levels (N ≥ 2). Each QoS level j (0 ≤ j ≤ N-1)

of Ti is characterized by the following attributes:

Di[j]: the relative deadline

EEi[j]: the estimated execution time

AEi[j]: the (actual) execution time that can vary considerably from instance to

instance and is unknown to the scheduler

Vi[j]: the value that task Ti contributes if it is completed at QoS level j before its

deadline Di[j]. The lowest QoS level 0 represents the rejection of the task

and Vi[0] = 0. Every QoS level contributes a miss penalty MPi < 0 if it

misses its deadline.

Periodic tasks:

Pi[j]: the invocation period

Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / Pi[j]

Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / Pi[j]

Aperiodic tasks:

EIi[j]: the estimated inter-arrival-time between subsequent invocations

AIi[j]: the average inter-arrival-time that is unknown to the scheduler

Bi[j]: the estimated CPU utilization Bi[j] = EEi[j] / EIi[j]

Ai[j]: the (actual) CPU utilization Ai[j] = AEi[j] / AIi[j]

In this model, a higher QoS level of a task has a higher (both estimated and actual)

CPU utilization and contributes a higher value if it meets its deadline, i.e., Bi[j+1] > Bi[j],

49

Ai[j+1] > Ai[j], and Vi[j+1] > Vi[j]. In the simplest form, each task only has two QoS

levels (corresponding to the admission and the rejection of the task, respectively). In

many applications including web services [5], multimedia [19], embedded digital control

systems [23], and systems that support imprecise computation [48] or flexible security

[68], each task has more than two QoS levels and the scheduler can trade-off the CPU

utilization of a task with the value it contributes to the system at a finer granularity. The

QoS levels may differ in term of execution time and/or period/inter-arrival-time. For

example, a web server may dynamically change the execution time of an HTTP session

by changing the complexity of the requested web page [5]. For another example, several

papers have shown that the deadlines and periods of tasks in embedded digital control

systems and multimedia players can be adjusted on-line [19][23][66] within certain

ranges. A key feature of our task model is that it characterizes systems in unpredictable

environments where task’s actual CPU utilization is time varying and unknown to the

scheduler. Such systems are amenable to the use of feedback control loops to

dynamically correct the scheduling errors to adapt to load variations at run-time.

4.1.2. Control Related Variables

An important step in designing the FC-RTS architecture is to decide the following

variables of a real-time system in terms of control theory.

Controlled variables are the performance metrics controlled by the scheduler in

order to achieve desired system performance. Controlled variables of a real-time

system may include the deadline miss ratio M(k) and the CPU utilization U(k)

(also called miss ratio and utilization, respectively), both defined over a time

50

window ((k-1)W, kW), where W is the sampling period and k is called the

sampling instant.

• The miss ratio M(k) at the kth sampling instant is defined as the number

of deadline misses divided by the total number of completed and

aborted task instances in a sampling window ((k-1)W, kW). Miss ratio

is usually the most important performance metric in a real-time

system.

• The utilization U(k) at the kth sampling instant is the percentage of

CPU busy time in a sampling window ((k-1)W, kW). CPU utilization is

regarded as a controlled variable for real-time systems due to cost and

throughput considerations. CPU utilization is important also because

the its direct linkage with the deadline miss ratio (see Section 4.3).

• Another controlled variable might be the total value V(k) delivered by

the system in the kth sampling period. In the remainder of this paper,

we do not directly use the total value as a controlled variable, but

rather address the value imparted by tasks via the QoS Actuator (see

and Section 4.5.1)

Performance references represent the desired system performance in terms of the

controlled variables, i.e., the desired miss ratio MS and/or the desired CPU

utilization US. For example, a particular system may require deadline miss ratio

MS = 0 and CPU utilization US = 90%. The difference between a performance

reference and the current value of the corresponding controlled variable is called

51

an error, i.e., the miss ratio error EM = MS – M(k) and the utilization error EU = US

– U(k).

Manipulated variables are system attributes that can be dynamically changed by

the scheduler to affect the values of the controlled variables. In our architecture,

the manipulated variable is the total estimated utilization B(k) = ∑iBi[li(k)] of all

tasks in the system, where Ti is a task with a QoS level of li(k) in the kth sampling

window. The rational for choosing the total estimated utilization as a manipulated

variable is that most real-time scheduling policies (such as EDF and

Rate/Deadline Monotonic) can guarantee no deadline misses when the system is

not overloaded, and in normal situations, the miss ratio increases as the system

load increases. The other controlled variable, the utilization U(k), also usually

increases as the total estimated utilization increases. However, the utilization is

often different from the total estimated utilization B(k). This is due to the

estimation error of execution times when workload is unpredictable and time

varying. Another difference between U(k) and B(k) is that U(k) can never exceed

100% while B(k) does not have this boundary.

4.1.3. Feedback Control Loop

The FC-RTS architecture features a feedback control loop that is invoked at every

sampling instant k. It is composed of a Monitor, a Controller, and a QoS Actuator (Figure

4.1).

1) The Monitor measures the controlled variables (M(k) and/or U(k)) and feeds the

samples back to the Controller.

52

2) The Controller compares the performance references with corresponding controlled

variables to get the current errors, and computes a change DB(k) (called the control

input) to the total estimated requested utilization, i.e., B(k+1) = B(k) + DB(k), based

on the errors. The Controller uses a control function to compute the correct control

input to compensate for the load variations and keep the controlled variables close to

the references. The detailed design of the Controller is presented in Section 4.4.

3) The QoS Actuator calls a QoS optimization algorithm (see Section 4.5.1) to maximize

the system value by dynamically adjusting tasks’ QoS levels under the utilization

constraint computed by the Controller, B(k+1) = B(k) + DB(k). In the simplest form,

each task only has only two QoS levels and the QoS Actuator is essentially an

admission controller.

In addition to the above feedback control loop, our FC-RTS architecture also includes

arriving-time QoS control, i.e., in addition to being called periodically by the Controller,

the QoS Actuator is also invoked upon the arrival of each task. The arriving-time QoS

control isolates disturbances caused by new task arrivals (see Section 4.3). Feedback

control scheduling in systems without arriving-time QoS control was previously studied

in [50].

4.1.4. Basic Scheduler

The FC-RTS architecture has a Basic Scheduler that schedules admitted tasks with a

scheduling policy (e.g., EDF or Rate/Deadline Monotonic). The properties of the

scheduling policy can have significant impact on the design of the feedback control loop.

Our FC-RTS architecture permits plugging in different real-time scheduling policies for

53

this Basic Scheduler and then designing the entire feedback control scheduling system

around this choice (see Section 4.4.4).

A key difference between our work and many previous works is that while previous

work often assumes the CPU utilization of each task is known a priori, we focus on

systems in unpredictable environments where tasks’ actual CPU utilizations are unknown

and time varying. This more challenging problem necessitates the feedback control loop

to dynamically correct the scheduling errors at run-time. Our FC-RTS architecture

establishes a mapping from real-time scheduling to a typical structure of feedback control

systems. This step enables us to treat a real-time system as a feedback control system and

utilize feedback control theory to design the system rather than developing ad hoc

algorithms.

4.2. Performance Specifications and Metrics

We now specialize the second element of the FCS framework, the performance

specifications, to real-time CPU scheduling. The performance specifications consist of a

set of performance profiles in terms of utilization U(k) and miss ratio M(k), and a set of

load profiles in term of the total requested CPU utilization of a system.

4.2.1. Performance Profile

The performance profile characterizes important transient and steady state performance

of a real-time system. M(k) and U(k) characterize the system performance in the sampling

window ((k-1)W, kW). In contrast, traditional metrics for real-time systems such as

average miss-ratio and average utilization are defined based on a much larger time

window than the sampling period W. The average metrics are often inadequate metric in

54

characterizing the dynamics of the system performance in response to overload

conditions [50]. The performance profile of a real-time system includes the follows.

Stability: A real-time system is stable if its miss ratio M(k) and utilization U(k) are

always bounded for bounded references. Although both miss ratio M(k) and

utilization U(k) are naturally bounded in the range [0, 1], stability is a necessary

condition to prevent the controlled variables from severe deviations from the

reference values.

Transient-state response represents the real-time system’s responsiveness and

efficiency of QoS adaptation in reacting to changes in run-time conditions.

• Overshoot Mo and Uo: For a real-time system, we define overshoot as the

maximum amount that the system overshoots its miss ratio or utilization

reference divided by its miss ratio or utilization reference, i.e., Mo = (Mmax

– MS) / MS, Uo = (Umax – US) / US, respectively. The maximum miss ratio

Mo and utilization Uo in the transient state is called the absolute overshoot.

Overshoot is important to a real-time system because a high transient

miss-ratio or utilization can cause system failure in many systems such as

robots and media streaming [19].

• Settling time Ts: The time it takes the system to enter a steady state in

response to a load profile. The settling time represents how fast the system

can settle down to steady state with desired miss ratio and/or utilization.

Steady-state error ESM and ESU: The difference between the average values of

miss ratio M(k) and/or utilization U(k) in steady state and its corresponding

reference. The steady state error characterizes how precise the system can enforce

55

the desired miss ratio and/or utilization in steady state.

Sensitivity Sp: Relative change of a controlled variable in steady state with respect

to the relative change of a system parameter p. For example, sensitivity of miss

ratio with respect to the task execution time SAE represents how significantly the

change in the task execution time affects the system miss-ratio. Sensitivity

describes the robustness of the system with regard to workload or system

variations.

4.2.2. Load Profile

For a CPU bound real-time system, the load profile L(k) of a system at sampling instant k

is defined in term of the total CPU utilization of all the tasks arriving at the system.

Specifically, two forms of overload can occur in a real-time system.

Arrival Overload: For this type of overload, the load variation ∆L = (Ln - Lm) in a

step load SL(Ln, Lm) or a ramp load RL(Ln, Lm, TR) is caused by the new arrival of

a set of tasks {j} with a total CPU utilization of ∆L = (Ln - Lm) at a system with an

initial admitted task set {p} with a total CPU utilization of A(0) = Lm. The load

variation is defined as the total CPU utilization of all new tasks assuming every

new task is at the highest QoS level N-1), ∆L = (Ln - Lm) = ∑{j}Aj[N-1], while the

initial load is defined as the total actual CPU utilization at time 0, i.e., A(0) = Lm =

∑{p}Ap[lp(0)] where lp(0) denotes the initial QoS level of task p. For example, a

step load SL(0, 150%) represents a sudden arrival of a task set with a total

utilization of 150% at an initially idle system.

Internal Overload: For this type of overload, the load variation ∆L = (Ln - Lm) in a

56

step load SL(Ln, Lm) or a ramp load RL(Ln, Lm, TR) is caused by increases in CPU

utilizations (i.e., execution times and/or inter-arrival times/periods) of tasks

already (admitted) in the system such that the total CPU utilization increases from

Ln to Lm. For example, a step load SL(100%, 150%) represents a scenario where

the execution time of every task in an initially 100% utilized system suddenly

increases by 50% at the same time instant (e.g., due to extra processing overhead

caused by retransmission of packet due to TCP congestion control).

4.3. Modeling the Controlled Real-Time System

Given the FC-RTS architecture described in Section 4.1 and the performance

specifications described in 4.2, we apply the feedback control theory based methodology

described in Section 3.3 to design the FC-RTS algorithms. The first step of this

methodology is to establish an analytical model to approximate the controlled system in

the FC-RTS architecture (Figure 4.1).

The controlled system includes the QoS Actuator, the scheduled real-time system, the

Basic Scheduler, and the Monitor. The input to the controlled system is the control input,

i.e., the change in the total estimated utilization DB(k). The output of the controlled

system includes the controlled variables, miss ratio M(k) and utilization U(k). Although it

is difficult to precisely model a nonlinear and time varying system such as a real-time

system, we can approximate such a system with a linear model for the purpose of control

design because of the robustness of feedback control with regard to system variations.

The block diagram of the controlled system model is illustrated in Figure 4.2. We now

derive the model from the control input, DB(z), through each block in Figure 4.2. The

goal is to derive the transfer function from the control input to the output, the controlled

57

variables U(z) and M(z). While the block diagram is expressed in the z-domain that is

amenable to control design, we describe the equivalent notations and formula in time

domain in the following for clarity of presentation. For example, DB(k) in the time

domain is equivalent to DB(z) in the z-domain.

DB(z) B(z) M(z)A(z)

Controlled System

M

AAth

1

U

A

1

1

GA

U(z)

Controller 1/(z-1)

Figure 4.2. The Model of the Controlled System Starting from the control input DB(k), the total estimated utilization B(k) is the

integration of the control input DB(k):

B(k+1) = B(k) + DB(k) Equation 4.1

Since the precise execution time of each task is unknown and time varying, the total

(actual) requested utilization A(k) may differ from the total estimated requested

utilization B(k):

A(k) = Ga(k)B(k) Equation 4.2

where Ga(k), called the utilization ratio, is a time-variant ratio that represents the extent

of workload variation in terms of total requested utilization. For example, Ga(k) = 2

means that the actual total requested utilization is twice of the estimated total utilization.

Since Ga(k) is time variant, we should use the maximum possible value GA =

max{Ga(k)}, called the worst-case utilization ratio, in control design to guarantee

stability in all cases. Hence Equation 4.2 can be simplified to the following formula for

the purpose of control design:

A(k) = GAB(k) Equation 4.3

The relationship between the total requested utilization A(k) and the controlled

variables are nonlinear due to saturation, i.e., the controlled variables remain constant

when the control input DB(k) ≠ 0. Saturation complicates the control design because the

controlled variables become unresponsive to the control in their saturation zones. When

the CPU is underutilized, the utilization U(k) is outside its saturation zone and stays close

to the total requested utilization A(k):

U(k) = A(k) (A(k) ≤ 1) Equation 4.4

However, Since utilization can never exceed 100%, U(k) saturates when the CPU is

overloaded:

U(k) = 1 (A(k) > 1) Equation 4.5

In contrast, the miss ratio M(k) saturates at 0 when the CPU is underutilized, i.e., the

total requested utilization is below a threshold Ath(k), called the schedulable utilization

threshold, or utilization threshold for simplicity.

M(k) = 0 (A(k) ≤ Ath(k)) Equation 4.6

In existing real-time scheduling theory, schedulable utilization bounds have been

derived for various real-time scheduling policies under different workload assumptions

[7][43][48][72]. A utilization bound Ab is typically defined as a fixed lower bound for all

possible workloads under certain assumptions, while we define the utilization threshold

Ath(k) as the time varying actual threshold for the system’s particular workload in the kth

sampling period (and hence Ab ≤ Ath(k)). Since it is always true that Ath(k) ≤ 1, the

saturation zones of CPU utilization (A(k) ≥ 1) and that of miss ratio (A(k) ≤ Ath(k)) are

guaranteed to be mutually exclusive. This property means that at any instant of time, at

59

least one of the controlled variables does not saturate. Note that different scheduling

policies in the Basic Scheduler usually lead to different utilization threshold Ath(k). For

example, if EDF is plugged into the FC-RTS architecture and the workload is composed

of independent and periodic tasks, the utilization threshold Ath = 100%. In comparison,

the utilization threshold is usually lower than 100% if RM is plugged into the

architecture. Therefore, the scheduling policy and the workload characteristics affect the

choices on the controlled variable and its reference (see Section 4.4.4).

When A(k) > Ath(k), M(k) usually increases nonlinearly with the total requested

utilization A(k) (as demonstrated in Section 4.5.5). The relationship between M(k) and

A(k) needs to be linearized by taking the derivative at the vicinity of the operation point

(A(k) = Ath(k)).

)()(

kdAkdMGa = Equation 4.7

In practice, the miss ratio factor Gm can be estimated experimentally by plotting a

M(k) curve as a function of A(k) based on experimental data and measuring its slope at

the vicinity of the point where M(k) starts to become nonzero (see Section 4.5.5). At the

vicinity of A(k) = Ath(k), we have the following linearized formula:

M(k) = M(k-1) + Gm(A(k) – A(k-1)) (A(k) > Ath(k)) Equation 4.8

Since Gm is usually different at different load levels, we use the worst-case miss ratio

factor GM, defined as the maximum value of measured Gm in the likely load range in

Controller tuning to guarantee stability. Note that different scheduling policies in the

Basic Scheduler usually have a different miss ratio factor, and hence the choice of the

scheduling policy has a direct impact on the Controller parameters (see Section 4.4.4).

From Equations (4.1)-(4.8), we can derive a transfer function for each controlled

60

variable when it is outside its saturation zone:

Utilization control: Under the condition that A(k) < 1, there exists a transfer function

HU(z) from the control input DB(z) to CPU utilization U(z) = PU(z)DB(z) and

PU(z) = GA / (z-1) (A(k) < 1) Equation 4.9

Miss ratio control: Under the condition that A(k) > Ath(k), there exists a transfer

function HM(z) from the control input DB(z) to Miss Ratio M(z) = PM(z)DB(z) and

PM(z) = GAGM / (z-1) (A(k) > Ath(k)) Equation 4.10

Since the model for miss-ratio is the same as the utilization except for the extra miss-

ratio factor GM in Equation 4.10, for simplicity of discussion we use a same formula P(z)

to represent the transfer functions of both controlled variables:

P (z) = G / (z-1) Equation 4.11

where G is called the process gain. G = GA for utilization control and G = GAGM for miss

ratio control.

In summary, the controlled system is approximated with a first order transfer function

(Equation 4.11) with a different saturation zone (Equation 4.5 and Equation 4.6) for each

controlled variable, utilization U(k) and miss ratio M(k), respectively. Note that the

saturation properties cause the controlled system to be non-linear and lead to special

challenges for the Controller design.

4.4. Design of FC-RTS Algorithms

In this section, we apply control design methods and analysis to the Controller, the key

component of feedback control scheduling algorithms. We first present the control

algorithm and the model of the feedback control loop for each controlled variable. Based

on the analytical system models, we apply control theory to tune the Controller and

61

develop a mathematical analysis on the performance profiles of the resultant Controller.

We then design several FC-RTS algorithms to handle the saturation zones in different

types of real-time systems.

4.4.1. Design of the Controller

At each sampling instant k, the Controller computes a control input DB(k), the change in

the total estimated requested utilization, based on the miss ratio error EM(k) = MS - M(k)

and/or the CPU utilization error EU(k) = US - U(k). In this section, we focus on a

Controller for a single controlled variable. The goal of a Controller includes (1)

guaranteed stability, (2) zero steady state error, (3) zero sensitivity to workload

variations, and (4) satisfactory settling time and overshoots. Since the same control

function can be used for both controlled variables, we use the same symbol E(k) to

represent the miss ratio error EM(k) and the utilization error EU(k) in the rest of this

section. Similarly we use S to denote the miss ratio reference MS and utilization reference

US, and the symbol O to denote the miss ratio reference M(k) and utilization reference

U(k).

For the FC-RTS architecture illustrated in Figure 4.1, we choose to use a simple P

(Proportional) control function [32] to compute the control input. The P control function

is in Equation 4.12(a) in the time domain and Equation 4.12(b) in the z domain where KP

is a tunable parameter.

)()()()()(bKzC

akEKkD

P

PB

==

Equation 4.12

The rationale for using a P Controller instead of a more sophisticated Controller such

as PID (Proportional-Integral-Derivative) Controller is that the controlled system

62

includes an integrator in the QoS Actuator (see Equation 4.1) such that zero steady state

error can be achieved without an I (Integral) term in the Controller (see detailed analysis

in Section 4.4.2). The D (Derivative) term is not appropriate for controlling real-time

systems because Derivative control may amplify the noise in miss ratio and utilization

due to random workloads.

The performance of the real-time system depends on the Controller parameter KP. An

ad hoc approach to design the Controller is to repeat numerous experiments on different

parameter values. In our work, we apply established control theory methods to tune the

parameters analytically to guarantee the performance specifications. We first tune the

Controller for each of the controlled variables in Section 4.4.2 based on the linear models

of the controlled system (Equation 4.11). Due to the saturation properties, the

performance of the closed loop system may deviate from the linear case. We address this

issue in Section 4.4.4.

DB(z) M(z)A(z)KP GA / (z-1)

L(z)

+GM

+MS z/(z-1)

- +

DB(z) U(z)A(z)KP GA / (z-1)

L(z)

++US z/(z-1)

- +

(a) Miss Ratio Control

(b) Utilization Control

Figure 4.3. Closed-Loop System Model for Real-Time CPU Scheduling

4.4.2. Closed-Loop System Model

For the purpose of control design, the system output is the controlled variable O(k) (miss

63

ratio M(k) or utilization U(k)).There are two input signals to a closed loop system with a

single (miss ratio or utilization) Controller.

Reference Input and Arrival Overload

The first input is the performance reference S (i.e., MS or US) modeled as a step signal,

Sz/(z-1) in the z domain. Note that with the arrival-time QoS control mechanism in our

FC-RTS architecture, the particular form of load profiles does not affect the system’s

response because the actual tasks admitted into the system are always determined by the

QoS Actuator. Therefore, the system response to the reference input corresponds to the

system performance in response to arrival overload. Given the model of the controlled

system P(z) (Equation 4.11) and the Controller C(z) (Equation 4.12), we can establish a

same closed-loop transfer function of both miss ratio and utilization control in response

to the reference input (see the block diagrams in Figure 4.3):

)()(1

)(

)()1()()(1

)()()(

bzSHz

zzO

aGKz

GKzPzC

zPzCzH

S

P

PS

−=

−−=

+=

Equation 4.13

where G = GA in utilization control, and G = GAGM in miss ratio control.

Disturbance Input: Internal Overload

The second input to the closed-loop system is the internal overload when admitted tasks’

CPU utilizations vary. The internal overload can be modeled as a disturbance that adds to

the total requested utilization A(k) (see Figure 4.3(a)). In particular, a step load SL(Ln, Lm)

is modeled as a step signal L(k) that jumps instantaneously from 0 to (Lm – Ln), or L(z) =

(Lm – Ln)z/(z-1) in the z domain; a ramp load RL(Ln, Lm, TR) is modeled as a ramp signal

L(k) that linearly increases from 0 to (Lm – Ln) in a duration of TR sec. Note that in the

64

case of internal overload input, the specific load profile decides the input signal and

therefore has a direct impact on the system performance. In this thesis, we focus our

analysis on the step load profile because it represents more severe load variations than the

ramp load with a finite duration. Regarding the disturbance input, the transfer function

for utilization control and the system output in response of the internal overload is as

follows.

)()()()(1

)(

)()1(

1)()(1

)()()(

bzHzLzSHz

zzO

aGKz

zzPzC

zPzCzH

DS

APD

+−

=

−−−=

+=

Equation 4.14

The above transfer function is also applicable to miss ratio control except for the

disturbance input should be transformed to GML(k) or GML(z) in the z domain to take into

account the extra GM term in Figure 4.3(a).

4.4.3. Control Tuning and Analysis

We now present the tuning and analysis of the utilization Controller and the miss ratio

Controller based on the analytical models described in Equation 4.13(a) and Equation

4.14(a). According to control theory, the performance profile of a system depends on the

poles of its closed loop transfer function. Based on Equation 4.13(a) and Equation

4.14(a), we can place the closed loop pole p = 1-KPG at the desired location by choosing

the right value for the control parameter KP. We now present the details of using control

theory to derive KP to achieve desired performance profile.

Stability Condition: The sufficient and necessary condition for the utilization and

the miss ratio control to guarantee stability is:

0 < KP < 2/G Equation 4.15

Proof: According to control theory, a system can guarantee stability if and only if all the

poles {pj | 0 ≤ j ≤ n} (n is the total number of poles) of its transfer function are in the unit

circle of z-plane [33]:

|pj| < 1 (0 ≤ j ≤ n)

From Equation 4.13(a) and Equation 4.14(a), the only pole of the utilization and the miss

ratio control system in response to the arrival overload and the internal overload is

p0 = 1 - KPG Equation 4.16

Hence, the utilization control and the miss ratio control guarantee stability if and only if

|1 - KPG| < 1

Therefore, the sufficient and necessary condition of stability is Equation 4.15.

We derive the steady state performance of the utilization and the miss ratio control

system by applying the Final Value Theorem to the system output in Equation 4.13(b)

and Equation 4.14(b). The following steady state analysis assumes that the stability

condition in Equation 4.15 is satisfied.

Steady state error (arrival overload): In response to an arrival overload, the miss

ratio and the utilization control guarantee zero steady state error, i.e., ESC = 0

Proof: Let O(z) be the output a stable system, the Final

Value Theorem [33] of digital control theory shows that

the system output converges to a final value

)()1()( lim1

zOzOz

−=∞→

From Equation 4.13(b), the output of the utilization

and the miss ratio

control in response to an

arrival overload is

)1(1)(

GKzGK

zSzzO

P

P

−−−=

where S represents the

66

performance reference.

Applying the Final Value Theorem to the above

equation, the final value of the utilization and miss ratio

control is

SGKz

GKzSzzzOzO

P

P

zz=

−−−−=−=∞

→→ )1(1)1()()1()( limlim

11

Equation 4.16

It follows that the steady state error ESC = S - O(∞) =

0.

Steady state error (internal overload): In

response to an internal overload, the miss

ratio and the utilization control achieve zero

steady state error, i.e., ESC= 0.

Proof: From Equation 4.14(b), the system output of the

utilization and miss ratio control in response to an

internal overload SL(Lm, Ln) is

)1(1

1)1(1)(

GKzz

zLz

GKzGK

zSzzO

PP

P

−−−

−∆+

−−−=

where ∆L = Lm – Ln for the utilization control, and ∆L =

GM(Lm – Ln) for the miss ratio control.

Applying the Final Value Theorem to the above

equation, the final value of the utilization control and the

miss ratio control is

zzSzzzOzO

zz −−−=−=∞

→→ (1)(1(()()1()( limlim

11

It follows that the

steady state error ESC =

S - O(∞) = 0.

Sensitivity:

assuming stability,

the steady-state

performance of the

utilization control

and the miss ratio

control has zero

sensitivity with

regard to task

execution times,

inter-arrival-times,

and miss ratio

factor.

Proof: According to the

definition in Equation

4.11, G = Ga(k) for the

utilization control, and

67

G = Ga(k)Gm(k) for the miss ratio control. The variation

in Ga(k) represents the variation in the task execution

times and/or inter-arrival-times, and the variation in

Gm(k) represents the variation in the miss ratio factor.

From Equations 4.17 and 4.19, the final output of the

utilization and miss ratio control system in response to

the arrival overload and the internal overload always

equals the performance reference S for any value of G if

it satisfies the stability condition (Equation 4.15). It

follows that SG = 0. €

In summary of our steady state analysis, we have proven

that, under the stability condition in Equation 4.15, the

utilization control and the miss ratio control always

achieve the performance reference in steady state in

response to arrival and internal overload. Furthermore,

we have also shown that this guarantee is robust with

regard to task execution times, inter-arrival-times, and

the miss ratio factor.

Transient State Performance

According to control theory, for the system transfer

function Equation 4.13(a), the overshoot remains zero in

response to arrival

overload if the closed

loop pole p0 ≥ 0. From

Equation 4.16, the only

pole p0 = 1 - KPG.

Hence the utilization

control and the miss

ratio control achieves

zero overshoot if and

only if

0 < KP ≤ 1/G

The settling time

increases as the

Controller parameter

increases in the above

range.

We place the pole p0

= 0.63 by settling KP =

0.37/G, or:

Miss Ratio Control: KP = 0.37/

Utilization Control: KP = 0.37/

Equation 4.18

The above values for the Controller parameter KP has the following properties based

68

on control analysis.

1) The parameters in Equation 4.18 satisfy the stability condition in Equation 4.15.

2) Since the control parameter value in Equation 4.18 also satisfy the zero overshoot

condition, the overshoot in response to the reference input is:

Miss Ratio Control: MO = 0 Mmax = MS (a)

Utilization Control: UO = 0 Umax = US (b)

Equation 4.19

3) However, the Controller cannot affect the overshoot in response to the disturbance

input, which directly changes the output before any control action could take place:

Miss Ratio Control: MO = GM(Lm – Ln)/MS Mmax = MS + GM(Lm – Ln) (a) (a) Utilization Control: MO = (Lm – Ln)/US

Equation 4.20

Umax = US + Lm – Ln (b)

4) Regarding the system as in the steady state if its output O(k) is within ε% = 2% of its

final value, the above pole placement corresponds to a same settling time in response

of the reference input and the disturbance input.

Miss Ratio/Utilization Control: Ts = 4.5 sec Equation 4.21

However, the above settling time is not applicable to the miss ratio control in

response to arrival overload because the miss ratio M(k) saturates at 0. Assume an arrival

overload occurs to an idle system at time 0, the miss ratio control observes M(0) = 0,

which results in a control signal of KP(MS – M(0)) = KPMS. Since MS is typically small,

the control signal is also small. Due to the saturation problem, the miss ratio will stay at 0

and cause the control signal to remain small. This property can cause the utilization and

miss ratio to increase slower than the in case of the linear model and result in a longer

settling time than Equation 4.21. One solution is to assign a high initial value to the

estimated requested utilization B(k) when the system is idle, which will help to push the

69

system out of the saturation zone faster than a zero initial B(k). This solution is adopted in

our Evaluation Experiment B in Section 4.5.9.

Based on the above analysis, we have the following conclusions on the transient

performance of the closed-loop system.

• Arrival Overload

From Equation 4.21, in response to an arrival overload the output settles to within 2% the

performance reference in 4.5 sec. Furthermore, Equation 4.20(a) ensures that with miss

ratio control, the system miss ratio never exceeds the miss ratio reference in response to

an arrival overshoot. Similarly, Equation 4.20(b) ensures that with utilization control, the

CPU utilization never exceeds the utilization reference in response to an arrival

overshoot. We use MATLAB to plot the closed-loop system’s step-response to a unit

reference input in Figure 4.4.

Step Response

Time (sec)

Am

plitu

de

0 1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

System: Closed Loop: r to y Settling Time: 4.5

Figure 4.4. System Response to Reference Input

70

Step Response

Time (sec)

Am

plitu

de

0 1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

System: Gd Settling Time: 4.5

Figure 4.5. System Response to Disturbance Input

• Internal Overload

From Equation 4.21, the system output can recover to within 2% the performance

reference in 4.5 sec after the beginning of an internal step-overload. However, (a) and (b)

show that the system suffers from a non-zero overshoot during transient state in response

to an internal step-overload. With miss ratio control, the system miss ratio M(k) can

overshoot the reference MS by GM(Lm-Ln). With utilization control, the CPU utilization

can overshoot the reference US by GM(Lm-Ln). We use MATLAB to plot the closed-loop

system’s step-response to a unit disturbance input (the reference input was set to zero) in

�.

Impact of System/Workload Variations on Performance Profiles

Because a real-time system is usually a time-varying system (as discussed in Section 4.3),

an important issue is how the variations in the system and workload affect the above

71

analysis result based on fixed values of system parameters. Specifically, since Ga(k) and

Gm(k) may be different from the worst-case utilization ratio GA and miss-ratio factor GM.,

we need to analyze how the changes in miss ratio factor Gm(k) and utilization ratio Ga(k)

affect the performance profile of the closed-loop system in the following.

• Stability

Based the stability condition in and the Controller parameter in Equation 4.18(a) and

Equation 4.18(b), we can derive the range of Gm(k) and Ga(k) such that the system

stability is guaranteed.

Miss Ratio Control: 0 < Ga(k)Gm(k) < 5.4GAGM

Utilization Control: 0 < Ga(k) < 5.4GA

Equation 4.22

Note that since we usually compute the Controller parameter KP based on the worst

case estimation such that, GA > Ga(k) > 0 and GA > Ga(k) > 0, our closed-loop system

guarantees stability. Further more, even if the actual system parameter can exceed the

design-time estimations (due to estimation error or dramatic system change), stability is

still guaranteed by the closed loop system as long as Ga(k) and Gm(k) stay within the

above stability range.

• Steady State Performance

We have proven that both miss ratio control and utilization control can achieve their

performance references in steady state as long as the systems remain stable. Therefore,

both the miss ratio control and utilization control provide robust and accurate

performance guarantees in steady state regardless of the actual values of miss ratio factor

and utilization ratio if they stay in the stability range (Equation 4.22).

72

0

2

4

6

8

10

12

14

0.5 1 1.5 2 2.5

Process Gain GMiss Ratio Control: G = GaGm

Utilization Control: G = Ga

Settl

ing

Tim

e Ts

Figure 4.6. Settling Time vs. Process Gain

• Transient Performance

Unlike stability and steady state performance, the closed loop system’s settling time is

sensitive to the variations in miss ratio factor Gm(k) and utilization ratio Ga(k). Assume

we use an estimation of GA = 2.0 to compute the utilization control parameter KP = 0.37/

GA = 0.185 (as in our experiments in Section 4.5), then we plot the theoretical settling

time corresponding to different process gain G in Figure 4.6. The process gain decreases

from 12.5 sec to 4 sec as the process gain G increases from 0.8 to 2.2. This result shows

that with a same Controller parameter KP, the system reacts faster to overload when its

utilization ratio and miss ratio factor are larger. Therefore, a P Controller with a fixed

parameter KP cannot guarantee a fixed settling time. Instead, if the range of the process

gain G is known, a range of settling time can be guaranteed. For example, if we know

that the process gain stays in the range 0.8 ≤ G ≤ 2.0, the settling time can be guaranteed

to be in the range of 4.5 ≤ TS ≤ 12.5 (sec) as shown in Figure 4.6.

73

Similarly, the overshoot is also sensitive to the variations in the process gain. For our

closed-loop transfer function in response to arrival overshoot (Equation 4.13(a)), the

overshoot remains zero if the closed loop pole p ≥ 0. Therefore, the system can achieve

zero overshoot in response to an arrival overload if miss ratio factor and

Miss Ratio Control: 0 < Ga(k)Gm(k) < 2.7GAGM

Utilization Control: 0 < Ga(k) < 2.7GA

Equation 4.23

In summary, given the system parameters, the worst-case utilization ratio GA, and the

miss ratio factor GM, we can directly derive the control parameter KP based on Equation

4.18(a) and Equation 4.18(b) to guarantee a set of performance profiles including

stability, zero steady state error, and a satisfactory range of transient performance. Note

that the analytical tuning method of the control parameter is significantly easier and less

time consuming than ad hoc approaches based on repeated simulation experiments. This

is one important advantage of using our control-theory based FCS framework instead of

ad hoc solutions.

4.4.4. FC-RTS Algorithms

In this section, we present the design of FC-RTS algorithms based on the utilization

and/or miss ratio control to achieve satisfactory performance profiles in different types of

real-time systems. We also discuss the impact of the basic scheduling policy and

workloads on the FC-RTS algorithm’s design.

FC-U: Feedback Utilization Control

The FC-U scheduling algorithm uses a utilization control loop (see Figure 4.3(b)) to

control the utilization U(k). FC-U can guarantee that the system has zero miss ratio in

steady state if its reference US ≤ Ath where Ath is the schedulable utilization threshold of

74

the system.

Because CPU utilization U(k) saturates at 100%, FC-U cannot detect how severely

the system is overloaded when U(k) remains at 100%. The consequence of this problem

is that FC-U can have a longer settling time than the analysis results based on the linear

model in severely overload conditions. The closer the reference is to 100%, the longer the

settling time will be. This is because the utilization control measures an error with a

smaller magnitude and thus generates a smaller control input than the ideal case

described by the linear model (Equation 4.11). For example, suppose the total requested

utilization A(k) = 200% and the utilization reference is 99%, the error measured by the

Controller would be EU = US – U(k) = 0.99 – 1 = -0.01; however, the error would have

been EU = US – U(k) = 0.99 – 2 = -1.01 according to the linear model described by

Equation 4.11. In the extreme case, US = 100% can cause the system to stay in overload

(a settling time of infinity) because the error EU = 0 even when the system is severely

overloaded. Therefore, the reference US should have enough distance from 100% (US ≤

90%) to alleviate the impact of saturation on the control performance.

FC-U is especially appropriate for systems with a utilization bound that is a priori

known and not pessimistic. In such systems, FC-U can guarantee a zero miss ratio in

steady state if its reference US ≤ Ab ≤ Ath. For example, FC-U can perform well in a

system with EDF scheduling and a periodic and independent task set because its

utilization bound is 100%. However, FC-U is not applicable for systems whose utilization

bounds are unknown or significantly pessimistic. In such systems, a reference that is too

optimistic (higher than the utilization threshold) can cause high miss ratio even in steady

state. On the other hand, a reference that is too pessimistic can unnecessarily underutilize

75

the system.

FC-M: Feedback Miss Ratio Control

The FC-M scheduling algorithm uses a miss ratio control loop (see Figure 4.3(a)) to

directly control the system miss ratio M(k) (FC-M has been called FC-EDF if EDF is

plugged into the Basic Scheduler [52]). Compared with FC-U, the advantage of FC-M is

that it does not depend on any knowledge about the utilization bound and therefore is

applicable in many real-world systems. In the process of directly controlling the miss

ratio, the miss ratio control loop always changes the total requested utilization A(k) to the

vicinity of the (unknown) utilization threshold Ath(k). An additional advantage of FC-M is

that it can achieve higher CPU utilization than FC-U because the utilization threshold is

often higher than the utilization bound.

Similar to FC-U, FC-M has restrictions on the miss ratio reference MS due to

saturation. Because miss ratio M(k) saturates at 0, FC-M cannot detect how severely the

system is underutilized. Therefore FC-M can have a longer settling time than the analysis

results based on the linear model (Figure 4.3(a)) in severely underutilized conditions, and

the settling time increases as the miss ratio reference decreases. This is because the miss

ratio control measures an error of a smaller magnitude and generates a smaller control

input than the case of the linear model (Equation 4.11). For example, suppose the total

requested utilization A(k) = 10% and the miss ratio reference is MS = 1%, the error

measured by the Controller would be EM = MS – M(k) = 0.01 – 0 = 0.01; however, the

error would have been much larger according to the linear model because it would have a

“negative” miss-ratio. In the extreme case, MS = 0 can cause the CPU to remain

underutilized because the error EM = 0 even when the system is severely underutilized.

76

Therefore, the miss ratio reference should have some distance from the saturation

boundary 0 (e.g., MS ≥ 1%) to alleviate the impact of saturation on the control

performance. Unfortunately, a positive miss ratio reference also means that the system

cannot achieve zero miss ratio in steady state.

In summary, the FC-M scheduling algorithm (with a small positive miss ratio

reference) can achieve low deadline miss ratio (close to MS) and high CPU utilization

even if the system’s utilization bound is unknown or time varying [52]. Since FC-M

cannot guarantee zero deadline miss ratio in steady state, it is only applicable to soft real-

time systems that can tolerate sporadic deadline misses in steady state.

CPU

Task Arrivals

Completed/AbortedTasks

QoS Actuator

FC-UM

Current TasksAdjustQoS

Sched

M(k)

BasicScheduler

PI ControllerMin

DBM

DBUDB

PI Controller

Monitor

U(k)

MS

US

Figure 4.7. The FC-UM Algorithm

FC-UM: Integrated Utilization/Miss Ratio Control

The FC-UM algorithm (also called FC-EDF2 if EDF is plugged into the Basic Scheduler

[50]) integrates miss-ratio control and utilization control (Figure 4.7) together to achieve

the advantages of both FC-U and FC-M. In this integrated control scheme, both miss-

ratio M(k) and utilization U(k) are monitored. At each sampling instant, M(k) and U(k)

are fed back to two separate Controllers, the miss ratio Controller and the utilization

77

Controller, respectively. Each Controller then computes its control signal independently.

The control input of the utilization control DBU(k) is compared with the miss-ratio control

input DBM(k), and the smaller one DB(k) = min(DBU(k), DBM(k)) is sent to the QoS

Actuator.

Note that the advantage of FC-U is that it can achieve excellent performance (M(k) =

0) in steady state if the utilization reference is correct, while the advantage of FC-M is

that it can always achieve low (but non-zero) miss ratio and therefore is more robust in

face of utilization threshold variations. The integrated control structure can achieve the

advantages of both controls because of the following reasons. If used alone, the

utilization control would change the total requested utilization A(k) to its reference US in

steady state, and the miss ratio control loop would change A(k) to the vicinity of the

utilization threshold Ath(k) in steady state. Due to the min operation on the two control

inputs, the integrated control loop would change the total requested utilization to the

lower value caused by the two control loops, min(Ath(k), US). The implication of this

feature is that the integrated control loop always achieves the performance of the

relatively more conservative control loop in steady state. Specifically, in a system

scheduled by FC-UM, if US ≤ Ath(k), the utilization control dominates in steady state and

guarantees that the total requested utilization A(k) stays close to its utilization reference

US and thus miss ratio M(k) = 0 in steady state. On the other hand, if US > Ath(k), the

utilization control dominates in steady state and guarantees that the total requested

utilization to stay close to its utilization threshold Ath(k) and miss ratio M(k) = MS in

steady state.

Therefore, in a system with the FC-UM scheduler, the system administrator can

78

simply set the utilization reference US to a value that causes no deadline misses in the

nominal case (e.g., based on system profiling or experiences), and set the miss ratio

reference MS according to the application’s requirement on miss ratio. FC-UM can

guarantee zero deadline misses in the nominal case while guaranteeing that the miss ratio

stay close to MS even if the utilization threshold of the system becomes lower than the

utilization threshold. Our experimental results demonstrate that FC-UM achieves

satisfactory performance. The rigorous analysis of the integrated Controller is left for our

future work.

Impacts of Scheduling Policies and Applications on FC-RTS algorithm Design

An important factor that affects the design of FC-RTS algorithms is whether an a

priori known and non-pessimistic utilization bound exists for the scheduling policy and

workload of a system. Existing real-time scheduling theory has derived the schedulable

utilization bound for various scheduling policies based on different workload

assumptions. For example, assuming a periodic and independent task set, it has been

established that the schedulable utilization bound of EDF and RM is 100% and 69%,

respectively [48]. Recently, it is proven that the schedulable utilization bound for

Deadline Monotonic scheduling is 58% for general aperiodic and periodic tasks in the

ideal case [7]. Other papers established schedulable utilization bounds for other types of

workloads (e.g., [40][71]). Since FC-U can guarantee miss ratio M(k) = 0 in steady state

if its utilization reference US ≤ Ab, the utilization reference should be determined based

on the scheduling policy and workload of a system. For example, for an independent and

periodic task set scheduled by EDF, a US = 90% is sufficient to guarantee that miss ratio

stays at 0 in steady state. Because FC-U can achieve zero steady state miss–ratio, it is the

79

most appropriate FC-RTS algorithm for systems with a known and non-pessimistic

utilization bound. FC-UM can also achieve zero steady state miss-ratio in this type of

system, but it is more complicated than FC-U.

Unfortunately, the utilization bounds of many unpredictable real-time systems are

still unknown. For example, in a typical on-line trading server, database transactions and

Web request processing can be blocked frequently due to concurrency control, disk I/O,

and TCP congestion control. The task arrival patterns may also vary considerably

because its workload is composed of periodic price updating tasks and unpredictable and

aperiodic stock trading request processing. Deciding a utilization bound on top of

commercial OS’s can become even more difficult due to unpredictable kernel activities

such as interrupt handling. Another issue is a theoretical utilization bound can be severely

pessimistic for the specific workload currently in a system. For example, although the

utilization bound of Rate Monotonic is 69% for periodic independent tasks, uniformly

distributed task sets often do not suffer deadline misses even when the CPU utilization

reaches 88% [44]. Enforcing the utilization at the utilization bound may not be cost-

effective in soft real-time systems. FC-M and FC-UM are more appropriate than FC-U

for systems without a known and non-pessimistic utilization bounds.

We should note that different scheduling policies and workloads usually introduce

different miss ratio factors GM. Because the gain KP of the miss ratio Controller should be

inversely proportional to the miss ratio factor (Equation 4.18(b)), the scheduling policy

and workload can directly affect the correct parameter of the miss ratio Controller. For

example, while our previous experiments showed that while the EDF algorithm with a

periodic task set led to a miss ratio factor GM = 1.254, the Extended Deadline Monotonic

80

(DM) algorithm with a mixed periodic and aperiodic task set has a much smaller miss

ratio factor GM = 0.447 (see Section 4.5.5). This result means that for DM with the mixed

task set, the KP of the miss ratio Controller should be 2.81 times the KP of EDF with the

aperiodic task set in order to achieve similar performance profiles.

In summary, we have designed three FC-RTS algorithms (FC-U, FC-M, and FC-UM)

using control theory based on an analytical model for a real-time system. Our control

theory analysis proves that the resultant FC-RTS algorithms can achieve the following

performance guarantees under the stability condition in Equation 4.22:

(1) Guaranteeing stability,

(2) Guaranteeing that the system miss ratio and utilization remains close to the

corresponding performance reference in steady state, and

(3) Satisfactory settling time (Figure 4.6) and zero overshoot under condition of

Equation 4.23 in transient state.

This design methodology is in contrast with existing ad hoc design methods that

depend on laborious design and testing iterations. We also investigate the impacts of

scheduling policies and workloads on the design of FC-RTS algorithms.

4.5. Experiments

In this section, we describe the simulation experiments that evaluate the performance of

our FC-RTS algorithms and the correctness of our control design. We first describe a

real-time CPU scheduling simulator used for our experiments. We then describe the

configurations of the experiments and workloads. A set of profiling experiments on the

controlled system is presented next. We then present two sets of evaluation experiments

81

on our FC-RTS algorithms.

4.5.1. FECSIM Real-Time System Simulator

The FC-RTS architecture is implemented on a generic uniprocessor real-time system

simulator called FECSIM [52]. FECSIM (Figure 4.8) has five components: a set of

Sources that each generates a periodic or aperiodic task; an Executor that emulate the

Basic Scheduler and the execution of the tasks; a Monitor that periodically measures

controlled variables; a Controller that periodically computes the control input based on

the performance errors; and a QoS Actuator that adjusts the QoS levels of the tasks to

optimize the system value (based on estimated utilizations) under the utilization

constraints. Different basic real-time scheduling policies can be plugged into the

Executor. The Controller can be turned on/off to emulate the closed loop or open loop

scheduling. The QoS Actuator can also be turned off for system profiling experiments

(see Section 4.5.5).

Source 1Source 2

Source n

… … Executor

QoSActuator

Controller

Monitor

ready_q

finishabort

controlled variablesM(k) and/or U(k)

control inputDB(k)

adjust QoS

performancereferences

Scheduling Policy

QoS OptimizationAlgorithm

Figure 4.8. The FECSIM Simulator

4.5.2. Scheduling Policy of the Basic Scheduler

To demonstrate the generality of our FC-RTS architecture and the robustness of our FC-

82

RTS algorithms, we present experimental results with two combinations of task sets and

scheduling policies in the basic scheduler. We denote these two combinations DM/PA

and EDF/P (see Table 4.1). We describe the scheduling policies in this section, and the

workloads in Section 4.5.3.

Configuration Basic Scheduling Policy Task Set

EDF/P EDF Periodic DM/PA Extended Deadline Monotonic Mixed Periodic/Aperiodic

Table 4.1. Testing Configurations

Two different scheduling policies, Extended Deadline Monotonic (DM) and EDF are

used in the Basic Scheduler.

DM: Each (periodic or aperiodic) task is assigned a fixed priority that equals its

relative deadline. A shorter relative deadline leads to a higher priority. DM has

been proved to be the optimal static scheduling policy in term of maximizing

schedulable utilization bound under certain conditions [7].

EDF: Each (periodic or aperiodic) task is dynamically assigned a priority that

equals its absolute deadline. An earlier absolute deadline leads to a higher

priority. EDF is a major dynamic real-time scheduling policy [48][72].

4.5.3. Workload

Two different task sets are used in our evaluation experiments.

Mixed Periodic/Aperiodic (PA): the workload is composed of 50% aperiodic tasks

and 50% periodic tasks. This type of task set can be found in a typical on-line

trading server whose workload is composed of periodic stock updating tasks and

aperiodic user requests such as trading and information queries.

Periodic (P): all the tasks are periodic tasks. This type of task set emulates real-

83

time applications such as multi-media streaming and process control where most

of the system operations are periodic.

Each task follows the task model described in Section 4.1.1. Each task is assumed to

have three QoS levels (0, 1, 2) including the lowest level 0 that represents service

rejection. For the rejection level, both the task execution time and value are set to 0. The

distributions of the task parameters are as follows. For the purpose of presentation, we

assume each time unit is 0.1 ms.

EEi[j]: The estimated execution time ETi[2] of task Ti at the QoS level 2 follows a

uniform distribution in the range [0.2, 0.8] ms, and ETi[1] = 0.2ETi[2].

AEi[j]: The actual execution time AEi[j] of task Ti at QoS level j follows a normal

distribution N(AAEi, AAEi1/2), where the average execution time AAEi[j] =

Ga′ETi[j]. Ga′, called the execution time factor, is a tunable workload

parameter that approximates the utilization ratio Ga. The larger Ga′ is, the

more pessimistic is the estimation of execution time. The maximum value of

Ga′ is 2.0 in all of our experiments, which means that the estimated execution

time is twice the average execution time. Therefore, the worst–case utilization

ratio

Worst-Case Utilization Ratio: GA = 2.0 Equation 4.24

This value is used to compute the Controller parameters based on Equation

(4.18a) and (4.18b).

Di[j]: All QoS levels of a task Ti have a same and fixed relative deadline Di = (10Fi

+ 10)ETi[2], where Fi, follows a uniform distribution in the range of [10, 15].

A task instance is immediately aborted once it misses its deadline.

84

Vi[j]: The value Vi[j] of task Ti at QoS level j is computed as a weight wi times its

estimated execution time, i.e., Vi[j] = wiETi[j]. The weight wi follows a

uniform distribution in the range [1, 5].

Periodic tasks:

Pi[j]: All QoS levels of a task Ti have a same period that equals its deadline Pi = Di.

The average utilization of each periodic task i at QoS level j is AAi[j] =

AAEi[j]/Pi.

Aperiodic tasks:

AIi[j]: The inter-arrival-time of an aperiodic task Ti follows an exponential

distribution with an average inter-arrival-time of AIi = Di. The average

utilization of each periodic task i at QoS level j is AAi[j] = AAEi[j]/AIi.

EIi[j]: The estimated inter-arrival-time of an aperiodic task Ti equals the average

inter-arrival-time, i.e., EIi = AIi = Di.

4.5.4. QoS Actuator

A Highest-Value-Density-First (HVDF) QoS assignment algorithm [67] is used in the

QoS Actuator. The value density of QoS level j of a task Ti is defined as VDi[j] =

Vi[j]/Bi[j]. The HVDF algorithm assigns QoS levels to all the current tasks in the order of

the decreasing value density until the total estimated requested utilization reaches a

utilization constraint UC. A fixed threshold of 80% is used by open loop scheduling

algorithms. In comparison, our FC-RTS algorithms dynamically change the threshold UC

= B(k+1) = B(k) + DB(k) at each sampling instant.

When each task’s utilization is small and there is no deadline misses, the HVDF

algorithm can approximate the optimal value under the utilization constraint. However, if

85

the actual requested utilization is unknown (as in the case in unpredictable

environments), the QoS optimization algorithm cannot always guarantee no deadline

misses and maximize the system value when used with an open loop scheduling

algorithm.

Note that our FC-RTS architecture can incorporate different real-time scheduling

policies and QoS optimization algorithms (although the scheduling policy does affect the

design of FC-RTS algorithms and the Controller parameters as discussed in Section

4.4.4)). Our work focuses on the steady and transient state performance of the feedback

control loop rather than evaluating the basic scheduling policies or QoS optimization

algorithms.

4.5.5. Profiling the Controlled Real-Time Systems

In the first set of experiments, we profile the controlled system to verify the saturation

properties of the controlled variables, miss ratio M(k) and CPU utilization U(k), and

measure the miss ratio factor GM, which is a key system parameter used for computing

the Controller parameter KP in miss ratio control (see Equation 4.18(a)).

Since we are interested in the properties of the controlled system, we turn off the

Controller and the QoS Actuator of FECSIM in the profiling experiments. A set of step

loads SL(0, Lm) with different overload level Lm are used to stress FECSIM with for 60

sec. Each step load is composed of a set of tasks with an average total requested

utilization of Lm. The experiments are repeated for both EDF/P and DM/PA

configurations. We plot the measured average CPU utilization and average miss ratio

corresponding to each step load level Lm in Figure 4.9(a) (DM/PA) and Figure 4.9 (b)

(EDF/P). Each point in the figures represents the average value of 5 runs. The 90%

86

confidence intervals of the average miss ratio are also shown, while the confidence

intervals of the average utilization are skipped because it is always within ±1% from

corresponding average values.

0 20 40 60 80 100120140160180200

Average Total Requested Utilization (%)

0

20

40

60

80

100

Ave

rage

Mis

s R

atio

, Ave

rage

Uti

lizat

ion

(%)

(a) DM/PA

0 20 40 60 80 100120140160180200

Average Total Requested Utilization (%)

0

20

40

60

80

100

Ave

rage

Mis

s R

atio

, Ave

rage

Uti

lizat

ion

(%)

(b) EDF/P

Average UtilizationAverage Miss Ratio

Figure 4.9. Controlled Variables vs. Total Requested Utilization

Profiling Results on DM/PA

First we study the profiling results on DM/PA. From Figure 4.9(a), we can see that CPU

utilization U(k) saturates at 100% after the step load level Lm exceeds 100%. Miss ratio

M(k) saturates at 0 when the average total requested utilization A′ is below 90%, and

deadline misses starts to occur when A′ reaches 90%.

When A′ is above 90%, the system’s average miss ratio increases as the total

requested utilization increases. We measure the maximum slop of the miss ratio curve

near the boundary of the saturation zone to approximate the miss ratio factor GM. In

Figure 4.9(a), the maximum slope is 0.447 when the average total requested utilization

increases from 100% to 110%. Therefore the miss ratio factor

Miss Ratio Factor (DM/PA): GM = 0.447 Equation 4.25

Profiling Results on EDF/P

Second, we study the profiling results on EDF/P. From Figure 4.9(b), we can see that

CPU utilization U(k) saturates at 100% after the step load level Lm exceeds 100%. Miss

ratio M(k) saturates at 0 when the average total utilization A′ is below 100%, and deadline

misses starts to occur when A′ reaches 100% (the deadline misses when A′ = 100% is due

to random execution times of the workload).

When A′ is above 100%, the system’s average miss ratio increases as the total

requested utilization increases. In Figure 4.9(b), the maximum slope is 0.447 when the

average total requested utilization increases from 100% to 110%.

Miss Ratio Factor (EDF/P): GM = 1.254 Equation 4.26

4.5.6. Controller Parameters

Based on the worst-case utilization ratio GA (Equation 4.24) of our workload, and the

worst-case miss ratio factor GM (Equation 4.25 and Equation 4.26) based on our profiling

experiments, we can compute the Controller parameter using Equation 4.18(a) and

Equation 4.18(b). The resultant Controller parameter KP for each FC-RTS algorithm is

listed in Table 4.2. All FC-RTS algorithms has a sampling window W = 0.5 sec in all

experiments.

FC-UM

FC-U FC-M

KP (DM/PA) 0.414

KP (EDF/P)

0.185 0.148

W (Sampling Window)

0.5 sec

Table 4.2. Controller Parameters of FC-RTS Algorithms

88

4.5.7. Performance References

The miss ratio reference depends on the application’s requirement and tolerance to

deadline misses in steady state. For example, Amazon.com may accept a higher miss

ratio reference than E-Trade.com because usually merchandize purchase may have less

strict timing constraints than stocking trading transactions. We assume that a miss ratio

reference MS = 2% (in both FC-M and FC-UM) is appropriate our simulated applications.

The utilization reference US should be lower than the nominal utilization threshold of the

basic scheduling policy and the task set. We have also discussed in Section 4.4.4 that US

should be lower than 100%, the saturation boundary the utilization control. Since the

theoretical utilization bound of EDF and periodic task set is 100% in the ideal case [48],

we set US = 90% in both FC-M and FC-UM in the EDF/P case. Although it has been

shown that DM and general (aperiodic and/or periodic) task sets have a theoretical

utilization bound of 58%, this bound is too pessimistic for our mixed aperiodic/periodic

task set. For example, in our profiling experiments (Figure 4.9(a)), the utilization

threshold Ath appears to be in the range (90%, 100%). We choose US = 80% in FC-U and

US = 90% in FC-UM. FC-UM has a more optimistic utilization reference than FC-U

because the miss ratio control in FC-UM provides a worst-case bound for the closed-loop

performance even if the utilization reference becomes higher than the actual utilization

threshold. The chosen performance references are summarized in Table 4.3.

FC-U FC-M FC-UM

80% (DM/PA) US

90% (EDF/P)

N/A

90%

MS N/A 2% 2%

Table 4.3. Performance References of FC-RTS Algorithms

89

0 5 10 15 20 25 30 35

Time (second)

0

20

40

60

80

100

U(k

); B

(k);

M(k

) (%

)(a) FC-U

UsU(k)B(k)M(k)

0 5 10 15 20 25 30 35 40 45 50

Time (second)

0

20

40

60

80

100U

(k);

B(k

); M

(k)

(%)

(b) FC-M

U(k)B(k)M(k)

0 5 10 15 20 25 30 35 40 45 50

Time (second)

0

20

40

60

80

100

U(k

); B

(k);

M(k

) (%

)

(c) FC-UM

UsU(k)B(k)M(k)

0 10 20 30 40 50

Time (second)

0

20

40

60

80

100

U(k

); B

(k);

M(k

) (%

)

(d) Open-Loop Baseline

U(k)B(k)M(k)

Figure 4.10. Response of Scheduling Algorithms to Arrival Overload SL(0, 150%) (DM/PA)

90

0 5 10 15 20 25 30 35

Time (second)

0

20

40

60

80

100

Uti

lizat

ion;

Mis

s R

atio

(%

)(a) FC-U

UsU(k)B(k)M(k)

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

Time (second)

0

20

40

60

80

100U

(k);

B(k

); M

(k)

(%)

(b) FC-M

U(k)B(k)M(k)

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

Time (second)

0

20

40

60

80

100

U(k

); B

(k);

M(k

) (%

)

(c) FC-UM

UsU(k)B(k)M(k)

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

Time (second)

0

20

40

60

80

100

Uti

lizat

ion;

Mis

s R

atio

(%

)


U(k)B(k)M(k)

Figure 4.11. Response of Scheduling Algorithms to Arrival Overload SL(0, 150%) (EDF/P)

4.5.8. Evaluation Experiment A: Arrival Overload

In this Section, we present the performance evaluation results of three FC-RTS

algorithms, FC-U, FC-M, and FC-UM in response to an arrival overload SL(0, 150%).

The execution time factor Ga′ = 2. Therefore the average execution time of each task was

twice of the estimation. An open loop scheduling algorithm using a fixed utilization

constraint B = 80% for QoS Optimization is also evaluated as a baseline. The same

scheduling policies (DM and EDF) and QoS optimization algorithm (the HVDF

algorithm) are used for all FC-RTS algorithms and the baseline. A zero initial value for

B(0) = 0 for the total estimated utilization B(k) is used in this Section. A larger initial

value for B(k) is used in Experiment B (Section 4.5.9) to reduce the settling time of FC-

91

M and FC-UM.

The sampled miss ratio M(k) and utilization U(k) of a typical run for each scheduling

algorithm are illustrated in Figure 4.10 (DM/PA) and Figure 4.11 (EDF/P). We now

describe the results for each of the scheduling algorithms.

FC-U

The performance evaluations of FC-U with DM/PA and EDF/P are illustrated in Figure

4.10(a) and Figure 4.11(a), respectively. First we look at FC-U with DM/P. In response to

the arrival overload, FC-U increases the CPU utilization U(k) by increasing the total

estimated utilization B(k) of the tasks in the system. The increasing B(k) is enforced by

the QoS Actuator that increases task QoS levels with the QoS optimization algorithm

HVDF. By 4.5 sec, the settling time predicted by our control analysis (see Section 4.4.3),

U(k) reaches 77.1%, which is within 3.6% of the reference US = 80%. This result is close

to our prediction that the U(k) should reach within 2% of the reference by 4.5 sec. The

small difference between the experimental results and the theoretical prediction is due to

the randomness of our workload2. U(k) never reaches beyond 80% in the transient state

(before 4.5 sec). This result is also consistent with our theoretical prediction of zero

overshoot, UO = 0 (see Section 4.4.3).

The CPU utilization U(k) remains stable all through the run. After 4.5 sec, the

utilization stays close to 80% and the system error stays close to zero. Because U(k) stays

below the utilization threshold, the miss ratio M(k) = 0 in throughout the run.

The performance of FC-U with EDF/P (see Figure 4.11(a)) is similar to that of FC-U

2 The measured performance profiles in our performance experiments are approximations to the theoretical definitions due to the noise introduced by the random workload.

92

with DM/PA. At 4.5 sec, FC-U increases the CPU utilization U(k) to 87.14%, within

3.2% of the reference US = 90%. U(k) never reaches beyond 90% in the transient state

(before 4.5 sec). The CPU utilization U(k) remains stable all through the run and close to

90% after 4.5 sec. Because U(k) stays below the utilization threshold, the miss ratio M(k)

= 0 throughout the run.

FC-M

The performance evaluations of FC-M with DM/PA and EDF/P are illustrated in Figure

4.10(b) and Figure 4.11(b), respectively. We first study FC-M with DM/PA. In response

to the arrival overload, FC-U increases the total estimated utilization B(k) by increasing

the QoS levels of arriving or admitted tasks. As discussed in Section 4.4.3, due to the

saturation problem of the miss-ratio control, the setting time of FC-M in response to the

arrival overload is longer than the prediction based on the linear model (Equation 4.13).

M(k) stays at 0 for the first 26.5 sec since the beginning of the arrival overload. The

system settles at approximately 30 sec when M(k) reaches 1.23% (within 0.77% to the

reference MS = 2%) and U(k) reaches 94.44%. We can shorten the settling time of FC-M

in response to arrival overload by assigning a larger initial value to the total estimated

utilization B[0] (as shown in Section 4.5.9). M(k) never reaches beyond 2% and therefore

achieves zero miss ratio overshoot in the transient state.

M(k) remains stable throughout the run. In steady state (after 30 sec), M(k) stays close

to 2% and below 5% throughout the run except for M(k) = 5.97% at 31.5 sec. This result

shows that the steady state error is close to 0 as predicted by our analysis in Section 4.4.3.

We also observe that with FC-M, the CPU utilization U(k) in steady state is clearly

higher than the CPU utilization (close to 80%) in the run of FC-U. This is because by

93

directly controlling the miss ratio, FC-M can change the CPU utilization to the vicinity of

the (a priori unknown) utilization threshold, which is higher than the utilization reference

of FC-U that is set to 80% a priori.

The performance of FC-M with EDF/P (Figure 4.11(b)) is similar to the case of

DM/PA. The settling time is approximately 87 sec when the miss ratio reaches 2.88%.

FC-M with EDF/P achieves zero overshoot in transient state. The miss ratio stays close to

2% in steady state and remains stable throughout the run. Similar to the case of DM/PA,

FC-M with EDF/P also has a higher CPU utilization (close to 100%) than FC-U with

EDF/P (close to 90%) in steady state.

In summary, compared with FC-U, FC-M achieves higher CPU utilization and

robustness with regard to utilization threshold variations at the cost of a low but non-zero

miss ratio in steady state.

FC-UM

The performance evaluations of FC-UM with DM/PA and EDF/P are illustrated in Figure

4.10(c) and Figure 4.11(c), respectively. First we study the performance of FC-UM with

DM/PA. After the overload arrives, FC-UM increases the utilization U(k). Similar to FC-

M, the miss ratio M(k) stays at 0 and the CPU utilization U(k) increases slower than FC-

U. In the beginning of the run, the (saturated) miss ratio control computes a smaller

control signal DBM(0) = KP(MS – M(0)) = 0.414*(0.02–0) = 0.008 than of the utilization

control’s signal DBM(0) = KP(US – U(0)) = 0.185*(0.9–0) = 0.167. Due to the min

operation on control inputs from the two Controllers, the miss ratio control dominates the

control loop in the starting phase. The miss control signal remains 0.008 and stays

smaller than the utilization control signal, which decreases as the utilization U(k)

94

increases. At time 27 sec, the utilization U(54) reaches 94.9% and the miss ratio M(54) =

0.93%. Now the utilization control signal DBU(54) = -0.009 becomes smaller than the

miss ratio control signal DBM(54) = 0.004 and takes over the control loop. Because the

utilization threshold is higher than the utilization reference US = 90%, the utilization

control dominates the control loop, and U(k) stays close to 90% while the miss ratio stays

at 0 after 27 sec. Therefore, the settling time is approximately 27 sec. Since neither U(k)

nor M(k) surpasses its corresponding reference in transient state (before 27 sec), FC-UM

achieves 0 overshoot in both U(k) and M(k).

In the steady state, the utilization U(k) stays close to 90% and hence FC-UM achieves

zero steady state error in term of the utilization. The miss ratio M(k) remains close to 0%,

lower than the miss ratio reference MS = 2% throughout the steady state except M(63) =

2.04%. This is because the utilization reference is lower than the utilization threshold and

therefore dominates the control loop in the steady state. Note that if the utilization

reference were higher than the utilization threshold, the miss ratio control would

dominate the control loop and FC-UM would achieve zero steady error in term of miss

ratio and a steady state utilization close to the utilization threshold. The system remains

stable throughout the run.

The performance of FC-UM with EDF/P (Figure 4.11(c)) is similar to the case of FC-

UM with DM/PA. The miss-ratio control dominates the control loop in the beginning of

the experiment until 75 sec (the settling time) when the utilization control starts to take

over the control loop. FC-UM with EDF/P achieves zero overshoot in both utilization

U(k) and M(k). Because the utilization reference US is lower than the utilization

threshold, FC-UM with EDF/P achieves zero steady state error in term of utilization and

95

the miss ratio stays at 0 throughout the steady state.

In summary, FC-UM combines the advantages of both FC-U and FC-M and achieves

zero steady state miss ratio in the nominal case when the utilization reference is lower

than the utilization threshold. Furthermore, FC-UM can also achieve a low steady state

miss ratio even if the system’s utilization threshold changes to lower than the utilization

reference.

Open Loop QoS Optimization Algorithm

In comparison with the FC-RTS algorithms, the system scheduled by the open loop QoS

optimization algorithm suffers from high miss ratios with both DM/PA and EDF/P (see

Figure 4.10(d) and Figure 4.11(d)). This is because the task execution time is on average

twice of the estimation and the QoS optimization algorithm overloaded the CPU due to

the incorrect estimations on task execution time. On the other hand, the system would

suffer from low CPU utilization if the task execution time were lower than the estimation

(see Section 4.5.9). This result demonstrates that the open loop QoS optimization

algorithm is incapable of maintaining satisfactory performance in face of unpredictable

workload.

In summary, we have demonstrated that all of our three FC-RTS algorithms, FC-U,

FC-M, FC-UM can provide desired performance guarantees in terms of miss ratio and

CPU utilization in steady state and achieve satisfactory performance profiles in response

to an arrival overload SL(0, 150%) when the average task execution times is different

from the estimation. In contrast, the open-loop QoS optimization fails to provide such

performance guarantees in face of the same overload.

96

0.8 0.8

1.26 1.26

2 2

1.5 1.5

0

0.5

1

1.5

2

2.5

0 100 200 300 400

Time (sec)

Ga'

Figure 4.12. Execution Time Factor Ga′ in Experiment B

4.5.9. Evaluation Experiment B: Arrival/Internal Overload

In the second set of evaluation experiments, we stress our FC-RTS algorithms and the

baseline with a more unpredictable load profile than the one used in Experiment A. The

new load profile causes an arrival overload of SL(0, 150%) in the beginning of each run.

Furthermore, the average task execution times of all tasks vary every 100 sec to create

internal overload in the system. The execution time factor Ga′ throughout the run is

shown in Figure 4.12. The execution time factor Ga′ instantaneously jumps from 0.8 to

1.26 at time 100 sec. This change causes a 57.5% increase in the average execution time

of every task. Suppose the total requested utilization of the system is A(200) before the

jump, the execution time change corresponds to an internal overload of SL(A(200),

1.575A(200)). A similar step load SL(A(400), 1.575A(400)) occurs again at time 200 sec

when Ga′ jumps from 1.26 to 2. The jump at time 300 sec, on the other hand, creates an

internal underload SL(A(600), 0.75A(600)) (modeled as a negative step signal) when Ga′

instantaneously decreases from 2 to 1.5.

In this set of experiments, a larger initial value B(0) = 80% is assigned to the

estimated requested utilization B(k) to shorten the settling time of FC-M and FC-UM in

97

response to arrival overloads. The configurations of our FC-RTS algorithms are listed in

Table 4.2 and Table 4.3. The open-loop baseline uses a fixed B(k) = 80% and 90% for

QoS optimization with DM/PA and EDF/P, respectively. A typical run of each of the FC-

RTS algorithms and the baseline is shown in Figure 4.13 (DM/PA) and Figure 4.14

(EDF/P).

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(a) FC-U

UsU(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(b) FC-M

U(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(c) FC-UM

UsU(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)


U(k)B(k)M(k)

Figure 4.13. Response of Scheduling Algorithms to Arrival/Internal Overload (DM/PA)

98

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(a) FC-U

UsU(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(b) FC-M

U(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)

(c) FC-UM

UsU(k)B(k)M(k)

0 50 100 150 200 250 300 350 400

Time (second)

0

50

100

U(k

); B

(k);

M(k

) (%

)


U(k)B(k)M(k)

Figure 4.14. Response of Scheduling Algorithms to Arrival/Internal Overload (EDF/P)

FC-U

The performance evaluations of FC-U with DM/PA and EDF/P are illustrated in Figure

4.13(a) and Figure 4.14(a), respectively. We first study the performance of FC-U with

DM/PA. Because FC-U starts from an non-zero estimated requested utilization B(0) =

80%, it settles to steady state within only 0.5 sec. FC-U stays in steady state while the

utilization U(k) stays close to the utilization reference US = 80% and the miss ratio M(k)

remains 0 until 200 sec when the average execution time of every task increases from 0.8

99

to 1.26 and causes an internal overload. Consequently, the utilization U(k) overshoots to

100% at 100.5 sec and the miss ratio increases to 13.06%. In response to the overload

condition, FC-U reduces total estimated utilization B(k) in the system by lowering task

QoS levels. Within 9 sec (closed to the predicted settling time of 8 sec for Ga = 1.26)

after the internal overload starts, U(k) decreases to 80.34% while the M(k) becomes 0,

and the system resettles in a steady state with the utilization close to 80% and miss ratio

M(k) = 0%. FC-U responds similarly to the internal overload at 200 sec when the

execution factor increases from 1.26 to 2. The system settles down to a satisfactory

steady state within 5 sec (close to the predicted settling time of 4.5 sec for Ga = 2).

At time 300 sec, an internal underutilization occurs at 300 sec when the execution

time factor decreases from 2 to 1.5. Consequently, the utilization U(k) decreases to

63.36%. In response to the instantaneous underutilization, FC-U increases total estimated

utilization B(k) allowed in the system by improving task QoS levels. At 305 sec, the

system resettles in a satisfactory steady state with the utilization close to 80% and miss

ratio M(k) = 0%.

The performance of FC-U with EDF/P is similar to the above case of FC-U with

DM/PA. The FC-U with EDF/P reacts to both the arrival overload and the subsequent

internal load variations efficiently and (re)settles to a satisfactory steady state with the

utilization close to 80% and 0 miss ratio. The performance profiles of the FC-M with

DM/PA and EDF/P are summarized in Table 4.4. Note that FC-U with DM/PA and

EDF/P both provide 80% utilization in all steady states despite of the difference in

execution times. This observation verifies that FC-U has zero sensitivity with regard to

execution time variations and provides robust performance guarantees in face of

100

unpredictable workloads.

Miss Ratio Utilization Settling Time 4.5 sec Transient State

(0 ~24.5) sec Absolute Overshoot 0.88% 84.12% Steady State

(24.5~100.5) sec Average 0.00% 80.04%

Settling Time 9 sec Transient State (100.5~114) sec Absolute Overshoot 13.06% 100%

Steady State (114~200.5)

Average 0.00% 80.00%


Steady State (212~300.5) sec

Average 0.00% 80.01%

Settling Time 5 sec Transient State (300.5~311) sec Absolute Overshoot 0.00% 87.22%

Steady State (311~400) sec

Average 0.00% 80.00%

(a) FC-M with DM/PA


(0 ~79.5) sec Absolute Overshoot 0 90.20% Steady State

(79.5~100.5) sec Average 0.00% 90.03%

Settling Time 13% Transient State (100.5~111) sec Absolute Overshoot 36.28% 100.00%


Average 0.00% 90.01%

Settling Time 7 sec Transient State (200.5~205.5) sec Absolute Overshoot 32.03% 100%

Steady State (205.5~300.5) sec

Average 0.00% 90.00%

Settling Time 4 sec Transient State (300.5~327.5) sec Absolute Overshoot 0.00% 88.32%

Steady State (327.5~400) sec

Average 0.00% 89.99%

(b) FC-M with EDF/P

Table 4.4. The Performance Profiles of FC-U in Experiment B

FC-M

The run of FC-M with DM/PA and EDF/P is illustrated in Figure 4.13(b) and Figure

4.14(b), respectively. We first study the run of FC-M with DM/PA. Because the initial

101

estimated requested utilization B(0) = 80% and the initial execution time factor Ga′ = 0.8,

the system starts from zero miss ratio with utilization close to 64%. The system settles to

an steady state within 24.5 sec when the miss ratio M(49) = 1.41% and the utilization

U(49) = 96.1%. FC-U stays in steady state while the miss ratio U(k) stays close to the

miss ratio reference MS = 2% and the utilization U(k) stays close to 100%. In the steady

state (from 24.5 sec to 100 sec), the average miss ratio is 1.88% (steady state error ESM =

0.11%) while the average utilization is 99.1%.

The system stays in steady state until 200 sec when the average execution time of

every task increases from 0.8 to 1.26 and causes an internal overload. Consequently, the

miss ratio and the utilization overshoots to 1.41% and 40.7%, respectively. In response to

the instantaneous overload, FC-M reduces the total estimated utilization B(k) by lowering

task QoS levels. Within 13.5 sec after the internal overload starts, the M(k) changes to

M(228) = 1.44% while U(228) = 98.0%, and the system resettles to a steady state. In the

new steady state (from 114.5 sec to 200 sec), the average miss ratio is 1.95% (steady state

error ESM = 0.05%) while the average utilization is 98.45%. Similarly, FC-M successfully

responds to the internal overload at 200 sec and settles down to a satisfactory steady state

within 11.5 sec.

At time 300 sec, the execution time factor decreases from 2 to 1.5 and the utilization

consequently drops to 79.9%. At time 211 sec, M(k) increases to 3.2% while U(k)

increases to 99.74%, and the system resettles in a satisfactory steady state with an

average miss ratio of 2.01% (the steady state error ESM = -0.01%) and an average miss

ratio of utilization close to 98.27%.

The performance of FC-M with EDF/P is similar to the above case of FC-M with

102

DM/PA. The FC-M with EDF/P successfully reacts to the arrival overload and the

subsequent internal load variations and (re)settles to steady states with the miss ratio

close to 2% and utilization close to 100% despite of the difference in execution times.

This observation verifies that FC-M has zero sensitivity with regard to execution time

variations and provides robust performance guarantees in face of unpredictable

workloads. The performance profiles of the FC-M with DM/PA and EDF/P are

summarized in Table 4.5.



(24.5~100.5) sec Average 1.88% 99.01%



Average 1.95% 98.45%



Average 2.01% 97.61%



Average 2.01% 98.27%

(a) FC-M with DM/PA



(79.5~100.5) sec Average 2.23% 100.00%



Average 1.99% 100.00%

Settling Time 5.5 sec Transient State (200.5~205.5) sec Absolute Overshoot 38.11% 100.00%


Average 2.03% 99.99%


103


Average 2.04% 100.00%

(b) FC-M with EDF/P

Table 4.5. The Performance Profiles of FC-M in Experiment B

FC-UM

The run of FC-UM with DM/PA and EDF/P is illustrated in Figure 4.13(c) and Figure

4.14(c), respectively. We first study the run of FC-UM with DM/PA. In response to the

arrival overload at time 0, the miss ratio control dominates the control loop in the

transient state until the utilization approaches the utilization reference US = 90% when the

utilization takes over the control loop. Because the utilization reference is lower than the

utilization threshold, the utilization control dominates the control loop and the system

settles to steady state at 17.5 sec. The miss ratio M(k) stays at 0% most of the time while

in steady state, and the utilization U(k) stays close to 90%.

The system stays in the steady state until 200 sec when the average execution time of

every task increases from 0.8 to 1.26. The utilization U(k) overshoots to 100% and the

miss ratio overshoots to 24.53%. Although both the miss ratio control and the utilization

control compute negative control signals in response to the internal overload, the miss

ratio takes over the control loop because the utilization saturates at 100% resulting in a

control signal with a smaller magnitude. The miss ratio control dominates the control

loop until the utilization approaches 90% and the miss ratio becomes zero. The FC-UM

then takes over and the system settles to a new steady state at 105 sec with an average

miss ratio of 0.07% and an average utilization of 89.85% (steady state error ESU =

0.15%).

FC-UM responds similarly to the internal overload at 200 sec when the execution

104

factor increases from 1.26 to 2. The system settles down to a satisfactory steady state

within 2.5 sec. In the steady state (from 203 sec to 300 sec), the average miss ratio is

0.12% and the average utilization is 89.71% (steady state error ESU = 0.29%).

At time 300 sec, the execution time factor decreases from 2 to 1.5 and the utilization

U(k) drops to 69.24%. Similar to the beginning of the run, FC-UM increases total

estimated utilization B(k) by improving task QoS levels. At time 308.5 sec, U(k)

increases to 92.02% while the system resettles in a steady state with an average miss ratio

of 0.07% and an average utilization close to 89.90% (the steady state error ESU = 0.10%).

The performance of FC-UM with EDF/P is similar to the above case of FC-UM with

DM/PA. The FC-UM with EDF/P successfully reacts to both the arrival overload and the

internal overload and (re)settles to satisfactory steady states while the miss ratio stays

close to 0% and the utilization stays close to 90% despite of the difference in execution

times. This observation verifies that FC-UM has zero sensitivity with regard to execution

time variations and provides robust performance guarantees in face of unpredictable

workloads. The performance profiles of the FC-UM are summarized in Table 4.6.



(24.5~100.5) sec Average 0.03% 89.92%

Settling Time 4.5 sec Transient State (100.5~114) sec Absolute Overshoot 24.53% 100.00%


Average 0.07% 89.85%



Average 0.12% 89.71%


105


Average 0.07% 89.90%

(a) FC-UM with DM/PA



(79.5~100.5) sec Average 0.00% 89.48%



Average 0.00% 89.85%



Average 0.00% 89.78%



Average 0.00% 89.86%

(b) FC-UM with EDF/P

Table 4.6. The Performance Profiles of FC-UM in Experiment B

Open-Loop Baseline

The performance of the open-loop baseline with DM/PA and EDF/P is illustrated in

Figure 4.13(d) and Figure 4.14(d), respectively. In contrast with our FC-RTS algorithms,

the open-loop baseline fails to provide performance guarantees in miss ratio or utilization

in with both EDF/P and DM/PA. When task execution times are lower than the

estimations (from 0 to 100 sec), the baseline algorithm underutilizes the CPU (with

utilization U(k) close to 72%). On the other hand, when the execution exceeds the

estimations (from 100.5 sec to 400 sec), the system suffers from persistent deadline

misses. For example, the baseline with DM/PA has an average miss ratio of 9.23% from

200.5 sec to 300 sec and the miss ratio reaches 94.1%. The baseline with EDF/P has an

average miss ratio of 51.39% in the same period.

106

In summary, our evaluation results verify that our FC-RTS algorithms can provide the

following performance guarantees under the stability condition in Equation 4.22:

(1) Stability in face of arrival overload and internal overload

(2) System miss ratio and utilization stay close to the corresponding performance

reference in steady state regardless of variations in task execution times

(3) Satisfactory settling time and low overshoot in transient state

In addition to the performance profiles, the average performance of the FC-RTS

algorithms and the baseline are shown in Figure 4.15(a) (DM/PA) and Figure 4.15(b)

(EDF/P). The considered performance metrics include the average miss ratio Ma, average

CPU utilization Ua, and the Average Value Completion Ratio Va defined as the total

completed value divided by the total values of all the arriving tasks at the highest QoS

level. Va characterizes the utility and throughput of the system throughout the run. All of

the above metrics is computed based on the performance throughout the run. Every data

point in Figure 4.15(a) and Figure 4.15(b) is the mean of 5 repeated runs. The 90%

confidence interval of each Ma, Ua, and Va is within ±0.91%, ±0.23%, and ±1.25%,

respectively, to its mean. We can see that all the FC-RTS algorithms consistently

outperform the open-loop baseline in terms of average miss ratio and the value

completion ratio.

107

0.13 2.14 0.32

9.11

80.31

96.8989.27 90.90

46.5151.85 50.55

41.87

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

FC-U FC-M FC-UM Baseline

%MaUaVa

(a) DM/PA

1.00 2.15 0.56

26.38

90.2795.89

87.8193.00

50.73 52.01 49.58

28.88

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

FC-U FC-M FC-UM Baseline

%

MaUaVa

(b) EDF/P

Figure 4.15. Average Performance of FC-RTS algorithms and the Baseline (Ma: Average Miss Ratio; Ua: Average Utilization; Va: Average Value Completion Ratio)

In summary, our evaluation results demonstrate that our three FC-RTS algorithms

provide robust and precise performance guarantees in term of utilization and miss ratio

108

even when the workload significantly varies from the estimation. Furthermore, they also

achieve satisfactory transient state performance profiles in response to arrival and internal

overload. In contrast, an open loop QoS optimization algorithm fails to provide such

guarantees when the workload deviates from the a priori estimation.

4.6. Comparison of Real-Time Scheduling Algorithms in Overload

We now qualitatively compare several existing real-time scheduling algorithms (see

Table 4.7). Our comparison is based two criteria, the required knowledge of the workload

by a scheduler and its performance in overload conditions. Simple algorithms such as

Rate (Deadline) Monotonic based on off-line schedulability analysis depend on complete

knowledge about the workload and the system including tasks’ resource requirements,

future arrivals and the system’s schedulable utilization bound. These algorithms cannot

work in overload conditions because of their lack of overload handling mechanisms. The

(open loop) on-line admission control or QoS optimization based algorithms add

flexibility to real-time systems by not requiring knowledge about task future arrivals,

although the tasks’ resource requirements and utilization bound still need to be known a

priori. FC-RTS algorithms accomplished the next level of flexibility by providing robust

performance guarantees without requiring a priori knowledge about tasks’ resource

requirements and even the utilization bound as in the case of FC-M and FC-UM.

Therefore, FC-RTS algorithms provide the most appropriate solutions for soft real-time

systems in unpredictable environments. Such systems include online trading and e-

business servers, and data-driven systems such as smart spaces, agile manufacturing, and

many defense applications such as C4I.

109

Knowledge of the Workload/System Performance in Overload

Miss Ratio

Task resource requirement

Future arrival time

Utilization Bound Steady

state Transient state

CPU utilization

RM, EDF Yes Yes Yes N/A N/A

Open Loop Admission Control/QoS Optimization

Yes No Yes 0 High if estimation of resource requirement is not pessimistic; Low otherwise

FC-U No No Yes 0 Bounded by overshoot

High

FC-M No No No Small Bounded by overshoot

High

FC-UM

No No No 0 nominally; Guaranteed to be small

Bounded by overshoot

High

Table 4.7. Comparison of Real-Time Scheduling Paradigms in Overload Conditions

4.7. Summary

We successfully apply our FCS framework to systematically design a set of feedback

control real-time CPU scheduling (FC-RTS) algorithms that achieve desired transient and

steady state performance specifications in face of unpredictable task execution times and

arrivals. The key results of our research on CPU scheduling include:

A novel FC-RTS architecture that integrates performance feedback control with

different real-time scheduling policies and QoS optimization algorithms,

Specialization of the generic performance profiles and load profiles to metrics

based on CPU utilization and deadline miss ratio, and arrival/internal overload in

CPU-bound real-time systems,

An analytical model of CPU-bound real-time systems that provides a foundation

for design and analysis of such algorithms with established control theory,

110

A set of FC-RTS algorithms, FC-U, FC-M, and FC-UM, that provides following

performance guarantees in terms of deadline miss ratio and/or CPU utilization for

different types of real-time applications in unpredictable environments,

• Stability in face of arrival overload and internal overload,

• Accurate enforcement of performance references in steady state, and

• Satisfactory settling time and low overshoot in transient state.

A set of tuning/analysis results of FC-RTS algorithms to achieve desired

performance profiles in response to unpredictable overload conditions,

Simulation evaluation results that demonstrate our FC-RTS algorithms achieve

robust performance guarantees and desired performance profiles in response to

new task arrivals and execution time variations, and

A qualitative comparison of classical real-time scheduling paradigms and FC-

RTS algorithms that achieve a leap in flexibility and predictability in


111

Chapter 5

Web Server with Delay Guarantees

5.1. Introduction

The increasing diversity of applications supported by the World Wide Web and the

increasing popularity of time-critical web-based applications (such as online trading)

motivates building QoS-aware web servers. Such servers customize their performance

attributes depending on the class of the served requests so that more important requests

receive better service. From the perspective of the requesting clients, the most visible

service performance attribute is typically the service delay. Different requests may have

different tolerances to service delays. For example, one can argue that stock trading

requests should be served more promptly than information requests. Similarly, interactive

clients should be served more promptly than background software agents such as web

crawlers and prefetching proxies. Some businesses may also want to provide different

service delays to different classes of customers depending on their importance or monthly

fees. In this chapter, we provide a solution to support delay differentiation in web servers.

While existing best effort differentiation approaches [9][10][13][28] on web servers

112

usually offer better service to premium clients, they do not provide any guarantees do not

provide guarantees on the extent of the difference between premium and basic

performance levels. This difference depends heavily on load conditions and may be

difficult to quantify. In a situation where clients pay to receive better service, any

ambiguity regarding the expected performance improvement may cause client concern,

and is, therefore, perceived as a disadvantage. Compared with the best effort

differentiation model, the proportional differentiated service and the absolute guarantee

model both provide stronger guarantees in service differentiation.

In the absolute guarantee model, a fixed maximum service delay (i.e., a soft deadline)

for each class needs to be enforced. A disadvantage of the absolute guarantee model is

that it is usually difficult to determine appropriate deadlines for web services. For

example, the tolerable delay threshold of a web user may vary significantly depending on

web page design, length of session, browsing purpose, and properties of the web browser

[19]. Since system load can grow arbitrarily high in a web server, it is impossible to

satisfy the absolute delay guarantees of all service classes under overload conditions. The

absolute delay guarantee requires that all classes receive satisfactory delay if the server is

not overloaded; otherwise desired delays are violated in the predefined priority order, i.e.,

low priority classes always suffer guarantee violation earlier than high priority classes3.

In the absolute guarantee model, deadlines that are too loose may not provide necessary

service differentiation because the deadlines can be satisfied even when delays of

different classes are the same. On the other hand, deadlines that are too tight can cause

3 Another scheme to implement absolute guarantee is to apply admission control on incoming requests during overload conditions. However, from the perspective of web clients, request denial by admission control is no better than service failure due to overload.

113

extremely long latency for low priority classes in order to enforce high priority classes’

(potentially unnecessarily) tight deadlines.

In the proportional differentiated service model introduced in [26], a fixed ratio

between the delays seen by the different service classes can be enforced. This architecture

provides a specification interface and an enforcement mechanism such that a desired

"distance" between the performance levels of different classes can be specified and

maintained. This service model is more precise in its performance differentiation

semantics than the best effort differentiation model. The proportional differentiated

service is also more flexible than absolute guarantee because it does not require fixed

deadlines being assigned for each service class.

Depending on the nature of the overload condition, either the proportional

differentiated service or the absolute guarantee may become more desirable. The

proportional differentiated service may be less appropriate in severe overload conditions

because even high priority clients may get extremely long delays. In nominal overload

conditions, however, the proportional differentiated service may be more desirable than

absolute guarantee because the proportional differentiated service can provide adequate

and precise service differentiation without requiring artificial, fixed deadlines being

assigned to each service class. Therefore, a hybrid guarantee is desirable in some

systems. For example, a hybrid policy can be that the server provides proportional

differentiated service when the delay received by each class is within its tolerable

threshold. When the delay received by a high priority class exceeds its threshold, the

server automatically switches to the absolute guarantee model that enforces desired

delays for high priority classes at the cost of violating desired delays of low priority

114

classes. This policy can achieve the flexibility of the proportional differentiated service in

nominal overload and bound the delay of high priority class in severe overload

conditions.

In this chapter, we present a web server architecture to support delay guarantees

including the absolute guarantee, proportional differentiated service, and the hybrid

guarantee described above. A key challenge in guaranteeing service delays in a web

server is that resource allocation that achieves the desired delay or delay differentiation

depends on load conditions that are unknown a priori. A main contribution of this thesis

is the introduction of a feedback control architecture for adapting resource allocation such

that the desired delay differentiation between classes is achieved. We formulate the

adaptive resource allocation problem as one of feedback control and apply feedback

control theory to develop the resource allocation algorithm. We target our architecture

specifically for the HTTP 1.1 protocol [32], the most recent version of HTTP that has

been adopted at present by most web servers and browsers. As we show in this thesis,

persistent connections introduced by HTTP 1.1 give rise to peculiar server bottlenecks

that affect our choice of resource allocation mechanisms for delay differentiation. Hence,

our contributions can be summarized as follows:

An adaptive architecture for achieving relative and absolute service delay

guarantees in web servers under HTTP 1.1

Use of feedback control theory and methodology to design an adaptive connection

scheduler with proven performance guarantees. The design methodology

includes:

115

• Using system identification to model web servers for purposes of

performance control,

• Specifying performance requirements of web servers using control-based

metrics, and

• Designing feedback controllers using the Root Locus method to satisfy the

performance specifications.

Developing a system identification methodology and software tool as a empirical

and practical modeling solution for computer systems with unknown or

complicated dynamics, which has been a major barrier for applying feedback

control in such systems.

Implementing the adaptive architecture by modifying an Apache web server on

top of a Linux platform.

Performance evaluation that demonstrates our adaptive architecture and

connection scheduling algorithms achieve robust service delay guarantees even

when the workload varies considerably.

The rest of this chapter is organized as follows. In Section 5.2, we briefly describe

how web servers (in particular those with the HTTP 1.1 protocol) operate. In Section 5.3,

we formally define the semantics of delay differentiation guarantees on web servers. The

design of the adaptive server architecture to provide delay guarantees is described in

Section 5.4. In Section 5.5, we apply feedback control theory to systematically design

feedback controllers to satisfy the desired performance of the web server. The

implementation of the architecture on an Apache web server and experimental results are

116

presented in Sections 5.6 and 5.7, respectively. We summarize this chapter in Section 5.8.

5.2. Background

The first step towards designing architectural components for service delay

differentiation is to understand how web servers operate. Web server software usually

adopts a multi-process or a multi-threaded model. Processes or threads can be either

created on demand or maintained in a pre-existing pool that awaits incoming TCP

connection requests to the server. The latter design alternative reduces service overhead

by avoiding dynamic process creation and termination - a very costly operation in an

operating system such as UNIX. Here, we assume a multi-process model with a pool of

processes, which is the model of the Apache server, the most commonly used web server

today [31].

In HTTP 1.0, each TCP connection carries a single HTTP request. This results in an

excessive number of concurrent TCP connections. To remedy this problem the current

version of HTTP, called HTTP 1.1 [32], reduces the number of concurrent TCP

connections with a mechanism called persistent connections, which allows multiple web

requests to reuse the same connection. An HTTP 1.1 client first sends a TCP connection

request to a web server. The request is stored in the listen queue of the server's well-

known port. Eventually, the request is dequeued and a TCP connection is established

between the client and one of the server processes. The client can then send HTTP

requests and receive responses over the established connection. The HTTP 1.1 protocol

requires that an established TCP connection be kept alive after a request is served in

anticipation of potential future requests. If no requests arrive on this connection within

TIMEOUT seconds, the connection is closed and the process responsible for it is returned

117

to the idle pool. Due to the increasing popularity of HTTP 1.1, we focus on the

implications of persistent connections on server performance, and present a resource

allocation mechanism for delay differentiation that specifically addresses the peculiarities

of this protocol.

Persistent connections generate a new type of resource bottleneck on the server. Since

a server process may be tied up with a persistent connection even after the request is

served, the CPU(s) can be under-utilized. One way to alleviate this problem is to increase

the number of server processes. However, too many processes can cause thrashing in

virtual memory systems [24] thus degrading server performance considerably. In

practice, an operating system specific limit is imposed on the number of concurrent

server processes. This limit is often the bottleneck in servers implementing HTTP 1.1.

Hence, while new connections suffer large queuing delays, requests arriving on existing

connections are served almost immediately by their dedicated processes4. We verify this

observation experimentally as described in Section 5.7.1. The observation is important as

it affects our choice of delay metrics and QoS enforcement architecture. In particular,

since CPU may not be the bottleneck resource, CPU scheduling/allocation is not

necessarily an effective mechanism to provide service differentiation in web servers

using HTTP 1.1. Instead, we develop a server process allocation mechanism (Section

5.4.1) to support service differentiation in such systems. Note that it is not the objective

of our current research to interfere with HTTP 1.1 semantics to improve CPU utilization.

4 Some new web servers (e.g., [58]) have a single-threaded and event driven architecture, which has no limit on the number of connections that can be accepted besides limits on the maximum number of open descriptors imposed by the OS. In this architecture, processing delay instead of the connection delay may dominate the server response time.

118

5.3. Semantics of Service Delay Guarantees

Let the connection delay denote the time interval between the arrival of a TCP

connection (establishment) request and the time the connection is accepted (dequeued) by

a server process. Let the processing delay denote the time interval between the arrival of

an HTTP service request to the process responsible for the corresponding connection and

time the server completes transferring the response to the client.

Connection delay includes the queuing delay on the server's well known port. As

explained in the previous section, such queuing delay may be significant even when CPU

utilization on the server is not high due to lack of available server processes/threads.

Processing delay of requests on already established connections, on the other hand, is

much smaller because such requests do not get queued in the server’s well-known port.

Hence, we focus on applying delay differentiation only to connection delays. Using

connection delays as the delay metric of choice is also desirable for another reason.

Besides being the dominant delay, it is also less of a function of client-side factors than

the processing delay. The processing delay is dominated by the time it takes to send the

response to the client which depends on TCP throughput. If the client is slow, TCP flow

control will reduce the response transfer rate accordingly. Since processing delay depends

on client speed, it is not an appropriate metric of server performance quality that is

attributable to the server.

Suppose every HTTP request belongs to a class k (0 ≤ k < N). The connection delay

Ck(m) of class k at the mth sampling instant is defined as the average connection delay of

all established connections of class k within the ((m-1)S, mS) sec, where S is a constant

sampling period. Connection delay guarantees are defined as follows. For simplicity of

119

presentation, we use delay to refer to connection delay interchangeably in the rest of this

paper.

Relative Delay Guarantee: A desired relative delay Wk is assigned to each class

k. A relative delay guarantee {Wk | 0 ≤ k < N} requires that Cj(m)/Cl(m) = Wj/Wl

for ∀ classes j and l (j ≠ l). For example, if class 0 has a desired relative delay of

1.0, and class 1 has a desired relative delay of 2.0, it is required that the

connection delay of class 0 should be half of that of class l.

Absolute Delay Guarantee: A desired (absolute) delay Wk is assigned to each

class k. An absolute delay guarantee {Wk | 0 ≤ k < N} requires that Cj(m) ≤ Wj for

∀ classes j if ∃ a class l > j and Cl(m) ≤ Wl (a lower class number means a higher

priority). Note that since system load can grow arbitrarily high in a web server, it

is impossible to satisfy the desired delay of all service classes under overload

conditions. The absolute delay guarantee requires that all classes receive

satisfactory delay if the server is not overloaded; otherwise desired delays are

violated in the predefined priority order, i.e., low priority classes always suffer

guarantee violation earlier than high priority classes.

Based on the relative and absolute delay guarantees, different hybrid guarantees can

be composed for the specific requirements of the application. For example, the hybrid

guarantee described in Section 5.1 can be formulated as follows.

Hybrid Delay Guarantee: Each class k is assigned a value Wk that represents

both its desired delay and its desired relative delay. The hybrid guarantee {Wk | 0

120

≤ k < N} provides the relative delay guarantees if the desired absolute delay of

every class is satisfied. When the server is severely overloaded and desired delays

cannot be provided to all classes, the hybrid guarantee provides absolute delay

guarantees to high priority classes at the cost of violating the delays of low

priority classes. This hybrid guarantee provides the flexibility of the proportional

differentiated service in nominal overload while bounding the delay of high

priority classes in severe overload conditions.

TCP listen queue

TCP connection requests Connection

Scheduler

HTTP response

Web ServerWeb

ServerServerProcess

monitorControllers

{Wk | 0 ≤ k < N}

{Ck | 0 ≤ k < N}

{Bk | 0 ≤ k < N}

HTTP service requests

Figure 5.1. The Feedback-Control Architecture for Delay Guarantees

5.4. A Feedback Control Architecture for Web Server QoS

In this section, we present an adaptive web server architecture (as illustrated in Figure

5.1 ) to provide the above delay guarantees. A key feature of this architecture is the use of

feedback control loops to enforce desired relative/absolute delays via dynamic

reallocation of server process. The architecture is composed of a Connection scheduler, a

Monitor, a Controller, and a fixed pool of server processes. We describe the design of the

121

components in the following subsections.

5.4.1. Connection Scheduler

The Connection Scheduler serves as an actuator to control the delays of different

classes. It listens to the well-known port and accepts every incoming TCP connection

request. The Connection Scheduler uses an adaptive proportional share policy to allocate

server processes to connections from different classes5. At every sampling instant m,

every class k (0 ≤ k < N) is assigned a process budget, Bk(m), i.e., class k should be

allocated at most Bk(m) server processes in the mth sampling period. For a system with

absolute delay guarantees (Section 5.4.4)), the total budgets of all classes may exceed the

total number of server processes, which is a condition called control saturation. In this

case, the process budgets are satisfied in the priority order until every process has been

allocated to a class. This policy means that the process budgets of high priority classes

are always satisfied before those of low priority classes, and thus the correct order of

guarantee violations can be achieved. For a server with relative delay guarantee, our

Relative Delay Controllers always guarantee that the total budget equals the total number

of processes (Section 5.4.4). For each class k, the Connection Scheduler maintains a

(FIFO) connection queue Qk and a process counter Rk. The connection queue Qk holds

connections of class k before they are allocated server processes. The counter Rk is the

number of processes allocated to class k. After an incoming connection is accepted, the

Connection Scheduler classifies the new connection and inserts the connection descriptor

to the scheduling queue corresponding to its class. Whenever a server process becomes

5 Note that the Connection Scheduler uses process allocation instead of CPU allocation to control the delays of different classes. This is because processes may hold idle (persistent) connections and therefore

122

available, a connection at the front of a scheduling queue Qk is dispatched if class k has

the highest priority among all eligible classes {j| Rj < Bj(m)}.

For the above scheduling algorithm, a key issue is how to decide the process budgets

{Bk | 0 ≤ k < N} to achieve the desired relative or absolute delays {Wk | 0 ≤ k < N}. Note

that static mappings from the desired relative or absolute delay {Wk | 0 ≤ k < N} to the

process budget {Bk | 0 ≤ k < N} (e.g., based on system profiling) cannot work well when

the workloads are unpredictable and vary at run time (see performance results in Section

5.7.3). This problem motivates the use of feedback controllers to dynamically adjust the

process budgets {Bk | 0 ≤ k < N} to maintain desired delays.

Because the Controller can dynamically change the process budgets, a situation can

occur when a class k’s new process budget Bk(m) (after the adjustment in saturation

conditions described above) exceeds the total number of free server processes and

processes already allocated to class k. Such class k is called an under-budget class. Two

different policies, preemptive vs. non-preemptive scheduling, can be supported in this

case. In the preemptive scheduling model, the Connection Scheduler immediately forces

server processes to close connections of over-budget classes whose new process budgets

are less than the number of processes currently allocated to them. In the non-preemptive

scheduling, the Connection Scheduler waits for server processes to voluntarily release

connections of over-budget classes before it allocates enough processes to under-budget

classes. The advantage of the preemptive model is that it is more responsive to the

Controller’s input and load variations, but it can cause jittery delay in preempted classes

because they may have to re-establish connections with the server in the middle of

CPU is not necessarily the bottleneck resource under HTTP 1.1 protocols (as discussed in Section 5.2).

123

loading a web page. Only the non-preemptive scheduling is currently implemented in our

web server testbed.

5.4.2. Server Processes

The second component of the architecture (Figure 5.1) is a fixed pool of server

processes. Every server process reads connection descriptors from the connection

scheduler. Once a server process closes a TCP connection it notifies the connection

scheduler and becomes available to process new connections.

5.4.3. Monitor

The Monitor is invoked at each sampling instant m. It computes the average

connection delays {Ck(m) | 0 ≤ k < N} of all classes during the last sampling period. The

sampled connection delays are used by the Controller to compute new process

proportions.

5.4.4. Controllers

The architecture uses one Controller for each relative or absolute delay constraint. At

each sampling instant m, the Controllers compare the sampled connection delays {Ck(m) |

0 ≤ k < N} with the desired relative or absolute delays {Wk | 0 ≤ k < N}, and computes

new process budgets {Bk(m) | 0 ≤ k < N}6, which are used by the Connection Scheduler to

(non-preemptively) reallocate server processes during the following sampling period. We

first describe the Absolute Delay Controllers and the Relative Delay Controllers, and

briefly describe how to compose the Hybrid Delay Controllers based the Absolute and

6 It is the exact algorithm for this computation that control theory enables us to derive as described in the remainder of this section and Section 5.5.

124

Relative Delay Controllers in the end of this section.

The Absolute Delay Controllers

The absolute delay of every class k is controlled by a separate Absolute Delay

Controller CAk. The key parameters and variables of CAk, are shown in Table 5.1.

Reference VSk The reference of an Absolute Delay Controller CAk is the desired delay of class k, i.e., VSk = Wk.

Output Vk(m) From the Absolute Delay Controller CAk’s perspective, the system output Vk(m) at the sampling instant m is the sampled delay of class k,

i.e., Vk(m) = Ck(m). Error Ek(m) The difference between the reference and the output, i.e., Ek(m) = VSk –

Vk(m). Control input Uk(m) At every sampling instant m, the Absolute Delay Controller CAk

computes the control input Uk(m), i.e., the process budget Bk-1(m) of class k.

Table 5.1. Variables and Parameters of the Absolute Delay Controller CAk

The goal of the Absolute Delay Controller CAk is to reduce the error Ek(m) to 0 and

achieve the desired delay for class k. Intuitively, when Ek(m) = VSk – Vk(m) < 0, the

Controller should increase the process budget Uk(m) = Bk(m) to allocate more processes

to class k. At every sampling instant m, the Absolute Delay Controller calls PI

(Proportional-Integral) control [28] to compute the control input. A digital form of the PI

control function is

)()1

1),1(())1()(()1()(

)())()(()(0

bK

rKKgkrEkEgkUkU

ajEKkEKkU

IIPkkkk

k

jkIkPk

+=+=−−+−=

+= ∑= (5.1)

where g and r (or KP and KI) are design parameters called the controller gain and the

controller zero, respectively. Equations 5.1(a) and 5.1(b) are equivalent with each other,

125

and Equation 5.1(b) is used in our implementation. Performance of the web server

depends on the values of the controller parameters. An ad hoc approach to design the

controller is to conduct laborious experiments on different values of the parameters.

Instead, we apply control theory to tune the parameters analytically to guarantee the

desired performance in the web server. The design and tuning methodology is presented

in Section 5.5.

For a system with N service classes, the Absolute Delay Guarantee is enforced by N

Absolute Delay Controllers CAk (0 ≤ k < N). At each sampling instant m, each Controller

CAk computes the process budget of class k. Note that in overload conditions, the process

budgets (especially those of low priority classes) computed by the Absolute Delay

Controllers may not be feasible if the sum of the computed process budgets of all classes

exceeds the total number of server processes M, i.e., ∑jPk(m) > M. This is a situation

called control saturation. Because low priority classes should suffer guarantee violation

in overload conditions, the system always satisfy the computed process budgets in the

decreasing order of priorities until every server process has been allocated to a class7.

The Relative Delay Controllers

The relative delay of every two adjacent classes k and k-1 is controlled by a separate

Relative Delay Controller CRk. Each Relative Delay Controller CRk, has following key

parameters and variables. For simplicity of discussion, we use the same notations for the

corresponding parameters and variables of the Absolute Delay Controller and the

Relative Delay Controllers.

7 To avoid complete starvation of low priority classes, the system may reserve a minimum number of server

126

Reference VSk The reference of the Relative Delay Controller CRk is the desired delay ratio between class k and k-1, i.e., VSk = Wk/Wk-1.

Output Vk(m) From the perspective of the Relative Delay Controller CRk, the system output is the sampled delay ratio between class k and k-1, i.e., Vk(m) =

Ck(m) / Ck-1(m). Error Ek(m) The difference between the reference and the output, Ek(m) = VSk –

Vk(m).

Control input Uk(m)

At every sampling instant m, CRk computes the control input Uk(m) defined as the ratio (called the process ratio) between the number of processes to be allocated to class k-1 and k, Uk(m) = Bk-1(m) / Bk(m).

Table 5.2. Variables and Parameters of the Relative Delay Controller CRk Intuitively, when Ek(m) < 0, CRk should decrease the process ratio Uk(m) to allocate

more processes to class k relative to class k-1. The goal of the controller CRk is to reduce

the error Ek(m) to 0 and achieve the correct delay ratio between class k and k-1. Similar to

the Absolute Delay Controller, the Relative Delay Controller also uses PI (Proportional-

Integral) control (Equation (5.1)) to compute the control input (note that the parameters

and variables are interpreted differently in the Absolute Delay Controller and the Relative

Delay Controller).

For a system with N service classes, the Absolute Delay Guarantee is enforced by

N-1 relative Delay Controllers CRk (1 ≤ k < N). At every sampling instant m, the system

calculates the process budget Bk(m) of each class k as follows.

control_relative_delay ({Wk | 0 ≤ k < N}, {Ck(m) | 0 ≤ k < N}) {

set class (N-1)’s process proportion PN-1(m) = 1; S = PN-1(m);

for ( k = N-2; k ≥ 0; k--) { calls CRk+1 to get the process ratio Uk+1(m) between class k and k+1; the process proportion of class k Pk(m) = Pk+1(m)Uk(m) S = S + Pk(m);

} for ( k = N-1; k ≥ 0; k--)

processes to each service class.

127

Bk(m) = M (Pk(m) / S) }

The Hybrid Delay Controllers

The hybrid delay guarantee described in Section 5.3 can be implemented via dynamic

switching between the Absolute Delay Controllers and the Relative Delay Controllers.

The server switches from Relative Delay Controllers to Absolute Delay Controllers if the

absolute delay guarantee of the highest priority class is violated, i.e., C0(m) > W0 + H; On

the other hand, the server switches from Absolute Delay Controllers to Relative Delay

Controllers if C0(m) < W0 - H. The use of a threshold window ±H in the mode switching

condition is to avoid thrashing between the two sets of Controllers. Since the hybrid

delay guarantee is a straightforward extension of absolute and relative delay guarantees,

we focus on the design and evaluation of absolute and relative delay guarantees in the

rest of this chapter.

In summary, we have presented a feedback control architecture to achieve absolute,

relative and hybrid delay guarantees on web servers. A key component in this

architecture is the Controllers, which are responsible of dynamically computing correct

process budgets in face of unpredictable workload and system variations. In the rest of

the paper, we use the closed-loop server to refer to the adaptive web server with the

Controllers ( ), while the open-loop server refers to a non-adaptive web server without

the Controllers. We present the design and tuning of the Controllers in the next section.

5.5. Design of the Controller

In this Section, we apply the FCS framework to design the Relative Delay Controller CRk

128

and the Absolute Delay Controller CAk. In Section 5.5.1, we specify the performance

requirement of the Controllers. We then use system identification techniques to establish

dynamic models for the web server in Section 5.5.2. Based on the dynamic model, we use

the Root Locus method to design the Controllers that meet the performance specifications

(Section 5.5.3).

5.5.1. Performance Specifications

In this section, we use the performance specifications of the FCS framework to

characterize the performance requirement of our web server in terms of service delay

guarantees. The performance specifications of web server include in following:

• Stability: a (BIBO) stable system should have bounded output in response to

bounded input. To the Relative Delay Controller (with a finite delay ratio

reference), stability requires that the delay ratio should always be bounded at run-

time. To the Absolute Delay Controller, stability requires that the service delay

should always be bounded at run-time. Stability is a necessary condition for

achieving desired relative or absolute delays.

• Settling time Ts is the time it takes the output to converge to within 2% of the

reference and enter steady state. The settling time represents the efficiency of the

Controller, i.e., how fast the server can converge to the desired relative or

absolute delay. As an example, we assume that our web server requires the

settling time Ts < 5 min.

• Steady state error Es is the difference between the reference input and average of

output in steady state. The steady state error represents the accuracy of the

129

Relative Delay Controller or the Absolute Delay Controller in achieving the

desired relative or absolute delay. As an example, we assume that our web server

requires a steady state error |Es| < 0.1VS. Note that satisfying this steady state error

requirement means that the web server can achieve the desired relative or absolute

delays in steady state.

5.5.2. Modeling the Web Server: A System Identification Approach

A dynamic model describes the mathematical relationship between the input and the

output of a system (usually with differential or difference equations). This dynamic

model provides a basis for the analytical design of the Controller. In Section 4.3, we

approximate the aggregate dynamics of a generic CPU bound real-time systems with an

intuitive first order model. Unfortunately, such an analytical modeling approach cannot

be easily applied to more computer systems with more complicated or unknown

dynamics such as web servers. We adopt an empirical approach by applying a system

identification approach [11] to estimate the model such systems based on system

profiling data.

In our system identification approach, the controlled computer system is

approximated as a linear model described as a difference equation with unknown

parameters. A system engineer or administrator first uses a workload generator to

stimulate the controlled system with pseudo-random digital white-noise input [61]. He

then can use a least squares estimator [11] to estimate the model parameters. This system

identification methodology provides a practical solution for the modeling computing

systems with unknown dynamics, which has been a major barrier for applying feedback

control in such systems. We have also developed a software tool (as illustrated in Figure

130

5.2) to facilitate the system identification of web servers.

We now apply system identification to establish a dynamic model for the controlled

web server system (including the Connection scheduler, the server processes, and the

Monitor) for the purpose of controlling relative/absolute delays. From the perspective of

a Relative Delay Controller CRk, the control input to the controlled system is the process

ratio Uk(m) = Bk-1(m)/Bk(m). The output of the controlled system is the delay ratio Vk(m) =

Ck(m)/Ck-1(m). From the perspective of an Absolute Delay Controller CAk, the (control)

input of the controlled system is the process budget Uk(m) = Bk(m). The output of the

controlled system is the delay Vk(m) = Ck(m). We intentionally use the same symbols for

input and output for Relative and Absolute Delay Controllers because the design

methodology described below applies to both cases. Assuming the controlled system

models for different classes are similar, we skip the class number k of Uk(m) and Vk(m) in

the rest of this Section. Our experimental results (Section 5.7.2) establishes that, for both

relative and absolute delay control, the controlled system can be modeled as a second

order difference equation with adequate accuracy for the purpose of control design. The

architecture for system identification is illustrated in Figure 5.2. We describe the

components of the architecture in the following subsections.

131

Web ServerWeb

ServerTCP listen queue

TCP connectionrequests Connection

Scheduler

HTTP response

ServerProcess

Least squaresestimator

white-noisegenerator

{ C0, C1 }

{ B0, B1 }

Modelparameters

monitor

HTTP service requests

Figure 5.2. Architecture for system identification

Model Structure

The web server is modeled as a difference equation with unknown parameters, i.e., an

nth order model can be described as follows,

∑∑==

−+−=n

jj

n

jj jmUbjmVamV

11)2.5()()()(

In an nth order model, there are 2n parameters {aj, bj | 1 ≤ j ≤ n} that need to be

decided by the least-squares estimator. The difference equation model is motivated by the

fact that the output of an open-loop server depends on previous inputs and outputs.

Intuitively, the dynamics of a web server is due to the queuing of connections and the

non-preemptive scheduling mechanism. For example, the connection delay may depend

on the number of server processes allocated to its class in several previous sampling

periods. For another example, after class k’s process budget is increased, the Connection

Scheduler has to wait for connections of other classes to voluntarily release server

processes to reclaim enough processes to class k.

132

White Noise Input

To stimulate the dynamics of the open-loop server, we use a pseudo-random digital

white noise generator to randomly switch two classes’ process budgets between two

configurations. White noise input has been commonly used for system identification [11].

The white noise algorithm because we use a standard algorithm [61].

Least Squares Estimator

The least squares estimator is the key component for system identification. In this

section, we review its mathematical formulation and describe its use to estimate the

model parameters. The derivation of estimator equations is given in [12]. The estimator is

invoked periodically for at every sampling instant. At the mth sampling instant, it takes as

input the current output V(m), n previous outputs V(m-j) (1 ≤ j ≤ n), and n previous

inputs U(m-j) (1 ≤ j ≤ n). The measured output V(m) is fit to the model described in

Equation (5.2). Define the vector q(m) = (V(m-1) … V(m-n) U(m-1) …U(m-n))T, and the

vector θ(m) = (a1(m)…an(m) b1(m)… bn(m))T, i.e., the estimations of the model

parameters in Equation (5.2). These estimates are initialized to 1 at the start of the

estimation. Let R(m) be a square matrix whose initial value is set to a diagonal matrix

with the diagonal elements set to 10. The estimator’s equations at sampling instant m are

[11]:

)5.5())1()()()()(1()()4.5())1()()()(()()1()1()(

)3.5()1)()1()(()( 1

−−−=

−−−+−=+−= −

mRmqmmqImRmRmmqmVmmqmRmm

mqmRmqm

T

T

T

γθγθθ

γ

At any sampling instant, the estimator can “predict” a value Vp(m) of the output by

substituting the current estimates θ(m) into Equation (5.2). The difference V(m)-Vp(m)

133

between the measured output and the prediction is the estimation error. It was proved that

the least squares estimator iteratively updates the parameter estimates at each sampling

instant such that ∑0≤i≤m(V(i) - Vp(m))2 is minimized.

Our system identification results (Section 5.7.2) established that, the controlled

system can be modeled as a second order difference equation,

V(m) = a1V(m-1) + a2V(m-2) + b1U(m-1) + b2U(m-2) (5.6a)

In the case of relative delay control, V(m) denotes the delay ratio between the two

controlled classes, and U(m) denotes the process ratio between the two controlled classes,

and the estimated model parameters are (Section 5.7.2):

(a1, a2, b1, b2) = (0.74, -0.37, 0.95, -0.12) (5.6b)

In the case of absolute delay control, V(m) denotes the delay of one controlled class,

and U(m) denotes the process budget of the controlled class. The estimated model

parameters based on system identification experiments (Section 5.7.2) are

(a1, a2, b1, b2) = (-0.08, -0.2, -0.2, -0.05) (5.6c)

5.5.3. Root-Locus Design

Given a model described by Equation (5.6a), we can apply control theory to design

the Relative Delay Controller and the Absolute Delay Controller. The controlled system

model in Equation (5.6a) can be converted to a transfer function G(z) in z-domain

(Equation 7). The transfer function of the PI controller (Equation 1) in the z-domain is

Equation (5.8). Given the controlled system model and the Controller model, the transfer

function of the closed loop system is Equation (5.9).

134

)9.5()()(1

)()()(

)8.5(1

)()(

)7.5()()()(

212

21

zGzDzGzDzG

zrzgzD

azazbzb

zUzVzG

c +=

−−=

−−+

==

We use the Root Locus [32] to tune the controller gain g and the controller zero r so

that the performance specs can be satisfied. We only summarize results of the design in

this thesis. The details of the design process can be found in control textbooks such as

[32].

To design the Relative Delay Controller, we use the Root Locus tool to plot the traces

of the closed loop poles (based on the model parameters in Equation (5.6b)) as the

controller gain increases are illustrated on the z-plane in Figure 5.3. The closed-loop

poles are placed at

p0 = 0.70, p1,2 = 0.38±0.62i (5.10a)

by setting the Relative Delay Controller’s parameters to

g = 0.3, r = 0.05 (5.10b)

Similarly, to design the Relative Delay Controller (based on the model parameters in

Equation (5.6c)), the closed-loop poles are placed at

p0 = 0.607, p1,2 = -0.30±0.59i (5.11a)

by setting the Absolute Delay Controller’s controller parameters to

g = -4.6, r = 0.3 (5.11b)

The above pole placement is chosen to achieve the following properties in the closed

loop system [32]:

135

Stability: The closed-loop system with the Relative Delay Controller (with

parameters in Equation (5.10b)) or the Absolute Delay Controller (with

parameters in Equation (5.11b)) guarantees stability because all the closed-loop

poles are in the unit circle, i.e., |pj| < 1 (0 ≤ j ≤ 2) (Equations (5.10a) and (5.11a)).

Settling time: According to control theory, decreasing the radius (i.e., the

distance to the origin in the z-plane) of the closed-loop poles usually results in

shorter settling time. The Relative Delay Controller (with Equation (5.10b))

achieves a settling time of 270 sec, and the Absolute Delay Controller (with

Equation (5.11b)) achieves a settling time of 210 sec, both lower than the required

settling time (300 sec) defined in Section 5.5.1.

Steady state error: Both the Relative Delay Controller and the Absolute Delay

Controller achieve zero steady state error, i.e., Es = 0. This result can be easily

proved using the Final Value Theorem in digital control theory [28]. This result

means that, in steady state, the closed-loop system with the Relative Delay

Controller or the Absolute Delay Controller guarantees the desired relative delays

or the desired absolute delays, respectively.

In summary, using feedback control theory techniques including system identification

and the Root Locus design, we systematically design the Relative Delay Controller and

the Absolute Delay Controller that analytically provide the desired relative or absolute

delay guarantee and meet the transient and steady state performance specifications

described in Section 5.5.1. This result further shows the strength of the control-theory-

based design framework for adaptive computing systems.

136

Root Locus Closed Loop Poles

Figure 5.3. The Root Locus of the web server model

5.6. Implementation

We now describe the implementation of the web server. We modified the source code

of Apache 1.3.9 [11] and implemented a new library as a Connection Manager (including

the Connection Scheduler, the Monitor and the Controllers). The server was written in C

on a Linux platform. The server is composed of a Connection Manager process and a

fixed pool of server processes (modified from Apache). The Connection Manager process

communicates with each server process with a separate UNIX domain socket.

The Connection Manager runs a loop that listens to the web server’s TCP socket

and accepts incoming TCP connection requests. Each connection request is

classified based on its sender’s IP address8 and scheduled by a Connection

Scheduler function. The Connection Scheduler dispatches a connection by

sending its descriptor to a free server process through the corresponding UNIX

137

domain socket. The Connection Scheduler time-stamps the acceptance and

dispatching of each connection. The difference between the acceptance and the

dispatching time is recorded as the connection delay of the connection. Strictly

speaking, the connection delay should also include the queuing time in the TCP

listen queue in the kernel. However, the kernel delay is negligible in this case

because the Connection Manager always greedily accepts (dequeues) all incoming

connection requests in a tight loop.

The Monitor and the Controllers are invoked periodically at every sampling

instance. For each invocation, the Monitor computes the average delay for each

class. This information is then passed to the Controllers, which implements the

control algorithm to compute new process budgets.

We modified the code of the Apache server processes so that they accept

connection descriptors from UNIX domain sockets (instead of common TCP

listen socket as in the original Apache server). When a server process closes a

connection, it notifies the Connection Manager of its new status by sending a byte

of data to the Connection Manager through the UNIX domain socket.

The server can be configured to a closed-loop/open-loop server by turning on/off the

Controllers. An open-loop server can be configured for either system identification or

performance evaluation.

8 Other classification criteria include cookies, browser plug-ins, request type/path, and virtual servers [17].

138

5.7. Experimentation

All experiments were conducted on a testbed of fives PC’s connected with 100 Mbps

Ethernet. Each machine had a 450MHz AMD K6-2 processor and 256 MB RAM. One

machine was used to run the web server with HTTP 1.1, and up to four other machines

were used to run clients that stress the server with a synthetic workload. The

experimental setup was as follows.

• Client: We used SURGE [14] to generate realistic web workloads in our

experiments. SURGE uses a number of user equivalents (also called users for

simplicity) to emulate the behavior of real-world clients. The load on the server can

be adjusted by changing the number of users on the client machines. Up to 500

concurrent users were used in our experiments.

• Server: The total number of server processes was configured to 128. Since service

differentiation is most necessary when the server is overloaded, we set up the

experiment such that the ratio between the number of users and the number of server

processes could drive the server to overload. Note that although large web servers

such as on-line trading servers usually have more server processes, they also tend to

have many more users than the workload we generated. Therefore, our configuration

can be viewed an emulation of real-world overload scenarios at a smaller scale. The

sampling period S was set to 30 sec in all the experiments. The connection

TIMEOUT of HTTP 1.1 was set to 15 sec.

In Section 5.7.1, we present experimental results that compare connection delays with

139

response time of a server with HTTP 1.1. The experiments on system identification are

presented in Section 5.7.2. We present the evaluation of the closed-loop server in Section

5.7.3.

0.005.00

10.0015.0020.0025.0030.0035.00

0 200 400 600

Number of Users

Tim

e (s

ec)

Connection Delay Response Time

Figure 5.4. Connection delay and response time

5.7.1. Comparing Connection Delays and Response Times

In the first set of experiments, we compare the average connection delay and the

average response time (per HTTP request) of an open-loop server (see Figure 5.4) to

justify the use of connection delay as a metric for service differentiation in web servers

with HTTP 1.1. All connections are treated as being in a same class and all server

processes are allocated to the class. Every point in Figure 5.4 refers to the average

connection delay or average response time in four 10-minute runs with a same number of

users. The 90% confidence intervals are within 0.58 sec to all the presented average

connection delays, and within 0.21 sec to all the presented average response times. The

connection delay is significantly higher and increases at a much faster rate than the

response time as the number of users increases. For example, when the number of users is

400, the connection delay is 4.9 times the response time. Note that the average response

time is computed based on two types of requests, i.e., the response time (including the

140

connection delay and the processing delay) of the first request of each connection and the

response time (including only the processing time) of each subsequent request. The

difference between connection delay and response time is due to the fact that processing

delay is on average significantly shorter than connection delay. We also run similar

experiments with 256 server processes (the maximum number allowed by the original

Apache on Linux). With 256 server processes, the ratio between the connection delay and

the response time is similar to that presented in Figure 5.4. For example, the connection

delay was 5.3 times the response time when 400 users are used. This result justifies our

decision to use connection delay as a metric for service differentiation in web servers

with HTTP 1.1.

141

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-1

0

1

Est

imat

ed p

aram

eter

s

(a) Estimated model parameters (second order model)

a1a2b1b2

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(b) Modeling error (second-order model)

actualestimate

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(c) Modeling error (first-order model)

actualestimate

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

2

4

6

8

Del

ay R

atio

(d) Modeling error (third-order model)

actualestimate

Figure 5.5. System identification results for Relative Delay

5.7.2. System Identification

We now present the results of system identification experiments for both relative

delay and absolute delay to establish a dynamic model for the open-loop system. Four

client machines are divided into two classes 0 and 1, and each class has 200 users. We

begin with the relative delay experiments. The input, process ratio U(m) = B0(m)/B1(m), is

initialized to 1. At each sampling instant, the white noise randomly sets the process ratio

to 3 or 1. The sampled output, the relative delay V(m) = C1(m)/C0(m) is fed to the least

squares estimator to estimate model parameters (Equation (5.2)). Figure 5.5(a) shows that

the estimated parameters of a second order model (Equation (5.6)) at successive sampling

142

instants in a 30 min run. The estimator and the white noise generator are turned on 2 min

after SURGE started in order to avoid its start-up phase. We can see that the estimations

of the parameters (a1, a1, b1, b2) converge to (0.74, -0.37, 0.95, -0.12). Substituting the

estimations into Equation (5.6), we established an estimated second-order model for the

open-loop server. To verify the accuracy of the model, we re-run the experiment with a

different white noise input (i.e., with a different random seed) to the open-loop server and

compare the actual delay ratio and that predicted by the estimated model. The result is

illustrated in Figure 5.5(b). We can see that prediction of the estimated model is

consistent with the actual relative delay throughout the 30 min run. This result shows that

the estimated second order model is adequate for designing the Relative Delay

Controller. We also re-ran the system identification experiments to estimate a first order

model and a third order model. The results demonstrate that the estimated first order

model had larger prediction error than the second order model (see Figure 5.5(c)), while

an estimated third order model does not tangibly improve the modeling accuracy (see

Figure 5.5(d)). Hence the second order model is chosen as the best compromise between

accuracy and complexity.

The system identification experiments are repeated with the same workload and

configurations for the absolute delay. The input of the open loop system is the process

budget U(m) = B0(m) of class 0, which is initialized to 64. At each sampling instant, the

white noise randomly sets the process budget to 96 or 64. The output is the sampled delay

V(m) = C0(m) of class 0. To linearize the model, we feed the difference between two

consecutive inputs (B0(m) - B0(m-1)) and the difference between two consecutive outputs

(C0(m) - C0(m-1)) to the least squares estimator to estimate the model parameters in

143

Equation (5.2). Figure 5.6(a) shows that the estimated parameters of the second order

model (Equation (5.6)) at successive sampling instants in a 30 min run. The estimations

of the parameters (a1, a1, b1, b2) converge to (-0.08, -0.2, -0.2, -0.05). To verify the accuracy

of the model, we re-run the experiment with a different white noise input to the open-loop

server and compare the actual difference between two consecutive delay samples with

that predicted by the estimated model (Figure 5.6(b)). Similar to the relative delay case,

the prediction of the estimated model is consistent with the actual delay throughout the 30

min run. This result shows that the estimated second order model is adequate for

designing the Absolute Delay Controller.

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-0.5

0.0

0.5

Est

imat

ed m

odel

par

amet

ers

(a) Estimated model parameters (second-order model)

a1a2b1b2

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

-20

-10

0

10

20

C0(

m)-

C0(

m-1

) (s

econ

d)

Modeling error (second-order model)

actualestimate

Figure 5.6. System Identification Results for Absolute Delay

5.7.3. Evaluation of the Adaptive Web Server

In this section, we present evaluation results for our adaptive web server. We first

present the evaluation results of the Relative Delay Controller. We then present the

results for guaranteeing the relative delays of a server with three classes. The evaluation

results of absolute delay guarantee are presented in the end of this section.

144

Evaluation of Relative Delay Guarantees between Two Classes

To evaluate the relative delay guarantee in a server with two classes, we set up the

experiments as follows.

Workload: Four client machines are evenly divided into two classes. Each client

machine has 100 users. In the first half of each run, only one client machine from

class 0 and two client machines from class 1 (100 users from class 0 and 200

users from class 1) generate HTTP requests to the server. The second machine

from class 0 starts generating HTTP requests 870 sec later than the other three

machines. Therefore, the user population changes to 200 from class 0 and 200

from class 1 in the latter half of each run.

Closed-loop server: The reference input (the desired delay ratio between class 1

and 0) to the Controller is W1/W0 = 3. The process ratio B0(m)/B1(m) is initialized

to 1 in the beginning of the experiments. To avoid the starting phase of SURGE,

the Controller is turned on 150 sec after SURGE started. The sampled absolute

connection delays and the delay ratio between the two classes are illustrated in

Figure 5.7(a) and (b), respectively.

Open-loop server: An open-loop server is also tested as a baseline. The open-

loop server is fine-tuned to have a “correct” process allocation based on profiling

experiments using the original workload (100 class 0 users and 200 class 1 users).

The results of the open-loop server are illustrated in Figure 5.7(c)(d).

We first look at the first half of the experiment on the closed-loop server (Figure 5.7

145

(a)(b)). When the Controller is turned on at 150 sec, the delay ratio C1(m)/C0(m) = (28.5

sec / 6.5 sec) = 4.4 due to incorrect process allocation. The Controller dynamically

reallocates processes and changes the relative delay to the vicinity of the reference W1/W0

= 3. The relative delay stays close (within 10%) to the reference at most sampling instants

after it converged. This demonstrates that the closed-loop server can guarantee the

desired relative delay. Compared with an open-loop server, a key advantage of a closed-

loop server is that it can maintain robust relative delay guarantees when workload varies.

Robust performance guarantees are especially important in web servers, which often face

with unpredictable and bursty workload [26]. The robustness of our closed-loop server is

demonstrated by its response to the load variation starting at 870 sec (Figure 5.7(a)(b)).

Because the number of users of class 0 suddenly increases from 100 to 200, the delay

ratio drops from 3.2 (at 870 sec) to 1.2 (at 900 sec) - far below the reference W1/W0 = 3.

The Controller reacts to load variation by allocating more processes to class 0 while

deallocating processes from class 1. By time 1140 sec, the relative delay successfully re-

converges to 2.9.

In contrast, while the open-loop server achieves satisfactory relative delays when the

workload conforms to its expectation (from 150 sec to 900 sec), it violates the relative

delay guarantee after the workload changes (see Figure 5.7(c)(d)). After the workload

changes (from 960 sec to the end of the run), connections from class 0 consistently have

longer delays than connections from class 1.

In terms of the control metrics, the closed-loop server maintains stability because its

relative delay is clearly bounded throughout the run. We observe from (Figure 5.7(b))

that the server renders satisfactory efficiency and accuracy in achieving the desired

146

relative delays. In particular, in response to the workload variation at time 870 sec, the

duration of the distinguishable performance deviation from the reference lasts for 180 sec

(from 900 sec to 1080 sec), well within the theoretical settling time of 270 sec based on

our design (Section 5.5.3). The delay ratio stays close to the reference in steady state,

which demonstrates a small steady state error9.

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (s

econ

d)

(a) Close-loop: Connection Delays C(0) and C(1)

Class 0Class 1

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

1

2

3

4

5

Del

ay R

atio

C1(

m)/C

0(m

)

(b) Close-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))

referenceDelay RatioProcess Ratio

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (s

econ

d)

(c) Open-loop: Connection Delays C(0) and C(1)

Class 0Class 1

0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800

Time (second)

0

1

2

3

4

5

Del

ay R

atio

C1(

m)/C

0(m

)

(d) Open-loop: Delay Ratio (C1(m)/C0(m)) and Process Ratio (P0(m)/P1(m))

referenceDelay RatioProcess Ratio

Figure 5.7. Evaluation Results of Relative Delay Guarantees between Two Classes

Evaluation of a Server with Three Classes

In the next experiment, we evaluate the performance of a closed-loop server with

9 Due to the noise of the server caused by the random workload, it is impossible to precisely quantify the settling time and steady state error based on the ideal definitions (Section 5.5.1).

147

three classes. Each class has a client machine with 100 users. The Controller is turned on

at 150 sec. The desired relative delays are (W0, W1, W2) = (1, 2, 4). The process

proportions are initialized to (P0, P1, P2) = (1, 1, 1). From Figure 5.8, we can see that the

connection delay begin at (C0, C1, C2) = (14.6, 17.3, 17.5) which has the ratio (1, 1.2,

1.2), and then changes to (C0, C1, C2) = (9.3, 16.2, 33.9) which has the ratio (1, 1.7, 3.6),

i.e., close to the desired relative delay, 240 sec after the Controller is turned on. The

relative connection delay remains bounded and close to the desired relative delay in

steady state. This experiment demonstrates the Relative Controllers can guarantee desired

relative delays for more than two classes.

0 500 1000 1500

Time (second)

0

20

40

60

Con

nect

ion

Del

ay (

seco

nd)

Class 0Class 1Class 2

Figure 5.8. Evaluation Results of Relative Delay Guarantees for Three Classes

Evaluation of Absolute Delay Guarantees

In this section, we evaluate the absolute delay guarantee for two classes. The

experiment is set up as follows.

Workload: The same workload as in the experiments for relative guarantee is

used to evaluate the absolute guarantees. In the first half of each run, 100 users

148

from class 0 and 200 users from class 1 generate HTTP requests to the server.

Another 100 users from class 0 start generating HTTP requests 870 sec later than

the original users. Thus the user population changes to 200 from class 0 and 200

from class 1 in the latter half of each run.

Closed-loop server: The reference input (the desired delays for class 1 and 0) to

the Controller is (W0, W1) = (10, 30) (sec). The process budgets (B0(m), B1(m)) are

initialized to 64 for each class in the beginning of the experiments. To avoid the

start up phase of SURGE, the Controller is turned on 150 sec after SURGE

started. The sampled absolute connection delays of the two classes are illustrated

in Figure 5.9(a).

Open-loop server: An open-loop server is tested as a baseline. The open-loop

server is fine-tuned to have a “correct” process allocation to achieve the desired

absolute delays based on profiling experiments using the original workload (100

class 0 users and 200 class 1 users). The results of the open-loop server are

illustrated in Figure 5.9(b).

In the first half of the experiment on the closed-loop server (Figure 5.9(a)), the

Controllers dynamically allocate processes and the delays of both classes remain close to

their desired delay (10 sec and 30 sec, respectively). At time 870 sec, the number of users

of class 0 suddenly increases from 100 to 200, and the delay of class 0 increases from 8.4

sec (at time 870 sec) to 20.0 sec (at time 900 sec) – violating its absolute delay guarantee

(10 sec). The Controllers react to the load variation by allocating more processes to class

0 and decreasing the number of processes allocated to class 1. By time 1020 sec, the

149

delay of class 0 successfully re-converges to 9.6 sec at the cost of violating the delay

guarantee of the low priority class (class 1)10.

In comparison, while the open-loop server achieves satisfactory delays for both

classes when the workload is similar to its expectation (from 150 sec to 900 sec), it fails

to provide delay guarantee for class 0 with the highest priority, after the workload

changes (see Figure 5.9(b)). Instead, connections from class 0 consistently have longer

delays than connections from class 1 after the workload changes, i.e., the open-loop

server fails to achieve the desired delay for the high priority class.

Note that while both the open loop server and the closed loop server violate the delay

guarantee of one service class, the closed loop server provides the correct order of

guarantee violation by discriminating against the low priority class, while the open loop

server fails to achieve the correct order. In terms of control metrics, the unsaturated (high

priority class) controller maintains stability because its delay is clearly bounded

throughout the run. Note that because the system load can grow arbitrarily, Absolute

Delay Controllers (especially those of low priority classes) can saturate and becomes

unstable in overload conditions even if it is tuned correctly. We observe from (Figure

5.9(a)) that the server renders satisfactory efficiency and accuracy in achieving the

desired delay for the high priority class (class 0). In particular, in response to the

workload variation at time 870 sec, the duration of the distinguishable performance

deviation from the reference lasts for 60 sec (from 930 sec to 990 sec), well within the

theoretical settling time of 210 sec based on the control design (Section 5.5.3). The delay

10 The low priority class suffers long delay in the second half of the experiment. This is because that the server devotes most processes to the high priority class to enforce its absolute delay guarantee and consequently starves low priority classes.

150

of class 0 stays close to the reference in steady state, which demonstrates a small steady

state error for high priority class, i.e., the desired delay of the high priority class is

guaranteed in steady state even when the server is severely overloaded.

0 500 1000 1500

Time (second)

1030

0

50

100

150

200

250

Lat

ency

(se

cond

)

(a) Connection Delays of the Closed Loop Server

Class 0Class 1

0 500 1000 1500

Time (second)

0

50

100

150

200

250

Lat

ency

(se

cond

)

(b) Connection Delays of the Open Loop Server

Class 0Class 1

Figure 5.9. Evaluation of Absolute Delay Guarantees In summary, our evaluation results demonstrate that the closed-loop server provides

robust relative and absolute delay guarantees even when workload significantly varied.

Properties of our adaptive web server also include guaranteed stability, satisfactory

efficiency and accuracy in achieving desired delay or relative delay differentiation. The

experimental results are also consistent with our theoretical analysis, which verifies the

correctness of our design methodology and dynamic system model for real-time systems.

5.8. Summary

We apply the FCS framework to the develop an adaptive architecture to provide

151

relative, absolute and hybrid service delay guarantees for different service classes on web

servers under HTTP 1.1. The first contribution of this work is the architecture based on

feedback control loops that enforce delay guarantees for different classes via dynamic

connection scheduling and process reallocation. The second contribution is our use of

feedback control theory to design the feedback loop with proven performance guarantees.

In contrast with ad hoc approaches that often rely on laborious tuning and design

iterations, our control theory approach enables us to systematically design an adaptive

web server with established analytical methods. The design methodology includes using

system identification to establish dynamic models for a web server, and using the Root

Locus method to design feedback controllers to satisfy performance specifications. The

adaptive architecture has been implemented by modifying an Apache web server.

Experimental results demonstrate that our adaptive server provides robust delay

guarantees even when workload varies significantly. Properties of our adaptive web

server also include guaranteed stability, and satisfactory efficiency and accuracy in

achieving desired delay or delay differentiation. In the future, we will extend our

architecture to web server farms. We are also interested to achieve service delay

guarantees in web servers supporting dynamic contents (e.g., database queries and media

streaming) where feedback control scheduling of multiple resources (CPU, memory, and

storage) may be necessary to handle different run-time conditions.

152

Chapter 6

Online Data Migration in Storage

Systems11

6.1. Introduction and Motivations

The storage requirements of enterprise-scale computing systems are currently

increasing at a very fast pace. Taking into account only online data (excluding tapes,

optical disks, and other tertiary storage media), storage system capacity has been

doubling in size every six to twelve months [60]. Current enterprise systems store up to

tens of terabytes in tens to hundreds of disk arrays and enclosures, interconnected by

storage area networks (SAN) such as Fibre Channel [2] or Gigabit Ethernet [1]. In many

cases, large data sets are spread over geographically distributed locations, with some

degree of replication for failure and disaster recovery.

It is extremely difficult to make good data placement decisions at this level of

complexity. Ideally, data should be close in terms of low latency and high effective

11 The work presented in this chapter was done when the author was a research intern at HP Labs.

153

bandwidth to the applications using it, while continuing to provide the required level of

reliability. More importantly, even if a data placement is initially adequate at some point

in the life of the system, it may become inadequate on short notice. New devices added

to the system need to be populated with data in order to balance the load; symmetrically,

some devices may be taken offline for repairs or due to obsolescence. Failures may occur

in the system without advance warning; even if the level of redundancy remains high

enough to prevent data loss, the performance in degraded mode may be unacceptable, and

the previous level of redundancy may need to be re-established by creating more replicas

of critical data. Finally, the performance achievable with a given data placement depends

on how the data is accessed, and by whom. Access patterns may change because of

gradual trends (e.g. more customers of a company), or seasonal variations (e.g. load

spikes for e-tailers before the holidays), or periodic application characteristics (e.g. the

west coast branch of a multinational opens shortly after the European offices have closed

for the day). The placement of data onto storage devices may change many times during

the lifetime of a system. We consider backup as a particular case of migration, in which

the original copy is not erased; keeping online backups of critical data to minimize

switchover and recovery times is a widely followed practice in large installations.

In this chapter, we address the problem of migrating data in a storage system on-line,

i.e. the data being migrated is concurrently being accessed by applications running on the

storage system. Some existing solutions work offline by interrupting the system’s

operation while migration occurs. This has the benefit that it makes it easy to guarantee

that customer data will remain consistent, as the presence of non-coordinated concurrent

readers and writers could otherwise result in data corruption. However, global enterprises

154

such as distributed data centers and multinational corporations need to access their data

around the clock because of the planetary scale of their operations. The business costs of

bringing down their systems are unacceptable. Some other existing solutions take the

middle road, selectively blocking accesses to the subset of the data being migrated. The

drawback is that client applications may not be able to cope with the increased delays

(e.g., because of timeouts built into their code), and that the performance degradation can

be substantial, both because of some data being unavailable and because of the contention

for system resources between the migration task and the client applications. Some

systems have quiet periods during which operations can cease, or at least degraded

performance does not have a major impact; but many systems are in use all the time, and

even systems that are periodically quiescent will seldom be able to tolerate an arbitrary

degradation in performance because of a data migration caused by an unforeseen event

during business hours. It is not acceptable for data migration to arbitrarily disrupt the

quality of service provided to applications executing in parallel. If migration were

allowed to go unchecked, customers could see a significant slow-down because the

storage, host, and networking subsystems are busy relocating data.

Realistic applications in practical systems must satisfy quality-of-service (QoS)

requirements. Examples of these requirements include performance (throughput,

bandwidth, latency) and dependability (availability, reliability). In general, on-line data

migration should satisfy two conflicting requirements:

Performance isolation: Applications should not see persistent QoS violations

during data migration, i.e., on-line data migration should be transparent to

applications in terms of performance.

155

Efficiency: Data migration should be completed fast under the constraint of

achieving performance isolation.

We assume that migration should use as much as possible of the available system

resources left by applications (or, equivalently, we want to satisfy both requirements

while keeping QoS violations to a minimum). In general, this is only one possible design

point: the right equilibrium between the two requirements depends on the individual

needs of each storage system. Some may need the migration to be completed as fast as

possible even if application performance is impacted, while others may place a greater

emphasis on the QoS guarantees.

We present a novel approach to migrate data in storage systems. The main

contribution of this work is Aqueduct, an adaptive architecture that, based on periodic

measurements of the current performance of the applications, uses a feedback loop to

dynamically adjust the speed of data migration. Aqueduct completes a migration

efficiently while avoiding QoS violations in running applications. So far, research has

concentrated on minimizing the total backup/migration time, whereas our adaptive

approach may proceed slowly, but guarantees that QoS requirements will not be violated.

Our feedback loop has been designed systematically, following well-established concepts

and methodology in control theory as opposed to hand-tuned tuned heuristics for a

particular system. Another contribution of this work is a performance analysis of the

speed of data migration and its impact on concurrent applications on a networked storage

system testbed. Our performance evaluation showed that Aqueduct provides guarantees

desired application iops (number of I/O’s per second) for all devices during data

migration.

156

In Section 6.2, we present the design of the Aqueduct architecture. Section 6.3

describes the design and analysis of the control part of the Aqueduct feedback loop. The

implementation details are described in Section 6.4. In Section 6.5, we present the

evaluation experiments and the results. In Section 6.6, we conclude the chapter with a

summary and future works.

6.2. Aqueduct: a Feedback Control Architecture for Online Data Migration

We develop a feedback-control-based migration executor that dynamically converges

to the correct migration speed using on-line trial-and-error based on performance

feedback. A data migration management subsystem is composed of a migration planner

and a migration executor. Given an initial and a goal store assignment, the migration

planner computes a migration plan composed of a sequence of moves, and the migration

executor moves data across devices to complete the migration plan. Excessive migration-

penalties are unacceptable for applications with QoS requirements. Due to the

uncertainties in storage systems and the fact that existing workload models allow

fluctuations on arbitrary time scales [35], it is difficult to a priori predict the "correct"

speed of data migration that will not cause QoS violations and not too pessimistic at the

same time. In this thesis, we present Aqueduct, a feedback-control architecture for the

migration executor.

6.2.1. Migration Planner

The migration planner generates a migration plan as an input to the migration

executor and triggers the execution of data migration. A migration plan is composed of a

set of partially ordered moves. Each move is composed of the name of a store object to be

157

moved, the source device, the destination device, and the dependencies on other moves

representing the precedence constraints among moves. A store object represents a logical

entity of storage such as all the data of an e-mail server, a database, or the /usr directory

of a file system. The partial order may be due to the capacity constraints on the devices.

For example, the following example describes a move planmove10 that moves store

planTest_item14 from device c10t0d0 to c10t2d0, and three other moves must occur

before this move.

move planmove10 { {store planTest_item14} {source /dev/dsk/c10t0d0} {destination /dev/dsk/c10t2d0} {dependencies { planmove4 planmove15 planmove5 }}

}

Since the migration plan is only partially ordered, it is possible to conduct several

eligible moves in parallel. The current Aqueduct prototype (Section 6.4) only conducts

moves sequentially.

6.2.2. LV Mover

The LV Mover is a mechanism to move a logical volume from its current device to a

destination device. The LV Mover is implemented on top of the LVM (Logical Volume

Manager) [56] of HP-UX. When the LV Mover is invoked, the LV Mover first creates a

mirror of the logical volume on the destination device, and then splits the two copies with

the mirror as the logical volume’s master copy on the new device. The underlying LVM

guarantees the consistency of the data in the logical volume while it is being moved. The

LV Mover maintains the logical consistency of data being moved.

However, the LV Mover does not have control over the speed of each move. Instead,

the Actuator modulates the migration speed by enforcing an idle interval between

158

subsequent invocations of the LV Mover (see Section 6.2.7). Each store is divided into

small logical volumes called substores with a fixed size of Ssub MB so that the speed of

data migration can be controlled at a fine granularity. We call the move of a substore a

submove, and each move in a migration plan was executed in a sequence of submoves

until the whole store is moved. The sleep time between the end of a submove and the start

of the next submove is called the inter-submove-time.

6.2.3. QoS guarantees

Two orthogonal issues need to be addressed on QoS specifications for applications on

a storage system: What QoS metrics should be guaranteed? At what granularity should

guarantees be provided?

QoS metric

The QoS metrics for storage systems include latency, iops (number of completed

I/O’s per second) and bandwidth [19]. Ideally, guarantees should be provided in all the

required metrics, which may require a multi-input-multi-output control solution [55]. In

this thesis, we design the Aqueduct to provide guarantees only on iops as a first step

toward the full set of performance guarantees.

Granularity of QoS guarantees

QoS guarantees may be achieved at three different granularities.

Stream QoS guarantees give the finest granularity of QoS control. However, an

individual stream tends to render noisy behavior. Because stream-level

monitoring and control overheads are proportional to the number of streams,

stream-level guarantee may not scale well in enterprise storage systems with large

159

number of I/O streams.

Global QoS guarantee refers to a guarantee on aggregated performance of all

streams in a storage system. The aggregated behavior of large number of I/O

stream tends to be less noisy and easier to control than each individual stream.

The global QoS guarantee also avoids the scalability problem of the stream

guarantee. However, the global guarantee does not guarantee satisfactory

performance for each individual I/O stream. Our experimental results (Section

6.5.2) show that some devices can suffer especially severe performance

degradation even when the global aggregated performance specification may be

satisfied.

Device QoS guarantee is a trade-off between the stream and global QoS

guarantees. A device may be a single disk in arrays composed of independent

disks (JBOD). In a RAID, a device may be a LUN, or a Logical Unit, which are a

set of disks bound together using a layout such as RAID 1/0 or RAID 5, and

addressed as a single entity. With the device QoS guarantee, the aggregated

performance on each device achieves its QoS spec. Since the aggregated

performance of streams on each victim device is guaranteed, the probability of

stream-level QoS violation is lower than the global guarantee scheme. The device

QoS guarantee scheme can also scale better than the stream QoS guarantee

because its monitoring and control overhead is proportional to the number of

devices, which is usually much smaller than the number of streams.

Aqueduct provides the device QoS guarantee. The guaranteed device iops {ISi | 0 ≤ i

< N}, where N is the number of devices in the system, is listed in a contract file as an

160

input to Aqueduct. In the common case where all the devices (e.g., all the disks in a

JBOD or all the LUN’s in a RAID) are similar, the guaranteed device iops may be the

same. In our FCS framework, ISi is also called the performance reference of device i. The

contract may be directly specified by the system administrators or derived from

application requirements on each device. Note that the device iops only includes the I/O’s

performed on be half of the applications. The I/O’s for data migration are not considered

as part of the device iops.

Controller

Actuator PlannerMigrationPlan

Monitor

{ISi}{Ii(k)}

Rm(k)

Disk ArraysGoal

Assignment

InitialAssignment

LV Mover

LUN

I/O Streams

Applications

Contract

Figure 6.1. Aqueduct: The Feedback Control Architecture for Data Migration

6.2.4. The Feedback Control Loop

Aqueduct (illustrated in Figure 6.1) features a feedback control loop that is invoked at

every sampling instant kW where W is a constant sampling window.

1) The Monitor periodically samples the average device iops {Ii(k) | 0 ≤ i < N} in the

last sampling period ((k-1)W, kW).

2) The Controller compares the sampled iops {Ii(k) | 0 ≤ i < N} with the performance

references {ISi(k) | 0 ≤ i < N}, and computes a control input, the new speed of data

161

migration, in the next sampling period (kW, (k+1)W). Intuitively, Aqueduct

should slow down data migration when some devices’ iops are lower than their

corresponding performance references; and speed up data migration when all the

devices perform better than their performance references. The Controller

quantifies the mapping from {Ii(k) | 0 ≤ i < N} to the control input so that the

migration efficiently converges to the correct speed and avoid excessive

oscillations.

3) The Actuator moves data according to a migration plan while enforcing the

migration speed according the control input.

We now present the details of the major components of the feedback control loop in

the following sections.

6.2.5. The Monitor

At every sampling instant k, the Monitor collects the average iops of each device

{Ii(k) | 0 ≤ i < N} in the last sampling period and feeds it to the Controller. The Monitor

may be implemented on top of existing performance monitoring tools such as the HP

PerfView toolset (a part of the HP OpenView software) [37].

6.2.6. The Controller

At each sampling instant k, given the sampled iops {Ii(k) | 0 ≤ i < N} from the

Monitor, and the performance references {ISi(k) | 0 ≤ i < N} from the contract, the

Controller uses a control algorithm to compute a new migration speed for the next

sampling period (kW, (k+1)W). The migration speed is defined as the inter-submove-time

Tim(k) or submove rate Rm(k), i.e., number of submoves in the sampling period (kW,

162

(k+1)W). The control algorithm works as follows:

1) For each device 0 ≤ i < N, error Ei(k) = ISi - Ii(k). A device i has a negative

error if its iops ISi is less than its reference Ii(k).

2) Find the smallest error Emin(k) = min{Ei | 0 ≤ i < N}. If there are devices with

negative errors, Emin(k) is the negative error with the largest absolute value.

3) Compute the change in submove-rate according to a PI (Proportional-Integral)

control function [33]:

( ) ( )⎟⎠

⎞⎜⎝

⎛ •+•= ∑≤≤ ku

m uEKIkEKPkdR1

minmin)(

Equation 6.1

4) Compute the new submove-rate:

Rm(k) = Rm(k-1) + dRm(k) Equation 6.2

5) Convert Rm(k) to inter-submove-time:

Tim(k) = W/Rm(k) - Tm

where Tm is the average submove (measured via system profiling).

6) Notify the Actuator of the new inter-submove-time Tim (the control input).

The rationale for using inter-submove-time as the manipulated variable is that a

longer inter-submove-time reduces the resource consumed by data migration and

consequently improves the performance of concurrent I/O streams. In Section 6.5.3, we

present a set of profiling results to verify that device iops can be effectively controlled via

regulation of the inter-submove-time.

The control input Tim is computed based on the minimum error among all devices.

Therefore, the Controller dynamically adjusts the migration speed until the minimum

163

error converges to zero. This means that if the control function is properly tuned, the iops

of every device is higher or equal to its references, ISi ≥ Ii(k) (0 ≤ i < N), in steady state.

The control algorithm has two parameters called control gains, i.e., KP and KI. The

values of the KP and KI need to be tuned to guarantee stability and efficient convergence

to the specs. The detailed design and tuning of the control function are presented in

Section 6.3.1.

6.2.7. The Actuator

The Actuator executes the migration plan at the migration speed that is dynamically

adjusted by the Controller. The Actuator divides each move in the migration plan into

subSS submoves, where S is the size of the store to be moved and the Ssub is the fixed

size of each substore. For each submove, the Actuator invokes the LV Mover to move a

substore, and sleeps for Tim sec after the LV Mover completes the submove before

invoking the LV Mover to conduct the next submove.

In summary, we designed Aqueduct, an on-line data migration architecture that

guarantees the specified aggregated application iops on each storage device. The key

novelty and contribution of Aqueduct is a feedback control loop that adaptively adjusts

migration speed based on performance of concurrent applications.

6.3. Design and Analysis of the Controller

In this Section, we present the second major contribution of Aqueduct, the modeling,

design, analysis and tuning of the Controller, which is the critical to the success of

Aqueduct. We first establish a dynamic model of Aqueduct and the storage system in

Section 6.3.1, and then tune the control functions with established control theory in

164

Section 6.3.2.

6.3.1. The Dynamic Model

For the controlled system (including the storage system, the Monitor, and the

Actuator) of Aqueduct, the output is the victim iops Iv(k) defined as the iops of device i

with the smallest error Emin(k+1)) in the time interval ((k-1)W, kW). The input of the

controlled system is the submove rate Rm(k) or the inter-submove-time Tim(k). In our

control design, we use Rm(k) as the input because the submove rate leads to the following

linear model of the controlled system that is amenable to linear control theory:

( ) )(),()()(

)())1()(()()1(1

min

minmin

bGzzPzRzPzI

akRkRGkIkI

m

mm−==

−−=−+

Equation 6.3

Equation 6.3(a) is in the time domain, and Equation 6.3(b) is the equivalent model in the

z-domain. The term 1−z represents a time delay of a sampling window. The other

dynamics of the controlled system are ignored because they occur at a much smaller time

scale than the sampling window. The process gain, G, is the derivative of the output

Imin(k+1) with respect to the submove rate Rm(k). The process gain characterizes how

sensitive the victim device iops is with regard to the change in submove rate, and is

different for different I/O size and read/write percentage. To guarantee stability in all

cases, we use the process gain with the largest magnitude, Gmax = -8.00 in our control

design. We verify the above controlled system model and measure the process gains with

profiling experiments (see Section 6.5.3).

The transfer function from the minimum error Emin(z) to the change in submove rate

dRm(z) is the standard PI control (Equation 6.2) in the z-domain:

( ) ⎟⎠⎞

⎜⎝⎛

−+=

111 z

zKKzC IP Equation 6.4

The transfer function from the change in submove rate dRm(z) to the submove rate

Rm(z) (Equation 6.2) is modled as an integrator:

( )12 −

=z

zzC Equation 6.5

For the closed loop Aqueduct system, the input is the reference of the victim iops, and

the output is the victim iops Imin(z). Given Equation 6.4, Equation 6.5, and Equation 6.6,

we can derive the transfer function of the closed loop Aqueduct system (including the

Controller and the controlled system):

( ) ( )( ) )()(1

)()(

21

21

zPzCzCzPzCzC

zPC += Equation 6.6

Assuming the common case that the references of all devices are the same, ISi = IS (0

≤ i < N), the constant reference is modeled as a step input, 1−z

zIS , to the closed loop

system. Therefore, the output of Aqueduct, the victim iops Iv(z), can be derived:

( ) )(1

zPz

zISzI cv −= Equation 6.7

6.3.2. Controller Tuning and Analysis

Given the dynamic model in Equation 6.6, we apply the Root-Locus method [33] in

control theory to tune the control gains KP and KI. Since the tuning follows a standard

process and is similar to the web server tuning described in Section 5.5.3, we only give

the results and analysis in this section. Given the process gain G = Gmax = -8.00, we set

KI = 0.1008, KP = 0.0364 Equation 6.8

which place the closed-loop poles of Equation 6.6 at

p0,1 = 0.84 ± 0.0607i Equation 6.9

Now we apply control theory [33] to analyze the performance profile based on the

tuning results in Equation 6.8.

Stability: Aqueduct guarantees stability because all of its closed-loop poles locate

inside the unit circle of z-domain, i.e., |p0,1| < 1. The stability is guaranteed despite

the possible variations in the process gain G cases because we assume the process

gain with the largest magnitude Gmax in our stability analysis.

Steady state performance: Applying the final value theorem in control theory to

Equation 6.7, we derive the final value of the victim iops:

ISzPzCzC

zPzCzCz

zISzIz

v =+−

−=∞→ )()()(1

)()()(1

)1()(21

21

1lim Equation 6.10

This results means that, the victim iops accurately converges to the specified

reference IS, and every device’s iops Ii(k) ≥ IS (0 ≤ i < N) after the feedback control

loop converges. We should also note that since Iv(∞) = IS, Aqueduct achieves the

optimal speed under the constraint of the specified device iops.

Sensitivity with regard to the process gain G: Equation 6.10 also shows that the

final value of victim iops Iv(k) does not dependent on the process gain G, i.e., the

victim converges to the reference IS regardless of the process gain G as long as

Aqueduct remains stable. This property is important because the process gain G

may vary at run-time in different workloads.

Overshoot: Assuming the process gain G = Gmax, the victim iops Iv(k) overshoots

the reference IS by 18%, i.e., max(Iv(k)) = 1.18IS during the transient state after

Aqueduct starts the execution of a migration plan.

Settling/Rise Time: Assuming the process gain G = Gmax, Aqueduct’s settling

167

time is 29W, where W is the sampling window, i.e., execution, the victim iops

Iv(k) converges to within 2% from the reference 29W sec after the beginning of

migration. Let the sampling window W = 30 sec, Aqueduct has a settling time of

870 sec. Although the settling time may be long, the victim iops reaches 0.98IS

5W sec (called the rise time in control theory) after the migration starts and stays

within 0.98IS ≤ Iv(k) ≤ 1.18IS afterwards (as illustrated in , which is plotted with

Matlab). Since the system overshoot is small (18%), we regard the system as in

the steady state for the purpose of performance evaluation.

In summary, we apply a control theory methodology to tune and analyze Aqueduct.

Specifically, we establish a dynamic model for the Aqueduct system, apply the Root

Locus method to tune the controller, and prove that the tuned Aqueduct achieves robust

performance guarantees in term of device iops and satisfactory performance profile.

Step Response

Time (sec)

Vic

tim io

ps

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure 6.2. Step Response of Aqueduct

168

6.4. Implementation

We implement an Aqueduct prototype in C++ on HP-UX 11.0. Upon invocation,

Aqueduct creates two processes, a Monitor/Controller process and an Actuator process.

We now describe the source code of the major components in detail.

Initialization

1) The Monitor/Controller process forks the Actuator process to execute the

migration plan and establishes a pipe (with the non_blocking I/O mode)

between the main Aqueduct process and the mover process. The Actuator

process reads the output file generated by the migration planner.

2) The Monitor/Controller process initializes a monitor object and a controller

object. The constructor of class Controller initializes a vector of iops

references of all I/O devices based on a contract file.

The Monitor/Controller Process

The Monitor/Controller process repeats a loop until the migration plan is completed.

In each iteration of the loop, the Monitor/Controller process calls monitor.sample() and

then controller.control().

1) monitor.sample() samples the iops of all devices. In the current Aqueduct

prototype, monitor.sample() loops until it successfully opens an output file

that is periodically generated by the workload generator called Pylon (see

Section 6.5.1). It then reads the iops samples of all devices from the output

file and put them to a vector monitor.vec_stream_perf. In the future, Aqueduct

should be modified to interact with a performance-monitoring tool such as

169

PerfView to get the performance samples.

2) controller.control(monitor.vec_stream_perf) computes the new inter-

submove-time based the control algorithm described in Section 6.2.6, and then

writes it to the pipe connected with the Actuator process. This step is skipped

if the Controller is turned off through a configuration file.

The Actuator Process

In parallel with the Monitor/Controller process, the Actuator process repeats the plan

through a loop. In each iteration of the loop,

1) Get the next submove (substore and destination device) from the migration

plan.

2) Fork a LV Mover process to conduct the submove and wait for its completion.

3) Read from the (non-blocking) pipe. If the read succeeds, the inter-submove-

time inter_submv_time is set to the value from the pipe, otherwise it keeps the

old value.

4) Sleep for inter_mv_time sec.

6.5. Experiments

We now present a set of experimental results on a networked storage test-bed at the

Storage System Program of HP Labs. In Section 6.5.1, we describe the configurations of

the experiments. In Section 6.5.2, we present a performance study that quantifies the

performance penalty caused by data migration. In Section 6.5.3, we then describe a set of

profiling experiments to measure the process gain of the controlled storage system.

Finally, we present a set of performance evaluations of the Aqueduct prototype.

170

6.5.1. Experiment Configurations

The hardware used in the presented experiments includes a JBOD disk array

composed of 5 disks, and a host machine running HP-UX 11.0. The disk array and the

host machine is connected with a fibre channel.

Two logical volume groups with 25 stores are used in our experiments. The host

machine runs a synthetic workload generator called Pylon to generate streams of I/O

requests to the disk array. Now we describe the stores and I/O streams in details.

Stores

Volume group vg02 includes 4 disks (also called physical volumes in LVM)

in the JBOD. 24 stores (including 20 migrated stores and 4 fixed stores) are

created in vg02. A migrated store is a store that is moved across devices, while

a fixed store is never moved in any experiments.

Volume group vg04 includes a separate disk in the JBOD with a single

standalone store on it. Similar to the fixed stores, the standalone store is not

moved. It is called standalone because it belongs to a volume group that never

participates in data migration.

I/O streams

A Pylon process executed 5 as-fast-as-possible (afap) I/O streams including one

stream (called a fix-stream) on each fixed store and, one stream (called a standalone-

stream) on the standalone store. All I/O streams are completely random with a run count

= 1, i.e., the I/O stream never generates I/O requests on sequential locations on a disk.

Intuitively, afap I/O suffers most from resource contention from concurrent data

171

migration. Therefore, the migration-penalty on afap I/O streams represents the upper

bound of the migration-penalty on I/O streams with real workloads. Since there is only

one I/O stream per device in all the experiments presented in this report, the iops of a

stream is the same as the total iops of its target device. When multiple I/O streams exist

on a same device, the device iops should be the aggregated iops of all I/O streams on the

device.

6.5.2. A Performance Study on Migration Penalty

In this section, we present a performance study to quantify the performance penalty

caused by data migration on concurrent applications. In all the experiments presented in

this section, the Controller is turned off and a fixed inter-submove time is used.

Metrics

Data migration may affect the performance of application I/O’s in two ways: 1) It

may cause resource contention with concurrent application I/O; 2) The load on devices

may change due to changes load distribution on devices when stores are moved. We only

execute I/O streams on fixed/standalone stores in our performance studies and there is no

change of load distribution during data migration. Therefore, only the effect of resource

contention is reflected in the presented experiments. To quantify the impact of data

migration on concurrent application I/O, we define the following terminology.

minMi The minimum iops of device i during the execution of a migration plan maxMi The maximum iops of device i during the execution of a migration plan minNi The minimum iops of device i after a migration-plan is completed

Pi The migration-penalty (or penalty for simplicity) of device i during the execution of a migration plan. RPi = (minNi – minMi) / minNi

The penalty Pi of device i represents how many iops it lost due to concurrent data

172

migration relative to its iops without concurrent migration.

Data Migration completes

Device w. Fixed stores

Device w. Standalone store

IO size=2kB, read only, Tim= 4 sec

IO size=64kB, Read 50%, Tim= 0 sec

Device iops

Time (min)

100

150

190

802 4 6 8

Figure 6.3. Device iops during data migration

Results

The iops of all devices in two experiments with different workloads are illustrated in

Figure 6.3 .

In Experiment 1, all devices (the top 5 curves in Figure 6.3) are read-only, and the

size of requested data (request size) is 2 KB. The inter-submove-time Tim = 4 sec.

In Experiment 2 (the bottom 5 curves in Figure 6.3), all devices are 50% read

with request size = 64 KB. The inter-submove-time Tim = 0 sec, i.e., the Actuator

process does not sleep between two subsequent submoves.

The sampling period W = 30 sec in all experiments. Each point in Figure 6.3

represents the iops of a device during a sampling period. In both experiments all devices

173

achieve significantly less iops during migration than after the migration is completed. In

Experiment 1 (see Figure 6.4), the penalties on the fixed streams were in the range [9.7%,

13.0%]. The standalone-stream has a smaller penalty of 4.1%. In Experiment 2 (see

Figure 6.5), the penalties of the fixed streams are in the range [15.4%, 17.9%]. The

standalone-stream has a smaller penalty of 5.3%.

i Type minMi maxMi minNi DPi VPi APi Pi

0 Fixed 161.94 173.67 186.10 6.3% 3.5% 3.2% 13.0%1 Fixed 169.40 174.86 187.54 2.9% 2.8% 3.9% 9.7%2 Fixed 163.41 172.80 187.55 5.0% 3.9% 3.9% 12.9%3 Fixed 166.26 175.18 186.63 4.8% 2.7% 3.5% 10.9%4 Standalone 180.15 187.77 4.1% 4.1%

Figure 6.4. Migration Penalty in Experiment 1 i Type minMi maxMi minNi DPi VPi APi Pi

0 Fixed 83.86 89.59 102.14 5.6% 7.7% 4.6% 17.9%1 Fixed 86.28 91.82 102.02 5.4% 5.5% 4.5% 15.4%2 Fixed 84.29 90.55 102.41 6.1% 6.7% 4.8% 17.7%3 Fixed 86.00 92.25 101.77 6.1% 5.1% 4.2% 15.5%4 Standalone 97.45 102.95 5.3% 5.3%

Figure 6.5. Migration Penalty in Experiment 2 A more detailed analysis (see Figure 6.4 and Figure 6.5) reveals that the penalty of a

device can be logically divided into three portions, namely, the device-penalty, the vg-

penalty, and the array-penalty.

• Device-Penalty DPi of device i is defined as the difference between the maximum

iops and the minimum iops of device i during migration divided by the minimum

iops of device i without migration, i.e., DPi = (maxMi - minMi) / minNi. This is

based on the observation that, during data migration, devices with fixed-streams

achieve lower iops when moves occur on them (i.e., serving as a source or

destination device of a move) when no moves occur on them. The device penalty

is caused by resource contentions between data migration and I/O streams on

resource (e.g., disk arm and/or disk controller) of the shared device.

174

• Vg-Penalty VPi of device i (with a fixed streams) is defined as the difference

between the minimum iops minMs of the device with a standalone-stream during

migration and device i’s maximum iops maxMi during migration; i.e., VPi =

(maxMi - minMs) / minNi. Note that even when no moves occurred to a device

with a fixed-stream, its iops is still lower than the iops of the device with a

standalone-stream during migration. The vg-penalty may be caused by the volume

group management overhead (e.g., metadata update and/or locking) that occurs to

every device in a volume-group when mirror-split operations are performed to any

logical volumes in the volume group.

• Global-penalty APi: Surprisingly, even the device with a standalone-stream

achieves lower iops during migration than after the migration is completed,

although the standalone store belongs to a volume group that never participates in

data migration. We call the portion of migration penalty on every device in a

storage system the global-penalty APi. APi is the portion of penalty not included

in the device-penalty or vg-penalty, i.e., APi = Pi - DPi - VPi = (minNi - minMs) /

minNi. The global-penalty may be due to contention on resources shared by the

whole storage system, e.g., array controllers and/or fiber channels. The reasons

for the vg-penalty and the global-penalty remain open questions that need further

investigation into the mirror/split mechanism.

The measured device-penalty, vg-penalty, and global-penalty for all devices in

Experiments 1 and 2 are summarized in Figure 6.4 and Figure 6.5, respectively. Different

devices suffer different degrees of penalties. In particular, when data moves occur on a

device, it receives all three penalties, while a device with standalone streams only suffers

175

from the global penalty. We call the device with the least iops during a sampling period

((k-1)W, kW) the victim device at the kth sampling period. The iops of the victim device is

called the victim iops Iv(k). Note that the victim iops Iv(k) forms a “bottom envelop” of

the sampled iops of all devices during migration.

6.5.3. System Profiling

In this section, we present a set of profiling experiments to 1) verify the effectiveness

of migration speed (inter-submove-time or submove rate) as the manipulated variable for

controlling the device iops, and 2) measure the process gain G for control design. The

standalone stream and its target disk is not used in the profiling experiments because it is

usually not the victim device. 4 workloads with different combinations of I/O request

sizes (2KB or 64KB) and read ratio (100% or 50%) are used. For each workload, we run

Aqueduct (with the Controller turned off) repeatedly with a same migration plan. Each

run uses a fixed inter-submove-times in the range between 0 sec and 22 sec. The

sampling window W = 30 sec in all runs. Each data point plotted in Figure 6.6(a)(b)

represents the average of victim iops avg(Iv(k)) of all sampling periods during the

execution of a migration plan. The 90% confidence interval of every data point is within

±1.11% of the average value.

Victim-iops and inter-submove-time

We can see that the average victim-iops increases monotonously with inter-submove-

time Tim in all workloads. For example, for workload (2 KB, 100% read), the average

victim-iops increased from 150.51 iops to 180.588 iops (an increase of 20.0%) when the

inter-submove-time increases from 0 sec to 22 sec. This result verifies that migration

176

speed is an effective mechanism for controlling device iops. However, Figure 6.6(a) also

shows that the relationship between Tim and Iv is non-linear, i.e., the slope of the Iv vs. Tim

curve changes dramatically in the testes range of Tim. Such non-linear relationship is not

amenable to linear control design [33].

(b) victim-iops vs submove rate

y = -7.8621x + 182.71

y = -8.0027x + 188.87

y = -3.9064x + 104.64

y = -4.0015x + 103.68

80

100

120

140

160

180

0 1 2 3 4 5 6

Rm

Iv

(2KB, 50%READ) (2KB, 100%READ)

(64KB, 50%READ) (64KB, 100%READ)

(a) victim-iops vs inter-submove-time

80

100

120

140

160

180

0 5 10 15 20 25

Tim

Iv

(2KB, 50%READ) (2KB, 100%READ)

(64KB, 50%READ) (64KB, 100%READ)

Figure 6.6. Relationship between migration speed and migration speed

Victim-iops and submove-rate

To find a linear model for the controlled storage system, we re-plotted the average

victim iops as a function of the submove-rate Rm in Figure 6.6(b) based on the same

experiments. We can see that Iv decreases linearly as a function of submove-rate Rm with

all workloads. Using linear regression techniques (as shown in ), we find that the

relationship between the victim-iops and submove-rate can be formulated as

avg(Iv(k))= Ic + GRm

where Ic is viewed as the constant capacity of the victim device. The process gain G of

177

the controlled storage system can be approximated with the slope of the Rm vs. Iv curve.

Note that G is different for different workloads and is especially sensitive to request size.

Smaller I/O size leads to a larger G, i.e., iops with smaller I/O is more sensitive to

changes in migration speed. For example, G = -8.00 when request size = 2KB and read

only, which is more than twice of G = –3.91 when request size = 64KB and 50% read.

Among all the workload, Gmax = -8.00 has the largest magnitude and is used in our

control design (Section 6.3).

The submove rate Rm can be approximately converted to inter-submove-time Tim

= SP/Rm - Tm where Tm is the submove time, i.e., the average duration of each submove.

The average submove time of all sampling periods during migration for each workload is

as follows:

Req. size 100%READ 50%READ 2KB 6.384±0.326 sec 6.807±0.388 sec 64KB 7.005±0.376 sec 6.873±0.332 sec

Since smaller submove time leads to a longer sleep time in our control algorithm, we use

the smallest measurement Tm = 6.384 sec in our Controller implementation to be

conservative.

6.5.4. Performance Evaluation

In this Section, we present the performance evaluation of the (closed-loop) Aqueduct

prototype. The standalone stream and its target device are not used in the performance

evaluation. Two workloads are used in the experiments. In workload A, each I/O stream

has an request size of 2 KB and all I/O requests are read-only. For workload B, each I/O

has an request size of 64KB and 50% of I/O are read. With workload A, the iops

reference of each device is 165 iops, and the spec for each device is 95 iops when

178

workload B is used.

We use a baseline called AFAP, which is configured by turning off the Controller in

Aqueduct and set the fixed inter-sub-move time to 0 sec. In all the evaluation

experiments presented in this section, the initial value of Tim(0) = 0 sec for the (closed-

loop) Aqueduct. The sampling period is 30 sec. All the data points presented in this

section (except for those in �) are the average values (with 90% confidence interval) of

11 repeated runs.

Performance Metrics

The following performance metrics are used in the performance evaluation.

• IAi: Average iops of device i during migration.

• VRi: QoS violation ratio of device i. A QoS violation of device i is a sampling

period k during which the iops of device i is less than its spec, i.e., Ii(k) < Isi. The

QoS violation ratio of device i is defined as the number of QoS violations NVi

divided by the total number of sampling period NSP, i.e., VRs = NVi / NSP.

• WVi: Worst QoS violation of device i, defined as the largest error of device i

during migration relative to its reference, i.e., WVi = max(ISi - Ii(k)) / ISi. Note that

while the QoS violation ratio represents the frequency of QoS violations, the

worst QoS violation represents the extent of QoS violation in the worst sampling

period.

• Tr: Rise time of the storage system defined as the during from the beginning of

migration (time 0) to the first sampling instant kW when Ii(k) > 0.98ISi for every

device i. As discussed in Section 6.3.2, since the overshoot is small, we regard the

system as in the “steady state” after the rise time. A reason for using the rise time

179

instead of the real settling time as the start time of steady state is that the rise time

is easier to measure in a noisy system such as the storage system. The rise time

describes how fast Aqueduct reaches the correct migration speed to approach the

reference iops of all devices after migration starts. It is desirable to have a small

rise time and therefore a short transient interval without QoS guarantees.

The following three metrics describes the system performance in the steady state..

• IASi: Average iops of device i in steady state.

• VRSi: Steady-state QoS violation ratio of device i, defined as the number of QoS

violations after the system enters a steady state divided by the total number of

sampling period in the steady state.

• WVSs: Worst QoS violation of device i in the steady state, defined as the largest

error of device i relative to its spec in steady state.

• TDM: Execution time of a migration plan, defined as the duration of the execution

of a migration plan. Tdm represents the efficiency of data migration. Note that it is

undesirable to over-throttle the migration speed while unnecessarily allowing

devices to perform better than its spec.

180

TimRm

IS0.98IS

Dev

ice

iops

{ I i

} (io

ps)

subm

ove-

rate

(1/(3

0 se

c))

inte

r-su

bmov

e-tim

e (s

ec)

Time (min)

Time (min)Ts=90 sec

Steady-state

TDM=720 sec

Figure 6.7. Device iops and control input of Aqueduct

A Typical Run

A typical run of Workload A is illustrated in �. The top graph illustrates the iops of

all devices. The bottom graph shows the inter-submove-time Tim(k) and the submove rate

Rm(k) computed by the Controller. The iops of all devices are less than the iops spec IS =

165 iops when migration started with Tim(0) = 0 sec and Rm(0) = 5 submove/W. This is

because the data migration is too fast and causes excessive resource contention with the

concurrent I/O streams. Aqueduct reacts to the QoS violations by gradually increasing

Tim(k) to slow down the migration. By the time 90 sec (the rise time), the iops of all

devices increases to equal or more than 0.98IS = 161.7 iops while the submove rate is

reduced to 3 submove/W. In the steady state, the iops of all devices stays above or close

to the spec while the submove rate stays at 2-3 submove/SP. The victim iops Iv(k) stays

close to the reference in steady state. This demonstrates that Aqueduct converges to a

181

correct speed and achieved QoS guarantees in the steady state. The system remains

stable throughout the run. The execution time of the migration plan TDM = 720 sec in this

run.

Average Device iops

The average device iops of the AFAP baseline and Aqueduct, and the steady state

average iops of Aqueduct are illustrated in Figure 6.8. With AFAP, the average iops of

every device is less than the spec with both workloads. In comparison, when Aqueduct

executed the same migration plan, every device achieved an average iops higher than the

reference. In addition, the steady state average iops IASi of Aqueduct is more than its

average iops, IASi > IAi > IS. This result shows that Aqueduct effectively increases the

iops of every device to more than the reference. The performance improvement is

especially significant after the system settles to the steady state.

0 1 2 3

Device Number

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

180

iops

(a) Average Device iops (Workload A: 64KB 50%READ)

IAi AFAP IAi AQUEDUCTIASi AQUEDUCTSPEC IS

0 1 2 3

Device Number

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

180

iops

(b) Average Device iops (Workload B: 2KB READ only)

Figure 6.8. Average iops of AFAP and Aqueduct, and Aqueduct in steady state

QoS Violation Ratio

The QoS violation ratio of AFAP and Aqueduct, and the steady state QoS violation

ratio of Aqueduct are illustrated in Figure 6.9. We can see that the AFAP baseline causes

182

every device to violate its iops spec in most of the sampling periods because VRi > 90%

for every device i in both workloads. In the Aqueduct case, the QoS violation ratio of

every device is significant lower than the AFAP baseline, i.e., VRi < 35% (Workload A)

and VRi < 30% (Workload B) for every device i. The QoS violation ratio in the steady

state is further reduced to lower than 20% in both workloads, i.e., VRi < 20% (Workloads

A and B) for every device i. The steady-sate QoS violation ratio in the Aqueduct case in

is less than ¼ of the QoS violation ratio of the AFAP baseline. However, Aqueduct

cannot eliminate QoS violations even in steady state because its feedback control loop

oscillated around (rather than above) the specs. However, if a device iops Ii close to the

spec, i.e., Ii ≥ 0.98ISi, is acceptable to the applications, we can treat ISi‘ = 0.98ISi as a

relaxed spec for device i13. The QoS violation ratios based on the relaxed specs are

plotted in Figure 6.10. We can see that the QoS violation ratios of all devices remain

above 90% (both workloads) in the AFAP case. In the Aqueduct case, the QoS violation

ratios of all devices are lower than 20%. Most importantly, the steady state QoS violation

ratios are reduced to close to zero for all devices, i.e., VRSi < 5% for every device and

workload. It means that Aqueduct successfully achieved QoS guarantees at 98% of specs

after it settled down to a steady state.

13 Note that Aqueduct can achieve the strict spec IS by using ISi/0.98 to compute the control input (see the Controller algorithm) so that the device iops converge to ISi/0.98 (instead of ISi).

183

0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

QoS

vio

lati

on r

atio

(a) Workload A: 64KB, 50% READ

VRi AFAP VRi AQUEDUCTVRSi AQUEDUCT

0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

QoS

vio

lati

on r

atio

(b) Workload B: 2KB, READ only

Figure 6.9. QoS Violation Ratio of AFAP, Aqueduct, and Aqueduct in Steady State

0 1 2 3

Device Number

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

QoS

vio

lati

on r

atio


VRi AFAPVRi AQUEDUCTVRSi AQUEDUCT

0 1 2 3

Device Number

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

QoS

vio

lati

on r

atio


Figure 6.10. QoS violation ratio based on the relaxed spec 0.98IS

Worst QoS Violation

The worst QoS violations of AFAP and Aqueduct, and Aqueduct in steady state are

illustrated in Figure 6.11. For the AFAP baseline, the worst QoS violation is more than

10% of the spec (both workloads) for every device except for device 4. The worst QoS

violation of Aqueduct is lower than the AFAP baseline, but the difference is insignificant.

This because the worst QoS violations occur before the system settles to the steady state.

In comparison, the worst QoS violation of every device in the steady state WVSi < 3%

184

with both workloads, which is significantly lower than the worst QoS violation of AFAP

and Aqueduct throughout the run. It means that device iops never becomes significantly

lower than the spec in the steady state. The results on worst QoS violations and QoS

violation ratios together show iops guarantees are successfully achieved in the steady

state.

0 1 2 3

Device Number

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Wor

st Q

oS v

iola

tion


WVi AFAPWVi AQUEDUCTWVSi AQUEDUCT

0 1 2 3

Device Number

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Wor

st Q

oS V

iola

tion


Figure 6.11. Worst QoS Violations of AFAP, Aqueduct, and Aqueduct in steady state

Rise Time

The evaluation has shown that Aqueduct can successfully guarantee iops for every

device in steady state. The rise times measures an orthogonal metric, i.e., how fast can

Aqueduct settle to a steady state? In our experiments, the rise time Ts of Aqueduct is

204.5(±17.6) sec for Workload A, and 144.5(±9.8) sec for Workload B. The rise time can

be further reduced if a shorter sampling period is used. Note that Workload B (request

size = 2KB, READ only) has a shorter settling time. This is because a smaller I/O size

leads to a larger process gain and therefore the system is more responsive to feedback

control.

185

(b) (2KB READ only)(a) (64KB READ 50%)0

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

1050

1100

Exe

cuti

on t

ime

of M

igra

tion

Pla

n (s

ec)

AFAPAQUEDUCT

Figure 6.12. Execution Time of Migration Plan

Execution Time of the Migration Plan

The execution times of the migration plan for the AFAP baseline and Aqueduct is

illustrated in Figure 6.12. Aqueduct achieves QoS guarantees at the cost of a longer

execution time of the migration plan (TDM). For workload A, Aqueduct had TDM =

765.0(±11.3) sec, 83% longer than the AFAP case. For Workload B, Aqueduct’s TDM =

1012(±6) sec, 121% longer than the AFAP case.

In summary, the evaluation experiments demonstrates that 1) Aqueduct settles to a

steady state after a rise time of Tr < 4 min for the tested workloads; 2) In the steady state,

every device achieves its iops spec in steady-state with a QoS violation ratio (based 98%

of the specs) less than 5% and a worst QoS violation less than 3% of the specs; 3)

Aqueduct’s execution time of the migration plan is (83% for Workload A and 121% for

workload B) longer than the AFAP baseline.

6.6. Conclusion and Future Work

The Aqueduct project demonstrates the applicability of our FCS framework in a non-

186

real-time application, i.e., networked enterprise storage systems. The major contributions

of the Aqueduct project are summarized as follows.

• A performance study that demonstrates uncontrolled data migration may cause

significant performance penalty on concurrent application I/O.

• A feedback control architecture called Aqueduct that dynamically adapts

migration speed to guarantee specified iops for every device in a storage system.

• A control theory methodology including a system-profiling technique for

modeling the storage system, the Root Locus method to the controller tuning, and

the control theory analysis on the performance profile of Aqueduct design.

• Implementation and evaluation of an Aqueduct prototype on a networked storage

testbed that demonstrate Aqueduct provide desired iops guarantees for all devices

during data migration.

The Aqueduct project has successfully made a case for the strength of FCS

framework in networked storage systems. Future work includes more realistic

workload/experiments on high end RAID, and a more general implementation that

interacts with performance monitoring tools. The division of each store into multiple

logical volumes cannot scale well for large stores because of limitations on the number of

logical volumes in a volume group and the LVM overhead. An efficient mechanism for

dividing moves/stores into submoves/substores needs to be developed. More

sophisticated Actuator mechanisms may also be developed, e.g., the Actuator may

dynamically change the plan to avoid busy devices in the next move.

187

Chapter 7

General Issues

In this Chapter, we summarize some insights based on the application of FCS

framework on three different types of applications. In Section 7.1, we argue that

controlling aggregate performance (instead of each individual task) is a scalable and

practical control scheme in many computer systems. In Section 7.2, we discuss the design

tradeoffs related to the sampling period of feedback control loops in computer systems.

Finally, in Section 7.3 we discuss some open questions on the robustness regarding our

linear models and control design in actual non-linear and time variant environments.

7.1. Granularity of Performance Control

The granularity of performance control is an important issue for designing feedback

control resource scheduling algorithms. Ideally, a performance guarantee should be

provided for each individual task such as each process/thread, each TCP connection, and

each I/O steam. Although the individual guarantee may be possible for small systems

such as PC’s or simple digital embedded controllers, it may be impractical on a large

server system such as a web server or a storage server for e-business applications. First,

188

controlling each individual task is not scalable to server systems with large number of

tasks. For example, it is not uncommon for a web server to handle millions of users and

TCP connections every hour, and performance control of each connection may introduce

extremely high overhead at run time. Second, the noisy and random behavior of each

individual task cannot be easily described by differential/difference equations or

controlled by classical control functions.

To solve the above problem, we aim to control the aggregate performance in all the

three applications in this thesis research.

• Our real-time CPU scheduling algorithms (Chapter 4) control the aggregate

deadline miss ratio and total CPU utilization of all the tasks in the system.

• Our web server (Chapter 5) controls the average service delay of each service

class composed of hundreds of users.

• The Aqueduct data migration executor (Chapter 7) provides performance

guarantees on the aggregate throughput of I/O streams on each storage device.

Compared with performance control at the individual task level, the aggregation

performance is more scalable and efficient at run time. We have also shown that all the

three applications can be sufficiently approximated with first-order or second order

difference equations for the purpose of controlling the aggregate performance. For

example, although the queuing delay and processing of each individual task can be

difficult to describe with differential equations, the aggregate CPU utilization can be

readily approximated with a discrete integration model. The simplicity in modeling

aggregate system behavior is because the aggregation of large number of individual tasks

tends to smooth out the noise of individual tasks and therefore is more amenable to

189

modeling based on classical differential/difference equations. The aggregated

performance guarantees are especially appropriate to applications where the individual

tasks of a same aggregation are equally important. For example, to support delay

differentiation in a web server, all the HTTP requests in a same service class should be

identical in term of QoS requirements.

However, the disadvantage of aggregate performance control is that it may not

provide guarantees to each individual task. Special consideration may be necessary to

provide individual performance guarantees with an aggregate control scheme. For

example, under certain assumptions, each task can be guaranteed to make its deadline by

combining aggregate CPU utilization control with the knowledge of schedulable

utilization bound in real-time scheduling theory in real-time CPU scheduling (see

Chapter 4). In the storage data migration executor, we make a design tradeoff between

the practicality of control and individual performance by choosing the throughput of each

device instead of individual I/O stream or the whole storage server as the controlled

variable (see Chapter 6). Individual guarantees may also be handled at the actuator

mechanisms. If a system has critical tasks that require hard real-time guarantees, a fixed

amount of resources should be reserved for such tasks. In our FC-RTS algorithms

presented in Chapter 4, the QoS optimization algorithm in the actuator assigns higher

QoS levels to tasks with higher values or importance.

7.2. Sampling Period and Overhead

The sampling period of feedback control is an important parameter and may require

some design trade-off. Intuitively, reducing the sampling period causes the Controller to

react faster to variations of run-time conditions and may result in better transient response

190

such as shorter settling time and lower overshoot. However, smaller sampling period also

means more control overhead. The size of the sampling period may also have a lower

bound imposed by the workload. For example, to control the service delays of a web

server, enough TCP connections per service class should be dispatched to server

processes so that the monitor can infer a smooth service delay of each class at every

sampling instant. Otherwise the sampled delay may be dominated by the noise of a small

number of connections. Therefore the arrival rate of the TCP connections and capacity of

the server determines a lower bound of the sampling window. This lower bound

decreases as higher the connection arrival rate and server capacity increases. This

property means that busier and more powerful web servers can benefit from a smaller

sampling period. Another design option is to use a low-pass filter in the monitor to

smooth out the performance samples [54]. However, the low pass filter also tends to slow

down the Controller’s response to run-time conditions. The sampling period may need to

be compatible with the periodicity of the workload. For example, for a periodic task set

with harmonious arrival period, the sampling period of the CPU scheduling algorithm

should be multiples of the least common multiples of tasks’ arrival periods. Otherwise the

system may oscillate frequently due to noise in sampled miss ratio because of the

difference in the number of processed tasks instances in different sampling periods [51].

Therefore, the sampling period of performance control should be tailored to the specific

systems and workloads.

The overhead of an FCS algorithm includes time spent in the Monitor, the Controller,

and the Actuator. The Controller overhead is usually negligible if it uses simple linear

191

control algorithms on aggregated controlled variables as in all the case studies in this

thesis. On the other hand, the overhead introduced by the Monitor and Actuator may need

more consideration. The Monitor introduces overhead for the collection of performance

information to compute the controlled variables. For example, collecting the aggregated

CPU utilization (as in FC-U and FC-UM in Section 4.4.4) may be more efficient than

keeping track of the CPU utilization of each individual task in the system. The Actuator

introduces overhead for changing the manipulated variables. For example, if the

preemptive scheduling is adopted in the web server (Section 5.4.1), the Actuator may

terminate established TCP connections and dispatch new TCP connections to server

processes. In contrast, the non-preemptive scheduling causes the Actuator to be much

more efficient Actuator because it only needs to change the process budget variable of

each service class at each sampling instant. In case no efficient Monitor or Actuator

mechanisms is possible, the Control designer may be forced to increases the sampling

period to reduces the relative overhead at the cost of slower response to run-time

variations.

7.3. Robustness of Linear Models and PI Control

In all three applications, we approximate a non-linear and time-varying computer system

with a linear and time-invariant model based on analysis or system identification

experiments. We then use classical linear control theory to design a P (Proportional) or PI

(Proportional-Integral) Controllers based on the linear model. Our evaluation experiments

have shown that FCS algorithms developed using this linear control approach

demonstrate exceptional robustness despite of approximations in the linear models. In all

of our evaluation experiments in three applications, all the FCS algorithms provide

192

performance guarantees (including stability, small steady state error, and satisfactory

transient response) in face of considerable variations in system workload and non-

linearities in the actual system. There remain two open questions on our linear models

and control approach: What conditions may cause the linear models to significantly

deviate from the actual non-linear computer systems? What conditions may cause

classical Controllers such as PI control to fail and therefore necessitate more

sophisticated control algorithms such as gain-scheduling and adaptive controllers? [12]

The above two questions are different because PI Controllers may achieve satisfactory

performance even when the models are inaccurate. Answers to these questions may lead

to even more robust feedback control resource scheduling solutions in unpredictable

environments.

193

Chapter 8

Conclusions and Future Work

In this thesis we establish Feedback Control real-time Scheduling (FCS) as a unified

framework of adaptive real-time systems based on feedback control theory. The FCS

framework supports fundamental resource scheduling solutions that provide robust

performance guarantees for real-time systems operating in unpredictable environments.

Such systems include open systems on the Internet such as online trading and e-business

servers, and data-driven systems such as smart spaces and agile manufacturing. In

contrast to ad hoc approaches that rely on laborious design/tuning/testing iterations, our

framework enables system designers to systematically design adaptive real-time systems

with established analytical methods to achieved robust performance guarantees in


We first introduce the major components and methodologies of the FCS framework in

general terms. The FCS framework includes a general feedback control scheduling

architecture that map the feedback control structure to adaptive resource scheduling, a set

of performance specifications and metrics to characterize transient and steady state

performance of adaptive real-time systems, and a control theory based design

194

methodology for resource scheduling algorithms to satisfy their performance

specifications.

We then present our first application of the FCS framework, real-time CPU

scheduling. We develop an adaptive CPU scheduling architecture and a set of scheduling

algorithms that provide performance guarantees in terms of deadline miss ratio and CPU

utilization in CPU-bound real-time systems in face of unpredictable task arrivals and

execution time variations. These scheduling algorithms are analytically designed and

tuned with feedback control theory based on a novel model of generic CPU-bound real-

time systems. Our simulation experiments demonstrate that our scheduling algorithms

can guarantee stability, desired miss ratio and CPU utilization in steady state, and

satisfactory transient performance in response to severe overloads and considerable

workload variations.

Our second application of the FCS framework is an adaptive architecture that

provides relative and absolute service delay guarantees for different service classes on

web servers under HTTP 1.1. This architecture is based on feedback control loops that

enforce delay guarantees for classes via dynamic connection scheduling and process

reallocation. We develop a system identification tool that enables system designers to

establish mathematical models for computer systems with unknown dynamics based on

experimental data. Based on a web server model established with our system

identification tool, we use a control theory method called the Root Locus method to

design feedback controllers to satisfy performance specifications. The adaptive

architecture has been implemented by modifying an Apache web server. Experimental

results demonstrate that our adaptive server provides robust delay guarantees even when

195

user populations of different classes vary significantly. Properties of our adaptive web

server also include guaranteed stability, and satisfactory efficiency and accuracy in

achieving desired delay or delay differentiation.

We also extend our FCS framework to a non-real-time application: on-line data

migration in networked storage servers. We developed a data migration executor called

Aqueduct that dynamically regulates data migration speed while guaranteeing specified

I/O throughput of concurrent applications. We implemented an Aqueduct prototype at a

storage server testbed. Our performance evaluation experiments demonstrate that

Aqueduct completes data migrations while successfully maintaining the throughput

specifications.

The successful application of our approach in three significantly different applications

gives us confidence of the applicability of our FCS framework in a wide range of real-

time and non-real-time systems.

The presented work on FCS suggests many interesting future work and research

directions. This thesis is mostly concerned with only single resource management to

achieve single dimension of performance guarantees. One direction for future research is

on feedback control scheduling of multiple resources for systems where bottleneck

resources can change at run-time. For example, e-business servers supporting dynamic

web contents such as database transactions and video/audio streaming may need feedback

control scheduling of the CPU, networking, memory, and storage in order to handle

different run-time conditions. Multiple concurrent feedback control loops and adaptive

mode-switching mechanisms need to be developed for such applications.

Another direction for future research is to extend the single node solutions in this

196

thesis to distributed systems such as server farms and smart spaces composed of

networked embedded systems. Future research is necessary to develop scalable and

decentralized control architecture and models that coordinate networked controllers to

achieve aggregate performance guarantees.

From a control theory perspective, it is interesting to investigate the application of

robust and adaptive control theory in computer systems with non-linearities and

variations that cannot be handled by the classical linear control scheme used in this

thesis. Research in these areas may lead to another leap toward the robustness and

flexibility of real-time systems in extremely unpredictable environments.

The current applications of the FCS framework are implemented as separate

application level software or as simulators. It would be interesting to implement the FCS

architecture as part of an OS kernel or middleware that provide a general set of service

API which provides performance guarantees to applications.

197

Reference

[1] 3Com Corporation, “Gigabit Ethernet Comes of Age,” Technology white paper, June 1996.

[2] ANSI, “Fibre Channel Arbitrated Loop,” Standard X3.272-1996, April 1996.

[3] T. F. Abdelzaher, “An Automated Profiling Subsystem for QoS-Aware Services,” IEEE Real-Time Technology and Applications Symposium, Washington D.C., June 2000.

[4] T. F. Abdelzaher, E. M. Atkins, and K. G. Shin, “QoS Negotiation in Real-Time Systems and its Application to Automatic Flight Control,” IEEE Real-Time Technology and Applications Symposium, June 1997.

[5] T. F. Abdelzaher and N. Bhatti, “Web Server QoS Management by Adaptive Content Delivery,” International Workshop on Quality of Service, 1999.

[6] T. F. Abdelzaher and C. Lu, “Modeling and Performance Control of Internet Servers”, 39th IEEE Conference on Decision and Control, Sydney, Australia, December 2000.

[7] T. F. Abdelzaher and C. Lu, “Schedulability Analysis and Utilization Bounds for Highly Scalable Real-Time Services,” IEEE Real-Time Technology and Applications Symposium, Taipei, Taiwan, June 2001.

[8] T. F. Abdelzaher and K. G. Shin, "End-host Architecture for QoS-Adaptive Communication," IEEE Real-Time Technology and Applications Symposium, Denver, Colorado, June 1998.

[9] T. F. Abdelzaher and K. G. Shin, “QoS Provisioning with qContracts in Web and Multimedia Servers,” IEEE Real-Time Systems Symposium, Phoenix, Arizona, December 1999, pp. 44-53.

[10] J. Almedia, M. Dabu, A. Manikntty, and P. Cao, “Providing Differentiated Levels of Service in Web Content Hosting,” First Workshop on Internet Server Performance, Madison, WI, June 1998.

[11] Apache Software Foundation, http://www.apache.org.

[12] K. J. Astrom and B. Wittenmark, Adaptive control (2nd Ed.), Addison-Wesley, 1995.

[13] C. Aurrecoechea, A. Cambell, and L. Hauw, “A Survey of QoS Architectures,” 4th IFIP International Conference on Quality of Service, Paris, France, March 1996.

[14] P. Barford and M. E. Crovella, “Generating Representative Web Workloads for Network and Server Performance Evaluation,” ACM SIGMETRICS '98, Madison WI, 1998.

[15] G. Banga, P. Druschel, and J. C. Mogul, “Resource Containers: A New Facility for

198

Resource Management in Server Systems,” Operating Systems Design and Implementation (OSDI'96), 1999.

[16] G. Beccari, et. al., “Rate Modulation of Soft Real-Time Tasks in Autonomous Robot Control Systems,” EuroMicro Conference on Real-Time Systems, York, UK, June 1999.

[17] N. Bhatti and R. Friedrich, “Web Server Support for Tiered Services.” IEEE Network, 13(5), Sept.-Oct. 1999.

[18] P. R. Blevins and C. V. Ramamoorthy, “Aspects of a Dynamically Adaptive Operating Systems,” IEEE Transactions on Computers, Vol. 25, No. 7, pp. 713-725, July 1976.

[19] E. Borowsky, R. Golding, A. Merchant, L. Schreier, E.Shriver, M.Spasojevic, and J. Wilkes, “Using Attribute-Managed Storage to Achieve QoS,” 5th Intl. Workshop on Quality of Service, New York, June 1997.

[20] A. Bouch, N. Bhatti, and A. J. Kuchinsky, “Quality is in the Eye of the Beholder: Meeting Users' Requirements for Internet Quality of Service,” ACM CHI'2000, Hague, Netherland, April 2000.

[21] S. Brandt and G. Nutt, “A Dynamic Quality of Service Middleware Agent for Mediating Application Resource Usage,” IEEE Real-Time Systems Symposium, December 1998.

[22] G. Buttazzo, G. Lipari, and L. Abeni, "Elastic Task Model for Adaptive Rate Control," IEEE Real-Time Systems Symposium, Madrid, Spain, pp. 286-295, December 1998.

[23] M. Caccamo, G. Buttazzo, and L. Sha, “Capacity Sharing for Overrun Control,” IEEE Real-Time Systems Symposium, Orlando, FL, December 2000.

[24] Carr, R., Virtual Memory Management, Ann Arbor, MI: UMI Research Press, 1984.

[25] S. Cen, "A Software Feedback Toolkit and its Application In Adaptive Multimedia Systems," Ph.D. Thesis, Oregon Graduate Institute, October 1997.

[26] M. E. Crovella and A. Bestavros, “Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes,” IEEE/ACM Transactions on Networking, 5(6):835--846, Dec 1997.

[27] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “Proportional Differentiated Services: Delay Differentiation and Packet Scheduling,” SIGCOMM’99, Cambridge, Massachusetts, August 1999.

[28] P. Druschel and G. Banga, “Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems,” Operating Systems Design and Implementation (OSDI'96), Seattle, WA, October 1996.

[29] L. Eggert and J. Heidemann, “Application-Level Differentiated Services for Web Servers,” World Wide Web Journal, Vol 2, No 3, March 1999, pp. 133-142.

[30] J. Eker, "Flexible Embedded Control Systems-Design and Implementation." PhD-thesis, Lund Institute of Technology, Dec 1999.

[31] E-Soft Inc., “Web Server Survey,” http://www.securityspace.com.

[32] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” IETF RFC 2616, June 1999.

[33] G. F. Franklin, J. D. Powell and M. L. Workman, Digital Control of Dynamic Systems (3rd Ed.), Addison-Wesley, 1998.

[34] G. F. Franklin, J. D. Powell and A. Emami-Naeini, Feedback Control of Dynamic Systems (3rd Ed.), Addison-Wesley, 1994.

199

[35] S. Gribble, G. Manku, E. Roselli and E. Brewer, “Self-similarity in File Systems”, SIGMETRICS’98, April 1998.

[36] C. V. Hollot, V. Misra, D. Towsley, and W. Gong, ”A Control Theoretic Analysis of RED,” IEEE INFOCOM, Anchorage, Alaska, April 2001.

[37] HP OpenView Homepage, http://www.openview.hp.com/.

[38] N. I. Kamenoff and N. H. Weiderman, “Hartstone Distributed Benchmark: Requirements and Definitions,” IEEE Real-Time Systems Symposium, 1991.

[39] Mathworks Inc., http://www.mathworks.com/products/matlab.

[40] R.P. Kar and K. Porter, "Rhealstone -- a Real Time Benchmarking Proposal," Dr. Dobbs' Journal, 14(2), February 1989.

[41] D. L. Kiskis and K. G. Shin, “SWSL: A Synthetic Workload Specification Language for Real-Time Systems”, IEEE Transactions on Software Engineering, 20(10), October 1994.

[42] D. L. Kiskis and K. G. Shin, “A Synthetic Workload for a Distributed Real-Time System”, Journal of Real-Time Systems, 11(1), July 1996.

[43] M. Klein, T. Ralya, B. Pollak, R. Obenza, M. G. Harbour, A Practitioner's Handbook for Real-Time Analysis – Guide to Rate Monotonic Analysis for Real-Time Systems, Kluwer Academic Publishers, August 1993.

[44] C. Lee, J. Lehoczky, D. Siewiorek, R. Rajkumar, and J. Hansen, “A Scalable Solution to the Multi-Resource QoS Problem,” IEEE Real-Time Systems Symposium, Phoenix, AZ, Dec 1999.

[45] J. P. Lehoczky, L. Sha and Y. Ding, “The Rate Monotonic Scheduling Algorithm – Exact Characterization and Average Case Behavior,” IEEE Real-Time Systems Symposium, 1989.

[46] B. Li and K. Nahrstedt, “A Control-based Middleware Framework for Quality of Service Adaptations,” IEEE Journal of Selected Areas in Communication, Special Issue on Service Enabling Platforms, 17(9), Sept. 1999.

[47] J. Liebeherr and N. Christin, “Buffer Management and Scheduling for Enhanced Differentiated Services,” University of Virginia Tech. Report CS-2000-24, August 2000.

[48] C. L. Liu and J. W. Layland, “Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment,” Journal of ACM, Vol. 20, No. 1, pp. 46-61, 1973.

[49] J. W. S. Liu, et. al., “Algorithms for Scheduling Imprecise Computations”, IEEE Computer, Vol. 24, No. 5, May 1991.

[50] C. Lu, T. F. Abdelzaher, J. A. Stankovic, and S. H. Son, “A Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers,” University of Virginia, Technical Report CS-2001-05, submitted to IEEE In submission to IEEE Transactions on Computers, Special Issue on QoS Issues in Internet Web Services, January 2001.

[51] C. Lu, J. A. Stankovic, T. F. Abdelzaher, G. Tao, S. H. Son and M. Marley, “Performance Specifications and Metrics for Adaptive Real-Time Systems,” IEEE Real-Time Systems Symposium, Orlando, FL, Dec 2000.

[52] C. Lu, J. A. Stankovic, G. Tao and S. H. Son, “Design and Evaluation of a Feedback Control EDF Scheduling Algorithm,” IEEE Real-Time Systems Symposium, Phoenix, AZ,

200

Dec 1999.

[53] C. Lu, J. A. Stankovic, G. Tao, and S. H. Son, “Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms,” University of Virginia, Technical Report CS-2001-06, submitted to Real-Time Systems Journal, Special Issue on Control-Theoretical Approaches to Real-Time Computing, January 2001.

[54] Y. Lu, A. Saxena, and T. F. Abdelzaher, “Differentiated Caching Services; A Control-Theoretical Approach,” International Conference on Distributed Computing Systems, Phoenix, AZ, April 2001.

[55] J. M. Maciejowski, Multivariable Feedback Design, Addison-Wesley, 1989.

[56] T. Madell, Disk and File Management Tasks on HP-UX, Prentice Hall, 1997.

[57] P. Mejia-Alvarez, R. Melhem, and D. Mosse, “An Incremental Approach to Scheduling during Overloads in Real-Time Systems,” IEEE Real-Time Systems Symposium, Orlando, FL, Dec 1999.

[58] V. Pai, P. Druschel and W. Zwaenepoel, “Flash: An Efficient and Portable Web Server,” USENIX Annual Technical Conference, Monterey, CA, June 1999.

[59] L. Palopoli, L. Abeni, F. Conticelli, M. D. Natale, and G. Buttazzo, “Real-Time Control System Analysis: An Integrated Approach,” IEEE Real-Time Systems Symposium, Orlando, FL, Dec 2000.

[60] G. Papadopoulos, “Moore’s Law Ain’t Good Enough”, keynote speech at Hot Chips Х, August 1998.

[61] S. K. Park and K. W. Miller, “Random Number Generators: Good Ones Are Hard to Find,” Communications of ACM, vol. 21, no. 10, Oct. 1988, pp. 1192-1201.

[62] R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek, “Practical Solutions for QoS-based Resource Allocation Problems,” IEEE Real-Time Systems Symposium, December 1998.

[63] D. Rosu, K. Schwan, and S. Yalamanchili, “FARA–a Framework for Adaptive Resource Allocation in Complex Real-Time Systems,” IEEE Real-Time Technology and Applications Symposium, June 1998.

[64] D. Rosu, K. Schwan, S. Yalamanchili and R. Jha, "On Adaptive Resource Allocation for Complex Real-Time Applications," IEEE Real-Time Systems Symposium, Dec 1997.

[65] M. Ryu and S. Hong, “Toward Automatic Synthesis of Schedulable Real-Time Controllers”, Integrated Computer-Aided Engineering, 5(3) 261-277, 1998.

[66] D. Seto, J. P. Lehoczky, L. Sha, and K. G. Shin, “On Task Schedulability in Real-Time Control Systems,” IEEE Real-Time Systems Symposium, December 1996.

[67] S. S. Skiena and S. Skiena, The Algorithm Design Manual, Telos/Springer-Verlag, New York, November 1997.

[68] S. H. Son, R. Zimmerman, and J. Hansson, " An Adaptable Security Manager for Real-Time Transactions," Euromicro Conference on Real-Time Systems, Stockholm, Sweden, June 2000.

[69] D. C. Steere, et. al., "A Feedback-driven Proportion Allocator for Real-Rate Scheduling," Symposium on Operating Systems Design and Implementation, Feb 1999.

[70] J. A. Stankovic, C. Lu, S. H. Son, and G. Tao, "The Case for Feedback Control Real-Time Scheduling," EuroMicro Conference on Real-Time Systems, York, UK, June 1999.

201

[71] J. A. Stankovic and K. Ramamrithitham (Eds), Hard Real-Time Systems, IEEE Press, 1988.

[72] J. A. Stankovic, M. Spuri, K. Ramamritham, and G. C. Buttazzo, Deadline Scheduling for Real-Time Systems – EDF and Related Algorithms, Kluwer Academic Publishers, 1998.

[73] K. G. Shin and C. L. Meissner, “Adaptation and Graceful Degradation of Control System Performance by Task Reallocation and Period Adjustment,” EuroMicro Conference on Real-Time Systems, June 1999.

[74] D. C. Steere, et. al., "A Feedback-driven Proportion Allocator for Real-Rate Scheduling," Symposium on Operating Systems Design and Implementation, Feb 1999.

[75] Veritas Software Corporation, “Veritas Volume Manager,” http://www.veritas.com/us/products/volumemanager/.

[76] N. H. Weiderman and N. I. Kamenoff, “Hartstone Uniprocessor Benchmark: Definitions and Experiments for Real-Time Systems,” Journal of Real-Time Systems, 4(4), December 1992.

[77] L. R. Welch and B. A. Shirazi, "A Dynamic Real-time Benchmark for Assessment of QoS and Resource Management Technology," IEEE Real-time Technology and Applications Symposium, June 1999.

[78] L. R. Welch, B. Shirazi and B. Ravindran, “Adaptive Resource Management for Scalable, Dependable Real-time Systems: Middleware Services and Applications to Shipboard Computing Systems,” IEEE Real-time Technology and Applications Symposium, June 1998.

[79] W. Zhao, K. Ramamritham and J. A. Stankovic, “Preemptive Scheduling Under Time and Resource Constraints,” IEEE Transactions on Computers 36(8), 1987.

Feedback Control Real-Time Schedulinglu/papers/thesis.pdf · We develop Feedback Control real-time...

Documents

Transcript of Feedback Control Real-Time Schedulinglu/papers/thesis.pdf · We develop Feedback Control real-time...