Decentralised agent-based resource allocation in open and ...

280
Decentralised Agent-based Resource Allocation in Open and Dynamic Environments Dissertation Submitted in fulfilment of requirements for the degree of Doctor of Philosophy Faculty of Information and Communication Technology Swinburne University of Technology Tino Schlegel June 2010

Transcript of Decentralised agent-based resource allocation in open and ...

Decentralised agent-based resource allocation in open and dynamic environmentsDissertation
Doctor of Philosophy
Swinburne University of Technology
Tino Schlegel
June 2010
I certify that I have read this dissertation and that, in my opinion, it is fully
adequate in scope and quality as a dissertation for the degree of Doctor of
Philosophy.
(Prof Ryszard Kowalczyk) Principal Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully
adequate in scope and quality as a dissertation for the degree of Doctor of
Philosophy.
ii
Abstract
The on-demand provisioning of distributed applications and services in open, large-scale
and distributed systems such as the Internet is a complex undertaking. It requires an
adaptive resource allocation scheme that can effectively and efficiently allocate resources on
a large scale and across different administrative domains. Two of the main challenges for
resource allocation in such environments are the lack of full control over the resources and
the uncertainty and the limitation in the type and the amount of information about resource
providers and resource consumers. These challenges restrict the use of resource allocation
schemes that require central control or assume the availability of full information and direct
coordination between providers and consumers.
In this thesis, we study distributed resource allocation problems in open and dynamic
environments consisting of independent resource providers and resource consumers. In
such environments, consumers have to select a provider with sufficient resources for the
task execution. Shared resources that serve the computational needs of the consumers are
offered by the providers. The interest of each provider is an optimal utilisation of their
resources, or equivalently to minimise the amount of idle resources. On the other hand, the
consumers pursue the execution of their tasks at a provider where the demand of resources
does not exceed the capacity in order to get a high quality of service. The problem in this
setting is that providers and consumers operate independently and no central entity exists
that can effectively mediate the allocation of resources. Resource consumers are not directly
aware of each other, thus they have no means to communicate and directly coordinate their
resource allocation decisions with each other. They must learn to coordinate their own
resource allocation decisions with others.
This problem of the on-demand allocation of resources in open environments where the
control over resources is decentralised among the participants is not well investigated in the
iii
current research literature. Existing resource allocation schemes with decentralised control
are typically studied in a static context, where a fixed number of consumers with static
resource demands try to access a fixed amount of resources provided by a single provider.
This thesis addresses the above problems and proposes a multi-agent framework for de-
centralised and dynamic resource allocation. Resource allocation decisions are made by
autonomous adaptive agents in the presence of changing demand for resources as well as
the availability and capacity of shared resources. This innovative resource allocation mech-
anism is based on inductive reasoning techniques. It allows agents to reason about the
expected amount of available resources based on past observations. This knowledge enables
the agents to individually request for the allocation of resources without direct coordina-
tion between them to pursue the overall aim of collectively optimising the utilisation of the
shared resources. The resource allocation is created by the effective competition of agents
for the available resources and is a purely emergent effect.
The second contribution of this thesis is a study of the impact of different information models
with regard to the level of coordination between the agents. More specifically, we consider
the Publish-Subscribe and the Data-Pull information models. The results show that agents
can adapt their resource allocation decisions in the face of gradual changes in a dynamic
environment. The resource utilisation of a provider is closer to the optimal utilisation when
consumers have only limited and heterogeneous information that they individually collect
using the Data-Pull model as opposed to the level of coordination that can be achieved
when the providers publish their resource utilisation information globally. At the same
time, the resource allocation success rate for the agents is significantly higher with limited
information because agents are less reactive to fluctuations in the environment.
The applicability of the developed algorithm in open and dynamic environments is demon-
strated in a range of different scenarios. More specifically, we first examine environments
where consumers use the resources of a single provider in different settings with static and
dynamic capacities. Then, this thesis investigates different resource allocation strategies
in environments with multiple providers and tasks that require multiple types of resources
simultaneously. The empirical evaluation shows that the utilisation of resources is closer to
optimal when the consumers have less information available and they only explore alterna-
tive resource allocations when the recent allocations were not satisfying.
iv
Acknowledgements
I would like to show my gratitude to my supervisor, Professor Ryszard Kowalczyk, who
has supported me throughout my PhD candidature with his invaluable support and guid-
ance through this difficult time, all his insightful advice and the constant encouragement
throughout my PhD candidature. I would also like to thank my co-supervisors, Professor
Jun Han and Dr. Bao Vo for their support. I appreciate all of their thoughtful directions,
feedback and encouragement.
Special thanks goes to Dr. Peter Braun, Jan Richter, Mohan Baruwal Chhetri, Dr. Xuan
Thang Nguyen, Dr. Jian Feng Zhang, and all current and former members of the Intelligent
Agent Technology Group at Swinburne University of Technology.
This thesis has been supported financially by the Swinburne University of Technology, and
I would like to express my sincere gratitude for providing this opportunity.
Last but not least, I would like to thank my parents and wife, Dr. Linda Osman, for their
love and constant support.
I hereby declare that the thesis entitled “Decentralised Agent-based Resource Allocation
in Open and Dynamic Environments” submitted in fulfilment of the requirements for the
Degree of Doctor of Philosophy in the Faculty of Information and Communication Tech-
nologies of Swinburne University of Technology, is my own work and that it contains no
material which has been accepted for the award to the candidate of any other degree or
diploma, except where due reference is made in the text of the thesis. To the best of my
knowledge, it contains no material previously published or written by another person except
where due reference is made in the text of the thesis.
Tino Schlegel
vi
Publications
Portions of the material in this thesis have previously appeared in the following publica-
tions:
1. T. Schlegel, P. Braun, and R. Kowalczyk. Towards Autonomous Mobile Agents with
Emergent Migration Behaviour. In Proceedings of the Fifth International Joint Con-
ference on Autonomous Agents & Multi Agent Systems (AAMAS 2006), Hakodate
(Japan), pp. 585–592, ACM Press, 2006
2. T. Schlegel and R. Kowalczyk. Towards Self-organising Agent-based Resource Allo-
cation in a Multi-Server Environment. In Proceedings of the Sixth Intl. Joint Conf.
on Autonomous Agents and Multiagent Systems (AAMAS 2007), Hawai’i, USA, pp.
74–81, IFAAMAS, 2007
3. T. Schlegel and R. Kowalczyk. Self-organizing Nomadic Services in Grids. In M.
Prokopenko (ed.) Advances in Applied Self-organizing Systems, pp. 217–244, Springer,
2008
4. T. Schlegel, R. Kowalczyk, and B.Q. Vo. Decentralized Co-Allocation of Interrelated
Resources in Dynamic Environments. In Proceedings of the 2008 IEEE/WIC/ACM
International Conference on Intelligent Agent Technology (IAT-08), Sydney, Aus-
tralia, pp. 104 – 108, IEEE Press, 2008
vii
Contents
1.2 Exemplar Problem Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Rich Internet Applications . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Service-Oriented Computing . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Research Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Scientific Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Grid Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.4 Peer-to-Peer Computing . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.5 Multi-Agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 Multi Agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Fundamentals of Non-cooperative Game Theory . . . . . . . . . . . . . . . . 41
3.2.1 Rationality Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2 Solution Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Nash equilibrium Solutions . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 Learning in the El Farol Bar Problem . . . . . . . . . . . . . . . . . 54
3.3.3 Reinforcement Learning Approaches . . . . . . . . . . . . . . . . . . 55
3.3.4 Inductive Reasoning Agents . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.1 Basic Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 Publish-Subscribe Model . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 Data-Pull Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.3 Problem Formalisation . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Histories, Beliefs and Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Basic Consumer Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2.1 Global Resource Allocation Strategy . . . . . . . . . . . . . . . . . . 106
5.2.2 Provider Selection Strategies . . . . . . . . . . . . . . . . . . . . . . 108
5.2.3 Residual Capacity Prediction . . . . . . . . . . . . . . . . . . . . . . 109
5.2.4 Predictor Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.5 Decision Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3.2 Resource Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Theoretical Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.5.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.5.4 Dynamic Environment Results . . . . . . . . . . . . . . . . . . . . . 155
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.1 Model Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2.3 Selection of Unpredictable Provider . . . . . . . . . . . . . . . . . . 185
6.2.4 Resource Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2.5 Decision Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.3.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.3.4 Co-Allocation of Interrelated Resources Results . . . . . . . . . . . . 230
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.3 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
3.1 Payoff Matrix – Battle of the Sexes . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 Confidence intervals for Bernoulli distribution with a normal distribution for
n=100 and different levels of p. . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2 Comparison of the resource allocation qualities of the local and the global
strategy as well as the random choice game with both provider selection
strategies in various static environments. . . . . . . . . . . . . . . . . . . . . 153
5.3 Comparison of the minimal, median and maximum resource allocation suc-
cess rates for the local strategy and the global strategy with the LMAP selec-
tion strategy in various static environments. . . . . . . . . . . . . . . . . . . 155
5.4 Comparison of the resource allocation qualities of the local and the global
strategy with both provider selection strategies in various dynamic environ-
ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.5 Comparison of the average distance of the resource usage from the capacity
for the local strategy and the global strategy with both provider selection
strategies, m ∈ {10, 15} and |Πi| ∈ {2, 4} in various dynamic environments. 165
6.1 Comparison of the resource allocation qualities for both provider selection
strategies in Experiment 1 (unsaturated) and 2 (saturated). . . . . . . . . . 203
6.2 Comparison of the providers’ average resource usage and standard deviation
for both provider selection strategies in Experiment 2 (saturated). . . . . . 208
6.3 Comparison of the minimal, median and maximum resource allocation suc-
cess rates for experiments 1-4 . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.4 Comparison of the providers’ resource allocation qualities for both provider
selection strategies in Experiment 3 (unsaturated) and 4 (saturated). . . . . 214
xii
6.5 Comparison of the providers’ resource allocation qualities for both provider
selection strategies in Experiment 6. . . . . . . . . . . . . . . . . . . . . . . 224
xiii
List of Figures
1.1 Web server providing shared resources for a limited number of clients . . . . 6
1.2 Resource consumer in an environment with resource providers in different
administrative domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Centralised Resource Allocation Paradigm . . . . . . . . . . . . . . . . . . . 21
2.2 Distributed Resource Allocation Paradigm . . . . . . . . . . . . . . . . . . . 22
2.3 Decentralised Resource Allocation Paradigm . . . . . . . . . . . . . . . . . . 23
3.1 Perfect reasoning assumes that agents apply deductive reasoning to translate
the complete knowledge about the game into the best possible action. . . . 45
3.2 Feedback loop of inductive reasoning and rounded rational agents . . . . . . 50
4.1 Basic model concepts (on left) with examples (on right). Arrows indicate
relationships between concepts. . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Comparison of the Publish-Subscribe and Data-Pull Information Models. . 75
4.3 Resource information using (a) the Publish-Subscribe Model and (b) the
Data-Pull Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4 Model of an open environment . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Decentralised, agent-based resource allocation algorithm . . . . . . . . . . . 81
5.1 Resource allocation scenario with multiple consumers and a single provider
for each resource type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Example of resource allocation decisions process with 3 consumers. . . . . . 109
5.3 Multi-consumer resource allocation decision problem in an environment with
a single provider with local and global information. . . . . . . . . . . . . . . 118
5.4 Global versus local history information of 3 consumers ci, cj and ck about
provider pl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
xiv
5.5 Comparison of resource usage and variance of the global and local strategy
using the last most accurate predictor selection strategy in various static en-
vironments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.6 Comparison of resource usage and variance of the local strategy for different
predictor set and history sizes using the last most accurate predictor selection
strategy in various static environments. . . . . . . . . . . . . . . . . . . . . . 137
5.7 Comparison of resource usage and variance of the global strategy for different
predictor set and history sizes using the last most accurate predictor selection
strategy in various static environments. . . . . . . . . . . . . . . . . . . . . . 138
5.8 Comparison of resource usage and variance of the local strategy for different
predictor pool sizes using the last most accurate predictor selection strategy
in various static environments (m ∈ {8, 12}). . . . . . . . . . . . . . . . . . 141
5.9 Comparison of resource usage and variance of the global strategy for different
predictor pool sizes using the last most accurate predictor selection strategy
in various static environments (m ∈ {8, 12}). . . . . . . . . . . . . . . . . . 142
5.10 Comparison of resource usage and variance of the global strategy and the local
strategy for a variety of arbitrary predictors with different history sizes using
the last most accurate predictor selection strategy in various static environ-
ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.11 Comparison of resource usage and variance for the local strategy with different
predictor selection strategies and history sizes in various static environments. 146
5.12 Comparison of resource usage and variance for the global strategy with dif-
ferent predictor selection strategies and history sizes in various static envi-
ronments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.13 Convergence of average resource usage and standard deviation for the local
strategy and the global strategy with the LMAP strategy, m = 15 and |Πi| = 2
in various static environments. . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.14 Comparison of the resource usage histograms for the local strategy and the
global strategy with the LMAP strategy, m = 15 and |Πi| = 2 in various
static environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.15 Comparison of the individual resource allocation success rates of the local
strategy and the global strategy with the LMAP strategy, m = 15 and |Πi| = 2
in various static environments. . . . . . . . . . . . . . . . . . . . . . . . . . 156
xv
5.16 Comparison of resource usage and variance for the local strategy and the
global strategy with the LMAP strategy, m = 15 and |Πi| = 2 in various
dynamic environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.17 Comparison of resource usage and variance for the local strategy and the
global strategy with the HPSS strategy, m = 15 and |Πi| = 2 in various
dynamic environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.18 Comparison of the resource usage histograms for the local strategy and the
global strategy with the LMAP strategy, m = 15 and |Πi| = 2 in various
static environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.19 Comparison of the resource usage histograms for the local strategy and the
global strategy with the HPSS strategy, m = 15 and |Πi| = 2 in various static
environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.20 Convergence of average resource usage and standard deviation for the local
strategy and the global strategy with the HPSS strategy, m = 15 and |Πi| = 2
in an environment with a dynamic consumer population, |C| ∈ [20, 100]. . . 167
5.21 Convergence of average resource usage and standard deviation for the local
strategy and the global strategy with the HPSS strategy, m = 15 and |Πi| = 2
in an environment with a dynamic consumer population, |C| ∈ [60, 140]. . . 168
6.1 System Model with multiple candidate providers and task with multiple re-
source types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.2 Multiple Provider Decision Problem with local information . . . . . . . . . 174
6.3 Normalisation of the expected residual capacities for the max-min Greedy
Provider Selection Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.4 Decay rate of historical information . . . . . . . . . . . . . . . . . . . . . . . 189
6.5 Example history of consumer ci about two candidate providers . . . . . . . 189
6.6 Experiment 1: Adaptive resource allocation in static, unsaturated environ-
ment with the greedy provider selection strategy. . . . . . . . . . . . . . . . 200
6.7 Experiment 1: Adaptive resource allocation in static, unsaturated environ-
ment with the provider confidence selection strategy. . . . . . . . . . . . . . 201
6.8 Experiment 2: Adaptive resource allocation in static, saturated environment
with the greedy provider selection strategy. . . . . . . . . . . . . . . . . . . . 205
xvi
6.9 Experiment 2: Adaptive resource allocation in static, saturated environment
with the provider confidence selection strategy. . . . . . . . . . . . . . . . . 206
6.10 Experiment 2: Comparison of the relative resource usage histograms for both
provider selection strategies in a static, unsaturated environment. . . . . . . 209
6.11 Experiment 3: Average resource usage development in dynamic, unsaturated
environment with the greedy provider selection strategy. . . . . . . . . . . . 212
6.12 Experiment 3: Adaptive resource allocation in dynamic, unsaturated envi-
ronment with the provider confidence selection strategy. . . . . . . . . . . . 213
6.13 Experiment 4: Average resource usage development in dynamic, saturated
environment with the greedy provider selection strategy. . . . . . . . . . . . 216
6.14 Experiment 4: Average resource usage development in dynamic, saturated
environment with the provider confidence selection strategy. . . . . . . . . . 218
6.15 Experiment 4: Comparison of the relative resource usage histograms for both
provider selection strategies in dynamic, saturated environment. . . . . . . 219
6.16 Experiment 5: Average resource usage development with the greedy provider
selection strategy in environment with a dynamic provider population. . . . 223
6.17 Experiment 6: Average resource usage development in an environment with
dynamic consumer population and dynamic provider capacities and the greedy
provider selection strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.18 Experiment 6: Average resource usage development with dynamic consumer
population and dynamic provider capacities and the provider confidence se-
lection strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.19 Experiment 6: Comparison of the relative resource usage histograms for both
provider selection strategies in static environments (Providers p1-p3). . . . . 228
6.20 Experiment 6: Comparison of the relative resource usage histograms for both
provider selection strategies in static environments (Providers p4-p6). . . . . 229
6.21 Experiment 7: Average resource usage development with the greedy provider
selection strategy in a static environment where tasks require multiple re-
sources types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.22 Experiment 8: Average resource usage development with the greedy provider
selection strategy in an open and dynamic environment where tasks require
multiple resources types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
1 Introduction
The increasing popularity of distributed computing paradigms such as the Grid computing
[Foster and Kesselman, 1999], Peer-to-Peer systems [Buford et al., 2008] and, more recently,
Cloud Computing [Velte et al., 2009] has transformed the Internet into a powerful comput-
ing platform for distributed applications. Important characteristics of these environments
are computational and geographical distribution, a dynamic system architecture, the lack
of coherent global knowledge and a dispersed ownership and control over the computing
resources [Foster and Kesselman, 1999].
Nowadays, most applications are distributed for users to remotely access services, data,
and devices on-demand from geographically disperse locations and across different admin-
istrative domains. Examples of such distributed applications are communication software1,
social networks2, photo-sharing3, personal finance4, etc. Not only do they enable access to
a variety of information and computing resources regardless of the current location, they
also allow people at multiple locations to collaboratively share and edit documents, spread
sheets, slide shows and photos.
The on-demand provisioning of distributed applications and services in open, large-scale and
highly distributed environments like the Internet is a complex undertaking. Every applica-
tion or service request is associated with a task that needs to be executed. The execution
requires the allocation of some amount of shared resources on a resource provider. The effi-
cient and flexible allocation of resources on a large scale and across different administrative
domains is a critical requirement for the provisioning of distributed applications in such
environments. Traditional resource allocation approaches, which are typically based on the
1E.g. http://www.googlewave.com or http://www.twitter.com 2http://www.facebook.com 3http://www.flickr.com or http://www.photobucket.com 4http://www.westpac.com or http://finance.yahoo.com
1 Introduction
central resource allocation paradigm with a central decision maker, are not suitable for this
kind of environment as they require full control over the resources and global knowledge
about the state of the environment.
It has been recognised that a shift away from the central resource allocation paradigm is
required [Wolpert, 2003, Eymann et al., 2007]. Open, large-scale and dynamic environments
require a resource allocation scheme that can cope with the challenges introduced by these
environments. The main challenges are the lack of full control over the resource allocation
process as well as the uncertainty and the limitation in the type and the amount of available
information on different providers and consumers [Schopf, 2004]. These challenges are
mainly caused by the distribution of the resources across different administrative domains.
1.1 Resource Allocation in Open Environments
Resource allocation is one of the fundamental and most important problems in computer
science. It describes a mechanism for the efficient and effective management of the access
to a limited resource or set of resources by its various consumers. A common formulation
of a resource allocation problem in a distributed computing environment is to assign a set
of tasks to resources and schedule their executions to optimize certain performance metrics.
Examples of such metrics are the maximisation of the number of jobs that can be processed
without deadline violations or the minimisation of the completion time of the entire job
set [Lu, 2006]. This problem is proven to be NP-complete [Coffman, 1976], and thus,
any practical resource allocation algorithm presents a trade-off between the computational
complexity and quality of the allocations [Braun et al., 2001].
Many solutions to solve the resources allocation problem in a system with central control
exist. Often, those approaches employ conventional optimisation techniques (e.g. vector-
packing algorithm [Shahabuddin et al., 2001], integer programming techniques [Sheikh and
Khan, 2005, Gertphol and Prasanna, 2005] or constraint programming techniques [Hladik
et al., 2008]) to compute optimal resources allocations at the expense of the scalability for
large problem sizes. They typically assume full control over the resource allocation pro-
cess and complete knowledge about the state of all resources and the entire set of tasks.
2
1 Introduction
These assumptions, however, do not hold in an open, large-scale and dynamic environ-
ment spanning across different administrative domains. This kind of environment consists
of autonomous, heterogeneous resource consumers and resources providers, which are not
known in advance. The structure of the system is unknown and is capable of dynamically
changing over time. Resources are owned and maintained by different organisations that
do not willingly handover the control over their resources to a central resource management
authority. Resource consumers or resource providers can enter and leave the system at any
time without prior notification, or they can change their interest unexpectedly. The sup-
ply and demand of shared resources can increase or decrease over time reflecting different
needs.
With respect to distributed systems, a number of resource allocation approaches for this
domains such as high-performance cluster computing [Buyya, 1999, Yeo and Buyya, 2006],
scheduling complex workflows [Pandey and Buyya, 2008], load-balancing internet [Bourke,
2001] or Grid Computing applications [Nabrzyski et al., 2004] have been proposed in the
literature. Those approaches address the problems of distribution, scalability and the lack
of full control over the resources in open and dynamic environments that prevents the
use of conventional approaches based on optimisation techniques. In recent years, market-
based techniques [Clearwater, 1996] have gained significant attention in the development
of approaches for the allocation of computational expensive tasks in open environments.
They partially distribute the control over the resource allocation process among the entities
in the system and can work with incomplete information. Market-based approaches also
allow the consumers to better specify their personal preferences. The practical suitability of
most traditional approaches for the on-demand allocation of resources in open and dynamic
environments in real-time, however, is limited due to their high computational complexity,
a high communication overhead and scalability problems when the number of participants
in the market increases [Haque et al., 2005]. Those approaches and their limitations in open
and dynamic environments are detailed in Chapter 2.
Over the past years, there is a growing realisation of how useful it is to control open,
large scale environments with little to no centralised communication where the decisions
are made by autonomous individuals that have little detailed knowledge about the system’s
small scale dynamic behaviour [Wolpert, 2003].
3
This thesis studies the resource allocation problem under decentralised control in open and
dynamic environments. We expound existing resource allocation approaches and investi-
gate their suitability to solve the resource allocation problem in an environment without
central control and any direct communication between the decision makers. Our resource
allocation framework employs autonomous, adaptive agents that allow a decentralisation of
control over the resource allocation process. The dynamic and distributed nature of open,
large-scale systems can benefit from adaptive and decentralised control. The decentrali-
sation of the resource allocation decisions increases not only the robustness of individual
systems, but also enables to build adaptive, self-organising and scalable systems that are
able to maintain their efficiency in the face of unexpected environmental changes. The re-
source allocations can emerge through the interaction between many self-interested resource
consumers competing for shared resources.
In recent times, the idea of decentralising the resource allocation process has gained sig-
nificant attention [Greenwald, 1999, Wolpert et al., 2000, Mainland et al., 2005, Vengerov,
2007]. Simple models and algorithms [Arthur, 1994, Challet and Zhang, 1997, Franke, 2003,
Hichri and Kirman, 2007] that enable the decentralised coordination of self-interested indi-
viduals in simplified, static environments have been proposed. However, the applicability of
those proposed approaches for the decentralised, on-demand allocation of resources in open
and dynamic environments has not been studied before and remains an open issue.
The central objective of this thesis is the development of a fully decentralised resource al-
location algorithm for open and dynamic environments. The performance goal for the re-
source allocation algorithm is the maximisation of the resource utilisation while avoiding
the over-allocation of the resources of any provider beyond its capacity. The idea is to use
autonomous, adaptive agents that request for the allocation of resources without any direct
communication to coordinate their decisions with other consumers. In order to do this the
agents must learn to adapt their resource allocation behaviour to the changes in a dynamic
environment. A special objective of the algorithm is the efficient coordination between the
autonomous users in the case that the total demand of all users exceeds the total capacity
of the providers. It is desired to provide a solution that is computationally efficient with
low communication requirements, which is also robust, scalable and able to maintain its
efficiency in the face of unexpected changes of an open and dynamic environment.
4
1 Introduction
In particular, the focus is on environments that are characterised by many frequent interac-
tions, where each interaction requires the real-time allocation of different types of resources.
This enables the use of information from previous task allocations to make future resource
allocation decisions. In this thesis, we define a task as anything that requires one or more
resources for its execution. A resource can be anything that can be allocated – typically
computing resources such as CPU cycles, system memory, and hard disk space or network
bandwidth.
The next section motivates our research with two real-world problem domains and illustrates
the research problems that need to be solved.
1.2 Exemplar Problem Domains
This section introduces two real-world application domains that gain significant advantages
from decentralised resource allocation schemes. The first application domain is Rich Internet
Applications (RIAs) introduced in Section 1.2.1. The second application domain that can
benefit from an efficient decentralised resource allocation framework is Service-Oriented
Computing (SOC), which is introduced in Section 1.2.2.
1.2.1 Rich Internet Applications
Rich Internet Applications are Web applications and Web services that are accessed online
with a web browser, while the software and data are stored on the servers. Examples of
Rich Internet Applications are travel booking services 5, online office applications6, social
networking7, photo-sharing applications8 or file storage applications9 that can be accessed
with a modern Web browser on-demand from any device with network access. Typically,
Rich Internet Applications can be described as an on-demand computing paradigm that
is characterised by frequent and short interactions with a remote resource provider where
5For example http://www.expedia.com 6For example, www.zoho.com offers a suite of online office applications (writer, spreadsheets etc.) that can
be accessed from the Website with a modern web browser. 7For example http://www.facebook.com 8For example http://www.flickr.com 9For example http://www.4shared.com
5
Resource Types
100 90
Ω = Σωi
Figure 1.1: Web server providing shared resources for a limited number of clients
the allocation of small amount of resources are required in real-time. Typical for such
applications are a short time between subsequent interactions, and subsequent allocation
requests, which have similar resource demands. Figure 1.1 illustrates such a scenario where
a large number of resource consumers accessing a RIA. Each request sent by consumer ci
requires the allocation of a fixed amount of resources, ωi, for the execution at the provider,
who offers free access to its shared resources of a known capacity, say Γ. The resources can
be shared by multiple users simultaneously. However, the quality of service is only assured
when the demand for resources does not exceed the provided capacity. This scenario is a
typical example of an open environment where multiple entities share a common pool of
resources that requires an adaptive, decentralised resources allocation scheme.
High volumes of incoming service requests from a large number of autonomous users can
cause serious technical problems for the provider. The problem of the lack of resources
gets intensified in particular peak times, where the total demand of all users, = ∑
i ωi
can rise up multiple times above the provided capacity, Γ. The ability to handle resource
allocation requests in peak times while minimizing the cost of required resources is the
subject of ongoing research. Most existing resource allocation mechanisms for on-demand
resource allocation do not handle this situation [Shahabuddin et al., 2001, Bourke, 2001,
Kopparapu, 2002, Zhou and Ippolitia, 2008].
A common workaround for the provisioning of a high quality of service (QoS) (i.e. response
time) even in peak times, is to prepare the provider for such a worst case scenario in advance.
This requires huge investments in provider resources for the delivery of a high quality of
service in peak times. This is often realised with the virtualisation of resources, where
6
1 Introduction
multiple heterogeneous resources managed by a local instance appear as one homogeneous
entity to the clients [David et al., 2009]. However, this workaround does not eliminate the
actual problem of coordinating the access to the resources when the total demand exceeds
the capacity of the provider. In addition, it is a very costly solution, where a large amount
of resources are underutilised in most times.
A central objective of this work is the development of a decentralised resource allocation
mechanism that enables the efficient coordination between self-interested, autonomous con-
sumers in peak times, when the total demand for resources exceeds the capacity of the
provider.
The technical problems are caused by the lack of coordination among the autonomous
users for the limited capacity of the provider. This is a common resource sharing problem
that occurs when unrestricted access to a publicly available resource with limited capacity
is provided. It is known as the ’Tragedy of the Commons’, a social trap that was first
described by Hardin [1968]. Thus, the provisioning of a RIA requires a decentralised and
adaptive resource allocation scheme that can effectively coordinate the large number of
incoming requests from autonomous users to prevent the over-allocation of the provider
resources.
1.2.2 Service-Oriented Computing
The second domain that can benefit from a decentralised resource allocation framework is
the service-oriented computing domain. Service-oriented computing (SOC) [OASIS, 2006]
is an emerging computing paradigm for building, delivering and consuming distributed
applications. Each application is modelled as a collection of loosely-coupled services com-
municating via the exchange of standardised documents. Often the services are located
on different hosts and potentially across different administrative boundaries. Each service
execution is associated with a task that requires the allocation of shared resources offered
by the provider. Even though this mechanism provides a great flexibility with clear advan-
tages, this paradigm has its own share of problems that need to be solved. An important
subtask of the Quality of Service (QoS) management is the service selection. Services must
be selected from a list of candidates so that the requirements on the QoS of the composition
can be satisfied.
provide different information that is
available about their provided resource and the monitoring service for the estimation of the future
resource usage.
P1 C=50
P2 C=40
T1 R=15
T2 R=15
T4 R=8
T3 R=25
T5 R=34
Resource providers discovered
by the client
Figure 1.2: Resource consumer in an environment with resource providers in different ad- ministrative domains
The same service is often provided by different providers located in their own administra-
tive domain, i.e. the resources in each domain are managed by their own administrative
authority. The providers differ in the quality-of-service, which is determined by the capacity
of resources that they offer and the current demand of resources. Thus, to ensure a high
quality of service, each client must select a provider from the list of candidate providers
that offers enough free resources for the service execution. In this environment, many au-
tonomous clients compete for shared resources of the same set of providers simultaneously.
Each client has to select a provider based on its limited view about the environment. This
situation with one client and a set of service providers is visualised in Figure 1.2. If all
clients select the same service provider in their composition, the resources of its provider
may not have enough capacity to serve all clients simultaneously. As a consequence, the
QoS of the individual service may be violated. The problem gets intensified by the fact that
users are autonomous and self-interested and are not explicitly aware of each other. They
may know that the shared resource is used by others simultaneously; however, they have
no way to observe their presence.
Therefore, an adaptive resource allocation mechanism for the coordination of the access
to multiple shared resources is required. The mechanism must work across different ad-
ministrative domains with little information about the environment. The key issue is the
8
1 Introduction
coordination between the autonomous resource consumers competing for a set of shared
resources. This decentralised resource allocation problem is the central objective of this
work.
1.3 Research Requirements
The aim of this thesis is to develop a decentralised resource allocation framework for the on-
demand allocation of resources in open, large-scale and dynamic environments in real-time.
The resource allocation mechanism must work across separately administrated domains,
under the consideration of resource heterogeneity, loss of absolute control over the resources,
and with limited and not up-to-date information about the system. The challenges that
need to be addressed in the development of a decentralised framework are explained in more
detail as follows:
• R1: Decentralisation of control – Shared resources in an open environment are
owned and maintained by different organisations. The owners of resources do not give
up the control over their own resources to a central resource management authority. At
the same time, the autonomous, self-interested consumers prefer to retain control over
their resource allocation decisions rather let a central facilitator decide allocations. As
a consequence, the control over the resource allocation decisions must be distributed
among the participants in the system.
• R2: Shared resources and unpredictability – Resources are shared among a
number of users. The lack of control over the resources and the other users sharing
the same resources may result in a high degree of fluctuation of the resource utilisation.
Therefore, a mechanism is required that enables the effective coordination among the
autonomous and self-interested users to assure a high system performance.
• R3: Limited information about the environment – The owners of resources may
have different policies about the provisioning of information about their resources as no
accepted standard exists. As a consequence, the information about different providers
in open environments can differ in type, level of detail and accuracy. All consumers
are competing for shared resources without knowing each other. They have no infor-
mation on how many others are competing for the same shared resources and what the
9
resource requirement of other consumers are. Thus, the resource allocation framework
must work across different administrative domains with very limited information that
can be heterogeneous, uncertain, or incomplete.
• R4: Dynamic environment – Open environments are subject to constant changes –
intended changes as well as unintended changes – leading to a highly dynamic system.
Intended changes are mainly caused by the change of interests of resource providers
or resource consumers over time. Resource providers may provide additional shared
resources or more types of resources to attract more customers. Others may also
reduce the amount or resources to reflect the change in demand. In fact, providers
can become available or unavailable without prior warning or notification. Unintended
changes can be caused by unscheduled maintenance work or faults in components
such as computer or network connection crashes. Therefore, the resource allocation
algorithm must be able to improve the resource allocations through experience in
stable environments, yet must be highly adaptive in order to maintain its efficiency
in the face of unexpected changes of a dynamic environment.
• R5: On-demand allocation of resources – Focus of the resource allocation frame-
work is the on-demand allocation of resources without the provisioning of queuing
resource allocation request for the future execution. The challenge is to allocate as
many tasks in real-time as possible without over-utilising shared resources. The al-
gorithm must provide a good trade-off between the computational efficiency of the
algorithm and the computation of near-optimal resource allocations. The distribu-
tion of control enables a speed-up of the resource allocation computations due to the
parallel execution.
• R6: Geographical distribution of resources – Providers and consumers are ge-
ographically widely dispersed. This distribution leads to a wide variety of network
characteristics. Not only does latency scale with the distance, but also the cost of
network transfer needs to be taken into consideration when allocating resources for
distributed applications.
• R7: Communication overhead – The lack of information in open and distributed
10
1 Introduction
environments is a common problem. Optimal resource allocations can only be com-
puted with complete information about all resource providers and resources con-
sumers, which causes a high communication overhead, in particular in distributed
systems under decentralised control [Modi et al., 2001]. The challenge is the develop-
ment of a resource allocation algorithm with little overhead with regard to the amount
of information needed for the resource allocation process.
On the other hand, it is desirable that the network as a whole shows optimised be-
haviour with regard to low overhead communication, short computation times and
pareto-optimal resource allocation. The coordination concept should avoid the so-
called over-usage of shared resources known as the “Tragedy of the Commons” or
“free-riding behaviour” which can lead to the network’s collapse [Eymann et al., 2003].
In conclusion, these properties characterise an open and dynamic environment as a com-
plex and adaptive system, which is “a dynamic network of many agents acting in parallel,
constantly acting and reacting to what the other agents are doing. The control of a system
tends to be highly dispersed and decentralized. If there is to be any coherent behaviour in
the system, it has to arise from competition and cooperation among the agents themselves.
The overall behaviour of the system is the result of a huge number of decisions made every
moment by many individual agents”. [Waldrop, 1992]
1.4 Research Contributions
This thesis studies the decentralised resource allocation problem in open and dynamic en-
vironments. In particular, it investigates the feasibility of building autonomous adaptive
agents that can allocate tasks to resource providers in an open and dynamic environment.
The key problem to address is the coordination of the access to the limited shared resources
between the autonomous resource consumer agents. Each consumer wants to use the shared
resources for the execution of its own tasks while avoiding the over-allocation of the shared
resources to prevent performance degradation.
11
Research Question 1: Decentralised Resource Allocation
Can the resource allocation problem be efficiently solved in open and dynamic environments
where the control over the resource allocation decisions is distributed among autonomous
and adaptive agents?
This question addresses the problem of the development of a decentralised resource alloca-
tion framework that can address the challenges of open and dynamic environments listed in
Section 1.3. A solution which is technically feasible, can be practically applied and has good
economical characteristics for providers as well as consumers is aspired. These properties
are explained in more detail in the following:
• Technical feasibility: The resource allocation framework must work in open envi-
ronments across different administrative domains, with incomplete and not up-to-date
information about resources, and the loss of absolute control over the resources.
• Practical applicability: A practical resource allocation framework for open and
dynamic environments must have a good computational performance and a limited
communication overhead. It must efficiently allocate tasks that require different re-
sources among multiple providers. The algorithm must be highly adaptable to changes
in dynamic environments in terms of dynamic capacities and consumer populations.
• Economic characteristics: The algorithm must determine efficient resource allo-
cations and avoid the over-usage of the shared resources commonly known as the
“Tragedy of the Commons” [Hardin, 1968]. In addition, it is desirable that the allo-
cations are fair to consumers and providers.
Research Question 2: Effects of Information Models
What are the effects of different information models (Publish-Subscribe Information Model
versus Data-Pull Information Model) on the performance of the agent coordination under
decentralised control with regard to the efficiency of the decentralised resource allocations?
12
The amount of information that a decentralised decision making procedure requires has a
big impact on their practical applicability. The higher the amount of information that is
required for the decision making process, the worse is the practical applicability of the ap-
proach due to the increased communication overhead of the resource allocation approach.
Research Question 3: Adaptability in Open and Dynamic Environments
Can a decentralised resource allocation mechanism adapt to changes in open and dynamic
environments?
A practical resource allocation mechanism for open environments must maintain its effi-
ciency in the face of unexpected changes in open and dynamic environments. Not only can
the supply and demand for shared resources change over time, but also computers crash
and need to be restarted or the network connection can fail.
Research Question 4: Empirical Evaluation
How can the decentralised resource allocation algorithm for open and dynamic environments
be evaluated by means of simulation?
The developed algorithm has to be evaluated with respect to its technical and economical
characteristics. Computer simulations were applied to evaluate the decentralised resource
allocation algorithm. The biggest challenge was caused by the lack of empirical data that
reflects the characteristics of open and dynamic environments. Furthermore, no simulation
framework for the decentralised allocation of resources in an open and dynamic environment
was readily available.
1.4.2 Scientific Contributions
The main contribution of this thesis is a decentralised resource allocation framework for open
and dynamic environments. The main purpose is the real-time allocation of resources for the
execution of repeated tasks that require small amounts of resources for short periods of time.
A learning algorithm is proposed that works across separately administrated domains, under
the consideration of resource heterogeneity, lack of absolute control over the resources, and
with limited and dated information about the system. The basic mechanism of our solution
is inspired by the general framework of inductive reasoning under bounded rationality, first
13
1 Introduction
proposed by Arthur [1994]. This framework targeted the agent coordination for the access
of a single shared resource in a static context given full information. The algorithm needs
to be modified and extended to apply it in dynamic environments, with multiple shared
resources and different resource types.
The contributions of this thesis can be summarised as follows:
• A new resource allocation framework for open and dynamic environments
is proposed. The on-demand allocation of resources in such environments is not
well investigated in the current research literature. Most existing resource allocation
frameworks for distributed environments aim at the allocation, or advance reservation
of resources for computational expensive tasks. Existing resource allocation schemes
with decentralised control are suitable in this scenario, but they are mostly studied
in simplified mathematical models and often in the context of congestion games. The
resource allocation framework is provided with:
– A distributed multi-agent model that serves as a reference for any implementation
of the framework.
– A novel, fully decentralised algorithm for the allocation of resources in dynamic
environments.
– Two information models – the publish-subscribe and the data-pull model – to
gather information to make informed resource allocation decisions.
• A novel, fully decentralised algorithm for the on-demand allocation of re-
sources for the execution of repeated tasks is developed. The algorithm is
a distributed multi-agent learning algorithm based on the principles of inductive rea-
soning and bounded rationality. The algorithm efficiently coordinates the allocation
decisions of the consumers competing for shared resources. Agents employing this
algorithm are able to learn to request for a near optimal utilisation of provider re-
sources without any a priori information. The highly adaptive resource allocation
strategy continuously incorporates new information about the dynamic environment
into its decision-making procedure. This enables the agents to maintain a high effi-
ciency of the resource allocations in the face of unexpected gradual changes in an open
and dynamic environment. The algorithm is computationally efficient and minimises
14
the communication overhead by making resource allocation decisions purely based on
local information.
• A comprehensive set of empirical results for the analysis and evaluation
of the algorithm performance. A range of empirical evaluations in static and
dynamic environments demonstrates the practical applicability of the resource allo-
cation framework and the ability of the agents adopting the algorithm to efficiently
learn near optimal resource allocations.
1.5 Thesis Outline
The structure of this thesis is organised as follows:
Chapter 2 conducts a comprehensive survey of the research in the area of resource allocation
mechanisms and frameworks targeting open and dynamic environments. We characterise
the types of control of existing resource allocation approaches and review selected rep-
resentative approaches with regard to their suitability for open, large-scale and dynamic
environments.
Chapter 3 reviews the existing work in multi-agent systems and game theory in repeated
games that focus on the coordination among individuals under decentralised control. We
are in particular interested to which extent existing models and techniques can address the
challenges posed by the target environment and their shortcomings. We identify the areas
where further improvements are needed in order to satisfy the research requirements.
Chapter 4 sets up the frame for our work. We provide a formal model of an open and
dynamic environment that captures the relevant characteristics for the resource allocation
problem. We then define the general resource allocation process, introduce information
models and resource allocation strategies.
Chapter 5 develops a novel decentralised resource allocation algorithm for environments,
where a resource type is provided by a single provider. This chapter presents a thorough
experimental evaluation that demonstrates the self-organising features of the algorithm. The
allocations are purely the results of the emergent behaviour of the individual decisions of
many consumers. We analyse the impact of the consumers’ available information, the belief
15
1 Introduction
models they use and decision making strategies on the efficiency of the resource allocation
decision of the consumers.
Chapter 6 focuses on complex environments, which are open and dynamic environments with
multiple resource providers and where consumers have tasks that require the co-allocation
of multiple types of resources. We extend our decentralised resource allocation framework
by developing new resource allocation strategies that enable the consumers to cope in this
kind of environments. We demonstrate experimentally that the consumers can effectively
allocate their tasks onto the providers in open and dynamic environments. A comprehensive
set of empirical evidence in open and dynamic environments supports the main point of this
thesis: bounded rational agents can learn mutually beneficial resource allocation strategies
through self-organisation.
Chapter 7 concludes the thesis with a summary of the contributions and a brief discussion
of possible research directions for the future.
16
2 Background
This chapter gives an overview of recent developments in resource allocation and job schedul-
ing research for open environments. It discusses some of the important technological ad-
vances that have led to the emergence of World Wide Web, Service-oriented computing,
Grid computing, and more recently Cloud computing. It presents a simple classification
of resource allocation systems, followed by a brief survey of some representative resource
allocation mechanisms from different research directions.
Resource allocation is one of the fundamental and most important problems in computer
science and has been studied extensively in the literature. The study of resource allocation
and job scheduling has transitioned from single processor systems to multiprocessor systems
[Casavant and Kuhl, 1988], from offline problems to online problems [Stoesser et al., 2007],
from independent tasks to interacting tasks [Modi et al., 2001, Abdallah and Lesser, 2006],
from one-time tasks to repeated tasks [Lu, 2006] and closed systems to open, large-scale
and distributed systems [Nabrzyski et al., 2004].
In general, the resource allocation problem is a combinatorial optimisation problem, which
is proven to be NP-complete [Coffman, 1976]. Thus, any practical resource allocation algo-
rithm presents a trade-off between the computational complexity and quality of the alloca-
tions [Braun et al., 2001].
The growing interest in resource allocation mechanisms for open, large-scale and distributed
systems is a result of the emergence of new computing technologies that enable to build
such systems, e.g. Grid Computing [Foster and Kesselman, 1999, 2003], Peer-to-Peer sys-
tems [Buford et al., 2008], service-oriented architectures [Erl, 2005] and, more recently,
Cloud computing [Boss et al., 2007] and their needs for efficient and flexible resource allo-
cation mechanisms to utilise their resources on a large scale across different administrative
domains.
17
The important characteristics of open, large-scale and dynamic environments are the com-
putational and geographical distribution of consumers and providers, a dynamic system
architecture, the lack of coherent global knowledge and a dispersed ownership and control
over the computing resources. These create new challenges for resource allocation that
needs to be addressed. The main problem is that in most cases, the entity that wants
to allocate resources in open system does not own the resources (unlike a local resource
manager), thus has no control over the resources and the availability of information about
the resources is limited and often not up-to-date. Thus, the resource allocator must usually
make best-effort decisions given the limited information available at the time of the re-
source allocation request [Schopf, 2004]. The main challenge for the allocation of resources
in open environments is that other resource allocators can submit their own tasks to the
same provider at the same time or during the execution of the task. This can cause the
over-utilisation of the resources and consequently to a degradation of the performance for
all simultaneously running tasks. Another problem is that a provider can decide at any
time to alter the amount of shared resources or not to offer any shared resources at all.
On the contrary, a local resource manager is responsible for the management of the re-
sources of a single machine, or perhaps for a cluster of machines, where the manager has
full control of the resources and schedules all tasks. The local resource managers provide
information about the status of their resources to the global resource allocators, which use
this information to make resource allocation decisions on the large scale across different
administrative domains.
2.1 Classification of Resource Allocation
Many efforts have been undertaken to build taxonomies of resource allocation approaches for
distributed computing systems. They provide an overview over the vast amount of different
approaches proposed over many decades. For example, Casavant and Kuhl [1988] reviewed
resource allocation mechanisms for multi-processor machines. Rotithor [1994] proposed a
taxonomy of dynamic job scheduling algorithms with focus on scheduling and state esti-
mation techniques. Braun et al. [1998] presented a taxonomy for heterogeneous computing
environments. Krauter et al. [2002] build a detailed taxonomy of grid resource management
systems with the focus on resource management architectures for Grids, and more recently,
18
Yeo and Buyya [2006] categorised market-based resource allocation algorithms for utility-
driven cluster computing with emphasis on the users’ quality of service requirements.
The general resource allocation problem has been described a number of times and in a
number of different ways in the literature [Coffman, 1976, Dhall and Liu, 1978].
Definition 1: Resource Allocation Problem [Lu, 2006]
Given a set of tasks, each of them associated with a deadline and a priority, and a set
of computing resources, assign the tasks to the resources and schedule their execution to
optimise certain performance metrics.
For example, these optimisation criteria are: maximising the number of tasks that can
be executed without deadline violations, minimising the total completion time, minimising
the number of resources needed for the task execution within the deadline. To clarify the
components of a resource allocation mechanism that are involved to solve the resource
allocation problem, we use the following general definition.
Definition 2: Resource Allocation Mechanism [Casavant and Kuhl, 1988]
A resource allocation mechanism describes a procedure for the efficient and the effective
management of the access to and the use of a resource or a set of resources by its consumers.
The main components of the resource allocation mechanism are:
• Resource Provider(s) (RP)
• Resource Consumer(s) (RC)
• Resource Allocation Instance(s) or Resource Allocator(s) (RA)
The resource providers are the owners of the resources. The consumers want to access or
use the resources that are managed or controlled by one or more resource allocators. A
resource allocator is a logical entity that can be a separate resource allocation instance, or
this role can be played by the consumer or the provider.
19
Data Acquisition
S in
Table 2.1: Structural Aspects of the Resource Allocation Problem
Organisational Structure We focus on the organisational structure of the resource alloca-
tion mechanism, which determines the type of control that the algorithm uses. The three
types of control are centralised, distributed and decentralised control. These different types
of control are distinguished by the flow of information of the data acquisition, the location
of the decision making process and the enactment of decisions. Even though these terms
are widely used to categorise resource allocation approaches in the research literature, their
meaning is often ambiguous. We argue that a decentralised resource allocation mechanism
has significant advantages in terms of allocating resources in open and dynamic environ-
ments compared to other types of control. In order to do this, we briefly highlight their
characteristics and the associated advantages and disadvantages of each resource allocation
paradigm. The characteristics are summarised in Table 2.1.
• Centralised Resource Allocation – A system with a centralised resource allocation
scheme has a single resource allocator with full control over the data acquisition, the
decision making and the enactment of decisions. The data from all providers and all
consumers in the system is collected and stored by the central resource allocator, who
decides the resource allocations and assigns which resources are allocated to which
consumers. Figure 2.1 shows the centralised resource allocation paradigm, where all
entities are controlled by the central instance.
Advantages: A central resource allocator simplifies the management, the deployment
and the maintenance of the resource allocation mechanism, in particular, in closed
systems with a single administrative domain. Each entity communicates with the
dedicated resource allocator which coordinates the use of the resources in the system.
An advantage of centralised resource allocation is that the resource allocator has global
knowledge of the entire state of the system and full control over all resource allocation
decisions. This allows the optimisation of certain system performance metrics such as
20
system throughput or the makespan.
Disadvantages: The central resource allocator is also the major drawback of this par-
adigm and raises serious concerns when subjected to larger system size. Not only
it is a bottleneck of the system and a single point of failure, but also do systems
with a centralised resource allocation scheme experience state and information over-
load with increasing scale as all information needs to be gathered and processed at a
single instance [Eymann et al., 2007]. In addition, the realisation of central control
is often challenging in open environments, where the resources are under different
ownerships, due to difficulties to incorporate different resource allocation policies of
providers and consumers. Hence, central resource allocation schemes are not suit-
able for the allocation of tasks to resources in open, large-scale environments across
different administrative domains.
• Distributed Resource Allocation – A system with distributed resource allocation
consists of a collection of independent consumers and providers, each with partial con-
trol over the data acquisition and the enactment of decisions. However, the resource
allocation decision making is supervised or remote-controlled by a single, dedicated
resource allocator, coordinator or facilitator instance. Figure 2.2 illustrates the dis-
tributed resource allocation paradigm. An example is a market-based mechanism,
21
Figure 2.2: Distributed Resource Allocation Paradigm
where the market is the central instance that determines the consumers with the
highest bids who are granted access to the resources.
Advantages: Distributed approaches typically trade off between the advantages
and disadvantages of the centralised and decentralised resource allocation paradigms.
Many tackle a specific problem of centralised approaches such as the scalability or
fault-tolerance of the system. For example, an approach that increases the scalability
of the system size does not allow to optimise certain performance metrics such as
system throughput or the makespan.
Disadvantages: Distributed resource allocation schemes do not give providers and con-
sumers the autonomy over resource allocation decisions and they usually cannot ac-
commodate individual resource allocation policies for the providers or the consumers.
• Decentralised Resource Allocation – The control of a resource allocation mech-
anism is decentralised, when the authority, responsibility and control over the data
acquisition, the decision making and the enactment of the decisions is functionally
and often geographically distributed over several resource allocation entities. Often,
the roles of a resource allocation are played by the consumers or the providers. There
is no single resource allocator that is more important than other entities and no one
entity is capable of allocating all resources, so that no entity becomes a bottleneck for
the system. Figure 2.3 illustrates the decentralised resource allocation paradigm.
22
Advantages: Decentralised resource allocation can provide several advantages, espe-
cially in systems with a large number of components which are geographically dis-
persed. First, the single point of failure and the bottleneck of the system compared
to central control, is removed, thus, increasing the robustness and scalability of the
system. Second, the distribution of control gives the participants not only the au-
tonomy to choose among alternative resource allocations, but also to implement their
own resource allocation policies as opposed to a central and distributed solution. Ap-
plying decentralised control enables building more robust, scalable, adaptive, fault
tolerant and self-organising systems since there is no critical reliance on any specific
entity for the system [Eymann et al., 2005]. Third, the costs of using and maintain-
ing a decentralised resource allocation system would be reduced and spread over a
number of individuals compared to a centralised coordinator instance. In addition,
it has business and economical reasons for distributed control in large systems. The
resources are owned and maintained by different organisations that do not willingly
handover the control over their resources to a central resource allocation authority.
These advantages are, in particular, important in open and dynamic environments.
Disadvantages: Decentralised control introduces several problems of their own. The
effectiveness of the resulting decentralised system depends on the level of coordination
and cooperation among the participants [Akram et al., 2005]. Each resource allocator
must typically make best-effort resource allocation decisions given the limited infor-
mation that is available. This can cause the system to get caught in situations such
as the Tragedy of the Commons [Hardin, 1968] that can result in inefficient resource
allocations.
23
between cooperative and non-cooperative mechanisms.
• Cooperative resource allocation mechanisms involve explicit coordination among re-
source allocators through communication. Typically, the providers coordinate the
efficient use of their resources. This coordination requires a significant amount of
communication in the system beyond the centralised resource allocation paradigm
and can be seen as the major drawback [Raja and Lesser, 2004].
• Non-cooperative resource allocation mechanisms, on the other hand, operate without
any explicit coordination among the entities in the system. Recent research has recog-
nised the value of decentralised resource allocation with minimal or no communication
between the resource allocators [Wolpert, 2003, Eymann et al., 2007].
2.2 Survey
This section provides a survey of recent advances in the area of resource allocation ap-
proaches for distributed computing systems, and evaluates them with respect to the re-
quirements of an open and dynamic environment outlined in Chapter 1. This section dis-
cusses background technologies that our work builds upon. We discuss different resource
allocation frameworks and decentralised decision making technologies to give an indication
of their current state of the art and to provide target applications for our research.
2.2.1 Web Applications
Web applications are probably the most common type of applications in an open and dis-
tributed computing environment (cf. Section 1.2.1). Web applications are often provided
from a cluster of computers [Kopparapu, 2002], which is owned and managed by a single
organisation. Therefore, centralised resource allocation mechanisms often manage the clus-
ter of resources and present the cluster as a single unified resource to its consumers. A
common resource allocation mechanism for providing internet application with a cluster is
load balancing [Bourke, 2001]. The central resource broker controls the allocation of user
requests to resources. It tries to be fair to all providers by balancing the system load equally
24
2 Background
among them. Different techniques [Kannan et al., 2001] such as round-robin, least connec-
tions and URL hashing have been proposed to achieve this goal. These techniques work
best in environments with homogeneous resources and equal resource demands. They allow
an increase of the total capacity in order to serve a large number of user requests. This
can limit the problem of resource over-utilisation caused by a high demand for resources.
However, this advantage is bought for a very high price since the provisioning of a capac-
ity above the theoretical limit of the user demand is very costly. The cost for providing,
operating and maintaining such a high capacity does often not justify the value in most
cases because most times, only a very small fraction of the resources are utilised. The high
costs for serving the peak demand is the major disadvantage of this solution besides the
drawback of central control.
Recent research in the area of providing web applications focuses on the dynamic placement
of web applications on a computer cluster. The goals are the maximisation of the total sat-
isfied application demand across different web applications that utilise the same resources,
the minimisation of the number of application placement decisions, and to balance the load
across machines [Karve et al., 2006, Tang et al., 2007]. This concept of centrally managed
resources has emerged as a new computing paradigm that is known as cloud computing
[Boss et al., 2007, Velte et al., 2009].
Other examples of existing cluster resource management systems with a central controller
include Utopia Load Sharing Facility [Zhou et al., 1993], Condor [Frey et al., 2002], IBM
LoadLeveler [Kannan et al., 2001], Load Sharing Facility (LSF) [Computing, 2009], and
Sun’s Grid Engine (SGE) [Microsystems, 2009]. The focus of all approaches lies on the
optimisation of the overall cluster performance given full control over the resources and
global information.
On the other hand, client-side resource management solutions that follow a decentralised
approach are more robust and less costly. For example, Zhu [2007] proposed the idea of
a decentralised load balancing mechanism for the provisioning of web applications. When
each client randomly selects the server it connects to, the loads should be distributed evenly
among servers given equal demand for each client request and homogeneous providers. The
advantage of this solution is the ability of each client to handle fail over of an application
server gracefully. The client has the ability to fail over to another server when the chosen
25
2 Background
server does not respond within a preset period of time. The application server connection
seamlessly fails over to another server. The actual provider that executes the task is trans-
parent to the user. This solution has no central broker that allocates the tasks to resources.
In addition, application servers can be geographically distributed since each client is select-
ing the provider instead of a central broker. The locations of the providers are unrestricted,
thus simplifying the creation of a content delivery network by deploying servers closer to
end users, to improve the application performance and scalability [Buyya et al., 2008]. This
idea of the decentralisation of the control in the system can provide advantages compared to
a centralised solution. However, the basic assumptions of equal demand and homogeneous
providers cannot be sustained in open and dynamic environments.
However, most resource allocation approaches for web applications target self-contained
systems with a single administrative domain (e.g. cluster computing) where complete and
up-to-date information about the status of all resources in the system is assumed. Together
with the knowledge about the demand of the users, these resource allocation approaches
are able to achieve a good performance in self-contained systems. However, they are not
designed to face the challenges of open and dynamic environments, where a central con-
troller cannot be established across different administrative domains to allocate tasks to
heterogeneous providers that can be added or removed at any time.
2.2.2 Service oriented computing
As stated in Section 1.2.2, service-oriented computing (SOC) is an emerging computing
paradigm for offering and consuming resources and functionality over a distributed computer
system. Even though the idea of sharing resources and offering services between computers
has been considered in the past, most early distributed systems were built for a special
purpose. Those systems usually relied on static links between components and used ad-hoc
mechanisms to interoperate [Singh and Huhns, 2005]
Manual Service Provisioning The SOC approach addresses these shortcomings by pro-
viding mechanisms for the automatic discovery and invocation of services at runtime. Web
services have become a popular technology for enabling the service-oriented computing par-
adigm by providing common protocols and data formats for service consumers and providers
26
2 Background
to communicate over the Internet. Web services are often seen as a special type of web ap-
plication where the application providers offer the resources provided to satisfy all client
requests. Thus, the majority of current service-oriented applications do not select services
dynamically at runtime, as is envisaged by the research literature in service-oriented com-
puting [Zimmermann et al., 2005]. Instead, most development tools (e.g. IBM’s WebSphere
Integration Developer1) provide tools for human programmers to define the service oriented
application and to manually bind the services to concrete Web services instances at design
time [Stein et al., 2009]. In summary, most current technologies and software tools for the
invocation of services in an open and dynamic environment rely on human effort.
Automatic Service Provisioning To overcome the problems of manually specified SOC ap-
plications, some research has suggested a dynamic service provisioning approach that is the
selection of particular service instances for specific tasks [Sirin et al., 2005]. Most proposed
service provisioning approaches focus on advance service level agreements based on some
form of financial remuneration to providers for their services. For example, constraint-based
service provisioning techniques use decision rules (e.g. reliability > 0.95 or response time
< 0.5s) to filter appropriate services and select among the service, which can guarantee to
satisfy the constraints. QoS Optimisation techniques improve these selection techniques by
considering preferences for different QoS characteristics, which are also assumed to be guar-
anteed by the resource provider in a service level agreement before the provisioning of the
service. Decentralised solutions of this distributed optimisation problem using distributed
constraint satisfaction techniques can be found in [Frei and Faltings, 1999, Modi et al.,
2001]. The distributed resource allocation based on advance service level agreements is an
important problem, however, this is not within the scope of this thesis.
Stein [2008] proposed a flexible service provisioning approach for complex workflows in un-
certain and unreliable environments. The proposed agent-based approach uses probabilistic
performance information about providers to reason about service uncertainty in order to de-
termine the optimal number of services that must be provisioned in parallel to meet budget
and deadline constraints.
None of these works addressed the problem of the on-demand provisioning of services in
open environments, without the establishment of service level agreements in advance or the
1http://www-01.ibm.com/software/integration/wid/
27
2 Background
guarantee of certain QoS constraints (e.g. resource availability) by the service providers.
This thesis addresses the problem of allocating resources for the provisioning of a service
on demand without prior service level agreements between resource consumers and resource
providers.
2.2.3 Grid Computing
Starting in the mid-1990’s, the vision of Grid computing is to provide a distributed comput-
ing infrastructure for sharing heterogeneous computing resources (services, data or process-
ing cycles) across a wide-area network [Foster and Kesselman, 2004]. More specifically, the
vision is to allow companies and research institutes to collaborate on solving complex com-
puting problems faster with a large number of computers that are geographically distributed
across different administrative domains [Foster and Kesselman, 1999, 2003, Berman et al.,
2003].
The distinction of the Grid paradigm to other existing distributed computing paradigms is
that users can participate in this environment by using idle system resources or sharing their
own resources or parts of them without giving away the control over their resources. This
vision to provide access to heterogeneous resource providers is central to Grid computing and
defines the field, rather than a particular implementation. However, several Grid toolkits
have emerged [Baker et al., 2002], which provide the basic infrastructure for accessing
heterogeneous resources across different administrative domains. For example, the Globus
Toolkit 42 [Foster and Kesselman, 1997] is a popular Gr