A Broker for Cost-efficient QoS aware Resource Allocation in EC2

A Broker for Cost-efficient QoSaware Resource Allocation in EC2

Proefschrift voorgelegd op 30 mei 2011 tot het behalen vande graad van Master in de Wetenschappen,bij de faculteit Wetenschappen, aan de UniversiteitAntwerpen.

Promotoren:prof. Dr. Jan BroeckhoveDr. Kurt Vanmechelen

K. Vermeersch

Research Group ComputationalModelling and Programming

Contents

List of Figures v

List of Tables viii

Preface x

Abstract xi

1 INTRODUCTION 11.1 What is Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 The Cloud (R)Evolution . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Service Models . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.4 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Amazon’s Cloud Computing Offering . . . . . . . . . . . . . . . . . . . 71.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 AWS Product Portfolio . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Instance Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.4 Instance Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Motivation for a EC2 Broker . . . . . . . . . . . . . . . . . . . . . . . 15

2 ENVIRONMENTAL ANALYSIS 172.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Price Evolution of the On-Demand and Reserved Instances . . . . . . . 202.3 Price Evolution Spot . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.2 Working Method . . . . . . . . . . . . . . . . . . . . . . . . . 242.3.3 Price Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.4 SpotWatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

i

2.4 Pricing Comparison of the Different Regions . . . . . . . . . . . . . . . 352.4.1 On-Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4.2 Reserved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.4.3 Spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4.4 Comparison of the Pricing Models . . . . . . . . . . . . . . . . 392.4.5 Data storage and transfer . . . . . . . . . . . . . . . . . . . . . 41

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 RESOURCE SCHEDULING 443.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2 Optimal Division between Pricing Models . . . . . . . . . . . . . . . . 46

3.2.1 Reserved vs On-Demand Instances . . . . . . . . . . . . . . . . 463.2.2 Spot Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Workload Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.3.1 Workload Models . . . . . . . . . . . . . . . . . . . . . . . . . 523.3.2 Workload Constraints . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Workload Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.1 Workload Model 1 (total VM hours needed is specified) . . . . . 543.4.2 Workload Model 2 (every hour #VMs needed is specified) . . . 56

3.5 Spot Decision Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.5.1 Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.5.2 SpotModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.5.3 Implementation Changes . . . . . . . . . . . . . . . . . . . . . 63

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4 BROKER DESIGN 664.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.2 Broker Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2.1 Task Generation and Specification . . . . . . . . . . . . . . . . 684.2.2 Price Gathering and Analysis . . . . . . . . . . . . . . . . . . . 71

4.3 Broker Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.3.1 Region Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 724.3.2 Workload Distribution . . . . . . . . . . . . . . . . . . . . . . . 74

4.4 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.4.1 Reserved Model . . . . . . . . . . . . . . . . . . . . . . . . . . 764.4.2 Spot Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.5 Broker Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.5.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . 784.5.2 Textual Representation . . . . . . . . . . . . . . . . . . . . . . 804.5.3 Detailed Cost Overview . . . . . . . . . . . . . . . . . . . . . . 80

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

ii

5 BROKER EVALUATION 835.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.2 Cost Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.2.1 Workload Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . 875.2.2 Workload Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.3 Scalability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.3.1 Workload Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . 895.3.2 Workload Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6 CONCLUSION 946.1 Conclusions and Contributions . . . . . . . . . . . . . . . . . . . . . . 946.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Appendices 98

Appendix A A: Size Estimation EC2 Regions 99A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A.2 Size IP Ranges in Different Regions . . . . . . . . . . . . . . . . . . . . 99

Appendix B B: On-Demand and Reserved Price Evolution 101B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101B.2 On-Demand and Reserved Price Evolution . . . . . . . . . . . . . . . . 101

Appendix C C: On-Demand/Reserved Price Versus Hardware Costs 105C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105C.2 Hardware Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105C.3 Price Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107C.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Appendix D D: Basic Economics 110D.1 Supply and Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Appendix E E: Spot Price Analysis 111E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111E.2 Box Plot Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 111E.3 Statistical Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112E.4 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Appendix F F: On-Demand versus Reserved Instances 115F.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115F.2 Tipping Points (3-year period) . . . . . . . . . . . . . . . . . . . . . . 115

iii

Appendix G G: Developed Software 117G.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117G.2 Environmental Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . 117G.3 Broker Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

iv

List of Figures

2.1 Screenshot of CloudExchange.org . . . . . . . . . . . . . . . . . . . . 202.2 Example of Spot Prices exceeding the On-Demand Prices in 2010

(Standard Small Linux instance in the US-East region) . . . . . . . . 242.3 Average Spot Price per Day (High-Memory Double Extra Large Linux

instance in the US-East region) . . . . . . . . . . . . . . . . . . . . . 252.4 Boxplots Spot Price per Day (High-Memory Double Extra Large

Linux instance in the US-East region) . . . . . . . . . . . . . . . . . 262.5 Average Spot Price per Week (High-Memory Double Extra Large

Linux instance in the US-East region) . . . . . . . . . . . . . . . . . 272.6 Boxplots Spot Price per Week (High-Memory Double Extra Large

Linux instance in the US-East region) . . . . . . . . . . . . . . . . . 272.7 Boxplots Spot Price per Week between December 2009 and January

2011 (Standard Large Linux instance in the US-East region) . . . . . 282.8 Boxplots Spot Price per Week during the Christmas period between

December 1th 2010 and January 10th 2011 (High-Memory DoubleExtra Large Linux instance in the US-East region) . . . . . . . . . . 28

2.9 Boxplots Spot Price per Day of the Week (High-Memory DoubleExtra Large Linux instance in the US-East region) . . . . . . . . . . 29

2.10 Average Spot Price per Hour of the Day (High-Memory Double ExtraLarge Linux instance in the US-East region) . . . . . . . . . . . . . . 29

2.11 Boxplots Spot Price per Hour of the Day (High-Memory Double ExtraLarge Linux instance in the US-East region) . . . . . . . . . . . . . . 30

2.12 Spot Price Difference between Day and Night (High-Memory DoubleExtra Large Linux instance in the US-East region) . . . . . . . . . . 31

2.13 Boxplots of the spot prices between December 18th 2010 and February13th 2011 (High-Memory Double Extra Large Linux instance in theUS-East region) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.14 Average Spot Price Evolution (High-CPU Extra Large Instance inthe US-East Region) . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

v

2.15 Boxplots Spot Price Evolution (High-CPU Extra Large Instance inthe US-East Region) . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.16 Screenshot of SpotWatch.eu . . . . . . . . . . . . . . . . . . . . . . . 342.17 Average Spot Price for Standard Large Linux instance in the US-East

Region between December 14th 2009 and February 14th 2010 . . . . 382.18 Instance Pricing Comparison US-East Region . . . . . . . . . . . . . 392.19 Instance Pricing Relative Comparison US-East Region . . . . . . . . 402.20 Instance Pricing Relative Comparison EU-West Region . . . . . . . . 41

3.1 Traditional Data center versus AWS Resource Provisioning . . . . . 453.2 Example Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.3 Division overview Reserved vs On-Demand . . . . . . . . . . . . . . 473.4 Workload Model 1 Specification . . . . . . . . . . . . . . . . . . . . . 533.5 Workload Model 2 Specification . . . . . . . . . . . . . . . . . . . . . 533.6 Workload Description File . . . . . . . . . . . . . . . . . . . . . . . . 543.7 Scheduling Total Workload (basic) . . . . . . . . . . . . . . . . . . . 553.8 Price Schedule Total Workload in US Dollars (basic) . . . . . . . . . 553.9 Scheduling Total Workload (optimized) . . . . . . . . . . . . . . . . 563.10 Price Schedule Total Workload in US Dollars (optimized) . . . . . . 563.11 Scheduling Workload Per Hour (basic) . . . . . . . . . . . . . . . . . 573.12 Price Schedule Workload Per Hour in US Dollars (basic) . . . . . . . 573.13 Example Workload Model 2 Scheduling . . . . . . . . . . . . . . . . 573.14 Combination formula for the number of possible combinations of x

objects from a set of y objects. . . . . . . . . . . . . . . . . . . . . . 583.15 Scheduling Workload Per Hour (optimized) . . . . . . . . . . . . . . 593.16 Price Schedule Workload Per Hour in US Dollars (optimized) . . . . 593.17 Decision Model [1]: variables . . . . . . . . . . . . . . . . . . . . . . 613.18 Decision Model [1]: practical meaning of variables . . . . . . . . . . 613.19 Decision Model [1]: workflow graph . . . . . . . . . . . . . . . . . . . 623.20 Decision Model [1]: execution time - bid price - confidence level graph 633.21 Workload Base Level Reserved Instances . . . . . . . . . . . . . . . . 64

4.1 Broker Task Components Overview . . . . . . . . . . . . . . . . . . . 674.2 Broker Input Task Description Different Workload Models . . . . . . 694.3 Broker Input Task Design . . . . . . . . . . . . . . . . . . . . . . . . 694.4 Broker Input PriceWatch Design . . . . . . . . . . . . . . . . . . . . 714.5 Broker Scheduling Component . . . . . . . . . . . . . . . . . . . . . 724.6 Broker Region Scheduling Not-Spot-Enabled Tasks: Standard Small

Linux Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.7 Broker Region Scheduling Not-Spot-Enabled Tasks: Standard Small

Windows Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.8 Broker Region Scheduling Spot-Enabled Tasks: Standard Small

Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.9 Broker Allocation Component . . . . . . . . . . . . . . . . . . . . . . 764.10 Broker Output GUI Zooming Capability . . . . . . . . . . . . . . . . 79

vi

4.11 Broker Output GUI Details SubTask . . . . . . . . . . . . . . . . . . 794.12 Broker Output GUI Cost Overview . . . . . . . . . . . . . . . . . . . 804.13 Broker Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 Snippet of the Benchmark Results Output File . . . . . . . . . . . . 865.2 Snippet of the Benchmark Tasks Output File . . . . . . . . . . . . . 865.3 Box plots total brokering time Basic Only On-Demand versus Spot-

Enabled Scheduling and Allocation (US-East region, Standard SmallLinux Instance) [workload model 1] . . . . . . . . . . . . . . . . . . . 91

5.4 Box plots total brokering time Basic Only On-Demand versus Spot-Enabled Scheduling and Allocation (US-East region, Standard SmallLinux Instance) [workload model 2] . . . . . . . . . . . . . . . . . . . 93

6.1 Broker Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . 96

C.1 CPU Procurement Cost Evolution for Intel Xeon E5430 (per unit) . 107C.2 CPU Procurement Cost Evolution for Intel Xeon E5507 (per unit) . 108

D.1 Supply and Demand Principle . . . . . . . . . . . . . . . . . . . . . . 110

E.1 Schematic explanation of Boxplot . . . . . . . . . . . . . . . . . . . . 111E.2 Positive vs Negative Skewness . . . . . . . . . . . . . . . . . . . . . . 113

vii

List of Tables

1.1 Properties of Standard Instances . . . . . . . . . . . . . . . . . . . . 101.2 Properties of Micro Instances . . . . . . . . . . . . . . . . . . . . . . 101.3 Properties of High-Memory Instances . . . . . . . . . . . . . . . . . . 101.4 Properties of High-CPU Instances . . . . . . . . . . . . . . . . . . . 111.5 Properties of Cluster Instances . . . . . . . . . . . . . . . . . . . . . 111.6 On-Demand Pricing in the US-East Region . . . . . . . . . . . . . . 121.7 Reserved Pricing in the US-East Region . . . . . . . . . . . . . . . . 131.8 Dedicated On-Demand Pricing in the US-East Region . . . . . . . . 141.9 Dedicated Reserved Pricing in the US-East Region . . . . . . . . . . 141.10 Availibility of the Instance Types in the different Regions . . . . . . 15

2.1 Evolution of On-Demand Pricing in the US-East region . . . . . . . 212.2 Evolution of the Reserved Pricing in the US-East region . . . . . . . 212.3 Comparison Average Spot Prices (High-Memory Double Extra Large

Linux instance in the US-East region) . . . . . . . . . . . . . . . . . 322.4 Region Comparison On-Demand Pricing of Linux Instances . . . . . 352.5 Region Comparison On-Demand Pricing of Windows Instances . . . 362.6 Region Comparison Reserved Pricing of Linux Instances . . . . . . . 362.7 Region Comparison Reserved Pricing of Windows Instances . . . . . 372.8 Region Comparison Average Spot Pricing of Linux Instances . . . . 372.9 Region Comparison Average Spot Pricing of Windows Instances . . 382.10 Data Storage Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.11 Data Transfer Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1 Percentage of time a certain number of instances is required by theexample workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2 On-Demand versus Reserved Linux instances 1-Year Overview . . . . 483.3 On-Demand versus Reserved Windows instances 1-Year Overview . . 483.4 Tipping Point taking one more Reserved (versus On-Demand) in-

stance is better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

viii

3.5 Spot Prices in Percentage of the On-Demand Prices . . . . . . . . . 513.6 Reserved prices become cheaper than spot prices after the stated

amount of days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1 Average price reduction from a set of brokering options to a set ofdifferent brokering options [workload model 1] . . . . . . . . . . . . . 87

5.2 Average price reduction from a set of brokering options to a set ofdifferent brokering options [workload model 2] . . . . . . . . . . . . . 88

5.3 Average time distribution (in percentage of total brokering time) ofthe different brokering phases [workload model 1] . . . . . . . . . . . 89

5.4 Average time increase (in seconds) when a task is added to the work-load presented to the broker prototype (US-East region, StandardSmall Linux Instance) [workload model 1] . . . . . . . . . . . . . . . 90

5.5 Average time distribution (in percentage of total brokering time) ofthe different brokering phases [workload model 2] . . . . . . . . . . . 92

5.6 Average time increase (in seconds) when a task is added to the work-load presented to the broker prototype (US-East region, StandardSmall Linux Instance) [workload model 2] . . . . . . . . . . . . . . . 92

A.1 Public IP Ranges of the EC2 Regions . . . . . . . . . . . . . . . . . 99

C.1 Microprocessors used by EC2 Instances at Introduction . . . . . . . 105C.2 Microprocessors used by EC2 Instances according to Test Program . 106C.3 Extra Information about the Microprocessors used by EC2 Instances 107

F.1 Linux 3-Year Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 116F.2 Windows 3-Year Overview . . . . . . . . . . . . . . . . . . . . . . . . 116

ix

Preface

The subject of this thesis, designing a broker for intelligent (cost-efficient and QoS-aware) cloud resource allocation, was presented to me during my first CS Mastersyear at the University of Antwerp. I couldn’t wait to take on the challenge. Thechoice of my Computer Science major, namely Distributed Systems and ComputerNetworks, emphasizes my special affinity with cloud computing.

This thesis is written using a very data-analysis and prototype-driven work method.Regular meetings with my mentors kept me on the right track, while keeping a blog1

up-to-date provided me valuable feedback from the industry and obliged me to startthe writing process early. The first term (fall 2010) was focused on the researchaspect, such as the price analysis and region comparison. The second term (spring2011) was spent finishing the broker prototype and writing this document.

First of all, I’d like to thank my promoter Prof. Dr. Jan Broeckhove, co-promoterDr. Kurt Vanmechelen and mentor Ruben Van den Bossche. They are all membersof the research group Computational Modeling and Programming (CoMP) at theUniversity of Antwerp. Without the continuous support of Kurt and Ruben mywork wouldn’t be as valuable, it was a very pleasant experience to work with them.I’m grateful for the opportunity I got and for the valuable skills I could developduring the work on this thesis. Of course I would not have been able to successfullyget my computer science degree without the support of my parents and my fellowstudents, who became great friends during the course of my college experience.

Let’s finish by thanking the providers of cloud applications, such as Dropbox andGoogle Docs that helped me during the writing process. Most importantly howeverI want to thank the people behind EC2 at Amazon, all their tools and products areeasy to use and well-documented. I hope the information in this thesis will be asvaluable for you, as the experience of gathering it was for me.

1The thesis blog can be found at http://www.thesis.kurtvermeersch.com/.

x

Abstract

Cloud computing is a model for providing on-demand dynamically-scalable compu-tation and storage resources to end-users. Amazon’s Elastic Compute Service (EC2)offering allows consumers to allocate virtual machine instances, which run a user-defined operating system and application software stack in its data centers in anon-demand fashion. That is, a user has the ability to allocate or deallocate virtualmachine instances at any time according to its current workload demand.

In terms of pricing, a user has the choice between the following models:

1. On-Demand Instances are priced with a fixed hourly charging rate. Oncelaunched, these instances are guaranteed to be kept live as long as the userpays for them. A user however does not have a full guarantee of being able tolaunch on-demand instance(s).

2. Reserved Instances require an upfront payment for an instance for a one orthree year period, which is then supplemented with usage based pricing basedon a fixed hourly price. These instances come with a guaranteed availabilityfor the user.

3. Spot Instances have prices that vary hourly. They can be shut down by EC2if the consumer’s bid no longer exceeds the spot market price, so there are noguarantees for the user.

At present, consumers do not have any tools to optimally map their workload andQoS requirements (such as the deadline by which a workload needs to finish), tothese different pricing plans. Nevertheless the potential for cost reductions throughintelligent instance allocations are huge, the spot prices for example are on averagea lot lower than the on-demand prices. We devise heuristics in this thesis that canbe used by a brokering component to realize this optimization goal. This involvesan analysis of the consumer’s workload characteristics, QoS requirements and thevolatility of the EC2 spot market to define an optimal portfolio of instance allocationsacross the three pricing plans.

xi

CHAPTER 1

INTRODUCTION

This chapter first introduces the technology that forms the focus of this thesis, whichis “Cloud Computing”. Then the cloud concepts and how these are implemented byAmazon in the Elastic Compute Cloud (EC2) products are discussed. The chapterconcludes with a detailed overview of and motivation for the subject of this thesis.

1.1 What is Cloud Computing

This thesis handles about the Elastic Compute Cloud (EC2) [2], the public cloudsolution offered by Amazon, thus a high level overview of cloud computing is givenhere. First the nature of cloud computing is discussed and a definition for thismisty term is given, then the service models and deployment models that exist arepresented, followed by the benefits and challenges of cloud computing. Note that thiswill be a rather short introduction to cloud computing, mainly aimed at emphasizingthe importance of the research proposed in this thesis. Cloud computing is a rathernew technology, that is still undergoing a rapid evolution. If one is unfamiliar withthe topic, make sure to check out some of the resources [3] that are referenced inthis chapter.

1.1.1 Definition

One of the definitions of cloud computing that best fits our problem context, is givenin “Cloud Computing and Grid Computing 360-Degree Compared” [4].

Cloud computing is a large-scale distributed computing paradigm that isdriven by economies of scale, in which a pool of abstracted, virtualized,dynamically-scalable, managed computing power, storage, platforms,

1.1. WHAT IS CLOUD COMPUTING 2

and services are delivered on demand to external customers over theInternet.

The Cloud Computing term is often used as a buzzword for the big switch to anIT world where computation and storage resources are provided over the Internet.This is the reason why a lot of different definitions and misconceptions about itexist. In “The Cloud Revolution” [5] Charles Babcock sketches the typical case inwhich a company’s CEOthe cloud is ‘the next phase of Internet computing’, but thatthe meaning of this term is now more muddled than ever. Since cloud computingis a rather new step in the computing evolution, it is constantly redefining itself,which makes it hard to describe it with one generic definition. The definition by IanFoster however emphasizes a number of the key features of cloud computing, theseare explained here.

Cloud computing is a large-scale distributed computing paradigm, since theservices are offered over the Internet. According to NIST [6] the term cloud is beingused as a metaphor for the Internet1. End users, companies and developers all haveaccess to the storage, processing power and services that are being offered by cloudproviders. They are able to access these resources from anywhere, as long as theyhave access to a device ’connected’ to the Internet.

The shift to Cloud computing is driven by economies of scale. The cloudproviders take advantage of a relative cost reduction, when they build large datacenters. It enables them to buy hardware in enormous quantities, which makestheir compute power relatively cheap. Cloud computing has a market-orientedbusiness model in which users are charged for the cloud services they consumed [7].These cloud services include the offering of compute power, storage and networkservices. Charging for these resources occurs analogously to conventional utilitiesin everyday life (e.g. water, electricity, gas and telephony). This concept is calledutility computing.

Virtualization is a key underlying technology of cloud computing. Reports indicatethat the average CPU utilization of a general server is only 10% [8]. Being able torun multiple virtual machines on one physical machine in cloud data centers enablesmore efficient use of the hardware resources. The possibility to perform serverconsolidation forms the base for a number of the advantages of the cloud model.

Cloud computing offers dynamically-scalable on-demand resources over theInternet. This causes the illusion of an infinite amount of on-demand computingresources, which allows companies to start small and grow when the increase in usersof their services requires more resources. The release of the acquired resources, whenthe company no longer needs them, is equally easy. This elasticity eliminates theneed for an up-front commitment to a fixed-size long-term investment in hardware,as found in traditional privately owned data centers.

1A cloud symbol is used in computer network diagrams to represent the Internet.


1.1.2 The Cloud (R)Evolution

Cloud computing is a generic term that describes the concepts, characteristics andtechnologies involved. These are changing the industry, so cloud computing couldbe defined as an ‘industrial revolution’. But how did the cloud computing modelthat exists today originated?

It all started in 1960 when John McCarthy said “computation may someday beorganized as a public utility” [9]. In 1966 a lot of the characteristics of cloudcomputing, such as elasticity, the illusion of infinite supply and the comparisonwith the electricity industry, were mentioned by Douglas Parkhill in the book“The Challenge of the Computer Utility” [10]. The term ‘cloud’ originated in thetelecommunication sector whenVPNs were introduced to balance the load, insteadof using direct point-to-point connections, the decentralized network structure wasreferred to as the cloud. Then the cloud symbol found is way to telecommunicationnetwork diagrams, in which it represents the telephone network. Later on, thesymbol also started representing the Internet in computer network diagrams.

Nicholas Charr stated in The Big Switch[11]: “Cheap utility-supplied computing willultimately change society as profoundly as cheap electricity did.” A hundred yearsago companies stopped producing their own power, since getting power throughthe electricity grid became cheaper. These companies could focus on their corebusiness activities, and no longer had to worry about power production. With ITthis shift is happening as well. A number of non-IT companies will no longer needIT departments and own a server park. Even IT startups don’t need to invest inexpensive servers any longer. FourSquare, a startup that created a location-basedsocial network, is running its applications on top of Amazon EC2 [12]. Computingis becoming a utility, one plugs a cable in the wall and has access to it over theInternet. A commoditization of IT infrastructure is happening, which means thatcloud computing is becoming commonplace and standardized. Cloud computing isa metered service, which means that the customer only pays for the services andcapacity that is used. The usage is measured according to well-defined models.

The last few years, large IT enterprises are making big investments in cloudcomputing. Companies such as Google (Google Apps [13]), Microsoft (WindowsAzure [14]), Amazon (EC2 [2] and S3 [15]) and IBM are investing a lot in theircloud computing services. These companies are building large next generation datacenters containing tens of thousands of servers. These investments are crucial inmaking the growth of their services and the rise of cloud computing a reality.

1.1.3 Service Models

The cloud computing solutions can be divided into three different service models [6]:IaaS, PaaS and Saas. These models are discussed in this section in order of theirlevel of abstraction.


IaaS In Infrastucture as a Service, raw hardware resources are provided. In thiscase the customer can choose the operating system and the software stack that runson the cloud instances. Companies that offer IaaS products include FlexiScale [16],GoGrid [17], RackSpace [18] and Amazon.

PaaS In Platform as a Service, a platform is provided, which means that on top ofthe hardware a framework of software tools is made available. The customer writesapplications that will be hosted on a cloud instance that offers a certain stack ofsoftware. Products available in this field include Microsoft Azure [14] and GoogleApp Engine [19].

SaaS In Software as a Service, finished software solutions are provided to theconsumer. The applications are accessible through a thin client, in other words theuser will only need a web browser to access the service. Although these applicationsare deployed in the cloud, they are often still configurable by the customer. Theprovider takes care of the management of the software, so there is no need for the userto install patches, etc. These applications are often offered free of charge (when theprovider has an alternative stream of revenue such as advertising) or on a per-userpricing plan. Salesforce [20] and Google are well known providers of SaaS products.Salesforce is well known for its Customer Relationship Management software tools,Google Apps [13] on the other hand offers GMail, Google Docs, Google Calendar, etc.

Different deployment models for the cloud computing model exist: public, privateand hybrid clouds.

A public cloud means that the customer uses an off-site third-party cloud provider.

A private cloud means that the customer emulates the cloud computing conceptson a privately owned data center. This emulation is made possible by a numberof virtualization products, that offer the ability to host virtual machines on theinfrastructure (that is solely used by one organization). These products providesome of the advantages of cloud computing, but still require an up-front investmentin hardware (they lack the advantages inherent to provisioning capacity from a third-party provider).

A hybrid cloud means that a public and private cloud are combined. A companycan host a part of the application portfolio on managed dedicated servers, while theother part of the portfolio is hosted on public cloud instances.

A community cloud may be established for a number of organizations thathave similar requirements for their infrastructure. These companies agree to sharecommon infrastructure, which may offer a higher degree of security and privacy, butis more expensive than a public cloud solution.


1.1.4 Characteristics

A number of the cloud computing characteristics were already touched upon, whendiscussing the definition of cloud computing. This section gives an overview of themost important characteristics and associated advantages of cloud computing.

On-Demand: Cloud resources are available in an on-demand manner. This makesthe provisioning of resources a continuous process that can be automated. Whenthe load of an application rises, the amount of resources rented from the cloudprovider should increase. If this process is automated, even an unpredictable burst inworkload will be reacted upon intelligently. This elasticity enables the application ofthe customer to grow according to its needs, without introducing the disadvantagesof under-provisioning (which leads to the company missing potential customers andrevenue because of failures or unavailability of its services) or over-provisioning(which leads to the consumer paying for resources he doesn’t need) any longer.Cloud computing brings about a certain level of agility that enables a fast time tomarket for a number of new software products, since these can be hosted on cloudresources that are available on-demand.

Device and Location Independence: The cloud resources are accessible overthe Internet, which makes the access to the applications hosted on these resourcesdevice (as long as it has Internet access) and location independent.

Virtualization: Virtualization techniques make it possible to run multiple virtualmachines (running different applications) on a single physical machine. This conceptis called multitenancy and makes the sharing of compute cycles and storage resourcespossible, which enables a lot of the advantages that come with cloud computing.Running multiple applications on one physical machine introduces a higher resourceutilization rate.

Pay-what-you-use: The customer only pays for the services he actually uses.There are clear rules defined about how the user is charged for the consumedresources, so it is important that the resource usage is measured properly.

Focus on Core Activities: Cloud computing makes it possible for companies tofocus on their core activities, the IT resource management activities are outsourcedto the cloud provider.

Green IT: Cloud computing is a form of green IT, since it is ecologically friendlyto use a data center that increases the utilization rate of the hardware, which lowersthe total power usage. The cloud providers invest to achieve an optimal PowerUsage Effectiveness (PUE) [21] of their data centers, this is important since powerconsumption is one of the highest operational costs of a data center.


CapEx to OpEx: Cloud computing turns capital expenses, which existed in theform of an up-front investment for buying servers, into operational expenses. Aconsumption-based pricing model is in use, which means that one does no longerpay for resources that aren’t needed.

1.1.5 Challenges

Not all the characteristics of the cloud computing paradigm are advantageous, itdoes introduce a number of (new) challenges.

Data and vendor lock-in: Data and vendor lock-in can occur in a cloud context,if a provider makes it difficult or impossible to move data or applications out of theirdata centers to another cloud provider. To solve this problem standardization of thetechnologies and procedures involved is required.

Performance Unpredictability: Service Level Agreements are well definedregarding the availability of instances, but they do not cover performance charac-teristics. Performance unpredictability can be caused by interference of multiplevirtual machines on the same physical hardware. I/O operations of differentvirtual machines performed on the same physical hard disk influence each other’sperformance for example.

Uncertainty of supply: Although there is the illusion of infinite supply, thephysical resource limit can be reached. This problem is best handled by usingresources from a pool of different cloud providers to assure high availability ofresources and to eliminate the single point of failure2.

Software licensing: There is no pricing model for software licensing tailored tosoftware installations on cloud instances. One possible way of dealing with this issue,is the usage of open source software on the cloud instances. Software companies willhowever adjust their pricing models gradually.

Privacy: Privacy concerns rise, because countries have different regulations. TheUnited States’ PATRIOT ACT [23] for example makes it possible for the USgovernment to request data access when a terrorist threat is feared. This problemcan be partially solved by allowing customers to select an availibility zone, whichensures them their instances run in the corresponding geographical part of the worldwhere legislation acceptable to the customer is in use.

2A good illustration to this uncertainty is the failure Amazon EC2 experienced on April 22nd2011, when a large number of instances and services were not available for multiple hours, see [22].

1.2. AMAZON’S CLOUD COMPUTING OFFERING 7

1.2 Amazon’s Cloud Computing Offering

In this section, an overview of Amazon’s implementation of the cloud computingmodel is given and the different products provided by Amazon are discussed. Thetwo most important ones are EC2 [2] and S3 [15]. EC2 (Elastic Compute Cloud) isdescribed by Amazon as a web service that provides resizable compute capacity. S3(Simple Storage Service) is described by Amazon as storage for the Internet. Bothtools are designed to make web-scale computing easier for developers. Since thisthesis focuses on Amazon as a cloud provider, it is important to have an idea whyAmazon became a cloud player and what products they offer.

1.2.1 Introduction

Amazon decided to modernize its data centers after the dot-com bubble (in 2000),since their servers were most of the time only using a small fraction of the maximumcapacity. Because of the introduction of the cloud model, which consolidatesworkloads on virtualized systems, Amazon obtained much higher resource utilizationrates. This increased efficiency gave them the insight that they could have evenmore of the advantages of cloud computing, if a larger variety of workloads wasavailable. This variety could be achieved by offering cloud solutions to the public. Byoperating the famous web book shop, which made Amazon a well known company,they obtained the knowledge about the technologies that are needed to develop acloud product. Thus Amazon decided to develop a cloud product, which lead to thebeta release of Amazon Web Services (AWS) in 2006. Diversification means tryingto introduce a revenue stream from a new market, which can reduce the businessrisks of a company.

1.2.2 AWS Product Portfolio4

On August 24th 2006 Amazon launched a beta version of EC2. Computer BusinessReview [24] described the offering as follows:

The retailer’s web services division yesterday opened Elastic CloudComputing for limited beta testing, promising prices starting at 0.10dollar per virtual server per hour. EC2 is one of the first servicesto address anticipated demand for on-demand computing capacity atthe lower end of the market. It’s targeted at smaller web developersthat don’t want to be able to dynamically scale if their services becomeunexpectedly popular.

Compute ProductsEC2 is the web service that offers compute capacity in the cloud, it allows the userto obtain capacity in the form of instances. EC2 is a public virtual computing

4This overview was created in February 2011, it might not be up-to-date. Check out http:

//aws.amazon.com/ for detailed information.


facility, in which virtualization is enabled by the open source Xen technology. Aninstance is a virtual machine that has certain characteristics in terms of disk space,compute capacity, memory capacity, and so on. A number of different instancetypes are available, there are for example a number of High-Memory and High-CPUinstance types available. The EC2 web service enables the customer to quicklylaunch and terminate instances, such that applications can be scaled up and downeasily. The customer is only charged for the number of instance hours he used andhas full control over his instances (root access). EC2 offers features to build failureresilient elastic applications with EC2, by using features such as Amazon ElasticLoad Balancing, Auto scaling and Amazon CloudWatch (for monitoring purposes).The AWS Management Console makes the instances easy to manage, the AmazonEC2 command line tools are an alternative. If the user’s needs do not fit one of thepreconfigured images, a customized Amazon Machine Image (AMI) can be createdthat contains the required tools, libraries, settings, and so forth. To create a customAMI the user can start from an existing AMI and modify it. An alternative is tobuild an AMI from scratch and upload it to S3. To start an instance with this newlycreated AMI, the ‘ec2-run-instances’ command can be used with the id of the newAMI. If an AWS customer has Elastic IP Addresses (static IP addresses) coupled tohis account, these can be dynamically assigned to the running instances.

Storage ProductsAmazon S3 provides a web service that can be used to store and retrieve any amountof data, at any time, from anywhere on the web. This service gives the customer theperception of infinite storage capacity. S3 users are charged on a pay-what-you-usebasis as well. The objects stored in S3 can be up to 5TB in size (upgraded from 5GBrecently5). These objects are stored in buckets that can be accessed by providing aunique developer key.

The other storage services that Amazon provides are Amazon Elastic Block Store andAWS Import and Export functionality. The latter accelerates moving large amountsof data into and out of AWS, using portable physical storage devices for transport.Amazon Elastic Block Store is an alternative root device that can be used to launchan Amazon EC2 instance. Data on Amazon EBS will persist independently fromthe lifetime of the instance, data on the instance’s local store on the other hand onlyremains available during the lifetime of the instance.

Database ProductsAmazon Relational Database Service (Amazon RDS) is a distributed relationaldatabase in the cloud. Amazon RDS is based on the familiar MySQL database,which means that applications and tools that work with MySQL databases will workseamlessly with Amazon RDS. Amazon SimpleDB is a web service that provides thefunctionality to do data indexing and fast querying of the data.

5On December 9th 2010 the Object Size Limit was raised to 5TB.


Messaging ProductsThe Amazon Simple Queue Service (Amazon SQS) allows developers to move databetween distributed components of their applications that perform different tasks,without losing messages or requiring each component to be always available. AmazonSNS (Simple Notification Service) enables sending notifications to subscribers orother applications from the cloud. This is done by a push mechanism, which meansthat whenever the customer posts a notification it is propagated directly to all thesubscribers for the corresponding topic and delivered over the protocol of choice ofthe subscriber.

Networking ProductsAmazon Route 53 is a DNS service. It translates human readable names into thenumeric IP addresses of the servers associated with the web service. Route 53answers DNS queries with low latency by using a global network of DNS servers.Amazon VPC (Virtual Private Cloud) creates a secure and seamless bridge betweenthe existing IT infrastructure of a company and the AWS cloud, this is achievedby accessing EC2 through an IPSec based virtual private network (VPN). This wayAmazon also tries to embrace the hybrid cloud model that combines a private cloudwith EC2. Elastic Load Balancing automatically distributes incoming applicationtraffic across multiple Amazon EC2 instances using a number of protocols, namelyHTTP, HTTPS, TCP and SSL. The service is also able to detect unhealthy instancesand will in that case cease routing traffic to the affected instances.

ConclusionAmazon developed an array of different cloud products through the years6. Theircloud services have been expanding and evolving constantly ever since the betaintroduction in 2006. James Hamilton wrote the following on his blog [25] whichreflects the fact that Amazon is involved in an emerging market:

Even working in Amazon Web Services, I’m finding the frequency of newproduct announcements and updates a bit dizzying. It’s amazing howfast the cloud is taking shape and the feature set is filling out. Utilitycomputing has really been on fire. I’ve never seen an entire new industrycreated and come fully to life this fast. Fun times.

1.2.3 Instance Types

When EC2 was first released, only one instance type was available. This instancetype is now called a Standard Small instance. A number of different types have beenintroduced over time to cover the needs of different users. Choosing the instancetype that is appropriate for your application is rather difficult, since it is workloaddependent. In this section an overview of the different instance types will be given,this comparison is important because pricing for the different instances will beanalyzed. To make the different instance types comparable in terms of compute

6The complete AWS product portfolio can be found at http://aws.amazon.com/products/.


power, Amazon introduced the EC2 Compute Unit (ECU). One ECU matches theequivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.This is also the equivalent capacity of an early-2006 1.7 GHz Xeon processor, whichis referenced in Amazon’s original documentation. Amazon’s data centers containcommodity off-the-shelf hardware, so through the years different microprocessorswere used. The fact that Amazon uses ECU to specify the compute power of thedifferent instance types it offers, has as a consequence that it is quite difficult tocompare prices across different IaaS cloud providers, since this measure is Amazon-specific.

The instance types that exist today are divided in groups that have similarcharacteristics and are well suited for a certain workload type. These instancetype groups are Standard, Micro, High-Memory, High-CPU, Cluster-Compute andCluster-GPU. Per instance group we’ll compare the corresponding instances in atable. The stated ‘ECU’ amounts can be obtained by offering a different numberof virtual CPU cores (given in the ‘cores’ column). The ‘platform’ column of theinstance comparison tables state whether the instance is a 32 or 64 bit system. The‘I/O performance’ is labeled with a subjective grade (e.g. moderate). The fact thatAmazon does not use an absolute measure, indicates that the performance is highlydependable on how many customers are sharing the resource at a given moment intime.

Standard instances are suited for most general purpose applications.

Table 1.1: Properties of Standard Instances

Micro instances are appropriate for low throughput applications and applicationsthat occasionally need some extra compute cycles, since this instance sometimes hasa burst in available compute cycles (up to 2 ECU, which is twice as much CPU powerthan the default instance type has available). These instances are not available witha local instance store, these have to be coupled to an EBS resource.

Table 1.2: Properties of Micro Instances

High-Memory instances are appropriate for high throughput applications (includesdatabases).

Table 1.3: Properties of High-Memory Instances


High-CPU instances are well suited for compute-intensive applications.

Table 1.4: Properties of High-CPU Instances

Cluster instances come in two varieties: Compute and GPU instances. ClusterCompute instances are best suited for HPC (High Performance Compute) appli-cations. These instances are powered by 2 Intel Xeon X5570 quad-core Nehalemprocessors and have an improved I/O performance, thanks to the increased networkperformance (10 Gigabit Ethernet connections). The Cluster GPU instances onthe other hand are best suited for applications that make use of highly parallelizedprocessing, such as rendering and media processing applications. These instancescontain two NVidia Tesla Fermi M2050 GPUs and contain the same processor andnetworking interface as the Cluster Compute instances.

Table 1.5: Properties of Cluster Instances

All these instances are launched using a certain AMI, which determines the operatingsystem and software stack that is available at launch. Amazon EC2 currentlysupports a variety of operating systems including: RedHat Linux, Windows Server,OpenSuSE Linux, Fedora, Debian, OpenSolaris, Cent OS, Gentoo Linux, andOracle Linux. This supply is constantly expanding, EC2 suppports more and moreplatforms.

1.2.4 Instance Pricing

EC2 is currently operating in five regions, namely US-East (Northern Virginia),US-West (Northern California), EU-West (Ireland), Asia Pacific (Singapore) andAsia Pacific (Tokyo)7. For now, we’ll focus on the US East region, since this is theoriginal region offered and it is still the most active market8. Later on, the pricingin the different regions will be compared (see section 2.4). There are a lot of factorsthat contribute to the price differences between these different geographic locations.These locations have for example varying building and operating costs for their datacenters. The operation costs of a data center are influenced a lot by electricityand Internet connectivity charges, these costs vary in the the different geographicalregions. Within a certain region, different Availability Zones are available, these aregeographically dispersed and isolated from failures in other availability zones. Thecommunication (and data transfer) between the different availability zones within aregion is reliable and fast.

7The Tokyo region was only introduced on March 2nd 2011.8The number of public IP addresses provided in this region by Amazon can be used as an

indication of the size of the region’s EC2 market, see appendix A.


Amazon offers three pricing models to their customers, On-Demand, Reserved andSpot pricing.

On-Demand InstancesIn the On-Demand pricing model the customer pays for compute capacity by thehour, this hourly rate is fixed. There is no upfront investment (i.e. a fixed fee perinstance), which means that there is no long-term commitment for the customer.This makes the user more flexible about starting and terminating instances whenevermore or less compute capacity is needed by his application. It is however possiblethat for short periods of time no more instances, in a certain availability zone,are available. Whenever one has to be assured about the availability of a certainnumber of instances,this number of instances should be reserved instead of acquiringthem as an on-demand instances. The on-demand pricing model is best suitedfor applications with short-term, spiky, or unpredictable workloads that cannot beinterrupted. Applications that are in testing or development phase are also suitedfor the on-demand pricing model.

Table 1.6 presents the pricing for the on-demand pricing model for all instancetypes in the US East region.

Table 1.6: On-Demand Pricing in the US-East Region

Reserved InstancesReserved Instances require a one-time upfront payment (a fixed price), whichreserves the instance for a one or three year term (depending on the customer’schoice). As a compensation for this upfront investment, the rate per hour issignificantly lower and the customer is assured that his instance, with the chosenoperating system and availability zone, will be available at any time. If one needsa certain instance constantly for a significant amount of time, it is cheaper to usethe reserved than the on-demand pricing model. The tipping point for a reserved


instance to be cheaper than an on-demand instance depends on the instance type andon the utilization rate of the instance. This is analyzed in section 3.2.1 of chapter3. The reserved pricing model is best suited for applications with steady stateor predictable usage and for applications that require reserved capacity, includingdisaster recovery software.

Table 1.7 presents the pricing for the reserved pricing model for all instance typesin the US East region.

Table 1.7: Reserved Pricing in the US-East Region

Spot InstancesIn December 2009, a new pricing model was released, called Spot pricing. SpotInstances do not require an upfront commitment and most of the time the hourlyrate is lower than the on-demand rate. In this model the hourly price fluctuates, itis set by Amazon based on the supply and demand of instances. Amazon can sell itsexcess capacity using this pricing model. A customer specifies the maximum price(bid) he is willing to pay for the instance. When the Spot Price becomes higherthan the customer’s maximum bid, the instance is shut down by Amazon. The spotpricing model is best suited for applications that have flexible start and end timesand for applications that are only feasible at very low compute prices. The averagespot prices and an analysis of the history of the spot prices will follow in the nextchapter.

Dedicated InstancesThis pricing model was introduced on March 27th 2011 for the Amazon VirtualPrivate Cloud (VPC) product. Dedicated instances are launched on hardwarededicated to a single customer. It ensures that Amazon EC2 compute instances arerunning in an isolated environment, such that a customer’s application performancecan’t be influenced by the workload of other customers. The single tenancy canbe assumed to be limited to the local disk, processor and memory. The networkand networked storage devices are still shared by multiple customers. Two versionsexist, one without an upfront commitment called Dedicated On-Demand instancesand another one with a one or three year upfront fee called Dedicated Reserved


instances. For now this offering is only released in the US-East and the EU-Westregions. It is available for all instance types, except for the Micro, Cluster Computeand Cluster GPU instances. The prices for dedicated instances, presented in Table1.8 and Table 1.9, are between 17% and 25% higher (the fixed fee as well as the hourlyrate for reserved instances) than the regular on-demand and reserved instances. Ontop of this, there is an additional dedication fee of $10 per hour for the regionsin which at least one dedicated instance is running. The convenience of having anisolated resource is offered at a rather high cost.

Table 1.8: Dedicated On-Demand Pricing in the US-East Region

Table 1.9: Dedicated Reserved Pricing in the US-East Region

Free TierOn November 1, 2010 Amazon introduced a free usage tier for their AWS services.The following is offered to new customers for a duration of 12 months on a permonth basis:

• 750 hours of Micro instance usage.

• 750 hours of an Elastic Load Balancer and 15 GB of data processing.

• 10 GB of EBS, 1 million I/Os, 1 GB of snapshot storage, 10000 snapshot getrequests and 1000 snapshot put requests.

1.3. MOTIVATION FOR A EC2 BROKER 15

• 5 GB of S3 storage, 20000 get requests and 2000 put requests.

• 15 GB of data transfer in and 15 GB of data transfer out.

Another part of the free usage tier is offered to existing customers as well, andthese offerings will not expire after 12 months. Everyone gets 25 Amazon SimpleDBmachine hours with 1 GB of storage and 100000 Amazon Simple Queue Servicerequests, 10 CloudWatch alarms and a number of Simple Notification Servicerequests and notifications.

Table 1.10 gives an overview of the instance types that are offered by AmazonEC2 using the different pricing models and operating systems in the differentgeographical regions. The Cluster Compute and Cluster GPU instances are notoffered everywhere, these instances are only offered running Linux in the US-Eastregion.

Table 1.10: Availibility of the Instance Types in the different Regions

1.3 Motivation for a EC2 Broker

The cloud computing overview has made clear that cloud computing offers greatadvantages to a number of companies. The pay-what-you-use model is one ofthese benefits, it should result in the customer paying less, while still meeting thequality of service requirements of his application. Under- and over-provisioning ofresources could be avoided, thanks to the on-demand character of cloud solutions.The previous section, which discussed Amazon’s product offering, has shown thatmoving an application to the cloud is not straightforward, since a large number ofdifferent instance types and cloud services exist. On top of that there are differentpricing models available.

The pricing models that are offered by Amazon, as stated in the previous section,provide the user the following options:

• Allocate on-demand instances that are priced with a fixed hourly chargingrate.

• Allocate reserved instances with a guaranteed availability and fixed hourlyprice, which is lower than the on-demand price for the corresponding instance.

1.3. MOTIVATION FOR A EC2 BROKER 16

This model however requires an upfront payment per instance for a one orthree year period.

• Allocate instances on a spot market with prices that vary hourly.

Determining how much running an application on EC2 costs is a hard task. The costdepends on a lot of properties of your application’s workload, such as what instancesand cloud services it requires, in what geographic location it has to run, how muchdata storage and data transferring is required, and so on. Although a simple costcalculator [26] is available, it still requires a lot of manual decisions that are hardto make, such as what instances and instance pricing models have to be used. It’snot possible to determine the optimal, in the sense of the one with the lowest cost,solution for running your application on EC2, that still respects the QoS constraintsof your application/workload.

For the moment consumers do not have any tools to optimally map their workloadand QoS requirements (such as the deadline by which a workload needs to finish),to the different EC2 pricing plans. Nevertheless the potential for cost reductionsthrough intelligent instance allocations are huge, since the difference in hourly pricespaid using the different models is significant. The difference between these modelsand an analysis of the historic prices will be discussed in the next chapter.

The goal of this thesis is to devise heuristics that can be used by a brokeringcomponent to realize this optimization goal. This involves an analysis of theconsumer’s workload characteristics, QoS requirements and the volatility of the EC2spot market to define an optimal portfolio of instance allocations across the threepricing plans.

CHAPTER 2

ENVIRONMENTAL ANALYSIS

This chapter explains what environmental parameters are taken into account in theheuristic this research proposes for the EC2 resource broker. These parametersmostly resulted from the analysis of Amazon’s pricing models and the actual pricesof the different AWS products and services. The environmental parameters that caninfluence the total cost for the customer considerably, will be considered important.Other parameters will be ignored, since the total cost is not influenced much by them.Determining which parameters to take into account is a trade-off that influences thecomplexity of the broker.

2.1 Introduction

The price of running a particular application, with a corresponding workload, onEC2 does not only depend on the chosen pricing model. The first choice thatdetermines the associated cost is the choice of the instance type that the applicationrequires. We’ll presume this choice to be made by the user in the proposed broker,but determining an intelligent choice of instance type could be automated throughfurther research. Depending on the type of workload, the application can bebenchmarked for a fixed amount of time and based on the acquired data, the resourcerequirements of the application can be determined, which directly results in a bestfit instance type.

There are other degrees of freedom when choosing the instance on which a workloadwill be handled, some of these choices influence the price. One of these parametersis the choice of operating system (OS) of the instance, EC2 offers both Linux andWindows instances. This choice is very dependent on the application characteristics.A geographical region should be chosen for the instance as well, depending on the

2.1. INTRODUCTION 18

application this choice is totally constraint free. As will be shown later on, thischoice can make a big difference in the resulting cost. Another degree of freedom isthe availability zone, but all zones within a region do offer the same prices on EC2.The pricing model that is chosen, clearly influences the total cost significantly. Thechoice between on-demand and reserved instances only depends on how long theinstance will be needed by the customer. There is a tipping point (as will be shownin chapter three) from where on taking a reserved instance is the appropriate choiceto ensure the lowest total cost possible. Whether the spot pricing model should beused, depends on whether the application allows to be interrupted on unexpectedmoments in time. Certain types of workload will require a checkpointing techniqueto be used. This checkpointing/snapshotting overhead takes time away from theinstance hour and should be taken into account when determining the total costfor a certain workload. Since these parameters are workload dependent, differentworkloads will be looked into (see chapter 3).

Amazon offers different instance purchasing options are on the basis of a numberof environmental parameters that can be exploited to reduce the cost of running aworkload on EC2.

• On-Demand Instances let customers pay for compute power by the hour. Thereare no long-term up-front commitments, which means the customer is freefrom the costs of purchasing and maintaining hardware. The large fixed costs(capital expenses) are transformed into smaller variable operation expenses.The need for an own capacity safety net, which is needed in case of trafficpeaks, is removed. Extra instances can be launched, even automatically, incase of an unexpected high traffic load. The evolution of the on-demand priceover time will be discussed in this chapter, to determine whether this pricechange has to be accounted for in our model. We can assume that cloudprovider competition and lower hardware costs over time will cause the on-demand prices to drop. A comparison of the on-demand prices in the differentregions will be discussed and conclusions will be made about whether certaininstance types are cheaper in certain regions.

• Reserved Instances require the user to make a one-time upfront payment (every1 or 3 years according to the user’s choice). The hourly rate for these instancesis however significantly lower than the corresponding on-demand price. Thecustomer has no further obligations, he may choose how much of the time areserved instance is used, and will only be charged for the hours the instanceis actually running. The evolution of the reserved prices over time will bediscussed in this chapter, to determine whether there have been price changesboth for the upfront fee as for the hourly rate. We assume again that this willbe a price reduction, due to the fact that the hardware cost for Amazon to offerthese resources becomes lower over time. A comparison of the reserved pricesin the different regions will be made, from which we can conclude whethercertain instance types are cheaper in certain regions. We will determine fromwhat level of usage a certain region becomes cheaper, the difference is caused


by differences in the upfront fee versus the hourly rate ratio in the differentregions.

• Spot Instances allow customers to bid on unused Amazon EC2 capacity, thecorresponding instances are periodically priced based on the level of supply anddemand (see appendix D). As long as this price is lower that the customer’sbid, the customer will be able to run a requested instance. If an application isflexible enough to run on spot instances, the Amazon EC2 costs can becomesignificantly lower. So, it is important to study the spot price history inorder to develop an intelligent resource allocation heuristic that takes spotinstances into account. The statistical analysis of the spot price history focuseson finding trends in the spot price evolution. This means investigating thedifference in price between the different geographical regions, but also betweennight and day, between week and weekend days, and so on. The spot price isalso compared to the different pricing models across the different regions. Wetry to determine whether these prices are purely based upon the rules of supplyand demand, or whether these prices are artificially chosen to meet Amazon’sneeds. More information about the statistical terms and the boxplots that areused in this chapter can be found in the appendix E.

The research in this chapter concerns an empirical analysis, based on the historyof the EC2 instance prices. An important preliminary is to find a way to acquirethis pricing history information. The current on-demand and reserved prices canbe found (subdivided per region) on the EC2 pricing website [27]. The EC2 APIdoes not provide a way to fetch the current on-demand and reserved prices. Thehistory of these on-demand and reserved prices is pretty hard to determine. Theonly way to find out when and how much these prices changed was to examine theannouncements in the News and Events section on the AWS EC2 website [28] tofind out when price reductions occurred. To determine what the previous priceswere several blogs and forums had to be consulted. The spot price history on theother hand can be requested through the Amazon EC2 API, although only the spotprices of the last 3 months can be obtained. Thus the API is not the ideal sourcefor spot prices when a long term analysis is desired. The spot price history data canbe found on cloudexchange.org [29] and exported to a CVS file containing recordtuples of a timestamp (date and time) and the spot price of that particular momentexpressed in US dollars.

2.2. PRICE EVOLUTION OF THE ON-DEMAND AND RESERVEDINSTANCES 20

Figure 2.1: Screenshot of CloudExchange.org

Note that the CSV files, which provide the full history of the spot prices, donot include information for Micro and Cluster Compute/GPU instances. Amazononly recently announced that Cluster Compute and Cluster GPU instances wereintroduced under the Spot pricing model. Micro instances did not exist at thetime of creation of the CloudExchange service1. These operating system, instancetype and region specific files are used as input files for the proposed broker. Tocompensate the aforementioned disadvantages of CloudExchange and to ensure thatour broker will still have the needed input available even when cloudexchange.orggoes offline, we created an own service that takes the cloudexchange files as initialinput and that appends the new spot prices to these files, after fetching them dailyby performing an Amazon EC2 API call. This service is incorporated in a spot priceapplication, called SpotWatch, that is available online at http://www.spotwatch.eu,more of its features are discussed further on.

2.2 Price Evolution of the On-Demand and Reserved Instances

Amazon EC2 instance prices are not fixed, in a way that Amazon has the abilityto change these prices at any given moment in time. It should not come as asurprise that prices have decreased the last couple of years. This is a rather naturalphenomenon, since the instance types that were released a couple of years ago, stillhave the same specifications today, but the hardware cost to provide such an instancehas over time. An overview of the on-demand and reserved pricing history of theEC2 instances is given in this section.

In the first appendix B the pricing history can be found in detail, but a generaloverview of the evolution in on-demand prices in the US-East region is given by thefollowing Table 2.1. Note that the dark grey colored cells contain the prices in USdollars that are currently in use.

1Micro instances were announced on September 9th 2010

2.2. PRICE EVOLUTION OF THE ON-DEMAND AND RESERVEDINSTANCES 21

Table 2.1: Evolution of On-Demand Pricing in the US-East region

Table 2.1 demonstrates that the on-demand prices of the Standard and High-CPUUnix instances have decreased 15%, since their introduction years ago. The High-Memory instances have decreased over 16% in price since their introduction in 2009.The instances that were introduced in 2010 haven’t gotten a price update yet. Theprice of the Windows instances has decreased less over time, the Standard and High-CPU instances became about 4% cheaper in 2 years. What causes this difference inprice reduction of the on-demand Linux and Windows instances is hard to determine.On possible explanation is that Microsoft is still fine-tuning the licensing costs tocharge Amazon for the Windows operating system. Another explanation is thatWindows instances were already priced more competitive at their launch date. TheWindows High-Memory instances however got a serious price decrease of about 14%.The Windows instances that launched in 2010 haven’t had a price update yet.

Next a general overview of the evolution of the reserved prices is given in the Table2.2.

Table 2.2: Evolution of the Reserved Pricing in the US-East region

When the hourly rate for reserved instances changes, this also impacts the hourlyrate for reserved instances acquired before the rate change. This is not mentionedin the official terms, but an official press release [28] of the price reduction thathappened in September 2010 shows this is done in practice.

If you have existing Reserved Instances your hourly usage rate willautomatically be lowered to the new usage rate and your estimated billwill reflect these changes later this month.

One still has paid the original one-time fee, so the possible price reduction of theone-time fee could still be taken into account by a broker. We notice that not manyprice reductions have happened in the EC2 history. Most of the new prices were

2.3. PRICE EVOLUTION SPOT 22

introduced when a new instance type was introduced. The only real general pricereduction happened in October 2009, when all on-demand prices were lowered up to15%. The increasing competition in the IaaS market could introduce more regularprice changes in the future. There are competitors on the IaaS cloud market whoare gaining market share, such as CloudSigma, ElasticHosts, FlexiScale, GoGrid,RackSpace, and so on.

An interesting research concerns the relation between on-demand price reductions inEC2 and the reduction in hardware costs over time. The details of this investigation,focusing on CPU procurement costs, can be found in appendix C. It is clear from ouranalysis that finding the price reduction of hardware products is difficult and findingout what hardware EC2 is using is not much easier. The appendix shows that duringthe last 5 years, in which Amazon EC2 has been active, newer microprocessors wereput in use. This evolution was accompanied by a noteworthy hardware price decreaseof up to 80 percents. The hourly rate for EC2 instances on the other hand has onlyhad price reductions of up to 15 percents. We can not make any notable conclusionsabout the fact whether the 15% price reduction is a good/honest reflection of thehardware cost decrease, since we have no insight in other parameters (such as theoperational costs involved) that make up the prices Amazon charges its customers.

2.3 Price Evolution Spot

2.3.1 Introduction

Amazon offers a spot pricing model, that allows customers to bid on unused EC2capacity. The customers acquire a running instance as long as their bid exceeds thecurrent spot price. The spot price is calculated every hour per instance type andper region. Amazon claims that these spot prices change according to the rules ofsupply and demand, but whether this is fully the case is questionable. If a customerhas an application that is flexible enough to allow it to cease its operations duringperiods when the customers bid price does no longer meet or exceed the spot price,this application could possibly be a good fit for the spot market. This pricing modelis thus only usable for applications with certain characteristics. These applicationsoften have to be adapted to be able to recover from instance outages (e.g. throughthe use of a checkpointing scheme). The use of spot instances can significantly lowerone’s total costs, since on average spot prices are a lot lower than On-Demand prices,and even than the hourly Reserved instance prices.

Amazon released the innovative (auction-like) spot pricing mechanism in December2009, it is still rather new and consequentially not that much research has been doneyet. To perform statistical analysis on the spot pricing, we need the history of theseprices. As explained earlier (see section 2.1), these spot price traces can best beretrieved from cloudexchange or SpotWatch.


Spot instances open a market model where unused resources can be sold throughthe mechanisms of supply and demand (see appendix D), but are these rulesactually carried out? The problem could be that the pricing model is not purelysupply/demand driven, because of the fact that there is only one seller, namelyAmazon. No one can assure that Amazon does not use all the information it has athand to determine the (for them most profitable) current spot price. Amazon couldfor example use the maximum bids customers placed on the spot market, to choosethe price that maximizes their profit.

CycleCloud [30] performed an experiment on EC2, which required a large numberof spot instances to be requested. This experiment often caused out of capacityerrors. CycleCloud hoped to be able to predict instance availability, by analyzingthe historical spot prices, but this failed. The spot prices do not always reflect thedemand and supply of an instance, because no increases in spot price were noticedwhen out of capacity errors occurred (indicating a supply shortage).

Customers with applications that need resources urgently can bid a higher price onthe spot market to get the remaining resources at their disposal. Specifying a highermaximum bid raises the priority of a request for capacity. An interesting questionis why customers wouldn’t always bid the on-demand price on the spot market, inthe hope to always pay less than the on-demand price on the spot market? Theproblem with this technique is that the spot price can actually rise above the on-demand price, which would cause all spot instances to get terminated at once. SpotInstances are especially applicable to certain applications that are flexible (are easilystopped and started again) such as financial modeling, web crawling, load testingand video conversion jobs. Note that these tasks can be performed in iterations,which makes taking snapshots rather easy. It remains however difficult to choosean intelligent checkpointing scheme, since the best scenario, the one that causesthe least amount of overhead, would be to only take a snapshot right before thespot instance gets terminated. Snapshotting makes these types of applications moreresilient to the fact that spot instances are terminated when the current spot priceexceeds the maximum bid.

The following boxplot (Figure 2.2) shows that the spot price in the US-East region forStandard Small instances rose above $0.085/hour on multiple days, the on-demandprice for the US-East region. On a number of days the boxplot indicates outliers thatrepresent spot prices higher than the corresponding on-demand price of the instance.We notice that during most days of the month the quartiles are positioned aroundthe $0.031/hour point, which is the spot price of the corresponding instance mostof the time. This phenomenon can be explained by the fact that when a number ofusers need resources urgently, the users can specify a maximum bid higher than theon-demand price to raise the relative priority of their requests. This allows them togain access to as much immediate capacity as possible. If only a limited amount ofspot instances are available this can cause the spot price to rise above the on-demand


price, thanks to the rules of supply and demand. The spot price does not follow therules of supply and demand all of the time, since we sometimes get out of capacityerrors without changes of the spot price as discussed previously.

Figure 2.2: Example of Spot Prices exceeding the On-Demand Prices in 2010(Standard Small Linux instance in the US-East region)

2.3.2 Working Method

We start by investigating the spot price history of the US-East region first, since thegenerated graphs for this region fluctuate the most, which may indicate more spotmarket activity in this region. The following analysis is performed on the data ofthe spot prices between the end of August and the middle of October 2010 at first,afterwards we determine whether the conclusions still hold in different time frames.The different scenarios for dividing the spot instance prices are:

• Average per day: We want to examine whether there is an evolution of theaverage spot price over time, so we can compare how the spot market of thecorresponding instance behaved on the different dates.

• Average per week: We want to be able to investigate whether there are certainweeks in the year when the spot market behaves differently, for example it couldbe that prices decrease during the Christmas holidays, since a lot of people areoff from work and thus certain cloud application need less capacity. Or maybethis is a period of higher demand, since people buy gifts online and web shopsneed more capacity.

• Average per day of the week: The average per day of the week is included todetermine whether a pattern can be found in the price of the spot market acrossthe days of the week. This could for example make it possible to determine


whether during the weekend, when a lot of people are off from work, the spotprices are lower (or higher).

• Average per hour of the day: The average per hour of the day is includedto determine whether a pattern can be found in the price of the spot marketduring the day. We could for example find out whether prices decrease duringthe night in a certain region (taking the time differences into account).

For all these scenarios we created a boxplot of the data and a plot of the averageprice. Boxplots are very good at presenting information about the central tendency,symmetry and skew, as well as outliers of the data. This data can be used tomake the EC2 capacity planner more intelligent, it could for example be used toforesee cheaper prices at certain times of the day. To make this possible somestatistics, including for example the mean, skew and kurtosis, are extracted fromthe spot price history data set, these values are stored for every scenario for eachOS-Region-Instance type combination possible. The different boxplot componentsand the statistical values that are calculated are explained in the appendix E.

2.3.3 Price Analysis

The generated graphs for the High-Memory Double Extra Large instance (us-east-1.linux.m2.xlarge) are discussed in this section, since these graphs clearly show thegeneral trends that were found. We demonstrate the findings using graphs from theUS-East region, since this region seemed to be the most active spot market. Thiscan be understood by the fact that spot pricing is a rather new model and thatcustomers want to play around with it in the cheapest region.

Average per DayFigure 2.3 gives an overview of the average spot price of the High-Memory DoubleExtra Large instance between August 25th 2010 and October 15th 2010. We noticethat the price fluctuates quite a bit.

Figure 2.3: Average Spot Price per Day (High-Memory Double Extra Large Linuxinstance in the US-East region)


Although the average prices could suggest considerably higher spot prices on certaindates, the boxplots show that the fluctuations in average price are caused by outliers.The percentiles in the boxplots are aligned quite well, also on the days when theaverage is higher. This means that most of the values on such a date are still lowerthan the Q3 border, which is situated around $0.176/hour. If we take into accountthe lower and upper whisker ends, it becomes even more clear that most prices,lay within the same range of values on all dates. The fluctuations in average spotprice can only be explained by the existence of outliers. For this instance we get thefollowing statistical values: Q1=0.165, Q2=0.172, Q3=0.176, an arithmetic mean of0.1833, a skew of 4.73 and a kurtosis of 21.01. A positive skew indicates that thetail on the right side is longer than the left side and that the bulk of the values lieto the left of the mean (including the median). A high skew value is an indicationfor outliers, since it tells us that most values are smaller than the average value wegot. Another indication for outliers is the high kurtosis value.

Figure 2.4: Boxplots Spot Price per Day (High-Memory Double Extra Large Linuxinstance in the US-East region)

We can use the information we acquire from the average per day analysis in ourresource allocation algorithm. By determining a maximum bid price that guaranteesus with a certain probability access to the needed amount of spot instances. Thisavoids the need to rent instances on the moments that extreme high outlier valuesoccur. Also the evolution of the quartiles should be monitored to modify ourmaximum bid appropriately, this however requires our broker to have a real-timecomponent in its scheduler. We’ll focus on general trends for now, by which wemean conclusions that can be drawn from analysing the spot price history, withoutintroducing a real-time component in our resource allocation algorithm.

Average per WeekThe average price per week graph (see Figure 2.5) shows fluctuations. Thesefluctuations seem less abrupt, this is understandable since averaging the spot pricesper week flattens out the outliers values.


Figure 2.5: Average Spot Price per Week (High-Memory Double Extra Large Linuxinstance in the US-East region)

Once again the boxplot graph (Figure 2.6) shows less fluctuations, the quartiles arewell aligned across the different weeks. The differences in average spot price areagain caused by outliers.

Figure 2.6: Boxplots Spot Price per Week (High-Memory Double Extra Large Linuxinstance in the US-East region)

The following Figure (2.7) shows that on a bigger time scale, namely from thebeginning of the existence of the spot instance (December 2009) until January1th 2011, the same conclusion holds. All quartiles are aligned very well, the onlypeculiarity is that during the first couple of weeks there was a search for the rightspot price.


Figure 2.7: Boxplots Spot Price per Week between December 2009 and January2011 (Standard Large Linux instance in the US-East region)

If we focus on a holiday period (cfr. scenario described in section 2.3.2), such asChristmas (see Figure 2.8), we notice a fall in the Q3 value during the week thatended with Christmas in 2010 (4th boxplot). In 2009 this was the case as well. Thefuture will tell whether this is a trend or whether the small spot price differences inthese weeks were mere coincidence.

Figure 2.8: Boxplots Spot Price per Week during the Christmas period betweenDecember 1th 2010 and January 10th 2011 (High-Memory Double Extra Large Linuxinstance in the US-East region)


Average per Day of the WeekThe average spot price during the days of the week shows fluctuations too. Theboxplots in Figure 2.9 show that during the weekend, Q3 falls below the $0.175/hourprice point, while the Q3 border is higher during the days of the week. We concludethat prioritizing weekend days above week days during the scheduling of the instancehours in a broker, can lead to cost reductions.

Figure 2.9: Boxplots Spot Price per Day of the Week (High-Memory Double ExtraLarge Linux instance in the US-East region)

Average per Hour of the DayThe average spot price per hour plot (see Figure 2.10) shows fluctuations, the averageprices are higher in the afternoon and the evening and the prices are cheaper duringthe night and the morning.

Figure 2.10: Average Spot Price per Hour of the Day (High-Memory Double ExtraLarge Linux instance in the US-East region)


The boxplots (see Figure 2.11) clearly show that the percentiles lay rather close toeach other, and thus that average spot price fluctuations are caused by outliers.The statistic values are almost always around 0.165 for Q1 and around 0.176 for Q3.The Kurtosis values is rather high most of the time, which indicates the presence ofoutliers.

Figure 2.11: Boxplots Spot Price per Hour of the Day (High-Memory Double ExtraLarge Linux instance in the US-East region)

Since a difference in price between the weekend and week days is noticeable, it israther evident to investigate the price difference between day and night as well. Weassume the day to be from 8AM until 8PM. The plot beneath (see Figure 2.12)shows the difference between the day and night prices for every day. A positive peakmeans that the spot price was more expensive during the day. Note that this wasthe case for most of the days. If we have a look at the statistics, we notice that theprice during the day is 0.0096 US dollars higher on average than during the night.This difference is rather small for an average price of $0.1833). We conclude thatour suspicion of higher prices during the day is in this case confirmed, but that thedifference is very small. The fact that there is only a very small difference betweenday and night can be explained by the fact that the US-East region is the very firstregion of EC2 and it is used by developers from all over the world. The developersare attracted to this region by the pricing of the instances and the introduction ofnew services is first done at the US-East location. The global consumer base of theUS-East region cause a constant workload on the region and can be the reason forsmall difference in price between the day and night. It is not a bad idea to bear inmind that this observation could become more clear in the future, when the spotmarket becomes more active. The average difference between day and night and thestandard deviation are good metrics to keep an eye on.


Figure 2.12: Spot Price Difference between Day and Night (High-Memory DoubleExtra Large Linux instance in the US-East region)

Comparison other Time FramesThere are only small differences when we compare the generated plots with thosein a more recent time frame (e.g. December 18th 2010 until February 13th 2011),our main conclusions still hold. The prices during the weekend are still only a littlecheaper than those during the week. The following boxplot (see Figure 2.13) showsthat the quartiles of the per day plot are still aligned properly (in comparison toFigure 2.4).

Figure 2.13: Boxplots of the spot prices between December 18th 2010 and February13th 2011 (High-Memory Double Extra Large Linux instance in the US-East region)

Comparing the average spot price of these two different time frames (see Table 2.3),reveals that the average prices did change over time. Most of the average spot priceshad a little price increase, except for the high-CPU extra large and standard largeinstances, whose prices reduced considerably. For the standard large instance this


can be explained by the fact that during the earlier time frame this spot price wastoo high compared to the other spot prices, its ratio spot versus on-demand pricewas much higher than the ratio of the other instance types. This deviation can thusbe seen as a price correction. The High-Memory Double and Quadruple Extra Largeinstance prices also decreased, which is explained by the fact that these instancesgot an on-demand hourly price reduction on September 1th 2010. During the secondtime frame the spot prices of these instances were adjusted according to the newlyintroduced prices. We conclude that monitoring the long term evolution of the pricesis an important task.

Table 2.3: Comparison Average Spot Prices (High-Memory Double Extra LargeLinux instance in the US-East region)

Comparison to other Instance TypesThis section compares the results to the prices of the other instance type in the USEast region. The following graphs (see Figure 2.14 and Figure 2.15) were createdfor the High-CPU Extra Large instance type (us-east-1.linux.c1.medium). We noticethat the average stays rather constant until the beginning of October 2010. Thisprobably indicates less activity on the spot price market, the flat graph indicates aperiod without the market mechanisms fully at work. These kind of periods can bedetected by using the average difference in Q1, Q2, and Q3 values as a measure formarket activity.


Figure 2.14: Average Spot Price Evolution (High-CPU Extra Large Instance in theUS-East Region)

On the boxplot we notice that the price evolution starts to become interesting inthe beginning of October, because the percentiles start to be positioned differently.

Figure 2.15: Boxplots Spot Price Evolution (High-CPU Extra Large Instance in theUS-East Region)

Also other graphs for this High-CPU Extra Large instance type start to be interestingin October 2010. So, it is important to analyze the price traces during different timeframes, because time periods can be detected during which the market mechanismsare not fully in operation.

Comparison to other Operating Systems and RegionsThe US East region is the cheapest and is presented by Amazon as the region to playaround in, all new functionalities are introduced in this region first. This makes itobvious to assume that customers from all over the world use instances in this region,such that there is activity on the US-East spot market at all times of the day. This


is caused by the different time zones the customers are located in. When the spotmarket in the EU-West region is analyzed, one would predict that the prices duringthe night would decrease more than in the US-East region. This is however not thecase, which can indicate a lack of activity in the spot market in this region. Thesame conclusions, as those made in the US-East region, hold for the other regionsand for the Windows instances as well.

2.3.4 SpotWatch

To enable further research on the statistical analysis of the EC2 spot pricing, wecreated the website http://www.spotwatch.eu, where graphs containing averagesand the accompanying box plots are generated on request. This web service allowsits users to create graphs for all the existing regions, including the new Tokyo region,instance types and operating systems offered by Amazon EC2 in any desired timeframe. The spot price history is available from the beginning of the existence ofthe EC2 spot market until the current date. The spot price history is updateddaily through an Amazon EC2 API call. SpotWatch offers four different charttypes, the data can be plotted per date, per week, per day of the week and perhour of the day. The features of this service can be easily extended in the future.Useful extensions include presenting a number of statistical values about the queriedtime period and offering more graph types. We could also offer users to downloadgraphs together with their corresponding data CVS files. Since we received multiplepositive reactions from people in the cloud industry, including two AWS evangelistsat Amazon.com, we plan to keep this site up and improve its features over time.The following screenshot shows the service’s look and feel (see Figure 2.16).

Figure 2.16: Screenshot of SpotWatch.eu

2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 35

2.4 Pricing Comparison of the Different Regions

In this section we investigate the pricing of EC2 instances in the differentgeographical regions. It’s needless to say that not only the monetary cost should beconsidered when running an application in the cloud, but also the latencies involved(there are of course applications for which higher latencies are no problem). Theprices of the different locations can be found on Amazon’s website and are presentedin this section, in tables in which the cheapest price for a certain instance is indicatedwith a green colored cell. Note that the cluster compute and cluster GPU instancesare only offered in the US East region running Linux and using the On-Demand orReserved pricing model, the other cases are labeled with ‘Not Available’ (NA).

2.4.1 On-Demand

Table 2.4 presents the on-demand prices for the different instances across the existingregions. For Linux instances we notice that the US-East region is the cheapest. TheUS-West, EU-West and APAC-SouthEast (Singapore) region have the same prices,but these prices are considerably higher than the ones in the US-East region. Thestandard and high-CPU instances are 11.76% more expensive, the micro instancesare 25% more expensive and the high-memory instances are 14% more expensivein these regions than in the US-East region. The newly introduced APAC-North(Tokyo) region has the highest Linux instance prices. The fact that the US-Eastregion is the cheapest can be caused by of a lot of reasons, such as higher operationalcosts (e.g. personnel, taxes, ...) in the other regions.

Table 2.4: Region Comparison On-Demand Pricing of Linux Instances

Table 2.5 presents a comparison of the on-demand prices for Windows instancesacross the different regions. The same order does not hold for Windows instances:the US-East, EU-West, APAC-SouthEast and APAC-North regions have the sameprices, while the US-West region is the most expensive one for Windows instances.There is however one exception, the Micro instances are the cheapest in the US-Eastregion, and hold the same price in all other regions. This proves that, even whenthe most important parameter is to get instances at the lowest cost possible, it isn’t


always the best option to choose for the US-East region. If dealing with Windowsinstances, it would be beneficial to study some other parameters, such as latency, todetermine the appropriate region.

Table 2.5: Region Comparison On-Demand Pricing of Windows Instances

2.4.2 Reserved

Table 2.6 presents the hourly reserved prices for the different instances across theexisting geographical regions. For Linux instances the US-East region demonstratesthe cheapest prices. The US-West, EU-West and APAC-SouthEast (Singapore)region have the same prices, which are between 25 and 30 percents higher than theUS-East prices. The prices in the new APAC-North (Tokyo) region are again (aswith the on-demand prices) the highest, these are about 9 to 11 percents higher thanthe prices in the US-West, Eu-West and the other APAC region. It is important tonote that the Tokyo region is the first one that introduced a different fixed fee, whichmakes this region even more expensive compared to the other regions (its fixed feesare about 5 percent higher).

Table 2.6: Region Comparison Reserved Pricing of Linux Instances

Table 2.7 shows a comparison of the reserved hourly prices for Windows instancesacross the different regions. The same order does hold for Windows instancesthan for the Linux instances. The US-East region is the cheapest, followed bythe US-West, EU-West, APAC-SouthEast regions who all have the same prices thatare between 13 and 25 percents higher than the US-East prices. For Windows


instances the same increased fixed fee holds for the newest region, namely APAC-North (Tokyo), this makes this region even more expensive. The hourly prices werealready between 7 and 9 percents higher in this region compared to the US-West,EU-West, APAC-SouthEast regions, except for the Micro instances price which isabout 23 percents higher in this region.

Table 2.7: Region Comparison Reserved Pricing of Windows Instances

2.4.3 Spot

For the following tables the average spot price for each OS, instance type and regioncombination possible was first determined using the spot price history up until theend of March 2011. The first Table 2.8 presents the spot prices for the Linuxinstances across the different regions, it shows that for all instances the US-Eastregion is the cheapest. The US-West, EU-West and APAC-SouthEast regions havepractically the same average spot prices which are a little higher than the ones inthe US-East region. The Tokyo region is once again the most expensive one. Thehigh-memory Double Extra Large and Quadruple Extra Large instances howeverseem to be an exception, since their prices do not exceed those of the other regions.The Tokyo region does not exist long enough (only since the beginning of March2011) to be certain whether this is a robust trend.

Table 2.8: Region Comparison Average Spot Pricing of Linux Instances

Table 2.9 presents a comparison of the average spot prices for Windows instancesacross the different regions. The same order does roughly hold: the US-East region


is the cheapest, followed by the US-West, EU-West and APAC-SouthEast regionswho again all hold about the same prices. The Tokyo region would be expected tobe the most expensive one, but this is not the case.

Table 2.9: Region Comparison Average Spot Pricing of Windows Instances

All instances, except for the High-Memory Extra Large and the Micro instance, havea cheaper price in the Tokyo region for the Windows instances in comparison to theother regions. The average spot prices in the Tokyo region are still considerablyhigher than in the US-East region however. This could again mean that the timeof observation to get a good image of the average spot prices was too short, or thatthere is an initialization phase during which the spot prices behave a little different.A phase is noticed in which the spot price was not settled yet, in other regions aswell. This can be seen on the following graph that shows the price fluctuations inthe first couple of days at the end of 2009 when spot instances were just introduced(see Figure 2.17).

Figure 2.17: Average Spot Price for Standard Large Linux instance in the US-EastRegion between December 14th 2009 and February 14th 2010


2.4.4 Comparison of the Pricing Models

The three different pricing models: on-demand, reserved and spot are combined nowinto one graph (see Figure 2.18), which shows clearly how the prices of the differentinstances relate to each other. Note that the reserved prices used in these graphsassume the instance to be used the entire year, so this is the best case scenario orin other words the optimal case for reserved pricing is compared to its on-demandand spot counterparts.

Figure 2.18: Instance Pricing Comparison US-East Region

The following graph (Figure 2.19) is a normalised version of the instance pricecomparison for the US East region. The reserved prices are always around 65%of the on-demand prices, as discussed above this is the best case scenario, withthe smallest fixed fee amount per hour possible (for the one-year period). Theaverage spot price equals about 35% to 45% of the on-demand price, so it is a lotcheaper even compared to the reserved prices. The Micro instance however is anexception, its average spot price is relatively high and only about 4% cheaper thanthe corresponding reserved price.


Figure 2.19: Instance Pricing Relative Comparison US-East Region

The following is the normalised graph (see Figure 2.20) for the EU-West region.The reserved prices (when assumed that the instances are used the whole year) areabout 70% of the on-demand prices, which is relatively high compared to the US-East Region. The spot prices in the EU-West region lay between 43% and 53% ofthe on-demand prices, this is also relatively high compared to the US-East region,where most average spot prices were less than 40% of the on-demand prices. Theon-demand price in the EU-West region is relatively low or the one in the US-Eastregion is relatively high.


Figure 2.20: Instance Pricing Relative Comparison EU-West Region

In the other regions a similar graph is observed than the one for the EU-West region,which indicates that the spot and reserved price for Micro instances are lay closertogether than the prices of the different pricing models for the other instance types.

2.4.5 Data storage and transfer

For data storage the US-East, EU-West and APAC-SouthEast are the cheapestlocations, the APAC-North (Tokyo) region is a bit more expensive. The US-Westregion is the most expensive region for data storage.

Table 2.10: Data Storage Pricing

The Asian regions are the most expensive ones for data transfer, with the Tokyoregion being the most expensive one. For incoming data on the other hand all regionsdo offer the same price.

2.5. CONCLUSION 42

Table 2.11: Data Transfer Pricing

2.5 Conclusion

This chapter determined the scope of our model. The environmental parametersthat will be taken into account in the heuristic this thesis proposes for its resourcebroker are detected in the analysis of the EC2 instance prices. Other parametersthat at first seemed to be important will be ignored, since it became clear that theseparameters do not influence the total cost much. Determining the scope of the modelinfluences the complexity of the broker, there is a trade-of between completeness ofthe model and complexity of the algorithms involved.

The cloud computing market is relatively new and thus very volatile, the productportfolio and prices change continuously. It is however possible to determine sometrends and conclusions that the given analysis has shown. The following itemsinfluence the total cost of running a customer’s workload on EC2 considerably:

• The choice of region will be taken into account, we’ve shown that alwayschoosing the US-East region does not constitute an optimal strategy. As saidbefore, it is also important to take into account the introduced latencies, sincethese might be considered important for certain workloads.

• A division between the different pricing models, namely on-demand, reservedand spot instances, will be made to minimize the total cost as intelligently aspossible.

• Concerning the spot instances, the differences between the regions and the factthat there is an evolution noticeable during the hours of the day have to beaccounted for.

• Checkpointing costs, introduced when snapshotting is needed in the case ofspot pricing, will be taken into account as well.

The following choices that influence the cost of running on EC2, will be assumed tobe fixed and will not be taken into account in our heuristics.

• The choice of instance type, our broker assumes to know for a presentedworkload which instance type is the most appropriate. To make the broker

2.5. CONCLUSION 43

more complete a workload benchmark could be implemented that determinesthe corresponding instance type, more on this in chapter 4.

• The previous argument also holds for the choice of OS of the instance, we’veseen that different OSes have different prices on EC2, presumably caused byvarying amounts of licensing costs. We will however assume the OS to beknown for a certain workload, so this is no longer a degree of freedom for ourheuristic.

• The different availability zones within a region will not be taken into account,this is not necessary since the availability zones of a certain region all hold thesame prices.

• The longterm price evolution is not be considered, for on-demand and reservedpricing we’ve shown that up until now, the prices haven’t decreased very often.For spot pricing however the price trends will be taken into account.

The proposed model need constant adaptation to the current trends and conclusions.This process should be automated as much as possible. Also, the more price trendcharacteristics are taken into account, the more complete the model becomes andthe better our heuristic will help the broker in minimizing the total cost for thecloud consumer.

CHAPTER 3

RESOURCE SCHEDULING

This chapter explains how, based upon the findings in the previous chapter, anintelligent heuristic can be developed that approaches the optimal division betweenthe On-Demand, Reserved and Spot instance pricing models. An optimal division inthis case should be seen as one that minimizes the total cost for the customer. Theenvironmental characteristics that should be considered valuable for the heuristicwere already presented in the previous chapter 2. In this chapter a way to makean optimal division between Reserved and On-Demand instances is presented first.The fact that a workload can be described by different models, that all requirean appropriate resource scheduling algorithm is discussed in this chapter and it isconcluded with a discussion about how spot instances fit into the derivation.

3.1 Introduction

Cloud computing introduced a business change in many companies, since theway departments interact and the way costs are allocated is altered [31]. Cloudcomputing provides a layer of abstraction, that hides the technical details and allowscompanies to focus on aligning supply and demand, while efficiently provisioninginfrastructure. IT departments are able to react to the business’ needs more quickly,since these are isolated from the arcane business of buying, installing and managingIT infrastructure. Automating the resource allocation process is what enables acustomer to benefit from the advantages of cloud computing the fullest. Theautomation also ensures that infrastructure is managed as efficiently as possible.This automatic resource allocation heuristic should try and minimize the total costfor the cloud customer. The capacity management approach that focuses on trendand threshold models is combined in this chapter with a focus on the importance ofworkload analysis, intelligent workload placement and resource allocation.


In the cloud computing environment it is important to do strategic optimization,which is a proactive, long-term placement of the resources based on supply anddemand. Two possible consequences [3] of the shift to this form of resource allocationexist:

• Over-provisioning: means that the actual demand doesn’t equal the foreseendemand, such that too many resources get provisioned. This is a problem,since the phenomenon leads to resources with no, or a low level of, utilization.This way money is being wasted on resources that aren’t really needed.

• Under-provisioning: means that the actual demand exceeds the plannedcapacity, this is a problem since it can lead to operational risks.

The following Figure 3.1 illustrates these properties and shows that a traditionaldata center always encounters one or both of these problems. A traditional solutionisn’t as flexible in the allocation process of resources as a cloud solution. One ofthe key characteristics of cloud computing is on-demand accessibility of resources,but to exactly meet the actual demand with the provisioned amount of resources isstill a rather difficult task. It is however possible to use the Auto Scaling feature onAWS to automatically add on-demand instances, when the current demand requiresmore resources.

Figure 3.1: Traditional Data center versus AWS Resource Provisioning

Amazon offers a reserved pricing model, which introduces the phenomenon of over-provisioning on the instance level again, since it is possible to have more reservedinstances than the number of instances needed at a particular moment in time. Toachieve the best fit that minimizes the total cost, it is important to have a clear viewon the resource demand over time. We assume the workload to be known for an entireyear when determining the resource allocation scheme in this chapter. This does nottake away that foreseeing/predicting the resource demand is extremely important

3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 46

to be able to draw some of the conclusions made in this chapter. Further researchshould be done regarding patterns in the workload of typical cloud applications.Monitoring tools should be developed that give feedback information to our brokerinformation about the evolution and trends in resource demand. We restrict theplanning of resource allocation to a one year period, since it is hard to imagine whatthe present value will be of the resources in three years. A customer can’t foreseehow his workload will evolve over such a long period of time, as technology and acustomer’s needs are changing rapidly.

3.2 Optimal Division between Pricing Models

In this section we describe how to divide the workload between instances of thedifferent pricing models, namely On-Demand, Reserved and Spot instances. Weshow that for reserved and on-demand instances an optimal division is possible,while introducing spot instances into the picture makes it more difficult.

3.2.1 Reserved vs On-Demand Instances

In this section we determine the optimal division between on-demand and reservedinstances, the spot pricing model is not taken into account for now. To make itmore tractable, we start from an example workload. The following workload (see3.2) presents the needed amount of Standard Small Linux instances in the US-East region for a period of 30 days. Note that the on-demand price for such aninstance is $0.085/hour, while the reserved price amounts $0.03/hour but has to besupplemented with a fixed fee per instance of $227.5/year.

Figure 3.2: Example Workload

The example workload requires 87 instances (for a day) in total and the maximumnumber of instances needed on a single day is 4. So, the maximum number of reservedinstances needed will also be 4. Now we extrapolate the workload presented for aperiod of a month (30 days) to a workload for a year by multiplying the calculated


price with 365/30. The tables in Figure 3.3 present all the possible divisions betweenon-demand and reserved instances.

Figure 3.3: Division overview Reserved vs On-Demand

Thus, the usage of 3 reserved instances yields the desirable result, namely thesolution with the cheapest total cost. We show now that determining the tippingpoint, identified by x, is identical to finding the solution of a simple equation. Thetipping point expresses how much time an instance has to be in use, for it to becheaper to be a reserved instance rather than an on-demand instance.

cost reserved instance = 227.5 + x ∗ 0.03

cost on-demand instance = x ∗ 0.085

The reserved and on-demand instance cost price is equal when x equals 416.36instance hours, which is about 172.35 days or 47.22% of a year. If we now apply thistechnique to the example, we get the results stated in Table 3.1. It shows for eachpossible number of instances, how many percent of the time the example workloadrequires that many instances.

Table 3.1: Percentage of time a certain number of instances is required by theexample workload

We notice that 3 is the biggest number of reserved instances for which our tippingpoint value of 47.22% is still reached. This yields an optimal division containing 3reserved instances. On days that need more than 3 instances simultaneously, thesereserved instances are supplemented with on-demand instances.

Table 3.2 gives an overview of the tipping point values for Linux instances wheninstances are reserved for a 1-year period.


Table 3.2: On-Demand versus Reserved Linux instances 1-Year Overview

Notice that the choice of geographical region influences the optimal division. Microinstances for example has the lowest tipping point value in the Tokyo region, whichmeans that for this instance the reserved pricing model is already preferred comparedto the on-demand pricing model for smaller loads than it is in other regions.

Table 3.3 gives an overview for Windows instances for a 1-year period.

Table 3.3: On-Demand versus Reserved Windows instances 1-Year Overview

We notice that Windows instances have lower tipping point values than thecorresponding Linux instances. The tipping points that were calculated for instancesthat are used during a 3-year period can be found in the appendix F.

The choice of using the reserved pricing model has a lot to do with how certainone is about the expected workload. This yields the interesting research questionof whether it’s cheaper to take too less or too many reserved instances? We derivethe conditions for one less reserved instance to be cheaper than one more reservedinstance than the optimal amount. For our example workload 3 was found to be the


optimal amount of reserved instances.

PO = on-demand hourly instance price

= $0.085 for a linux standard small instance in the US-East region

PRH = reserved hourly instance price

= $0.03 for a linux standard small instance in the US-East region

PRY = reserved fixed upfront instance fee

= $227.50 for a 1-year period

S = scaling factor (in instance hours)

=365 ∗ 24

30to scale from a 30-day period to the 1-year period

OPT = optimal number of reserved instances

= 3 (for our example)

T = total number of instance-days needed per month

= 87 instances needed for a day to complete our example workload

X = number of on-demand instance-days needed per month (30 days)

in the optimal minus one situation

= 29 in case of 2 reserved instance (optimal amount is given to be 3),

our example workload uses 58 reserved instance days each month.

The rest of the workload is handled by 29 on-demand instance days.

Y = number of on-demand instance-days needed per month (30 days)

in the optimal plus one situation

= 0 in this case 4 reserved instances will be used (optimal amount is 3),

and all 87 required instance days will be

handled by the reserved instances, which yields

that no on-demand instance days are needed.

Since we want to know whether it is better to take one more or one less reservedinstance than the optimal amount, we first determine and compare the total cost ofboth possibilities.

OPTMIN1 = X ∗ PO ∗ S + (T −X) ∗ PRH ∗ S + (OPT − 1) ∗ PRY

= 29 ∗(

0.085 ∗ 365 ∗ 24

30

)+ (87 − 29) ∗

(0.03 ∗ 365 ∗ 24

30

)+ 2 ∗ 227.5

= 29 ∗ 24.82 + 58 ∗ 8.76 + 455

= 719.78 + 963.08

= 1682.86


OPTPLUS1 = Y ∗ PO ∗ S + (T − Y ) ∗ PRH ∗ S + (OPT + 1) ∗ PRY

= 0 ∗(

0.085 ∗ 365 ∗ 24

30

)+ (87 − 0) ∗

(0.03 ∗ 365 ∗ 24

30

)+ 4 ∗ 227.5

= 0 ∗ 24.82 + 87 ∗ 8.76 + 910

= 0 + 1672.12

= 1672.12

The OPTPLUS1 solution is the better choice, since it is the one with the smallesttotal cost. Formally, we would determine whether the difference between OPTMIN1and OPTPLUS1 is positive or negative to make a choice.

OPTMIN1 −OPTPLUS1 = (X − Y ) ∗ PO ∗ S + (Y −X) ∗ PRH ∗ S − 2 ∗ PRY

= 10.74

> 0

OPTMIN1 > OPTPLUS1

The tipping point for taking more or less reserved instances is the point where thedifference of OPTPLUS1 and OPTMIN1 equals 0.

0 = (X − Y ) ∗ PO ∗ S + (Y −X) ∗ PRH ∗ S − 2 ∗ PRY

2 ∗ PRY = (X − Y ) ∗ (PO ∗ S − PRH ∗ S)

X − Y =2 ∗ PRY

PO ∗ S − PRH ∗ S

X − Y =455

24.82 − 8.76

X − Y = 28.33

When X is 28.33 or more instance days higher than Y, taking 1 more reservedinstance is the better choice. Remember that X is the number of on-demand instancedays needed in a 30-day period when taking 1 less reserved instance, while Y is thenumber of on-demand instance days needed in a 30-day period when taking 1 morereserved instance. The tipping point is almost equal (except for the Micro instancetype) across all instance types and geographical regions, as shown in the Table 3.4.

Table 3.4: Tipping Point taking one more Reserved (versus On-Demand) instanceis better


3.2.2 Spot Instances

When the same technique that is used for finding the optimal division between on-demand and reserved instances, is used for spot instances, we notice that using100% spot instances would always be the cheapest option. As we mentionedbefore this does not mean spot instances should be chosen blindly. The qualityof service constraint and characteristics of the particular workload should be takeninto account. If the workload for example does not allow easy snapshotting, it wouldbe a better idea to take a reserved or on-demand instance for that workload. Weuse average spot prices in this analysis, so it is probably not always the best choiceto only take spot instances. There might be moments in time with high spot prices,at which time we would be better off using on-demand/reserved instances.

First we investigate the division between Spot and On-Demand instances for ourexample that uses Standard Small Linux instances.

cost on-demand instance = x ∗ 0.085

cost spot instance = x ∗ 0.037

The average spot price is cheaper than the on-demand rate, so choosing spotinstances will always be the cheapest solution. Table 3.5 presents the averagespot price expressed in a percentage of the on-demand price for the correspondinginstance. Since all percentages are smaller than 100, spot instances will always bethe choice with the smallest total cost.

Table 3.5: Spot Prices in Percentage of the On-Demand Prices

Next we apply the technique for the division between spot and reserved instances.

cost reserved instance = 227.5 + x ∗ 0.03

cost spot instance = x ∗ 0.037

In this case taking a reserved instance becomes the cheapest when x equals 32500instance hours, which is about 1354.17 days. This equals a period of more than ayear, so choosing the spot instance is always the cheapest option. Table 3.6 showsthat this conclusion holds for every instance region combination, since we alwaysneed to run the instance longer than 365 days while we only took a fixed fee forthe reserved instances for a one-year period into account. The ‘Not Available’ (NA)

3.3. WORKLOAD MODELS 52

fields were introduced for the cases where the average spot price was already lowerthan the reserved hourly rate, such that the spot choice would always yield thesolution with the smallest cost.

Table 3.6: Reserved prices become cheaper than spot prices after the stated amountof days

When we decide to use spot instances, we need to determine an appropriate bidprice. This bid should be high enough for us to get the desired number of spotinstances, but as small as possible to keep the total cost low. We describe how todetermine the value of the maximum bid at the end of this chapter.

3.3 Workload Models

There is no easy access to detailed cloud workload data, so we determined twoworkload models that will be used within this thesis.

3.3.1 Workload Models

A distinction is made in the broker component between two workload models:

1. In the first workload model, a task has a deadline and a certain amount of VMhours that need to be executed, these work hours can be divided over timesuch that it best fits the schedule. This workload model can be compared tothe BOINC Powered Projects [32] who have a deadline (number of days) anda process time (task length).

2. The second model specifies a task with a deadline and a certain length, forevery work hour is also specified how many VMs are needed to execute thecorresponding part of the task. This workload model could be compared to asystem that takes batch jobs, where the total load takes on a wave form.

3.3.2 Workload Constraints

An example of a description of a task using both workload models is given, whichdescribes the different characteristics/constraints involved. Each task has a nameand a description, the user of our broker will also be able to specify the instance

3.3. WORKLOAD MODELS 53

type and operating system of the instance that should be used for the task. It wouldbe best to benchmark the task to determine the appropriate instance type, but asdiscussed this falls out of the scope of this thesis. We can optionally pinpoint atask to a certain region, this is appropriate when small latencies are of importanceto the task. When the geographic region field is left blank, our broker determineswhere it is the cheapest to place the workload. We also specify whether the taskcan make use of spot instances, or in other words whether the application can dealwith snapshotting easily.

1. Workload Model 1 (total VM hours needed is specified):

Figure 3.4: Workload Model 1 Specification

This task is specified with a deadline, which is the moment in time when thetask has to be finished, and a length, which is the amount of instance hoursthe task will take in total.

2. Workload Model 2 (every hour #VMs needed is specified):

Figure 3.5: Workload Model 2 Specification

A task of workload model two is also constrained with a deadline. The lengthspecified reflects the total width in hours of the task, but each hour canrequire multiple instances to complete the corresponding part of the task.These amounts are specified in a separate workload specification file, whichis referenced in the task specification. The workload generator takes anupper and lower boundary and fluctuates the load between these boundaries.Starting with a random load (within the boundaries) for the first hour ofthe task, random amounts of workload are added each consecutive task houruntil the upper bound is reached. From then on we start subtracting randomamounts of workload until the lower boundary is reached. We repeat this

3.4. WORKLOAD SCHEDULING 54

process to generate a workload amount for every hour of the task. Thevalues are adjusted for every instance type available, according to the ElasticCompute Units (ECUs) that Amazon assigned to the corresponding instancetype. The ratio workload versus ECU will be constant for every instance type.These workload values represent the number of instances required, thus thesenumbers are rounded to the first greater integer in our broker. A snippet froma CSV file containing such a load description is given in Figure 3.6.

Figure 3.6: Workload Description File

3.4 Workload Scheduling

This section describes how to get from a number of constrained workload descriptionsto a resource schedule which satisfies all deadline constraints and that divides theload as equally as possible over time. This way we get instances that are loadedas much of the time as possible and can be marked as reserved instances as longas the tipping point, a percentage of time the resource is needed, is reached. Weschedule over a one year period, since predicting a customer’s workload for longertime frames is not feasible. EC2 is evolving quickly, such that it could start offeringnew pricing models that better fit one’s workload. On the other hand it is hard (orimpossible) to predict a realistic workload, because it is uncertain what technologywill do in three years time and how the company will evolve during this time period.For smaller time frames on the other hand, we’re unable to make intelligent decisionsbecause of the minimum 1-year fixed fee for reserved instances.

3.4.1 Workload Model 1 (total VM hours needed is specified)

To determine the optimal division between reserved and on-demand instances it isimportant to schedule the tasks’ workloads in such a way that the resources are in useas much of the time as possible. We describe the scheduling of the workload across anumber of resources here for the workload model that describes a task by the amountof instance hours that need to be executed before a given deadline. First the followingnaive scheduling algorithm is used, it assigns all the task hours of the tasks that needto be scheduled to different resources. On a single resource, task hours are assigned tothe resource up until all task hours are distributed or the instance hour corresponding


to the deadline of the task is reached. If not all task hours of the task fit on a resource,an extra resource is added.

for all t in tasks dowhile not t.isDistributed(); do

r=createNewResource();r.addPartUntilDeadlineOrEndReached(t);resources.append(r);

end whileend for

This algorithm yields the following result (illustrated by Figure 3.7 and Figure 3.8),with at total price of 619.92 US dollars for our example workload to run on EC2.

Figure 3.7: Scheduling Total Workload (basic)

Figure 3.8: Price Schedule Total Workload in US Dollars (basic)

To perform the scheduling more intelligently (preferably optimally) for this workloadmodel, the tasks are first sorted in an earliest deadline first manner. Then for everytask we assign the task hours to the instance hours of the existing resources. We startfrom the earliest available instance hour of the resource and process the task hoursuntil all the task hours are distributed or the instance hour corresponding to thedeadline of the task is reached. If the instance hours of all the existing instances arefilled up until the task’s deadline and more task hours need to be distributed, an extraresource is added. This process yields an optimal scheduling solution that provisionsthe right amount of reserved instances to get the smallest total cost possible. Howto determine the boundary (from what amount of VM hour utilization it is cheaper


to rent a reserved instance) between a reserved and an on-demand instance wasexplained earlier in section 3.2.1.

tasks.performEDFSort();for all t in tasks dofor all r in resources do

r.addPartUntilDeadlineOrEndReached(t);if t.isDistributed() then

break;end ifif r.isLast() then

resources.addNew();end if

end forend for

This intelligent scheduling algorithm resulted in a solution with a total cost of 516US dollars (illustrated by Figure 3.9 and Figure 3.10), which amounts to a decreaseof 16.76 percents compared to the naive scheduling method. The more intelligentsolution uses the reserved pricing model for all four resources.

Figure 3.9: Scheduling Total Workload (optimized)

Figure 3.10: Price Schedule Total Workload in US Dollars (optimized)

3.4.2 Workload Model 2 (every hour #VMs needed is specified)

We describe the scheduling of the workload across a number of resources here for theworkload model that describes a task by the length, which in this case is the total


width (in hours) of the task, each hour of the task can need multiple instances tocomplete the corresponding part of the task. The task is constrained be a deadlineas well. Using the basic scheduling technique, which consists of simply starting everytask at the start date and scheduling all the hours of the task adjacently, withoutgaps. When we do this for our example workload, the following result, that has atotal cost price of 37509.08 US dollars, is obtained (illustrated by Figure 3.11 andFigure 3.12).

Figure 3.11: Scheduling Workload Per Hour (basic)

Figure 3.12: Price Schedule Workload Per Hour in US Dollars (basic)

We now introduce a more intelligent scheduling algorithm (illustrated by Figure3.13), that tries to maximize the number of reserved instances that can be used whilemeeting the given deadlines. In other words it maximizes the load on the neededresources, such that the reserved tipping point is reached as often as possible.

Figure 3.13: Example Workload Model 2 Scheduling

First we developed a way to optimally schedule these tasks. Finding the optimalsolution consists of first calculating all different divisions of the workload hours over


the time available for that task, this is equal to a combination of x out of y with ybeing the total number of time slots available (between the start and deadline time)and x the number of time slots of the task. The combination formula is given inFigure 3.14. (

yx

)= y!

x !(y−x)!

Figure 3.14: Combination formula for the number of possible combinations of xobjects from a set of y objects.

Because not all the possible combinations can be stored in memory at once, acombination is only generated at the moment it is used. This is made possibleby a combination generator class that is initialized with x and y and can be used toiterate over all the possible combinations. Determining the optimal task scheduleis reduced to the task of selecting the optimal solution between the schedules. Wecalculate the cost for each schedule and are then able to determine the optimalschedule. We came to the conclusion that the time complexity of this algorithm istoo high. This is caused by the fact that all possible combinations are tried, withthe only constraint that the task hours are scheduled in order and that the scheduledoes not exceed the deadline. This technique becomes unfeasible when the differencebetween the task length and the time between the start and deadline of the taskis large. This leads to many possibilities to schedule the task, especially since thegeneration of one single schedule is already time-consuming.

We had to settle for a suboptimal solution that uses the above algorithm in smallerintervals, we divide the time in intervals and distribute the workload evenly amongthe intervals. The size of the interval is determined by looking at the number ofcombinations that would be possible within the interval and making the intervalssmaller until a given threshold value is reached. This threshold can be madebigger on a system that has more CPU and memory resources available, or canbe made smaller when a quicker scheduling result is desired. The solution withina bucket/interval is reached by minimizing the number of instances (resources)needed using the straightforward technique of trying all possible solutions, afterwhich the concatenation of the bucket schedules yields a total schedule. The factthat we minimize the number of resources needed, already makes the solutionsuboptimal in terms of finding the solution with the lowest total cost, since thisrequirement will yield an even distribution that is not necessarily the cheapestsolution. We are now dealing with a small time frame, which makes it difficultto select the solution that will use the highest number of reserved instances. Notethat the allocation of the pricing models for the different instances of the schedulehappens post-mortem, thus after the scheduling is completed. The allocationphase is explained in detail in section 4.4. To enable the selection of the bestsolution we would have to determine whether using all buckets together would reachthe reserved tipping point for a certain resource, but this would take away theadvantage of using intervals (to not have to put all possible combinations in memorysimultaneously). We prefer an option that only makes use of information within the


bucket itself, since this simplifies the solution and facilitates the possibility to makethe algorithm recursive. We also tried a scheduler that interpolates the numberof hours the instance was needed in the interval to the bigger time frame (to beable to divide between reserved and on-demand this way), but this didn’t seemto give better results than the scheduler that minimized the number of instancesneeded in each interval. The described algorithm is given in pseudocode here.

for all t in tasks dobuckets.divideEqually(t);

end forfor b in buckets do

//try all combinations, choose the one//that minimizes the number//of needed instancesb.makePlanning();

end for

Using this scheduling algorithm we obtain a result with a total price of 34531.56US dollars (illustrated by Figure 3.15 and Figure 3.16), which corresponds with aprice decrease of 7.94 percents of the total cost in comparison to the basic schedulingresult. Only 27 instances are required now, instead of the 39 instances that wereneeded at one point in the basic version. Thirteen of these instances are reservedones, while in the basic scheduling version only 11 reserved instances were used.

Figure 3.15: Scheduling Workload Per Hour (optimized)

Figure 3.16: Price Schedule Workload Per Hour in US Dollars (optimized)

3.5. SPOT DECISION MODEL 60

3.5 Spot Decision Model

The introduction of spot instances in our heuristic is still challenging. Our spotbid handling is based on the findings of “Decision Model for Cloud Computingunder SLA Constraints” [1] by Derrick Kondo. The paper presents a probabilisticmodel that can be used to answer the question of how to bid given certain workloadconstraints. Our broker can easily apply this model to automatically generateappropriate spot price bids that ensure the reliability and performance requirementsare met. This model is tailored for environments where resource pricing andreliability vary significantly and dynamically, and where the number of resourcesallocated initially is flexible and is almost unbounded. The ‘Decision Model’ alsoprovided an implementation of their algorithms that helps us to determine the bidprice that minimizes the total cost for a task of a certain length running on a certaininstance type. We ported the provided implementation ‘SpotModel’ to Java, suchthat it could perform as a component of our broker prototype. An explaination howthe spot model is used in our broker can be found in section 4.4.2.

3.5.1 Checkpointing

Spot instances get terminated and are no longer available, when the customer’s biddoesn’t exceed the current spot price any longer. Because of this, it is a good ideato take snapshots of the work that is already done. The following two possiblecheckpointing schemes are considered:

• OPT: Optimal checkpointing means that a snapshot is taken just before afailure. A failure occurs when there is a gap, an amount of time in which we donot get access to a spot instance because the spot price exceeds our maximumbid price. Note that when the spot prices are not known yet for the period weare scheduling tasks in, an optimal checkpointing scheme is impossible. It ishowever important to implement this optimal checkpointing scheme, to havea reference point to compare the performance of other implementations with.

• HOUR: Hourly checkpointing means that a snapshot is taken every hour wehave a spot instance at our disposal. This clearly has a larger overhead cost,that is amounted for in the decision model by taking a certain fixed amountof time into account that is needed to take the snapshot (and to recover fromthe snapshot the next time a spot instance is available). Hourly checkpointingis chosen here because Amazon EC2s lowest granularity of the spot instancepricing model is exactly one hour.

Adding more checkpointing schemes to our model is not too difficult, since theimplementation of a simulation of a number of checkpointing techniques can befound in the source code that comes with the “Reducing Costs of Spot Instancesvia Checkpointing in the Amazon Elastic Compute Cloud” [33] paper. The workof Kondo shows that the hourly checkpointing reduces the costs significantly in the


presence of failurs and that other schemes do not perform better. Only for a smallset of instance types an edge-driven checkpointing technique was found to result ina smaller total cost. For more in depth information, check out the paper [33].

3.5.2 SpotModel

The proposed Decision Model has the following random variables (see Table 3.17)on which constraints can be placed in the implementation.

Figure 3.17: Decision Model [1]: variables

The following example (see Figure 3.18) shows what the different variables mean inpractice. We notice that a failure occurs when the spot price exceeds the bid price,and that checkpointing and restarting time is taken into account in the adjacentspot instance hours to this gap.

Figure 3.18: Decision Model [1]: practical meaning of variables


T = task lenght (useful computation time)

= 6 hours

ET = T + chpt + failure + restart

= 10 hours

AT = availability

= 8 hours

M =∑

(usedPrice ∗ numberOfHoursUsed)

= 0.1*3 + 0.2*4 + 0.3*1

= 1.4 USD

EP = MAT

= 0.175 USD/h

AR = ATET

= 0.8

UR = TET

= 0.6

The Decision Model has the following workflow (see Figure 3.19), in which decidingwhether a certain submission is feasible involves actually simulating the running ofthe workload and looking whether there are results that meet the constraints.

Figure 3.19: Decision Model [1]: workflow graph

The decision model performs simulations that use real price traces of Amazon EC2,but it selects a random starting point in this price history to start scheduling thetask on. Because of the number of simulations involved, the result will give arealistic expected cost price. The output of the program can be used to make anintelligent bid for the spot price (one that ensures the task to be finished before thegiven deadline and within the foreseen budget). To find the optimal bid price, wedetermine whether there exist combinations of the parameters of the model that arefeasible or in other words we select all possibilities that result in the task meetingits time constraints. This is done by “look-ups” in tables of previously computeddistributions, such that the processing effort is negligible. Among the feasible cases,we select the one with the smallest cost. If no feasible cases exist, the job cannot beperformed under the desired constraints.


Using the results of the example run of the decision model implementation in thepaper, the following graph (see Figure 3.20) was created. It shows us that for acertain bid price and confidence level (p) to meet the given constraints, we get theexpected execution time. We notice that when the bid price becomes too low theexecution time increases significantly and also that an increasing level of confidenceyields a larger execution time. One of the findings of the paper was that biddinga low price yields cost savings of about 10%, but can lengthen the execution timesignificantly.

Figure 3.20: Decision Model [1]: execution time - bid price - confidence level graph

3.5.3 Implementation Changes

First of all, we started refactoring the software [34] into more classes and methods tomake the code more readable, Javadoc comments were added as well. The originalcode needs a lot of memory, because it reads a couple of hundred thousand recordsto memory from the input CSV file that includes the spot price history. It then usesthese values to do a number of simulations. The program only writes all the resultsfrom memory back to file at the end of all simulations. This process is repeatedfor every possible task length and checkpointing scheme combination. To boost theperformance we adjusted the source code, such that only the history spot prices ofone instance type are read into memory at a time and used for the correspondingsimulations. This almost made all data structures used in the program a dimensionsmaller. Doing this was not a problem, since there are no correlations between thedata records of different instance types, or in other words it is possible to do thesimulation for every instance type/category separately.

We also changed the input file for the Decision Model, previously it took onedata.csv file as input containing the spot price data for every instance type (different

3.6. CONCLUSION 64

columns). This file did not contain any date information, every record containedthe price for a minute of time. The application therefore takes the CSV files fromcloudexchange.org as input, which means there is a separate file for every region-OS-instance combination that has two columns: the first one contains the date, thesecond one the corresponding spot price for that instance at that time.

3.6 Conclusion

This chapter showed how to make the optimal division between reserved and on-demand instances, as illustrated by Figure 3.21. This basic level of workload thatrequires reserved instances to minimize the cost price, is easily determined for acertain instance type, operating system and geographical region combination. Itinvolves solving a simple equation to determine the tipping point for the particularsituation and comparing the actual use with this tipping point to decide whether ornot to use the reserved pricing model.

Figure 3.21: Workload Base Level Reserved Instances

Whether taking too many or too little reserved instances is the better (in terms oftotal cost) option was investigated, it was found to depend on the characteristics ofthe workload involved.

We then introduced the two workload models that are used during the developmentof the broker prototype. The first one describes a task by stating the total amountof VM hours that need to be executed and a deadline at which time the task has tobe finished. The second workload model describes a task by stating the length of thetask, which in this case equals the total width (in hours) of the task, every hour ofthe task could however still need multiple instances to complete the correspondingpart of the task. We developed an algorithm that schedules tasks of both models ina way that tries to minimize the total cost. These can be used to determine how toget to a resource schedule which satisfies all deadline constraints and that dividesthe load as equally as possible over time, starting from a number of constrainedworkload descriptions. Once the schedule is made, we can determine which pricing

3.6. CONCLUSION 65

model is most appropriate for each resource involved. To make the division betweenreserved and on-demand instances the tipping point values determined earlier in thechapter (see section 3.2.1) can be used.

To introduce spot instances in our heuristic, we use the port of the ‘SpotModel’application that accompanies the “Decision Model for Cloud Computing under SLAConstraints” [1] paper. It provides a way to determine the bid price that minimizesthe total cost for a certain instance type and a task of a given length in the spotmarket, all while meeting deadline and budget constraints with a certain confidencepercentage. Details about the integration of this software in our broker can be foundin section 4.4.2.

CHAPTER 4

BROKER DESIGN

This chapter introduces the developed prototype of the broker and the underlyingheuristics and algorithms. The broker is a tool1 that tries to optimally map theworkload of a consumer to the different pricing models offered by Amazon EC2,namely on-demand, reserved and spot instances. The consumer’s jobs can be boundby a set of quality of service constraints, such as a deadline by which the task needsto finish. The potential for cost reductions through intelligent instance allocationsare shown to be huge in the previous chapters, where we stated that the spot pricesare almost always a lot cheaper than the on-demand prices. The different algorithmsand broker components, involved in reaching an intelligent schedule from an inputof constrained workloads, are discussed within this chapter. The evaluation of theunderlying heuristics of the broker is done in the next chapter (see chapter 5).

4.1 Introduction

The cloud computing market is relatively new and is rapidly changing, the spotmarket for example was only introduced in EC2 in December 2009. EC2 now offersthree pricing models, namely on-demand, reserved and spot pricing. It does howevernot offer any tools to its consumers to optimally map workloads and correspondingQoS requirements, such as a deadline by which a workload needs to be finished, tothese pricing models. This chapter describes the broker prototype which implementsthe proposed heuristics in order to map a number of constrained workloads to anintelligent resource allocation schedule, that tries to maximize the cost reductionsfor the consumer while still meeting the given deadlines.

1The Java prototype of our broker can be found on-line, see appendix G for downloadinstructions.


Our broker consists of four easily separable tasks that are equally important, whichare schematically shown in Figure 4.1 and afterwards briefly explained.

Figure 4.1: Broker Task Components Overview

• Input The first component of the broker prototype is labeled as the inputof the broker. It consist of a task generation and specification componentthat is used to specify (and generate) tasks that utilize the previously definedworkload schemes, see 3.3.1. A task’s characteristics and the correspondingworkload is specified in CSV files using a predefined format. Anothercomponent of the broker is the part that provides and analyzes the prices ofthe different instances (which are identified by the combination of an operatingsystem, an instance type and a geographical region) across the existing pricingmodels. The on-demand, reserved and spot prices can’t all be accessed throughthe EC2 API2, such that we need to foresee our own component that providesthis information to our broker. The spot pricing component needs to trackthe history of the prices and analyze the trends that occur, as was explainedin the environmental analysis chapter (see chapter 2).

• Scheduling The next step in the brokering process is the scheduling of thetasks (specified by the input component) across the different geographicalregions. The scheduling is performed in a way that tries to minimize thecost for the consumer as much as possible. A second function of this partof the broker is to spread the workloads over time, while taking into accountthe start and deadline constraints of the workloads. Spreading the load in awell-balanced way (meaning in an evenly distributed form) ensures that theresources will be in use as much of the time as possible, which is necessaryto be able to use the cheaper, in comparison to on-demand pricing, reservedpricing model as much as possible.

• Allocation The next brokering step is the allocation phase in which isdetermined what pricing model, namely on-demand, reserved or spot pricing,

2Only three months worth of spot price history can be retrieved through the EC2 web serviceand it doesn’t provide easy access to the on-demand and reserved prices either.

4.2. BROKER INPUT 68

should be used for a certain resource, given the load that was scheduled on thisresource in the previous step of the brokering process. The resource allocationalgorithms incorporate the conclusions that are based on the analysis of thepricing history in the environmental analysis chapter (see chapter 2). Thiscomponent uses the pricing input to determine the tipping point in termsof resource utilization percentage. These tipping points reflect from whatamount of resource utilization it is better to use the reserved pricing modelrather than the on-demand one for a certain instance, see section 3.2.1. Thedetermination about the usage of the spot pricing model depends on whetherthe tasks scheduled on a resource are allowed to run on spot instances. If alltasks are spot-enabled, the spot pricing model is used. If it’s a combinationof spot-enabled task and tasks that do not allow the spot pricing model, thetipping point for reserved versus a combination of spot and on-demand iscalculated. The underlying algorithms are discussed in section 4.4.

• Output The last step of the brokering process consists in presenting thecalculated schedule and associated costs to the consumer. A graphicalrepresentation, which is presented in the form of a Gantt chart, and a textualrepresentation of the schedule will be provided by our application. Bothrepresentations are accompanied by a detailed overview of the costs associatedwith the proposed schedule. The cost is the metric to measure the performanceof our broker, since our goal is to minimize the total cost for the consumer.

These different components of the proposed broker prototype are presented in thischapter, their software designs are briefly discussed and the algorithms involved areexplained.

4.2 Broker Input

The input component of the broker consists of a task generation and specificationcomponent and a price gathering and analysis component, both are presented in thissection.

4.2.1 Task Generation and Specification

In the previous chapter, see section 3.3.1, two workload models were introduced thatare used to describe tasks that have to be scheduled by our broker in such a waythat the total cost is minimized. The first model describes a task by stating thetotal amount of instance hours that need to be executed and a deadline that stateswhen the task has to be finished. The second workload model describes a task bystating the length of the task and specifying for every hour of the task how manyinstances are needed to complete the corresponding part of the task.

The task’s specification input file (see Figure 4.2) contains the name and adescription of the task. The file provides information about the associated workload


and the deadline of the task as well. The task specification includes informationabout the EC2 instances on which the workload should run: the instance type, theoperating system and whether the spot pricing model may be used for the workload isspecified. The region in which to run the task can be specified as well, this is howeveroptional since cost reductions can be achieved by automatically determining thecheapest location. Sometimes it is however important to run a certain applicationin a specific geographical region, for example when the users of the application aresituated in the same region and small latencies are desirable or for legal purposes.Whether a certain region is required by a workload is hard to decide, since it is basedon a large number of workload characteristics. The decision whether the choice ofregion is free is left to the user of our broker for now.

Figure 4.2: Broker Input Task Description Different Workload Models

When a task is generated using the task generator, a CSV file is created based on anumber of program arguments that specify the different properties of the task. Theuser chooses to generate a task using one of the two provided workload models, theworkload for the second model is generated randomly within the specified boundariesas was discussed before, in chapter 2. You can of course inject your own workloadtraces as well.

Figure 4.3 gives an overview of the design of the task specification input componentof the broker.

Figure 4.3: Broker Input Task Design


The output of the broker input step is a TaskCollection, that is passed to thescheduling component. The tasks in the list of the collection are read from taskspecification CSV files, and each Task contains a task specification and a workload.The workload is represented by an object corresponding to the workload modelof the task. A workload of the first model is specified by a list of SubTasks,while a workload of the second type is specified by a list of SubTaskCollections,and a SubTaskCollection is a list of SubTasks. This way, we can represent theappropriate amount of workload for every hour of the task. The TaskSpecificationconsists of a name and description for the task, an earliest start date and a deadline,whether the task is allowed to run on spot instances and a description of the instanceon which the task can be performed. The InstanceSpecification consists of theoperating system, the instance type and the geographical region of the instance thatis required for the task.

There are a couple of limitations of the broker introduced in this section, sincewe require the user to specify certain properties manually in the task specifications.The properties include the instance type and the operating system of the instancethat is most appropriate for the task and whether spot instances can be used for thetask. These characteristics should be determined automatically as well, this howeverneeds further research. The first part of the broker should then iterate the tasksthat have to be scheduled in order to determine different characteristics of the giventasks.

for all task in providedTasks dotask.determineAppropriateOperatingSystem();task.determineAppropriateInstanceType();task.determineWhetherSpotEnabled();

end for

One of the limitations of the broker prototype is that the determination of whethera certain task can be ran on a spot instance has to be specified by the user of ourbroker. Whether a task is spot enabled, is dependent on the ability of the applicationto handle frequent gaps within the availability of the resource. These gaps aredealt with by using a checkpointing scheme that takes snapshots at the appropriatemoments in time (for example at the end of every hour) and recovers from an outagethe next available instance hour by recovering using the snapshot taken earlier, theseschemes were discussed before in section 3.5.1. Certain applications are able to dealwith these situations better and can thus efficiently use the spot instance market toreduce the cost for the consumer.

The determination of the appropriate instance type for a given workload, canbe automated too. The instance type for a certain workload can be determinedby benchmarking the workload on a virtualized Xen environment, since it’s suchan environment that instances in EC2 are hosted on. The benchmark wouldmeasure certain characteristics of the workload, such as the CPU, disk and memoryrequirements. With the acquired information a mapping can be made to the instance


types that are offered by EC2. My research project [35], entitled “Instrumentationof Xen VMs for efficient VM scheduling and capacity planning in hybrid clouds”,enables the possibility to monitor a Xen environment that is loaded with a certaintask and could be extended by a set of metrics that are valuable for the mappingof a virtual machine (VM), running a certain task/workload, to the virtual machineprofiles of EC2.

4.2.2 Price Gathering and Analysis

The provision of an interface to access the price for a given instance using a certainpricing model is important for the broker, since it is designed to determine a resourceallocation scheme that minimizes the total cost. Access to up-to-date on-demand,reserved and spot prices is crucial to the correct working of the broker algorithms.We provide an interface to the other components of the broker to access pricinginformation for the three EC2 pricing models, see Figure 4.4. A user can requestthe price of a certain InstanceSpecification (see 4.2.1), which uniquely definesan instance by its type, the region and the operating system. When the spot price isaccessed, a date and time can be provided to access a spot price of a certain momentin time, if it is not provided the latest price is returned. A Price object can be onethat contains an hourly rate amount and the corresponding currency, but in the caseof a ReservedPrice it will contain the fixed price as well.

Figure 4.4: Broker Input PriceWatch Design

The on-demand and reserved prices are provided through an input CVS file, sincethese values can’t be easily acquired by an EC2 API call. The only way to acquirethe prices there is by launching an instance of the corresponding type and queryingfor the cost of the instance hour. This is why the choice was made to manually inputthe prices in the broker component. The required information can be found in theEC2 pricing section on the AWS website [27].

Spot pricing information is provided by the SpotWatch tool, it accesses the newspot prices daily (through an API call to the EC2 web service) and adds them to itshistory database. The SpotWatch application can be used to access the spot pricehistory and to acquire certain statistical properties of the price traces, see section2.3.4.

4.3. BROKER SCHEDULING 72

4.3 Broker Scheduling

After acquiring and calculating the required pricing data and specifying the workloadof the different tasks, it is possible to schedule these workloads across the differentgeographical regions of EC2.

Figure 4.5: Broker Scheduling Component

Figure 4.5 shows that the scheduling component uses the tasks with their correspond-ing deadline constraints together with the EC2 pricing information as input. Thescheduling component then divides all workloads across the different geographicalregions of EC2, namely US-East, US-West, EU-West, APAC-Tokyo and APAC-Singapore. This section explains how this distribution is made, such that thegoal of minimizing the cost is achieved. The workload distribution part of thebroker’s scheduling component then distributes the load evenly over time. An evendistribution maximizes the utilization rate of the involved resources in order tomaximize the usage of the reserved pricing model, which was shown to be cheaperthan the on-demand one once a certain utilization tipping point is reached (seesection 3.2.1). The scheduling algorithms for the different workload models arealready presented in the previous chapter, see chapter 3.4. These algorithms howeverare improved upon here, such that handling workloads that are allowed to utilizespot instances are supported too. The output of the scheduling component of thebroker is an in-memory schedule of the load per instance (identified by a certainoperating system, instance type and geographical region).

4.3.1 Region Allocation

There are large differences between the prices in the different regions, as wasdiscussed in a separate section 2.4 in the environmental analysis chapter, thus it’simportant to make an intelligent choice between the regions for every workload thatis presented to the broker.

The algorithm for the region allocation is presented here.

for all task in providedTasks doif task.hasRegionAssigned() then

//Do Nothing


elseinstanceDescription=task.getInstanceDescription();if task.isSpotEnabled() then

cheapestRegion=determineSpotCheapestRegion(instanceDescription,task.getTimePeriod());

elsecheapestRegion=determineCheapestRegion(instanceDescription);

end iftask.setRegion(cheapestRegion);

end ifend for

The algorithm iterates over all tasks and assigns them to a geographical region, whenthis was not yet done by the user. Our users can pinpoint a task to a region, when thetask requires the application to be run at a certain location. This can be importantwhen low latencies (to reach an external service in a certain region) are requiredby the application. When a region is not yet assigned to a task, we determine thecheapest one purely based on the pricing and do not take into account the loadearlier appointed to the different regions. Note that further price reductions can bereached by first scheduling the tasks that are already pinpointed to a certain regionby the user. The other tasks can afterwards be appointed to a region in a way thatminimizes the total cost, taking into account the existing schedules. The decisionin our broker is only based on the pricing in the different regions, which keeps thebroker’s strategy straightforward and reduces the computational complexity of theschedule. There is a trade-off between broker complexity and performance.

The algorithm used to determine the ranking of the regions for on-demand andreserved pricing for a specific instance (of which we know the instance type andthe operating system) is discussed now. The on-demand and reserved prices of theinstance are separately ranked from low to high. If these rankings select the samegeographical region as the cheapest one for the corresponding instance specification,the broker will assign tasks to this region. For Linux instances we conclude that thisis always the case, Figure 4.6 demonstrates this for the Standard Small instances.

Figure 4.6: Broker Region Scheduling Not-Spot-Enabled Tasks: Standard SmallLinux Instances

Note that the numbering of the regions is done from the cheapest to the mostexpensive one, but consecutive prices are higher or equal to each other. For StandardSmall Windows instances the region ranking was not the same for the on-demandand the reserved pricing model, but both rankings indicated that the US-East regionhas the cheapest price (see Figure 4.7). There is no problem in selecting the cheapest


region here, but if the rankings differentiate more problems could rise. In case thisoccurs we could introduce a parameter that indicates how much the on-demand andthe reserved price influence the decision. We could for example let the on-demandprice count for x% (and the reserved one for (100-x)%), with x being the averagepercentage of the total number of instance hours a not-spot-enabled task runs usingthe on-demand pricing model (based on empirical data). This technique ensuresthat the typical usage of on-demand versus reserved prices is taken into accountwhen a geographical region is chosen.

Figure 4.7: Broker Region Scheduling Not-Spot-Enabled Tasks: Standard SmallWindows Instances

For the determination of the ranking of the different regions for a spot-enabled task,the average Q3 percentile value (during the period in which we are scheduling thetask) is used. The period in which the average Q3 value is examined is defined asthe period from the task’s start date until its deadline. If a broker is developed thatdoes price predictions, the coming spot prices are not known and the average Q3value over the course of the last month could for example be used. For the StandardSmall instances the ranking that is found is shown in Figure 4.8 (for March 2011).

Figure 4.8: Broker Region Scheduling Spot-Enabled Tasks: Standard SmallInstances

Note that the Reserved instances give the same ranking for the geographical regionsthan the ranking based on the spot prices. When there are instance hours left on areserved instance this unused capacity will be used by spot-enabled tasks as well. Ifthere are no free instance hours left, the cheapest solution is to use the spot pricingmodel (instead of the on-demand one) (see the environmental analysis chapter 2).A check is however performed during the allocation phase to see whether the spotprice crosses the on-demand price on the corresponding moment in time. If this isthe case, the on-demand pricing model is used.

4.3.2 Workload Distribution

In the previous chapter we introduced two workload models, and we explainedhow both types of workload can be scheduled on resources within a geographicalregion (see section 3.4). These algorithms resulted in a schedule that divided the


provided workloads as equally as possible over time, while still meeting the deadlineconstraints of all the tasks involved. This section explains how to take spot-enabledtasks into account in the scheduling algorithms of the broker.

Workload Model 1 (total VM hours needed is specified)In the first workload model we handle the spot-enabled tasks separately, thesetasks are added to the schedule after the other tasks have been scheduledalready.

notSpotEnabledTasks=tasks.subsetNotSpotEnabled();spotEnabledTasks=tasks.subsetNotSpotEnabled();performSchedulingWLM1(notSpotEnabledTasks);performSchedulingWLM1(spotEnabledTasks);

This is done by calling the algorithm presented in the scheduling section 3.4.1 twotimes (the function is called performSchedulingWLM1 in the algorithm 4.3.2), firstfor the not-spot-enabled subset of the tasks and afterwards for the spot-enabledones. Scheduling the tasks in this order will cause a larger amount of gaps for thespot-enabled tasks, which these tasks should be able to cope with better. It alsoensures that as much of the not-spot-enabled workloads as possible will be co-locatedon the first resources in the list of the schedule. These resources have the highestchance of reaching the reserved pricing tipping point, since the highest amount ofworkload is scheduled on them. This way, as much of the not-spot-enabled tasks aspossible will be located on the resources that will be assigned to the reserved pricingmodel in the next step of the brokering process. Not-spot-enabled task parts need touse the on-demand pricing model when they are scheduled on a resource that doesnot reach the reserved tipping point, since these are tagged as not suited for spotinstances. The on-demand prices have been shown to be more expensive than thecorresponding reserved prices, thus it is important to schedule the not-spot-enabledtasks first in order to achieve the cheapest result possible.

Workload Model 2 (every hour #VMs needed is specified)In the second workload model the spot instance possibility needs to be accountedfor too, here a little adjustment to the previously introduced algorithm 3.4.2 ismade.

for all t in tasks dobuckets.divideEqually(t);

end forfor all b in buckets do

//try all combinations, choose the one//that minimizes the number//of needed instancesb.makePlanningWithoutSpotEnabledTasks();b.addSpotEnabledTasksToPlanning();

end for

In every bucket we want the spot instances to snoop away as little reserved slots inthe final schedule as possible. Since we focus on having an as heavy load as possible

4.4. RESOURCE ALLOCATION 76

on the first resources of the schedule, it’s better to handle the spot enabled tasksafter the other ones are already scheduled on as little as possible resources. Notethat when we add the spot enabled resources to the schedule, we again try to keepthe total amount of resources required as little as possible.

4.4 Resource Allocation

The resource allocation component of the broker determines which EC2 pricingmodel, namely on-demand, reserved or spot pricing, is best suited for a certainresource that is part of the schedule calculated by the scheduling component of thebroker. These pricing model allocation choices are made with the goal of minimizingthe total cost for the consumer.

Figure 4.9: Broker Allocation Component

Figure 4.9 shows that the resource allocation component of the broker uses thetask schedule made for every instance required by one of the provided tasks asinput. An instance is specified by an operating system, an instance type and ageographical region. The resource allocation algorithm makes a lot of choices basedon the conclusions of the analysis on the pricing history in the environmental analysischapter (see chapter 2). The algorithm determines whether a certain resource ofthe schedule (determined in the previous brokering step), reaches the tipping pointutilization rate for it to be better (or in other words cheaper) to use the reservedpricing model rather than the on-demand one. The output of this component ofthe broker is an in-memory representation of the instance specific schedule it got asinput annotated with cost information.

4.4.1 Reserved Model

Once the tasks’ workloads are scheduled, it is possible to determine which pricingmodel is most appropriate for each resource involved. There are three differentsituations that can occur in terms of different types of tasks that are scheduled ona specific resource.

• Only spot-enabled tasks are scheduled on the resource. In this case all thetask parts will be using the spot pricing model, since this was found to be

4.4. RESOURCE ALLOCATION 77

the cheapest model on average. The SpotModel (see section 4.4.2) is used todetermine how the subtasks assigned to the resource will be scheduled in timeusing the bid price received from the spot model. When the spot price crossesthe on-demand price, the corresponding task hours use the on-demand pricingmodel.

• Only not-spot-enabled tasks are scheduled on the resource. In this casethe determination whether this resource should be provisioned as a reserved in-stance, can be done using the following algorithm.

tippingPoint=getTippingPoint(resource.getInstanceDescription());usageCounter=0;for all slot in slots do

if slot.isTaken() thenusageCounter++;

end ifend forif (usageCounter/slots.getAmount()) ≤ tippingPoint then

resource.markAsReserved();else

resource.markAsOnDemand();end if

In this algorithm we determine (in the function getTippingPoint) the instancespecific tipping point using the technique explained in the section 3.2.1 thathandles about the division between on-demand and reserved instances in theprevious chapter. Then the percentage of time the resource is needed (becausea task part was scheduled on it) is determined, to be able to check whetherthe tipping point is reached by the resource. If so, the reserved pricing modelis used. On the other hand when the tipping point is not reached, the loadswill be scheduled on an on-demand instance.

• Mixture of spot-enabled and not-spot-enabled tasks. In this case weneed to decide when the use of the reserved pricing model is cheaper than theuse of a combination of on-demand and spot instances (for the tasks that arespot enabled). We use the following definitions in the tipping point calculation:

x = number of hours a part of a spot-enabled task

is scheduled on the resource

y = number of hours a part of a not-spot-enabled task

is scheduled on the resource

cost reserved instance = hourlyResPrice + (fixedResPrice/(x + y))

cost on-demand instance = hourlyOnDemandPrice

cost spot instance = avgHourlySpotPrice

4.5. BROKER OUTPUT 78

The equation to check when taking a reserved instance is the preferred choicethen becomes:

cost reserved instances ≤ cost on-demand instances

+ cost spot instances

(x+y)*cost reserved instance ≤ y ∗ cost on-demand instance

+ x ∗ cost spot instance

(x+y)*hourlyResPrice ≤ y ∗ hourlyOnDemandPrice

+ x ∗ avgHourlySpotPrice− fixedResPrice

4.4.2 Spot Model

The workloads that are flagged to be able to run on spot instances are scheduledon spot instances as long as there are no reserved instance slots available. Oncethe decision is made for a certain task to run in on the spot market, we usethe ‘SpotModel’ port of the application accompanying the “Decision Model forCloud Computing under SLA Constraints” [1] paper. The findings of the paper areexplained in the previous chapter, see section 3.5. It provides a way to determinethe maximum bid price to use on the EC2 spot market for a certain instance and atask of a given length, while meeting deadline and budget constraints with a certainconfidence percentage. Our broker selects the bid that meets the constraints with aconfidence of 99%. Once the appropriate bid amount is determined, we still have todetermine on which slots the task parts will be scheduled. When the spot prices areunknown, one could wait and see what the new spot price is and determine whether atask should be scheduled on the next slots by checking whether the current spot priceis smaller than the maximum bid. In our broker prototype our findings about thespot market (see chapter 2) are incorporated when the scheduling occurs, weekendand night slots are preferred since these tend to have lower prices. We first checkwhether the weekend slots have prices below the bid price that was decided to beused. Then we check night slots and afterwards the task parts that were not yetscheduled are appointed to time slots that have prices below the bid price as well.

4.5 Broker Output

The output component of the broker consists of two possibilities of presenting theproposed schedule to the consumer. A graphical representation in the form ofa Gantt chart and a textual representation of the schedule are provided by ourapplication. Both representations are accompanied by a detailed overview of thecosts associated with the proposed schedule, which is important since the brokertries to minimize the total cost for the consumer.

4.5.1 Graphical Representation

A graphical representation of the resulting resource allocation schedule of the brokeris made, for which we use the Gantt chart that is available in the JFreechart library

4.5. BROKER OUTPUT 79

[36]. A Gantt chart is a type of bar chart that generally illustrates a project schedule.Their elements have a start and finish date and sometimes the dependencies betweenthe components are indicated as well. The domain axis represents the time. Avertical line is drawn to show what components should already be finished accordingto the schedule. This type of graph was found to be appropriate to represent theinformation of our schedule. The JFreechart library provided an easy way to presentthe information in such a way, and it makes the zooming process to see parts of theschedule in more detail possible, as shown in Figure 4.10. Note that all subtasks ofa certain task get the same unique color assigned.

Figure 4.10: Broker Output GUI Zooming Capability

A popup providing information about the subtask that is pointed too was addedas well, it is shown in Figure 4.11. It provides information about the task thesubtask belongs too such as its name, description and deadline. Information aboutthe subtask itself is given, its start and end date and time, the price that has tobe paid for running the subtask at that moment in time and what hour of the taskthe subtask corresponds too. Information about the instance on which the subtaskhas to be executed is given too: the geographical region, the operating system, theinstance type and the pricing model.

Figure 4.11: Broker Output GUI Details SubTask

4.6. CONCLUSION 80

4.5.2 Textual Representation

The schedule and resource allocation scheme can be stored in a CSV file. It’simportant to have a way to store and share brokering results, such that once thecalculations are performed the results do not get lost. This opens the possibility toimplement import functionality that loads a schedule from file into the GUI of thebroker prototype. This way, schedules that were made earlier, can still be examinedusing the GUI tools that present them in an easy to browse way. Extracting therequired information from a textual representation of the schedule is a lot harder.

4.5.3 Detailed Cost Overview

Both representations of the resource allocation schedule are accompanied by adetailed overview of the costs. The cost of every allocated instance hour can beconsulted as well as the cost per task. Also an hierarchical structure (see Figure4.12) of the costs is given. It is subdivided (in this order) in the geographical regions,the instance types, the operating systems and the pricing models involved. The orderof these categories can be changed in the settings of the broker GUI.

Figure 4.12: Broker Output GUI Cost Overview

This cost overview is important for the evaluation of the performance of the proposedbroker, which is explained in the next chapter.

4.6 Conclusion

This chapter presented a design of the developed broker prototype. The broker mapsthe consumer’s workloads and QoS requirements, such as the deadline by which aworkload needs to finish, to the different pricing plans offered by EC2. The potentialfor cost reductions through this intelligent instance allocation scheme is huge, thespot prices for example are on average a lot lower than the on-demand prices. Theheuristics given in this and previous chapters all try to realize the optimization goalof making the schedule of the workloads as cheap as possible for the consumer.

The broker’s working process is divided into four different components, an overviewof this is given by Figure 4.13.

4.6. CONCLUSION 81

Figure 4.13: Broker Design Overview

The input-based component provides pricing information and a task generationand specification part that delivers the constrained workloads for which thebroker develops a cost-efficient schedule. The scheduling component provides thefunctionality that assigns the different workloads to a geographical region. Perregion, an instance specific (a certain operating system and instance type) scheduleis created, which divides the load as equally as possible over time. The resourceallocation component then takes this schedule and determines for every resourceneeded, what pricing model is suited best in order to minimize the total cost for theconsumer. The schedule’s components are annotated with the corresponding costsit is then graphically or textually presented by the output component of the brokerto the user. The graphical representation is a Gantt-like graph that shows the taskparts scheduled over time across different resources. An accompanying detailed costoverview is provided for the consumer as well.

There are a number of items taken into account in the developed broker prototype,since these were found to influence the total cost of running the customer’s workloadon EC2 considerably. The choice of geographical region is automatically chosen, inorder to minimize the cost. This choice however influences the quality of service,in terms of network latency, of the application running on EC2. This requires thepossibility for the user to assign a task to a certain region (when high latencies arenot desirable). It’s the user’s responsibility to weigh this advantage with respectto the possible cost reductions when this choice is left to the broker. A divisionbetween the different pricing models is made by the broker too. A pricing model,namely on-demand, reserved or spot, is allocated for every resource of the createdschedule. The fact that the spot price history showed certain price trends, such ascheaper prices during weekends, is taken into account.

There are of course a number of restrictions of the proposed broker prototype. Anumber of heuristics that we developed can be replaced by algorithms that providemore optimal schedules, but these algorithms were labeled as not feasible sincetheir time complexity was too high. The basic algorithm that schedules the load(workload model 2) for a specific instance as equally as possible is an example ofsuch an algorithm (see section 3.4.2). The fact that we determine a region for aninstance purely based on pricing, without considering the load that has alreadybeen appointed to the geographical regions, makes our schedule suboptimal. Theconclusions made in the environmental analysis chapter (see chapter 2) have beentaken into account as much as possible. Certain things that could be automatedwere not elaborated in this thesis, such as the determination of the instance typethat suits the characteristics of a certain workload best. The operating system

4.6. CONCLUSION 82

and instance type that should be used to process a certain task were assumed tobe provided by the user in this prototype. Benchmarking a workload in order toacquire a number of resource utilization characteristics, enables the automation ofthe mapping of the workload to an instance type.

CHAPTER 5

BROKER EVALUATION

This chapter evaluates the performance of the proposed broker and the underlyingalgorithms. The cost savings achieved by the introduction of the broker areevaluated. The benchmarking of the broker does not only indicate the reachablecost reduction, but also measures the scalability of the proposed heuristics.

5.1 Introduction

To evaluate the working of the broker model and implementation (see chapter 4),the prototype needs to be benchmarked. The benchmark provides access to thedata required to make conclusions concerning the cost reduction and scalability ofthe broker prototype.

To be able to make a comparison between the achieved cost savings, the brokeris provided a certain workload using different scheduling and resource allocationoptions. Remember that the scheduling part distributes the different task hoursacross a number of required resources in time. The resource allocation process onthe other hand determines the appropriate pricing model for a resource on whichtasks are scheduled. In section 3.4 of the ‘Resource Scheduling’ chapter, differentversions of the scheduling algorithms for both workload models (see section 3.3) werepresented:

• Basic scheduling is the naive method in which every task is scheduled on aseparate resource. The task hours are scheduled consecutively starting at thebeginning of the schedule. This scheduling method results in a worst-caseschedule and thus provides a good point of comparison in order to discover thecost savings achieved using the proposed optimized scheduling technique. More


details about the underlying algorithms, including the distinction between thetwo workload models, can be consulted in section 3.4.

• Optimized scheduling is the method in which the task hours are distributed asequally as possible over time, in order to get resources that are loaded as muchof the time as possible. This enables a larger number of the resources to usethe reserved pricing model, which is cheaper than using on-demand instancesonce a certain utilization tipping point is reached. More details about thedistinction between the algorithms for both workload models can be found insection 3.4.

• Spot-Optimized scheduling is an alteration of the optimized algorithm, thatensures that the least loaded resources contain as much spot-enabled taskparts as possible. This way the heavily loaded resources get to use the reservedpricing model, while the spot pricing model can be used for the other resourcesas much as possible. Since spot prices are considerably lower on average thanthe prices of other pricing models, this helps the broker in achieving the goalof minimizing the total cost for the customer. The scheduling algorithms arealtered such that the spot-enabled tasks are scheduled after the other ones,details can be found in section 4.3 of the previous chapter.

A number of different resource allocation options were built into the algorithmspresented in section 4.4:

• Only On-Demand resource allocation means that no actual resource allo-cation is performed, all instances get to use the on-demand pricing model.This option represents a naive allocation scheme that resembles the worst-casescenario in terms of cost price for the consumer.

• On-Demand & Reserved resource allocation means that for every resourceinvolved in the brokering process, it is checked whether the reserved tippingpoint is reached. In other words, when using the reserved pricing model yieldsa lower total cost, than the on-demand pricing model, the reserved one is used.

• Spot enabled resource allocation is the scheme that takes spot instances intoaccount, it uses the spot model implementation to determine an appropriatebid that results in an allocation decision according to the rules described insection 4.4.2.

• Optimal Spot resource allocation is the scheme that does not use thespot model, instead it determines the optimal spot instance hour allocationsaccording to the actual spot history prices. The algorithm uses the fact thatthe history of the spot prices is known during the time period in which thebroker is creating a schedule, such that the task hours of a spot-enabled taskcan be appointed to the instance hours that correspond to the lowest spotprices. This results in a cost price that has an optimal (lowest possible) spotprice contribution. Optimal spot resource allocation is suited to be seen as thebest achievable result and thus can be used as a reference point.


The pseudo code of the benchmarking method:

for all region in regions dofor all instance in instanceTypes dofor all os in OSes do

createTasksAndResultsFiles(region, instance, os);tasks=new TaskCollection(region, instance, os);for i=0;i<maxNrOfTasks;i++ dotasks.addRandomTask();for all scheduling in schedulingVersions dofor all allocation in allocationVersions do

results=runBroker(tasks, scheduling, allocation);writeResultsToFile(scheduling, allocation, results);

end forend for

end forwriteTasksToFile(tasks);closeTasksAndResultsFiles();

end forend for

end for

The ‘runBroker’ method performs the four brokering phases in order and times themseparately. The method returns the timing results and the cost price results thatwere calculated in the output phase of the broker. Every possible combination ofgeographical region, instance type and operating system is benchmarked. A singleinstance specification can be benchmarked separately since the broker will divide thepresented workloads per appearing combination of the three parameters mentionedanyway. The only part of the broker prototype that is not benchmarked this way,is the automatic region allocation feature which was presented in section 4.3.1. Thebenchmark executes all four steps of the brokering process (their implementationsare explained in chapter 4) for every possible combination of scheduling and resourceallocation options. All these combinations provide cost and timing measurementsfor one single workload at a time and the results can thus be compared to eachother. To investigate the evolution in the cost and duration of the brokering, thebenchmark is performed for a number of different workloads.

The workload presented to the broker consists of a number of randomly generatedtasks, for the provided geographic region, instance type and operating system. Thetasks are assigned the name ‘benchTask’ followed by the number of the task in thecurrent benchmarking workload. The tasks get a pseudo-random deadline assigned,which is feasible to reach. In other words the time between the beginning of theschedule and the deadline of the task is always greater or equal to the length ofthe task. The benchmark is given a parameter that represents the probability forthe deadline to fall within the last 10 days of the 1-year period in which a scheduleis being made. This results in a set of tasks that is able to occupy the resources


during the whole scheduling period, such that the tipping point for the reservedpricing point can be reached from time to time. The tasks are assigned a randomtask length between one hour and a hundred days (or 2400 hours). For a task ofthe second workload model, the maximum task length equals the number of hoursbetween the beginning of the schedule and the deadline assigned to the task itself.These tasks also require a certain number of resources for each task hour, accordingto the distribution explained in section 3.4.2. With a given probability, the task isallowed to be run on spot instances. In our benchmark this was chosen to be 40% ofthe tasks. If too many tasks are randomly chosen to be spot-enabled, the benchmarkstarts to take a long time, because of the spot model part which will be shown tobe the most compute intensive part of the brokering process. A benchmark startswith one task and keeps adding them until a given maximum is reached, for everyversion of the workload all different scheduling and resource allocation options arebenchmarked.

For every single run of the benchmark a line is written to the instance-specificresults file, which means that a separate file is created for the region, instancetype and operating system combination that is being processed. The output filecontains the general information of the benchmark run, such as a reference to theworkload being scheduled, the scheduling and resource allocation options that wereactive and the cost and timing measurements. Figure 5.1 shows an example of thebenchmark output. The total time (in seconds) the brokering process took and thetime the four different phases (input, scheduling, allocation and output) needed arepresented. The total cost price (in US Dollars) and how much the different pricingmodels contribute to this total cost is stated too.

Figure 5.1: Snippet of the Benchmark Results Output File

Figure 5.2 shows the second output file generated for each instance specification bythe benchmarking process. The file contains the tasks that were generated duringthe benchmarking process of the corresponding instance specification. For everytask the file contains: the name of the task, the region to run it in, the instancetype and operating system of the instance to run it on, the deadline and length ofthe task and whether the task is allowed to be run on spot instances.

Figure 5.2: Snippet of the Benchmark Tasks Output File

The benchmarking process introduced in this section is used to evaluate the costand scalability performance of the implemented broker prototype, as is discussed inthe next sections of this chapter.

5.2. COST EVALUATION 87

5.2 Cost Evaluation

In this section the cost savings realized by our brokering model and implementationare demonstrated using the results we got from running the benchmark1 for Linuxinstances in the US-East region. The other regions and operating systems givesimilar results, this region and operating system was chosen however since it can beseen as the most active one and thus the most interesting one to study.

5.2.1 Workload Model 1

The original benchmark described in section 5.1 shows behavior that is expectedfrom the broker: when a spot-enabled task is added for example, the allocation timefor spot-enabled schemes increases and a spot cost is introduced while the total costlowers. The only remarkable conclusion to be made is that the ‘Spot-Enabled’ andthe ‘Optimal Spot’ allocation techniques always lead to the same total cost. Bothtechniques use the real spot price traces. This is a positive result for our broker, sincethis proves that the spotmodel, when the simulations are performed on the actualspot price history, finds the bid price that enables us to get the best possible spothour allocation. When a prediction algorithm is introduced into the spot allocationscheme, this would no longer be the case and the optimal solution would make agood comparison point.

To get an idea of the cost reductions accomplished by the broker model, a slightalteration is made to the benchmark described in section 5.1. Every benchmarkingstep now generates a workload of ten random tasks, instead of starting with onetask and adding tasks one by one until a given maximum is reached. The brokeringprocess is executed a hundred times for every option-combination. Table 5.1indicates the price reductions achieved by using the broker prototype when certainoptions are used instead of others.

Table 5.1: Average price reduction from a set of brokering options to a set of differentbrokering options [workload model 1]

If the ‘Only On-Demand’ allocation technique is used, no price difference is foundbetween the basic and the spot-optimized scheduling. Using the spot-optimizedscheduling technique instead of the basic scheduling technique when the ‘Spot-Enabled’ allocation scheme is used, causes an average price reduction of 3%. Spot-optimized scheduling with the ‘Only On-Demand’ allocation scheme is on average71.49% more expensive than the basic scheduling with the ‘Spot-Enabled’ allocation

1The results were obtained by running the benchmark on a Intel Core i7 Q7201.66GHz systemthat is running Ubuntu 9.10.

5.3. SCALABILITY EVALUATION 88

scheme, this increase is indicated in the table by the negative price reduction.The most interesting reduction stated in the table is the one our proposed brokerimplementation achieves, the reduction equals on average 42.55%. So, the spot-optimized scheduling that uses the spot-enabled allocation option results in aschedule that is on average 42.55% cheaper than the one of the basic schedulingtechnique. The variance on this percentage measures only 0.33%, which indicatesthat the proposed broker almost always generated results that are around 42.55%cheaper.

For workloads of the first model can be concluded that the broker benchmarkingindicates that the broker makes a considerable cut in the cost price for runningrandomly presented workloads on EC2. This was shown in this section for a specificinstance specification, but further investigation shows that similar cost reductionsare also reached within other geographical regions and for different instance typesand operating systems.


The benchmarking of the broker using tasks of the second workload model, resultsin very similar findings. It is again the alternate benchmark version that is used,it generates workloads consisting of ten tasks and every time makes the scheduleusing all broker-option combinations possible. The findings are illustrated by Table5.2. There is no price reduction noticeable from ‘basic — ondemand’ to ‘spot— onlyondemand’ and a price increase is noticed when the cost of the ‘basic —spotenabled’ schedule is compared with the ‘spot — onlyondemand’ one. As wasseen for workload model one in Table 5.1, the other two transitions cause considerableprice reductions. Our proposed broker model realizes on average a price reduction of45.63% for workload model two loads, when the spot-optimized scheduling is usedtogether with the spot-enabled allocation scheme instead of the basic schedulingthat only allocates on-demand instances.

Table 5.2: Average price reduction from a set of brokering options to a set of differentbrokering options [workload model 2]

For workloads of the second model can be concluded that the broker benchmarkingindicates that the broker makes considerable cost savings for running randomlypresented workloads on EC2.

5.3 Scalability Evaluation

This section investigates the scalability of the proposed broker, by analyzing thetime the different phases of the brokering process took in different situations. The


results used in this section were obtained by performing the benchmark for Linuxinstances in the US-East region.


Using the gathered data by the benchmarking experiment described in section 5.1,Table 5.3 was created for workloads of the first model. The table presents the timingof the different phases of the brokering process. When the allocation option doesnot take the spot pricing option into account, in other words for the ‘Only On-Demand’ and ‘On-Demand & Reserved’ allocation scheme options, most of the time(over 90%) is spent on the input part of the broker. The other phase containing‘slow’ I/O operations is the output part of the broker, but in this case it does notcontribute that much to the total time due to the fact that the graphical Gantt-likechart representation is turned off during the benchmarking of the broker prototype.There is only a price overview calculated in the output part of the brokering process.When an allocation algorithm that takes into account spot-enabled tasks is used,most of the time (over 99%) is spent on the allocation phase. The duration of theinput, output and scheduling phases remained the same, so the allocation time isa significant order greater. The fact that this phase takes so long is because thedetermination of the spot bid is an expensive operation, either the spotmodel isused to do a simulation or all spot prices are iterated to determine the optimaltipping point bid such that enough prices below the bid price are selected to run theworkload on.

Table 5.3: Average time distribution (in percentage of total brokering time) of thedifferent brokering phases [workload model 1]

To determine whether there is a scalability problem, the amount of time that is addedto the duration of the brokering process when more tasks need to be scheduled isanalyzed. Table 5.4 presents the average number of seconds the brokering durationincreases when a random task is added to the existing workload. When a spot-enabled allocation scheme is used, thus ‘Spot-Enabled’ or ‘Optimal Spot’, theduration increases about twenty seconds for Standard Small instances. One runof this experiment consists of starting with a workload of one random task and keep


on adding tasks until there are ten tasks in the workload. The time increases showthat only when an allocation algorithm that takes the spot pricing possibility intoaccount is used, scalability issues rise. When the number of tasks is increased to ahundred tasks, the average duration increase reduces, but is still significant. Thespot allocation scheme of the broker should be altered, such that it becomes morescalable. One solution would be to run the simulations of the spot model in advance,but doing the simulation with all possible task lengths is unfeasible. The brokeringtime does not increase too much when tasks are added and only a division betweenon-demand and reserved is made. Further analysis of the data shows that there isnot much difference in duration increase between the addition of a spot-enabled ornot spot-enabled task once there are already spot-enabled tasks part of the workload.

Table 5.4: Average time increase (in seconds) when a task is added to the workloadpresented to the broker prototype (US-East region, Standard Small Linux Instance)[workload model 1]

Figure 5.3 shows the box plot graphs containing the total time information gatheredduring an altered benchmark run. The benchmark generated a random workloadof ten tasks twenty times in a row, for each workload all the different brokeringoption combinations were used. The box plots make a comparison between the basicscheduling with only on-demand allocation, this matches the most naive schedulingand allocation option that should be fast but has a high cost for the customer, anda spot-optimized scheduling with the spot-enabled allocation technique (which isour proposed broker implementation). The duration for the naive implementationis very small, it lies between 0.05 and 0.06 seconds for a workload of ten randomtasks. For our proposed broker we notice that the duration is always a lot larger, aswould be expected. There are also large fluctuations in the duration, depending onthe presented workload.


Figure 5.3: Box plots total brokering time Basic Only On-Demand versus Spot-Enabled Scheduling and Allocation (US-East region, Standard Small Linux Instance)[workload model 1]

The conclusion to be made here is that our proposed brokering implementationis for sure feasible for small workloads, but for it to be used for workloads witha large number of tasks for a single instance specification (geographical region,instance type and operating system combination) using spot-enabled techniques,the implementation has to become more scalable. Note that we localized the timeincrease to the spot allocation scheme, so this is where improvements have to bemade.


The benchmarking of the broker using tasks of the second workload model, resultsin similar findings. Table 5.5 shows however that for the second workload modelthe for the allocation schemes ‘Only On-Demand’ and ‘On-Demand & Reserved’ thescheduling phase is now the one that takes the most time. For tasks of workloadmodel one, the input phase has the longest duration. The difference exists thanksto the complexity of the scheduling algorithm for this kind of tasks, see section3.4.2. When spot-enabled allocation schemes are used, the allocation phase of thebrokering process still accounts for over 99% and thus overshadows the durationof the other phases. As for tasks of workload one, workload two tasks have theallocation phase as bottle neck.


Table 5.5: Average time distribution (in percentage of total brokering time) of thedifferent brokering phases [workload model 2]

The amount of time that is added to the duration of the brokering process when moretasks need to be scheduled is analyzed for the second workload model too. Table5.6 presents the average number of seconds the brokering duration increases whena random task is added to the existing workload. When a spot-enabled allocationscheme is used, thus ‘Spot-Enabled’ or ‘Optimal Spot’, the duration increases over400 seconds for the Standard Small instances that were benchmarked. This increaseis considerably higher than the one for workload one tasks, thanks to the complexityof the scheduling algorithm for this kind of tasks, see section 3.4.2.

Table 5.6: Average time increase (in seconds) when a task is added to the workloadpresented to the broker prototype (US-East region, Standard Small Linux Instance)[workload model 2]

Figure 5.4 shows the box plot graphs containing the total time information gatheredduring the alternative benchmark version (see section 5.3.1). The basic schedulingbox plot is situated around the ten seconds mark and has a very small range, whilethe spot-optimized scheduling with a spot-enabled allocation scheme results in a boxplot that shows durations between 200 and 1400 seconds are most common. Thesedurations seem very high, but take into consideration that a schedule for a one yearperiod is created every time. Note that the duration of the brokering process fortasks of workload model one, only take between 60 and 270 seconds in general (seeFigure 5.3).

5.4. CONCLUSION 93

Figure 5.4: Box plots total brokering time Basic Only On-Demand versus Spot-Enabled Scheduling and Allocation (US-East region, Standard Small Linux Instance)[workload model 2]

The brokering process for tasks of the second workload model takes longer than itdoes for the first workload process. The scalability problem is again localized to thespot allocation scheme, so this is where improvements have to be made.

5.4 Conclusion

The benchmarking of the broker performance indicates that the broker achievesconsiderable cost savings when running randomly presented workloads on EC2, whilestill meeting the imposed deadline constraints.

In terms of scalability performance, a mostly linear increase in time consumptionwas noticed when a certain number of tasks were added to the workload one byone, such that the broker prototype could be considered scalable to a certain extent.Note that the spotmodel implementation simulation as well as the optimal spotallocation algorithm take a lot of time, such that a more scalable solution to do thespot resource allocation should be developed in the future.

CHAPTER 6

CONCLUSION

This chapter concludes the ‘A Broker for Cost-efficient QoS aware resource allocationin EC2’ thesis, by giving an overview of the steps that have been performed tocome to the resulting broker prototype and associated model. In the overview thecontributions of the work, such as the developed algorithms, are stated. A numberof possible extensions to the broker are proposed and future research that could beperformed is presented in the second section of the chapter.

6.1 Conclusions and Contributions

Starting from an overview of cloud computing and a detailed description of theofferings of Amazon EC2 (see chapter 1), a proposal for an EC2 brokering tool wascreated. The diversity of the EC2 offerings shows that moving an application to thecloud is not straightforward, a large number of different instance types, cloud servicesand pricing models are available. The focus of the proposed model lies on findingthe best possible division of the required resources for a given workload betweenthe three pricing models offered: on-demand, reserved and spot pricing. The bestpossible solution is presumed to be the one that minimizes the total cost for theconsumer, while still meeting the workload’s constraints. At present, consumers donot have any tools to optimally map their workload and QoS requirements (such asthe deadline by which a workload needs to finish), to these different pricing plans.Nevertheless the potential for cost reductions through intelligent instance allocationsare huge, the spot prices in the US-East region for example are on average about60% lower than the on-demand prices.

Determining how much running an application on EC2 costs, is still a hard taskto date. The cost depends on a lot of properties of the application’s workload,

6.1. CONCLUSIONS AND CONTRIBUTIONS 95

such as what instances and cloud services it requires, in what geographic locationit has to run, how much data storage and data transferring is required, and soon. A number of different environmental parameters that can influence the totalcost for the customer considerably, were presented in chapter 2. These parameterswere identified by making a comparison of the different pricing models across thedifferent geographical regions and by performing an analysis of the history of theprices. Determining which parameters to take into account in the broker’s model isa trade-off that influences the complexity of the broker. The following parametersare taken into account in the broker’s model:

• The choice of geographical region influences the cost and always choosingthe US-East region does not constitute an optimal strategy (see section 2.4).Taking into account the introduced latency associated with choosing a certainregion is important, since a workload might impose latency constraints.

• The division between the different pricing models, namely on-demand, reservedand spot instances, influences the cost as well (see section 2.4.4). When spotprices are not considered, an optimal division between on-demand and reservedresources can be made that is purely based upon the resource utilization, asexplained in section 3.2.1. This basic level of workload that requires reservedinstances to minimize the cost price, is easily determined by solving an equationto determine the tipping point utilization.

• Concerning spot pricing, the differences between the regions and the fact thatthere is an evolution noticeable during the hours of the day has to be accountedfor. The time periods, for which has statistically be shown that they hold lowerspot prices, are preferred by the broker. These time periods include weekendsand nights. The volatility of the spot market makes it an interesting marketto study (see section 2.3), that’s why a tool suite to analyze the statisticalproperties of the spot price history was developed. A part of this programis made available as a web service, it is called SpotWatch and can be foundonline [37]. SpotWatch presents the spot history in a clear to interpret way, inthe form of box plot graphs. The determination of an appropriate bid is doneby a port of the ‘SpotModel’ software, that accompanies the work of Kondo[1] on the creation of a decision model for cloud computing.

The following choices influence the cost of running a workload on EC2, but areignored or assumed to be fixed in our broker’s heuristics:

• Checkpointing costs, which are introduced when snapshotting is needed (inthe case of spot pricing). See section 3.5.1 for more information.

• The choice of the instance type on which to run a provided workload is assumedto be made by the user in our broker prototype. To make the broker morecomplete a workload benchmark that determines the most appropriate instancetype for a given workload could be implemented (see chapter 4).

6.1. CONCLUSIONS AND CONTRIBUTIONS 96

• The longterm price evolution of on-demand and reserved instances is notconsidered, this was decided because these prices haven’t changed often inthe past (see section 2.2).

The broker prototype, described in chapter 4, uses two workload models. The firstone describes a task by stating the total amount of work hours that need to beexecuted and a deadline at which time the task has to be finished. The secondworkload model describes a task by stating the length of the task, which in this caseequals the total width (in hours) of the task, every hour of the task could howeverstill need multiple instances to complete the corresponding part of the task. Thebroker’s working consists of four phases: input, scheduling, allocation and output(illustrated by Figure 6.1).

Figure 6.1: Broker Design Overview

The input component (see section 4.2) provides pricing information and a taskgeneration and specification part that delivers the constrained workloads for whichthe broker develops a cost-efficient schedule. The scheduling component (see section4.3) provides the functionality that assigns the different workloads to a geographicalregion. Per region, an instance-specific schedule is created, which divides the load asequally as possible over time. The resource allocation component (see section 4.4)then takes this schedule and determines for every resource involved, what pricingmodel is suited best in order to minimize the total cost for the consumer. Thetask components are annotated with the corresponding costs and the schedule isthen graphically or textually presented by the output component of the broker (seesection 4.5). The graphical representation is a Gantt-like graph that shows the taskparts scheduled over time across different resources. An accompanying detailed costoverview is presented to the user as well.

The broker performance was evaluated in chapter 5, it was found that the brokerachieves considerable cost savings when running randomly presented workloads onEC2, while still meeting the imposed deadline constraints. Besides of the costevaluation, the scalability of the broker was evaluated as well (see section 5.3).A linear increase in time consumption was noticed when a certain number of taskswere added to the workload one by one, such that the broker prototype could beconsidered rather scalable. The duration of a broker benchmark run might seemlong, but the benchmark was generating schedules over the period of a whole year.

6.2. FUTURE WORK 97

6.2 Future Work

The broker model and proposed prototype implementation in this thesis can beextended by different components. The time frame in which this research wasconducted was to small to elaborate on certain aspects of the problem statement.A number of possible extensions to the broker are summed up in this section:

• The broker model assumes the appropriate instance type and operating system,to run a given workload on, to be provided by the user (see section 3.3).Benchmarking a workload in order to acquire a number of resource utilizationcharacteristics, enables the automation of the mapping of the workload to aninstance type. Another similar limitation of the broker prototype is that thedetermination of whether a certain task can be run on a spot instance has tobe specified by the user as well.

• An investigation of the possibility of running two workloads together on asingle large instance, instead of running them separately on two slightly smallerinstances.

• The broker prototype should be transformed into a web service with a well-defined protocol to perform brokering tasks. This should be a rather easything to do using JAX-WS [38], since all brokering code is written in the Javalanguage.

• Expand the spot model such that more of the conclusions drawn from thecreated box plots are incorporated (see section 2.3). Modify the model suchthat it uses a prediction mechanism that tries to foresee the future spot pricesbased on the spot price history and noticeable general trends. The brokerprototype now requires the spot prices to be known during the schedulingperiod.

• More EC2 specific features (see section 1.2) can be incorporated into the brokermodel, such that the model better represents the real price the customer wouldpay to run its workload on the Amazon EC2 cloud. Data storage and transfercost can be taken into account, dedicated instances can be added to the model,the possibility of having a free tier can be accounted for, and so on.

• Checkpointing is only incorporated in the spotmodel for now (see section 3.5.1),it is however not used by the brokering prototype. Taking the overhead costof snapshotting into account would yield a more realistic representation of thecost overview and is thus desirable.

Appendices

APPENDIX A

A: Size Estimation EC2 Regions

A.1 Introduction

To determinine how large the EC2 markets are in the different geographical regions,it’s interesting to investigate the size of the infrastructure offered by Amazon inthese regions. On April 22th 2011, AWS posted on the official EC2 forums [39] thepublic IP ranges used by the different geographical EC2 regions. It was accompaniedby the following introduction sentence: “We are pleased to announce that as part ofour ongoing expansion, we have added a new public IP range (APAC-Tokyo)”. Thesize of these IP ranges can be seen as an indication of the size of the different EC2regions.

A.2 Size IP Ranges in Different Regions

The public IP ranges that were published can be used to determine how the amountof addresses offered in the different regions compare to each other, Table A.1illustrates this.

Table A.1: Public IP Ranges of the EC2 Regions

A.2. SIZE IP RANGES IN DIFFERENT REGIONS 100

From the provided data it can be seen that more than half of all the public IPaddresses provided by Amazon EC2 are situated in the US-East region, it is followedby EU-West and US-West who account for about 15% of the IP addresses. TheAsian regions represent less than 10% of the IP addresses, but these regions are themost recently introduced and are still demonstrating a faster growth than the otherregions.

APPENDIX B

B: On-Demand and Reserved Price Evolution

B.1 Introduction

The history of EC2 and its instances is given in a chronological order in thisappendix. This history overview contains a selection of the events that wereannounced by Amazon in their ‘What’s New’ section [28]. Only the events that areconsidered important for the research of this thesis are stated. When the reserved oron-demand pricing in the US-East region got an update, these changes are illustratedwith a table containing the new prices.

B.2 On-Demand and Reserved Price Evolution

(24/08/2006) Amazon Elastic Compute Cloud beta is announced. There was onlyone instance type (1.7GHz Xeon processor/1.74 GB of RAM) available for $0.10 perhour.

(22/10/2007) EC2 in unlimited beta and new instance types are announced.

(29/05/2008) High-CPU instances announced.

B.2. ON-DEMAND AND RESERVED PRICE EVOLUTION 102

(23/10/2008) EC2 exits beta and offers SLA with commitment of 99.95(23/10/2008)EC2 instances running Windows Server available.

(10/12/2008) The European Region is added.(08/01/2009) The AWS management console is launched.(03/03/2009) Windows instances announced for the EU region.(12/03/2009) EC2 introduces Reserved instances.

(15/04/2009) Reserved instances now available in Europe.(20/08/2009) Prices of Reserved instances are decreased.

(30/09/2009) Distinction between instances running Windows and instances runningWindows with Authentication Services is removed (the price of the general Windowsinstances is put in place).(27/10/2009) New High-Memory instances announced.


(27/10/2009) All other on-demand prices are lowered with up to 15%

(03/12/2009) Northern California region launched.(14/12/2009) Amazon EC2 Spot instances announced.(23/02/2010) Extra Large High Memory instances get introduced.

(23/02/2010) EC2 instances with Windows now available.

(29/04/2010) Asia Pacific (Singapore) Region announced.(13/07/2010) Cluster Compute instances announced.

(01/09/2010) Lower prices for High Memory Double and Quadruple XL instances.

(09/09/2010) Micro instances are announced.


(21/10/2010) AWS Free Usage Tier introduced.(15/11/2010) Cluster GPU instances announced.

(03/12/2010) Free monitoring of CPU, disk and network performance metricsthrough CloudWatch is introduced.(02/03/2011) Asia Pacific (Tokyo) Region announced.

APPENDIX C

C: On-Demand/Reserved Price Versus Hardware Costs

C.1 Introduction

An interesting research concerns the relation between the price reductions in EC2and the expected hardware cost reduction that happens over time. In this appendix,that focusses on the CPU cost, the underlying hardware Amazon is using forinstances of a certain type is discussed. The evolution of the hardware being usedby EC2 instances is investigated as well.

C.2 Hardware Evolution

According to several sources [40], the microprocesor stated in Table C.1 were used inthe hardware on which a certain instance ran at the time the corresponding instancetype got introduced.

C.2. HARDWARE EVOLUTION 106

Table C.1: Microprocessors used by EC2 Instances at Introduction

The following pseudocode was used in a test to verify whether these are still the mi-croprocessors that are used by Amazon today1.

for i = 0 to x dofor all type in instanceTypes do

instance=startInstanceUSRegion(type)procInfo=instance.execute(“more /proc/cpuinfo”)output.append(procInfo)

end forend for

The command used in this algortithm “more /proc/cpuinfo” returns a number ofCPU-related characteristics of the underlying system [41].

[ec2-user@ip-10-212-101-159 ~]$ more /proc/cpuinfo

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 23

model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz

stepping : 10

cpu MHz : 2659.994

cache size : 6144 KB

...

This experiment, that tested what type of processor the machine on which arequested instance (of a certain type) runs, resulted in the following Table C.2presenting the microprocessors being used by instances of a certain type.

Table C.2: Microprocessors used by EC2 Instances according to Test Program

These results show that Amazon started using newer microprocessors for a number ofearlier introduced instance types. They did not replace the old processors, but addedmore machines with a newer microprocessors in their data centers. The processor

1These tests were performed on March 26th 2011.

C.3. PRICE EVOLUTION 107

that has been around the longest however, namely the AMD Opteron 2218 that wasused in the US East Region for Standard Small instances in the early days of EC2,seems to have disappeared (or it at least became rare to get a machine with thisprocessor). Table C.3 gives a bit more information about the microprocessors thatare in use in EC2, such as their official launch date and their original price (whenthey are sold in bulk, per thousand units) [42].

Table C.3: Extra Information about the Microprocessors used by EC2 Instances

C.3 Price Evolution

The evolution of the price of the microprocessors that are used in EC2 areinvestigated in this section. The following consumer market price evolution graphstaken from a German price watch site [43] show the average price of the processoracross a number of popular electronics online merchant sites over time. The price,of for example the Intel Xeon E5430 (see Figure C.1) and E5507 (see Figure C.2),did not change that much on the consumer market. The start price on these graphsmirrors the introduction price mentioned in Table C.3 above.

Figure C.1: CPU Procurement Cost Evolution for Intel Xeon E5430 (per unit)

C.4. CONCLUSION 108

Figure C.2: CPU Procurement Cost Evolution for Intel Xeon E5507 (per unit)

The fact that the CPU price didn’t decrease much, does not mean that thehardware cost for Amazon didn’t decrease. Amazon started using newer processormodels over time, which can reduce the hardware cost for a certain amount ofcompute power which is needed to offer the advertised amount of Elastic ComputeUnits (ECU) of a certain instance type. The Standard Small instances on EC2started using different microprocessors over time, this evolution is accompanied bya price reduction. The AMD Opteron 2218, which was used in the early days ofEC2 in the US East region, was a dual core 2.6GHz processor and costed 873 dollarsat the time of its introduction. If expressed in price per GHz, this processor had aprice of 167 dollar per GHz. The microprocessors that are used today for this kindof instances are the Intel Xeon E5430 (quad core at 2.66GHz for 455 dollars) and theIntel Xeon E5507 (quad core at 2.26GHz for 276 dollars). They have a correspondingprice per GHz of 42.76 and 30.53 dollars. So, substituting the AMD Opteron by thenewer E5430 processor comes with a price reduction of about 75%. This reductionhappened in a bit over a year, since the E5430 was launched in the fourth quarter of2007 and the AMD Opteron 2218 was introduced in the third quarter of 2006. Thenewer Intel Xeon E5507, launched in the first quarter of 2010, signifies a reductionof about 82% compared to the AMD Opteron processor and a decrease of about 29percents compared to the Intel Xeon E5430.

C.4 Conclusion

Not much data about hardware costs and what hardware Amazon is using is publiclyavailable. During the last 5 years, in which Amazon EC2 has been active, they putnewer microprocessors in use to provide certain instances to its customers. Thisevolution was accompanied by a noteworthy hardware price decrease of up to 80

C.4. CONCLUSION 109

percents. The hourly rate for EC2 instances on the other hand has only had aprice reduction of up to 15 percents. It’s however hard to make any conclusionsbased on the realisation of this divergence, the CPU procurement cost is only asmall portion of the total cost of the service that Amazon offers through EC2. Theprice impact should be normalized using the percentage the CPU cost representsof the instance hourly price. This percentage can however only be determined byAmazon itself, since others can’t get access to the needed data. It is reasonableto assume the hardware price decrease gives Amazon room for price reductions, iftougher competition would become reality.

APPENDIX D

D: Basic Economics

D.1 Supply and Demand

The principle of supply and demand [44] should be the basis of the EC2 spot pricemarket, so it is important to understand what this actually means. It is used todescribe the quantity and price of a good, based on the relationship between theseller and buyer of the good. In this case the price of a spot instance should bebased upon how much resources are still available on EC2 and on the other handon the number of customers that want to pay a certain price for the instance. Bothgroups sellers (supply) and buyers (demand) are represented by a graph (see FigureD.1), and the point where both curves meet is called the equilibrium and indicatesthe socially optimal price for the given good. The slope of the curves indicate thedegree of elasticity supply or demand, this determines how much the optimal pricewill change if the supply or demand changes.

Figure D.1: Supply and Demand Principle

APPENDIX E

E: Spot Price Analysis

E.1 Introduction

The empirical analysis of the spot price involves a number of statistical terms thatwill briefly be discussed in this appendix. Since the empirical analysis chapter 2utilized a large number of box plots to illustrate the conclusions made, it is importantto explain how to interpret the different components of a box plot graph.

E.2 Box Plot Introduction

First of all, a schematic representation of the boxplot is given:

Figure E.1: Schematic explanation of Boxplot

E.3. STATISTICAL TERMS 112

A box plot [45], which is also called a box and whisker graph, is a graphicalrepresentation that gives an idea of the dispersion of the data, it is useful fordescribing the behavior of the data in the middle as well as at the ends/tails ofthe distribution. A box is drawn between the upper (Q3, the 75th percentile) andlower quartiles (Q1, the 25th percentile), with a solid line drawn across the box toindicate the median value. The interquartile (IQ) range is the difference betweenthe upper and the lower quartile. There are whiskers used to identify the rangesthat contain values that differ more from the median of the distribution. There areinner fences/whiskers drawn (not shown on the illustration) at Q1 - 1.5*IQ and Q2+ 1.5*IQ. The upper outer fence is located at Q2 + 3*IQ, while the lower outerfence can be found at Q1 - 3*IQ. These fences give an idea of how the tails of thedistribution look like. Any observation outside these fences is considered a potentialoutlier.

E.3 Statistical Terms

We decided to use the JFreechart library [36] to create the boxplots in ourSpotWatch application, it provides an easy way to create such graphs using theirBoxAndWhisker renderer. The datastructure that it uses during the generation ofthe boxplots, provides a number of interesting statistical values that are explainedbelow. We store these values for every operating system, geographic region andinstance type combination that is processed.

• Arithmetic mean: The arithmetic mean is often referred to as simply themean or average, it indicates the central tendency of the sample space.

• Geometric mean: The geometric mean indicates the typical value of a setof numbers. It is similar to the arithmetic mean, which is what most peoplethink of with the word “average”, except that the numbers are all multipliedwith each other and then the nth root is taken of the resulting product (wheren is the count of numbers in the set).

• Number of Values: In the context of our broker application the numberof values in a sample space indicates the number of spot price changes thatoccured, this value can be used as part of the expression that indicates theactiveness of the spot market.

• Maximum & Minimum: The maximum and minimum are, as you wouldexpect, the greatest value and the smallest in the set.

• Percentile: A percentile is the value of a variable below which a certainpercentage of the observations (values in the sample set) fall. The 25thpercentile is also known as the first quartile (Q1); the 50th percentile as themedian or second quartile (Q2) and the 75th percentile as the third quartile(Q3).

E.4. OUTLIER DETECTION 113

• Variance: The variance is used to describe a distribution, it indicates howfar values lie from the mean.

• Standard deviation: The standard deviation is a widely used measurementof variability or diversity, it shows how much dispersion there is from theaverage/mean value. A low standard deviation indicates that the data pointstend to be very close to the mean, whereas high standard deviation indicatesthat the data is spread out over a large range of values.

• Kurtosis: Kurtosis is a measure of the ‘peakedness’ of the probabilitydistribution of the random variable. A higher kurtosis value means thatinfrequent extreme deviations yield more of the observed variance, as opposedto frequent modestly sized deviations.

• Skewness: Skewness [46] is a measure of the asymmetry of the probabilitydistribution. The skewness value can be positive or negative as illustratedhere.

Figure E.2: Positive vs Negative Skewness

Thus, a negative skew indicates that the tail on the left side of the probabilitydensity function is longer than the right side and this yields that most of thevalues (including the median) lie to the right of the mean. A positive skew onthe other hand indicates that the tail on the right side is longer than the leftside and that the bulk of the values lie to the left of the mean. A zero valueindicates that the values are relatively evenly distributed on both sides of themean, typically this implies a symmetric distribution.

E.4 Outlier Detection

To determine whether certain conclusions made based upon calculated averages,such as the ones that were discussed in the environmental analysis chapter (see 2),are true, it is important to find out whether the conclusions are valid or whetherthey were caused by the presence of outliers. An outlier is defined by Grubbs [47]:

An outlying observation that appears to deviate markedly from othermembers of the sample in which it occurs.

E.4. OUTLIER DETECTION 114

Sometimes the distinction with an extreme value [48] is made, but we consider thosevalues outliers as well. An extreme value is said to be an observation that mighthave a low probability of occurrence and cannot be statistically shown to originatefrom a different distribution than the rest of the data.

Outliers are often indicators of either measurement errors or of the fact that thepopulation has a heavy-tailed distribution. In the former case we can discard thevalues, measurement errors will however not exist in our application, because weuse the values that are actually used as spot price by Amazon EC2 since we got byusing their API. In case of a heavy-tailed distribution they are indicated by a highkurtosis value.

A number of outlier detection strategies [49] exist, but they all incorporate theidea of a measure with a spread. This means that the non-outlier values fall withina distance below and above the the mean value. One possibility is to use the mean+/- x times the standard deviation as the range for the ‘normal’ values, the onesthat fall outside of this range are considered possible outliers. The range that isoften applied here is from the mean minus three times the standard deviation to themean plus three times the standard deviation. Emperical analysis has shown thatapproximately 68% of the values of a normal distribution fall within one standarddeviation (SD) unit of the mean, 95% within 2 SD of the mean, and 99% within 3 SDof the mean. Another possible detection strategy uses a range defined by the usageof the characteristics used in a boxplot. They consider values that fall below Q1-1.5*IQ or above Q3+1.5*IQ to be outliers. IQ stands for inter quartile range, whichmeans the distance between the 25 and 75 percentile, or in other terms: Q3-Q1.

APPENDIX F

F: On-Demand versus Reserved Instances

F.1 Introduction

In the Resource Scheduling chapter 3 a technique to make the optimal divisionbetween on-demand and reserved instances is given. It’s optimal in the sense that itminimizes the total cost price. For a particular instance (in a certain geographicalregion, using a certain operating system and being of a certain instance type), itwas shown that it is possible to determine a tipping point that expresses from whatutilization rate it is cheaper for an instance to be using the reserved pricing model.In this appendix the tipping points for Linux and Windows instances that are rentedfor a 3-year period are presented.

F.2 Tipping Points (3-year period)

Table F.1 gives an overview of the tipping point percentages for Linux instancesduring a 3-year period. The tipping point expresses how much time an instance hasto be actually in use, for it to be cheaper to be using the reserved pricing modelrather than the on-demand one.

F.2. TIPPING POINTS (3-YEAR PERIOD) 116

Table F.1: Linux 3-Year Overview

Table F.2 gives an overview for Windows instances for a 3-year period.

Table F.2: Windows 3-Year Overview

Already at a utilization rate of 25 percent of the time, for a Windows instance thatis rented for a period of 3 years, the reserved version is cheaper than an on-demandinstance. The 3-year reserved version is not taken into account in our model, sinceit is impossible to predict the utilization rate for such a long period. First of all itis difficult/impossible for a company to predict its workload for the next 3 years.Secondly, EC2 is evolving quickly and Amazon could start offering new pricingmodels that better fit one’s workload. It is impossible to predict what technologywill do in 3 years time, this makes both predicting one’s workload and foreseeingwhat Amazon EC2 will look like hard.

APPENDIX G

G: Developed Software

G.1 Introduction

In the this appendix the information is provided to access the software that wasdeveloped to accompany the research stated in this thesis. It consists of a Javaprototype of the broker application and the SpotWatch website that was developedto make the spot price history publicly available.

G.2 Environmental Analysis Tools

The different graphs and tables generated for the analysis of the EC2 instance pricesused in the environmental analysis chapter (see chapter 2), are gathered in oneMicrosoft Excel file that updates when the input consisting of the current On-Demand, Reserved and average Spot prices is modified. This Excel file can bedownloaded from the following location:

http://kurtvermeersch.com/Thesis/EnvironmentalAnalysisFinal.xlsx

The statistical analysis of the EC2 spot pricing that was performed in theenvironmental analysis chapter, is made into a webservice that allows the user tocreate graphs for all existing regions, instance types and operating systems currently1

offered by Amazon EC2 in any desired time frame. The spot price history is availablefrom the beginning of the existence of the EC2 spot market untill the current date,the spot price history is updated daily through an Amazon EC2 API call. SpotWatch

1Last checked on April 25th 2011.

offers 4 different chart types, the data can be plotted per date, per week, per dayof the week or per hour of the day and generates for every situation a graph of theaverage spot price and a corresponding boxplot. The application can be accessedthrough its website

http://spotwatch.eu

G.3 Broker Prototype

A broker that maps the workloads with corresponding deadlines of a consumer toa schedule that specifies which portfolio of instances and usage of pricing modelsachieves the goal of minimizing the total cost as good as possible. The broker wasdescribed in the broker design chapter 4. A prototype of this broker was developedin the Java language and can be downloaded together with Javadoc documentationand a ReadMe file from the following location:

http://kurtvermeersch.com/Thesis/PrototypeBrokerFinal.zip

Part of this broker is the SpotModel that is used to determine an appropriatemaximum spot price bid according to the spot price history, it allows you to schedulea workload that meets its deadline with a certain confidence when bidding a certainvalue. This part of the broker is based on the tool developed for the “DecisionModel for Cloud Computing under SLA Constraints” paper by Derrick Kondo [1].The Java port of the SpotModel can be downloaded seperately at this location:

http://kurtvermeersch.com/Thesis/SpotModelFinal.zip

Bibliography

[1] A. Andrzejak, D. Kondo, and S. Yi, “Decision model for cloud computingunder sla constraints,” in Modeling, Analysis Simulation of Computerand Telecommunication Systems (MASCOTS), 2010 IEEE InternationalSymposium on, pp. 257 –266, 2010.

[2] Amazon, “Elastic compute cloud.” http://aws.amazon.com/ec2, 2008.[Accessed 22-12-08].

[3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski,G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above theclouds: A berkeley view of cloud computing,” Tech. Rep. UCB/EECS-2009-28,EECS Department, University of California, Berkeley, February 2009.

[4] I. R. I. Foster, Zhao Yong and S. Lu, Cloud Computing and Grid Computing360-Degree Compared. Proc. 2008 Grid Computing Environments Workshop,2008.

[5] C. Babcock, The Cloud Revolution. Mc Graw Hill, 2010.

[6] P. Mell and T. Grance, “The nist definition of cloud computing,” 2009.

[7] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloudcomputing and emerging it platforms: Vision, hype, and reality for deliveringcomputing as the 5th utility,” Future Gener. Comput. Syst., vol. 25, no. 6,pp. 599–616, 2009.

[8] A. A. R. Mueen Uddin, “Server consolidation: An approach to makedata centers energy efficient & green.” http://aws.amazon.com/ec2, 2008.[Accessed 22-12-08].

[9] J. McCarthy, Centennial Keynote Address. MIT, 1961.

[10] D. Parkhill, The Challenge of the Computer Utility. Addison-Wesley PublishingCompany, 1966.

[11] N. Charr, The Big Switch: Rewiring the World, from Edison to Google. W. W.Norton & Company, 2008.

[12] H. Heymann, “What stack does foursquare run on ec2?.” http://www.quora.

com/What-stack-does-Foursquare-run-on-EC2. [Accessed 13-04-11].

[13] Google, “Google apps for business.” http://www.google.com/Apps. [Accessed13-04-11].

[14] Microsoft, “Windows azure.” http://www.microsoft.com/windowsazure/.[Accessed 13-04-11].

[15] Amazon, “Simple storage service.” http://aws.amazon.com/s3, 2008.

[16] FlexiScale, “Flexiscale public cloud.” http://www.flexiant.com/products/

flexiscale/. [Accessed 13-04-11].

[17] GoGrid, “Gogrid, complex infrastructure made easy.” http://www.gogrid.

com/. [Accessed 03-05-11].

[18] RackSpace, “Rackspace hosting.” http://www.rackspace.com. [Accessed 03-05-11].

[19] Google, “Google app engine.” code.google.com/appengine/. [Accessed 13-04-11].

[20] SalesForce, “Crm & cloud computing.” www.salesforce.com. [Accessed 13-04-11].

[21] D. R. Trust, “Pue data center efficiency metric.” http://www.

digitalrealtytrust.com/pue-efficiency.aspx. [Accessed 03-05-11].

[22] AWS, “Aws service health dashboard.” http://status.aws.amazon.com/.[Accessed 27-04-11].

[23] U. Congress, “Uniting and strengthening america by providing appropriatetools required to intercept and obstruct terrorism (usa patriot act)act of 2001.” http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?

dbname=107_cong_public_laws&docid=f:publ056.107, 2001. [Accessed 13-04-11].

[24] C. S. Writer, “Amazon cloud computing goes beta.” http://www.cbronline.

com/news/amazon_cloud_computing_goes_beta, 2006. [Accessed 13-04-11].

[25] J. Hamilton, “Amazon route 53 dns service.” http://perspectives.

mvdirona.com/2010/12/06/AmazonRoute53DNSService.aspx, 2010. [Ac-cessed 13-04-11].

[26] “Simple monthly calculator.” http://calculator.s3.amazonaws.com/calc5.

html. [Accessed 13-04-11].

[27] A. AWS, “Amazon ec2 pricing.” http://aws.amazon.com/ec2/pricing/.[Accessed 25-04-11].

[28] E. W. New, “Whats new?.” http://aws.amazon.com/about-aws/

whats-new/. [Accessed 14-04-11].

[29] N. I. of Standards and Technology, “Engineering statistics handbook.” http://

www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm. [Accessed17-04-11].

[30] CycleCloud, “Lessons learned building a 4096-core cloud hpcsupercomputer.” http://blog.cyclecomputing.com/2011/03/

cyclecloud-4096-core-cluster.html, 2011. [Accessed 05-05-11].

[31] A. Hillier, “Analytics for internal cloud management.” http://www.cirba.

com/forms/63f_analytics-for-internal-cloud-m.htm. [Accessed 17-04-11].

[32] B. Wiki, “Catalog of boinc powered projects.” http://www.boinc-wiki.info/

Catalog_of_BOINC_Powered_Projects. [Accessed 19-04-11].

[33] A. A. Sangho Yi, Derrick Kondo, “Reducing costs of spot instances viacheckpointing in the amazon elastic compute cloud,” in 3rd InternationalConference on Cloud Computing (IEEE CLOUD 2010), pp. 236–243, 2010.

[34] S. Yi, “Spotmodel.” http://sourceforge.net/projects/spotmodel/. [Ac-cessed 18-04-11].

[35] K. Vermeersch, “Research project blog.” http://www.stage.

kurtvermeersch.com/. [Accessed 25-04-11].

[36] JFreechart, “Jfreechart.” http://www.jfree.org/jfreechart/. [Accessed 17-04-11].

[37] K. Vermeersch, “Spotwatch.” http://www.spotwatch.eu/. [Accessed 19-05-11].

[38] “Jax-ws project description.” http://jax-ws.java.net/. [Accessed 25-05-11].

[39] Jason@AWS, “Amazon ec2 public ip ranges.” https://forums.aws.amazon.

com/ann.jspa?annID=1008. [Accessed 26-04-11].

[40] CloudHarmony, “What is an ecu? cpu benchmarkingin the cloud.” http://blog.cloudharmony.com/2010/05/

what-is-ecu-cpu-benchmarking-in-cloud.html. [Accessed 14-04-11].

[41] CloudIquity, “Amazon ec2 instances and cpuinfo.” http://www.cloudiquity.

com/2009/01/amazon-ec2-instances-and-cpuinfo/. [Accessed 14-04-11].

[42] Heise, “Preisentwicklung intel xeon e5430.” http://www.heise.de/

preisvergleich/eu/?phist=293851&age=2000. [Accessed 14-04-11].

[43] Intel, “Intel xeon processor e5507.” http://ark.intel.com/Product.aspx?

id=37100. [Accessed 14-04-11].

[44] N. Geographic, The Knowledge Book: Everything you need to knoe to get by inthe 21th century. National Geographic, 2009.

[45] T. Lossen, “Cloudexchange.” http://cloudexchange.org. [Accessed 13-04-11].

[46] W. Community, “Skewness.” http://en.wikipedia.org/wiki/Skewness.[Accessed 17-04-11].

[47] F. E. Grubbs, “Procedures for detecting outlying observations in samples.,”Technometrics, vol. 11, 1969.

[48] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley & Sons, 1985.

[49] H. Seltman, “Exploratory data analysis.” http://www.stat.cmu.edu/

~hseltman/309/Book/chapter4.pdf. [Accessed 17-04-11].

A Broker for Cost-efficient QoS aware Resource Allocation in EC2

Documents

Transcript of A Broker for Cost-efficient QoS aware Resource Allocation in EC2