Runtime Measurements in the Cloud · September 14th, 2010 Runtime Measurements in the Cloud, J....

Information Systems Group

Runtime Measurements in the Cloud

Observing, Analyzing, and Reducing Variance

Jörg Schad, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz

VLDB 2010

September 14th, Singapore1

Information Systems GroupSaarland University

http://infosys.cs.uni-sb.de/people/joergschad.php

http://infosys.cs.uni-sb.de/people/joergschad.php

Runtime Measurements in the Cloud, J. Schad, J. Dittrich, and J. QuianéSeptember 14th, 2010

Information Systems GroupMotivation

Hadoop++

1

public void getValue() {return value;}


public void getValue() {return value;}dsfdsfl; flksdjfjds lkfjsdjk jlkjsf ljsdlkfj sf

2Motivation


Information Systems GroupMotivation

Hadoop++

1




J. Dittrich et al., Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

VLDB 2010Presentation on Wednesday at 12:00

Research track: Cloud computing session

2Motivation



Hadoop++

MotivationYeah! the

prototype is ready

2

Hadoop++

1







2Motivation



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready

But, where do I run our 100-nodes experiments?

2 3

Hadoop++

1







2Motivation



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready


O.K., run the experiments on

Amazon EC2

2 3 4 5

Hadoop++

Hadoop++

1







2Motivation



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready



Amazon EC2

Hadoop++

2 3 4 5

6

Hadoop++

Hadoop++

1







2Motivation



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready



Amazon EC2

Hadoop++

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements

EC2 Cluster Local Cluster

2 3 4 5

6 7

Hadoop++

Oops!

Hadoop++

1







2Motivation



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready



Amazon EC2

Hadoop++

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


2 3 4 5

6 7

Hadoop++

Oops!

Hadoop++

1







2Motivation

• Application: a MapReduce aggregation job

• Number of virtual nodes: 50

• Repetitions: once every hour

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


Local Cluster

Amazon EC2



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready



Amazon EC2

Hadoop++

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


2 3 4 5

6 7

Hadoop++

Oops!

Hadoop++

1







2Motivation

Hadoop++

Why?!

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


8



Hadoop++Hadoop++

MotivationYeah! the

prototype is ready



Amazon EC2

Hadoop++

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


2 3 4 5

6 7

Hadoop++

Oops!

Hadoop++

1







2Motivation

Hadoop++

Why?!Let’s

investigate!

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


9 10

Hadoop++

?

8


Information Systems GroupRelated Work

3Related Work



3Related Work

M. Armbrust et al., Above the Clouds: A Berkeley View of Cloud Computing. UCB Technical Report, 2009.

Summary: performance unpredictability is mentioned as one of the major obstacles for Cloud computing.



3Related Work


S. Ostermann et al., A Performance Analysis of EC2 Cloud computing Services for Scientific Computing. Cloudcomp, 2009.


Summary: evaluation of different Cloud services of Amazon in terms of cost and performance.

ResearchChallenges



[Appeared after VLDB’10 deadline]

Related Work

3Related Work


D. Kossmann et al., An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD, 2010.


Summary: cost and performance evaluation of different distributed databases architectures and cloud providers.



AbsolutePerformance

ResearchChallenges



?

Related Work

3Related Work







ApplicationPerformance

AbsolutePerformance

ResearchChallenges

Variability inPerformance



?

Related Work

3Related Work







ApplicationPerformance

AbsolutePerformance

ResearchChallenges

Variability inPerformance

11

Go for it!


Information Systems GroupAgenda

• Motivation

• Related Work

• Background

• Methodology

• Results & Analysis

4Background


Information Systems GroupAmazon EC2

• Most popular Cloud infrastructure

• Three locations: US, EU, and ASIA [after VLDB’10 deadline]

• Different availability zones for US

• Linux-based virtual machines (instances)

• Five EC2 Instance types: standard, micro [from

September 9th], high-memory, high-cpu, and cluster-compute [after VLDB’10 deadline]

5Background


Information Systems GroupStandard Instances• Small size instance

• 1.7 GB of main memory

• 1 EC2 Compute Unit

• 160 GB of local storage

• Large size instance


• 4 EC2 Compute Units


• Extra Large size instance

• 15 GB of main memory



6Background


Information Systems GroupStandard Instances• Small size instance




• Large size instance


• 4 EC2 Compute Units


• Extra Large size instance




6Background

“one EC2 compute unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.”



• Motivation

• Related Work

• Background

• Methodology


7Methodology


Information Systems GroupWhat to Measure?

8Methodology



8Methodology

Microbenchmarks



8

• CPU performance

• Memory performance

Methodology

Microbenchmarks



8

• CPU performance


• Disk I/O (sequential and random)

Methodology

Microbenchmarks



8

• CPU performance


• Disk I/O (sequential and random)

Methodology

In this talk:

• Internal network bandwidth

• External network bandwidth

• Instance startup

Microbenchmarks


Information Systems GroupHow to Measure?

9

• CPU performance:

• Memory performance:

• Disk I/O (sequential and random):

Methodology

In this talk:




Microbenchmarks


Information Systems GroupHow to Measure?

9

• CPU performance:

• Memory performance:

• Disk I/O (sequential and random):

Methodology

Bonnie++

In this talk:

Ubench

Ubench




Microbenchmarks


Information Systems GroupGoal of Our Study

10Methodology



10

• Do different Instance types have different variations in performance?

Methodology



10


• Do different locations or availability zones impact performance?

Methodology



10


• Do different locations or availability zones impact performance?

• Does performance depend on the time of the day, weekday, or week?

Methodology

In this talk:


Information Systems GroupSetup

11

• Small and large Instances in US and EU locations

• Default settings for Ubench and Bonnie++

• Results reported in CET time

• Baseline: our team’s cluster at Saarland University

Methodology

• 2.66 GHz Quad Core Xeon CPU


• 6x750 GB SATA hard disks

• 50 Xeon-based virtual nodes


Information Systems GroupSetup

11

• Small and large Instances in US and EU locations

• Default settings for Ubench and Bonnie++

• Results reported in CET time

• Baseline: our team’s cluster at Saarland University

Methodology

[ASIA location was introduced after VLDB deadline]

• 2.66 GHz Quad Core Xeon CPU


• 6x750 GB SATA hard disks

• 50 Xeon-based virtual nodes


Information Systems GroupMethodology

Methodology12

Create one small and one large Instance

Run Bonnie++

Run Ubench

Run Bonnie++

Small Large

Other benchmarks

Run Ubench

Other benchmarks

Every hour

Loop

Kill previous instances


Information Systems GroupMethodology

Methodology12

Start: December 14, 2009End: January 12, 2010Duration: 31 days

[Results for one additional month,but without any additional pattern]

Create one small and one large Instance

Run Bonnie++

Run Ubench

Run Bonnie++

Small Large

Other benchmarks

Run Ubench

Other benchmarks

Every hour

Loop

Kill previous instances


Information Systems GroupMeasure of Variation

13Methodology


Information Systems GroupMeasure of Variation

13Methodology

•Different ones: range, variance, standard deviation, ...



•Need to compare data series in different scales

Measure of Variation

13Methodology




•Need to compare data series in different scales

Measure of Variation

13Methodology

CPU Memory Sequential Read Random Read Network

[Ubench score] [Ubench score] [KB/second] [seconds] [MB/second]

Mean x 1,248,629 390,267 70,036 215 924Min 1,246,265 388,833 69,646 210 919Max 1,250,602 391,244 70,786 219 925Range 4,337 2,411 1,140 9 6COV 0.001 0.003 0.006 0.019 0.002

Table 1: Physical Cluster: Benchmark results obtained as baseline

byte I/O and block I/O. For further details please refer to [4].In our study, we report results for sequential reads/writesand random reads block I/O, since they are the most influ-encing aspects in database applications.Network Bandwidth. We use the Iperf benchmark [8]to measure network performance. Iperf is a modern alter-native for measuring maximum TCP and UDP bandwidthperformance developed by NLANR/DAST. It measures themaximum TCP bandwidth, allowing users to tune variousparameters and UDP characteristics. Iperf reports resultsfor bandwidth, delay jitter, and datagram loss. Unlike othernetwork benchmarks (e.g. Netperf), Iperf consumes less sys-tem resources, which results in more precise results.S3 Access. To evaluate S3, we measure the required timefor uploading a 100 MB file from one unused node of ourphysical cluster at Saarland University (which has no net-work contention locally) to a newly created bucket on S3(either in US or EU location). The bucket creation timeand deletion time are included in the measurement. It isworth noting that such a measurement also reflects the net-work congestion between our local cluster and the respectiveAmazon datacenter.

4.3 Benchmark ExecutionWe ran our benchmarks two times every hour during 31

days (from December 14 to January 12) on small and largeinstances. The reason for making such a long measurementsis because we expected the performance results to vary con-siderably over time. This long period of testing also allowsus to do a more meaningful analysis of the system perfor-mance of Amazon EC2. We have even one month moreof data, but we could not see any additional patterns thanthose presented here1. We shut down all instances after 55minutes, which allowed us to enforce Amazon EC2 to createnew instances just before running again all benchmarks. Themain idea behind this is to better distribute our tests overdifferent computing nodes and hence to get a real overallmeasure for each of our benchmarks. To avoid that bench-mark results were impacted by each other, we sequentiallyran all benchmarks so as to ensure that only one benchmarkwas running at any time. Notice that, as sometimes a singlerun can take longer than 30 minutes, we ran all benchmarksonly once in such cases. To run the Iperf benchmark, wesynchronized two instances just before running it, becauseIperf requires two idle instances. Furthermore, since twoinstances are not necessarily in the same availability zone,network bandwidth is very likely to be different. Thus, weran different experiments for the case when two instancesare in the same availability zone and when they are not.

4.4 Experimental SetupWe ran our experiments on Amazon EC2 using one small

1The entire dataset is publicly available on the project web-site [9].

standard instance and one large standard instance in bothlocations US and EU (we increased the number of instancesin Section 7.1). Please refer to Section 3 for details on thehardware of these instances. For both types of instances weused a Linux Fedora 8 OS. For each instance type we cre-ated one Amazon Machine Image per location including thenecessary benchmark code. We used standard instances lo-cal storage and mnt partitions for both types when runningBonnie++. We stored all benchmark results in a MySQLdatabase, hosted at our local file server.

To compare EC2 results with a meaningful baseline, wealso ran all benchmarks — except instance startup and S3 —in our local cluster having physical nodes. It has the follow-ing configuration: one 2.66 GHz Quad Core Xeon CPU run-ning 64-bit platform with Linux openSuse 11.1 OS, 16 GBmain memory, 6x750 GB SATA hard disks, and three Giga-bit network cards in bonding mode. As we had full controlof this cluster, there was no additional workload on the clus-ter during our experiments. Thus, this represents the bestcase scenario, which we consider as baseline.

We used the default settings for Ubench, Bonnie++, andIperf in all our experiments. As Ubench performance alsodepends on compiler performance, we used gcc-c++ 4.1.2on all Amazon EC2 instances and all physical nodes of ourcluster. Finally, as Amazon EC2 is used by users from allover the world, and thus with different time zones, there isno local time. This is why we decided to use CET as thecoordinated time for presenting results.

4.5 Measure of VariationLet us now introduce the measure we use to evaluate the

variance in performance. There exist a number of measuresto represent this: range, interquartile range, and standarddeviation among others. The standard deviation is a widelyused measure of variance, but it is hard to compare for dif-ferent measurements. In other words, a given standard devi-ation value can only indicate how high or low the variance isin relation to a single mean value. Furthermore, our studyinvolves the comparison of different scales. For these tworeasons, we consider the Coefficient of Variation (COV),which is defined as the ratio of the standard deviation tothe mean. Since we compute the COV over a sample of re-sults, we consider the sample standard deviation. Therefore,the COV is formally defined as follows,

COV =1

x·

vuut 1

N − 1·

NX

i=1

(xi − x)2

Here N is the number of measurements; x1, .., xN are themeasured results; and x is the mean of those measurements.Note that we divide by N − 1 and not by N , as only N − 1of the N differences (xi − x) are independent [18].

In contrast to the standard deviation, the COV allows usto compare the degree of variation from one data series toanother, even if the means are different from each other.

•Coefficient of Variation (COV): ratio of the standard deviation to the mean


standard deviation (s)mean (x)



• Motivation

• Related Work

• Background

• Methodology


14Results & Analysis

0

100000

200000

300000

400000

500000

600000

Week 52 Week 53 Week 1 Week 2 Week 3

CP

U P

erf

orm

ance [U

bench s

core

]

Measurements per Hour

US location EU location


Information Systems GroupCPU Performance

15

0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

an

ce [

Ub

en

ch s

core

]



Large Instances

Results & Analysis


CPU Performance [Ubench score]



15

0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

an

ce [

Ub

en

ch s

core

]



Large Instances

Results & Analysis

COV of 24%



Baseline: COV of 0.1%



15

0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

an

ce [

Ub

en

ch s

core

]



Large Instances

Results & Analysis

COV of 24%




Obser vat ion : two bands in performance


Information Systems GroupMemory Performance

16

0

50000

100000

150000

200000

250000

300000

350000


Mem

ory

Perf

orm

ance

[U

bench

sco

re]



Results & Analysis

Large InstancesMeasurements per Hour

Memory Performance [Ubench score]



16

0

50000

100000

150000

200000

250000

300000

350000


Mem

ory

Perf

orm

ance

[U

bench

sco

re]



Results & Analysis



COV of 10%




16

0

50000

100000

150000

200000

250000

300000

350000


Mem

ory

Perf

orm

ance

[U

bench

sco

re]



Results & Analysis



COV of 10%

Observation: again, two bands in performance



Information Systems GroupRandom I/O Performance

17

50

100

150

200

250

300

350

400


Random

Read D

isk

Perf

orm

ance

[K

B/s

]



Results & Analysis


Random Read Disk Performance [KB/s]



17

50

100

150

200

250

300

350

400


Random

Read D

isk

Perf

orm

ance

[K

B/s

]



Results & Analysis



COV of 13%




17

50

100

150

200

250

300

350

400


Random

Read D

isk

Perf

orm

ance

[K

B/s

]



Results & Analysis



COV of 13%


Observation: one band in performance for US


Information Systems GroupAvailability Zones


0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

ance

[U

bench

sco

re]


us-east-1a us-east-1b us-east-1c us-east-1d






0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

ance

[U

bench

sco

re]





“Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones...”




0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

ance

[U

bench

sco

re]






Observation 1 : us-east-1d results always in the upper band




0

100000

200000

300000

400000

500000

600000


CP

U P

erf

orm

ance

[U

bench

sco

re]





Observation 2: lower band results belong only to us-east-1c


Observation 1 : us-east-1d results always in the upper band


Information Systems GroupCPU Distribution (1/2)


Hadoop++

12

Why two bands?




Hadoop++

12

Why two bands?

Hadoop++

13

... and if I verify the performance of the

different processors type?



Run Ubench 150 times

For each pair

CPU Distribution (1/2)


Hadoop++

12

Why two bands?

Hadoop++

13



Xeon Opteron


Request 5 Xeon-Opteron pairs




For each pair

CPU Distribution (1/2)


Hadoop++

12

Why two bands?

Hadoop++

13



Xeon Opteron


Request 5 Xeon-Opteron pairs examining the /proc/cpuinfo fileexamining the /proc/cpuinfo file




0

20000

40000

60000

80000

100000

120000

140000

Instance 1 Instance 2 Instance 3 Instance 4 Instance 5

CP

U p

erf

orm

ance [U

bench S

core

]

Instances

Xeon Opteron

Instance-pair 1 Instance-pair 2 Instance-pair 3 Instance-pair 4 Instance-pair 5




Different performance per processor type

0

20000

40000

60000

80000

100000

120000

140000


CP

U p

erf

orm

ance [U

bench S

core

]

Instances

Xeon Opteron





COV: 2%

COV: 1% Different performance per processor type

While the COV for combined measurements is 24%

0

20000

40000

60000

80000

100000

120000

140000


CP

U p

erf

orm

ance [U

bench S

core

]

Instances

Xeon Opteron





COV: 2%

COV: 1% Different performance per processor type

While the COV for combined measurements is 24%

0

20000

40000

60000

80000

100000

120000

140000


CP

U p

erf

orm

ance [U

bench S

core

]

Instances

Xeon Opteron

Observation: 1 cpu 1 underlying system[memory and I/O follows this pattern ]



Information Systems GroupLarger Clusters

21

50 Large Instances Cluster

0

20000

40000

60000

80000

100000

120000

140000

0 5 10 15 20 25 30 35

Me

an

Ub

en

ch S

core

Start Point [Hours]

Results & Analysis


Information Systems GroupLarger Clusters

21

50 Large Instances Cluster

0

20000

40000

60000

80000

100000

120000

140000

0 5 10 15 20 25 30 35

Me

an

Ub

en

ch S

core

Start Point [Hours]

Observation: 30% of the measurements fall into the low performance band.

Still, two bands in performance

Results & Analysis


Information Systems GroupMapReduce Job

22

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


Local Cluster

Amazon EC2




Results & Analysis


Information Systems GroupMapReduce Job

22

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40 45 50

Runtim

e [sec]

Measurements


Local Cluster

Amazon EC2




Observation: upper band results from virtual clusters composed of 80% of Xeon processors (of 20% for the lower band).

Results & Analysis


Information Systems GroupLessons Learned

23Conclusion



23Conclusion

16

Ah O.K.! So...

1514

Hadoop++



23Conclusion

16

Ah O.K.! So...

•Be careful!

1514

Hadoop++



23Conclusion

•High variance in performance: COV up to 24%

16

Ah O.K.! So...

•Be careful!

•Hard to interpret results

•Repeatability to limited extent

1514

Hadoop++



23Conclusion


16

Ah O.K.! So...

•Be careful!



•Two bands in performance

1514

Hadoop++



23Conclusion


16

Ah O.K.! So...

•Be careful!



•Two bands in performance

•Partially due to different physical CPU types

1514

Hadoop++


Information Systems GroupConclusion

24Conclusion



24Conclusion

• Amazon should:

• reveal the physical details• allow users to specify physical characteristics



24Conclusion

•Researchers should

• use equivalent virtual clusters to compare systems• report underlying system type with the results

• Amazon should:




24Conclusion

17

By the way...

Amazon recently introduced the cluster-compute Instances

[after VLDB’10 deadline]



• Amazon should:




24Conclusion

17

By the way...





• Amazon should:


Still significantly higher than in a local cluster



24Conclusion

17

By the way...





• Amazon should:


Thank you!

Still significantly higher than in a local cluster

Runtime Measurements in the Cloud · September 14th, 2010 Runtime Measurements in the Cloud, J....

Documents

Transcript of Runtime Measurements in the Cloud · September 14th, 2010 Runtime Measurements in the Cloud, J....