Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty...

24
Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty Humphrey Department of Computer Science, University of Virginia Science Cloud Workshop June 21st, 2010 1

Transcript of Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty...

Page 1: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty HumphreyDepartment of Computer Science, University of Virginia

Science Cloud Workshop June 21st, 2010 1

Page 2: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

The question is not can I build my application for the cloud, it’s how to do it well

How will it perform? Our focus

How do Azure services perform?

Experiments run between November, 2009 and February, 2010

Science Cloud Workshop June 21st, 2010 2

Page 3: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 3

Deploying and Scaling Compute Resources

Storage Services

Direct TCP Communication

Azure SQL Services

Page 4: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Methodology

Application deployed from Azure Blob Storage

Deployment package <5MB

Measure time to start deployment (i.e. 4 small instances.)

Measure time to double instance count

Between Dec 17, 2009 and Jan 09 2010 we ran the experiment 431 times. Failure rate: 2.6%

Science Cloud Workshop June 21st, 2010 4

Page 5: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 5

9.910.6

11.3

13.8

8.99.9

11.0

13.2

0

2

4

6

8

10

12

14

16

Small Medium Large X-Large

Minutes

VM Instance Size

Average Instance Adding Time

Web

Worker

Page 6: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 6

0

5

10

15

20

25

Scale Web Scale Worker Start Web Start Worker

Minutes

Role

Scaling versus Startup for 4 Small Instances

4th VM

3rd VM

2nd VM

1st VM

Page 7: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Deploying a VM takes around 10 minutes—too long?

Adding instances takes much longer than initial deployment—even worse

Larger instance types take longer to start & web roles take longer than worker roles

Not all instances will come online at the same time

Science Cloud Workshop June 21st, 2010 7

Page 8: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Blobs – Large, unstructured storage

Tables – Semi-structured data, queries, updates, inserts, deletes

Queues – FIFO, asynchronous messaging

Science Cloud Workshop June 21st, 2010 8

Page 9: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Large object storage – 50GB or 1TB limit depending on type

Get/Put semantics Performance isolated between blob

containers Methodology:

Get a 1GB blob concurrently with 1 – 192 clients operating on the same blob

Put 1GB blobs concurrently into same container

Science Cloud Workshop June 21st, 2010 9

Page 10: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 10

0

2

4

6

8

10

12

14

1 8 16 32 64 128 192

Per-Client Bandwidth

(MB/s)

Concurrent Instances

Upload

Download

Download more than 2x upload speed Single, small client ~100Mb/s

Page 11: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 11

0

50

100

150

200

250

300

350

400

450

1 8 16 32 64 128 192

MB/s

Concurrency

Cumulative Bandwidth

Download

Upload

Page 12: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Entity, Attribute, Value model Semi-structured, no schema Methodology: Perform 4 primary operations: insert, query, update,

delete

Each client operates on unique entities (rows) within the same shared partition

Insert & Query & Delete: 500 ops/client

Update: 100 ops/client

~220K entities in table for Query, Update, & Delete

Science Cloud Workshop June 21st, 2010 12

Page 13: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 13

0

1

2

3

4

5

6

1 8 16 32 64 128 192

Ave

rag

e O

pe

rati

on

Tim

e (

sec)

Table Update

1K

4K

16K

64K

0

20

40

60

80

100

120

140

160

180

200

1 8 16 32 64 128 192

Ave

rag

e T

ime

(m

s)

Concurrent Instances

Table Performance Using 4KB Entities

Query

Insert

Delete

Page 14: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Passing small (<=8K) messages in a FIFO style

Get, Put, Peek operations

Methodology: Single queue, concurrent clients get/put messages

Science Cloud Workshop June 21st, 2010 14

Page 15: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 15

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128 192

Put Message

512B

1K

4K

8K

0

10

20

30

40

50

1 2 4 8 16 32 64 128 192

Peek Message

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128 192

Get Message

Vertical Axis: Msg/sec

Horizontal Axis: Concurrent Instances

Page 16: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

TCP Endpoints allow Worker-to-Worker Role communication directly

Offers a lower latency communication mechanism than message queues

No intermediary required

Science Cloud Workshop June 21st, 2010 16

Page 17: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 17

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Cu

mu

lati

ve P

erc

en

tag

e

MB/sec

Bandwidth

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 20 40 60 80 100

Cu

mu

lati

ve P

erc

en

tag

e

RTT ms

Latency

Page 18: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

TCP performance can change dramatically, why?

Science Cloud Workshop June 21st, 2010 18

0

20

40

60

80

100

Time

MB/s

Azure to Azure TCP Bandwidth over Time

Page 19: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Normal SQL Server capabilities (RDBMS) Size limited to <50GB per database Tested with TPC-E benchmark for OLTP

workload

Our .NET implementation of the benchmark

Simulates a brokerage house

Testing DB is 3GB in size

Science Cloud Workshop June 21st, 2010 19

Page 20: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 20

00.05

0.10.15

0.20.25

0.30.35

0.40.45

Avg

. Tra

ns.

Tim

e in

Se

con

ds

Single Thread TPCE Client Local to LAN

Azure to Azure

Page 21: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 21

75%

100%

125%

150%

175%

200%

225%

1 2 4 6 8 16 20 30 40 50 64

Number of Concurrent Threads

Azure Cloud LAN Server

Normalized average TPCE transaction time (only committed transactions)

Page 22: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Science Cloud Workshop June 21st, 2010 22

0%10%20%30%40%50%60%70%80%90%

100%

1 2 4 6 8 16 20 30 40 50

Number of Concurrent Client Threads

Azure Cloud

LAN server

Normalized Percent of Transactions Committed per Client

Thread

Page 23: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Deployment size expected client slowdown and service throughput

Deployment scaling is slower than initial deployment, web roles slower than worker roles, large VMs slower than small VMs

VM deployment can take a long time depending on how many are requested

Distribute blob accesses across as many containers as possible to achieve performance at scale

Science Cloud Workshop June 21st, 2010 23

Page 24: Zach Hill, Jie Li, Ming Mao, Arkaitz Ruiz-Alvarez, Marty ...datasys.cs.iit.edu/events/ScienceCloud2010/s01.pdfExpect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially

Access tables by partition and row key. Property filters are slow

Tables scale well for query and insert, but watch out for delete and update – this is expected

Expect SQL Azure 2x slowdown SQL Azure scales reasonably well, especially under

30 or less concurrent clients SQL Azure performance over time: low variability

Science Cloud Workshop June 21st, 2010 24