Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

22
PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa Deelman USC Information Sciences Institute 18th Workshop on Advances in Parallel and Distributed Computational Models 30th IEEE International Parallel & Distributed Processing Symposium May 23, 2016 – Chicago, USA

Transcript of Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

Page 1: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES

Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa DeelmanUSC Information Sciences Institute

18th Workshop on Advances in Parallel and Distributed Computational Models30th IEEE International Parallel & Distributed Processing Symposium

May 23, 2016 – Chicago, USA

Page 2: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

OUTLINE

Introduction Experiment Conditions Storage Configurations

Evaluation Performance Benchmarking Summary

Scientific workflowsMotivationGoals

Scientific applicationExecution environment

ConclusionsFuture Research Directions

Cloud storageVirtual machine storageSubmit host

Network I/OSSD I/O

MakespanCumulative execution timeData transfer time

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services 2Pegasus

Page 3: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

3

INTRODUCTION

>>Scientific WorkflowsLarge scale computationsProvenance

Data-intensive WorkflowsCharacterized by tasks that consume large volumes of data, and the application makespan is dominated by the processing of data movement and I/O operations

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

job

dependencyUsuallydatadependencies

split

merge

pipeline

Command-lineprograms

DAGdirected-acyclic graphs

Pegasus

Page 4: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

4

MOTIVATION

>>>Scientific WorkflowsCampus clustersNational cyberinfrastructures

Cloud ComputingPredictable performanceQuality of the service

Challenge: run data-intensive application on cloudsClouds were not designed for the execution of complex simulations and I/O intensive applications

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

AdvantagesOn-demand resource provisioningAbility to store virtual machines (VM)Resource monitoringFull control of the execution environment

Page 5: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

5

GOALS

OCTOBER 2014

Therefore…There is an incentive for researchers to explore avenues to reduce the cost of executing workflows, while increasing their efficiency

In this work…

Practical evaluation of the performance of anI/O-intensive scientific workflow (Montage)

Cloud environments:

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

Cost efficiencyApplication deadline

Energy-aware schedulingOn-demand resource provisioning

…GoogleCompute Engine

Page 6: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

6

SCIENTIFIC APPLICATION

>Mon

tage

10,429 jobsreads 4.2 GB of input dataproduces 7.9 GB of output data

Montage is an astronomy application that creates astronomical image mosaics using data collected from telescopes

An illustrative representation of the Montage workflow

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

A workflow instance operates over about

23K intermediate files, where most of them

have a few MBs6

6190

2 199

5931

10598

0

3000

6000

9000

0 256 4K 64K 512K 4MFile Size (Bytes)

# Fi

les

>

Page 7: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

EXECUTION ENVIRONMENT

Pegasus Workflow Management Systemhttp://pegasus.isi.edu

Automates complex, multi-stage processing pipelinesEnables parallel, distributed computationsAutomatically executes data transfersHandles failures with to provide reliability

Amazon: m3.2xlargeGoogle: n1-standard-8

8 cores per node, 30GB of memory,and 70GB SSD

7

VM Instance Types cleanup jobRemovesunuseddata

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

>stage-in job

stage-out job

registration job

Transferstheworkflowinputdata

Transferstheworkflowoutputdata

Registerstheworkflowoutputdata

Page 8: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

DATA MOVEMENT TOOLS

aws-cli

8

In order to measure the actual overhead involved on data transfers, weinitially limit the transfer mechanisms to single-threaded mode

gsutil pegasus-s3

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

Amazon standard client Google standard clientPegasus standard client built on top of

standard Amazon API

Page 9: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

9

STORAGE CONFIGURATION DEPLOYMENTS

However, storing all intermediate files in a storage service may be costly

Intermediate files are stored into object storage.It is expected to be the most commonly used in cloud

environments due to its simplicity

CLO

UD

STO

RA

GE

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

objectstorage

submit host(e.g.,user’slaptop)

Page 10: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

10

STORAGE CONFIGURATION DEPLOYMENTS

Although it may reduce the monetary cost, it may not be scalable for very large workflow executions

Intermediate files are stored locally to disks attached to the VM

Helps quantify the overhead of other storage configurations

VM S

TOR

AG

E

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

localdisk

VirtualMachineB

VirtualMachineA

submit host(e.g.,user’slaptop)

localdisk

Page 11: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

11

STORAGE CONFIGURATION DEPLOYMENTS

Unlikely to be used in real production workflows due to the high latency in transferring data to the submit host

Intermediate files are stored at the submit hostUseful in low cost scenarios, or local analyses in the intermediate results

SUB

MIT

HO

ST

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

submit host(e.g.,user’slaptop)

Typical OSG sitesOpenScience Grid

Page 12: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

12

OVERALL MAKESPAN EVALUATION

The VM storage configuration outperforms all other configurations due to the absence of transferring

intermediate files

Average workflow makespan, the turnaround time for a workflow to complete its execution

Average makespan values for 3 runs of the Montage Workflow for different storage configurations>

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

The performance gain on storing data locally is 400% when compared to the Cloud Storage configuration, and up to 580% in relation to the Submit Host configuration

Page 13: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

13

EVALUATION

The execution time for the VM Storage configuration is larger due to the execution of local data movement

operations (waiting for concurrent I/O operations)

Data transfer operations become a bottleneck in the Cloud Storage and Submit Host configurations

VM S

tora

ge

Subm

it H

ost

Clo

ud S

tora

ge

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

0

4000

8000

12000

Makespan Execution Time Transfer Time

Tim

e (s

)

AmazonGoogle

Page 14: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

14

BENCHMARKING

Network I/OEvaluated in the Cloud Storage scenario(files are stored in an object store)

>

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

●●●●●●

●●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●●●●

●●●●●

1

Hourly Downloads

Log 1

0(Ti

me

in s

econ

ds)

● aws−cli gsutil pegasus−s3

Time series download times from an object store to a VM(May 12, 2015 to May 18, 2015)

The goal in transferring an empty file is to measure the overhead induced by the system

The merge job stages in over 6K small files with average size of 0.3KB

The performance of these operations are of utmost importance

Page 15: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

15

NETWORK I/Opegasus-s3 has better performance

for small file sizes in mostly cases

Poor performance of the gsutil tool may include network latency and increased load

Upl

oad

Dow

nloa

d10

20

30

0B 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Tim

e (s

)

aws−cligsutilpegasus−s3

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

10

20

0B 10KB 100KB 1MB 10MB 100MB 1GBFile Size

Tim

e (s

)

aws−cligsutilpegasus−s3

Page 16: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

16

NETWORK I/O

We used tcpdump and wireshark to trace TCP packets

We ran the transfer tools in the debug mode and evaluated all request operationsBytes per 0.01s transferred per TCP connection

<

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

aws-cli uses https over all operations, which generates an additional overhead of Transport Level Security (TLS)

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0

3000

6000

9000

Time (s)

Byte

s pe

r tic

k (0

.01s

)

● pegasus−s3 aws−cli (TCP 1) aws−cli (TCP 2)

Amazon client (aws-cli) uses two TCP connections to perform a copy command, while pegasus-s3 uses only one

TLS overheadGET request gsutil also establishes two TCP connections, but two GET requests

Page 17: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

17

BENCHMARKING

Amazon provides a consistent baseline throughput of 3 IOPS per GB and handles

bursts up to 3000 IOPS per volume

Executed dd with a block size of 4MB and one thousand blocksPerformed 100 sequential iterations

I/O T

hrou

ghpu

t

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

2550

7510

0

10 20 30 40 50 60 70 80 90 100

Iteration

Thro

ughp

ut (M

B/s)

● Google AmazonSSD I/OEvaluated in the VM Storage scenario(files are stored in an attached disk)

> Amazon General Purpose SSD / Google Persistent SSD

Amazon’s bursttolerance policy

Alternative solution: Provisioned IOPS SSD VolumeDrawback: may significantly increase the cost

Page 18: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

MULTI-THREADED DATA TRANSFER

Single-threaded ModeFacilitates the detection/evaluation of performance issues

Not often used in production environments

18

Multi-threaded Mode

H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

>

0

5000

10000

15000

Single−thread Multi−threaded

Mak

espa

n (s

)

AmazonGoogle

Average workflow makespan for 3 runs of the Montage workflow using singe- and multi-threaded mode for data transfer

Workflow runs using the Cloud Storage configuration

Gradually increased the number of threads used to execute the transfer operations. A reduction in the makespan was observed up to 5 threads

Makespan for Amazon is 21% lower, while for Google the improvement is of 32%

Page 19: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

>

SUMMARY

Summary Future Research DirectionsConclusionFuture Research Directions

For workflows that operate over a large number of (small) files, the performance may be poor. A possible solution to mitigate this overhead is to use a bulk mechanism to concatenate and manage files within a single transfer request (similar to multipart strategy)

The performance difference among data transfer clients is mostly due to the number of connections established to perform a transfer operation. If secured connections is not a requirement, not using them could significantly increase the performance

19H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

Page 20: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

>

SUMMARY

Summary Future Research DirectionsConclusionFuture Research Directions

Cloud computing is constantly evolving

20H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

Time series download times from an object store to a VM(Oct 28, 2015 to Dec 02, 2015)

Page 21: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

>

SUMMARY

Summary ContributionsConclusionFuture Research Directions

An evaluation of the impact of varying storage configurations on the performance of an I/O-intensive workflow

A quantitative analysis of application performance on popular cloud systems using provenance data

A comprehensive analysis of benchmarking file transfer times of different sizes using different cloud tools

A discussion on indicators that would significantly improve the performance of I/O-intensive workflows on cloud environments

21H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus

Page 22: Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES

Rafael Ferreira da Silva, Ph.D.Computer Scientist, USC Information Sciences [email protected] – http://rafaelsilva.com

Thank You

Questions?

http://pegasus.isi.edu

H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman