Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
-
Upload
rafael-ferreira-da-silva -
Category
Technology
-
view
556 -
download
2
Transcript of Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa DeelmanUSC Information Sciences Institute
18th Workshop on Advances in Parallel and Distributed Computational Models30th IEEE International Parallel & Distributed Processing Symposium
May 23, 2016 – Chicago, USA
OUTLINE
Introduction Experiment Conditions Storage Configurations
Evaluation Performance Benchmarking Summary
Scientific workflowsMotivationGoals
Scientific applicationExecution environment
ConclusionsFuture Research Directions
Cloud storageVirtual machine storageSubmit host
Network I/OSSD I/O
MakespanCumulative execution timeData transfer time
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services 2Pegasus
3
INTRODUCTION
>>Scientific WorkflowsLarge scale computationsProvenance
Data-intensive WorkflowsCharacterized by tasks that consume large volumes of data, and the application makespan is dominated by the processing of data movement and I/O operations
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
job
dependencyUsuallydatadependencies
split
merge
pipeline
Command-lineprograms
DAGdirected-acyclic graphs
Pegasus
4
MOTIVATION
>>>Scientific WorkflowsCampus clustersNational cyberinfrastructures
Cloud ComputingPredictable performanceQuality of the service
Challenge: run data-intensive application on cloudsClouds were not designed for the execution of complex simulations and I/O intensive applications
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
AdvantagesOn-demand resource provisioningAbility to store virtual machines (VM)Resource monitoringFull control of the execution environment
5
GOALS
OCTOBER 2014
Therefore…There is an incentive for researchers to explore avenues to reduce the cost of executing workflows, while increasing their efficiency
In this work…
Practical evaluation of the performance of anI/O-intensive scientific workflow (Montage)
Cloud environments:
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Cost efficiencyApplication deadline
Energy-aware schedulingOn-demand resource provisioning
…GoogleCompute Engine
6
SCIENTIFIC APPLICATION
>Mon
tage
10,429 jobsreads 4.2 GB of input dataproduces 7.9 GB of output data
Montage is an astronomy application that creates astronomical image mosaics using data collected from telescopes
An illustrative representation of the Montage workflow
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
A workflow instance operates over about
23K intermediate files, where most of them
have a few MBs6
6190
2 199
5931
10598
0
3000
6000
9000
0 256 4K 64K 512K 4MFile Size (Bytes)
# Fi
les
>
EXECUTION ENVIRONMENT
Pegasus Workflow Management Systemhttp://pegasus.isi.edu
Automates complex, multi-stage processing pipelinesEnables parallel, distributed computationsAutomatically executes data transfersHandles failures with to provide reliability
Amazon: m3.2xlargeGoogle: n1-standard-8
8 cores per node, 30GB of memory,and 70GB SSD
7
VM Instance Types cleanup jobRemovesunuseddata
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>stage-in job
stage-out job
registration job
Transferstheworkflowinputdata
Transferstheworkflowoutputdata
Registerstheworkflowoutputdata
DATA MOVEMENT TOOLS
aws-cli
8
In order to measure the actual overhead involved on data transfers, weinitially limit the transfer mechanisms to single-threaded mode
gsutil pegasus-s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Amazon standard client Google standard clientPegasus standard client built on top of
standard Amazon API
9
STORAGE CONFIGURATION DEPLOYMENTS
However, storing all intermediate files in a storage service may be costly
Intermediate files are stored into object storage.It is expected to be the most commonly used in cloud
environments due to its simplicity
CLO
UD
STO
RA
GE
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
objectstorage
submit host(e.g.,user’slaptop)
10
STORAGE CONFIGURATION DEPLOYMENTS
Although it may reduce the monetary cost, it may not be scalable for very large workflow executions
Intermediate files are stored locally to disks attached to the VM
Helps quantify the overhead of other storage configurations
VM S
TOR
AG
E
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
localdisk
VirtualMachineB
VirtualMachineA
submit host(e.g.,user’slaptop)
localdisk
11
STORAGE CONFIGURATION DEPLOYMENTS
Unlikely to be used in real production workflows due to the high latency in transferring data to the submit host
Intermediate files are stored at the submit hostUseful in low cost scenarios, or local analyses in the intermediate results
SUB
MIT
HO
ST
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
submit host(e.g.,user’slaptop)
Typical OSG sitesOpenScience Grid
12
OVERALL MAKESPAN EVALUATION
The VM storage configuration outperforms all other configurations due to the absence of transferring
intermediate files
Average workflow makespan, the turnaround time for a workflow to complete its execution
Average makespan values for 3 runs of the Montage Workflow for different storage configurations>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
The performance gain on storing data locally is 400% when compared to the Cloud Storage configuration, and up to 580% in relation to the Submit Host configuration
13
EVALUATION
The execution time for the VM Storage configuration is larger due to the execution of local data movement
operations (waiting for concurrent I/O operations)
Data transfer operations become a bottleneck in the Cloud Storage and Submit Host configurations
VM S
tora
ge
Subm
it H
ost
Clo
ud S
tora
ge
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
0
4000
8000
12000
Makespan Execution Time Transfer Time
Tim
e (s
)
AmazonGoogle
14
BENCHMARKING
Network I/OEvaluated in the Cloud Storage scenario(files are stored in an object store)
>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●●●●●●
●
●
●
●
●●●●●
●
●
●●●
●●●●
●
●●●
●●●
●●
●●
●
●
●●●●
●●●●●
●●●●●
●●●●
●
●●
●
●
●
●
●●●
●
●●
●
●
●●●●
●●●●●
●●●●●
1
Hourly Downloads
Log 1
0(Ti
me
in s
econ
ds)
● aws−cli gsutil pegasus−s3
Time series download times from an object store to a VM(May 12, 2015 to May 18, 2015)
The goal in transferring an empty file is to measure the overhead induced by the system
The merge job stages in over 6K small files with average size of 0.3KB
The performance of these operations are of utmost importance
15
NETWORK I/Opegasus-s3 has better performance
for small file sizes in mostly cases
Poor performance of the gsutil tool may include network latency and increased load
Upl
oad
Dow
nloa
d10
20
30
0B 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Tim
e (s
)
aws−cligsutilpegasus−s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
10
20
0B 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Tim
e (s
)
aws−cligsutilpegasus−s3
16
NETWORK I/O
We used tcpdump and wireshark to trace TCP packets
We ran the transfer tools in the debug mode and evaluated all request operationsBytes per 0.01s transferred per TCP connection
<
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
aws-cli uses https over all operations, which generates an additional overhead of Transport Level Security (TLS)
●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0
3000
6000
9000
Time (s)
Byte
s pe
r tic
k (0
.01s
)
● pegasus−s3 aws−cli (TCP 1) aws−cli (TCP 2)
Amazon client (aws-cli) uses two TCP connections to perform a copy command, while pegasus-s3 uses only one
TLS overheadGET request gsutil also establishes two TCP connections, but two GET requests
17
BENCHMARKING
Amazon provides a consistent baseline throughput of 3 IOPS per GB and handles
bursts up to 3000 IOPS per volume
Executed dd with a block size of 4MB and one thousand blocksPerformed 100 sequential iterations
I/O T
hrou
ghpu
t
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
2550
7510
0
10 20 30 40 50 60 70 80 90 100
Iteration
Thro
ughp
ut (M
B/s)
● Google AmazonSSD I/OEvaluated in the VM Storage scenario(files are stored in an attached disk)
> Amazon General Purpose SSD / Google Persistent SSD
Amazon’s bursttolerance policy
Alternative solution: Provisioned IOPS SSD VolumeDrawback: may significantly increase the cost
MULTI-THREADED DATA TRANSFER
Single-threaded ModeFacilitates the detection/evaluation of performance issues
Not often used in production environments
18
Multi-threaded Mode
H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
0
5000
10000
15000
Single−thread Multi−threaded
Mak
espa
n (s
)
AmazonGoogle
Average workflow makespan for 3 runs of the Montage workflow using singe- and multi-threaded mode for data transfer
Workflow runs using the Cloud Storage configuration
Gradually increased the number of threads used to execute the transfer operations. A reduction in the makespan was observed up to 5 threads
Makespan for Amazon is 21% lower, while for Google the improvement is of 32%
>
SUMMARY
Summary Future Research DirectionsConclusionFuture Research Directions
For workflows that operate over a large number of (small) files, the performance may be poor. A possible solution to mitigate this overhead is to use a bulk mechanism to concatenate and manage files within a single transfer request (similar to multipart strategy)
The performance difference among data transfer clients is mostly due to the number of connections established to perform a transfer operation. If secured connections is not a requirement, not using them could significantly increase the performance
19H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
SUMMARY
Summary Future Research DirectionsConclusionFuture Research Directions
Cloud computing is constantly evolving
20H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Time series download times from an object store to a VM(Oct 28, 2015 to Dec 02, 2015)
>
SUMMARY
Summary ContributionsConclusionFuture Research Directions
An evaluation of the impact of varying storage configurations on the performance of an I/O-intensive workflow
A quantitative analysis of application performance on popular cloud systems using provenance data
A comprehensive analysis of benchmarking file transfer times of different sizes using different cloud tools
A discussion on indicators that would significantly improve the performance of I/O-intensive workflows on cloud environments
21H. Nawaz, G. Juve, R. Ferreira da Silva, E. DeelmanPerformance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Rafael Ferreira da Silva, Ph.D.Computer Scientist, USC Information Sciences [email protected] – http://rafaelsilva.com
Thank You
Questions?
http://pegasus.isi.edu
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman