Post on 08-Jan-2018
description
Cloud Computing for the Automated Assignment of Broadband Rotational
Spectra: Porting Autofit to Amazon EC2A thesis by Aaron C. Olinger
0 5 10 15 200
5
10
15
20
Number of Instances
Spee
d up
Fac
tor
Autofit: Fitting rotational Spectra [2]
[1]: Phillips, M., Pruning Interstellar Space: Rotational Spectroscopy of Small Organic Molecules. Senior Thesis, New College of Florida (2013)[2]: N.A. Seifert, I.A. Finneran, C. Perez, D.P. Zaleski, J.L. Neill, A.L. Steber, R.D. Suenram, A. Lesarri, S.T. Shipman, B.H. Pate, “Autofit, an Automated Fitting Tool for Broadband Rotational Spectra, and Applications to 1-Hexanal”, Journal of Molecular Spectroscopy (2015)
[1]
Distributed Computing
DistributorAlgorithm
Server
Client ClientClient
App AppApp
Cloud Computing
vCPU ECU Memory (GiB)
Instance Storage (GB)
Linux/Unix Usage
General Purpose – Current Generation
t2.micro 1 Variable 1 EBS Only $0.013
m3.mediu
m
1 3 3.75 1 x 4 SSD $0.070
m3.large 2 6.5 7.5 1 x 32 SSD $0.140
m3.xlarge 4 13 15 2 x 40 SSD $0.280
m3.2xlarge 8 29 30 2 x 80 SSD $0.560
vCPU ECU Memory (GiB)
Instance Storage (GB)
Linux/Unix Usage
Compute Optimized – Current generation
c4.large 2 8 3.75 EBS Only $0.116
c4.xlarge 4 16 7.5 EBS Only $0.232
c4.8xlarge 36 132 60 EBS Only $1.856
c3.large 2 7 3.75 2 x 16 SSD $0.105
c3.8xlarge 32 108 60 2 x 320 SSD $1.680
• Rates charged at the start of each hour
[1]
[1]
[1]: Amazon Web Services. Instance Types. http://aws.amazon.com/ec2/instance-types/ (Accessed May 5, 2015)
Creating the Distribution Network
Performance t2.micro
0 5 10 15 20 25 30 3505
101520253035
t2.micro Speed up factor and Ideal
Speed Up FactorIdeal Parallelization
Number of Instances
Spee
d up
Fac
tor
0 5 10 15 20 25 30 35 400
100200300400500600
700 t2.micro Time to Complete
Trial 1Trial 2Trial 3Trial 4
Number of Instances
Tim
e to
Com
plet
e(s)
vCPU ECU Memory (GiB) Instance Storage (GB) Linux/Unix Usage
General Purpose – Current Generation
t2.micro 1 Variable 1 EBS Only $0.013 +750hr FREE per month
Performance c4.large
0 5 10 15 200
100
200
300
400
500
600
f(x) = 520.851888670618 x^-0.877354094684018R² = 0.996683951187583
c4.large mean completion time
Instances
Tim
e to
Com
plet
e (s
)
0 5 10 15 200
5
10
15
20
c4.large Speed up factor and Ideal
Speed Up FactorIdeal Parallelization
Number of Instances
Spee
d up
Fac
tor
Time to complete on Desktop (s)
Time to complete on
Laptop (s)
Time to complete on
Desktop (minutes)
Time to complete on
Laptop (minutes)
mean 600 1595 10 26.6standard deviation 35 32 0.58 0.53
vCPU ECU Memory (GiB) Instance Storage (GB) Linux/Unix Usage
Compute Optimized – Current generation
c4.large 2 8 3.75 EBS Only $0.116
Cost Analysis0 5 10 15 20
8
9
10
11
12
13
14
15
Predicted Cost for 3 Day Job Using c4.large Instances
Instances running
Cos
t ($)
vCPU ECU Memory (GiB) Instance Storage (GB) Linux/Unix Usage
Compute Optimized – Current generation
c4.large 2 8 3.75 EBS Only $0.116Number of instances 20 15 10 5 3 2 1
time(hr) 5.2 6.6 8.4 15.6 24.9 36.9 72.0Cost($) 13.92 12.18 10.44 9.28 8.7 8.584 8.468
Future Work
• Run additional tests on larger data sets• Complete benchmarking for more instance types• Incorporate t2 instances into network
Acknowledgements
• Pate Group (UVa)• Maria Phillips• Ian Finneran• Dr. Nathan Seifert (UVa)• Dr. Steve Shipman• Noah Anderson
This thesis was made possible through the work of: