AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

35
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 [email protected] Science as a Service on AWS Ravi K Madduri

description

We present our work on creating sustainable science services using Globus, Amazon Web Services and Galaxy framework. We focus on Globus Genomics as successful usecase

Transcript of AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

Page 1: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

[email protected]

Science as a Service on AWS

Ravi K Madduri

Page 2: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 3: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Outline

• CI Mission and Introduction of Science as a Service

• Motivation– Why is this important?

• Separation of concerns – Going far together• Examples of Science as a Service• Focus on Globus Genomics as a Success story

– Announcing Globus Genomics AWS Test Drive

Page 4: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery Infrastructure

Provide more capability for people at lower cost by delivering

Science as a servicewww.globus.org

Page 5: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Two Broader Themes

• Productivity of Researchers– Time spent performing administrative tasks Vs

time spent doing science – Reproducibility

• Sustainability of scientific software– Reduction in funding for science

Page 6: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

Page 7: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 8: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Page 9: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

42%

Page 10: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Presenting21st Century Discovery Infrastructure

Page 11: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Going Far Together

Separation of Concerns

Page 12: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Science Stack• Galaxy

– Interactive execution– Creation, Execution, Sharing,

Discovering Workflows

• Globus– Data management– Identity Management

• AWS– EC2, EBS, S3, SNS, Spot,

Route 53, Cloud Formation

SaaS

PaaS

IaaS

Page 13: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

Page 14: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

DataDestination

User initiates transfer request1

Globus moves and syncs files2

Globus notifies user3

Globus: Fast, reliable data transfer

Page 15: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Amazon S3 Endpoints

Page 16: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus tracks shared files; no need to move files to cloud storage!

2

User B logs into Globus and

accesses shared file

3

Globus: Sharing off existing systems

Page 17: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

MyProxy

Globus: Federated identity

Page 18: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

>25,000 registered users; >150 daily50 PB moved; >1B files

10x (or better) performance vs. scp99.9% availability

Entirely hosted on Amazon

Globus Transfer

Page 19: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Metadata

Access Control

License

Storage

Curation Workflow

PoliciesCollection

Globus: Data publication service

Metadata

DataMetadata

Data

Metadata

Data

DatasetDataset

Dataset

Community

Page 20: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

Page 21: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Science Stack in Action

Sequencing Centers

Sequencing Centers

PublicData

Storage

Local Cluster/CloudSeq

Center

Research Lab

Globus Provides a• High-performance • Fault-tolerant• Secure

file transfer Service between all data-endpoints

Data Management Data Analysis

Picard

GATK

Fastq Ref Genome

Alignment

Variant Calling

Galaxy Data Libraries

Globus Genomics on Amazon EC2

• Analytical tools are automatically run on the scalable compute resources when possible

• Globus Integrated within Galaxy

• Web-based UI• Drag-Drop workflow

creations• Easily modify Workflows

with new tools

Galaxy Based Workflow Management System

FTP, SCP, others

FTP, SCP

SCP

Globus SaaS

FTP,

SCP,

HTTP

Page 22: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Flexible, scalable, affordable

genomics analysis for all biologists

Page 23: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Genomics• Analysis tools profiled for optimal

performance

• Workload management for parallel execution

• Resources provisioned on demand

• High performance, reliable data movement

• Seamless access using institution’s credentials

• Best practice + extensible, customizable pipelines

Page 24: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Climate

Page 25: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Materials

Page 26: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Cardio Vascular Research

Page 27: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Proton Cancer Treatment

No. Histories

Execution Time (s)

No. Per Hour

On-demand Cost ($2.10)

Spot Cost ($0.50)

1.5B 570 6 $35 $91B 445 8 $27 $70.5B 283 12 $18 $50.25B

170 21 $10 $2

Page 28: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Usage has been promising

January February March April May June0

200000

400000

600000

800000

1000000

1200000

0

2000

4000

6000

8000

10000

12000Instance Hours Cost

Date

Inst

ance

Hou

rs

Cost

($)

2.5 Million Core hours

Page 29: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Exome: 3 – 12hrs ~1hr

Whole Genome: ~22hrs ~10hrs

RNA-Seq: 1 – 12hrs ~minutes

Page 30: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Diversity of collaborations

DobynsLab

Cox LabVolchenboum LabOlopade Lab

Nagarajan Lab

Page 31: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Common misconceptions• Cloud is expensive• Cloud is insecure• It takes a long time to move data and its hard• Cloud is about VMs and we got VMs• My codes won’t run on the cloud• Cloud is not HPC-enough• Amazon will be acquired or will file for bankruptcy

– What happens to my data?

Page 32: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Possible Solutions

• Outreach• Case studies with TCO for various

domains and problem types• Compliance• Transparency in Billing

Page 33: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery

InfrastructureTo make advanced

computational capabilities available to all researchers at

substantially lower cost

Page 34: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

We’re “all in” on cloud

Identify time-consuming activities amenable to automation, outsourcing and deliver as high-quality, low-touch SaaS

Extract common elements as a research data management automation PaaS

Leverage IaaS for reliability, economies of scale

Page 35: AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Thank you to our sponsors!