AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

Post on 21-Jun-2015

21.096 views 1 download

Tags:

description

We present our work on creating sustainable science services using Globus, Amazon Web Services and Galaxy framework. We focus on Globus Genomics as successful usecase

Transcript of AWS Public Sector Summit 2014 Talk - Science as a Service using AWS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

madduri@anl.gov

Science as a Service on AWS

Ravi K Madduri

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Outline

• CI Mission and Introduction of Science as a Service

• Motivation– Why is this important?

• Separation of concerns – Going far together• Examples of Science as a Service• Focus on Globus Genomics as a Success story

– Announcing Globus Genomics AWS Test Drive

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery Infrastructure

Provide more capability for people at lower cost by delivering

Science as a servicewww.globus.org

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Two Broader Themes

• Productivity of Researchers– Time spent performing administrative tasks Vs

time spent doing science – Reproducibility

• Sustainability of scientific software– Reduction in funding for science

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

42%

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Presenting21st Century Discovery Infrastructure

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Going Far Together

Separation of Concerns

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Science Stack• Galaxy

– Interactive execution– Creation, Execution, Sharing,

Discovering Workflows

• Globus– Data management– Identity Management

• AWS– EC2, EBS, S3, SNS, Spot,

Route 53, Cloud Formation

SaaS

PaaS

IaaS

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

DataDestination

User initiates transfer request1

Globus moves and syncs files2

Globus notifies user3

Globus: Fast, reliable data transfer

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Amazon S3 Endpoints

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus tracks shared files; no need to move files to cloud storage!

2

User B logs into Globus and

accesses shared file

3

Globus: Sharing off existing systems

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

MyProxy

Globus: Federated identity

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

>25,000 registered users; >150 daily50 PB moved; >1B files

10x (or better) performance vs. scp99.9% availability

Entirely hosted on Amazon

Globus Transfer

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Metadata

Access Control

License

Storage

Curation Workflow

PoliciesCollection

Globus: Data publication service

Metadata

DataMetadata

Data

Metadata

Data

DatasetDataset

Dataset

Community

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment

with simulation• Search the literature

• Communicate with colleagues

• Publish papers• Find, configure, install

relevant software• Find, access, analyze

relevant data• Order supplies• Write proposals• Write reports

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Science Stack in Action

Sequencing Centers

Sequencing Centers

PublicData

Storage

Local Cluster/CloudSeq

Center

Research Lab

Globus Provides a• High-performance • Fault-tolerant• Secure

file transfer Service between all data-endpoints

Data Management Data Analysis

Picard

GATK

Fastq Ref Genome

Alignment

Variant Calling

Galaxy Data Libraries

Globus Genomics on Amazon EC2

• Analytical tools are automatically run on the scalable compute resources when possible

• Globus Integrated within Galaxy

• Web-based UI• Drag-Drop workflow

creations• Easily modify Workflows

with new tools

Galaxy Based Workflow Management System

FTP, SCP, others

FTP, SCP

SCP

Globus SaaS

FTP,

SCP,

HTTP

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Flexible, scalable, affordable

genomics analysis for all biologists

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Genomics• Analysis tools profiled for optimal

performance

• Workload management for parallel execution

• Resources provisioned on demand

• High performance, reliable data movement

• Seamless access using institution’s credentials

• Best practice + extensible, customizable pipelines

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Climate

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Globus Materials

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Cardio Vascular Research

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Proton Cancer Treatment

No. Histories

Execution Time (s)

No. Per Hour

On-demand Cost ($2.10)

Spot Cost ($0.50)

1.5B 570 6 $35 $91B 445 8 $27 $70.5B 283 12 $18 $50.25B

170 21 $10 $2

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Usage has been promising

January February March April May June0

200000

400000

600000

800000

1000000

1200000

0

2000

4000

6000

8000

10000

12000Instance Hours Cost

Date

Inst

ance

Hou

rs

Cost

($)

2.5 Million Core hours

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Exome: 3 – 12hrs ~1hr

Whole Genome: ~22hrs ~10hrs

RNA-Seq: 1 – 12hrs ~minutes

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Diversity of collaborations

DobynsLab

Cox LabVolchenboum LabOlopade Lab

Nagarajan Lab

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Common misconceptions• Cloud is expensive• Cloud is insecure• It takes a long time to move data and its hard• Cloud is about VMs and we got VMs• My codes won’t run on the cloud• Cloud is not HPC-enough• Amazon will be acquired or will file for bankruptcy

– What happens to my data?

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Possible Solutions

• Outreach• Case studies with TCO for various

domains and problem types• Compliance• Transparency in Billing

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Our Vision for a 21st Century Discovery

InfrastructureTo make advanced

computational capabilities available to all researchers at

substantially lower cost

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

We’re “all in” on cloud

Identify time-consuming activities amenable to automation, outsourcing and deliver as high-quality, low-touch SaaS

Extract common elements as a research data management automation PaaS

Leverage IaaS for reliability, economies of scale

AWS Government, Education, and Nonprofits Symposium

Washington, DC | June 24, 2014 - June 26, 2014

Thank you to our sponsors!