Bionimbus Cambridge Workshop (3-28-11, v7)
-
Upload
robert-grossman -
Category
Technology
-
view
1.142 -
download
4
Transcript of Bionimbus Cambridge Workshop (3-28-11, v7)
![Page 1: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/1.jpg)
Bionimbus: A Cloud-Based Infrastructure for Managing,
Analyzing and Sharing Genomics Data
Robert GrossmanInstitute for Genomics & Systems Biology
Computation InstituteUniversity of Chicago
andOpen Cloud Consortium
March 29, 2011
![Page 2: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/2.jpg)
Part 1Biology, Big Data & Clouds
2
Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).
![Page 3: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/3.jpg)
Source: Lincoln Stein
![Page 4: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/4.jpg)
The Challenge is to Support Cubes of Next Gen Sequence Data
Perturb the environment
Different developmental stages
Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.
Different pathologies
![Page 5: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/5.jpg)
Discipline Duration Size # Devices
HEP - LHC 10 years 15 PB/year One
Astronomy - LSST 10 years 10 PB/year One
Genomics -NGS 2-4 years 0.5 TB/genome Hundreds
Genomics as a Big Data Science
![Page 6: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/6.jpg)
What is a new about clouds?
6
![Page 7: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/7.jpg)
7
Scale is New
![Page 8: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/8.jpg)
Elastic, On-Demand Computing with Usage Based Pricing Is New
8
1 computer in a rack for 120 hours
120 computers in three racks for 1 hour
costs the same as
![Page 9: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/9.jpg)
Part 2. What is Bionimbus?
www.bionimbus.org
![Page 10: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/10.jpg)
Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
![Page 11: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/11.jpg)
Bionimbus Private Cloud
UC
Bionimbus Community
Cloud
Bionimbus Private
Cloud XYAmazondbGaP
External Sequencers
IGSBSequencers
Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.
Step 2. Send sample tobe sequenced.
BID Generator
Step 3b. Returnvariant calls, CNV, annotation…
Step 4. Secure datarouting to appropriatecloud based upon BID.
Step 5. Cloud based analysis
using IGSB and 3rd party tools and applications.
Step 3a. Return rawreads.
![Page 12: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/12.jpg)
What is a good unit to understand data intensive computing of
biological data?
![Page 13: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/13.jpg)
Bionimbus & OSDC Today
• The NIH in the U.S. currently makes available for download approximately 2PB of data.
• Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
• Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.
![Page 14: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/14.jpg)
Database Services
Analysis Pipelines & Re-analysis
Services
GWT-based Front End
Large Data Cloud Services
Data Ingestion Services
Elastic Cloud Services
Intercloud Services
![Page 15: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/15.jpg)
Bionimbus Deployment Options
Bionimbus Community Cloudwww.bionimbus.org
Bionimbus AMIs & Amazon hosted applications
Bionimbus Private Clouds
![Page 16: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/16.jpg)
Part 3. Some Bionimbus Case
![Page 17: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/17.jpg)
Case Study: Public Datasets in Bionimbus
![Page 18: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/18.jpg)
Case Study: ModENCODE
• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
• Bionimbus VMs were used for some of the integrative analysis.
• Bionimbus is used as a backup for the modENCODE DCC
![Page 19: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/19.jpg)
Case Study: IGSB
• All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
![Page 20: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/20.jpg)
20
Bionimbus Virtual Machine Releases Peak Calling MAT
MA2CPeakSeqMACSSPP
Quality Control
Various
Alignment & Genotyping
Bowtie
TopHatSamtoolsPicard
![Page 21: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/21.jpg)
What is the OSDC?
Part 4
![Page 22: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/22.jpg)
Astronomical dataBiological data (Bionimbus)
NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
Open Science Data Cloud
![Page 23: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/23.jpg)
23www.opencloudconsortium.org
• U.S based not-for-profit corporation.• Manages cloud computing infrastructure to
support scientific research: Open Science Data Cloud.
• Manages cloud computing testbeds: Open Cloud Testbed.
• Develop reference implementations, benchmarks and standards.
![Page 24: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/24.jpg)
OCC Members
• Companies: Cisco, Citrix, Yahoo!, …• Universities: University of Chicago, Calit2,
Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …
• Federal agencies: NASA• Other: National Lambda Rail• Adding international partners in 2011.
24
![Page 25: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/25.jpg)
Infrastructure
• 2010 Proof-of-Concept Infrastructure– 450+ nodes– 3000+ cores– 3+ PB– Four data centers (two more to come in 2011)– Data centers have 10G network connections (some
100G links in 2011)• Plan to add approximately 1 PB of data in 2011.• With current funding, we will refresh 1/3 of the
infrastructure in 2011 and 2012.
![Page 26: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/26.jpg)
Towards a Long Term, Sustainable Model
• Cap Exp about $1M/year• Op Exp about $1M/year• Moore Foundation providing $1M/year for
2011 and 2012 to support the Cap Exp.
![Page 27: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/27.jpg)
Small Medium to Large Very Large
Data Size
Low
Med
Wide
Variety of analysis
No infrastructure Dedicated infrastructureGeneral infrastructure
Scientist with laptop
Open Science Data Cloud
Sequencing centers, LHC, LSST
![Page 28: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/28.jpg)
Single workstations
Small to medium clusters
HPC
Cycles
Small
Med
Large
Persistent data
data clouds
Large & spec. clusters
databases
![Page 29: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/29.jpg)
Bionimbus Team*David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P WhiteInstitute for Genomics & Systems BiologyUniversity of Chicago
Jia Chen, Yunhong Gu and Damian RoqueiroUniversity of Illinois at Chicago
Lincoln Stein and Zheng ZhaOntario Institute for Cancer Research*In alphabetical order
![Page 30: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/30.jpg)
Acknowledgements
![Page 31: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/31.jpg)
Questions?
![Page 32: Bionimbus Cambridge Workshop (3-28-11, v7)](https://reader035.fdocuments.us/reader035/viewer/2022062712/55d56a7bbb61eb2f6e8b4641/html5/thumbnails/32.jpg)
Thank You
For more information: www.bionimbus.org
www.opencloudconsortium.orgwww.igsb.org
rgrossman.com