Open Science Data Cloud - CCA 11
-
Upload
robert-grossman -
Category
Technology
-
view
1.340 -
download
0
description
Transcript of Open Science Data Cloud - CCA 11
Open Science Data Cloud
Robert GrossmanOpen Cloud Consortium
University of ChicagoOpen Data Group
April 13, 2011
Astronomical dataBiological data (Bionimbus)
NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
Open Science Data Cloud
Who are we?
4www.opencloudconsortium.org
• U.S based not-for-profit corporation.• Manages cloud computing infrastructure to
support scientific research: Open Science Data Cloud.
• Manages cloud computing testbeds: Open Cloud Testbed.
• Develop reference implementations, benchmarks and standards.
OCC Members
• Companies: Cisco, Citrix, Yahoo!, …• Universities: University of Chicago,
Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …
• Federal agencies: NASA• International Partners: AIST (Japan)• Other: National Lambda Rail• Beginning to add international partners in 2011.
5
Proof of Concept2008 - 2010
Phase 12011 - 2014
Phase 22015-2020
• 4 locations• 10G networks• 450+ nodes• 3000 cores• 2 PB
• 6-10 locations• 100G networks• $1M - $2M
hardware per year
• Build a data center for science
Why Another Cloud Project?
Small Medium to Large Very Large
Data Size
Low
Med
Wide
Variety of analysis
No infrastructure Dedicated infrastructureGeneral infrastructure
Scientist with laptop
Open Science Data Cloud
High energy physics, astronomy
Single workstations
Small to medium clusters
HPC
Cycles
Small
Med
Large
Persistent data
data clouds
Large & spec. clusters
databases
What is the Open Science Data Cloud?
Hosted, managed, distributed facility to:• Manage & archive your medium and large datasets• Provide computational resources to analyze it• Provide networking to share it with your colleagues
and the public.
Long Time Goal
Build a (small) data center for science.
And preserve your data the same way that libraries preserve books &
museums preserve art.
OSDC Perspective• Take a long term point of
view (think like a library not a cloud service provider)
• Operate infrastructure at the scale of a small data center
• Interoperate with public clouds
• Open, interoperable architecture
• Experiment at scale• Vendor neutral
OSDC Projects
Project 1. Bionimbus
www.bionimbus.org
Case Study: Public Datasets in Bionimbus
What Could You Do With 1 PB of Genomics Data?
• The NIH in the U.S. currently makes available for download approximately 2PB of data.
• Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
• We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
Case Study: ModENCODE
• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
• Bionimbus VMs were used for some of the integrative analysis.
• Bionimbus is used as a backup for the modENCODE DCC
Project Matsu 2: An Elastic Cloud For Disaster Response
Daniel Mandl - NASA/GSFC, Lead
20
Provide Fire / Flood Data to Rescue Workers
Short Term Pilot for 2011• Colored areas represent catchments where rainfall collects and drains to river basins • River gauges displayed as small circles• Detailed measurements are available on the display by clicking on the river gauge stations.
21
Note blue bars indicating a surge of rainfall upstream
Then a flood wave appears downstream at Rundu river gauge days later
Flood Dashboard
Zambezi basin consisting of upper, middle and lower catchments
Project 3: OSDC PIRE Project
OSDC PIRE Project Overview
• Research– Cloud middleware for data intensive computing– Wide area clouds
• Training and education workshops – Data intensive computing using the OSDC– Cloud computing for scientific computing
• Outreach– OSDC Data Challenge
Foreign Partners
• National Institute of Advanced Industrial Science and Technology (AIST), Japan
• Beijing Institute of Genomics (BIG)• Edinburgh University• Korea Institute of Science & Technology• San Paulo State University• Universidade Federal Fluminense, Brasil• University of Amsterdam
OSDC Data Challenge
• Annual contest to select 3 to 4 datasets each year to add to the OSDC.
• Looking for the most interesting datasets to add.
Research Focus
• Cloud architectures for data intensive computing
• Wide area clouds• Continuous learning• Scanning queries
Ways to Participate
• Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
• Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing
• Submit your most impressive dataset to the OSDC Data Challenge
• Buy a container of computers and join the OSDC
Open Science Data Cloud Sustainability Model
Towards a Long Term, Sustainable Model
• Capital Exp about $1M/year• Operating Exp about $1M/year• Moore Foundation providing $1M/year for
2011 and 2012 to support the Cap Exp.
Who do you most trust to manage your data for 100 years?
Companies may not be here tomorrow.
Think of a not for profit with that mission.
Government agencies have a role, but not always easy to use.
Buy A Container and Join the OCC
• Use 2/3 of the container for your own purposes.• Provide 1/3 of the container to the OCC for a
share replica space.
To Get Involved
Join the Open Cloud Consortium: www.opencloudconsortium.org
Questions?