A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
-
Upload
larry-smarr -
Category
Documents
-
view
568 -
download
1
description
Transcript of A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research
![Page 1: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/1.jpg)
“A Campus-Scale High Performance Cyberinfrastructure is Required
for Data-Intensive Research”
Keynote Presentation
CENIC 2013
Held at Calit2@UCSD
March 11, 2013
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
![Page 2: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/2.jpg)
“Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team
• A Five Year Process Begins Pilot Deployment This Year
research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf
No Data Bottlenecks--Design for
Gigabit/s Data Flows
April 2009
See talk on RCI by Richard MooreToday at 4pm
![Page 3: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/3.jpg)
Calit2 Sunlight OptIPuter Exchange Connects 60 Campus Sites Each Dedicated at 10Gbps
Maxine Brown,
EVL, UICOptIPuter
Project Manager
![Page 4: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/4.jpg)
Rapid Evolution of 10GbE Port PricesMakes Campus-Scale 10Gbps CI Affordable
2005 2007 2009 2010 2011 2013
$80K/port Chiaro(60 Max)
$ 5KForce 10(40 max)
$ 500Arista48 ports
$ 400 (48 ports – today); 576 ports (2013)
• Port Pricing is Falling • Density is Rising – Dramatically• Cost of 10GbE Approaching Cluster HPC Interconnects
Source: Philip Papadopoulos, SDSC/Calit2
![Page 5: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/5.jpg)
Arista Enables SDSC’s Massively Parallel 10G Switched Data Analysis Resource
12
![Page 6: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/6.jpg)
Partnering Opportunities with NSF:SDSC’s Gordon-Dedicated Dec. 5, 2011
• Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW– Emphasizes MEM and IOPS over FLOPS– Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate– 8 TB SSD Aggregate
– Total Machine = 32 Supernodes– 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science
Source: Mike Norman, Allan Snavely SDSC
![Page 7: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/7.jpg)
Gordon Bests Previous Mega I/O per Second by 25x
![Page 8: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/8.jpg)
Creating a “Big Data Freeway” SystemConnecting Instruments, Computers, & Storage
Phil Papadopoulos, PILarry Smarr co-PI
PRISM@UCSD
Start Date1/1/13
See talk on PRISM
by Phil P.Tomorrow at
9am
![Page 9: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/9.jpg)
Many Disciplines Beginning to NeedDedicated High Bandwidth on Campus
• Remote Analysis of Large Data Sets– Particle Physics
• Connection to Remote Campus Compute & Storage Clusters– Ocean Observatory
– Microscopy and Next Gen Sequencers
• Providing Remote Access to Campus Data Repositories– Protein Data Bank and Mass Spectrometry
• Enabling Remote Collaborations– National and International
How to Terminate a CENIC 100G Campus Connection
![Page 10: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/10.jpg)
PRISM@UCSD Enables Remote Analysis of Large Data Sets
![Page 11: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/11.jpg)
CERN’s CMS Detector is One of the World’s Most Complex Scientific Instrument
See talk on LHC 100G Networks by Azher Mughal, CaltechToday at 10am
![Page 12: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/12.jpg)
CERN’s CMS ExperimentGenerates Massive Amounts of Data
![Page 13: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/13.jpg)
UCSD is a Tier-2 LHC Data Center
Source: Frank Wuerthwein, Physics UCSD
![Page 14: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/14.jpg)
Flow Out of CERN for CMS DetectorPeaks at 32 Gbps!
14Source: Frank Wuerthwein, Physics UCSD
![Page 15: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/15.jpg)
CMS Flow Into Fermi LabPeaks at 10Gbps
15Source: Frank Wuerthwein, Physics UCSD
![Page 16: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/16.jpg)
CMS Flow into UCSD PhysicsPeaks at 2.4 Gbps
16Source: Frank Wuerthwein, Physics UCSD
![Page 17: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/17.jpg)
Open for all of science, includingbiology, chemistry, computer science, engineering, mathematics, medicine, and physics
The Open Science GridA Consortium of Universities and National Labs
Source: Frank Wuerthwein, Physics UCSD
![Page 18: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/18.jpg)
Dan Cayan USGS Water Resources Discipline
Scripps Institution of Oceanography, UC San Diego
much support from Mary Tyree, Mike Dettinger, Guido Franco and other colleagues
Sponsors: California Energy Commission NOAA RISA program California DWR, DOE, NSF
Planning for climate change in California substantial shifts on top of already high climate variability
![Page 19: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/19.jpg)
Greenhouse Gas
Emissionsand
ConcentrationCMIP3 GCM’s
UCSD Campus Climate Researchers Need to Download Results from Remote Supercomputer Simulations
Source: Dan Cayan, SIO UCSD
![Page 20: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/20.jpg)
GCMs ~150km downscaled toRegional models ~ 12km
Many simulationsIPCC AR4 and IPCC AR5 have been downscaledusing statistical methods
INCREASING VOLUME OF CLIMATE SIMULATIONS
in comparison to 4th IPCC (CMIP3) GCMs :
Latest Generation CMIP5 Models Provide: More Simulations Higher Spatial Resolution More Developed Process Representation Daily Output is More Available
Global to Regional Downscaling
Source: Dan Cayan, SIO UCSD
![Page 21: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/21.jpg)
average summer afternoon temperature
average summer afternoon temperature
21GFDL A2 1km downscaled to 1kmHugo Hidalgo Tapash Das Mike Dettinger
![Page 22: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/22.jpg)
HOW MUCH CALIFORNIA SNOW LOSS ? Initial projections indicate substantial reduction
in snow water for Sierra Nevada+
declining Apr 1 SWE:2050 median SWE ~ 2/3 historical median2100 median SWE ~ 1/3 historical median
![Page 23: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/23.jpg)
PRISM@UCSD Enables Connection to Remote Campus Compute & Storage Clusters
![Page 24: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/24.jpg)
The OOI CI is Built on Dedicated 10GEand Serves Researchers, Education, and Public
Source: Matthew Arrott, John Orcutt OOI CI
![Page 25: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/25.jpg)
Reused Undersea Optical CablesForm a Part of the Ocean Observatories
Source: John Delaney UWash OOI
![Page 26: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/26.jpg)
Source: John Orcutt, Matthew Arrott, SIO/Calit2
OOI CI is Built on Dedicated Optical Networks and Federal Agency & Commercial Clouds
![Page 27: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/27.jpg)
OOI CI Team at Scripps Institution of Oceanography Needs Connection to Its Server Complex in Calit2
![Page 28: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/28.jpg)
Ultra High Resolution Microscopy ImagesCreated at the National Center for Microscopy Imaging
![Page 29: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/29.jpg)
Zeiss Merlin 3View w/ 32k x 32k Scanning and Automated Mosaicing:
Current= 1-2 TB/week soon 12 TB/week
JEOL-4000EX w/ 8k x 8k CD, Automated Mosaicing, and Serial Tomography:
Current= 1 TB/week
FEI Titan w/ 4k x 4k STEM, EELS, 4k x 3.5k DDD, 4k x4k CCD, Automated Mosaicing, and Multi-tilt Tomography:
Current= 1 TB/week
200-500 TB/year Raw >2 PB/year Aggregate
Microscopes Are Big Data Generators – Driving Software & Cyberinfrastructure Development
Source: Mark Ellisman, School of Medicine, UCSD
![Page 30: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/30.jpg)
NIH National Center for Microscopy & Imaging Research Integrated Infrastructure of Shared Resources
Source: Steve Peltier, Mark Ellisman, NCMIR
Local SOM Infrastructure
Scientific Instruments
End UserWorkstations
Shared Infrastructure
![Page 31: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/31.jpg)
Agile System that Spans Resource Classes
![Page 32: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/32.jpg)
SDSC Gordon Supercomputer Analysisof LS Gut Microbiome Displayed on Calit2 VROOM
Calit2 VROOM-FuturePatient Expedition
See Live Demo on Calit2 to CICESE 10G
Weds at 8:30am
![Page 33: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/33.jpg)
PRISM@UCSD Enables Providing Remote Access to Campus Data Repositories
![Page 34: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/34.jpg)
Protein Data Bank (PDB) NeedsBandwidth to Connect Resources and Users
• Archive of experimentally determined 3D structures of proteins, nucleic acids, complex assemblies
• One of the largest scientific resources in life sciences
Source: Phil Bourne and Andreas Prlić, PDBHemoglobin
Virus
![Page 35: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/35.jpg)
PDB Usage Is Growing Over Time
• More than 300,000 Unique Visitors per Month• Up to 300 Concurrent Users• ~10 Structures are Downloaded per Second 7/24/365• Increasingly Popular Web Services Traffic
Source: Phil Bourne and Andreas Prlić, PDB
![Page 36: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/36.jpg)
RCSB PDB159 millionentry downloads
PDBe34 millionentry downloads
PDBj16 millionentry downloads
2010 FTP Traffic
36
Source: Phil Bourne and Andreas Prlić, PDB
![Page 37: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/37.jpg)
• Why is it Important?– Enables PDB to Better Serve Its Users by Providing
Increased Reliability and Quicker Results
• How Will it be Done?– By More Evenly Allocating PDB Resources at Rutgers and
UCSD– By Directing Users to the Closest Site
• Need High Bandwidth Between Rutgers & UCSD Facilities
PDB Plans to Establish Global Load Balancing
Source: Phil Bourne and Andreas Prlić, PDB
![Page 38: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/38.jpg)
UCSD Center for Computational Mass SpectrometryBecoming Global MS Repository
ProteoSAFe: Compute-intensive discovery MS at the click of a button
MassIVE: repository and identification platform for all
MS data in the world
Source: Nuno Bandeira,Vineet Bafna, Pavel Pevzner,
Ingolf Krueger, UCSD
proteomics.ucsd.edu
![Page 39: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/39.jpg)
Automation: Do it Billions of Times
• Large Volumes of Data from Many Sources--Must Automate– Thousands of Users, Tens of Thousands of Searches
– Multi-Omics: Proteomics, Metabolomics, Proteogenomics, Natural Products, Glycomics, etc.
• CCMS ProteoSAFe– Scalable: Distributed Computation over 1000s of CPUs
– Accessible: Intuitive Web-Based User Interfaces
– Flexible: Easy Integration of New Analysis Workflows
• Already Analyzed >1B Spectra in >26,000 Searches from >2,200 users
![Page 40: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/40.jpg)
PRISM@UCSD Enables Enabling Remote National and International Collaborations
![Page 41: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/41.jpg)
Tele-Collaboration for Audio Post-ProductionRealtime Picture & Sound Editing Synchronized Over IP
Skywalker Sound@Marin Calit2@San Diego
![Page 42: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/42.jpg)
Tele-Collaboration for Cinema Post-Production
Disney + Skywalker Sound + Digital Domain + Laser Pacific NTT Labs + UCSD/Calit2 + UIC/EVL + Pacific Interface
![Page 43: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/43.jpg)
Collaboration Between EVL’s CAVE2 and Calit2’s VROOM Over 10Gb Wavelength
EVL
Calit2
Source: NTT Sponsored ON*VECTOR Workshop at Calit2 March 6, 2013
![Page 44: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/44.jpg)
Calit2 is Linked to CICESE at 10GCoupling OptIPortals at Each Site
See Live Demo on Calit2 to CICESE 10G
Weds at 8:30am
![Page 45: A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Intensive Research](https://reader030.fdocuments.us/reader030/viewer/2022012918/554e7ed9b4c90545698b51e3/html5/thumbnails/45.jpg)
PRAGMAA Practical Collaboration Framework
Build and Sustain Collaborations
Advance & Improve Cyberinfrastructure
Through Applications Source: Peter Arzberger, Calit2 UCSD