1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not...

16
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure Justin Miller High Performance File Systems Indiana University Polar HPDC Workshop Rutgers University December 4, 2014

Transcript of 1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not...

1

Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed.

Supporting Polar Research with NationalCyberinfrastructure

Justin MillerHigh Performance File Systems

Indiana University

Polar HPDC WorkshopRutgers UniversityDecember 4, 2014

2

Our Experience with Polar Research

• Indiana University is a partner with the Center for Remote Sensing of Ice Sheets (CReSIS) at the University of Kansas– CReSIS develops radar instruments and processing software– IU provides the computational support

• Data collection in the field• Data storage and processing on IU resources

• Instruments and compute systems are installed on airborne science platforms – NASA Operation IceBridge– NSF funded campaigns

3

4

Polar Research Operations Center (PROC)

• Data collection in the field– Forward Observer

• In-flight data collection and processing system– Captures multiple copies of the data simultaneously– Provides the ability to process data in near real-time

» Quality control» Observation

• Ground lab hardware– generate derived products in the field

• Data management back at home– computational and archival storage– data and metadata management

5

Data Storage and Processing

• Data from the field is shipped to University of Kansas and Indiana University (two copies)– At KU the copy the data to long-term archival storage– At IU we load the data onto a 3.5 PB filesystem for processing

• A complete season of radar data is approximately 80 TB• Able to keep multiple complete seasons online for

reprocessing• Researches use custom Matlab and python code to generate data

products from the raw data• Data products are available online

• http://data.cresis.ku.edu• http://nsidc.org/icebridge/portal/

pti.iu.edu/jetstreamAward #1445604

A national science & engineering cloud

funded by the National Science FoundationAward #ACI-1445604

pti.iu.edu/jetstreamAward #1445604

What is Jetstream?• NSF’s first cloud for science and engineering research across all

areas of activity supported by the NSF

• Jetstream will be a user-friendly cloud environment designed to give researchers and research students access to interactive computing and data analysis resources “on demand.”

• It will provide a user-selectable library of virtual machines that users can select from to do their research.

• Software creators and researchers will also be able to create their own customized virtual machines -or- their own “private computing system” within Jetstream.

• It will enable countless discoveries across disciplines such as biology, atmospheric science, economics, network science, observational astronomy, and social sciences.

pti.iu.edu/jetstreamAward #1445604

pti.iu.edu/jetstreamAward #1445604

Jetstream Hardware Components

Jetstream

Site

#CPUs # Physical

Cores

PFLOPS Total RAM

(GB)

Node Local

Storage

(TB)

Secondary

Storage

(TB)

Connectio

n to

Internet2

(Gbps)

Production

Systems

IU 640 7,680 0.258 40,960 640 960 100

TACC 640 7,680 0.258 40,960 640 960 100

Test &

Developm

ent

System

Arizona 32 384 0.013 2,048 32 192 100

Total 1,312 15,744 0.529 83,968 1,312 2,112 300

pti.iu.edu/jetstreamAward #1445604

Jetstream System Diagram

pti.iu.edu/jetstreamAward #1445604

News for software developers

• Jetstream is enabling cyberinfrastructure

• RESTful APIs

• You build, package, deploy

• Users run on NSF-funded hardware

• Can leverage Globus technology for data movement and authentication

12

13

14

15

16

Citations and License Terms

• Slides about Jetstream are from the presentation “Jetstream – A national science & engineering cloud” by Craig Stewart, PI ([email protected])

• Slides about Wrangler are from the presentation “A Transformational Data Intensive Resource for the Open Science Community” by Dan Stanzione, Director and PI ([email protected]), and Niall Gaffney, Director for Data Intensive Computing ([email protected])

• Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse.

• Except where otherwise noted, contents of this presentation are copyright 2013 by the Trustees of Indiana University.

• This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.