The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research...

40
The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology Labs Chair, Coalition for Academic Scientific Computing IU TeraGrid Resource Partner PI Indiana University [email protected] 17 February 2008

Transcript of The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research...

Page 1: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

The TeraGrid: An essential tool for 21st century

scienceCraig Stewart,

Associate Dean, Research Technologies

Chief Operating Officer, Pervasive Technology Labs

Chair, Coalition for Academic Scientific Computing

IU TeraGrid Resource Partner PI

Indiana University

[email protected]

17 February 2008

Page 2: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

2

Outline• Why this workshop may be valuable to you

– (Time consuming computations on the critical path of you research? Need more storage? Do you provide scientific services/resources over the Web?)

• What is cyberinfrastructure?• Examples of TeraGrid uses• More detailed info about the TeraGrid

– Architecture– Storage– Computation– Science Gateway use and support, including Visualization– Data source and service hosting

• How can you get going using the TeraGrid?– Resources are available to use– Help using the system is available– At the end of the talk we will help those who wish (and have laptops

here) start the application process. You need your CV to finish the whole process, but you can do some of the work and save it

• NB: ‘Tufte was here’

Page 3: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

3

What is Cyberinfrastructure?• Indiana University’s definition of Cyberinfrastructure:

“Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.”

• This and other information in Wikipedia definition of Cyberinfrastructure• Some basic terms

– TFLOPS - Trillions of FLOating Point operations per Second (mathematical operations) (10^12)

– Processor hour - one hour of processor (CPU) utilization– TB - terabyte; PB - petabyte– Parallel programming– MPI - Message Passing Interface – WSRF - Web Services Resource Framework

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 4: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

4

What is the TeraGrid?

• An instrument (cyberinfrastructure) that delivers high-end IT resources - storage, computation, visualization, and data/service hosting - almost all of which are UNIX-based under the covers; some hidden by Web interfaces

– A data storage and management facility: over 20 Petabytes of storage (disk and tape), over 100 scientific data collections

– A computational facility - over 750 TFLOPS in parallel computing systems and growing

– (Sometimes) an intuitive way to do very complex tasks, via Science Gateways, or get data via data services

• A service: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources

• The largest individual cyberinfrastructure facility funded by the NSF, which supports the national science and engineering research community

• Something you can use without financial cost - allocated via peer review (and without double jeopardy)

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 5: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

5

• Simulation of TonB-dependent transporter (TBDT)

• Used 400,000 processor (CPU) hours on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing Center [45 years with one processor]

• Modeled mechanisms for allowing transport of molecules through cell membrane

• Experimental analysis not possible!

• Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. Mechanics of Force Propagation in TonB-Dependent Outer Membrane Transport. Biophysical Journal 93:496-504 (2007).

• Results of the simulation may be seen at www.life.uiuc.edu/emad/TonB-BtuB/btub-2.5Ans.mpg

Image courtesy of Emad Tajkhorshid, UIUC

Examples of what you can do with the TeraGrid:Simulation of cell membrane processes

Page 6: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

6

Predicting storms

• Hurricanes and tornadoes cause massive loss of life and damage to property

• TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed–Major Goal: assess how well ensemble

forecasting predicts thunderstorms, including the supercells tornadoes

–Nightly reservation at PSC–Delivers “better than real time”

prediction–Used 675,000 CPU hours for the

season–Used 312 TB on HPSS storage at PSC

Slide courtesy of Dennis Gannon, IU, and LEAD Collaboration

Page 7: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

7

Solve any Rubik’s Cube in 26 moves?

• Rubik's Cube is perhaps the most famous combinatorial puzzle of its time

• > 43 quintillion states (4.3x10^19)

• Gene Cooperman and Dan Kunkle of Northeastern Univ. proved any state can be solved in 26 moves

• 7TB of distributed storage on TeraGrid allowed them to develop the proof

Source: http://www.physorg.com/news99843195.html

Page 8: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

8

• Resources for many disciplines!

• > 40,000 processors in aggregate

• Resource availability will grow during 2008 at unprecedented rates

Page 9: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

9

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

Caltech

USC/ISI

UNC/RENCI

UW

Resource Provider (RP)

Software Integration Partner

Grid Infrastructure Group (UChicago)

The TeraGrid Map

Tennessee

LONI/LSU

Network Hub

©University of Chicago, Courtesy Dane Skow, Director, TeraGrid Grid Infrastructure Group. Used with Permission.

Page 10: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

10

But you don’t care - TeraGrid Architecture

ComputeService

VizService

DataService

Network, Accounting, …

RP 1

RP 3

RP 2

©University of Chicago, Courtesy Dane Skow, Director, TeraGrid Grid Infrastructure Group. Used with Permission and modified substantially from original by Craig A. Stewart

TeraGrid Infrastructure (Accounting, Network, Authorization,…)

POPS (for now)

Science Gateways

UserPortal

Command Line

Page 11: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

11

Page 12: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

12

Data storage and management: Tape

• TeraGrid provides persistent (up to Feb 2010+) storage on disk and tape

• Could you benefit from having a spare copy of your data stored someplace removed from your home location?

• Allocatable tape-based storage systems:– IU (Indiana University) - geographically distributed– NCAR (National Center for Atmospheric Research) - also supports

dual copy– NCSA (National Center for Supercomputing Applications)– SDSC (San Diego Supercomputer Center)– Note: most sites have massive data storage systems that provide

storage in support of computation• Command line usage is reasonably straightforward with GridFTP; IU is

developing a GUI

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 13: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

13©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 14: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

14

Data storage and management: Disk• GPFS-WAN (General Parallel File System Wide Area

Network). ~ 1 petabyte– Home at San Diego Supercomputer Center; may be

accessed as if it were a local file system from NCAR, NCSA, IU, UC/ANL

• IU Data Capacitor - Lustre– 1 petabyte of spinning disk– Primarily for short term storage of data

• Long term disk storage allocations– Indiana University, National Center for Supercomputing

Applications, San Diego Supercomputer Center

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 15: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

15

TeraGrid High Performance Computing Systems 2007-8

Computational Resources (size approximate - not to scale)

Slide Courtesy Tommy Minyard, TACC

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

2007(504TF)

2008(~1PF)Tennessee

LONI/LSU

Page 16: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

16

Two examples of TeraGrid supercomputers

• Newest addition to the TeraGrid - Texas Advanced Computing Center’s Ranger– Biggest open supercomputer in

world

– 504 TFLOPS Sun Constellation

– 15,744 AMD Quad-core “Barcelona” processors

– Disk subsystem - 1.7 petabytes

• IU’s Big Red– 30 TFLOPS

– Particularly good for molecular dynamics codes

– Biggest system in the TeraGrid in summer 2006

Ranger info courtesy of Tommy Minyard, TACC

Big Red

Page 17: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

17

Science Gateways

• A Science Gateway is a domain-specific computing environment, typically accessed via the Web, that provides a scientific community with end-to-end support for a particular scientific workflow

• Science Gateways are distinguished from Web portals (http://en.wikipedia.org/wiki/Web_portal) in that portals “present information from diverse sources in a unified way.”

• Hides complexity (pay no attention to the grid behind the curtain…)

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 18: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

18

LEAD (portal.leadproject.org)

• Simple enough an undergraduate can use it!• National Center for Supercomputing Applications (NCSA) and IU teamed up to

support WxChallenge weather forecast competition. 64 teams, 1000 students, ~16,000 CPU hours on Big Red

• XBaya is available from http://www.collab-ogce.org/

Page 19: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

19

Purdue’s NanoHUB (www.nanohub.org)

Page 20: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

20

U. Chicago SIDGrid (sidgrid.ci.uchicago.edu)

Page 21: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

21

Image by Chris Matusek Image by Ralf Frieser

IU Render Portal

• Supports scientific visualization • Supports education in visualization, graphics, and new media

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 22: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

22

Purdue TeraDRE

Page 23: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

23

TeraGrid Science GatewaysAccessible at http://www.teragrid.org/programs/sci_gateways/

Title Discipline

Open Science Grid (OSG) Advanced Scientific Computing

Special PRiority and Urgent Computing Environment (SPRUCE) Advanced Scientific Computing

Massive Pulsar Surveys using the Arecibo L-band Feed Array (ALFA) Astronomical Sciences

National Virtual Observatory (NVO) Astronomical Sciences

High Resolution Daily Temperature and Precipitation Data for the Northeast United States

Atmospheric Sciences

Linked Environments for Atmospheric Discovery (LEAD) Atmospheric Sciences

Computational Chemistry Grid (GridChem) Chemistry

Computational Science and Engineering Online (CSE-Online) Chemistry

Network for Earthquake Engineering Simulation (NEES) Earthquake Hazard Mitigation

GEON(GEOsciences Network) Earth Sciences

NanoHUB Nanotechnology

TeraGrid Geographic Information Science Gateway (GISolve) Geography

Page 24: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

24

TeraGrid Science GatewaysAccessible at http://www.teragrid.org/programs/sci_gateways/

Title Discipline

CIG Science Gateway for the Geodynamics Community Geophysics

QuakeSim (QuakeSim) Geophysics

The Earth System Grid (ESG) Global Atmospheric Research

National Biomedical Computation Resource (NBCR) Integrative Biology and Neuroscience

Developing Social Informatics Data Grid (SIDGrid) Language, Cognition, and Social Behavior

Neutron Science TeraGrid Gateway (NSTG) Materials Research

Biology and Biomedicine Science Gateway Molecular Biosciences

Open Life Sciences Gateway (OLSG) Molecular Biosciences

The Telescience Project Neuroscience Biology

Grid Analysis Environment (GAE) Physics

SCEC Earthworks Project Seismology

TeraGrid Visualization Gateway Visualization, Image Processing

Page 25: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

25

Hosting services

• Remember that old Waffle House commercial?• If you have a data set or a data resource that serves

a national community (or even a community that extends beyond your home institution… or a community you would like to extend beyond your home institution) …

• Hosting of your service is available from Indiana University via our Quarry system!

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 26: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

26

MutDB (www.mutdb.org)

http://www.chembiogrid.org/

Page 27: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

27

Getting an account and allocation• Get a POPS (Partnership Online Proposal System) account• Apply for a DAC allocation (Development Allocation

Committee): < 5 TB disk, < 25 TB tape storage, and/or < 30,000 Standard Units (SUs - related to CPU hours - in general an SU on one of the newer TeraGrid systems is about 0.5 CPU hours)

• Wait a month (although IU can help you shorten that!)• Read the introductory documentation• Use the TeraGrid KB if you need• Ask for help ([email protected], [email protected])• Go discover!

Page 28: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

28

Go to the POPS page - https://pops-submit.teragrid.org/

Á

Page 29: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

29

Create a POPS Login

Page 30: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

30

Á

Indicate that you are “New” to the Teragrid

Page 31: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

31

Á

Indicate that this is a “Start-up” Request

Page 32: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

32

Á

Select DAC-TG (nonintuitive)

Page 33: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

33

Fill out PI information

Page 34: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

34

Á

Á

Skip Co-PIs probably (unless Co-PI has current funding and you don’t)

Page 35: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

35

Á

Fill out info on your project

Page 36: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

36

Fill out info on your funding

Page 37: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

37

Á

Á

Á

Á

Á

Make reasonable estimates about your computing

Page 38: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

38

Á Á when ready

Upload your CV and Submit!

Page 39: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

39

Additional info

• www.researchtechnologies.iu.edu (also pervasive.iu.edu)• Getting started guide - includes examples of good proposals:

http://www.teragrid.org/userinfo/getting_started.php• Review criteria:

http://www.teragrid.org/userinfo/access/allocationspolicy.php• When you’re in a foreign country there is nothing like a guide. If you

need help with the application process contact IU consultants at [email protected] or submit a help request via the TeraGrid ([email protected])

• If you are interested in having a data collection or science gateway hosted on the TeraGrid, definitely contact IU directly ([email protected]). Do the same if you are interested in Advanced Support for TeraGrid Allocations

• If you are anxious to get going, contact us as soon as you have your DAC allocation request submitted and we can provide a local login for up to 6 weeks of use

Page 40: The TeraGrid: An essential tool for 21st century science Craig Stewart, Associate Dean, Research Technologies Chief Operating Officer, Pervasive Technology.

40

Acknowledgements• IU’s involvement as a TeraGrid Resource Partner is supported in part by the National Science Foundation under Grants No.

ACI-0338618l, OCI-0451237, OCI-0535258, and OCI-0504075.

• The IU Data Capacitor is supported in part by the National Science Foundation under Grant No. CNS-0521433.

• The Grid Infrastructure Group management of the TeraGrid, and Dane Skow's leadership thereof, is funded by NSF grant 0503697.

• Purdue’s involvement as a TeraGrid Resource Partner is supported in part by the National Science Foundation under Grant No. OCI-050399.

• This research was supported in part by the Pervasive Technology Labs and the Indiana METACyt Initiative. Both Indiana University initiatives are supported by the Lilly Endowment, Inc.

• This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University.

• The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and supported by NSF grant 331480. Marcus Christie and Surresh Marru of the Extreme! Computing Lab contributed the LEAD graphics

• The ChemBioGrid Portal is developed under the leadership of IU Professor Dr. Geoffrey C. Fox and Dr. Marlon Pierce and funded via the Pervasive Technology Labs (supported by the Lilly Endowment, Inc.) and the National Institutes of Health grant P20 HG003894-01.

• Many of the ideas presented in this talk were developed under a Fulbright Senior Scholar’s award to Stewart, funded by the US Department of State and the Technische Universitaet Dresden.

• Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other funding agency.

• This work is made possible by the dedicated efforts of the expert staff of the Research Technologies Division of University Information Technology Services, the faculty and staff of the Pervasive Technology Labs, and the staff of UITS generally. Erik Cornet, Mike Lowe, Scott Tiege, Michael Grobe, and Malinda Lingwall helped with this presentation.

• Thanks to the faculty and staff with whom we collaborate locally at IU and globally (within the US via the TeraGrid, and internationally via collaboration with Technische Universitaet Dresden)

Thank you! Any questions?