Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances...

17
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department of Energy Advancing High End Scientific Computing at Argonne Overview, Operations Philosophy, Best Practices Raymond Bair, Ph.D. Senior Computational Scientist Director, Laboratory Computing Resource Center Mathematics and Computer Science Division [email protected] May 2005

Transcript of Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances...

Page 1: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

A U.S. Department of EnergyOffice of Science LaboratoryOperated by The University of Chicago

Argonne National Laboratory

Office of ScienceU.S. Department of Energy

Advancing High End Scientific Computing at ArgonneOverview, Operations Philosophy, Best Practices

Raymond Bair, Ph.D.Senior Computational ScientistDirector, Laboratory Computing Resource CenterMathematics and Computer Science [email protected]

May 2005

Page 2: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

2

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

About Argonne

• Founded in 1943, designateda national laboratory in 1946

• Managed by The University of Chicago for the Department of Energy~4000 employees and 4000

facility users~$500M budget1500-acre site in Illinois

• Broad R&D portfolio

• Numerous sponsors

• Collaborations worldwide

Page 3: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

3

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

MCS Vision

Petascale Computing

Grid Computing

Computational Science and Engineering

Petascale Computing

Grid Computing

Computational Science and Engineering

• Increase by several orders of magnitude the computing power that can be applied to individual scientific problems, thus enabling progress in understanding complex physical and biological systems.

• Interconnect the world’s most important scientific databases, computing systems, instruments and facilities to improve scientific productivity and remove barriers to collaboration.

• Make high-end computing a core tool for challenging modeling, simulation and analysis problems. Foster high-end computing use in non-traditional areas.

http://www.mcs.anl.gov/

Page 4: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

4

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

High Performance Computing is Integral to Argonne Science and Technology Thrusts

• Biology- Identify functions of genes;

model cellular processes• Nanoscience

- Experiment + theory for catalysts, sensors, electronics, photonics

• Environment- Understand atmospheric chemistry,

aerosols, climate change• Transportation

- Efficient truck aerodynamics and fuel injection

• Energy- Next generation nuclear reactors- Hydrogen storage and production

• Physics- From nuclear structure to stellar

explosions

Page 5: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

5

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

High End Computing Facilities at Argonne

Software Scalability Scalability R&D in system software, open source software, and applications.512 CPUs, 256 nodes, Myrinet, 2 TB storage, Linux.DOE OASCR funded.

Chiba City Installed in 1999.

Integrated High EndResourcesAdvances production grids and science gateways. 48 TFat 9 sites connected by 10-40Gbit links. 224 CPUs at ANL,with focus on visualization.

TeraGrid / ETF NSF funded.

ANL ApplicationsSupports ANL com-munity. 60+ projects from a spectrum of S&Edivisions. 350 CPUs, Myrinet, 20TB storage. ANL funded.Achieved 1.1 TF. Installed in 2002. Jazz

Petascale EvaluationEvaluate new architectures,explore scalability, and develop systems software, tools andapplications for petascale computers. DOE NLCF program. 5.6 TF system installed in 2005.

BlueGene/L

Page 6: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

6

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Three HPC CommunitiesSo

phis

ticat

ion

Scale

Launching off the

Desktop

Launching off the

Desktop

Scaling UpScaling Up

Climbing the Next SummitClimbing the Next Summit

Shared/ Replicated

Memory

Hundreds to Many

Hundreds

Thousands and Beyond

Simple Distributed

Models

Page 7: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

7

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Lowering Barriers to HPC Adoption

• One Click Account Creation- 1000 hours to any staff

member to get started

• Hands On Tutorials- Learn the basics, then get

some practice

• BYO Code Workshops- Small groups, individual

attention

• Short Proposals for Large Projects- Internal peer review

0

10

20

30

40

50

60

70

80

Apr-03

Jul-03

Oct-03

Jan-04

Apr-04

Jul-04

Oct-04

Jan-05

Apr-05

14 Argonne divisions use Jazz

Projects

http://www.lcrc.anl.gov/

Page 8: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

8

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Ongoing Project Support

• New Project Follow-Up- Contact project leads after accounts generated- Help with porting, scripting, parallelization strategies- Match applications developers with MCS staff

• Dedicated Tool and Package Support- Compilers, libraries, tools, and commercial codes built and

tested prior to system upgrades

• Performance Analysis and Tuning Support- Personal assistance and tutorials on advanced

performance improvement techniques

Page 9: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

9

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Project Requirements and Process

• Selection Criteria- Scientific and Technical Merit- Suitability for the System- Technical Feasibility

(and readiness)- Resources Required

(and past performance)• Allocations Committee (Jazz)

- Crosscutting membership- Quarterly meetings- Center staff have interim authority- Multiple project priorities

• Reporting Requirements- Acknowledgement – publications

and presentations- Performance Results (BG) or Annual

Report (1-2 pages)

• First-ever 3D simulations of the full core of a nuclear reactor with rigorous treatment of multiphysicsphenomena

• Initial development on a divisional 80-node cluster, but simulations outgrew it

• “Jazz enabled us to conduct full-system runs of unprecedented size and resolution for core simulations.”

Page 10: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

10

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Large Applications Have Many Lives

• Many codes are long-lived – longer than architectural cycles, OS fads, or the ascent and descent of languages

• Researchers run the same code on different systems during a computational campaign

• Features and tools that bridge environments are valuable

• Ports require justification commensurate with effort – price-performance, capability, longevity, …

• Small system availability, alone, is not adequate justification to port

Page 11: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

11

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Large Applications Operate in Complex Distributed Environments

• Files and databases are not stored ‘locally’- SAN, HFS, Remote SRM, WAFS, Grid/Web Services- Fast end-to-end performance is complex

• Multi-System Workflow: setup, solve, analyze• Cyber Security and AUP are not strictly local issues

- Authentication, proxies, virtual organizations, all create policy challenges

Sensors

Page 12: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

12

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Blue Gene/L Evaluation Plans

• Thoroughly understand performance- Benchmarks and applications- Optimal approaches- Scalability models

• Advance systems software- And firm up administration

tools• Influence next generation

designs• Disseminate results and know

how- Papers, workshops,

consortium, …

Blue

Gene

/L

Page 13: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

13

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

The Blue Gene/L Consortiumformed by ANL and IBM

• Focuses interest in the Blue Gene series- Exploiting its potential for computational

science• Creates a framework for cooperation

- Developing applications, tools and systems software

- Sharing support of systems (not a fully supported IBM product)

- Exchanging innovations and novel solutions

• Supports upcoming HPC needs- Training students and develop next

generation user community- Providing functional requirements for next

generation systems

Working Groups- Applications- System

Software- Operations- Architecture- Outreach

http://www.mcs.anl.gov/bgconsortium/

Page 14: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

14

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

Blue Gene/L Consortium Members (48)• DOE Laboratories

- Ames National Laboratory/Iowa State U. - Argonne National Laboratory - Brookhaven National Laboratory - Fermi National Laboratory - Jefferson Laboratory - Lawrence Berkeley National Laboratory - Lawrence Livermore National Laboratory - Oak Ridge National Laboratory - Pacific Northwest National Laboratory - Princeton Plasma Physics Laboratory

• Universities- Boston University - California Institute of Technology - Columbia University - Harvard University - Indiana University - Louisiana State University - Massachusetts Institute of Technology - National Center for Atmospheric Research - New York University/Courant Institute - Northern Illinois University- Northwestern University - Ohio State University - Pittsburgh Supercomputing Center- Princeton University

• Universities (continued)- Purdue University - Rutgers University- Stony Brook University (SUNY)- Texas A&M University - University of California

Irvine, San Francisco, San Diego/SDSC - University of Chicago - University of Delaware - University of Illinois – Urbana Champaign - University of Minnesota - University of North Carolina - University of Southern California/ISI- University of Texas at Austin – TACC- University of Utah- University of Wisconsin

• Industry- Engineered Intelligence Corporation- IBM

• International- ASTRON/LOFAR, The Netherlands- Trinity College, Ireland- John von Neumann Institute, Germany - NIWS Co., Ltd., Japan - University of Edinburgh, EPCC Scotland

Page 15: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

15

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

BlueGene/L Consortium Activities

• Applications Workshop

• Systems Software Workshop

• BlueGene Update Meeting

• HPC with BlueGene/L and QCDOC Architectures

• Consortium Meeting

• Consortium Kick-off

April 27-28, 2005 at Argonne

Feb. 23-24, 2005 in Salt Lake City

February 8, 2005 via Access Grid

Oct. 27-28, 2004 at Brookhaven

Sept. 10, 2004 via Access Grid

April 27, 2004 at Argonne

http://www.mcs.anl.gov/bgconsortium/

Page 16: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

16

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

What’s Next for Argonne’s BlueGene/L?

• Continue system evaluation- Port Argonne applications and those of collaborations- Finish PVFS2 port and deployment- Help fill gaps in systems software and tools

• Access for BlueGene/L Consortium Members- Hands-On workshops at ~2 month intervals- Participants receive 2 month accounts

• Community resources on Consortium web site- List of ported applications, availability, and contact- List of ported routines and tools for download- List of application performance results

http://www.bgl.mcs.anl.gov/

Page 17: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224

17

Pioneering Science andTechnology

Office of ScienceU.S. Department

of Energy

CommentsDescriptionDomainInstitutionApplication

Near 1 TF/s on full machine

Non-ideal MHD (finite element) w/ rotation, complex boundaries

FusionU WisconsinNimrod

Run to 2048 procs –sample results on later slide

primitive equations on sphere – hydro-static, Boussinesq

OceanographyLANLPOP

Good scaling to 2048 processors

Unstructured N-Ssolver (compressible and incompressible)

General CFDANL/NASAPetscFUN3d

Good scaling to 2048 processors

Molecular dynamics with provisions for periodic slabs and solids

Nano-chemistryDaresburyLaboratoryDL_POLY

Run to 2048. Communication still being optimized

Hodgkin/Huxley Model for neuron firingNeuroscienceANL/UCpNeo

Good scaling to 2048 processors

Nuclear binding energy using Monte CarloNuclear PhysicsANLQMC

Good scaling to 2048 processors

N-S using spectral elementsGeneral CFDANLNek5

Scaling tests to 16k processors

Hydro (PPM) + Nuclear burningAstrophysicsANL/UCFlash

1st Applications Running 2048-way on ANL BG/L