Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances...
Transcript of Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances...
![Page 1: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/1.jpg)
A U.S. Department of EnergyOffice of Science LaboratoryOperated by The University of Chicago
Argonne National Laboratory
Office of ScienceU.S. Department of Energy
Advancing High End Scientific Computing at ArgonneOverview, Operations Philosophy, Best Practices
Raymond Bair, Ph.D.Senior Computational ScientistDirector, Laboratory Computing Resource CenterMathematics and Computer Science [email protected]
May 2005
![Page 2: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/2.jpg)
2
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
About Argonne
• Founded in 1943, designateda national laboratory in 1946
• Managed by The University of Chicago for the Department of Energy~4000 employees and 4000
facility users~$500M budget1500-acre site in Illinois
• Broad R&D portfolio
• Numerous sponsors
• Collaborations worldwide
![Page 3: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/3.jpg)
3
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
MCS Vision
Petascale Computing
Grid Computing
Computational Science and Engineering
Petascale Computing
Grid Computing
Computational Science and Engineering
• Increase by several orders of magnitude the computing power that can be applied to individual scientific problems, thus enabling progress in understanding complex physical and biological systems.
• Interconnect the world’s most important scientific databases, computing systems, instruments and facilities to improve scientific productivity and remove barriers to collaboration.
• Make high-end computing a core tool for challenging modeling, simulation and analysis problems. Foster high-end computing use in non-traditional areas.
http://www.mcs.anl.gov/
![Page 4: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/4.jpg)
4
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
High Performance Computing is Integral to Argonne Science and Technology Thrusts
• Biology- Identify functions of genes;
model cellular processes• Nanoscience
- Experiment + theory for catalysts, sensors, electronics, photonics
• Environment- Understand atmospheric chemistry,
aerosols, climate change• Transportation
- Efficient truck aerodynamics and fuel injection
• Energy- Next generation nuclear reactors- Hydrogen storage and production
• Physics- From nuclear structure to stellar
explosions
![Page 5: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/5.jpg)
5
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
High End Computing Facilities at Argonne
Software Scalability Scalability R&D in system software, open source software, and applications.512 CPUs, 256 nodes, Myrinet, 2 TB storage, Linux.DOE OASCR funded.
Chiba City Installed in 1999.
Integrated High EndResourcesAdvances production grids and science gateways. 48 TFat 9 sites connected by 10-40Gbit links. 224 CPUs at ANL,with focus on visualization.
TeraGrid / ETF NSF funded.
ANL ApplicationsSupports ANL com-munity. 60+ projects from a spectrum of S&Edivisions. 350 CPUs, Myrinet, 20TB storage. ANL funded.Achieved 1.1 TF. Installed in 2002. Jazz
Petascale EvaluationEvaluate new architectures,explore scalability, and develop systems software, tools andapplications for petascale computers. DOE NLCF program. 5.6 TF system installed in 2005.
BlueGene/L
![Page 6: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/6.jpg)
6
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Three HPC CommunitiesSo
phis
ticat
ion
Scale
Launching off the
Desktop
Launching off the
Desktop
Scaling UpScaling Up
Climbing the Next SummitClimbing the Next Summit
Shared/ Replicated
Memory
Hundreds to Many
Hundreds
Thousands and Beyond
Simple Distributed
Models
![Page 7: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/7.jpg)
7
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Lowering Barriers to HPC Adoption
• One Click Account Creation- 1000 hours to any staff
member to get started
• Hands On Tutorials- Learn the basics, then get
some practice
• BYO Code Workshops- Small groups, individual
attention
• Short Proposals for Large Projects- Internal peer review
0
10
20
30
40
50
60
70
80
Apr-03
Jul-03
Oct-03
Jan-04
Apr-04
Jul-04
Oct-04
Jan-05
Apr-05
14 Argonne divisions use Jazz
Projects
http://www.lcrc.anl.gov/
![Page 8: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/8.jpg)
8
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Ongoing Project Support
• New Project Follow-Up- Contact project leads after accounts generated- Help with porting, scripting, parallelization strategies- Match applications developers with MCS staff
• Dedicated Tool and Package Support- Compilers, libraries, tools, and commercial codes built and
tested prior to system upgrades
• Performance Analysis and Tuning Support- Personal assistance and tutorials on advanced
performance improvement techniques
![Page 9: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/9.jpg)
9
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Project Requirements and Process
• Selection Criteria- Scientific and Technical Merit- Suitability for the System- Technical Feasibility
(and readiness)- Resources Required
(and past performance)• Allocations Committee (Jazz)
- Crosscutting membership- Quarterly meetings- Center staff have interim authority- Multiple project priorities
• Reporting Requirements- Acknowledgement – publications
and presentations- Performance Results (BG) or Annual
Report (1-2 pages)
• First-ever 3D simulations of the full core of a nuclear reactor with rigorous treatment of multiphysicsphenomena
• Initial development on a divisional 80-node cluster, but simulations outgrew it
• “Jazz enabled us to conduct full-system runs of unprecedented size and resolution for core simulations.”
![Page 10: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/10.jpg)
10
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Large Applications Have Many Lives
• Many codes are long-lived – longer than architectural cycles, OS fads, or the ascent and descent of languages
• Researchers run the same code on different systems during a computational campaign
• Features and tools that bridge environments are valuable
• Ports require justification commensurate with effort – price-performance, capability, longevity, …
• Small system availability, alone, is not adequate justification to port
![Page 11: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/11.jpg)
11
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Large Applications Operate in Complex Distributed Environments
• Files and databases are not stored ‘locally’- SAN, HFS, Remote SRM, WAFS, Grid/Web Services- Fast end-to-end performance is complex
• Multi-System Workflow: setup, solve, analyze• Cyber Security and AUP are not strictly local issues
- Authentication, proxies, virtual organizations, all create policy challenges
Sensors
![Page 12: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/12.jpg)
12
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Blue Gene/L Evaluation Plans
• Thoroughly understand performance- Benchmarks and applications- Optimal approaches- Scalability models
• Advance systems software- And firm up administration
tools• Influence next generation
designs• Disseminate results and know
how- Papers, workshops,
consortium, …
Blue
Gene
/L
![Page 13: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/13.jpg)
13
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
The Blue Gene/L Consortiumformed by ANL and IBM
• Focuses interest in the Blue Gene series- Exploiting its potential for computational
science• Creates a framework for cooperation
- Developing applications, tools and systems software
- Sharing support of systems (not a fully supported IBM product)
- Exchanging innovations and novel solutions
• Supports upcoming HPC needs- Training students and develop next
generation user community- Providing functional requirements for next
generation systems
Working Groups- Applications- System
Software- Operations- Architecture- Outreach
http://www.mcs.anl.gov/bgconsortium/
![Page 14: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/14.jpg)
14
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Blue Gene/L Consortium Members (48)• DOE Laboratories
- Ames National Laboratory/Iowa State U. - Argonne National Laboratory - Brookhaven National Laboratory - Fermi National Laboratory - Jefferson Laboratory - Lawrence Berkeley National Laboratory - Lawrence Livermore National Laboratory - Oak Ridge National Laboratory - Pacific Northwest National Laboratory - Princeton Plasma Physics Laboratory
• Universities- Boston University - California Institute of Technology - Columbia University - Harvard University - Indiana University - Louisiana State University - Massachusetts Institute of Technology - National Center for Atmospheric Research - New York University/Courant Institute - Northern Illinois University- Northwestern University - Ohio State University - Pittsburgh Supercomputing Center- Princeton University
• Universities (continued)- Purdue University - Rutgers University- Stony Brook University (SUNY)- Texas A&M University - University of California
Irvine, San Francisco, San Diego/SDSC - University of Chicago - University of Delaware - University of Illinois – Urbana Champaign - University of Minnesota - University of North Carolina - University of Southern California/ISI- University of Texas at Austin – TACC- University of Utah- University of Wisconsin
• Industry- Engineered Intelligence Corporation- IBM
• International- ASTRON/LOFAR, The Netherlands- Trinity College, Ireland- John von Neumann Institute, Germany - NIWS Co., Ltd., Japan - University of Edinburgh, EPCC Scotland
![Page 15: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/15.jpg)
15
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
BlueGene/L Consortium Activities
• Applications Workshop
• Systems Software Workshop
• BlueGene Update Meeting
• HPC with BlueGene/L and QCDOC Architectures
• Consortium Meeting
• Consortium Kick-off
April 27-28, 2005 at Argonne
Feb. 23-24, 2005 in Salt Lake City
February 8, 2005 via Access Grid
Oct. 27-28, 2004 at Brookhaven
Sept. 10, 2004 via Access Grid
April 27, 2004 at Argonne
http://www.mcs.anl.gov/bgconsortium/
![Page 16: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/16.jpg)
16
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
What’s Next for Argonne’s BlueGene/L?
• Continue system evaluation- Port Argonne applications and those of collaborations- Finish PVFS2 port and deployment- Help fill gaps in systems software and tools
• Access for BlueGene/L Consortium Members- Hands-On workshops at ~2 month intervals- Participants receive 2 month accounts
• Community resources on Consortium web site- List of ported applications, availability, and contact- List of ported routines and tools for download- List of application performance results
http://www.bgl.mcs.anl.gov/
![Page 17: Advancing High End Scientific Computing at Argonnetkwon/course/5315/HW/BG/... · Advances production grids and science gateways. 48 TF at 9 sites connected by 10-40 Gbit links. 224](https://reader036.fdocuments.us/reader036/viewer/2022063016/5fd68f2adcf6ba50a94fec18/html5/thumbnails/17.jpg)
17
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
CommentsDescriptionDomainInstitutionApplication
Near 1 TF/s on full machine
Non-ideal MHD (finite element) w/ rotation, complex boundaries
FusionU WisconsinNimrod
Run to 2048 procs –sample results on later slide
primitive equations on sphere – hydro-static, Boussinesq
OceanographyLANLPOP
Good scaling to 2048 processors
Unstructured N-Ssolver (compressible and incompressible)
General CFDANL/NASAPetscFUN3d
Good scaling to 2048 processors
Molecular dynamics with provisions for periodic slabs and solids
Nano-chemistryDaresburyLaboratoryDL_POLY
Run to 2048. Communication still being optimized
Hodgkin/Huxley Model for neuron firingNeuroscienceANL/UCpNeo
Good scaling to 2048 processors
Nuclear binding energy using Monte CarloNuclear PhysicsANLQMC
Good scaling to 2048 processors
N-S using spectral elementsGeneral CFDANLNek5
Scaling tests to 16k processors
Hydro (PPM) + Nuclear burningAstrophysicsANL/UCFlash
1st Applications Running 2048-way on ANL BG/L