A U.S. Department of EnergyOffice of Science LaboratoryOperated by The University of Chicago
Argonne National Laboratory
Office of ScienceU.S. Department of Energy
Advancing High End Scientific Computing at ArgonneOverview, Operations Philosophy, Best Practices
Raymond Bair, Ph.D.Senior Computational ScientistDirector, Laboratory Computing Resource CenterMathematics and Computer Science [email protected]
May 2005
2
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
About Argonne
• Founded in 1943, designateda national laboratory in 1946
• Managed by The University of Chicago for the Department of Energy~4000 employees and 4000
facility users~$500M budget1500-acre site in Illinois
• Broad R&D portfolio
• Numerous sponsors
• Collaborations worldwide
3
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
MCS Vision
Petascale Computing
Grid Computing
Computational Science and Engineering
Petascale Computing
Grid Computing
Computational Science and Engineering
• Increase by several orders of magnitude the computing power that can be applied to individual scientific problems, thus enabling progress in understanding complex physical and biological systems.
• Interconnect the world’s most important scientific databases, computing systems, instruments and facilities to improve scientific productivity and remove barriers to collaboration.
• Make high-end computing a core tool for challenging modeling, simulation and analysis problems. Foster high-end computing use in non-traditional areas.
http://www.mcs.anl.gov/
4
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
High Performance Computing is Integral to Argonne Science and Technology Thrusts
• Biology- Identify functions of genes;
model cellular processes• Nanoscience
- Experiment + theory for catalysts, sensors, electronics, photonics
• Environment- Understand atmospheric chemistry,
aerosols, climate change• Transportation
- Efficient truck aerodynamics and fuel injection
• Energy- Next generation nuclear reactors- Hydrogen storage and production
• Physics- From nuclear structure to stellar
explosions
5
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
High End Computing Facilities at Argonne
Software Scalability Scalability R&D in system software, open source software, and applications.512 CPUs, 256 nodes, Myrinet, 2 TB storage, Linux.DOE OASCR funded.
Chiba City Installed in 1999.
Integrated High EndResourcesAdvances production grids and science gateways. 48 TFat 9 sites connected by 10-40Gbit links. 224 CPUs at ANL,with focus on visualization.
TeraGrid / ETF NSF funded.
ANL ApplicationsSupports ANL com-munity. 60+ projects from a spectrum of S&Edivisions. 350 CPUs, Myrinet, 20TB storage. ANL funded.Achieved 1.1 TF. Installed in 2002. Jazz
Petascale EvaluationEvaluate new architectures,explore scalability, and develop systems software, tools andapplications for petascale computers. DOE NLCF program. 5.6 TF system installed in 2005.
BlueGene/L
6
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Three HPC CommunitiesSo
phis
ticat
ion
Scale
Launching off the
Desktop
Launching off the
Desktop
Scaling UpScaling Up
Climbing the Next SummitClimbing the Next Summit
Shared/ Replicated
Memory
Hundreds to Many
Hundreds
Thousands and Beyond
Simple Distributed
Models
7
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Lowering Barriers to HPC Adoption
• One Click Account Creation- 1000 hours to any staff
member to get started
• Hands On Tutorials- Learn the basics, then get
some practice
• BYO Code Workshops- Small groups, individual
attention
• Short Proposals for Large Projects- Internal peer review
0
10
20
30
40
50
60
70
80
Apr-03
Jul-03
Oct-03
Jan-04
Apr-04
Jul-04
Oct-04
Jan-05
Apr-05
14 Argonne divisions use Jazz
Projects
http://www.lcrc.anl.gov/
8
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Ongoing Project Support
• New Project Follow-Up- Contact project leads after accounts generated- Help with porting, scripting, parallelization strategies- Match applications developers with MCS staff
• Dedicated Tool and Package Support- Compilers, libraries, tools, and commercial codes built and
tested prior to system upgrades
• Performance Analysis and Tuning Support- Personal assistance and tutorials on advanced
performance improvement techniques
9
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Project Requirements and Process
• Selection Criteria- Scientific and Technical Merit- Suitability for the System- Technical Feasibility
(and readiness)- Resources Required
(and past performance)• Allocations Committee (Jazz)
- Crosscutting membership- Quarterly meetings- Center staff have interim authority- Multiple project priorities
• Reporting Requirements- Acknowledgement – publications
and presentations- Performance Results (BG) or Annual
Report (1-2 pages)
• First-ever 3D simulations of the full core of a nuclear reactor with rigorous treatment of multiphysicsphenomena
• Initial development on a divisional 80-node cluster, but simulations outgrew it
• “Jazz enabled us to conduct full-system runs of unprecedented size and resolution for core simulations.”
10
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Large Applications Have Many Lives
• Many codes are long-lived – longer than architectural cycles, OS fads, or the ascent and descent of languages
• Researchers run the same code on different systems during a computational campaign
• Features and tools that bridge environments are valuable
• Ports require justification commensurate with effort – price-performance, capability, longevity, …
• Small system availability, alone, is not adequate justification to port
11
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Large Applications Operate in Complex Distributed Environments
• Files and databases are not stored ‘locally’- SAN, HFS, Remote SRM, WAFS, Grid/Web Services- Fast end-to-end performance is complex
• Multi-System Workflow: setup, solve, analyze• Cyber Security and AUP are not strictly local issues
- Authentication, proxies, virtual organizations, all create policy challenges
Sensors
12
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Blue Gene/L Evaluation Plans
• Thoroughly understand performance- Benchmarks and applications- Optimal approaches- Scalability models
• Advance systems software- And firm up administration
tools• Influence next generation
designs• Disseminate results and know
how- Papers, workshops,
consortium, …
Blue
Gene
/L
13
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
The Blue Gene/L Consortiumformed by ANL and IBM
• Focuses interest in the Blue Gene series- Exploiting its potential for computational
science• Creates a framework for cooperation
- Developing applications, tools and systems software
- Sharing support of systems (not a fully supported IBM product)
- Exchanging innovations and novel solutions
• Supports upcoming HPC needs- Training students and develop next
generation user community- Providing functional requirements for next
generation systems
Working Groups- Applications- System
Software- Operations- Architecture- Outreach
http://www.mcs.anl.gov/bgconsortium/
14
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
Blue Gene/L Consortium Members (48)• DOE Laboratories
- Ames National Laboratory/Iowa State U. - Argonne National Laboratory - Brookhaven National Laboratory - Fermi National Laboratory - Jefferson Laboratory - Lawrence Berkeley National Laboratory - Lawrence Livermore National Laboratory - Oak Ridge National Laboratory - Pacific Northwest National Laboratory - Princeton Plasma Physics Laboratory
• Universities- Boston University - California Institute of Technology - Columbia University - Harvard University - Indiana University - Louisiana State University - Massachusetts Institute of Technology - National Center for Atmospheric Research - New York University/Courant Institute - Northern Illinois University- Northwestern University - Ohio State University - Pittsburgh Supercomputing Center- Princeton University
• Universities (continued)- Purdue University - Rutgers University- Stony Brook University (SUNY)- Texas A&M University - University of California
Irvine, San Francisco, San Diego/SDSC - University of Chicago - University of Delaware - University of Illinois – Urbana Champaign - University of Minnesota - University of North Carolina - University of Southern California/ISI- University of Texas at Austin – TACC- University of Utah- University of Wisconsin
• Industry- Engineered Intelligence Corporation- IBM
• International- ASTRON/LOFAR, The Netherlands- Trinity College, Ireland- John von Neumann Institute, Germany - NIWS Co., Ltd., Japan - University of Edinburgh, EPCC Scotland
15
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
BlueGene/L Consortium Activities
• Applications Workshop
• Systems Software Workshop
• BlueGene Update Meeting
• HPC with BlueGene/L and QCDOC Architectures
• Consortium Meeting
• Consortium Kick-off
April 27-28, 2005 at Argonne
Feb. 23-24, 2005 in Salt Lake City
February 8, 2005 via Access Grid
Oct. 27-28, 2004 at Brookhaven
Sept. 10, 2004 via Access Grid
April 27, 2004 at Argonne
http://www.mcs.anl.gov/bgconsortium/
16
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
What’s Next for Argonne’s BlueGene/L?
• Continue system evaluation- Port Argonne applications and those of collaborations- Finish PVFS2 port and deployment- Help fill gaps in systems software and tools
• Access for BlueGene/L Consortium Members- Hands-On workshops at ~2 month intervals- Participants receive 2 month accounts
• Community resources on Consortium web site- List of ported applications, availability, and contact- List of ported routines and tools for download- List of application performance results
http://www.bgl.mcs.anl.gov/
17
Pioneering Science andTechnology
Office of ScienceU.S. Department
of Energy
CommentsDescriptionDomainInstitutionApplication
Near 1 TF/s on full machine
Non-ideal MHD (finite element) w/ rotation, complex boundaries
FusionU WisconsinNimrod
Run to 2048 procs –sample results on later slide
primitive equations on sphere – hydro-static, Boussinesq
OceanographyLANLPOP
Good scaling to 2048 processors
Unstructured N-Ssolver (compressible and incompressible)
General CFDANL/NASAPetscFUN3d
Good scaling to 2048 processors
Molecular dynamics with provisions for periodic slabs and solids
Nano-chemistryDaresburyLaboratoryDL_POLY
Run to 2048. Communication still being optimized
Hodgkin/Huxley Model for neuron firingNeuroscienceANL/UCpNeo
Good scaling to 2048 processors
Nuclear binding energy using Monte CarloNuclear PhysicsANLQMC
Good scaling to 2048 processors
N-S using spectral elementsGeneral CFDANLNek5
Scaling tests to 16k processors
Hydro (PPM) + Nuclear burningAstrophysicsANL/UCFlash
1st Applications Running 2048-way on ANL BG/L
Top Related