applications requiring exascale compute and data capabilities · applications requiring exascale...
Transcript of applications requiring exascale compute and data capabilities · applications requiring exascale...
ORNL is managed by UT-Battelle for the US Department of Energy
Science applications requiring exascale compute and data capabilities Jack Wells Director of Science Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory SIAM EX14 Workshop Chicago 6 July 2014
2
Outline • U.S. DOE Leadership Computing Program
– Breakthrough Science Across a Broad Range of Disciplines
• Science opportunities over next decade – Fusion Energy, Biomass to Biofuels, Solar Energy,
Nuclear Energy
• Industry also has big problems and demanding requirements
• CORAL Procurement: Mission need for pre-exascale capability system in 2018
• Exascale challenges remain with us
3
Mission need for Leadership Computing Leadership computing capability is required for scientists to tackle the high-resolution, multi-scale/multi-physics simulations of greatest interest and impact to both science and the nation.
Leadership Computing capability is typically 10-100X greater than other computational centers.
Leadership Computing research is mission critical to inform policy decisions and advance innovation in far reaching topics such as:
• energy assurance
• ecological sustainability
• scientific discovery
• global security
“We will respond to the threat of climate change, knowing that the failure to do so would betray our children and future generations.” – President Obama 1/21/2013
4
What is the Leadership Computing Facility (LCF)? • Collaborative DOE Office of Science
program at ORNL and ANL
• Mission: Provide the computational and data resources required to solve the most challenging problems.
• 2-centers/2-architectures to address diverse and growing computational needs of the scientific community
• Highly competitive user allocation programs (INCITE, ALCC).
• Projects receive 10x to 100x more resource than at other generally available centers.
• LCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts).
5
Hardware scaled from single-core through dual-core to quad-core and dual-socket , 12-core SMP nodes
Scaling applications and system software was the biggest challenge
Cray XT4 Dual-Core 119 TF
2006 2007 2008
Cray XT3 Dual-Core 54 TF
Cray XT4 Quad-Core 263 TF
ORNL has increased system performance by 1,000 times 2004-2010
2005
Cray X1 3 TF
Cray XT3 Single-core 26 TF
2009
Cray XT5 Systems 12-core, dual-socket SMP 2.3 PF
6
Science breakthroughs at the OLCF:
Biomass as a viable, sustainable feedstock for hydrogen production for fuel cells, Nano Letters (2011)
J. Phys. Chem. Lett. (2010) 71 & 74 citations, respectively
World’s first continuous simulation of 21,000 years of Earth’s climate
history, Science (2009) 116 citations
Largest simulation of a galaxy’s worth of dark matter, showed for the first time the
fractal-like appearance of dark matter substructures, Nature (2008)
326 citations, 3/2014
MD simulations show selectivity filter of a trans-membrane ion channel is sterically locked open by hidden water molecules, Nature (2013)
Global Warming preceded by increasing carbon dioxide concentrations during the last deglaciation, Nature (2012). 64 citations, 3/2014
Researchers solved the 2D Hubbard model and presented evidence that it predicts
HTSC behavior, Phys. Rev. Lett (2005) 105 citations, 3/2014
First-Principles Flame Simulation Provides Crucial Information to Guide Design of Fuel-Efficient Clean
Engines, Proc. Combust. Insti. (2007) 78 citations, 3/2014
Calculation of the number of bound nuclei in nature, Nature (2012), 36 citations, 3/2014 , 36 citations, 3/2014
SELECTED science and engineering advances over the period 2003 - 2013
Astrophysicists discover supernova shock-wave instability, Astrophys. J. (2003) 254 citations, 3/2014
Demonstrated that three-body forces are necessary to describe the long lifetime of 14C Phys. Rev. Lett. (2011) 28 citations, 3/2014
2007 2008 2009 2010 2011 2013 2012 2004 2005 2006 2014 2003
7
High-impact science across a broad range of disciplines
Superconductivity “Doping dependence of spin
excitations and correlations with high-temperature super- conductivity in iron pnictides,“
Meng Wang(IOP CAS Beijing), Nature Communications.
December (2013)
Paleoclimate Science “Northern Hemisphere forcing of Southern Hemisphere climate during the last deglaciation,” Feng He (UW Madison), et al., Nature, February (2013)
Molecular Biology “Recovery from slow
inactivation in K1 channels is controlled by water molecules”
Jared Ostmeyer, et al. (U. Chicago) Nature, Sept. (2013)
Polymer Science “Self-Organized and Cu-Coordinated Surface Linear Polymerization” Qing Li, B. Sumpter (ORNL), Nature Scientific Reports. July (2013)
Molecular Biology “A phenylalanine rotameric switch for signal-state control in bacterial chemoreceptors” D. Ortega (UTK), Nature Communications December (2013)
Complex Oxide Materials “Atomically resolved
spectroscopic studyof Sr2IrO4: Experiment and theory,” Qing
Li (ORNL), E.G. Eguiluz (UTK) Nature Scientific Reports.
October (2013)
For example in 2013:
8
Science opportunities over the next decade
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
• Science benefits and impact of future systems are examined on an ongoing basis.
• Baseline plans are developed in consultation with leading domain scientists.
• Detailed performance analyses are conducted for a subset of applications to understand architectural bottlenecks.
• Leadership Computing Facility has been actively engaged in community assessments of future computational needs and solutions.
Application requirements process has been guiding paths forward
9
Our Science requires that we advance computational capability 1000x over the next decade. Mission: Providing world-class computational resources and specialized services for the most computationally intensive global challenges
Vision: Deliver transforming discoveries in climate, materials, biology, energy technologies, etc
Jaguar 2.3 PF 362 TB DRAM
Titan 27 PF 600 TB DRAM Hybrid GPU/CPU
2010 2012 2017 2022
OLCF-5: 1 EF 20 MW
OLCF-4: 100-250 PF 4000 TB memory > 20MW 6 day resilience
Roadmap to Exascale
What are the Challenges?
10
Science challenges for LCF in next decade
Combustion Science Increase efficiency by
25%-50% and lower emissions from internal
combustion engines using advanced fuels and low-temperature combustion.
Biomass to Biofuels Enhance the understanding
and production of biofuels for transportation and other bio-
products from biomass.
Fusion Energy Develop predictive understanding of plasma properties, dynamics, and interactions with surrounding materials.
Climate Change Science Understand the dynamic ecological and chemical evolution of the climate system with uncertainty quantification of impacts.
Solar Energy Improve photovoltaic efficiency and lower cost for organic and inorganic materials.
Nuclear Energy: For existing reactors, provide
safe, increased fuel utilization, power upgrades, and reactor
lifetime extensions. Design new, safe, cost-effective reactors.
.
11
Fusion Energy/ITER
2013-2018 2018-2023 • Perform high-fidelity simulation of edge plasma
turbulent transport in tokamak from first-principles to address DIII-D and JET-scale plasmas with a goal of understanding high-confinement physics.
• Increase simulation of tokamak edge plasma to ITER scale. Coupled simulations of plasma edge with core and chamber wall interactions. Control edge-localized modes and other destructive mechanisms.
• Perform integrated first-principles simulation including the critical multiscale processes to study fusion-reacting plasmas in realistic magnetic confinement geometries.
• Produce an experimentally validated simulation capability for ITER to design DEMO, the ITER follow-on facility to solve the engineering issues necessary for electricity production with fusion plasmas.
Key science challenges: Effectively model and control the flow of plasma and energy in a fusion reactor, scaling up to ITER-size. Develop predictive understanding of plasma properties, dynamics, and interactions with surrounding materials. Mitigate plasma disruptions.
Science enabled by LCF Capabilities
A global particle-in-cell simulation to show core turbulence in a tokamak. Image S. Ethier, PPPL
12
XGC1 Gyrokinetic simulation of “blobby” edge turbulence in a DIII-D H-mode plasma, together with background and neutral particle dynamics.
The resulting heat load footprint from XGC1 on divertor plate, mapped back to outboard midplane.
Achieved clarification of some assumptions/findings from reduced models, together with some new discoveries
XGC1 on Titan simulates turbulent transport in plasma edge for whole tokamak from first-principles
Courtesy: CS Chang
13
Biomass to biofuels
2013-2018 2018-2023 • Atomic-detail dynamical models of biomass
systems of several million atoms, permitting detailed analysis of interactions
• Simulations of pretreatment effects on multi-component biomass systems to understand the bottlenecks in bioconversion
• Understand the dynamics of enzymatic reactions on biomass by simulating interactions between microbial systems and cellulosic biomass
• Design superior enzymes for conversion of biomass
Key science challenges: Enhance the understanding and production of biofuels from biomass for transportation and other bio-products. The main challenge to overcome is the recalcitrance of biomass (cellulosic materials) to hydrolysis.
Science enabled by increasing LCF Capabilities
Lignin interacting with crystalline cellulose.
14
Science Objectives and Impact
Boosting Bioenergy and Overcoming Recalcitrance Molecular Dynamics Simulations
• Improve our understanding of lignin-cellulose interactions at the molecular level in order to overcome biomass recalcitrance and improve the efficiency of biofuel production, thereby reducing the cost of ethanol
• Ethanol, which is carbon-neutral and domestically produced, is the primary renewable substitute for gasoline
2013 INCITE Program PI: Jeremy Smith
University of Tennessee & ORNL Used Hours: 72,167,000
Performance & OLCF Contribution Science Results
A cellulase enzyme (pink) hydrolyzing a cellulose (blue) strand despite the presence of lignin aggregates (green) on the cellulose surface. Vizualization by M. Matheson (ORNL)
• GROMACS application has been adapted to run on the GPU-accelerated Titan system: Project can handle much larger systems—30 million atoms, compared to 3 million atoms on Jaguar
• The OLCF’s Mike Matheson provided visualization services
• Discovered amorphous cellulose is easier to break down because it associates less with lignin
• This is because the less-organized cellulose interacts strongly with water, making it less available to interact with lignin
• Biofuel production may potentially become more efficient by manipulating cellulose crystallinity so as to reduce lignin precipitation onto cellulose
15
Solar energy
2013-2018 2018-2023 • Understand growth, interface structure, and
stability of heterogeneous polymer blends necessary for efficient solar conversion.
• Simulations of structure, carrier transport, and defect states in nanomaterials.
• Describe excited state phenomena in homogeneous systems.
• Enable computational screening of materials for desired excited-state and charge transport properties.
• Systems-level, multiphysics simulations of practical photovoltaic devices are enabled.
• Uncertainty quantification enabled for critical integrated materials properties.
Key science challenges: Improve photovoltaic efficiency and lower cost for organic and inorganic materials. A photovoltaic material poses difficult challenges in the prediction of morphology, excited state phenomena, transport, and materials aging.
Science enabled by LCF Capabilities
16
Science Objectives and Impact • Organic photovoltaic (OPV) solar cells
are promising renewable energy sources:
– Low costs, high-flexibility, and light weight
• Bulk-heterojunction (BHJ) active layer morphology and domain size is critical for improving performance
Towards Rational Design of Efficient Organic Photovoltaic Materials
LAMMPS Early Science Project J.-M. Carrillo, W.M. Brown,
Oak Ridge National Laboratory Hours Used: 72,000,000
Titan Simulation: LAMMPS Preliminary Science Results
Corse-grained MD simulation of phase-separation of a 1:1 weight ratio P3HT/PCBM mixture into donor (white) and acceptor (blue) domains.
P3HT (electron donor)
PCBM (electron acceptor)
• OLCF/CAAR team prepared LAMMPS for use with GPUs on Titan
• Portability: Builds with CUDA or OpenCL • Speedups on Titan (GPU+CPU vs. CPU:
2X to 15x (mixed precision) depending upon model and simulation
• Speedup of 2.5-3x for OPV simulation used here
• Titan simulations are 27x larger and 10x longer – Converged P3HT:PCBM separation in 400ns CGMD
time • Prediction: Increasing polymer chain length will
decrease the size of the electron donor domains • Prediction: PCBM (fullerene) loading parameter results
in an increasing, then decreasing impact on P3HT domain size
17
Nuclear Energy advances will be enabled by LCF over the next decade
2013-2016 2016-2020 • Coupled-physics simulation of fluid flow,
heat transfer, neutron transport, and fuel behavior for fuel assemblies.
• Single-physics and low-fidelity coupled-physics analysis of full cores.
• Predict behavior of nominal reactor operation and existing nuclear fuels with limited quantification of uncertainties.
• Coupled-physics simulation of fluid flow, heat transfer, neutron transport, and fuel behavior for multiple fuel assemblies and limited full-core analysis of nominal reactor operation.
• Initial capability for analysis of reactor transients, including some accident scenarios. Improved quantification of uncertainties.
Key science challenges: For existing reactors provide safe, increased fuel utilization, power upgrades, and reactor lifetime extensions. Design new, safe, cost-effective reactors.
Science enabled by LCF Capabilities
18
Execution
Goals • Compare fidelity and performance of Shift against Keno, SPN,
and SN (Denovo) • Generate high-fidelity neutronics solution for code comparison
of solutions for predicting reactor startup and physics testing
Westinghouse-CASL Simulation of Reactor Start Up – Quarter-Core Zero Power Physics Test
• AP1000 model created and results generated for reactor criticality, rod worth, and reactivity coefficients
• Identical VERA Input models used for Shift, SPN, and SN – dramatically simpler than KENO-VI input model
Results • Some of the largest Monte Carlo calculations ever performed
(1 trillion particles) have been completed – runs use 230,000 cores of Titan or more
• Excellent agreement with KENO-VI • Extremely fine-mesh SN calculations, which leverage Titan’s
GPU accelerators, are under way
AP1000 pin powers
19
Number of Industry Projects Underway at OLCF by Calendar Year
3 5 5
7
12
19 21
29 28
0
5
10
15
20
25
30
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Current as of June 2014
*
* Year in progress
20
Human skin barrier
Global flood maps
Engine cycle-to-cycle variation
Fuel efficient jet engines
Wind turbine resilience
Welding Software
Demonstrated small molecules can have large
and varying impact on skin permeability
depending on their molecular
characteristics—important for
product efficacy and safety
Developed fluvial and pluvial high resolution global flood maps to enable insurance firms to better price risk and reduce loss of life and property
Developing novel approach to using massively parallel, multiple simultaneous combustion cycle simulations to address cycle-to-cycle variations in spark ignition engine
Conducting first-of-a-kind high-
fidelity LES computations
of flow in turbomachinery components for
more fuel efficient, next-generation jet
engines
First time simulation of ice formation within million-molecule water droplets is
expanding understanding of freezing at the
molecular level to enhance wind
turbine resilience in cold climates
Evaluating large-scale HPC and
GPU capability of critical welding
simulation software and
further developing & testing weld optimization
algorithm
Innovation through Industrial Partnerships
21
Aircraft design
Consumer product stability
Gasoline engine injector
Jet engine efficiency
Li-ion batteries
Underhood cooling
Unexpected discovery of multiple solutions for steady RANS equations with separated flow helps explain why numerical modeling sometimes fails to capture maximum lift
Developed method to measure impact of additives, such as dyes and perfumes, on properties of lipid systems such as fabric enhancer and other formulated products
Optimizing multihole gasoline spray injector nozzle designs for better in-cylinder fuel-air mixture distributions, greater fuel efficiency and reduced physical prototypes
Accurate predictions
of atomization of liquid fuel
by aerodynamic forces enhance
combustion stability, improve
efficiency, and reduce emissions
New classes of solid inorganic Li-ion electrolytes
could deliver high ionic
and low electronic conductivity and good
electrochemical stability
Developed a new, efficient and automatic
analytical cooling package
optimization process leading to one of a kind
design optimization of
cooling systems
Innovation through Industrial Partnerships
22
Catalysis Design
innovation Aircraft design
Industrial fire suppression
Turbo machinery efficiency
Long-haul truck fuel efficiency
Demonstrated biomass as a
viable, sustainable feedstock
for hydrogen production for
fuel cells; showed nickel is a
feasible catalytic alternative to platinum
Accelerating design of shock
wave turbo compressors
for carbon capture and
sequestration
Simulated takeoff and
landing scenarios improved
a critical code for estimating characteristics of commercial
aircraft, including lift, drag, and controllability
Developing high-fidelity modeling
capability for fire growth and
suppression; fire losses
account for 30% of U.S. property
loss costs
Simulated unsteady flow in turbo machinery,
opening new opportunities for
design innovation and efficiency improvements.
Simulations reduced by 50%
the time to develop a unique system of add-on
parts that increases fuel
efficiency by 7−12% for long-haul (18-
wheeler) trucks
Innovation through Industrial Partnerships
23
Science Objectives and Impact • Ramgen Power Systems is developing shock wave
compression technology to meet DOE goals for reducing Carbon Capture and Sequestration costs.
• NASA transonic test case: Conduct high-resolution simulations of “Stage 67” tip injection flow control (originally designed and experimentally tested by NASA Glenn Research Center in 2004) to observe new details of fundamental aerodynamic phenomenon.
Ramping Up Turbomachine Performance Simulating novel tip injection flow control shown to delay stall in shock wave compression technology
Application Performance Science Results
• Validated accuracy of CFD models driving optimization of Ramgen turbomachine designs.
• Simulated secondary structures observed for the first time and never detected experimentally.
• Gained insight into how injected jets manipulate viscous shear flow and vortex structures to delay stall.
Project PI: A. Grosvenor, Ramgen Power Systems Allocation Program: Discretionary, ALCC
• ADIOS integrated into FINE/Turbo, enabling 100x increase in I/O speed.
• Efficient parallel performance of commercial code FINE/Turbo to 1.5 billion grid cells (148 million per rotor passage) running up to 5,000 cores.
Simulation demonstrating tip injection influence on vortices at tip clearance gap between rotor tip and casing. Grosvenor, et al. Proceedings of ASME Turbo Expo (2014).
OLCF contributions: Mike Matheson guided Ramgen and Numeca to improve scalability, I/O performance, memory utilization and workflow design.
24
Leadership Computing Mission Need for 150 PF to 400 PF Capability System in 2017-2018
DOE Mission Need Statement was approved 12/2012 and revised in 7/2013: • 150-400 PF Capability System with delivery in
2017-2018 – Capability will be divided between two Oak Ridge and
Argonne Leadership Computing Facilities – Architectural Diversity among the LCF systems is
required
25
CORAL Joint NNSA & SC Leadership Computing Acquisition Project
Leadership Computers run the most demanding DOE mission applications and advance HPC technologies to assure continued US/DOE leadership
Objective - Procure 3 leadership computers to be sited at ANL, ORNL and LLNL in CY17
Approach Competitive process - one RFP (issued by LLNL) leading to 2 R&D contracts and 3 computer procurement contracts For risk reduction and to meet a broad set of requirements, 2 architectural paths will be selected Once Selected, Multi-year Lab-Awardee relationship to co-design computers Both R&D contracts jointly managed by the 3 Labs Each lab manages and negotiates its own computer procurement contract, and may exercise options to meet their specific needs Understanding that long procurement lead-time may impact architectural characteristics and designs of procured computers
Sequoia (LLNL) 2012 - 2017
Mira (ANL) 2012 - 2017
Titan (ORNL) 2012 - 2017
Current DOE Leadership Computers
26
CORAL Procurement Model
RFP
R&D contract SC Lab #1 computer contract (2017 delivery)
R&D contract SC Lab #2 computer contract (2017 delivery) LLNL computer contract (2017 delivery)
Two Diverse Architecture Paths
27
OLCF-4 Accelerated Application Readiness
Center for Accelerated Application Readiness (CAAR) • Performance analysis of community applications
• Technical plan for code restructuring and optimization
• Deployment on OLCF-4
OLCF-4 will issue a call for proposals in FY2015 for application development partnerships between community developers, OLCF staff and the OLCF Vendor Center of Excellence.
28
Are we seeing the real exascale challenges?
Advanced simulation and modeling apps
Exascale Challenges
Scale Power Resilience Memory wall
Conquering Petascale problems of today
Beware being eaten alive by the Exascale problems of tomorrow.
29
Conclusions • Leadership computing is for the critically important
problems that need the most powerful compute and data infrastructure
• Broad-based science requirements for exascale computing are well established
• Accelerated, hybrid-multicore computing solutions are performing well on real, complex scientific applications. – But you must work to expose the parallelism in your codes. – This refactoring of codes is largely common to all massively
parallel architectures
• OLCF resources are available to industry, academia, and labs, through open, peer-reviewed allocation mechanisms.
30
Acknowledgements
OLCF-3 Vendor Partners: Cray, AMD, NVIDIA, CAPS, Allinea
OLCF Users:
CS Chang (PPPL), Jeremy Smith(UT/ORNL), Jan- Michael Carrillo (ORNL), Tom Evans/John Turner (ORNL), Allan Grosvenor (Ramgen/Masten)
Mike Matheson and Dave Pugmire (ORNL) for visualizations
DOE ASCR Leadership Computing Program, ALCF, OLCF This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.