NVIDIA HPC Update for Earth System Modeling · Business Alliances: Steve Rohm ... FR, and JP for...
Transcript of NVIDIA HPC Update for Earth System Modeling · Business Alliances: Steve Rohm ... FR, and JP for...
Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
NVIDIA HPC Update for Earth System Modeling
2
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
3
New HQ in Santa Clara, CA, USA
FY17 Rev ~$7B USD
GTC 2018 Mar 26-29
Artificial Intelligence Computer Graphics GPU Computing
NVIDIA Company Update
NVIDIA Core Markets
www.gputechconf.com/present/call-for-submissions
4
3X HPC Developers
2016 2014
400,000
120,000
55,000
25x Deep Learning Developers
•Higher Ed 35% •Software 19% •Internet 15% •Auto 10% •Government 5% •Medical 4% •Finance 4% •Manufacturing 4%
2016 2014
2,200
NVIDIA Growth from Advancement of HPC and AI
GPUs Power World’s Leading Data Centers for HPC and AI:
5
V100 (2017) P100 (2016) K40 (2014)
Double Precision TFlop/s 7.5 5.3 1.4
Single Precision TFlop/s 15.0 10.6 4.3
Half Precision TFlop/s 120 (DL) 21.2 n/a
Memory Bandwidth
(GB/s) 990 720 288
Memory Size 16GB 16GB 12GB
Interconnect NVLink: Up to 300 GB/s
PCIe: 32 GB/s
NVLink: 160 GB/s
PCIe: 32 GB/s PCIe: 16 GB/s
Power 300W 300W 235W
NVIDIA Volta – GPU Feature Comparisons
Volta Availability DGX-1: Q3 2017; OEM : Q4 2017
1.42x
1.42x
1.25x
3.8x
2.5x
2.5x
1.33x 1.00x
1.00x
~6x
6
NVIDIA NVLink Fast Interconnect
x86 Cluster Node: GPU-to-GPU
Power Cluster Node: CPU-to-GPU-to-GPU
7
DOE CORAL Systems with Volta and NVLink
LLNL Sierra 150PF in 2018 ORNL Summit 200PF in 2018
CAAR support from IBM and NVIDIA
~1/4x
~29x
~1.7x
27,600 GPUs
8
ACME: Accelerated Climate Modeling for Energy First fully accelerated climate model (GPU and MIC)
Consolidation of DOE ESM projects from 7 into 1 DOE Labs: Argonne, LANL, LBL, LLNL, ORNL, PNNL, Sandia
Towards NH global Atm 12 km, Ocn 15 km, 80 year
ACME component models and GPU progress Atm – ACME-Atmosphere (NCAR CAM-SE fork)
Dycore now in trunk, CAM physics started with OpenACC
Ocn – MPAS-O (LANL) LANL team at ORNL OpenACC Hackathon during 2015
Others – published OpenACC progress Sea-Ice – ACME-CICE (LANL) Land – CLM (ORNL, NCAR) Cloud Superparameterization – SAM (SBU, CSU) Land-Ice – PISCEES (Multi-lab – LLNL, Sandia)
DOE ACME GPU-Accelerated Coupled Climate Model
9
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
10
Developer Relations: Stan Posey - Santa Clara, CA (HQ) [email protected]
Developer Technology: Carl Ponder, PhD – Austin, TX, US WRF, MPAS-A, FV3, GEOS-5, COAMPS, GEM
Jeff Larkin – Oak Ridge, TN, US CAM-SE, all ACME component models
Jeremy Appleyard, PhD – Oxford, UK IFS, NEMO, UM/GungHo, GOcean
Peter Messmer, PhD – Zurich, CH IFS/ESCAPE, COSMO, ICON
Akira Naruse, PhD – Tokyo, JP JMA-GSM, ASUCA, NICAM
. . .
PGI Applications Eng: Dave Norton, Lake Tahoe, CA All models that use PGI compilers
Business Alliances: Steve Rohm – Charlotte, NC, US US East: NOAA, EC, DOE, DoD, NASA
Greg Branch – Boulder, CO, US US West: NOAA, NCAR, DOE, NASA
Jeremy Purches - Bristol, UK ECMWF, UKMO, STFC, No. Europe
Stefan Kraemer – Würselen, DE DWD, DKRZ/MPI-M, MCH, Central Europe
Frederic Pariente – Toulouse, FR MF, IPSL, CNRS, CERFACS, So. Europe
. . .
Solution Architects: Jeff Adie – Singapore WRF, MPAS, any ESM sales opportunity
Several Others Worldwide Contact [email protected]
NVIDIA Support for Earth System Modeling Domain
11
WW ESM growth in GPU funded-development: NOAA, NCAR, ECMWF, DOE, DoD
Large ESM-driven GPU systems (K80/P100): NOAA, ECMWF, CSCS, NIES, Others
First ever GPU-based operational NWP: MeteoSwiss with COSMO (Since 2016) ~4x speedup with ~5x less energy vs. conventional CPU-only; New COSMO evaluations by Met’s in DE, RU, IT
DOE climate model ACME-Atm v1 production on TITAN using PGI OpenACC
NCAR collaboration on MPAS with 2016 & 2017 GPU Hands-on Workshops https://www2.cisl.ucar.edu/news/summertime-hackathon (focus on GPU development of MPAS-A with KISTI)
ECWMF selected NVIDIA as partner for ESCAPE exascale weather project https://software.ecmwf.int/wiki/display/OPTR/NVIDIA+Basic+GPU+Training+with+emphasis+on+Fortran+and+OpenACC
NEMO Systems Team invitation for NVIDIA to join new HPC working group Following successful NVIDIA OpenACC scalability of NEMO for ORCA025 configuration (NEMO UGM 2014)
New ESM opportunities, developing in new solution focus areas DL in climate and weather; BI for Ag and Actuary; Air quality monitoring (CN, KR); Commercial WRF start-up TQI
ESM development teams with participation in global GPU Hackathons DOE/ACME, NRL/COAMPS, MPI-M/ECHAM6, ODU/FVCOM, NRL/HYCOM, NOAA GFDL radiation, RRTMGP
Select NVIDIA ESM Highlights Since Multi-Core 6
12
SENA – NOAA funding for accelerator development of WRF, NGGPS (FV3), GFDL climate, NMMB
ESCAPE – ECMWF-led EUC Horizon 2020 program for IFS; NVIDIA 1 of 11 funded partners
ACME – US DOE accelerated climate model: CAM-SE, MPAS-O, CICE, CLM, SAM, PISCEES, others
AIMES – Govt’s from DE, FR, and JP for HPC (and GPU) developments of ICON, DYNAMICO, NICAM
SIParCS – NCAR academia funding for HPC (and GPU) developments of MPAS, CESM, DART, Fields
AOLI – US DoD accelerator development of operational models HYCOM, NUMA, CICE, RRTMG
GridTools – Swiss gov funding MCH/CSCS/ETH for accelerator-based DSL in COSMO, ICON, others
GPU Funded-Development Growing for ESM
NOTE: Follow each program LINK for details; Programs listed from top-down in rough order of newest to oldest start date
HPC Programs with Funding Specifically Targeted for GPU Development of Various ESMs
13
NOAA To Improve NWP and Climate Research with GPUs
Develop global model with 3km resolution, five-fold increase from today’s resolution
NWP Model: FV3/GFS (also climate research)
Important GPU-Based NWP System Deployments
Cray CS-Storm, 760 x P100, 8 GPUs per node
MeteoSwiss Deploys World’s 1st Operational NWP on GPUs
2-3x higher resolution for daily forecasts
14x more simulation with ensemble approach for medium-range forecasts
NWP Model: COSMO
Cray CS-Storm, 192 x K80, 8 GPUs per node
14
NGGPS final selection: FV3
Models with GPU
developments:
NIM
MPAS
NEPTUNE/NUMA
FV3
NOAA NGGPS Motivated Several GPU Collaborations
From: Next Generation HPC and Forecast Model Application Readiness at NCEP
-by John Michalakes, NOAA NCEP; AMS, Phoenix, AZ, Jan 2015
NGGPS NH Model Dycore Candidates (5)
16
NCAR MPAS Model and OpenACC Developments
Results by NCAR presented at ISC17 BoF, 20 June, Frankfurt, DE
Cloud Resolving Global Earth-System Models: HPC at Its Extreme
MPAS-A Global grid P100 vs. BDW node P100 vs. BDW
(dycore) resolution (2 x BDW sockets) (1 x BDW socket)
v5.0 trunk 120 km 2.5x (actual) 5.0x (estimate)
v5.0 trunk 60 km 2.7x (actual) 5.4x (estimate)
v5.0 trunk 30 km n/a (exceeded memory of single GPU)
Source: http://www.isc-hpc.com/isc17_ap/sessiondetails.htm?t=session&o=444&a=select&ra=index
17
US NRL NUMA Dycore and Scalability on TITAN
Sources: - https://www2.cisl.ucar.edu/sites/default/files/Abdi_Boulder_2016.pdf
- Giraldo, et. al. SIAM CSE, Mar 2017
Weak Scalability to 16,384 GPUs for 3 NUMA Numerical Methods
(Explicit)
Strong Scalability to 4,096 GPUs for 3 NUMA Global Resolutions
(Implicit – Explicit) Multi-Core 6 Workshop
14 Sep 16, NCAR, Boulder, USA
Towards Exascale Computing with the Atmospheric Model
NUMA -by Dr. Daniel Abdi, Frank Giraldo, NPS
• Strong scalability for
all grid resolutions
• Weak scalability at 90% efficiency
18
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
19
Large Scale Climate Simulations with COSMO
Source: PASC 2017, Lugano, CH, Jun 2017; Contact Hannes Vogt, CSCS, [email protected]
Strong Scaling to 4888 x P100 GPUs
Piz Daint #3 Top500
25.3 PetaFLOPS
5320 x P100 GPUs
- Oliver Fuhrer, et al, MeteoSwiss
20
MeteoSwiss Weather Prediction Based on GPUs
World’s First GPU-Accelerated NWP
Piz Kesch (Cray CS Storm)
Installed at CSCS July 2015
2x Racks with 48 Total CPUs
192 Tesla K80 Total GPUs
High GPU Density Nodes:
2 x CPU + 8 x GPU
> 90% of FLOPS from GPUs
Operational NWP Mar 16
Image by NVIDIA/MeteoSwiss
21
COSMO 7 (6.6 KM) 3 per day, 3 day forecast
COSMO 2 (2.2 KM) 8 per day, 24 hr forecast
IFS from ECMWF 2 per day, 10 day forecast
COSMO E (2.2 KM) 2 per day, 5 day forecast
COSMO 1 (1.1 KM) 8 per day, 24 hr forecast
IFS from ECMWF 2 per day, 10 day forecast
MeteoSwiss COSMO NWP
Configurations During 2016
With GPUs
MeteoSwiss COSMO NWP
Configurations Since 2008
Before GPUs
“New configurations of higher resolution and ensemble predictions possible owing to the performance-per-energy gains from GPUs” –X. Lapillonne , MeteoSwiss; EGU Assembly, Apr 2015
MeteoSwiss and Operational COSMO NWP on GPUs
22
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
23
CUDA + OpenACC: TempoQuest
Plans for commercial “WRF-based” software product NVIDIA providing standard engineering guidance Based on WRF 3.6.1 and 3.8.1, ARW dycore
\
OpenACC: NVIDIA + NCAR (guidance) Migrating 3.6.1 routines to 3.7.1 and 3.8.1 Working towards unified memory capability PGI compiler continues to improve results Several P100 customer evaluations completed
(Example in later slides)
Potential for Full model WRF on GPUs Several months away, hybrid in near term P100/V100 GPU will improve hybrid vs. Kepler
GPU Developments for the WRF Model
24
TQI HQ in Boulder,
3 developers focus
on CUDA + OpenACC
Release candidate based on WRF ARW
3.8.1 now under validation and QA
NVIDIA providing
engineering support
TempoQuest USA Start-up Commercializing WRF
25
WRF Modules and Routines Available in OpenACC
Project to implement OpenACC routines into full model WRF 3.6.1 Several dynamics routines including all of advection
Several physics schemes (10): Microphysics (4) – Kessler, Morrison, Thompson, WSM6 Radiation (2) – RRTM (lw), Dudhia (sw), RRTMG (modified AER version from 3.7.1 trunk) Planetary boundary layer (2) – YSU, GWDO Cumulus (1) – Kain-Fritsch Surface layer (1) – Noah
dyn_em/module_advect_em.OpenACC.F
dyn_em/module_bc_em.OpenACC.F
dyn_em/module_big_step_utilities_em.OpenACC.F
dyn_em/module_diffusion_em.OpenACC.F
dyn_em/module_em.OpenACC.F
dyn_em/module_first_rk_step_part1.OpenACC.F
dyn_em/module_first_rk_step_part2.OpenACC.F
dyn_em/module_small_step_em.OpenACC.F
dyn_em/module_stoch.OpenACC.F
dyn_em/solve_em.OpenACC.F
dyn_em/start_em.OpenACC.F
frame/module_dm.OpenACC.F
frame/module_domain_extra.OpenACC.F
frame/module_domain.OpenACC.F
frame/module_domain_type.OpenACC.F
frame/module_integrate.OpenACC.F
share/mediation_integrate.OpenACC.F
share/module_bc.OpenACC.F
share/wrf_bdyin.OpenACC.F
phys/module_bl_gwdo.OpenACC.F
phys/module_bl_ysu.OpenACC.F
phys/module_cu_kfeta.OpenACC.F
phys/module_cumulus_driver.OpenACC.F
phys/module_microphysics_driver.OpenACC.F
phys/module_microphysics_zero_out.OpenACC.F
phys/module_mp_kessler.OpenACC.F
phys/module_mp_morr_two_moment.OpenACC.F
phys/module_mp_thompson.OpenACC.F
phys/module_mp_wsm6.OpenACC.F
phys/module_pbl_driver.OpenACC.F
phys/module_physics_addtendc.OpenACC.F
phys/module_physics_init.OpenACC.F
phys/module_ra_rrtm.OpenACC.F
phys/module_ra_sw.OpenACC.F
phys/module_sf_noahlsm.OpenACC.F
phys/module_sf_sfclayrev.OpenACC.F
phys/module_surface_driver.OpenACC.F
Routines completed:
Dynamics (11)
Physics (18)
Other (8)
26
32
4 + 4
8 + 8
16 +
16
64
128
Number of
CPU Cores + P100 GPUs
Number of
CPU Cores
WRF OpenACC Performance: Weather Service in JP
Comparisons on NVIDIA PSG Cluster
2x Intel Xeon E5-2698V3 Haswell CPUs (2.3GHz, 16 cores) 256 GB memory, 4x P100 GPUs, PGI 16.9
WRF Domain (5km)
• WRF 3.6.1
• Grid = 661 x 711 x 51 (24 MM)
• Time step = 30 sec, 720 TS’s
• MP = Morrison
• LSM = Noah
• PBL = YSU
• Rad = Dudhia + RRTM
27
32
4 + 4
8 + 8
16 +
16
64
128
Number of
CPU Cores + P100 GPUs
Number of
CPU Cores
Comparisons on NVIDIA PSG Cluster
2x Intel Xeon E5-2698V3 Haswell CPUs (2.3GHz, 16 cores) 256 GB memory, 4x P100 GPUs, PGI 16.9
WRF Domain (5km)
• WRF 3.6.1
• Grid = 661 x 711 x 51 (24 MM)
• Time step = 30 sec, 720 TS’s
• MP = Morrison MP
• LSM = Noah
• PBL = YSU
• Rad = Dudhia + RRTM
16
~7400 (s)
~15x with
16 GPUs
WRF OpenACC Performance: Weather Service in JP
28
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
30
ESCAPE Development of Weather & Climate Dwarfs
Batched Legendre Transform
(GEMM) variable length Batched 1D FFT
variable length
NVIDIA-Developed Dwarf: Spectral Transform - Spherical Harmonics
GPU-based Spectral Transform Approach: o For all vertical Layers
• 1D FFT along all latitudes • GEMM along all longitudes
o NVIDIA-based libraries for FFT and GEMM o OpenACC directives
31
Results of Multi-GPU Spectral Transform Dwarf
Source: ESCAPE 2nd Dissemination Workshop, 5-7 Sep 2017, Poznań, Poland
32
TOPICS OF
DISCUSSION
• NVIDIA HPC UPDATE
• ESM PROGRESS WITH GPUS
• COSMO
• WRF
• ESCAPE/IFS
• ESM GROWTH IN HPC + AI
33
Sample of Inside HPC Headlines Just Last Week . . .
“It’s somewhat ironic that training for deep learning probably has more similarity to the HPL
benchmark than many of the simulations that are run today…” - - Kathy Yelick, LBNL
34
Hardware Trends for Operational and Research HPC
Organization Location Models Previous HPC Current HPC Size / Cost (M) / Date
ECMWF Reading, UK IFS IBM Power Cray XC30 – x86 3.5 PF / $65 / Jun 2013
Met Office Exeter, UK UM IBM Power Cray XC30 – x86 16 PF / $120 / Oct 2014
DWD Offenbach, DE COSMO, ICON NEC SX-9 Cray XC30 - x86 2 PF / $23 / Jan 2013
MF Toulouse, FR Arpege, Arome NEC SX-9 Bull - x86 5 PF / $? / Nov 2012
NOAA NCEP Various, US GFS, HRRR/WRF IBM Power IBM iDataPlex - x86
Cray XC-40 - x86 5 PF / $50/yr / Oct 2015
Env Canada Montreal, CA GEM-YY, WRF IBM Power TBA ~2016 > 5 PF
JMA Tokyo, JP GSM, ASUCA Hitachi Power TBA ~2016 > 5 PF
DKRZ/MPI-M Hamburg, DE ICON, MPI-ESM IBM Power Bull - x86 3 PF / $35 / May 2014
NCAR Boulder, CO, US CESM, WRF, MPAS IBM iDataPlex SGI ICE XA - x86 5.34 PF / ~$60 / Jan 2016
NOAA ESRL Fairmont, WV, US FV3, MPAS, WRF Various Cray CS-Storm - x86 TBA ~2016 Re
sear
ch
Op
era
tio
nal
an
d R
ese
arch
35
Use Cases for HPC + Artificial Intelligence
“A confluence of developments is
driving this new wave of AI
development. Computer power is
growing, algorithms and AI models
are becoming more sophisticated,
and, perhaps most important of all,
the world is generating once
unimaginable volumes of the fuel
that powers AI—data.”
“Billions of gigabytes every day,
collected by networked devices . . .”
36
HPC + AI for Weather and Climate Applications
AI in Weather
Applications
Challenges for
HPC and AI
in Weather
and Climate
NERSC
Yandex + Start-ups
NOAA, MCH, others
NCAR, KISTI, others
39
Background
Unexpected fog can cause an airport to cancel or
delay flights, sometimes having global effects on
flight planning.
Challenge
While the weather forecasting model at MeteoSwiss
work at a 2km x 2km resolution, runways at Zurich
airport is less than 2km. So human forecasters sift
through huge simulated data with 40 parameters, like
wind, pressure, temperature, to predict visibility at
the airport.
Solution
MeteoSwiss is investigating the use of deep learning to
forecast type of fog and visibility at sub-km scale at
Zurich airport.
MCH Use of DL Models for Fog Forecast at Airports
40
NOAA Neural Network Study on Thompson MP
https://ams.confex.com/ams/97Annual/webprogram/Paper310969.html
41 41
KISTI Weather Research Deploys Deep Learning
Courtesy Dr. Minsu Joh, KISTI, May 2017
KISTI Schematic of HPC Systems with 60 x K40m GPUs
KISTI Disaster Management HPC Research Group
Deploys GPUs for Numerical and Deep Learning Models in
the Prediction of Extreme Weather
42 Courtesy Dr. Minsu Joh, KISTI, May 2017
KISTI Disaster Management HPC Research Group
Deploys GPUs for Numerical and Deep Learning Models in
the Prediction of Extreme Weather
KISTI to present this research at Climate
Informatics (CI) 2017, Boulder, CO, Sep 2017
KISTI Weather Research Deploys Deep Learning
43 Courtesy Dr. Minsu Joh, KISTI, May 2017
KISTI Disaster Management HPC Research Group
Deploys GPUs for Numerical and Deep Learning Models in
the Prediction of Extreme Weather
KISTI to present this research at Climate
Informatics (CI) 2017, Boulder, CO, Sep 2017
More detail next slide
KISTI Weather Research Deploys Deep Learning
44 44
Courtesy Dr. Minsu Joh, KISTI, May 2017 44
KISTI Schematic of the workflow of a
hybrid numerical plus deep learning model
KISTI Scientists use output from the MPAS numerical atmosphere model to train a deep
learning model for improved forecast
tracking of typhoons
KISTI Weather Research Deploys Deep Learning