NVIDIA HPC Update for Earth System Modeling · Business Alliances: Steve Rohm ... FR, and JP for...

45
Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC Update for Earth System Modeling

Transcript of NVIDIA HPC Update for Earth System Modeling · Business Alliances: Steve Rohm ... FR, and JP for...

Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

NVIDIA HPC Update for Earth System Modeling

2

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

3

New HQ in Santa Clara, CA, USA

FY17 Rev ~$7B USD

GTC 2018 Mar 26-29

Artificial Intelligence Computer Graphics GPU Computing

NVIDIA Company Update

NVIDIA Core Markets

www.gputechconf.com/present/call-for-submissions

4

3X HPC Developers

2016 2014

400,000

120,000

55,000

25x Deep Learning Developers

•Higher Ed 35% •Software 19% •Internet 15% •Auto 10% •Government 5% •Medical 4% •Finance 4% •Manufacturing 4%

2016 2014

2,200

NVIDIA Growth from Advancement of HPC and AI

GPUs Power World’s Leading Data Centers for HPC and AI:

5

V100 (2017) P100 (2016) K40 (2014)

Double Precision TFlop/s 7.5 5.3 1.4

Single Precision TFlop/s 15.0 10.6 4.3

Half Precision TFlop/s 120 (DL) 21.2 n/a

Memory Bandwidth

(GB/s) 990 720 288

Memory Size 16GB 16GB 12GB

Interconnect NVLink: Up to 300 GB/s

PCIe: 32 GB/s

NVLink: 160 GB/s

PCIe: 32 GB/s PCIe: 16 GB/s

Power 300W 300W 235W

NVIDIA Volta – GPU Feature Comparisons

Volta Availability DGX-1: Q3 2017; OEM : Q4 2017

1.42x

1.42x

1.25x

3.8x

2.5x

2.5x

1.33x 1.00x

1.00x

~6x

6

NVIDIA NVLink Fast Interconnect

x86 Cluster Node: GPU-to-GPU

Power Cluster Node: CPU-to-GPU-to-GPU

7

DOE CORAL Systems with Volta and NVLink

LLNL Sierra 150PF in 2018 ORNL Summit 200PF in 2018

CAAR support from IBM and NVIDIA

~1/4x

~29x

~1.7x

27,600 GPUs

8

ACME: Accelerated Climate Modeling for Energy First fully accelerated climate model (GPU and MIC)

Consolidation of DOE ESM projects from 7 into 1 DOE Labs: Argonne, LANL, LBL, LLNL, ORNL, PNNL, Sandia

Towards NH global Atm 12 km, Ocn 15 km, 80 year

ACME component models and GPU progress Atm – ACME-Atmosphere (NCAR CAM-SE fork)

Dycore now in trunk, CAM physics started with OpenACC

Ocn – MPAS-O (LANL) LANL team at ORNL OpenACC Hackathon during 2015

Others – published OpenACC progress Sea-Ice – ACME-CICE (LANL) Land – CLM (ORNL, NCAR) Cloud Superparameterization – SAM (SBU, CSU) Land-Ice – PISCEES (Multi-lab – LLNL, Sandia)

DOE ACME GPU-Accelerated Coupled Climate Model

9

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

10

Developer Relations: Stan Posey - Santa Clara, CA (HQ) [email protected]

Developer Technology: Carl Ponder, PhD – Austin, TX, US WRF, MPAS-A, FV3, GEOS-5, COAMPS, GEM

Jeff Larkin – Oak Ridge, TN, US CAM-SE, all ACME component models

Jeremy Appleyard, PhD – Oxford, UK IFS, NEMO, UM/GungHo, GOcean

Peter Messmer, PhD – Zurich, CH IFS/ESCAPE, COSMO, ICON

Akira Naruse, PhD – Tokyo, JP JMA-GSM, ASUCA, NICAM

. . .

PGI Applications Eng: Dave Norton, Lake Tahoe, CA All models that use PGI compilers

Business Alliances: Steve Rohm – Charlotte, NC, US US East: NOAA, EC, DOE, DoD, NASA

Greg Branch – Boulder, CO, US US West: NOAA, NCAR, DOE, NASA

Jeremy Purches - Bristol, UK ECMWF, UKMO, STFC, No. Europe

Stefan Kraemer – Würselen, DE DWD, DKRZ/MPI-M, MCH, Central Europe

Frederic Pariente – Toulouse, FR MF, IPSL, CNRS, CERFACS, So. Europe

. . .

Solution Architects: Jeff Adie – Singapore WRF, MPAS, any ESM sales opportunity

Several Others Worldwide Contact [email protected]

NVIDIA Support for Earth System Modeling Domain

11

WW ESM growth in GPU funded-development: NOAA, NCAR, ECMWF, DOE, DoD

Large ESM-driven GPU systems (K80/P100): NOAA, ECMWF, CSCS, NIES, Others

First ever GPU-based operational NWP: MeteoSwiss with COSMO (Since 2016) ~4x speedup with ~5x less energy vs. conventional CPU-only; New COSMO evaluations by Met’s in DE, RU, IT

DOE climate model ACME-Atm v1 production on TITAN using PGI OpenACC

NCAR collaboration on MPAS with 2016 & 2017 GPU Hands-on Workshops https://www2.cisl.ucar.edu/news/summertime-hackathon (focus on GPU development of MPAS-A with KISTI)

ECWMF selected NVIDIA as partner for ESCAPE exascale weather project https://software.ecmwf.int/wiki/display/OPTR/NVIDIA+Basic+GPU+Training+with+emphasis+on+Fortran+and+OpenACC

NEMO Systems Team invitation for NVIDIA to join new HPC working group Following successful NVIDIA OpenACC scalability of NEMO for ORCA025 configuration (NEMO UGM 2014)

New ESM opportunities, developing in new solution focus areas DL in climate and weather; BI for Ag and Actuary; Air quality monitoring (CN, KR); Commercial WRF start-up TQI

ESM development teams with participation in global GPU Hackathons DOE/ACME, NRL/COAMPS, MPI-M/ECHAM6, ODU/FVCOM, NRL/HYCOM, NOAA GFDL radiation, RRTMGP

Select NVIDIA ESM Highlights Since Multi-Core 6

12

SENA – NOAA funding for accelerator development of WRF, NGGPS (FV3), GFDL climate, NMMB

ESCAPE – ECMWF-led EUC Horizon 2020 program for IFS; NVIDIA 1 of 11 funded partners

ACME – US DOE accelerated climate model: CAM-SE, MPAS-O, CICE, CLM, SAM, PISCEES, others

AIMES – Govt’s from DE, FR, and JP for HPC (and GPU) developments of ICON, DYNAMICO, NICAM

SIParCS – NCAR academia funding for HPC (and GPU) developments of MPAS, CESM, DART, Fields

AOLI – US DoD accelerator development of operational models HYCOM, NUMA, CICE, RRTMG

GridTools – Swiss gov funding MCH/CSCS/ETH for accelerator-based DSL in COSMO, ICON, others

GPU Funded-Development Growing for ESM

NOTE: Follow each program LINK for details; Programs listed from top-down in rough order of newest to oldest start date

HPC Programs with Funding Specifically Targeted for GPU Development of Various ESMs

13

NOAA To Improve NWP and Climate Research with GPUs

Develop global model with 3km resolution, five-fold increase from today’s resolution

NWP Model: FV3/GFS (also climate research)

Important GPU-Based NWP System Deployments

Cray CS-Storm, 760 x P100, 8 GPUs per node

MeteoSwiss Deploys World’s 1st Operational NWP on GPUs

2-3x higher resolution for daily forecasts

14x more simulation with ensemble approach for medium-range forecasts

NWP Model: COSMO

Cray CS-Storm, 192 x K80, 8 GPUs per node

14

NGGPS final selection: FV3

Models with GPU

developments:

NIM

MPAS

NEPTUNE/NUMA

FV3

NOAA NGGPS Motivated Several GPU Collaborations

From: Next Generation HPC and Forecast Model Application Readiness at NCEP

-by John Michalakes, NOAA NCEP; AMS, Phoenix, AZ, Jan 2015

NGGPS NH Model Dycore Candidates (5)

15

NOAA NIM Model and Scaling on Dense GPU Node

Source: NVIDIA GTC 2016

Results by

NOAA ESRL

16

NCAR MPAS Model and OpenACC Developments

Results by NCAR presented at ISC17 BoF, 20 June, Frankfurt, DE

Cloud Resolving Global Earth-System Models: HPC at Its Extreme

MPAS-A Global grid P100 vs. BDW node P100 vs. BDW

(dycore) resolution (2 x BDW sockets) (1 x BDW socket)

v5.0 trunk 120 km 2.5x (actual) 5.0x (estimate)

v5.0 trunk 60 km 2.7x (actual) 5.4x (estimate)

v5.0 trunk 30 km n/a (exceeded memory of single GPU)

Source: http://www.isc-hpc.com/isc17_ap/sessiondetails.htm?t=session&o=444&a=select&ra=index

17

US NRL NUMA Dycore and Scalability on TITAN

Sources: - https://www2.cisl.ucar.edu/sites/default/files/Abdi_Boulder_2016.pdf

- Giraldo, et. al. SIAM CSE, Mar 2017

Weak Scalability to 16,384 GPUs for 3 NUMA Numerical Methods

(Explicit)

Strong Scalability to 4,096 GPUs for 3 NUMA Global Resolutions

(Implicit – Explicit) Multi-Core 6 Workshop

14 Sep 16, NCAR, Boulder, USA

Towards Exascale Computing with the Atmospheric Model

NUMA -by Dr. Daniel Abdi, Frank Giraldo, NPS

• Strong scalability for

all grid resolutions

• Weak scalability at 90% efficiency

18

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

19

Large Scale Climate Simulations with COSMO

Source: PASC 2017, Lugano, CH, Jun 2017; Contact Hannes Vogt, CSCS, [email protected]

Strong Scaling to 4888 x P100 GPUs

Piz Daint #3 Top500

25.3 PetaFLOPS

5320 x P100 GPUs

- Oliver Fuhrer, et al, MeteoSwiss

20

MeteoSwiss Weather Prediction Based on GPUs

World’s First GPU-Accelerated NWP

Piz Kesch (Cray CS Storm)

Installed at CSCS July 2015

2x Racks with 48 Total CPUs

192 Tesla K80 Total GPUs

High GPU Density Nodes:

2 x CPU + 8 x GPU

> 90% of FLOPS from GPUs

Operational NWP Mar 16

Image by NVIDIA/MeteoSwiss

21

COSMO 7 (6.6 KM) 3 per day, 3 day forecast

COSMO 2 (2.2 KM) 8 per day, 24 hr forecast

IFS from ECMWF 2 per day, 10 day forecast

COSMO E (2.2 KM) 2 per day, 5 day forecast

COSMO 1 (1.1 KM) 8 per day, 24 hr forecast

IFS from ECMWF 2 per day, 10 day forecast

MeteoSwiss COSMO NWP

Configurations During 2016

With GPUs

MeteoSwiss COSMO NWP

Configurations Since 2008

Before GPUs

“New configurations of higher resolution and ensemble predictions possible owing to the performance-per-energy gains from GPUs” –X. Lapillonne , MeteoSwiss; EGU Assembly, Apr 2015

MeteoSwiss and Operational COSMO NWP on GPUs

22

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

23

CUDA + OpenACC: TempoQuest

Plans for commercial “WRF-based” software product NVIDIA providing standard engineering guidance Based on WRF 3.6.1 and 3.8.1, ARW dycore

\

OpenACC: NVIDIA + NCAR (guidance) Migrating 3.6.1 routines to 3.7.1 and 3.8.1 Working towards unified memory capability PGI compiler continues to improve results Several P100 customer evaluations completed

(Example in later slides)

Potential for Full model WRF on GPUs Several months away, hybrid in near term P100/V100 GPU will improve hybrid vs. Kepler

GPU Developments for the WRF Model

24

TQI HQ in Boulder,

3 developers focus

on CUDA + OpenACC

Release candidate based on WRF ARW

3.8.1 now under validation and QA

NVIDIA providing

engineering support

TempoQuest USA Start-up Commercializing WRF

25

WRF Modules and Routines Available in OpenACC

Project to implement OpenACC routines into full model WRF 3.6.1 Several dynamics routines including all of advection

Several physics schemes (10): Microphysics (4) – Kessler, Morrison, Thompson, WSM6 Radiation (2) – RRTM (lw), Dudhia (sw), RRTMG (modified AER version from 3.7.1 trunk) Planetary boundary layer (2) – YSU, GWDO Cumulus (1) – Kain-Fritsch Surface layer (1) – Noah

dyn_em/module_advect_em.OpenACC.F

dyn_em/module_bc_em.OpenACC.F

dyn_em/module_big_step_utilities_em.OpenACC.F

dyn_em/module_diffusion_em.OpenACC.F

dyn_em/module_em.OpenACC.F

dyn_em/module_first_rk_step_part1.OpenACC.F

dyn_em/module_first_rk_step_part2.OpenACC.F

dyn_em/module_small_step_em.OpenACC.F

dyn_em/module_stoch.OpenACC.F

dyn_em/solve_em.OpenACC.F

dyn_em/start_em.OpenACC.F

frame/module_dm.OpenACC.F

frame/module_domain_extra.OpenACC.F

frame/module_domain.OpenACC.F

frame/module_domain_type.OpenACC.F

frame/module_integrate.OpenACC.F

share/mediation_integrate.OpenACC.F

share/module_bc.OpenACC.F

share/wrf_bdyin.OpenACC.F

phys/module_bl_gwdo.OpenACC.F

phys/module_bl_ysu.OpenACC.F

phys/module_cu_kfeta.OpenACC.F

phys/module_cumulus_driver.OpenACC.F

phys/module_microphysics_driver.OpenACC.F

phys/module_microphysics_zero_out.OpenACC.F

phys/module_mp_kessler.OpenACC.F

phys/module_mp_morr_two_moment.OpenACC.F

phys/module_mp_thompson.OpenACC.F

phys/module_mp_wsm6.OpenACC.F

phys/module_pbl_driver.OpenACC.F

phys/module_physics_addtendc.OpenACC.F

phys/module_physics_init.OpenACC.F

phys/module_ra_rrtm.OpenACC.F

phys/module_ra_sw.OpenACC.F

phys/module_sf_noahlsm.OpenACC.F

phys/module_sf_sfclayrev.OpenACC.F

phys/module_surface_driver.OpenACC.F

Routines completed:

Dynamics (11)

Physics (18)

Other (8)

26

32

4 + 4

8 + 8

16 +

16

64

128

Number of

CPU Cores + P100 GPUs

Number of

CPU Cores

WRF OpenACC Performance: Weather Service in JP

Comparisons on NVIDIA PSG Cluster

2x Intel Xeon E5-2698V3 Haswell CPUs (2.3GHz, 16 cores) 256 GB memory, 4x P100 GPUs, PGI 16.9

WRF Domain (5km)

• WRF 3.6.1

• Grid = 661 x 711 x 51 (24 MM)

• Time step = 30 sec, 720 TS’s

• MP = Morrison

• LSM = Noah

• PBL = YSU

• Rad = Dudhia + RRTM

27

32

4 + 4

8 + 8

16 +

16

64

128

Number of

CPU Cores + P100 GPUs

Number of

CPU Cores

Comparisons on NVIDIA PSG Cluster

2x Intel Xeon E5-2698V3 Haswell CPUs (2.3GHz, 16 cores) 256 GB memory, 4x P100 GPUs, PGI 16.9

WRF Domain (5km)

• WRF 3.6.1

• Grid = 661 x 711 x 51 (24 MM)

• Time step = 30 sec, 720 TS’s

• MP = Morrison MP

• LSM = Noah

• PBL = YSU

• Rad = Dudhia + RRTM

16

~7400 (s)

~15x with

16 GPUs

WRF OpenACC Performance: Weather Service in JP

28

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

29

NVIDIA Member of ESCAPE Program on Exascale NWP

O 1

O 3

30

ESCAPE Development of Weather & Climate Dwarfs

Batched Legendre Transform

(GEMM) variable length Batched 1D FFT

variable length

NVIDIA-Developed Dwarf: Spectral Transform - Spherical Harmonics

GPU-based Spectral Transform Approach: o For all vertical Layers

• 1D FFT along all latitudes • GEMM along all longitudes

o NVIDIA-based libraries for FFT and GEMM o OpenACC directives

31

Results of Multi-GPU Spectral Transform Dwarf

Source: ESCAPE 2nd Dissemination Workshop, 5-7 Sep 2017, Poznań, Poland

32

TOPICS OF

DISCUSSION

• NVIDIA HPC UPDATE

• ESM PROGRESS WITH GPUS

• COSMO

• WRF

• ESCAPE/IFS

• ESM GROWTH IN HPC + AI

33

Sample of Inside HPC Headlines Just Last Week . . .

“It’s somewhat ironic that training for deep learning probably has more similarity to the HPL

benchmark than many of the simulations that are run today…” - - Kathy Yelick, LBNL

34

Hardware Trends for Operational and Research HPC

Organization Location Models Previous HPC Current HPC Size / Cost (M) / Date

ECMWF Reading, UK IFS IBM Power Cray XC30 – x86 3.5 PF / $65 / Jun 2013

Met Office Exeter, UK UM IBM Power Cray XC30 – x86 16 PF / $120 / Oct 2014

DWD Offenbach, DE COSMO, ICON NEC SX-9 Cray XC30 - x86 2 PF / $23 / Jan 2013

MF Toulouse, FR Arpege, Arome NEC SX-9 Bull - x86 5 PF / $? / Nov 2012

NOAA NCEP Various, US GFS, HRRR/WRF IBM Power IBM iDataPlex - x86

Cray XC-40 - x86 5 PF / $50/yr / Oct 2015

Env Canada Montreal, CA GEM-YY, WRF IBM Power TBA ~2016 > 5 PF

JMA Tokyo, JP GSM, ASUCA Hitachi Power TBA ~2016 > 5 PF

DKRZ/MPI-M Hamburg, DE ICON, MPI-ESM IBM Power Bull - x86 3 PF / $35 / May 2014

NCAR Boulder, CO, US CESM, WRF, MPAS IBM iDataPlex SGI ICE XA - x86 5.34 PF / ~$60 / Jan 2016

NOAA ESRL Fairmont, WV, US FV3, MPAS, WRF Various Cray CS-Storm - x86 TBA ~2016 Re

sear

ch

Op

era

tio

nal

an

d R

ese

arch

35

Use Cases for HPC + Artificial Intelligence

“A confluence of developments is

driving this new wave of AI

development. Computer power is

growing, algorithms and AI models

are becoming more sophisticated,

and, perhaps most important of all,

the world is generating once

unimaginable volumes of the fuel

that powers AI—data.”

“Billions of gigabytes every day,

collected by networked devices . . .”

36

HPC + AI for Weather and Climate Applications

AI in Weather

Applications

Challenges for

HPC and AI

in Weather

and Climate

NERSC

Yandex + Start-ups

NOAA, MCH, others

NCAR, KISTI, others

37

NWP Nowcasting Systems Apply Deep Learning

38

NWP Nowcasting Systems Apply Deep Learning

39

Background

Unexpected fog can cause an airport to cancel or

delay flights, sometimes having global effects on

flight planning.

Challenge

While the weather forecasting model at MeteoSwiss

work at a 2km x 2km resolution, runways at Zurich

airport is less than 2km. So human forecasters sift

through huge simulated data with 40 parameters, like

wind, pressure, temperature, to predict visibility at

the airport.

Solution

MeteoSwiss is investigating the use of deep learning to

forecast type of fog and visibility at sub-km scale at

Zurich airport.

MCH Use of DL Models for Fog Forecast at Airports

41 41

KISTI Weather Research Deploys Deep Learning

Courtesy Dr. Minsu Joh, KISTI, May 2017

KISTI Schematic of HPC Systems with 60 x K40m GPUs

KISTI Disaster Management HPC Research Group

Deploys GPUs for Numerical and Deep Learning Models in

the Prediction of Extreme Weather

42 Courtesy Dr. Minsu Joh, KISTI, May 2017

KISTI Disaster Management HPC Research Group

Deploys GPUs for Numerical and Deep Learning Models in

the Prediction of Extreme Weather

KISTI to present this research at Climate

Informatics (CI) 2017, Boulder, CO, Sep 2017

KISTI Weather Research Deploys Deep Learning

43 Courtesy Dr. Minsu Joh, KISTI, May 2017

KISTI Disaster Management HPC Research Group

Deploys GPUs for Numerical and Deep Learning Models in

the Prediction of Extreme Weather

KISTI to present this research at Climate

Informatics (CI) 2017, Boulder, CO, Sep 2017

More detail next slide

KISTI Weather Research Deploys Deep Learning

44 44

Courtesy Dr. Minsu Joh, KISTI, May 2017 44

KISTI Schematic of the workflow of a

hybrid numerical plus deep learning model

KISTI Scientists use output from the MPAS numerical atmosphere model to train a deep

learning model for improved forecast

tracking of typhoons

KISTI Weather Research Deploys Deep Learning

Stan Posey, [email protected]

Thank you and Questions?