DSL Toolchains for Performance Portable Geophysical Fluid ...€¦ · DSL Toolchains for...

Post on 28-Jun-2020

4 views 0 download

Transcript of DSL Toolchains for Performance Portable Geophysical Fluid ...€¦ · DSL Toolchains for...

Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwissDSL Toolchains for Performance Portable Geophysical Fluid Dynamic ModelsCarlos OsunaMeteoswiss

carlos.osuna@meteoswiss.ch

V. Clement, O. Fuhrer, S. Moosbrugger, X. Lapillonne, P. Spoerri, T. Schulthess, F. Thuering, H. Vogt, T. Wicky

MPI Seminar, 21st Novemebr 2017 2

DSL Toolchains for Performance PortableGeophysical Fluid Dynamic ModelsCOSMO 2km ensembles (21 members) COSMO 1km

MPI Seminar, 21st Novemebr 2017 3

DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models

Carlos Osuna, MPI Seminar, 21 November 2017 4Multi-coreGPU MIC 4

Top 500 overview Sunway TaihuLightTianhe-2TitanSequoia K ComputerPiz daint

TrinityCorioakforestGyoukouDisruptvie emergence of accelerators in HPC.Large memory bandwidth have speeded upour models. On NVIDIA GPUs:• dynamical cores – 2-3x• physical parametrizations 3-7x• spectral transforms > 10x

Carlos Osuna, MPI Seminar, 21 November 2017 5

How to achieveportabilityfor our GFD models ?

Carlos Osuna, MPI Seminar, 21 November 2017 6

DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models

Carlos Osuna, MPI Seminar, 21 November 2017 7

COSMO global 1kmDiscussion paper under reviewhttps://www.geosci-model-dev-discuss.net/gmd-2017-230/

Carlos Osuna, MPI Seminar, 21 November 2017 8

DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models

Carlos Osuna, MPI Seminar, 21 November 2017 9

Carlos Osuna, MPI Seminar, 21 November 2017 10

A domain specific language (DSL) for GFDs:• abstracts away all unnecessary details of a programminglanguage (Fortran) implementation• concise syntax: only those keywords needed to express thenumerical problem The GridTools Ecosystem• Set of C++ APIs / Libraries• DSL for PDEs• Large class of problems• Performance Portability• Separation of concerns• Interface to Fortran• Open source license

Carlos Osuna, MPI Seminar, 21 November 2017 11

DSL for Weather Codes on Unstructured GridsDSLs are a successful approach to deal with the software “productivity gap”ESCAPE develops a DSL based on GridTools: library tools for solving PDEs on multiple grids. The DSL syntax requires only the grid-point operation:Tiled loop templates and parallelization are abstractedUse of fast on-chip memory is also abstracted. Grid is abstracted (in combination with Atlas)����) = -4* u + u(i+1) + u(i-1) + u(j+1) + u(j-1)

Carlos Osuna, MPI Seminar, 21 November 2017 12

Writing Operators using DSL

Carlos Osuna, MPI Seminar, 21 November 2017 13

Carlos Osuna, MPI Seminar, 21 November 2017 14

Global Grids• New DSL constructs for stencils on global grids• Expose a unstructured grid API• Leverage grid structure for performanceIcosahedral Cubed sphere Octahedral

Carlos Osuna, MPI Seminar, 21 November 2017 15

Grid Abstraction15

Icosahedral OctahedralDual-grid Cube sphere

constepxr auto offsets = connectivity<edges, vertices, Color>::offsets();

for( auto off: offsets) {eval(flux) += eval( pD(off) );

}

eval(flux) = eval(sum_on_vertices(0.0, pD));• The DSL syntax elements are grid independent. • The same used code using DSL can be compiled for multiple Grids. • GridTools will support efficient native layouts for octahedral/icosahedral. • GridTools will use Atlas in order to support any grid.

Carlos Osuna, MPI Seminar, 21 November 2017 16

Example of GridTools DSL : MPDATA

Carlos Osuna, MPI Seminar, 21 November 2017 17

Example of GridTools DSL : MPDATA

Carlos Osuna, MPI Seminar, 21 November 2017 18

m_upwind_fluxes = make_computation<gpu>(domain_uwf, grid_,make_multistage(execute<forward>(),make_stage<upwind_flux, octahedral_topology_t, edges>(p_flux(), p_pD(), p_vn()),make_stage<upwind_fluz, octahedral_topology_t, vertices>(p_fluz(), p_pD(), p_wn()),make_stage<fluxzdiv, octahedral_topology_t, vertices>(p_divVD(), p_flux(), p_fluz(), p_dual_volumes(), p_edges_sign()),make_stage<advance_solution, octahedral_topology_t, vertices>(p_pD(), p_divVD(), p_rho())));Full MPDATA implemented using the GridTools DSL:� upwind fluxes� minmax limiter� compute fluxes� Limit fluxes� Flux solution� Rho correction

Composition of multiple MPDATA operators

Carlos Osuna, MPI Seminar, 21 November 2017 19

Grid Abstraction: Performance Portability19

Thanks to abstraction of the mesh wecan implement any partitioning:

O32 meshEqual partitionerimplemented with Atlas

Structured partitionerimplemented with Atlas

Carlos Osuna, MPI Seminar, 21 November 2017 20

Atlas: a library for NWP and climate modelling20Deconinck et al. 2017

Willem Deconinck

Carlos Osuna, MPI Seminar, 21 November 2017 21

Grid Abstraction: Performance Portability21

Abstraction of the mesh can implement any indexing layoutIndexing has a large impact in performance:NVIDIA P100, 128x128x80, Bandwidth GB/s

SN DA 211

SN IA 270

UN IA 130

HN IA 256

B � �∑ �������� �

Carlos Osuna, MPI Seminar, 21 November 2017 22

GridTools and Atlas E� solely relies on a C++ compiler � No external tools� provides a high-level of abstraction � Performance portabilityBut writing in this new programming modelE � is hard (C++ expertise) � Decreased productivity� is unsafe � No protection from violating the parallel model� often requires expert knowledge to produce efficient code

DSLs and abstractions: state of the art

Carlos Osuna, MPI Seminar, 21 November 2017 23

What is the Future of GFD Models in a complex HPC environment?1. Need to increase model development productivity2. Need to decrease maintainence cost of models3. Need to extend tools to a large set of models, methods, domains.4. Need to maximize efficiency

Carlos Osuna, MPI Seminar, 21 November 2017 2424

DSL Toolchains for Performance PortableGeophysical Fluid Dynamic Models

Carlos Osuna, MPI Seminar, 21 November 2017 25

Need to increase modeldevelopment productivityHigh level DSLs, concise, safe and intuitive syntaxthat remove boiler-platefrom embedded GridTools

Carlos Osuna, MPI Seminar, 21 November 2017 26

Need to decrease maintainence of modelsCheck for parallel errors,Auto-generation of boundaryconditions and halo exchangesCheck for out of boundsSafe generation of loop datadependenciesE.

Carlos Osuna, MPI Seminar, 21 November 2017 27

Need to extend tools to a large set of models, methods, domains.ICON-atmos IFS COSMOLFRicICON-Ocean

GridTools LisztPsyClone HybridFortranFiredrakeCLAWGFD modelsDSLs and programmingmodels

Options:• Unify all tools in a single DSL software stack -> maintainenceissue.• One DSL (programming model) per model, currently leadingto explosion of DSL solutions.• ESCAPE2

Carlos Osuna, MPI Seminar, 21 November 2017 28

Need to extend tools to a large set of models, methods, domains.ICON IFS COSMOLFRicICON-Ocean

GridTools LisztPsyClone HybridFortranFiredrakeCLAWGFD modelsDSLs and programmingmodels

Options:• Unify all tools in a single DSL software stack -> maintainenceissue.• One DSL (programming model) per model, currently leadingto explosion of DSL solutions.• ESCAPE2

Carlos Osuna, MPI Seminar, 21 November 2017 29

In the ESCAPE2 DNA3• Develop programming languages for GFD models thatcan increase productivity, and lower maintainance• Standardization, avoid explosion of tools.• Support multiple domains (FV weather, FV ocean, structured, unstructured, FE) in the standardization• Multiple domain languages or customizations, singletool• Language to be defined together with scientists

Carlos Osuna, MPI Seminar, 21 November 2017 30

ESCAPE2 base technologyDAWN

GTCLANG

Modular compiler toolchain (based on modern compilers –llvm- )

Carlos Osuna, MPI Seminar, 21 November 2017 31

ESCAPE2 base technologyDAWN

Unstructured

Structured

ICON Ocean pluginIFS collocatedFE pluginMultiple variants of a languagea[i+1]

on_cells(+,a)

Carlos Osuna, MPI Seminar, 21 November 2017 32 11/3Fabian Thüring

Example – Horizontal Diffusion

32

Carlos Osuna, MPI Seminar, 21 November 2017 33 33

stencil_function laplacian {storage phi;Do {return 4.0 * phi – phi[i+1] – phi[i-1] – phi[j+1] – phi[j-1];

}};

stencil hd_type2 {storage hd, in;temporary_storage lap;Do {vertical_region(k_start, k_end) { lap = laplacian(in);hd = laplacian(lap);}

}};

Example – Horizontal Diffusion

Carlos Osuna, MPI Seminar, 21 November 2017 34 34

stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[k+1]); }

}};

Safety First

Carlos Osuna, MPI Seminar, 21 November 2017 35 35

stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[j+1] + uavg[j-1]); }

}};

Safety First

Carlos Osuna, MPI Seminar, 21 November 2017 36 36

stencil hd_type2 {storage u, utens;temporary_storage uavg;Do {vertical_region(k_start, k_end) {uavg = (u[i+1]+u[i-1])*0.5;utens = (uavg[j+1] + uavg[j-1]); }

}};

Safety First

Carlos Osuna, MPI Seminar, 21 November 2017 37 37

stencil_function laplacian {storage phi;Do {return 4.0 * phi – phi[i+1] – phi[i-1] – phi[j+1] – phi[j-1];

}};stencil hd_type2 {storage hd, in;temporary_storage lap;Do {vertical_region(k_start, k_end) { lap = laplacian(in);hd = laplacian(lap);}

}};#omp parallel for nowait#acc …for(int i=0; i < isize; ++i) {for(int j=0; j < jsize; ++j) {for(int k=0; k < ksize; ++k) {udiff(i,j,k) = hd(i+1,j,k) + hd(i-1,j,k);

} } }

Interoperatbility : Escaping the languageDSL codeStandard C++ code(+OpenMP, +OpenACC, +CUDA)

Carlos Osuna, MPI Seminar, 21 November 2017 38

Standardization:SIR { «filename» : «/code/access_test.cpp»,«stencils» : [«name» : «hori_diff»,«loc» : {«line»: 24,«column»: 8},«ast»: {«block_stmt»: {«statements»: {«expr_stmt»: {«assignment_expr»: {«left»: {«field_access_expr»: {«name»: «b»,«offset»: [], }},«right»: {«binary_op»: {«left»: {«field_access_expr»: {«name»: «a»,«offset»: [1,0,0], }},«right»: {«field_access_expr»: {«name»: «a»,«offset»: [-1,0,0], } } } } } } } } } } }

b = a(i+1) + a(i-1)GTCLANGMultiple DSL (thin) frontends can generatestandard SIR schemeGridToolsT4PyCLAWESCAPE2 / OceanPsyCloneFortranFortran

C++ Python

Carlos Osuna, MPI Seminar, 21 November 2017 39

DAWN: Compiler optimization passes

Carlos Osuna, MPI Seminar, 21 November 2017 40

COSMO dycore evaluation of DSL toolchainType 2 u SmagorinskyType 2 v Type 2 w Type 2

pp Horizontal Diffusion(P100)Advection( metric and time - P100 -)

Carlos Osuna, MPI Seminar, 21 November 2017 41

Conclusions• The future of GFD models, high resolution, multidecade simulationwill bring serious computational challenges:efficiency and production cost, maintenance cost• We need to find right programming models so that adaptation tonew architectures do not hinder scientific development -> high productivity• Single efforts are not a valid model anymore, we needcollaborations• ESCAPE2 is a great opportunity to define the new language withscientists, we have the technology in place.

MPI Seminar, 21st Novemebr 2017

BACKUPS

MPI Seminar, 21st Novemebr 2017

Internal presentation, Valentin Clément

Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss

MeteoSvizzeraVia ai Monti 146CH-6605 Locarno-MontiT +41 58 460 92 22www.meteosvizzera.ch MétéoSuisse7bis, av. de la PaixCH-1211 Genève 2T +41 58 460 98 88www.meteosuisse.ch MétéoSuisseChemin de l‘AérologieCH-1530 PayerneT +41 58 460 94 44www.meteosuisse.chMeteoSwissOperation Center 1 CH-8058 Zurich-Airport T +41 58 460 91 11 www.meteoswiss.ch

44