Porting Telemac{Mascaret to OpenPower and experimenting ......Porting Telemac{Mascaret to OpenPower...

Porting Telemac–Mascaret to OpenPower and

experimenting GPU offloading to accelerate

the Tomawac module

TUC 2019 16-17th October, CERFACS, Toulouse, France

Judicael Grasset(1), Stephen Longshaw(1), Charles Moulinec(1), David R. Emerson(1)

Yoann Audouin(2), Pablo Tassi(2)

October 17, 2019

(1) STFC, Daresbury Laboratory, Warrington, United Kingdom

(2) EDF R&D, Chatou, France

Computing used

OpenPower architecture in a

nutshell:

• IBM POWER processors

• NVIDIA GPUs

• NVIDIA NVLink The machine used for this work, Paragon

In our case, each node of the machine used consists of:

• 2 IBM POWER8 processors, with 8 cores each

• Each core has simultaneous multithreading (SMT) capability

• In this case the cores are able to run either 1 thread (SMT1), 2

threads (SMT2), 4 threads (SMT4) or 8 threads (SMT8) at the

same time

• 4 NVIDIA P100 GPUs

• NVIDIA NVLink for GPU–GPU and GPU–CPU interconnections1

Porting to OpenPower

• Why? Summit and Sierra, the 2 most powerful cluster in the world

are based on an OpenPower architecture (Top500, June 2019)

• Porting to different architecure might reveal some bugs in the code

(increased robustness)

Porting to OpenPower

Status of the port:

Version > PGI 18.10 > GCC 9.1 > XL 16.1.1.1

v8p0r2 compile compile does not compile*

trunk (Oct. 2019) does not compile* compile does not compile*

*problem known and solved, it compile when applying a small patch

All tests done with the Spectrum MPI library

Experimenting with GPUs

Or trying to port Telemac to the architecture of the ��future present

The test case

Test case used: tomawac/fetch limited/tom test6.cas

• This is a limited test with a small mesh: 75k elements, 32k points.

• It spends all of its time in a single fortran subroutine: qnlin3.f

• This function was reported to be a bottleneck by some users during

the annual TELEMAC User Conference (2018).

qnlin3.f

In a nutshell:

• do loop

• init some variables

• do loop

• tmp array(x,y,z) = tmp array(x,y,z) + k

Porting to GPUs, methods

Different solutions exist:

• Pragma based: OpenMP, OpenACC

• Library based: Magma, cuBLAS...

• Language extension: CUDA, OpenCL

MPI+OpenACC (PGI compiler) on GPU

Move data to GPU and execute the loop on it.

• !$acc data copy(array)

• !$acc parallel loop collapse(4)

• do loop

• !$acc atomic

• array(x,y,z) = array(x,y,z) + k

• ...

• !$acc end data

Elsewhere during the initialisation of the code, we have linked each MPI

task to a specific GPU.

MPI+OpenACC (PGI compiler) on GPU

MPI+OpenMP (IBM compiler) on GPU

Move data to GPU and execute the loop on it.

• !$omp target data map(array)

• !$omp target teams distribute parallel do collapse(4)

• do loop

• !$omp atomic

• array(x,y,z) = array(x,y,z) + k

• ...

• !$omp end target data

Elsewhere during the initialisation of the code, we have linked each MPI

task to a specific GPU.

MPI+OpenMP (IBM compiler) on GPU

Somme test-case

• Somme 7 days

• Telemac2d-Tomawac-Sisyphe

6%6%6.6%

other subroutinessemimpqwind1propa

fremoyschar41 per 4dlogqnlin1bief interp

Inclusion in the codebase

• OpenACC and OpenMP redundancy

• Could be solved with pragma in this case

• But might not always be possible

• Usage of the optional directory

Conclusion

Results achieved:

• Telemac-Mascaret ported to OpenPower

• The port revelead bugs in Telemac-Mascaret and some compilers

• Good improvement when using GPU for the qnlin3 subroutine

• Work still going on, but will be more difficult for real world test-case

Acknowledgements

• This work is supported by the Hartree Centre through the Innovation

Return on Research (IROR) programme.

Thank you for your attention

If you think the code is too slow, or uses to much memory for you

(partel, Telemac, Tomawac...)

Please contact us.

Contact:

judicael.grasset@stfc.ac.uk charles.moulinec@stfc.ac.uk

Porting Telemac{Mascaret to OpenPower and experimenting ......Porting Telemac{Mascaret to OpenPower...

Documents

Transcript of Porting Telemac{Mascaret to OpenPower and experimenting ......Porting Telemac{Mascaret to OpenPower...

BuildLightningFASTApps+with Dockerand+OpenPOWER · BuildLightningFASTApps+with Dockerand+OpenPOWER Indrajit Poddar,STSM,IBMSystems Seetharami+Seelam,+RSM,+IBM+Research

OpenPOWER Foundation Supercomputing Recap: Accelerating Innovation

Reference Guide for OpenPOWER CPUs

OpenPOWER Foundation Overview

OpenPOWER Overview

Our journey to the OpenPOWER Foundation · 2019-04-16 · Our journey to the OpenPOWER Foundation Georg C. F. Greve 01/12/2016. Collaborate in Confidence Our path to the OpenPOWER

OpenPOWER Summit Day 2 Recap

Calista Redmond · © 2016 OpenPOWER Foundation HPC Advisory Council March 21, 2016 Calista Redmond President, OpenPOWER Foundation Director, IBM OpenPOWER Global Alliances

TELEMAC-3Dmathocean.math.cnrs.fr/presentations/Hervouet.pdf · General features of numerical schemes in Telemac-3D Implicit schemes (linear systems solved with CG, GMRES, direct,

Open BMC Overview - OpenPOWER Foundation

TELEMAC MODELLING SYSTEM

Modelling culverts in TELEMAC - Vlaanderen

OpenPOWER SC16 Recap: Day 3

OpenPOWER ISA Compliance Definition · OpenPOWER ISA Compliance Definition January 10, 2018 Version 1.0 OpenPOWER Foundation 1 Workgroup Specification Standard Track 1. Introduction

1040: OpenPOWER Foundation Update

Mascaret Podensac n°49

Applications of the Telemac-Mascaret 1D/2D/3D Open … CEDEX... · Applications of the Telemac-Mascaret 1D/2D/3D Open Source Flow Modelling System to floods events ... • Dimensioning

OpenPOWER Summit Event Highlights · OpenPower Summit a year ago. At the time, the group had about 130 members and showed off fewer than 20 OpenPower-based systems and components.

DC GIS Steering Committee Meeting March 16, 2006 · OpenPower OpenPower OpenPower OpenPower DMZ Switch DC WAN Intranet 2TB Hitachi SAN Gigabit Fibre Switch Optical Fibre connection

OpenPOWER I/O Design Architecture - Version 2kib.kiev.ua/x86docs/POWER/IODA2WGSpec-1.0.0-20160217.pdf · OpenPOWER I/O Design Archi-tecture February 17, 2016 Revision 1.0.0 OpenPOWER