The Performance Optimisation and Productivity Centre of ... › market ›...

1
The Performance Optimisation and Productivity Centre of Excellence Sally Bridgwater Nick Dingle Jonathan Boyle (Numerical Algorithms Group) Introduction NAG are delighted to be part of the EU group called POP (Perfor- mance Optimisation and Productivity) that is helping to improve the performance of software. In brief POP offers to analyse software and recommend improvements with a focus on HPC and parallelization. This service is free of charge to EU organisations. Methodology Efficiency metrics give an overview of how well the parallelization of the applications works and how efficient the hardware is used. Each efficiency metric investigates one source of common inefficiency in a parallel program. The metrics are organized in a hierarchy which allows you to drill down from the top level Global Efficiency to specific inefficiencies in a pro- gram, they give a detailed overview of the parallel performance of an application in a very condensed form. Global Efficiency - Overall performance – Parallel Efficiency - Efficiency of parallelization strategy * Load Balance Efficiency - Distribution of work * Communication Efficiency · Serialization Efficiency - Dependencies between processes · Transfer Efficiency - Effect of data transfer – Computational Efficiency - Scaling of computational load * IPC Scaling - Implicates resource contention * Instruction Scaling - Increase in computational work Figure 1: Timeline visualization of four processes executing a simple program with a major load imbalance between processes causing long waiting times in MPI. Figure 2: Three processes performing dependent computations which leads to serial- ization in the program execution causing long waiting times in MPI but perfect load balance. Tools Integral to the POP project is the use of performance analysis tools, some developed by members of the POP consortium. Barcelona tools Include Paraver, a trace-based performance analyser with great flex- ibility to explore and extract information. Including timelines that graphically display the evolution of the application, and tables that provide statistical information. Figure 3: Paraver timeline ulich tools Include Scalasca which characterises parallel execution inefficiencies and detects the best candidates for optimisation. Figure 4: Cube view of Scalasca data from MPI test program Intel ® tools Include VTune which is a powerful tool for profiling code written in a variety of languages including C/C++, Fortran and Python. It can be used on parallel code that uses paradigms such as OpenMP, MPI and Intel ® Thread Building Blocks (TBB), as well as on serial code. Figure 5: VTune analysis of Cheby code. Breadth of Work Organisations from a wide range of application areas have used the services: • Computer aided engineering (CAE) Finite element analysis (FEA) Computational fluid dynamics (CFD) • Earth sciences • Neural networks • Materials & Chemistry modelling • Electronic structure calculations • Health Along with many languages and parallelization approaches: Acknowledgements The other POP project partners: Barcelona Supercomputing Center, Juelich Supercomputing Center, RWTH Aachen, HLRS Stuttgart and Teratec. This project has received funding from the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 676553. October 2017 Correspondence address: [email protected] www.nag.com

Transcript of The Performance Optimisation and Productivity Centre of ... › market ›...

Page 1: The Performance Optimisation and Productivity Centre of ... › market › pop-centre-excellence.pdf · The Performance Optimisation and Productivity Centre of Excellence Sally Bridgwater

The Performance Optimisation and ProductivityCentre of Excellence

Sally Bridgwater Nick Dingle Jonathan Boyle (Numerical Algorithms Group)

IntroductionNAG are delighted to be part of the EU group called POP (Perfor-mance Optimisation and Productivity) that is helping to improve theperformance of software. In brief POP offers to analyse software andrecommend improvements with a focus on HPC and parallelization.This service is free of charge to EU organisations.

MethodologyEfficiency metrics give an overview of how well the parallelization ofthe applications works and how efficient the hardware is used. Eachefficiency metric investigates one source of common inefficiency in aparallel program.

The metrics are organized in a hierarchy which allows you to drill downfrom the top level Global Efficiency to specific inefficiencies in a pro-gram, they give a detailed overview of the parallel performance of anapplication in a very condensed form.• Global Efficiency - Overall performance

– Parallel Efficiency - Efficiency of parallelization strategy∗ Load Balance Efficiency - Distribution of work∗ Communication Efficiency

· Serialization Efficiency - Dependencies between processes· Transfer Efficiency - Effect of data transfer

– Computational Efficiency - Scaling of computational load∗ IPC Scaling - Implicates resource contention∗ Instruction Scaling - Increase in computational work

Figure 1: Timeline visualization of four processes executing a simple program with amajor load imbalance between processes causing long waiting times in MPI.

Figure 2: Three processes performing dependent computations which leads to serial-ization in the program execution causing long waiting times in MPI but perfect loadbalance.

ToolsIntegral to the POP project is the use of performance analysis tools,some developed by members of the POP consortium.

Barcelona toolsInclude Paraver, a trace-based performance analyser with great flex-ibility to explore and extract information. Including timelines thatgraphically display the evolution of the application, and tables thatprovide statistical information.

Figure 3: Paraver timeline

Julich toolsInclude Scalasca which characterises parallel execution inefficienciesand detects the best candidates for optimisation.

Figure 4: Cube view of Scalasca data from MPI test program

Intel® toolsInclude VTune which is a powerful tool for profiling code written in avariety of languages including C/C++, Fortran and Python. It can beused on parallel code that uses paradigms such as OpenMP, MPI andIntel® Thread Building Blocks (TBB), as well as on serial code.

Figure 5: VTune analysis of Cheby code.

Breadth of WorkOrganisations from a wide range of application areas have used theservices:• Computer aided engineering (CAE)

– Finite element analysis (FEA)– Computational fluid dynamics (CFD)

• Earth sciences• Neural networks• Materials & Chemistry modelling• Electronic structure calculations• Health

Along with many languages and parallelization approaches:

AcknowledgementsThe other POP project partners: Barcelona Supercomputing Center,Juelich Supercomputing Center, RWTH Aachen, HLRS Stuttgart andTeratec.

This project has received funding from the European Union‘s Horizon2020 research and innovation programme under grant agreement No676553.

October 2017 Correspondence address: [email protected] www.nag.com