Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

Post on 31-Dec-2015

20 views 0 download

Tags:

description

Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress. Shirley Moore shirley@cs.utk.edu. Scalability Issues. Code instrumentation Hand instrumentation too tedious for large codes Runtime control of data collection Batch queueing systems - PowerPoint PPT Presentation

Transcript of Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM 1

Towards Scalable Cross-Platform Application Performance Analysis --

Tool Goals and ProgressShirley Mooreshirley@cs.utk.edu

October 18, 2001 LACSI Symposium, Santa Fe, NM

2

Scalability Issues

• Code instrumentation– Hand instrumentation too tedious for

large codes

• Runtime control of data collection• Batch queueing systems

– Cause problems for interactive tools

• Tracefile size and complexity• Data analysis

October 18, 2001 LACSI Symposium, Santa Fe, NM

3

Cross-platform Issues

• Goal: similar user interfaces across different platforms

• Tools necessarily rely on platform-dependent substrates – e.g., for accessing hardware counters.

• Standardization of interfaces and data formats promotes interoperability and allows design of portable tools.

October 18, 2001 LACSI Symposium, Santa Fe, NM

4

Where is Standardization Needed?

• Performance data– Trace records vs. summary statistics– Data format– Data semantics

• Library interfaces– Access to hardware counters– Statistical profiling– Dynamic instrumentation

October 18, 2001 LACSI Symposium, Santa Fe, NM

5

Standardization? (cont.)

• User interfaces– Common set of commands– Common functionality

• Timing routines• Memory utilization information

October 18, 2001 LACSI Symposium, Santa Fe, NM

6

Parallel Tools Consortium

• http://www.ptools.org/• Interaction between vendors,

researchers, and users• Venue for standardization• Current projects

– PAPI– DPCL

October 18, 2001 LACSI Symposium, Santa Fe, NM

7

Hardware Counters

• Small set of registers that count events, which are occurrences of specific signals related to the processor’s function

• Monitoring these events facilitates correlation between the structure of the source/object code and the efficiency of the mapping of that code to the underlying architecture.

October 18, 2001 LACSI Symposium, Santa Fe, NM

8

Goals of PAPI

• Solid foundation for cross platform performance analysis tools

• Free tool developers from re-implementing counter access

• Standardization between vendors, academics and users

• Encourage vendors to provide hardware and OS support for counter access

• Reference implementations for a number of HPC architectures

• Well documented and easy to use

October 18, 2001 LACSI Symposium, Santa Fe, NM

9

PAPI Implementation

Tools!!!

PAPI Low LevelPAPI High Level

Hardware Performance Counter

Operating System

Kernel Extension

PAPI Machine Dependent SubstrateMachine

SpecificLayer

PortableLayer

October 18, 2001 LACSI Symposium, Santa Fe, NM

10

PAPI Preset Events

• Proposed standard set of events deemed most relevant for application performance tuning

• Defined in papiStdEventDefs.h• Mapped to native events on a

given platform– Run tests/avail to see list of PAPI

preset events available on a platform

October 18, 2001 LACSI Symposium, Santa Fe, NM

11

Statistical Profiling

• PAPI provides support for execution profiling based on any counter event.

• PAPI_profil() creates a histogram by text address of overflow counts for a specified region of the application code.

• Used in vprof tool from Sandia Lab

October 18, 2001 LACSI Symposium, Santa Fe, NM

12

PAPI Reference Implementations

• Linux/x86, Windows 2000– Requires patch to Linux kernel, driver for Windows

• Linux/IA-64• Sun Solaris 2.8/Ultra I/II• IBM AIX 4.3+/Power

– Contact IBM for pmtoolkit

• SGI IRIX/MIPS• Compaq Tru64/Alpha Ev6 & Ev67

• Requires OS device driver patch from Compaq• Per-thread and per-process counts not possible• Extremely limited number of events

• Cray T3E/Unicos

October 18, 2001 LACSI Symposium, Santa Fe, NM

13

PAPI Future Work

• Improve accuracy of hardware counter and statistical profiling data– Microbenchmarks to measure accuracy (Pat

Teller, UTEP)– Use hardware support for overflow

interrupts– Use Event Address Registers (EARs) where

available

• Data structure based performance counters (collaboration with UMd)– Qualify event counting by address range– Page level counters in cache coherence

hardware

October 18, 2001 LACSI Symposium, Santa Fe, NM

14

PAPI Future (cont.)

• Memory utilization extensions (following list suggested by Jack Horner, LANL)– Memory available on a node– Total memory available/used– High-water-mark memory used by

process/thread– Disk swapping by process– Process-memory locality– Location of memory used by an object

• Dynamic instrumentation – e.g., PAPI probe modules

October 18, 2001 LACSI Symposium, Santa Fe, NM

15

For More Information

• http://icl.cs.utk.edu/projects/papi/– Software and documentation– Reference materials– Papers and presentations– Third-party tools– Mailing lists

October 18, 2001 LACSI Symposium, Santa Fe, NM

16

DPCL

• Dynamic Probe Class Library• Built of top of IBM version of

University of Maryland’s dyninst• Current platforms

– IBM AIX– Linux/x86 (limited functionality)

• Dyninst ported to more platforms but by itself lacks functionality for easily instrumenting parallel applications.

October 18, 2001 LACSI Symposium, Santa Fe, NM

17

Infrastructure Components?

• Parsers for common languages• Access to hardware counter data• Communication behavior

instrumentation and analysis• Dynamic instrumentation

capability• Runtime control of data collection

and analysis• Performance data management

October 18, 2001 LACSI Symposium, Santa Fe, NM

18

Case Studies

• Test tools on large-scale applications in production environment

• Reveal limitations of tools and point out areas where improvements are needed

• Develop performance tuning methodologies for large-scale codes

October 18, 2001 LACSI Symposium, Santa Fe, NM

19

PERC: Performance Evaluation Research Center

• Developing a Developing a sciencescience for understanding for understanding

performance of scientific applications on high-end performance of scientific applications on high-end

computer systems. computer systems.

• Developing Developing engineeringengineering strategies for improving strategies for improving

performance on these systems. performance on these systems.

• DOE Labs: ANL, LBNL, LLNL, ORNLDOE Labs: ANL, LBNL, LLNL, ORNL

• Universities: UCSD, UI-UC, UMD, UTKUniversities: UCSD, UI-UC, UMD, UTK

• Funded by SciDAC: Scientific Discovery through Funded by SciDAC: Scientific Discovery through

Advanced ComputingAdvanced Computing

October 18, 2001 LACSI Symposium, Santa Fe, NM

20

PERC: Real-World Applications

• High Energy and Nuclear PhysicsHigh Energy and Nuclear Physics– Shedding New Light on Exploding Stars: Terascale Simulations of Shedding New Light on Exploding Stars: Terascale Simulations of

Neutrino-Driven SuperNovae and Their NucleoSynthesisNeutrino-Driven SuperNovae and Their NucleoSynthesis– Advanced Computing for 21st Century Accelerator Science and Advanced Computing for 21st Century Accelerator Science and

TechnologyTechnology• Biology and Environmental ResearchBiology and Environmental Research

– Collaborative Design and Development of the Community Climate Collaborative Design and Development of the Community Climate

System Model for Terascale ComputersSystem Model for Terascale Computers• Fusion Energy SciencesFusion Energy Sciences

– Numerical Computation of Wave-Plasma Interactions in Multi-Numerical Computation of Wave-Plasma Interactions in Multi-

dimensional Systemsdimensional Systems• Advanced Scientific ComputingAdvanced Scientific Computing

– Terascale Optimal PDE Solvers (TOPS)Terascale Optimal PDE Solvers (TOPS)– Applied Partial Differential Equations Center (APDEC)Applied Partial Differential Equations Center (APDEC)– Scientific Data Management (SDM)Scientific Data Management (SDM)

• Chemical SciencesChemical Sciences– Accurate Properties for Open-Shell States of Large MoleculesAccurate Properties for Open-Shell States of Large Molecules

• ……and more…and more…

October 18, 2001 LACSI Symposium, Santa Fe, NM

21

Parallel Climate Transition Model

• Components for Ocean, Atmosphere, Sea Ice, Land Surface and River Transport

• Developed by Warren Washington’s group at NCAR

• POP: Parallel Ocean Program from LANL

• CCM3: Community Climate Model 3.2 from NCAR including LSM: Land Surface Model

• ICE: CICE from LANL and CCSM from NCAR

• RTM: River Transport Module from UT Austin

• Fortran 90 with MPI

October 18, 2001 LACSI Symposium, Santa Fe, NM

22

PCTM: Parallel Climate Transition Model

Flux CouplerLand

SurfaceModel

OceanModel Atmosphere

Model

Sea Ice Model

Sequential Executionof Parallelized Modules

RiverModel

October 18, 2001 LACSI Symposium, Santa Fe, NM

23

PCTM Instrumentation

• Vampir tracefile in tens of gigabytes range even for toy problem

• Hand instrumentation with PAPI tedious• UIUC working on SvPablo

instrumentation• Must work in batch queueing

environment• Plan to try other tools

– MPE logging and jumpshot– TAU– VGV?

October 18, 2001 LACSI Symposium, Santa Fe, NM

24

In Progress

• Standardization and reference implementations for memory utilization information (funded by DoD HPCMP PET, Ptools-sponsored project)

• Repositories of application performance evaluation case studies (e.g., SciDAC PERC)

• Portable dynamic instrumentation for parallel applications (DOE MICS project – UTK, UMd, UWisc)

• Increased functionality and accuracy of hardware counter data collection (DoD HPCMP, DOE MICS)

October 18, 2001 LACSI Symposium, Santa Fe, NM

25

Next Steps

• Additional areas for standardization?– Scalable trace file format– Metadata standards for performance

data– New hardware counter metrics (e.g.,

SMP and DMP events, data-centric counters)

– Others?

October 18, 2001 LACSI Symposium, Santa Fe, NM

26

Next Steps (cont.)

• Sharing of tools and data– Open source software– Machine and software profiles– Runtime performance data– Benchmark results– Application examples and case

studies• Long-term goal: common

performance tool infrastructure across HPC systems