Performance Technology for Component Software

36
Allen D. Malony, Sameer Shende {malony,shende}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University of Oregon Performance Technology for Component Software

description

Performance Technology for Component Software. Allen D. Malony, Sameer Shende {malony,shende}@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University of Oregon. Outline. Complexity and performance technology TAU performance system - PowerPoint PPT Presentation

Transcript of Performance Technology for Component Software

Page 1: Performance Technology for Component Software

Allen D. Malony, Sameer Shende {malony,shende}@cs.uoregon.edu

Department of Computer and Information Science

Computational Science Institute

University of Oregon

Performance Technologyfor Component Software

Page 2: Performance Technology for Component Software

Oct 16, 2002 LACSI 20022

Outline

Complexity and performance technology TAU performance system Developing performance interfaces for CCA Performance modeling and prediction issues Applications

Uintah [U. Utah], VTF [Caltech], SAMRAI [LLNL] Concluding remarks

Page 3: Performance Technology for Component Software

Oct 16, 2002 LACSI 20023

Focus on Component Technology and CCA

Emerging component technology for HPC and Grid Component: software object embedding functionality Component architecture (CA): how components connect Component framework: implements a CA Common Component Architecture (CCA)

Standard foundation for scientific component architecture Component descriptions

Scientific Interface Description Language (SIDL) CCA ports for component interactions CCA framework services (CCAFEINE)

directory, registry, connection, event

Page 4: Performance Technology for Component Software

Oct 16, 2002 LACSI 20024

Problem Statement

How do we create robust and ubiquitous performance technology for the analysis and tuning of component software in the presence of (evolving)

complexity challenges?

How do we apply performance technology effectively for the variety and diversity of performance problems

that arise in the context of CCA components?

Page 5: Performance Technology for Component Software

Oct 16, 2002 LACSI 20025

TAU Performance System Framework

Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high-

performance computing Targets a general complex system computation model

nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction

Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable, configurable performance profiling/tracing facility Open software approach

University of Oregon, LANL, FZJ Germany http://www.cs.uoregon.edu/research/paracomp/tau

Page 6: Performance Technology for Component Software

Oct 16, 2002 LACSI 20026

TAU Performance System Architecture

EPILOG

Paraver

Page 7: Performance Technology for Component Software

Oct 16, 2002 LACSI 20027

Extended Component Design

PKC: Performance Knowledge Component POC: Performance Observability Component

genericcomponent

Page 8: Performance Technology for Component Software

Oct 16, 2002 LACSI 20028

Performance Observation

Ability to observe execution performance is important Empirically-derived performance knowledge

Does not require measurement integration in component Monitor during execution to make dynamic decisions

Measurement integration is key

Performance observation integration Component integration: core and variant Runtime measurement and data collection On-line and off-line performance analysis

Page 9: Performance Technology for Component Software

Oct 16, 2002 LACSI 20029

Performance Observation Component (POC)

Performance observation in aperformance-engineeredcomponent model

Functional extension of originalcomponent design ( ) Include new component

methods and ports ( ) for othercomponents to access measured performance data

Allow original component to access performance data Encapsulate as tightly-couple and co-resident performance

observation object POC “provides” port allow use optmized interfaces ( )

to access ``internal'' performance observations

Page 10: Performance Technology for Component Software

Oct 16, 2002 LACSI 200210

Design of Performance Observation Component

Performance Component

One performance component per context Performance component provides a Measurement Port

Measurement Port allows a user to create and access: Timer (start/stop, set name/type/group) Event (trigger) Control (enable/disable groups) Query (get functions, metrics, counters, dump to disk)

TimerEvent

ControlQuery

Measurement Port

Page 11: Performance Technology for Component Software

Oct 16, 2002 LACSI 200211

Measurement Port in CCAFEINE namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){}

/* Create a Timer */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type,

string group) = 0;

/* Create a Query interface */ virtual performance::Query* createQuery(void) = 0;

/* Create a User Defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0;

/** * Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; }

Page 12: Performance Technology for Component Software

Oct 16, 2002 LACSI 200212

Timer Class Interfacenamespace performance { class Timer { public:

virtual ~Timer() {} /* Start the Timer. Implement these methods in * a derived class to provide required functionality. */ virtual void start(void) = 0;

/* Stop the Timer.*/ virtual void stop(void) = 0;

virtual void setName(string name) = 0; virtual string getName(void) = 0;

virtual void setType(string name) = 0; virtual string getType(void) = 0;

/**Set the group name associated with the Timer * (e.g., All MPI calls can be grouped into an "MPI" group)*/

virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0;

virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; }

Page 13: Performance Technology for Component Software

Oct 16, 2002 LACSI 200213

Control Class Interfacenamespace performance { class Control { public: ~Control () { }

/* Control instrumentation. Enable group Id.*/ virtual void enableGroupId(unsigned long id) = 0; /* Control instrumentation. Disable group Id. */ virtual void disableGroupId(unsigned long id) = 0; /* Control instrumentation. Enable group name. */ virtual void enableGroupName(string name) = 0; /* Control instrumentation. Disable group name.*/ virtual void disableGroupName(string name) = 0; /* Control instrumentation. Enable all groups.*/ virtual void enableAllGroups(void) = 0; /* Control instrumentation. Disable all groups.*/ virtual void disableAllGroups(void) = 0; };}

Page 14: Performance Technology for Component Software

Oct 16, 2002 LACSI 200214

Query Class Interfacenamespace performance { class Query { public: virtual ~Query() {}

/* Get the list of Timer names */ virtual void getTimerNames(const char **& functionList, int& numFuncs)

= 0; /* Get the list of Counter names */ virtual void getCounterNames(const char **& counterList,

int& numCounters) = 0;

/* getTimerData. Returns lists of metrics.*/ virtual void getTimerData(const char **& inTimerList,

int numTimers, double **& counterExclusive, double **& counterInclusive, int*& numCalls, int*& numChildCalls, const char **& counterNames, int& numCounters) = 0;

virtual void dumpProfileData(void) = 0; virtual void dumpProfileDataIncremental(void) = 0; // timestamped dump virtual void dumpTimerNames(void) = 0; virtual void dumpTimerData(const char **& inTimerList, int numTimers)

= 0; virtual void dumpTimerDataIncremental(const char **& inTimerList,

int numTimers) = 0; }; }

Page 15: Performance Technology for Component Software

Oct 16, 2002 LACSI 200215

Event Class Interfacenamespace performance { class Event { public: /** * Destructor */ virtual ~Event() { }

/** * Register the name of the event */ virtual void trigger(double data) = 0;

/* e.g., size of a message, error in an iteration, memory allocated */ };}

Page 16: Performance Technology for Component Software

Oct 16, 2002 LACSI 200216

Measurement Port Implementation

TAU component implements the MeasurementPort Implements Timer, Control, Query and Control classes Registers the port with the CCAFEINE framework

Components target the generic MeasurementPort interface Runtime selection of TAU component during execution Instrumentation code independent of underlying tool Instrumentation code independent of measurement choice TauMeasurement_CCA port implementation uses a

specific TAU measurement library

Page 17: Performance Technology for Component Software

Oct 16, 2002 LACSI 200217

Using MeasurementPort#include "ports/Measurement_CCA.h"

…double MonteCarloIntegrator::integrate (double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast < performance::ccaports::Measurement *

>(port); if (measurement_m == 0){ cerr << "Connected to something other than a Measurement port"; return -1; } static performance::Timer* t = measurement_m->createTimer(

string("IntegrateTimer")); t->start();

for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop();

Page 18: Performance Technology for Component Software

Oct 16, 2002 LACSI 200218

Using TAU Component in CCAFEINErepository get TauMeasurementrepository get Driverrepository get MidpointIntegratorrepository get MonteCarloIntegratorrepository get RandomGeneratorrepository get LinearFunctionrepository get NonlinearFunctionrepository get PiFunction

create LinearFunction lin_funccreate NonlinearFunction nonlin_funccreate PiFunction pi_funccreate MonteCarloIntegrator mc_integratorcreate RandomGenerator rand

create TauMeasurement tauconnect mc_integrator RandomGeneratorPort rand RandomGeneratorPortconnect mc_integrator FunctionPort nonlin_func FunctionPortconnect mc_integrator MeasurementPort tau MeasurementPortcreate Driver driverconnect driver IntegratorPort mc_integrator IntegratorPortgo driver Goquit

Page 19: Performance Technology for Component Software

Oct 16, 2002 LACSI 200219

Using SIDL for Language Interoperability//// File: performance.sidl//

version performance 1.0;

package performance { class Timer { void start(); void stop(); void setName(in string name); string getName(); void setType(in string name); string getType(); void setGroupName(in string name); string getGroupName(); void setGroupId(in long group); long getGroupId(); }}

Page 20: Performance Technology for Component Software

Oct 16, 2002 LACSI 200220

Using SIDL Interface for Timers

// SIDL:#include "performance_Timer.hh"int main(int argc, char* argv[]){ performance::Timer t = performance::Timer::_create(); ... t.setName("Integrate timer"); t.start();

// Computation for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } ... t.stop();

return 0;}

Page 21: Performance Technology for Component Software

Oct 16, 2002 LACSI 200221

Performance Knowledge Component Describe and store “known” component’s performance

Benchmark characterizations in performance database Empirical or analytical performance models

Saved information about component performance Use for performance-guided selection and deployment Use for runtime adaptation

Representation must be in common forms with standard means for accessing the performance information

Page 22: Performance Technology for Component Software

Oct 16, 2002 LACSI 200222

Performance Knowledge Repository & Component Component performance repository

Implement in componentarchitecture framework

Similar to CCA componentrepository [Alexandria]

Access by componentinfrastructure

View performance knowledge as component (PKC) PKC ports give access to performance knowledge to other components back to original component Store performance model for performance prediction Component composition performance knowledge

Page 23: Performance Technology for Component Software

Oct 16, 2002 LACSI 200223

Component Performance Model

User specified Inferred automatically by performance tool

Prior performance data Expression Parametric model

Estimate performance of a single component by Querying runtime performance data Passing this to performance model for evaluation

Integration of performance observation and knowledge components key to runtime selection of components

Page 24: Performance Technology for Component Software

Oct 16, 2002 LACSI 200224

Composition of Components

Understanding scalability of performance models (Research problem) Linear superposition principle does not apply!

Composition of scalable components may not produce a scalable execution (mismatch of data structures…)

Scalable Component A

ScalableComponent B

data

Unscalable union

Page 25: Performance Technology for Component Software

Oct 16, 2002 LACSI 200225

Performance Technology for Components: TAU

EPILOG

Paraver

Page 26: Performance Technology for Component Software

Oct 16, 2002 LACSI 200226

TAU Instrumentation

Flexible instrumentation mechanisms at multiple levels Source code

Manual (TAU API, CCA Measurement Port API) automatic using Program Database Toolkit (PDT), OPARI

(for OpenMP programs), Babel SIDL compiler (proposed) Object code

pre-instrumented libraries (e.g., MPI using PMPI) statically linked dynamically linked (e.g., Virtual machine instrumentation) fast breakpoints (compiler generated)

Executable code dynamic instrumentation (pre-execution) using DynInstAPI

Page 27: Performance Technology for Component Software

Oct 16, 2002 LACSI 200227

Program Database Toolkit

Application/ Library

C / C++parser

Fortran 77/90parser

C / C++IL analyzer

Fortran 77/90IL analyzer

ProgramDatabase

Files

IL IL

DUCTAPE

PDBhtml

SILOON

CHASM

TAU_instr

Programdocumentation

Applicationcomponent glue

C++ / F90interoperability

Automatic sourceinstrumentation

Page 28: Performance Technology for Component Software

Oct 16, 2002 LACSI 200228

Program Database Toolkit (PDT) Program code analysis framework for developing source-based tools for C99,

C++ and F90 [U.Oregon, LANL, FZJ Germany] High-level interface to source code information Widely portable:

IBM, SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray T3E...

Integrated toolkit for source code parsing, database creation, and database query commercial grade front end parsers (EDG for C99/C++, Mutek for F90) Intel/KAI C++ headers for std. C++ library distributed with PDT portable IL analyzer, database format, and access API open software approach for tool development

Target and integrate multiple source languages Used in CCA for automated generation of SIDL [CHASM] Use in TAU to build automated performance instrumentation tools

(tau_instrumentor) Can be used to generate code for performance ports in CCA

Page 29: Performance Technology for Component Software

Oct 16, 2002 LACSI 200229

New Features in TAU

Instrumentation OPARI – OpenMP directive rewriting approach [POMP, FZJ] Selective instrumentation –grouping, include/exclude lists tau_reduce – rule based detection of high overhead lightweight

routines Measurement

PAPI [UTK] – Support for multiple hardware counters/time Callpath profiling (1-level) Native generation of EPILOG traces [EXPERT, FZJ]

Analysis Support for Paraver [CEPBA] trace visualizer jracy – New Java based profile browser in TAU

Availability New platforms and compilers supported (NEC, Hitachi, Intel)

Page 30: Performance Technology for Component Software

Oct 16, 2002 LACSI 200230

Applications: Uintah (U. Utah)

Scalability analysis

Page 31: Performance Technology for Component Software

Oct 16, 2002 LACSI 200231

Applications: VTF (ASCI ASAP Caltech) C++, C, F90, Python PDT, MPI

Page 32: Performance Technology for Component Software

Oct 16, 2002 LACSI 200232

Overview of VTF Code Profile

VTF code run with 1 solid node, 32 fluid nodes (Nodes 0 and 1 are solid and fluid server nodes)

Solid solver adlib computes cube response to planar shock

Fluid solver arm3d evolves shock using Godunov scheme on two-level AMR grid

Try to balance solid & fluid workload, reduce wait time at end of each time step

Use of highly refined solid mesh leads to expensive broadcast of solid boundary location data from fluid server to other nodes (long magenta bars)

Colored bars indicate portion of total execution time spent by each node in various functions

Page 33: Performance Technology for Component Software

June 24, 2002 Argonne CCA Meeting33

Applications: SAMRAI (LLNL) C++ PDT, MPI SAMRAI timers (groups)

Page 34: Performance Technology for Component Software

June 24, 2002 Argonne CCA Meeting34

TAU Status Instrumentation supported:

Source, preprocessor, compiler, MPI, runtime, virtual machine Languages supported:

C++, C, F90, Java, Python HPF, ZPL, HPC++, pC++...

Packages supported: PAPI [UTK], PCL [FZJ] (hardware performance counter access), Opari, PDT [UO,LANL,FZJ], DyninstAPI [U.Maryland] (instrumentation), EXPERT, EPILOG[FZJ],Vampir[Pallas], Paraver [CEPBA] (visualization)

Platforms supported: IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq Tru64 ES, Linux clusters (IA-32, IA-64, PowerPC, Alpha), Apple, Windows, Hitachi SR8000, NEC SX, Cray T3E ...

Compilers suites supported: GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, Compaq,HP, Fujitsu,

Hitachi, Sun, Apple, Microsoft, NEC, Cray, PGI, Absoft, … Thread libraries supported:

Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

Page 35: Performance Technology for Component Software

Oct 16, 2002 LACSI 200235

Concluding Remarks

Complex component systems pose challenging performance analysis problems that require robust methodologies and tools

New performance problems will arise Instrumentation and measurement Data analysis and presentation Diagnosis and tuning

Performance engineered components Performance knowledge, observation, query and control

Integration of performance technology

Page 36: Performance Technology for Component Software

Support Acknowledgement

TAU and PDT support: Department of Energy (DOE)

DOE 2000 ACTS contract DOE MICS contract DOE ASCI Level 3 (LANL, LLNL) U. of Utah DOE ASCI Level 1 subcontract

DARPA NSF National Young Investigator (NYI) award