Projections –APerformance Tool for...

Projections – A Performance Tool forCharm++Applications

Chee Wai [email protected]

Parallel Programming Laboratory

Dept. of Computer Science

University of Illinois at Urbana Champaign

http://charm.cs.uiuc.edu

Outline• General Introduction to Projections

• Projections Basics

• Advanced Features

• Features to aid Effective Analysis

• Extremely Large Datasets

• Tips and Notes

10/19/05 Projections Tutorial 3

Projections

• Projections is a performance tool designed for use withCharm++/AMPI.

• Trace-based, post-mortem analysis.

• Supports highly detailed traces, summary formats and aflexible user-level API.

• Java-based visualization tool for presentingperformance information.


Charm++ Model

• Object Oriented. Chares (objects) encapsulate data,standard C++ methods and entry methods.

• Message Driven. Entry methods represent work unitsactivated by an incoming message.

• Only one entry method may execute at any time in aChare.

• A runtime schedules incoming messages.

What you will need

• A version of Charm++ built without theCMK_OPTIMIZE flag.– User-built, download our release or check out a copy fromanonymous CVS.

– Pre-built, check with your machine sysadmin.

• Java Runtime 1.3.1 or higher.

• Projections Visualization binary (projections.jar)– User-built Charm++, located in charm/tools/projections/bin

– Pre-built, check with sysadmin or acquire the binaryseparately.

Outline: Basics• General Introduction to Projections

• Projections Basics– Instrumentation

– Trace generation

– Visualization




Trace Generation: Basics

– Automatic trace instrumentation - No user codesrequired by default.

– Any Charm++ version built without theCMK_OPTIMIZE flag supports tracing.

– All Charm++ entry methods and messaging eventsare traced.

Trace Generation: Steps

• Link your application with link-time options“-tracemode projections -tracemode summary”

• Run your application normally.

• At the end of the run, you will see “.log”,“.sum” as wellas “.sts” files generated on the same directory as yourapplication binary.

Visualization: Basic Steps

• Run the script found at charm/tools/projections/bin assuch:– projections [<application>.sts]

• Or activate the Java binary projections.jar via thecommand-line passing, in the optional <application>.stsfilename as argument.


Visualization: Main Window

Visualization: Overview

Visualization: Usage Profile

Visualization: Time Profile

Visualization: Timeline

Visualization: Task Histogram

Visualization: Communication

Visualization: Tabulated Call Info


Projections Basic demo.

Outline: Advanced Features

• General Introduction to Projections


• Advanced Features– Partial Tracing

– Tracing User Events

– Tracing AMPI Functions



• Tips and Notes


Partial Trace Generation

• The following API calls are provided by the tracingframework:– void traceBegin()

– void traceEnd()

– The above calls turns tracing on/off for the processor onwhich the call was made.

– int traceIsOn() queries the tracing framework status.

• +traceoff runtime option– Causes tracing (over all processors) to be turned off when theapplication is started.

Partial Tracing “watch-it”s• traceBegin() and traceEnd() calls apply only on theprocessor on which it invoked.– Offers flexibility but vulnerable to programmer error.

– Typically used in a collective manner. (eg. NAMD does this atspecific load balancing operations)

• Partial trace calls are invoked in the context of an entrymethod. One should be prepared to drop initialperformance data just after turning tracing on and justbefore turning tracing off.

• Appropriate use of the +traceoff runtime option isessential.

Partial Tracing Example// in the case when trace is off at the beginning,

// only turn trace of from after the first LB to the firstLdbStep after

// the second LB.

// 1 2 3 4 5 6 7

// off on Alg7 refine refine ... on

#if CHARM_VERSION >= 050606

if (traceAvailable()) {

static int specialTracing = 0;

if (ldbCycleNum == 1 && traceIsOn() == 0) specialTracing = 1;

if (specialTracing) {

if (ldbCycleNum == 4) traceBegin();

if (ldbCycleNum == 6) traceEnd();

}

}

#endif


Tracing User Events

• The following APIs are provided for user eventregistration and tracing:

– int traceRegisterUserEvent(char *eventDesc, int EventNum=-1)

• Acquire or specify an event ID to be associated with event name.

– void traceUserEvent(int eventNum)

• Use a valid event ID to record an event.

– void traceUserBracketEvent(int eventNum, double startTime,

double endTime)

• Use a valid event ID to record an event interval.

AMPI Function Tracing API

• Works like User Events in Charm++Applications.

• Why? MPI Function abstraction is invisible to theCharm++ runtime.

• REGISTER_FUNCTION(<namestring>) to register<namestring> as a string-id to be traced.

• TRACEFUNC(<funcall>,<namestring>) to record a callto function <funcall> to be associated with the eventregistered as <namestring>.

Advanced Features Demo

Outline: Effective Analysis




• Features to aid Effective Analysis– How Tracing Works

– Memory Footprint Control

– Data Volume Control

– Visualization Controls


• Tips and Notes

Log Event Tracing

• Each Charm++ event and any registered user events arerecorded into a pre-assigned memory buffer on eachprocessor.

• Default Buffer size is 10,000 trace log entries.

• When a buffer is full, a special flush event is logged andbuffer is flushed to disk.

• Flushing is done independent of other processors. Thereis no synchronization on a flush.

Summary Tracing

• Invoked by the same event set as in Log tracing. Userevents are, however, ignored.

• Memory buffer is organized into k bins of representingan initial time of 1ms. Each event contributes data intothe appropriate time-bin.

• When a buffer is filled, bin-time representation isdoubled and the data is packed into the first k/2 bins.Event contribution continues at the (k/2+1)th bin.

Memory Footprint Control

• Tracing Memory Footprint– Event Log tracemode (-tracemode projections)

• Default 10,000 event entries.

• Controlled by runtime flag “+logsize <size>”.

– Summary tracemode (-tracemode summary)

• Default 10,000 time bins of 1 ms.

• Controlled by runtime flags “+bincount <#bins>” and“+binsize <seconds>”.

Memory Footprint Control (2)

• Trade-offs to consider– Log Buffers

• Flush overhead vsMemory usage

• Frequency of flushes vs Size of flushes

– Summary Buffers

• Frequency of compaction + Data Granularity vsMemoryusage

Controlling Data Volume

• [Code-time] User API for partial tracing.

• [Link-time] Generating only summary data.

• [Run-time] Writing compressed output (runtime flag):+gz-trace

• [Post-run] Deleting subset of generated logs.

• [Visualization] Parameter range control, analysis“Memory”.

Visualization Parameter Control


Specific Visualization Tool Issues

• Memory usage constraints– Timeline - dependent on the event-density of the selectedtime range of the application. Typical workable range is10ms to 10s for between 10-20 processors.

– Processor based tools (Overview, Communications, UsageProfile) - limit to 2000 processors or less.

– Interval based tools (Graph, Time Profile, Communication vsTime, Animation) - limit to 1500 time intervals.

– Histogram tools – limit to 1000 bins or less.

Analysis Techniques

• Zoom in from wide-ranged low-detail views to detailedlook at problem spots.

• Make use of range histories.

• Control data volume.

• Use effective colors.

• Make effective use of specific tool features (seemanual).

Analysis Techniques (2)

• Load Imbalance: Overview, Usage Profile.

• Where's my work going?: Time Profile.

• How is my communication behaving?: Communication,Communication vs Time.

• Are there critical paths? I need details!: Timeline.

Putting it all together

Outline: Extremely Large Datasets



• How Tracing Works



• Tips and Notes

Considerations for Extremely Large Datasets

• Large number of files – ware thee the filesystem.

• Huge amounts of data – control!

Demo – NAMD on 8192 processors

Outline: Miscellany



• How Tracing Works



• Tips and Notes


Conveniently Placing Logs

• Specifying a user-defined output location (runtimeoption):

+traceroot <desired log directory>

– It is important to note that <desired log directory>must be available on a machine's compute nodes.


Perturbation Issues

• Perturbation of application.– Tracing overhead

• Timer overheads (rdtsc, machine wallclock).

• Acquisition of performance data and storage.

– Observed in the case of NAMD with timesteps below 10mswith many compute objects in the microsecond range.

Look Closely!

• Do not be fooled! Projections does not handle somevisualization artifacts well:– Fine grain details can sometimes look like one big solid blockon timeline.

– It is hard to mouse-over items that represent fine-grainedevents.

– Other times, tiny slivers of activity become too small to bedrawn.

Answering my questions (again!)

• Load Imbalance: Overview, Usage Profile.

• Where's my work going?: Time Profile.

• How is my communication behaving?: Communication,Communication vs Time.

• Are there critical paths? I need details!: Timeline.

Frequently Asked Questions

Q: I tried user events and projections visualization crashes!

A: Did you register the events before using them?

Q: There are giant stretched event(s) in my run!

A: Did you set a large enough log buffer size?

Q: Projections visualization crashes!

A: Are your logs corrupted? These can happen on bad I/O(experienced on NFS).

Projections –APerformance Tool for...

Documents

Transcript of Projections –APerformance Tool for...