Projections –APerformance Tool for...
Transcript of Projections –APerformance Tool for...
Projections – A Performance Tool forCharm++Applications
Chee Wai [email protected]
Parallel Programming Laboratory
Dept. of Computer Science
University of Illinois at Urbana Champaign
http://charm.cs.uiuc.edu
Outline• General Introduction to Projections
• Projections Basics
• Advanced Features
• Features to aid Effective Analysis
• Extremely Large Datasets
• Tips and Notes
10/19/05 Projections Tutorial 3
Projections
• Projections is a performance tool designed for use withCharm++/AMPI.
• Trace-based, post-mortem analysis.
• Supports highly detailed traces, summary formats and aflexible user-level API.
• Java-based visualization tool for presentingperformance information.
10/19/05 Projections Tutorial 4
Charm++ Model
• Object Oriented. Chares (objects) encapsulate data,standard C++ methods and entry methods.
• Message Driven. Entry methods represent work unitsactivated by an incoming message.
• Only one entry method may execute at any time in aChare.
• A runtime schedules incoming messages.
What you will need
• A version of Charm++ built without theCMK_OPTIMIZE flag.– User-built, download our release or check out a copy fromanonymous CVS.
– Pre-built, check with your machine sysadmin.
• Java Runtime 1.3.1 or higher.
• Projections Visualization binary (projections.jar)– User-built Charm++, located in charm/tools/projections/bin
– Pre-built, check with sysadmin or acquire the binaryseparately.
Outline: Basics• General Introduction to Projections
• Projections Basics– Instrumentation
– Trace generation
– Visualization
• Advanced Features
• Features to aid Effective Analysis
• Extremely Large Datasets
Trace Generation: Basics
– Automatic trace instrumentation - No user codesrequired by default.
– Any Charm++ version built without theCMK_OPTIMIZE flag supports tracing.
– All Charm++ entry methods and messaging eventsare traced.
Trace Generation: Steps
• Link your application with link-time options“-tracemode projections -tracemode summary”
• Run your application normally.
• At the end of the run, you will see “.log”,“.sum” as wellas “.sts” files generated on the same directory as yourapplication binary.
Visualization: Basic Steps
• Run the script found at charm/tools/projections/bin assuch:– projections [<application>.sts]
• Or activate the Java binary projections.jar via thecommand-line passing, in the optional <application>.stsfilename as argument.
10/19/05 Projections Tutorial 10
Visualization: Main Window
Visualization: Overview
Visualization: Usage Profile
Visualization: Time Profile
Visualization: Timeline
Visualization: Task Histogram
Visualization: Communication
Visualization: Tabulated Call Info
10/19/05 Projections Tutorial 18
Projections Basic demo.
Outline: Advanced Features
• General Introduction to Projections
• Projections Basics
• Advanced Features– Partial Tracing
– Tracing User Events
– Tracing AMPI Functions
• Features to aid Effective Analysis
• Extremely Large Datasets
• Tips and Notes
10/19/05 Projections Tutorial 20
Partial Trace Generation
• The following API calls are provided by the tracingframework:– void traceBegin()
– void traceEnd()
– The above calls turns tracing on/off for the processor onwhich the call was made.
– int traceIsOn() queries the tracing framework status.
• +traceoff runtime option– Causes tracing (over all processors) to be turned off when theapplication is started.
Partial Tracing “watch-it”s• traceBegin() and traceEnd() calls apply only on theprocessor on which it invoked.– Offers flexibility but vulnerable to programmer error.
– Typically used in a collective manner. (eg. NAMD does this atspecific load balancing operations)
• Partial trace calls are invoked in the context of an entrymethod. One should be prepared to drop initialperformance data just after turning tracing on and justbefore turning tracing off.
• Appropriate use of the +traceoff runtime option isessential.
Partial Tracing Example// in the case when trace is off at the beginning,
// only turn trace of from after the first LB to the firstLdbStep after
// the second LB.
// 1 2 3 4 5 6 7
// off on Alg7 refine refine ... on
#if CHARM_VERSION >= 050606
if (traceAvailable()) {
static int specialTracing = 0;
if (ldbCycleNum == 1 && traceIsOn() == 0) specialTracing = 1;
if (specialTracing) {
if (ldbCycleNum == 4) traceBegin();
if (ldbCycleNum == 6) traceEnd();
}
}
#endif
10/19/05 Projections Tutorial 23
Tracing User Events
• The following APIs are provided for user eventregistration and tracing:
– int traceRegisterUserEvent(char *eventDesc, int EventNum=-1)
• Acquire or specify an event ID to be associated with event name.
– void traceUserEvent(int eventNum)
• Use a valid event ID to record an event.
– void traceUserBracketEvent(int eventNum, double startTime,
double endTime)
• Use a valid event ID to record an event interval.
AMPI Function Tracing API
• Works like User Events in Charm++Applications.
• Why? MPI Function abstraction is invisible to theCharm++ runtime.
• REGISTER_FUNCTION(<namestring>) to register<namestring> as a string-id to be traced.
• TRACEFUNC(<funcall>,<namestring>) to record a callto function <funcall> to be associated with the eventregistered as <namestring>.
Advanced Features Demo
Outline: Effective Analysis
• General Introduction to Projections
• Projections Basics
• Advanced Features
• Features to aid Effective Analysis– How Tracing Works
– Memory Footprint Control
– Data Volume Control
– Visualization Controls
• Extremely Large Datasets
• Tips and Notes
Log Event Tracing
• Each Charm++ event and any registered user events arerecorded into a pre-assigned memory buffer on eachprocessor.
• Default Buffer size is 10,000 trace log entries.
• When a buffer is full, a special flush event is logged andbuffer is flushed to disk.
• Flushing is done independent of other processors. Thereis no synchronization on a flush.
Summary Tracing
• Invoked by the same event set as in Log tracing. Userevents are, however, ignored.
• Memory buffer is organized into k bins of representingan initial time of 1ms. Each event contributes data intothe appropriate time-bin.
• When a buffer is filled, bin-time representation isdoubled and the data is packed into the first k/2 bins.Event contribution continues at the (k/2+1)th bin.
Memory Footprint Control
• Tracing Memory Footprint– Event Log tracemode (-tracemode projections)
• Default 10,000 event entries.
• Controlled by runtime flag “+logsize <size>”.
– Summary tracemode (-tracemode summary)
• Default 10,000 time bins of 1 ms.
• Controlled by runtime flags “+bincount <#bins>” and“+binsize <seconds>”.
Memory Footprint Control (2)
• Trade-offs to consider– Log Buffers
• Flush overhead vsMemory usage
• Frequency of flushes vs Size of flushes
– Summary Buffers
• Frequency of compaction + Data Granularity vsMemoryusage
Controlling Data Volume
• [Code-time] User API for partial tracing.
• [Link-time] Generating only summary data.
• [Run-time] Writing compressed output (runtime flag):+gz-trace
• [Post-run] Deleting subset of generated logs.
• [Visualization] Parameter range control, analysis“Memory”.
Visualization Parameter Control
10/19/05 Projections Tutorial 33
Specific Visualization Tool Issues
• Memory usage constraints– Timeline - dependent on the event-density of the selectedtime range of the application. Typical workable range is10ms to 10s for between 10-20 processors.
– Processor based tools (Overview, Communications, UsageProfile) - limit to 2000 processors or less.
– Interval based tools (Graph, Time Profile, Communication vsTime, Animation) - limit to 1500 time intervals.
– Histogram tools – limit to 1000 bins or less.
Analysis Techniques
• Zoom in from wide-ranged low-detail views to detailedlook at problem spots.
• Make use of range histories.
• Control data volume.
• Use effective colors.
• Make effective use of specific tool features (seemanual).
Analysis Techniques (2)
• Load Imbalance: Overview, Usage Profile.
• Where's my work going?: Time Profile.
• How is my communication behaving?: Communication,Communication vs Time.
• Are there critical paths? I need details!: Timeline.
Putting it all together
Outline: Extremely Large Datasets
• General Introduction to Projections
• Projections Basics
• How Tracing Works
• Features to aid Effective Analysis
• Extremely Large Datasets
• Tips and Notes
Considerations for Extremely Large Datasets
• Large number of files – ware thee the filesystem.
• Huge amounts of data – control!
Demo – NAMD on 8192 processors
Outline: Miscellany
• General Introduction to Projections
• Projections Basics
• How Tracing Works
• Features to aid Effective Analysis
• Extremely Large Datasets
• Tips and Notes
10/19/05 Projections Tutorial 41
Conveniently Placing Logs
• Specifying a user-defined output location (runtimeoption):
+traceroot <desired log directory>
– It is important to note that <desired log directory>must be available on a machine's compute nodes.
10/19/05 Projections Tutorial 42
Perturbation Issues
• Perturbation of application.– Tracing overhead
• Timer overheads (rdtsc, machine wallclock).
• Acquisition of performance data and storage.
– Observed in the case of NAMD with timesteps below 10mswith many compute objects in the microsecond range.
Look Closely!
• Do not be fooled! Projections does not handle somevisualization artifacts well:– Fine grain details can sometimes look like one big solid blockon timeline.
– It is hard to mouse-over items that represent fine-grainedevents.
– Other times, tiny slivers of activity become too small to bedrawn.
Answering my questions (again!)
• Load Imbalance: Overview, Usage Profile.
• Where's my work going?: Time Profile.
• How is my communication behaving?: Communication,Communication vs Time.
• Are there critical paths? I need details!: Timeline.
Frequently Asked Questions
Q: I tried user events and projections visualization crashes!
A: Did you register the events before using them?
Q: There are giant stretched event(s) in my run!
A: Did you set a large enough log buffer size?
Q: Projections visualization crashes!
A: Are your logs corrupted? These can happen on bad I/O(experienced on NFS).