Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon Steely Jr., Joel Emer
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi...
-
Upload
milton-sullivan -
Category
Documents
-
view
212 -
download
0
Transcript of Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi...
![Page 1: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/1.jpg)
1
Analyzing parallel programs with Pin
Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi-Keung Luk, Gail Lyons, Harish Patil, and Ady Tal, Intel Corp.
Meng YingGNU OSLab
![Page 2: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/2.jpg)
2
Contents
• Introduction• Pintool• Instrumentation modes• Instrumenting parallel programs• Performance considerations• Multithreaded versus multiprocess instrumentation • Example tools• Experimental results • Conclusion
![Page 3: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/3.jpg)
3
Introduction
• Instrumentation– A technique for collecting the information needed to
understand programs.
• Instrumentation-based tools– They typically insert extra code into a program to record
events during execution.
• Pin – A software system that performs runtime binary
instrumentation of Linux and Microsoft Windows applications.
– Pin ‘s goal is to provide an instrumentation platform for building a wide variety of program analysis tools.
![Page 4: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/4.jpg)
4
Pintool
• Instrumentation routines– inspect the application’s instructions and insert calls to
analysis routines.
• Analysis routines– are called when the program executes an instrumented
instruction and often perform ancillary tasks.
• Callback routines– are called when specific conditions are met, or when a
certain event has occurred, for example, when it is about to exit.
![Page 5: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/5.jpg)
PintoolA simple pintool
Instrumentation
Analysis
Callback
5
![Page 6: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/6.jpg)
6
Instrumentation modes
• JIT mode– supports all features of Pin.– JIT mode uses a just-in-time compiler to recompile all
program code and insert instrumentation.• Probe mode– supports a limited feature set but is far faster, adding
almost no overhead to program running time.– Pin overwrites the entry point of procedures with
jumps (called probes) to dynamically generated instrumentation.
![Page 7: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/7.jpg)
7
Instrumentation modes• Probe mode example
![Page 8: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/8.jpg)
8
Instrumenting parallel programs
• Pintool author– Taking some special care
• When a pintool instruments a parallel program, the application threads execute the calls to analysis functions.
– making the analysis functions thread-safe .• using locks to synchronize references to shared data with other
threads. (figure1)
• Pin – provides callbacks when a new thread or new process is
created. Analysis routines can be passed a thread ID.– Pin provides APIs for allocating and addressing thread-
local storage.
![Page 9: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/9.jpg)
9
Performance considerations
• A highly contended lock – serializes execution and leads to poor CPU
utilization. – Pintool authors • must employ standard parallel programming
techniques to avoid excessive serialization. • should use thread-local storage to avoid the need to
lock global storage. • Instead of a single monolithic lock for a data structure,
they should use fine-grained locks.
![Page 10: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/10.jpg)
10
Performance considerations• False sharing
– is occurring when multiple threads access different parts of the same cache line and at least one of them is a write.
– Developers can eliminate false sharing by padding critical data structures to the size of a cache line or rearranging the structures’ data layout.
![Page 11: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/11.jpg)
11
Multithreaded versus multiprocess instrumentation
Multithreaded Multiprocess
Pin notifies the pintool when a new one is created. No Yes
The pintool must reinstrument. No Yes
Sharing pintool data Yes No
Accessing the same data Yes No
![Page 12: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/12.jpg)
12
Example tools
• Intel Parallel Inspector– To find memory and threading errors , it uses Pin to instrument
all machine instructions in the program that reference memory and records the effective addresses. (memory leaks, references to uninitialized data, data races, and deadlocks )
• Intel Trace Analyzer and Collector– provides information critical to understanding and optimizing
cluster performance by quickly finding performance bottlenecks with Message Passing Interface (MPI) communication. The tool presents a timeline showing when MPI messages are sent and received.
![Page 13: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/13.jpg)
13
Example tools
• Intel Parallel Amplifier (three types of analysis)– Hotspots
• attribute time to source lines and call stacks, identifying the parts of the programs that would benefit from tuning and parallelism.
– Concurrency • measures the CPUs’ utilization, giving whole program and per-
function summaries.
– Locks and waits• measures the time multithreaded programs spend waiting on
locks, attributing time to synchronization objects and source lines.
![Page 14: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/14.jpg)
14
Example tools
• CMP$im – is a cache profiler which uses pin to collect the memory
addresses of multithreaded and multiprocessor programs, and uses a memory system’s software model to analyze program behavior.
• PinPlay– enables the capture and deterministic replay of the running of
multithreaded programs under pin. Capturing the running of a program helps developers overcome the non-determinism inherent in multithreading
![Page 15: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/15.jpg)
15
Example tools
• Prospector – discovers potential parallelism in serial programs by loop and
data-dependence profiling.
• Intel Software Development Emulator– is a user-level functional emulator for new instructions in the
Intel64 instruction set built on Pin.– uses Pin to alter the program while it is running.– is primarily a tool for debugging programs that use new
instructions before the corresponding hardware exists.
![Page 16: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/16.jpg)
16
Experimental results• Runtime overhead
![Page 17: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/17.jpg)
17
Experimental results
• Scalable workload performance
![Page 18: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/18.jpg)
18
Conclusion
• Pin was originally conceived as a tool for computer architecture analysis. A user community latched on to its flexible API and high performance, taking the tool into unforeseen domains like security, emulation, and parallel program analysis. We can make tools based on Pin to help us get a deeper understanding of our software’s behavior.
![Page 19: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e735503460f94b73b8f/html5/thumbnails/19.jpg)
19
Thank you !