Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi...

19
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi-Keung Luk, Gail Lyons, Harish Patil, and Ady Tal, Intel Corp. Meng Ying GNU OSLab 1

Transcript of Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi...

Page 1: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

1

Analyzing parallel programs with Pin

Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi-Keung Luk, Gail Lyons, Harish Patil, and Ady Tal, Intel Corp.

Meng YingGNU OSLab

Page 2: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

2

Contents

• Introduction• Pintool• Instrumentation modes• Instrumenting parallel programs• Performance considerations• Multithreaded versus multiprocess instrumentation • Example tools• Experimental results • Conclusion

Page 3: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

3

Introduction

• Instrumentation– A technique for collecting the information needed to

understand programs.

• Instrumentation-based tools– They typically insert extra code into a program to record

events during execution.

• Pin – A software system that performs runtime binary

instrumentation of Linux and Microsoft Windows applications.

– Pin ‘s goal is to provide an instrumentation platform for building a wide variety of program analysis tools.

Page 4: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

4

Pintool

• Instrumentation routines– inspect the application’s instructions and insert calls to

analysis routines.

• Analysis routines– are called when the program executes an instrumented

instruction and often perform ancillary tasks.

• Callback routines– are called when specific conditions are met, or when a

certain event has occurred, for example, when it is about to exit.

Page 5: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

PintoolA simple pintool

Instrumentation

Analysis

Callback

5

Page 6: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

6

Instrumentation modes

• JIT mode– supports all features of Pin.– JIT mode uses a just-in-time compiler to recompile all

program code and insert instrumentation.• Probe mode– supports a limited feature set but is far faster, adding

almost no overhead to program running time.– Pin overwrites the entry point of procedures with

jumps (called probes) to dynamically generated instrumentation.

Page 7: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

7

Instrumentation modes• Probe mode example

Page 8: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

8

Instrumenting parallel programs

• Pintool author– Taking some special care

• When a pintool instruments a parallel program, the application threads execute the calls to analysis functions.

– making the analysis functions thread-safe .• using locks to synchronize references to shared data with other

threads. (figure1)

• Pin – provides callbacks when a new thread or new process is

created. Analysis routines can be passed a thread ID.– Pin provides APIs for allocating and addressing thread-

local storage.

Page 9: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

9

Performance considerations

• A highly contended lock – serializes execution and leads to poor CPU

utilization. – Pintool authors • must employ standard parallel programming

techniques to avoid excessive serialization. • should use thread-local storage to avoid the need to

lock global storage. • Instead of a single monolithic lock for a data structure,

they should use fine-grained locks.

Page 10: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

10

Performance considerations• False sharing

– is occurring when multiple threads access different parts of the same cache line and at least one of them is a write.

– Developers can eliminate false sharing by padding critical data structures to the size of a cache line or rearranging the structures’ data layout.

Page 11: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

11

Multithreaded versus multiprocess instrumentation

Multithreaded Multiprocess

Pin notifies the pintool when a new one is created. No Yes

The pintool must reinstrument. No Yes

Sharing pintool data Yes No

Accessing the same data Yes No

Page 12: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

12

Example tools

• Intel Parallel Inspector– To find memory and threading errors , it uses Pin to instrument

all machine instructions in the program that reference memory and records the effective addresses. (memory leaks, references to uninitialized data, data races, and deadlocks )

• Intel Trace Analyzer and Collector– provides information critical to understanding and optimizing

cluster performance by quickly finding performance bottlenecks with Message Passing Interface (MPI) communication. The tool presents a timeline showing when MPI messages are sent and received.

Page 13: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

13

Example tools

• Intel Parallel Amplifier (three types of analysis)– Hotspots

• attribute time to source lines and call stacks, identifying the parts of the programs that would benefit from tuning and parallelism.

– Concurrency • measures the CPUs’ utilization, giving whole program and per-

function summaries.

– Locks and waits• measures the time multithreaded programs spend waiting on

locks, attributing time to synchronization objects and source lines.

Page 14: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

14

Example tools

• CMP$im – is a cache profiler which uses pin to collect the memory

addresses of multithreaded and multiprocessor programs, and uses a memory system’s software model to analyze program behavior.

• PinPlay– enables the capture and deterministic replay of the running of

multithreaded programs under pin. Capturing the running of a program helps developers overcome the non-determinism inherent in multithreading

Page 15: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

15

Example tools

• Prospector – discovers potential parallelism in serial programs by loop and

data-dependence profiling.

• Intel Software Development Emulator– is a user-level functional emulator for new instructions in the

Intel64 instruction set built on Pin.– uses Pin to alter the program while it is running.– is primarily a tool for debugging programs that use new

instructions before the corresponding hardware exists.

Page 16: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

16

Experimental results• Runtime overhead

Page 17: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

17

Experimental results

• Scalable workload performance

Page 18: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

18

Conclusion

• Pin was originally conceived as a tool for computer architecture analysis. A user community latched on to its flexible API and high performance, taking the tool into unforeseen domains like security, emulation, and parallel program analysis. We can make tools based on Pin to help us get a deeper understanding of our software’s behavior.

Page 19: Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

19

Thank you !