Kent Milfeld TACC May 28, 2009
description
Transcript of Kent Milfeld TACC May 28, 2009
![Page 1: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/1.jpg)
Profiling
Kent Milfeld
TACCMay 28, 2009
![Page 2: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/2.jpg)
Outline
• Profiling MPI performance with MPIp• Tau• gprof• Example: matmult
![Page 3: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/3.jpg)
MPIp
• Scalable Profiling library for MPI apps• Lightweight• Collects statistics of MPI functions• Communicates only during report generation • Less data than tracing tools
http://mpip.sourceforge.net
![Page 4: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/4.jpg)
Usage, Instrumentation, Analysis• How to use
– No recompiling required!– Profiling gathered in MPI profiling layer
• Link static library before default MPI libraries -g -L${TACC_MPIP_LIB} -lmpiP -lbfd -liberty -lintl –lm
• What to analyze– Overview of time spent in MPI communication during the
application run– Aggregate time for individual MPI call
![Page 5: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/5.jpg)
Control• External Control: Set MPIP environment variable (threshold,
callsite depth)– setenv MPIP ‘-t 10 –k 2’– export MPIP= ‘-t 10 –k 2’
• Use MPI_Pcontrol(#) to limit profiling to specific code blocks
Pcontrol Arg Behavior
0 Disable profiling.
1 Enable Profiling.
2 Reset all callsite data.
3 Generate verbose report.
4 Generate concise report.
MPI_Pcontrol(2);MPI_Pcontrol(1); MPI_Abc(…);MPI_Pcontrol(0);
call MPI_Pcontrol(2)call MPI_Pcontrol(1) call MPI_Abc(…)call MPI_Pcontrol(0)
F90
C
![Page 6: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/6.jpg)
• Untar the mpip tar file. % tar -xvf ~train00/mpip.tar; cd mpip• Read the “Instructions” file– setup Env. with sourceme-files% source sourceme.csh (C-type shells)% source sourceme.sh (Bash-type shells)
Example
module purge {Resets to default modules.}module load TACCmodule unload mvapich {Change compiler/MPI to intel/mvapich.} module swap pgi intelmodule load mvapich
module load mpiP {Load up mpiP.}
set path=($path /share/home/00692/train00/mvapich1_intel/Mpipview_2.02/bin){Sets PATH with mpipview dir. – separate software from mpiP.)}
![Page 7: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/7.jpg)
• Compile either the matmultc.c or matmultf.f90 code.
% make matmultfOr
% make matmultc
• Uncomment “ibrun matmultc” or “ibrun matmultf” in the job file.
• Submit job.% qsub job
• Results in:<exec_name><unique number>.mpiP
Example (cont.)
![Page 8: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/8.jpg)
Example *.mpiP Output
• MPI-Time: wall-clock time for all MPI calls
• MPI callsites
![Page 9: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/9.jpg)
Output (cont.)
• Message size
• Aggregate time
![Page 10: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/10.jpg)
e.g. % mpipview matmultf.32.6675.1.mpiPNote: list selection.
mpipview
![Page 11: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/11.jpg)
Indexed
![Page 12: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/12.jpg)
Call Sites
![Page 13: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/13.jpg)
Statistics
![Page 14: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/14.jpg)
Message Sizes
![Page 15: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/15.jpg)
Tau Outline
• General– Measurements– Instrumentation & Control– Example: matmult
• Profiling and Tracing– Event Tracing– Steps for Performance Evaluation– Tau Architecture
• A look at a task-parallel MxM Implementation• Paraprof Interface
15
![Page 16: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/16.jpg)
General
• Tuning and Analysis Utilities (11+ year project effort)
www.cs.uoregon.edu/research/paracomp/tau/
• Performance system framework for parallel, shared & distributed memory systems
• Targets a general complex system computation model– Nodes / Contexts / Threads
• Integrated toolkit for performance instrumentation, measurement, analysis, and visualization
TAU = Profiler and Tracer + Hardware Counters + GUI + Database
16
![Page 17: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/17.jpg)
Tau: Measurements• Parallel profiling
– Function-level, block (loop)-level, statement-level– Supports user-defined events– TAU parallel profile data stored during execution– Hardware counter values– Support for multiple counters– Support for callgraph and callpath profiling
• Tracing– All profile-level events– Inter-process communication events– Trace merging and format conversion
17
![Page 18: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/18.jpg)
PDT is used to instrument your code.
Replace mpicc and mpif90 in make files with tau_f90.sh and tau_cc.sh
It is necessary to specify all the components that will be used in the instrumentation (mpi, openmp, profiling, counters [PAPI], etc. However, these come in a limited number of combinations.)
Combinations: First determine what you want to do (profiling, PAPI counters, tracing, etc.) and the programming paradigm (mpi, openmp), and the compiler. PDT is a required component:
Tau: Instrumentation
PAPICallpath
…
MPIOMP
…
intelpgignu
PDTHand-code
Collectors Compiler:ParallelParadigmInstrumentation
18
![Page 19: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/19.jpg)
You can view the available combinations(alias tauTypes 'ls -C1 $TAU | grep Makefile ' ).
Your selected combination is made known to the compiler wrapper through the TAU_MAKEFILE environment variable.
E.g. the PDT instrumention (pdt) for the Intel compiler (icpc) for MPI (mpi) is set with this command:
setenv TAU_MAKEFILE /…/Makefile.tau-icpc-mpi-pdt
Other run-time and instrumentation options are set through TAU_OPTIONS. For verbose:
setenv TAU_OPTIONS ‘-optVerbose’
Tau: Instrumentation
19
![Page 20: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/20.jpg)
% tar –xvf ~train00/tau.tar
% cd tau READ the Instructions file % source sourceme.csh or % source sourceme.sh create env. (modules and TAU_MAKEFILE)
% make matmultf create executable(s)or
% make matmultc
% qsub job submit job (edit and uncomment ibrun line)
% paraprof (for GUI) Analyze performance data:
Tau Example
20
![Page 21: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/21.jpg)
Definitions – Profiling• Profiling
– Recording of summary information during execution• inclusive, exclusive time, # calls, hardware statistics, …
– Reflects performance behavior of program entities• functions, loops, basic blocks• user-defined “semantic” entities
– Very good for low-cost performance assessment– Helps to expose performance bottlenecks and hotspots– Implemented through
• sampling: periodic OS interrupts or hardware counter traps• instrumentation: direct insertion of measurement code
21
![Page 22: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/22.jpg)
Definitions – Tracing
TracingRecording of information about significant points (events)
during program executionentering/exiting code region (function, loop, block, …) thread/process interactions (e.g., send/receive message)
Save information in event record timestampCPU identifier, thread identifierEvent type and event-specific information
Event trace is a time-sequenced stream of event records Can be used to reconstruct dynamic program behaviorTypically requires code instrumentation
22
![Page 23: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/23.jpg)
Event Tracing: Instrumentation, Monitor, Trace
1 master
2 worker
3 ...
void worker { trace(ENTER, 2);
... recv(A, tag, buf); trace(RECV, A);
... trace(EXIT, 2);
}
void master { trace(ENTER, 1);
... trace(SEND, B);
send(B, tag, buf); ...
trace(EXIT, 1);}
MONITOR58 A ENTER 1
60 B ENTER 2
62 A SEND B
64 A EXIT 1
68 B RECV A
...
69 B EXIT 2
...
CPU A:
CPU B:
Event definition
timestamp
23
![Page 24: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/24.jpg)
Event Tracing: “Timeline” Visualization
1 master
2 worker
3 ...
58 A ENTER 1
60 B ENTER 2
62 A SEND B
64 A EXIT 1
68 B RECV A
...
69 B EXIT 2
...
mainmasterworker
58 60 62 64 66 68 70
B
A
24
![Page 25: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/25.jpg)
Steps of Performance Evaluation
Collect basic routine-level timing profile to determine where most time is being spent
Collect routine-level hardware counter data to determine types of performance problems
Collect callpath profiles to determine sequence of events causing performance problems
Conduct finer-grained profiling and/or tracing to pinpoint performance bottlenecksLoop-level profiling with hardware countersTracing of communication operations
25
![Page 26: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/26.jpg)
TAU Performance System Architecture
26
![Page 27: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/27.jpg)
Overview of Matmult: C = A x B
CreateA & B
A BC = x
Order NP Tasks
MASTER Worker
Receive B
SendRow of A
Multiply row a x B
Send Back row of C
Receive a
ReceiveRow of C
Send B
jj x=
27
![Page 28: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/28.jpg)
Preparation of Matmult: C = A x B
PE 0
CreateA
CreateB
GenerateA & B
BroadcastB to All
by columns
PE 0 PE x
MPI_Bcast( b(1,i)…
loop over i (i=1n)
PE 0
A BC = x
Order NP Tasks
MASTER
28
![Page 29: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/29.jpg)
Master Ops of Matmult: C = A x B
Master (0) sends rows 1 through (p-1) to
slaves (1p-1) receives
1
p-1
PE 0 PE 1 -- p-1
MPI_Send(arow … i,i
loop over i (i=1p-1)
destinationtag
Master (0) receives rows 1 through (n) from
Slaves.
1
n
PE 0 PE 1 -- p-1
MPI_Recv(crow … ANY,k
loop over i (i=1n)
source,tag
MPI_Send(arow …idle,jdest,tag
A BC = x
Order NP Tasks
MASTER
29
![Page 30: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/30.jpg)
Master Ops of Matmult: C = A x B
Slave(any) sends row j of C to master, PE 0
PE 0 MPI_Send( crow … j
jrow j
Slaves multiply all Columns of B into A (row i) to form row i of Matrix C
x=
Pick up broadcast of B columns from PE 0
MPI_Recv( arow …ANY,j
loop over i (i=1n)
Matrix * Vector
row j
A BC = x
Order NP Tasks
Worker
Slave receives anyA row from PE 0 j
j
30
![Page 31: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/31.jpg)
Paraprof and Pprof
• Execute application and analyze performance data:• % qsub job
– Look for files: profile.<task_no>.
– With Multiple counters, look for directories for each counter.
• % pprof (for text based profile display)• % paraprof (for GUI)
– pprof and paraprof will discover files/directories.
– paraprof runs on PCs,Files/Directories can be downloaded to laptop and analyzed there.
31
![Page 32: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/32.jpg)
Tau Paraprof Overview
HPMToolkit
MpiP
TAU
Raw files
PerfDMFmanaged(database)
Metadata
Application
Experiment
Trial
32
![Page 33: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/33.jpg)
Tau Paraprof Manager WindowProvides Machine Details
Organizes Runs as: Applications, Experiments and Trials.
33
![Page 34: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/34.jpg)
Routine Time Experiment
Profile Information is in “GET_TIME_OF_DAY” metric
Mean and Standard Deviation Statistics given.
34
![Page 35: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/35.jpg)
Multiply_Matrices Routine Results
not from same run
Function Data Window gives a closer look at a single function:
35
![Page 36: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/36.jpg)
Float Point OPS trial
Hardware Counters provide Floating Point Operations (Function Data view).
36
![Page 37: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/37.jpg)
L1 Data Cache Miss trial
Hardware Counters provide L1 Cache Miss Operations.
37
![Page 38: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/38.jpg)
Call Path
Call Graph Paths (Must select through “thread” menu.)
38
![Page 39: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/39.jpg)
Call Path
TAU_MAKEFILE = …Makefile.tau-callpath-icpc-mpi-pdt
39
![Page 40: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/40.jpg)
Derived MetricsSelect Argument 1 (green ball); Select Argument 2 (green ball);
Select Operation; then Apply. Derived Metric will appear as a new trial.
40
![Page 41: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/41.jpg)
DerivedMetrics
Be careful even thoughratios are constant, coresmay do different amounts
of work/operations per call.
Since FP/Missratios are
constant– mustbe memory
access problem.
![Page 42: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/42.jpg)
GPROF
GPROF is the GNU Project PROFiler. gnu.org/software/binutils/
•Requires recompilation of the code.
•Compiler options and libraries provide wrappers for each routine call, and periodic sampling of the program.
•A default gmon.out file is produced with the function call information.
•GPROF links the symbol list in the executable with the data in gmon.out.
![Page 43: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/43.jpg)
Types of Profiles
• Flat Profile– CPU time spend in each function (self and cumulative) – Number of times a function is called– Useful to identify most expensive routines
• Call Graph– Number of times a function was called by other functions– Number of times a function called other functions– Useful to identify function relations– Suggests places where function calls could be eliminated
• Annotated Source– Indicates number of times a line was executed
![Page 44: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/44.jpg)
Profiling with gprofUse the -pg flag during compilation:% gcc -g -pg srcFile.c% icc -g -pg srcFile.c% pgcc -g -pg srcFile.c
Run the executable. An output file gmon.out will be generated with the profiling information.
Execute gprof and redirect the output to a file:% gprof exeFile gmon.out> profile.txt% gprof -l exeFile gmon.out> profile_line.txt% gprof -A exeFile gmon.out>profile_anotated.txt
![Page 45: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/45.jpg)
gprof example
Move into the profiling examples directory:
% cd profile/
Compile matvecop with the profiling flag:
% gcc -g -pg -lm matvecop.c {@ ~train00/matvecop.c}
Run the executable to generate the gmon.out file:
% ./a.out
Run the profiler and redirect output to a file:
% gprof a.out gmon.out > profile.txt
Open the profile file and study the instructions.
![Page 46: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/46.jpg)
Visual Call Graph
mainmain
matSqrtmatSqrt
vecSqrtvecSqrt
matCubematCube vecCubevecCubesysSqrtsysSqrt sysCubesysCube
![Page 47: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/47.jpg)
Flat profile
In the flat profile we can identify the most expensive parts of the code (in this case, the calls to matSqrt, matCube, and sysCube).
% cumulative self self total
time seconds seconds calls s/call s/call name
50.00 2.47 2.47 2 1.24 1.24 matSqrt
24.70 3.69 1.22 1 1.22 1.22 matCube
24.70 4.91 1.22 1 1.22 1.22 sysCube
0.61 4.94 0.03 1 0.03 4.94 main
0.00 4.94 0.00 2 0.00 0.00 vecSqrt
0.00 4.94 0.00 1 0.00 1.24 sysSqrt
0.00 4.94 0.00 1 0.00 0.00 vecCube
![Page 48: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/48.jpg)
Call Graph Profileindex % time self children called name
0.00 0.00 1/1 <hicore> (8)
[1] 100.0 0.03 4.91 1 main [1]
0.00 1.24 1/1 sysSqrt [3]
1.24 0.00 1/2 matSqrt [2]
1.22 0.00 1/1 sysCube [5]
1.22 0.00 1/1 matCube [4]
0.00 0.00 1/2 vecSqrt [6]
0.00 0.00 1/1 vecCube [7]
-----------------------------------------------
1.24 0.00 1/2 main [1]
1.24 0.00 1/2 sysSqrt [3]
[2] 50.0 2.47 0.00 2 matSqrt [2]
-----------------------------------------------
0.00 1.24 1/1 main [1]
[3] 25.0 0.00 1.24 1 sysSqrt [3]
1.24 0.00 1/2 matSqrt [2]
0.00 0.00 1/2 vecSqrt [6]
-----------------------------------------------
![Page 49: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/49.jpg)
Profiling dos and don’ts
DO
• Test every change you make
• Profile typical cases• Compile with
optimization flags• Test for scalability
DO NOT
• Assume a change will be an improvement
• Profile atypical cases• Profile ad infinitum
– Set yourself a goal or– Set yourself a time limit
![Page 50: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/50.jpg)
PAPI Implementation
Tools
PAPI Low LevelPAPI High Level
Hardware Performance Counter
Operating System
Kernel Extension
PAPI Machine Dependent SubstrateMachine
SpecificLayer
PortableLayer
50
![Page 51: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/51.jpg)
PAPI Performance Monitor
• Provides high level counters for events:– Floating point instructions/operations, – Total instructions and cycles– Cache accesses and misses– Translation Lookaside Buffer (TLB) counts– Branch instructions taken, predicted, mispredicted
• PAPI_flops routine for basic performance analysis– Wall and processor times– Total floating point operations and MFLOPS http://icl.cs.utk.edu/projects/papi
• Low level functions are thread-safe, high level are not
51
![Page 52: Kent Milfeld TACC May 28, 2009](https://reader035.fdocuments.us/reader035/viewer/2022081517/568154bd550346895dc2c56e/html5/thumbnails/52.jpg)
High-level API
• C interfacePAPI_start_countersPAPI_read_countersPAPI_stop_countersPAPI_accum_countersPAPI_num_countersPAPI_flips PAPI_ipc
• Fortran interfacePAPIF_start_countersPAPIF_read_countersPAPIF_stop_countersPAPIF_accum_countersPAPIF_num_countersPAPIF_flipsPAPIF_ipc
52
Instrumenting Code:1.) Specify “events” in an array.2.) Initialize library.3.) Start counting events.4.) Stop counting (and accumulate).