dPOMP: An Infrastructure for Performance Monitoring of ... · • Intel KAI Software Laboratory...
Transcript of dPOMP: An Infrastructure for Performance Monitoring of ... · • Intel KAI Software Laboratory...
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 1
dPOMP:An Infrastructure for Performance Monitoring of OpenMP Applications
Bernd Mohr
Forschungszentrum Jülich (FZJ)John von Neumann - Institut für Computing (NIC)
Zentralinstitut für Angewandte Mathematik (ZAM)52425 Jülich, [email protected]
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [2]
dPOMP Team
• Luiz DeRose• IBM Research, ACTC• Yorktown Heights, NY, USA• [email protected]
• Seetharami Seelam• IBM Research, ACTC• Yorktown Heights, NY, USA• [email protected]
• Bernd Mohr• Forschungszentrum Jülich,
ZAM• [email protected]
Thomas J. Watson Research CenterPO Box 218Yorktown Heights, NY 10598
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 2
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [3]
Outline
• What is POMP?
• What is DPCL?
• IBM compiler and run-time library featuresthat makes dPOMP possible
• dPOMP Implementation
• Examples of use
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [4]
The Motivation: PMPI - The MPI Profiling Interface
• PMPI allows selective replacement of MPI routines at link time⇒ no re-compilation necessary
• Uses technique of “wrapper” function libraries• Used by most MPI performance tools
• Vampirtrace, MP_profiler, MPICH MPE, TAU, EPILOG, …
User program
Call MPI_Bcast
Call MPI_Send
MPI Library
MPI_Bcast
PMPI_Send
MPI_Send
MPI library
MPI_Bcast
PMPI_Send
MPI_Send
Profiling library
MPI_Send
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 3
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [5]
“Standard” OpenMP Monitoring API?
• Problem:• OpenMP (unlike MPI) does not define
standard monitoring interface• OpenMP is defined mainly by directives/pragmas
• Solution:• POMP: OpenMP Monitoring Interface• Joint Development
– Forschungszentrum Jülich– University of Oregon
• Presented at EWOMP’01, LACSI’01 and SC’01
“The Journal of Supercomputing”, 23, Aug. 2002.
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [6]
POMP Instrumentation
POMPmonitoring
library
POMPpreprocessor
POMPinstrumented
programOpenMPcompiler
POMPenabled
RTSOpenMPcompiler
OpenMPprogram
OpenMP compilerwith --pomp
POMPenabled
executable
binaryinstrumentorexecutableOpenMP
compiler
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 4
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [7]
Prototype POMP Instrumentation Tool
•• OOpenMP PPragma AAnd RRegion IInstrumentor• Source-to-source translator to insert POMP calls
around OpenMP constructs and API functions• Implemented in C++
• Supports:• Fortran77 und Fortran90, OpenMP 2.0• C und C++, OpenMP 1.0• Additional POMP directives for control and region definition• EPILOG and TAU POMP measurement libraries• Preserves source code information (#line line file)
• Does not support: Instrumentation of user functions
• http://www.fz-juelich.de/zam/kojak/opari/
44
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [8]
OpenMP Monitoring APIs: Other Projects
• European IST Project INTONE• Development of OpenMP programming environment
(includes monitoring interface)• Pallas, CEPBA, Royal Inst. Of Technology, TU Dresden• http://www.cepba.upc.es/intone/
• Intel KAI Software Laboratory (KSL), VGV (Vampir+Guide)• Development of OpenMP monitoring interface inside ASCI• Based on POMP, but further developed in other directions
• Current status:• Design of joint proposal POMP2 == POMP
(presented at EWOMP’02)• Investigating standardization through OpenMP Forum (??)
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 5
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [9]
POMP Functionality
• Call of POMP routines at significant points (“events”)during execution of OpenMP programs
• Instrumentation-time (static) and run-time (dynamic) eventcontext get passed as parameter to POMP routines
• Allows specification of extent of• Instrumentation• Monitoring
• Organization of events into groups and assignment to levelsallows for flexible yet simple control
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [10]
OpenMP Event Model
• OpenMP Directives/Pragmas• ENTER/EXIT of OpenMP construct
plus BEGIN/END of corresponding structured block• Special case parallel loop: CHUNKBEGIN/END, ITERBEGIN/END or
ITEREVENT instead of BEGIN/END
• “Single events” for small constructs like atomic or flush
• OpenMP API calls• ENTER/EXIT for omp_set_*_lock() functions• “Single events” for all API functions
• User functions and regions• ENTER/EXIT or “single events”
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 6
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [11]
1: int main() {2: int id;3:4: #pragma omp parallel private(id)5: {6: id = omp_get_thread_num();7: printf("hello from %d\n", id);8: }9: }
Example: Standard Instrumentation
1: int main() {2: int id;3:
4: #pragma omp parallel private(id)5: {
6: id = omp_get_thread_num();7: printf("hello from %d\n", id);8: }
9: }
*** POMP_Init();
*** POMP_Finalize();
*** { POMP_handle_t pomp_hd1 = 0;*** int32 pomp_tid = omp_get_thread_num();
*** int32 pomp_tid = omp_get_thread_num();
*** }
*** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1,*** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**");
*** POMP_Parallel_begin(pomp_hd1, pomp_tid);
*** POMP_Parallel_end(pomp_hd1, pomp_tid);*** POMP_Parallel_exit(pomp_hd1, pomp_tid);
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [12]
Example: Optimized Instrumentation
1: int main() {2: int id;
*** POMP_handle_t pomp_hd1 = 0;*** POMP_Init();*** POMP_Get_handle(&pomp_hd1,*** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**");3:
*** { int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1, NULL);4: #pragma omp parallel private(id)5: {
*** int32 pomp_tid = omp_get_thread_num();*** POMP_Parallel_begin(pomp_hd1, pomp_tid);6: id = omp_get_thread_num();7: printf("hello from %d\n", id);
*** POMP_Parallel_end(pomp_hd1, pomp_tid);8: }
*** POMP_Parallel_exit(pomp_hd1, pomp_tid);*** }*** POMP_Finalize();9: }
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 7
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [13]
dPOMP Motivation
• Need for testbed for POMP2 proposal• Could be gets never accepted by OpenMP ARB• Even if accepted, may take too long to be implemented
• Need for POMP implementation based on dynamic instrumentation• src-to-src: OPARI• compiler: INTONE• run-time lib: KSL-POMP
• Our Approach• A POMP implementation based on dynamic probes• Built on top of IBM's DPCL
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [14]
What Is DPCL?
• C++ Based Class Library• IBM Poughkeepsie Unix Development Lab• 11 Classes, Plus Additional API's
• Dynamic Instrumentation - Software Probes• Based on DynInst and Paradyn
• Language/Programming Model Independent• Supports Fortran, Fortran 90, C, C++• Requires only information from the executable (a.out)
• Provides a general purpose infrastructure for:• Serial, shared memory, and message passing
• A Platform to Enable Tools Developers To Build ToolsWith Less Time And Effort
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 8
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [15]
DPCL Probes
• DPCL allows tools to insert data, functions, andcode patches (probes) into a program dynamically
• Call site• Call entry• Call exit
• Probes can collect and report program information, program state, or modify the program execution
• Probes may be placed at specific locations in the programand can be activated:
• Whenever execution reaches that location• By expiration of a timer• Exactly once
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [16]
A() {
}
OMP loop
Source code
main() {
}
A()
OMP parallel
OMP end parallel
The IBM Compiler and Run-time Library
run-time library
Compiler generated
A() {
}
xlf_Par
main() {
}
A()
master thread
A@0L1 {
}
xlf_DoPar
all threads
do I=start,endloop body
enddo
A@0L1@OL2 {
}
POMP_Parallel_enter
POMP_Parallel_exit
POMP_Parallel_begin
POMP_Parallel_end
POMP_Loop_enter
POMP_Loop_exit
POMP_Loop_chunk_begin
POMP_Loop_chunk_end
POMP_Function_enter
POMP_Function_exit
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 9
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [17]
Limitations
• 63 out of 68 POMP events supported !
• Limitations due to compiler issues•POMP_Loop_iter_(begin, or end, or event)•POMP_Implicit_barrier_(end, or exit)• OMP Parallel Loop NOT = OMP Parallel / OMP Loop• Compile Time Context (CTC)
– hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available
• Limitations due to DPCL issues• Loop iteration values (init, final, incr, chunk)
• Limitations due to lack of time …• C++ methods instrumentation support
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [18]
Changes and Extensions Due to Open Issues
• Fully defined attribute and values for CTC string
• Event handler is always passed by reference
• Finer instrumentation control• User defined functions
– Function calls in “main” program (outside parallel regions)+ all MPI calls are instrumented by default
– User can provide a file with functions to instrument
• POMP Events– Only events supplied in the monitoring libraries are
instrumented
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 10
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [19]
dPOMP Tool
• Basic usage% dpomp <pomp-lib> <exe>
•<pomp-lib> POMP compliant monitoring library•<exe> OpenMP application (or mixed-mode)
• Performs binary instrumentation• Amount of instrumentation can be controlled by
– By the tool builder: Set of POMP calls availablein the monitoring library
– By the user: Environment variables
• Executes instrumented application
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [20]
dPOMP Tool
• Selective instrumentation of user functions% dpomp –l <func-list-file> <exe>Edit <func-list-file>% dpomp –f <func-list-file> <pomp-lib> <exe>
• Predefined POMP libraries (probes)• pomprof_probe (to generate *.viz profiles)• elg_probe (to generate EPILOG trace files)
• Trial package available from IBM Alphaworks for 2004• dPOMP + pomprof_probe• http://www.alphaworks.ibm.com/tech/dpomp/
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 11
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [21]
POMP Profiler Library (POMPROF)
• POMP compliant library from IBM ACTC
• Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:
• Parallel regions• OpenMP loops inside a parallel region• User defined functions
• Profile data• Presented in the form of an XML file• Visualized with PeekPerf
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [22]
Example: PeekPerf Visualization of POMPROF Output
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 12
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [23]
KOJAK POMP Tracing Library: elg_probe
• POMP monitoring library which generates EPILOG event traces• Processed by KOJAK’s automatic event tracer analyzer EXPERT
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [24]
The KOJAK Project
•• KKit for OObjective JJudgementand AAutomatic KKnowledge-baseddetection of bottlenecks
• Lomg-term goals• Design and Implementation of a
Portable, Generic, and AutomaticPerformance Analysis Environment
• Current focus• Event Tracing• Parallel computers with SMP nodes• MPI, OpenMP, Hybrid (OpenMP + MPI) programming model • Development of research prototypes
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 13
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [25]
Overall KOJAK Architecture
AutomaticAnalysis
userprogram
execute
EPILOGevent trace
EXPERTAnalyzer
EARL
analysisresult
EXPERTPresenter
executable
Semi-automaticInstrumentation
OPARI /TAU instr.
modifiedprogram
Compiler /Linker
Manual Analysis
POMP+PMPIlibraries
EPILOGtrace library
VAMPIRtraceconverter
VTF3event trace
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [26]
KOJAK Architecture
AutomaticAnalysis
executewith dpomp
EPILOGEvent trace
EXPERTAnalyzer
EARL
analysisresult
EXPERTPresenter
executable
Manual Analysis
VAMPIRtraceconverter
VTF3event trace
on IBM AIX
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 14
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [27]
LocationHow is the
problem distributed across the machine?
Performance PropertyWhat problem?
Region TreeWhere in source code?
In what context?
Color CodingHow severe
is the problem?
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [28]
EPILOG Trace Converted to VTF3
• EPILOG-to-VTF3• Maps OpenMP constructs into VAMPIR symbols and activities
DPOMP: OpenMP Tool Infrastructure SCICOMPBologna, March 2004
© 2004 Bernd Mohr 15
© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [29]
Conclusion
• Very productive and effective collaboration with IBM ACTC
• Innovative tool infrastructure for OpenMP
• Available at IBM alphaworks
Future Work
• OPARI• Support for POMP2
• dPOMP• More extensive evaluations• Finish missing features• Remove limitations?