S3D: Comparing Performance of XT3+XT4 with XT4 Sameer Shende [email protected].
Scalability Study of S3D using TAU Sameer Shende [email protected].
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Scalability Study of S3D using TAU Sameer Shende [email protected].
![Page 2: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/2.jpg)
TAU Performance SystemS3D Scalability Study 2
Acknowledgements
Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL]
The performance data presented here is available at:
http://www.cs.uoregon.edu/research/tau/s3d
![Page 3: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/3.jpg)
TAU Performance SystemS3D Scalability Study 3
TAU Parallel Performance System
http://www.cs.uoregon.edu/research/tau/ Multi-level performance instrumentation
Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system
Computer system architectures and operating systems Different programming languages and compilers
Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid
![Page 4: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/4.jpg)
TAU Performance SystemS3D Scalability Study 4
Scalability Study
Harness testcase Platform: Jaguar Cray XT3 at ORNL
1p 8p 64p 512p
Goal: to evaluate scaling properties of code regions Scalability of MPI operations
![Page 5: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/5.jpg)
TAU Performance SystemS3D Scalability Study 5
Introduction to ParaProf: Main Window
click left mouse button
click right mouse button
% paraprof *.ppkload all 1p, 8p, 64p, 512pprofile datasets together
![Page 6: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/6.jpg)
TAU Performance SystemS3D Scalability Study 6
ParaProf: MFLOPs sorted by Exclusive Time
![Page 7: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/7.jpg)
TAU Performance SystemS3D Scalability Study 7
Source Code View
![Page 8: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/8.jpg)
TAU Performance SystemS3D Scalability Study 8
Comparison Window: Inclusive Time
![Page 9: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/9.jpg)
TAU Performance SystemS3D Scalability Study 9
Comparing Level 1 Data Cache Misses
![Page 10: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/10.jpg)
TAU Performance SystemS3D Scalability Study 10
CPU Resource Stalls
![Page 11: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/11.jpg)
TAU Performance SystemS3D Scalability Study 11
ParaProf: 3D view for 512 cpus - Jagged Edges!
![Page 12: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/12.jpg)
TAU Performance SystemS3D Scalability Study 12
MPI_Wait - Jagged Edges Seen in 3D Window
pattern repeatsevery 8 cpus!
512 cpus
![Page 13: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/13.jpg)
TAU Performance SystemS3D Scalability Study 13
MPI_Wait - Histogram (Bins) View
![Page 14: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/14.jpg)
TAU Performance SystemS3D Scalability Study 14
Comparing MPI_Wait
MPI_Wait time increases steadily with processors!
![Page 15: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/15.jpg)
TAU Performance SystemS3D Scalability Study 15
PerfDMF: Performance Data Mgmt. Framework
![Page 16: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/16.jpg)
TAU Performance SystemS3D Scalability Study 16
PerfExplorer - Comparative Analysis Relative speedup, efficiency
total runtime, by event, one event, by phase Breakdown of total runtime Group fraction of total runtime Correlating events to total runtime Timesteps per second
![Page 17: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/17.jpg)
TAU Performance SystemS3D Scalability Study 17
PerfExplorer
TAU’sPerfDMFdatabase
S3D
![Page 18: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/18.jpg)
TAU Performance SystemS3D Scalability Study 18
PerfExplorer: Select Experiment & Analysis
![Page 19: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/19.jpg)
TAU Performance SystemS3D Scalability Study 19
Relative Efficiency By Event
![Page 20: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/20.jpg)
TAU Performance SystemS3D Scalability Study 20
Relative Efficiency For S3D - Weak Scaling
![Page 21: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/21.jpg)
TAU Performance SystemS3D Scalability Study 21
Relative Speedup
![Page 22: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/22.jpg)
TAU Performance SystemS3D Scalability Study 22
Relative Efficiency & Speedup for One Event
![Page 23: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/23.jpg)
TAU Performance SystemS3D Scalability Study 23
Data Mining: Event Correlation to Total Time
r = 1 impliesdirect correlation
![Page 24: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/24.jpg)
TAU Performance SystemS3D Scalability Study 24
MPI Scaling
![Page 25: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/25.jpg)
TAU Performance SystemS3D Scalability Study 25
Total Runtime Breakdown by Events
![Page 26: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/26.jpg)
TAU Performance SystemS3D Scalability Study 26
S3D - Building with TAU Change name of compiler in build/make.XT3
ftn=> tau_f90.sh cc => tau_cc.sh
Set compile time environment variables setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/
Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation
setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling
Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script:
export TAU_THROTTLE=1 export COUNTER1 GET_TIME_OF_DAY export COUNTER2 PAPI_FP_INS export COUNTER3 PAPI_L1_DCM export COUNTER4 PAPI_RES_STL export COUNTER5 PAPI_L2_DCM
![Page 27: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/27.jpg)
TAU Performance SystemS3D Scalability Study 27
Selective Instrumentation in TAU
% cat select.tauBEGIN_EXCLUDE_LIST
MCADIF
GETRATES
TRANSPORT_M::MCAVIS_NEW
MCEDIF
MCACON
CKYTCP
THERMCHEM_M::MIXCP
THERMCHEM_M::MIXENTH
THERMCHEM_M::GIBBSENRG_ALL_DIMT
CKRHOY
MCEVAL4
THERMCHEM_M::HIS
THERMCHEM_M::CPS
THERMCHEM_M::ENTROPY
END_EXCLUDE_LIST
BEGIN_INSTRUMENT_SECTION
loops routine="#"
END_INSTRUMENT_SECTION
![Page 28: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/28.jpg)
TAU Performance SystemS3D Scalability Study 28
Getting Access to TAU on Jaguar set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var.) from
/spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* Makefile.tau-mpi-pdt-pgi (flat profile) Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile)
Binaries of S3D can be found in: ~sameer/scratch/S3D-BINARIES
withtau» papi, multiplecounters, mpi, pdt, pgi options
without_tau
![Page 29: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/29.jpg)
TAU Performance SystemS3D Scalability Study 29
Concluding Discussion Performance tools must be used effectively More intelligent performance systems for productive use
Evolve to application-specific performance technology Deal with scale by “full range” performance exploration Autonomic and integrated tools Knowledge-based and knowledge-driven process
Performance observation methods do not necessarily need to change in a fundamental sense More automatically controlled and efficiently use
Develop next-generation tools and deliver to community Open source with support by ParaTools, Inc. http://www.cs.uoregon.edu/research/tau
![Page 30: Scalability Study of S3D using TAU Sameer Shende tau-team@cs.uoregon.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a1a12b/html5/thumbnails/30.jpg)
TAU Performance SystemS3D Scalability Study 30
Support Acknowledgements
Department of Energy (DOE)
Office of Science LLNL, LANL, ORNL, ASC PERI