Prof. Dr. Michael Gerndt Technische Univeristät München [email protected]
description
Transcript of Prof. Dr. Michael Gerndt Technische Univeristät München [email protected]
![Page 1: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/1.jpg)
Beyond Automatic Performance Analysis
Prof. Dr. Michael GerndtTechnische Univeristät München
![Page 2: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/2.jpg)
www.autotune-project.eu
Performance Analysis and Tuning is Essential
![Page 3: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/3.jpg)
www.autotune-project.euPerformance Analysis at Scale
A high level description of the performance of cosmology code MADCAP.
Source: David Skinner, NERSC
![Page 4: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/4.jpg)
www.autotune-project.eu
Performance Analysis for Parallel Systems
• Development cycle– Assumption: Reproducibility
• Instrumentation– Static vs Dynamic– Source-level vs binary-level
• Monitoring– Software vs Hardware– Statistical profiles vs event traces
• Analysis– Source-based tools– Visualization tools– Automatic analysis tools
Coding
Performance Monitoring
and Analysis
Production
Program Tuning
![Page 5: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/5.jpg)
www.autotune-project.euPeriscope
• Automated search – Based on formalized performance properties
• Online analysis– Search performed while application is executing
• Distributed search– User specified number of analysis agents– Additional cores for agents
• Profile data only– even for MPI Waittime analysis
![Page 6: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/6.jpg)
www.autotune-project.euProperties
• StallCycles(Region, Rank, Thread, Metric, Phase)– Condition: Percentage of lost cycles >30%– Confidence: 1.0– Severity: Percentage of lost cycles
• StallCyclesIntegerLoads– Requires access to two counters
• L3MissesDominatingMemoryAccess– Condition: Importance of L3 misses (theoretical latencies)– Severity: Importance multiplied by actual stall cycles
![Page 7: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/7.jpg)
www.autotune-project.euPeriscope Design
Interactive Frontend
Performance Analysis Agent Network
Application with Monitor
MRI
Master Agent
CommunicationAgent
AnalysisAgent
![Page 8: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/8.jpg)
www.autotune-project.euAgent Search Strategies
• Application phase is a period of program‘s execution– Phase regions
• Full program• Single user region assumed to be repetitive
– Phase boundaries have to be global (SPMD programs)
• Search strategies– Determine hypothesis refinement
• Region nesting• Property hierarchy-based refinement
![Page 9: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/9.jpg)
www.autotune-project.euAgent Search Strategies
• Stall cycle analysis• Stall cycle analysis with breadth first search• MPI strategy• OpenMP strategy• OMP Scalability strategy• Benchmarking strategy• Default strategy
![Page 10: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/10.jpg)
www.autotune-project.euCrystal Growth Simulation
![Page 11: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/11.jpg)
www.autotune-project.euResults
USER_REGION; cx.f:61; 78.361; IA64 Pipeline Stall Cycles46.250; Stalls due to L1D TLB misses ...36.018; L3 misses dominate memory access30.294; Stalls due to waiting for FP register
CALL_REGION; cx.f:70; 52.324; IA64 Pipeline Stall Cycles29.468; L3 misses dominate memory access26.858; Stalls due to L1D TLB misses ...23.972; Stalls due to waiting for FP register
velo; cx.f:622; 52.316; IA64 Pipeline Stall Cycles29.465; L3 misses dominate memory access26.854; Stalls due to L1D TLB misses ...23.972; Stalls due to waiting for FP register
LOOP_REGION; cx.f:731; 32.512; IA64 Pipeline Stall Cycles24.098; L3 misses dominate memory access20.035; Stalls due to waiting for FP register
![Page 12: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/12.jpg)
www.autotune-project.euIntegration in Eclipse (PTP)
Where is the problem?
What is themost severe
problem?
Filter problems for region
![Page 13: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/13.jpg)
www.autotune-project.eu
Altix 4700
PerSyst: Periscope System Monitoring• Distributed fault tolerant architecture• Incremental analysis
High Level Agent
Synchronisierung und Aggregation
Analysis Agent
IO Agent
…1 Agent
per Partition
Data Base
Analysis Agent
IO Agent
![Page 14: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/14.jpg)
www.autotune-project.eu
280 MB/Tag
PerSyst: Data Reduction
• Aggregation of data in properties• Aggregation per job
High Level Agent
Analysis Agent
IO Agent
…
Data Base
Analysis Agent
IO Agent
< 18 MB/Tag
< 244 MB/Tag
![Page 15: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/15.jpg)
www.autotune-project.euAutoTune
• Automatic application tuning– Performance and energy
• Parallel architectures– HPC and parallel servers– Homogeneous and heterogeneous– Multicore and GPU accelerated systems– Reproducable execution capabilities
• Variety of parallel pardigms– MPI, HMPP, parallel patterns
![Page 16: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/16.jpg)
www.autotune-project.euPartners
Technische Universität München
Universität Wien
CAPS Entreprises
Universitat Autònoma de Barcelona
Leibniz Computing Centre
National University of Galaway, ICHEC
![Page 17: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/17.jpg)
www.autotune-project.euAutotune Approach
• Predefined tuning strategies combining performance analysis and tuning
• Plugins– Compiler based optimization– HMPP tuning for GPUs– Parallel pattern tuning– MPI tuning– Energy efficiency tuning
![Page 18: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/18.jpg)
www.autotune-project.eu
Periscope Tuning Framework
• Online– Analysis and evaluation of tuned
version in single application run– Multiple versions in single step due to
parallelism in application
• Result– Tuning recommendation– Adaptation of source code and /or
execution environment– Impact on production runs
![Page 19: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/19.jpg)
www.autotune-project.euAutotuning Extension in HMPP
• Directives to provide optimization space to explore– Parameterized loop transformations– Alternative/specialized code declaration to specify various
implementations
• Runtime API– Optimization space description– Static code information collect– Dynamic information collect (i.e. timing, parameter values)
#pragma hmppcg(CUDA) unroll(RANGE), jam for( i = 0 ; i < n; i++ ) { for( j = 0 ; j < n; j++ ) { . . . VC(j,i) = alpha*prod+beta * VC(j,i); } }
![Page 20: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/20.jpg)
www.autotune-project.eu
Energy Efficiency Plugin(LRZ)
1. Preparation phase:• Selection of possible core frequencies • Selection of regions for code instrumentation
2. “While”-loop (until region refinement): “For”-loop (all frequencies) :
a) Set new frequency of tuned regionb)Periscope analysis (instrumentation, (re-)compiling , start and stop experiment)c) Measure execution time + energy of tuned regiond)Evaluate experiment
“End for”3. Evaluate results of all experiments in refinement loop4. Store best frequencies-region combination
![Page 21: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/21.jpg)
www.autotune-project.euParallel Pattern/MPI Tuning
• PP or MPI Plugin encloses a performance model (ex. M/W) based on Periscope-provided performance data
• Automatically generated tuning decisions are sent to Tuner
• Tuner dynamically modifies the application before next experiment
+Tuner
PP or MPI
Plugin
![Page 22: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/22.jpg)
www.autotune-project.euExpected Impact
• Improved performance of applications• Reduced power consumption of parallel systems• Facilitated program development and porting• Reduced time for application tuning• Leadership of European performance tools groups• Strengthened European HPC industry
![Page 23: Prof. Dr. Michael Gerndt Technische Univeristät München gerndt@in.tum.de](https://reader035.fdocuments.us/reader035/viewer/2022062315/568160f8550346895dd03628/html5/thumbnails/23.jpg)
www.autotune-project.eu
THANK YOU