AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...
Transcript of AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling...
![Page 1: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/1.jpg)
AMD ROCm GPU profiling in
Trace Compass
Arnaud Fiorini with Pr. Michel Dagenais
May 8th, 2020
Polytechnique Montreal
DORSAL Laboratory
![Page 2: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/2.jpg)
POLYTECHNIQUE MONTREAL – Arnaud Fiorini 2
I. Introduction
1. GPU Development
2. Optimization Problems
II. Tracing and profiling of CPU-GPU systems
1. ROC Platform
2. Tracing GPUs
3. Profiling GPUs
Agenda
![Page 3: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/3.jpg)
GPU Development - Introduction
33POLYTECHNIQUE MONTREAL – Arnaud Fiorini 3
• A few definitions :
• Kernel : A small piece of code executed on the device.
![Page 4: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/4.jpg)
GPU Development - Introduction
44POLYTECHNIQUE MONTREAL – Arnaud Fiorini 4
• A few definitions :
• Kernel : A small piece of code executed on the device.• Heterogeneous system : system mixing multiple types of processors
CPU GPUFPGACPU DSP…
Unified Memory
![Page 5: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/5.jpg)
GPU Development - Introduction
55POLYTECHNIQUE MONTREAL – Arnaud Fiorini 5
• SIMD Architecture :
Data pool
Instruction pool
APU
APU
APU
APU
![Page 6: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/6.jpg)
GPU Development - Introduction
6
© 2019 AMD Corporation
6POLYTECHNIQUE MONTREAL – Arnaud Fiorini 6
![Page 7: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/7.jpg)
Optimization Problems - Introduction
77POLYTECHNIQUE MONTREAL – Arnaud Fiorini 7
• Communication Overhead :• Memory synchronisation• Interprocessor Communication
• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels
• Shared Cache :• Cache misses, thrashing
![Page 8: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/8.jpg)
Optimization Problems - Introduction
88POLYTECHNIQUE MONTREAL – Arnaud Fiorini 8
• Communication Overhead :• Memory synchronisation• Interprocessor Communication
• Scheduling and load balancing :• Benchmarking• Load characteristics of kernels
• Shared Cache :• Cache misses, thrashing
Tracing
Profiling (Performance Counters)
![Page 9: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/9.jpg)
POLYTECHNIQUE MONTREAL – Arnaud Fiorini 9
I. Introduction
1. GPU Development
2. Optimization Problems
II. Tracing and profiling of CPU-GPU systems
1. ROC Platform
2. Tracing GPUs
3. Profiling GPUs
Agenda
![Page 10: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/10.jpg)
ROC Platform - Tracing and profiling of CPU-GPU systems
1010POLYTECHNIQUE MONTREAL – Arnaud Fiorini 10
© 2019 AMD Corporation https://rocm.github.io/
![Page 11: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/11.jpg)
ROC Platform - Tracing and profiling of CPU-GPU systems
1111POLYTECHNIQUE MONTREAL – Arnaud Fiorini 11
User Application
ROC runtime
HSA Kernel Agent
GPU
ROCm functioning summarized :
User Mode Command
Queue
User context
![Page 12: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/12.jpg)
ROC Platform - Tracing and profiling of CPU-GPU systems
1212POLYTECHNIQUE MONTREAL – Arnaud Fiorini 12
User Application
ROC runtime
HSA Kernel Agent
User Mode Command
QueueGPU
• Open source• Existing mechanism to insert
trace points• Standardized interface (HSA)
ROCm functioning summarized :
User context
![Page 13: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/13.jpg)
ROC Platform - Tracing and profiling of CPU-GPU systems
1313POLYTECHNIQUE MONTREAL – Arnaud Fiorini 13
• This work has already been done by AMD and is open source : https://github.com/ROCm-Developer-Tools/rocprofilerhttps://github.com/ROCm-Developer-Tools/roctracer
• AMD has released a few other libraries and tools thanks to their Radeon Open Compute initiative.
![Page 14: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/14.jpg)
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1414POLYTECHNIQUE MONTREAL – Arnaud Fiorini 14
TraceCompass ROCm plugin
HSA function calls separated by thread
Kernel executions
![Page 15: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/15.jpg)
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1515POLYTECHNIQUE MONTREAL – Arnaud Fiorini 15
HIP function calls separated by thread
Kernel executions
TraceCompass ROCm plugin
![Page 16: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/16.jpg)
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1616POLYTECHNIQUE MONTREAL – Arnaud Fiorini 16
TraceCompass ROCm plugin running on Theia front-end
![Page 17: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/17.jpg)
Tracing GPUs - Tracing and profiling of CPU-GPU systems
1717POLYTECHNIQUE MONTREAL – Arnaud Fiorini 17
Analyzing this tracing data further, future work includes :
• Critical path analysis of CPU-GPU execution
• Determining whether the program performance is limited by the GPU or the CPU
• Extracting statistics to use in profiling analysis
![Page 18: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/18.jpg)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
1818POLYTECHNIQUE MONTREAL – Arnaud Fiorini 18
TraceCompass ROCm plugin
Performance counters
![Page 19: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/19.jpg)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
1919POLYTECHNIQUE MONTREAL – Arnaud Fiorini 19
β : peak bandwidthI : arithmetic intensityπ : peak performance
Using tracing and profiling data, future work includes :
• Roofline model
![Page 20: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/20.jpg)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2020POLYTECHNIQUE MONTREAL – Arnaud Fiorini 20
β : peak bandwidthI : arithmetic intensityπ : peak performance
Using tracing and profiling data, future work includes :
• Roofline model
![Page 21: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/21.jpg)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2121POLYTECHNIQUE MONTREAL – Arnaud Fiorini 21
Using tracing and profiling data, future work includes :
• Top-down analysis
GPU Architectural Model (Top-down
analysis)
Derived MetricsPerformance Counters
![Page 22: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/22.jpg)
GPU Architectural Model (Top-down
analysis)
Profiling GPUs - Tracing and profiling of CPU-GPU systems
2222POLYTECHNIQUE MONTREAL – Arnaud Fiorini 22
Using tracing and profiling data, future work includes :
• Top-down analysis
Derived MetricsPerformance Counters
![Page 23: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/23.jpg)
2323POLYTECHNIQUE MONTREAL – Arnaud Fiorini 23
Thank you for listening !
Questions ?
![Page 24: AMD ROCm GPU profiling in Trace Compass › system › files › Progress...AMD ROCm GPU profiling in Trace Compass Arnaud Fiorini with Pr. Michel Dagenais May 8th, 2020 Polytechnique](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d42d27e708231d439778b/html5/thumbnails/24.jpg)
References
2424POLYTECHNIQUE MONTREAL – Arnaud Fiorini 24
• https://github.com/RadeonOpenCompute/ROCm• https://rocm-documentation.readthedocs.io/en/latest/• http://www.hsafoundation.com/• HSA Runtime Programmer’s Reference Manual, Version 1.2• HSA Programmer's Reference Manual, Version 1.2• HSA Platform System Architecture Specification, Version 1.2• https://github.com/ucb-bar/opencl-
kernels/blob/master/saxpy/kernel.cl• https://medium.com/@smallfishbigsea/basic-concepts-in-gpu-
computing-3388710e9239• https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-
architecture.pdf• https://software.intel.com/content/www/us/en/develop/docu
mentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html