The Sparse Matrix Vector Product on High-End GPUs
Transcript of The Sparse Matrix Vector Product on High-End GPUs
![Page 1: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/1.jpg)
KIT – The Research University in the Helmholtz Association
HartwigAnzt,TerryCojean, Yuhsiang M.TsaiSteinbuchCentre for Computing (SCC)
www.kit.edu
TheSparseMatrixVectorProductonHigh-EndGPUsSIAMConferenceonParallelProcessingforScientificComputing(PP20)February12- 15,2020HyattRegencySeattle|Seattle,Washington,U.S.
MikeTsai
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of EnergyOffice of Science and the National Nuclear Security Administration and the Helmholtz Impuls und VernetzungsfondVH-NG-1241.
![Page 2: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/2.jpg)
2 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
SpMV onGPUs– MovingawayfromtheNVIDIAhegemony
• Inthepast,NVIDIAGPUsweredominating theGPGPUmarket;
• WeseeanincreasingadoptionofAMDGPUsinleadershipsupercomputers:• FrontiersysteminOakRidge (2021)• ElCapitaninLawrenceLivermoreNationalLab?(2023)
• AMDisheavilyinvestingintheHIPsoftwaredevelopmentecosystem;• HIPprogramming similartoCUDAprogramming;• HIPlibrariessimilartocuBLAS, cuSPARSE,…
• TheRaceison!
• HowcanwepreparetheGinkgosparselinearalgebralibrary forcross-platformportability?
• AretheCUDA-optimizedkernelssuitableforAMDGPUs?
• HowdoestheperformancecompareacrossdifferentGPUs?
![Page 3: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/3.jpg)
3 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDA
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
https://github.com/ginkgo-project/ginkgo
Part of
https://xsdk.info/
![Page 4: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/4.jpg)
4 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
![Page 5: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/5.jpg)
5 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
CUDA HIP
![Page 6: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/6.jpg)
6 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
![Page 7: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/7.jpg)
7 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
LibraryInfrastructureAlgorithm Implementations• IterativeSolvers• Preconditioners• …
Core
OpenMP-kernels• SpMV• Solverkernels• Precond kernels• …
OpenMPReferencekernels• SpMV• Solverkernels• Precond kernels• …
ReferenceCUDA-GPUkernels• SpMV• Solverkernels• Precond kernels• …
CUDAHIP-GPUkernels• SpMV• Solverkernels• Precond kernels• …
HIP
Librarycorecontainsarchitecture-agnosticalgorithm implementation;
Runtimepolymorphism selectstherightkerneldepending onthetargetarchitecture;
Architecture-specifickernelsexecutethealgorithmontargetarchitecture;
Referencearesequentialkernels tocheckcorrectnessofalgorithmdesignandoptimizedkernels;
Optimizedarchitecture-specifickernels;
Kernels
• SharedkernelsCommon
Toavoidcodeduplication,commoncontainskernelssharedbetweenCUDAandHIP(upon parameterconfigs)
![Page 8: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/8.jpg)
8 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
ExtendGinkgo’shardwarescopetoAMDGPUs
• KernelssharedbetweenCUDAandAMDbackends (upon parametersetting)arerelocatedinthe``common’’ module.
• NewcodenecessaryforHIP-specificoptimizationsandforimplementingfunctionalitycurrentlymissing intheHIPecosystem(e.g.cooperativegroups).
CUDA
new
CUDA
common
HIP
![Page 9: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/9.jpg)
9 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- COOSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 10: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/10.jpg)
10 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- CSRSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 11: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/11.jpg)
11 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- ELLSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 12: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/12.jpg)
12 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoesGinkgocomparetothevendorlibraries- hybridSpMV
GinkgovsHIPsparse onRadeonVII GinkgovscuSPARSE onV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 13: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/13.jpg)
13 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
PerformanceProfileonAMD’sRadeonVII
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 14: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/14.jpg)
14 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
PerformanceProfileonNVIDIA’sV100
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 15: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/15.jpg)
15 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
![Page 16: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/16.jpg)
16 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
![Page 17: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/17.jpg)
17 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
Outlierstatson100runsa20reps:
![Page 18: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/18.jpg)
18 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
• NativeCUDAvs.HIPcompiledforNVIDIAGPUs• Samekernel• AlltestsonNVIDIAV100(Summit)• WeexpectCUDAtobeslightlyfaster
outliers?machinenoise?HIPfasterthanCUDAonNVIDIAGPU?
Outlierstatson100runsa20reps:
Reproducible– butrelevant?
![Page 19: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/19.jpg)
19 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CompilingHIPcodeforNVIDIAGPUs– comparisonagainstnativeCUDAcode
HIPcodefaster
CUDAcodefaster
• Running onV100GPU• 2,800testmatrices• Comparekeyfunctionality
• GinkgoSellp SpMV• GinkgoCooSpMV• Vendor’sCsr SpMV• Ginkgo’s CGsolver
SlightadvantagesontheCUDAside,butusually<5%.
![Page 20: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/20.jpg)
20 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoCOOSpMV Vendor libraryCOOSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 21: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/21.jpg)
21 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoCSRSpMV Vendor libraryCSRSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 22: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/22.jpg)
22 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
HowdoGPUarchitecturescompareintermsofSpMV performance?
GinkgoELLSpMV Vendor libraryELLSpMV
Resultsandinteractiveperformanceexploreravailableat: https://ginkgo-project.github.io/gpe/
![Page 23: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/23.jpg)
23 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Summary:TheSparseMatrixVectorProductonHigh-EndGPUs
• AMDanditsHIPsoftwareecosystembecomesarelevantalternativetoNVIDIACUDA;
• Significantsimilarities inthelanguages (HIP/CUDA)allowsforsharedkernel implementations;
• HIPallowstocompileforNVIDIAGPUs
– inmostcaseswithmoderateperformance losscomparedtonativeCUDAcode;
• AMD GPUsandNVIDIA GPUscomparableinsparselinearalgebraperformance;
• Wedeployed comprehensivecross-platformSpMV functionalityintheGinkgo library;
• Weprovideacomprehensive SpMV performancestudyforinteractiveexploration:
https://ginkgo-project.github.io/gpe/
CheckoutourposteronGinkgoatthepostersessiontonight inPP2at6pm,5th floor:Ginkgo- aNode-LevelSparseLinearAlgebraLibraryforHighPerformanceComputing
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of EnergyOffice of Science and the National Nuclear Security Administration and the Helmholtz Impuls und VernetzungsfondVH-NG-1241.
https://xsdk.info/
![Page 24: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/24.jpg)
24 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
CITest
PerformanceDataRepository
ContinuousIntegration(CI)
DeveloperCodeReview
CIBuildSourceCodeRepository
Push
Schedule inBatchSystem
Web-Application
HPCSystem
TrustedReviewer
Users
MergeintoMasterBranch
CIBenchmarkTests https://ginkgo-project.github.io/
https://ginkgo-project.github.io/gpe/
Backup:Ginkgodevelopmentworkflow
![Page 25: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/25.jpg)
25 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Backup:GinkgoDesign
• Open-sourceC++frameworkforsparselinearalgebra.• Sparselinearsolvers,preconditioners, SpMV etc.• FocusedonMulticoreandManycore accelerators;• Softwarequalityandsustainabilityefforts
guidedbyxSDK communitypolicies:
•
•
•
https://xsdk.info/
• Staticpolymorphism fortemplatingprecisions• ValueType (default:Z,C,D,S), IndexType (int32/64)
• Smartpointerstoavoidmemory leaks• Runtimepolymorphism foroperators andkernels
• Kernelshavethesamesignaturefordifferentarchitectures
• Executordetermineswhichkernel isused
• LinOp classforanylinearoperator:• Matrices• Solvers• Preconditioners• …
generateapply…
DetermineswhereDatalives&operationsareexecuted
![Page 26: The Sparse Matrix Vector Product on High-End GPUs](https://reader033.fdocuments.us/reader033/viewer/2022041516/62527db90ab0856ffc610571/html5/thumbnails/26.jpg)
26 02/13/2020Hartwig Anzt:TheSparseMatrixVectorProduct onHigh-EndGPUs
Backup:GPUformatconversion