More Charm++/TAU examples
description
Transcript of More Charm++/TAU examples
![Page 1: More Charm++/TAU examples](https://reader035.fdocuments.us/reader035/viewer/2022062410/56816238550346895dd269b2/html5/thumbnails/1.jpg)
More Charm++/TAU examplesApplications: NAMD Parallel Framework for Unstructured Meshing (ParFUM)
Features:• Profile snapshots:
• Captures the runtime of the application by segregating it into user specified intervals
• CUDA Profiling• Tracks time spent in CUDA kernel routines• Shows scaling behavior for a experiment varying the
number of devices used.
![Page 2: More Charm++/TAU examples](https://reader035.fdocuments.us/reader035/viewer/2022062410/56816238550346895dd269b2/html5/thumbnails/2.jpg)
Load Balancing Phases NAMD Snapshot Profile of over 800sec on 2048 processors
Mea
n Ex
clus
ive
Tim
eSt
anda
rd D
evia
tion enqueneSelfB
enqueneSelfA
Main
enqueneWorkBenqueneWorkA
Idle
![Page 3: More Charm++/TAU examples](https://reader035.fdocuments.us/reader035/viewer/2022062410/56816238550346895dd269b2/html5/thumbnails/3.jpg)
NAMD CUDA events
GPU efficiency gained by doubling the number of GPU from 16 to 32. These Events are broken down by routine and by device number.
Device #0
~100% efficiency
~50% efficiency
![Page 4: More Charm++/TAU examples](https://reader035.fdocuments.us/reader035/viewer/2022062410/56816238550346895dd269b2/html5/thumbnails/4.jpg)
NAMD CUDA scaling
Non-Bonded Calculations
Sum Forces Calculations
Scaling by event and device number, Non-Bonded Calculations scale well. Sum Forces less well but the overall time is only a few microseconds.
Number of Devices
Scal
ing
Effici
ency
![Page 5: More Charm++/TAU examples](https://reader035.fdocuments.us/reader035/viewer/2022062410/56816238550346895dd269b2/html5/thumbnails/5.jpg)
ParFUM CUDA speedup
128x8x8 Mesh0
50
100
150
200
250
Total time using only a CPUTotal Time with CUDA accelerationTime spent in CUDA Kernel
Single CPU or GPU Performance on a 128x8x8 mesh. When run with GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel routines.