Post on 22-Feb-2016
description
More Charm++/TAU examplesApplications: NAMD Parallel Framework for Unstructured Meshing (ParFUM)
Features:• Profile snapshots:
• Captures the runtime of the application by segregating it into user specified intervals
• CUDA Profiling• Tracks time spent in CUDA kernel routines• Shows scaling behavior for a experiment varying the
number of devices used.
Load Balancing Phases NAMD Snapshot Profile of over 800sec on 2048 processors
Mea
n Ex
clus
ive
Tim
eSt
anda
rd D
evia
tion enqueneSelfB
enqueneSelfA
Main
enqueneWorkBenqueneWorkA
Idle
NAMD CUDA events
GPU efficiency gained by doubling the number of GPU from 16 to 32. These Events are broken down by routine and by device number.
Device #0
~100% efficiency
~50% efficiency
NAMD CUDA scaling
Non-Bonded Calculations
Sum Forces Calculations
Scaling by event and device number, Non-Bonded Calculations scale well. Sum Forces less well but the overall time is only a few microseconds.
Number of Devices
Scal
ing
Effici
ency
ParFUM CUDA speedup
128x8x8 Mesh0
50
100
150
200
250
Total time using only a CPUTotal Time with CUDA accelerationTime spent in CUDA Kernel
Single CPU or GPU Performance on a 128x8x8 mesh. When run with GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel routines.