More Charm++/TAU examples

Post on 22-Feb-2016

48 views 0 download

Tags:

description

More Charm++/TAU examples. Applications: NAMD Parallel Framework for Unstructured Meshing ( ParFUM ) Features: Profile snapshots: Captures the runtime of the application by segregating it into user specified intervals CUDA Profiling Tracks time spent in CUDA kernel routines - PowerPoint PPT Presentation

Transcript of More Charm++/TAU examples

More Charm++/TAU examplesApplications: NAMD Parallel Framework for Unstructured Meshing (ParFUM)

Features:• Profile snapshots:

• Captures the runtime of the application by segregating it into user specified intervals

• CUDA Profiling• Tracks time spent in CUDA kernel routines• Shows scaling behavior for a experiment varying the

number of devices used.

Load Balancing Phases NAMD Snapshot Profile of over 800sec on 2048 processors

Mea

n Ex

clus

ive

Tim

eSt

anda

rd D

evia

tion enqueneSelfB

enqueneSelfA

Main

enqueneWorkBenqueneWorkA

Idle

NAMD CUDA events

GPU efficiency gained by doubling the number of GPU from 16 to 32. These Events are broken down by routine and by device number.

Device #0

~100% efficiency

~50% efficiency

NAMD CUDA scaling

Non-Bonded Calculations

Sum Forces Calculations

Scaling by event and device number, Non-Bonded Calculations scale well. Sum Forces less well but the overall time is only a few microseconds.

Number of Devices

Scal

ing

Effici

ency

ParFUM CUDA speedup

128x8x8 Mesh0

50

100

150

200

250

Total time using only a CPUTotal Time with CUDA accelerationTime spent in CUDA Kernel

Single CPU or GPU Performance on a 128x8x8 mesh. When run with GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel routines.