Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation...

50
Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and Beyond 2017 GPU Technology Conference Shogo Okada <[email protected]> Koichi Murakami <[email protected]> Nick Henderson <[email protected]>

Transcript of Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation...

Page 1: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and

Beyond2017 GPU Technology Conference

Shogo Okada <[email protected]> Koichi Murakami <[email protected]>

Nick Henderson <[email protected]>

Page 2: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Outline

Geant4 GPUexperimentation MPEXS

Algorithmresearch

ApplicationdevelopmentGeant4

multi-threading

Page 3: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Big Picture

Page 4: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

(~x, ~p, k)

k 2 {�, e�, e+, . . . }Goal: record effect of particle

interaction in material

Page 5: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Geant4• Toolkit for simulation of particles traveling

through and interacting with matter • Supports wide variety of physics models,

geometries, and materials • Extendable - users can add new models • Used in numerous and diverse

application areas • high energy physics • medical physics • spacecraft • semiconductor devices • biology research

ATLAS

LISA

gMocren

Page 6: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Parallelism• Simulations require many events for statistical significance • Events are IID • Each computation thread processes an event Challenges:• Random nature of simulation leads to thread divergence • Storage of secondary particles • Recording of energy deposition If you want to consider full capability of Geant4:• Very complicated geometry -- non uniform data structures • Many material types • Large data tables to support physics processes

Page 7: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

MPEXS• MPEXS is an adaptation of the core simulation algorithm from Geant4 for

GPU • Target application: X-ray radiotherapy • Geometry: uniformly discretized box • Material: Water with variable density • Physics: Low energy electromagnetics

• Gamma: Compton scattering, photoelectric effect, pair-production • Electron/Positron: ionization, multiple scattering, Bremsstrahlung,

positron annihilation • Each GPU thread tracks an active particle • Secondary particles are stored on thread-local secondary stacks • Threads deposit energy to a shared global domain (via atomicAdd)

Page 8: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

MPEXS - Performance & Validation

Page 9: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Verification for Dose Distribution

z

y

densitywater 1.0 g/cm3

lung 0.26 g/cm3

bone 1.85 g/cm3

air 0.0012 g/cm3

- phantom size : 30.5 x 30.5 x 30 cm - voxel size : 5 x 5 x 2 mm- field size : 10 cm2- SSD : 100 cm- slab materials :

(1) water(2) lung(3) bone

air

source

Beam particle and its initial kinetic energy: - electron with 20MeV - photon with 6MV Linac- photon with 18MV Linac

Dose Distribution of slab phantoms

Page 10: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Comparison of depth dose for γ 6MV

− G4 v9.6.3�− G4CU

(1) water

• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4

(2) lung (3) bone

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.05

0.1

0.15

0.2

0.25

0.3

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.1

0.15

0.2

0.25

0.3

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.01

0.02

0.03

0.04

0.05

0.06

0.07-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

lung bone

MPEXS

MPEXS

MPEXS

MPEXS MPEXS

Page 11: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Comparison of depth dose for γ 18MV

− G4 v9.6.3�− G4CU

(1) water

• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4

(2) lung (3) bone

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.02

0.04

0.06

0.08

0.1

0.12

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.02

0.04

0.06

0.08

0.1

0.12

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0.02

0.04

0.06

0.08

0.1

0.12

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

lung bone

MPEXS

MPEXS

MPEXS

MPEXS MPEXS

Page 12: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2depth (cm)

0 5 10 15 20 25 30

dose

(Gy)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

-310×

G4G4CU

depth dose distribution

depth (cm)0 5 10 15 20 25 30

resi

dual

-0.2

-0.1

0

0.1

0.2

Comparison of depth dose for e- 20MeV

− G4 v9.6.3�− G4CU

(1) water

• x-axis: z-direction (cm)• y-axis: dose (Gy)• residual = (G4CU−G4) / G4

(2) lung (3) bone

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

-610

-510

-410

depth dose distribution

log scale

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

-610

-510

-410

depth dose distribution

depth (cm)0 5 10 15 20 25 30

dose

(Gy)

-610

-510

-410

depth dose distribution

log scale log scale

lung bone

MPEXS

MPEXS

Page 13: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Computation Time Performance

γ beam with 6MV γ beam with 18MV

(1) water (2) lung (3) bone (1) water (2) lung (3) bone

G4 [msec/particle] 0.780 0.822 0.819 0.803 0.857 0.924

G4CU [msec/particle] 0.00336 0.00331 0.00341 0.00433 0.00425 0.00443

× speedup factor( = G4 / G4CU ) 232 248 240 185 201 208

GPU:- Tesla K20c (Kepler architecture)- 2496 cores, 706 MHz- 4096 x 128 threads

- # of primaries

- 50M particles -> e- 20MeV

- 500M particles -> γ 6MV, 18MV

CPU:- Xeon E5-2643 v2 3.50 GHz

e- beam with 20MeV

(1) water (2) lung (3) bone

G4 [msec/particle] 1.84 1.87 1.65

G4CU [msec/particle] 0.00881 0.00958 0.00885

× speedup factor( = G4 / G4CU ) 208 195 193

185~250 times speedup against single-core G4 simulation!!

MPEXS

/ MPEXS)

MPEXS

/ MPEXS)

Page 14: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Algorithm Research

Page 15: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

• MPEXS does not attempt to sort particles

• Thread divergence: if threads in the same warp are tracking different particle kinds, then thread divergence occurs in physics process code

• Size of particle stack is the same for each thread and is fixed at run-time. Some applications call for the generation of many secondary particles. This restriction meant that we could only run with a small number of active threads.

Page 16: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

ᶕ e- e+ ᶕ ᶕ e- e- ᶕ

ᶕ e- e+ ᶕ ᶕ e- e- ᶕ

computation

ᶕ process

e- process

e+ process

ᶕ ᶕ ᶕ ᶕ

e- e- e-

e+

particles in memory

0 1 2 3 4 5 6 7index

particles in memory

Page 17: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

MPEXS Experiments

• Initialize each thread with the same random number generator state. This leads to a non-physical simulation, but eliminates thread divergence. We saw a factor 3x speedup in these runs.

• Measure the time it takes to sort particle index by selected process and perform a run length encode against the time for a single trip through event loop. Calculations indicate we should expect a factor 2x speedup if implemented in full simulation.

Page 18: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

New Architecture

• Goal 1: minimize/eliminate thread divergence

• Goal 2: eliminate need for fixed-size and thread-local secondary stacks

• Goal 3: maintain extensibility

Page 19: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

How it works

Page 20: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

Page 21: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-

Page 22: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

pop

ˠ ˠ ˠ ˠ ˠ

output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-

Page 23: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

pop

ˠ ˠ ˠ ˠ ˠ

process selection

ˠ ˠ ˠ ˠ ˠCompton scattering

Photoelectric effect

output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-

Page 24: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

pop

ˠ ˠ ˠ ˠ ˠ

process selection

ˠ ˠ ˠ ˠ ˠ

output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-

sort by selected process

Compton scattering

Photoelectric effect

Page 25: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

pop

ˠ ˠ ˠ ˠ ˠ

process selection

ˠ ˠ ˠ ˠ ˠ

secondary generation

secondary particles ˠ ˠ ˠ e- e- e- e-e-

output buffers ˠ ˠ ˠ ˠ ˠ ˠ e- e-

sort by selected process

Compton scattering

Photoelectric effect

Page 26: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

input buffer ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ

pop

ˠ ˠ ˠ ˠ ˠ

process selection

ˠ ˠ ˠ ˠ ˠ

secondary generation

secondary particles ˠ ˠ ˠ e- e- e- e-e-

output buffers ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ ˠ e- e- e- e- e- e-e-

secondary storage

sort by selected process

Compton scattering

Photoelectric effect

Page 27: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Features• Store particles on a generalized stack that allows pushing and popping a block of

particles in one operation.

• Group particles by kind (gamma, e-, e+). When we pop a block of particles, we know they are all the same kind, thus we can apply the same (non-divergent) operations.

• Maintain separate input and output buffers. Physics processes know the input and output particles. For example, in Compton scattering the input is a photon and the output is a scattered photon and a recoil electron. Thus, we can read from the active input photon buffers and write to output electron and photon buffers that are pushed onto appropriate stacks.

• The sort and run-length encode operations are applied after process selection so that after-step processes are applied only to particles that call for it.

Page 28: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Properties• No thread-divergence due to process selection. Thread

divergence may occur in the application of a physics process, because many of them rely on sample-reject algorithms to sample from various distributions.

• Have non-coalesced reads of particle data in the after-step physics process. However, all writes of particle data is coalesced. We have to pay for the randomness somewhere.

• Thread-local stacks are not required.

Page 29: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Experiments• The new architecture is substantially different from MPEXS. We

have not yet ported the physics processes over. We've done performance experiments with fake/model physics processes (which mimic computation and memory access patterns of the real ones).

• We can vary the number of physics processes and the amount of data moved. The numbers shown are the speed up of the new-architecture against the old for a variety of configurations.

Page 30: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Speedup via new architecture• Speedup due to sorting by process id for fake/model processes • Vary number of process and amount of data required by each process • Results collected from K40

Number of processes1 2 4 8 16 32 64 128

Dat

a tra

nsfe

r (flo

at #

)

1 0.5 0.6 0.8 1.0 1.3 1.8 2.7 4.12 0.5 0.7 0.8 1.0 1.5 2.1 2.9 4.24 0.6 0.7 0.9 1.2 1.8 2.6 3.4 4.68 0.6 0.8 1.1 1.6 2.4 3.3 4.2 5.216 0.6 1.0 1.5 2.1 3.0 4.1 4.9 5.932 0.7 1.2 1.8 2.6 3.6 4.6 5.4 6.364 0.8 1.4 2.0 2.8 3.9 4.9 5.7 6.6128 0.9 1.7 2.4 3.1 4.0 5.1 5.9 6.7

speedup

Page 31: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Summary• MPEXS is a GPU-based Monte Carlo simulator for X-ray radiotherapy

• MPEXS attains around 200x speedup when compared to Geant4 running on single CPU core

• Algorithm experimentation indicates a further 2x speed up with a sort operation after process selection

• New architecture also opens opportunities for other applications

• better performance with more physics processes

• no thread-local secondary stacks

Page 32: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Outline

Geant4 GPUexperimentation MPEXS

Algorithmresearch

ApplicationdevelopmentGeant4

multi-threading

Page 33: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

MPEXS-DNA

Page 34: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

The Geant4-DNA Project“Geant4-DNA”, an extension of Geant4 to DNA physics

• Estimates biological effects (e.g. DNA strand breaks) by radiation with ultra low energy scale (down to meV)

• The main objective of the project: • Evaluates effects on human health in chronic radiation exposure

• ex.) Medical diagnostic, Astronauts in space missions, Airline crews, …

• Should be improved its computing performance using GPU power. • Energy spread in cells is an important factor for DNA damage.

• Geant4-DNA calculates complex track geometry within cells. • Needs to handle a large number of secondary particles.

• ex.) More than 20k secondaries are generated per primary • Days-Weeks simulation on CPU cluster

Page 35: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

• Based on Geant4-DNA 10.02 p03 • EM Physics for lower energy range (down to meV)

• Calculates energy loss and generates primary molecules like excited and ionized H2O.

• Radiolysis of water • Diffusion and production of chemical species

• Estimates DNA damage (-> future work).A single He+ 100 keV produces direct DNA damages •  5 Single Strand Breaks •  2 Double Strand Breaks in a total of 1.2×108 basis elementary volumes

Chromatine fiber (constituent of chromosomes)

EM shower in DNA ∅

10 nm

© CENBG

in collaboration with G. Cosmo, CERN

Courtesy of Sebasien Incerti (IN2P3-CNRS / CENBG)

Physics phase: primary radiation interacting with matter (DNA) and producing radicals Chemistry phase: Brownian motion of radicals (further cell level damage) and interactions between radicals 1. Physical Phase 2. Chemical Phase

• Calculates dose distributions • Generates primary chemical

species like H2O*, H2O-/+, e-aq

Diffusion and reactions for chemical species

3. Biological Phase (Future work)

MPEXS-DNA, microdosimetry simulation on GPU

http://www.windows2universe.org/earth/Life/cell_radiation_damage.html

Page 36: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Physics Processes for X-rays

Compton scattering 100 eV - 1 GeV, Livermore

Photoelectric effect 100 eV - 1 GeV, Livermore

Gamma conversion 100 eV - 1 GeV, Livermore

Rayleigh scattering 100 eV - 1 GeV, Livermore

Particles Electrons Protons Hydrogen atoms

Helium atoms (He++, He+, He0)

Elastic scattering

9 eV - 10 keVUehara

10 keV - 1 MeVChampion

100 eV - 1 MeVHoang

100 eV - 10 MeVHoang

Excitation10 eV - 10 keVEmfietzoglou

10 keV - 1 MeVBorn

10 eV - 500 keVMiller Green

500 keV - 100 MeVBorn

10 eV - 500 keVMiller Green

1 keV - 400 MeVMiller Green

Chargechange — 100 eV - 10 MeV

Dingfelder100 eV - 10 MeV

Dingfelder1 keV - 400 MeV

Dingfelder

Ionization10 eV - 10 keVEmfietzoglou

10 keV - 1 MeVBorn

100 eV - 500 keVRudd

500 keV - 100 MeVBorn

100 eV - 100 MeVRudd

1 keV - 400 MeVRudd

Vibrationalexcitation

2 - 100 eVMichaud et al. — — —

Disociative attachment

4 - 13 eVMelton — — —

E1

E2

pe-

H atom -> p

AB + e- -> AB- -> A + B-

((( (((

ΔEe-e-

p

Phys

ics

Proc

esse

s

MPEXS-DNA Physics Processes

Atomic deexcitation occurs during ionization process, and emits auger electrons and X-rays

Page 37: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

The difference of energy loss process (EM Physics vs DNA Physics)

Standard EM Physics• Continues process

• Energy loss is below a given threshold. • Calculates average energy loss at each

step with the Bethe-Bloch formula. • No secondaries are generated.

• Discrete process • Generates a secondary if energy loss is

above the threshold. DNA physics• Handling as a discrete process without

energy thresholds to calculate complex energy spread within cells for DNA damage • A large number of secondaries are generated

(~ 20k / primary).

Bethe-Bloch formula:

ΔE1 ΔE3

ΔE2ion

izatio

n

excita

tion

ΔE4

ΔE5

ΔE6

ΔE1

ΔE2

ΔE3

Δx1

Δx2

Δx3

Δx4

ΔE4

“continues process”

“discrete process”

Page 38: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

• DNA Physics simulation had an issue of Low thread occupancy.

• The number of active threads was limited due to large memory consumption for storing secondaries generated into the stack.

NVIDIA, Tesla K40c, Global Memory: 11,439 MB (GDDR5)

The difference of # of secondaries and active thread number (DNA vs EM)

Incidentparticle

Initialenergy

Typical # ofsecondariesgenerated

Stack size per CUDA thread

Total active CUDA thread numbers

(Nblk x Nthr/blk)

Total memory usage

for stacks

DNAPhysics He++ 1 MeV > 20,000 25,000

(1,074 kB)10,240

(80 x 128) 10,740 MB

EMPhysics e- 20 MeV < 40 100

(4.3 kB)1,048,576

(4,096 x 256) 4,405 MB

An issue on lower thread occupancy in physics simulation

Page 39: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

CUDA Thread Assignment For MPEXS-DNA Physics Simulation

• A group of 32 CUDA threads is assigned per event and the threads in a group share a secondary stack. • cf.) In MPEXS case (Standard EM Physics), each thread has its own stack.

• Host memory is also available as a stack (using virtual memory addressing) • Reduces memory consumption for the stacks and increases active thread number

(~10k threads -> more than 1 M threads) -> Keeps high thread occupancy during the simulation

DNA PhysicsStandard EM Physics

0 1 2 3 4 5 6 …e- e- γ γ e- e+ γ …

CUDA Threads

Secondary stacks

(capacity: 100)

Thread# …CUDA Threads

Secondary stacks (tot. capacity: 25k)

32 threads

…Warp #0

0 1 2 3 4 5 6 … 30H

31p e- H e- e- e- H … H e-

Thread#

Event #0 Event #1

Warp #132 33 34 35 36 37 38 … 62

H63

p H e- e- H p …

on host mem.

on device mem.

Page 40: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

MPEXS-DNA Physics PerformanceDepth dose curves (CPU vs GPU)

z-direction (um)0 5 10 15 20 25

Dos

e (G

y)

0

100

200300

400500

600

700310×

depth dose distributiondepth dose distribution

z-direction (um)0 5 10 15 20 25

Ener

gy D

epos

it (e

V)

0

100

200

300

400

500

energy depositenergy deposit

z-direction (um)0 10 20 30 40 50 60 70 80 90 100

Dos

e (G

y)

1

10

210

depth dose distribution

z-direction (um)0 10 20 30 40 50 60 70 80 90 100

Ener

gy D

epos

it (e

V)

1−10

1

10

energy deposit

z-direction (um)0 1 2 3 4 5 6 7 8 9 10

Dos

e (G

y)

0500

1000150020002500300035004000

310×depth dose distribution

z-direction (um)0 1 2 3 4 5 6 7 8 9 10

Ener

gy D

epos

it (e

V)0

20406080

100120140160180200

energy deposit

— Geant4-DNA (CPU) — MEPXS-DNA (GPU)

p 1 MeV

He++ 1 MeV

e- 100 keV

Good agreement with Geant4-DNA

Page 41: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Physico-Chemical Phase for MPEXS-DNA• Physical interactions (Ionization / Excitation / Attachment) produce ionised and

excited H2O molecules (H2O+/H2O-, H2O*)

• Then, dissociates or releases energy into water

• Electrons (Ekin < 8.22 eV) become hydrated electrons (e-aq)

• These processes occur within 1 ps after irradiation

Electronic state Process Dissociation channel Fraction (%)Ionization state Dissociative decay H3O+ + •OH 100

Excitation state: A1B1Dissociative decay •OH + H• 65

Relaxation H2O + ΔE 35

Excitation state: B1A1Auto-ionization H3O+ + •OH + e-aq 55

Dissociative decay •OH + •OH + H2 15Relaxation H2O + ΔE 30

Excitation state: Rydberg,diffusion bands

Auto-ionization H3O+ + •OH + e-aq 50Relaxation H2O + ΔE 50

Dissociative attachment: H2O- Dissociative decay •OH + OH- + H2 100

Ref.) Radiat Environ Biophys (2009) 48: 11- 20

Page 42: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

(1)Calculates intermolecular distance (d) for all pairs. • Computation time increases by O(N2/2).

• kd-tree algorithm (Geant4-DNA) • Spreading CUDA threads (MPEXS-DNA)

Then, makes reactions for pairs with d < R

(2)Finds minimum distance in remains,and calculates time step (Δt).

(3)Diffuses molecules using Δt. • A CUDA thread transports a molecule.

(4)Loops (1) ~ (3)

Species Diffusion coefficient [m2/s]

H3O+ 9.0E-09

H• 7.0E-09

OH- 5.0E-09

e-aq 4.9E-09

H2 4.8E-09

•OH 2.8E-09

H2O2 2.3E-09

Reactions Reaction rate [M-1s-1]2e-aq + 2H2O -> H2+ 2OH- 5.00E+09e-aq + •OH -> OH- 2.95E+10e-aq + H• + H2O -> OH- + H2 2.65E+10e-aq + H3O+ -> H• + H2O 2.11E+10e-aq + H2O2 -> OH- + •OH 1.44E+10•OH + •OH -> H2O 4.40E+09•OH + H• -> H2O 1.44E+10H• + H• -> H2 1.20E+10H3O+ + OH- -> 2H2O 1.43E+10Ref.) Radiat Environ Biophys (2009) 48: 11- 20

d

d < R ?No Yes

Make reactionDiffusion

R = k4πNAD

Reaction radius (R)(by Smoluchowski Model) :

Chemical Phase for MPEXS-DNA

Page 43: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Time(ps)1 10 210 310 410 510 610

G-v

alue

(# o

f mol

ecul

es /

100

eV)

0

1

2

3

4

5

6 Comparison of G-value profile (CPU vs GPU) ✓ Line: Geant4-DNA ✓ Filled circle: MPEXS-DNA

p 20 MeV

OH・ OH- H3O+ eaq- H2 ・H H2O2

Agrees with Geant4-DNA within ~ 3 %G-value = # of Molecules

Energy loss

Time(ps)1 10 210 310 410 510 610

G-v

alue

(# o

f mol

ecul

es /

100

eV)

0

1

2

3

4

5

6

7

Time(ps)1 10 210 310 410 510 610

G-v

alue

(# o

f mol

ecul

es /

100

eV)

0

1

2

3

4

5

6

7

Time(ps)1 10 210 310 410 510 610

G-v

alue

(# o

f mol

ecul

es /

100

eV)

0

1

2

3

4

5

6

7

Time(ps)1 10 210 310 410 510 610

G-v

alue

(# o

f mol

ecul

es /

100

eV)

0

1

2

3

4

5

6

7

G-value (e- 750 keV)

先週 new

eaq (! MPEXS-DNA)

H2 (! MPEXS-DNA)

eaq (Partrac)

H2 (Partrac)

eaq (! MPEXS-DNA)

H2 (! MPEXS-DNA)

eaq (Partrac)

H2 (Partrac)

・OH (! MPEXS-DNA)

H2O2 (! MPEXS-DNA)

・OH (! MPEXS-DNA)

H2O2 (! MPEXS-DNA)

・OH (Partrac)

H2O2 (Partrac)

・OH (Partrac)

H2O2 (Partrac)

e- 750 keV

Verifying with other simulation dataRef.) J. Radiat. Res., 46, 333–341 (2005)

MPEXS-DNA Physics and Chemical Performance

Diffusions and chemical reactions after irradiated water phantom with a 10 keV electron

Page 44: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

• Fast math option (nvcc --use_fast_math) • ~ 1.2x speedup

• L1 cache (nvcc -Xpxas -dlcm=ca) • ~ 1.8x speedup

• CUDA Stream • For kernels without dependency in Physics Phase

• Calculating cross-section value for each physical interaction • To use GPU resource fully in Chemical Phase

Code optimization for Tesla K40c GPU

Page 45: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

13.48

2.57

3279.47932.82

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

e- 750keV p20MeV

EventN

umber/1min.

Geant4-DNA(CPU) MPEXS-DNA(GPU)

363x243x

Up to 360 times speedup against single-core Xeon CPU • Process time for p 20 MeV (total ~15k events)

• ~ 4 days (single-core Xeon CPU) -> ~ 16 min. (Tesla K40c GPU)

GPU Performance for MPEXS-DNA SimulationIncluding Physics and Chemical Phases

• GPU: • NVIDIA, Tesla K40c,

2,880 cores, 745 MHz • CPU:

• Intel, Xeon E5-2643 v2, 3.50 GHz

Comparison of event number processed per 1 min.

Page 46: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Performance Gain for Tesla P100 against Tesla K40c

3279.47

932.82

10053.09

3028.60

0.0E+00 2.0E+03 4.0E+03 6.0E+03 8.0E+03 1.0E+04 1.2E+04

e- 750keV p20MeV

EventN

umber/1min.

MPEXS-DNA(K40c) MPEXS-DNA(P100)

3.06x

3.24x

• Adopted the same thread configuration as K40c in the simulation with P100 • More than 3 times performance gain against K40c

Comparison of event number processed per 1 min.

Preliminary result

Page 47: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Summary• MPEXS-DNA is an extension of MPEXS to DNA Physics.

• Geant4-DNA should be improved an issue on long duration of simulation time.

• We’ve succeeded to boost up computing performance for microdosimetry simulation using GPU power drastically. • Up to 360 times speedup against single-core Xeon CPU for K40c

• A Tesla P100 is equivalent to ~ 1000 cores of Xeon CPU.• ~ 3 times performance gain against K40c without any optimization • Could achieve further performance improvement by appropriate

optimization.

Page 48: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

In near future• Developing “killer applications” based on MPEXS-DNA to estimate biological effects

on radiation quantitatively • DNA single- and double-strand breaks • Cellular survival rate • Radiosensitization to tumor in radiation therapy

(e.g. Gold nanoparticle; GNP) • …

• Extending MPEXS to “nuclear physics” and “thermal neutron physics” • Proton and carbon therapy • Boron Neutron Capture Therapy • Radiation shielding calculations • …

Page 49: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

Acknowledgements• Makoto Asai, SLAC

• Joseph Perl, SLAC

• Andrea Dotti, SLAC

• Takashi Sasaki, KEK

• Akinori Kimura, Ashikaga Institute of Technology

• Margot Gerritsen, ICME, Stanford

Page 50: Fast GPU Monte Carlo Simulation for Radiotherapy, DNA ...€¦ · Geant4 • Toolkit for simulation of particles traveling through and interacting with matter • Supports wide variety

References• N. Henderson, et al. A CUDA Monte Carlo simulator for radiation therapy dosimetry

based on Geant4. <https://dx.doi.org/10.1051/snamc/201404204>

• K. Murakami, et al. Geant4 Based simulation of radiation dosimetry in CUDA. <https://dx.doi.org/10.1109/NSSMIC.2013.6829452>

• S. Okada, et al. GPU Acceleration of Monte Carlo Simulation at the cellular and DNA levels. <https://dx.doi.org/10.1007/978-3-319-23024-5_29>

• S. Agostinelli, et al. Geant4—-a simulation toolkit.<https://dx.doi.org/10.1016/S0168-9002(03)01368-8>

• M.A. Bernal, et al. Track structure modeling in liquid water: A review of the Geant4-DNA very low energy extension of the Geant4 Monte Carlo simulation toolkit. <https://dx.doi.org/10.1016/j.ejmp.2015.10.087>