Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1
description
Transcript of Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1
![Page 1: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/1.jpg)
CHARMM-G, a GPU based MD Simulation code with PME and Reaction Force field for
Studying Large Membrane Regions Narayan Ganesan1, Sandeep Patel2, and Michela Taufer1
Computer and Info. Sciences Dept.1
Chemistry and Biochemistry Dept.2
University of Delaware
![Page 2: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/2.jpg)
Outline
• Overview of forces in molecular dynamics
• Data structures and methodology
• PME for long distance electrostatic interactions
• Steps involved in PME calculations
• Performance and profiling of large membranes
• Related work and conclusions
![Page 3: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/3.jpg)
Classical Forces
Classical Forces
Bonded
BondsAngles
Dihedrals
Non-Bonded
ElectrostaticVan der Waals
Reaction Field (RF) PME
![Page 4: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/4.jpg)
Bond Interactions
• Bond forces: acts only within pairs of molecules
• Angle forces: acts only within a triad of atoms
• Torsion or dihedral forces: acts only within quartet of atoms
![Page 5: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/5.jpg)
Non-bond Interactions:Van der Waals Potential
• Van der Waals or Lennard-Jones potential: Decays rapidly with distance
612
4rr
EVDW
• A cutoff of ~10A, accurately captures the effect of the Van der Waals potential
![Page 6: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/6.jpg)
Non-Bond Interactions:Electrostatic Potential
• Coulomb Potential: inverse square law
€
Ucoulomb =1
4πε 0
q1q2
| r1 − r2 |
221
21
0 ||41
rrqqFcoulomb
• Decays as 1/r with distance• Since 1/r decays rather slowly,
the potential can act over long distances
• Choosing a cutoff for electrostatic force/potential causes computational errors and inaccuracy • Our solutions to sum long distance electrostatic forces:
Reaction Force Field (RF) Ewald summation / Particle Mesh Ewald (PME)
![Page 7: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/7.jpg)
GPU Implementation: Data Structures
• A single thread is assigned to each atom• For each atom a set of lists is maintained:
Bond list stores list of bonds the atom belongs to Angle list stores list of angles the atom belongs to Dihedral list stores list of dihedrals the atom belongs to Nonbond list stores non-bond interactions with atoms within cutoff
q2, r2q6, r6q8, r8q9, r9
q1 q2 q3
q9q7 q8
q5 q6q4
Nonbond list for q5:
![Page 8: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/8.jpg)
MD Simulation
• MD simulations are iterative executions of MD steps• Each iteration computes forces on each particle due to:
Bonds – Bond List Angles – Angle List Dihedrals – Dihedral List Electrostatic Van der Waals
• If Ewald summation is used an additional component is added: Long distance interaction using PME method
- Nonbond List
Bond, angle, and dihedral lists are unchanged for each atom throughout the simulation
Nonbond list is updated based on a cutoff buffer
![Page 9: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/9.jpg)
Ways to Update Nonbond List
• Global neighbor list Each thread can iterate through the
global list of atoms to build the nonbond list
• Cell-based neighbor list Divide the domain into equal cells of
size = cutoff Search only in current cell and
adjacent cells for neighboring atoms There are 26 adjacent cells and 1
current cell in 3-dimensions
• Cell-based list is computationally very efficient but also needs regular cell updates
![Page 10: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/10.jpg)
Cell Updates
• Single thread manages a single or a set of cells• Each cell is managed by a list of
atoms in the cell called ‘CellList’• When an atom ‘i’ moves from
Cell A to Cell B, the thread responsible for Cell A updates the list of Cell B via thread safe integer atomic intrinsics• Invalid atoms are removed from
the cell lists by the ‘CellClean’ kernel
![Page 11: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/11.jpg)
q1 q2 q3
q9q7 q8
q5 q6q4
Periodic Boundary Condition
q1 q2 q3
q9q7 q8
q5 q6q4q1 q2 q3
q9q7 q8
q5 q6q4q1 q2 q3
q9q7 q8
q5 q6q4
q1 q2 q3
q9q7 q8
q5 q6q4q1 q2 q3
q9q7 q8
q5 q6q4
q1 q2 q3
q9q7 q8
q5 q6q4q1 q2 q3
q9q7 q8
q5 q6q4q1 q2 q3
q9q7 q8
q5 q6q4
Cell of interest of edge vectors ax, ay
Region of influence
![Page 12: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/12.jpg)
Reaction Force Field
• Any molecule is surrounded by spherical cavity of finite radius Within the radius, electrostatic interactions are calculated explicitly Outside the cavity, the system is treaded as a dielectric continuum
• This model allows the replacement of the infinite Coulomb sum by a finite sum plus the reaction filed
where the second terms is the reaction filed correction and Rc is the radius of the cavity
€
Uc =1
4πε 0
qiq ji< j∑ 1
rij−B0rji
2
2Rc3
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
Coulomb potential
![Page 13: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/13.jpg)
Ewald Summation Method (I)
• Proposed by Paul Peter Ewald in 1921 for crystallographic systems• Has found applications in molecular, astrophysical and
crystallographic systems• Used to sum inverse distance potential over long distance
efficiently – e.g., Gravity and Coulomb Potential.• Was started to be used in the late 70s for numerical
simulations O(NlogN) instead of O(NxN)
![Page 14: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/14.jpg)
Ewald Summation Method
• Three contributions to the total energy, depending on the distance of the interaction: Direct space (Edir) Reciprocal space (Erec) Self energy (Eself )
![Page 15: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/15.jpg)
Ewald Summation Method (II)
• Divide interactions into short range (Direct Space) and long range (Reciprocal Space)
Short Range
Long Range
Direct space using Nonbond List
Fourier Space
||4|)|(erfc
10 ji
jijidir rr
rrqqE
)()()/(exp2
10
2
222
mSmSmm
VE
mrec
V - Volume of the simulation region
S(m) – Structure parameters
![Page 16: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/16.jpg)
Steps in SMPE
Put charges on grids1
FFT of charge grid
2
Multiply with structure constants
3
FFT back
4
€
∂U∂ri
Compute force on atom i by calculating
Convolution yields potential at grid points which have to be summed
5
![Page 17: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/17.jpg)
Charge Spreading
• Each charge is spread on a 4x4x4 = 64 grid points in 3-D Grid spacing 1 A by a cardinal B-Spline of order 4 Create a 3 dimensional Charge Matrix “Q”.
• Mesh-based charge density Approximation by sum of charges at each grid point Multiple charges can influence a single lattice point
Charges
ni
iiii kzMkyMkxMqkkkQ..1
342414321 )()()(),,(
Essman et al., J. Chem. Phys. 1995
xi yi zi: position of the ith charge; k1 k2 k3: index of the lattice point
![Page 18: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/18.jpg)
Cardinal B-Spline of Order 4
• B-Spline has a region of influence of 4 units Each unit = 1A
• During charge spreading B-Spline has an impact on the neighboring 4x4x4 cells in 3 dimensions
![Page 19: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/19.jpg)
CPU vs. GPU Charge Spreading
• Charge Spreading by a cardinal B-Spline of order 4:
• CPU implementation is straightforward Time computation: Natoms x 4 x 4 x 4 time steps
• GPU implementation is hard to parallelize Can lead to racing conditions - need floating point atomic writes
• Current version of CUDA supports atomic writes for integers only Charges need to be converted to fixed point in order to utilize the
functionality
Unit cellcharges
![Page 20: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/20.jpg)
CPU vs. GPU Charge Spreading
CPU spreading of charges: GPU gathering of charges by a cardinal B-Spline of order 4:
• Charge spreading on GPU can be parallelized easily by the grid points instead of the atoms• Each thread works on a single or a set of grid points• Need O(ax*ay*az) threads, with each thread parsing through
all the atoms within 4x4x4 neighborhood –> O(N)
Each thread is assigned to a lattice point
![Page 21: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/21.jpg)
GPU Charge Spreading (I)
• Each lattice point maintains a list of atoms within 4x4x4 neighborhood for charge gathering
1
3
2
Neighbor list of point: q1, r1q2, r2q3, r3
Effect of charges 1, 2, 3 are gathered at the lattice point
![Page 22: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/22.jpg)
GPU Charge Spreading (II)
2
2’
• When a charge moves, several lattice points need to be updated• The charge is added to the neighbor
list of lattice points in dark gray• The charge is removed from the
neighbor list of lattice points in light gray• Lattice points in white are not affected• Since there are equal number of light
gray and dark gray lattice points, a 1-to-1 mapping was devised
The threads for lattice points in light gray update the list of lattice points in dark gray in a 1-to-1 fashion
![Page 23: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/23.jpg)
GPU Charge Spreading (III)
2
2’
1
1’
• When a single lattice point is updated by multiple threads, thread safe integer atomic intrinsics are used to update the cell lists
![Page 24: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/24.jpg)
Fast Fourier Transform
• CUFFT provides library functions to compute FFT and inverse FFT 3D FFT implemented with series of 1D FFTs and
transpositions• CUFFTExec can be optimized by choosing proper FFT
dimensions Power of 2
![Page 25: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/25.jpg)
Scientific Challenge
• One-third of the human genome is composed of membrane-bound proteins • Pharmaceuticals target membrane-bound protein receptors
e.g., G-protein coupled receptors Importance of systems to human health and understanding of
dysfunction
• State-of-the-art simulations only consider small regions (or patches) of physiological membranes • Heterogeneity of the membrane spans length scales much
larger than included in these smaller model systems. • Our goal: apply large-scale GPU-enabled computations for the
study of large membrane regions
![Page 26: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/26.jpg)
DMPC
• DiMyristoylPhosphatidylCholine (DMPC) lipid bilayers
Explicit solvent i.e., water
Membrane 152A
92A
92A
Small system: 17 004 atoms, 46. 8A x 46.8 A x 76.0 A
Large system: 68 484 atoms, 93.6 A x 93.6 A x 152.0 A
![Page 27: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/27.jpg)
Performance
Small membrane (17 004 atoms)
Case studies: Global neighbor list and RF (I), with cell-based list and RF (II), with neighbor list and PME (III), and with cell-based neighbor list and PME (IV)
Large membrane (68 484 atoms)
![Page 28: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/28.jpg)
Kernel Profiling (I)
• Large membrane – RF methodGlobal neighbor list Cell-based neighbor list
![Page 29: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/29.jpg)
Kernel Profiling (II)
• Large membrane – PME methodGlobal neighbor list Cell-based neighbor list
![Page 30: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/30.jpg)
Related Work
• Other MD code including PME method:M. J. Harvey and G. De. Fabritiis, J. Chem. Theory and Comp, 2009”
• Our implementation is different in terms of: Charge spreading algorithm Force field methods, including RF
![Page 31: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/31.jpg)
Conclusions and Future Work
• CHARMM-G is a flexible MD code based on the CHARMM force field integrating Ewald summation Reaction force field
• The code supports explicit solvent representations and enables fast simulations of large membrane regions• Improvements of the CUDA FFT will further improve the
performance presented in the paper• Future work include:
Code optimizations and parallelization across multiple GPUs Scientific characterization of large membranes
![Page 32: Narayan Ganesan 1 , Sandeep Patel 2 , and Michela Taufer 1 Computer and Info. Sciences Dept. 1](https://reader036.fdocuments.us/reader036/viewer/2022062305/56816361550346895dd42f0e/html5/thumbnails/32.jpg)
32
Acknowledgements
Sponsors:
Collaborators:Sandeep Patel, Brad A. Bauer, Joseph E. Davis (Dept. of Chemistry, UD)
Related work:Bauer et al, JCC 2010 (In Press)
Davis et al., BICoB 2009
More questions: [email protected]
GCL Members:
Trilce Estrada Boyu Zhang
Abel Licon Narayan Ganesan
Lifan Xu Philip Saponaro
Maria Ruiz Michela Taufer
GCL members in Spring 2010