Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science...
-
Upload
holly-clark -
Category
Documents
-
view
218 -
download
0
Transcript of Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science...
![Page 1: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/1.jpg)
Modeling molecular dynamics from simulations
Nina Singhal HinrichsDepartments of Computer Science and Statistics
University of Chicago
January 28, 2009
![Page 2: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/2.jpg)
Motivation
• Proteins are essential parts of living organisms– enzymes, cell signaling, membrane
transport . . .
• Composed of chain of amino acids• Fold to unique 3-dimensional
structure• Misfolding can cause diseases
– Alzheimer’s, Mad cow, Huntington’s . . .
• How do proteins fold?
![Page 3: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/3.jpg)
Molecular dynamics
• Represent atoms of molecule and solvent
• Model forces on atoms
• Integrate laws of motion
• Small integration time step compared to motion timescales
![Page 4: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/4.jpg)
Folding@Home: Distributed computing for biomolecular simulation
• Perform multiple simulations in parallel
• Total simulation times – hundreds of microseconds (hundreds of CPU-years)
Very powerful computational resource– ~200 Teraflops sustained performance– >1,000,000 total CPUs; 200,000 active
![Page 5: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/5.jpg)
Challenge: How to analyze?
• Enormous datasets– Describe dynamics in microscopic detail
• Questions we want to answer– Rate of folding, mechanism of folding . . .
• How can we extract these properties from our data?
![Page 6: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/6.jpg)
Outline
• Markovian state model for molecular motion– Model description, uses, examples
• New algorithms for building these models– Defining states and transition probabilities
• New methods for dealing with finite sampling– Model complexity, uncertainty analysis, targeted
sampling
![Page 7: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/7.jpg)
Chemical intuition
Chemical reactions often exhibit stochastic behavior
n-butane
Chandler, Journal of Chemical Physics (1977)
![Page 8: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/8.jpg)
1
2 34
5
Markovian state model
Define transition probabilities, or edges, between states
Define states in the conformation space
NNN
N
pp
pp
ppp
1
2221
11211
1
2 34
5
![Page 9: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/9.jpg)
Uses of the model
• Populations of states over time
• Eigenvalues and eigenvectors – conformational changes
• Kinetic properties – virtually any kinetic property
• Mechanistic properties – most likely path, probability of transitions as graph algorithms
Chodera et al., Multiscale Modeling and Simulation (2006)
t
p
![Page 10: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/10.jpg)
Example models
Chodera et al., Multiscale Modeling and Simulation (2006)
Kasson et al., PNAS (2006)
lipid vesicle fusionalanine peptide
Sorin and Pande, Biophysical Journal (2005) Jayachandran et al., Journal of Structural Biology (2006)
villin headpiece
alpha helix
![Page 11: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/11.jpg)
• Building Markovian state model
– Defining states that are Markovian
– Calculating the transition probabilities
• Refining Markovian state model
– Finding the best model
– Determining model uncertainty
– Designing new simulations
Computational and statistical challenges
1
2 34
5
p11 p12 p1Np21 p22 pN1 pNN
1
2 34
5
![Page 12: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/12.jpg)
• Challenge: Find appropriate states
• Individual conformations as states does not scale
• Group conformations into discrete states
• Structural clustering is insufficient
• Basic algorithm – combine structural and kinetic similarity
Automatic state decomposition
J. D. Chodera*, N. Singhal*, V. S. Pande, K. A. Dill, and W. C. Swope. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. Journal of Chemical Physics, 126, 155101 (2007). (*These authors contributed equally to this work)
• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities
![Page 13: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/13.jpg)
Comparison of structural and kinetic clustering
structural clustering kinetic clustering
trpzip2Cochran et al. PNAS 98:5578, 2001.
![Page 14: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/14.jpg)
State decomposition – splitting
Cluster conformations by root mean square distance (RMSD)
![Page 15: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/15.jpg)
State decomposition – lumping
group states which inter-convert quickly
![Page 16: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/16.jpg)
State decomposition – resplitting
Cluster conformations, restricted to each state
![Page 17: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/17.jpg)
Blocked alanine peptide
60
-60 60
-60
1 2
3 4
6
5Chodera et al., Multiscale Modeling
and Simulation (2006)
![Page 18: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/18.jpg)
Automatic state decomposition of alanine peptide
Black state sits on top of multiple other states!
Benefit of automatic algorithm
These conformations had an unusual peptide bond
![Page 19: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/19.jpg)
Stability of decomposition
![Page 20: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/20.jpg)
TrpZip peptide
![Page 21: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/21.jpg)
N. Singhal, C. D. Snow, and V. S. Pande. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a trp zipper beta hairpin. Journal of Chemical Physics, 121(1), 415-425 (2004).
Transition probabilities
1
23
4
5
Discretize trajectories into series of states
1223435
normalize
NNN
N
pp
pp
ppp
1
2221
11211
Count number of transitions between all pairs of states
NNN
N
zz
zz
zzz
1
2221
11211
transition counts transition probabilities
• Building Markovian State Model– Defining states that are Markovian– Calculating the transition probabilities
![Page 22: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/22.jpg)
Model selection
• Challenge: How many states should we have?
– More states are more Markovian
– More states have more parameters
• How do we evaluate this tradeoff?
N. S. Hinrichs and V. S. Pande. Bayesian metrics for validating and improving Markovian state models for molecular dynamics simulations. (In preparation)
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
![Page 23: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/23.jpg)
Hidden Markov Model formulation
• Formulate the problem as a Hidden Markov Model structure scoring question
• Different discretizations of continuous space
• Benefits of Bayesian scores– Naturally handles tradeoff between complexity of model and
amount of data– Avoids over-fitting of parameters
States
Observations
![Page 24: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/24.jpg)
Alanine peptide results
Score of Hidden Markov models for different lag times
Last model is worse at shorter times but preferred at longer times
No previous evaluation methods could distinguish these models
![Page 25: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/25.jpg)
Uncertainty analysis
Goal: Once we have the states, what is the uncertainty in the model?
Both are reasonable but give different transition probabilities
Different MFPT, Pfold, eigenvalues, eigenvectors ...
N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).
1
2 34
51
2 34
5
Uncertainty caused by finite sampling
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
![Page 26: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/26.jpg)
Transition probabilities
Recall that we calculate transition probabilities by counting:
kik
ijij z
zp
)()|()|( *i*i*i ppp PcountsPcountsP
Instead of getting a single value, we can talk about the distribution of transition probabilities
Bayes’ Rule:
pij
i
70
30 k
j
i
700
300 k
j
![Page 27: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/27.jpg)
Sampling approach
Possible solution to get distribution of eigenvalues:
Problem:sampling can be expensivesolving per sample can be expensive
solve for eigenvalue[pij] solve for
eigenvalue[pij] solve for
eigenvalue[pij]
![Page 28: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/28.jpg)
Closed-form solution
Idea: trade exact distribution for efficient approximation
Taylor series expansion:
NNNN
pp
pp
pp
AAA
1212
1111
efficient to calculate using adjoint systems
Multivariate normal approximation of pi*
Closed-form normal distribution for
Eigenvalue equation:
0)det( A
IP
![Page 29: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/29.jpg)
Uncertainty results
5000 trajectories from each state
Running times (6 states)
Sampling-based: 40 seconds Closed-form: < 0.01 seconds
4926620057
1849784000
034823158133
0022646041169
00014788211
002151534380
Alanine System Transition Counts
1 2
3 4
6
5
Running times (87 states)
Sampling-based: 3600 seconds Closed-form: < 0.07 seconds
![Page 30: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/30.jpg)
Sampling strategies
Problem: Simulations are expensive. Even with Folding@Home, we run simulations for months
How to intelligently allocate our resources?
Common approaches:• equilibrium sampling – sample each conformation from
its equilibrium distribution• even sampling – sample equally from each state
New sequential approaches
N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, 204909-204921 (2005).N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, 244101 (2007).
• Refining Markovian State Model– Finding the best model– Determining model uncertainty– Designing new simulations
![Page 31: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/31.jpg)
Adaptive sampling
Goal: Reduce uncertainty of eigenvalue
Uncertainty analysis decomposes by transitions from each state
NNNN
NN
NN
NN
pp
pp
pp
pp
pp
pp
AA
AA
AA
11
22
2121
11
1111
Variance depends on both uncertainty of and sensitivity to transition probabilities
![Page 32: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/32.jpg)
Adaptive sampling – alanine
On 6-state alanine system, select trajectories randomly for 3 sampling strategies
4926620057
1849784000
034823158133
0022646041169
00014788211
002151534380
Transition Counts
![Page 33: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/33.jpg)
Adaptive sampling – villin
• Benefits– Very quickly reduce the
variance– Reduce the total number of
simulations– Need less computational
power– Can study more complex
systems
Villin HeadpieceJayachandran, et al.,
Journal of Chemical Physics (2006)
2454 states
![Page 34: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/34.jpg)
Summary
• Markovian state models are convenient methods to describe molecular motion
• Automatic state decomposition– Scalable to large size systems
• Model selection– Evaluate tradeoff between model complexity and
amount of data
• Uncertainty analysis– Efficient and decomposable
• Adaptive sampling– Reduce number of simulations
![Page 35: Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.](https://reader033.fdocuments.us/reader033/viewer/2022051214/56649e035503460f94aedbab/html5/thumbnails/35.jpg)
Acknowledgements
• Vijay Pande – Stanford University adviser
• Bill Swope, Jed Pitera – IBM collaborators
• John Chodera – state decomposition work