All-atom molecular simulations of protein folding and unfolded-state dynamics and
structure with accelerated calculations on GPU
Cezary CzaplewskiFaculty of ChemistryUniversity of GdańskPoland
The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011
Molecular Simulation of ab Initio Protein Folding for
a Millisecond Folder NTL9(1-39)
Vincent A. Voelz,1 Gregory R. Bowman,2 Kyle Beauchamp,2 Vijay S. Pande1,2,3
1 Department of Chemistry, Stanford University, 2 Biophysics Program, Stanford University
3 Department of Structural Biology Stanford University
J. AM. CHEM. SOC. 2010, 132, 1526–1528
• Computer simulations, validated by experiment, can help gain a complete understanding of how proteins fold.
• Over a million-fold range in folding rates = possible diversity in folding mechanism.
• Folding@Home using GPU allowing for several folding trajectories of 39-residue NTL9(1-39), the slowest-folding protein (~1.5 ms folding time) folded ab initio with all-atom model MD to date.
• Insights into folding mechanism based on Markov state model (MSM).
10-15femto
10-12pico
10-9nano
10-6micro
10-3milli
100seconds
bond vibration
loopclosure
helixformation
folding of-hairpins
proteinfolding
all atom MD step
sidechainrotation
GPU
• Type of CPU attached to a graphics card dedicated to calculating floating point operations
• Incorporates stream processing microchips which contain special mathematical operations
• Stream Processing: applications can use multiple computational units without explicitly managing allocation, synchronization, or communication among those units.
CPU vs. GPU
CPU – 4 cores
Floating-Point Operations per Second for the CPU and GPU
Trp-cage 4.1 msPitera, Swope, PNAS 2003
Proteins folded ab initio by all atom MD
Fip35 WW 13 msEnsign, Pande, Biophys. J., 2009
Villin headpiece 10 msZagrovic, Snow, Shirts, Pande, JMB 2002
Fast folding villin variant <1 msEnsign, Kasson, Pande, JMB 2007
NTL9(1-39)~1.5 ms
experimental folding time
• Folding@Home using Gromacs with OpenMM library written specially for GPU allowing dramatically longer trajectories
• AMBER ff96 with Onufriev, Bashford,Case GBSA• Up to 10000 parallel MD simulations at 300, 330, 370 and 450K• Starting from native, random coil, extended• Aggregate 1.52 ms • Out of the ~3000 trajectories started from unfolded states at
370K only two reach <3.5 Å RMSD and eight <4 Å RMSD• Number of folding events is consistent with a simple model of
parallel uncoupled folding as a two-state Poisson process: ⟨n = ∫M(t)k exp(-M(t) kt) dt⟩
M(t) is the number of parallel simulations that reach time t.k is ~640/s experimental folding rate
Distributions of rmsd for native-state simulations of NTL9(1−39) after 10 μs
The number of parallel simulations at 370 K that reach time t.
Posterior predictions of the folding rate
A snapshot from a folding trajectory 3.1 Å RMSD
Non-native and native-like hydrophobic core arrangements
Markov state model (MSM)• MSM constitutes a kinetic clustering• Conformations that can interconvert rapidly are grouped into the
same state• Conformations that can only interconvert slowly are grouped into
separate states• Satisfies the Markov property—the identity of the next state
depends only on the identity of the current state and not any of the previous states
• Transition probability matrix T propagates state probabilities p
• An implied timescale k for given lag time t can be calculated from the eigenvalues m of matrix T
Detail of MSMBuilder package
• 100,000 microstates were generated by clustering conformations separated by 10 ns using k-centers algorithm
• The remaining 90% of the data was then assigned to these clusters• The resulting microstates had an average radius of ~4.5 Å • A macrostate model generated by lumping microstates into 2,000
macrostates using the Robust Perron Cluster Analysis (PCCA+) algorithm
• Although only a few folding trajectories were observed directly, a network of many possible pathways can be inferred from the overlapping sampling of local transitions.
• Top 10 folding fluxes, calculated by a greedy backtracking algorithm
Implied timescales Markov State Models (MSMs) built at lag times between 1 and 32 ns
100,000-microstate model 2000-macrostate model
A scatter plot of the 2000 macrostates Shown in red are the 14 macrostates transited by the top ten pathway fluxes
A 2000-state Markov State Model (MSM).
The top 10 folding pathways account for 25% of the ∼total flux and transit 14 of the 2000 macrostates
Contact profile subspaces used to calculate Qa Q12 Q13
natnat
nat
ccccQ
c(x)– contact profile indexed by x = (i, j)
The 14 macrostates plotted along structural and kinetic reaction coordinates
Contact profiles for the 14 macrostates involvedin the top folding pathways
Values of Q for each of the 14 macrostates involved in the top ten folding pathways
Q-values plotted versus pfold (committor) values
Macrostates l, m and n have very similar structural ensembles and similar pfold values
These states differ mostly intheir hairpin registrations and packing of the hairpin loop.
Conclusions
• Existing force field models using implicit solvent are accurate enough to fold proteins ab initio at long time scales, opening the door to simulating more structurally complex proteins.
• There need not be a single pathway or single, dominant mechanism for the folding of a given protein.
• Multiple mechanisms could be simultaneously present .
• The sequence of the protein, coupled with the chemical environment, control the balance to which each mechanistic pathway is seen.
Take-home message• GPU can speed up your simulations 10 times• Existing force field models using implicit solvent are
accurate enough to fold proteins during MD.• With only a few folding trajectories observed directly,
a network of many possible pathways can be inferred from kinetic clustering using the Markov State Model.
• Several pathways for the folding of a given protein.• Multiple folding mechanisms (a diffusion-collision or
nucleation-condensation) could be simultaneously present .
Top Related