Download - Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland

All-atom molecular simulations of protein folding and unfolded-state dynamics and

structure with accelerated calculations on GPU

Cezary CzaplewskiFaculty of ChemistryUniversity of GdańskPoland

The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011

Molecular Simulation of ab Initio Protein Folding for

a Millisecond Folder NTL9(1-39)

Vincent A. Voelz,1 Gregory R. Bowman,2 Kyle Beauchamp,2 Vijay S. Pande1,2,3

1 Department of Chemistry, Stanford University, 2 Biophysics Program, Stanford University

3 Department of Structural Biology Stanford University

J. AM. CHEM. SOC. 2010, 132, 1526–1528

• Computer simulations, validated by experiment, can help gain a complete understanding of how proteins fold.

• Over a million-fold range in folding rates = possible diversity in folding mechanism.

• Folding@Home using GPU allowing for several folding trajectories of 39-residue NTL9(1-39), the slowest-folding protein (~1.5 ms folding time) folded ab initio with all-atom model MD to date.

• Insights into folding mechanism based on Markov state model (MSM).

10-15femto

10-12pico

10-9nano

10-6micro

10-3milli

100seconds

bond vibration

loopclosure

helixformation

folding of-hairpins

proteinfolding

all atom MD step

sidechainrotation

GPU

• Type of CPU attached to a graphics card dedicated to calculating floating point operations

• Incorporates stream processing microchips which contain special mathematical operations

• Stream Processing: applications can use multiple computational units without explicitly managing allocation, synchronization, or communication among those units.

CPU vs. GPU

CPU – 4 cores

Floating-Point Operations per Second for the CPU and GPU

Trp-cage 4.1 msPitera, Swope, PNAS 2003

Proteins folded ab initio by all atom MD

Fip35 WW 13 msEnsign, Pande, Biophys. J., 2009

Villin headpiece 10 msZagrovic, Snow, Shirts, Pande, JMB 2002

Fast folding villin variant <1 msEnsign, Kasson, Pande, JMB 2007

NTL9(1-39)~1.5 ms

experimental folding time

• Folding@Home using Gromacs with OpenMM library written specially for GPU allowing dramatically longer trajectories

• AMBER ff96 with Onufriev, Bashford,Case GBSA• Up to 10000 parallel MD simulations at 300, 330, 370 and 450K• Starting from native, random coil, extended• Aggregate 1.52 ms • Out of the ~3000 trajectories started from unfolded states at

370K only two reach <3.5 Å RMSD and eight <4 Å RMSD• Number of folding events is consistent with a simple model of

parallel uncoupled folding as a two-state Poisson process: ⟨n = ∫M(t)k exp(-M(t) kt) dt⟩

M(t) is the number of parallel simulations that reach time t.k is ~640/s experimental folding rate

Distributions of rmsd for native-state simulations of NTL9(1−39) after 10 μs

The number of parallel simulations at 370 K that reach time t.

Posterior predictions of the folding rate

A snapshot from a folding trajectory 3.1 Å RMSD

Non-native and native-like hydrophobic core arrangements

Markov state model (MSM)• MSM constitutes a kinetic clustering• Conformations that can interconvert rapidly are grouped into the

same state• Conformations that can only interconvert slowly are grouped into

separate states• Satisfies the Markov property—the identity of the next state

depends only on the identity of the current state and not any of the previous states

• Transition probability matrix T propagates state probabilities p

• An implied timescale k for given lag time t can be calculated from the eigenvalues m of matrix T

Detail of MSMBuilder package

• 100,000 microstates were generated by clustering conformations separated by 10 ns using k-centers algorithm

• The remaining 90% of the data was then assigned to these clusters• The resulting microstates had an average radius of ~4.5 Å • A macrostate model generated by lumping microstates into 2,000

macrostates using the Robust Perron Cluster Analysis (PCCA+) algorithm

• Although only a few folding trajectories were observed directly, a network of many possible pathways can be inferred from the overlapping sampling of local transitions.

• Top 10 folding fluxes, calculated by a greedy backtracking algorithm

Implied timescales Markov State Models (MSMs) built at lag times between 1 and 32 ns

100,000-microstate model 2000-macrostate model

A scatter plot of the 2000 macrostates Shown in red are the 14 macrostates transited by the top ten pathway fluxes

A 2000-state Markov State Model (MSM).

The top 10 folding pathways account for 25% of the ∼total flux and transit 14 of the 2000 macrostates

Contact profile subspaces used to calculate Qa Q12 Q13

natnat

nat

ccccQ

c(x)– contact profile indexed by x = (i, j)

The 14 macrostates plotted along structural and kinetic reaction coordinates

Contact profiles for the 14 macrostates involvedin the top folding pathways

Values of Q for each of the 14 macrostates involved in the top ten folding pathways

Q-values plotted versus pfold (committor) values

Macrostates l, m and n have very similar structural ensembles and similar pfold values

These states differ mostly intheir hairpin registrations and packing of the hairpin loop.

Conclusions

• Existing force field models using implicit solvent are accurate enough to fold proteins ab initio at long time scales, opening the door to simulating more structurally complex proteins.

• There need not be a single pathway or single, dominant mechanism for the folding of a given protein.

• Multiple mechanisms could be simultaneously present .

• The sequence of the protein, coupled with the chemical environment, control the balance to which each mechanistic pathway is seen.

Take-home message• GPU can speed up your simulations 10 times• Existing force field models using implicit solvent are

accurate enough to fold proteins during MD.• With only a few folding trajectories observed directly,

a network of many possible pathways can be inferred from kinetic clustering using the Markov State Model.

• Several pathways for the folding of a given protein.• Multiple folding mechanisms (a diffusion-collision or

nucleation-condensation) could be simultaneously present .