Application of Probabilistic Roadmaps to the Study of Protein Motion.

52
Application of Probabilistic Roadmaps to the Study of Protein Motion
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    2

Transcript of Application of Probabilistic Roadmaps to the Study of Protein Motion.

Page 1: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Application of Probabilistic Roadmaps to

the Study of Protein Motion

Page 2: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Proteins Proteins are the workhorses of all living organisms They perform many vital functions, e.g:

• Catalysis of reactions• Transport of molecules• Building blocks of muscles• Storage of energy• Transmission of signals• Defense against intruders

They are large molecules (few 100s to several 1000s of atoms)

They are made of building blocks (amino acids) drawn from a small “library” of 20 amino-acids

They have an unusual kinematic structure: long serial linkage (backbone) with short side-chains

Page 3: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Protein Sequence

O

N

NN

N

OO

O

Long sequence of amino-acids (dozens to thousands), also called residues

Dictionary of 20 amino-acids (several billion years old)

(residue i-1)

Page 4: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Central Dogma of Molecular Biology

Physiological conditions: aqueous solution, 37°C, pH 7,atmospheric pressure

Page 5: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Mad cow disease is caused by mis-folding

Drug molecules act bybinding to proteins

Molecular motion is an essential process of life

Page 6: Application of Probabilistic Roadmaps to the Study of Protein Motion.

So, studying molecular motion is of critical importance in

molecular biology

Stanford BioX cluster

NMR spectrometer

However, few tools are available

Computer simulation:- Monte Carlo simulation- Molecular Dynamics

Page 7: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Motion occurs at very different frequencies

HIV-1 protease

Low-frequency motions (diffusive motions) are more directly related to protein functions

Page 8: Application of Probabilistic Roadmaps to the Study of Protein Motion.

I ntermediate states

I ntermediate states

Unfolded (denatured) state

Folded (native) stateMany pathwaysMany pathways

Two Major Drawbacks of MD and MC Simulation

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways

Interest in ensemble properties

Page 9: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Two Major Drawbacks ofMD and MC Simulation

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways

2) Each simulation run tends to waste much time in local minima

Page 10: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Kinematic Models Atomistic model: The position of each

atom is defined by its coordinates in 3-D space

(x4,y4,z4)

(x2,y2,z2)(x3,y3,z3)

(x5,y5,z5)

(x6,y6,z6)

(x8,y8,z8)(x7,y7,z7)

(x1,y1,z1)

p atoms 3p parameters

Drawback: The bond structure is not taken into account

Page 11: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Kinematic Models Linkage model: The protein consists of

atoms connected by rotatable bonds

NN

NN

C’

C’

C’

C’

O

O O

O

C

C

C

C

C

C C

C

Resi Resi+1 Resi+2 Resi+3

Page 12: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Roadmap-Based Representation

Compact representation of many motion pathways Coarse resolution relative to MC and MD simulation ( only low-frequency motions are represented) Efficient algorithms for analyzing multiple pathways

Page 13: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Initial Work A.P. Singh, J.C. Latombe, and D.L. Brutlag.

A Motion Planning Approach to Flexible Ligand Binding. Proc. 7th ISMB, pp. 252-261, 1999

Study of ligand-protein binding The ligand is a small flexible molecule, but the protein is assumed rigid A fixed coordinate system P is

attached to the protein and a moving coordinate system L is defined using three bonded atoms in the ligand

A conformation of the ligand is defined by the position and orientation of L relative to P and the torsional angles of the ligand

Page 14: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Roadmap Construction (Node Generation)

The nodes of the roadmap are generated by sampling conformations of the ligand uniformly at random in the parameter space (around the protein)

The energy E at each sampled conformation is computed: E = Einteraction + Einternal

Einteraction = electrostatic + van der Waals potentialEinternal = non-bonded pairs of atoms electrostatic + van der Waals

Page 15: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Roadmap Construction (Node Generation)

The nodes of the roadmap are generated by sampling conformations of the ligand uniformly at random in the parameter space (around the protein)

The energy E at each sampled conformation is computed: E = Einteraction + Einternal

Einteraction = electrostatic + van der Waals potentialEinternal = non-bonded pairs of atoms electrostatic + van der Waals

A sampled conformation is retained as a node of the roadmap with probability:

0 if E > Emax

Emax-EEmax-Emin

1 if E < Emin

Denser distribution of nodes in low-energy regions of conformational space

P = if Emin E Emax

Page 16: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Roadmap Construction (Edge Generation)

q q’

Each node is connected to its closest neighbors by straight edges

Each edge is discretized so that between qi and qi+1 no atom moves by more than some ε (= 1Å)

If any E(qi) > Emax , then the edge is rejected

qi qi+

1

E

Emax

Page 17: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Heuristic measureof energetic difficultyor moving from q to q’

Roadmap Construction (Edge Generation)

q q’

Any two nodes closer apart than some threshold distance are connected by a straight edge

Each edge is discretized so that between qi and qi+1 no atom moves by more than some ε (= 1Å)

If all E(qi) Emax , then the edge is retained and is assigned two weights w(qq’) and w(q’q)

where:

(probability that the ligand moves from qi to qi+1 when it is constrained to move along the edge)

qi qi+

1

i i+1i

w(q q') = -ln(P[q q ])

ii+1

i ii+1 i-1

-(E -E )/ kT

i i+1 -(E -E )/ kT -(E -E )/ kT

eP[q q ] =

e e

Page 18: Application of Probabilistic Roadmaps to the Study of Protein Motion.

For a given goal node qg (e.g., binding conformation), the Dijkstra’s single-source algorithm computes the lowest-weight paths from qg to each node (in either direction) in O(N logN) time, where N = number of nodes

Various quantities can then be easily computed in O(N) time, e.g., average weights of all paths entering qg and of all paths leaving qg (~ binding and dissociation rates Kon and Koff)

Querying the Roadmap

Protein: Lactate dehydrogenaseLigand: Oxamate (7 degrees of freedom)

Page 19: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Computation of Potential Binding Conformations

1) Sample many (several 1000’s) ligand’s conformations at random around protein

2) Repeat several times: Select lowest-energy

conformations that are close to protein surface

Resample around them

3) Retain k (~10) lowest-energy conformations whose centers of mass are at least 5Å apart

lactate dehydrogenase

active site

Page 20: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Experiments on 3 Complexes

1) PDB ID: 1ldmReceptor: Lactate Dehydrogenase (2386 atoms, 309 residues)Ligand: Oxamate (6 atoms, 7 dofs)

2) PDB ID: 4ts1Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423

atoms, 319 residues)Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs)

3) PDB ID: 1stpReceptor: Streptavidin (901 atoms, 121 residues)Ligand: Biotin (16 atoms, 11 dofs)

Page 21: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Results for 1ldm

Some potential binding sites have slightly lower energy than the active site Energy is not a discriminating factor

Average path weights (energetic difficulty) to enter and leave binding site are significantly greater for the active site Indicates that the active site is surrounded by an energy barrier that “traps” the ligand

Page 22: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Energy

ConformationPotential binding

site

Potential binding

site

Active site

Page 23: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Known native state Degrees of freedom: φ-ψ angles Energy: van der Waals, hydrogen bonds,

hydrophobic effect New idea: Sampling strategy Application: Finding order of SSE

formation

Application of Roadmaps to Protein Folding

N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of

Known Native Structures. J. Comp. Biology, 10(2):239-255, 2003

Page 24: Application of Probabilistic Roadmaps to the Study of Protein Motion.

High dimensionality non-uniform sampling

Conformations are sampled using Gaussian distribution around native state

Conformations are sorted into bins by number of native contacts (pairs of C atoms that are closeapart in native structure)

Sampling ends when all bins have minimum number of conformations “good” coverage of conformational space

Sampling Strategy(Node Generation)

Page 25: Application of Probabilistic Roadmaps to the Study of Protein Motion.

The lowest-weight path is extracted from each denatured conformation to the folded one

The order of formation of SSE’s is computed along each path

The formation order that appears the most often over all paths is considered the SSE formation order of the protein

Application: Order of Formation of Secondary

Structures

Page 26: Application of Probabilistic Roadmaps to the Study of Protein Motion.

1) The contact matrix showing the time step when each native contact appears is built

Method

Page 27: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Protein CI2 (1 + 4 )

Page 28: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Protein CI2(1 + 4 )

60

5

The native contact between residues 5 and 60 appears at step 216

Page 29: Application of Probabilistic Roadmaps to the Study of Protein Motion.

1) The contact matrix showing the time step when each native contact appears is built

2) The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

Method

Page 30: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Protein CI2(1 + 4 )

forms at time step 122 (II)3 and 4 come together at 187 (V)2 and 3 come together at 210 (IV)1 and 4 come together at 214 (I) and 4 come together at 214 (III)

Page 31: Application of Probabilistic Roadmaps to the Study of Protein Motion.

1) The contact matrix showing the time step when each native contact appears is built

2) The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

Method

Page 32: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Comparison with Experimental Data

CI2

1+5

31+4

1+4 5126, 70k

5471, 104k7975, 104k8357, 119k

roadmap sizeSSE’s

Page 33: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C.

Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-

281, 2003

New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges

vi

vj

Pij

Page 34: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Edge probabilities

Follow Metropolis criteria:

ijij

iij

i

exp(-ΔE / kT), if ΔE >0;

NP =

1, otherwise.

N

Self-transition probability:

ii ijj i

P=1- Pvj

vi

Pij

Pii

[Roadmap nodes are sampled uniformly at random and energy profilealong edges is not considered]

Page 35: Application of Probabilistic Roadmaps to the Study of Protein Motion.

V

Stochastic Roadmap Simulation

Pij

Stochastic roadmap simulation and Monte Carlo simulation converge to the Boltzmann distribution, i.e., the number of times SRS is at a node in V converges towardwhen the number of nodes grows (and they are uniformly distributed)

-E/ kT

Ve dV

Page 36: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Roadmap as Markov Chain

Transition probability Pij depends only on i and j

Pijij

Page 37: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Example #1: Probability of Folding pfold

Unfolded state Folded state

pfold1- pfold

Page 38: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Pii

F: Folded stateU: Unfolded state

First-Step Analysis

Pij

i

k

j

l

m

Pik Pil

Pim

Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

=1 =1

One linear equation per node Solution gives pfold for all nodes No explicit simulation run All pathways are taken into account Sparse linear system

Page 39: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Number of Self-Avoiding Walks

on a 2D Grid

1, 2, 12, 184, 8512, 1262816,575780564, 789360053252, 3266598486981642,(10x10) 41044208702632496804, (11x11) 1568758030464750013214100,(12x12) 182413291514248049241470885236 > 1028 http://mathworld.wolfram.com/Self-AvoidingWalk.html

Page 40: Application of Probabilistic Roadmaps to the Study of Protein Motion.

In contrast …

Computing pfold with MC simulation requires:

For every conformation q of interest

Perform many MC simulation runs from q

Count number of times F is attained first

Page 41: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Computational Tests• 1ROP (repressor of

primer)• 2 helices• 6 DOF

• 1HDD (Engrailed homeodomain)

• 3 helices• 12 DOF

H-P energy model with steric clash exclusion [Sun et al., 95]

Page 42: Application of Probabilistic Roadmaps to the Study of Protein Motion.

1ROP

Correlation with MC Approach

Page 43: Application of Probabilistic Roadmaps to the Study of Protein Motion.

pfold for ß hairpin

Immunoglobin binding protein

(Protein G)

Last 16 amino acids

Cα based representation

Go model energy function

42 DOFs

[Zhou and Karplus, `99]

Page 44: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Computation Times (ß hairpin)

Monte Carlo (30 simulations):

1 conformation ~10 hours ofcomputer time

Over 107 energy

computations

Roadmap:

2000 conformations23 seconds ofcomputer time

~50,000 energycomputations

~6 orders of magnitude speedup!

Page 45: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Using Path Sampling to Construct Roadmaps

N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build Better Markovian State Models: Predicting the Folding Rate

and Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics, 121(1):415-425, 2004

New idea:Paths computed with Molecular Dynamics simulation techniques are used to create the nodes of the roadmap

More pertinent/better distributed nodes

Edges are labeled with the time needed to traverse them

Page 46: Application of Probabilistic Roadmaps to the Study of Protein Motion.

t

U

F

Sampling Nodes from Computed Paths (Path

Shooting)

Page 47: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Sampling Nodes from Computed Paths (Path

Shooting)

U

Fi

jtij

pij

Example: Langevin dynamics equation of motion is where R is a Gaussian random forceext

dxF -mγ +R=0

dt

Page 48: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Node Merging

If two nodes are closer apart than some , they are merged into one and merging rules are applied to update edge probabilities and times

4

1

5

3

2P12, t12

P14, t14

1

5

3

2’P12’, t12’

P12’ = P12 + P14 t12’ = P12xt12 + P14xt14

Page 49: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Node Merging

If two nodes are closer apart than some , they are merged into one and merging rules are applied to update edge probabilities and times

4

1

5

3

2P12, t12

P14, t14

1

5

3

2’P12’, t12’

P12’ = P12 + P14 t12’ = P12xt12 + P14xt14

Approximately uniform distribution of nodes over the reachable subset of

conformational space

Approximately uniform distribution of nodes over the reachable subset of

conformational space

Page 50: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Application: Computation of MFPT

Mean First Passage Time: the average time when a protein first reaches its folded state

First-Step Analysis yields: MPFT(i) = j Pij x (tij + MPFT(j)) MPFT(i) = 0 if i F

Assuming first-order kinetics, the probability that a protein folds at time t is:

where r is the folding rate

MFPT = =1/r

-rtfP(t) = 1 - e

f0

P(t) tdt

Page 51: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Computational Test

12-residue tryptophan zipper beta hairpin (TZ2)

Folding@Home used to generate trajectories (fully atomistic simulation) ranging from 10 to 450 ns

1750 trajectories (14 reaching folded state) 22,400-node roadmap MFPT ~ 2-9 s, which is similar to

experimental measurements (from fluorescence and IR)

Page 52: Application of Probabilistic Roadmaps to the Study of Protein Motion.

Conclusion

Probabilistic roadmaps are a recent, but promising tool for exploring conformational space and computing ensemble properties of molecular pathways

Current/future research:• Better sampling strategies able to handle more

complex molecular models (protein-protein binding)• More work to include time information in roadmaps • More thorough experimental validation to compare

computed and measured quantitative properties