The Probabilistic Roadmap Approach to Study Molecular Motion

56
The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor, Computer Science, Stanford

description

The Probabilistic Roadmap Approach to Study Molecular Motion. Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor, Computer Science, Stanford. Molecular motion is an essential process of life. CspA. - PowerPoint PPT Presentation

Transcript of The Probabilistic Roadmap Approach to Study Molecular Motion

Page 1: The Probabilistic Roadmap Approach to  Study Molecular Motion

The Probabilistic Roadmap Approach to Study Molecular Motion

Jean-Claude LatombeKwan Im Thong Hood Cho Temple Visiting Professor,

NUSKumagai Professor, Computer Science, Stanford

Page 2: The Probabilistic Roadmap Approach to  Study Molecular Motion

Molecular motion is an essential process of life

CspA

Page 3: The Probabilistic Roadmap Approach to  Study Molecular Motion

Mad cow disease is caused by misfolding Drug molecules act by

binding to proteins

Understanding molecular motion could help cure

many diseases

Page 4: The Probabilistic Roadmap Approach to  Study Molecular Motion

As few experimental tools are available, computational tools

are critical

Stanford BioX cluster NMR spectrometer

Computer simulation:- Monte Carlo simulation- Molecular Dynamics

Page 5: The Probabilistic Roadmap Approach to  Study Molecular Motion

But MD and MC simulation have two major drawbacks

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways

Page 6: The Probabilistic Roadmap Approach to  Study Molecular Motion

But MD and MC simulation have two major drawbacks

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways

Intermediate states

Page 7: The Probabilistic Roadmap Approach to  Study Molecular Motion

But MD and MC simulation have two major drawbacks

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways Interest in

ensemble properties

Page 8: The Probabilistic Roadmap Approach to  Study Molecular Motion

Example of Ensemble Property:

Probability of Folding pfold

Unfolded state Folded state

pfold1- pfold

Measure kinetic distance to folded state

Page 9: The Probabilistic Roadmap Approach to  Study Molecular Motion

Other Examples of Ensemble Properties

Order of formation of secondary structure elements

Average time for a ligand to escape a binding site

Folding rate of a protein Key intermediates along folding

pathways Etc ...

Page 10: The Probabilistic Roadmap Approach to  Study Molecular Motion

1) Each simulation run yields a single pathway, while molecules tend to move along many different pathways Interest in ensemble properties

2) Each simulation run tends to waste much time in local minima

But MD and MC simulation have two major drawbacks

Page 11: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap-Based Representation

Network of conformations connected by local motion pathways

Compact representation of huge number of motion pathways

Coarse resolution relative to MC and MD simulation

Efficient algorithms for analyzing multiple pathways

Page 12: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmaps for Robot Motion Planning

free space

[Kavraki, Svetska, Latombe,Overmars, 95][Kavraki, Svetska, Latombe,Overmars, 95]

Page 13: The Probabilistic Roadmap Approach to  Study Molecular Motion

Initial Work: Application ofRoadmaps to Ligand Binding

A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7th Int. Conf. on Intelligent Syst. for Molecular Biology (ISMB), pp. 252-261,

1999

The ligand is modeled as a flexible molecule, but the protein is assumed rigid

A conformation of the ligand is defined by the position and orientation of a group of 3 atoms relative to the proteinand by the torsional angles of the ligand

Page 14: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap Construction (Node Generation)

Conformations of the ligand are sampled at random around the protein

The energy E at each sampled conformation is computed:

E = Einteraction + Einternal

Einteraction = electrostatic + van der Waals potentialEinternal = non-bonded pairs of atoms electrostatic + van der

Waals A sampled conformation is retained as a node with

probability:0 if E > Emax

Emax-EEmax-Emin

1 if E < Emin

Denser distribution of nodes in low-energy regions of conformational space

P = if Emin E Emax

Page 15: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap Construction (Edge Generation)

q q’

Each node is connected to each of its closest neighbors by a straight edge

Each edge is discretized at some resolution ε (= 1Å)

If any E(qi) > Emax , then the edge is rejected

qi qi+

1

E

Emax

ε

Page 16: The Probabilistic Roadmap Approach to  Study Molecular Motion

Heuristic measureof energetic difficultyof moving from q to q’

Roadmap Construction (Edge Generation)

q q’

Each node is connected to each of its closest neighbors by a straight edge

Each edge is discretized at some resolution ε (= 1Å)

If all E(qi) Emax , then the edge is retained and is assigned two weights w(qq’) and w(q’q)

where:

(probability that the ligand moves from qi to qi+1 when it is constrained to move along the edge)

qi qi+

1

i

w(q q') = -ln(P[i i+1])

ii+1

i ii+1 i-1

-(E -E )/ kT

i i+1 -(E -E )/ kT -(E -E )/ kT

eP[q q ] =

e e

ε

Page 17: The Probabilistic Roadmap Approach to  Study Molecular Motion

For a given goal node qg (e.g., binding conformation), the Dijkstra’s single-source algorithm computes the lowest-weight paths from qg to each node (in either direction) in O(N logN) time, where N = number of nodes

Various quantities can then be easily computed in O(N) time, e.g., average weights of all paths entering qg and of all paths leaving qg (~ binding and dissociation rates Kon and Koff)

Querying the Roadmap

Protein: Lactate dehydrogenaseLigand: Oxamate (7 degrees of freedom)

Page 18: The Probabilistic Roadmap Approach to  Study Molecular Motion

Experiments on 3 Complexes

1) PDB ID: 1ldmReceptor: Lactate Dehydrogenase (2386 atoms, 309 residues)Ligand: Oxamate (6 atoms, 7 dofs)

2) PDB ID: 4ts1Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423

atoms, 319 residues)Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs)

3) PDB ID: 1stpReceptor: Streptavidin (901 atoms, 121 residues)Ligand: Biotin (16 atoms, 11 dofs)

Page 19: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computation of Potential Binding Conformations

1) Sample many (several 1000’s) ligand’s conformations at random around protein

2) Repeat several times: Select lowest-energy

conformations that are close to protein surface

Resample around them

3) Retain k (~10) lowest-energy conformations whose centers of mass are at least 5Å apart

lactate dehydrogenase

active site

Page 20: The Probabilistic Roadmap Approach to  Study Molecular Motion

Results for 1ldm

Some potential binding sites have slightly lower energy than the active site Energy is not a discriminating factor for recognizing active site

Average path weights (energetic difficulty) to enter and leave binding site are significantly greater for the active site Indicates that the active site is surrounded by an energy barrier that “traps” the ligand

Page 21: The Probabilistic Roadmap Approach to  Study Molecular Motion

Known native state Degrees of freedom: φ-ψ angles Energy: van der Waals, hydrogen

bonds, hydrophobic effect New idea: Sampling strategy

Application of Roadmaps to Protein Folding

N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of

Known Native Structures. J. Comp. Biology, 10(2):239-255, 2003

Page 22: The Probabilistic Roadmap Approach to  Study Molecular Motion

High dimensionality non-uniform sampling

Conformations are sampled using Gaussian distribution around native state

Conformations are sorted into bins by number of native contacts (pairs of C atoms that are closeapart in native structure)

Sampling ends when all bins have minimum number of conformations “good” coverage of conformational space

Sampling Strategy(Node Generation)

Page 23: The Probabilistic Roadmap Approach to  Study Molecular Motion

The lowest-weight path is extracted from each denatured conformation to the folded one

The order of formation of SSE’s is computed along each path

The formation order that appears the most often over all paths is considered the SSE formation order of the protein

Application: Order of Formation of Secondary Structure

Elements

Page 24: The Probabilistic Roadmap Approach to  Study Molecular Motion

1) The contact matrix showing the time step when each native contact appears is built

Order of Formation of Secondary Structures along a

Path

Page 25: The Probabilistic Roadmap Approach to  Study Molecular Motion

Protein CI2 (1 + 4 )

Page 26: The Probabilistic Roadmap Approach to  Study Molecular Motion

Protein CI2(1 + 4 )

60

5

The native contact between residues 5 and 60 appears at step 216

Page 27: The Probabilistic Roadmap Approach to  Study Molecular Motion

1) The contact matrix showing the time step when each native contact appears is built

2) The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

Order of Formation of Secondary Structures along a

Path

Page 28: The Probabilistic Roadmap Approach to  Study Molecular Motion

Protein CI2(1 + 4 )

forms at time step 122 (II)3 and 4 come together at 187 (V)2 and 3 come together at 210 (IV)1 and 4 come together at 214 (III)

Page 29: The Probabilistic Roadmap Approach to  Study Molecular Motion

The lowest-weight path is extracted from each denatured conformation to the folded one

The order of formation of SSE’s is computed along each path

The formation order that appears the most often over all paths is considered the SSE formation order of the protein

Application: Order of Formation of Secondary Structure

Elements

Page 30: The Probabilistic Roadmap Approach to  Study Molecular Motion

Comparison with Experimental Data

1+5

31+4

1+4 5126, 70k

5471, 104k7975, 104k8357, 119k

roadmap sizeSSE’s

Page 31: The Probabilistic Roadmap Approach to  Study Molecular Motion

Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C.

Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-

281, 2003

New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges

vi

vj

Pij

Page 32: The Probabilistic Roadmap Approach to  Study Molecular Motion

Edge Probabilities

Follow Metropolis criteria:

ijij

iij

i

exp(-ΔE / kT), if ΔE >0;

NP =

1, otherwise.

N

Self-transition probability:

ii ijj i

P=1- Pvj

vi

Pij

Pii

Page 33: The Probabilistic Roadmap Approach to  Study Molecular Motion

V

Stochastic Roadmap Simulation

Pij

Stochastic roadmap simulation and Monte Carlo simulation converge to the Boltzmann distribution, i.e., the number of times SRS is at a node in V converges toward Zwhen the number of nodes grows (and they are uniformly distributed)

-E/ kT

Ve dV

Page 34: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap as Markov Chain

Transition probability Pij depends only on i and j

Pijij

Page 35: The Probabilistic Roadmap Approach to  Study Molecular Motion

Probability of Folding pfold

Unfolded state Folded state

pfold1- pfold

Page 36: The Probabilistic Roadmap Approach to  Study Molecular Motion

First-Step Analysis

Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

Pii

F: Folded stateU: Unfolded state

Pij

i

k

j

l

m

Pik Pil

Pim

Page 37: The Probabilistic Roadmap Approach to  Study Molecular Motion

Pii

F: Folded stateU: Unfolded state

First-Step Analysis

Pij

i

k

j

l

m

Pik Pil

Pim

Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

=1 =1

One linear equation per node Solution gives pfold for all nodes

No explicit simulation run All pathways are taken into account Sparse linear system

Page 38: The Probabilistic Roadmap Approach to  Study Molecular Motion

Number of Self-Avoiding Walks

on a 2D Grid

1, 2, 12, 184, 8512, 1262816,575780564, 789360053252, 3266598486981642,(10x10) 41044208702632496804, (11x11) 1568758030464750013214100,(12x12) 182413291514248049241470885236 > 1028

http://mathworld.wolfram.com/Self-AvoidingWalk.html

Page 39: The Probabilistic Roadmap Approach to  Study Molecular Motion

In contrast …

Computing pfold with MC simulation requires:

For every conformation q of interest

Perform many MC simulation runs from q

Count number of times F is attained first

Page 40: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computational Tests

• 1ROP (repressor of primer)

• 2 helices• 6 DOF

• 1HDD (Engrailed homeodomain)

• 3 helices• 12 DOF

H-P energy model with steric clash exclusion [Sun et al., 95]

Page 41: The Probabilistic Roadmap Approach to  Study Molecular Motion

pfold for ß hairpin

Immunoglobin binding protein

(Protein G)

Last 16 amino acids

Cα based representation

Go model energy function

42 DOFs

[Zhou and Karplus, `99]

Page 42: The Probabilistic Roadmap Approach to  Study Molecular Motion

1ROP

Correlation with MC Approach

Page 43: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computation Times (ß hairpin)

Monte Carlo (30 simulations):

1 conformation ~10 hours ofcomputer time

Over 107 energy

computations

Roadmap:

2000 conformations23 seconds ofcomputer time

~50,000 energycomputations

~6 orders of magnitude speedup!

Page 44: The Probabilistic Roadmap Approach to  Study Molecular Motion

Using Path Sampling to Construct Roadmaps

N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build Better Markovian State Models: Predicting the Folding Rate and Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical

Physics, 121(1):415-425, 2004

New idea:Paths computed with Molecular Dynamics simulation techniques are used to create the nodes of the roadmap

More pertinent/better distributed nodes

Edges are labeled with the time needed to traverse them

Page 45: The Probabilistic Roadmap Approach to  Study Molecular Motion

t

U

F

Sampling Nodes from Computed Paths (Path

Shooting)

Page 46: The Probabilistic Roadmap Approach to  Study Molecular Motion

Sampling Nodes from Computed Paths (Path

Shooting)

U

Fi

jtij

pij

Page 47: The Probabilistic Roadmap Approach to  Study Molecular Motion

Node Merging

If two nodes are closer apart than some , they are merged into one roadmap

Rules are applied to update edge probabilities and times

4

1

5

3

2P12, t12

P14, t14

1

5

3

2’P12’, t12’

P12’ = P12 + P14 t12’ = P12xt12 + P14xt14

Page 48: The Probabilistic Roadmap Approach to  Study Molecular Motion

Application: Computation of MFPT

Mean First Passage Time: the average time when a protein first reaches its folded state

First-Step Analysis yields: MPFT(i) = j Pij x (tij + MPFT(j)) MPFT(i) = 0 if i F

Assuming first-order kinetics, the probability that a protein folds at time t is:

where r is the folding rate

MFPT = =1/r

-rtfP(t) = 1 - e

f0

P(t) tdt

Page 49: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computational Test

12-residue tryptophan zipper beta hairpin (TZ2)

Folding@Home used to generate trajectories (fully atomistic simulation) ranging from 10 to 450 ns

1750 trajectories (14 reaching folded state) 22,400-node roadmap MFPT ~ 2-9 s, which is similar to

experimental measurements (from fluorescence and IR)

Page 50: The Probabilistic Roadmap Approach to  Study Molecular Motion

Conformational Analysis of Protein Loops

J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric Algorithms for the Conformational Analysis of Long Protein Loops.

J. Comp. Chemistry, 25:956-967, 2004

New idea:Explore the clash-free subset of the conformational space of a loop, by building a tree-shaped roadmap

Kinematic model: - angles on the backbone + i torsional angles in side-chains

Page 51: The Probabilistic Roadmap Approach to  Study Molecular Motion

Amylosucrase (AS)- Only enzyme in its family that acts on sucrose substrate-The 17-residue loop (named loop 7) between Gly433 and Gly449 is believed to play a pivotal role

Page 52: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap Construction

A tree-shaped roadmap is created from a start conformation qstart

At each step of the roadmap construction, a conformation qrand of the loop is picked at random, and a new roadmap node is created by iteratively pulling toward it the existing node that is closest to qrand

Page 53: The Probabilistic Roadmap Approach to  Study Molecular Motion

Roadmap Construction

C CfreeCclosed

qstart

qrand

Stops when one can’t get closer to qrand or a clash is detected

Page 54: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computational Results Surprisingly, loop 7 can’t move much Main bottleneck is residue Asp231

Positions of theC atom of middleresidue (Ser441)

Page 55: The Probabilistic Roadmap Approach to  Study Molecular Motion

Computational Results If residue Asp231 is “removed”, then loop

7’s mobility increases dramatically. The C atom of Ser441 can be displaced by more than 9Å from its crystallographic position

Page 56: The Probabilistic Roadmap Approach to  Study Molecular Motion

Conclusion

Probabilistic roadmaps are a recent, but promising tool for exploring conformational spaces and computing ensemble properties of molecular pathways

Current/future research:•Better sampling strategies able to handle more

complex molecular models (protein-protein binding)•More work to include time information in roadmaps •More thorough experimental validation to compare

computed and measured quantitative properties