1 CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES Peter Minary Computational...

Post on 18-Jan-2018

217 views 0 download

description

CENTRAL DOGMA OF MOLECULAR BIOLOGY 3 F. H. Crick (1) Transcriptional Regulation Post Transcriptional Regulation Translation Folding (1) F. H. C. Crick et al. Nature (1970). FUNCTION Motion “If you want to understand function, study structure.” F. H. C. Crick

Transcript of 1 CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES Peter Minary Computational...

1

CONFORMATIONAL OPTIMIZATION AND SAMPLINGALONG NATURAL COORDINATES

Peter MinaryComputational Structural Biology Group & Bio-X

CenterStanford UniversityStanford, CA 94305

2

TALK OUTLINE

– Obstacles for Deciphering the Central Dogma of MB

– Challenges for Optimization & Sampling Algorithms

– Natural Coordinates for Biological Macromolecules

– Chain Closure Algorithms, Obstacles & Solutions

– An Atomic Level Insight into the Central Dogma• Nucleosome Positioning/Large Scale Optimization• Structure Space of RNA Junctions and Fractals• Interpretation & Refinement of Experimental Data

CENTRAL DOGMA OF MOLECULAR BIOLOGY

3

F. H. Crick(1) Tran

scrip

tiona

l

Regula

tion

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).

FUNCTIONM

otio

n“If you want to understand function, study structure.” F. H. C. Crick

CENTRAL DOGMA OF MOLECULAR BIOLOGY

4

F. H. Crick(1) Tran

scrip

tiona

l

Regula

tion

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mot

ion

5

TRANSCRIPTIONAL REGULATION

TF...GTCCAGTTACGAATTGCGCGC…DNA DNA

~

Nucleosome Structure Nucleosome Positioning

...GTCCAGTTACGAATTGCGCGC…

3D Structure

E(Xi)

…..GTGAATGCCCAG…..

Scan DNA

TF

DNA in Chromatin

– Grand Challenges for CSB• Structure Based Prediction of Nucleosome Positions• Structure Based Prediction of TransF Binding Sites

• Requires All Atom Representation & Rapid Optimization• Simultaneously Explore Sequence and Structure Space

• Need Conceptually Novel Optimization/Sampling Tools

CENTRAL DOGMA OF MOLECULAR BIOLOGY

6

F. H. Crick(1) Tran

scrip

tiona

l

Regula

tion

PostTranscriptional

Regulation

Translation Folding

(1) F. H. C. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mot

ion

POST TRANSCRIPTIONAL REGULATION

– Grand Challenges for CSB• Prediction of RNA Tertiary Structure

EXAMPLE: mRNA TRANSPORT IN NEURONS

• Need a Novel O/S Approach

• & Transport Protein Binding Sites

CENTRAL DOGMA OF MOLECULAR BIOLOGY

8

F. H. Crick(1) Tran

scrip

tiona

l

Regula

tion

PostTranscriptional

Regulation

Translation Folding

(1) F. H. Crick et al. Nature 227 561-563 (1970).FUNCTION

Mot

ion

EM images of Molecular Complex

PROTEIN MOTION

– In Current Trend: Experimentally Measured Structures Are Getting

• Larger in Size• Higher in Flexibility• Lower in Resolution

FASFattyAcidSynthase

– In Current Refinement Methods Atomic Motions Are Modeled As

• Independent• Isotropic• Harmonic

– To Follow the Trend Atomic Motion in Refinement Methods Should Be

• Collective• Anisotropic• Anharmonic

9

– Demand for Novel Optimization Methods for Structure Refinement

10

CHALLENGES FOR OPTIMIZATION & SAMPLING ALGORITHMS

– Roughness of the object function, E(X)• Leads to rare events in Markov Chain MC(1)

• Solutions– Multiple Markov Chains in Temperature(2)/Energy Domain(3, 4)

– Transformation of Variables(5) and/or using Extra Dimensions(6)

– Large number of degrees of freedom, Nd

• Number of energy basins is non polynomial in Nd

• Solutions– Local or Global Torsional Degrees of Freedom(4,7)

– Arbitrary/Most Relevant/Natural Degrees of Freedom(9)

(1) Metropolis, et al. J. Chem. Phys. 21, 1087-1091 (1953).(2) Geyer, et al. Proceedings of the 23rd Symposium on the Interface, 156-163 (1991).(3) Kou, et al. Annals of Statistics 34 1581-1619 (2006).(4) Minary et al. Annals of Statistics 34 1638-1642 (2006).(5) Minary et al. SIAM Journal of Scientific Computing 30 2055-2083 (2008).(6) Minary et al. J. Chem. Phys. 118 2510-2525 (2003) (7) Minary et al. J. Mol. Biol. 25 920-933 (2008).(8) Dodd et al. Mol. Phys. 78 961-996 (1993).(9) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

11

NATURAL DEGREES of FREEDOM for

NUCLEIC ACIDS

Dx ShiftDy SlideDz Rise

τ Tiltρ Rollω Twist

Sx ShearSy StretchSz Stagger

κ Buckleπ Propellerσ Opening

xy

zSx

x

y

zSy

xy

zSz

z

xy

κ

y

zxσ

yx

zx

y

Dx

zx

yDy

zx

y

Dz

xy

xy

zx

y

ω

dof: 10(4+12x½)

Sx

Sy

Sz

κπσ

Dx

Dy

Dzτρω

N

O3′O3′

RC

C5’

O5’ PC4’

O1’

Movesbreak the

chain!

τ 12

τ 23

θ1

θ2

12

NATURAL DEGREES of FREEDOM for PROTEINS

β-SHEET & α-HELIX Sx ShearSy StretchSz Stagger

κ Buckleπ Propellerσ Opening

x

y

zSx

Movesbreak the

chain!

13

CHAIN CLOSURE ALGORITHMS

– Analytical multi atom closure algorithms(1)

• Ncd non-linear equations and Ncd unknown, Ncd number of closure dof

• Ncd = 6 is the practical limit, given that the complexity is O(fNP(Ncd))

– Single atom Deterministic Full Closure (DFC)(2)

• Cost efficient• Two solutions or No solution

– Single atom Stochastic Partial Closure (SPC)(3) • Cost efficient• Solution always exist for• Any size of the chain break

(1) Dodd et al. Mol. Phys. 78 961-996 (1993).(2) Sklenar et al. J. Comp Chem. 27 309-315 (2005).(3) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

14

RECURSIVE STOCHASTIC CLOSURE 1 cycle of RSC = DFC[ SPC[ SPC[ SPC[…] ] ] ]

Molten zone

Molten zone

DFC

1st cycle

m cycles

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

• One SPC step– Restores 4-5, breaks 3-4

• Multiple SPC steps– Propagates the chain brake– Narrows closure gap

• AC = O(Ncd) << O(fNP(Ncd))– Ncd = 2 Nm + 5

15

MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-I

Molten zone (C4’….O3’)

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

16

MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-II

• Monte Carlo Minimization(1) (MCM) is Monte Carlo on

• In MCRSC(2) is Monte Carlo on

( ) min ( )X

E X E XE

E

( ) min ( ) d

id dXiX XE X E X

minimization invariant DOF X E evaluation

MCM

MCRSC

BFGS, CG none cart/tors ~10-1000

N cycle of RSC Xi arbitrary 1

(1) Wales, D. J., Scheraga, H. A. Science 285 1368-1372 (1999).(2) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).

17

• RSC works with an order of magnitude larger move sizes than DFC• RSC is like a wire, you pull the system that deforms to follow the change

RECURSIVE STOCHASTIC vs DETERMINISTIC FULL CLOSUREin MONTE CARLO: a B-DNA

zx

y

Dx

zx

yDy

zx

y

Dz

xy

zSx

x

y

zSy

xy

zSz

dof: 6

Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

E2 binding DNA: 5’-ACCGAATTCGGT-3’ Force Field: amber99-bs0

18

RECURSIVE STOCHASTIC CLOSURE vs LOOP TORSIONAL SAMPLING in MONTE CARLO: an α+β PROTEIN

SCOP id: d1div_2, 55 residue domain

(2) Minary & Levitt J. Mol. Biol. 25 920-933 (2008).(1) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).

(1)

(2)

Ncd = 19

19

APPLICATIONS

20

THE METHOD: GENERAL PIPELINE IN SILICO NUCLEOSOME POSITIONING

21

APPLICATION TO CHROMOSOME 14

(1) Cherry, J. M. et al., Nucleic Acids Res. 26, 73-79 (1998).(2) Kaplan, N. et al., Nature 458, 362-366 (2006). (3) Davey, C. A. et al., J. Mol. Biol. 319 1097-1113 (2002).(4) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(5) Perez et al., Biophysics J. 92 3817-3827 (2007).(6) Minary (2010).

ab initio

P(i)

i i

P(i)

in vitro

• Yeast Chromosome 14– 187k-189k from SGD(1)

– Experimental Data(2)

• Nucleosome template– 1.9 Å resolution– pdb code (1kx3)(3)

• Slide nucleosome along DNA– Slide a 147 bp window– Design template

• Run MCRSC on all structures– Force field: AMBER99-bs0(5)

– Software: MOSAICS(6)

• Get probability profile– P(i) ~ exp(-β <E(i)>)

187k 189k 201k 203k 205k 207k

Minary & Levitt

IN SILICO NUCLEOSOME POSITIONING

NUCLEOSOME OCCUPANCY

Yeast Chromosome 14

i

Minary & Levitt

P(i)in vivo

P(i)in vitro

ab initio P(i)

i 191000 193000 195000 197000 199000

P(i)in vivo

P(i)in vitro

P(i)ab initio

22

187000 191000 195000 199000 203000 207000

IN SILICO NUCLEOSOME POSITIONING

HIERARCHICAL NATURAL DOFs/MOVES (HNM)

23

L2L1

L1

L3 L4

EXPLORING RNA STRUCTURE SPACE

RNA 4 WAY JUNCTION: SAMPLING METHODS

24

Move Set(1,2,3)

L1

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. To be submitted.(3) Minary, P., MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

L1 NM-MC(1,3)

L1 – L2

Sampling Methods

L2

L3 L4

NM-MC(1,3)

MCRSC(1)

+ . . . =

L1 - L4

L1

HNM-MC(1,2,3)

.

.

L1 – L3 HNM-MC(1,2,3)

L1 – L4 . .

MCRSC(1)

+User Defined

Move Sets(Medicine/Physics)(Chemistry/Biology)

RNA 4 WAY JUNCTION

25

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Parisien and Major, Nature, 452, 51 (2008).(3) R. Das, J. Karanicolas, and D. Baker, Nat. Methods 7 (4), 291 (2010). (4) Sim, A., Levitt, M., Minary, P. , To be submitted. (5) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

NM-MC(1,5) FA-MC-Sym(2) FA-Rosetta(3) HNM-MC(1,4,5)

(a) (b) (c) (d)L1 L1-L4

• Necessary condition for unbiased sampling– Symmetric RNA -> distributions coincide

• Easy to improve by field specific move set– RNA : relative arrangement of stem loops

• Comparing to Fragment Assembly– Biased and non continuous sampling– Dependence on fragment libraries

HNM-MC(1,4,5)

L1 - L4

L2

L4

L1

L3

FRACTAL RNA: BEYOND CURRENT METHODS

26

(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

• Necessary condition for unbiased sampling– Symmetric RNA -> armend distributions coincide

• Further improvement by L5, L6, L7

– No limitation on improvement

• Benchmark with different move sets– Accuracy converges by L7

(1,2,3)

HNM-MC(1,2,3)εr

ror(

i)

i x 104

L1 – L4 L1 – L7

FRACTAL RNA: WHY/HOW DOES IT WORK?

27(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).(2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

EXPLORING RNA STRUCTURE SPACE

• Use embedded subspaces

• In particular– : 6 DOFs / main arms(2)

– : 6 DOFs / arms of arms(2)

– : 10 DOFs / nucleotides(1)

Ω3 ⊂Ω2 ⊂Ω1 ≡ Ω

Ω1

Ω3

Ω2

Ω1

• Low cost method to approximate

• Multi scale integration(3) along–

– around all

– around all

Ω2

Ω3

α dLL∈Ω∫ α (L) f (L)

α, f :Ω→ °

L3 ∈Ω3

L2 ∈Ω2

L1 ∈Ω1

L3L2

Fatty Acid Synthase (FAS)

EM images of Molecular Complex

OBJECTIVE

Objective

initial model refined model EM image

CRYO-EM REFINEMENT

28

initial structure

target structure2 Å rmsd

refined structure

VALIDATION I

(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS

optimization(1)-(3)

along natural dof

target projection18 Å rmsd

CRYO-EM REFINEMENT

29

Lysozyme

cc

Projection Angle

CRYO-EM REFINEMENTVALIDATION II: CROSS CORRELATION OF MAPS

Etotal= Weight*EEM+ Emolecule

THE PROTOCOL CRYO-EM REFINEMENT

31

Lysozyme

REFINEMENT CRYO-EM REFINEMENT

32

DOMAIN FLEXIBILITY CRYO-EM REFINEMENT

33

(1) Zhang, Minary, Levitt In preparation.(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS(4) Courtesy of Steve Ludtke, Baylor College, Texas.

(1)-(3)

(4)

CONCLUSION

• CSB has Limited Impact due to Inefficient Conformational Sampling

• Novel Algorithms Supporting Natural DOF May Offer The Solution

• Our Novel Approach May Open New Avenues

– In The Refinement and Interpretation of Experimental Data

– In The Use of Structural Information in Molecular Biology

• Atomic Level Understanding of the CDMB may be a reality with NC

34FUNCTION

“If the code does indeed have some logical foundation then it is legitimate to consider all the evidence, both good and bad, in any attempt to

deduce it.” F. C. H. Crick

CDMB

35

ACKNOWLEDGEMENTS

– Michael Levitt Computer Sci. & Structural Biology, Stanford, US– Jernei Ule Molecular Biology/MRC, Cambridge, UK– Peter Lukavszky Molecular Biology/MRC, Cambridge, UK– Sebastian Doniach Physics, Stanford, US– Zev Bryan Bioengineering, Stanford, US– Wing H Wong Statistics, Stanford, US– Wah Chiu Baylor College, Texas, US

– Adelene Sim Physics, Stanford, US (graduate student)– Gaurav Chopra Mathematics, Stanford, US (graduate student)– Junjie Zhang Baylor College and Stanford, US (postdoc)

– Anatole von Lilienfeld & and Workshop Organizing Committee