Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building...

51
Let's Play LEGO!  Use the PDB [1] as the building blocks for 3D protein construction.  Target is Baker's Top7 protein (novel fold) [2]. [1] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, The Protein Data Bank, Nucl. Acids Res28 (2000) 235-242. [2] B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, D. Baker, Design of a novel globular protein fold with atomic-level accuracy, Science 302 (2003) 1364-1368. [email protected]

Transcript of Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building...

Page 1: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Let's Play LEGO!

● Use the PDB [1] as the building blocks for 3D protein construction.● Target is Baker's Top7 protein (novel fold) [2].

[1] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, The Protein Data Bank, Nucl. Acids Res. 28 (2000) 235­242.

[2] B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, D. Baker, Design of a novel globular protein fold with atomic­level accuracy, Science 302 (2003) 1364­1368.

[email protected]

Page 2: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Baker's Top7 Protein

● Two beta­alpha­beta modules (with a strand in between).● Pure anti­parallel beta­sheet.● PDB code 1QYS.

Page 3: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Design1) Make a 3D backbone of a novel fold (i.e. not yet observed in the PDB). The novelty here comes from the relative strand arrangement (2­1­3­5­4).

2) Optimize the sequence on the backbone using a rotamer library.

3) Optimize the backbone given the sequence using an all atom potential energy (implicit solvent model).

4) Goto 2) until sequence/energy converges.

[1] B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, D. Baker, Design of a novel globular protein fold with atomic­level accuracy, Science 302 (2003) 1364­1368.

RosettaDesign (http://www.unc.edu/kuhlmanpg/rosettadesign.htm) [1]

Page 4: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Our Approach1) Sequence ­> Secondary Structure Assignments

(alpha­helices, beta­turns, beta­strands, coils, ...)by computer (SS prediction, ...), by lab (CD, NMR, ...), ...

2) Strands ­> Plausible beta­sheet topologies(at least N! * 2^(N­2) solutions, where N is number of strands)start building the protein with it's beta­sheet; long­rangesequence portions interact at short­range in 3D structure!everything in between (turns, helices) is trapped!!

3) Topologies ­> 3D structures of beta­sheets(strands are curled and sheets are twisted, pleated and arched)existing software is not able to explore the conformational spaceof beta­sheets because of it's complexity (suffice not to updatesome phi/psi angles here and there).­> we have a very nice solution to it!

Page 5: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Our Approach4) From the 3D structures of the beta­sheets add:

­ beta­turns.­ beta­alpha­beta units.­> we also have a nice solution to it too!

5) Optimize the hydrophobic moments of all the alpha­helices witha simple search space operator and a simple energy function.(we will stop here...)

6) Use rotamer library to add side­chains.

7) Use a minimizer to correct:­ bond lengths (specially the peptidic link between blocks).­ steric clashes.

Page 6: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Start with the Sequence

● NNPREDICT (http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html) [2][3] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ----EEEE----------EEEE----HHHHHHHHHHHHHHH-----EEEEEEE--HHHHHHHHHHHHHHHHHHH----EEEE-----EEEE-------HHHH----

Legend: S for Sequence, C for Crystal, P for Prediction.

[1] McGuffin LJ, Bryson K, Jones DT. (2000) The PSIPRED protein structure prediction server. Bioinformatics. 16, 404­405.[2] J. L. McClelland and D. E. Rumelhart. (1988) "Explorations in Parallel Distributed Processing" vol 3. pp 318­362. MIT Press, Cambridge MA.[3] D. G. Kneller, F. E. Cohen and R. Langridge (1990) "Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network" J. Mol. Biol. (214) 171­182. 

● PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html) [1] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ---EEEEEEE-------EEEEEEE-HHHHHHHHHHHHHHHHH----EEEEEEEE--HHHHHHHHHHHHHHHHH----EEEEEE---EEEEEEEE------------

Legend: S for Sequence, C for Crystal, P for Prediction.

Page 7: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Use Hairpin and Helix Preds also● TURNPRED (http://www.jens­meiler.de/turnpred.html) [1] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ---EEEEEEEhhhhhhEEEEEE---HHHHHHHHHHHHHHHHH----EEEEEEEE--HHHHHHHHHHHHHHHHHH---EEEEEEhhhEEEEEEEE------------

Legend: S for Sequence, C for Crystal, P for Prediction.

[1] Kuhn, M.; Meiler, J.; Baker, D. Strand­loop­strand motifs: prediction of hairpins and diverging turns in proteins, Proteins (2003) 54, 282­288

● PROTSCALE (http://www.expasy.org/tools/protscale.html)

Page 8: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Secondary Structure Prediction

● Exploit the strength of the method.(ex. TURNPRED for beta­turns)

● Use recent methods on recent databases.(not good to use Chou/Fasman parameterized in 1973 on59 proteins)

● Look for consistent and consensus predictions.(different predictions are due to ambiguous peptide signal)

● Relative prediction accuracy:beta­turns > alpha­helices > beta­strands > coils

Page 9: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Secondary StructureSTRAND 4 10TURN 10 13STRAND 13 19LOOP 19 24HELIX 24 41LOOP 41 45STRAND 45 51LOOP 51 55HELIX 55 72LOOP 72 76STRAND 76 82TURN 82 85STRAND 85 91

Page 10: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Fragments from PDB

not new:

● ROSETTA [1]● PROFESY [2]● 3­MER [3]● Assembly of Segments [4]...

Ideas behind the use of PDB fragments:

1) Sequence imposes the local backbone conformation [5].2) PDB blocks reduce the conformational search space [4].

(see also [6] for technical points)

Page 11: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Fragments from PDB (refs)[1] K. T. Simons, C. Kooperberg, E. Huang, D. Baker, Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol.  268 (1997) 209­25.

[2] J. Lee, S. Y. Kim, K. Joo, I. Kim, J. Lee, Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins 56 (2004) 704­14. 

[3] E. Martineau, P. J. L'Heureux, J. R. Gunn, Biased fragment distribution in MC simulation of protein folding, J Comput Chem. 25 (2004) 1895­903. 

[4] I. Simon, L. Glasser, H. A. Scheraga, Calculation of protein conformation as an assembly of stable overlapping segments: application to bovine pancreatic trypsin inhibitor. PNAS 88 (1991) 3661­5. 

[5] A. G. Street, S. L. Mayo, Intrinsic beta­sheet propensities result from van der Waals interactions between side chains and the local backbone. PNAS 96 (1999) 9074­6.

[6] J. B. Holmes, J. Tsai, Some fundamental aspects of building protein structures from fragment libraries.Protein Sci. 13 (2004) 1636­50. 

Page 12: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

What is a Backtracking?Variables Domains

V1V2V3

D1 : { 1, 2, 3, ..., 10 }D2 : { A, B, C, ..., Z }D3 : { I, II, III, ..., M }

The backtracking produces the Cartesian productD1 x D2 x D3

In a depth­first search­like manner

Page 13: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

What is a Backtracking?

1 2 10...V1

A B Z...V2

I II M...V3

Partial assignments are checked for validity.If not valid then backtrack!

Page 14: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Backtrack Optimizations1) Variables with smallest domains should be at the root.

2) 1­Consistency (node­consistency)often taken for granted because domain values are taken fromthe PDB (nucleotide conformation, WC relation, pair of beta­strands, ...).

3) 2­Consistency (arc­consistency)idea: when a backtrack fails it often fails for the same reason.­> can we eliminate the dead­ends from the domain values?

compute the Cartesian product D(i) x D(i+1).if there is a value j in D(i) for which there are no valid sub­treeswith k, D(i,j)­D(i+1,k), then remove j from D(i).­> not applicable for MCSYM (why? MCSYM is 2­consistent...)

Page 15: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Strand ShufflerGiven all the beta­strands how do we arrange them to form a beta­sheet?

With N strands there are N! ways for relative ordering(1­2­3, 1­3­2, ...)

In each of these the strands can either go left or rightwhich leads to 2^N different strand orientations.

BUT there are 2 axes of symmetry (divide by two,two times), so total number of fold is “just”

N! * 2^(N­2)

Now, consider sliding the strands with respect tothe others... this is called registering.

Page 16: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Strand Shuffler

We need a program that:1) generates all plausible beta­sheet topology.2) scores each topology so we can select the best one.

see http://www­lbit.iro.umontreal.ca/bShuffle/index.html...

Page 17: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Strand Shuffler

1) generates all plausible beta­sheet topology.­> this is straightforward...

2) scores each topology so we can select the best one.­> this is not so obvious... why?

­ Context is a major determinant of beta­sheet propensity [1].­ What determines the best registering between 2 strands?

­ residue pairings?­ hydrogen bond network?­ context??

[1] D. L. Minor Jr, P. S. Kim, Context is a major determinant of beta­sheet propensity. Nature 371 (1994), 264­7. 

Page 18: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Strand ShufflerThe energy model is as follow:

­ Amino­acid composition in parallel and anti­parallel sheets differ [1].­ Amino­acid pairings are not “random” [2][3].­ Beta­sheets often have an hydrophobic face [4].­ Number of H­bonds given a topology (~2.8 kcal/mol/h­bond) [5].

[1] S. Lifson, C. Sander, Antiparallel and parallel beta­strands differ in amino acid residue preferences, Nature 282 (1979) 109­111.[2] S. Lifson, C. Sander, Specific recognition in the tertiary structure of beta­sheets of proteins, J. Mol. Biol. 139 (1980) 627­639.[3] H. Zhu, W. Braun, Sequence specificity, statistical potentials, and three­dimensional structure prediction with self­correcting distance      geometry calculations of beta­sheet formation in proteins, Protein Sci. 8 (1999) 326­342.[4] J. F. Richardson, D. C. Richardson, Principles and patterns of protein conformation, Plenum Press, New York, 1989, Ch. 1, pp. 1­98.[5] D. N. Boobbyer, P. J. Goodford, P. M. McWhinnie, R. C. Wade, New hydrogen­bond potentials for use in determining energetically      favorable binding sites on molecules of known structure, J. Med. Chem. 32 (1989) 1083­1094.

Page 19: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

./bShuffle.exe -S -R 6 -O 19-45 -O 51-76 -H -B 23 ./baker.str

STRAND 4 10STRAND 13 19STRAND 45 51STRAND 76 82STRAND 85 91

● Maximum register sliding is 6 (strands are of length 7).● We want residues 19 and 45, as well as 51 and 76, to be on opposite sides of the sheet. This is to force the two helices to lie over the beta­sheet.● The mean number of H­bonds in all the generated topologies is 23.● Consider also the most hydrophobic face (-H).

Page 20: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] Pairing Energy: -23.73 kcal/molHydrophobicity Energy: -7.04 kcal/mol

(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)

H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 85 with 51] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 82 with 85] Energy: -12.50 kcal/mol)

--------------------------------------- Total Sheet Energy: -43.27 kcal/mol

Lowest beta­sheet energy:

● As in Top7, with the H­bonding network [82­85].● Has the most number of H­bonds among all topologies.

Page 21: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] Pairing Energy: -21.93 kcal/molHydrophobicity Energy: -7.04 kcal/mol

(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)

H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 5 with 18] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 4 with 19] Energy: -12.50 kcal/mol)

--------------------------------------- Total Sheet Energy: -41.47 kcal/mol

Low beta­sheet energy:

● Same face hydrophobicity as crystal.● Networked salt bridge (D9­R47­E89) which is electrostatically more stable than isolated version [1] (not taken into account).

[1] S. Kumar, R. Nussinov, Salt bridge stability in monomeric proteins, J Mol Biol. 293 (1999) 1241­55.

Page 22: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

Low beta­sheet energy:

2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] Pairing Energy: -20.89 kcal/molHydrophobicity Energy: -7.04 kcal/mol

(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)

H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 85 with 82] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 51 with 85] Energy: -12.50 kcal/mol)

--------------------------------------- Total Sheet Energy: -40.43 kcal/mol

● Same face hydrophobicity as crystal.● D9 and D76 are close in space.● Would force an alpha­helix to lie on the polar face.

Page 23: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

We cannot discriminate between these plausible topologies based on:

­ Face hydrophobicity.­ Number of H­bonds.

Also each topology has an unpaired charged residue at a border strand. This should prevent amyloid fibril formation [1]. Remember that this has been crystalized...

[1] J. S. Richardson, D. C. Richardson, Natural beta­sheet proteins use negative design to avoid edge­to­edge aggregation, PNAS 99 (2002) 2754­9.

Page 24: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Sheet Builder

Given a beta­sheet topology how can build 3D structures that satisfy the prescribed topology (including the h­bonding network and the beta­bulges)?

Does only 1 beta­sheet conformation allows for proper placement of the alpha­helices? If so which is it?

Can we explore the conformational search space for a beta­sheet?

see http://www­lbit.iro.umontreal.ca/bBuilder/index.html...

Page 25: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Sheet BuilderBasically it backtracks on pairs of strands, assembling them in 3D.

Page 26: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Sheet Builder

Page 27: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Sheet Builder

Preserves the original H­bonds from the crystal structures...

Page 28: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Sheet Builder

Precision:

Flexible:

Accuracy test of the Beta­Sheet Builder on the beta­sheet of 1TML. The crystal structure has light grey cylinders while best RMSD (0.83A) structure has dark grey ones. Strand ribbons are pictured for the crystal structure.

Flexibility test of the Beta­Sheet Builder on the beta­sheet of 1TML. The crystal structure is in red while the worst RMSD rebuilt structure is in blue. Both structures are aligned along the strand 40 to 42. The high RMSD (7.88A) comes from the fact that the rebuilt structure chooses a different path as soon as the third strand from the top.

Page 29: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

354 VAL5 GLN6 VAL7 ASN8 ILE9 ASP10 ASP13 LYS14 ASN15 PHE16 ASP17 TYR18 THR19 TYR45 ARG46 VAL47 ARG48 ILE49 SER50 ILE51 THR76 ASP77 ILE78 ASN79 VAL80 THR81 PHE82 ASP85 THR86 VAL87 THR88 VAL89 GLU90 GLY91 GLN

LINKP 10 13LINKH 10 13LINKP 76 91LINKH 76 91LINKP 91 45LINKP 45 9LINKH 45 9LINKP 9 14LINKP 77 90LINKP 90 46LINKH 90 46LINKP 46 8LINKP 8 15LINKH 8 15LINKP 78 89LINKH 78 89LINKP 89 47LINKP 47 7LINKH 47 7LINKP 7 16LINKP 79 88LINKP 88 48LINKH 88 48LINKP 48 6LINKP 6 17LINKH 6 17LINKP 80 87LINKH 80 87LINKP 87 49LINKP 49 5LINKH 49 5LINKP 5 18LINKP 81 86LINKP 86 50LINKH 86 50LINKP 50 4LINKP 4 19LINKH 4 19LINKP 82 85LINKH 82 85LINKP 85 51

1101334621-9-2.str

● LINKP: beta­sheet partners.● LINKH: H­bond.

Page 30: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Sheet

psb.exe.AMD64 1101334621-9-2.str cullpdb_pc25_res3.0_R1.0_d040427_chains3083.bspider.pair.dat0.5 0.5 0.4 9999 1.0

backtrack tree size before arc­consistency: 1.12615e+14   (100%)backtrack tree size after arc­consistency:  8.04181e+13   ( 71%)

193243 partial structures were rejected for CB(i)­CB(j) steric conflicts.31422720 partial structures were rejected for C(i)­N(i+1) peptidic deformation 1.32 +/­ 0.5A.

● 579 beta­sheet 3D structures at 1.0A from each other.● 13:10:57 on AMD64 @ 2.2 Ghz.

­> closest RMSD to crystal structure is 1.0A (backbone­only).

Page 31: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­SheetStrand 4­10­> cyan

Strand 13­19­> blue

Strand 45­51­> green

Strand 76­82­> red

Strand 85­91­> yellow

579 structuresaligned on thestrand 13­19...

Page 32: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Add Turns

A program to add beta­turns on beta­sheets...

● database of:< TFO(1,2), 3D structure >(turn fragments from PDB)

● distance between 2 TFOs

Page 33: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Turns

STRAND 4 10● TURN 10 13STRAND 13 19STRAND 76 82● TURN 82 85STRAND 85 91

foreach i ( 1101334621-9-2-??????.pdb )./addFrag.exe -T -Z 1.5 baker.all.bab $i loops.25.dat

● Matrix distance (“closure”) is 1.5 Angstroms.● This leaves us 463 (80%) beta­sheets on 579 (100%).

Page 34: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Top7 Beta­Turns

~20 A

Page 35: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Alpha­Beta Builder

77% of alpha­helices are “tied” to beta­strands.38% of alpha­helices are “tied” at both N and C­terms.

Page 36: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Alpha­Beta BuilderBacktrack level 1:variables:

v1: N­term loop.v2: helix.v3: C­term loop.

domains:3D fragments from PDB.­> generates 3D babs foreach bab units.

Backtrack level 2:variables:

v(i): the ith b­a­b unit.domains:

previously assembled babs.­> generates 3D structureswith co­existing babs.

Page 37: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Alpha­Beta Builder

The loops encode the stereo­chemistry of the inter­strand crossovers...

Page 38: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Alpha­Beta Builder

WHY?

Page 39: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Beta­Alpha­Beta of Top7

On the 463 3D structures of beta­sheets with added beta­turns

● We obtain 9559 3D structures with the 2 babs.

● Not all beta­sheets are suitable for the helices:115/463 (25%) did not yield 3D structures for babs!

● Stats on number of structures with 3D babs:Min: 1, Max: 176, Mean: 27.5, StdDev: 33.0

● About 4 days of computation on AMD64 @ 2.2 Ghz.

foreach i ( 1101334621-9-2-??????-??????.pdb )./addFrag.exe -L -Z 1.5 baker.all.bab $i loops.25.dat

Page 40: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Hydrophobic Fitness Score

[1] Huang, E. S., Subbiah, S. & Levitt, M. Recognizing native folds by the arrangement of hydrophobic and polar residues. J. Mol. Biol. 252 (1995) 709­720. [2] Huang, E. S., Subbiah, S., Tsai, J. & Levitt, M. Using a hydrophobic contact potential to evaluate native and near­native folds generated by molecular dynamics simulations. J. Mol. Biol. 257 (1996) 716­725.

Page 41: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Hydrophobic Fitness Score

2 problems with the original formulation:

1) HP partition of amino­acids.Solution: use a hydrophobic scaling of the amino­acids.

2) Hard distance cut­offs.Solution: use a distance switching function.

Page 42: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Hydrophobic Fitness Score

[1] Cowan, R. & Whittaker, R. G. Hydrophobicity indices for amino acid residues as determined by high­performance liquid chromatography. Pept. Res. 3 (1990)  75­80.

Hydrophobic scaling of the amino­acids:

Page 43: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Hydrophobic Fitness Score

Distance switching function:

Plot[ (1-Tanh[ x – 10 ])/2, {x, 0, 20} ]

Page 44: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Hydrophobic Fitness Score

And finally:

Page 45: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Helix Hydrophobic MomentThe Beta­Alpha­Beta Builder is a geometrical process; it makes sure that the loop­helix­loop fragment can be attached on the beta­sheet; it does not take into account the hydrophobic moment of the helix...

Simple procedure to optimize the hydrophobic moment of helices within a 3D structure:

consider each helix one at the time.1) rotate it 360 degrees and remember best rotation(we optimize the helix within the context of the others aswe find them currently in the 3D structure).2) rotate that helix at it's best rotation.repeat while an helix has rotated to a different angle.

Best rotation: minimum of the HF score!

Page 46: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

optHelix

HFS: ­2.95 HFS: ­5.80

● HFS score is better.● We have disconnected the helices because of the rotation.

Page 47: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

optHelix+154

­10

­7

­4

...

Page 48: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Add Loops

● The helix rotation optimization procedure can disconnectthe helix from it's loop because of the rotation.

● We need a procedure to rebuild these loops:addTurn ­L (instead of adding turns we'll add loops)

● We will destroy any 3D structure whose loops cannot connect the helices properly (say within 1.5 A).

Page 49: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

HFS vs RMSD

1QYS HFS Score: ­7.37

Page 50: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Best Solution

RMSD: 1.34    HFS: ­7.20

Page 51: Let's Play LEGO!major/BCM6200/notes/baker.pdf · Let's Play LEGO! Use the PDB [1] as the building blocks for 3D protein construction. Target is Baker's Top7 protein (novel fold) [2].

Conclusions

● Possible to build atomic precision 3D structures from PDB fragments.● Possible to build novel folds (goal of Top7) from PDB fragments.● Independence of sequence, except:

­ beta­sheet topology determination (start).­ helix hydrophobic moment optimization (end).

● Not all beta­sheet 3D structures can accommodate BABs.● The Hydrophobic Fitness Score is a properly behaved function to

pick the native­like structures (in our example).