AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard...

12
27 AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native conformation of a protein from the amino acid sequence alone. Such attempts are both a fundamental test of our understanding of protein folding, and an important practical challenge in this era of large scale genome sequencing projects, which are producing large numbers of protein sequences for which no three-dimensional structural information is available. Anfinsen showed forty years ago that all of the information necessary for a protein to fold to the native state resides in the protein’s amino acid sequence (Anfinsen et al., 1961; Anfinsen, 1973). In the absence of large kinetic barriers in the free energy land- scape, Anfinsen’s results and those of large numbers of researchers in the intervening years suggest that the native conformations of most proteins are the lowest free energy conformations for their sequences (for a description of some notable exceptions, see Baker and Agard, 1994). Successful structure prediction requires a free energy function sufficiently close to the true potential for the native state to be at one of the lowest free energy minima, as well as a method for searching conformational space for low energy minima. Ab initio structure prediction is challenging because current potential functions have lim- ited accuracy, and the conformational space to be searched is vast. Many methods use reduced representations, simplified potentials, and coarse search strategies in recogni- tion of this resolution limit (Simons et al., 1997; Samudrala et al., 1999; Ortiz et al., 1999; Pillardy et al., 2001). Encouragingly, these simplified methods are starting to show some success in protein structure prediction (Murzin, 2001; Lesk, Lo Conte, and Hubbard, 2001) and have advanced to the point where genome scale modeling may become useful. Structural Bioinformatics Edited by Philip E. Bourne and Helge Weissig Copyright 2003 by Wiley-Liss, Inc. 547

Transcript of AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard...

Page 1: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

27

AB INITIO METHODSDylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker

Ab initio structure prediction seeks to predict the native conformation of a proteinfrom the amino acid sequence alone. Such attempts are both a fundamental test of ourunderstanding of protein folding, and an important practical challenge in this era oflarge scale genome sequencing projects, which are producing large numbers of proteinsequences for which no three-dimensional structural information is available.

Anfinsen showed forty years ago that all of the information necessary for a proteinto fold to the native state resides in the protein’s amino acid sequence (Anfinsen et al.,1961; Anfinsen, 1973). In the absence of large kinetic barriers in the free energy land-scape, Anfinsen’s results and those of large numbers of researchers in the interveningyears suggest that the native conformations of most proteins are the lowest free energyconformations for their sequences (for a description of some notable exceptions, seeBaker and Agard, 1994).

Successful structure prediction requires a free energy function sufficiently close tothe true potential for the native state to be at one of the lowest free energy minima,as well as a method for searching conformational space for low energy minima. Abinitio structure prediction is challenging because current potential functions have lim-ited accuracy, and the conformational space to be searched is vast. Many methods usereduced representations, simplified potentials, and coarse search strategies in recogni-tion of this resolution limit (Simons et al., 1997; Samudrala et al., 1999; Ortiz et al.,1999; Pillardy et al., 2001). Encouragingly, these simplified methods are starting toshow some success in protein structure prediction (Murzin, 2001; Lesk, Lo Conte, andHubbard, 2001) and have advanced to the point where genome scale modeling maybecome useful.

Structural BioinformaticsEdited by Philip E. Bourne and Helge WeissigCopyright 2003 by Wiley-Liss, Inc.

547

Page 2: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

548 AB IN IT IO M ETHODS

REPRESENTATIONS OF THE POLYPEPTIDE CHAIN

The most detailed representations include all atoms of the protein and the surroundingsolvent molecules. However, representing this large number of atoms and the interac-tions between them is quite computationally expensive, and it is not clear that this levelof detail is necessary during the phase of the search far from the native conformation.

To streamline the calculations, representations can be simplified in a variety ofways. The use of explicit solvent molecules is usually replaced by employing implicitsolvent models. United atom representations are frequently used in which hydrogensare drawn into their base carbon, oxygen, and nitrogen atoms. Side chains can berepresented using a limited set of conformations (Dunbrack and Karplus, 1994) that arefound to be prevalent in structures from the Protein Data Bank (PDB; see Chapter 9),without any great loss in predictive ability. Alternatively, side-chain atoms can bereplaced entirely by locating the side-chain properties at either the centroid of the sidechain or at the beta carbon (Simons et al., 1997), which amounts to averaging over theside-chain degrees of freedom and permits a significant performance enhancement atthe loss of some degree of specificity.

The size of the conformational space to be searched can be further reduced byrestricting the conformations available to the polypeptide backbone. Certain torsionangle pairs are preferred by amino acids in particular local structures (Marqusee, Rob-bins, and Baldwin, 1989; Blanco, Rivas, and Serrano, 1994; Callihan and Logan, 1999).One may restrict the torsion angles to discrete values commonly seen in known struc-tures, either by utilization of a small set of phi–psi pairs (Park and Levitt, 1995), byselecting pairs from an ideal set based on predicted regular secondary structure, or bythe use of fragments from known protein structures (Sippl, Hendlich, and Lackner,1992; Bowie and Eisenberg, 1994; Jones, 1997; Simons et al., 1997).

A method developed by our group that builds structures from protein fragments,called Rosetta (examples of Rosetta predictions in Critical Assessment of StructurePrediction 4 (CASP4) are shown in Figure 27.1), is based on a model of folding inwhich short segments of the protein chain flicker between different local structures,consistent with their local sequence, and folding to the native state occurs when theselocal segments are oriented such that low free energy interactions are made throughoutthe protein (Simons et al., 1997). In simulating this process, it is assumed that theensemble of local structures sampled by a given sequence segment during folding isroughly approximated by the distribution of local structures sampled by that sequencesegment in native protein structures. A list of possible conformations is extractedfrom experimental structures for each nine residue segments of the chain, and proteintertiary structures are assembled by searching through the combinations of these shortfragments for conformations that have buried hydrophobic residues, paired beta strands,and other low free energy features of native proteins. This strategy resolves some of thetypical problems with both the conformational search and the free energy function: Thesearch is greatly accelerated as switching between different possible local structurescan occur in a single Monte Carlo step, and less demands are placed on the free energyfunction since local interactions are accounted for in the fragment libraries.

In the most simplified models, entire segments of contiguous regular secondarystructure are represented as rigid bodies, allowing only freedom at the junctions (Eyrich,Standley, and Friesner, 1999). Such methods perform searches of probable arrange-ments of the elements, thus significantly decreasing the conformational search. How-ever, such representations lack enough detail to allow for more subtle features such asstrand twist and do not accommodate packing issues well.

Page 3: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

POTENT IAL FUNCT IONS 549

Secreted frizzled protein 3 (1IJX)

PPase (1I74), domain 2 MutS (1EWQ), domain 1

native prediction native prediction

13.8 11.1

Ribosome Binding Factor A (1KKG)

native prediction native prediction

10.1 11.0

Hypothetical Protein HI0442 (1J8B)

6.9 7.2

ERp29 C-terminal domain (1G7D)

prediction

predictionnativenative

Figure 27.1. Examples of ROSETTA structure predictions from CASP4 (see Chapter 24).

Native/prediction pairs are shown left-to-right, except for 1J8B and 1IJX, which are displayed

as a superposition of native and predicted structures. Values indicate Calpha root-mean-square

(rms) deviations between native and predicted structures, in angstroms. Colors represent position

along the chain from blue (N terminus) to red (C terminus). Figure also appears in Color Figure

section.

An alternative model with a long history is that of the lattice representation, inwhich residues are restricted to points on a regular three-dimensional lattice, withresidues proximal in sequence occupying adjacent lattice points (Skolnick and Kolinski,1991; Hinds and Levitt, 1994; Dill et al. 1995; Ishikawa, Yue, and Dill, 1999). Suchmethods allow for very fast sampling of conformational space, but are limited intheir ability to represent some of the finer details of backbone conformations (Revaet al., 1996).

POTENTIAL FUNCTIONS

There are two categories of potentials that may be employed in evaluating the freeenergy of the peptide chain and the surrounding solvent. Molecular mechanics poten-tials seek to model the forces that determine protein conformation using physically

Page 4: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

550 AB IN IT IO M ETHODS

based functional forms parameterized from small molecule data or in vacuo quantummechanical (QM) calculations. For example, van der Waals interactions are usuallyrepresented using a standard 6–12 potential with parameters derived from simpleliquids, whereas electrostatic interactions are modeled using Coulomb’s law with par-tial charges derived from QM calculations on peptide substructures or from chemicalintuition. In contrast, protein structure-derived potentials or scoring functions are empir-ically derived from experimental structures from the PDB (Sippl, 1995; Koppensteinerand Sippl, 1998). Usually a functional form is not specified and instead pseudoener-gies are obtained by taking the logarithm of probability distribution functions. Suchstructure-derived potentials are particularly useful in conjunction with reduced com-plexity models, where they may be viewed as representing the interactions between, forexample, side-chain centroids after averaging over all plausible positions of the atomsnot represented (Kocher, Rooman, and Wodak, 1994). Such potentials are also usefulin treating aspects of protein thermodynamics, particularly the hydrophobic effect, thatare not completely understood.

Both classes of potentials must represent the forces that determine macromolecularconformation: solvation, electrostatic interactions including hydrogen bonds and ionpairs, Van der Waals interactions, and, in certain cases, covalent bonds (Park, Huang,and Levitt, 1997). Additionally, they must be applicable at a granularity that is inkeeping with that of the representation selected and the target resolution of the method.

SEARCH METHODS

In searching, as in selecting the appropriate level of detail in the representation andin the potential, one must choose the granularity of the search based on the resolutiondesired from the method. Molecular dynamics directly integrates Newton’s equationsof motion to derive the motion of a molecule in a given potential. However, thevery small step size required for numerical stability makes molecular dynamics withfull atom representation of protein and solvent impractical for de novo generation oflow-resolution models.

To accelerate conformational searching, one must employ techniques that permitcoarse sampling of the energy landscape. A variety of methods may be used in con-junction with reduced complexity models and simplified potentials to perform broadsearches through low-resolution structures, including Metropolis Monte Carlo simu-lated annealing (Simons et al., 1997), simulated tempering (Hansmann and Okamoto,1997), evolutionary algorithms (Bowie and Eisenberg, 1994), and genetic algorithms(Pedersen and Moult, 1997). Individual moves in these procedures can involve quitelarge perturbations, and allow much more rapid (and more coarse) sampling of con-formational space in a relatively short time. For example, simple torsion space MonteCarlo procedures involve changing the backbone torsion angles of one or a smallnumber of residues by several degrees, which can produce quite large changes inthe Cartesian coordinates of the protein. Fragment insertion-based procedures (seeabove) can speed sampling by allowing jumps between different local structures in asingle step.

A single search is unlikely to find the global minimum of the free energy land-scape, and may instead yield a structure that has become trapped in a local minimum.In an effort to correct for this possibility, many current methods perform numerousconformational searches, generating an ensemble of candidate structures. Numerous

Page 5: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

APPL ICAT IONS 551

techniques have been used to select those structures most likely to be close to thenative from the ensemble (Park and Levitt, 1996; Huang et al., 1996; Samudrala andMoult, 1998), and future insights into features of native protein structures and prop-erties of near-native ensembles will undoubtedly add to the arsenal of methods ofselecting the most nativelike structures. Ultimately, improvements in potential func-tions may make identification of the most accurate models a straightforward procedureof selecting those conformations possessing the lowest free energy (Vorobjev, Almagro,and Hermans, 1998; Lazaridis and Karplus, 1999; Rapp and Friesner, 1999; Petrey andHonig, 2000; Lee et al., 2001). It is possible that improved energy functions for dis-crimination will ultimately involve a fusion of molecular mechanics-based and proteindatabase-derived potentials.

APPLICATIONS

Genome functional annotation and structural genomics initiatives are two areasof research where ab initio protein structure prediction could make importantcontributions.

Genome Annotation

While traditionally genome annotation has been accomplished using sequence-similaritysearch tools, many factors reduce the ability of sequence homology to identify dis-tant homologs (Russell and Pontig, 1998). Domain insertions, circular permutations,exchange of secondary structure elements, and genetic drift all contribute to the diver-gence of functionally related proteins over time. Thus, the annotation of open readingframes lacking detectable sequence homology to proteins of known function representsa promising application for ab initio models. Low-resolution ab initio predicted struc-tures may be able to reveal structural and functional relationships between proteinsnot apparent from sequence similarity alone. This concept is well illustrated by someexamples of predictions from CASP4. In the first examples (Figs. 27.2a and 27.2b), thepredicted structures were each found to be structurally related to a protein with a sim-ilar function, but no significant sequence similarity. In the second example (Fig. 27.3),functionally important residues were found clustered in the predicted structures. Inboth cases, some of the most important insights into these proteins’ function couldhave been obtained from the predicted structures alone.

Structural similarities like these may be detected using several different meth-ods. First, predicted structures may be compared against the PDB, using a generalstructure–structure comparison tool (Chapter 16). Recent experiments have found sig-nificant matches of ab initio predictions to structural homologs of the native structuresfor a variety of sequences, suggesting that current techniques may be sufficient todetect evolutionarily distant functional homologies in this manner (Simons, Strauss,and Baker, 2001; Bonneau et al., 2002, see also Chapter 20).

Second, ab initio structures could be probed for the presence of residues adopt-ing conserved geometric motifs (e.g., serine protease catalytic triads). While thisapproach has been applied to ab initio models with some success (Fetrow and Skol-nick, 1998a, Fetrow, et al., 1998b), it remains unclear how to best apply the tech-nique to low-resolution structures. In particular, some question remains as to howambiguous structural motifs must be in order to detect homologies in low-resolutionmodels.

Page 6: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

552 AB IN IT IO M ETHODS

native prediction homolog (1NKL)

(a)

native prediction homolog (1B7E)

(b)

Figure 27.2. Potential of ab initio predcitions to detect distant protein homologies. (a) The

native structure of bacterial-lysis protein Bacteriocin AS-48 (left, PDB id 1E68) is compared to the

best ROSETTA prediction for the structure (center), and the native structure of NK-Lysin (right,

PDB id 1NKL), a functionally similar protein. (b) The native structure of domain 2 of the DNA

mismatch repair protein MutS (left, PDB id 1EWQ), is compared to the best ROSETTA prediction

for the domain (center), and a domain from the native structure of the Tn5 transposase inhibitor

(right, PDB id 1B7E). In both (a) and (b) the ab initio models of the proteins were of sufficient

quality to detect these functional homologs by the similarity of the folds in the absence of

significant sequence similarity. Figure also appears in Color Figure section.

Third, predicted structures could be used to improve the sensitivity and reliablityof matches to sequence-based motif libraries, such as the PROSITE database (Bucherand Bairoch, 1994). Previous work has shown that weak matches to functional motifpatterns may be filtered effectively by requiring similarity between the structures ofpattern matches and the known structural environments of particular motifs (Jonassenet al., 2000). Therefore, it seems possible that ab initio models could provide thisstructural information when high-resolution structures are unavailable.

Structural Genomics Initiatives

Structural genomics initiatives present a second opportunity for the application of abinitio methods in several ways. First, ab initio structure prediction can help guide target

Page 7: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

FUTURE WORK 553

native prediction

Figure 27.3. An example of active-site conservation in ab initio models. The ROSETTA predicted

structure of domain 1 from an inorganic pyrophosphatase from Streptococcus mutans is compared

to the corresponding domain in the native structure (PDB id 1I74). Strongly conserved active site

residues are rendered as spheres along the backbone. Note the similar relative orientation of

these residues in the native and predicted structures, implying that ab initio models may be

sufficient to detect functional homologies using methods that search for functionally significant

residue arrangements. Figure also appears in Color Figure section.

selection by focusing experimental structure determination on those proteins likely toadopt novel folds or to be of particular biological importance.

Second, although homology modeling methods have been applied on a genomicscale (Sanchez and Sali, 1998, Sanchez and Sali, 1999), these approaches are inherentlylimited by their need for at least one homolog of known structure with good coverageand sufficient sequence similarity to be structurally equivalent (Marti-Renom et al.;see also Chapter 25). Homologs of this quality are not always available, and thereforehomology methods tend to leave significant fractions of both sequences and genomesimproperly modeled. Ab initio techniques do not face this limitation, and thus may bea valuable adjunct to homology methods, filling in structural gaps and producing muchmore complete sets of models than could be obtained by either technique alone.

Third, even small amounts of experimental data can dramatically improve thequality and reliability of ab initio structure prediction with the application of spatialconstraints. For example, the Rosetta method can produce moderate- to high-resolutionstructures when combined with limited NMR constraints (Standley et al., 1999; Bow-ers, Strauss, and Baker, 2000; Rohl and Baker, 2002). In addition, other sources ofexperimental data such as chemical cross-linking experiments could be used, allow-ing rapid structure determination for proteins not readily amenable to X-ray or NMRanalysis (e.g., membrane-bound proteins). Ab initio structure prediction may thereforebe useful for increasing the speed of structure determination, which is particularlyimportant for structural genomics.

FUTURE WORK

What are the prospects for improvement in ab initio protein structure prediction meth-ods? Improvement in potential functions should permit the generation of more preciseand accurate structures. All atom potentials in particular seem promising for the refine-ment of low-resolution models. Additionally, more detailed structures may require

Page 8: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

554 AB IN IT IO M ETHODS

better fine search strategies. Even for coarse models, the sampling rate of proteinconformational space has been a limitation, as demonstrated by the tendency of abinitio models to adopt low contact order conformations (Plaxco, Simons, and Baker,1998). Correcting for this contact order bias through focused sampling of higher-orderconformations will require significantly more computational resources, but is likely toimprove the prediction of larger, more complicated proteins. Ideally, the developmentof search strategies that do not face this local-contact bias would provide a boost toab initio methods.

Ab initio protein structure prediction has traditionally been an area of primarilyacademic interest, attaining only slow progress. Recently, however, there have beensignificant advancements in the field. There is hope that ab initio methods will con-tinue to improve, and that this improvement will provide both fundamental insightsinto the physics underlying protein folding and a valuable, practical resource forgenome analysis.

FURTHER READING

CASP3 (1999): Results from the Comparative Assessment of Techniques for Protein StructurePrediction. Proteins 37(S3):149–208.

CASP4 (Forthcoming) Results from the Comparative Assessment of Techniques for ProteinStructure Prediction. Proteins 45(S5):98–162.

Chothia C (1984): Principles that determine the structure of proteins. Ann Rev Biochem53:537–72.

Kabsch W, Sander C (1984): On the use of sequence homologies to predict protein structure:identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA81:1075–8.

Lazaridis T, Karplus M (2000): Effective energy functions for protein structure prediction. CurrOpin Struct Biol 10:139–45.

Simons KT, Strauss C, Baker D, (2001): Prospects for ab initio protein structural genomics. JMol Biol 306:1191–9.

Sippl MJ (1995): Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–35.

Wallace AC, Borkakoti N, Thornton JM (1997): TESS: a geometric hashing algorithm forderiving 3D coordinate templates for searching structural databases. Application to enzymeactive sites. Protein Sci 6:2308–23.

REFERENCES

Anfinsen CB (1973): Principles that govern the folding of protein chains. Science 181:223–30.

Anfinsen CB, Haber E, Sela M, White FW Jr (1961): The kinetics of the formation of nativeribonuclease during oxidation of the reduced polypeptide domain. Proc Natl Acad Sci USA47:1309–14.

Baker D, Agard DA (1994): Kinetics versus thermodynamics in protein folding. Biochemistry33:7505–9.

Blanco FJ, Rivas G, Serrano L (1994): A short linear peptide that folds into a native stablebeta-hairpin in aqueous solution. Nat Struct Biol 1:584–90.

Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malonstrom L, Robertson T, Baker D(2002): De novo prediction of three-dimensional structures for major protein families. J MolBiol 322:65–78.

Page 9: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

REFERENCES 555

Bowers PM, Strauss CE, Baker D (2000): De novo protein structure determination using sparseNMR data. J Biomol NMR 18:311–8.

Bowie JU, Eisenberg D (1994): An evolutionary approach to folding small alpha-helical proteinsthat uses sequence information and an empirical guiding fitness function. Proc Natl Acad SciUSA 91:4436–40.

Bucher P, Bairoch A (1994): A generalized profile syntax for biomolecular sequence motifs andits function in automatic sequence interpretation. Proc Int Conf Intell Syst Mol Biol 2:53–61.

Callihan DE, Logan TM (1999): Conformations of peptide fragments from the FK506 bindingprotein: comparison with the native and urea-unfolded states. J Mol Biol 285:2161–75.

Dann CE, Hsieh JC, Rattner A, Sharma D, Nathans J, Leahy DJ (2001): Insights into Wntbinding and signaling from the structures of two frizzled cysteine-rich domains. Nature12:86–90.

Davies DR, Braem LM, Reznikoff WS, Rayment I (1999): The three-dimensional structure of aTn5 transposase-related protein determined to 2.9-A resolution. J Biol Chem 274:11904–13.

Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995): Principles ofprotein folding—a perspective from simple exact models. Protein Sci 4:561–602.

Dunbrack RL Jr, Karplus M (1994): Conformational analysis of the backbone-dependentrotamer preferences of protein sidechains. Nat Struct Biol 1:334–40.

Eyrich VA, Standley DM, Friesner RA (1999): Prediction of protein tertiary structure to lowresolution: performance for a large and structurally diverse test set. J Mol Biol 288:725–42.

Fetrow JS, Skolnick J (1998a): Method for prediction of protein function from sequence usingthe sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxinsand T1 ribonucleases. J Mol Biol 281:949–68.

Fetrow JS, Godzik A, Skolnick J (1998b): Functional analysis of the Escherichia coli genomeusing the sequence-to-structure-to-function paradigm: identification of proteins exhibiting theglutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol 282:703–11.

Gonzalez C, Langdon G, Bruix M, Galvez A, Valdivia E, Maqueda M, Rico M (2000):Bacteriocin AS-48, a microbial cyclic polypeptide structurally and functionally related tomammalian NK-lysin. Proc Nat Acad Sci 97:11221–6.

Hansmann UH, Okamoto Y (1997): Numerical comparisons of three recently proposedalgorithms in the protein folding problem. J Comput Chem 18:920–33.

Hinds DA, Levitt M (1994): Exploring conformational space with a simple lattice model forprotein structure. J Mol Biol 243:668–82.

Huang ES, Subbiah S, Tsai J, Levitt M (1996): Using a hydrophobic contact potential to evaluatenative and near-native folds generated by molecular dynamics simulations. J Mol Biol257:716–25.

Huang YJ, Swapna GV, Shukla K, Ke H, Xia B, Inovye M, Montalione GT (Forthcoming).

Ishikawa K, Yue K, Dill KA (1999): Predicting the structures of 18 peptides using Geocore.Protein Sci 8:716–21.

Jonassen I, Eidhammer I, Grindhaug SH, Taylor WR (2000): Searching the protein structuredatabank with weak sequence patterns and structural constraints. J Mol Biol 304:599–619.

Jones DT (1997): Successful ab initio prediction of the tertiary structure of NK-lysin usingmultiple sequences and recognized supersecondary structural motifs. Proteins 29(S1):185–91.

Kocher JP, Rooman MJ, Wodak SJ (1994): Factors influencing the ability of knowledge-basedpotentials to identify native sequence-structure matches. J Mol Biol 235:1598–613.

Koppensteiner WA, Sippl MJ (1998): Knowledge-based potentials—back to the roots.Biochemistry (Mosc) 63:247–52.

Lazaridis T, Karplus M (1999): Discrimination of the native from misfolded protein modelswith an energy function including implicit solvation. J Mol Biol 288:477–87.

Page 10: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

556 AB IN IT IO M ETHODS

Lee MR, Tsai J, Baker D, Kollman PA (2001): Molecular dynamics in the endgame of proteinstructure prediction. J Mol Biol 313:417–30.

Lesk AM, Lo Conte L, Hubbard T (2001): Assessment of novel fold targets in CASP4:predictions of three-dimensional structures, secondary structures, and interresidue contacts.Proteins 45(S5):98–118.

Liepinsh E, Andersson M, Roysschaert JM, otting G (1997): Saposin fold revealed by the NMRstructure of NK-lysin. Nat Struct Biol 4:793–5.

Liepinsh E, Barishev M, Shapiro A, Ingelman-Sundberg M, Otting G, Mkrtchian S (2001):Thioredoxin fold as a homodimerization module in the potative chaperone Erp29: NMRstructures of the domains and experimental model of the 51 kDa dimer. Structure 9:457–71.

Lim K, Tempcyzk A, Toedt J, Parsons J, Howard A, Eisenstein E, Herzberg O (Forthcoming).

Marqusee S, Robbins VH, Baldwin RL (1989): Unusually stable helix formation in shortalanine-based peptides. Proc Natl Acad Sci USA 86:5286–90.

Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000): Comparative proteinstructure modeling of genes and genomes. Ann Rev Biophys Biomol Struct 29:291–325.

Merckel MC, Fabrichniy IP, Salminen A, Kalkkinen N, Baykov AA, Lahti R, Goldman A(2001): Crystal structure of Streptococcus mutans pyrophosphatase: a new fold for an oldmechanism. Structure 9:289–97.

Murzin AG (2001): Progress in protein structure prediction. Nat Struct Biol 8:110–2.

Obmolova G, Ban C, Hsieh P, Yang W (2000): Crystal structures of mismatch repair proteinMutS and its complex with a substrate DNA. Nature 407:703–10.

Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J (1999): Ab initio folding ofproteins using restraints derived from evolutionary information. Proteins 37(S3):177–85.

Park BH, Levitt M (1995): The complexity and accuracy of discrete state models of proteinstructure. J Mol Biol 249:493–507.

Park B, Levitt M (1996): Energy functions that discriminate X-ray and near native folds fromwell-constructed decoys. J Mol Biol 258:367–92.

Park BH, Huang ES, Levitt M (1997): Factors affecting the ability of energy functions todiscriminate correct from incorrect folds. J Mol Biol 266:831–46.

Pedersen JT, Moult J (1997): Protein folding simulations with genetic algorithms and a detailedmolecular description. J Mol Biol 269:240–59.

Petrey D, Honig B (2000): Free energy determinants of tertiary structure and the evaluation ofprotein models. Protein Sci 9:2181–91.

Pillardy J, Czaplewski C, Liwo A, Lee J, Ripoll DR, Kazmierkiewicz R, Oldziej S, Wede-meyer WJ, Gibson KD, Arnautova YA, Saunders J, Ye YJ, Sheraga HA (2001): Recentimprovements in prediction of protein structure by global optimization of a potential energyfunction. Proc Natl Acad Sci USA 98:2329–33.

Plaxco KW, Simons KT, Baker D (1998): Contact order, transition state placement and therefolding rates of single domain proteins. J Mol Biol 277:985–94.

Rapp CS, Friesner RA (1999): Prediction of loop geometries using a generalized born model ofsolvation effects. Proteins 35:173–83.

Reva BA, Finkelstein AV, Sanner MF, Olson AJ (1996): Adjusting potential energy functionsfor lattice models of chain molecules. Proteins 25:379–88.

Rohl CA, Baker D (2002): De novo determination of protein backbone structure from residualdipolar couplings using Rosetta. J Am Chem Soc 124:2723–9.

Russell RB, Ponting CP (1998): Protein fold irregularities that hinder sequence analysis. CurrOpin Struct Biol 8:364–71.

Samudrala R, Moult J (1998): An all-atom distance-dependent conditional probabilitydiscriminatory function for protein structure prediction. J Mol Biol 275:895–916.

Page 11: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native

REFERENCES 557

Samudrala R, Xia Y, Huang E, Levitt M (1999): Ab initio protein structure prediction using acombined hierarchical approach. Proteins 37(S3):194–8.

Sanchez R, Sali A (1998): Large-scale protein structure modeling of the Saccharomycescerevisiae genome. Proc Natl Acad Sci USA 95:13597–602.

Sanchez R, Sali A (1999): Comparative protein structure modeling in genomics. J Comp Phys151:388–401.

Simons KT, Kooperberg C, Huang E, Baker D (1997): Assembly of protein tertiary structuresfrom fragments with similar local sequences using simulated annealing and Bayesian scoringfunctions. J Mol Biol 268:209–25.

Simons KT, Strauss C, Baker D (2001): Prospects for ab initio protein structural genomics. JMol Biol 306:1191–9.

Sippl MJ (1995): Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–35.

Sippl MJ, Hendlich M, Lackner P (1992): Assembly of polypeptide and protein backboneconformations from low energy ensembles of short fragments: development of strategies andconstruction of models for myoglobin, lysozyme, and thymosin beta 4. Protein Sci 1:625–40.

Skolnick J, Kolinski A (1991): Dynamic Monte Carlo simulations of a new lattice model ofglobular protein folding, structure and dynamics. J Mol Biol 221:499–531.

Standley DM, Eyrich VA, Felts AK, Friesner RA, McDermott AE (1999): A branch andbound algorithm for protein structure refinement from sparse NMR data sets. J Mol Biol285:1691–710.

Vorobjev YN, Almagro JC, Hermans J (1998): Discrimination between native and intentionallymisfolded conformations of proteins: ES/IS, a new method for calculating conformationalfree energy that uses both dynamics simulations with an explicit solvent and an implicitsolvent continuum model. Proteins 32:399–413.

Page 12: AB INITIO METHODS - BioJuncture · AB INITIO METHODS Dylan Chivian, Timothy Robertson, Richard Bonneau, and David Baker Ab initio structure prediction seeks to predict the native