Why Proteins Are So Robust to Site Mutation

download Why Proteins Are So Robust to Site Mutation

of 6

Transcript of Why Proteins Are So Robust to Site Mutation

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    1/6

    Why Are Proteins So Robust To Site Mutations?

    Darin M. Taverna1 and Richard A. Goldstein2*

    1Biophysics Research Divisionand2Department of ChemistryUniversity of Michigan, AnnArbor, MI 48109-1055, USA

    There have been repeated observations that proteins are surprisinglyrobust to site mutations, enduring signicant numbers of substitutionswith little change in structure, stability, or function. These results arealmost paradoxical in light of what is known about random heteropoly-mers and the sensitivity of their properties to seemingly trivial mutations.To address this discrepancy, the preservation of biological protein prop-erties in the presence of mutation has been interpreted as indicating theindependence of selective pressure on such properties. Such results alsolead to the prediction that de novo protein design should be relativelyeasy, in contrast to what is observed. Here, we use a computationalmodel with lattice proteins to demonstrate how this robustness can resultfrom population dynamics during the evolutionary process. As a result,sequence plasticity may be a characteristic of evolutionarily derived pro-teins and not necessarily a property of designed proteins. This suggeststhat this robustness must be re-interpreted in evolutionary terms, andhas consequences for our understanding of both in vivo and in vitroprotein evolution.

    # 2002 Academic Press

    Keywords: site substitution; mutagenesis; molecular evaluation; proteinstability; protein folding*Corresponding author

    Introduction

    There has been much interest in probing therelationship between a protein's sequence and itsresulting structural, thermodynamic, and func-tional properties. It is hoped that insights resultingfrom these pursuits will lead to the ability to pre-dict protein properties based on sequence infor-mation as well as how these properties could bealtered by changes in the sequence. Such insightsare also crucial in developing the ability to designproteins with prescribed or altered structures, stab-ilities, and functionalities.

    One of the major methods of investigating therelationship of protein sequence to the correspond-ing properties is to alter naturally occuring pro-

    teins through site mutagenesis. Often thesubstitution is chosen so as to modify a specicinteraction, although more exhaustive and randomsubstitutions have been studied. One of the sur-prising results of such studies is the robustness ofprotein properties to mutations. Although most

    site substitutions are destabilizing, many result inessentially unchanged stabilities, and a signicantfraction of mutations actually result in increased

    stability over the wild type (e.g. see Reddy et al.1).The conclusion drawn from these studies is thatthere is an inherent robustness to the mapping ofsequence to structure, and that sequence space con-sists of large regions of possible sequences corre-sponding to proteins with essentially equivalentproperties. The general level of sequence plasticityhas also led researchers to conclude that the robustproperties must not be under active selectionduring evolution.2 This plasticity provides opti-mism for de novo protein design, in that it indicatesthat there are large numbers of amino acidsequences consistent with a given stable structure;

    the ability of a protein to fold despite changedinteractions means that the interactions do nothave to be formulated precisely in advance. Proteindesign may correspond to nding a needle in ahaystack, but at least it is a quite sizable needle.

    In contrast, de novo engineering of well-packedproteins has proven ``surprisingly difcult''.3 Whydoes the sequence plasticity observed in site-directed mutagenesis not translate into ease in pro-tein engineering? Perhaps we are interpreting theresults of these mutagenesis experiments in thewrong context. Proteins are the result of a long

    Present address: D. Taverna, Protein Pathways, Inc.,1145 Gayley Ave., Suite 304, Los Angeles, CA 90024,USA.

    E-mail address of the corresponding author:[email protected]

    doi:10.1006/jmbi.2001.5226 available online at http://www.idealibrary.com on J. Mol. Biol. (2002) 315, 479484

    0022-2836/02/0304796 $35.00/0 # 2002 Academic Press

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    2/6

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    3/6

    ility. If we consider the fact that most randomsequences of amino acids do not have a stablefolded state, then any mutation in one of the few

    viable sequences with a stable ground state wouldmost likely move the stability in the direction ofthe more random sequences; that is, towards beingless stable.

    The surprise is rather that experimentalmutations have a signicant probability in result-ing in unchanged or increased stability. The exactpercentages of mutations that are stabilizing varyaccording to the protein and the nature of the sub-stitutions, ranging from approximately 8 % inmutations of barnase11 and staphylococcal nucle-ase1214 designed to eliminate specic interactions,to 17 % in interior locations of myoglobin,15 20% of

    non-Ala locations in Arc repressor,16

    and 29% fortwo specic solvent-exposed locations in phage T4lysozyme.17 A more comprehensive set of 356 site-directed mutations compiled from the literature byReddy and co-workers showed that 25% of themutations increased protein stability.1 While thespecics vary, these results are sufciently consist-ent to conclude that robustness would seem to be ageneral characteristic of systems that have comeinto being through the Darwinian evolution ofpopulations. The range of these experimentalresults is close to the 18% to 28% (dependingupon Gcrit) we observed for our lattice proteinsevolving through population dynamics, but farfrom the 0.04 % to 0.4 % observed for random-walksequences with comparable Gcrit.

    Note that this robustness occurs in the absenceof any selective pressure towards robustness. Thevarious sequences in the model withGfolding < Gcrit have equal tness and equalprobability of contributing to the next generation.Robustness towards mutations is just one of anumber of properties that emerge from neutralevolution in sequence space, as has been empha-sized by a number of authors.4,5,7,10,18 24

    So how can evolution, where robustness is notan explicit selection criterion, result in such unex-pectedly robust proteins? Insight into this phenom-enon comes from the pioneering work by Eigen.25

    In analytical studies of RNA evolution, he foundthat evolution selected for a network of genotypes,what he called quasispecies. The relevant tness ofthe quasispecies is a function of the tness of all ofthe genotypes, so the population of any one geno-type would be enhanced by being surrounded byt neighbors. This effect depends on the possibilityof back-mutations: if one genotype contributes to aneighboring genotype in one generation, there is aprobability that the neighboring genotype willreturn the favor in a future generation. Throughthis mechanism, population dynamics result in anevolutionary selection of genotypes biased by thetness of their neighbors; that is, on their robust-ness to mutations. In the sense of tness land-scapes, nature may choose broad tness plateausof well-connected neighbors even in the presenceof higher, yet poorly connected tness peaks. Thisevolutionary heritage is encoded in the genotype,resulting in a sequence plasticity that distinguishesthese sequences from random sequences chosen tohave the same phenotype. Bornberg-Bauer &Chan, for instance, found that evolutionary

    dynamics would result in a bias in the populationtowards ``prototype'' sequences with the maximumnumber of ``neutral neighbors''.6 The workdescribed here concentrates on protein stability,but it should be true of any protein property thatis important for survival of the organism. Thisevolutionary trend towards robustness may be ageneral characteristic of biological systems.26,27

    There are a number of important consequencesof this effect. Firstly, the lessons of sequence plas-ticity in biological proteins may be inapplicable toartically designed proteins. It may be necessary to

    Figure 2. Density distribution ofGmut from modelproteins undergoing population (continuous line) andsingle sequence evolution (broken line), for variousvalues ofGcrit.

    Figure 3. Probability of destabilizing mutation frommodel proteins undergoing population evolution,according to the number of point mutations, as a func-tion ofGcrit (thin continuous line). The average rate forall mutations (thick continuous line) and the high rateof destabilizing mutations for single sequence evolution(broken line) are included for comparison.

    Why Are Proteins So Robust To Site Mutations? 481

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    4/6

    have a de novo sequence exquisitely designed tohave properties similar to biological proteins. Thisalso suggests that taking advantage of theobserved robustness by modifying existing pro-teins may be a more effective route. Alternatively,in vitro evolution studies may provide proteinswith the same degree of sequence plasticity asnatural proteins. More optimistically, proteins mayhave compromised possible interactions and prop-

    erties in developing this robustness, whichsuggests that more effective if less robust proteinsmay be available.

    In addition, these results suggest that theobserved sequence plasticity may have non-obvious consequences for our understanding ofproteins and their evolution. For instance, Bakerand co-workers observed that sequence changes inthe IgG binding domain of protein L often resultedin proteins that folded faster than the wild-typeprotein, and concluded that this indicates that thefolding rate is not under strong selective pressure.2

    The model presented here results in the oppositeconclusion, that properties of the protein under

    stronger selective pressure are more likely to be``buffered'' and thus robust to mutations. In otherwords, robustness to site mutations would para-doxically be an indication of stronger selectivepressure on these characteristics.

    Finally, we note that there is growing interest inthe relationship between robustness and evolvabil-ity; that is, between the ability to buffer genotypicvariations and the ability of an organism to modifyto new situations and environments.28 If so, thetendency of population dynamics to increasesequence plasticity might have had signicantimpact on the evolutionary process, includingthe development of new functionalities of existent

    proteins.

    Methods

    We consider a highly simplied representation ofevolving proteins. Our model proteins consist of chainsof n 25 monomers, conned to a 5 5 two-dimen-sional, maximally compact square lattice with eachmonomer located at one lattice point. This provides uswith 1081 possible conformations represented by the1081 self-avoiding walks on this lattice, neglecting struc-tures related by rotation, reection, or inversion. Thenon-compact states were neglected in order to allow fora reasonable number of stable sequences. Alternatively,we would expect the non-compact states to be neglect-

    ible as long as the contact energies were sufcientlyattractive. The fact that most protein structures arereasonably compact makes this assumption not toounreasonable. There are important differences betweenthe two-dimensional and three-dimensional models,especially in folding simulations where the two-dimen-sional conformation space may not be ergodic.29,30 Whilethese limitations are critical in folding simulations, weare more interested in the mapping of sequence to struc-ture rather than how the sequence folds to that givenstructure; the thermodynamic properties describedbelow involve sums over states and should be lessaffected by the dimensionality of the model. We use the

    two-dimensional model to provide a more realistic ratioof buried to exposed sites.

    We assume that the energies of any sequence in con-formation k is given by a simple contact energy of theform:

    E

    i`j

    geiejUkij 1

    Here, Uijk is equal to 1 if residues i and j are not cova-

    lently connected but are on adjacent lattice sites in con-formation k, and g eiej is the contact energy betweenamino acid ei at location i and ej at location j in thesequence. We use the contact energies derived by Miya-zawa & Jernigan based on a statistical analysis of thedatabase of known proteins that implicitly includes theeffect of interactions of the protein with the solvent.31 Inour simplied proteins, there are 132 pairs of residuesthat can possibly come into contact, with 16 of these con-tacts present in any given compact structure.

    Using equation (1) we can calculate the energy of agiven protein sequence in all 1081 possible confor-mations. We make the assumption that the thermodyn-amic hypothesis is obeyed and that the lowest-energystructure is the native state;32 the other 1080 possible

    structures represent the ensemble of unfolded states.Not all possible protein sequences are viable. In gener-al, a protein must full a number of conditions relatingto stability, functionality, and foldability. Here, we con-centrate on stability. For each sequence, we calculate thefree energy of folding:

    Gfolding Ef kTlnZ expEfakT 2

    where Z is the partition function. (For the Miyazawa-Jernigan potential, we use kT 0.6.) We consider asequence as representing a viable protein as long as itsGfolding is less than some specied Gcrit.

    We implement two different dynamic models ofsequence change. In the rst model, we choose asequence at random and make point mutations until wearrive at a sufciently stable protein sequence. Startingwith this initial stable sequence, residue positions arerandomly mutated with the number of mutations chosenfrom a Poisson distribution with an average of 0.002mutations per amino acid residue per generation. Withthis low mutation rate, multiple mutations are rare (theratio of single mutants to multiple mutants is 200). Wecalculate the stability of the new sequence; ifGfolding islarger than Gcrit or the structure has changed, themutation is rejected and the original sequence retained.Generations where no mutations occurred are notcounted. This allows the single sequence to diffuse ran-domly over the range of acceptable sequences, analogousto random-walk models in which a particle has averagezero velocity when a boundary is encountered. This is

    done ve times for each of ve values of Gcrit (0.0,0.5, 1.0, 1.5, 2.0). Sequences that arise during theseruns are probed for robustness to mutations. We makemutations in the sequence with a Poisson distributionwith mean 0.002 mutation per amino acid residue, main-taining a constant rare rate of multiple mutations. Wethen calculate the probability that a mutation results in agiven change in stability (Gmut) as a function of thestability prior to the mutation (Gwt).

    For the second model, we simulate the effect of popu-lation dynamics using an evolutionary scheme, using amethod described elsewhere.8 We construct a populationof N 3000 identical viable sequences. For each gener-

    482 Why Are Proteins So Robust To Site Mutations?

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    5/6

    ation, each residue in the protein population has a prob-ability of 0.002 to be mutated to another random residue;both the population size and mutation rate were chosento be comparable to previous analytical models of evol-ution processes.3335 The stability of each protein in thepopulation is then calculated. We use truncation selec-tion where the NH sequences having Gfolding

  • 8/7/2019 Why Proteins Are So Robust to Site Mutation

    6/6

    28. Kirschner, M. & Gerhart, J. (1998). Evolvability. Proc.Natl Acad. Sci. USA, 95, 8420-8427.

    29. Abkevich, A. I., Gutin, A. M. & Shakhnovich, E. I.(1995). Impact of local and non-local interactions onthermodynamics and kinetics of protein folding.J. Mol. Biol. 252, 460-471.

    30. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (1997).Statistical mechanics of simple models of proteinfolding and design. Biophys. J. 73, 3192-3210.

    31. Miyazawa, S. & Jernigan, R. L. (1985). Estimation of

    effective interresidue contact energies from protein

    crystal structures: quasi-chemical approximation.Macromolecules, 18, 534-552.

    32. Govindarajan, S. & Goldstein, R. A. (1998). On thethermodynamic hypothesis of protein folding. Proc.Natl Acad. Sci. USA, 95, 5545-5549.

    33. Kimura, M. (1979). The neutral theory of molecularevolution. Sci. Am. 241, 98-126.

    34. Ohta, T. (1987). Simulating evolution by gene dupli-cation. Genetics, 115, 207-213.

    35. Ohta, T. (1988). Multigene and supergene families.

    Oxford Surv. Evol. Biol. 5, 41-65.

    Edited by J. Thornton

    (Received 6 August 2001; received in revised form 22 October 2001; accepted 23 October 2001)

    484 Why Are Proteins So Robust To Site Mutations?