Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only...

42
JOURNAL OF EXPERIMENTAL ZOOLOGY (MOL DEV EVOL) 288:242–283 (2000) © 2000 WILEY-LISS, INC. Plasticity, Evolvability, and Modularity in RNA LAUREN W. ANCEL 1 AND WALTER FONTANA 2,3 * 1 Department of Biological Sciences, Stanford University, Stanford, California 94305 2 Santa Fe Institute, Santa Fe, New Mexico 87501 3 Institute for Advanced Study, Program in Theoretical Biology, Princeton, New Jersey 08540 ABSTRACT RNA folding from sequences into secondary structures is a simple yet powerful, biophysically grounded model of a genotype–phenotype map in which concepts like plasticity, evolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and profoundly non-independent effects of natural selection. Molecu- lar plasticity is viewed here as the capacity of an RNA sequence to assume a variety of energeti- cally favorable shapes by equilibrating among them at constant temperature. Through simulations based on experimental designs, we study the dynamics of a population of RNA molecules that evolve toward a predefined target shape in a constant environment. Each shape in the plastic repertoire of a sequence contributes to the overall fitness of the sequence in proportion to the time the sequence spends in that shape. Plasticity is costly, since the more shapes a sequence can assume, the less time it spends in any one of them. Unsurprisingly, selection leads to a reduction of plasticity (environmental canalization). The most striking observation, however, is the simulta- neous slow-down and eventual halting of the evolutionary process. The reduction of plasticity entails genetic canalization, that is, a dramatic loss of variability (and hence a loss of evolvability) to the point of lock-in. The causal bridge between environmental canalization and genetic canali- zation is provided by a correlation between the set of shapes in the plastic repertoire of a sequence and the set of dominant (minimum free energy) shapes in its genetic neighborhood. This statisti- cal property of the RNA genotype–phenotype map, which we call plastogenetic congruence, traps populations in regions where most genetic variation is phenotypically neutral. We call this phe- nomenon neutral confinement. Analytical models of neutral confinement, made tractable by the assumption of perfect plastogenetic congruence, formally connect mutation rate, the topography of phenotype space, and evolvability. These models identify three mutational regimes: that corre- sponding to neutral confinement, an exploration threshold corresponding to a breakdown of neu- tral confinement with the simultaneous persistence of the dominant phenotype, and a classic error threshold corresponding to the loss of the dominant phenotype. In a final step, we analyze the structural properties of canalized phenotypes. The reduction of plasticity leads to extreme modu- larity, which we analyze from several perspectives: thermophysical (melting—the RNA version of a norm of reaction), kinetic (folding pathways—the RNA version of development), and genetic (transposability—the insensitivity to genetic context). The model thereby suggests a possible evo- lutionary origin of modularity as a side effect of environmental canalization. J. Exp. Zool. (Mol. Dev. Evol.) 288:242–283, 2000. © 2000 Wiley-Liss, Inc. *Correspondence to: W. Fontana, Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501. E-mail: [email protected] Received 4 November 1999; accepted 20 May 2000 Biological evolution is the transformation of heritable phenotypes through time. Evolution is fueled by the introduction of novel phenotypes and steered by population-level interactions including natural selection and genetic drift. The predomi- nant route to heritable phenotypic change origi- nates with genetic mutation. The processes that translate genetic variation into phenotypic varia- tion give rise to an association between genotype and phenotype which we represent as a map that is sensitive to environmental conditions. Concepts such as canalization, epistasis, and modularity underlie our understanding of phenotypic variabil- ity [for a sweeping perspective see Wagner and Altenberg (’96) and Schlichting and Pigliucci (’98). Yet, a microfoundation of these concepts and of their interconnections in terms of the relation be- tween genotype and phenotype is largely missing. Our goal here is to initiate such a foundation in the specific context of a conceptually, computa-

Transcript of Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only...

Page 1: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

242 L. ANCEL AND W. FONTANAJOURNAL OF EXPERIMENTAL ZOOLOGY (MOL DEV EVOL) 288:242–283 (2000)

© 2000 WILEY-LISS, INC.

Plasticity, Evolvability, and Modularity in RNALAUREN W. ANCEL1 AND WALTER FONTANA2,3*1Department of Biological Sciences, Stanford University, Stanford,California 94305

2Santa Fe Institute, Santa Fe, New Mexico 875013Institute for Advanced Study, Program in Theoretical Biology, Princeton,New Jersey 08540

ABSTRACT RNA folding from sequences into secondary structures is a simple yet powerful,biophysically grounded model of a genotype–phenotype map in which concepts like plasticity,evolvability, epistasis, and modularity can not only be precisely defined and statistically measuredbut also reveal simultaneous and profoundly non-independent effects of natural selection. Molecu-lar plasticity is viewed here as the capacity of an RNA sequence to assume a variety of energeti-cally favorable shapes by equilibrating among them at constant temperature. Through simulationsbased on experimental designs, we study the dynamics of a population of RNA molecules thatevolve toward a predefined target shape in a constant environment. Each shape in the plasticrepertoire of a sequence contributes to the overall fitness of the sequence in proportion to the timethe sequence spends in that shape. Plasticity is costly, since the more shapes a sequence canassume, the less time it spends in any one of them. Unsurprisingly, selection leads to a reductionof plasticity (environmental canalization). The most striking observation, however, is the simulta-neous slow-down and eventual halting of the evolutionary process. The reduction of plasticityentails genetic canalization, that is, a dramatic loss of variability (and hence a loss of evolvability)to the point of lock-in. The causal bridge between environmental canalization and genetic canali-zation is provided by a correlation between the set of shapes in the plastic repertoire of a sequenceand the set of dominant (minimum free energy) shapes in its genetic neighborhood. This statisti-cal property of the RNA genotype–phenotype map, which we call plastogenetic congruence, trapspopulations in regions where most genetic variation is phenotypically neutral. We call this phe-nomenon neutral confinement. Analytical models of neutral confinement, made tractable by theassumption of perfect plastogenetic congruence, formally connect mutation rate, the topography ofphenotype space, and evolvability. These models identify three mutational regimes: that corre-sponding to neutral confinement, an exploration threshold corresponding to a breakdown of neu-tral confinement with the simultaneous persistence of the dominant phenotype, and a classic errorthreshold corresponding to the loss of the dominant phenotype. In a final step, we analyze thestructural properties of canalized phenotypes. The reduction of plasticity leads to extreme modu-larity, which we analyze from several perspectives: thermophysical (melting—the RNA version ofa norm of reaction), kinetic (folding pathways—the RNA version of development), and genetic(transposability—the insensitivity to genetic context). The model thereby suggests a possible evo-lutionary origin of modularity as a side effect of environmental canalization. J. Exp. Zool. (Mol.Dev. Evol.) 288:242–283, 2000. © 2000 Wiley-Liss, Inc.

*Correspondence to: W. Fontana, Santa Fe Institute, 1399 HydePark Road, Santa Fe, NM 87501. E-mail: [email protected]

Received 4 November 1999; accepted 20 May 2000

Biological evolution is the transformation ofheritable phenotypes through time. Evolution isfueled by the introduction of novel phenotypes andsteered by population-level interactions includingnatural selection and genetic drift. The predomi-nant route to heritable phenotypic change origi-nates with genetic mutation. The processes thattranslate genetic variation into phenotypic varia-tion give rise to an association between genotypeand phenotype which we represent as a map thatis sensitive to environmental conditions. Conceptssuch as canalization, epistasis, and modularity

underlie our understanding of phenotypic variabil-ity [for a sweeping perspective see Wagner andAltenberg (’96) and Schlichting and Pigliucci (’98).Yet, a microfoundation of these concepts and oftheir interconnections in terms of the relation be-tween genotype and phenotype is largely missing.Our goal here is to initiate such a foundation inthe specific context of a conceptually, computa-

Page 2: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 243

tionally, and empirically simple yet powerful geno-type–phenotype model based on the folding of RNAsequences (genotypes) into shapes (phenotypes).

RNA folding seems, at first, unlikely to be ableto address canalization, epistasis, and modular-ity. Although we shall mostly use language ap-propriate for RNA, an RNA sequence could beviewed as a metaphor for a genome, and a posi-tion along the sequence as a metaphor for a locuswith four possible alleles (nucleotides). A pheno-type (RNA shape) then is a simple pattern ofgene–gene interactions (base pairs, see section1.2). As this paper weaves together several seem-ingly diverse concepts, we begin with an overview.

Our study provides a molecular illustrationfor the Simpson–Baldwin effect (Baldwin, 1896;Simpson, ’53; Ancel, ’99a) using RNA as an ex-ample. Central to the initial “discovery” stage ofthe Simpson–Baldwin effect is phenotypic plas-ticity, that is, the genetically influenced capacityof an individual to develop into one among a rangeof phenotypes. In an evolutionary context, a fixedenvironment will convey a selective advantage tothose individuals that can access an improved phe-notype within their plastic repertoire over thosewho cannot. The “assimilation” stage of theSimpson–Baldwin effect arises from the fitnesscosts of plasticity. Among the individuals selectedin the first stage, those that can still access theimproved phenotype while reducing their rangeof phenotypic plasticity will have a selective ad-vantage. The Simpson–Baldwin effect describesthe genetic determination of a phenotype that pre-viously seemed to be acquired anew in each gen-eration. A frequently considered mechanism ofplasticity is learning (Hinton and Nowland, ’87;Maynard-Smith, ’87).

Although molecules do not learn, biopolymerslike RNA are plastic in the sense that a given se-quence can realize a repertoire of alternative struc-tures, rather than being frozen in its minimum freeenergy configuration (section 1). An RNA sequencesamples a variety of energetically low lying struc-tures by wiggling among them under thermal fluc-tuation. The overall time a sequence spends in ashape reflects the thermodynamic stability of thatshape. Consider now a population of replicating andmutating RNA sequences that are subject to selec-tion for structural proximity to a constant targetshape (section 2). Assume further that each shapeattainable by a sequence contributes to thatsequence’s overall fitness in proportion to the timethe sequence spends in it.

Sequences with an advantageous but energeti-

cally suboptimal shape will be selected over thosethat lack that shape. Plasticity entails a fitnesscost because the more alternative shapes an RNAmolecule can fold into, the less time it will spendin each shape, including advantageous ones. Se-quences with an advantageous but energeticallysuboptimal shape s will therefore be replaced bymutants with s at lower free energy until s be-comes the minimum free energy structure. Sub-sequently, natural selection will fine tune thethermodynamic stability of σ by favoring variantswith few alternative structures in the energeticvicinity of s. We show that in our RNA model suchgenetic assimilation occurs extremely rapidly andcovers several orders of magnitude in thermody-namic stability (section 2).

Some models link plasticity to a speed-up in evo-lution (Hinton and Nowlan, ’87). This is not the casein our RNA model. In fact, the reduction of plastic-ity in a constant environment leads to a slow downof evolution to the point of a phenotypic dead-end.In section 2 we describe this dynamic, and in sec-tion 3 we offer an explanation for this behavior interms of features that are intrinsic to the RNA geno-type–phenotype map. Several threads come togetheras we argue that genetic assimilation (the reduc-tion of plasticity) requires a genotype–phenotypemap in which plasticity mirrors variability. In otherwords, the shapes appearing in a sequence’s reper-toire of energetically favorable structures correlatesignificantly to the minimum free energy structuresof the one-error mutants of that sequence. Thisturns out to be a general property of the RNA geno-type-phenotype map (section 3). We call this phe-nomenon “plastogenetic congruence.”

Phenotypic variability describes the extent of phe-notypic variation accessible to a genotype throughmutation. The evolvability of an individual is thelikelihood of reaching a phenotype with improvedfitness through mutation (Altenberg, ’95). As such,evolvability is linked to variability via the fitnessfunction. Plasticity, on the other hand, captures thephenotypic variation at a fixed genotype, typicallyinduced by environmental heterogeneity. In thissense, the impacts of the environment on plasticityare analogous to those of genetic mutation on phe-notypic variability. Plastogenetic congruence meansthat plasticity and variability mirror each other: lowplasticity (that is, strong genetic determination) im-plies low variability (that is, strong mutational buff-ering), and vice versa.

Plastogenetic congruence implies that an evo-lutionary reduction of plasticity has a flip side: adecline in variability. This link results in a self-

Page 3: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

244 L. ANCEL AND W. FONTANA

defeating process in which the loss of plasticitythrough natural selection leads to the loss of phe-notypic variability, to the extent of evolutionarylock-in.

Plastogenetic congruence also sheds light onWaddington’s theories for the evolution of organis-mal development (Waddington, ’57). He introducedtwo modes of evolution: environmental canaliza-tion is the honing of developmental pathways toreduce environmental noise, and genetic canaliza-tion is the integration of genetic factors to reducethe deleterious effects of mutation. Under plasto-genetic congruence then, genetic canalization willensue as a byproduct of selection for environmen-tal canalization. This yields a mechanistic expla-nation of a hypothesis put forward by Wagner etal. (’97).

Pairwise epistasis is the influence that an alter-ation of gene i has on the phenotypic consequencesof a subsequent alteration of gene j (Wagner et al.,’98). This definition of epistasis captures the ge-netic control of variability. In RNA, low plasticitycoincides with low variability, maintained by thefixation of epistatic interactions that buffer the phe-notype against mutations. Remarkably, epistaticinteractions in RNA can eliminate variability al-most completely. This phenomenon, which we call“neutral confinement,” contributes to the evolu-tionary lock-in mentioned earlier.

Analytical models built on a stylized version ofplastogenetic congruence predict that, for certainparameter regimes, the mutation rate needed toescape this exploration catastrophe is so high asto result in the loss of the dominant phenotype(error catastrophe). We discuss this issue in sec-tions 4 and 5.

The RNA folding genotype–phenotype map en-ables not only an exploration of evolutionarydynamics but also a characterization of themorphological endpoints of evolution. In section5 we compare three classes of sequences thatshare the same dominant (that is, minimum freeenergy) structure. One set is derived from a ran-dom sample of sequences with the given domi-nant structure, another set has evolved on aneutral network (see section 1.3), and the thirdset results from genetic assimilation under theplastic genotype–phenotype map. The charac-teristic which best distinguishes among thethree classes is modularity. We study modular-ity of shape characters from a variety of per-spectives, all contributing to a definition andquantification of modular traits as “transpos-able characters” that maintain their structural

integrity in different sequence and environmentalcontexts. Although the three classes share the sameminimum free energy structure, that structure isnot even remotely modular on the random se-quences while it is extremely modular on sequencesthat have experienced genetic assimilation.

Genetic assimilation leaves us with sequencesthat possess modular shapes and are at the sametime evolutionarily locked in. This seems to con-tradict the hypothesized evolutionary advantageof modularity: modularity partitions quantitativetraits into independently and easily evolvableunits (Wagner and Altenberg, ’96). While modu-larity may indeed facilitate the quantitative pol-ishing of a trait, it leads to an evolutionary lock-inwith respect to significant structural modificationsof that trait. Resistance to structural change isthe hallmark of a module. Once modules are avail-able, the generation of further evolutionary nov-elty (or plasticity) then may shift from locked-inmodules to the combinatorial arrangement of mod-ules into new units.

1. RNA FOLDING AS A GENOTYPE–PHENOTYPE MAP

1.1. Why RNA?RNA combines genotype and phenotype into a

single molecule. This makes RNA folding in manyrespects a limited, but also a simple, model of agenotype–phenotype map. As a model system, RNAhas the advantages of both computational tracta-bility (Waterman, ’95) and suitability as a substratefor test tube evolution experiments (Joyce, ’89;Landweber, ’99). The RNA sequence–structure re-lation also occupies a rare intermediate level of ab-straction bridging the empirical and the formal.RNA folding algorithms are sufficiently realistic forcomputational discoveries to suggest worthwhileempirical investigations. At the same time, RNAfolding is sufficiently abstract to provide insightand to suggest axioms for the construction of sim-plified models that are analytically tractable.

1.2. Secondary structureRNA molecules are heteropolymers of (predomi-

nantly) four units called ribonucleotides. Ribo-nucleotides have a ribose phosphate in commonbut differ in the base attached to the sugar. Theessence of an RNA sequence is therefore capturedby a string over a four letter alphabet, each letterrepresenting a particular base: A for adenine, Ufor uracil, C for cytosine, and G for guanine. Hy-drogen bonds give rise to stereoselective recogni-

Page 4: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 245

tion between certain base pairs, specifically A·Uand G·C. This base pairing enables an RNA se-quence to be copied into a negative and back againinto a positive. The pairing is not always exact.Error rates depend on the molecular machinerythat assists in pairing and ligating the bases. Forexample, the per nucleotide error rate is 7 × 10–5

to 2.7 × 10–4 for Influenza and 3 × 10–4 for Coliph-age Qβ (Eigen et al., 1989). In this way, base pair-ing enables heredity in RNA viruses. We thereforetreat an RNA sequence as a genotype. The samebase pairing mechanism, however, also enablessegments of a sequence to pair with other seg-ments within the same sequence, causing it to foldback on itself into a three-dimensional structure.(For the formation of an intramolecular structureG·U is also an admissible pair.) This structure con-veys chemical behavior to the sequence and con-stitutes its phenotype.

The rapid replication time and simplicity of thephenotype make RNA a tractable laboratorymodel. RNA molecules can be evolved in the testtube using a variety of techniques for amplifica-tion, variation, and selection. Evolutionary opti-mization of RNA properties in the test tube occursreadily and effectively (Mills et al., ’67; Spiegel-man, ’71; Ellington and Szostak, ’90; Tuerk andGold, ’90; Beaudry and Joyce, ’92; Bartel andSzostak, ’93; Ellington, ’94; Ekland et al., ’95;Landweber and Pokrovskaya, ’99). For a recentreview see Landweber (’99).

Molecular structure in RNA can be character-ized at many levels of resolution. One empiricallywell-established notion is the secondary structure,which is the topology of binary contacts as theyresult from base pairing (Figure 1). The second-ary structure is a useful abstraction, since the pat-tern of base pairs provides both a geometric andthermodynamic scaffold for the tertiary structureof the molecule. This puts the secondary struc-ture in correspondence with some functional prop-erties of the tertiary structure.

A secondary structure on a sequence of lengthn can be represented as a graph of base pair con-tacts (Fig. 1). The nodes of the graph stand forbases at positions i = 1,...,n along the sequence.The set of edges includes the unspecific covalentbackbone connecting node i with node i + 1, for i= 1,...,n – 1, and those indicating pairings betweennon-adjacent positions. The set of such non-adja-cent pairings P has to satisfy two conditions: (i)every edge in P connects a node to at most oneother node, and (ii) if both i < j and k < l are in P,then i < k < j implies that i < l < j. Failure to

meet condition (ii) results in pseudoknots whichare interactions that belong to the next—the ter-tiary—level of structure. Both conditions distin-guish RNA structure from protein structure, inparticular condition (i) which builds RNA second-ary structure exclusively from binary contacts. Weuse a picture of the graph (Fig. 1) as the visuallymost immediate representation of a secondarystructure. We sometimes use a more convenientline-oriented representation of nested parenthe-ses, such as “((((.(((...))).(((...))).)))),” in which a dotstands for an unpaired position, and a pair ofmatching parentheses indicates positions that pairwith one another.

The elements of a secondary structure graph arecertain types of cycles or loops, see Fig. 1. Twocontiguous base pairs constitute the smallest loop.We make the reasonable assumption that the over-all energy of a secondary structure is the sum ofits loop energies. These have been measured andtabulated (Freier et al., ’86; Turner et al., ’88; Jae-ger et al., ’89; He et al., ’91) as a function of loopsize and delimiting base pairs. A stack of base

Fig. 1. RNA secondary structure graph. A secondary struc-ture is a graph consisting of structural elements called cyclesor loops: a hairpin loop occurs when one base pair encloses anumber of unpaired positions, a stack consists of two basepairs with no unpaired positions, while an interior loop hastwo base pairs enclosing unpaired positions. An internal loopis called a bulge if either side has no unpaired positions. Fi-nally, multiloops are loops delimited by more than two basepairs. A position that does not belong to any loop type is calledexternal, such as free ends or joints. Components are shapefeatures delimited by external bases.

Page 5: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

246 L. ANCEL AND W. FONTANA

pairs—a double-stranded region of several contigu-ous base pairs—is the major stabilizing element.The formation of an energetically favorable stack-ing region, however, implies the formation of anenergetically unfavorable loop constraining un-paired bases (for example, a hairpin loop). This“frustrated” energetics leads to a vast combinato-rics of stack and loop arrangements constitutingthe structural repertoire of an individual RNA se-quence. RNA is an excellent model system for se-quence–structure relations in biopolymersprecisely because of our ability to rapidly computethe set of all structures realized by a sequence.In particular, we use dynamic programming tocompute the minimum free energy secondarystructure and statistical mechanics quantities,such as the partition function (Nussinov et al.,’78; Waterman and Smith, ’78; Nussinov andJacobson, ’80; Zuker and Stiegler, ’81; Zuker andSankoff, ’84; McCaskill, ’90). The work in this pa-per makes use of the Vienna RNA folding pack-age (Hofacker et al., ’94–’98), a state-of-the-artlibrary of utilities and RNA folding programs rou-tinely applied in the laboratory.

1.3 Robust properties of RNA foldingRNA folding algorithms vary considerably in the

accuracy of secondary structure predictions for in-dividual instances (Huynen et al., ’97). Our mainconcern, however, is with statistical properties ofthe sequence to structure map as a whole, ratherthan with specific cases. We review two statisti-cal descriptions of the RNA folding map: typicalshapes and neutral networks (Fontana et al.,’93a,b; Schuster et al., ’94; Grüner et al., ’96a,b).

Sequence space and shape space are both veryhigh dimensional, and the sequence space is sub-stantially larger than the shape space. Analyticaltools developed in Stein and Waterman (’78) yieldan upper bound of only Sn = 1.48 × n–3/2 1.85n

shapes vis-à-vis 4n sequences, where n is the se-quence length (Hofacker et al., ’99). The mappingfrom sequences to minimum free energy shapesis significantly many-to-one.

Typical shapesRelatively few shapes are realized with very

high frequency, contrasting many rarely occurringshapes. More precisely, as sequence length goesto infinity, the fraction of such “typical shapes”tends to zero (their number grows neverthelessexponentially), while the fraction of sequencesfolding into them tends to one. Consider a numeri-cal example: In the space of GC-only sequences of

length n = 30, 1.07 × 109 sequences fold into218,820 shapes; 22,718 shapes (10.4%) are typi-cal in the sense of being formed more frequentlythan the average number of sequences per shape,and 93.4% of all sequences fold into these 10.4%shapes (Grüner et al., ’96a,b; Schuster, ’97).

Neutrality and neutral networksMany sequences have the same (typical) shape

α as their minimum free energy structure. We callsuch sequences “neutral” with respect to a. Astructure a thereby identifies an equivalence classof sequences. A one-error mutant of a sequencethat shares the same minimum free energy struc-ture as that sequence is called a “neutral neigh-bor.” By “neutrality” of a sequence we mean thefraction of its 3n one-error mutants that are neu-tral neighbors. This notion of neutrality pertainsto the minimum free energy structures of RNAsequences, and should not be confused with fit-ness-based neutrality.

Any given sequence has a significant fractionof neutral neighbors, and the same holds for theseneighbors. In this way, jumping from neighbor toneighbor, we can map an extensive mutationallyconnected network of sequences that fold into thesame minimum free energy structure (Schusteret al., ’94; Reidys et al., ’97). We termed such net-works “neutral networks” (Schuster et al., ’94).The ability to change a sequence while preserv-ing the phenotype is critical for evolutionary dy-namics. The role of neutrality has been typicallyviewed as a conservative one, buffering the phe-notypic effects of mutations. Neutrality, however,also enables phenotypic change, because it per-mits phenotypically silent mutations to influencethe phenotypic consequences of other mutations.

The boundary of a neutral network is the setof sequences that differ by one mutation from asequence in the network, but are not themselvescontained in the network. Transitions betweenstructures are transitions between adjacent neu-tral networks. This suggests a measure of near-ness between RNA structures based on thefraction of common boundary shared betweentheir corresponding networks in sequence space(Fontana and Schuster, ’98a,b), rather than onshape similarity. RNA shape space is then orga-nized by an accessibility relation based on the ad-jacency of neutral networks in genotype space.Such a topology enables a formal definition of con-tinuous and discontinuous phenotypic change in-dependently of fitness criteria (Fontana andSchuster, ’98a). Figure 2 describes the RNA shape

Page 6: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 247

transformations that are discontinuous in this ac-cessibility sense.

1.4. Plasticity in RNAPlasticity is the genetic determination of a range

of phenotypic possibilities, where the realized phe-notype depends on the environmental context ofthe organism. If the environment is relatively con-sistent or the trait is fixed in development, thenplasticity may only be evident over several gen-erations. In other cases, a plastic individual may

change phenotypes during its lifetime throughlearning or by reacting to environmental stimuli.For a single RNA molecule, unspecific contact witha heat bath triggers transitions between molecu-lar configurations, provided the energy barriersbetween configurations are sufficiently low. Therange of configurations realized by an RNA mol-ecule at a constant temperature depends on theenergies of the configurations. The kinetic processof folding into the minimum free energy configu-ration is the RNA analogue of development. Ran-

Fig. 2. Continuous and discontinuous RNA shape trans-formations. The figure illustrates transformations betweenRNA secondary structure motifs. Solid (dashed) arrows indi-cate continuous (discontinuous) transformations in the to-pology of Fontana and Schuster (’98). Three groups oftransformations are shown. Top: The loss and formation ofa base pair adjacent to a stack are both continuous. Middle:The opening of a constrained stack (e.g., closing a multiloop)is continuous while its creation is discontinuous. This reflects

the fact that the formation of a long stack upon mutation ofa single position is a highly improbable event, whereas theunzipping of a random stack is likely to occur as soon as amutation blocks one of its base pairs. Bottom: Generalizedshifts are discontinuous transformations in which one or bothstrands of a stack slide past one another, ending up with orwithout an overlap. Accordingly, generalized shifts are dividedinto the four classes shown.

Page 7: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

248 L. ANCEL AND W. FONTANA

dom transitions between kinetically accessiblestructures that are energetically close to the mini-mum free energy configuration are an unavoid-able consequence of the stochasticity of molecularmotion, akin to “developmental noise.”

A microscopic kinetic folding model for RNA hasbeen developed recently (Flamm et al., ’99), butat present it is too involved for the goals of thispaper. Instead, we exploit an extension of the stan-dard thermodynamic minimum free energy fold-ing algorithm which permits the computation ofall secondary structures within some energy rangeabove the minimum free energy (Wuchty et al.,’99). This suboptimal folding algorithm yields fastaccess to the low-energy portion of the secondarystructure space of a given sequence. We neglectenergy barriers and assume that a sequenceequilibrates among all structures whose free en-ergy is within 5kT from the minimum free en-ergy configuration. The 5kT choice amounts toapproximately 3 kcal at 37°C and corresponds tothe loss of two G·C/C·G stacking interactions. Un-der thermodynamic equilibration, the Boltzmannprobability of a structure σ, exp(–∆Gσ/kT)/Z, cor-responds to the overall fraction of time that themolecule spends in σ, where ∆Gσ is the free en-ergy of structure σ, k is the Boltzmann constant,T the absolute temperature, and Z = Στ exp(–∆Gτ/kT), the partition function. The latter is computedby an algorithm described in McCaskill (’90).

This defines a genotype–phenotype map thattakes a sequence to a set of structures and theiroccupation times. We shall refer to this map asthe “plastic map,” to distinguish it from the“simple map” where a sequence is associated withits minimum free energy structure only (Fig. 3).

Under natural selection toward a target struc-ture (section 2), the obvious advantage for a plas-tic sequence that covers a broad spectrum ofstructures is the increased likelihood that someof them are structurally similar to the target. Aspointed out in Scheiner (’93) the cost of plasticitymay be in terms of maintaining the cellular ma-chinery required for plasticity, rather than the di-rect impact on fitness due to the realization of aparticular plastic trait. In the stripped down worldof RNA there is no such machinery beyond themolecule itself. Yet, the cost of plasticity is evi-dent: the broader the range of structures, the lesstime the sequence will spend in any one of them.Thus, even if some structures constitute an im-provement, this can easily be undermined by asmall occupancy time. Incidentally, this is analo-gous to Schmalhausen’s argument (’49) that one

cost of plasticity is given by “erroneous” pheno-typic changes (Ancel, ’99a). The erroneous changesin RNA are residencies in detrimental structures.

Biologists have drawn a distinction betweentwo kinds of plasticity: phenotypic plasticityproper often refers to beneficial responses tomacroenvironmental variation, while environ-mental variance refers to flexible responses tomicroenvironmental parameters (Waddington,’57; Gavrilets and Hastings, ’94). Our work inRNA considers the latter. A given sequence as-sumes a range of structures in response to itsmicroenvironment—energy fluctuations at aconstant temperature.

1.5. Epistasis in RNAEpistasis is the extent to which the phenotypic

consequences of a mutation at position i dependon the genetic background provided by the re-maining sequence. One common estimate for theepistasis of a genome measures the deviation ofpairwise gene interactions from additivity (Wag-ner et al., ’98).

Some epistatic interactions increase the neutral-ity of a sequence by making mutations at othersites inconsequential. Consider the sequenceand its minimum free energy structure at thecenter of Fig. 4. We call a sequence position“neutral” if at least one out of three possiblemutations at that locus leave the structure in-variant. The neutral positions, like the one la-beled x, are marked with grey bullets. Theneutral mutation from C to G at position xyields the (same) structure shown at the topleft of Fig. 4. The “+” symbols indicate positionsthat have become neutral as a result of thismutation, while the “–” symbol marks a posi-tion that has lost its neutrality. This illustratesthat even if a point mutation does not affectthe structure, it can alter the extent of neu-trality across the sequence. Epistasis as the ge-netic control of neutrality plays an important rolein the evolutionary dynamics we describe in sec-tion 2.

1.6. Neutrality: a note in terminologyIn section 1.3 two sequences are called “neu-

tral” if they share the same minimum free en-ergy structure. The term neutral, however, maybe used to indicate equal fitness. Under the simplemap, the phenotype of an RNA sequence is justits minimum free energy structure (Fig. 3). In thiscase, neutrality with respect to minimum free en-ergy structures implies neutrality with respect to

Page 8: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 249

both phenotype and fitness. Under the plasticmap, however, the phenotype of an RNA sequenceis a repertoire of energetically favorable struc-tures. In this case, neutrality with respect to mini-mum free energy structure is not equivalent toneutrality with respect to phenotype or fitness.Two sequences that share a minimum free energystructure, will differ in their remaining repertoire.Nevertheless, we maintain the use of minimum

free energy neutrality for two reasons. First, it isrelevant to the phenotype of sequences under theplastic map, as they spend a majority of time inthe minimum free energy structure. Second, evo-lutionary dynamics under the simple map—whichconstitute our baseline for comparison—have his-torically been characterized by this notion (Huy-nen et al., ’96; Fontana and Schuster, ’98a; Reidyset al., ’98).

Fig. 3. Genotype–phenotype maps. The lower part illus-trates the simple map that takes a sequence to its minimumfree energy structure as the only phenotype. The upper partschematizes the plastic map that takes a sequence to the ther-modynamic spectrum of shapes within 5kT (typically T =310.15 K). If shape σ has free energy ∆Gs, the sequence is

assumed to spend a fraction of time in s that is given by itsBoltzmann factor, exp(–∆Gs/kT)/Z (values indicated on theleft). The minimum free energy structure is the dominantphenotype, in the sense that the sequence spends the largestfraction of time in it.

Page 9: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

250 L. ANCEL AND W. FONTANA

2. SIMULATED EVOLUTION IN PLASTICAND NON-PLASTIC RNA POPULATIONS

2.1. Model setupIn this section we describe simulations of evolv-

ing RNA populations. Under the simple genotype–phenotype map (section 1.4) a sequence S has asingle minimum free energy structure σ0 as itsphenotype. To determine the fitness of S, we com-pare s0 to a prespecified target structure t.

This fitness function is motivated by an experi-mental protocol in which RNA sequences areevolved, for example, to optimally bind a ligand.RNA sequences are artificially selected by run-ning the RNA sample through a column that hasthe ligand tethered to the filling material. The de-sirable portion of the sample remains bound inthe column and is eluted by a solvent with suit-able ionic strength (Ellington and Szostak, ’90;Tuerk and Gold, ’90). The selected portion is then

subject to a further cycle of evolution by amplifi-cation through replication at a controlled errorrate. The evolutionary end product is typicallyunpredictable in the laboratory situation. Its pos-sible shape(s) are, however, implicitly specified bythe choice of the ligand, like a simple lock speci-fies its key. We are not seeking RNA shapes withparticular chemical properties, since little isknown about the link between an RNA structureand its binding properties or catalytic behaviors.Instead, we specify the optimal shape directly atthe outset, shortcutting the role of ligands in thelaboratory. We then study the evolutionary dynam-ics, the evolutionary histories, and the thermody-namic properties of evolved sequences.

We define the selective value f(s) of shape s asa hyperbolic function of the Hamming distanced(s,t) between the parenthesized representations(section 1.2) of σ and the target t:

Fig. 4. Epistasis.

Page 10: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 251

=+

1( ) ,0.01 ( , ) /

fd n

σσ t

(1)

where n is the length of the sequence. The resultsreported here are robust to changes in the func-tional form of the selective value. In particular, theyalso hold under linear and exponential forms.

The fitness rj of a sequence Sj is its replicationrate constant, and for the non-plastic case we sim-ply take it to be:

= 0( ).jjr f σ (2)

Our population evolves in a model chemical flowreactor whose outflow is regulated to maintain anearly constant total sequence population size(Eigen, ’71). The entire system is simulated interms of continuous time stochastic chemical re-action kinetics. [For a description see Fontana andSchuster (’87) and Huynen et al. (’96); for a gen-eral simulation technique see Gillespie (’76, ’77).]The number of replication events per time unit inthe flow reactor depends on the replication rateconstants of the individual sequences comprisinga population, and changes over time as the popu-lation evolves to larger rate constants. When com-paring different runs, we therefore plot statisticsalong replication events rather than the timeshown by an external clock.

Point mutations are the sole source of geneticvariation in our simulations. Unless otherwisestated, the replication accuracy per nucleotide po-sition is 0.999, the sequence length is 76 (whichcorresponds to a short tRNA sequence), and thereactor capacity is 1,000 sequences. Simulationsbegin with a homogenous population of a ran-domly generated sequence species.

The plastic fitness function is a simple exten-sion of the one above. For a sequence Sj, we con-sider all of its suboptimal structures s i

j whose freeenergy E(s i

j) is within an interval of size ∆ abovethe minimum free energy E(s 0

j). The s ij are indexed

with increasing energy, where index 0 refers to theground state, and the set of all suboptimal struc-tures accessible within [E(s 0

j), E(s 0j) + ∆] is denoted

with σj(∆). Unless otherwise specified, ∆ = 5kT,where T is the absolute temperature, which is fixedat 310.15 K (37°C), and k is the Boltzmann con-stant, k = 1.98717 × 10–3 kcal/K. Thus, ∆ = 3.08kcal. The macroscopic environment of our modelconsists in the target shape t and the temperatureT. Throughout this paper we only consider fixed tand T.

The selective value of a suboptimal structure

s ij is given by f (s i

j) (eq. 1), exactly as in the non-plastic case. For a plastic sequence, however, s i

j

contributes this value to the overall fitness of thesequence in proportion to the time the sequencespends in it. In the laboratory, a sequence flow-ing through the selection column will switchamong its alternative structures, such that eachstructure contributes to the overall binding prob-ability of the sequence in proportion to its Boltz-mann factor. Hence,

∈ ∆

−= ∑

644474448selective value probability of of

( )

exp( ( ) / )( ) ,

ji

ji

j ji

jj i

j i

E kTr f

Z

σσ

σ σ

σσ (3)

where Z is the partition function. Here we assumethat a sequence achieves thermodynamic equali-bration among its alternative structures. High-en-ergy barriers between the structures, however, mayrender this assumption kinetically unrealistic ontime scales shorter than the experiment. Our un-derstanding of the kinetic process of RNA foldingis too immature at present to assess this claim.

2.2. Discussion of sample runsRNA shape space topology is invariant toplasticity

Figures 5 and 6 juxtapose the progress of a simpleand a plastic population that evolve toward a tRNAtarget shape. We will denote population averageswith –·$. The black curves trace –d(s 0

j,t)$, the aver-age distance between the target and the minimumfree energy shapes realized by the sequences inthe population.

The plastic and simple populations exhibit quali-tatively similar behavior: initial relaxation followedby punctuated dynamics. Steps toward the targetshape correspond to difficult transformations fromone dominant shape to another. Recall from sec-tion 1.3 that the difficulty of discovering minimumfree energy shapes through mutation is indepen-dent of the fitness assigned to shapes. An evolu-tionary transformation of one shape into anothercorresponds to a transition between their neutralnetworks in sequence space. In the simulations,flat periods correspond to populations that aremutationally isolated from higher fitness pheno-types and are genetically diffusing along a shape-neutral network or along several shape-neutralnetworks that are fitness-neutral with respect toeach other. Steps, or difficult shape transforma-tions, are transitions between neutral networkswith a small common border. Such shape trans-

Page 11: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

252 L. ANCEL AND W. FONTANA

formations involve the simultaneous shift of sev-eral base pairings in a single mutational event.

The gray curves in Figs. 5 and 6 monitor–d(s j,t)$, the population average of theweighted shape distance. For any given sequence,–d(s j,t)$ is the Boltzmann weighted average dis-

tance between its suboptimal shapes and thetarget:

∈ ∆= −∑

( )

1( , ) ( , )·exp( ( ) / ).j ji

j j ji id d E kT

Z σ σσ σ τ σt (4)

Fig. 5. Evolution under plastic and simple genotype–phe-notype maps (example 1). These graphs depict the evolutionof the population average of the distance between the mini-mum free energy structures in the population and the target(a), as well as the average weighted shape distance to target(b). (See text for definitions.) The inset magnifies the trajec-

tories through the first few million replication events. In theplastic case, fitness improvements correspond to transitionsmanifest in the average weighted distance to the target. Thetarget is a tRNA cloverleaf structure, the population size islogistically constrained to fluctuate around 1,000 sequences,and the replication accuracy is 0.999 per position.

Fig. 6. Evolution under plastic and simple genotype–phenotype maps (example 2). Seecaption to Fig. 5.

Page 12: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 253

In the plastic case, –d(s 0j,t)$ dictates the trajec-

tory of –d(s j,t)$. The minimum free energystructure of a sequence S is usually closer tothe target than any other suboptimal structureof S. When this is the case, the Boltzmannweighted average distance to target over allsuboptimals, not just those within s j(∆), wouldnecessarily be larger than the average mini-mum free energy distance to target, –d(so

j,t)$.But –d(s j,t)$, which considers only the shapesin s j(∆), remains less than –d(s 0

j,t)$. This indi-cates that the suboptimal structures are not muchworse than the minimum free energy structureor that their probabilities are relatively small. Be-low we show that both of these factors play a role.This is in contrast to the statistics for evolutionunder the simple map where the –d(s j,t)$ fluctu-ates wildly at values much higher than –d(s 0

j,t)$.

No Baldwin expediting effectFigures 5 and 6 represent the outcomes of a

large set of similar simulations: evolution proceedsmuch more slowly under the plastic map, andplastic populations never approach the target asclosely as do simple populations, which often at-tain the target. We defer the explanation of thisstriking contrast until section 3.

Here we take issue with a generalization oftendrawn from the Hinton and Nowlan class of mod-els (Hinton and Nowlan, ’87; Maynard-Smith, ’87)that phenotypic plasticity expedites evolution byeffectively smoothing the fitness function. Morespecifically, they claim that plastic individualsscan a wider range of phenotypes, and thereforehave a better chance of locating advantageousones than non-plastic individuals who are blindto other possibilities. Plasticity will accelerate ge-netic evolution, however, only if the phenotypesrealized through plasticity correspond to the phe-notypes that can be reached through mutation. Insection 3 we show that such a correlation exists inRNA (plastogenetic congruence), and that it ulti-mately leads populations into an evolutionary dead-end. In this section, we counter the claim thatplasticity speeds up the early stages of the evolu-tionary process. On average, plasticity does notshorten the early periods of stasis preceding phe-notypic innovations. To see this, we compare theearly evolutionary trajectories under the plasticmap and the simple map (see the inset of Fig. 5,which magnifies the first few million replications).

Recent work specifies the restrictive population-genetic conditions for a Baldwin expediting effect(Ancel, ’99b). In the present case, the absence of

an expediting effect results from features intrin-sic to the RNA genotype–phenotype map. Considerthe evolutionary transition from a structure α toa higher fitness structure b. The simple popula-tion must wait for a mutation that takes a se-quence from minimum free energy structure a tominimum free energy structure b. In the plasticpopulation, this transition may be mediated by asequence that has minimum free energy structurea and suboptimal structure b. Call this sequenceS. The opportunity to encounter b as a subopti-mal may expedite the production of a mutantwith minimum free energy structure b, if thereis a correlation between the suboptimal struc-tures of a sequence and the minimum free en-ergy structures produced through mutations onthat sequence.

Assume that b is difficult to access from α inthe shape space topology sensu Fontana andSchuster (’98a) (section 1.3). This is typically thecase at the major phenotypic transitions in thesimulations. We claim that the discovery of indi-vidual S under the plastic map is approximatelyas unlikely as the first appearance of an individualwith b as its minimum free energy shape in a non-plastic population dominated by a. That is, plas-ticity does not significantly expedite the searchfor a higher fitness structure.

First we note that the individual S is alwayspossible in principle. If we relax conditions by ask-ing if there is a sequence S′ that can assume bothshapes a and b irrespective of their energies, thatis, dropping the requirements that b is in the 5kTband and a is the minimum free energy struc-ture, then such a sequence S′ exists for any choiceof a and b. This was proved by Reidys et al. (’97).A sequence is said to be compatible with a struc-ture if it can form that structure, regardless ofwhere it ranks energetically. A given structure α,then, determines a set consisting in all sequencescompatible with a. Our individual S′ must there-fore be an element of the intersection between thecompatibility set of a and the compatibility set ofb. The theorem of Reidys et al. (’97) states thatthe intersection of any two compatibility sets isnon-empty.

One can compute the size of these intersectionsWeber (’97). Figure 7 provides a nontechnical il-lustration. A sequence is compatible with a struc-ture α if positions that are supposed to pair in αare occupied by nucleotides that are actually ableto pair. Satisfying two structures at once entailsadditional constraints. If the two structures are“independent,” meaning that positions that pair

Page 13: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

254 L. ANCEL AND W. FONTANA

in one structure are distinct from positions pair-ing in the other, the constraints simply add up.If, however, the two structures are non-indepen-dent, additional constraints arise because certain

positions must now satisfy two pairings simulta-neously, and, depending on how the two structuresare related, such constraints propagate alongchains of dependencies. The constraints on nucle-otides for a sequence that must satisfy two par-ticular structures increase with the number andlength of such chains. The most severe constraintsarise from structure pairs that differ in shifts andtheir generalization (see Figs. 2 and 7). Thesehighly constrained sequences are precisely thosethat enable the difficult shape transformations inthe accessibility topology of Fontana and Schuster(’98a,b). These transformations correspond to theevolutionary steps of Figs. 5 and 6. Plasticity ex-pedites these transformations only after the popu-lation finds these highly constrained sequences,that is, the intersection of the compatibility setsof a and b. Yet, the difficult transformations cor-respond to the smallest overlap regions. These re-gions are, in essence, as hard to find as theboundary between the neutral networks of a andb. The benefit gained once the population is inthe overlap regions is negligible compared to thetime required to find these regions in the firstplace. The search for these regions is made evenmore challenging by the requirement that the in-termediary sequence must not only be compatiblewith a and b, but it must have α as its minimumfree energy structure and b within 5kT. There-fore, if the transformation from a to b is hard toachieve in the simple case, it is approximately asdifficult to achieve in the plastic case. In this way,the failure of plasticity to expedite the discoveryof new structures stems from the organization ofthe RNA genotype–phenotype map.

Environmental canalization (reductionof plasticity)

In Fig. 8 we monitor the time course of fivepopulation plasticity statistics for the plastic case.We discuss three of them here. Curve i in Fig. 8shows the average fraction of the partition func-tion realized within 5kT, which is

∈ ∆= −∑

( )

1 exp( ( ) / ).j ji

j jiz E kT

Z σ σσ (5)

This indicates the total probability of the struc-tures in 5kT relative to all structures attainableby sequence Sj, that is, how much of the partitionfunction is accounted for by the 5kT neighborhoodof the minimum free energy structure. Curve ivshows the average number of structures realized

Fig. 7. Satisfying two structures at once (after Weber ’97).Top: Two structures that differ in a “shift”—a transforma-tion in which one strand slides past the other (see Fig. 2).Middle: The same structures represented as circles with thebase pairs drawn as chords. Bottom: The two circles super-imposed to illustrate the additional constraints on a sequencethat arise from having to satisfy both structures simulta-neously. Position 1 must have a nucleotide that can pair withboth position 13 and 12. But the nucleotide choices at 12 musttake into account its complementarity to position 2 as well,and so on until the end of this chain is reached at position 9.Any such chain can be satisfied, and hence there exists atleast one sequence compatible with any two structures. [Asecondary structure can be thought of as a permutation act-ing on sequence positions. For example, the permutation cor-responding to the structure on the left specifies that positions{5, 6, 7, 8, 9} are fixed points, and that 4 is assigned to 10, 10to 4, 3 to 11, 11 to 3, etc. The two permutations correspond-ing to the two structures generate a dihedral group, and thechain of constraints in the bottom circle is one orbit obtainedby the action of this group on the sequence positions (Reidysand Stadler, ’96; Reidys et al., ’97).]

Page 14: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 255

in 5kT, –|s j(∆)|$, while curve v tracks the aver-age structure variance, –var s j(∆)$, with

∆ = −22

0 0var ( ) ( , ) ( , ) ,j j j j ji id dσ σ σ σ σ (6)

where the bar indicates an average over the struc-tures in s j(∆). In eq. (6) we make use of the factthat the notion of variance can be generalized tosets (here of structures) for which a mean does

not exist, but for which one can compute pairwisedistances between the elements. Finally var s j(∆)conveys information about the structural diver-sity in the plastic repertoire of sequence Sj, while|s j(∆)|simply counts the structures. These arethree quantitative measures of RNA plasticity. Wejuxtapose these to adaptive events indicated bythe evolutionary trajectories (curves vi and vii).

After an initial period, –|s j(∆)|$ levels off to be-tween 126 and 254 structures for the simple case

Fig. 8. Plastic evolution. The time evolution of fiveplasticity statistics (i–v) are shown in conjunction withthe major shape transitions (curves vi and vii). See text

for the definitions of the quantities monitored. The insetenlarges a population-level Simpson–Baldwin effect dis-cussed in the text.

Page 15: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

256 L. ANCEL AND W. FONTANA

(not shown). In contrast, the plastic curve (iv) at-tains much lower, fairly constant levels of about30, 13, and 20 structures. This shows a popula-tion-wide reduced plasticity compared to thesimple case and the initial condition (204 struc-tures in 5kT). The loss of plasticity occurs veryrapidly. The reduction is also evidenced by the av-erage fraction of the partition function, –z j

$ (curvei), which rapidly climbs to 0.9 and settles at 0.95,while the simple case remains in the range [0.65,0.75] (not shown). The structure variance, curvev (range in the plastic case [7, 15], versus rangein the simple case [46, 92]), indicates that alter-native structures in the 5kT band become moresimilar to the ground state.

Transitions between evolutionary epochs aremarked by dashes running vertically through allgraphs. The dynamics surrounding a transitionbreak down into four episodes: (a) the populationis dominated by a structure with no indicationsof a better structure; (b) a sequence with a bettersuboptimal structure arises and dominates thepopulation; (c) natural selection favors mutationsthat move that structure to the minimum free en-ergy; (d) it then favors subsequent mutations thatstabilize the new minimum free energy structureby eliminating variation in the remaining plasticrepertoire. As described above, populations spendlong periods of time in stage a. Stage b is rapidand typically does not transpire at the populationlevel. Stage c, during which the new structure hasbecome the minimum free energy structure buthas not yet been stabilized, is well-illustrated bythe dip in –z j$ immediately after transitions 2, 3,and 6 in Fig. 8. The subsequent rapid assimila-tion to pre-transition levels reflects the stage d.We see even more dramatic evidence for the fourthstage at the beginning of the evolutionary trajec-tory, immediately after transition 1, with the dropin the number of structures within 5kT, –|s j(∆)|$

(curve iv) in Fig. 8.

Example of a population-level Simpson–Baldwin effect

The four episodes surrounding transitions con-stitute the Simpson–Baldwin effect. As discussedin the Introduction, there is a discovery phase(stages a and b) and an assimilation phase (stagesc and d). Transitions 4 and 5 in Fig. 8 are some-what unique in that the entire Simpson–Baldwineffect is manifested at the population-level, al-though not strictly chronologically. At transition4, a new advantageous sequence species S with asuboptimal structure b that is closer to the target

than its minimum free energy shape α begins toinvade the population. After 106 replication eventsthe first sequences S′ appear with b as their mini-mum free energy structure. The population retainsa, however, as a frequent suboptimal shape withhigh Boltzmann weight. Mutation shifts aroundweights between a and b. Although a is selectivelyworse than b, this can be offset by a higherBoltzmann probability. Shape a thereby succeedsin remaining the most prevalent minimum freeenergy structure. This shows up as a discrepancybetween the population averages of the weighteddistance to target (curve vii), –d(s j, t)$, and thedistance based only on the minimum free energystructures (curve vi), –d(s 0

j, t)$; see the enlargedinset at the bottom of Fig. 8. Upon 2 × 105 furtherreplications, sequences with β as the minimumfree energy structure take over the population,and –d(s 0

j, t)$ consequently drops toward –d(s j,t)$.The situation reverses, however, as Boltzmannprobabilities shift again. After 6.5 × 105 replica-tions, b disappears completely as a minimum freeenergy structure and is observed only as a subop-timal. Finally, 3 × 105 replications later the finaltransition back to sequences with b as minimumfree energy structure occurs, accompanied by arealignment of –d(s j,t)$ with –d(s 0

j, t)$. The plot of–zj$ indicates a significant subsequent reductionin plasticity that stabilizes b (label “f” in graph iof Fig. 8). In terms of minimum free energy struc-tures, events 4 and 5 taken together constitute onesingle transition that occurs in two stages. Event 4marks the onset of the Simpson–Baldwin discov-ery phase which lasts until event 5 when the as-similation phase begins. This is the only examplewe encountered of both Simpson–Baldwin phasesbeing revealed at the population level.

2.3. SummaryA quantitative characterization of simulated RNA

evolution under the plastic and the simple geno-type–phenotype maps, shows the following points:

• The punctuated character of the evolutionarydynamics holds for RNA populations evolv-ing under both the simple and the plasticgenotype–phenotype map.

• The populations evolving under the plasticmap halt evolutionary progress much fartheraway from the target structure than thoseevolving under the simple map.

• Plastic populations show no evolutionaryspeed-up (or Baldwin expediting effect); over-all they exhibit a strong slow-down.

Page 16: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 257

• A strong reduction in plasticity occurs in theplastic case. The Simpson–Baldwin effect isobserved along individual founding lineagescausing the phenotypic transitions. The dis-covery phase, however, typically occurs rap-idly and is not evident in population-levelstatistics. We observed one example of aSimpson–Baldwin effect that was entirely ap-parent at the macroscopic population level.

3. PLASTOGENETIC CONGRUENCE ANDNEUTRAL CONFINEMENT

3.1. Plastogenetic congruenceSection 2 demonstrates that a plastic popula-

tion evolving toward a constant target undergoesa drastic narrowing of plastic repertoires and ul-timately hits an evolutionary dead-end. In the cur-

rent section, we construct a causal bridge fromthe loss of plasticity to the loss of evolvability byarguing a significant correlation between thestructures available to a sequence in its plasticrepertoire and the minimum free energy struc-tures present in the mutational vicinity of thatsequence. First we use a concrete example to pro-vide intuition for the mechanistic underpinningsof such a correlation. Then through statistical evi-dence, we claim that it is a fundamental propertyof the genotype–phenotype map.

The top left corner of Fig. 9 shows a short se-quence A and its minimum free energy structurea together with the list of all suboptimal configu-rations within 5kT of the minimum free energy.While the sequence spends 52% of the time in a,its relative stability is only marginal, as it com-petes with a number of energetically close subop-

Fig. 9. Plasticity and minimum free energy shapes of ge-netic neighbors. The figure illustrates how the plasticity of asequence correlates with the minimum free energy structuresof one-error mutants. Arrows show point mutations. The mu-

tation from B to B′ involves the same nucleotide substitutionat the same position as the mutation from A to A′. The onlydifference is the slight change in context due to the neutralmutation from A to B. See text for details.

Page 17: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

258 L. ANCEL AND W. FONTANA

timal structures. These alternative structures areenergetically easy to access because base pair in-teractions holding together the stem in a areweak. For example, the entire stem can switchinto a different position, such as in the 4th sub-optimal configuration b.

Significantly, a’s marginal stability coincides withan increased sensitivity to mutations. For example,mutating a position inside a’s loop (arrow) tips thethermodynamics in favor of b. The result is achange in the minimum free energy structure froma to b (upper right corner of Fig. 4).

Now consider a neutral mutation—one that doesnot displace α as the minimum free energy struc-ture—transforming A into B (lower left corner ofFig. 4). The mutation strengthens the thermody-namic stability of the stacking region of a. TheBoltzmann weight of a increases from 52% in se-quence A to 88% in B. Notice that the set of alter-native structures available to B within 5kT hasbecome considerably smaller compared to A. Inaddition, the few alternative structures that oc-cur with appreciable probability have becomemore similar to the ground state. The relativestability of b (which is still compatible with themutated sequence) has dropped by one order ofmagnitude and disappeared from the set of 5kTsuboptimal structures in B. This coincides withan increased robustness of a towards muta-tions. In fact, the same nucleotide change thatled from A to A′, changing its minimum freeenergy structure, leaves α invariant when ap-plied to sequence B.

The upper arrow from A to A′ illustrates the cor-relation between structures in the plastic reper-toire and structures in the mutational vicinity. Inparticular, this point mutation changes a (5kT) sub-optimal structure into the minimum free energystructure. It is not difficult to imagine that a se-quence with b in its plastic repertoire is much morelikely to have a genetic neighbor with b as the mini-mum free energy structure, then a sequence thatlacks b in its plastic repertoire. The arrows from Ato A′ and further to B′ illustrate the epistatic con-trol of neutrality (section 1.5). This is a special caseof the correlation just described: the more time asequence spends in its minimum free energy struc-ture, the higher the fraction of neutral neighborsin its genetic vicinity.

To summarize, we call a shape s “plastically ac-cessible” to a sequence S if s is in the plastic rep-ertoire of S. Likewise, s is “genetically accessible”from S, if there exists a one-error mutant S′ of S,such that s is the minimum free energy struc-

ture of S′. The alignment of plastic accessibilitywith genetic accessibility means that the set ofshapes into which a particular sequence can fold(plastic accessibility) strongly correlates to theminimum free energy shapes realized by its one-error mutants (genetic accessibility). We call thisproperty of the genotype–phenotype map “plasto-genetic congruence.”

The genetic accessibility of phenotypes con-strains an evolutionary trajectory under thesimple map (Fontana and Schuster, ’98a,b). In theplastic simulations, however, plastic accessibility,through its alignment with genetic accessibility,has an equally profound impact on the evolution-ary dynamics. We return to this after a quantita-tive demonstration of plastogenetic congruence forthe RNA folding map.

Ideally we would perform statistics on a broadsampling of genotypes to show the extent of over-lap between plastic repertoires and minimum freeenergy structures of genetic neighbors. In lieu ofthis computationally prohibitive approach, wepresent three pieces of partial evidence for plasto-genetic congruence:

1. The frequency of a structure b as a minimumfree energy structure among the one-mutantneighbors of a sequence S is significantlylarger for sequences that have b in their plas-tic repertoires than for sequences that do not.

2. The minimum free energy structure a of asequence S is present at high frequency inthe plastic repertoires of one-mutant neigh-bors of S.

3. For any advantageous shape b in the plasticrepertoire of any sequence S, S can typicallyevolve to another sequence S′ with b as itsminimum free energy structure, in only (onaverage) five point mutations.

In the first approach we generate random se-quences Sα with a particular structure a as theminimum free energy structure [“inverse folding,”see Hofacker et al. (’94)]. Since two sequences thatshare the same minimum free energy structureoften share other suboptimal structures, thissimple procedure yields sets of sequences Sb

a thatshare the same ground state α and that also havea particular shape b in their 5kT-plastic reper-toire. We simultaneously obtain control samplesof sequences Sa that specifically lack the shape bfrom their plastic repertoire. We then compute theminimum free energy shapes of all one-mutantneighbors of sequences in the sample Sb

a and in

Page 18: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 259

the control sample Sa, and compare the frequen-cies of sequences with b as the minimum free en-ergy shape. Although a systematic exploration ofthe possible shape combinations a and b is un-feasible, Table 1 provides anecdotal evidence forour proposition: A sequence (with minimum freeenergy structure a) that has a structure b amongits 5kT suboptimals is much more likely to havea one-error mutant with b as its minimum freeenergy structure than a sequence (with minimumfree energy structure a) that lacks b among its5kT suboptimals.

Table 1 shows ratios of likelihoods. The topseven structures b are in the neighborhood of αin the sense of the shape topology developed inFontana and Schuster (’98a,b). We are able to at-tain larger samples for the top structures as theyappear more frequently as suboptimals of se-quences with minimum free energy structure a.The likelihood ratios indicate that these subopti-mal structures are easily converted into minimumfree energy structures by a single point mutation.

The last row of Table 1 shows the only counter-example we found when α is the tRNA cloverleaf.We observe similar patterns for other groundstates a, and for sequence lengths other than 76.

In a second approach, we generate a sample ofsequences with minimum free energy structure α(a “neutral set” of a), and check for the presenceof α in the plastic repertoires of sequences ob-tained by one point mutation from the neutral set.We search and calculate statistics over only theone-mutant neighbors that are compatible (seesection 2.2) with α, since only these can have aas a suboptimal in the first place. (Given a se-quence S, the fraction of one-error mutants com-patible with a is 1 + (nGU – 5nbp)/3n, where n isthe sequence length, nbp is the number of basepairs in a and nGU is the number of GU pairingsthat would occur when S folds into a.) While se-quences that are compatible with a have, by defi-nition, a among their suboptimals at some energy,we are only interested in the limited plasticityrange of 0 ≤ ∆ ≤ 10kT from the minimum free

TABLE 1. Plastogenetic congruence I1

Ground state (a)((((((...((((........)))).(((((.......))))).....(((((.......))))).))))))....

Suboptimal shape (b) o-ratio n-ratio

((((((...((((........)))).((((((.......)))).....(((((.......))))).)))))).... 9.0 8.0(((((....((((........)))).(((((........)))).....(((((........)))))..))))).... 7.4 6.7((((((...(((..........))).(((((........)))).....(((((........))))).)))))).... 6.6 4.9.(((((...((((........)))).(((((.......))))......(((((.......))))).)))))...... 6.2 4.8.........((((........)))).(((((........)))))......(((((.......)))))).......... 14.0 3.9((((((...((((.......)))).(((((.......)))))......((((..........)))).)))))).... 4.0 3.3((((((...((((.......)))).((((.........)))).....(((((.......))))).)))))))).... 3.5 2.8

(((((((....(((.......)))..(((((.......))))).....(((((........))))).)))))).... 15.8 14.0..((((...((((........)))).(((((.......))))).....(((((........))))).))))...... 9.7 10.8((((((...((((........)))).(((((.......))))).(...(((((........)))))))))))).... 7.5 7.5((((((.(.((((........)))).)((((.......))))......(((((........))))).)))))).... 10.8 7.1((((((...((((........))))..((((........))))......((((........))))).)))))).... 7.0 6.2(((((....((((........)))).(((((.......)))))...(.(((((.......))))).)))))).... 7.3 5.7((((((...((((.(.....))))).(((((......))))).....(((((.......))))).)))))).... 6.3 5.5((((((....(((........)))(.(((((.......))))).)...(((((.......)))).)))))).... 7.0 4.7((((((.(.((((.......))).(((((........))))).......((((.......)))).)))))).... 5.2 4.2((((((...((((.(....).)))).(((((.......))))).....(((((......)))).)))))).... 2.8 3.2((((((.(.((((.........)))).(((((......))))).....(((((......))))))))))).... 2.7 2.4((((((.(.((((.........))))((((((......))))).....(((((......))))).))))).... 0.5 0.41Samples of sequences with a tRNA cloverleaf as the minimum free energy structure a (bottom line) and various structures b as suboptimalconfigurations were generated as described in the test. The frequency of b becoming the minimum free energy structure upon one pointmutation was computed for the sample of sequences having b as a suboptimal and for a control sample of sequences lacking b within a 5kTrange from a. The ratio of the former to the latter is tabulated. Two kinds of frequencies were computed: the frequency with which b occursas a ground state among all one-error neighbors (“occurrence frequency” and corresponding o-ratio) and the frequency with which it occurs atleast once in a 1-error neighborhood (“neighborhood frequency” and corresponding n-ratio). For example, row 1 states that the shown shapeoccurs 8 times more frequently as a minimum free energy structure in the one-error neighborhood (n-ratio) of a sequence that has that shapeas a suboptimal compared to one that lacks it. The first part of the table is based on samples with more than 200 sequences with b as asuboptimal, while the second part is based on sample sizes of more than 100 but less than 200 sequences. A given sequence can have severalsuboptimal configurations listed in this table.

Page 19: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

260 L. ANCEL AND W. FONTANA

energy. Structures outside this range have Boltz-mann probabilities too low to influence fitness.Figure 10 shows the fraction of sequences thathave a, the tRNA cloverleaf, among their subop-timal configurations and that are located in thecompatible one-error boundary of a tRNA neutralset. This fraction is shown as a function of theenergy interval ∆. The fraction of sequences in thesample that have at least one one-mutant neigh-bor with the tRNA as a suboptimal (curve i) hits1 at only 0.6kT, while the fraction of all one-errormutants (curve ii) is above 0.8 at 5kT, and reaches1 at 9.6kT (for T = 37°C).

The first numerical observation demonstratesthat the occurrence of α among the plastically ac-cessible configurations of a sequence indicates theimmediate vicinity of a’s neutral network. The sec-ond method shows that a’s neutral network castsa “shadow” into the energetically low lying sub-optimal configurations realized by sequences inits one-error compatible boundary. Together thesefacts suggest that plastic accessibility and geneticaccessibility mirror each other in our RNA model.From a biophysical point of view, this matches in-tuition. It means that the thermodynamic stabil-ity of a suboptimal structure a that is realizedwith non-negligible probability can frequently beimproved (even to the point of making a the

ground state) by one suitably placed point muta-tion. This occurs “positively” through mutationsin the base pairing positions of a to nucleotidesthat yield better stacking energies; and throughsimilar modifications to mismatches at the ter-mini of helical regions. It also occurs “negatively”through mutations that outright eliminate ener-getically competing structures by making the se-quence incompatible to them. Intuitively, themutational stabilization of structures towardwhich a sequence is already predisposed occursmore readily than the construction of a minimumfree energy structure from scratch.

The evolutionary reduction of plasticity proceedsif plasticity entails a fitness cost and mutationproduces similar but less plastic alternatives toexisting phenotypes. The latter relies on plasto-genetic congruence. Figures 11 and 12 emphasizethis evolutionarily enabling function of plasto-genetic congruence. We generate a sample of se-quences with a given a as a 5kT suboptimalconfiguration and no constraints on the minimumfree energy structure. The likelihood of obtainingsequences with a predefined α among their 5kTrepertoires can be tuned by enriching those se-quence segments that should fold into the stack-ing regions of a with stabilizing GC pairs (seeinset of Fig. 11).

We let each sample sequence be the startingpoint of a gradient walk. At each step of a walk,the one-error mutant is chosen that most in-creases the Boltzmann probability P(a) of a. Fig-ure 11 shows the distribution of the walk lengthsuntil α becomes the ground state and until P(a)cannot further be improved. The graphs at the leftof Fig. 11 confirm that there is a large probabilityof making a the ground state in a single point mu-tation. Furthermore, this one-step probability islower for samples that permit α to lie higher inthe energy interval ∆. The difference is one orderof magnitude as ∆ increases from 5kT to 8kT. Ahigher GC content in stacking regions also in-creases the one-step probability, since any bias to-ward α helps its evolution to lower free energy.

As shown in Fig. 11, α usually becomes the mini-mum free energy structure in only two to threemutations, yet it takes many more mutations forthe thermodynamic stability P(α) to attain its lo-cal maximum.

Figure 12 shows the distributions of P(α) whenα has become the minimum free energy structurefor the first time and when P(α) is a local maxi-mum. Two aspects are worth noting. First, the de-gree to which the thermodynamic sharpness of

Fig. 10. Plastogenetic congruence II. For each point in thegraph, we generate a new sample of 100 sequences with thetRNA cloverleaf as their minimum free energy structure. Allone-error mutants were scanned for the presence of the tRNAcloverleaf as a suboptimal configuration. The graph showsthe frequency of such sequences as a function of the energyrange ∆ that defines the plastic repertoire. Curve i refers tothe frequency with which a sequence in the sample has atleast one one-error mutant with the desired suboptimal struc-ture (“neighborhood frequency”), while curve ii shows the “oc-currence frequency.” See text for details.

Page 20: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 261

Fig. 12. Plastogenetic congruence III. The frequency dis-tributions of Boltzmann probabilities for structures I and IIare shown. See Fig. 11 for details. The left set of curves shows

the Boltzmann probabilities when the structures first appearas the ground state, and the right set shows the Boltzmannprobabilities at local optima of the gradient walk.

Fig. 11. Plastogenetic congruence III. The inset shows thefrequency with which a random sequence compatible with atRNA cloverleaf (structure I) has that structure among its5kT suboptimal configurations (curve i). This frequency isshown as a function of the fraction of GC or CG pairs in thesequence segments that fold into the stacking regions of thecloverleaf shape. Curve ii is the frequency with which suchsequences have structure I as a minimum free energy struc-ture. The main graph shows the distribution of gradient walklengths as described in the text. There are two sets of curves,black (left hump) and gray (right hump). The left set pertainsto walk lengths up to the first appearance of the given struc-

ture (I or II) as a minimum free energy structure. The rightset is the distribution of walk lengths until the Boltzmann prob-ability of that structure could not be further improved. Onlywalks that terminated within 100 steps are considered. Circlespertain to structure II, ∆ = 5kT, GC fraction in helical regionsis 0.5, 1,000 distinct walks were performed of which 686 ter-minated. Squares pertain to structure I, ∆ = 5kT, GC fractionis 0.5, 1,000 walks of which 999 terminated. Up-triangles per-tain to structure I, ∆ = 5kT, GC fraction is 0.33 (i.e., no bias),916 walks of which 909 terminated. Down-triangles pertain tostructure I, ∆ = 8kT, GC fraction is 0.33 (i.e., no bias), 322walks of which 322 terminated.

Page 21: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

262 L. ANCEL AND W. FONTANA

tRNA structures (for example) can be improvedis large. Second, the final P(α) distribution is muchnarrower and the average P(α) attained is sub-stantially closer to 1 for a tRNA structure (I) thanfor another structure (II) chosen randomly. Only68% of the gradient walks found within 11 stepsa sequence with the shape II as the minimum freeenergy structure. This suggests structure depen-dent limits to canalization.

In sum, this third perspective on plastogeneticcongruence looks beyond the one-error neighbor-hood. If a conversion of a suboptimal shape into aminimum free energy structure cannot be achievedin a single step, it can occur gradually, over severalsteps. Once a structure has become the groundstate, there is still room for a significant reductionof plasticity.

3.2. Evolutionary downside ofplastogenetic congruence

We have seen that plastogenetic congruence en-ables genetic assimilation, in that mutations eas-ily move a good structure from a minor positionin a plastic repertoire to the minimum free en-ergy position. There is, however, a flip side toplastogenetic congruence. When the reduction ofplasticity is linked to the reduction in genetic vari-ability, plastogenetic congruence stalls evolution.

We now consider the neutrality of the sequencesobtained through our gradient walks. Recall thatneutrality is the fraction of single base mutationsthat preserve the minimum free energy structure.The top graph of Fig. 13 shows four examples ofgradient walks, chosen for their diversity of ap-proaches to a local optimum. The walk criterion op-timizes the Boltzmann probability of the structuredepicted at the bottom of Fig. 13. (This structuredominated the population when the evolutionaryprocess of Fig. 6, section 2.2, became trapped.) Thebottom part of Fig. 13 shows the concurrent changesin neutrality along these same gradient walks. Firsta structure descends to the minimum free energyconfiguration, and the neutrality drops to a mini-mum. Then the neutrality increases sharply to lev-els above 0.4. This value lies in the tail of thedistribution of neutralities for all sequences withthat minimum free energy structure (not shown).The mean of this distribution is 0.3 (Matt Bell, per-sonal communication, August, ’99). Note that thefraction of one-error neighbors compatible with theshown structure is approximately 0.55. (The exactcompatibility fraction depends on the U content of asequence.) One walk (up-triangles) approaches butdoes not sustain this upper bound for neutrality.

After the desired structure has moved into theminimum free energy position, point mutationsthat increase its Boltzmann probability typicallyalso increase the neutrality of the sequence. Thesechanges have an epistatic effect of the kind wedefine in section 1.5 and Fig. 4.

Recall that neutrality indicates robustness togenetic modification and the Boltzmann probabil-ity of the minimum free energy structure indicatesrobustness to thermodynamic perturbation. Assuch, we conclude that genetic canalization occursin tandem with environmental canalization. Inthese walks and in our full-blown simulations, en-vironmental canalization—the reduction of plas-ticity—is the directly selected response to selection

Fig. 13. Gradient walks in plastogenetic space. Gradi-ent walks are generated by moving from the current se-quence to its one-error mutant that most improves theBoltzmann probability of a prespecified structure. Walks be-gin with sequences that have that structure within their5kT plastic repertoire. Top: Progress profiles along samplewalks. Bottom: Concurrent development of neutrality (frac-tion of one-error mutants with the same minimum free en-ergy structure) along these walks.

Page 22: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 263

in a population. Genetic canalization, however, isnot an adaptation. It is instead a byproduct of en-vironmental canalization, to which it is linked bycommon genetic underpinnings. The possibilitythat environmental and genetic canalization sharea genetic basis which might account for the evo-lution of canalization was recently hypothesizedby Wagner et al. (’97). Our study of the evolutionof plasticity in RNA provides a mechanistic real-ization for this hypothesis.

Figures 14 and 15 provide support for the simul-taneous drops in plasticity and evolvability. The firstof these figures illustrates the surprising extent ofplasticity reduction in RNA. It compares the den-sity of structural states of three sequences that havebeen obtained through different processes but thatshare the same minimum free energy structure. Thecommon minimum free energy shape is the charac-teristic shape α found at the dead-end of the evolu-tionary process depicted in Fig. 6 (section 2).

The random sequence obtained by inverse fold-ing α (Hofacker et al., ’94) has 574 different struc-tures within 3 kcal of a. It spends only 3% of thetime in α, and the combined probability of the 574alternative configurations accounts for only 58%of the partition function. The different configura-tions cover a wide range of structural diversity.

The second sequence is the product of evolutionunder the simple map. We inoculated a simulatedflow reactor with the aforementioned inversefolded sequence, and specified its minimum freeenergy structure a as the target structure. Thisleaves no room for phenotypic improvement. De-spite the absence of direct selection pressure onthe well-definition of the ground state, we observea reduction of plasticity by one order of magni-tude. In van Nimwegen et al. (’99a), we learn thata population evolving on a neutral network underthe simple map will concentrate on sequences withhigher than average neutrality (see Figs. 15 and18). Because mutations off the neutral networkyield, on average, much lower fitness phenotypes,there is indirect selection against sequences in theneutral network that have a high fraction of one-error mutants off the network, i.e., low neutral-ity. This is a second order effect that depends onthe probability of deleterious mutation. Similarobservations were made with a model of geneticregulatory networks by A. Wagner (’96). Theevolved sequences that have become mutationallymore robust also exhibit a correspondingly lowerplasticity. This is a manifestation of plastogeneticcongruence, which is (weakly) effective even un-der the simple map.

The last sequence in Fig. 14 is evolved underthe plastic map where reduced plasticity is thedirect consequence of natural selection. The sizeof the plastic repertoire decreases by another or-der of magnitude to only four structures in addi-tion to the ground state which is occupied 67% ofthe time. If the molecule equilibrates over all itsstates, it spends 94% of its time in these five con-figurations that are highly similar to each other.

Figure 15 shows the neutral positions of thesesequences. We call a position neutral if at least onemutation at that position leaves the minimum freeenergy structure unchanged. Figure 15 additionallyincludes three sequences obtained similarly for thestructural end point in the evolutionary process ofFig. 5, section 2. For the structures on the left, theneutrality of the sequences increases from 0.184 forthe random sequence to 0.412 for the neutrallyevolved sequence to 0.456 for the canalized se-quences. Similarly on the left, the neutrality in-creases from 0.158 to 0.311 to 0.430 from top tobottom. The neutral coverage in the canalized caseis almost perfect. That is, a position will tend to beneutral if it permits a mutation that does not de-stroy the compatibility of the sequence with its mini-mum free energy structure. The intermediateneutrality of the neutrally evolved sequences reflectsthe second-order selection for increased neutralityunder the simple map as discussed above. The fur-ther increase in neutrality that occurs under theplastic map is a side effect of the sharp reductionin plasticity.

3.3. Neutral confinementPlastogenetic congruence provides both the

mechanism for an evolutionary reduction of plas-ticity and the link between this reduction and theultimate evolutionary dead-end. Plasticity is costlybecause more structures in the plastic repertoireimplies less time spent in any one. Plastogeneticcongruence enables populations to reduce thesefitness costs through genetic assimilation—themovement of an advantageous suboptimal struc-ture to the minimum free energy structure, andsubsequent reduction in plasticity. At the sametime, plastogenetic congruence causes a loss invariability. In this section we closely examine theextent to which populations end up genetically iso-lated from phenotypic novelty.

Figure 16 compares the frequency distributionof neutralities at the end of the plastic simula-tions depicted in Figs. 5 and 6 (filled circles) andthe corresponding simple runs (down-triangles).(The data represented by squares are discussed

Page 23: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

264 L. ANCEL AND W. FONTANA

below.) Again, the neutrality of a sequence is thefraction of its one-error mutants that share itsminimum free energy structure. It indicates thelack of phenotypic variability of the sequence. Theplastic population evolves to a distribution of neu-tralities with a much higher average and lowerkurtosis than that of the non-plastic population.

Sequences with high neutrality have only a smallproportion of distinct phenotypes in their mutationalvicinities, and are therefore unlikely to mutate to-wards a higher fitness phenotype. Furthermore, weargue that the minimum free energy structures ofthe few non-neutral neighbors offer little phenotypicdiversity, making the discovery of novel advanta-

geous mutants even more unprobable. High neu-trality correlates with high structural similarity be-tween the ground state structure and the otherconfigurations in the plastic repertoire (Wuchty etal., ’99). By plastogenetic congruence, this comes tomean that a one-error mutant of a low-plasticitysequence either folds into the same minimum freeenergy structure (neutrality) or into a structure thatis very similar to it. Very similar here means thatthe structures differ slightly with respect to stacklengths or loop sizes. As a consequence, the discov-ery of new advantageous shapes is considerablyslowed down and eventually halted. We say thepopulation is “neutrally confined.”

Fig. 14. Density of states in 5kT. The structural densityof states is shown for three sequences that have been ob-tained by inverse folding, neutral evolution under the simplemap, and canalization under the plastic map. All three se-quences have the same ground state. For the first two se-quences, we present a few sample structures with their energyon the right hand scale. We display the complete repertoireof the third sequence. The gray boxes indicate many non-displayed structural states.

Page 24: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 265

Highly neutral regions of a neutral network ap-pear to be wide-spread and connected (Matt Bell,personal communication). Genetic diversification,therefore, still occurs within these regions. Yet theextent of genetic variation is insufficient to pro-duce difficult shape transformations, such as ashift or the de novo creation of a stack (Fig. 2).

Neutral confinement in RNA seems independentof the mutation rate. We simulate evolution on aneutral network by starting a population with se-quences that have the designated target structureas their minimum free energy structure. Figure17 depicts such simulations that use the struc-ture shown at the top of Fig. 16 as both the start-

Fig. 15. Neutral positions. The emphasized positions arethose with at least one neutral nucleotide substitution. Wehighlight these neutral positions for inverse folded, neutrally

evolved, and canalized sequences on two minimum free en-ergy structures.

ing minimum free energy shape and the target.Figure 17 monitors the fraction of sequences withthat structure as a function of the replication ac-curacy per position. This is done for two values ofthe superiority (see caption for Fig. 17) and forboth the plastic and the simple map. The pheno-typic error threshold (Huynen et al., ’96; Reidyset al., ’98) is the replication accuracy at which thatminimum free energy structure is lost from thepopulation. At the same time we monitor the av-erage neutrality of sequences with that structure.The independence of the average neutrality fromerror rate and superiority was predicted for thesimple case by van Nimwegen et al. (’99a). In the

Page 25: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

266 L. ANCEL AND W. FONTANA

Fig. 16. Neutral confinement. We measure the neutralityof each sequence species in the population support and plot afrequency distribution. This just means that each sequencetype is weighted the same, irrespective of its frequency inthe population. Frequency-weighted plots look similar, withmore dramatic high-neutrality peaks and a subdued low-neu-trality spectrum. Top: Populations pertaining to the simula-tion of Fig. 6. Filled circles: population support neutralityafter 4.2 × 106 replications under the plastic map (the struc-ture shown in the inset dominates). Down-triangles: neutral-

ity at the end of the simple run (Fig. 6) with the simple map.Squares: neutrality of a population that has evolved for 5.6× 106 replications from the one underlying the filled-circledata when plasticity was switched off. Bottom: Similar analy-sis for populations pertaining to the simulation shown in Fig.5. Filled circles: with plasticity after 30 × 106 replications.Down triangles: simple run without plasticity after 32 × 106

replications. Squares: evolved from the filled-circle data for4.4 × 106 replications with plasticity switched off.

Page 26: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 267

plastic case, however, the independence is ratherunexpected. In section 4 we present formal mod-els of evolution under the plastic map. They pre-dict a shift in the equilibrium distribution fromhigh neutrality (of neutral confinement) to lowerneutrality at a mutation rate higher than the er-ror threshold. This suggests that as the mutationrate increases, the population goes directly fromneutral confinement into falling completely off theneutral network (error catastrophe).

To summarize, we have demonstrated that

plastogenetic congruence plays three critical roles.First it enables the genetic assimilation of subop-timal shapes into ground states. Second it facili-tates the reduction of plasticity. This occurs understabilizing selection following a transition, in theearly periods of apparent stasis when the geneti-cally accessible phenotypes are either neutral orof lower fitness. Third, during this reduction,plastogenetic congruence yields the advance to-wards sequences with low phenotypic variability—sequences in an evolutionary dead-end.

3.4. Fitness landscape perspectiveOur analysis repeatedly appeals to neutrality

with respect to minimum free energy structures.Under the simple map from genotype to minimumfree energy structure, such neutral networks arealso invariant with respect to phenotype and fit-ness. Plasticity adds texture to these neutral net-works in terms of both phenotype and fitness. Twosequences that share a minimum free energy struc-ture will have divergent structures in their remain-ing plastic repertoires, and hence different fitnesses.As discussed in section 1.6, we nevertheless con-tinue to refer to such networks as neutral.

When populations of plastic sequences arrive atan evolutionary dead-end, they are confined to themost neutral recesses of a neutral network. Theseregions correspond to local fitness maxima underthe plastic map. One might therefore argue, thatthese dynamics can be solely conceived in termsof evolution toward a local fitness optimum. Wecounter that the ruggedness of the fitness func-tion that results from the plastic map is remark-able by virtue of plastogenetic congruence.

The fitness peaks at which plastic populationscome to rest correspond not just to good phenotypes(that is, repertoires with a stable minimum free en-ergy structure close to the target), but also to re-gions that are mutationally isolated. Plastogeneticcongruence means that the set of genotypes withhighest fitness in a neutral network are mutation-ally highly interconnected with very small bound-aries (if any) with other neutral networks.

Consider a fitness function in which a randomlychosen connected sub-network of a neutral net-work was assigned high fitness relative to the restof the network. We assert that a population evolv-ing under such a fitness function is much lesslikely to be trapped than a population evolvingunder our plastic fitness function. Arbitrarily cho-sen fitness peaks of this kind will not suffer themutational buffering induced by plastogenetic con-gruence.

Fig. 17. Neutrality as a function of replication accuracy.Populations were obtained by neutral evolution with differ-ent replication accuracies on the structure α shown at thetop of Fig. 16. The graphs that decay exponentially with de-creasing replication accuracy represent the stationary fre-quency of the target phenotype α in the population. The nearlyconstant graphs monitor the average neutrality (in the popu-lation support sense of Fig. 16; essentially the same figureobtains when sequences are weighted by their frequency) ofthe sequences on the “master network” (i.e., whose minimumfree energy configuration is the target α). The replication ac-curacy at which the master network is lost is known as thephenotypic error threshold (Huynen et al., ’96; Reidys et al.,’98). For infinite populations the threshold accuracy qmin de-pends on the average neutrality of the α-network, λα, andthe superiority of α, σα, indicating how much better the rep-lication rate of α, rα, is with respect to the remaining pheno-types in the population (Reidys et al., ’98; Schuster andFontana, ’99): qmin = ((1 – lasa)/(1 – la)sa)1/n with sa = ra/Sb¹arβxβ/(1 – xa), where xb is the fraction of sequences with groundstate β. In the plastic case, rα is not constant, since differentsequences with α as a ground state will have different sub-optimal configurations. We replace ra by the average oversuch sequences in the population. The graphs for the plasticcase always dominate those of the simple case (higher neu-trality and hence lower threshold accuracy). Filled circles:neutral evolution at superiority 10. Down-triangles: neutralevolution at superiority 1.5. Solid curves are linear regres-sions and exponential fits. Note the near constancy of thepopulation neutrality in the plastic case regardless of the su-periority and error rate.

Page 27: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

268 L. ANCEL AND W. FONTANA

3.5. Relationship to work in protein foldingA correlation between the thermodyamic stabil-

ity of a minimum free energy structure and itsmutational robustness has been observed in lat-tice models of protein folding (Bussemaker etal., ’97; Vendruscolo et al., ’97; Govindarajanand Goldstein, ’97; Bornberg-Bauer and Chan,’99), and in RNA models (Wuchty et al., ’99).“Thermodynamic stability” refers to the freeenergy ∆G (Govindarajan and Goldstein, ’97;Bornberg-Bauer and Chan, ’99), the thermalstability (melting temperature) of the groundstate structure (Vendruscolo et al., ’97), or theenergy gap between it and the first excited state(Bussemaker et al., ’97). We relate our findingsto this body of work in two respects.

Although plastogenetic congruence in RNAimplies an analogous correlation between ther-modynamic and mutational stability, it hasbroader implications. The plasticity of an RNAmolecule refers to alternative structures in thevicinity of its ground state at constant tempera-ture. [Bornberg-Bauer and Chan (’99) use theterm “plasticity” unconventionally to refer tomutational (in)stability, rather than to alterna-tive phenotypes of a single genotype in responseto the environment.] Plastogenetic congruencestates that the alternative structures availableto a given sequence are frequently found as theminimum free energy structures of its muta-tional neighbors. This indicates which minimumfree energy structures will be accessible throughmutation, and therefore goes beyond a correla-tion between the thermodynamic and muta-tional stability of the ground state. Hence weintroduce the new terminology “congruence.”

Bornberg-Bauer and Chan (’99) report thatneutral networks in protein models “centeraround a single prototype sequence of maximummutational stability”. This is not true in RNA.Our evolutionary simulations and numericalstudies by Matt Bell (personal communication,August, ’99) indicate that the RNA sequenceswith highest neutrality constitute extended andconnected subnetworks of neutral networks.Furthermore, gradient walks to optimize theBoltzmann probability of the minimum free en-ergy structure (Fig. 13) reach maximum neutral-ity after a dozen steps. Following this, the walkscontinue to improve the thermodynamic stabil-ity of the ground state structure while maintain-ing constant neutrality.

4. ANALYTIC MODELS OF NEUTRALCONFINEMENT

4.1. Model assumptionsIn this section, we discuss three simplified mod-

els of plastic RNA evolution. In each, we charac-terize the dynamics of a plastic population in thefinal stage of evolution, and derive equilibriumconditions in terms of neutrality and fitness. Thetwo assumptions that make these approaches ana-lytically tractable are (1) perfect plastogenic con-gruence and (2) containment within a singleneutral network in which all sequences share thesame minimum free energy structure. By perfectplastogenetic congruence, we mean that structuresaccessible to a sequence through plasticity are ex-actly those found as minimum free energy struc-tures of sequences accessible through singlemutations. Although simulated populations typi-cally consist of sequences from several equally fitneutral networks, our containment assumptionsimplifies the population to sequences that sharea single minimum free energy structure, whiletheir suboptimal structures may vary.

Recall that the plastic genotype–phenotype mapgives rise to a much more rugged fitness land-scape than the simple map. Neutral networks ofsequences with the same minimum free energystructure no longer share the same fitness. Rather,the fitness of a sequence depends directly on otherstructures in its plastic repertoire. Since plasto-genetic congruence ties these structures to theminimum free energy structures of mutationalneighbors, our simplifying assumption (1) meansthat the fitness of a sequence becomes a functionof the phenotypes of its genetic neighbors. By ex-tracting this brand of ruggedness in a simplerframework, we demonstrate that the extreme ge-netic isolation in RNA should come as no surprise.

Furthermore, these models provide insight intothe role of mutation. We demonstrate that the ge-netic isolation may be such that no amount of mu-tation can facilitate an escape. The minimumamount of mutation necessary to produce novelphenotypes is so great that the resulting mutantshave regressed completely from the target.

4.2. Frequency distribution withina neutral network

Consider a population of RNA sequences withphenotypic plasticity. A sequence can assume arange of structures, all within an energetic neigh-borhood of its minimum free energy structure. Thefitness of a sequence is determined by a weighted

Page 28: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 269

average of the distances between its low energyconfigurations and a target structure. The as-sumption that plasticity gives a perfect picture ofevolutionarily adjacent structures allows us to con-strue the fitness of a sequence entirely in termsof the fitness of its mutational neighbors:

)0

( ( )

( , )( ) ( ( , )).

( , )j i j i

j iji

S S j iS S

S Sw S d

S Sµ

µ

δα

δ∈ ∈

∑ fN N

t (7)

where Si is a sequence, a j0 is the minimum free

energy structure of Si, N (Si) is the mutational neigh-borhood of Si, d() determines structural distance be-tween two shapes, f() is a monotonically decreasingfunction of structural distance d(), and δm() is amonotonically decreasing function of mutational dis-tance. This definition is analogous to eq. (3) in sec-tion 2. Because of perfect plastogenetic congruence,the structures that determine the fitness of Si arethe minimum free energy structures in its muta-tional neighborhood. The second factor in eq. (7) isanalogous to the Boltzmann probability in that itweighs a neighboring structure with the likelihoodof reaching its sequence by mutation.

Suppose a population is concentrated on a neu-tral network G of relatively high fitness. That is,all sequences in G have the same minimum freeenergy structure, and most one-error mutants ofthese sequences that lie outside of G have rela-tively much lower fitness. Let |G| be the num-ber of sequences in G.

We use an approach proposed by van Nimwegenet al. (’99a) in which G is viewed as a graph. Eachsequence corresponds to a node, and two nodesare connected by an edge when the sequences theyrepresent differ by exactly one mutation. The de-gree di of a node is the number of edges that con-nect it to another node in G; in other words, it isthe number of single mutations that preserve theminimum free energy structure. A node Si willtherefore have di one-error mutants in G, and 3n– di one-error mutants that lie outside of G, wheren is the length of sequences. Assuming perfectplastogenetic congruence, we can approximate thefitness of an individual node Si by the averagefitness of its one mutant neighbors. We assumefitness off of G is relatively negligible, and there-fore ignore the fitness contributions of one-errormutants not in G. Therefore the fitness of Si isfdi + 0(3n – di) = fdi where f is the selective valueof the minimum free energy structure shared byall sequences in G.

We can express the per generation change inthe frequency distribution of sequences in G witha system of |G| equations. For any S ∈ G,

( )( (1 ) ),

3s s s t tt s

P P d Pdw n

µµ∈

= − +′ ∑f

N(8)

where Ps gives the frequency of sequence S, m givesthe per sequence per generation mutation rate, nis the length of the sequences, and w– is the aver-age fitness of the population. Note that this for-mulation ignores the possibility of mutations ontoG from sequences outside of G.

We now translate this system of equations intoa transition matrix M such that P→′ = M P→ whereP→ is the frequency distribution vector. Let I de-note the identity matrix; A denote the adjacencymatrix where Aij = 1 if Si and Sj are one-errormutants of each other and Aij = 0 otherwise; andD denote the diagonal matrix of degrees with Dii= di for all i and Dij = 0 for all i ≠ j. Then wederive

((1 ) ) .3nµµ= − +M I A D (9)

Assume there exists a unique node c such that dc >di for all i ≠ c. Without loss of generality, we let c =0. Then S0 is the most neutral sequence in G withdegree d0. When m = 0, that is, there is no muta-tion, then M = D which has a leading eigenvalue ofd0 and an associated eigenvector [1,0,0,...,0]. In theabsence of mutation then, the equilibrium popula-tion is made up entirely of S0.

In general, the equilibrium distribution is thesolution P→ to the eigenvalue equation MP→ = lP→

where l is the leading eigenvalue of M. The re-sulting distribution of sequences will reflect a mu-tation–selection balance. Because we can breakour transition matrix into a diagonalizable ma-trix and a small remainder term,

0 11 ) [ ] + [ ],

3M M

nµ µ= + ( − =M D A I D (10)

we can use perturbation theory to get a first-or-der approximation of the equilibrium distributionfor small m. We find that

| |0 01

0

00

0

1 1 1 if 0,1 3

ˆif 1,

1 3 ( )0 otherwise.

G kk

k

kk

k

Adk

n d d

P dA

n d d

µµ

µµ

=

+ − Σ = + − =

= + −

(11)

Page 29: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

270 L. ANCEL AND W. FONTANA

Recall that Aij indicates whether sequences Si andSj differ by a single mutation. According to this ap-proximation, the equilibrium distribution is madeup of the most connected sequence S0 and its one-error mutants. As m decreases, the proportion of non-S0 sequence shrinks. Because the denominator 3n(d0– dk) depends on dk, the frequency of a non-S0 se-quence is an increasing function of its degree. Evensequences with degree close to d0 will appear atlow frequency. For example, consider an Sk wheredk = d0 – 1 and the maximal µ = 0.1. Then P̂k = 0.3d0/n, which is very small for large n.

For two simple networks, we compare the equi-librium distribution of neutrality predicted by ourplastic model to that of the original non-plasticmodel (van Nimwegen et al., ’99a). Both networkscontain 20 nodes (sequences). In the first, node 0is connected to all 19 other nodes, and all othernodes are only connected to 0. In the second, nodes0–10 are connected to all other nodes while nodes11–20 are only connected to nodes 0–10. Figure18 graphs the equilibrium distributions for non-plastic (dotted) and plastic (solid) populations. Theplastic populations are much more concentratedon the nodes with highest degree.

4.3. Three-tiered model: explorationand error thresholds

Premised on the previous model, we turn to therole of mutation. Can mutation enable the popu-lation to avoid the evolutionary dead-end in thefirst place? Intuition suggests that an increasedmutation rate might enable populations to searchbeyond their immediate genetic vicinity, into re-gions where novel phenotypes exist. We show here,however, that for certain parameter values thereis no mutation rate that provides such an escape.Under low mutation rates, populations, like thosein our simulations, are confined in an “explorationcatastrophe,” yet under high mutation rates, popu-lations are in an “error catastrophe” where theyhave slipped away completely from the target.

Again consider a population that has reached arelatively high fitness neutral network G. We breakdown the population into three distinct classes, twowithin the neutral network and one representingthe rest of sequence space. For any sequence Si ∈G, the one-error mutants of Si that are also in Gare called its neutral neighbors, and the number ofsuch neighbors is called di, the degree of Si. Assumethat the neutral network contains a single sequence

Fig. 18. Network distributions. The equilibrium distribu-tions for a non-plastic population (dotted) and a plastic popu-lation (solid). In network a (left ordinate), node 1 has muchhigher neutrality than all other nodes: d1 = 19 and dn = 1 for

n = 2–20. In network b (right ordinate), nodes 1–10 form ahigh-neutrality subnetwork: dn = 10 for n = 1–10 and dn = 1for n = 11–20.

Page 30: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 271

S0 with maximal connectivity d0 within G, and thedegree of all other sequences in G is d1. That is,for any Si ∈ G where Si ≠ S0, di = d1 < d0.

We define three classes of sequences: C0 is justthe sequence S0, C1 consists of the sequences inG – {S0}, and C2 are all sequences not containedin G. Changes in the frequency distribution amongthese classes results from a combination of muta-tion and natural selection.

Let m be the probability of mutation per se-quence per generation. Sequences in C0 mutateto sequences in C1 at a rate md0/3n, where 3n isthe number of possible one-error mutants of a se-quence. Mutations take sequences in C0 off thenetwork, into C2 at a rate m (1 – md0/3n). Likewisethe mutation rates from C1 to itself and to C2 aremd1/3n and m (1 – d1/3n), respectively. We ignoreback mutation of sequences in C1 to C0 and of se-quences in C2 to either C0 or C1. We discuss theimplications of this assumption below.

The sequences in this model are again endowedwith perfect plastogenetic congruence as describedin the previous section. Every sequence has a se-lective value which is a measure of the similaritybetween the sequence’s minimum free energystructure and a pre-determined target structure.Sequences within G have a selective value of 1while sequences off G have a relative selectivevalue of f<1. The relative fitness of a sequence inclass Ci is wi, an average of the selective valuesof its one-error mutants:

0 00 1 ,

3 3d d

wn n

= + − f (12)

1 11 1 ,

3 3d d

wn n

= + − f (13)

w2 = f . (14)

Recall that di/3n (i = 0, 1) is, for any sequence inCi, the fraction of its one-error mutants that alsolie within G. We construct a transition matrix Tthat incorporates the effects of mutation and selec-tion. If P→ = (P1, P2, P3) describes the occupancy ofthe three classes at some time, then P→′= TP→ givesthe frequency distribution in the next generationwhere T is as follows:

0 1

0 1

0

3 30 1 1

30 1 2

(1 ) 0 0

T (1 ) 0 .

(1 ) (1 )

d dn n

d dn dn

w

w w w

w w w

µ

µ µ µ

µ µ

− = − + − −

(15)

The leading eigenvector of T provides the popula-tion equilibrium distribution. The eigensystem ofT is given by

1

00

31 1

22

(1 )

(1 (1 ))dn

w

ww

µλλ µλ

= − −

(16)

and0 0 01 1

0 1 1 03 3 3 3 30 0

0 0 23 3

(1 ) (1 (1 )) (1 (1 )) (1 (1 ))0 (1 )

,1, ,d d dd d

n n n n nd d

n n

w w w w

w w wP − − − − − − − − − − −

− − +

= r µ µ µ µ

µ µ

(17)1

1 231

1 3

(1 (1 ))1 1

0, ,1dndn

w w

wP µ

µ

− − −

=

r(18)

2 (0,0,1).P =r

(19)

The subscripts of the eigenvalues and eigenvec-tors refer to the class which dominates the distri-bution, and not to their magnitudes. As parametervalues vary, so does the leading eigenvalue λ =max(l0, l1, l2).

We seek parameter ranges that allow the popu-lation to explore phenotype space. A populationconcentrated in C0 will have mostly neutral mu-tants, and therefore will be unlikely to find ahigher fitness phenotype through mutation. Wecall this an exploration catastrophe. A populationlost in C2 has regressed from the higher fitnessnetwork. This is the error catastrophe. C1 on theother hand is a high fitness platform from whicha population can explore phenotype space for bet-ter options. For these reasons, we say that a popu-lation with a high concentration of C1 is belowthe error threshold which moves the populationoff of G, and above the exploration threshold whichcontains the population in a highly inward-look-ing subset of G.

In Fig. 19, we display equilibrium distributionsfor various parameter ranges. In every case, weset the length of the sequences to n = 100. Eachright-hand graph shows the concentration of C0(labeled “0”), C1 (labeled “1”), and C2 (labeled “2”).Populations dominated by C2 are above the errorthreshold while populations dominated by C0 arebelow the exploration threshold. On the left wegraph the eigenvalues. The topmost plane at eachpoint is the leading eigenvalue. Its associatedeigenvector is that which determined the frequen-

Page 31: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

272 L. ANCEL AND W. FONTANA

cies in the opposing graph. Light gray, mediumgray, and dark gray represent l0, l1, and l2, re-spectively.

The center distribution assumes neutralities ofd0/3n = 0.45 for C0 and d1/3n = 0.3 for C1. These

Fig. 19. Eigenvalues (left graphs) and equilibrium distri-butions (right graphs). Given values for µ and f, we graph λ0,λ1, and λ2 in light, medium, and dark gray, respectively, and

equilibrium concentrations for C0, C1, and C2 labeled as (0),(1), and (2), respectively.

are the values we obtain from actual simulation.The equilibrium distribution suggests that for thelow values of µ we use in simulations, the popu-lation should be trapped on C0 in an explorationcatastrophe. This is consistent with the neutral

Page 32: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 273

confinement we find for low-plasticity sequences.We discuss these distributions further in the nextsection.

4.4. Neutrality in the subnetwork C0

In simulation, RNA populations evolve throughneutral networks that are much more complexthan the idealized C0 and C1 of the three-tieredmodel. Populations move among highly intercon-nected subnetworks of multiple structurally simi-lar neutral networks. In this extension of thethree-tiered model, we still assume the popula-tion converges on a single neutral network G, butattempt a more realistic conception of the struc-ture of G.

Suppose now that C0 is enlarged to a subnet-work of the neutral network G in which all |C0|sequences have degree d0. As above, the remain-ing sequences in G occupy C1 and share degreed1. We define two new parameters c0 and c1, whichare the fractions of neutral mutations that remainwithin C0 and C1, respectively. Thus 1 – χ0 is thefraction of neutral mutations from C0 to C1, and 1– χ1 is the fraction from C1 to C0.

Mutation operates according to the followingtable of mutation rates (Table 2). The entry inthe Ci row and Cj column is the probability thatan individual in Ci mutates into Cj in a given gen-eration. We will use these rates in the transitionmatrix.

In this case, we also consider mutations fromC1 into C0, but again we ignore mutations fromC2 into G.

As in the original formulation, fitness is theweighted average of the selective values of theminimum free energy structures of all one-errormutants of a sequence. For any sequence in C0 orC1 the total number of one-error mutants that arealso in G is still d0 or d1, respectively. Thereforethe extension of C0 to a sub-network of G doesnot alter the fitnesses of the three classes.

Again we construct a transition matrix that de-

scribes flow between classes. In the following ma-trix µij, 0 ≤ i, j ≤ 2, is the mutation rate from Ci toCj as given in Table 2:

0 0 00 1 10

0 01 1 1 11

0 02 1 12 2

(1 ) 0(1 ) 0 .

w w ww w ww w w

µ µ µµ µ µµ µ

− + = − +′

T (20)

Though it still only involves a quadratic and alinear term, the eigensystem of T′ looks more com-plicated than that of T because T′ is not triangu-lar, and we have added the new parameters c0and c1. In the following analysis, we assume c0 =c1 = c, and explore the effects of c and m on thepredicted equilibrium distributions.

Figure 20 depicts equilibrium distributions forf = 0.9, 0.5, 0.1 from left to right. As c increases,the equilibrium concentration of C0 increasesslightly and the frequency of C1 decreases, mak-ing exploration even more difficult. For very largerc, the frequency of C1 remains negligible for all m.See for example Fig. 21.

The lower eigenvalue graphs reveal that underthe extended model, populations are always in ei-ther the l0 phase or the λ2 phase. In the previousmodel, we found three distinct regimes, each domi-nated by a unique eigenvector. This discrepancyarises from the role of mutation. In the first modelrecall that there is no back mutation from C2 intoC0 and C1, and no back mutation from C1 to C0.In the second model, we add mutation from C1 toC0. A model that includes mutation in all direc-tions among three classes would yield a singlemaximum eigenvector over all of parameter space.In the first model, we use the transitions betweenleading eigenvalues to mark the phase transitionsbetween regimes dominated by different subsetsof the population. In the current model and thehypothetical all-mutations model, however, tran-sitions between such phases in the frequency dis-tribution are difficult to identify. One innovativeapproach to this problem considers finite-popula-tion dynamics in a simplex (van Nimwegen et al.,’99b). Here a qualitative perspective suggests thatthere are parameter values for which populationsgo directly from an exploration catastrophe, con-fined to C0, to an exploration catastrophe, diffus-ing in C2.

4.5. Exploration and error catastrophesIn both versions of the model we find param-

eter ranges for which the population is confined

TABLE 2. Mutation rates

DestinationOrigin C0 C1 C2

C00

0 3dn

χ µ 00

(1 )3dn

χ µ− ( )031 d

nµ −

C1 ( ) 11

13dn

χ µ− 11 3

dn

χ µ ( )131 d

nµ −

C2 0 0 m

Page 33: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

274 L. ANCEL AND W. FONTANA

a slightly monotonically increasing function of c.This implies that as the size and interconnectivityof C0 grows, so does the likelihood of reaching anevolutionary dead-end.

There also exist parameter ranges for which thepopulation is lost in C2. For specified f, n, r, and cwe can find a threshold µ above which the popu-lation reaches such an error catastrophe. Intu-itively, as f—the relative fitness of C2—increases,the error threshold decreases.

These transitions are formally identical to thewell-known genotypic error threshold, denotingthe mutation rate at which a genotype with opti-mal phenotype (“master genotype”) is lost (Eigen,’71), and the phenotypic error threshold, denot-ing the mutation rate at which the optimal phe-notype is no longer maintained in the population(Huynen et al., ’96; Reidys et al., ’98).

In the simple version of our model, the explora-tion threshold corresponds to the genotypic errorthreshold in which there is a single master se-quence with superiority approximately w0/[d0w1/3n + (1 – d0/3n)w2] = [d0(1 – f ) + 3nf ]/[d1d0(1 – f )/3n + 3nf]. The genotypic error thresholddivides the low mutation rates at which the mas-ter sequence is preserved at equilibrium from the

Fig. 20. Extended model equilibrium distributions andeigenvalues. Given values for m and c, these display equilib-rium concentrations for C0, C1, and C2 labeled as (0), (1), and(2), respectively. The bottom row gives eigenvalues l0 in light

gray, l1 in medium gray, and l2 in dark gray. All graphs as-sume n = 100, d0 = 135, and d1 = 95. From left to right, f =0.9, 0.5, 0.1. Note that the range for m varies across graphs.

Fig. 21. Equilibrium distribution for c = 0.9. For low m,the population is confined to C0. Around m = 0.25, the popula-tion moves off C0 into C2, with virtually no occupancy of C1.

to C0 and therefore is unlikely to find phenotypicnovelty through mutation (exploration catastro-phe). For given values of f, n, and c we can iden-tify a threshold m below which the population isin an exploration catastrophe. Generally, the ex-ploration threshold decreases as f increases and is

Page 34: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 275

high mutation rates at which the master class iscatastrophically lost. The exploration thresholdsimilarly divides the low mutation rates at whichthe population is confined to C0 from the high mu-tation rates at which the population reaches C1.

What we call the error catastrophe in our simplemodel is equivalent to a phenotypic error thresh-old where the mean fraction of neutral neighborsis approximately ν ~ d1/3n. In our extended modelboth the exploration catastrophe and the errorcatastrophe correspond to a phenotypic errorthreshold because C0 becomes a neutral networkrather than a single sequence. Increasing the sizeof C0, i.e., increasing the neutrality of C0, post-pones the exploration threshold. A larger C0 en-tails extended ranges of mutation rates at whichthe population will be in an exploration catastro-phe. Consequently the regions of parameter spacein which the population goes directly from an ex-ploration catastrophe into an error catastrophealso increase.

5. MODULARITYModularity is a hallmark of biological organi-

zation and an important source of evolutionaryinnovation (Bonner, ’88; Wagner and Altenberg,’96; Hartwell et al., ’99). Once it exists, modular-ity constitutes an obvious advantage by enablingthe recombination of stable subunits into novelphenotypes. Yet, the origin of modules remains aproblem for evolutionary biology, even in the caseof the most basic protein or RNA domains (West-hof et al., ’96). Here we offer a possible origin forsuch organization.

In section 2, we demonstrated that plasticity israpidly reduced by natural selection under a bio-physically motivated fitness function that weighsthe selective value of each shape in the plasticrepertoire of a sequence by its Boltzmann prob-ability. The reduction of plasticity has, in addi-tion to genetic canalization, a further side effectwhich is best characterized as modularity.

The point we are making in this section is, inessence, yet another characterization of plasto-genetic congruence in RNA: structural units thatappear autonomous from an environmental anddevelopmental perspective appear at the sametime autonomous from a genetic perspective. Herethe environment refers to temperature, and de-velopment refers to the kinetic process by whichan RNA sequence folds from an open chain intoits minimum free energy structure. We show thatthe evolutionarily reduction of plasticity crystal-lizes RNA structures into environmentally, devel-opmentally, and genetically autonomous units.

5.1. Norms of reaction: melting behavior

We view plasticity as a stochastic choice amongalternative structural states of a biopolymer incontact with a heat bath at constant tempera-ture. Biologists often characterize the plasticityof a genotype using a norm of reaction. This istypically a map from some parameter in the en-vironment to a phenotype. The RNA plastic rep-ertoires deviate from this standard frameworkin that the relevant environmental input—Brownian motion in a heat bath—is not easilyscaled on an axis. In this section we use a moreconventional norm of reaction: a map from tem-perature to minimum free energy structure. Thesuite of minimum free energy structures as theychange with temperature is known in biophysicsas “melting” profile. Figures 22 and 23 comparethe (computed) melting profiles of the sequencesfrom Fig. 15 at three levels.

First we compute the melting series, that is, thesuite of minimum free energy structures in thetemperature range from 0 to 100°C. Second wecalculate the heat capacity (at constant pressure),labeled H in Figs. 22 and 23, from the Gibbs freeenergy, G, of the ensemble of structural states bymeans of the partition function Z (McCaskill, ’90):

2

2 with log ,GH T G RT ZT

∂= − = −∂ (21)

where R is the universal gas constant. H can bemeasured empirically by differential scanningcalorimetry (DSC). DSC is widely used to deter-mine the thermophysical properties of materials.As a sample is heated over a range of tempera-ture, the material starts to undergo a phasechange that releases or absorbs heat. The calo-rimeter measures the heat flow (enthalpy change)into or out of the sample undergoing the phasechange, thereby providing data from which theheat capacity can be quantitatively recovered. Ourthird perspective on structural transitions are theBoltzmann probabilities as a function of tempera-ture for each minimum free energy structure ap-pearing in the melting series. These show thatthe peaks in H correspond to major structuralphase transitions occurring at the temperature atwhich the Boltzmann probabilities of the outgo-ing and incoming structures intersect.

In Fig. 22, we compare the melting profile forthe three sequences on the left of Fig. 15. Recallthat these sequences share a common minimumfree energy structure at 37°C. The melting behav-

Page 35: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

276 L. ANCEL AND W. FONTANA

ior of the inverse folded sequence is markedlymore disorderly than that of the neutrally evolvedand canalized sequences. Its melting series con-sists of as many as 10 shapes. The relevant ob-servation, however, is the absence of structuralfeatures that rearrange locally, particularly at lowtemperatures (when there is more structure). Inother words, temperature variations induce glo-bal refoldings of the shape. Furthermore, eachstructure in the melting series remains stable onlyover a small temperature range. As a consequence,the heat capacity consists of several small andclosely occurring humps.

The canalized sequence, in contrast, has ahighly localized melting behavior. We easily iden-tify structural features that melt individually atdistinct temperatures without affecting the integ-

rity of other parts of the structure. Often, but notalways, such structural units coincide with com-ponents in the sense of Fig. 1. The structure ofthe canalized sequence is stable up to 61°C, whenthe large T-like feature at the 5′ end disappearsalmost entirely in a single step. Remarkably, theother feature at the 3′ end is not affected, despitea new open sequence segment that is now avail-able for interaction. The 3′ feature melts at itsown transition temperature of 89°C in a singlestep. The major transitions from #1 to #2 and from#3 to #4 are well separated and marked by twosharp peaks in the heat capacity.

The neutrally evolved sequence occupies amiddle ground. It has fewer transitions and morehighly preserved structural similarity across tran-sitions than the inverse folded sequence, but to a

Fig. 22. Melting behavior I. Left of each graph are theminimum free energy structures in the temperature range0–100°C for three sequences obtained by inverse folding, neu-tral evolution, and canalization on the same structure (Fig.

15, left). The graphs trace the temperature dependence ofthe specific heat (H) and the Boltzmann probabilities of theindividual structures in the melting series.

Page 36: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 277

lesser extent than the canalized sequence. It de-parts significantly from the canalized sequence inits lack of independently melting features. For ex-ample, the transition from #1 to #2, involving thehairpin structure at the 3′ end, neither melts thisfeature completely nor does it preserve some ofits subfeatures. The same holds for the structureportion at the 5′ end.

This comparative analysis of melting behaviorssuggests one facet of “modularity”: the thermo-physical independence of a structural trait fromother traits over a wide temperature range. Notethat this is a quite different notion of “unit” thanwhat is obtained from parsing a shape into struc-tural units based on morphology alone. In fact,all three sequences considered have the same mor-phology at 37°C, but only the canalized sequenceis modular in the sense of thermophysically au-tonomous units.

Figure 23 illustrates the same points. Note,however, that the modules evidenced in Fig. 22are larger structural assemblies than in the caseof Fig. 23.

5.2 Kinetics: modularity and funnelsA recent stochastic model of kinetic RNA fold-

ing at elementary step resolution permits thestudy of folding pathways (Flamm et al., ’99). Thefolding kinetics of our three sequences provides afurther perspective on modularity. The foldingpathways of RNA play a role analogous to devel-opmental pathways of organisms.

The tree graphs in Fig. 24 represent differentorganizations of the energy landscape on whichthe folding process occurs. For each of our threesequences on the left of Fig. 15 the portion of theenergy landscape shown comprises the 50 lowestlocal free energy minima (courtesy Christoph

Fig. 23. Melting behavior II. See caption to Fig. 22. The 37°C ground state structure forthese sequences is the one pictured on the right of Fig. 15.

Page 37: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

278 L. ANCEL AND W. FONTANA

Flamm, University of Vienna). A local minimumcorresponds to a leaf, and leaves are grouped intobasins which are further linked to one another ina hierarchical fashion. The heights of internalnodes represent energy barriers connecting twolocal minima or their basins (Flamm et al., ’99).In other words, the height of the lowest internalnode connecting two leaves represents the ener-getic requirement of refolding from one structureinto the other. The lower right graph of Fig. 24shows the distribution of folding times (first pas-sage times from the unfolded sequence to the na-tive structure) for these sequences.

The differences are striking. Not only does the

Fig. 24. Folding kinetics and energy landscapes. Thegraph on the lower right shows the folding time distributionsfor the three classes of sequences shown on the left of Fig.15, based on a stochastic model of kinetic folding whose el-ementary moves consist in the making, breaking and shift-ing of a single base pair (Flamm et al., ’99). Images (1), (2),

and (3) are the inverse folded, neutrally evolved, and cana-lized sequences, respectively. The trees depict the energylandscape associated with each sequence in terms of the hi-erarchical organization of barriers separating individualstates and their basins. The distance from the root (top) ofthe tree represents the free energy.

canalized sequence fold much more rapidly thanthe others, but it folds predominantly along onewell-defined pathway and has, therefore, onedominant time scale. Its energy surface resemblesa funnel (Bryngelson and Wolynes, ’87; Dill andChan, ’97), suggesting that individual structuralunits fold independently from one another. Se-quence segments of one unit are unlikely tocrossfold with segments of other units, whichwould cause traps delaying the formation of thenative structure. (There is a folding trap [notshown], visited with low probability, that accountsfor the tail of the distribution. It is due to earlydiffusion among several high-energy shapes until

Page 38: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 279

the molecule drops into the funnel and folds). Incontrast, the inverse folded sequence has an en-ergy landscape without much structure, as wellas high barriers separating individual states. Thesequence can misfold in many ways, giving riseto a broad distribution of folding times. The neu-trally evolved sequence exhibits two funnels, oneof which leads to a misfold.

5.3. Context insensitivityWe used the melting behavior and the organi-

zation of the energy landscape to identify modu-larity with respect to temperature change andfolding dynamics. We next turn to modularity asidentified through genetic perturbation.

Point mutations cause only local disruption (ifany) to the minimum free energy structure of alow-plasticity sequence. A thermodynamicallywell-defined stacking region is unlikely to unfoldcompletely when a point mutation knocks out asingle base pair, while a marginally stable stack-ing region is likely to unwind, leading to a globalrearrangement of the structure. Modules bufferagainst extensive rearrangements. This aspect of

modularity is similar to low pleiotropy in anorganismal context (Wagner and Altenberg, ’96).

Perhaps the most defining property of modulesis the maintenance of their structural integrityacross changing genetic contexts. We slice fromeach sequence the segment st that folds into a par-ticular structural unit t. We then paste to the leftand right of st random segments half its size andask whether st still folds into t despite the newgenetic embedding. The chart in Fig. 25 showsthe fraction of 1,000 such foldings that maintaint. Indeed, canalized sequences have structuralcomponents that are much more context indepen-dent than inverse folded sequences. In a few cases,neutrally evolved sequences withstand contextualmodification as well as the canalized ones.

To summarize, plastogenetic congruence statesthat plasticity (the environmental variance of phe-notype) mirrors variability (the mutational sensi-tivity of phenotype). This section on modularity,in essence, further elaborates this theme. Ther-modynamic and kinetic autonomy of units, asmanifest in the norms of reaction to temperatureand the organization of the energy landscape, cor-relates to the autonomy of those same units withrespect to changing genetic contexts. A computa-tional analysis of naturally occurring sequencessuggests that functionally important structureshave heightened context insensitivity (Wagner andStadler, ’99). Recall, however, that in our simula-tions sequences were never selected for modular-ity, only for reduced plasticity. Direct selectionpressures for modularity may exist, but this analy-sis demonstrates that the emergence of modular-ity does not require them. Modularity arises, likegenetic canalization, as a byproduct of environ-mental canalization.

6. DISCUSSION

Biological evolution changes not only the frequen-cies of extant phenotypes, but the phenotypes them-selves. A population genetic analysis of the fate ofinnovations under natural selection provides onlya partial story that must be integrated with atheory of phenotypic innovations (Buss, ’87).

We turn our attention to a simple but nontrivialevolvable object: an RNA molecule. RNA providesboth a theoretically and empirically well-charac-terized high-dimensional relation between geno-type (sequence) and phenotype (structure). Themain algorithms for RNA folding were developedtwenty years ago (Nussinov et al., ’78; Watermanand Smith, ’78; Zuker and Stiegler, ’81), not withthe present questions in mind but rather as a tool

Fig. 25. Context insensitivity of modules. The sequencesegments underlying the 5′ structural component (A) and the3′ component (B) of the shapes on the left and right of Fig.15, respectively, are embedded in random sequence contexts.The chart shows the frequency with which the segment re-tained its original structure if it originated in an inversefolded, neutrally evolved, and canalized sequence.

Page 39: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

280 L. ANCEL AND W. FONTANA

to assist experimentalists. This divergence of ap-plications suggests that inscription errors (thatis, the construction of models which predeterminethe desired output) are less likely here than forgenotype–phenotype models constructed with par-ticular definitions of epistasis, plasticity, or modu-larity in mind.

Recent advances in RNA folding (McCaskill, ’90;Flamm et al., ’99; Wuchty et al., ’99) enable us tointroduce and analyze a form of environment–geneinteraction which we call plasticity. The result isa powerful model system in which concepts likeplasticity, evolvability, epistasis, and modularitynot only can be precisely defined and statisticallymeasured, but reveal simultaneous and pro-foundly non-independent effects of natural selec-tion. Although these concepts were introduced inthe study of organismal evolution, we demonstratethat they apply to the molecular domain as welland are optimistic that lessons from RNA may, inturn, provide robust insight.

The secondary structure into which an RNA se-quence folds is determined not by the primary se-quence alone but also by environmental inputssuch as temperature and the presence of otherpotentially interacting molecules. As a surrogatefor such environmental factors, we map a se-quence to a repertoire of its thermodyanamicallymost stable structures. We assume that theBoltzmann coefficient—a variable reflecting thethermodynamic stability of a structure relative toall other structures within the configuration spaceof a sequence—is proportional to the time thatsequence would spend in the given structure un-der a heterogeneous environment.

The most striking outcome of our simulationsis the dramatic loss of variability that accompa-nies the evolutionary reduction of diversity in theplastic repertoires. Recall that variability is thepotential of a population of sequences to innovatephenotypically. We gain a deeper understandingof the loss of variability through two lines of in-quiry. First we construct the causal bridge fromthe assumptions of our model—point mutation, aplastic genotype–phenotype map, and fitnessbased on the average structural distance to tar-get over all structures in a plastic repertoire—tothe observed evolutionary dead-end. Second, wecharacterize in as many dimensions as possiblethe typical genotype and phenotype distributionfor a steady state population evolved under theplastic map. These two objectives are highly in-terrelated. The link between plasticity and vari-ability is shown by a close look at the evolving

distribution of sequences and their shape reper-toires.

The loss of variability stems from two simpleobservations:

1. The more variation in the plastic repertoire,the less time a sequence spends in its beststructure. In this way, plasticity is costly andis ultimately reduced by natural selection inconstant environments.

2. There is a significant overlap between theshapes in the plastic repertoire of a sequenceand the set of minimum free energy struc-tures of genetically proximate sequences, i.e.,of sequences that differ from it by one muta-tion. We call this property plastogenetic con-gruence.

The first observation is a straightforward con-sequence of our (biophysically motivated) plasticfitness function. We verify the second, which restson the intuition that a point mutation can tip thefolding landscape of a sequence in favor of anylow-energy structure, through several statisticalassays. Phenocopies, environmentally triggeredtraits that correspond to mutant phenotypes, pro-vide evidence for the generality of plastogeneticcongruence in nature. The effects of high tempera-ture on moth antenna morphology (Goldschmidt,’41), of ether on Drosophila melanogaster thoracicdevelopment (Waddington, ’42; Gibson and Hog-ness, ’96) and of gold foil on the fibular crest inbirds (Müller, ’90) among many other environmen-tal perturbations have been shown to mimic knownmutants or ancestral morphologies (Stearns, ’93;Schlichting and Pigliucci, ’98). Because the alignmentbetween environmental variability (plasticity) andgenetic variability has met the skepticism of evo-lutionary geneticists, it is not well-integrated intomainstream evolutionary thinking. Our exampleof the plastic RNA map illustrates its relevance toevolutionary theory and instantiates the claim thatplastogenetic congruence is a fairly ubiquitousproperty of genotype–phenotype relationships.

Under the conditions studied here, natural se-lection reduces plasticity. Plastogenetic congruencethen implies that a drop in diversity in the plas-tic repertoire entails a drop in the diversity ofminimum free energy structures in the one-errorneighborhood (the set of one-error mutants of agiven sequence). Natural selection on plasticityindirectly curtails phenotypic novelty accessibleby mutation, and hence the potential to evolve.

Through idealized models of a plastic popula-

Page 40: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 281

tion, we formalize this explanation for the declinein evolvability. These models become analyticallytractable under the assumption that the plasto-genetic congruence is perfect. In other words, theplastic repertoire for any given sequence is ex-actly the set of minimum free energy structuresin the one-error neighborhood. As the fraction ofone-error mutants with identical minimum freeenergy structure (neutrality) increases, the abil-ity to evolve decreases.

The models formally connect mutation rate, thetopography of a phenotype space and evolvability.Assuming that the population has reached a neu-tral network relatively close to the target shape,we identify three phases of equilibrium distribu-tions: the exploration catastrophe, when the popu-lation is concentrated in a highly neutral regionand so cannot access phenotypic novelty; the er-ror catastrophe, when the population falls off theneutral network into the rest of genotype spacewith on average much lower fitness; and the idealphase, when the population remains in regions ofthe neutral network that have mutational accessto the rest of phenotype space. The fate of thepopulation as mutation increases depends on thestructure of the neutral network. For some, in-creasing mutation rate takes the population fromthe exploration catastrophe through the idealphase to the error catastrophe. For many neutralnetworks, however, the exploration threshold ex-ceeds the error threshold. Upon increasing muta-tion rate, a population goes immediately from anexploration catastrophe to an error catastrophe.Simulation suggests that this is the predicamentof our steady state RNA populations under theplastic map.

Plastogenetic congruence is a robust statisticalfeature of the RNA folding map from sequencesto secondary structures with broad population ge-netic implications. In particular, it instantiates thehypothesis put forward by Wagner et al. (’97) thatgenetic canalization—buffering against phenotypiceffects of mutation—occurs as a byproduct of en-vironmental canalization—the evolution of resil-ience to environmental perturbation.

In meeting the second objective, a character-ization of the phenotypic consequences of natu-ral selection, we discovered a second, equallyremarkable byproduct of environmental canaliza-tion: modularity. By modules we do not mean asyntactical property of structures (which is trivialin RNA) but rather autonomous components thatmaintain their structural integrity across a broadrange of environmental and genetic contexts and

that lose integrity through sudden and discretesteps without affecting the remaining structure.Modularity manifests itself as a resistance to sus-tained environmental or genetic perturbation, andthe dissolution of modules translates into sharpand well separated phase transitions. In section5, we compare thermophysical, genetic and kineticaspects of modularity across sequences that weregenerated by different processes but share thesame minimum free energy structure. Modular-ity appears thermophysically as distinct meltingtemperatures of structural components that van-ish upon melting, leaving an open chain segment,rather than a different structural arrangement.Modularity appears kinetically as a single primaryfolding funnel over the configurational landscapeof a sequence. There is little probability of mis-folding because the folding of a modular compo-nent does not interfere with the folding of othercomponents. Modularity appears genetically as cas-sette-like behavior, by which modular structuralcomponents have a markedly increased probabil-ity of maintaining their integrity if transplantedinto different sequence contexts. This agrees withthe principles of RNA architecture discoveredthrough recent crystallizations of catalytic RNAs[e.g., Cech et al. (’94); for an overview see Westhofet al. (’96)].

Modularity is both a manifestation of evolution-ary lock-in, and provides the basic tool for escap-ing it. The evolutionary stability of modules makesthem, in conjunction with their context-insensi-tivity (transposability), natural building blocks forconstructing novelty at a higher combinatoriallevel (Wagner and Altenberg, ’96). The shift to-ward a combinatorics of modular elements to es-cape evolutionary lock-in appears to be less of aconvenient ad hoc innovation in evolutionary pro-cess than the only route left, once modules ariseas a byproduct of environmental canalization ina constant environment.

Throughout this work, we hold the temperatureand target shape constant, and find an intuitiveevolutionary loss of plasticity and some surprisingcorrollaries. What happens, though, when the en-vironment is not constant? In particular, what con-ditions favor the maintenance of plasticity? Wehope to analyze several modes of environmentalvariability: temperature heterogeneity, target fluc-tuations, and joint folds with co-occuring molecules.

ACKNOWLEDGMENTSThanks to our colleagues Ivo Hofacker, Peter

Stadler, and Stefan Wuchty for their contributions

Page 41: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

282 L. ANCEL AND W. FONTANA

to the Vienna RNA package. We are grateful to LeoBuss, Marc Feldman, Christoph Flamm, PeterGodfrey-Smith, Ellen Goldberg, James Griesemer,Martijn Huynen, Erica Jen, Michael Lachmann,Laura Landweber, Mark Newman, Erik van Nim-wegen, Peter Schuster, and Andreas Wagner forvaluable discussions and comments on the manu-script. Günter Wagner spotted several mistakesin a previous draft. Junhyong Kim’s thoughtfulcomments led to a much improved revision.Christoph Flamm kindly provided the folding ki-netics of section 5. Matt Bell, 1999 REU summerintern at the Santa Fe Institute, shared valuableinsight and back-up computations. The presentwork was supported by a Steinmetz Fellowshipand a U.S. National Defense Science and Engi-neering Fellowship to L.A., by NIH Grant GM-28016 to M.W. Feldman, by the Keck Foundation,and by core grants to the Santa Fe Institute fromthe John D. and Catherine T. MacArthur Foun-dation, the National Science Foundation, and theU.S. Department of Energy. W.F. and his researchprogram are supported by Michael A. Grantham.

REFERENCESAltenberg L. The schema theorem and Price’s theorem. In:

Whitley D, Vose MD, editors, Foundations of Genetic Algo-rithms, pages 23–49. MIT Press, Cambridge, MA, 1995.

Ancel LW. 1999a. A quantitative model of the Simpson–Baldwin effect. J Theor Biol 196:197–209.

Ancel LW. 1999b. Undermining the Baldwin expediting ef-fect: does phenotypic plasticity accelerate evolution? Theo-retical Population Biol (in press).

Baldwin JM. 1896. A new factor in evolution. Am Nat30:441–451.

Bartel DP, Szostak JW. 1993. Isolation of new ribozymes froma large pool of random sequences. Science 261:1411–1418.

Beaudry AA, Joyce GFR. 1992. Directed evolution of an RNAenzyme. Science 257:635–641.

Bonner JT. 1988. The evolution of complexity. Princeton, NJ:Princeton University Press.

Bornberg-Bauer E, Chan HS. 1999. Modeling evolutionarylandscapes: mutational stability, topology, and superfunnelsin sequence space. Proc Natl Acad Sci USA 96:10689–10694.

Bryngelson JD, Wolynes PG. 1987. Spin glasses and the sta-tistical mechanics of protein folding. Proc Natl Acad Sci USA84:7524–7528.

Buss LW. 1987. The evolution of individuality. Princeton, NJ:Princeton University Press.

Bussemaker HJ, Thirumalai D, Bhattacharjee JK. 1997. Ther-modynamic stability of folded proteins against mutations.Phys Rev Lett 79:3530–3533.

Cech TR, Damberger SH, Gutell RR. 1994. Representation ofthe secondary and tertiary structure of group I introns. NatStruct Biol 1:273–280.

Dill KA, Chan HS. 1997. From Levinthal to pathways to fun-nels. Nat Struct Biol 4:10–19.

Eigen M. 1971. Selforganization of matter and the evolu-tion of biological macromolecules. Naturwissenschaften58:465–523.

Eigen M, McCaskill JS, Schuster P. 1989. The molecularquasispecies. Adv Chem Phys 75:149–263.

Ekland EH, Szostak JW, Bartel DP. 1995. Structurally com-plex and highly active RNA ligases derived from randomRNA sequences. Science 269:364–370.

Ellington AD. 1994. RNA selection. Aptamers achieve the de-sired recognition. Curr Biol 4:427–429.

Ellington AD, Szostak JW. 1990. In vitro selection of RNAmolecules that bind specific ligands. Nature 346:818–822.

Flamm C, Fontana W, Hofacker IL, Schuster P. 2000. RNAfolding at elementary step resolution. RNA 6:325–338.

Fontana W, Schuster P. 1987. A computer model of evolution-ary optimization. Biophys Chem 26:123–147.

Fontana W, Schuster P. 1998a. Continuity in evolution: onthe nature of transitions. Science 280:1451–1455.

Fontana W, Schuster P. 1998b. Shaping space: the possibleand the attainable in RNA genotype–phenotype mapping.J Theor Biol 194:491–515.

Fontana W, Konings DAM, Stadler PF, Schuster P. 1993a.Statistics of RNA secondary structures. Biopolymers33:1389–1404.

Fontana W, Stadler PF, Bornberg-Bauer EG, Griesmacher T,Hofacker IL, Tacker M, Tarazona P, Weinberger ED,Schuster P. 1993b. RNA folding and combinatory landscapes.Phys Rev E 47:2083–2099.

Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH,Neilson T, Turner DH. 1986. Improved free-energy param-eters for prediction of RNA duplex stability. Proc Natl AcadSci USA 83:9373–9377.

Gavrilets S, Hastings A. 1994. A quantitative-genetic modelfor selection on developmental noise. Evolution 48:1478–1486.

Gibson G, Hogness DS. 1996. Effect of polymorphism in theDrosophila regulatory gene ultrabithorax on homeotic sta-bility. Science 271:200–203.

Gillespie DT. 1976. A general method for numerically simu-lating the stochastic time evolution of coupled chemical re-actions. J Comp Phys 22:403–434.

Gillespie DT. 1977. Exact stochastic simulation of coupledchemical reactions. J Phys Chem 81:2340–2361.

Goldschmidt RB. 1940. The material basis of evolution. NewHaven, CT: Yale University Press.

Govindarajan S, Goldstein RA. 1997. The foldability landscapeof model proteins. Biopolymers 42:427–438.

Grüner W, Giegerich R, Strothmann D, Reidys C, Weber J,Hofacker IL, Stadler PF, Schuster P. 1996a. Analysis of RNAsequence structure maps by exhaustive enumeration. I. Neu-tral networks. Monatsh Chem 127:355–374.

Grüner W, Giegerich R, Strothmann D, Reidys C, Weber J,Hofacker IL, Stadler PF, Schuster P. 1996b. Analysis of RNAsequence structure maps by exhaustive enumeration. II.Structure of neutral networks and shape space covering.Monatsh Chem 127:375–389.

Hartwell LH, Hopfield JJ, Leibler S, Murray AW. 1999. Frommolecular to modular cell biology. Nature 402:C47–C52.

He L, Kierzek R, SantaLucia J, Walter AE, Turner DH. 1991.Nearest-neighbour parameters for G-U mismatches. Bio-chemistry 30:11124.

Hinton GE, Nowlan SJ. 1987. How learning can guide evolu-tion. Complex Systems 1:495–502.

Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, TackerM, Schuster P. 1994. Fast folding and comparison of RNAsecondary structures. Monatsh Chem 125(2):167–188.

Hofacker IL, Fontana W, Stadler PF, Schuster P. 1994–1998.Vienna RNA Package. http://www.tbi.univie.ac.at/~ivo/RNA/(free software).

Page 42: Plasticity, Evolvability, and Modularity in RNAevolvability, epistasis, and modularity can not only be precisely defined and statistically measured but also reveal simultaneous and

PLASTICITY AND EVOLVABILITY 283

Hofacker IL, Schuster P, Stadler PF. 1999. Combinatorics ofRNA secondary structures. Discrete Appl Math 89:177–207.

Huynen MA, Stadler PF, Fontana W. 1996. Smoothness withinruggedness: The role of neutrality in adaptation. Proc NatlAcad Sci USA 93:397–401.

Huynen MA, Gutell R, Kongings D. 1997. Assessing the reli-ability of RNA folding using statistical mechanics. J MolBiol 267:1104–1112.

Jaeger JA, Turner DH, Zuker M. 1989. Improved predictionsof secondary structures for RNA. Proc Natl Acad Sci USA86:7706–7710.

Joyce GF. 1989. Amplification, mutation and selection of cata-lytic RNA. Gene 82:83–87.

Landweber LF. 1999. Experimental RNA evolution. TrendsEcol Evolution (TREE) 14:353–358.

Landweber LF, Pokrovskaya ID. 1999. Emergence of a dual-catalytic RNA with metal-specific cleavage and ligase ac-tivities: the spandrels of RNA evolution. Proc Natl Acad SciUSA 96:173–178.

Maynard-Smith J. 1987. Natural selection: when learningguides evolution. Nature 329:761–762.

McCaskill JS. 1990. The equilibrium partition function andbase pair binding probabilities for RNA secondary struc-ture. Biopolymers 29:1105–1119.

Mills DR, Peterson RL, Spiegelman S. 1967. An extracellularDarwinian experiment with a self-duplicating nucleic acidmolecule. Proc Natl Acad Sci USA 58:217–224.

Müller GB. 1990. Developmental mechanisms at the originof morphological novelty: a side-effect hypothesis. In: NiteckiMH, editor. Evolutionary Innovations. Chicago: Universityof Chicago Press. p 99–130.

Nussinov R, Jacobson AB. 1980. Fast algorithm for predict-ing the secondary structure of single-stranded RNA. ProcNatl Acad Sci USA 77(11):6309–6313.

Nussinov R, Piecznik G, Griggs JR, Kleitman DJ. 1978. Algo-rithms for loop matching. SIAM J Appl Math 35(1):68–82.

Reidys C, Stadler PF. 1996. Bio-molecular shapes and alge-braic structures. Comput Chem 20:85–94.

Reidys C, Stadler PF, Schuster P. 1997. Generic properties ofcombinatory maps: neutral networks of RNA secondarystructures. Bull Math Biol 59:339–397.

Reidys C, Forst CV, Schuster P. 1998. Replication and muta-tion on neutral networks. Preprint SFI 98-04-036. Avail-able electronically from www.santafe.edu. Santa Fe: SantaFe Institute.

Scheiner SM. 1993. Genetics and evolution of phenotypic plas-ticity. Annu Rev Ecol Syst 24:35–368.

Schlichting CD, Pigliucci M. 1998. Phenotypic evolution: areaction norm perspective. Sunderland, MA: Sinauer Asso-ciates, Inc.

Schmalhausen II. 1949. In: Dobzhansky T, editor. Factors ofevolution. Philadelphia: Blakiston.

Schuster P. 1997. Landscapes and molecular evolution.Physica D 107:351–365.

Schuster P, Fontana W. 1999. Chance and necessity in evolu-tion: Lessons from RNA. Physica D: Nonlinear Phenomena133:427–452.

Schuster P, Fontana W, Stadler PF, Hofacker I. 1994. From

sequences to shapes and back: a case study in RNA second-ary structures. Proc R Soc (London) B 255:279–284.

Simpson GG. 1953. The Baldwin effect. Evolution 7:110–117.Spiegelman S. 1971. An approach to the experimental analy-

sis of precellular evolution. Q Rev Biophys 4:213–253.Stearns SC. 1993. The evolutionary links between fixed and

variable traits. Acta Paleontol Pol 38:1–17.Stein PR, Waterman MS. 1978. On some new sequences gen-

eralizing the Catalan and Motzkin numbers. Discrete Math26:261–272.

Tuerk C, Gold L. 1990. Systematic evolution of ligands byexponential enrichment (SELEX): RNA ligands to bacte-riophage T4 DNA polymerase. Science 249:505–510.

Turner DH, Sugimoto N, Freier S. 1988. RNA structure pre-diction. Annu Rev Biophys Biophys Chem 17:167–192.

van Nimwegen E, Crutchfield JP, Huynen M. 1999a. Neutralevolution of mutational robustness. Preprint SFI 99-03-021.Santa Fe: Santa Fe Institute. (Available electronically fromwww.santafe.edu.)

van Nimwegen E, Crutchfield JP, Mitchell M. 1999b. Statis-tical dynamics of the royal road genetic algorithm. TheorComp Sci 229:41–102.

Vendruscolo M, Maritan A, Banavar JR. 1997. Stabilitythreshold as a selection principle for protein design. PhysRev Lett 78:3967–3970.

Waddington CH. 1942. Canalization of development and theinheritance of acquired characters. Nature 3811:563–565.

Waddington CH. 1957. The strategy of the genes. New York:MacMillan Co.

Wagner A. 1996. Does evolutionary plasticity evolve? Evolu-tion 50(3):1008–1023.

Wagner A, Stadler PF. 1999. Viral RNA and evolved muta-tional robustness. J Exp Zool/MDE 285:119–127. (SFI work-ing paper 99-02-010, available from http://www.santafe.edu.)

Wagner GP, Altenberg L. 1996. Complex adaptations and theevolution of evolvability. Evolution 50:967–976.

Wagner GP, Booth G, Bagheri-Chaichian H. 1997. A popula-tion genetic theory of canalization. Evolution 51:329–347.

Wagner GP, Laubichler MD, Bagheri-Chaichian H. 1998. Ge-netic measurement theory of epistatic effects. Genetica 102/103:569–580.

Waterman MS. 1995. Introduction to computational biology:sequences, maps and genomes. London: Chapman & Hall.

Waterman MS, Smith TF. 1978. RNA secondary structure: acomplete mathematical analysis. Math Biosci 42:257–266.

Weber J. 1997. Dynamics of neutral evolution. Ph.D. disser-tation, Friedrich-Schiller-Universität, Jena, Germany, 1997(available electronically from http://www.tbi.univie.ac.at).

Westhof E, Masquida B, Jaeger L. 1996. RNA tectonics: to-wards RNA design. Folding Design 1:R78–R88.

Wuchty S, Fontana W, Hofacker IL, Schuster P. 1999. Com-plete suboptimal folding of RNA and the stability of sec-ondary structures. Biopolymers 49:145–165.

Zuker M, Stiegler P. 1981. Optimal computer folding of largerRNA sequences using thermodynamics and auxiliary infor-mation. Nucleic Acids Res 9:133–148.

Zuker M, Sankoff D. 1984. RNA secondary structures andtheir prediction. Bull Math Biol 46(4):591–621.