Evolution of gene regulatory networks: Robustness as an emergent property of evolution

17
Physica A 387 (2008) 2170–2186 www.elsevier.com/locate/physa Evolution of gene regulatory networks: Robustness as an emergent property of evolution Arun Krishnan a , Masaru Tomita a , Alessandro Giuliani b,* a Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan b Dept. of Environment and Health, Istituto Superiore di Sanita, Rome, Italy Received 11 July 2007; received in revised form 1 October 2007 Available online 19 November 2007 Abstract Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Although much work has been done in elucidating the transcriptional regulatory network, the underlying mechanisms that have possibly influenced the evolution of these GRNs are still debatable. We have developed a framework to analyze the effect of objective functions, input types and starting populations on the evolution of GRNs with a specific emphasis on the robustness of evolved GRNs. We observed that robustness evolves along with the networks as an emergent property even in the absence of specific selective pressure towards more robust systems. In addition, robustness was independent of the selective pressure, input types or the initial starting populations. We also observed the existence of multiple genotypes giving rise to the same phenotype in accordance with the theoretical view that natural selection operates on phenotypes thereby accommodating variation in the genotype by fixing those changes that are phenotype-neutral. This study gives a proof-of-concept of the fact that robustness is an emergent property of GRNs as well as of the degeneracy of the network topology/function relationship analogous to the sequence/structure problem in proteins. c 2007 Elsevier B.V. All rights reserved. PACS: 87.17.-d; 82.39-k; 84.35.+1 Keywords: Networks; Gene expression regulation; Computational biology 1. Introduction Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years due to the rapid improvement in high-throughput sequencing technologies allied with new experimental strategies and advances in computational modeling and informational technology. The basic unit of gene regulation consists of a transcription factor, its DNA binding site and the target gene or transcription unit that it regulates [1]. In GRN, transcription factors (TFs) receive inputs from upstream signal transduction processes and in response, bind directly or indirectly, via other TFs or cofactors to target sequences in the promoter or cis-regulatory regions of target genes. These bound TFs can then promote or repress transcription by stimulating or repressing the assembly of preinitiation complexes. The resulting network is a complex, multilayered system that can be examined at multiple levels of details [2]. Much of the work related to GRNs has focused on the elucidation of regulatory networks from time-series gene * Corresponding author. Tel.: +39 0649902579. E-mail address: [email protected] (A. Giuliani). 0378-4371/$ - see front matter c 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2007.11.022

Transcript of Evolution of gene regulatory networks: Robustness as an emergent property of evolution

Page 1: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

Physica A 387 (2008) 2170–2186www.elsevier.com/locate/physa

Evolution of gene regulatory networks: Robustness as an emergentproperty of evolution

Arun Krishnana, Masaru Tomitaa, Alessandro Giulianib,∗

a Institute for Advanced Biosciences, Keio University, Tsuruoka, Japanb Dept. of Environment and Health, Istituto Superiore di Sanita, Rome, Italy

Received 11 July 2007; received in revised form 1 October 2007Available online 19 November 2007

Abstract

Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Although much work has beendone in elucidating the transcriptional regulatory network, the underlying mechanisms that have possibly influenced the evolutionof these GRNs are still debatable. We have developed a framework to analyze the effect of objective functions, input types andstarting populations on the evolution of GRNs with a specific emphasis on the robustness of evolved GRNs.

We observed that robustness evolves along with the networks as an emergent property even in the absence of specific selectivepressure towards more robust systems. In addition, robustness was independent of the selective pressure, input types or the initialstarting populations. We also observed the existence of multiple genotypes giving rise to the same phenotype in accordance withthe theoretical view that natural selection operates on phenotypes thereby accommodating variation in the genotype by fixing thosechanges that are phenotype-neutral.

This study gives a proof-of-concept of the fact that robustness is an emergent property of GRNs as well as of the degeneracy ofthe network topology/function relationship analogous to the sequence/structure problem in proteins.c© 2007 Elsevier B.V. All rights reserved.

PACS: 87.17.-d; 82.39-k; 84.35.+1

Keywords: Networks; Gene expression regulation; Computational biology

1. Introduction

Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years due to the rapidimprovement in high-throughput sequencing technologies allied with new experimental strategies and advances incomputational modeling and informational technology. The basic unit of gene regulation consists of a transcriptionfactor, its DNA binding site and the target gene or transcription unit that it regulates [1]. In GRN, transcription factors(TFs) receive inputs from upstream signal transduction processes and in response, bind directly or indirectly, via otherTFs or cofactors to target sequences in the promoter or cis-regulatory regions of target genes. These bound TFs canthen promote or repress transcription by stimulating or repressing the assembly of preinitiation complexes.

The resulting network is a complex, multilayered system that can be examined at multiple levels of details [2].Much of the work related to GRNs has focused on the elucidation of regulatory networks from time-series gene

∗ Corresponding author. Tel.: +39 0649902579.E-mail address: [email protected] (A. Giuliani).

0378-4371/$ - see front matter c© 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.physa.2007.11.022

Page 2: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2171

expression data [3] as well as on the analysis of the dynamics of such systems [4–17]. The modeling of GRNshas utilized two key approximations [18]. These are: (a) control is exercised at the transcriptional level, and (b) theproduction of protein product is a continuous process with the rate determined by the balance of gene activation versusrepression. The first constraint, even though it is known not to be tenable in many cases, is considered a prerequisitewhile dealing with GRNs. In our approach we relaxed the strict transcriptional character of control by inserting a posttranslational modification (PTM) mechanism into the simulation.

Recent approaches have got rid of the second approximation by including the stochastic nature of the productionof individual protein molecules, which is a much more realistic representation than the continuous one given the smallnumber of molecules involved in a given instance of the modeled network. Methods used to model transcriptionalcontrol include both “Boolean” [7,19–22], and differential equation approaches [4,14,23–28] as well as hybridmethods [29,30]. The interested reader is referred to papers by Smolen et al. [18] and de Jong [31] for a moreexhaustive review of the existing modeling approaches. In an earlier work [32], we showed using a similar modelingapproach that the problem of reverse engineering of GRNs, solely on the basis of expression data is an indeterminateproblem since multiple genotypes can map to the same phenotype.

Studies have also been carried out towards understanding the structural principles of these networks [33,34]. Itis well known that biological and engineering systems share design features such as modularity [35,36] and thereuse of network motifs [33,37]. While efforts have been made at trying to understand the origin of these designfeatures [38–40], they have, for the most part been focused on example engineering systems. The evolutionarymechanisms that give rise to GRNs are still largely unknown. There have been efforts at understanding particularaspects of evolution, such as the correlation between development, evolution and robustness or canalization of thenetwork [41,42]. Studies on the evolution of GRNs has tended to focus on certain a priori assumptions about thenature of the evolutionary force such as stabilizing selection [35,43–45]. Rice et al. [46] utilized a more abstractapproach involving a generalized mapping of genotype to phenotype. Eschel et al. [47] presented an analytic model ofgenetic assimilation in which individuals randomly experienced environments favoring different phenotypic optimaand concluded that genotypes differ in their sensitivities to different environments. Similar conclusions were reachedby Pal et al. [48] while Kawecki [49] modeled a system in which the environmental conditions change from generationto generation. Siegal et al. [41] showed that the developmental process constrains the genetic system to producerobustness even in the absence of a selection towards optimum. Babu et al. [50] studied the evolutionary dynamicsof prokaryotic transcriptional regulatory networks and noted that organisms with similar lifestyles across a widephylogenetic range tend to conserve equivalent interactions and network motifs.

In our work, we have tried to come up with a more generic framework to study GRN evolution with the aim oftesting the hypothesis that network properties such as robustness and modularity could be considered as emergentproperties not specifically selected by evolution forces but, that are instead driven toward much more basic goals likethe reaching of an optimal growth of the organism. Our model consists of elements of the finite-state and stochasticalgorithms and simulates the process of gene regulation including transcription, translation, activation, inhibition andPost Translational Modification (PTM). The finite-state aspect of the model is based on the assumption (like Booleanmodels) that the important aspects of gene regulation can be described by binary on/off switches. For example, aprotein could be bound to a particular gene’s promoter region or it could be in the free state. Another example wouldbe that of a protein which could toggle between activated and de-activated states based on its interaction with a PTMagent. Although this behavior could well have been modeled by using differential equations, we chose to model itusing a stochastic algorithm since, at the biophase, the concentration of relevant molecules (e.g. drugs binding to agiven receptor molecule) is extremely low (on the order of five to ten molecules). This prevents any application ofsomething similar to the law mass action and implies a stochastic character of the response that we try to model withour paradigm. There is a vast literature [51–53] pointing to the need to take into consideration the intrinsic stochasticityof gene regulation networks to explain some emergent features of gene expression.

We have used a genetic algorithm to evolve a given population over successive generations, where reproductionassumes segregation followed by mutation. We study the effect of input types, evolutionary pressures as well asstarting states of the population on the evolution of the network. We observe that despite differing conditions andnetwork connectivities, robustness emerges spontaneously as a by-product of evolution. In agreement with the work ofTsong et al. [54], we observe the existence of multiple genotypes giving rise to the same phenotype in accordance withthe theoretical view that natural selection operates on phenotypes thereby accommodating variation in the genotypeby fixing those changes that are phenotype-neutral.

Page 3: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2172 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

2. Methods

Our approach towards the simulation of GRNs is a mix of the finite-state model pioneered by Brazma et al. [55]and stochastic simulation. The model is based on the following assumptions:

• Each gene has a number of TF binding sites in its promoter region.• Each protein has a number of binding domains, with each binding domain able to bind to a specific gene.• The binding of a single activating protein to a specific site creates a complex that can then be bound to by RNA

Polymerase (RNAP) molecules.• The binding of a single repressive protein molecule to a binding site creates a complex that can no longer be bound

to by RNAP molecules.• An “active” gene is thus denoted by the presence of the corresponding complexes that can be bound to by RNAP

molecules.• Each protein has the possibility of undergoing PTM.• The PTM can activate or deactivate a protein.

It must be borne in mind that our model is a simplified representation of a very complex process. Notwithstandingthat, the model incorporates some basic features typical of eukaryotic regulation circuits like post translationalmodifications (PTM) and the multiplicity of binding domains of the same protein. The relatively low number ofelements involved derives from the explosion of optimization time with the increasing size of GRN; nevertheless thereare many regulation circuits involving even fewer genes (proteins) than our simulated systems. On another side theinvestigated features (robustness, indeterminacy of reverse engineering) should in principle be more easily appreciatedwith bigger network, thus implying that our analysis is based on restrictive (and thus more reliable) conditions. In anycase, even smaller and simplistic models have been used earlier to study the evolution of GRNs [35,44,56,57].

At its most basic level, the model is a finite-state one since the state of the network depends on thebinding/unbinding of proteins to the different binding sites in the promoter regions of the different genes. Each proteinhas binding domains for none or more genes. Similarly, each RNAP-cofactor complex can bind to none or more genesin order to transcribe them. The RNAP-cofactor complexes also evolve by either gaining or losing the ability to bindto and transcribe specific genes. Readers are referred to [32] and to Supplementary Materials for a figure showing theabstraction of our model given an example network.

While the genes in Brazma et al.’s model have binary (ON/OFF) states, gene activity in our model is governed bythe number of molecules of the “active” gene (that is one with promoter proteins bound to their promoter regions).As a result, the model stays closer to reality where a basal level of gene activity is present and genes are seldom seento exhibit purely binary state behavior. Additionally, in contrast to the work by Brazma et al. [55], time, in our caseis discrete. Moreover, the state affects the number of molecules of each species in the system. Additionally, we alsomodel the effect of reversible PTMs. We describe the model in more detail in the following section.

2.1. Model

Following the work of Hayot et al. [58] and Ingram et al. [59], our model of the gene regulatory network attemptsto describe the process of gene regulation from transcription binding to protein production in a physically reasonableway. As mentioned in Ref. [59], each gene (say i) is represented as having a section of DNA (Di ) which codes for thecorresponding mRNA (Mi ). This is preceded by the binding of transcription factors to the promoter region to form acomplex Qi . The transcription factors are one among the different protein species that are present in the system. Thenumber of proteins in the system usually consists of the inputs to the system as well as the products of the number ofgenes in the system. However, there can be a higher number of protein species than genes. This is in order to cater forall types of transcriptional regulators and will be discussed in greater detail below.

RNAP molecules (in combination with other cofactors) can then bind to Qi as they read the DNA forming a secondcomplex Q∗

i . This complex then breaks down on completion of the reading, thereby releasing Qi , Ri and the newlyformed Mi . The mRNA molecules are then translated to produce copies of the protein Pi .

Both positive and negative regulation have been included in the model. In case of negative regulation, protein Pibinding to the promoter region of gene j will result in the formation of a complex Q̄i . These molecules cannot bebound to by RNAP-cofactor complex molecules and hence repress the particular gene by inhibiting transcription. Theinhibition however is not independent of the binding order. Thus a regulator that inhibits the expression of a gene canonly bind a promoter region that has not been already bound by any other transcriptional regulator.

Page 4: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2173

Table 1Species present in our model

Species Description

I Input proteins (activating signals)D DNA moleculesQ Transcription factor-DNA complexes (active)Q∗ RNAP-cofactor-Q complexesQ̄ Transcription factor-DNA complexes (inactive)R RNAP-cofactor complexesM mRNA moleculesP Protein moleculesP∗ Active/Inactive protein molecules for proteins requiring PTMT PTM agentsNULL Null molecules for mono-molecular reactions

Proteins can also undergo PTMs. PTMs are of two types: activating and inhibiting. An activating PTM promotes theactivity of the protein while an inhibiting PTM deactivates the protein. It must be mentioned that PTMs in our modelare reversible. The species R can be viewed as either RNAP by itself or as an RNAP-cofactor complex. Typically,in our simulations, while simulating only the RNAP molecule, a single R species was utilized whereas multiple Rspecies implied that different RNAP-cofactor complexes were part of the system.

There are 11 species types present in our model as shown in Table 1. The allowed reactions between these speciestypes are given below:

D + Ik1k′

1

Q

D + Pk1k′

1

Q

D + Pk1k′

1

Q + Rk2k′

2

Q∗

Q∗ k3→ Q + R + M

Mk4→ M + P

Mk5→ ∅

Pk6→ ∅

P + Tk7k′

7

P∗+ T . (1)

The reactions between a particular section of DNA, D, and a protein P , or between the complex Q and the RNAP-cofactor complex R can only take place under certain conditions determined by the type of protein or RNAP-cofactorcomplex. We model each protein as having potentially up to g DNA-binding domains (one for each gene where g is thenumber of genes). Similarly, the different types of RNAP-cofactor complex can bind 1, 2, 3, . . . , g DNA-transcriptionfactor complexes Q.

There are three different types of transcription factors in the system: Those that influence the expression of othertranscriptional regulators but are themselves not transcriptionally regulated, those that regulate the expression of otherregulators and are also transcriptionally regulated, and those that do not regulate the expression of other transcriptionalregulators. While the first and the third types arise naturally out of the network evolution (that is those that do no bindto any transcription factor or those that bind to one or more transcription factors), the second type are a consequence

Page 5: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2174 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

Fig. 1. Simulation flowchart.

of the inclusion of more proteins than genes in the model. The extra proteins (over and above the number of genes)act as transcriptional regulators that are themselves not regulated by other regulators.

There are a finite number of PTM-agents (T ) and null molecules (NULL) in the system (see Simulation section).

2.2. Simulation

In order to better represent the low copy numbers of all these molecules in the actual cell, we simulate the reactionsusing a stochastic algorithm. The simulation flowchart is given in Fig. 1.

At each time instant, we pick two species at random. We check to see the compatibility of the species using thereactions given in Eq. (1). If the two molecules cannot take part in a reaction, say for example D and M , then noreaction takes place in that time interval. If however, the species can potentially interact, the subtypes of both thespecies are again chosen at random. If one of the two species is a P or an R the bits for the corresponding subtype arechecked to see whether the respective protein or RNAP-cofactor complex can bind to the second species. Additionally,the action of the selected protein species (positive or negative regulatory) is checked to ascertain its effect on the otherspecies. Also, the protein is checked to see if it can undergo PTM and if so whether the PTM is activating or inhibiting.

Page 6: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2175

Fig. 2. Example network: The red circles refer to inactivating PTMs. The green circles refer to activating PTMs. Ri refer to the RNAP-cofactorcomplexes, Pi refer to the proteins. Red and green arrows refer to activation and inhibition respectively while the violet arrows refer to transcriptionof the protein by the corresponding RNAP-cofactor complex. (For interpretation of the references to colour in this figure legend, the reader isreferred to the web version of this article.)

Once all the conditions have been satisfied, a further random number r is generated and only if r ≤ ki where ki isthe probability of occurrence of reaction i and the appropriate counts incremented and decreased according to thestoichiometric coefficients given by Eq. (1). This process is then repeated at the next time interval till the end of thesimulation time. Our simulation approach is closest in ethos to that of the StochSim [60] stochastic simulator. We alsomake use of null molecules in order to simulate mono-molecular reactions.

We simulate the model for a total of T time intervals of δt seconds each (with δt = 0.001 for our simulations).A stochastic simulation can give different results depending on the random numbers used. Hence for each input-type/seed-type/objective-type combination, the algorithm was run 20 times and the run with the highest objectivefunction value at the end of the evolution time was chosen.

2.3. Network evolution

In order to study how this network evolves, we use a genetic algorithm that finds the optimum combination ofdomains in the different proteins and RNAP-cofactor complexes that results in the maximization of some objectivefunction after some arbitrary simulation time T (300 s in this case). The number of genes, RNAP-cofactor complexesand proteins are denoted by g, r and p respectively. Each protein is represented using 2g + 2 bits (also called alleles)while each RNAP-cofactor complex is represented using g bits (alleles), one for each gene. The first g bits of eachprotein represent the binding domains to each of the g promoter regions (1 for presence and 0 for absence of thedomains), while the next g bits indicate the type of regulatory action directed towards the respective proteins (1 forpromotion and 0 for inhibition). The last 2 bits represent the effect of PTM, with bit 2g + 1 representing the presenceor absence of PTM and bit 2g+2 representing the nature of the PTM (1 for activating or 0 for inhibiting). For examplea protein with a representation 1001-0001-11 (separated by a hyphen for ease of understanding) implies that it canbind to the promoter regions for genes 1 and 4 and that while the regulatory action is negative for gene 1, it is positivefor gene 4 (since the first and fourth bits for the second half of the bit string are 0 and 1 respectively). The protein alsorequires PTM for activation as suggested by the last two bits (11 implies that PTM is required and it is an activatingmodification). A similar representation is made for the RNAP-cofactor complex using one bit for each gene. The pproteins are encoded using a (2g+2)p-long bit string (2g+2 bits for each protein) and we do the same for the RNAP-cofactor complexes using an rg-long bit string. The two encodings are then concatenated to give a chromosome oflength (2g + 2)p + rg bits (alleles).

Fig. 2 shows a simple four-gene network with four RNAP-cofactor complexes. Two of the proteins P1 and P4 havePTMs with one being activating (green circle) and the other inactivating (red circle). The bit string representation forthis network is given in Eq. (2).

0011-0011-10︸ ︷︷ ︸ 0011-0001-00︸ ︷︷ ︸ 0000-0000-00︸ ︷︷ ︸ 0001-0000-11︸ ︷︷ ︸P1 P2 P3 P4

1000︸ ︷︷ ︸ 0100︸ ︷︷ ︸ 0010︸ ︷︷ ︸ 0001︸ ︷︷ ︸R1 R2 R3 R4

. (2)

Page 7: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2176 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

Each bit string representation of a network corresponds to a specific individual in the population. A population of40 individuals is initially seeded. Two different starting conditions have been used:

• fully connected: All proteins can bind to the promoter regions of all genes and all RNAP-cofactor complexes canbind to and transcribe all genes. The whole population consists of identical individuals.

• random network: Random networks are generated for each individual in the population.

The fully connected network represents the most plausible model as a starting point for the simulation of networkevolution. A fully connected network bestows broad specificity on the regulatory actions of the transcriptionalregulators, at the cost of low turnover for any single gene.

In each generation, two individuals in the population are chosen at random to mate in order to produce offspring(with a crossover probability of 0.85). Mutations can affect chromosomes that are not mating in a given generationwith a certain mutation probability. Although typically, mutation probabilities are very low, in order to facilitate fasterconvergence of the algorithm, high mutation values (0.8) are used. This enables a much faster coverage of the searchspace since the search space can become quite large even for fairly small networks. For example, a network containing10 genes and 10 proteins and 2 RNAP-cofactor complexes can result in chromosome containing 220 bits.

In each generation, the least fit individuals die and are replaced by new individuals. Hence, as evolution proceeds,the overall fitness of the population keeps increasing. Instead of imposing a single objective function such as stabilizingselection, we imposed two different evolutionary pressures on our GRNs: The first, termed BIOMASS simulates aselective advantage for a faster growth rate. This is achieved by selectively enhancing the expression levels of somegenes with respect to others. We decide a priori that some gene products are linked to cell growth and then selectnetworks that result in higher values for the corresponding proteins, thereby leading to a faster growth. This can byno means be considered a biasing assumption: as a matter of fact, in nature, only a portion of gene products have aneffect on the biomass increase and there are no special reasons, at this level of definition for us not to consider asarbitrary the specific gene products exerting such an effect. The second objective function, BIOMASS-MINLINKS,tries to increase the biomass (as in the first case) while at the same time keeping the number of interactions in thenetwork to a minimum.

The objective function value to be maximized depends on the objective function being used. For BIOMASS, theobjective function value, was used to maximize the difference between pre-selected protein levels and those of therest of the proteins while for BIOMASS-MINLINKS, the objective function value was given by BIOMASS fitness value

100 +

1numlinks , where numlinks is the total number of edges in the network. A ranking fitness function was then used torank the different networks based on their objective function values. In each generation, only 30/40 individuals werereplaced with the top 10 fittest individuals (genotypes) passing through the generations. This ensured that the highmutation rates being used did not lead to the loss of good genotypes obtained earlier in the evolutionary path.

Moreover, we studied the effect of varying the type of inputs to a network. Ingram et al. [59] have shown that thetype of inputs to a simple four node network influences the corresponding output. Hence, the type of inputs is alsoexpected to play a role in the evolution dynamics. We have chosen three main input types: step-together where allthe inputs are stepped together for a given period of time; step-in-seq where the inputs are stepped one after the otherand step-overlap where the inputs are stepped together in an overlapping fashion, such that any two inputs are in anup-regulated state for only parts of their overall up-regulation time.

The last parameter that was varied in our simulations was the nature of the starting population: seed-all denotesa starting population where all the individuals are similar and where every transcriptional regulator (TR) has broadspecificity (that is where every TR can regulate upon binding every gene and every RNAP-cofactor complex cantranscribe every gene) while seed-random denotes a starting population with randomly assigned interactions forthe proteins and the RNAP-cofactor complexes. The two starting conditions represent two well-known features oftranscriptional regulation: seed-all mirrors the presence of genes sharing the same transcription factors (TFs) withoutbeing effectively co-regulated (a simple analysis on TFs data bases (data not reported) gives a probability to shareat least one TF for two randomly picked up genes around 0.20; this is not significantly different from that of twoeffectively co-regulated genes) and thus the existence of many more potential GRNs of those really activated; the seed-random condition instead reflects the specificity of TF binding to DNA sequences that, in the local, really activatednetworks, drives the dynamics of the specific GRN.

We studied a network consisting of 20 genes, 25 proteins, 7 RNAP-cofactor complexes and 10 inputs. The inputshere act as perturbing signals to the system. The networks consist of binary strings reporting the connection pattern

Page 8: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2177

(a) Growth of best individual.

(b) Growth of population.

Fig. 3. Growth of the best individual in the population (a) and the entire population (b). The network consisted of 20 genes, 25 proteins 10 inputsand 7 RNAP-cofactor complexes. The objective type used for this particular run was biomass, the input type was step-together while the seedtype was seed-AUTO. The fitness value of both the best individual as well as the entire population continues to grow until attaining an “optimal”solution. Interestingly, the final evolved population of individuals is almost completely homogeneous with a very few individuals having very lowfitness values. There are no individuals with intermediate fitness values.

among the constituent nodes (as explained above); a mutation corresponds to a change of one element (bit) of thestring.

3. Results

3.1. Population growth and evolution

Fig. 3(a) shows the increase in fitness value of the best individual in the population as evolution proceeds. It can beobserved that the fitness value increases fairly quickly and converges to a solution within about 200 generations. Thisis a little misleading since, in an attempt to speedup the computational process, the mutation rate has been chosen tobe very high 0.8. A much lower mutation rate would lead to a slower convergence to the optimal solution with thedanger of getting mired in a local optimum. This comes from the stochastic nature of genetic algorithm; a greatermutation rate implies a bigger drive to climb rugged landscapes to escape from local minima, given that mutationsappear independently from the fitness gradient. However, the qualitative nature of the evolution of the network remainsunaffected by this computational “trick”.

Fig. 3(b) shows the growth of the entire population over the different generations. It is interesting to note that thefitness of the population as a whole increases rapidly and the population becomes almost homogeneous except for a

Page 9: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2178 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

Table 2The table shows the assignment of the 240 different runs to the six clusters

Clusters Step-together Step-in-seq Step-overlap TotalHomogeneous Random Homogeneous Random Homogeneous Random

Cluster 1 6 9 5 15 4 8 47Cluster 2 8 9 9 3 9 7 45Cluster 3 9 10 5 7 10 4 45Cluster 4 6 4 8 8 5 8 39Cluster 5 2 3 9 5 6 11 36Cluster 6 9 5 4 2 6 2 28

There seems to be a marginally significant distribution of the different classes into the six clusters as evidenced by the Chi-square test (p < 0.05).

Table 3The table shows the results from the discriminant analysis for discriminating between the six different input-type/seed-type combinations

A priori-class Counts/Percentages Estimated class TotalST/H ST/R SIS/H SIS/R SO/H SO/R

ST/HCounts 19 14 0 3 2 2 40Percentages 47.50 35.00 0.00 7.50 5.000 5.00

ST/RCounts 11 26 0 1 1 1 40Percentages 27.50 65.00 0.00 2.50 2.50 2.50

SIS/HCounts 2 1 24 5 5 3 40Percentages 5.00 2.50 60.00 12.50 12.50 7.50

SIS/RCounts 1 3 4 16 10 6 40Percentages 2.50 7.50 10.00 40.00 25.00 15.00

SO/HCounts 3 1 11 4 15 6 40Percentages 7.50 2.50 27.50 10.00 35.00 15.50

SO/RCounts 2 2 4 4 8 20 40Percentages 5.00 5.00 10.00 10.00 20.00 50.00

Accuracy Percentage 52.50 35.00 40.00 60.00 62.50 50.00 50.00

Correct classifications have been given in bold. Although the overall classification accuracy is 50% which is statistically significantly higher than therandom assignment rate of 16.67%, it is far from a one-to-one mapping suggesting that the different simulation runs (different input-type/startingpopulation combinations) cannot be differentiated in the protein-space. ST: step-together, SIS: step-in-seq, SO: step-overlap. H and R refer tohomogeneous and random starting populations respectively.

few individuals with extremely low (and divergent) fitness values. Such individuals will “die out” as a result of naturalselection. It is worth noting the virtual absence of individuals with intermediate fitness values.

In order to check for the possibility to discriminate among solutions (corresponding to different vector-points inthe 20 dimensional space spanned by the 20 protein concentrations) coming from different simulation paradigmswe adopted two different statistical approaches: unsupervised and supervised. Basically, the 240 different simulationruns (20 simulation runs for each of 12 different input-type/seed-type/objective function-type combinations) can beclassified into 6 classes on the bases of input and seed types (3×2 conditions). In the unsupervised approach a k-means(with k = 6) cluster solution was obtained by the coordinates of the 240 runs in the 20 dimensional space (shown inTable 2). This cluster solution, when compared by means of a chi-square test on the 6 × 6 contingency table havingas rows and columns the unsupervised clusters and a priori classes respectively showed a marginally significant Chi-Square statistic (p < 0.05) but no clearcut cluster-class mapping for any of the cluster-class pairs. This implies ageneral “statistical tendency” of the different runs to be different but with no possibility for a clear mapping.

This result was further demonstrated by the supervised learning approach in which a linear discriminant analysis(LDA) algorithm was applied to the 240 runs having the 20 proteins concentrations as input variables and the 6 apriori classes as classification variables. The results are shown in Table 3. The overall rate of correct classifications(how many runs were ascribed to the right class over the total number of runs) was 0.50 and is statistically significantlyhigher that the expected 0.167 for a random assignment when in presence of six classes. However, this is still very farfrom a one-to-one mapping. All in all these results point to the basic inability to discriminate among different classesof simulations in the protein space in accordance with our previous results [32].

Page 10: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2179

Fig. 4. In-degree distribution for homogeneous (top panel) and random (bottom panel) starting populations. Notice the bimodal distribution for thehomogeneous starting population. Legend: ST = step-together, SIS = step-in-seq, SO = step-overlap, B = biomass, BM = biomass-minlinks.

3.2. Effect of starting population

In light of the statistical analysis described in the previous section, it was also observed that surprisingly, ahomogeneous starting population resulted in networks with much higher fitness function values as shown in Table 4.Besides possible artefacts due to the trapping of the GA algorithm into a local minimum, it could be interesting toexplore further the connectivity of the networks evolved from the two different initial populations. However, whatit also means in this case is that the higher fitness function values are more attainable from a homogeneous startingpopulation, that is, one in which there is very little specificity to begin with. Since homogeneous starting populationsseem to lead to higher fitness function values, it would be instructive to explore further, the connectivity of thenetworks identified from the two different initial populations. Table 5 shows the connectivity statistics for networksobtained for the two different seed-types and for the various objective and input types. The table shows the mean andmedian in-degree and out-degree values for the best networks obtained from the different seed/input/objective typecombinations. The median in-degree for the case where the starting population is homogeneous is much lower thanthat for a random starting population (given bold in Table 5). This along with the similarity in the mean in-degreesuggests that whereas for a homogeneous starting population, the in-degree distribution is bimodal with a few nodeshaving a high value and a number of nodes having very low values, the in-degree distribution for a random startingpopulation is unimodal. This can be observed in Fig. 4. This distribution of the in-degree shows a hypothetical scalingperspective that could be related to a scale-free behavior [61] with a few nodes with high in-degrees and the rest ofthe nodes with low in-degrees. However, the number of nodes, even for this larger network is too small to make astatistical claim of “scale-freeness”.

3.3. Robustness of the networks

An area of research that has generated considerable interest in recent years is the study of network robustness andstability. Fig. 5 shows the effect of mutations on the network for a particular objective-input-seed type combination.The absolute differences of the protein levels from their basal levels (for the unmutated network) are plotted forthe mutation of each allele of the bit string. The white spaces correspond to a value of zero. Non-zero values are

Page 11: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2180 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

Fig. 5. Plot of the effect of mutations on the large network consisting of 20 genes, 25 proteins 5 inputs and 7 RNAP-cofactor complexes. Theobjective type used for this particular run was biomass, the input type was step-together while the seed type was seed-AUTO. The absolutedifferences of the protein levels from their basal levels (for the unmutated network) are plotted for the mutation of each allele of the bit string.The white spaces correspond to a value of zero. Non-zero values are color coded from blue (small absolute change in level) to red (large absolutechange in level). The preponderance of white and blue colors along with the fact that there are only a few mutations that lead to massive changesin protein levels attest to the robustness of the network. (For interpretation of the references to colour in this figure legend, the reader is referred tothe web version of this article.)

Table 4The table shows the fitness functions of the “best” individuals for the different objective-type-input type combinations, for the two different seedtypes: homogeneous and random starting populations

Input type Objective type: biomass Objective type: biomass-minlinksHomogeneous Random Homogeneous Random

Step-together 493 490 5.41 4.39Step-in-seq 712 548 6.95 5.83Step-overlap 685 604 6.30 5.66

Table 5Statistics of the connectivity of the best networks for the different seed/objective/input type combinations

Starting Population Statistics Input typesStep-together Step-in-seq Step-overlapObjective type Objective type Objective typeB BM B BM B BM

Homogeneous

Mean in-degree 10.2 10.1 7.8 9.2 10.45 7.35Mean out-degree 11.05 12.85 10.3 11.3 12.6 8.3Median in-degree 9 6 5 7 6 5.5Median out-degree 11 13 10 12 12.5 8.5

Random

Mean in-degree 10.75 9.8 9.65 10.45 9.2 9.5Mean out-degree 13.25 12.65 12.15 12.45 12.3 11.85Median in-degree 11 9 10 10.5 9 10Median out-degree 13 12.5 12 13 12 12

B and BM refer to the biomass and biomass-minlinks objective types respectively. Notice the lower median in-degree values for the case where thestarting population is homogeneous.

color coded from blue (small absolute change in level) to red (large absolute change in level). The preponderance ofwhite and blue colors along with the fact that there are only a few mutations that lead to massive changes in protein

Page 12: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2181

Fig. 6. Plot of the stability of the network versus the number of edges in the network with 20, 30 and 50 nodes. Stability (S) is defined asS = Nλ−/N where Nλ− is the number of negative eigenvalues while N is the total number of eigenvalues of the adjacency matrix. As can beobserved, there is a very steep rise in the number of negative eigenvalues and it reaches a plateau. For a small number of nodes (N < 100), thisplateau is reached when the number of edges ≈ 2 ∗ (N − 1) which seems to be the optimal number of edges placed randomly to make the networkfully connected.

levels (fewer presence of green, yellow and red lines) attest to the robustness of the network despite the fact that theevolutionary pressure did not implicitly or explicitly indicate a preference for more stable systems.

We noticed, through a simple simulation that involved the random addition of connections between nodes, startingfrom a fully disconnected network, that the stability of a network, defined as the fraction of negative eigenvalues of theadjacency matrix, increased with increasing number of connections until a plateau is reached, after which the valueremained the same as more and more interconnections were added (see Fig. 6). This implies robustness and stabilityare common to networks with very different connectivity properties. In the light of this, it is perhaps not surprisingthat networks with a smaller degree of connectivity have a similar robustness to highly connected networks.

4. Discussion

Our work is focused towards achieving a framework for the modeling and simulation of the evolution of generegulatory networks. We model the evolution of a given network using a genetic algorithm that mimics the naturalprocess of evolution, viz., mating and mutation. We studied the evolution of networks under a wide variety ofconditions such as differences in starting populations, objective and input types.

It was observed that while homogeneous and random starting populations provided similar protein expressionprofiles, there were marked differences in the final network connectivities. A homogeneous starting population notonly tends to evolve towards populations with higher fitness values but the in-degree distribution of the best individualsseem to have a bimodal distribution. On the other hand, random starting populations evolve towards individuals withsmaller fitness values. Moreover, in this case, the in-degree distribution shows a unimodal behavior. The fact that thetwo paradigms of seed-all and seed-random gave rise to different results (both in terms of fitness and connectivitypatterns) raises some important issues on the relevance of starting conditions of simulations and about the tenabilityof a single “maximal efficiency” paradigm for the evolution of gene regulatory networks in nature.

We had used a very high mutation rate (0.8) for our simulations. However, it must be borne in mind that there is adanger in increasing the error rate in GAs. Although an increase in the error rate would increase the search radius, it canalso lead to destabilizing fluctuations in GAs that occur at high error rates; a phenomenon called error thresholding.The notion of an error threshold in genetic algorithms is related to the idea of an optimal balance between exploitationand exploration. Too low a mutation rate implies that there is too little exploration; too high a mutation rate on the

Page 13: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2182 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

Fig. 7. Plot of the objective functions values of the final solutions obtained for each mutation rate as a function of the generation in which they wereobtained. As can be observed in general, the higher the mutation rate, the faster the convergence to a solution and the higher the objective function(fitness). The color map on the right goes from blue to maroon for mutation rates ranging from 0.01 to 0.9. (For interpretation of the references tocolour in this figure legend, the reader is referred to the web version of this article.)

Table 6Statistics of the connectivity of the best networks for a specific input-type/objective-type/seed-type combination (Step-in-seq/Biomass/seed ALL)for different mutation rates

Mutation rate Mean in-degree Median in-degree Mean out-degree Median out-degree

0.01 16.55 17 21.25 210.05 13.40 13.50 16.85 16.500.1 11.05 11 14 13.500.2 11.80 11.50 14.90 14.500.5 13.60 11 16.50 160.75 6.95 5.5 7.75 70.90 7.7 3 8.6 8.5

other hand can lead to the degeneration of the evolutionary process into random search. Any optimal mutation ratemust lie between these two extremes. Thus using a high mutation rate could lead to a degenerate solution. However,this is true mainly for a generational GA [68] where the entire population changes in every generation (with all theparents dying out). In our case, in every population only 75% of the individuals die in every generation leaving thebest 25% unchanged. Hence, in this case, a high mutation rate does not lead to a degenerate solution since “memory”of the previous best individuals (genotypes) in the evolutionary path is always present.

While this is not strictly realistic, it is sufficient for our needs of exploring the landscape of “final” solutions on theevolutionary pathway. However, in order to make sure that our solutions are indeed the “best” we have still run a seriesof simulations using different mutation rates (0.01 0.02 0.05 0.1 0.2 0.5 0.75 0.9) on a particular input-type/objective-type/seed-type combination (Step-in-seq/Biomass/seed ALL). Fig. 7 shows a plot of the objective functions of thefinal solutions obtained for each mutation rate, as a function of the generation in which the solutions converged. Ingeneral, higher mutation rates lead to a faster convergence with higher objective function values suggesting that thehigher mutation rates have indeed led to solutions with higher fitness values. It must be mentioned here that we keptthe generations to convergence as a constant value. An increase in this parameter could well lead to higher fitness ratesfor even smaller mutation values, but at a much longer evolutionary time. Fig. 8 shows the in-degree distributions forthe 7 different mutation rates studied. As expected from the smaller objective function values for the lower mutationrates, the in-degree distribution is both unimodal as well has high mean and median values (Table 6). However, forhigher mutation rates, we see the typical bimodal distribution as seen in Fig. 4 with much smaller mean and medianvalues.

Page 14: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2183

Fig. 8. Plot of the in-degree distributions for the 7 different mutation rates. The x-axis shows the in-degree while the y-axis shows the frequency.As expected from the smaller objective function values for the lower mutation rates, the in-degree distribution is both unimodal and with highermean and median values (see Table 6). However, for the higher mutation rates, we see the typical bimodal distribution with much smaller mean andmedian in-degree values.

As reported in (Fig. 6) the stability of networks increases slowly with increasing number of interactions and thenreaches a plateau. This implies that there are a number of equivalent networks of widely differring connectivitiesthat are equally stable. This is also borne out by the work of Tsong et al. [54]. In a recent paper, they examined thegene-regulatory circuit that governed the mating types in several yeast species and observed the existence of a widedivergence at the genotypic level despite conservation at the phenotypic level.

This is exactly what we observe through our simulations; that is, the presence of many networks with differentnetwork structures (genotypes) giving rise to similar protein values (phenotypes).1 Since natural selection actson phenotypes, mutations that do not affect the phenotype are often conserved and can become fixed in naturalpopulations [62–64]. This process, called Developmental System Drift (DSD) is apparently ubiquitous and hassignificant implications for the flexibility of developmental evolution of both conserved and evolving characters.

Of further interest is the fact that the plateau in Fig. 6 is reached at a value corresponding to ≈ 2 ∗ (N − 1), whichseems to be the optimal number of randomly placed edges that can make the network fully connected. This seemsreminiscent of a sort of percolation threshold. It appears as if there is a selective advantage (assuming a GRN thatis fully connected and scale free as reported in Ref. [61]), for networks with connectivities that correspond to thebeginning of the plateau, rather than to the end. Does this then imply that, considering an early stage in the evolutionof life, the early population had a wide diversity of genotypes? However, from our simulation of the larger network, weobserve that a fully connected and homogeneous starting population has a bimodal in-degree distribution as opposed to

1 It is important to stress that we are adopting a biologically improper terminology for genotype and phenotype. From a biological point ofview, the network wiring is a phenotype similar to protein expression levels, but we think this use of the genotype and phenotype terms can graspthe fact that network wiring can be directly affected by mutations (we can think of a DNA mutation leading to a less effective protein with theconsequent elimination of an interconnection of the network or any other similar mechanism) while protein levels are a consequence. Anotherexcellent metaphor is in considering the network wiring as the protein sequence and protein expression levels as the protein structure.

Page 15: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2184 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

a unimodal one for a random starting population. This is more in line with the evolution of hubs in metabolic networksas proposed by Kacser and Beeby [65] and shown as plausible by means of a computer simulation by Pfeiffer et al. [66]in which they assumed that the initial population was made up of a few enzymes with broad specificity which thengave rise to enzymes with enhanced specificities.

Hence, it appears that the make-up of the initial population does play a role in determining the types of networksthat are observed. It must however be borne in mind that the less than optimal behavior for seed-random simulationsarise due to the fact that a fully connected population can explore a larger portion of the energy landscape with respectto random networks. This is akin to having a “higher temperature” in simulated annealing. However, it is difficult tosay at this point in time as to whether this behavior is a local “computational” effect or a more general effect.

Since there are a number of equivalent networks that can produce similar expression patterns, it appears as if thereis an additional selective pressure that implicitly forces the network to evolve to a scale-free model. What is thisselective pressure? This is a question that is critical to an understanding of the whole evolutionary process of GRNsand which is yet to be answered. Moreover, we also observed the presence of autoregulatory loops (both positive andnegative) as well as PTMs. However, most of the PTMs tended to be activating mechanisms. This could very well bea result of the fact that PTMs in our model are reversible and hence tend to reflect specific activating modificationssuch as phosphorylations. On the other hand, the incorporation of irreversible PTMs in our model could help to modelthe effect of ubiquitination-like modifications.

With regard to the robustness of the evolved network, it is pertinent to reiterate the observation by Waddingtonthat the wild type of an organism is much less variable in appearance than the majority of the mutant races [67]. Thisobservation, essentially implies that as long as genetic variation exists, any mechanism that dampens the effects ofthat variation on the phenotype is expected to be favored by stabilizing selection [41]. Siegal et al. showed that thedevelopmental system, modeled as a network of interacting transcriptional regulators, constrained the genetic systemto produce canalization (robustness), even without selection towards an optimum [41]. They observed that the extentof canalization or robustness depends on the connectivity of the network with more highly connected systems beingmore canalized.

While our observations regarding the appearance of robustness in the evolved network as an emergent propertythat is not dependent on selection towards an optimum supports the view of Siegel et al. Fig. 5 shows that networksdiffering widely in their connectivity show similar robustness to mutations in contrast to the observation by Siegelet al. that the degree of robustness increases with increasing connectivity. This occurs even though we use a strictdefinition of “loss-of-function” where a change in the expression levels of any single protein by >25% implies loss offunctionality. A relaxation of this definition would lead to an increase in the robustness of the networks. Bergman et al.observed that the availability of loss-of-function mutations accelerates adaptation to a new optimum phenotype [42].Can this then be the selective pressure that guides evolution of GRNs towards sparser, scale-free like networks?

5. Conclusion

In this work we developed a computational framework for the simulation and analysis of the evolution of GRNs.We analyzed the effect of objective functions, input types and starting population on the evolution of GRNs. Wealso studied the properties of these evolved networks in terms of their robustness (canalization). We observed thatrobustness evolves along with the networks as an emergent property even in the absence of selective pressure towardsmore robust systems. We also observed the presence of various genotypes that give rise to the same phenotypes inaccordance with the theoretical view that natural selection operates on phenotypes thereby accommodating variationin the genotype by fixing those changes that are phenotype-neutral. These genotypes were of varying degrees ofconnectivity. The question that remains to be answered deals with the nature of the evolutionary force that guidesthe evolution towards specific types of networks (mostly scale-free). We believe that understanding the selection ofparticular types of networks, from the set of all the possible network types and identifying the selective pressure willenable us in understanding the evolution of GRNs.

Acknowledgments

Arun Krishnan and Masaru Tomita would like to acknowledge the Tsuruoka city and Yamagata prefecturegovernments for funding this project.

Page 16: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

A. Krishnan et al. / Physica A 387 (2008) 2170–2186 2185

References

[1] S.A. Teichmann, M. Madan Babu, Gene regulatory network growth by duplication, Nature Genet. 36 (5) (2004) 492–496.[2] M. Madan Babu, N.M. Luscombe, L. Aravind, M. Gerstein, S.A. Teichmann, Structure and evolution of transcriptional regulatory networks,

Curr. Opin. Struct. Biol. 14 (2004) 283–291.[3] T.S. Gardner, J.J. Faith, Reverse-engineering transcription control networks, Phys. Life Rev. 2 (1) (2005) 65–88.[4] E.H. Snoussi, R. Thomas, Logical identification of all steady states: The concept of feedback loop characteristic states, Bull. Math. Biol.

55 (1993) 973–991.[5] R. Thomas, The role of feedback circuits: Positive feedback circuits are a necessary condition for positive real eigenvalues of the Jacobian

matrix, Ber. Bunsenges. Phys. Chem. 98 (1994) 1148–1151.[6] A. Keller, Specifying epigenetic states with autoregulatory transcription factors, J. Theoret. Biol. 170 (1994) 175–181.[7] R. Thomas, D. Thieffry, M. Kaufmann, Dynamical behaviour of biological regulatory networks-I. Biological role of feedback loops and

practical use of the concept of the loop-characteristic state, Bull. Math. Biol. 57 (1995) 247–276.[8] A. Keller, Model genetic circuits encoding autoregulatory transcription factors, J. Theoret. Biol. 172 (1995) 169–185.[9] D.M. Wolf, F.H. Eckmann, On the relationship between genomic regulatory element organization and gene regulatory dynamics, J. Theoret.

Biol. 195 (1998) 167–186.[10] P. Smolen, D.A. Baxter, J.H. Byrne, Frequency selectivity, multistability and oscillations emerge from models of genetic regulatory systems,

Amer. J. Physiol. 274 (1998) C531–C542.[11] P. Smolen, D.A. Baxter, J.H. Byrne, Effects of macromolecular transport and stochastic fluctuations on the dynamics of genetic regulatory

systems, Amer. J. Physiol. 277 (1999) C777–C790.[12] B.Z. Liu, J.H. Peng, Y.C. Sun, Y.W. Liu, A comprehensive dynamical model of pulsatile secretion of the hypothalamo-pituitary-gonadal axis

in man, Comput. Biol. Med. 27 (1997) 507–513.[13] S.M. Reppert, A clockwork explosion, Neuron 21 (1998) 1–4.[14] H. Smith, Oscillations and multiple steady states in a cyclic gene model with repression, J. Math. Biol. 25 (1987) 169–190.[15] T. Scheper, D. Klinkenberg, C. Pennartz, J. van Pelt, A mathematical model for the intracellular circadian rhythm generator, J. Neurosci.

19 (1999) 40–47.[16] A. Goldbeter, A model for circadian oscillations in the Drosophila period (PER) protein, Proc. R. Soc. Lond. Ser. B. 261 (1995) 319–324.[17] J.C. Leloup, A. Goldbeter, A model for circadian rhythms in Drosophila incorporating the formation of a complex between the PER and TIM

proteins, J. Biol. Rhythms. 13 (1998) 70–87.[18] P. Smolen, D.A. Baxter, J.H. Byrne, Modeling transcriptional control in gene networks — methods, recent results and future directions,

Bull. Math. Biol. 62 (2000) 247–292.[19] R. Somogyi, C. Sniegoski, Modeling the complexity of genetic networks: Understanding multigenic and pleiotropic regulation, Complexity

1 (1996) 45–63.[20] J. Boden, Programming the Drosophila embryo, J. Theoret. Biol. 188 (1997) 391–445.[21] X.L. Wen, S. Fuhrman, G. Michaels, D. Carr, S. Smith, J. Barker, R. Somogyi, Large scale temporal gene expression mapping of central

nervous system development, Proc. Natl. Acad. Sci. USA 95 (1998) 334–339.[22] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome wide expression patterns, Proc. Natl. Acad.

Sci. USA 95 (1998) 14863–14868.[23] J. Tyson, H.G. Othmer, The dynamics of feedback control circuits in biochemical pathways, Progr. Theoret. Biol. 5 (1978) 2–62.[24] J.S. Griffith, Mathematics of cellular control processes. I. Negative feedback to one gene, J. Theoret. Biol. 20 (1968) 202–208.[25] J.S. Griffith, Mathematics of cellular control processes. II. Positive feedback to one gene, J. Theoret. Biol. 20 (1968) 209–216.[26] T. Mestl, E. Plahte, S.W. Omholt, A mathematical framework for describing and analyzing gene regulatory networks, J. Theoret. Biol.

176 (1995) 291–300.[27] T. Mestl, C. Lemay, L. Glass, Chaos in high dimensional neural and gene networks, Physical D 98 (1996) 33–52.[28] E. Plahte, T. Mestl, S.W. Omholt, A methodological basis for description and analysis of systems with complex switch-like interactions,

J. Math. Biol. 36 (1998) 321–348.[29] H. Mcadams, A. Arkin, Simulation of prokaryotic genetic circuits, Ann. Rev. Biophys. Biomed. Struct. 27 (1998) 199–224.[30] C.H. Yuh, H. Bolouri, E.H. Davidson, Genomic cis-regulatory logic, experimental and computational analysis of a sea urchin gene, Science

279 (1998) 1896–1902.[31] H. de Jong, Modeling and simulation of genetic regulatory systems: A literature review, J. Comput. Biol. 9 (1) (2002) 67–103.[32] A. Krishnan, A. Giuliani, M. Tomita, Indeterminacy of reverse engineering of gene regulatory networks: The curse of gene elasticity, PLoS

ONE 2 (6) (2007) e562.[33] S.S. Shen-Orr, R. Milo, S. Mangan, U. Alon, Network motifs in the transcriptional regulation network of Escherichia Coli, Nature Genet.

31 (2002) 64–68.[34] N. Guelzim, S. Bottani, P. Bourgine, F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network, Nature Genet.

31 (2002) 60–63.[35] A. Wagner, Does evolutionary plasticity evolve, Evolution 50 (3) (1996) 1008–1023.[36] L.H. Hartwell, J.J. Hopfield, S. Leibler, A.W. Murray, From molecular to modular cell biology, Nature 402 (1999) C47–C52.[37] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, U. Alon, Network motifs: Simple building blocks of complex networks, Science 298 (2002)

824–827.[38] A. Thompson, P. Layzell, Evolution of Robustness in an Electronic Design. in: 3rd Int. Conf. on Evolvable Systems, ICES2000, 2000,

pp. 218–228.

Page 17: Evolution of gene regulatory networks: Robustness as an emergent property of evolution

2186 A. Krishnan et al. / Physica A 387 (2008) 2170–2186

[39] H. Lipson, J.B. Pollack, N.P. Suh, On the evolution of modular variation, Evolution 56 (2002) 1549–1556.[40] E.A. Variano, J.H. McCoy, H. Lipson, Networks, dynamics and modularity, Phys. Rev. Lett. 92 (2004) 188701.[41] M.L. Siegal, A. Bergman, Waddington’s canalization revisited: Developmental stability and evolution, Proc. Natl. Acad. Sci. USA 99 (16)

(2002) 10528–10532.[42] A. Bergman, M.L. Siegal, Evolutionary capacitance as a general feature of complex gene networks, Nature 424 (2003) 549–552.[43] S. Gavrilets, A. Hastings, A quantitative-genetic model for selection on developmental noise, Evolution 48 (5) (1994) 1478–1486.[44] G.P. Wagner, G. Booth, H. Bagheri-Chaichian, A population genetic theory of canalization, Evolution 51 (2) (1997) 329–347.[45] L.W. Ancel, W. Fontana, Plasticity, evolvability and modularity in RNA, J. Exp. Zool. 288 (2000) 242–283.[46] S.H. Rice, The evolution of canalization and the breaking of Von Baer’s laws: Modeling the evolution of development with epistasis, Evolution

52 (1998) 647–656.[47] I. Eshel, C. Matessi, Canalization, genetic assimilation and preadaptation: A quantitative genetic model, Genetics 149 (1998) 2119–2133.[48] C. Pal, I. Miklos, Epigenetic inheritance, genetic assimilation and speciation, J. Theoret. Biol. 200 (1999) 19–37.[49] T.J. Kawecki, The evolution of genetic canalization under fluctuation selection, Evolution 54 (2000) 1–12.[50] M. Madan Babu, S.A. Teichmann, L. Aravind, Evolutionary dynamics of prokaryotic transcriptional regulatory networks, J. Mol. Biol.

358 (2) (2006) 614–633.[51] D.A. Hume, Probability in transcriptional regulation and its implications for leukocyte differentiation and inducible gene expression, Blood

96 (7) (2000) 2323–2328.[52] W.J. Blake, M. Kaern, C.R. Cantor, J.J. Collins, Noise in eukaryotic gene expression, Nature 422 (2003) 633–637.[53] X. Wang, N. Hao, H.G. Dohlman, T.C. Elston, Bistability, stochasticity, and oscillations in the mitogen-activated protein kinase cascade,

Biophys. J. 90 (2006) 1961–1978.[54] A.E. Tsong, B.B. Tuch, H. Li, A.D. Johnson, Evolution of alternative transcriptional circuits with identical logic, Nature 443 (2006) 415–420.[55] A. Brazma, T. Schlitt, Reverse engineering of gene regulatory networks: A finite state linear model, Genome Biol. 4 (6) (2003) P5.[56] G. Gibson, G. Wagner, Canalization in evolutionary genetics: A stabilizing theory? BioEssays 22 (2000) 372–380.[57] S. Ciliberti, O. Martin, A. Wagner, Robustness can evolve gradually in complex regulatory networks with varying topology, PLoS Comput.

Biol. 3 (2) (2007) e15.[58] F. Hayot, C. Jayaprakash, A feedforward loop motif in transcriptional regulation: Induction and repression, J. Theoret. Biol. 234 (2005)

133–143.[59] P.J. Ingram, M.P.H. Stumpf, J. Stark, Network Motifs: Structure does not determine function, BMC Genomics 7 (2006) 108.[60] N.L. Novere, T.S. Shimizu, StochSim: Modelling of stochastic biomolecular processes, Bioinformatics 17 (6) (2001) 575–576.[61] A.L. Barabasi, E. Bonabeau, Scale-free networks, Sci. Amer. 288 (5) (2003) 60–69.[62] J.R. True, E.S. Haag, Developmental system drift and flexibility in evolutionary trajectories, Evol. Dev. 3 (2001) 109–119.[63] K.M. Weiss, S.M. Fullerton, Phenogenetic drift and the evolution of genotype–phenotype relationships, Theor. Popul. Biol. 57 (2000)

187–195.[64] A. Rokas, Different paths to the same end, Nature 443 (2006) 401–402.[65] H. Kacser, R. Beeby, Evolution of catalytic proteins or on the origin of enzyme species by means of natural selection, J. Mol. Evol. 20 (1984)

38–51.[66] T. Pfeiffer, O.S. Soyer, S. Bonhoeffer, The evolution of connectivity in metabolic networks, PLOS Biol. 3 (7) (2005) e228.[67] C.H. Waddington, Canalization of development and the inheritance of acquired characters, Nature 150 (1942) 563–565.[68] C.H. Waddington, Error thresholds in genetic algorithms, Evol. Comput. 14 (2) (2006) 157–182.