Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

10
Boolean modeling of biological regulatory networks: A methodology tutorial Assieh Saadatpour a , Réka Albert b,a Department of Mathematics, The Pennsylvania State University, University Park, PA 16802, USA b Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA article info Article history: Available online 9 November 2012 Communicated by David Arnosti Keywords: Biological regulatory networks Boolean models Structural analysis Dynamic analysis Drosophila segment polarity gene network abstract Given the complexity and interactive nature of biological systems, constructing informative and coherent network models of these systems and subsequently developing efficient approaches to analyze the assembled networks is of immense importance. The integration of network analysis and dynamic mod- eling enables one to investigate the behavior of the underlying system as a whole and to make experi- mentally testable predictions about less-understood aspects of the processes involved. In this paper, we present a tutorial on the fundamental steps of Boolean modeling of biological regulatory networks. We demonstrate how to infer a Boolean network model from the available experimental data, analyze the network using graph-theoretical measures, and convert it into a predictive dynamic model. For each step, the pitfalls one may encounter and possible ways to circumvent them are also discussed. We illus- trate these steps on a toy network as well as in the context of the Drosophila melanogaster segment polar- ity gene network. Ó 2012 Elsevier Inc. All rights reserved. 1. Introduction Recent advances in high-throughput techniques have led to a tremendous growth in the amount and type of biological data that reflect various biological interactions at different levels. Any cell responds to its environmental stimuli through a coordinated se- quence of biological processes that includes interactions and regu- lation at the signal transduction, gene, protein or metabolism level [1]. For example, at the gene level, transcriptional regulatory pro- teins regulate the activity of their own or of other genes, i.e. induce or repress the transcription from the respective genes, leading to an increase or a decrease in the amount of the expressed mRNA transcript. This in turn changes the concentrations of the corre- sponding proteins and subsequently leads to a new state, or possi- bly a new behavior of the cell. Therefore all genes and their products (i.e., proteins) form a complex web of interactions, in which each component can have a positive or negative effect on the activity of other components. The interactions at the protein le- vel arise from the fact that proteins can form complexes to enable additional functionalities or they can be involved in posttransla- tional modification of other proteins. Given the large number of components (i.e., thousands of genes, proteins and metabolites) and complex interactions in biological systems, network modeling can serve as a powerful tool for coher- ent representation of the whole system in a unified template. In network modeling, the components of the system are represented by nodes (vertices), whereas the interactions and processes among the nodes are denoted by edges (links). Edges in the network can be directed according to the orientation of mass transfer or infor- mation propagation. In addition, the interaction types can be signi- fied with a positive or negative sign representing activation or inhibition, respectively. The aforementioned processes at different levels can be represented in the form of signal transduction net- works [2,3], gene regulatory networks [4,5], protein–protein inter- action networks [6,7], and metabolic networks [8,9], respectively. In particular, the nodes of the gene regulatory networks, in their most general form, consist of genes, mRNAs, and proteins, and the edges denote transcription, translation, transcriptional regula- tion or protein–protein interactions. Considering the myriad of components available inside a cell, we may deal with enormously complicated networks of interactions. Deciphering the structure and dynamics of these networks is the first step towards under- standing the overall behavior of the cell. The network representation provides a backbone for structural analysis and dynamic modeling of the underlying biological sys- tem. Structural analysis includes the use of graph-theoretical mea- sures, such as centrality measures and shortest paths, to extract information on the heterogeneity and organization of the network [10]. Dynamic models reflect the fact that the nodes of the network are molecular species, and describe how the population levels of these species change over time. Dynamic modeling approaches are divided into two major categorizes, namely continuous or dis- crete, according to the description of nodes’ states. Continuous 1046-2023/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ymeth.2012.10.012 Corresponding author. Address: 104 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802, USA. Fax: +1 8148653604. E-mail addresses: [email protected] (A. Saadatpour), [email protected], [email protected] (R. Albert). Methods 62 (2013) 3–12 Contents lists available at SciVerse ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth

description

Boolean modeling of biological regulatory networks A methodology tutorial

Transcript of Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

Page 1: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

Methods 62 (2013) 3–12

Contents lists available at SciVerse ScienceDirect

Methods

journal homepage: www.elsevier .com/locate /ymeth

Boolean modeling of biological regulatory networks: A methodology tutorial

Assieh Saadatpour a, Réka Albert b,⇑a Department of Mathematics, The Pennsylvania State University, University Park, PA 16802, USAb Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA

a r t i c l e i n f o

Article history:Available online 9 November 2012Communicated by David Arnosti

Keywords:Biological regulatory networksBoolean modelsStructural analysisDynamic analysisDrosophila segment polarity gene network

1046-2023/$ - see front matter � 2012 Elsevier Inc. Ahttp://dx.doi.org/10.1016/j.ymeth.2012.10.012

⇑ Corresponding author. Address: 104 Davey LaboraUniversity, University Park, PA 16802, USA. Fax: +1 8

E-mail addresses: [email protected] (A. [email protected] (R. Albert).

a b s t r a c t

Given the complexity and interactive nature of biological systems, constructing informative and coherentnetwork models of these systems and subsequently developing efficient approaches to analyze theassembled networks is of immense importance. The integration of network analysis and dynamic mod-eling enables one to investigate the behavior of the underlying system as a whole and to make experi-mentally testable predictions about less-understood aspects of the processes involved. In this paper,we present a tutorial on the fundamental steps of Boolean modeling of biological regulatory networks.We demonstrate how to infer a Boolean network model from the available experimental data, analyzethe network using graph-theoretical measures, and convert it into a predictive dynamic model. For eachstep, the pitfalls one may encounter and possible ways to circumvent them are also discussed. We illus-trate these steps on a toy network as well as in the context of the Drosophila melanogaster segment polar-ity gene network.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

Recent advances in high-throughput techniques have led to atremendous growth in the amount and type of biological data thatreflect various biological interactions at different levels. Any cellresponds to its environmental stimuli through a coordinated se-quence of biological processes that includes interactions and regu-lation at the signal transduction, gene, protein or metabolism level[1]. For example, at the gene level, transcriptional regulatory pro-teins regulate the activity of their own or of other genes, i.e. induceor repress the transcription from the respective genes, leading toan increase or a decrease in the amount of the expressed mRNAtranscript. This in turn changes the concentrations of the corre-sponding proteins and subsequently leads to a new state, or possi-bly a new behavior of the cell. Therefore all genes and theirproducts (i.e., proteins) form a complex web of interactions, inwhich each component can have a positive or negative effect onthe activity of other components. The interactions at the protein le-vel arise from the fact that proteins can form complexes to enableadditional functionalities or they can be involved in posttransla-tional modification of other proteins.

Given the large number of components (i.e., thousands of genes,proteins and metabolites) and complex interactions in biologicalsystems, network modeling can serve as a powerful tool for coher-

ll rights reserved.

tory, The Pennsylvania State148653604.our), [email protected],

ent representation of the whole system in a unified template. Innetwork modeling, the components of the system are representedby nodes (vertices), whereas the interactions and processes amongthe nodes are denoted by edges (links). Edges in the network canbe directed according to the orientation of mass transfer or infor-mation propagation. In addition, the interaction types can be signi-fied with a positive or negative sign representing activation orinhibition, respectively. The aforementioned processes at differentlevels can be represented in the form of signal transduction net-works [2,3], gene regulatory networks [4,5], protein–protein inter-action networks [6,7], and metabolic networks [8,9], respectively.In particular, the nodes of the gene regulatory networks, in theirmost general form, consist of genes, mRNAs, and proteins, andthe edges denote transcription, translation, transcriptional regula-tion or protein–protein interactions. Considering the myriad ofcomponents available inside a cell, we may deal with enormouslycomplicated networks of interactions. Deciphering the structureand dynamics of these networks is the first step towards under-standing the overall behavior of the cell.

The network representation provides a backbone for structuralanalysis and dynamic modeling of the underlying biological sys-tem. Structural analysis includes the use of graph-theoretical mea-sures, such as centrality measures and shortest paths, to extractinformation on the heterogeneity and organization of the network[10]. Dynamic models reflect the fact that the nodes of the networkare molecular species, and describe how the population levels ofthese species change over time. Dynamic modeling approachesare divided into two major categorizes, namely continuous or dis-crete, according to the description of nodes’ states. Continuous

Page 2: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

4 A. Saadatpour, R. Albert / Methods 62 (2013) 3–12

models, usually described as a set of differential equations, are themost relevant strategy to capture the dynamical nature of biologi-cal systems. However, the use of these models is hampered by thescarcity of the kinetic details of the interactions in all but a handfulof extensively studied pathways. On the other hand, discrete mod-els, such as finite state logical models [11,12], Boolean models[13,14], and Petri nets [15,16], provide a qualitative descriptionthat requires no or few parameters. The class of piece-wise lineardifferential equation (hybrid) models [17] bridges the gap betweencontinuous and discrete models by characterizing each node bytwo variables, a continuous concentration and a discrete activity.These models meld the logical description of the regulatory rela-tionships with a linear concentration decay. Which model to selectdepends on the level of quantitative details of the available exper-imental data: continuous models can be employed when sufficientkinetic information is available, discrete models are best suited forless-characterized systems with no kinetic details, and hybridmodels can be used when partial information on the kineticparameters is available.

The focus of this tutorial is on Boolean models, which were firstintroduced by Kauffman [18] and Thomas [19] for modeling generegulatory networks. In these models each node can have only twoqualitative values; 0 (OFF) and 1 (ON). The OFF state refers to a be-low-threshold concentration or activity, which is insufficient toinitiate the intended process or regulation, and the ON state isan above-threshold concentration or activity. Although the con-centration of components in biological systems changes continu-ously over time, the input–output curves of the regulatoryinteractions are often sigmoidal and can be well estimated bystep-functions [14], thereby providing an ample justification forthe use of Boolean models. Notably, these models have been suc-cessfully applied in modeling various biological systems such asthe segment polarity gene network in Drosophila melanogaster[20–22], flower morphogenesis in Arabidopsis thaliana [23], corti-cal area development in mammalian brain [24], the cell cycle ofseveral different organisms [25–28], glucose repression signalingpathways in Saccharomyces cerevisiae [29], mammalian immuneresponse to bacteria [30], T cell receptor signaling [31], phospho-lipase C-coupled calcium signaling [32], abscisic acid signalingnetwork in plants [33], as well as a T cell survival network inhumans [34].

In this paper, we provide a tutorial on the basic steps of Booleanmodeling of biological regulatory networks with a focus on generegulatory networks. However, the methods are general enoughto be applied to signal transduction networks as well, and indeedthey have been applied successfully [29,31–34]. We demonstratehow to infer a Boolean network from the available experimentaldata, analyze it using graph-theoretical measures, and convert itinto a predictive dynamic model. For each step, the decisions andrevisions that may need to be made are described. These stepsare demonstrated for a toy network and in the context of thewell-studied Drosophila segment polarity gene network, whichwe describe next.

Fig. 1. Illustration of the Boolean rules corresponding to a simple regulatorynetwork. (A) Network representation. A, B, and C are the network nodes. Thedirected edge ? or —| denotes activation or inhibition, respectively. (B) Booleanrules governing the nodes’ states for the network given in (A). For simplicity thestate of a node is represented by the node name. The asterisk denotes the futurestate of the labeled node.

2. Drosophila segment polarity gene network

The body of the fruit fly D. melanogaster, like in other arthro-pods, is composed of segments. A single-celled egg develops intoa multi-cellular embryo with 14 segments through a hierarchicalcascade of genetic interactions involving more than 40 genes[35]. The successive transient expression of the gap and pair-rulegenes from maternal morphogens creates a prepattern of the para-segments (the embryonic counterparts of the segments) and initi-ates the expression of the segment polarity genes. These genes,which are expressed throughout the life of the fruit fly, stabilize

the pattern, define narrow boundaries between the parasegments,and contribute to subsequent developmental processes [35,36].The segment polarity genes have a periodic spatial expression pat-tern wherein their mRNA is expressed (has a high concentration) incertain cells, and not expressed (has a very low concentration) inother cells. The main segment polarity genes include engrailed(en), wingless (wg), hedgehog (hh), patched (ptc), cubitus interruptus(ci) and sloppy paired (slp) (see e.g., [37–39]), which encode fortheir corresponding proteins denoted by EN, WG, HH, PTC, CI,and SLP. Two additional proteins, which are cleaved from CI, arealso involved: a transcriptional activator (CIA) and a transcriptionalrepressor (CIR).

The segment polarity genes refine and maintain their expres-sion through a network of intra- and inter-cellular regulatoryinteractions, which is referred to as the Drosophila segment polar-ity gene network [20,40]. This network has been extensively stud-ied using continuous [40–43], Boolean [20,21,44,45], as well ashybrid [22,46] approaches. The primary goal of these models wasto capture the spatially resolved expression of the segment polaritygenes, starting from an initial condition corresponding to the initialpre-pattern of the genes, and ending at a steady state that corre-sponds to the known sustained pattern of the genes. The modelsalso examined how this process changes in the presence of tran-sient or permanent perturbations. von Dassow et al. [40] proposedthe first model for this network: a continuous model containing 13differential equations and 48 unknown kinetic parameters. Theirresults suggested that the network’s topology has an essential im-pact on the robustness of the model. This idea was further sup-ported using Boolean dynamic models [20,21]. In the followingsection, we describe Boolean modeling of biological regulatory net-works in a step-by-step fashion with illustrations from the Dro-sophila segment polarity gene network.

3. Boolean models

A Boolean network model provides a qualitative representationof a system wherein each node can take two possible values de-noted by 1 (ON) or 0 (OFF). This indicates, for example, that a geneis expressed or not expressed, a transcription factor is active orinactive, and a molecule’s concentration is above or below a certainthreshold. This binary value represents the state of a node. In Bool-ean models, the future state of a node is determined based on a lo-gic statement on the current states of its regulators. Thisstatement, called a Boolean rule (function), is usually expressedvia the logic operators AND, OR, and NOT. A simple network con-taining three nodes along with the Boolean rules that express theregulation among the nodes is given in Fig. 1.

Page 3: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

A. Saadatpour, R. Albert / Methods 62 (2013) 3–12 5

3.1. Network inference

Reconstruction of a network structure for a biological regula-tory system consists of integrating information about the compo-nents involved in the system and their interactions from eithergene expression (e.g., microarray) data or causal experimentalobservations.

3.1.1. Inference of the nodes of the networkAs currently it is not possible to comprehensively model the

dynamics of genome-wide regulatory networks, the model is usu-ally confined to target a single behavior or outcome, e.g. paraseg-ment formation in D. melanogaster [20,40], or differentiation of Tcells [47,48]. The aim of this focused model is to include all thegenes and gene products involved in the targeted outcome orbehavior. The list of components to include usually starts from acore set known from the literature, and it expands as additionalinformation is included. For example, large-scale gene expressiondata can be used to identify which genes are differentially ex-pressed over the examined conditions, thus indicating that thesegenes may be associated with the behavior of interest [49–51]. Inaddition, new nodes for the regulatory network can be identifiedusing causal experimental evidence, where modulation of an al-ready known component of the network leads to variations inexpression/concentration of a gene/protein, implying that thegene/protein should be added to the network [52].

3.1.2. Inference of the edges of the networkOnce the nodes of the network are identified, the next step is to

infer the edges connecting these nodes (i.e., interactions among thecomponents). Information on the edges of a regulatory networkcan be obtained from either gene expression data [53–61] or causalexperimental evidence [62]. Inference of edges from gene expres-sion data is achieved using probabilistic, deterministic, or probabi-listic-deterministic hybrid methods [63]. Among the probabilisticapproaches are Bayesian network methods [54,55], which detecta directed, acyclic (feedback loop free) graph, along with a set ofjoint probability distributions describing the causal relationshipsamong the components. Dynamic Bayesian networks allow theinclusion of cyclic regulatory pathways in the inferred network[61]. Deterministic model-driven approaches, including continu-ous [56,57] and discrete methods [58,59], relate the rate of changein the expression level of a given gene to that of other genes using aset of differential or logic equations. Solving these equations re-sults in the identification of potential regulatory relationships thatcan best reproduce the observed expression data. Probabilistic-deterministic hybrid methods, such as probabilistic Boolean net-works [60], take into account the stochastic fluctuations in geneexpression levels by incorporating uncertainty in the designateddependency relationships.

There are three major types of causal experimental evidencefrom which information on edges of a network can be extracted[62]: (1) biochemical evidence of transcription factor-gene interac-tions, enzymatic activity or protein–protein interactions whichindicate direct interaction between two components; (2) geneticevidence of the effect of the mutation of a particular componenton another component; this evidence leads to an indirect causalrelationship between the two components; or (3) pharmacologicalevidence of the effect of exogenous treatment of a particular com-ponent on another component; this evidence also leads to an indi-rect causal relationship between the two components. Theintegration of the indirect causal evidence is often challengingand non-intuitive, as each such apparently pairwise relationshipmay in fact reflect a set of adjacent edges (a path) in the network,and it may involve other, known or unknown nodes. Fortunately,algorithmic approaches exist and have been implemented in the

software package NET-SYNTHESIS [64], a regulatory network infer-ence and simplification tool, which generates the sparsest networkconsistent with the given causal evidence. In some cases, one mayneed to perform manual curation of the output of the software tofind the most realistic network corresponding to the availableexperimental observations [34].

3.1.3. Network visualizationAfter the nodes and edges of the network were identified, one

can assemble a network corresponding to the regulatory systemof interest to get a big picture of the whole system. Several soft-ware packages are available for network visualization and analysisincluding yEd Graph Editor [65], Graphviz [66], Cytoscape [67,68],and Pajek [69].

3.1.4. Inference of the Drosophila segment polarity networkThe interaction network corresponding to the segment polarity

genes in Drosophila was constructed based on qualitative experi-mental data (generally of genetic nature) on causal relationshipsamong the genes [20,40]. As mentioned earlier, von Dassow et al.[40] constructed the first network for the segment polarity genesin Drosophila containing five genes (en, wg, ptc, ci, and hh) and theirproteins, and modeled it using a continuous approach. Their initialchoice of the network topology was not able to reproduce the ob-served expression patterns of the genes. However, adding two newinteractions to the network (an activating edge from WG to wg aswell as an inhibitory edge from CIR to en) restored consistencywith known robust patterns even in the presence of large varia-tions in the kinetic parameters. The authors of the first Booleanmodel of this network decided not to include the putative inhibi-tory edge from CIR to en as no evidence was found in the literatureto support the presence of this edge in the network, and they alsoincorporated the wg auto-regulation in a weaker form [20]. Otherextensions of this reconstructed network include addition of aninhibitory edge from EN to ptc as well as a new node called SloppyPaired (SLP) with its respective edges, all supported by experimen-tal observations [20]. This network was subsequently refined fur-ther by Chaves et al. [21] as depicted in Fig. 2. This figurerepresents intra- and inter-cellular interactions of a cell and itstwo neighbors. When expression of the segment polarity genes be-gins, a given gene is expressed in every fourth cell [20]. Thus onecan focus on a single parasegment of four cells and impose periodicboundary conditions on the ends. There are 13 components percell, thus the total number of nodes in the model is 4 � 13 = 52.

3.2. Constructing the Boolean rules

Having assembled the network, the next step is to construct theBoolean rules based on the relevant literature. If a node has onlyone regulator, then a single variable, usually denoted by the labelof the regulator, appears in its Boolean rule. This variable is com-bined with a NOT operator if the regulator is an inhibitor. Forexample, node A in Fig. 1A is inhibited by node C, resulting inthe Boolean rule A⁄ = NOT C, where the asterisk signifies the futurestate of node A. When a node has multiple regulators, the OR oper-ator is used if any of the regulators can activate the node, and theAND operator is used if co-expression of all of them is required forsuccessful activation of the node. For example, in the case of node Bin Fig. 1A the rule B⁄ = A AND C expresses that both A and C need tobe active (ON) in order for B to be able to turn ON.

When constructing the Boolean rules, one may encounter diffi-culties in deciding whether to use an AND or OR operator in a rulewherein a node is controlled by more than one regulator. Forexample, node B in Fig. 1 is regulated by A and C. How can onedetermine, in a real situation, if activation of both A and C, or onlyone of them, is required for the activation of B? If there is experi-

Page 4: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

Fig. 2. The network of interactions between the Drosophila segment polarity genes. The gray background layers illustrate a cell and its two neighbors. The shape of the nodesindicates whether the corresponding substances are mRNAs (ellipses) or proteins (rectangles). The edges of the network signify either biochemical reactions (e.g., translation,protein interactions) or regulatory interactions (e.g., transcriptional activation). Activating or inhibiting edges are represented by ? or —|, respectively. This figure and itscaption were adapted from [21].

6 A. Saadatpour, R. Albert / Methods 62 (2013) 3–12

mental evidence that knocking out either A or C leads to the ab-sence of B, then AND should be used. In contrast, if there is evi-dence that only simultaneous knockout of A and C wouldinactivate B, then OR should be used. When such information isnot available, if practical, one can construct several variants ofthe Boolean rules and compare the properties of the model variantswith those of the real system to decide which is the best [70]. If thisis not feasible, the OR operator may be used as a default, and themodel can be updated once additional information is obtained.Alternatively, one can employ probabilistic Boolean networks[60,71], which allow incorporating uncertainty in the rules byassigning different Boolean rules to a node, each with a certainprobability of being selected.

Table 1Boolean rules governing the nodes’ states in the Drosophila segment polarity genenetwork given in Fig. 2. Subscripts signify cell number, and the asterisk denotes thefuture state of the labeled node. This table was adapted from [21].

Node Boolean rule

SLPi SLP�i ¼0 if i 2 f1;2g1 if i 2 f3;4g

�wgi wgi

⁄ = (CIAi AND SLPi AND NOT CIRi) OR (wgi AND (CIAi OR SLPi) ANDNOT CIRi)

WGi WGi⁄ = wgi

eni eni⁄ = (WGi�1 OR WGi+1) AND NOT SLPi

ENi ENi⁄ = eni

hhi hhi⁄ = ENi AND NOT CIRi

HHi HHi⁄ = hhi

ptci ptci = CIAi AND NOT ENi AND NOT CIRi

PTCi PTCi⁄ = ptci OR (PTCi AND NOT HHi�1 AND NOT HHi+1)

cii cii⁄ = NOT ENi

CIi CIi⁄ = cii

CIAi CIAi⁄ = CIi AND (NOT PTCi OR HHi-1 OR HHi+1 OR hhi�1 OR hhi+1)

CIRi CIRi⁄ = CIi AND PTCi AND NOT HHi�1 AND NOT HHi+1 AND NOT hhi�1

AND NOT hhi+1

3.2.1. Constructing the Boolean rules for the Drosophila segmentpolarity gene network

In constructing the Boolean rules for this network, first somegeneral assumptions were made: (i) inhibitors are always domi-nant over activators, and (ii) the state of proteins follows the stateof their mRNAs with a time delay. However, there are some excep-tions from the latter assumption. For example, the rule for PTC al-lows for sustained PTC expression even if its mRNA ptc decays.There are three justifying arguments for this rule. First, experimen-tal observations support a sustained, though diminished, level ofPTC in the third cell of the parasegment whereas ptc expressiondisappears from this cell [72]. Second, as PTC is a transmembranereceptor protein and binding to HH of the neighboring cells is theonly reaction PTC participates in, it can be assumed that existingPTC levels are maintained even in the absence of ptc if there isno HH in the neighboring cells [20]. Third, allowing PTC to decayfrom the third cell following the decay of ptc causes the productionof CIA instead of CIR in that cell, which will then cause the abnor-mal expression of wg in the third cell.

We also note that the expression of SLP protein is assumed to beconstant. Indeed, this protein (which is a merger of two proteinswith similar functions) is translated from the pair-rule gene slp,which is activated before the segment polarity genes and ex-

pressed constitutively thereafter [73,74]. Since the translation ofSLP is not affected by any nodes in the segment polarity network,it was assumed that its expression does not change. The fulldescription of the Boolean rules is given in Table 1.

3.3. Structural analysis

Structural analysis of an assembled network by means of graph-theoretical measures sheds light on the topological organizationand function of the underlying biological system. These measuresrange from local to global topological measures, which provideinformation on individual nodes/edges or the whole network,respectively. The most often used measures include connectivitymeasures (such as distance) and centrality measures (such as nodedegree or betweenness centrality). The Python library NetworkX

Page 5: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

A. Saadatpour, R. Albert / Methods 62 (2013) 3–12 7

[75] and the software CellNetAnalyzer [76,77] offer tools for struc-tural analysis of a network.

From a graph theoretical standpoint, a path is a sequence ofadjacent edges in the network. The number of edges on the short-est path connecting two given nodes is called the distance betweenthe two nodes. For example, the distance from A to B in Fig. 1A isone, while the distance from B to A is infinity as there is no pathfrom B to A. A network is connected (or strongly connected inthe directed case) if there is a path (or two directed paths of oppo-site orientations in the directed case) between every pair of nodes.If a network is not (strongly) connected, it is informative to identify(strongly) connected components (or sub-graphs) of the network.In directed graphs, each strongly connected component (SCC)may have an in-component (nodes that can reach the SCC) andout-component (nodes that can be reached from the SCC). Forexample, nodes A and C in Fig. 1A form an SCC having node B inits out-component. It was reported that the transcriptional regula-tory networks have no or only small SCCs (e.g., three-node feed-back loops) [78,79], whereas a large SCC was observed inmetabolic [80] and signaling [81] networks. Having no SCCs indi-cates that the network has an acyclic structure (i.e., it does not con-tain feedback loops), while having a large SCC implies that thenetwork has a central core and the nodes outside this core are inits in- or out-component. Nodes in each of these subsets tend tohave a common task. For example, in signaling networks, the nodesof the in-component tend to be involved in ligand-receptor bindingand the nodes of the out-component are usually responsible for thetranscription of target genes as well as phenotypic changes [81].

Centrality measures describe the importance of individualnodes in the network. The simplest of such measures is the nodedegree, which quantifies the number of edges connected to eachnode. For directed networks, the in- and out-degree of a node is de-fined as the number of edges coming into or going out of the node,respectively, which collectively define the node degree. In particu-lar, the nodes with only outgoing edges (that is, nodes with in-de-gree = 0) are called sources, and nodes with only incoming edges(out-degree = 0) are sinks of the network. Such nodes indicatethe initial and terminal points of flow propagation in signal trans-duction networks. In Fig. 1A, for example, node B is a sink node. It isalso useful to determine the nodes whose degree is highest amongother nodes, termed hubs, whose removal can break down a net-work into isolated clusters [10]. The expression patterns of geneproducts in regulatory networks can be used to further classifyhubs into permanent hubs, e.g. multi-functional transcription fac-tors and house-keeping regulators, and transient hubs which areinfluential in one condition but less so in others [82]. The between-ness centrality of a node is another centrality measure, which is de-fined as the fraction of shortest paths between any pair of nodespassing through the given node [83]. This measure provides infor-mation on the importance of a node in mediating flow through thenetwork. The betweenness centrality of an edge can be definedsimilarly [84]. Although node degree and betweenness centrality(of a node or an edge) are local topological properties, they canbe generalized to distributions over all the nodes (or edges) ofthe network to get insights into the heterogeneity (diversity) ofnode interactivity levels or of the flow propagation through thenetwork [63]. For example, transcriptional regulatory networkstend to have a decreasing but long-tailed out-degree distribution(indicating a large heterogeneity, including the existence of tran-scription factors that regulate many genes) and a narrower in-de-gree distribution [85,86].

Besides the measures described above, it is also beneficial toidentify network motifs, recurring patterns of interconnectionwith well-defined topologies, which form simple building blocksof the network architecture [87]. Among these motifs are feed-forward and feedback loops. A three-node transcriptional feed-

forward loop consists of a transcription factor X that regulates asecond transcription factor Y, and both X and Y regulate gene Z[88]. Notably, such motifs are more abundant in the transcrip-tional regulatory networks of different organisms compared torandomized networks of the same degree distribution[87,89,90]. Feed-forward loops can carry out several functionssuch as filtering of noisy input signals, pulse generation, and re-sponse acceleration [88]. A feedback loop is a directed cyclewhose sign depends on the number of negative interactions inthe cycle; a positive/negative feedback loop has an even/oddnumber of negative interactions [14]. These motifs, as we willsee in the next section, play a pivotal role in the dynamics of asystem [14,91,92].

In the Drosophila segment polarity network given in Fig. 2, thenode SLP has only outgoing edges (in-degree = 0), thus it is asource node in the network. All other nodes have non-zero in-and out-degrees. The two proteins CIR and CIA have the highestin-degree of four as they can be activated in many ways. HHhas the highest out-degree as it has three outgoing edges to eachof the neighboring cells, thus affecting the future expression of sixother nodes. We can also see that there are two negative feedbackloops between ptc, PTC, and CIR as well as between ptc, PTC, andCIA.

We also note that there is a way of incorporating the Booleanrules into the structure of a network by introducing a complemen-tary node for any node whose negated expression (by the NOToperator) appears in the rules, as well as a composite node forany combination of the nodes connected by the AND operator[20]. Although this expansion of the network increases the numberof nodes, it has the benefit of having only one edge type, that is, allthe edges now indicate activation. In particular, the expanded net-work of the segment polarity genes provided important insightsinto the identification of coexpressed genes [20]. For example, itshowed that the cells expressing en and hh never express wg, ptc,or ci, a well-known polarization that is the basis for the name ‘‘seg-ment polarity genes’’ [36]. In a recent study [93], the idea ofexpanding the network combined with consideration of the cas-cading effects of a node’s removal was utilized to identify essentialcomponents of signal transduction networks. In that study, theconcept of elementary signaling mode, defined as the minimalset of nodes that can perform signal transduction independently,was also introduced; this is an adaptation of the path concept tonetworks exhibiting combinatorial regulation. Although this meth-od was primarily used for signal transduction networks, it can beadapted for evaluating the importance of genes in gene regulatorynetworks by considering the connectivity of the whole network in-stead of the connectivity from source to sink nodes [93]. It shouldbe noted that among the graph-theoretical measures describedabove, only the latter one requires the knowledge of the Booleanrules, analysis of the other measures only needs the knowledgeof nodes and edges of the network.

3.4. Dynamic analysis

3.4.1. Time implementationIn order to convert a static network representation into a dy-

namic model, one first needs to decide how to implement time.In Boolean models, time is an implicit variable and can be imple-mented via synchronous or asynchronous update algorithms. Syn-chronous models assume similar timescales for all the processesinvolved in a system, and are implemented by updating all thenodes’ states simultaneously [13]. More precisely, the state of nodei at time step t + 1, denoted by xi(t + 1), is determined based on thestate of its regulators at the tth time step:

xiðt þ 1Þ ¼ fiðxi1 ðtÞ; xi2 ðtÞ; . . . ; ximiðtÞÞ;

Page 6: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

8 A. Saadatpour, R. Albert / Methods 62 (2013) 3–12

where fi is the Boolean rule for node i and xij ’s, 1 6 j 6 mi, are thestates of its regulators. Although the simplicity and computationalefficiency of synchronous models is attractive, they cannot describewell the biological systems that include processes at multiple levels,e.g. mRNA, protein, and post-translational levels, because the time-scales of these processes range from fractions of a second to hours[3]. To overcome this limitation, asynchronous models have beendeveloped wherein the nodes are updated based on their individualtimescales [19,94]. Different asynchronous algorithms have beenproposed so far, including stochastic asynchronous algorithms, suchas the random order asynchronous [21,95] and general asynchro-nous [95] update methods, as well as the deterministic asynchro-nous algorithm [22]. In the random order asynchronousalgorithm, at each time step a random permutation of the nodesis selected and the nodes are updated in that order [21]. In this case,xi(t + 1) is determined according to the most recent updated state ofthe regulators of node i:

xiðt þ 1Þ ¼ fiðxi1 ðsi1 Þ; . . . ; ximiðsimiÞÞ;

where sij 2 ft; t þ 1g. If the position of regulator j is before node i inthe permutation sij ¼ t þ 1, otherwise sij ¼ t. In the general asyn-chronous framework, at each time step a randomly selected nodeis updated [95,96]. In the deterministic asynchronous framework,each node has a pre-selected time unit, ci, (either based on a prioriknowledge or randomly chosen from a uniform distribution) and isupdated at positive multiples of that unit [22]:

xiðt þ 1Þ ¼fiðxi1 ðtÞ; :::; ximi

ðtÞÞ if t þ 1 ¼ kci; k ¼ 1;2;3; :::

xiðtÞ otherwise:

(

A recent study provides a comparison of these three asynchro-nous methods applied to the same biological system [97].

3.4.2. Attractor analysisBy updating the state of the nodes according to synchronous or

asynchronous algorithms, one can obtain the state of the wholesystem at each time step, which is expressed by a vector whoseith element represents the state of node i at that time step. We notethat the Boolean model of a network with n nodes has a total of 2n

states. These states and the allowed transitions among them formthe state transition graph of the system. Starting from an initialstate in the state transition graph and iteratively updating the stateof the nodes, the state of the system evolves over time and follow-ing a trajectory of states, it eventually settles into an attractor.Attractors, which describe the long-time behavior of a system, fallinto two groups: fixed points (steady states), wherein the state of

Fig. 3. Synchronous and asynchronous state transition graphs of the network representeand C, respectively. The gray states are attractors of the system. (A) State transition graphdeterministic update. (B) State transition graph obtained by the random order asynchronof node permutations.

the system does not change, and complex attractors, wherein thesystem oscillates among a set of states. As fixed points of a systemare time independent, they are the same for both synchronous andasynchronous models. In contrast, complex attractors can be differ-ent in synchronous and asynchronous models. In synchronous anddeterministic asynchronous models, the system regularly oscillatesamong a set of states, called a limit cycle. In this case, the length(period) of a limit cycle is the number of states in the cycle. In sto-chastic asynchronous models, the system may oscillate irregularlyamong a set of states; these states form a terminal SCC of the statetransition graph (i.e., an SCC with an empty out-component). Foreach attractor, the set of states that can reach that attractor iscalled the basin of attraction of that attractor.

To obtain the fixed points of small Boolean networks, one canremove the time dependency from the Boolean rules and solvethe resulting set of equations. In addition, methods such as scalarequations and reduced scalar equations [98,99], which are basedon iterations of the Boolean rules, can be used to analytically iden-tify the limit cycles of synchronous models of small networks. Toidentify attractors of large Boolean networks one can take advan-tage of the available search-based methods that utilize the proper-ties of the state transition graphs [96,100,101], or simplify thenetwork prior to dynamic analysis (see next subsection).

Fig. 3 represents the state transition graph of the network givenin Fig. 1 obtained by synchronous and random order asynchronousmethods. In both models, states 001 and 100 are the fixed points.The synchronous model also has a limit cycle of period two con-taining states 010 and 101, which is absent from the asynchro-nous model. Indeed, synchronous models may exhibit limitcycles not present in the corresponding continuous or asynchro-nous models [26,97,102]. We also note that the fixed points canbe obtained analytically by solving the Boolean equations givenin Fig. 1B independent of time. Indeed, substituting A = NOT C intothe second and third equations results in B = 0 and C@C. The latterexpression implies that C can be 0 or 1, and as a result A can be 1 or0, respectively. Therefore the two fixed points (001 and 100) areobtained analytically as well.

By comparing the state transition graphs given in Fig. 3A and B,we see that in the state transition graph of the synchronous model,each state has a unique successor, which is not the case in theasynchronous model. Consequently, attractors in the state transi-tion graph of the synchronous model have disjoint basins of attrac-tion, whereas the basin of attraction of different attractors in(stochastic) asynchronous models may overlap. For example, states000, 010, 101, and 111 in Fig. 3B are in the basins of attraction ofboth fixed points.

d in Fig. 1. The binary digits from left to right represent the state of the nodes A, B,obtained by the synchronous model. Each state has a single successor for this type ofous model. Each states can have multiple successors, at most as many as the number

Page 7: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

Table 2Absorption times and absorption probabilities for the Markov chain corresponding tothe state transition graph of Fig. 3B when the chain starts from the transient states.Absorption probabilities are given for each fixed point: 001 on the left, and 100 onthe right.

State 000 010 011 101 110 111

Absorption time 1.0 1.0 1.0 1.3 1.0 1.3Absorption probability 0.5, 0.5 0.5,0.5 1.0, 0.0 0.5, 0.5 0.0,1.0 0.5,0.5

A. Saadatpour, R. Albert / Methods 62 (2013) 3–12 9

3.4.3. Network reductionAs the size of the state transition graph of a Boolean network is

an exponential function of the number of nodes in the network, itis difficult to map the whole state transition graph for even rela-tively small networks. In such cases, one can sample over a largeset of initial states [103] or take advantage of the available networkreduction techniques to simplify the network while preserving itsessential dynamical properties [97,104–107]. For example, themethod described in [104,106] is based on the removal of stablevariables (i.e., nodes whose states stabilized due to their regulationand irrespective of the update method or initial condition) and ofleaf nodes (i.e., nodes with out-degree = 0). The method presentedin [105,107] iteratively removes the nodes without a self-loop fromthe network. We recently proposed a network reduction methodand successfully applied it to gain insights into the dynamic behav-iors of the drought response in plants and the T-LGL leukemia sig-naling network in humans [97,108]. This method is based on theremoval of stable variables and simple mediator nodes (such asthe ones with in- and out-degree of one). This reduction methodenabled us to identify, for example, attractors of the unperturbedand perturbed models of the T-LGL leukemia signaling networkand to analyze their basins of attraction, thereby predicting novelcandidate therapeutic targets for the disease [108].

3.4.4. Absorption times and absorption probabilitiesA stochastic asynchronous model, including the random order

and general asynchronous models, can be characterized by a Mar-kov chain for which each entry of its transition matrix, pij, denotesthe probability of a transition from state i to state j, 0 6 i; j 6 2n � 1(decimal representation of the binary numbers). If a system has atleast one fixed point and it is possible to reach a fixed point fromeach transient state, then starting from an initial state, the ex-pected number of time steps to be absorbed into one of the fixedpoints (absorption time) as well as the probability to reach eachfixed point (absorption probability) can be computed using theanalysis of the transition matrix (see e.g. [109]). For example, thetransition matrix corresponding to the state transition graph givenin Fig. 3B is as follows, where the probabilities represent the frac-tions of the total number of permutations transforming one stateto another:

P ¼

0 1=2 0 0 1=2 0 0 00 1 0 0 0 0 0 00 1=2 0 0 1=2 0 0 00 1 0 0 0 0 0 00 0 0 0 1 0 0 00 1=3 0 1=6 1=3 0 1=6 00 0 0 0 1 0 0 00 1=3 0 1=6 1=3 0 1=6 0

0BBBBBBBBBBBBB@

1CCCCCCCCCCCCCA

By a rearrangement of the states, the matrix P can be parti-

tioned as P ¼ ðQ R0 I Þ, where Q denotes the submatrix containing

the transient states, I is an identity matrix, and R is the remainingpart. When the chain starts from the transient state i, the expectednumber of time steps for the chain to be absorbed into one of thefixed points is obtained by summing up the entries of the ith row of

the matrix ðI � QÞ�1, where I is the identity matrix of the same sizeas Q. In addition, the probability to reach each fixed point from the

transient state i is given by the ith row of the matrix ðI � QÞ�1 � R[109]. The absorption times and probabilities for the state transi-tion graph of Fig. 3B are given in Table 2. As we can see, the absorp-tion times from states 101 and 111 are longer than the otherstates as they have multiple ways to reach the fixed points.

Furthermore, except for two cases that converge to only one ofthe fixed points, all other states reach the two fixed points withequal probabilities.

3.4.5. Effect of the feedback loopsThe structure of a network plays a determinant role in its

dynamics. For example, it has been conjectured that the presenceof positive feedback loops in the network is necessary for multi-stability (having multiple fixed points), while negative feedbackloops are necessary for sustained oscillations [92]. This conjecturewas proven correct in several continuous [110–113] and discrete[114,115] frameworks. We can use the network of Fig. 1A for anillustration of the dynamic effects of feedback loops. The originalnetwork contains a positive feedback loop between nodes A andC, and it has two fixed points (it is bistable). If we change one ofthe inhibitory edges between nodes A and C into an activation,i.e. changing the positive feedback into a negative feedback, thenthere will be no fixed points for the system and instead it exhibitsoscillatory behavior both in synchronous and asynchronousmodels.

3.4.6. Software packages for dynamic simulationsSeveral software resources can be employed for dynamic anal-

ysis of Boolean models such as BooleanNet [116], BoolNet [117],SimBoolNet [118], GINsim [119], SQUAD [120], ChemChains[121], and ADAM [122]. The state transition graphs in Fig. 3 weregenerated using BooleanNet.

3.4.7. Dynamic analysis of the Drosophila segment polarity genenetwork

Now we illustrate how the described dynamic methods wereapplied in the context of the Drosophila segment polarity gene net-work. The segment polarity genes maintain the parasegment bor-ders and later the polarity of the segment in stages 8–11 of theembryonic development [36]. As the Boolean dynamic model ofthis network was intended to describe the effect of the segmentpolarity genes in maintaining the parasegment borders, the pat-terns of these genes formed before stage 8 was considered as theinitial state of the system [20]. This initial state includes a two-cell-wide SLP stripe in the posterior (toward the tail) half [73], asingle-cell-wide wg stripe in the most posterior part [35], single-cell-wide en and hh stripes in the most anterior (toward the head)part [35,123], and ci and ptc expressed in the posterior three-fourths [35,72,124]. Since the proteins are translated after themRNAs are transcribed, it was assumed that the proteins are notexpressed (OFF) in the initial state [20].

Starting from the above initial state, the synchronous model ofthe system reached the wild-type fixed point, in agreement withthe experimentally known results, after six time steps [20]. Remov-ing the time dependencies from the Boolean rules in Table 1 andsolving the resulting set of Boolean equations, as it was done forthe simple example of Fig. 1, results in ten solutions, six of whichare distinct fixed points as summarized in Table 3, and the restare minor variations of the previous six [20]. Three distinct fixedpoints were observed experimentally, corresponding to the wild-type state and two states (with either broad stripes or no stripes)

Page 8: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

Table 3Characterization of the distinct fixed points for the Drosophila segment polarity genenetwork. Subscripts indicate cell numbers. This table was adapted from [21].

Fixed point Expressed mRNAs/proteins

Wild-type wg4, WG4, en1, EN1, hh1, HH1, ptc2,4, PTC2,3,4, ci2,3,4, CI2,3,4, CIA2,4, CIR3

Broad stripes wg3,4, WG3,4, en1,2, EN1,2, hh1,2, HH1,2, ptc3,4, PTC3,4, ci3,4, CI3,4, CIA3,4

No segmentation ci1,2,3,4, CI1,2,3,4, PTC1,2,3,4, CIR1,2,3,4

Wild-type variant wg4, WG4, en1, EN1, hh1, HH1, ptc2,4, PTC1,2,3,4, ci2,34, CI2,3,4, CIA2,4, CIR3

Ectopic wg3, WG3, en2, EN2, hh2, HH2, ptc1,3, PTC1,3,4, ci1,3,4, CI1,3,4, CIA1,3, CIR4

Ectopic variant wg3, WG3, en2, EN2, hh2, HH2, ptc1,3, PTC1,2,3,4, ci1,3,4, CI1,3,4, CIA1,3, CIR4

10 A. Saadatpour, R. Albert / Methods 62 (2013) 3–12

observed in knockout and over-expression experiments, as de-picted in Fig. 4. The existence of the additional fixed points sug-gests that, for suitable initial conditions, this network canproduce patterns beyond those required in Drosophila embryogen-esis [20]. Estimating the size of the basins of attraction of eachfixed point was achieved by fixing the state of all nodes but oneas in that fixed point and determining how many of such statesreach that particular fixed point. This procedure was repeated forall the initial states wherein the expression of pairs and tripletsof nodes is different from that fixed point. After extrapolation itwas estimated that only a small fraction of the total initial statesreach the wild-type fixed point whereas the majority of the statesreach the fixed point with broad stripes.

Introducing time variability into the model using asynchronousdynamic frameworks enabled further insights into the develop-ment of gene expression patterns. For example, the random orderasynchronous model of the system showed that in the course ofa large number of simulations that start from the wild-type initialcondition but use different random node permutations, more thanhalf of the simulations lead to the wild-type fixed point but a con-siderable percentage results in an infeasible final state, for examplethe non-segmented or broad-striped pattern [21]. This is indicativeof the fragility of the wild-type pattern with respect to unboundedvariability of the synthesis and decay timescales. It was also foundthat the divergence from the wild-type fixed point is due to a tran-

Fig. 4. Illustration of the gene expression pattern of wingless in wild-type and two mutanexpression pattern of wingless in a gastrulating (stage 9) embryo obtained from http://wwmaintained for around 3 hours of embryonic development. The parasegmental furrows ffixed point obtained by the Boolean models. Horizontal rows correspond to the pattern ofparasegments wherein left corresponds to anterior and right to posterior. Each parasegmeexpressed. (B) Top: wingless expression pattern in a patched knockout mutant embryo aappear at the middle of the parasegment, indicating a new en–wg boundary. Bottom: B(with the change that ptc and PTC are not expressed), or when wg, en, or hh are initiatmutant embryo at stage 11 according to [123]. The initial periodic pattern disappearssegmented steady-state of the Boolean models, obtained when wg, en, or hh are kept OFFfrom [21] with permission from Elsevier. Gene expression images in parts (B) and (C) we

sient CIA/CIR imbalance in the posterior half of the parasegment,thus predicting that perturbations of the post-translational modifi-cation of Cubitus Interruptus can have as severe effects as muta-tions [21]. Interestingly, introducing a single realistic constraint,i.e., a timescale separation between protein-level processes andmRNA-level processes, resulted in having only the wild-type (witha frequency of 87.5%) and broad striped (with a frequency of 12.5%)fixed points. A similar timescale separation in the deterministicasynchronous version yielded a convergence to the wild-type fixedpoint with a frequency of 93% or more [22]. Analysis of the absorp-tion times for the random order asynchronous model of this net-work revealed the fast convergence to the wild-type pattern(with an expected time of four steps), contrasted with the longand oscillation-strewn path toward the broad striped pattern (withan expected time of 15 steps) [21]. Taken together, these resultssuggest that the system’s dynamical trajectory is robust even inthe presence of considerable variability in timescales.

3.5. Validity of the reconstructed model

To assess the validity of a reconstructed model, one needs tocheck if the known experimental observations can be replicatedby the dynamic analysis. Therefore, if there is experimental evi-dence for a certain behavior of a system, which cannot be repro-duced by the dynamic model, then the network model and/or theBoolean rules should be revised. To this end, one can, for example,change the rules with uncertainties in them e.g. those for which itwas not clear whether to use an OR or AND operator. Since theBoolean model of the Drosophila segment polarity gene networkreproduces the available experimental observations quite well, itsuggests that the structure of the network reasonably representsthe core features of the underlying biological system. Once newinformation on the involvement of additional nodes/edges in a syn-thesized network becomes available, they can be incorporated intothe model. For example, after the reconstruction of the segmentpolarity gene network, it was observed experimentally that the

ts with the corresponding fixed points obtained using Boolean models. (A) Top: Genew.fruitfly.org. Other segment polarity genes have similar periodic patterns that are

orm at the posterior border of the wg-expressing cells [36]. Bottom: The wild-typeindividual nodes—specified at the left side of the row—over two full and two partialnt is assumed to be four cells wide. A black (gray) box denotes a node that is (is not)t stage 11 according to [123]. The wingless stripes broaden, and secondary furrowsroad striped fixed point of the Boolean models, obtained when patched is kept OFFed in every cell [20]. (C) Top: wingless expression pattern in an engrailed knockout, and gives rise to a non-segmented, embryonic lethal phenotype. Bottom: Non-

, or cell-to-cell signaling is disrupted [20]. This figure and its caption were reprintedre reprinted from [123] with permission from Cold Spring Harbor Laboratory Press.

Page 9: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

A. Saadatpour, R. Albert / Methods 62 (2013) 3–12 11

EN protein inhibits slp transcription [125]. Subsequently, to incor-porate this experimental observation into the Boolean model, thefollowing rule was suggested for slp and SLP [22]:

RX�i ¼0 if i 2 f1;2g1 if i 2 f3;4g

�slp�i ¼ RXi AND NOT ENi

SLP�i ¼ slpi;

where RX represents a combination of regulation by the pair-rulegenes responsible for the establishment of slp. Considering thewild-type initial states for slp and SLP, it was proven that thisnew rule results in constant patterns for the two nodes and accord-ingly the same results as in the previous models [22]. In a latermodel, the rule was refined further to incorporate the feedback ofthe segment polarity network on SLP without the need of introduc-ing the new node RX [46].

3.6. Robustness of the reconstructed model

In order to assess the robustness of the reconstructed modelone can perform a perturbation analysis on the network structureand Boolean rules, or on the nodes’ states. For the former, therobustness can be examined through, for example, randomly add-ing an edge between two components or interchanging OR andAND operators in a rule. For the latter, one can study the effectof knockout or over-expression mutations by fixing the state of anode at 0 or 1, respectively. This analysis can be performed on aparticular initial state of the system or on the attractors. For exam-ple, knocking out en, wg, or hh in the wild-type initial state of theDrosophila segment polarity gene network in the synchronousmodel led to the fixed point with no segmentation (see Table 3),while ptc knockout or several over-expressed initial states resultedin the broad-striped pattern, in agreement with several experi-mental observations [20]. One of the interesting predictions of thisanalysis was the importance of the wg prepattern as it was foundthat an initial condition with no cell expressing wg or WG led tothe same non-segmented attractor as a wg knockout. Several otherresearchers also studied the effect of perturbations in the nodes’states or in the Boolean rules on the attractors of the segmentpolarity gene network [44,45,126]. For example, Subramanianand Gadgil [45] studied the robustness of the synchronous modelof this system under transient perturbations in the nodes’ states.By flipping the state(s) of one or multiple nodes in the wild-typefixed point for a short period of time, they identified critical nodes,such as wg in the first and third cells or en or hh in the second andfourth cells of the parasegment, whose state changes lead the sys-tem away from this fixed point [45].

For systems with cell-to-cell interactions robustness to changesin the number of cells can also be studied. This has been done inthe Drosophila segment polarity gene network using the synchro-nous Boolean as well as hybrid frameworks and led to the conclu-sion that rounds of cell division do not disrupt the pattern of thesegment polarity genes [20,46]. In general, a model is not expectedto be robust to every possible change, but fulfilling some degree ofrobustness reflects the adaptability of the underlying biologicalsystem under different conditions and leads to experimentallytestable predictions.

4. Conclusions

In this tutorial, we presented the basic steps of Boolean model-ing of biological regulatory networks from network reconstructionto structural and dynamic analysis of the inferred network. We notethat the results of Boolean models are greatly affected by the accu-racy of the inferred network and its Boolean rules. Therefore, to get

the most out of the model it is essential to bring to bear all the avail-able experimental observations into the assembled network. Struc-tural and dynamic analysis of the network can then provide insightsinto the topological organization and temporal behavior, respec-tively, of the underlying system. Such an analysis holds the promiseof new predictions which can guide future experimental studies,thus leading to further model refinements.

Since Boolean models are parameter free, they serve as a suit-able starting point for modeling poorly characterized systems,especially large-scale regulatory networks, to gain coarse-grainedinsights into the unknown facets of such systems. In addition,one can incorporate certain quantitative details on the timescales(e.g., the relative rates of two processes) in Boolean models by con-sidering priority classes either in the order of updates [21,26] or inthe probabilities of the node updates [127]. Furthermore, perform-ing a large number of Boolean dynamic simulations and obtainingthe fraction of simulations that have a certain state results in asemi-quantitative curve [34]. Nonetheless, it should be noted thatBoolean models cannot capture fine details of the underlying bio-logical system. For example, negative feedback loops often gener-ate oscillations in Boolean models (especially synchronous ones),but in continous models they can be a source of homeostasis[128] or excitation-adaptation behavior [88] as well. In addition,infinitesimal perturbation analysis cannot be achived in standard(synchronous/asychnronous) Boolean models as they only allowthe extreme case of reversing a node’s state [70]. In such situati-tons, if partial quantitative and/or kinetic information about thebiological processes involved in the system is available, one cantake advantage of alternative modeling strategies, such asautonomous Boolean networks [129,130] or hybrid models[17,22,131,132], in which time is considered to be continuous, toget further insights into such systems.

Acknowledgment

The authors were supported by the NSF grant CCF-0643529.

References

[1] B.Ø. Palsson, Systems Biology: Preperties of Reconstructed Networks,Cambridge University Press, New York, 2006.

[2] H.A. Kestler, C. Wawra, B. Kracher, M. Kuhl, BioEssays 30 (2008) 1110–1125.[3] J.A. Papin, T. Hunter, B.Ø. Palsson, S. Subramaniam, Nat. Rev. Mol. Cell Biol. 6

(2005) 99–111.[4] G. Karlebach, R. Shamir, Nat. Rev. Mol. Cell Biol. 9 (2008) 770–780.[5] T.I. Lee, N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, et al., Science 298

(2002) 799–804.[6] S. Cho, S.G. Park, D.H. Lee, B.C. Park, J. Biochem. Mol. Biol. 37 (2004) 45–52.[7] M. Pellegrini, D. Haynor, J.M. Johnson, Expert Rev. Proteomics 1 (2004) 239–

249.[8] N.C. Duarte, S.A. Becker, N. Jamshidi, I. Thiele, M.L. Mo, et al., Proc. Natl. Acad.

Sci. USA 104 (2007) 1777–1782.[9] V. Hatzimanikatis, C. Li, J.A. Ionita, L.J. Broadbelt, Curr. Opin. Struct. Biol. 14

(2004) 300–306.[10] R. Albert, A.L. Barabasi, Rev. Mod. Phys. 74 (2002) 47–97.[11] L. Sanchez, J. van Helden, D. Thieffry, J. Theor. Biol. 189 (1997) 377–389.[12] L. Sanchez, D. Thieffry, J. Theor. Biol. 211 (2001) 115–141.[13] S.A. Kauffman, Origins of Order: Self-Organization and Selection in Evolution,

Oxford Univ. Press, New York, 1993.[14] R. Thomas, R. D’Ari, Biological Feedback, CRC Press, Boca Raton, 1990.[15] A. Sackmann, M. Heiner, I. Koch, BMC Bioinformatics 7 (2006) 482.[16] C. Chaouiya, Brief Bioinform 8 (2007) 210–219.[17] L. Glass, J. Chem. Phys. 63 (1975) 1325–1335.[18] S.A. Kauffman, J. Theor. Biol. 22 (1969) 437–467.[19] R. Thomas, J. Theor. Biol. 42 (1973) 563–585.[20] R. Albert, H.G. Othmer, J. Theor. Biol. 223 (2003) 1–18.[21] M. Chaves, R. Albert, E.D. Sontag, J. Theor. Biol. 235 (2005) 431–449.[22] M. Chaves, E.D. Sontag, R. Albert, Syst. Biol. (Stevenage) 153 (2006) 154–167.[23] L. Mendoza, D. Thieffry, E.R. Alvarez-Buylla, Bioinformatics 15 (1999) 593–606.[24] C.E. Giacomantonio, G.J. Goodhill, PLoS Comput. Biol. 6 (2010) e1000936.[25] M.I. Davidich, S. Bornholdt, PLoS One 3 (2008) e1672.[26] A. Faure, A. Naldi, C. Chaouiya, D. Thieffry, Bioinformatics 22 (2006) e124–e131.[27] F.T. Li, T. Long, Y. Lu, Q. Ouyang, C. Tang, Proc. Natl. Acad. Sci. USA 101 (2004)

4781–4786.[28] K. Mangla, D.L. Dill, M.A. Horowitz, PLoS One 5 (2010) e8906.

Page 10: Boolean Modeling of Biological Regulatory Networks a Methodology Tutorial

12 A. Saadatpour, R. Albert / Methods 62 (2013) 3–12

[29] T.S. Christensen, A.P. Oliveira, J. Nielsen, BMC Syst. Biol. 3 (2009) 7.[30] J. Thakar, M. Pilione, G. Kirimanjeswara, E.T. Harvill, R. Albert, PLoS Comput.

Biol. 3 (2007) e109.[31] J. Saez-Rodriguez, L. Simeoni, J.A. Lindquist, R. Hemenway, U. Bommhardt,

et al., PLoS Comput. Biol. 3 (2007) e163.[32] G. Bhardwaj, C.P. Wells, R. Albert, D.B. van Rossum, R.L. Patterson, IET Syst.

Biol. 5 (2011) 174–184.[33] S. Li, S.M. Assmann, R. Albert, PLoS Biol. 4 (2006) e312.[34] R. Zhang, M.V. Shah, J. Yang, S.B. Nyland, X. Liu, et al., Proc. Natl. Acad. Sci. USA

105 (2008) 16308–16313.[35] J.E. Hooper, M.P. Scott, The molecular genetic basis of positional information

in insect segments, in: W. Hennig (Ed.), Early Embryonic Development ofAnimals, Springer-Verlag, Berlin, 1992, pp. 1–48.

[36] L. Wolpert, R. Beddington, J. Brockes, T. Jessell, P. Lawrence, et al., Principles ofDevelopment, Current Biology Press, London, 1998.

[37] P. Aza-Blanc, F.A. Ramirez-Weber, M.P. Laget, C. Schwartz, T.B. Kornberg, Cell89 (1997) 1043–1053.

[38] J.E. Hooper, M.P. Scott, Cell 59 (1989) 751–765.[39] B. Sanson, EMBO Rep. 2 (2001) 1083–1088.[40] G. von Dassow, E. Meir, E.M. Munro, G.M. Odell, Nature 406 (2000) 188–192.[41] N.T. Ingolia, PLoS Biol. 2 (2004) e123.[42] W. Ma, L. Lai, Q. Ouyang, C. Tang, Mol. Syst. Biol. 2 (2006) 70.[43] G. von Dassow, G.M. Odell, J. Exp. Zool. 294 (2002) 179–215.[44] G. Stoll, M. Bischofberger, J. Rougemont, F. Naef, Biosystems 102 (2010) 3–10.[45] K. Subramanian, C. Gadgil, IET Syst. Biol. 4 (2010) 169–176.[46] M. Chaves, R. Albert, J. R. Soc. Interface 5 (Suppl. 1) (2008) S71–84.[47] L. Mendoza, Biosystems 84 (2006) 101–114.[48] L. Mendoza, F. Pardo, Theory Biosci. 129 (2010) 283–293.[49] S.M. Kaech, S. Hemby, E. Kersh, R. Ahmed, Cell 111 (2002) 837–851.[50] R.J. Lund, M. Loytomaki, T. Naumanen, C. Dixon, Z. Chen, et al., J. Immunol.

178 (2007) 3648–3660.[51] D. Mathur, T.W. Danford, L.A. Boyer, R.A. Young, D.K. Gifford, et al., Genome

Biol. 9 (2008) R126.[52] R. Albert, S.M. Assmann, Discrete dynamic modeling with asynchronous

update or, how to model complex systems in the absence of quantitativeinformation, in: N.J. Totowa, D. Belostotsky (Eds.), Plant Systems Biology,Humana Press, 2009, pp. 207–225.

[53] J.J. Faith, B. Hayete, J.T. Thaden, I. Mogno, J. Wierzbowski, et al., PLoS Biol. 5(2007) e8.

[54] N. Friedman, M. Linial, I. Nachman, D. Pe’er, J. Comput. Biol. 7 (2000) 601–620.[55] A. Lewin, S. Richardson, Bayesian methods for microarray data, in: D.J.

Balding, M. Bishp, C. Cannings (Eds.), Handbook of Statistical Genetics, JohnWiley & Sons, Chichester, UK, 2008, pp. 267–295.

[56] A. Gupta, J.D. Varner, C.D. Maranas, Comput. Chem. Eng. 29 (2005) 565–576.[57] T.S. Gardner, D. di Bernardo, D. Lorenz, J.J. Collins, Science 301 (2003) 102–

105.[58] T. Akutsu, S. Miyano, S. Kuhara, Pac. Symp. Biocomput. (1999) 17–28.[59] T.E. Ideker, V. Thorsson, R.M. Karp, Pac. Symp. Biocomput. (2000) 305–316.[60] I. Shmulevich, E.R. Dougherty, S. Kim, W. Zhang, Bioinformatics 18 (2002)

261–274.[61] S.Y. Kim, S. Imoto, S. Miyano, Brief Bioinform. 4 (2003) 228–235.[62] R. Albert, B. DasGupta, R. Dondi, S. Kachalo, E. Sontag, et al., J. Comput. Biol. 14

(2007) 927–949.[63] C. Christensen, J. Thakar, R. Albert, IET Syst. Biol. 1 (2007) 61–77.[64] S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta, Bioinformatics 24

(2008) 293–295.[65] yEd Graph Editor: <http://www.yworks.com/en/products_yed_about.html>.[66] Ellson J, Gansner E, Koutsofios L, North SC, Woodhull G Graphviz—open

source graph drawing tools, in: P. Mutzel, M. Jnger, S. Leipert (Eds.), LNCS Vol.2265, Springer, 2002. pp. 483–484.

[67] P. Shannon, A. Markiel, O. Ozier, N.S. Baliga, J.T. Wang, et al., Genome Res. 13(2003) 2498–2504.

[68] M.E. Smoot, K. Ono, J. Ruscheinski, P.L. Wang, T. Ideker, Bioinformatics 27(2011) 431–432.

[69] V. Batagelj, A. Mrvar, Pajek – analysis and visualization of large networks, in:M. Jünger, P. Mutzel (Eds.), Graph Drawing Software, Springer, Berlin, 2003,pp. 77–103.

[70] S. Bornholdt, J. R. Soc. Interface 5 (Suppl. 1) (2008) S85–94.[71] I. Shmulevich, E.R. Dougherty, Probabilistic Boolean Networks: The Modeling

and Control of Gene Regulatory Networks, SIAM, New York, 2010.[72] A.M. Taylor, Y. Nakano, J. Mohler, P.W. Ingham, Mech. Dev. 42 (1993)

89–96.[73] K.M. Cadigan, U. Grossniklaus, W.J. Gehring, Genes Dev. 8 (1994) 899–913.[74] U. Grossniklaus, R.K. Pearson, W.J. Gehring, Genes Dev. 6 (1992) 1030–1051.[75] A. Hagberg, D. Schult, P. Swart, Exploring network structure, dynamics, and

function using NetworkX, in: G. Varoquaux, T. Vaught, J. Millman (Eds.),Proceedings of the 7th Python in Science Conference, Pasadena, CA, 2008, pp.11–15.

[76] S. Klamt, J. Saez-Rodriguez, E.D. Gilles, BMC Syst. Biol. 1 (2007) 2.[77] S. Klamt, J. Saez-Rodriguez, J.A. Lindquist, L. Simeoni, E.D. Gilles, BMC

Bioinformatics 7 (2006) 56.[78] H.W. Ma, J. Buer, A.P. Zeng, BMC Bioinformatics 5 (2004) 199.

[79] J. Sanz, J. Navarro, A. Arbues, C. Martin, P.C. Marijuan, et al., PLoS One 6 (2011)e22178.

[80] H.W. Ma, A.P. Zeng, Bioinformatics 19 (2003) 1423–1430.[81] A. Ma’ayan, S.L. Jenkins, S. Neves, A. Hasseldine, E. Grace, et al., Science 309

(2005) 1078–1083.[82] N.M. Luscombe, M.M. Babu, H. Yu, M. Snyder, S.A. Teichmann, et al., Nature

431 (2004) 308–312.[83] C.L. Freeman, Sociometry 40 (1977) 35–41.[84] M. Girvan, M.E.J. Newman, Proc. Natl. Acad. Sci. USA 99 (2002) 7821–7826.[85] N. Guelzim, S. Bottani, P. Bourgine, F. Kepes, Nat. Genet. 31 (2002) 60–63.[86] R. Dobrin, Q.K. Beg, A.L. Barabasi, Z.N. Oltvai, BMC Bioinformatics 5 (2004) 10.[87] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, et al., Science 298

(2002) 824–827.[88] U. Alon, An Introduction to Systems Biology: Design Principles of Biological

Circuits, Chapman & Hall/CRC, Boca Raton, FL, 2006.[89] C. Christensen, A. Gupta, C.D. Maranas, R. Albert, Physica A 373 (2007) 796–

810.[90] S.S. Shen-Orr, R. Milo, S. Mangan, U. Alon, Nat. Genet. 31 (2002) 64–68.[91] E. Sontag, A. Veliz-Cuba, R. Laubenbacher, A.S. Jarrah, Biophys. J. 95 (2008)

518–526.[92] R. Thomas, On the relation between the logical structure of systems and their

ability to generate multiple steady states and sustained oscillations, in: J.Della Dora, J. Demongeot, B. Lacolle (Eds.), Numerical Methods in the Study ofCritical Phenomena, Springer Verlag, Berlin, 1981, pp. 180–193.

[93] R.S. Wang, R. Albert, BMC Syst. Biol. 5 (2011) 44.[94] R. Thomas, J. Theor. Biol. 153 (1991) 1–23.[95] I. Harvey, T. Bossomaier, Time out of joint: attractors in asynchronous

random Boolean networks, in: P. Husbands, I. Harvey (Eds.), Proceedings ofthe fourth European Conference on Artificial Life, MIT Press, Cambridge, 1997,pp. 67–75.

[96] A. Garg, A. Di Cara, I. Xenarios, L. Mendoza, G. De Micheli, Bioinformatics 24(2008) 1917–1925.

[97] A. Saadatpour, I. Albert, R. Albert, J. Theor. Biol. 266 (2010) 641–656.[98] C. Farrow, J. Heidel, J. Maloney, J. Rogers, IEEE Trans. Neural. Network 15

(2004) 348–354.[99] J. Heidel, J. Maloney, F.J.R. Ch, Int. J. Bifurcat. Chaos 13 (2003) 535–552.

[100] E. Dubrova, M. Teslenko, IEEE/ACM Trans. Comput. Biol. Bioinf. 8 (2011)1393–1399.

[101] T. Skodawessely, K. Klemm, Adv. Complex Syst. 14 (2011) 439–449.[102] A. Mochizuki, J. Theor. Biol. 236 (2005) 291–310.[103] C. Campbell, S. Yang, R. Albert, K. Shea, Proc. Natl. Acad. Sci. USA 108 (2011)

197–202.[104] S. Bilke, F. Sjunnesson, Phys. Rev. E 65 (2002) 016129.[105] A. Naldi, E. Remy, D. Thieffry, C. Chaouiya, Theoret. Comput. Sci. 412 (2011)

2207–2218.[106] K.A. Richardson, Adv. Complex Syst. 8 (2005) 365–381.[107] A. Veliz-Cuba, J. Theor. Biol. 289 (2011) 167–172.[108] A. Saadatpour, R.S. Wang, A. Liao, X. Liu, T.P. Loughran, et al., PLoS Comput.

Biol. 7 (2011) e1002267.[109] C.M. Grinstead, J.L. Snell, Introduction to Probability, AMS, Providence, RI,

1997.[110] O. Cinquin, J. Demongeot, J. Theor. Biol. 216 (2002) 229–241.[111] J.L. Gouze, J. Biol. Syst. 6 (1998) 11–15.[112] E.H. Snoussi, J. Biol. Syst. 6 (1998) 3–9.[113] C. Soulé, ComplexUs 1 (2003) 123–133.[114] A. Reichard, J.-P. Comet, Dis. Appl. Math. 155 (2007) 2403–2413.[115] E. Remy, P. Ruet, D. Thieffry, Adv. Appl. Math. 41 (2008) 335–350.[116] I. Albert, J. Thakar, S. Li, R. Zhang, R. Albert, Source Code Biol. Med. 3 (2008)

16.[117] C. Mussel, M. Hopfensitz, H.A. Kestler, Bioinformatics 26 (2010) 1378–1380.[118] J. Zheng, D. Zhang, P.F. Przytycki, R. Zielinski, J. Capala, et al., Bioinformatics

26 (2010) 141–142.[119] A.G. Gonzalez, A. Naldi, L. Sanchez, D. Thieffry, C. Chaouiya, Biosystems 84

(2006) 91–100.[120] A. Di Cara, A. Garg, G. De Micheli, I. Xenarios, L. Mendoza, BMC Bioinformatics

8 (2007) 462.[121] T. Helikar, J.A. Rogers, BMC Syst. Biol. 3 (2009) 58.[122] F. Hinkelmann, M. Brandon, B. Guang, R. McNeill, G. Blekherman, et al., BMC

Bioinformatics 12 (2011) 295.[123] T. Tabata, S. Eaton, T.B. Kornberg, Genes Dev. 6 (1992) 2635–2645.[124] A. Hidalgo, P. Ingham, Development 110 (1990) 291–301.[125] C. Alexandre, J.P. Vincent, Development 130 (2003) 729–739.[126] Y. Xiao, E.R. Dougherty, Bioinformatics 23 (2007) 1265–1273.[127] L. Tournier, M. Chaves, J. Theor. Biol. 260 (2009) 196–209.[128] J.J. Tyson, K.C. Chen, B. Novak, Curr. Opin. Cell Biol. 15 (2003) 221–231.[129] V. Sevim, X. Gong, J.E. Socolar, PLoS Comput. Biol. 6 (2010) e1000842.[130] R. Zhang, S. De, H.L.D. Cavalcante, Z. Gao, D.J. Gauthier, J.E.S. Socolar, et al.,

Phys. Rev. E 80 (2009) 045202.[131] H. De Jong, J.L. Gouze, C. Hernandez, M. Page, T. Sari, et al., Bull. Math. Biol. 66

(2004) 301–340.[132] J. Thakar, A. Saadatpour-Moghaddam, E.T. Harvill, R. Albert, J. R. Soc. Interface

6 (2009) 599–612.