Modeling Biological Networks Lecture3
Transcript of Modeling Biological Networks Lecture3
-
8/8/2019 Modeling Biological Networks Lecture3
1/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20081
Modeling Biological Networks
Dr. Carlo CosentinoSchool of Computer and Biomedical Engineering
Department of Experimental and Clinical MedicineUniversit degli Studi Magna Graecia
Catanzaro, [email protected]://bioingegneria.unicz.it/~cosentino
-
8/8/2019 Modeling Biological Networks Lecture3
2/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20082
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
-
8/8/2019 Modeling Biological Networks Lecture3
3/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20083
Types of Biological Network
Several different kinds of biological network can be distinguished at themolecular level
Gene regulatory
Metabolic
Signal transduction
Proteinprotein interaction
Moreover other networks can be considered as we move to differentdescription levels, e.g.
Immunological
Ecological
Here we will focus exclusively on molecular processes that take place withinthe cell
-
8/8/2019 Modeling Biological Networks Lecture3
4/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20084
Goals
A major challenge consists in identifying with reasonable accuracy thecomplex macromolecular interactions at the gene, metabolite and proteinlevels
Once identified, the network model can be used to
simulate the process it represents
predict the features of its dynamical behavior
extrapolate cellular phenotypes
-
8/8/2019 Modeling Biological Networks Lecture3
5/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20085
Graphs
A very useful formal tool for describing and visualizing
biological networks is represented by graphs
A graph, or undirected graph, is an ordered pair
G=(V,E), where V is the set of the vertices, or nodes,and E is the set of unordered pairs of distinct
vertices, called edges or lines
For each edge {u,v}, the nodes u and v are said to be
adjacent
We have a directed graph, or digraph, if E is a set of
ordered pairs
In digraphs, the indegree, kin, (outdegree, kout) of a
node is the number of edges incident to (from) thatnode
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
-
8/8/2019 Modeling Biological Networks Lecture3
6/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20086
Topological Characteristics
The degree distribution ,P(k), gives the probabilitythat a selected node has exactly k links
It allows us to distinguish between different classes
of networks (see next slide)
The clustering coefficient of a node I, CI, measuresthe aggregation of its adjacents (number of
triangles passing through node I)
C(k) is the average clustering coefficient of all nodeswith k links
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
-
8/8/2019 Modeling Biological Networks Lecture3
7/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20087
ErdsRnyi Random Networks
The ErdsRnyi model of a random network startswith N nodes and connects each pair of nodes withprobabilityp
The degree follows a Poisson distribution, thus manynodes have the same number of links (close to the
average degree
The tail decreases exponentially, which indicates thatnodes with kvery different from the average are rare
The clustering coefficient is independent of a nodesdegree
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
-
8/8/2019 Modeling Biological Networks Lecture3
8/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20088
ScaleFree Networks
Scalefree networks are characterized by a powerlawdegree distribution
The probability that a node hask
links followsP(k)~k-
,where is the degree exponent
The probability that a node is highly connected isstatistically more significant than in a random graph
In the BarabsiAlbert model, at each time point a nodewith M links is added to the network, which connects toan already existing node I with probabilityi=ki/jkj
The underlying mechanism is that nodes with many linkshave higher probability of getting more (this is alsoreferred to aspreferential attachment)
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
-
8/8/2019 Modeling Biological Networks Lecture3
9/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20089
Hierarchical Networks
A hierarchical structure arises in systems that combinemodularity and scalefree topology
The hierarchical model is based on the replication of a
small cluster of four nodes (the central ones)
The external nodes of the replicas are linked to thecentral node of the original cluster
The resulting network has a powerlaw degreedistribution, thus it is scalefree
The average clustering coefficient scales with the degree
followingC(k )~k-1
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
-
8/8/2019 Modeling Biological Networks Lecture3
10/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200810
Graphs of Biological Networks
Depending on the kind of biological network, the edges and nodes of thegraph have different meaning
Metabolic network
nodes: metabolic product, edge: a reaction transforming A into B
Transcriptional regulation network (proteinDNA)
nodes: genes and proteins, edge: a TF regulates a gene
Protein protein network
nodes: proteins, edge: interaction between proteins
Gene regulatory networks (functional association network)
nodes: genes, edge: expressions of A and B are correlated
-
8/8/2019 Modeling Biological Networks Lecture3
11/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200811
Topology of Biological Networks
An extensive commentary has been published by Albert in 2005, reviewingliterature on the topology of different kinds of biological networks
Experimental evidences are reviewed for metabolic, transcriptionalregulatory, signal transduction, functional association networks
All of the considered networks approximately exhibit powerlaw degree
distribution, at least for the in or for the outdegree
For instance, transcriptional regulation networks exhibit a scalefree outdegree distribution, signifying the potential of transcription factors toregulate multiple targets
On the other hand, their indegree is a more restricted exponentialfunction, suggesting that combinatorial regulation by several TFs is lessfrequent
Albert, Scalefree networks in cell biology, Journal of Cell
Science 118(21), 49474957, 2005
-
8/8/2019 Modeling Biological Networks Lecture3
12/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200812
PP Interaction Network in Yeast
This network is based on yeast twohybrid
experiments
Few highly connected nodes (hubs) hold
the network together
Barabasi et al, Nature Review Genetics 101(5), 101114 , 2004
The color of a node indicates thephenotypic effect deriving from
removing the correspondingprotein
red: lethal
green: nonlethal
orange: slow growth
yellow: unknown
-
8/8/2019 Modeling Biological Networks Lecture3
13/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200813
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
-
8/8/2019 Modeling Biological Networks Lecture3
14/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200814
Metabolic Reactions
Living cells require energy and material for
building up membranes
storing molecules
replenishing enzymes
replication and repair of DNA
movement
Metabolic reactions can be divided in two categories
Catabolic reactions: breakdown of complex compounds to get energy
and building blocksAnabolic reactions: assembling of the compounds used by the cellular
mechanisms
-
8/8/2019 Modeling Biological Networks Lecture3
15/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200815
Basic Concepts of Metabolism
Historically metabolism is the part of cell functioning that has been studiedmore thoroughly during the last decades
This implies that several well assessed mathematical tools exist for
describing this kind of networks
Enzyme kinetics investigates the dynamic properties of the individual
reactions in isolation
Stoichiometric analysis deals with the balance of compound productionand degradation at the network level
Metabolic control analysis describes the effect of perturbations in the
network, in terms of changes of metabolites concentrations
Most of the tools used in the quantitative study of metabolic networks canalso be applied to other types of networks
-
8/8/2019 Modeling Biological Networks Lecture3
16/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200816
Glycolysis
We will exploit the casestudy of glycolysis in yeast in order to illustrate the
theoretical concepts introduced hereafter
The pathway shown below is part of the glycolysis process
Hynne et al,Fullscale model of glycolysis in Saccharomyces cerevisiae (2001) Biophys. Chem. 94, 121163
v1: hexokinase
v2: consumption of glucose6phosphatein other pathways
v3: phosphoglucoisomerase
v4: phosphofructokinase
v5: aldolase
v6: ATP production in lower glycolysis v7: ATP consumption in other pathways v8: adenylate kinase
List of Reactions
-
8/8/2019 Modeling Biological Networks Lecture3
17/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200817
ODE Model of Glycolysis
The system of ODEs describing the pathway is
-
8/8/2019 Modeling Biological Networks Lecture3
18/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200818
ODE Model with Constant Glucose
The kinetic rates as functions ofreactants can be derived by applying themodels presented in the previous lecture
Model Parameters
-
8/8/2019 Modeling Biological Networks Lecture3
19/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200819
Stoichiometric Analysis
The basic elements considered in stoichiometric analysis of metabolicnetworks are
The concentrations of the various species
The reactions or transport processes affecting such concentrations
The stoichiometric coefficients denote the proportion of substrate and
product molecules involved in a reaction
For instance, if we consider the reaction
the stoichiometric coefficients ofS1, S2, P are 1,1,-2 respectively
-
8/8/2019 Modeling Biological Networks Lecture3
20/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200820
Stoichiometric Analysis
The change of concentrations in time can be described by means of ODEs
For the simple reaction above we have
This means that the degradation ofS1with rate v is accompanied by thedegradation ofS2with the same rate and by the production of P with adouble rate
-
8/8/2019 Modeling Biological Networks Lecture3
21/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200821
Stoichiometric Matrix
In general, for a system ofm substances and rreactions, the systemdynamics are described by
The number nij is the stoichiometric coefficient of the i-th metabolite in the
j-th reaction
For the sake of simplicity, we assume that the changes of concentrations areonly due to reactions (i.e. we neglect the effect of convection or diffusion)
We can then define the stoichiometric matrix
in which columns correspond to reactions and rows to concentrationvariations
-
8/8/2019 Modeling Biological Networks Lecture3
22/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200822
Stoichiometric Model
The mathematical description of the metabolic network can be given inmatrix form as
where
S=(S1,,Sm)T is the vector of concentration values
v=(v1,,vr)T
is the vector of reaction rates
If the system is at steadystate (that is dSi/dt= 0 for i=1,,m) we can alsodefine the vector of steadystate fluxes, J=(J1,,Jr)
T
Finally, the model involves a certain number of parameters, thus we candefine also a parameter vector, p=(p1,,p)
T
-
8/8/2019 Modeling Biological Networks Lecture3
23/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200823
Stoichiometric Model of Glycolysis
For the glycolysis model we have
-
8/8/2019 Modeling Biological Networks Lecture3
24/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200824
Analysis of the Stoichiometric Matrix
A relevant information that can be readily derived from the Nmatrix iswhich combinations of individual fluxes are possible at steadystate
The system of algebraic eqs admits a nontrivial solution only ifrank(N)
-
8/8/2019 Modeling Biological Networks Lecture3
25/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200825
An Example
Let us consider the simple network
The stoichiometric matrix is N=(1 1 1)
and the steadystate fluxes are described by the linear combination
-
8/8/2019 Modeling Biological Networks Lecture3
26/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200826
Null Rates at SteadyState
For the glycolysis model we have r=8 and rank(N)=5, thus the base of thenull space ofN is composed of three vectors
Note that the entries in the last row are all zero; this means that the net ratefor that reaction is null at steadystate
Hence, at steadystate we can neglect the reaction v8
-
8/8/2019 Modeling Biological Networks Lecture3
27/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200827
Unbranched Pathways
Another property that can be readily derived is the presence of unbranchedpathways
In this case, the net rate of all the reactions in the pathway must be equal
The entries for the second and third reaction
in the matrix Kare always equal
This implies that the fluxes through reactions2 and 3 must be equal at steadystate
-
8/8/2019 Modeling Biological Networks Lecture3
28/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200828
Elementary Flux Modes
A pathway can be defined as a set of metabolic reactions linked by commonmetabolites
It is not straightforward to recognize pathways in metabolic maps that have
been reconstructed from experimental evidences
This problem is formalized in the concept of finding the Elementary FluxModes (EFMs)
The aim is to find which are the admissible direct routes for producing acertain metabolite starting from another one
In order to have an idea of the usefulness of such mathematical methods,
we can have a glimpse at a typical wholeorganismscale metabolic network
-
8/8/2019 Modeling Biological Networks Lecture3
29/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200829
Metabolic Network in Yeast
Palsson, Systems Biology: Properties of Reconstructed Networks, 2006
-
8/8/2019 Modeling Biological Networks Lecture3
30/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200830
Elementary Flux Modes
Without going into the mathematical details, we can have a further insightby looking at the elementary flux modes of two simple networks
A factor that greatly influences the EFMs is the reversibility of the singlereactions
-
8/8/2019 Modeling Biological Networks Lecture3
31/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200831
Applications of EFM Analysis
EFMs can be used to
infer the range of metabolic pathways in the network
test a set of enzymes for production of a desired compound, and to findthe most convenient pathway
reconstruct metabolism from annotated genome sequences and analyzethe effects of enzyme deficiency
reduce drug effects and identify drug targets
-
8/8/2019 Modeling Biological Networks Lecture3
32/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200832
Flux Balance Analysis
Flux Balance Analysis (FBA) deals with the problem of finding theoperative modes of metabolic networks subject to three kinds of constraints
1) The operative mode is assumed to be at steadystate
2) The operative mode must respect the (ir)reversibility of the reactions
3) The enzyme catalytic activity in each reaction is limited to an
admissible range, i.e. i vii
Additional constraints may be imposed by biomass composition or otherexternal conditions
-
8/8/2019 Modeling Biological Networks Lecture3
33/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200833
Flux Balance Constrained Optimization
Such constraints confine the steadystate fluxes to a feasible set, but usuallydo not yield a unique solution
Hence, the determination of a particular metabolic flux distribution can be
cast as a linear optimization problem
Maximize an objective function
subject to the constraints given above
-
8/8/2019 Modeling Biological Networks Lecture3
34/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200834
Conservation Relations
If a substance is neither added nor removed from the reaction system, itstotal concentration remains constant
This property can be derived by analyzing the null space ofNT, defined by
the matrix G such that
The latter implies
The dimension of the null space is m-rank(N)
GS= GNv = 0 GS= const
GN = 0
C i i Gl l i
-
8/8/2019 Modeling Biological Networks Lecture3
35/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200835
Conservation in Glycolysis
For the glycolysis example we have
which means the sum of concentrations of AMP, ADP, ATP remainsconstant
The conservation relations can be used to simplified the dynamical model,by exploiting the algebraic equations that express the conservationconstraints to express some variables as functions of the others
M t b li C t l A l i
-
8/8/2019 Modeling Biological Networks Lecture3
36/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200836
Metabolic Control Analysis
Metabolic Control Analysis (MCA) deals with the sensitivity of the steadystate properties of the network to small parameter changes
It can be also applied to models of other kinds of network, like signaling
pathways or gene expression
Issues addressed by MCA
Predict properties of the network from knowledge of individual
components
Find which specific step has the greatest influence on a flux or steadystate concentration or reaction rate
Find which is the best target reaction to treat a metabolic disorderThese questions are very relevant in biotechnological production
processes and health care
B i C t f MCA
-
8/8/2019 Modeling Biological Networks Lecture3
37/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200837
Basic Concepts of MCA
The relations between steadystate properties and model parameters areusually highly nonlinear
There is no general theory predicting the effect of large parameter changes
The MCA approach deals with small parameter changes
Under this assumption, the model can be approximated, in theneighborhood of the steadystate, with a linear one
Given the linearized model it is possible to derive some indexes describingthe properties above mentioned, e.g. elasticity coefficients, controlcoefficients, response coefficients
O tline
-
8/8/2019 Modeling Biological Networks Lecture3
38/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200838
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Gene Regulatory Networks
-
8/8/2019 Modeling Biological Networks Lecture3
39/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200839
Gene Regulatory Networks
A protein synthesized from a gene can serve as a transcription factor foranother gene, as an enzyme catalyzing a metabolic reaction, or as acomponent of a signal transduction pathway
Apart from DNA transcription regulation, gene expression may becontrolled during RNA processing and transport, RNA translation, and the
posttranslational modification of proteins
Therefore, gene regulatory networks (GRNs) involve interactions betweenDNA, RNA, proteins and other molecules
A suitable way to dominate this complexity may consist of using functionalassociation networks
In this networks the edges of the corresponding graph do not representchemical interactions, but functional influences of one gene on the other
Example of a GRN
-
8/8/2019 Modeling Biological Networks Lecture3
40/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200840
Example of a GRN
A toy regulatory network of three genes is depicted in the cartoon below
De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000
Modeling GRNs
-
8/8/2019 Modeling Biological Networks Lecture3
41/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200841
Modeling GRNs
In what follows we will present an overview of the models used to describeGRNs
Two main issues have to be taken into account when choosing a modeling
framework
Computational requirements for simulation
Available methods for inferring the network topology
Bayesian Networks
-
8/8/2019 Modeling Biological Networks Lecture3
42/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200842
Bayesian Networks
In the formalism of Bayesian Networks, the structure of a genetic regulatory
system is modeled by a directed acyclic graph G=V,E
The vertices iV, i=1,,n, represent genes expression levels and
correspond to random variablesXi.
For eachXi, a conditional distribution p(Xi |parents(Xi)) is defined, whereparents(Xi) denotes the direct regulators ofi
The graph G and the set of conditional distributions uniquely specify a jointprobability distributionp(X)
Independency in BN
-
8/8/2019 Modeling Biological Networks Lecture3
43/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200843
Independency in BN
IfXi is independent ofYgiven Z, where Yand Z are set of variables, we canstate a conditional independency
For every node i in G,
Hence, the joint probability distribution can be decomposed into
i (Xi;Y|Z)
i (Xi;non descendant(Xi)|parents(Xi))
p(X) =nY
i=1
p(Xi|parents(Xi))
Example of BN
-
8/8/2019 Modeling Biological Networks Lecture3
44/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200844
Example of BN
Here we illustrate the formulation of the BN model for a simple network
Two graphs are said to be equivalent if the imply the same set of
independencies; they cannot be distinguished by observation on X
De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000
Features of BNs
-
8/8/2019 Modeling Biological Networks Lecture3
45/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200845
Features of BNs
There is no need to specify a single value for each parameter of the model,but rather a distribution over the admissible range of values is assigned
This characteristic helps in avoiding overfitting, which is common in the
presence of a small data set and a large number of parameters
It is a statistical modeling approach, which nicely fits the stochastic natureof biological systems
BNs are static models, although it is possible to take into account dynamicalaspects through an extension of this theory, namely dynamical bayesiannetworks (DBNs)
Boolean Networks
-
8/8/2019 Modeling Biological Networks Lecture3
46/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200846
Boolean Networks
In the framework of Boolean Networks , the expression level of a gene canattain only two values, that is active (on, 1) or inactive (off, 0)
Accordingly, the interactions between elements of the network are
represented by Boolean functions
Smolen, Baxter, Byrne, Mathematical model of gene networks, Neuron 26, 567 580, 2000
Features of Boolean Networks
-
8/8/2019 Modeling Biological Networks Lecture3
47/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200847
Features of Boolean Networks
Deterministic description
Very easy to build the model and to simulate it, even for very large networks
They provide only a coarsegrained description of the network behavior,thus not useful for a more detailed analysis of the regulatory mechanisms
ODE Models
-
8/8/2019 Modeling Biological Networks Lecture3
48/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200848
ODE Models
We have seen that the mechanistic ODE approach has been widelyexploited since the beginning of the last century for modeling biochemicalreactions
When the order of the system increases, classical nonlinear ODE modelsbecome hardly tractable, in terms of parametric analysis, numerical
simulation and especially for identification purposes
In order to overcome this limitations, alternative modeling approaches havebeen devised for application to biological networks
PowerLaw Models
-
8/8/2019 Modeling Biological Networks Lecture3
49/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200849
The basic concept underlying powerlaw models is the approximation ofclassical ODE models by means of a uniform mathematical structure
SSystems
-
8/8/2019 Modeling Biological Networks Lecture3
50/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200850
y
Ssystems are a particular class of powerlaw models in which fluxes areaggregated
( ) ( ) ( )===
n
j
h
ji
n
j
g
jii jiji tXtX
dt
tdX
11
,,,
Features of S - Systems
-
8/8/2019 Modeling Biological Networks Lecture3
51/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200851
y
Ssystems feature low computational requirements
Their structural homogeneity allows to easily identify the model parameters
from steadystate data by means of logarithmic linearization
Generalized aggregation may introduce a loss of accuracy
Violation of biochemical fluxes concentration
It may conceal important structural features of the network
PiecewiseLinear Models
-
8/8/2019 Modeling Biological Networks Lecture3
52/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200852
Another class of approximate models based on ODEs is that of piecewise-linear (PWL) models
The basic idea is to approximate sigmoidal curves through step functions
The model takes the general form
where
and the functions bil() are boolean valued regulation functions expressed in
terms of step functions
Casey, De Jong, Gouz, J. Math. Biol. 52, 2756, 2006
Features of PWL Models
-
8/8/2019 Modeling Biological Networks Lecture3
53/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200853
Numerical simulation studies have shown that PWL models properlyapproximate the behavior of the corresponding original nonlinear ones
A drawback of this class of systems is that their behavior is very difficult to
analyze from a rigorous point of view
PWL models, indeed, can exhibit singular steadystates, that isequilibrium points lying on the threshold surfaces
Moreover it is known that the stability ofswitching systems cannot be reduced to theanalysis of the stability of the linear systemsin each sub-space
Outline
-
8/8/2019 Modeling Biological Networks Lecture3
54/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200854
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Inferring Bayesian Networks
-
8/8/2019 Modeling Biological Networks Lecture3
55/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200855
In order to reverse engineering a Bayesian network model of a genenetwork, we must find the directed acyclic graph that best describes the data
To do this, a scoring function is chosen, in order to evaluate the candidate
graphs Gwith respect to the data setD
The score can be defined using Bayes rule
If the topology of the network is partially known, the a prioriknowledge can
be included in P(G)
The most popular scores are the Bayesian Information Criterion (BIC) orBayesian Dirichlet equivalence (BDe)
They incorporate a penalty for complexity to cope with overfitting
P(G|D) =P(D|G)P(G)
P(D)
Inferring Bayesian Networks
-
8/8/2019 Modeling Biological Networks Lecture3
56/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200856
The evaluation of all possible networks involves checking all possiblecombinations of interactions among the nodes
This problem is NP-hard, therefore heuristic methods are used, like the
greedyhill climbing approach, the MarkovChain Monte Carlo method, orSimulated Annealing
A software tool for inferring both BNs and DBNs is Banjo, developed bythe group of Hartemink(http://www.cs.duke.edu/~amink/software/banjo)
Yu et al,Advances to bayesian network inference for generating causal networks fromobservational biological data, Bioinformatics 20: 3594-3603, 2004
InformationTheoretic Approaches
-
8/8/2019 Modeling Biological Networks Lecture3
57/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200857
Information theoretic approaches use a generalization of the Pearsoncorrelation coefficient
used in hierarchical clustering, namely the Mutual Information (MI), whichis computed as
where the marginal and joint entropy are defined, respectively, as
H(X) = XxX
p(x)logp(x)
H(X, Y) = X
xX,yY
p(x, y)logp(x, y)
MI(X;Y) = H(X) + H(Y)H(X, Y)
InformationTheoretic Approaches
-
8/8/2019 Modeling Biological Networks Lecture3
58/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200858
From the definitions above it follows that
MI becomes zero if the two variables are statistically independent
A high value of MI indicates that the variables are nonrandomly
associated to each other
MIij=MIji therefore the resulting reconstructed graph is undirected
An important characteristic is that, since the approach is based on the
independence of samples, it is not suitable for application to timeseries (itcan applied only to steadystate data sets)
A software tool based on MutualInformation theory is ARACNE,
described inBasso et al, Reverse engineering of regulatory networks in human B cells, Nature Genetics37(4): 382-90, 2005
Inference of ODE Models
-
8/8/2019 Modeling Biological Networks Lecture3
59/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200859
The identification of the structure and parameters of mechanistic nonlinearODE models is a very demanding task for nontrivial networks, both froma theoretical point of view and in terms of computational requirements
A feasible approach is based on the use of linearized dynamical models,which yield good results when applied to data sets obtained through
perturbation experiments
Several methods have been developed from the groups of Gardner and diBernardo, dealing both with steadystate (NIR, MNI) and timeseries data(TSNI)
TimeSeries Network Identification
-
8/8/2019 Modeling Biological Networks Lecture3
60/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200860
The TSNI algorithm is based on the linearized model
The data set consists of the expression level of N genes, sampled at M time
points with a fixed sampling interval
The experimental data are derived from perturbation experiments (e.g. by
treatment with a compound or gene overexpression/downregulation)
A linear regression algorithm is used to estimate the coefficients of thedynamical matrix, aij, and those of the input matrix, bi
A non-zero coefficient aij indicates an edge in the (directed) graph, betweennodes i andj, whereas a nonzero bij indicates that the node i is directlyaffected by the perturbation
i = 1, . . . , N
k = 1, . . . ,M
Bansal, Della Gatta, di Bernardo, Bioinformatics 22: 815822
Features of TSNI
-
8/8/2019 Modeling Biological Networks Lecture3
61/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200861
For small networks (tens of genes), TSNI is able to correctly infer thenetwork structure
Besides topological inference, ODE-based methods are also wellsuited for
uncovering unknown targets of perturbations, even in complex networks
It is not possible to exploit prior knowledge about the network topology,because this would require the exact knowledge of nonphysical parameters
LMI-based Inference Approach
-
8/8/2019 Modeling Biological Networks Lecture3
62/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200862
The basic idea is improving linear ODEbased methods by exploitingavailable prior knowledge about the network topology (as in BNs)
The identification of the parameters aij, bij, is cast as a convex optimization
problem, in the form of linear matrix inequalities (LMIs)
This formulation allows to reduce the admissible solution space by assigningsign constraints to the coefficients corresponding to known interactions
x1 x2
x3 x4???x4
??>x3
?x2
???x1
x4x3x2x1
Cosentino et al, IET Systems Biology 1(3): 164173, 2007
activation
inhibition
Features of the LMI-based Approach
-
8/8/2019 Modeling Biological Networks Lecture3
63/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200863
Numerical tests show that exploitation of prior knowledge greatly improvesthe reconstruction performances
The method can exploit qualitative a prioriknowledge, as well as quantitative
information
Such knowledge is exploited within the reconstruction, not for a posteriorievaluation
The optimization problem is convex, therefore the optimal solution, interms of data-interpolation, can be always found
The latter feature, on other hand, implies a higher tendency to overfitting
Hard to apply to largescale networks (more than 100 nodes), due to thecomputational load deriving from the high number of constraints
Choice of the Inference Algorithms
-
8/8/2019 Modeling Biological Networks Lecture3
64/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200864
In a recent study, Bansal et al have compared the performance obtainedusing different modeling formalisms (BNs, MI, hierarchical clustering,ODE-based models)
Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007
Results on Experimental Data Sets
-
8/8/2019 Modeling Biological Networks Lecture3
65/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200865
Bansal et al,How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007
Results Discussion
-
8/8/2019 Modeling Biological Networks Lecture3
66/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200866
The different techniques considered in the review infer networks thatoverlap for only 10% in the best case
Furthermore, the edges predicted by more than one method are not more
accurate than those inferred by a single one
On the other hand, taking the union of the interactions found by all themethods would yield an even larger number of false positives
Local perturbation experiments (i.e. affecting one or few genes) seems toyield better results than global ones (perturbations on a high number ofgenes)
Remarks on Inference Algorithms
-
8/8/2019 Modeling Biological Networks Lecture3
67/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200867
A relevant issue, that is common to all inference algorithm, is that theproblem is very often overdetermined
All modeling formalisms, indeed, involve a large number of parameters,
whereas the number of samples is usually limited (curse of dimensionality)
Possible solutions
Devise methods to exploit different data sets
Reduce the dimensionality of the problem, via data preprocessing, e.g.
clustering algorithm
elimination of statistically nonexpressed nodes
Concluding Remarks
-
8/8/2019 Modeling Biological Networks Lecture3
68/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200868
Regardless to the adopted formalism, good inference performances can beachieved only by exploiting the available prior knowledge from biologicalliterature
Despite the great concern about the topological characterization ofbiological networks, much has still to be done in terms of exploitation of
such features in the inference process
Several other approaches exist, both for modeling and inferring biological
networks (discrete events, formal languages, machine learning methods, etc.)
References
-
8/8/2019 Modeling Biological Networks Lecture3
69/69
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200869
Klipp et al, Systems Biology in Practice, Wiley-VCH, 2005
Palsson, Systems Biology: Properties of Reconstructed Networks, Cambridge University Press, 2006
Barabasi, Oltvai,Network Biology: Understanding the Cells Functional Organization, Nature Review
Genetics 101(5), 101114 , 2004
Hynne et al, Fullscale model of glycolysis in Saccharomyces cerevisiae(2001) Biophys. Chem. 94, 121163
De Jong,Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000
Smolen, Baxter, Byrne,Mathematical model of gene networks, Neuron 26, 567 580, 2000
Casey et al, Piecewise linear Models of Genetic Regulatory Networks, Equilibria and their Stability, J. Math.
Biol. 52, 2756, 2006
Bansal et al, Inference of gene regulatory networks and compound mode of action from time course gene
expression profiles, Bioinformatics 22: 815822
Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007
Cosentino et al, Linear Matrix Inequalities Approach to Reconstruction of Biological Networks, IETSystems Biology 1(3): 164173, 2007