Outline Biological motivation Introduction to graph models and Bayesian network Case study

38
Outline Biological motivation Introduction to graph models and Bayesian network Case study “Module networks: identifying regulatory modules and their condition- specific regulators from gene expression data” Segal, Shapira, Regev, Pe’er, Botstein, Koller, Friedman. Nature Genetics. 2003 Large-scale mapping and validation of E. Coli transcriptiional regulation from a compendium of expression

description

Outline Biological motivation Introduction to graph models and Bayesian network Case study “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data” Segal, Shapira, Regev, Pe’er, Botstein, Koller, Friedman. Nature Genetics. 2003 - PowerPoint PPT Presentation

Transcript of Outline Biological motivation Introduction to graph models and Bayesian network Case study

Page 1: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Outline• Biological motivation

• Introduction to graph models and Bayesian network

• Case study• “Module networks: identifying regulatory modules and

their condition-specific regulators from gene expression data” Segal, Shapira, Regev, Pe’er, Botstein, Koller, Friedman. Nature Genetics. 2003

• Large-scale mapping and validation of E. Coli transcriptiional regulation from a compendium of expression profiles. PLoS Biology. 2007.

Page 2: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• cis-regulatory motif: A short (6-to-12-ish) series of DNA bases that can bind to an “activator” or “repressor” protein. Illustrated at right as activator/repressor binding sites.

1. Biological motivation

Page 3: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• Module: set of genes that participate in a coherent biological process

• Module group: set of modules that all share at least one cis-regulatory motif

• regulator: a gene that encodes a protein whose concentration regulates the expression of other genes

• expression profile: concentrations of various genes in given bio-experimental circumstances

1. Biological motivation

Page 4: Outline Biological motivation Introduction to graph models and Bayesian network Case study

1. Biological motivation

Page 5: Outline Biological motivation Introduction to graph models and Bayesian network Case study

"Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two

problems that occur throughout applied mathematics and engineering -- uncertainty and complexity. ……..Fundamental to

the idea of a graphical model is the notion of modularity -- a complex system is built by combining simpler parts. Probability

theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent.

Many of the classical multivariate probabalistic systems are special cases of the general graphical model formalism -- examples include

mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models.…..the graphical model formalism

provides a natural framework for the design of new systems."

--- Michael Jordan, 1998.

2. Introduction to Bayesian Network

Page 6: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Cloudy

Sprinkler Rain

Wet Grass

P(C)=0.5, P(-C)=0.5

P(S|C)=0.1, P(-S|C)=0.9P(S|-C)=0.5, P(-S|-C)=0.5

P(R|C)=0.8, P(-R|C)=0.2P(R|-C)=0.2, P(-R|-C)=0.8

P(W|S,R)=0.99, P(-W|S,R)=0.01P(W|S,-R)=0.9, P(-W|S,-R)=0.1P(W|-S,R)=0.9, P(-W|-S,R)=0.1P(W|-S,-R)=0, P(-W|-S,-R)=1

From: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

2. Introduction to Bayesian Network

Page 7: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• Graphs in which nodes represent random variables (cloudy? sprinkler? rain? wet grass)

• Arrows represent conditional independence assumptions. (e.g. P(W|S,R,C)=P(W|S,R))

• Present & absent arrows provide compact representation of joint probability distributions

• BNs have complicated notion of independence, which takes into account the “directionality” of the arrows

2. Introduction to Bayesian Network

Page 8: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Bayes’ Rule

Can rearrange the conditional probability formula

to get P(A|B) P(B) = P(A,B), but by symmetry we can also get: P(B|A) P(A) = P(A,B) It follows that:

 The power of Bayes' rule is that in many situations where we want to compute P(A|B) it turns out that it is difficult to do so directly, yet we might have direct information about P(B|A). Bayes' rule enables us to compute P(A|B) in terms of P(B|A).

2. Introduction to Bayesian Network

Page 9: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Cloudy

Sprinkler Rain

Wet Grass

P(C)=0.5, P(-C)=0.5

P(S|C)=0.1, P(-S|C)=0.9P(S|-C)=0.5, P(-S|-C)=0.5

P(R|C)=0.8, P(-R|C)=0.2P(R|-C)=0.2, P(-R|-C)=0.8

P(W|S,R)=0.99, P(-W|S,R)=0.01P(W|S,-R)=0.9, P(-W|S,-R)=0.1P(W|-S,R)=0.9, P(-W|-S,R)=0.1P(W|-S,-R)=0, P(-W|-S,-R)=1

From: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

Need prior P for root nodes and conditional Ps, that consider all possible values of parent nodes, for nonroot nodes.

2. Introduction to Bayesian Network

Page 10: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Major benefit of BN

• We can know P(W) based only on the conditional probabilities of W and its parent nodes (R and S). We don’t need to know/include all the ancestor probabilities between W and the root nodes (C) .

Cloudy

Sprinkler Rain

Wet Grass

2. Introduction to Bayesian Network

Page 11: Outline Biological motivation Introduction to graph models and Bayesian network Case study

This BN benefit hugely reduces # of numbers and computations needed for large networks, e.g. hundreds or thousands of genes

• SSR article: many separate Bayesian networks generated based on gene expression data. Here one activator and one repressor form basic BN, with 3 corresponding expression “contexts” shown at bottom.

2. Introduction to Bayesian Network

Page 12: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Cloudy

Sprinkler Rain

Wet Grass

P(C)=0.5, P(-C)=0.5

P(S|C)=0.1, P(-S|C)=0.9P(S|-C)=0.5, P(-S|-C)=0.5

P(R|C)=0.8, P(-R|C)=0.2P(R|-C)=0.2, P(-R|-C)=0.8

P(W|S,R)=0.99, P(-W|S,R)=0.01P(W|S,-R)=0.9, P(-W|S,-R)=0.1P(W|-S,R)=0.9, P(-W|-S,R)=0.1P(W|-S,-R)=0, P(-W|-S,-R)=1

From: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

Order of reduction of required numbers:Reduce from 24-1=15 to 9

2. Introduction to Bayesian Network

Page 13: Outline Biological motivation Introduction to graph models and Bayesian network Case study

De Jong (2002)

Bayesian network: general formulation

Given a graph G, the likelihood of observing the data D:

2. Introduction to Bayesian Network

)(|)|(1

ii

n

iXparentsXpGDP

Page 14: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Evaluating Bayesian networks:

• Generally NP hard!

Where do the numerical estimates of probability come from?

• Can be, at least initialized with, expert opinion• Can be learned by system• Both SSR and BSK articles lay out basics and some

details of iterative algorithms for finding probability numbers.

2. Introduction to Bayesian Network

Page 15: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Modellinging regulatory network

De Jong (2002)

2. Introduction to Bayesian Network

Page 16: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data”Segal, Shapira, Regev, Pe’er, Botstein, Koller,

Friedman [SSR]Nature Genetics, June 2003

• Bayesian network-based algorithms are applied to gene expression data to generate good testable hypotheses.

3. Case study

Page 17: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• Expression data set, from Patrick Brown’s lab, is for genes of yeast subjected to various kinds of stress

• Compiled list of 466 candidate regulators• Applied analysis to 2355 genes in all 173

arrays of yeast data set• This gave automatic inference of 50 modules

of genes• All modules were analyzed with external data

sources to check functional coherence of gene products and validity of regulatory program

• Three novel hypotheses suggested by method were tested in bio lab and found to be accurate

3. Case study

Page 18: Outline Biological motivation Introduction to graph models and Bayesian network Case study

3. Case study

Page 19: Outline Biological motivation Introduction to graph models and Bayesian network Case study

3. Case study

Page 20: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• 2 examples of 50 modules inferred by SSR methods:– Respiration – mostly genes encoding respiration

proteins or glucose-metabolism proteins. One primary regulator predicted – Hap4 – which is known from past experiments to play activation role in respiration. Secondary regulators affect Hap4 expression.

– Nitrogen catabolite repression – 29 genes tied to process by which yeast uses best available nitrogen source. Key regulator suggested is Gat1, due to 26 of 29 genes having Gat1 regulatory motif in their upstream regions.

3. Case study

Page 21: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Respiration Network:3. Case study

Two majormotifs found

Page 22: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• Evaluating module content and regulation programs– All 50 modules were tested to see if

proteins coded in same module had related functions

– Scored modules on how many genes are noted in current bio databases as being related to the predicted function – diagram, next slide

– 31 of 50 modules had coherence >50%; only 4 had coherence <30%.

3. Case study

Page 23: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Colored boxes indicate that known experimental evidence validates the predicted regulatory role of a regulator (named in one of the ‘Reg’ columns) in a given module (each row of the table).

M, C and G column headers and different colors of boxes represent different sorts of experimental evidence that validate the model’s prediction.

C(%): functional coherence of module, from literature mentions of module genes.

#G: number of genes in module

3. Case study

Page 24: Outline Biological motivation Introduction to graph models and Bayesian network Case study

• To find global relationships between modules, graph (next 2 slides) made showing modules & their motifs. Motifs were found within the 500 base pairs upstream from each gene.

• Observations from this graph: modules with related biological functions often shared at least one motif, & sometimes shared one or more regulator genes.

3. Case study

Page 25: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Module relationships

3. Case study

Yeast mutants of Kin82, Ppt1, Ypl230W were further tests to validate their relationship with the module.

Page 26: Outline Biological motivation Introduction to graph models and Bayesian network Case study

3. Case study

Page 27: Outline Biological motivation Introduction to graph models and Bayesian network Case study

What does a BN look like here?

• Need to specify two things to describe a BN– Graph topology (structure)– Parameters of each conditional

probability distribution

• Possible to learn both from data

• Learning structure is much harder than learning parameters

3. Case study

Page 28: Outline Biological motivation Introduction to graph models and Bayesian network Case study

What can we learn?Why the Segal’s paper can be

successful?• Yeast! Not mouse or human.

• Use gene clustering first. Start from simplified hypothesis and a small set of known regulators; not to attempt a network of thousands of genes.

• Experimental validation.

3. Case study

Page 29: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Network measures

Page 30: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Networks• Key elements: nodes and edges

• Binary network vs weighted network

• Directed vs undirected network

One simple way to generate a binary gene expression network:

(1) Calculate Pearson or Spearman correlation of each pair of genes.

(2) Use a threshold d (e.g d=0.6) to determine presence of an edge.

Page 31: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Connectivity (degree)Degree of node is the number of edges connected to the node.

For example:

Deg(a)=4

Deg(c)=5

Page 32: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Clustering coefficientThe (local) clustering coefficient of node i is a density measure of local connections,or “cliquishness”.

For example:

ClusterCoef(a)=4/6=0.66667

4 triangles involving a:(d,a,b) (c,a,g) (d,a,c) (d,a,g)

ClusterCoef(b)=2/6=0.33333

Page 33: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Shortest pathlengthThe shortest path length from one node to another.

For example:

Shortest path (from a to e):a->d->e, thus of length 2

Shortest path (from a to g);a->g, thus of length 1

Page 34: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Betweenness centralityBetweenness centrality is a measure of a node's centrality in a network equal tothe number of shortest paths from all vertices to all others that pass through that node. Betweenness centrality is a more useful measure of the load placed on the given node in the network as well as the node's importance to the network than just connectivity.

For example: g(c)=2.5

g-e 1/2f-a 1/3f-d 1/3f-e 1a-e 1/3

Page 35: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Network densityThe network density (also known as line density (Snijders 1981))is defined as the mean off-diagonal adjacency and is closely related to the meanconnectivity.

In this network:

Off diagonal connectivitysum is 14.

So the density equals:14/choose(7,2)=0.66667

Density

Page 36: Outline Biological motivation Introduction to graph models and Bayesian network Case study

AssortativityThe assortativity coefficient is the Pearson correlation coefficient of degree between pairs of linked nodes. Hence, positive values of r indicate a correlation between nodes of similar degree, while negative values indicate relationships between nodes of different degree.

In this network:

Assortativity is -0.296

Page 37: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Small-world networkSmall world property: Most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps.

Denote by distance L between two randomly chosen nodes (the number of steps required by shortest path) grows proportionally to the logarithm of the number of nodes N in the network, that is:

L log N

Page 38: Outline Biological motivation Introduction to graph models and Bayesian network Case study

Scale-free networkA scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

P(k) ~ c k-r

where c is a normalization constant and r is a parameter whose value is typically in the range 2 < r < 3.

Cohen and Havlin showed analytically that scale-free networks are ultra-small worlds. In this case, due to hubs, the shortest paths become significantly smaller and scale as

L~log(log N)