Biological Networks 8 April 2015
-
Upload
carol-lyons -
Category
Documents
-
view
217 -
download
0
description
Transcript of Biological Networks 8 April 2015
Biological Networks 8 April 2015
Slides courtesy of Eric Franzosa, Kimberly Glass Co-authorship of
scientific articles
Networks in Molecular Biology
Protein-Protein interactions Protein-DNA interactions Genetic
interactions Metabolic reactions Co-expression interactions Text
mining interactions Association Networks Etc. Barabasi &
Oltvai, Nature Reviews, 2004 Network terminology and
vocabulary
Paths Motifs Node metrics Network metrics Translation to biological
networks Undirected interactions Functional networks Predicting PPI
and inferring knowledge from them Directed networks Activies
Parsing network data; KEGG, STRING, and Cytoscape; Kevin Bacon
Introduction to networks NETWORKS
A network is a collection of things connected by relationships (in
math language a network is called a graph). It is a set of vertices
V and edges E (G=V, E). The things being connected are called
nodes
Vocabulary NETWORKS The things being connected are called nodes
(or, in math language, vertices). 2 1 5 4 3 V = {v1, v2, v3, v4,
v5, v6} 6 Relationships/connections between nodes are called
edges
Vocabulary NETWORKS Relationships/connections between nodes are
called edges (the same term is used in math language). 3 2 4 1 6 5
E = {(v1, v2), (v1, v3), (v1, v4), (v1, v5) , (v1, v6)} An edge is
said to be incident to two nodes.
Vocabulary NETWORKS An edge is said to be incident to two nodes.
Two nodes are connected by an edge. An edge can be undirected (A
and B do/are something)
Vocabulary NETWORKS An edge can be undirected (A and B do/are
something) A B or directed (A does/is something to B). A B NETWORKS
Network examples Network Node is... Edge is... Directed?
Person Friendship No NETWORKS Network examples Network Node is...
Edge is... Directed?
Politics Politician Shared project No NETWORKS Network examples
Network Node is... Edge is... Directed?
The Internet Website Hyperlink Yes NETWORKS Network examples
Network Node is... Edge is... Directed?
Family tree Person Descent/Marriage Yes/No Vocabulary NETWORKS The
number of edges incident to a node is the nodes degree.Nodes of
high degree are called hubs. 1 1 1 5 1 1 This hub is a 5th-degree
node In-Degree/Out-Degree
Vocabulary NETWORKS Degree in directed networks can be split into
in-degree and out-degree for number of incoming and outgoing edges
respectively. 1/0 0/1 1/0 2/3 @EF: Could mention in-degree and
out-degree hubs with directed edges.(Draw on the board.) 0/1 1/0
In-Degree/Out-Degree 15 Graphs Graph G=(V,E) is a set of vertices V
and edges E
V = {v1, v2, v3, v4, v5} E = {(v1, v2), (v1, v3), (v2, v4), (v2,
v5) , (v3, v5)} A subgraph G of G is induced by some V V and E E
For example, V = {v1, v2, v3} and E = {(v1, v2), (v1, v3)} v1 v3 v4
v2 v5 v1 v3 v2 Networks and Graphs: Terminology
Formally, a network is a graph is G = (V, E), an ordered tuple of
two sets V = {v1, , vn}, a set of unique nodes, and E = {(vi, vj),
}, a set of (un)ordered node tuples Bipartite Cyclic Multigraph
Acyclic (DAG) Weighted 0.5 1.2 6 -2 Loops (Self-connections)
Undirected Directed Sparse vs Dense G(V, E) where |V|=n, |E|=m the
number of vertices and edges Graph is sparse if m~n Graph is dense
if m~n2 Complete graph when m=n2 Connected Components G(V,E) |V| =
69 |E| = 71 Connected Components G(V,E) |V| = 69 |E| = 71 6
connected components PATHS Vocabulary INTRO End Node Path length =
4 Path Start Node 22 Vocabulary: Weighted Edges (distance)
INTRO
End Node 1 1 9 3 1 Weighted Path length = 9 3 Path 6 2 2 6 1 1 1 1
2 4 2 6 4 4 7 2 3 2 9 2 2 4 @EF: Are you going to say something
about the meaning of edge weights? 1 3 1 6 1 1 5 Start Node Larger
distance = weaker connection 23 Paths A path is a sequence {x1,
x2,, xn} such that (x1,x2), (x2,x3), , (xn-1,xn) are edges of the
graph. A closed path xn=x1on a graph is called a graph cycle or
circuit. Shortest-Path between nodes Shortest-Path between nodes
Longest Shortest-Path Breadth First Search (BFS) SHORTEST
PATH
Goal: Search for a node j starting from a start node i BFS
Algorithm - Begin at start node i - Explore all neighbors - For
each neighbor, explore its neighbors - Keep going till you find
search node j 28 Simple version does not account for weights
BFS Train Problem SHORTEST PATH How do I get from Frankfurt to
Munich with the fewest number of connections? FKM Simple version
does not account for weights 29 Finds shortest path in a
network
Dijkstras Algorithm SHORTEST PATH Finds shortest path in a network
Network can be weighted or unweighted (distances = 1) Network can
be directed or undirected Widely used in computer network routing
protocols and transportation route calculations Basic idea Consider
closest nodes (to start node) rather than every neighbor. In
unweighted case it is BFS. 30 Dijkstras Algorithm: Train Problem FM
SHORTEST PATH
Step 1 (Initialization): Sort neighbors by distance to start node.
Mark all nodes (except start) as unvisited. F 85 173 217 Ma W K 80
186 Node Visited? Distance F 1 Ma 85 Kr ??? K 173 W 217 N E A M S
103 Kr E N 502 250 183 A S 167 84 M 31 Dijkstras Algorithm: Train
Problem FM SHORTEST PATH
Step 2 (Visit closest neighbors): Visit closest node to start. Mark
as visited and keep track of path. Calculate/update neighbor
distances to start node. F 85 173 217 Ma W K 80 186 Node Visited?
Distance F 1 Ma 85 Kr 165 K 173 W 217 N ??? E A M S 103 Kr E N 502
250 183 A @EF: I see that space is tight, but I think you should
emphasize that you explore the next closest *unvisited* node at
each step.I had heard of (but not seen) this algorithm before and
it took me a couple passes through the slides before I figured out
what was going on.Its an awesome example/explanation though! S 167
84 M 32 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N ??? E A 415 M S 103 Kr E N 502 250 183
A S 167 84 M 33 Dijkstras Algorithm: Train Problem FM SHORTEST
PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N ??? E A 415 M 675 S 103 Kr E N 502 250
183 A S 167 84 M 34 Dijkstras Algorithm: Train Problem FM SHORTEST
PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S ??? 103 Kr E N
502 250 183 A S 167 84 M 35 Dijkstras Algorithm: Train Problem FM
SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S 503 103 Kr E N
502 250 183 A S 167 84 M 36 Dijkstras Algorithm: Train Problem FM
SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S 503 103 Kr E N
502 250 183 A S 167 84 M 37 Dijkstras Algorithm: Train Problem FM
SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F
1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 599 S 503 103 Kr E N
502 250 183 A S 167 84 M 38 Dijkstras Algorithm: Train Problem FM
SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never
revisit a node. DONE!!!!! F 85 173 217 Ma W K 80 186 Node Visited?
Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 599 S 503
103 Kr E N 502 250 183 A S 167 84 M 39 NODE METRICS Three typical
measures of centrality: Degree Centrality
What is centrality? CENTRALITY Centrality is a measure of the
relative importance of a node within a network Three typical
measures of centrality: Degree Centrality Closeness Centrality
Betweenness Centrality 41 What are nodes with high degree
centrality called? Hubs
Degree centrality is the simplest measure and is equal to the
degree of the node What are nodes with high degree centrality
called? Hubs Degree == how many other nodes is it connected to? 42
Hubs are nodes with a high degree
What are hubs? CENTRALITY Hubs are nodes with a high degree Date
Hubs: Interact with many at different times Party Hubs: all the
time Controversial: Is there a difference? @EF: Could ask about
this from a structural POV.A protein cannot bind 100 partners
simultaneously (max is like based on crystal geometry), so if you
have that many partners there must be something special happening,
e.g., a shared binding interface. Examples? 43 Removing hubs is bad
for network integrity CENTRALITY
Removing date hubs from yeast PPI network results in small
subgraphs 44 Single knockouts of essential genes cause the organism
to die
Hubs are essential CENTRALITY Single knockouts of essential genes
cause the organism to die Knockouts of hubs are more essential than
other genes in the yeast protein-protein interaction (PPI) network
@EF: Could recall genetic interactions, as this is related concept.
45 Knock-out lethality and connectivity How do you determine degree
cutoff? CENTRALITY
A hub is a node with ahigh degree Hub has degree > k k = 5 or 8
or 12 or 20 Hub has degree > degree of x % of all nodes x = 50
or 80 or 95 % The degree cutoff is (typically) determined ad hoc 47
Degree centrality is normalized CENTRALITY
CD(i) = Degree(i) / (N-1) Degree of node divided by total possible
nodes it could connect to (ignoring self loop) Normalized metric
for comparing same node in different networks 48 Closeness
centrality measures how close a node is to everything else
~ Average shortest path length to all other nodes 49 Betweenness
Centrality CENTRALITY
Betweenness centrality measures the number of times a node is
present in shortest paths between ALL pairs of nodes 50 Betweenness
Centrality CENTRALITY
Betweenness centrality measures the number of times a node is
present in shortest paths between ALL pairs of nodes @EF:The
political network is an extreme example of betweenness.The
bipartisan senators may not be degree hubs, but they are
betweenness hubs in that shortest paths from the democrat clique
(left) to the republican clique (right) must go through them. 51
Clustering Coefficient CLUSTERING COEFFICIENT
Clustering coefficient of node i (Ci) measures how close its
neighbors are to being a clique (completely connected subgraph)
Clique: All nodes interact # max edges = 6 CA: # edges = 2 CA = 2/6
= 1/3 A A B C B C 52 Clustering coefficient
The density of the network surrounding node I, characterized as the
number of triangles through I. Related to network modularity k:
neighbors of I nI: edges between node Is neighbors The center node
has 8 (grey) neighbors There are 4 edges between the neighbors C =
2*4 /(8*(8-1)) = 8/56 = 1/7 WHOLE NETWORK METRICS Node to Network
Properties NETWORK PROPERTIES
Simple set of properties come from averaging a given property of
all nodes: Degreeavg , Cavg Also you can average all distances
(shortest paths) Characteristic Path Length (CPL): Average distance
between all pairs of nodes But averages are highly dependent on the
number of nodes. It is better to look at a distribution (more in 3
slides) 55 RANDOM NETWORKS Network properties can be compared
against random (and randomized) networks to assess significance 56
Diameter: Maximum distance between all pairs of nodes
NETWORK PROPERTIES Diameter: Maximum distance between all pairs of
nodes Network properties allow you to compare different networks.
57 Random Networks: ER Model RANDOM NETWORKS
Erds-Rnyi (ER) model is a method for generating a random network
Algorithm: - Loop through each pair of N nodes Randomly add an edge
between them with probability p p = 0.01 Alfrd Rnyi Paul Erds 58
Real networks have different properties than random networks
Real vs. Random NETWORK PROPERTIES Real networks have different
properties than random networks Real networks are small-world and
scale-free 59 i.e. Small-world networks have small diameter
NETWORK PROPERTIES Small-world: Most nodes can be reached from
every other by a small number of steps i.e. Small-world networks
have small diameter President Teddy Roosevelt has a Bacon number of
3 6 of separation 60 Randomizing Networks: Swapping RANDOM
NETWORKS
Some properties such as shortest path length are heavily dependent
on the size of the network AND the degrees of the nodes To avoid
changing basic degree related properties, one can randomize an
existing real network by iteratively swapping the ends of two edges
@EF: Nice. X1 Y1 X1 Y1 X2 Y2 X2 Y2 61 NETWORK PROPERTIES
Scale-free
Degree Distribution: Frequency of all possible node degrees in a
network Scale-free: The degree distribution follows a power-law
i.e. Most nodes have small degree, but some have a very large
degree @EF: Could talk about why this is the case (i.e.,
preferential attachment).A new edge in a random network is assigned
to a random node, but in a scale-free network they connect to a
node with probability proportional to degree.Example: internet if
you start a website, you are more likely to provide a link to
google (high degree) than to my website (low degree). P(k) ~ k-g 62
Recurring pattern in network with a biological significance
Motifs NETWORK MOTIFS Recurring pattern in network with a
biological significance Pioneering work by Uri Alon 63 Biological
function of motifs NETWORK MOTIFS
Network motifs are considered the basic building blocks of a
network Network motifs act as information processing circuits
Coherent FFL acts as a noise filter X increases Y increases X and Y
increase Z increases Time delay between X increasing and Y
increasing @EF: Could talk about pulse generator (X increases Y, Z
increase; Y increases Z decreases).Also a good thought-provoking
homework problem TF x TF y Gene z 64 3-node model and simulation
NETWORK MOTIFS
65 Biological Networks Complexity comes from the set of
parts...
INTRO ...and their connections (e.g., metabolism)
INTRO How is biological data represented in networks?
High Correlation Low Gene expression Physical PPIs Genetic
interactions Colocalization Sequence Protein domains Regulatory
binding sites + = Building and Interpreting Biological
Networks
How we build a biological network depends on what data we have AND
what we want the edges in the network to represent. The meaning of
the edges in a biological network depend on the method used to
generate those edges. Influences how we interpret the interactions
in a network. node: an object in the network (e.g. genes) edge:
indicates relationship between two nodes Interpreting the edges in
Biological Networks
Relational Networks Generally Undirected (non-causal relationships)
Nodes all of same type Generally no signs on edges Example: Protein
A is a dimerization partner with protein B. A B Correlation Network
Undirected (non-causal relationships) Nodes all of same type Edges
can have signs Example: When the expression of Gene A changes, so
does the expression for Gene B. A B *Correlation is not causation.
Regulatory Network Directed Network (causal relationships) Can have
types of nodes Edges can have signs Example: TF A regulates Gene B.
A B Network examples (Molecular biology -omes) NETWORKS
Node is... Edge is... Directed? Physical Interactome Protein
Direct/indirect contact No Genetic Interactome Gene Epistatic
relationship Informatic Interactome Various Computed similarity
Regulatory Interactome 1 TF/gene Transcriptional activation Yes
Regulatory Interactome 2 Kinase/target Phosphorylation Metabolome 1
Reactant Reaction Metabolome 2 B A B A C PHYSICAL INTERACTION
Physical interactions between proteins (protein-protein
interactions) are intuitive to think about. Protein A makes direct
physical contact with Protein B in the cell; alternatively, A and B
both interact with a third (mediator) protein, C. A B B A C @JP:
Love the ginger bread men @JP: ACB is a complex perhaps you could
introduce the term complex @EF: Good idea, highlighted on next page
(real example). PHYSICAL INTERACTION Examples
ATP synthase is a large, stable complex of physically interacting
proteins.These are permanent* interactions. *also called obligate
or constitutive Examples PHYSICAL INTERACTION (1) Cyclin binds to
CDK and (2) the Cyclin-CDK complex binds to a target protein.These
are transient interactions. Detection PHYSICAL INTERACTION Some
physical interactions are inferred from biochemical
activities(e.g., a kinase and its target) or from structures (e.g.,
two chains in contact in the PDB). There are many experimental
techniques for validating or screening for protein-protein
interactions. The most popular are affinity capture and two-hybrid.
PHYSICAL INTERACTION Affinity capture
D A B The cells contents are exposed to a surface engineered to
bind a particular protein (the bait, here A).This is often done
using an antibody specific to A or a tag fused to A. PHYSICAL
INTERACTION Affinity capture
D A B The bait protein binds to the surface, bringing its various
interaction partners along with it (called prey). The unbound
cellular contents are then washed away.
Affinity capture PHYSICAL INTERACTION A C D A B The unbound
cellular contents are then washed away. PHYSICAL INTERACTION
Affinity capture
B A C D @JP: Mass Spectrometry not spectroscopy @EF: Doh!I was
tired too. @JP: Poor D. He looks like he just got whackederrr he
could also be a suicide bomber I need to go to bed Prey proteins
pulled down by the bait are identified using prey-specific
antibodies or by mass spectrometry. Affinity capture PHYSICAL
INTERACTION Method strengths: Done well, co-immunoprecipitation is
considered a gold standard of protein-protein interaction. Method
weaknesses: Cant differentiate between direct and indirect
(mediated) contact; prey must bind bait tightly to be pulled down.
PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions.
transcription factor BD AD UAS Gene PHYSICAL INTERACTION
Two-hybrid
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions. BD AD
Transcription ON UAS Gene PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions. Two fusion
proteins are made: BD-P1 (bait) and AD-P2 (prey). BD P1 AD P2
PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions. Two fusion
proteins are made: BD-P1 (bait) and AD-P2 (prey). AD P2 BD P1 UAS
Gene PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of
DNA-binding (BD) and transcription activating (AD) domains of
eukaryotic transcription factors to detect interactions. Two fusion
proteins are made: BD-P1 (bait) and AD-P2 (prey). Interaction of P1
and P2 is sufficient to initiate transcription. BD P1 AD P2 So we
can use a label l Transcription ON UAS Gene PHYSICAL INTERACTION
Two-hybrid Method strengths:
Scales well to very high-throughput screens; can detect transient
interactions; reasonably specific to binary (A+B) type
interactions. Method weaknesses: High false positive and negative
rates; fusion may affect bait/prey proteins ability to fold or
bind; bait/prey may not be able to enter the nucleus (required for
activation). @JP: Maybe you should use the term binary here @EF:
Good idea.Added it. GENETIC INTERACTIONS Genetic interactions are
more abstract. They go by many names, often recognized by the terms
phenotypic, synthetic, or dosage. All are related to the concept of
epistasis. GENETIC INTERACTIONS Epistasis
Lets say there are two methods of recreating ATP from ADP and Pi:
one mediated by gene 1 (solid) and another by gene 2 (dashed). gene
1 gene 2 GENETIC INTERACTIONS Epistasis
If only one of the two pathways is lost, the redundant pathway
remains, the cell can still produce ATP, and therefore lives.
Phenotype = alive. gene 1 gene 1 gene 2 gene 2 GENETIC INTERACTIONS
Epistasis
If both pathways are lost the cell cannot produce ATP and therefore
dies.Loss of both genes results in a new phenotype. Phenotype =
dead. gene 1 gene 2 This notion, that a new phenotype can result
from a combination of changes at the genetic level, is epistasis.We
report a genetic interaction between genes 1 and 2 called synthetic
lethality. (Related terms: sick, phenotypic enhancement, rescue).
GENETIC INTERACTIONS Genetic interactions can be useful for
identifying parallel pathways and other subtle (non-physical)
interactions. B B Complexes may also be revealed if they are robust
against the removal of one, but not two, components. A D A D C C
@JP: Directed pathways are inferred from undirected interactions.
@EF: I will try to mention this out-loud. B B D A C C Common
interaction databases DATABASES
BioGRID (http://www.thebiogrid.org/) Biological General Repository
for Interaction Datasets.Comprehensive, especially for yeast;
includes high throughput and small-scale analyses; 250,000
interactions. MINT (http://mint.bio.uniroma2.it/mint/) Molecular
Interaction database.Experimental interaction data manually curated
from literature.80,000 interactions. MIPS
(http://mips.helmholtz-muenchen.de/) Munich Information Center for
Protein Sequences.Very well curated; often used as a gold standard
of protein-protein interaction. HPRD (http://www.hprd.org/) Human
Protein Reference Database.Emphasis on human protein
bioinformatics, including 40,000 interactions. Others @JP: Is MIPS
the one with good complex (affinity capture) data? Single
interaction report DATABASES
Gene/Protein 1, code and alias Experimental method YOR128C YCR066W
ADE2 RAD18 Two-hybrid Uetz P (2000) @JP: Nice idea to show them
this Gene/Protein 2, code and alias Reference (including Pubmed ID)
Statistics from BioGRID (2009): Organisms DATABASES
Species Genes in Genome Reported Interactions % Confirmed %
Physical % Genetic Saccharomyces cerevisiae (Bakers Yeast) 6,000
95,978 25% 49% 54% Homo sapiens (Human) 25,000 26,864 29% 100% 1%
Drosophila melanogaster (Fruitfly) 14,000 24,953 11% 89%
Schizosaccharomyces pombe (Fission yeast) 5,000 11,562 16% 88%
Caenorhabditis elegans (Nematode worm) 20,000 6,622 2% 69% 31%
Arabidopsis Thaliana (Mouse-ear cress) 2,611 27% 97% 4% Mus
musculus (Mouse) 24,000 894 21% 99% 3% @JP: I dont get how
%Physical + %Genetic > 100 for yeast, human, mouse? @EF: An
interacting pair can be detected by both a physical and a genetic
method. Statistics from BioGRID (2009): Methods DATABASES
Method Type Method Name Interactions Reported Papers Using Physical
Two-hybrid 48,192 4,519 Affinity Capture-MS 31,258 655 Genetic
Phenotypic Enhancement 30,807 2,675 Affinity Capture-Western 16,524
8,763 Phenotypic Suppression 12,399 1,936 Synthetic Growth Defect
12,085 980 Reconstituted Complex 11,782 7,138 Synthetic Lethality
11,666 1,555 Biochemical Activity 6,657 1,370 Dosage Rescue 3,660
1,736 Synthetic Rescue 2,767 1,277 PCA 2,685 31 Co-purification
2,168 615 Affinity Capture-RNA 1,209 24 Co-fractionation 1,065 444
Statistics from BioGRID (2009): Papers DATABASES
Interactions Reported () Number of Papers 1 9,639 10 10,696 100
1,049 1,000 64 10,000 25 100,000 2 The vast majority of
interaction-reporting papers (94.7%) report 10 or fewer
interactions (99.6% for 100 or fewer). About 20% of known
interactions have only been observed in studies reporting 10 or
fewer interactions. Functional association network or Functional
linkage network (FLN)
What are they? FUNCTIONAL NETWORKS Functional association network
or Functional linkage network (FLN) Nodes are genes or proteins
Proteins aka functional association What can we use to functionally
link genes/proteins? GO! @EF: I have also heard the term Functional
Linkage Network (FLN) STRING FUNCTIONAL NETWORKS - Physical
interactions - Genomic context (e.g. gene fusion events)
Coexpression (microarray) Literature co-occurrence STRING
FUNCTIONAL NETWORKS Functional association Predicted physical
interaction Maybe? Works because they include another information:
Species co-occurrence (630 organisms!!) Homology based prediction
PPI PREDICTION
- Interacting proteins are more likely to co-evolve - Interactions
are transferred to corresponding orthologs A B physical
interaction? interaction ortholog Mouse Human Interologs:
Interacting AND Homologous
Homology based prediction PPI PREDICTION - Interacting proteins are
more likely to co-evolve - Interactions are transferred to
corresponding orthologs A B physical interaction ortholog Mouse
Human Interologs: Interacting AND Homologous Homology based
prediction PPI PREDICTION
- Interacting proteins are more likely to co-evolve - Interactions
are transferred to corresponding orthologs HOLD YOUR HORSES! A B
physical interaction ortholog Human Mouse @EF: I was just thinking,
You know what this lecture needs?More Piglet. @EF: Could mention
the word interologs here, something Brandon taught me recently.
Phylogenetic profiling PPI PREDICTION
Ortholog interactions must be present across manyspecies Human
Mouse Chicken Yeast Worm Fly Fugu E. Coli A-B ? Yes No Phylogenetic
profiling PPI PREDICTION
Ortholog interactions must be present across manyspecies 5 out of 7
p-value = Human Mouse Chicken Yeast Worm Fly Fugu E. Coli A-B Yes
No PPI PREDICTION Phylogenetic tree similarity
- Entirely based on co-evolution - A and B have similar trees they
must interact @EF: Could mention hemoglobin alpha/beta, which they
have seen from searching UniProt Protein A Protein B - Identify
interaction interfaces from structures
Structural patterns PPI PREDICTION - Identify interaction
interfaces from structures - Search for the same interface in other
pairs of PDBstructures A B Interface Integrate all information PPI
PREDICTION
The best prediction algorithms integrate differentevidences using
machine learning like STRING Basic idea: Step 1: Identify recurring
evidence pattern in knowninteractions training Step 2: Identify new
interactions by searching for sameevidence pattern in unknown
protein pairs testing @EF: Maybe open for discussion, What are some
other things that make you think two proteins interact? @EF: One
clever option is that they share domains that interact in other
proteins; also co-expression and co-function (which you mentioned)
and co-compartment. How to use interactomes?
PPI ANALYSES Remember: Network is undirected Clustering Find
complexes Protein neighborhoods functional Other Inferring
knowledge such as functional annotations Clustering PPI ANALYSES A
B C D E F PPI ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors
Interacting partner annotations: BLUE GREEN A B C D E F A B C D E F
A B C D E F PPI ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors
Interacting partner annotations: BLUE GREEN A B C D E F A B C D E F
best = max Correlation Networks A B C D E F A B C D E F PPI
ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors
Interacting partner annotations: BLUE GREEN McGary et al, Genome
Biology, 2007 A B C D E F A B C D E F all Correlation is the
simplest metric for co-expression
genes genes conditions genes Mutual Information is a Measure of
Non-linear Correlation
Pearson correlation value Source: Mutual Information (MI)
Definition Properties Measures how much knowing one of these
variables reduces uncertainty about the other Positive and
symmetric Invariant under nonlinear transformation Network
Reconstruction Algorithms that use MI: ARACNE CLR Transcriptional
Regulation
Regulatory Networks DIRECTED NETWORKS Signaling Phosphorylation
Activation Inhibition Protein A Protein B Transcriptional
Regulation Expression Repression TF A Gene B TF = Transcription
Factor Regulatory Networks DIRECTED NETWORKS Regulatory Networks
Signal at Cell Surface
Cascade to Nucleus Activate Transcription Factors TF @EF: Its such
a good feeling when you know I have a slide for that! Genes Gene
Expression Transcriptional Regulatory Networks
TF NETWORKS Identify genes where transcription factors bind DNA
binding sites Experimental techniques Computational prediction
Identifying DNA Binding Sites: Experiments
TF NETWORKS ChIP-chip Chromatin immunoprecipitation (ChIP) followed
bymicroarray analysis (chip) or sequencing (seq) Identifying DNA
Binding Sites: Computational
TF NETWORKS Motif Scanning Scan promoters using position weight
matrices (PWM) Yeast Transcriptional Regulatory Network
TF NETWORKS Rick Young dataset Yeast Transcriptional Regulatory
Network TF NETWORKS
TF TF interactions only Every edge can be an activation or an
inhibition. Edges: activation or inhibition (multiple edge
types!)
Overview SIGNALING NETWORKS Edges: activation or inhibition
(multiple edge types!) @EF: Could draw a simple circuit like a feed
forward loop on the board and ask what happens as you ramp up the
expression of one of the members in the circuit. @JP: I was going
to talk about motifs and simple building blocks in the next
lecture. KEGG Pathways Database
SIGNALING NETWORKS Edges: activation, inhibition, phosphorylation,
etc. KEGG Pathways Database
SIGNALING NETWORKS Literature curated, manually drawn pathways
Groups of pathways Metabolism Genetic Information Processing
Environmental Information Processing Cellular Processes Human
Diseases Pathways are both species specific & cross-species
(KO) Other Pathway Databases
SIGNALING NETWORKS KEGG (http://www.kegg.jp/kegg/pathway.html)
Great for metabolic pathways. Simple interface. Multiple species
including prokaryotes. REACTOME (http://www.reactome.org/)
Supposedly the most comprehensive resource for signal transduction
pathways. Human only. BIOCARTA
(http://www.biocarta.com/genes/index.asp) Pretty maps with lots of
colors. Mammalian. Experiments SIGNALING NETWORKS Decades of low
throughput, painstaking experiments Stimulation Mutants Structure
Context No single experiment type to deduce signaling network -
Chain regulatory interactions
Directions = Pathways DIRECTED NETWORKS - Chain regulatory
interactions - Concept of pathway emerges from directions - New
analyses not possible with undirected networks TF A TF B TF C Gene
D Recept. A Kinase B Signal Protein C TF D DIRECTED NETWORKS
Connect the dots Signal at Cell Surface
Cascade to Nucleus Activate Transcription Factors TF Genes Gene
Expression Clustering DIRECTED NETWORKS Network Analysis and
Visualization
SUMMARY / APPLICATION Functional mapping: mining biological
networks
Predicted relationships between genes High Confidence Low The
strength of these relationships indicates how cohesive a process
is. Cell cycle genes Functional mapping: mining biological
networks
Predicted relationships between genes High Confidence Low Cell
cycle genes Functional mapping: mining biological networks
Predicted relationships between genes High Confidence Low The
strength of these relationships indicates how associated two
processes are. Cell cycle genes DNA replication genes Predicting
gene function
Predicted relationships between genes High Confidence Low Cell
cycle genes Predicting gene function
Predicted relationships between genes High Confidence Low Cell
cycle genes Predicting gene function
Predicted relationships between genes High Confidence Low These
edges provide a measure of how likely a gene is to specifically
participate in the process of interest. Cell cycle genes IMAGE
SOURCES Slide Source 142 1
Slide numbers are no longer correct due to rearrangement and slide
deck merging, but consult these URLs for all otherwise unattributed
images Slide Source 1 4 8 31 34 36 38 45 48 49 50 51
https://www.weizmann.ac.il/complex/tlusty/courses/InfoInBio/Papers/AlonMotifs2002.pdf
53 142 IMAGE SOURCES Slide Source 143 1
17 29 41 42 43 45 50 143