Biological Networks 8 April 2015

download Biological Networks 8 April 2015

If you can't read please download the document

description

Co-authorship of scientific articles http://www.jeffkennedyassociates.com:16080/connections/concept/image.html

Transcript of Biological Networks 8 April 2015

Biological Networks 8 April 2015
Slides courtesy of Eric Franzosa, Kimberly Glass Co-authorship of scientific articles
Networks in Molecular Biology
Protein-Protein interactions Protein-DNA interactions Genetic interactions Metabolic reactions Co-expression interactions Text mining interactions Association Networks Etc. Barabasi & Oltvai, Nature Reviews, 2004 Network terminology and vocabulary
Paths Motifs Node metrics Network metrics Translation to biological networks Undirected interactions Functional networks Predicting PPI and inferring knowledge from them Directed networks Activies Parsing network data; KEGG, STRING, and Cytoscape; Kevin Bacon Introduction to networks NETWORKS
A network is a collection of things connected by relationships (in math language a network is called a graph). It is a set of vertices V and edges E (G=V, E). The things being connected are called nodes
Vocabulary NETWORKS The things being connected are called nodes (or, in math language, vertices). 2 1 5 4 3 V = {v1, v2, v3, v4, v5, v6} 6 Relationships/connections between nodes are called edges
Vocabulary NETWORKS Relationships/connections between nodes are called edges (the same term is used in math language). 3 2 4 1 6 5 E = {(v1, v2), (v1, v3), (v1, v4), (v1, v5) , (v1, v6)} An edge is said to be incident to two nodes.
Vocabulary NETWORKS An edge is said to be incident to two nodes. Two nodes are connected by an edge. An edge can be undirected (A and B do/are something)
Vocabulary NETWORKS An edge can be undirected (A and B do/are something) A B or directed (A does/is something to B). A B NETWORKS Network examples Network Node is... Edge is... Directed?
Person Friendship No NETWORKS Network examples Network Node is... Edge is... Directed?
Politics Politician Shared project No NETWORKS Network examples Network Node is... Edge is... Directed?
The Internet Website Hyperlink Yes NETWORKS Network examples Network Node is... Edge is... Directed?
Family tree Person Descent/Marriage Yes/No Vocabulary NETWORKS The number of edges incident to a node is the nodes degree.Nodes of high degree are called hubs. 1 1 1 5 1 1 This hub is a 5th-degree node In-Degree/Out-Degree
Vocabulary NETWORKS Degree in directed networks can be split into in-degree and out-degree for number of incoming and outgoing edges respectively. 1/0 0/1 1/0 2/3 @EF: Could mention in-degree and out-degree hubs with directed edges.(Draw on the board.) 0/1 1/0 In-Degree/Out-Degree 15 Graphs Graph G=(V,E) is a set of vertices V and edges E
V = {v1, v2, v3, v4, v5} E = {(v1, v2), (v1, v3), (v2, v4), (v2, v5) , (v3, v5)} A subgraph G of G is induced by some V V and E E For example, V = {v1, v2, v3} and E = {(v1, v2), (v1, v3)} v1 v3 v4 v2 v5 v1 v3 v2 Networks and Graphs: Terminology
Formally, a network is a graph is G = (V, E), an ordered tuple of two sets V = {v1, , vn}, a set of unique nodes, and E = {(vi, vj), }, a set of (un)ordered node tuples Bipartite Cyclic Multigraph Acyclic (DAG) Weighted 0.5 1.2 6 -2 Loops (Self-connections) Undirected Directed Sparse vs Dense G(V, E) where |V|=n, |E|=m the number of vertices and edges Graph is sparse if m~n Graph is dense if m~n2 Complete graph when m=n2 Connected Components G(V,E) |V| = 69 |E| = 71 Connected Components G(V,E) |V| = 69 |E| = 71 6 connected components PATHS Vocabulary INTRO End Node Path length = 4 Path Start Node 22 Vocabulary: Weighted Edges (distance) INTRO
End Node 1 1 9 3 1 Weighted Path length = 9 3 Path 6 2 2 6 1 1 1 1 2 4 2 6 4 4 7 2 3 2 9 2 2 4 @EF: Are you going to say something about the meaning of edge weights? 1 3 1 6 1 1 5 Start Node Larger distance = weaker connection 23 Paths A path is a sequence {x1, x2,, xn} such that (x1,x2), (x2,x3), , (xn-1,xn) are edges of the graph. A closed path xn=x1on a graph is called a graph cycle or circuit. Shortest-Path between nodes Shortest-Path between nodes Longest Shortest-Path Breadth First Search (BFS) SHORTEST PATH
Goal: Search for a node j starting from a start node i BFS Algorithm - Begin at start node i - Explore all neighbors - For each neighbor, explore its neighbors - Keep going till you find search node j 28 Simple version does not account for weights
BFS Train Problem SHORTEST PATH How do I get from Frankfurt to Munich with the fewest number of connections? FKM Simple version does not account for weights 29 Finds shortest path in a network
Dijkstras Algorithm SHORTEST PATH Finds shortest path in a network Network can be weighted or unweighted (distances = 1) Network can be directed or undirected Widely used in computer network routing protocols and transportation route calculations Basic idea Consider closest nodes (to start node) rather than every neighbor. In unweighted case it is BFS. 30 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 1 (Initialization): Sort neighbors by distance to start node. Mark all nodes (except start) as unvisited. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr ??? K 173 W 217 N E A M S 103 Kr E N 502 250 183 A S 167 84 M 31 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 2 (Visit closest neighbors): Visit closest node to start. Mark as visited and keep track of path. Calculate/update neighbor distances to start node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N ??? E A M S 103 Kr E N 502 250 183 A @EF: I see that space is tight, but I think you should emphasize that you explore the next closest *unvisited* node at each step.I had heard of (but not seen) this algorithm before and it took me a couple passes through the slides before I figured out what was going on.Its an awesome example/explanation though! S 167 84 M 32 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N ??? E A 415 M S 103 Kr E N 502 250 183 A S 167 84 M 33 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N ??? E A 415 M 675 S 103 Kr E N 502 250 183 A S 167 84 M 34 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S ??? 103 Kr E N 502 250 183 A S 167 84 M 35 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S 503 103 Kr E N 502 250 183 A S 167 84 M 36 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 675 S 503 103 Kr E N 502 250 183 A S 167 84 M 37 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 599 S 503 103 Kr E N 502 250 183 A S 167 84 M 38 Dijkstras Algorithm: Train Problem FM SHORTEST PATH
Step 3 (Repeat): Repeat Step 2 until you reach destination. Never revisit a node. DONE!!!!! F 85 173 217 Ma W K 80 186 Node Visited? Distance F 1 Ma 85 Kr 165 K 173 W 217 N 320 E 403 A 415 M 599 S 503 103 Kr E N 502 250 183 A S 167 84 M 39 NODE METRICS Three typical measures of centrality: Degree Centrality
What is centrality? CENTRALITY Centrality is a measure of the relative importance of a node within a network Three typical measures of centrality: Degree Centrality Closeness Centrality Betweenness Centrality 41 What are nodes with high degree centrality called? Hubs
Degree centrality is the simplest measure and is equal to the degree of the node What are nodes with high degree centrality called? Hubs Degree == how many other nodes is it connected to? 42 Hubs are nodes with a high degree
What are hubs? CENTRALITY Hubs are nodes with a high degree Date Hubs: Interact with many at different times Party Hubs: all the time Controversial: Is there a difference? @EF: Could ask about this from a structural POV.A protein cannot bind 100 partners simultaneously (max is like based on crystal geometry), so if you have that many partners there must be something special happening, e.g., a shared binding interface. Examples? 43 Removing hubs is bad for network integrity CENTRALITY
Removing date hubs from yeast PPI network results in small subgraphs 44 Single knockouts of essential genes cause the organism to die
Hubs are essential CENTRALITY Single knockouts of essential genes cause the organism to die Knockouts of hubs are more essential than other genes in the yeast protein-protein interaction (PPI) network @EF: Could recall genetic interactions, as this is related concept. 45 Knock-out lethality and connectivity How do you determine degree cutoff? CENTRALITY
A hub is a node with ahigh degree Hub has degree > k k = 5 or 8 or 12 or 20 Hub has degree > degree of x % of all nodes x = 50 or 80 or 95 % The degree cutoff is (typically) determined ad hoc 47 Degree centrality is normalized CENTRALITY
CD(i) = Degree(i) / (N-1) Degree of node divided by total possible nodes it could connect to (ignoring self loop) Normalized metric for comparing same node in different networks 48 Closeness centrality measures how close a node is to everything else
~ Average shortest path length to all other nodes 49 Betweenness Centrality CENTRALITY
Betweenness centrality measures the number of times a node is present in shortest paths between ALL pairs of nodes 50 Betweenness Centrality CENTRALITY
Betweenness centrality measures the number of times a node is present in shortest paths between ALL pairs of nodes @EF:The political network is an extreme example of betweenness.The bipartisan senators may not be degree hubs, but they are betweenness hubs in that shortest paths from the democrat clique (left) to the republican clique (right) must go through them. 51 Clustering Coefficient CLUSTERING COEFFICIENT
Clustering coefficient of node i (Ci) measures how close its neighbors are to being a clique (completely connected subgraph) Clique: All nodes interact # max edges = 6 CA: # edges = 2 CA = 2/6 = 1/3 A A B C B C 52 Clustering coefficient
The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity k: neighbors of I nI: edges between node Is neighbors The center node has 8 (grey) neighbors There are 4 edges between the neighbors C = 2*4 /(8*(8-1)) = 8/56 = 1/7 WHOLE NETWORK METRICS Node to Network Properties NETWORK PROPERTIES
Simple set of properties come from averaging a given property of all nodes: Degreeavg , Cavg Also you can average all distances (shortest paths) Characteristic Path Length (CPL): Average distance between all pairs of nodes But averages are highly dependent on the number of nodes. It is better to look at a distribution (more in 3 slides) 55 RANDOM NETWORKS Network properties can be compared against random (and randomized) networks to assess significance 56 Diameter: Maximum distance between all pairs of nodes
NETWORK PROPERTIES Diameter: Maximum distance between all pairs of nodes Network properties allow you to compare different networks. 57 Random Networks: ER Model RANDOM NETWORKS
Erds-Rnyi (ER) model is a method for generating a random network Algorithm: - Loop through each pair of N nodes Randomly add an edge between them with probability p p = 0.01 Alfrd Rnyi Paul Erds 58 Real networks have different properties than random networks
Real vs. Random NETWORK PROPERTIES Real networks have different properties than random networks Real networks are small-world and scale-free 59 i.e. Small-world networks have small diameter
NETWORK PROPERTIES Small-world: Most nodes can be reached from every other by a small number of steps i.e. Small-world networks have small diameter President Teddy Roosevelt has a Bacon number of 3 6 of separation 60 Randomizing Networks: Swapping RANDOM NETWORKS
Some properties such as shortest path length are heavily dependent on the size of the network AND the degrees of the nodes To avoid changing basic degree related properties, one can randomize an existing real network by iteratively swapping the ends of two edges @EF: Nice. X1 Y1 X1 Y1 X2 Y2 X2 Y2 61 NETWORK PROPERTIES Scale-free
Degree Distribution: Frequency of all possible node degrees in a network Scale-free: The degree distribution follows a power-law i.e. Most nodes have small degree, but some have a very large degree @EF: Could talk about why this is the case (i.e., preferential attachment).A new edge in a random network is assigned to a random node, but in a scale-free network they connect to a node with probability proportional to degree.Example: internet if you start a website, you are more likely to provide a link to google (high degree) than to my website (low degree). P(k) ~ k-g 62 Recurring pattern in network with a biological significance
Motifs NETWORK MOTIFS Recurring pattern in network with a biological significance Pioneering work by Uri Alon 63 Biological function of motifs NETWORK MOTIFS
Network motifs are considered the basic building blocks of a network Network motifs act as information processing circuits Coherent FFL acts as a noise filter X increases Y increases X and Y increase Z increases Time delay between X increasing and Y increasing @EF: Could talk about pulse generator (X increases Y, Z increase; Y increases Z decreases).Also a good thought-provoking homework problem TF x TF y Gene z 64 3-node model and simulation NETWORK MOTIFS
65 Biological Networks Complexity comes from the set of parts...
INTRO ...and their connections (e.g., metabolism)
INTRO How is biological data represented in networks?
High Correlation Low Gene expression Physical PPIs Genetic interactions Colocalization Sequence Protein domains Regulatory binding sites + = Building and Interpreting Biological Networks
How we build a biological network depends on what data we have AND what we want the edges in the network to represent. The meaning of the edges in a biological network depend on the method used to generate those edges. Influences how we interpret the interactions in a network. node: an object in the network (e.g. genes) edge: indicates relationship between two nodes Interpreting the edges in Biological Networks
Relational Networks Generally Undirected (non-causal relationships) Nodes all of same type Generally no signs on edges Example: Protein A is a dimerization partner with protein B. A B Correlation Network Undirected (non-causal relationships) Nodes all of same type Edges can have signs Example: When the expression of Gene A changes, so does the expression for Gene B. A B *Correlation is not causation. Regulatory Network Directed Network (causal relationships) Can have types of nodes Edges can have signs Example: TF A regulates Gene B. A B Network examples (Molecular biology -omes) NETWORKS
Node is... Edge is... Directed? Physical Interactome Protein Direct/indirect contact No Genetic Interactome Gene Epistatic relationship Informatic Interactome Various Computed similarity Regulatory Interactome 1 TF/gene Transcriptional activation Yes Regulatory Interactome 2 Kinase/target Phosphorylation Metabolome 1 Reactant Reaction Metabolome 2 B A B A C PHYSICAL INTERACTION
Physical interactions between proteins (protein-protein interactions) are intuitive to think about. Protein A makes direct physical contact with Protein B in the cell; alternatively, A and B both interact with a third (mediator) protein, C. A B B A C @JP: Love the ginger bread men @JP: ACB is a complex perhaps you could introduce the term complex @EF: Good idea, highlighted on next page (real example). PHYSICAL INTERACTION Examples
ATP synthase is a large, stable complex of physically interacting proteins.These are permanent* interactions. *also called obligate or constitutive Examples PHYSICAL INTERACTION (1) Cyclin binds to CDK and (2) the Cyclin-CDK complex binds to a target protein.These are transient interactions. Detection PHYSICAL INTERACTION Some physical interactions are inferred from biochemical activities(e.g., a kinase and its target) or from structures (e.g., two chains in contact in the PDB). There are many experimental techniques for validating or screening for protein-protein interactions. The most popular are affinity capture and two-hybrid. PHYSICAL INTERACTION Affinity capture
D A B The cells contents are exposed to a surface engineered to bind a particular protein (the bait, here A).This is often done using an antibody specific to A or a tag fused to A. PHYSICAL INTERACTION Affinity capture
D A B The bait protein binds to the surface, bringing its various interaction partners along with it (called prey). The unbound cellular contents are then washed away.
Affinity capture PHYSICAL INTERACTION A C D A B The unbound cellular contents are then washed away. PHYSICAL INTERACTION Affinity capture
B A C D @JP: Mass Spectrometry not spectroscopy @EF: Doh!I was tired too. @JP: Poor D. He looks like he just got whackederrr he could also be a suicide bomber I need to go to bed Prey proteins pulled down by the bait are identified using prey-specific antibodies or by mass spectrometry. Affinity capture PHYSICAL INTERACTION Method strengths: Done well, co-immunoprecipitation is considered a gold standard of protein-protein interaction. Method weaknesses: Cant differentiate between direct and indirect (mediated) contact; prey must bind bait tightly to be pulled down. PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. transcription factor BD AD UAS Gene PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. BD AD Transcription ON UAS Gene PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). BD P1 AD P2 PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). AD P2 BD P1 UAS Gene PHYSICAL INTERACTION Two-hybrid
The two-hybrid method manipulates the independent operation of DNA-binding (BD) and transcription activating (AD) domains of eukaryotic transcription factors to detect interactions. Two fusion proteins are made: BD-P1 (bait) and AD-P2 (prey). Interaction of P1 and P2 is sufficient to initiate transcription. BD P1 AD P2 So we can use a label l Transcription ON UAS Gene PHYSICAL INTERACTION Two-hybrid Method strengths:
Scales well to very high-throughput screens; can detect transient interactions; reasonably specific to binary (A+B) type interactions. Method weaknesses: High false positive and negative rates; fusion may affect bait/prey proteins ability to fold or bind; bait/prey may not be able to enter the nucleus (required for activation). @JP: Maybe you should use the term binary here @EF: Good idea.Added it. GENETIC INTERACTIONS Genetic interactions are more abstract. They go by many names, often recognized by the terms phenotypic, synthetic, or dosage. All are related to the concept of epistasis. GENETIC INTERACTIONS Epistasis
Lets say there are two methods of recreating ATP from ADP and Pi: one mediated by gene 1 (solid) and another by gene 2 (dashed). gene 1 gene 2 GENETIC INTERACTIONS Epistasis
If only one of the two pathways is lost, the redundant pathway remains, the cell can still produce ATP, and therefore lives. Phenotype = alive. gene 1 gene 1 gene 2 gene 2 GENETIC INTERACTIONS Epistasis
If both pathways are lost the cell cannot produce ATP and therefore dies.Loss of both genes results in a new phenotype. Phenotype = dead. gene 1 gene 2 This notion, that a new phenotype can result from a combination of changes at the genetic level, is epistasis.We report a genetic interaction between genes 1 and 2 called synthetic lethality. (Related terms: sick, phenotypic enhancement, rescue). GENETIC INTERACTIONS Genetic interactions can be useful for identifying parallel pathways and other subtle (non-physical) interactions. B B Complexes may also be revealed if they are robust against the removal of one, but not two, components. A D A D C C @JP: Directed pathways are inferred from undirected interactions. @EF: I will try to mention this out-loud. B B D A C C Common interaction databases DATABASES
BioGRID (http://www.thebiogrid.org/) Biological General Repository for Interaction Datasets.Comprehensive, especially for yeast; includes high throughput and small-scale analyses; 250,000 interactions. MINT (http://mint.bio.uniroma2.it/mint/) Molecular Interaction database.Experimental interaction data manually curated from literature.80,000 interactions. MIPS (http://mips.helmholtz-muenchen.de/) Munich Information Center for Protein Sequences.Very well curated; often used as a gold standard of protein-protein interaction. HPRD (http://www.hprd.org/) Human Protein Reference Database.Emphasis on human protein bioinformatics, including 40,000 interactions. Others @JP: Is MIPS the one with good complex (affinity capture) data? Single interaction report DATABASES
Gene/Protein 1, code and alias Experimental method YOR128C YCR066W ADE2 RAD18 Two-hybrid Uetz P (2000) @JP: Nice idea to show them this Gene/Protein 2, code and alias Reference (including Pubmed ID) Statistics from BioGRID (2009): Organisms DATABASES
Species Genes in Genome Reported Interactions % Confirmed % Physical % Genetic Saccharomyces cerevisiae (Bakers Yeast) 6,000 95,978 25% 49% 54% Homo sapiens (Human) 25,000 26,864 29% 100% 1% Drosophila melanogaster (Fruitfly) 14,000 24,953 11% 89% Schizosaccharomyces pombe (Fission yeast) 5,000 11,562 16% 88% Caenorhabditis elegans (Nematode worm) 20,000 6,622 2% 69% 31% Arabidopsis Thaliana (Mouse-ear cress) 2,611 27% 97% 4% Mus musculus (Mouse) 24,000 894 21% 99% 3% @JP: I dont get how %Physical + %Genetic > 100 for yeast, human, mouse? @EF: An interacting pair can be detected by both a physical and a genetic method. Statistics from BioGRID (2009): Methods DATABASES
Method Type Method Name Interactions Reported Papers Using Physical Two-hybrid 48,192 4,519 Affinity Capture-MS 31,258 655 Genetic Phenotypic Enhancement 30,807 2,675 Affinity Capture-Western 16,524 8,763 Phenotypic Suppression 12,399 1,936 Synthetic Growth Defect 12,085 980 Reconstituted Complex 11,782 7,138 Synthetic Lethality 11,666 1,555 Biochemical Activity 6,657 1,370 Dosage Rescue 3,660 1,736 Synthetic Rescue 2,767 1,277 PCA 2,685 31 Co-purification 2,168 615 Affinity Capture-RNA 1,209 24 Co-fractionation 1,065 444 Statistics from BioGRID (2009): Papers DATABASES
Interactions Reported () Number of Papers 1 9,639 10 10,696 100 1,049 1,000 64 10,000 25 100,000 2 The vast majority of interaction-reporting papers (94.7%) report 10 or fewer interactions (99.6% for 100 or fewer). About 20% of known interactions have only been observed in studies reporting 10 or fewer interactions. Functional association network or Functional linkage network (FLN)
What are they? FUNCTIONAL NETWORKS Functional association network or Functional linkage network (FLN) Nodes are genes or proteins Proteins aka functional association What can we use to functionally link genes/proteins? GO! @EF: I have also heard the term Functional Linkage Network (FLN) STRING FUNCTIONAL NETWORKS - Physical interactions - Genomic context (e.g. gene fusion events) Coexpression (microarray) Literature co-occurrence STRING FUNCTIONAL NETWORKS Functional association Predicted physical interaction Maybe? Works because they include another information: Species co-occurrence (630 organisms!!) Homology based prediction PPI PREDICTION
- Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs A B physical interaction? interaction ortholog Mouse Human Interologs: Interacting AND Homologous
Homology based prediction PPI PREDICTION - Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs A B physical interaction ortholog Mouse Human Interologs: Interacting AND Homologous Homology based prediction PPI PREDICTION
- Interacting proteins are more likely to co-evolve - Interactions are transferred to corresponding orthologs HOLD YOUR HORSES! A B physical interaction ortholog Human Mouse @EF: I was just thinking, You know what this lecture needs?More Piglet. @EF: Could mention the word interologs here, something Brandon taught me recently. Phylogenetic profiling PPI PREDICTION
Ortholog interactions must be present across manyspecies Human Mouse Chicken Yeast Worm Fly Fugu E. Coli A-B ? Yes No Phylogenetic profiling PPI PREDICTION
Ortholog interactions must be present across manyspecies 5 out of 7 p-value = Human Mouse Chicken Yeast Worm Fly Fugu E. Coli A-B Yes No PPI PREDICTION Phylogenetic tree similarity
- Entirely based on co-evolution - A and B have similar trees they must interact @EF: Could mention hemoglobin alpha/beta, which they have seen from searching UniProt Protein A Protein B - Identify interaction interfaces from structures
Structural patterns PPI PREDICTION - Identify interaction interfaces from structures - Search for the same interface in other pairs of PDBstructures A B Interface Integrate all information PPI PREDICTION
The best prediction algorithms integrate differentevidences using machine learning like STRING Basic idea: Step 1: Identify recurring evidence pattern in knowninteractions training Step 2: Identify new interactions by searching for sameevidence pattern in unknown protein pairs testing @EF: Maybe open for discussion, What are some other things that make you think two proteins interact? @EF: One clever option is that they share domains that interact in other proteins; also co-expression and co-function (which you mentioned) and co-compartment. How to use interactomes?
PPI ANALYSES Remember: Network is undirected Clustering Find complexes Protein neighborhoods functional Other Inferring knowledge such as functional annotations Clustering PPI ANALYSES A B C D E F PPI ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors Interacting partner annotations: BLUE GREEN A B C D E F A B C D E F A B C D E F PPI ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors Interacting partner annotations: BLUE GREEN A B C D E F A B C D E F best = max Correlation Networks A B C D E F A B C D E F PPI ANALYSES Function Assignment
Guilt-by-association: Function is transferred fromneighbors Interacting partner annotations: BLUE GREEN McGary et al, Genome Biology, 2007 A B C D E F A B C D E F all Correlation is the simplest metric for co-expression
genes genes conditions genes Mutual Information is a Measure of Non-linear Correlation
Pearson correlation value Source: Mutual Information (MI)
Definition Properties Measures how much knowing one of these variables reduces uncertainty about the other Positive and symmetric Invariant under nonlinear transformation Network Reconstruction Algorithms that use MI: ARACNE CLR Transcriptional Regulation
Regulatory Networks DIRECTED NETWORKS Signaling Phosphorylation Activation Inhibition Protein A Protein B Transcriptional Regulation Expression Repression TF A Gene B TF = Transcription Factor Regulatory Networks DIRECTED NETWORKS Regulatory Networks Signal at Cell Surface
Cascade to Nucleus Activate Transcription Factors TF @EF: Its such a good feeling when you know I have a slide for that! Genes Gene Expression Transcriptional Regulatory Networks
TF NETWORKS Identify genes where transcription factors bind DNA binding sites Experimental techniques Computational prediction Identifying DNA Binding Sites: Experiments
TF NETWORKS ChIP-chip Chromatin immunoprecipitation (ChIP) followed bymicroarray analysis (chip) or sequencing (seq) Identifying DNA Binding Sites: Computational
TF NETWORKS Motif Scanning Scan promoters using position weight matrices (PWM) Yeast Transcriptional Regulatory Network
TF NETWORKS Rick Young dataset Yeast Transcriptional Regulatory Network TF NETWORKS
TF TF interactions only Every edge can be an activation or an inhibition. Edges: activation or inhibition (multiple edge types!)
Overview SIGNALING NETWORKS Edges: activation or inhibition (multiple edge types!) @EF: Could draw a simple circuit like a feed forward loop on the board and ask what happens as you ramp up the expression of one of the members in the circuit. @JP: I was going to talk about motifs and simple building blocks in the next lecture. KEGG Pathways Database
SIGNALING NETWORKS Edges: activation, inhibition, phosphorylation, etc. KEGG Pathways Database
SIGNALING NETWORKS Literature curated, manually drawn pathways Groups of pathways Metabolism Genetic Information Processing Environmental Information Processing Cellular Processes Human Diseases Pathways are both species specific & cross-species (KO) Other Pathway Databases
SIGNALING NETWORKS KEGG (http://www.kegg.jp/kegg/pathway.html) Great for metabolic pathways. Simple interface. Multiple species including prokaryotes. REACTOME (http://www.reactome.org/) Supposedly the most comprehensive resource for signal transduction pathways. Human only. BIOCARTA (http://www.biocarta.com/genes/index.asp) Pretty maps with lots of colors. Mammalian. Experiments SIGNALING NETWORKS Decades of low throughput, painstaking experiments Stimulation Mutants Structure Context No single experiment type to deduce signaling network - Chain regulatory interactions
Directions = Pathways DIRECTED NETWORKS - Chain regulatory interactions - Concept of pathway emerges from directions - New analyses not possible with undirected networks TF A TF B TF C Gene D Recept. A Kinase B Signal Protein C TF D DIRECTED NETWORKS Connect the dots Signal at Cell Surface
Cascade to Nucleus Activate Transcription Factors TF Genes Gene Expression Clustering DIRECTED NETWORKS Network Analysis and Visualization
SUMMARY / APPLICATION Functional mapping: mining biological networks
Predicted relationships between genes High Confidence Low The strength of these relationships indicates how cohesive a process is. Cell cycle genes Functional mapping: mining biological networks
Predicted relationships between genes High Confidence Low Cell cycle genes Functional mapping: mining biological networks
Predicted relationships between genes High Confidence Low The strength of these relationships indicates how associated two processes are. Cell cycle genes DNA replication genes Predicting gene function
Predicted relationships between genes High Confidence Low Cell cycle genes Predicting gene function
Predicted relationships between genes High Confidence Low Cell cycle genes Predicting gene function
Predicted relationships between genes High Confidence Low These edges provide a measure of how likely a gene is to specifically participate in the process of interest. Cell cycle genes IMAGE SOURCES Slide Source 142 1
Slide numbers are no longer correct due to rearrangement and slide deck merging, but consult these URLs for all otherwise unattributed images Slide Source 1 4 8 31 34 36 38 45 48 49 50 51 https://www.weizmann.ac.il/complex/tlusty/courses/InfoInBio/Papers/AlonMotifs2002.pdf 53 142 IMAGE SOURCES Slide Source 143 1
17 29 41 42 43 45 50 143