Post on 20-Dec-2015
10/12: “Properties of Interaction Networks”
Presenter: Susan Tang
Scriber: Neda Nategh
DFLW: Chuan Sheng Foo
Upcoming:
10/17: “Transforming Cells into Automata” Ravi Tiruvury
“Index-based search of single sequences” Omkar Mate
10/19: “Multiple indexes and multiple alignments” Siddharth Jonathan
“Human Migrations” Anjalee Sujanani
Properties of Interaction Networks
CS 374 PresentationSusan Tang
October 12, 2006
Protein Interactions
Protein interactions are ubiquitous and essential for cellular function
Signal transduction Metabolic pathway Transcription regulation
Protein Interaction: Cell Signaling
http://en.wikipedia.org/wiki/Phospholipase_C
Protein Interaction: Metabolic Pathway
http://www.phschool.com/science/biology_place/biocoach/images/transcription/eusplice.gif
Protein Interaction: Transcription Regulation
http://www.cifn.unam.mx/Computational_Genomics/old_research/FIG22.gif
Protein Interaction Network
Yeast Protein Interaction Network. Tucker et al. 2001.
Studying protein interaction network architecture allows us to:
Assess the role of individual proteins in the overall pathway Evaluate redundancy of network components Identify candidate genes involved in genetic diseases Sets up the framework for mathematical models
For complex systems, the actual output may not be predictable by looking at only individual components:
The whole is greater than the sum of its parts
Importance of Protein Interaction Networks
Protein Interaction Data
High-throughput experiments
Yeast 2 Hybrid Screens Co-IP
Experimental flaws
False positives / False negatives Self-activators Promiscuous proteins Protein concentration differences Lack of benchmark
Yeast 2 Hybrid Screen
(Cytotrap System)
http://media.biocompare.com/bcimages/techart/cytofig1.jpg
Protein Interaction Data
Figure 1. Network cross-comparison.Pairs of proteins have been binned according to their shortest path in networks generated from Y2H and Co-IP data. The false-color map indicates bins with more (red) or fewer (blue) interactions than expected by chance. Bins enriched for true positives, false positives and true noninteractors are indicated.
Gaining confidence in high-throughput protein interaction networks. Bader et al. 2004
Protein Interaction Data
Validation
mRNA co-expression
genetic interactions database annotations / keywords
Analysis based on validation studies show that only 30 – 50 % of high-throughput interactions are valid.
Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Gasch et al. 2000.
Protein Interaction Data: Verification
Figure 4. Joint analysis of physical and genetic interactions.Genetic interactions have been used as anchors to mine the physical interaction network. Lines indicate high-confidence physical interactions (blue), genetic interactions (red) and physical + genetic interactions (black). Protein color indicates biological process (red, cell cycle; green, cell defense; cell environment, yellow; cell fate, yellow; cell organization, magenta; metabolism, lavender; protein fate, blue; protein synthesis, cyan; transcription, brown; transport mechanisms, tan; gray, no annotation).
Gaining confidence in high-throughput protein interaction networks. Bader et al. 2004
Interaction Networks: Features
Network Conservation Across Species Comparison between yeast,
fly, worm 3 eukaryotic species with
most complete networks 71 network regions are
conserved across all 3 species
Conserved patterns of protein interaction in multiple species. Sharan et al. 2005.
Interaction Networks: Literature
Interactions can be mapped from one genome to another through comparative genomics Annotation Transfer Between Genomes: Protein-Protein
Interologs and Protein-DNA Regulogs
Yu et al.
By correlating gene expression profiles for a hub and its partners, we can predict whether it’s a date or party hub Evidence for dynamically organized modularity in the yeast
protein-protein interaction network
Han et al.
Interolog Mapping: Background
Homology-based function annotation Sequence similarity structural similarity functional similarity
Protein function is a vague term and difficult to compare Focus on one aspect of protein function: Interactions with other proteins Examine the accuracy of comparing sequences to extrapolate protein
interactions
Functional similarity = f (Sequence similarity)
Protein interactions = f (Joint sequence similarity of interaction pair )
Interolog Mapping: Protein Homology
Homologs = proteins with significant sequence similarity (E-value<=10-10 )
Homologs encompass orthologs and paralogs Paralogs = proteins in the same species that arose from gene
duplication
DIFFERENT FUNCTION Orthologs = proteins in different species that evolved from a common
ancestor by speciation
SAME FUNCTION
In-Paralogs Out-Paralogs
C
http://genomebiology.com/content/figures/gb-2001-2-4-comment1005-1.jpg
Interolog Mapping: Orthologs
Interest in Orthologs
Key concept: If A and B interact in one species orthologs A’ and B’ will interact
(A’ & B’) = “interologs” of (A & B)
Defining Orthologs Loose definition: Top-blast hit Stringent definition: Reciprocal top-
blast hit Not all orthologs can be found using
above definitions
Maintain function Maintain interactions
Interolog Mapping: Interaction Transfer
Previous Works Best-match mapping Reciprocal best-match mapping Disadvantages:
Low coverage of total set of interactions Low prediction accuracy
Limitations of Interaction Transfer Some networks are more complete than others
Proportion of proteins that is annotated Proportion of protein interaction partners recorded
Interolog Mapping: New Method
Generalized Interolog Mapping Search for all homologs of each
interacting protein homolog family
Generalized interologs = any protein from family 1 + any protein from family 2
Interolog Mapping: Sequence Similarity Measures
Joint Sequence Similarity Many ways to define joint sequence similarity 2 definitions are used here
Joint Sequence Identity
Joint E-Value
JE less biased in shorter sequences than JI
Prediction Accuracy vs. JE and Prediction Accuracy vs. JI plots convey similar trend
Interolog Mapping: Data
Gold Standard Positives P Known interacting protein pairs in target organism Loose definition of an interaction: does not have to be a
physical interaction; can be via a complex association 8250 unique interactions in yeast
Gold Standard Negatives N Known non-interacting protein pairs in target organism Extracted/estimated from knowledge about protein localization 2,708,746 non-interactions in yeast
Interolog Mapping: Schema
H. pylori (bacteria) C. elegans (worm)
D. melanogaster(fly) S. Cerevisiae (yeast)
S. cerevisiae(yeast)
Interolog Mapping: Quantitative Parameters
Verification V(J) = percentage of verified predictions among generalized
interologs using J
Likelihood Ratio L(J) = likelihood that a generalized interolog is a true prediction
Opost = L(J) Oprior
Naïve Bayesian network no correlations between features iterative use of different L’s
Opost/Oprior
Interolog Mapping: Sequence Similarity and Interaction Transfer
Weighted Average of all 4 mappings
70
Interolog Mapping: Comparison to Other Methods
By the numbers… Applies to C.elegans S.cerevisiae mapping only
Best-Match
Reciprocal Best-Match
Generalized Interolog (all)
Generalized Interolog (top 5% JE )
Predicted 84 33 9317 112
Validated 25 18 162 35
Accuracy 30% 54% 2% 31%
Interolog Mapping: Trade-Offs
Increase JE Increase Accuracy
Decrease Predictive Power
Interolog Mapping: Experimental Verification
PIE (Probabilities Interactome Experimental) = 4 large-scale yeast interaction data sets
ROC curves compare generalized interolog mapping PIE
Generalized interlog mapping: coverage and accuracy comparable to PIE
Interolog Mapping: Summary
Finding
Higher joint sequence similarity Higher accuracy of protein interaction transfer
Application
Can use interolog mapping method developed in paper to predict interactions in model organisms with less-complete interaction networks
Interaction Networks: Literature
Interactions can be mapped from one genome to another through comparative genomics Annotation Transfer Between Genomes: Protein-Protein
Interologs and Protein-DNA Regulogs
Yu et al.
By correlating gene expression profiles for a hub and its partners, we can predict whether it’s a date or party hub Evidence for dynamically organized modularity in the yeast
protein-protein interaction network
Han et al.
Interaction Network Modularity: Background
Interaction networks are scale-free Most proteins interact with a small number of partners A few proteins (“hubs”) interact with many partners Resistant to random node removal Sensitive to targeted hub removal
Types of Hubs Party Hubs
Interact with most of their partners simultaneously Perform specific functions inside module
Date Hubs Interact with different partners at different times or locations Connect modules (biological processes) together
Party Hub: Example (Supreme Court)
Stephen G Breyer
Samuel Alito, Jr.
Ruth Bader GinsburgJohn Roberts (Chief of Justice)
David H. Souter
Clarence Thomas
Anthony Kennedy
Antonin Scalia John Paul Stevens
Date Hub: Example (Presidential Cabinet)
Margaret Spellings
Elaine ChaoDept of Labor
Condoleeza Rice(Secretary of State)
George Bush
Samuel Bodman (Dept of Energy)
Alberto GonzalesDepartment of Justice
Michael O. Leavitt(Dept of HHS)
Interaction Network Modularity: Network Construction
Filtered Yeast Interactome(FYI) Input Methods
High-throughput yeast-2-hybrid projects Co-IP Computational predictions MIPS protein complexes MIPS physical interactions
Procedure Extract high-confidence interactions in yeast High confidence = observed by atleast 2 different input methods
Results 1,379 proteins in this set Average: 3.6 interactions per protein Largest component: 778 proteins connected
Interaction Network Modularity: Hub Characterization
Data Source mRNA gene expression profiles Data for 5 different conditions
Pearson Correlation Coefficients (PCC)
Hub vs. Non-Hub Calculate PCC for a hub and each of its partners take average Calculate PCC for a non-hub and each of its partners take average Look at distribution of average PCC
Hubs have a bi-modal distribution Non-hubs have a normal distribution centered near 0
Interaction Network Modularity: PCC Distribution
Interaction Network ModularityPrediction of Date vs. Party Hub
Yeast Expression Compendium Superset of data for all external conditions Bi- modal: suggests we can partition date hubs from party hubs
Yeast Expression Conditions Pheromone treatment 45 data points Sporulation 10 data points Unfolded protein response 9 data points Stress response 174 data points Cell cycle 77 data points
Date/Party Partition Party Hubs = nodes with average PCC > cutoff in >= 1 conditions
Absence of clear bi-modal
Presence of clear bi-modal
Interaction Network Modularity: In Silico Node Removal
Effect on Path Connectivity Characteristic path length =
average shortest path length between node pairs
Remove node observe change in characteristic path length
Is there a difference in path connectivity change for removal of party vs. date hubs? YES
Party hubs: connectivity not affected
Date hubs: connectivity decreased
Interaction Network Modularity: In Silico Node Removal
Effect on Remaining Components Is there a difference in main
component after node removal for party vs. date hubs? YES
Main Component (Remove party hub) >> Main Component (Remove date hub)
Removal (Party Hub)
Removal (Date Hub)
FYI Network
Interaction Network Modularity: In Silico Node Removal
Date Hub Subnetworks Each subnetwork has a tendency to be homogeneous in function Subnetworks biological modules Can assign a ‘most likely’ function for each subnetwork by examining
functional annotation
Interaction Network Modularity: Genetic Interactions
Organized modularity model predicts that genetic perturbations of party hubs should differ from those of date hubs
Genetic Perturbation Date hubs and party hubs are comparable in terms of functional
essentiality Date hubs have more genetic interactions than party hubs
Interaction Network Modularity: Date/Hub Representation of FYI
Interaction Network Modularity: Summary
Findings
In silico investigation and genetic interaction analysis both describe a protein interaction model where: there is organized modularity date hubs act as module connectors party hubs function at a lower level within modules.
Application
Use this prediction method to classify and organize other interactomes into a modular network
Identification of party and date hubs may provide insight into potential drug targets