(Mis)predicting adaptation to adverse outcomes: New evidence from the medical domain
Predicting domain-domain interactions using a parsimony approach
description
Transcript of Predicting domain-domain interactions using a parsimony approach
![Page 1: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/1.jpg)
Predicting domain-domain interactions using a parsimony approach
Katia Guimaraes, Ph.D.NCBI / NLM / NIH
![Page 2: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/2.jpg)
K. Guimaraes NCBI/NLM/NIH
2
The problem
We have:
• A protein-protein interaction network, not necessarily very reliable.
• Domain composition of the proteins in the network.
We want:
• Identify a set of putative domain interactions.
Basic assumption: Protein interactions are mediated by domain-domain interactions.
![Page 3: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/3.jpg)
K. Guimaraes NCBI/NLM/NIH
3
Related Work
Association Method: Sprinzak and Margalit. J.Mol. Biol., 2001.
Score( , ) = 4
Score based on the ratio: observed frequency (i,j) expected frequency (i,j)
(Figure from Sprinzak and Margalit, 2001)
P( ) • P( )
![Page 4: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/4.jpg)
K. Guimaraes NCBI/NLM/NIH
4
Related Work
Maximum Likelihood Estimation (EM): Deng, Mehta, Sun, and Chen. Genome Res., 2002.
GOAL: To assign a probability to each domain-domain contact so that the likelihood of the network is maximized.
Repeatedly tries to adapt parameters to explain the observed network, until there is no change.
Important feature of this method: Can take into account missing data so as to consider, for instance, false negatives.
![Page 5: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/5.jpg)
K. Guimaraes NCBI/NLM/NIH
5
Related Work
Domain Pair Exclusion Analysis (DPEA): Riley, Lee, Sabatti, and Eisenberg. Genome Biology, 2005.
APPROACH: MLE is computed multiple times, with a given domain-domain interaction disallowed, in order to observe the impact of that in the likelihood of the protein interaction network.
DPEA outperforms all previous prediction methods.
![Page 6: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/6.jpg)
K. Guimaraes NCBI/NLM/NIH
6
Our Approach
Our hypothesis:
Interactions evolved in the most parsimonious way.
So, we will try to explain the protein interactions using the “smallest-weighted” set of putative domain interactions.
Ex: For this protein interaction network:
Domain pair ( , ) would suffice to explainall protein interactions.
![Page 7: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/7.jpg)
K. Guimaraes NCBI/NLM/NIH
7
The intuition behind our approach
If single-domain proteins interact,
But the fact is that most proteins have multiple domains.
the problem is trivial:
![Page 8: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/8.jpg)
K. Guimaraes NCBI/NLM/NIH
8
What if there are multiple interacting proteins all with multiple domains?
By parsimony principle Domain pairs that are common in those protein interactions
are the best candidates as putative mediators.
In this example, pairs ( , ) and ( , ) represent the best choices.
![Page 9: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/9.jpg)
K. Guimaraes NCBI/NLM/NIH
9
Modeling the problem as an LP
For each domain pair Di Dj create a variable xij ≥ 0.
For each protein interaction Pm Pn
create a constraint:
xij
Pm Pn
i
j
xij 1
xij {Pm , Pn}
For this network there will be six constraints.
![Page 10: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/10.jpg)
K. Guimaraes NCBI/NLM/NIH
10
Modeling the problem as an LP
From the set protein-protein interactions, identify the potential domain-domain contacts, a set of variables. Ex:
We have 8 potential contacts:
( , )
( , )
( , ) ( , )
( , ) ( , )
( , )
( , ) 1
![Page 11: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/11.jpg)
K. Guimaraes NCBI/NLM/NIH
11
Modeling the problem as an LP
Since parsimonious evolution favors that domain pairs appearing in multiple interacting protein pairs are better candidates for mediating the contact,
minimize the sum of all scores assigned to the variables.
So, we have:
Minimize xij
Subject to: xij 1
xij {Pm , Pn}
{Pm , Pn} interacting protein pair
![Page 12: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/12.jpg)
K. Guimaraes NCBI/NLM/NIH
12
Modeling the reliability of the protein interaction network
Large scale experiments are rather unreliable.
Estimation: Protein interaction network reliability ~50%
To model that:– Build 1000 protein interaction subnetworks where each edge is kept according to the network reliability.
– Compute LP-scores for each xij in each network k, xijk
– LP-score for each pair will be the average of the values obtained in all runs.
![Page 13: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/13.jpg)
K. Guimaraes NCBI/NLM/NIH
13
The pw-score
pw-score(i,j) = min (p-value (i,j), (1-r)w(i,j) )
pw-score is an indicator of the influence of:
- Frequency of appearance of the domain pair
- Number of witness in view of network reliability
We use pw-score to filter our predictions.
![Page 14: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/14.jpg)
K. Guimaraes NCBI/NLM/NIH
14
Dataset used
Protein interaction network and domain contents compiled
by Eisenberg’s group for [Riley et al. , 2005] (DPEA)
Protein interaction network originally obtained from DIP.
- 26,032 protein-protein interactions (constraints)
- 177,233 potential domain contacts (variables)
Gold Standard Set = Subset of iPFAM
![Page 15: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/15.jpg)
K. Guimaraes NCBI/NLM/NIH
15
Comparison with other methods
We did two experiments to evaluate our method:
1. Enrichment of domain pairs in confirmed by crystal structure among topmost scored pairs
2. Prediction of interacting domain pair between two proteins containing at least one domain pair in the gold standard set.
![Page 16: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/16.jpg)
K. Guimaraes NCBI/NLM/NIH
16
Enrichment of domain pairs in the goldstandard set among topmost scored pairs
PE method outperforms othersin both coverage and accuracy.
pw-score ≤ 0.01
pw-score ≤ 0.05
![Page 17: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/17.jpg)
K. Guimaraes NCBI/NLM/NIH
17
EXPERIMENT 2 Prediction of interacting domain pair
between two interacting proteins
We use a more controlled datasetProtein pairs used in this experiment includes only those that contain at least one potential domain contact that is in the GSS (1,780 and not 26,032).
Pm
Pn
Given an interacting protein pair,
Identify which domain pair(s) mediates the protein interaction.
We assume that: Every protein interaction is mediated by a domain pair in the gold standard set.
For each one of the 1780 protein interacting pairs,check if the domain(s) with maximum score is (are)in gold standard set.
![Page 18: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/18.jpg)
K. Guimaraes NCBI/NLM/NIH
18
Comparison of PPV in Mediating Domain Pair Prediction experiment
0
10
20
30
40
50
60
70
80
90
100
242 321 148 50 232 34 84 67 84 20 60 8 37 59 34 7 33 6 11 243 1780
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21+ ANY
Number of Potential Domain Interactions in Protein Pairs (Number of Protein Pairs in the Corresponding Class)
Po
sit
ive
Pre
dic
tiv
e V
alu
e (
TP
/ (T
P+
FP
))
Association
EM
Random
DPEA
PE
Overall PPV around 75%PPV of PE is well above that of other methods in every class
DPEA ~42%
PPV estimations separated by classes, according to the # of potential domain contacts of the protein interaction.
![Page 19: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/19.jpg)
K. Guimaraes NCBI/NLM/NIH
19
Predicting domain-domain interactions
using a parsimony approach
Katia Guimaraes, Raja Jothi, Elena Zotenko, and Teresa Przytycka
Genome Biology, 2006
![Page 20: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/20.jpg)
K. Guimaraes NCBI/NLM/NIH
20
The impact of many appearances of the same domain
Domain pairs that appear very frequently may induce domain pairs with higher scores.
Obviously, a frequent pair may actually interact.
But we define a p-value to indicate that possibility.
![Page 21: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/21.jpg)
K. Guimaraes NCBI/NLM/NIH
21
Estimating a p-value
We randomize the network:
Build 1000 protein interaction networks with:• Same set of proteins, with same domain architectures
• ne edges selected at random
(ne = # edges in original protein interaction network.)
– Compute LP-scores for each xij in each network k, xijk
– p-value (xij) = # times LP-score (xijk) LP-score (xij)
1000
![Page 22: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/22.jpg)
K. Guimaraes NCBI/NLM/NIH
22
The presence of Witnesses
We recall the case of single domain interacting proteins:
We call such interacting protein pairs witnesses.
But since the edges of the network are not reliable, we may have false witnesses.
We use an estimation on the chance that a false
witness is present in the dataset: (1-r) w(i,j)
r = reliability of network; w(i,j) = # witnesses of (i,j).
![Page 23: Predicting domain-domain interactions using a parsimony approach](https://reader036.fdocuments.us/reader036/viewer/2022062304/56814336550346895dafa95a/html5/thumbnails/23.jpg)
K. Guimaraes NCBI/NLM/NIH
23
Dataset used
As input data we used the files compiled by Eisenberg’s group for [Riley et al. , 2005] (DPEA)
Protein interaction network originally obtained from DIP. - 26,032 protein-protein interactions - underlying 11,403 proteins - from 69 organisms. (This set generated 177,233 potential domain contacts.)
Domain architectures of the 11,403 proteins were
obtained by HMM, and include PFAM-B domains.
Our LP had 177,233 variables and 26,032 constraints.