Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q...

27
1 Topology of PPI Networks: Applications and Questions Talk given at University of Warsaw, January 2010 Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan PPI network cleansing based on PPI topology PPI-based protein complex prediction PPI-based protein function prediction Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Transcript of Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q...

Page 1: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

1

Topology of PPI Networks: Applications and Questions

Talk given at University of Warsaw, January 2010

pp Q

Limsoon Wong

(Works with Hon Nian Chua & Guimei Liu)

2

Plan

• PPI network cleansing based on PPI topology

• PPI-based protein complex prediction

• PPI-based protein function prediction

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Page 2: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

2

PPI Network Cleansing Based on PPI Topologyp gy

4

• Complete genomes are now available

• Proteins, not genes, are responsible for

• Proteins function by interacting w/ other

Why Protein Interactions?

• Knowing the genes is not enough to understand how biology functions

many cellular activities proteins and biomolecules

GENOME PROTEOME

“INTERACTOME”

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Slide credit: See-Kiong Ng

Page 3: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

3

5

High-Tech Expt PPI Detection Methods

• Yeast two-hybrid assays

• Mass spec of purified complexes (e.g., TAP)

FACT: Generating large amounts ofFACT: Generating large amounts of

Mass spec of purified complexes (e.g., TAP)

• Correlated mRNA expression

• Genetic interactions (e.g., synthetic lethality)

• ...

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

FACT: Generating large amounts of

experimental data about protein-protein

interactions can be done with ease.

FACT: Generating large amounts of

experimental data about protein-protein

interactions can be done with ease.

Slide credit: See-Kiong Ng

6

Key Bottleneck

• Many high-throughput expt detection methods for protein-protein interactions have been devisedp p

• But ...

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Slide credit: See-Kiong Ng

Page 4: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

4

7

Noise in PPI Networks

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Large disagreement betw methods

• High level of noise

Need to clean up before making inference on PPI networks

Sprinzak et al., JMB, 327:919-923, 2003

8

Measures that correlate with function homogeneity and localization coherence

• Two proteins participating in same biological process are more likely to interact

• Two proteins in the same cellular compartments are more likely to interact

• CD-distance

• FS-Weight

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

CD-distance & FS-Weight: Based on concept that two proteins with many interaction partners in common are likely to be in same biological process & localize to the same compartment

Page 5: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

5

9

Czekanowski-Dice Distance (Brun et al, 2003)

• Given a pair of proteins (u, v) in a PPI network

– Nu = the set of neighbors of uNu the set of neighbors of u

– Nv = the set of neighbors of v

• CD(u,v) =

||||

||2

vu

vu

NN

NN

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Consider relative intersection size of the two neighbor sets, not absolute intersection size

– Case 1: |Nu| = 1, |Nv|= 1, |NuNv|=1, CD(u,v)=1

– Case 2: |Nu| = 10, |Nv|= 10, |NuNv|=10, CD(u,v)=1

10

Iterated CD-Distance (Liu et al, 2008)

• Variant of CD-distance that penalizes proteins with few neighborsg

wL(u,v) =

u = max{0, }, v = max{0, }

vvuu

vu

NN

NN

||||

||2

||||

||u

Gx

x

NV

N

||

||

||v

Gx

x

NV

N

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Suppose average degree is 4, then

– Case 1: |Nu| = 1, |Nv|= 1, |NuNv|=1, wL(u,v)=0.25

– Case 2: |Nu| = 10, |Nv|= 10, |NuNv|=10, wL(u,v)=1

Page 6: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

6

11

A thought…

• Weight of interaction reflects its reliability

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Can we get better results if we use this weight to re-calculate the score of other interactions?

12

Iterated CD-Distance (Liu et al, 2006)

• wL0(u,v) = 1 if (u,v)G, otherwise wL0(u,v)=0

vuvu NNNN ||||• wL1(u,v) =

• wLk(u,v) =

Nvx

vkk

Luk

Nux

kL

NvNux

kL

NvNux

kL

xvwxuw

xvwxuw

),(),(

),(),(

11

11

vvuu

vuvu

NN

NNNN

||||

||||

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• ku = max{0, }

• kv = max{0, }

Nux

kL

Vx Nxy

kL

xuwV

yxw

),(||

),(1

1

Nvx

kL

Vx Nxy

kL

xvwV

yxw

),(||

),(1

1

Page 7: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

7

13

Validation

• DIP yeast dataset

– Functional homogeneity is 32.6% for PPIs whereFunctional homogeneity is 32.6% for PPIs where both proteins have functional annotations and 3.4% over all possible PPIs

– Localization coherence is 54.7% for PPIs where both proteins have localization annotations and 4.9% over all possible PPIs

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Let’s see how much better iterated CD-distance is over the baseline above, as well as over the original CD-distance/FS-weight

14

How many iteration is enough?

Cf. ave functional homogeneity of protein pairs in DIP < 4%ave functional homogeneity of PPI in DIP < 33%

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Iterated CD-distance achieves best performance wrt functional homogeneity at k=2

• Ditto wrt localization coherence (not shown)

Page 8: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

8

15

How many iteration is enough?

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Iterative CD-distance at diff k values on noisy network

# of iterations depends on amt of noise

16

Identifying False Positive PPIs

Cf. ave localization coherence of protein pairs in DIP < 5%ave localization coherence of PPI in DIP < 55%

1 1

AdjustCD (k=1)

FSweight

0.5

0.6

0.7

0.8

0.9

Fu

nct

ion

al h

om

oge

nei

ty

AdjustCD (k=2)

CD-distance

0 7

0.75

0.8

0.85

0.9

0.95

Lo

caliz

atio

n co

he

ren

ce

AdjustCD (k=2)AdjustCD (k=1)

FSweightCD-distance

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Iterated CD-distance is an improvement over previous measures for assessing PPI reliability

0 1000 2000 3000 4000 5000 6000

#interactions

0.70 500 1000 1500 2000 2500 3000

#interactions

Page 9: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

9

17

Identifying False Negative PPIs

Cf. ave localization coherence of protein pairs in DIP < 5%ave localization coherence of PPI in DIP < 55%

1 1 2

CD-distance

0.2

0.4

0.6

0.8

1

Fun

ctio

nal h

omog

enei

ty AdjustCD (k=2)AdjustCD (k=1)

FSweightCD-distance

0 2

0.4

0.6

0.8

1

1.2

Loca

lizat

ion

cohe

renc

e

AdjustCD (k=2)AdjustCD (k=1)

FSweight

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Iterated CD-distance is an improvement over previous measures for predicting new PPIs

0 200 400 600 800 1000

#predicted interactions

0.20 200 400 600 800 1000

#predicted interactions

18

5-Fold Cross-Validation

• DIP core dataset

– Ave # of proteins in 5 groups: 986Ave # of proteins in 5 groups: 986

– Ave # of interactions in 5 training datasets: 16723

– Ave # of interactions in 5 testing datasets: 486591

– Ave # of correct answer interactions: 307

• Measures:

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

– sensitivity =TP/(TP + FN)

– specificity =TN/(TN + FP)• #negatives >> #positives, specificity is always high

• >97.8% for all scoring methods

– precision =TP/(TP + FP)

Page 10: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

10

19

5-Fold X-Validation

0.8

1

AdjustCD (k=2)AdjustCD (k=1)

0.2

0.4

0.6

Pre

cisi

on

FSweightCD-distance

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Iterated CD-distance is an improvement over previous measures for identifying false positive & false negative PPIs

00 0.1 0.2 0.3 0.4 0.5

Sensitivity

PPI-Based Protein Complex Prediction

Page 11: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

11

21

Motivation

• Nature of high-throughput PPI expts

• Can a protein interact with so many proteins

i lt l ?– Proteins are taken out of their natural context!

simultaneously?

• A big “hub” and its “spokes” should probably be decomposed into subclusters

– Each subcluster is a set t i th t i t t i

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

proteins that interact in the same space and time

– Viz., a protein complex

22

PPI-Based Complex Prediction Algo

RNSC MCODE MCL

Type Clustering Local FlowType Clustering, local search cost based

Local neighborhood density search

Flow simulation

Multiple assignment of protein

No Yes No

Weighted edge

No No Yes

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Issue: recall vs precision has to be improved

Does a “cleaner” PPI network help?

How to capture “low edge density” complexes?

edge

Page 12: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

12

23

PPI-Based Complex Prediction

• Recall & precision of protein complex prediction

l h l t t balgo have lots to be improved

• Does a “cleaner” PPI network help?

• How to capture “low edge

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

density” complexes? Clique merging?

Relative density?

Core-n-attachment?

24

Cleaning PPI Network

• Modify existing PPI network as follow

– Remove interactions with low weight

Add interactions with high weight

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

– Add interactions with high weight

• Then run RNSC, MCODE, MCL, …, as well as our own method CMC

Page 13: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

13

25

CMC: Clustering of Maximal Cliques

• Remove noise edges in input PPI network by discarding edges having low iterated CD-distanceg g g

• Augment input PPI network by addition of missing edges having high iterated CD-distance

• Predict protein complex by finding overlapping maximal cliques and merging/removing them

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

maximal cliques, and merging/removing them

• Score predicted complexes using cluster density weighted by iterated CD-distance

26

Validation Experiments

• Matching a predicted complex S with a true complex C

– Vs: set of proteins in S

– Vc: set of proteins in C

– Overlap(S, C) = |Vs Vc| /|VsVc|

– Overlap(S, C) 0. 5

• Evaluation

Precision = matched predictions / total predictions

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

– Precision = matched predictions / total predictions

– Recall = matched complexes / total complexes

• Datasets: combined info from 6 yeast PPI expts

– #interactions: 20461 PPI from 4671 proteins

– #interactions with >0 common neighbor: 11487

Page 14: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

14

27

Effecting of Cleaning on CMC

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Cleaning by Iterated CD-distance improves recall & precision of CMC

28

Noise Tolerance of CMC

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• If cleaning is done by iterating CD-distance 20 times, CMC can tolerate up to 500% noise in the PPI network!

Page 15: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

15

29

Effect of Cleansing on MCL

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• MCL benefits significantly from cleaning too

• Ditto for other protein complex prediction methods

30

CMC vs Others

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Page 16: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

16

31

Characteristics of Unmatched Clusters

• At k = 2 …

• 85 clusters predicted by CMC do not match85 clusters predicted by CMC do not match complexes in Aloy and MIPS

• Localization coherence score ~90%

• 65/85 have the same informative GO term annotated to > 50% of proteins in the cluster

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Likely to be real complexes

PPI-Based Protein Function Prediction

Page 17: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

17

33

Protein Interaction Based Approaches

• Neighbour counting (Schwikowski et al, 2000)

• Rank function based on freq in interaction partners

• Clustering (Brun et al, 2003; Samanta et al, 2003)

• Functional distance derived from shared interactionin interaction partners

• Chi-square (Hishigaki et al, 2001)

• Chi square statistics using expected freq of functions in interaction partners

• Markov Random Fields (Deng et al, 2003; Letovsky et al, 2003)

• Belief propagation exploit unannotated proteins for prediction

Si l t d A li

from shared interaction partners

• Clusters based on functional distance represent proteins with similar functions

• Functional Flow (Nabieva et al, 2004)

• Assign reliability to various expt sources

• Function “flows” to neighbour based on reliability of interaction and

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Simulated Annealing (Vazquez et al, 2003)

• Global optimization by simulated annealing

• Exploit unannotated proteins for prediction

y“potential”

• Indirect Functional Assoc (Chua et al, 2006)

• Identification of reliable common interaction partners

34

Functional Association Thru Interactions

• Direct functional association:

– Interaction partners of a protein

Level-1 neighbour

are likely to share functions w/ it

– Proteins from the same pathways are likely to interact

• Indirect functional association

– Proteins that share interaction partners with a protein may also likely to share functions w/ it

Level-2 neighbour

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

– Proteins that have common biochemical, physical properties and/or subcellular localization are likely to bind to the same proteins

Page 18: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

18

35

An Illustrative Case of Indirect Functional Association?

SH3 Proteins SH3-BindingProteinsProteins

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Is indirect functional association plausible?

• Is it found often in real interaction data?

• Can it be used to improve protein function prediction from protein interaction data?

36

YAL012W|1.1.6.5|1.1.9

Freq of Indirect Functional Association

YBR055C|11.4.3.1

YDR158W|1.1.6.5|1.1.9

YJR091C|1.3.16.1|16.3.3

YMR101C|42.1

YPL149W|14.4|20.9.13|42.25|14.7.11

YPL088W|2.16|1.1.9

YMR300C|1.3.1

YBL072C|12.1.1

YBL061CYBR023C

YBR293W|16.19.3|42.25|1.1.3|1.1.9

YLR330W YLR140W YMR047C

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

YOR312C|12.1.1

YBL061C|1.5.4|10.3.3|18.2.1.1|32.1.3|42.1|43.1.3.5|1.5.1.3.2

YBR023C|10.3.3|32.1.3|34.11.3.7|42.1|43.1.3.5|43.1.3.9|1.5.1.3.2

YKL006W|12.1.1|16.3.3 YPL193W

|12.1.1

YLR330W|1.5.4|34.11.3.7|41.1.1|43.1.3.5|43.1.3.9

YLR140W

YDL081C|12.1.1

YDR091C|1.4.1|12.1.1|12.4.1|16.19.3

YPL013C|12.1.1|42.16

0 7C|11.4.2|14.4|16.7|20.1.10|20.1.21|20.9.1

Source: Kenny Chua

Page 19: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

19

37

Prediction Power By Majority Voting

• Remove overlaps in level-1 and level-2 neighbours to

t d di ti fstudy predictive power of “level-1 only” and “level-2 only” neighbours

• Sensitivity vs Precision analysis

K

i

K

i i

n

kSN

K

i

K

i i

m

kPR

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• ni is no. of fn of protein i

• mi is no. of fn predicted for protein i

• ki is no. of fn predicted correctly for protein i

“level-2 only” neighbours performs better

L1 ∩ L2 neighbours has greatest prediction power

i ii im

38

• Functional distance between two proteins (Brun et al, 2003)

Functional Similarity Estimate:Czekanowski-Dice Distance

NN

• Nk is the set of interacting partners of k

• X ∆ Y is symmetric diff betw two sets X and Y

• Greater weight given to similarity

vuvu

vu

NNNN

NNvuD

,

Is this a good measure if u and v have very

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Similarity can be defined as

)(2

2),(1,

ZYX

XvuDvuS

and v have very diff number of neighbours?

Page 20: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

20

39

Functional Similarity Estimate:FS-Weighted Measure

• FS-weighted measure

• Nk is the set of interacting partners of k

• Greater weight given to similarity

vuuv

vu

vuvu

vu

NNNN

NN

NNNN

NNvuS

2

2

2

2,

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Rewriting this as

ZX

X

YX

XvuS

2

2

2

2,

40

Correlation w/ Functional Similarity

• Correlation betw functional similarity & estimates

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Equiv measure slightly better in correlation w/ similarity for L1 & L2 neighbours

Source: Kenny Chua

Page 21: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

21

41

Reliability of Expt Sources

• Diff Expt Sources have diff reliabilities

Source Reliability

Affinity Chromatography 0.823077– Assign reliability to an

interaction based on its expt sources (Nabieva et al, 2004)

• Reliability betw u and v computed by:

Affinity Precipitation 0.455904

Biochemical Assay 0.666667

Dosage Lethality 0.5

Purified Complex 0.891473

Reconstituted Complex 0.5 ivu rr )1(1,

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• ri is reliability of expt source i,

• Eu,v is the set of expt sources in which interaction betw u and v is observed

Synthetic Lethality 0.37386

Synthetic Rescue 1

Two Hybrid 0.265407

vuEi ,

42

Functional Similarity Estimate:FS-Weighted Measure with Reliability

• Take reliability into consideration when computing FS-weighted measure:p g g

• Nk is the set of interacting partners of k

i li bilit i ht f i t ti b t d

vuvuuv

vu

vuvuvu

vu

NNwwvwu

NNwwuwv

NNwwv

NNwwvwu

NNwwvwu

NNwwvwu

NNwwu

NNwwvwu

R

rrrrr

rr

rrrrr

rr

vuS

,,,,,

,,

,,,,,

,,

21

2

21

2

,

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• ru,w is reliability weight of interaction betw u and v

Rewriting

ZX

X

YX

XvuS

2

2

2

2,

Page 22: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

22

43

Integrating Reliability

• Equiv measure shows improved correlation w/ functional similarity when reliability of y yinteractions is considered:

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

44

Improvement to Prediction Power by Majority Voting

Considering only g yneighbours w/ FS weight > 0.2

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Page 23: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

23

45

Improvement to Over-Rep of Functions in Neighbours

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

46

Use L1 & L2 Neighbours for Prediction

• FS-weighted Average

• rint is fraction of all interaction pairs sharing function

• is weight of contribution of background freq

• (k, x) = 1 if k has function x, 0 otherwise

• Nk is the set of interacting partners of k

u vNv NwTRTRxx xwwuSxvvuSr

Zuf ,,,,

1int

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Nk is the set of interacting partners of k

• x is freq of function x in the dataset

• Z is sum of all weights

u vNv NwTRTR wuSvuSZ ,,1

Page 24: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

24

47

Performance of FS-Weighted Averaging

• LOOCV comparison with Neighbour Counting, Chi-Square, PRODISTINq

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

48

Performance of FS-Weighted Averaging

• Dataset from Deng et al, 2003

– Gene Ontology (GO) AnnotationsGene Ontology (GO) Annotations

– MIPS interaction dataset

• Comparison w/ Neighbour Counting, Chi-Square, PRODISTIN, Markov Random Field, FunctionalFlow

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Page 25: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

25

49

Freq of Indirect Functional Association in Other Genomes

D. melanogaster

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

50

Effectiveness of FS Weighted Averaging in Other Genomes

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

Page 26: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

26

Last Remarks

52

What have we learned?

• Guilt by association of common interaction partners is useful forp

– Cleansing high-throughput PPI network data

– Predicting protein complexes

– Inferring protein functional information

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• Acknowledgement

– Kenny Chua, Guimei Liu

Page 27: Topology of PPI Networks: Applications and Questions pp Qwongls/psZ/ppi-poland10.pdf · pp Q Limsoon Wong (Works with Hon Nian Chua & Guimei Liu) 2 Plan ... • Issue: recall vs precision

27

53

Readings

• H. N. Chua, et al. ”Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions” Bioinformatics 22:1623-1630 2006interactions , Bioinformatics, 22:1623-1630, 2006

• H.N. Chua, et al. “Using Indirect Protein-Protein Interactions for Protein Complex Prediction”, JBCB, 6(3):435--466, 2008

• H. N. Chua, L. Wong. “Increasing the Reliability of Protein Interactomes”, Drug Discovery Today, 13(15/16):652--658, 2008

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong

• G. Liu, et al. “Assessing and predicting protein interactions using both local and global network topological metrics”, Proc GIW2008

• G. Liu, et al. “Complex Discovery from Weighted PPI Networks”, Bioinformatics, 25(15):1891--1897, 2009.

54

Any Question?

Talk at Warsaw University, January 2010. Copyright © 2010 by Limsoon Wong