k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis,...
-
Upload
michalis-potamias -
Category
Technology
-
view
1.403 -
download
1
Transcript of k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis,...
k-Nearest Neighbors in Uncertain Graphs
Michalis Potamias Francesco Bonchi
Aristides Gionis George Kollios
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
2
Thesis
• Many complex networks are modeled as probabilistic (i.e., uncertain) graphs.
• The probabilistic treatment of such graphs leads to better understanding of real data.
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
3
Source: Asthana et al., Genome Research 2004
Possible interactions between proteins are established through biological experiments that entail uncertainty. The edge probabilityrepresents that uncertainty. A
B C
D
0.2
0.4
0.6
0.3 0.7
A
B C
D
Probabilistic Protein-Protein Interaction Networks
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
4
• Neighbors of a given node in a standard graph?– Nodes close in terms of shortest path distance!
• How do we define neighbors in probabilistic graphs?
• How do we define the distance?
– Treat them as weighted graphs (N06)– Nodes with high reliability(GR04)– Most probable path (BI03)– …shortest paths? (VLDB10)
A
B C
D
0.2
0.4
0.6
0.3 0.7
A
B C
D
Probabilistic Protein-Protein Interaction Networks
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
5
• Why is it important to find good neighbors of proteins in PPI networks?– Detection of candidate co-complex relationships.– Actual co-complex relationships can be
established through experiments in the lab.
Probabilistic Protein-Protein Interaction Networks
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
6
Outline
• Thesis
• Probabilistic PPI Networks
• Distance Definition
• Sampling Algorithms
• kNN Pruning
• Experiments
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
7
Outline
• Thesis
• Probabilistic PPI Networks
• Distance Definition
• Sampling Algorithms
• kNN Pruning
• Experiments
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
8
A
B C
D
0.2
0.4
0.6
0.3 0.7
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
A
B C
D
Distance Definition
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
9
Distance Definition
the graphA
B C
D
0.2
0.4
0.6
0.3 0.7
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
10
Distance Definition
the graph
)),(1()),(1()),(1(
),(),()Pr(
DApDCpCBp
DBpBApworld
a worldA
B C
D
0.2
0.4
0.6
0.3 0.7
A
B C
D
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
12
Distance Definition
the graph a worldA
B C
D
0.2
0.4
0.6
0.3 0.7
A
B C
D
.3.26
.44
1 2 infshortest path length d(B,D)
)),(1()),(1()),(1(
),(),()Pr(
DApDCpCBp
DBpBApworld
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
14
• Use well known statistics of the Shortest Path PDF:– Median– Majority (mode)– ExpectedReliable
• infinity problem
• Hard! they require explicit enumeration of possible worlds: resort to sampling!
.3.26
.44
1 2 inf46.1
inf
2
exp
d
d
d
maj
med
shortest path length d(B,D)
Distance Definition
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
15
Outline
• Thesis
• Probabilistic PPI Networks
• Distance Definition
• Sampling Algorithms
• kNN Pruning
• Experiments
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
16
1. sample (a small number of) worlds
2. compute sample median (approximation)
3. output result– Median (Chernoff bound) – ExpectedReliable (Hoeffding inequality)– Majority (No bound)
Sampling Algorithms
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
17
Sampling Algorithms
BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges
FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
18
Outline
• Thesis
• Probabilistic PPI Networks
• Distance Definition
• Sampling Algorithms
• kNN Pruning
• Experiments
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
19
kNN Pruning
• Query: Given a probabilistic graph, and a source node find the set of k nodes closest to the source.
• Naïve algorithm:1. sample worlds
2. run dijkstra traversals and compute a pdf of the sp distance per node
3. calculate the median distance to all nodes using the pdf’s
4. compute k-nn
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
20
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
21
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
1
B C D E F G
2 3
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
22
kNN Pruning
1nn - mediannode: Asample: 5 worlds
B C D E F G
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1 2 3
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
23
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
A
C
F
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1
B C D E F G
2 3 21 2 2
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
24
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
A
C
F
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1
B C D E F G
2 3 21 2 2
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
25
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
A
C
F
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1
B C D E F G
2 3 21 2 2
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
26
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
A
C
F
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1
B C D E F G
2 3 21 2 2
1
2
3
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
27
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
A
C
F
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
E
B
G
D
A
C
F
1
B C D E F G
2 3 21 2 2
1
2
3
naive
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
28
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
• algorithm– sample worlds on the fly– increase the horizon of each dijkstra one hop at a
time– maintain truncated pdf histograms
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
29
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
30
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.71
B
B
A
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
31
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.71
B
B
A
B
A
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
32
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.71
B C
B
A
B
A
B
A
C
1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
33
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
B
A
B
A
B
A
C
B
A
C
1
B C
1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
34
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
B
A
B
A
B
A
C
B
A
C
A
1
B C
1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
35
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
B
A
B
A
B
A
C
B
A
C
A
1
B C
1
1
>1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
36
kNN Pruning
1nn - mediannode: Asample: 5 worlds
E
B
G
D
0.9
A
C
F
0.3
0.4
0.6
0.8
0.5
0.3
0.7
B
A
B
A
B
A
C
B
A
C
A
1
B C
1
•B has distance 1•C has distance greater than 1•D, E, F, G, … were not discovered (d>1)•1NN set is complete with B – no need to cont
•just 2 nodes visited (and 2 histograms maintained)•worlds were only partially instantiated •same answer as the naive
•with a small cost: dijkstra state needs to be maintained in memory for all worlds
1
>1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
37
kNN Pruning
BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges
FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges
DBLPauthors from dblp. probabilities have been assigned based on number of coauthored papers226K nodes, 1.4M edges
for 200 worlds and 5NN the speedups were:247x (BIOMINE), 111x (FLICKR), 269x (DBLP)
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
38
kNN Pruning
BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges
FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges
DBLPauthors from dblp. probabilities have been assigned based on number of coauthored papers226K nodes, 1.4M edges
for 200 worlds and 5NN the speedups were:247x (BIOMINE), 111x (FLICKR), 269x (DBLP)
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
39
Less uncertainty, more pruning
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
40
Less uncertainty, more pruning
A
B C
D
0.2
0.4
0.6
0.3 0.7
d
A
B C
D
1-0.8
1-0.6
1-0.4
1-0.7 1-0.3
d
d
d d
•boost probabilities of edges by giving each edge d chances
•d=1: original graph•increasing d, p goes to 1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
41
Less uncertainty, more pruning
A
B C
D
0.2
0.4
0.6
0.3 0.7
d
A
B C
D
1-0.8
1-0.6
1-0.4
1-0.7 1-0.3
d
d
d d
•boost probabilities of edges by giving each edge d chances
•d=1: original graph•increasing d, p goes to 1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
42
Less uncertainty, more pruning
A
B C
D
0.2
0.4
0.6
0.3 0.7
d
A
B C
D
1-0.8
1-0.6
1-0.4
1-0.7 1-0.3
d
d
d d
•boost probabilities of edges by giving each edge d chances
•d=1: original graph•increasing d, p goes to 1
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
43
Outline
• Thesis
• Probabilistic PPI Networks
• Distance Definition
• Sampling Algorithms
• kNN Pruning
• Experiments
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
44
Experiments
• Dataset– Probabilistic PPI network
[Krogan et al, Nature 06]
– Protein co-complex relationships (ground truth)
[Mewes et al, Nuc Acids Res 04]
• Experiment– Choose a ground truth edge
(A,B)– Choose a node C s.t. there is
no ground truth edge (A,C)– Classification task: Distinguish
between the two types of edges: (A,B) and (A,C)
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
45
Experiments
• Dataset– Probabilistic PPI network
[Krogan et al, Nature 06]
– Protein co-complex relationships (ground truth)
[Mewes et al, Nuc Acids Res 04]
• Experiment– Choose a ground truth edge
(A,B)– Choose a node C s.t. there is
no ground truth edge (A,C)– Classification task: Distinguish
between the two types of edges: (A,B) and (A,C)
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
46
Conclusion
• Probabilistic graph analysis benefits from possible-world semantics.
– Extended standard graph concepts to probabilistic graphs and designed approximation algorithms to compute them
– Introduced novel pruning algorithms for kNN in probabilistic graphs
– Confirmed the efficacy of our framework on real data.
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
47
Future Work
• Enrich model– Node probabilities– Arbitrary PDFs
• Explore random walks further
Nearest Neighbors in Uncertain Graphs @ VLDB 2010
48
Thank you!
?