Nonparametric Link Prediction in Dynamic Graphs
description
Transcript of Nonparametric Link Prediction in Dynamic Graphs
![Page 1: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/1.jpg)
1
Nonparametric Link Prediction in Dynamic Graphs
Purnamrita Sarkar (UC Berkeley)Deepayan Chakrabarti (Facebook)Michael Jordan (UC Berkeley)
![Page 2: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/2.jpg)
2
Link Prediction Who is most likely to be interact with a given node?
Friend suggestion in Facebook
Should Facebook suggest Alice
as a friend for Bob?
Bob
Alice
![Page 3: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/3.jpg)
3
Link Prediction
Alice
Bob
Charlie
Movie recommendation in Netflix
Should Netflix suggest this
movie to Alice?
![Page 4: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/4.jpg)
4
Link Prediction Prediction using simple features
degree of a node number of common neighbors last time a link appeared
What if the graph is dynamic?
![Page 5: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/5.jpg)
5
Related Work
Generative models Exp. family random graph models [Hanneke+/’06] Dynamics in latent space [Sarkar+/’05] Extension of mixed membership block models
[Fu+/10] Other approaches
Autoregressive models for links [Huang+/09] Extensions of static features [Tylenda+/09]
![Page 6: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/6.jpg)
6
Goal
Link Prediction incorporating graph dynamics, requiring weak modeling assumptions, allowing fast predictions, and offering consistency guarantees.
![Page 7: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/7.jpg)
7
Outline
Model Estimator Consistency Scalability Experiments
![Page 8: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/8.jpg)
8
The Link Prediction Problem in Dynamic Graphs
G1 G2 GT+1……
Y1 (i,j)=1
Y2 (i,j)=0
YT+1 (i,j)=?
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j))
Edge in T+1 Features of previous graphsand this pair of nodes
![Page 9: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/9.jpg)
9
cn
ℓℓ
deg
Including graph-based features
Example set of features for pair (i,j): cn(i,j) (common neighbors) ℓℓ(i,j) (last time a link was formed) deg(j)
Represent dynamics using “datacubes” of these features. ≈ multi-dimensional histogram on binned feature values
ηt = #pairs in Gt with these features
1 ≤ cn ≤ 33 ≤ deg ≤ 61 ≤ ℓℓ ≤ 2
ηt+ = #pairs in Gt with these
features, which had an edge in Gt+1
high ηt+/ηt this feature
combination is more likely to create a new edge at time t+1
![Page 10: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/10.jpg)
10
G1 G2 GT……
Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=?
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Including graph-based features
How do we form these datacubes? Vanilla idea: One datacube for Gt→Gt+1
aggregated over all pairs (i,j) Does not allow for differently evolving communities
![Page 11: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/11.jpg)
11
YT+1 (i,j)=?
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Our Model
How do we form these datacubes? Our Model: One datacube for each neighborhood
Captures local evolution
G1 G2 GT……
Y1 (i,j)=1 Y2 (i,j)=0
![Page 12: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/12.jpg)
12
Our Model
Number of node pairs- with feature s- in the neighborhood of i- at time t
Number of node pairs- with feature s- in the neighborhood of i- at time t- which got connected at time t+1
Datacube
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Neighborhood Nt(i)= nodes within 2 hops
Features extracted from (Nt-p,…Nt)
![Page 13: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/13.jpg)
13
Our Model
Datacube dt(i) captures graph evolution in the local neighborhood of a node in the recent past
Model:
What is g(.)?
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( gG1,G2,…GT(i,j))g(dt(i), st(i,j))
Features of the pair
Local evolution patterns
![Page 14: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/14.jpg)
14
Outline
Model Estimator Consistency Scalability Experiments
![Page 15: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/15.jpg)
15
Kernel Estimator for g
G1 G2 …… GTGT-1GT-2
query data-cube at T-1 and feature vector at time T
compute similarities
datacube, feature pair
t=1
{{
{
{
{
{
{
{
…
datacube, feature pair
t=2
{{
{
{
{
{
{
{
…datacube,
feature pair t=3
{{
{
{
{
{
{
{
…{
{
![Page 16: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/16.jpg)
16
Factorize the similarity function Allows computation of g(.) via simple lookups
}} }
K( , )I{ == }
Kernel Estimator for g
![Page 17: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/17.jpg)
17
Kernel Estimator for g
G1 G2 …… GTGT-1GT-2
datacubes t=1
datacubes t=2
datacubes t=3
compute similarities only between data cubes
w1
w2
w3
w4
η1 , η1+
η2 , η2+
η3 , η3+
η4 , η4+
44332211
44332211
wwwwwwww
![Page 18: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/18.jpg)
18
Factorize the similarity function Allows computation of g(.) via simple lookups What is K( , )?
}}
}
K( , )I{ == }
Kernel Estimator for g
![Page 19: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/19.jpg)
19
Similarity between two datacubes
Idea 1 For each cell s, take
(η1+/η1 – η2
+/η2)2 and sum
Problem: Magnitude of η is ignored 5/10 and 50/100 are treated
equally
Consider the distribution
η1 , η1+
η2 , η2+
![Page 20: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/20.jpg)
20
Similarity between two datacubes
0 5 10 15 20 25 30 35 40 450
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 5 10 15 20 25 30 35 40 450
0.02
0.04
0.06
0.08
0.1
0.12
0.14
) , dist(b) , K( 0<b<1
As b0, K( , ) 0 unless dist( , ) =0
Idea 2 For each cell s, compute
posterior distribution of edge creation prob.
dist = total variation distance between distributions summed over all cells
η1 , η1+
η2 , η2+
![Page 21: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/21.jpg)
21
1tη) , K(#1f
) , (f) , (h) , (g
1tη) , K(
#1h
Want to show: gg
Kernel Estimator for g
![Page 22: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/22.jpg)
22
Outline
Model Estimator Consistency Scalability Experiments
![Page 23: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/23.jpg)
23
Consistency of Estimator
Lemma 1: As T→∞, for some R>0,
Proof using:
) , (f) , (h) , (g
As T→∞,
![Page 24: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/24.jpg)
24
Consistency of Estimator
Lemma 2: As T→∞,
) , (f) , (h) , (g
![Page 25: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/25.jpg)
25
Consistency of Estimator
Assumption: finite graph Proof sketch:
Dynamics are Markovian with finite state spacethe chain must eventually enter a closed, irreducible communication classgeometric ergodicity if class is aperiodic(if not, more complicated…)strong mixing with exponential decayvariances decay as o(1/T)
![Page 26: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/26.jpg)
26
Consistency of Estimator
Theorem:
Proof Sketch:
for some R>0
So
![Page 27: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/27.jpg)
27
Outline
Model Estimator Consistency Scalability Experiments
![Page 28: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/28.jpg)
28
Scalability Full solution:
Summing over all n datacubes for all T timesteps Infeasible
Approximate solution: Sum over nearest neighbors of query datacube
How do we find nearest neighbors? Locality Sensitive Hashing (LSH)
[Indyk+/98, Broder+/98]
![Page 29: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/29.jpg)
29
Using LSH
Devise a hashing function for datacubes such that “Similar” datacubes tend to be hashed to the
same bucket “Similar” = small total variation distance
between cells of datacubes
![Page 30: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/30.jpg)
30
0 5 10 15 20 25 30 35 40 450
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Using LSH
Step 1: Map datacubes to bit vectors
Use B2 bits for each bucket For probability mass p the first bits are set to
1Use B1 buckets to discretize [0,1]
Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells
![Page 31: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/31.jpg)
31
Using LSH
Step 1: Map datacubes to bit vectors Total variation distance
L1 distance between distributions Hamming distance between vectors
Step 2: Hash function = k out of MB1B2 bits
![Page 32: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/32.jpg)
32
Fast Search Using LSH
1111111111000000000111111111000
10000101000011100001101010000
10101010000011100001101010000
101010101110111111011010111110
1111111111000000000111111111001
00000001
1111
0011
.
.
.
.
1011
![Page 33: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/33.jpg)
33
Outline
Model Estimator Consistency Scalability Experiments
![Page 34: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/34.jpg)
34
Experiments
Baselines LL: last link (time of last occurrence of a pair)
CN: rank by number of common neighbors in AA: more weight to low-degree common neighbors Katz: accounts for longer paths
CN-all: apply CN to AA-all, Katz-all: similar
ss
![Page 35: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/35.jpg)
35
Setup
Pick random subset S from nodes with degree>0 in GT+1
, predict a ranked list of nodes likely to link to s Report mean AUC (higher is better)
G1 G2 GT
Training data Test dataGT+
1
![Page 36: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/36.jpg)
36
Simulations Social network model of Hoff et al.
Each node has an independently drawn feature vector
Edge(i,j) depends on features of i and j Seasonality effect
Feature importance varies with seasondifferent communities in each season
Feature vectors evolve smoothly over timeevolving community structures
![Page 37: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/37.jpg)
37
Simulations
NonParam is much better than others in the presence of seasonality
CN, AA, and Katz implicitly assume smooth evolution
![Page 38: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/38.jpg)
38
Sensor Network*
* www.select.cs.cmu.edu/data
![Page 39: Nonparametric Link Prediction in Dynamic Graphs](https://reader035.fdocuments.us/reader035/viewer/2022062521/5681667a550346895dda1a83/html5/thumbnails/39.jpg)
39
Summary
Link formation is assumed to depend on the neighborhood’s evolution over a time window
Admits a kernel-based estimator Consistency Scalability via LSH
Works particularly well for Seasonal effects differently evolving communities