Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering...

36
Local Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented By: Ahmed Elbagoury 1

Transcript of Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering...

Page 1: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Local Graph Sparsification for Scalable ClusteringVe n u S a t u l u r i , S r i n i v a s a n Pa r t h a s a ra t hy a n d Y i y e R u a n

O h a i o S t a t e U n i v e r i s t y

P r e s e n t e d B y : A h m e d E l b a g o u r y

1

Page 2: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Outline•Motivation

•Related Work

•Graph Sparsification• Global Sparsification

• Local Sparsification

• Minwise Hashing

•Experimental Evaluation

•Conclusion

•Discussion

2

Page 3: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Motivation

Social Network community discoveringEconomic Networks

• Man real problems can be modeled as graphs• Analysing these graphs using clustering

Genetic Interaction Networks

3FIGURE ARE FROM HTTP://GOO.GL/2QBCAF

HTTP://GOO.GL/DXKSXK

Page 4: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Motivation•Many graph clustering algorithms have been proposed

•Scalability of these algorithm is a challenge

•Large graphs are common in WWW domain• Facebook and Twitter

The goal is:

Develop a preprocessing step• To speedup graph clustering algorithms• Without loosing quality of the clusters

Not developing a clustering algorithm

4

Page 5: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Outline•Motivation

•Related Work

•Graph Sparsification• Global Sparsification

• Local Sparsification

• Minwise Hashing

•Experimental Evaluation

•Conclusion

•Discussion

5

Page 6: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Related Work•Preserving edge cut• Random Sampling

• Edge Connectivity

• Resistance of an edge

•Edge filtering• Identifies the most interesting edges without keeping cluster structure

• Inappropriates for unweight networks

•Graph Sampling• Selects subset of edges and nodes

• How to cluster missing nodes?!

6

Page 7: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Outline•Motivation

•Related Work

•Graph Sparsification• Global Sparsification

• Local Sparsification

• Minwise Hashing

•Experimental Evaluation

•Conclusion

•Discussion

7

Page 8: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Graph Sparsification•The objective is scaling up clustering algorithms

•Reduce the size of the graph

•Sparsify the graph: Filter only some edges and retain all the nodes

8FIGURES ARE FROM "LOCAL GRAPH SPARSIFICATION FOR SCALABLE CLUSTERING." PROCEEDINGS

OF THE 2011 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA. ACM, 2011

Page 9: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Sparsification Method•Clustering sparsified graph should be much faster than the original graph

•Retain good clustering quality compared to the clustering over the original graph

•Fast sparsification method

9

Page 10: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Sparsification MethodRetain the intra-cluster edges compared to inter-cluster edges

10

Page 11: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Intra and inter edgesOne way to differentiate between intra and inter edges is

• Edge betweenness centrality

• Number of shortest paths between any two vertices that pass through the edge (𝑖, 𝑗)

• The higher the betweenness the higher the edge is an inter-cluster edge

• Expensive to compute the betweenness

11

Page 12: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Similarity-based Sparsification Heuristic•An edge (𝑖, 𝑗) is likely to lie within a cluster • If the vertices 𝑖 and 𝑗 have adjacency list with high overlap

•Jaccard measure quantifies the overlap between

adjacency lists

𝑆𝑖𝑚 𝑖, 𝑗 =|𝐴𝑑𝑗 𝑖 𝐴𝑑𝑗 𝑗 |

|𝐴𝑑𝑗 𝑖 𝐴𝑑𝑗 𝑗 |

12

Page 13: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Global Sparsification (G-Spar)•Each edge (𝑖, 𝑗), will be assigned a similarity 𝑠𝑖𝑚(𝑖, 𝑗)

•Return the graph with the top 𝑠% of all the edges in the graph• 𝑠 is a sparsification parameter

13THE PSEUDOCODE IS FROM "LOCAL GRAPH SPARSIFICATION FOR SCALABLE CLUSTERING."

PROCEEDINGS OF THE 2011 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA. ACM, 2011.

Page 14: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Global Sparsification(G-Spar)Using one global threshold is not suitable for multiple density clusters

Original graph Globally sparsified graph

14FIGURES ARE FROM "LOCAL GRAPH SPARSIFICATION FOR SCALABLE CLUSTERING." PROCEEDINGS

OF THE 2011 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA. ACM, 2011

Page 15: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Local Sparsification (L-Spar)•Avoid using a global threshold

•For each node 𝑖 (with degree 𝑑𝑖), compute the similarity of each edge

•Pick the top 𝑓 𝑑𝑖 = 𝑑𝑖𝑒

• Ensure that for each node, at least one of its incident edges is picked

•Adapt to different densities

15

Page 16: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Local Sparsification (L-Spar)

Original graph Locally sparsified graph

16FIGURES ARE FROM "LOCAL GRAPH SPARSIFICATION FOR SCALABLE CLUSTERING." PROCEEDINGS

OF THE 2011 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA. ACM, 2011

Page 17: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Time Complexity•For node 𝑖 (with degree 𝑑𝑖) and node 𝑗 ( with degree 𝑑𝑗)

• Number of operations to intersect their adjacency lists is proportional to 𝑑𝑖 + 𝑑𝑗

•A node of degree 𝑑𝑖 requires 𝑑𝑖 intersections• The total number of operations is proportional to 𝑖 𝑑𝑖

2

•The total number of operations is proportional to 𝑛 ⋅ 𝑑𝑎𝑣𝑔2

• 𝑑𝑎𝑣𝑔 is the average degree of the graph

Prohibitively large for most graphs

17

Page 18: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing•Min-wise hashing is as a technique for estimating the jaccard coefficient between sets 𝐴 and 𝐵

•Let ℎ be a hash function that maps the members of a set 𝑆 of size 𝑛 to the range 0,… . , 𝑛 − 1

•Define ℎ𝑚𝑖𝑛(𝑆) to be min𝑥∈𝑆ℎ(𝑥)

• The member 𝑥 of 𝑆 with the minimum value of ℎ 𝑥

• ℎ𝑚𝑖𝑛(𝑆) = 4

18

10 12 4 7 5 3 3 5 0 1 2 4ℎ

Page 19: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing

19

•ℎ𝑚𝑖𝑛(𝐴) = ℎ𝑚𝑖𝑛(𝐵) when ℎ𝑚𝑖𝑛(𝐴) is in B

•Which means ℎ𝑚𝑖𝑛 𝐴 𝐵 lies in the intersection 𝐴 𝐵

•This happens with probability |𝐴𝑑𝑗 𝑖 𝐴𝑑𝑗 𝑗 |

|𝐴𝑑𝑗 𝑖 𝐴𝑑𝑗(𝑗)|Which is equal to s𝑖𝑚 𝑖, 𝑗

10 12 4 7 5 3

5 4 0 3 1 2

2 4 8 5 3 11

2 0 3 4 1 5

A

h(A)

B

h(B)

h h

Page 20: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing•A simple estimator for 𝑠𝑖𝑚(𝐴, 𝐵) is 𝐼[ℎ𝑚𝑖𝑛(𝐴) = ℎ𝑚𝑖𝑛(𝐵)]• 𝐼[𝑥] = 1 if x is true and 0 otherwise.

•Hashing can be considered as a random permutation 𝜋 on 𝑛 numbers rather than a hash function

ℎ𝑚𝑖𝑛(𝐴) = 7

first 𝜋 𝐴 = 7

which is the first element in 𝜋(𝐴)

20

10 12 4 7 5 3 7 4 3 5 12 10𝜋

10 12 4 7 5 3 5 4 1 0 3 2ℎ

Page 21: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise HashingThis can be considered as a random permutation 𝜋 on 𝑛 numbers rather than a hash function

first 𝜋 𝐴 = 7 first 𝜋 𝐵 = 8

21

10 12 4 7 5 3

7 4 3 5 12 10

2 4 8 5 3 11

8 3 4 12 11 8

A

𝜋(A)

B

𝜋(B)

𝜋 𝜋

Page 22: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing•A simple estimator for 𝑠𝑖𝑚(𝐴, 𝐵) is 𝐼 𝑓𝑖𝑟𝑠𝑡 𝜋 𝐴 = 𝑓𝑖𝑟𝑠𝑡 𝜋 𝐵

•This estimate has a high variance• It is always zero or one

22

Page 23: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise HashingTo reduce variance, use 𝑘 independent permutations 𝜋𝑖 , 𝑖 = 1: 𝑘

23

10 12 4 7 5 3

7 4 3 5 12 10

A

𝜋1(A)

5 7 4 10 3 12𝜋2(A)

4 10 7 3 12 5𝜋3(A)

𝑓𝑖𝑟𝑠𝑡(𝜋1 𝐴 ) = 7

𝑓𝑖𝑟𝑠𝑡(𝜋2 𝐴 ) = 5

𝑓𝑖𝑟𝑠𝑡(𝜋3 𝐴 ) = 4

Page 24: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing•To reduce variance, use 𝑘 independent permutations 𝜋𝑖 , 𝑖 = 1: 𝑘

𝑠𝑖𝑚 𝐴, 𝐵 =1

𝑘

𝑖=1

𝑘

𝐼[𝑓𝑖𝑟𝑠𝑡 𝜋𝑖 𝐴 = 𝑓𝑖𝑟𝑠𝑡 𝜋𝑖 𝐵 ]

24

Page 25: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Generating Permutations•Linear permutations

𝜋 𝑖 = 𝑎 ∗ 𝑖 + 𝑏 % 𝑃

•𝑃 is large prime number

•𝑎 is an integer drawn uniformly at random from [1, 𝑃 − 1]

•𝑏 is an integer drawn uniformly at random from [0, 𝑃 − 1]

•Multiple linear permutations can be generated by generating random pairs of (𝑎, 𝑏, 𝑃)

25

Page 26: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Minwise Hashing•Generate 𝑘 linear permutations, by generating 𝑘 triplets 𝑎, 𝑏, 𝑃

•Compute a length 𝑘 signature for each node by minhashing its adjacency list 𝑘 times

•Fill up a hashtable of size 𝑛 ∗ 𝑘

•∀ 𝑖, 𝑗 ∈ 𝐺• Compare the signatures of i and j

• Count the number of matching minhashes

•Sort the edges incident to a node 𝑖 by the number of matches of each edge

•Include the top 𝑑𝑖𝑒 inclusion in the sparsified graph

26

Page 27: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Outline•Motivation

•Related Work

•Graph Sparsification• Global Sparsification

• Local Sparsification

• Minwise Hashing

•Experimental Evaluation

•Conclusion

•Discussion

27

Page 28: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Experimental Results•Seven real-world datasets• Three biological datasets: Yeast-BioGrid (BioGrid), Yeast-DIP (DIP), Human-PPI (Human)

• Wikipedia: an undirected version of the graph of hyperlinks between Wikipedia articles

• Three social networks datasets: Orkut, Twitter, Flickr tags

•Baselines • L-Sparse

• G-Sparse

• Random Edge Sampling

• ForestFire

• All the experiments were performed on a PC with 16 GB RAM

28

Page 29: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Experimental Results•Average F-score (higher is better) when ground truth is available

•Average Conductance (lower is better) when ground truth is not available

• 𝜙 𝑆 =𝑖,𝑗 𝑠𝑡 𝑖∈𝑆,𝑗∈ 𝑆

min(| 𝑖,𝑗 𝑠𝑡 𝑖∈𝑆,𝑗∈𝑉|, 𝑖,𝑗 𝑠𝑡 𝑖∈ 𝑆, 𝑗∈𝑉 )

•Four state-of- the-art algorithms for clustering • Metis

• Metis+MQI

• MLR-MCL

• Graclus

29

Page 30: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Experimental Results

30

Page 31: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Experimental Results

31

Page 32: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Speedup and F-score as 𝑒 changes

32

Page 33: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Speedup and F-score as 𝑘 changes

33

Page 34: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

Conclusion•Scaling clustering algorithm is an important task

•Introduced an efficient and effective localized method to sparsify a graph• Retain only a fraction of the original number of edges

• Handless multiple density clusters

•The proposed sparsification speeds up graph clustering algorithms without sacrificing quality

34

Page 35: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

35

ThanksQuestions?

Page 36: Local Graph Sparsification for Scalable ClusteringLocal Graph Sparsification for Scalable Clustering Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan Ohaio State Univeristy Presented

DiscussionImplementing the sparsification method on multiple machines

◦ How to partition the data?

◦ Some framework doesn’t support Graph mutations: GraphLab and PowerGraph

How to handle streams (incremental clustering)?◦ Will we need to re-cluster

◦ Will the density change, do we need to change 𝑒

Can we use other measures that takes in account edge weight and edge direction?

How to handle outliers more efficiently?

36