Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a...

38
Community Detection fundamental limits & efficient algorithms Laurent Massoulié, Inria

Transcript of Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a...

Page 1: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Community Detection

fundamental limits & efficient algorithms

Laurent Massoulié, Inria

Page 2: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Community Detection

• From graph of node-to-node interactions, identify groups of similar nodes

Example: Graph of US political blogs’ citations [Adamic & Glance 2005]

Page 3: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Data: “friendship” graph

Variation: NSA’s “co-traveler programme”: Spot groups of suspect persons meeting regularly in unusual places

Application 1: contact recommendation in online social networks

!recommend members of user’s implicit community

Page 4: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Data: {user-item} matrix Example: Netflix prize dataset! {user-movie} ratings

Item communities can guide recommendation: “users who liked this also liked…”

Application 2: item recommendation to users

User / Movie

Alice ? ** ***

Bob ∗∗∗ ? ?

Deirdre ***** ** **

Page 5: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Application 3: categorizing chemical reactives in biology

Data: sets of chemicals and reactions involving them

Jeong, H., et al., Lethality and centrality in protein networks. Nature, 2001. 411(6833): p. 41-2. Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062): p. 1173-8.

More generally: Knowledge graph as generic representation of data A1 has with B1 interaction of type C1 A2 has with B2 interaction of type C2 …

Page 6: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

End goal: Algorithms with good accuracy at low computation cost

Outline:

– An algorithm

– Its performance when signal is strong

– Fundamental limits and better algorithms when signal is weak

Page 7: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Typical algorithm for community detection: first embed, then cluster

7

Embedding space (here

2D)

Page 8: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

How to cluster

8

K-means clustering [Lloyd 1957]

Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated

points

Page 9: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

How to cluster

9

K-means clustering [Lloyd 1957]

Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated

points 3) Iterate

Page 10: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

How to cluster

10

K-means clustering [Lloyd 1957]

Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated

points 3) Iterate

Page 11: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Illustration in dimension 2 on Netflix dataset

Page 12: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

How to embed: The basic recipe for dimension reduction

•  

Best 1D-fitBest 2D-fit

Page 13: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Spectral Embedding

•  1

23

4

0 1 0 0

1 0 1 1

0 1 0 1

0 1 1 0

1 0 1 1

0 2 1 1

1 1 2 0

1 1 0 2

1

23

4

Column: Data of corresponding node

Page 14: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Spectral Embedding

•  

Page 15: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Illustration in dimension 2 on Netflix dataset

Clustering from Singular Value Decomposition

! How good is this method? ! Should we replace adjacency by other matrix? ! How do amount & quality of data affect achievable accuracy?

Page 16: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

The need for generative models of data

Empirical comparison of algorithms on specific datasets: necessary, but – provides only limited understanding of their merits

Analysis of algorithms on data from generative model: – enables to quantify quality of algorithms – reveals fundamental limits on feasibility of community detection – guides design of new algorithms

The problem of ground truth: Where are the true Democrats?

Page 17: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

The Stochastic Block Model [Holland-Laskey-Leinhardt’83]

•  Nodes in block 1

Nodes in block 2

Nodes in block 3

Page 18: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Schematic view of community detection

 

   

 

Signal matrix

Page 19: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Efficiency of spectral approach in a strong-signal regime

Stochastic block model with K=4 communities

 

Page 20: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Efficiency of spectral approach in a strong-signal regime

Non-zero eigenvalues of signal matrix appear nearly unchanged Signal captured in their eigenvectors

Spectrum of adjacency matrix, strong-signal case

Page 21: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

One word on random matrices

•  

Page 22: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Efficiency of spectral approach in a strong-signal regime

•  

Page 23: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Efficiency of spectral approach in a strong-signal regime

Non-zero eigenvalues of signal matrix appear nearly unchanged Signal captured in their eigenvectors

Spectrum of adjacency matrix, strong-signal case

Weaker signal: useful information, if any remains, is no longer concentrated in a few eigenvectors

Page 24: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

 

•  

Page 25: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

 

•   a/n b/n

a/nb/n

n/2

n/2

Page 26: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

The argument for feasibility: fixing the spectral method

•  

Page 27: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Alternative: “Spectral Redemption” [Krzakala-Moore-Mossel-Neeman-Sly-Zdeborova-Zhang 2013]

•  

u v ye f

u ve

f

ef

Page 28: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Spectrum of non-backtracking matrix,stochastic block model

  

 

 

Page 29: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Non-backtracking spectra of stochastic block models [Bordenave-Lelarge-LM, 2015]

•  

Page 30: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Spectrum of non-backtracking matrix,political blogs

Page 31: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Conjectured phase diagram for community detection at low signal

(assuming fixed inter-community parameter b)

Intr

a-bl

ock

para

met

er a

Above Kesten-Stigum threshold: Detection “easy” (spectral methods++)

Number of blocks K

Kesten-St

igum th

resholdDetection feasible but hard

(no known polynomial time algorithm)

Detection infeasible

K = 2

Page 32: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Conclusions❑ Community Detection motivates search for new algorithms

! Led to spectral methods with self-avoiding & non-backtracking path counts, but others are yet to be invented

❑ Community Detection in Stochastic Block Model: rich playground for analysis of computational complexity with methods of statistical physics and probability theory

! What can be said about the hard phase???

Page 33: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

BACKUP

Page 34: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Spectrum of non-backtracking matrix Erdős-Rényi graph (1 community)

 

 

Page 35: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

The argument for impossibility [Mossel-Neeman-Sly 2012]

•  u  

 ...

Page 36: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Ramanujan graphs [Lubotzky-Phillips-Sarnak’88]

•  

Page 37: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Corollary: Erdős-Rényi graphs are nearly Ramanujan

•  

Page 38: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):

Open questions for detection in SBM’s (2)

•  K

n-K½

½

½