Concord 7062 Armor at War - British Sherman Tanks - Concord Publications Company (2006)
Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a...
Transcript of Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a...
![Page 1: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/1.jpg)
Community Detection
fundamental limits & efficient algorithms
Laurent Massoulié, Inria
![Page 2: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/2.jpg)
Community Detection
• From graph of node-to-node interactions, identify groups of similar nodes
Example: Graph of US political blogs’ citations [Adamic & Glance 2005]
![Page 3: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/3.jpg)
Data: “friendship” graph
Variation: NSA’s “co-traveler programme”: Spot groups of suspect persons meeting regularly in unusual places
Application 1: contact recommendation in online social networks
!recommend members of user’s implicit community
![Page 4: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/4.jpg)
Data: {user-item} matrix Example: Netflix prize dataset! {user-movie} ratings
Item communities can guide recommendation: “users who liked this also liked…”
Application 2: item recommendation to users
User / Movie
…
Alice ? ** ***
Bob ∗∗∗ ? ?
…
Deirdre ***** ** **
![Page 5: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/5.jpg)
Application 3: categorizing chemical reactives in biology
Data: sets of chemicals and reactions involving them
Jeong, H., et al., Lethality and centrality in protein networks. Nature, 2001. 411(6833): p. 41-2. Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062): p. 1173-8.
More generally: Knowledge graph as generic representation of data A1 has with B1 interaction of type C1 A2 has with B2 interaction of type C2 …
![Page 6: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/6.jpg)
End goal: Algorithms with good accuracy at low computation cost
Outline:
– An algorithm
– Its performance when signal is strong
– Fundamental limits and better algorithms when signal is weak
![Page 7: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/7.jpg)
Typical algorithm for community detection: first embed, then cluster
7
Embedding space (here
2D)
![Page 8: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/8.jpg)
How to cluster
8
K-means clustering [Lloyd 1957]
Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated
points
![Page 9: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/9.jpg)
How to cluster
9
K-means clustering [Lloyd 1957]
Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated
points 3) Iterate
![Page 10: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/10.jpg)
How to cluster
10
K-means clustering [Lloyd 1957]
Initialization: start with K centers placed at random 1) Cluster points according to their nearest center 2) Update center position to center of mass of associated
points 3) Iterate
![Page 11: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/11.jpg)
Illustration in dimension 2 on Netflix dataset
![Page 12: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/12.jpg)
How to embed: The basic recipe for dimension reduction
•
Best 1D-fitBest 2D-fit
![Page 13: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/13.jpg)
Spectral Embedding
• 1
23
4
0 1 0 0
1 0 1 1
0 1 0 1
0 1 1 0
1 0 1 1
0 2 1 1
1 1 2 0
1 1 0 2
1
23
4
Column: Data of corresponding node
![Page 14: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/14.jpg)
Spectral Embedding
•
![Page 15: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/15.jpg)
Illustration in dimension 2 on Netflix dataset
Clustering from Singular Value Decomposition
! How good is this method? ! Should we replace adjacency by other matrix? ! How do amount & quality of data affect achievable accuracy?
![Page 16: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/16.jpg)
The need for generative models of data
Empirical comparison of algorithms on specific datasets: necessary, but – provides only limited understanding of their merits
Analysis of algorithms on data from generative model: – enables to quantify quality of algorithms – reveals fundamental limits on feasibility of community detection – guides design of new algorithms
The problem of ground truth: Where are the true Democrats?
![Page 17: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/17.jpg)
The Stochastic Block Model [Holland-Laskey-Leinhardt’83]
• Nodes in block 1
Nodes in block 2
Nodes in block 3
![Page 18: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/18.jpg)
Schematic view of community detection
Signal matrix
![Page 19: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/19.jpg)
Efficiency of spectral approach in a strong-signal regime
Stochastic block model with K=4 communities
![Page 20: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/20.jpg)
Efficiency of spectral approach in a strong-signal regime
Non-zero eigenvalues of signal matrix appear nearly unchanged Signal captured in their eigenvectors
Spectrum of adjacency matrix, strong-signal case
![Page 21: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/21.jpg)
One word on random matrices
•
![Page 22: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/22.jpg)
Efficiency of spectral approach in a strong-signal regime
•
![Page 23: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/23.jpg)
Efficiency of spectral approach in a strong-signal regime
Non-zero eigenvalues of signal matrix appear nearly unchanged Signal captured in their eigenvectors
Spectrum of adjacency matrix, strong-signal case
Weaker signal: useful information, if any remains, is no longer concentrated in a few eigenvectors
![Page 24: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/24.jpg)
•
![Page 25: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/25.jpg)
• a/n b/n
a/nb/n
n/2
n/2
![Page 26: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/26.jpg)
The argument for feasibility: fixing the spectral method
•
![Page 27: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/27.jpg)
Alternative: “Spectral Redemption” [Krzakala-Moore-Mossel-Neeman-Sly-Zdeborova-Zhang 2013]
•
u v ye f
u ve
f
ef
![Page 28: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/28.jpg)
Spectrum of non-backtracking matrix,stochastic block model
![Page 29: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/29.jpg)
Non-backtracking spectra of stochastic block models [Bordenave-Lelarge-LM, 2015]
•
![Page 30: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/30.jpg)
Spectrum of non-backtracking matrix,political blogs
![Page 31: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/31.jpg)
Conjectured phase diagram for community detection at low signal
(assuming fixed inter-community parameter b)
Intr
a-bl
ock
para
met
er a
Above Kesten-Stigum threshold: Detection “easy” (spectral methods++)
Number of blocks K
Kesten-St
igum th
resholdDetection feasible but hard
(no known polynomial time algorithm)
Detection infeasible
K = 2
![Page 32: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/32.jpg)
Conclusions❑ Community Detection motivates search for new algorithms
! Led to spectral methods with self-avoiding & non-backtracking path counts, but others are yet to be invented
❑ Community Detection in Stochastic Block Model: rich playground for analysis of computational complexity with methods of statistical physics and probability theory
! What can be said about the hard phase???
![Page 33: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/33.jpg)
BACKUP
![Page 34: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/34.jpg)
Spectrum of non-backtracking matrix Erdős-Rényi graph (1 community)
![Page 35: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/35.jpg)
The argument for impossibility [Mossel-Neeman-Sly 2012]
• u
...
![Page 36: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/36.jpg)
Ramanujan graphs [Lubotzky-Phillips-Sarnak’88]
•
![Page 37: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/37.jpg)
Corollary: Erdős-Rényi graphs are nearly Ramanujan
•
![Page 38: Community Detection fundamental limits & efficient algorithms · Rual, J.F., et al., Towards a proteome-scale map of the human protein-protein interaction network. Nature, 2005. 437(7062):](https://reader036.fdocuments.us/reader036/viewer/2022062414/5fb483c00649c033f57e94e7/html5/thumbnails/38.jpg)
Open questions for detection in SBM’s (2)
• K
n-K½
½
½