Graph P artitioning a nd Clustering for Community Detection
description
Transcript of Graph P artitioning a nd Clustering for Community Detection
GRAPH PARTITIONINGAND CLUSTERING FORCOMMUNITY DETECTION
Presented By: Group One
1
Outline
Introduction: Hong Hande Graph Partitioning: Muthu Kumar C and Xie
Shudong Partitional Clustering: Agus Pratondo Spectral Clustering: Li Furong and Song
Chonggang Summary and Applications of Community
Detection: Aleksandr Farseev
2
INTRODUCTION-BY HONG HANDE
3
Facebook Group
https://www.facebook.com/thebeatles?rf=111113312246958
4
Flickr group
http://www.flickr.com/groups/49246928@N00/pool/with/417646359/#photo_417646359
5
CS6234 Advanced Algorithms
Sub-communityWhole class as a community
6
Graph construction from web data(1)
Webpage www.x.comhref = “www.y.com”href = “www.z.com”
x
zy
ba
Webpage www.y.comhref = “www.x.com”href = “www.a.com”href = “www.b.com”
Webpage www.z.comhref = “www.a.com”
7
Graph construction from web data(2) 8
Web pages as a graph
Cnn.com
Lots of links, lots of images. (1316 tags)
http://www.aharef.info/2006/05/websites_as_graphs.htm
9
Internet as a graph
nodes = service providers edges = connections
hierarchical structure
S. Carmi,S. Havlin, S. Kirkpatrick, Y. Shavitt, E. Shir. A model of Internet topology using k-shell decomposition. PNAS 104 (27), pp. 11150-11154, 2007
10
Emerging structures
Graph (from web, daily life) present certain structural characteristics
Group of nodes interacting with each other Dense inter-connections
functional/topical associations
Communitya.k.a. group, subgroup, module, cluster
11
Community Types
Explicit The result of conscious human decision
Implicit Emerging from the interactions & activities of users Need special methods to be discovered
12
Defining Communities
Often communities are defined with respect to a graph, G = (V,E) representing a set of objects (V) and their relations (E).
Even if such graph is not explicit in the raw data, it is usually possible to construct, e.g. feature vectors distances graph
13
Communities and graphs
Internal edge
External edge
Given a graph, a community is defined as a set of nodes that are more densely connected to each other than to the rest of the network nodes
14
Graph cuts
A cut is a partition of the vertices of a graph into two disjoint subsets.
The cut-set of the cut is the set of edges whose end points are in different subsets of the partition.
15
Community detection methods
Graph partitioning
Node clustering K-means clustering Spectral clustering
16
GRAPH PARTITIONING
MUTHU KUMAR C
17
Graph Partitioning
Dividing vertices into groups of predefined size.
Given a graph G = (V, E, WE), with vertices V, edges E and edge weights WE.
Choose a partition such that: V = V1 U V2 U … U VP
V1∩ V2 …. ∩ Vp = Ø
Bisectioning: Partitioning into two equal sized groups of vertices.
18
How many partitions?
There exists many possible partitioning to search. Just to divide into 2 partitions there are: which is exponential in n. Choosing optimal partitioning is NP-complete.
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
19
20
Kernighan/Lin Algorithm1
An iterative, 2-way, balanced partitioning (bi-sectioning) heuristic.
The algorithm can also be extended to solve more general partitioning problems.
Given and find a partition such that: Cutsize T between A and B is minimized.
where
1. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2), 291-307.
21
Let and be two vertices. External Cost Internal Cost
Moving a node from A to B increases T by and decreases T by
This is measured as , and are defined analogously for
b in B.
Kernighan-Lin: Definitions
22
K/L Algorithm: Swap
A B
a
b
Cutsize
A Bb
a
A B
b
a
Cutsize
25
Kernighan-Lin Algorithm
// KERNIGHAN-LIN Page 1 of 2COMPUTE T = COST(A,B) FOR INITIAL A, B
REPEAT // SWEEP BEGINS Compute costs D(v) for all v in V Unmark all vertices in V
While there are unmarked nodes Find an unmarked pair (ai,bi) with maximal gai,bi(i)
Mark ‘a’ and ‘b’ Update D(v) for all unmarked v Endwhile
Each sweep greedily computes |V|/2 possible X A, Y B to swap, picks a sequence of best such swaps.
but do not swap them.
as though ‘a’ and ‘b’ had been swapped.
(1)
(2)
26
Kernighan-Lin Algorithm
// KERNIGHAN-LIN Page 2 of 2We have now computed:
*) a sequence of pairs(a1,b1), … , (ak,bk) and*) gains g(1),…., g(k) where k = |V|/2,
numbered in the order in which we marked themPick m ≤ k, which maximizes gain.
GAIN =
If Gain > 0 then // it is worth swapping Update newA = A - { a1,…,am } U { b1,…,bm } Update newB = B - { b1,…,bm } U { a1,…,am } Update T = T – Gain endifUNTIL GAIN <= 0 // SWEEP ENDS
Gain is reduction in cost from swapping (a1,b1) through (am,bm)
Kernighan-Lin Example
2
5
6
3
1
4
7
8
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
Edges are unweighted in this example
30
2
5
6
3
1
4
7
8
Cut cost: 9Unmarked : 1,2,3,4,5,6,7,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1
Costs D(v) of each node:
Nodes that lead to maximum gain
Kernighan/Lin Example31
Calculate D values to find best pair
Kernighan/Lin Example
2
5
6
3
1
4
7
8
Cut cost: 9Unmarked : 1,2,3,4,5,6,7,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1
g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
Nodes that lead to maximum gain
Gain in the current pass
Costs D(v) of each node:
Gain after node swapping
32
Mark the identified pair as a candidate swap.
Kernighan/Lin Example
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
Cut cost: 6Unmarked: 1,2,4,6,7,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
33
New partitions and cut cost
Kernighan/Lin Example
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
Cut cost: 6Unmarked: 1,2,4,6,7,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
D(1) = -1 D(6) = 2 D(2) = -1 D(7)=-1 D(4) = 3 D(8)=-1
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
34
Kernighan/Lin Example
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
Cut cost: 6Unmarked: 1,2,4,6,7,8
2
5
6
3
1
4
7
8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
D(1) = -1 D(6) = 2 D(2) = -1 D(7)=-1 D(4) = 3 D(8)=-1 g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2 =8
Nodes that lead to maximum gain
Gain in the current pass
Gain after node swapping
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
35
Kernighan/Lin Example
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
Cut cost: 6Unmarked: 1,2,4,6,7,8
Cut cost: 1Unmarked: 1,2,7,8
2
5
6
3
1
4
7
8
Cut cost: 7Unmarked: 2,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
D(1) = -1 D(6) = 2 D(2) = -1 D(7)=-1 D(4) = 3 D(8)=-1 g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2 =8
D(1) = -3D(7)=-3 D(2) = -3 D(8)=-3
g3 = -3-3-0 = -6 Swap (1,7) G3= G2 +g3 = 2
Gain in the current pass
Nodes that lead to maximum gain
Gain after node swapping
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
36
Kernighan/Lin Example
2
5
6
3
1
4
7
8
Cut cost: 9Unmarked: –
Cut cost: 6Unmarked: 1,2,4,6,7,8
Cut cost: 1Unmarked: 1,2,7,8
Cut cost: 7Unmarked: 2,8
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
D(1) = -1 D(6) = 2 D(2) = -1 D(7)=-1 D(4) = 3 D(8)=-1 g2 = 3+2-0 = 5 Swap (4,6) G2 = G1+g2 =8
D(1) = -3D(7)=-3 D(2) = -3 D(8)=-3
g3 = -3-3-0 = -6 Swap (1,7) G3= G2 +g3 = 2
D(2) = -1 D(8)=-1
g4 = -1-1-0 = -2 Swap (2,8) G4 = G3 +g4 = 0
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
2
5
6
3
1
4
7
8
Cut cost: 9Unmarked: 1,2,3,4,5,6,7,8
37
Kernighan/Lin Example
D(1) = 1 D(5) = 1 D(2) = 1 D(6) = 2 D(3) = 2 D(7) = 1 D(4) = 1 D(8) = 1 g1 = 2+1-0 = 3 Swap (3,5) G1 = g1 =3
D(1) = -1 D(6) = 2 D(2) = -1 D(7)=-1 D(4) = 3 D(8)=-1 g2 = 3+2-0 = 5
Swap (4,6) G2 = G1+g2 =8
D(1) = -3D(7)=-3 D(2) = -3 D(8)=-3
g3 = -3-3-0 = -6 Swap (1,7) G3= G2 +g3 = 2
D(2) = -1 D(8)=-1
g4 = -1-1-0 = -2 Swap (2,8) G4 = G3 +g4 = 0
Since Gm > 0, the first m = 2 swaps (3,5) and (4,6) are executed. 2
5
6
3
1
4
7
8
Since Gm > 0, more passes are needed until Gm 0.
Maximum positive gain Gm = 8 with m = 2.
38
39
Escaping Local minima
Non monotonically increasing gains, that is, in the sequence of m swaps chosen, some may be negative.
Possibly escape “local minima”.
But there is no guarantee of optimal solution.
Demerits40
Bi-sectioning does not generalize well to k-way partitioning.
Partition to predefined sizes limits utility to niche applications.
ANALYSIS OF K/L ALGORITHM
XIE SHUDONG
41
42
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then … it is worth swapping
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
K/L Algorithm: AnalysisCOMPUTE T = COST(A,B) FOR INITIAL A, B
43
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then … it is worth swapping
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
K/L Algorithm: AnalysisCOMPUTE T = COST(A,B) FOR INITIAL A, B
A B
O(|V|²)
Edges |V|/2
Nodes |V|/2
**
==
All Ext Edges |V|²/4
44
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
A B
a
b
For one node a: D(a) = E(a) – I(a) O(|V|)
For all |V| nodes
O(|V|²)
O(|V|²)
K/L Algorithm: AnalysisCOMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
45
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
K/L Algorithm: AnalysisCOMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B46
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(|V|²)
K/L Algorithm: AnalysisO(|V|²) The (i+1)-th Candidate Pair
A B Pairs
47
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(1)
K/L Algorithm: Analysis
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, B48
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(1)
newD(a’) = D(a’) + 2*w(a’, a) - 2*w(a’, b) O(1)
O(|V|)
K/L Algorithm: Analysis
O(|V|²)
O(|V|²)
(i+1)-th loop: |V|-2i Unmarked Nodes
49
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(1)O(|V|)
|V|/2 pairs to be found
O(|V|³)
K/L Algorithm: Analysis
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
50
O(|V|²)
O(|V|)
O(1)O(|V|)
O(|V|³)
g(1)g(1) + g(2)…
…g(1) + g(2) + … + g(m) + … + g(|V|/2)
g(1) + g(2) + … + g(m) → G O(|V|)
O(|V|)
K/L Algorithm: Analysis
O(|V|²)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
51
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b∪ 1, …, bm } Update newB = B - { b1, …, bm } { a∪ 1, …, am }Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)O(|V|)
O(1)O(|V|)
O(|V|)
A
O(|V|)
K/L Algorithm: Analysis
O(|V|²)
O(|V|³)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
O(|V|)
52
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b1, …, bm } ∪ Update newB = B - { b1, …, bm } { a1, …, am }∪Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(1)O(|V|)
O(|V|)
O(|V|)
O(|V|)
K/L Algorithm: Analysis
O(|V|²)
O(|V|³)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
O(1)
O(|V|)
53
REPEATCompute costs D(v) for all v in VUnmark all vertices in VWhile there are unmarked nodes
Find an unmarked pair (a, b) with maximal g(a, b) Mark ‘a’ and ‘b’ (but do not swap them)Update D(v) for all unmarked v, as though ‘a’ and ‘b’ had been
swappedEndwhilePick m maximizing If Gain > 0 then
Update newA = A - { a1, …, am } { b1, …, bm } ∪ Update newB = B - { b1, …, bm } { a1, …, am }∪Update T = T – G
endifUNTIL GAIN <= 0
O(|V|²)
O(|V|)
O(1)O(|V|)
O(|V|)
O(|V|)
O(|V|)
O(|V|)
(p iterations) O(p |V|³)
K/L Algorithm: Analysis
O(|V|²)
O(|V|³)
COMPUTE T = COST(A,B) FOR INITIAL A, BO(|V|²)
O(1)
How many Iterations?
Empirical testing by Kernighan and Lin on small graphs (|V|<=360) showed convergence after 2 to 4 passes
K-MEANS CLUSTERING
by Agus Pratondo
54
Graph in Rn
55
x ey
d
ba
c
25
3
2
1 1
1
Graph in Rn
56
x ey
d
ba
c
25
3
2
1 1
1
a b c d e x ya 0 2 0 0 0 0 0b 2 0 0 0 0 1 5c 0 0 0 0 0 2 0d 0 0 0 0 1 3 0e 0 0 0 1 0 0 1x 0 1 2 3 0 0 0y 0 5 0 0 1 0 0
Adjacency matrix
Graph in Rn
57
x ey
d
ba
c
25
3
2
1 1
1
a b c d e x ya 0 2 0 0 0 0 0b 2 0 0 0 0 1 5c 0 0 0 0 0 2 0d 0 0 0 0 1 3 0e 0 0 0 1 0 0 1x 0 1 2 3 0 0 0y 0 5 0 0 1 0 0
Adjacency matrix
pa (0,2,0,0,0,0,0)b (2,0,0,0,0,1,5)c (0,0,0,0,0,2,0)d (0,0,0,0,1,3,0)e (0,0,0,1,0,0,1)x (0,1,2,3,0,0,0)y (0,5,0,0,1,0,0)
points in R7 :
Algorithm
Algorithm: Basic K-means1: Select K points as the initial centroids2: repeat3: Form K clusters by assigning all points to the closest centroids4: Re-compute the centroids of each cluster5: until the centroids do not change
58
K-means example, step 159
X
Y
Let K = 3
K-means example, step 160
k1
k2
k3
X
YPick 3 initialclustercenters(randomly)
Let K = 3
Algorithm
Algorithm: Basic K-means1: Select K points as the initial centroids2: repeat3: Form K clusters by assigning all points to the closest centroids4: Re-compute the centroids of each cluster5: until the centroids do not change
61
K-means example, step 2-362
k1
k2
k3
X
Y
Assigneach pointto the closestclustercenter
Algorithm
Algorithm: Basic K-means1: Select K points as the initial centroids2: repeat3: Form K clusters by assigning all points to the closest centroids4: Re-compute the centroids of each cluster5: until the centroids do not change
63
K-means example, step 464
X
Y
Moveeach cluster centerto the meanof each cluster
k1
k2
k2
k1
k3
k3
`
``
Algorithm
Algorithm: Basic K-means1: Select K points as the initial centroids2: repeat3: Form K clusters by assigning all points to the closest centroids4: Re-compute the centroids of each cluster5: until the centroids do not change
65
K-means example, step 3-4(repeat)66
X
YReassignpoints closest to a different new cluster center
k1
k2
k3
K-means example, step 3-4(repeat)67
X
Y
Three points change
k1
k3k2
K-means example, step 3-4(repeat)68
X
Y
re-compute cluster means
k1
k3k2
K-means example, step 3-4 (repeat)69
X
Y
move cluster centers to cluster means
k2
k1
k3
The centers change, repeat step 3-4
K-means example, step 570
X
Y
No cluster center changes
convergedk2
k1
k3
Time Complexity
Algorithm: Basic K-means1: Select K points as the initial centroids2: repeat 3: Form K clusters by assigning all points to the closest centroids4: Re-compute the centroids of each cluster5: until the centroids do not change
Loop is repeated i times. Step 3: There are n points. Each points, the distance to k clusters is evaluated. Step 4: There are k cluster center that should be re-computed.
If there are k clusters with n points , and the iteration is repeated i times thenthe time complexity will be O(kni)
71
Discussion72
Result can vary significantly depending on the initial choice of seeds (number and position) To increase chance of finding global optimum:
restart with different random seeds.
Problem with initializations 73
4 most top points 4 most bottom points
Problem with initializations 74
4 most left points 4 most right points
Discussion75
Result can vary significantly depending on initial choice of seeds (number and position) To increase chance of finding global optimum:
restart with different random seeds.
SPECTRAL CLUSTERING-BY LI FURONG
76
Motivation77
Motivation
Two kinds of clusters
convex shaped non-convex shaped
78
Motivation
Two kinds of clusters convex shaped, compact
convex shaped non-convex shaped
k-means
79
Motivation
Two kinds of clusters convex shaped, compact non-convex shaped, connected
convex shaped non-convex shaped
k-means
spectral clustering
80
Key Idea
Project the data points into a new space Clusters can be trivially detected in the new
space
81
Key Idea
Project the data points into a new space Clusters can be trivially detected in the new
space
Next, we will cover How to find the new space How to represent data points in the space
82
Matrix Representations of Graphs84
Matrix Representations of Graphs
Adjacency matrix W
Degree di of a node i
Degree matrix D
85
Matrix Representations of Graphs
Adjacency matrix W
Degree di of a node i
Degree matrix D
86
Graph Laplacian
Graph Laplacian87
Graph Laplacian
Graph Laplacian
Next, we will see some properties of L, which would be used for spectral clustering
We will work closely with linear algebra, especially eigenvalues and eigenvectors
88
Properties of Graph Laplacian (1)
(1)(2)
89
Properties of Graph Laplacian (1)
(1)(2)apply Equation 2
90
Proof:
Properties of Graph Laplacian (1)
(1)(2)apply Equation 2
91
Proof:
Properties of Graph Laplacian (1)
(1)(2)
apply Equation 1
apply Equation 2
92
Proof:
Properties of Graph Laplacian (1)
(1)(2)
apply Equation 1
apply Equation 2
93
Proof:
Properties of Graph Laplacian (2)
The smallest eigenvalue of L is 0, the corresponding eigenvector is the constant one vector . (1)(2)
96
Properties of Graph Laplacian (2)
The smallest eigenvalue of L is 0, the corresponding eigenvector is the constant one vector . (1)(2)
97
Properties of Graph Laplacian (2)
The smallest eigenvalue of L is 0, the corresponding eigenvector is the constant one vector . (1)(2)
98
Proof:
Properties of Graph Laplacian (2)
The smallest eigenvalue of L is 0, the corresponding eigenvector is the constant one vector . (1)(2)
99
Proof:
We Have Done So Many Works…101
We Have Done So Many Works…
Transform the graph to Laplacian L102
We Have Done So Many Works…
Transform the graph to Laplacian L
Study the properties of L, basically the eigenvalues and eigenvectors
103
We Have Done So Many Works…
Transform the graph to Laplacian L
Study the properties of L, basically the eigenvalues and eigenvectors
Finally, we can see the relationship between the graph and the eigenvalues!
104
Number of Connected Components & Eigenvalues of L
105
Number of Connected Components & Eigenvalues of L
a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
106
a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear independent eigenvectors corresponding to v
Number of Connected Components & Eigenvalues of L
107
indicator vector:
Number of Connected Components & Eigenvalues of L
a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear independent eigenvectors corresponding to v
108
indicator vector:
Number of Connected Components & Eigenvalues of L
a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear independent eigenvectors corresponding to v
109
indicator vector:
Number of Connected Components & Eigenvalues of L
a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
If an eigenvalue v has multiplicity k, then there are k linear independent eigenvectors corresponding to v
eigenvectors correspondingto eigenvalue 0
1⋮1
1⋮1
110
Proof of Proposition 2111
Proof of Proposition 2
1 connected component
112
Proof of Proposition 2
1 connected component
113
Proof of Proposition 2
1 connected component
i j
114
Proof of Proposition 2
1 connected component
i jnm
115
Proof of Proposition 2
𝑓 =(1⋮1)
1 connected component
i jnm
116
Proof of Proposition 2
several connected components
1⋮1
1⋮1
1 connected component
i jnm 𝑓 =(1⋮1)
117
Proof of Proposition 2
several connected components
1⋮1
1⋮1
1 connected component
i jnm 𝑓 =(1⋮1)
118
SPECTRAL CLUSTERING-BY SONG CHONGGANG
119
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W
120
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W Compute the first k eigenvectors of L Let contain the vectors as columns
121
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W Compute the first k eigenvectors of L Let contain the vectors as columns
122
New space found!
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W Compute the first k eigenvectors of L Let contain the vectors as columns Let be the vector corresponding to the i-th row of U Cluster the points into k clusters using k-means
123
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W Compute the first k eigenvectors of L Let contain the vectors as columns Let be the vector corresponding to the i-th row of U Cluster the points into k clusters using k-means
124
Representing data in the new space!
Spectral Clustering Algorithm
Input: Graph , number k of clusters to form Compute adjacency matrix W and degree matrix D Laplacian L = D – W Compute the first k eigenvectors of L Let contain the vectors as columns Let be the vector corresponding to the i-th row of U Cluster the points into k clusters using k-means
Time Complexity: O(n3)
125
Example(1)
Now let’s go through an example. n = 6, k=2
126
Example(2)
Step 1: Weighted adjacency matrix W and degree matrix D
Adjacency Matrix W Degree Matrix D
0 0 0
00
00
0
0 0
0
0
0
0
0 00
127
Example(3)
Step 2: Laplacian matrix L=D-W
Laplacian Matrix L
128
Example(4)
Step 3: Eigen-decomposition Eigenvalues =
Eigenvectors=
129
Laplacian Matrix L
Example(5)
Step 3: Eigen-decomposition Eigenvalues =
Eigenvectors=
U
130
Example(6)
Step 4: Embedding U=
131
Example(6)
Step 4: Embedding U=
Each row represents a data point
132
Example(7)
Step 4: Embedding U=
Map it to a two-dimensional space
0
0.25
0.5
-0.25
-0.5
0 0.25 0.5-0.25-0.5
133
Example(8)
Step 5: Clustering K-means clustering
0
0.25
0.5
-0.25
-0.5
0 0.25 0.5-0.25-0.5
0
0.25
0.5
-0.25
-0.5
0 0.25 0.5-0.25-0.5
Cluster A
Cluster B
134
Example(8)
Step 5: Clustering K-means clustering
0
0.25
0.5
-0.25
-0.5
0 0.25 0.5-0.25-0.5
0
0.25
0.5
-0.25
-0.5
0 0.25 0.5-0.25-0.5
Cluster A
Cluster B
135
Why Spectral Clustering Works?(1)
Consider an ideal case There are no similarities between any nodes in
different connected components This conforms to Proposition 2:
136
Why Spectral Clustering Works?(1)
Consider an ideal case There are no similarities between any nodes in
different connected components Compute the weighted adjacency matrix W and
degree matrix D. L = D - W; compute L’s 3 eigenvectors of eigenvalue
0.
137
Why Spectral Clustering Works?(1)
1⋮1
1⋮1
138
Consider an ideal case There are no similarities between any nodes in
different connected components Compute the weighted adjacency matrix W and
degree matrix D. L = D - W; compute L’s 3 eigenvectors of eigenvalue
0.
Why Spectral Clustering Works?(2)
Consider an ideal case Let the three eigenvectors be three columns of
matrix U.
U=1⋮1
1⋮1
1⋮1
1⋮1
139
Why Spectral Clustering Works?(2)
Consider an ideal case Let the three eigenvectors be three columns of a
matrix U. Project the rows in U to a 3-dimensional space.
U=1⋮1
1⋮1
140
Why Spectral Clustering Works?(3)
Consider an ideal case Now we use K-Means in this space, we can have
very good results. # of 0 eigenvalues = # of connected components
141
Why Spectral Clustering Works?(4)
What if not the ideal case? We need to introduce Perturbation Theory.
Ideal Case
142
Why Spectral Clustering Works?(4)
What if not the ideal case? We need to introduce Perturbation Theory.
Perturbation is like noise.
Ideal Case Nearly ideal Case
Perturbation
143
Why Spectral Clustering Works?(5)
What if not the ideal case? Perturbation Theory will not be formally discussed
here. References will be offered on IVLE.
144
Why Spectral Clustering Works?(5)
What if not the ideal case? Perturbation Theory will not be formally discussed
here. What you need to know is:
For ideal case, the between-cluster similarity is 0. The first k eigenvectors of Laplacian matrix L are
indicators of clusters. For real case, L’ = L + H, where H is the perturbation. Perturbation theory tells us the eigenvectors
generated from L’ will be very close to the ideal vectors from L, bounded by a small value.
145
APPLICATIONS AND SUMMARY-BY ALEKSANDR FARSEEV
Applications: VLSI
Very-large-scale integration (VLSI) - the process of creating integrated circuits by combining thousands of transistors into a single chip.
147
Applications: VLSI design
ENTITY test isport a: in bit;
end ENTITY test;
DRCLVSERC
Circuit Design
Functional Designand Logic Design
Physical Design
Physical Verificationand Signoff
Fabrication
System Specification
Architectural Design
Chip
Packaging and Testing
Chip Planning
Placement
Signal Routing
Partitioning
Timing Closure
Clock Tree Synthesis
148
Circuit:
Cut ca: four external connections
1
2
4
5
3
6
7 8
5
6
48
7 23
1
56
48
7 2
3 1
Cut ca
Block A Block B Block A Block B
Cut cb: two external connections
Cut cb
Applications: VLSI design(2)149
Applications: Social Media
One-modality (one-mode) network - type of networks where all vertices are of the same kind.
Multi-modality or multi-mode network - type of networks where vertices are of different kinds.
Hypergraph - is a generalization of a graph, where an edge (hyperedge) can connect any number of vertices.
(k, k) - (hyper) network - a network with k modalities and hyper edges involving exactly k vertices with each vertex from one unique modality. (3, 3) – network
150
Applications: Social Media (2)
(3, 3) – network
Graphrepresentation
151
Applications: Social Media (3)
(2, 2) User - Venue network
1 2 3 4
1 2 3 4 5
12 3
4 5
(1,2) User – User similarity network
UsersUsers
Venues
152
Applications: Social Media (4)
Reduction
(3, 3) User - Venue - Photo network(1, 2) User – User network
153
Applications: Social Media (5)
Detection
154
Applications: Social Media (6)
Understanding
Identification
155
Applications: Social Media (7) 156
http://next.comp.nus.edu.sg
Other applications
Parallel processing
Parallel Graph Computations
Complex Networks
Power Grids
Geographically Embedded Networks
Road Networks
Image Processing
157
Summary: KL Graph Partitioning
Able to perform only Bipartition.
Can not detect overlapping communities.
Time complexity – O
158
Summary: K-Means
Fast: O, where is # objects, is # clusters, and is #
iterations.
Easy to implement.
Need to specify k, the number of clusters, in advance
Not suitable to discover clusters with non-convex shapes
Can not detect overlapping communities
159
Summary: Spectral clustering
Need to specify k, the number of clusters, in advance
Can not detect overlapping communities.
Time complexity – O, where is # objects, is #
iterations.
Able to discover clusters with non-convex shapes
160
161
Sources162
1. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell system technical journal, 49(2), 291-307
2. James Demmel, CS 267: Applications of Parallel Computers , Graph Partitioning, http://www.cs.berkeley.edu/~demmel/cs267_Spr09
3. Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu, VLSI Physical Design: From Graph Partitioning to Timing Closure
4. Sadiq M. Sait & Habib Youssef King, Chapter 2: Partitioning, Fahd University of Petroleum & Minerals College of Computer Sciences & Engineering Department of Computer Engineering September 2003.
5. http://shabal.in/visuals/kmeans/2.html6. www.cs.ucr.edu/~eamonn/205/MachineLearning3.ppt7. info.psu.edu.sa/psu/cis/asameh/cs-500/dm13-clustering.ppt