Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals...
Transcript of Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals...
![Page 1: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/1.jpg)
Machine Learning - MT 2016
15. Clustering
Varun Kanade
University of OxfordNovember 28, 2016
![Page 2: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/2.jpg)
Announcements
I No new practical this week
I All practicals must be signed off in sessions this week
I Firm Deadline: Reports handed in at CS reception by Friday noon
I Revision Class for M.Sc. + D.Phil. Thu Week 9 (2pm & 3pm)
I Work through ML HT2016 Exam (Problem 3 is optional)
1
![Page 3: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/3.jpg)
Outline
This week, we will study some approaches to clustering
I Defining an objective function for clustering
I k-Means formulation for clustering
I Multidimensional Scaling
I Hierarchical clustering
I Spectral clustering
2
![Page 4: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/4.jpg)
England pushed towards Test defeat by India
France election: Socialists scramble to avoid split after Fillon win
Giants Add to the Winless Browns’ Misery
Strictly Come Dancing: Ed Balls leaves programme
Trump Claims, With No Evidence, That ‘Millions of People’ Voted Illegally
Vive ‘La Binoche’, the reigning queen of French cinema
3
![Page 5: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/5.jpg)
Sports England pushed towards Test defeat by India
Politics France election: Socialists scramble to avoid split after Fillon win
Sports Giants Add to the Winless Browns’ Misery
Film&TV Strictly Come Dancing: Ed Balls leaves programme
Politics Trump Claims, With No Evidence, That ‘Millions of People’ Voted Illegally
Film&TV Vive ‘La Binoche’, the reigning queen of French cinema
3
![Page 6: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/6.jpg)
England England pushed towards Test defeat by India
France France election: Socialists scramble to avoid split after Fillon win
USA Giants Add to the Winless Browns’ Misery
England Strictly Come Dancing: Ed Balls leaves programme
USA Trump Claims, With No Evidence, That ‘Millions of People’ Voted Illegally
France Vive ‘La Binoche’, the reigning queen of French cinema
3
![Page 7: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/7.jpg)
Clustering
Often data can be grouped together into subsets that are coherent.However, this grouping may be subjective. It is hard to define a generalframework.
Two types of clustering algorithms
1. Feature-based - Points are represented as vectors in RD
2. (Dis)similarity-based - Only know pairwise (dis)similarities
Two types of clustering methods
1. Flat - Partition the data into k clusters
2. Hierarchical - Organise data as clusters, clusters of clusters, and so on
4
![Page 8: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/8.jpg)
Defining Dissimilarity
I Weighted dissimilarity between (real-valued) attributes
d(x,x′) = f
D∑i=1
widi(xi, x′i)
I In the simplest setting wi = 1 and di(xi, x′i) = (xi − x′i)2 and f(z) = z,
which corresponds to the squared Euclidean distance
I Weights allow us to emphasise features differently
I If features are ordinal or categorical then define distance suitably
I Standardisation (mean 0, variance 1) may or may not help
5
![Page 9: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/9.jpg)
Helpful Standardisation
6
![Page 10: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/10.jpg)
Unhelpful Standardisation
7
![Page 11: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/11.jpg)
Partition Based Clustering
Want to partition the data into subsets C1, . . . , Ck, where k is fixed inadvance
Define quality of a partition by
W (C) =1
2
k∑j=1
1
|Cj |∑
i,i′∈Cj
d(xi,xi′)
If we use d(x,x′) = ‖x− x′‖2, then
W (C) =k∑
j=1
∑i∈Cj
‖xi − µj‖2
where µj = 1|Cj |
∑i∈Cj
xi
The objective is minimising the sum of squares of distances to the meanwithin each cluster
8
![Page 12: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/12.jpg)
Outline
Clustering Objective
k-Means Formulation of Clustering
Multidimensional Scaling
Hierarchical Clustering
Spectral Clustering
![Page 13: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/13.jpg)
Partition Based Clustering : k-Means Objective
Minimise jointly over partitions C1, . . . , Ck and µ1, . . . ,µk
W (C) =k∑
j=1
∑i∈Cj
‖xi − µj‖2
This problem is NP-hard even for k = 2 for points in RD
If we fix µ1, . . . ,µj , finding a partition (Cj)kj=1 that minimisesW is easy
Cj = {i | ‖xi − µj‖ = minj′‖xi − µj′‖}
If we fix the clusters C1, . . . , Ck minimisingW with respect to (µj)kj=1 is
easy
µj =1
|Cj |∑i∈Cj
xi
Iteratively run these two steps - assignment and update
9
![Page 14: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/14.jpg)
10
![Page 15: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/15.jpg)
10
![Page 16: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/16.jpg)
10
![Page 17: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/17.jpg)
10
![Page 18: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/18.jpg)
10
![Page 19: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/19.jpg)
Ground Truth Clusters k-Means Clusters (k = 3)
11
![Page 20: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/20.jpg)
The k-Means Algorithm
1. Intialise means µ1, . . . ,µk ‘‘randomly’’
2. Repeat until convergence:
a. Find assignments of data to clusters represented by the mean that isclosest to obtain, C1, . . . , Ck:
Cj = {i | j = argminj′
‖xi − µj′‖2}
b. Update means using the current cluster assignments:
µj =1
|Cj |∑i∈Cj
xi
Note 1: Ties can be broken arbitrarily
Note 2: Choosing k random datapoints to be the initial k-means is a goodidea
12
![Page 21: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/21.jpg)
The k-Means Algorithm
Does the algorithm always converge?
Yes, because theW function decreases every time a new partition is used;there are only finitely many partitions
W (C) =
k∑j=1
∑i∈Cj
‖xi − µj‖2
Convergence may be very slow in the worst-case, but typically fast onreal-world instances
Convergence is probably to a local minimum. Run multiple times withrandom initialisation.
Can use other criteria: k-medoids, k-centres, etc.
Selecting the right k is not easy: plotW against k and identify a "kink"
13
![Page 22: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/22.jpg)
Ground Truth Clusters k-Means Clusters (k = 4)
14
![Page 23: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/23.jpg)
Choosing the number of clusters k
2 4 6 8 10 12 14 160
0.05
0.1
0.15
0.2
0.25
MSE on test vs K for K−means
I As in the case of PCA, larger k will give better value of the objective
I Choose suitable k by identifying a ‘‘kink’’ or ‘‘elbow’’ in the curve
(Source: Kevin Murphy, Chap 11)
15
![Page 24: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/24.jpg)
Outline
Clustering Objective
k-Means Formulation of Clustering
Multidimensional Scaling
Hierarchical Clustering
Spectral Clustering
![Page 25: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/25.jpg)
Multidimensional Scaling (MDS)
In certain cases, it may be easier to define (dis)similarity between objectsthan embed them in Euclidean space
Algorithms such as k-means require points to be in Euclidean space
Ideal Setting: Suppose for someN points in RD we are given all pairwiseEuclidean distances in a matrix D
Can we reconstruct x1, . . . ,xN , i.e., all of X?
16
![Page 26: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/26.jpg)
Multidimensional Scaling
Distances are preserved under translation, rotation, reflection, etc.
We cannot recover X exactly; we can aim to determine X up to thesetransformations
IfDij is the distance between points xi and xj , then
D2ij = ‖xi − xj‖2
= xTi xi − 2xT
i xj + xTj xj
=Mii − 2Mij +Mjj
Here M = XXT is theN ×N matrix of dot products
Exercise: Show that assuming∑
i xi = 0, M can be recovered from D
17
![Page 27: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/27.jpg)
Multidimensional Scaling
Consider the (full) SVD: X = UΣVT
We can write M asM = XXT = UΣΣTUT
Starting from M, we can reconstruct X̃ using the eigendecomposition ofM
M = UΛUT
Because, M is symmetric and positive semi-definite, UT = U−1 and allentries of (diagonal matrix) Λ are non-negative
Let X̃ = UΛ1/2
If we are satisfied with approximate reconstruction, we can use truncatedeigendecomposition
18
![Page 28: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/28.jpg)
Multidimensional Scaling: Additional Comments
In general if you define (dis)similarities on objects such as text documents,genetic sequences, etc., we cannot be sure that the generated similaritymatrix M will be positive semi-definite or that the dissimilarity matrix D isa valid squared Euclidean distance
If such cases, we cannot always find a Euclidean embeddding that recoversthe (dis)similarities exactly
Minimize stress function: Find z1, . . . , zN that minimizes
S(Z) =∑i 6=j
(Dij − ‖zi − zj‖)2
Several other types of stress functions can be used
19
![Page 29: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/29.jpg)
Multidimensional Scaling: Summary
I In certain applications, it may be easier to define pairwise similarities ordistances, rather than construct a Euclidean embedding of discreteobjects, e.g., genetic data, text data, etc.
I Many machine learning algorithms require (or are more naturallyexpressed with) data in some Euclidean space
I Multidimensional Scaling gives a way to find an embedding of the data inEuclidean space that (approximately) respects the originaldistance/similarity values
20
![Page 30: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/30.jpg)
Outline
Clustering Objective
k-Means Formulation of Clustering
Multidimensional Scaling
Hierarchical Clustering
Spectral Clustering
![Page 31: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/31.jpg)
Hierarchical Clustering
Hierarchical structured data exists all around us
I Measurements of different species and individuals within species
I Top-level and low-level categories in news articles
I Country, county, town level data
Two Algorithmic Strategies for Clustering
I Agglomerative: Bottom-up, clusters formed by merging smallerclusters
I Divisive: Top-down, clusters formed by splitting larger clusters
Visualise this as a dendogram or tree
21
![Page 32: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/32.jpg)
Measuring Dissimilarity at Cluster Level
To find hierarchical clusters we need to define dissimilarity at cluster level,not just at datapoints
Suppose we have dissimilarity at datapoint level, e.g., d(x,x′) = ‖x− x′‖
Different ways to define dissimilarity at cluster level, say C and C′
I Single LinkageD(C,C′) = min
x∈C,x′∈C′d(x,x′)
I Complete Linkage
D(C,C′) = maxx∈C,x′∈C′
d(x,x′)
I Average Linkage
D(C,C′) =1
|C| · |C′|∑
x∈C,x′∈C′
d(x,x′)
22
![Page 33: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/33.jpg)
Measuring Dissimilarity at Cluster LevelI Single Linkage
D(C,C′) = minx∈C,x′∈C′
d(x,x′)
I Complete Linkage
D(C,C′) = maxx∈C,x′∈C′
d(x,x′)
I Average Linkage
D(C,C′) =1
|C| · |C′|∑
x∈C,x′∈C′
d(x,x′)
23
![Page 34: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/34.jpg)
Linkage-based Clustering Algorithm
1. Initialise clusters as singletons Ci = {i}
2. Initialise clusters available for merging S = {1, . . . , N}
3. Repeat
a. Pick 2 most similar clusters, (j, k) = argminj,k∈S
D(j, k)
b. Let Cl = Cj ∪ Ck
c. If Cl = {1, . . . , N}, break;
d. Set S = (S \ {j, k}) ∪ {l}
e. UpdateD(i, l) for all i ∈ S (using desired linkage property)
24
![Page 35: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/35.jpg)
Hierarchical Clustering: Dendogram
Outputs of hierarchical clusteringalgorithms are typically representedusing dendograms
A dendogram is a binary tree,representing clusters as they weremerged
The height of a node representsdissimilarity
Cutting the dendogram at somelevel gives a partition of data
25
![Page 36: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/36.jpg)
Outline
Clustering Objective
k-Means Formulation of Clustering
Multidimensional Scaling
Hierarchical Clustering
Spectral Clustering
![Page 37: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/37.jpg)
Spectral Clustering
26
![Page 38: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/38.jpg)
Spectral Clustering: Limitations of k-Means
27
![Page 39: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/39.jpg)
Limitations of k-means
k-means will typically form clusters that are spherical, elliptical, convex
Kernel PCA followed by k-means can result in better clusters
Spectral clustering is a (related) alternative that often works better
28
![Page 40: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/40.jpg)
Spectral Clustering
Construct a graph from data; one node for every point in dataset
Use similarity measure, e.g., si,j = exp(−‖xi − xj‖2/σ)
Construct mutualK-nearest neighbour graph, i.e., (i, j) is an edge if eitheri is among theK nearest neighbours of j or vice versa
The weight of edge (i, j), if it exists is si,j
29
![Page 41: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/41.jpg)
Spectral Clustering
30
![Page 42: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/42.jpg)
Spectral Clustering
Use graph partitioning algorithms
Mincut can give bad cuts (only onenode on one side of the cut)
Multi-way cuts, balanced cuts, aretypically NP-hard to compute
Relaxations of these problems giveeigenvectors of Laplacian
W is the weighted adjacency matrix
D is (diagonal) degree matrix:Dii =
∑j Wij
Laplacian L = D−W
Normalised Laplacian:L̃ = I−D−1W
31
![Page 43: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/43.jpg)
Spectral Clustering: Simple Example
1
2 3
4 5
6
1
1
1
1
1 1
Suppose all edge weightsare 1 (0 for missing edges)
The weighted adjacency matrix, the degreematrix and the Laplacian are given by
W =
0 1 1 0 0 01 0 1 0 0 01 1 0 0 0 00 0 0 0 1 10 0 0 1 0 10 0 0 1 1 0
D =
2 0 0 0 0 00 2 0 0 0 00 0 2 0 0 00 0 0 2 0 00 0 0 0 2 00 0 0 0 0 2
L = D−W =
2 −1 −1 0 0 0−1 2 −1 0 0 0−1 −1 2 0 0 00 0 0 2 −1 −10 0 0 −1 2 −10 0 0 −1 −1 2
32
![Page 44: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/44.jpg)
Spectral Clustering: Simple Example
1
2 3
4 5
6
1
1
1
1
1 1
Suppose all edge weightsare 1 (0 for missing edges)
Let us consider some eigenvectors of L
L = D−W =
2 −1 −1 0 0 0−1 2 −1 0 0 0−1 −1 2 0 0 00 0 0 2 −1 −10 0 0 −1 2 −10 0 0 −1 −1 2
v1 = [1, 1, 1, 1, 1, 1]T is an eigenvector with
eigenvalue 0
v2 = [1, 1, 1,−1,−1,−1]T is also an eigenvectorwith eigenvalue 0
α1v1 + α2v2 for any α1, α2 is also an eigenvectorwith eigenvalue 0
We can use the matrix [v1v2] as theN × 2feature matrix and perform k-means
33
![Page 45: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/45.jpg)
Spectral Clustering: Simple Example
1
2 3
4 5
6
1
1
1
1
1 1
Suppose all edge weightsare 1 (0 for missing edges)
34
![Page 46: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/46.jpg)
Spectral Clustering: Simple Example
1
2 3
4 5
6
1.1
1
0.9
1.1
0.9 1
0.1 0.2
Suppose all edge weightsare 1 (0 for missing edges)
Let us consider some eigenvectors of L
L = D−W =
2 −1.1 −0.9 0 0 0
−1.1 2.2 −1 −0.1 0 0−0.9 −1 2.1 0 −0.2 00 −0.1 0 2.1 −1.1 −0.90 0 −0.2 −1.1 2.3 −10 0 0 −0.9 −1 1.9
When the weights are slightly perturbed,v1 = [1, . . . , 1]T is still an eigenvector witheigenvalue 1
We can’t compute the second eigenvector v2 byhand
Nevertheless, we expect that the eigenspacecorresponding to similar eigenvalues is relativelystable
We can still use the matrix [v1v2] as theN × 2feature matrix and perform k-means
35
![Page 47: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/47.jpg)
Spectral Clustering: Simple Example
1
2 3
4 5
6
1.1
1
0.9
1.1
0.9 1
0.1 0.2
Suppose all edge weightsare 1 (0 for missing edges)
36
![Page 48: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/48.jpg)
Spectral Clustering Algorithm
Input: Weighted graph with weighted adjacency matrix W
1. Construct Laplacian L = D−W
2. Find v1 = 1,v2, . . . ,vl+1 the k-eigenvectors
3. Construct theN × l feature matrix Vl = [v2, · · · ,vl]
4. Apply clustering algorithm using Vl as features, e.g., k-means
Note: If the degrees of nodes are not balanced, using the normalisedLaplacian, L̃ = I−D−1W may be a better idea
37
![Page 49: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/49.jpg)
Spectral Clustering
38
![Page 50: Machine Learning - MT 2016 15. ClusteringAnnouncements I Nonewpractical this week I All practicals must be signed off in sessions this week I Firm Deadline: Reports handed in at CS](https://reader036.fdocuments.us/reader036/viewer/2022071000/5fbc2a2a6642e42d240cb911/html5/thumbnails/50.jpg)
Summary: Clustering
Clustering is grouping together similar data in a larger collection ofheterogeneous data
Definition of good clusters often user-dependent
Clustering algorithms in feature space, e.g., k-Means
Clustering algorithms that only use (dis)similarities: k-Medoids,hierarchical clustering
Spectral clustering when clusters may be non-convex
39