Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from...
Transcript of Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from...
![Page 1: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/1.jpg)
Spectral Clustering
by HU Pili
June 16, 2013
![Page 2: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/2.jpg)
Outline
• Clustering Problem
• Spectral Clustering Demo
• Preliminaries
◦ Clustering: K-means Algorithm
◦ Dimensionality Reduction: PCA, KPCA.
• Spectral Clustering Framework
• Spectral Clustering Justification
• Ohter Spectral Embedding Techniques
Main reference: Hu 2012 [4]
2
![Page 3: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/3.jpg)
Clustering Problem
Height
Weight
Figure 1. Abstract your target using feature vector
3
![Page 4: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/4.jpg)
Clustering Problem
Height
Weight
Figure 2. Cluster the data points into K (2 here) groups
4
![Page 5: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/5.jpg)
Clustering Problem
Height
Weight
ThinFat
Figure 3. Gain Insights of your data
5
![Page 6: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/6.jpg)
Clustering Problem
Height
Weight
Figure 4. The center is representative (knowledge)
6
![Page 7: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/7.jpg)
Clustering Problem
Height
Weight
Figure 5. Use the knowledge for prediction
7
![Page 8: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/8.jpg)
Review: Clustering
We learned the general steps of Data Mining/ KnowledgeDiscover using a clustering example:
1. Abstract your data in form of vectors.
2. Run learning algorithms
3. Gain insights/ extract knowledge/ make prediction
We focus on 2.
8
![Page 9: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/9.jpg)
Spectral Clustering Demo
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 6. Data Scatter Plot
9
![Page 10: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/10.jpg)
Spectral Clustering Demo
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 7. Standard K-Means
10
![Page 11: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/11.jpg)
Spectral Clustering Demo
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 8. Our Sample Spectral Clustering
11
![Page 12: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/12.jpg)
Spectral Clustering Demo
The algorithm is simple:
• K-Means:[idx, c] = kmeans(X, K) ;
• Spectral Clustering:epsilon = 0.7 ;
D = dist(X’) ;
A = double(D < epsilon) ;
[V, Lambda] = eigs(A, K) ;
[idx, c] = kmeans(V, K) ;
12
![Page 13: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/13.jpg)
Review: The Demo
The usual case in data mining:
• No weak algorithms
• Preprocessing is as important as algorithms
• The problem looks easier in another space (thesecret coming soon)
Transformation to another space:
• High to low: dimensionality reduction, low dimen-sion embedding. e.g. Spectral clustering.
• Low to high. e.g. Support Vector Machine (SVM)
13
![Page 14: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/14.jpg)
Secrets of Preprocessing
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Figure 9. The similarity graph: connect to ε-ball.
D = dist(X’); A = double(D < epsilon);
14
![Page 15: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/15.jpg)
Secrets of Preprocessing
−0.11 −0.1 −0.09 −0.08 −0.07 −0.06 −0.05 −0.04 −0.03 −0.02 −0.01−0.1
−0.05
0
0.05
0.1
0.15
Figure 10. 2-D embedding using largest 2 eigenvectors
[V, Lambda] = eigs(A, K);
15
![Page 16: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/16.jpg)
Secrets of Preprocessing
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 11. Even better after projecting to unit circle (not
used in our sample but more applicable, Brand 2003 [3])16
![Page 17: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/17.jpg)
Secrets of Preprocessing
−0.2 0 0.2 0.4 0.6 0.8 1 1.20
1
2
3x 10
4 Angles −− All Points
0.94 0.96 0.98 1 1.02 1.04 1.060
200
400
600
800Angles −− Cluster 1
0.8 0.85 0.9 0.95 1 1.05 1.1 1.150
1000
2000
3000
4000Angles −− Cluster 2
Figure 12. Angle histogram: The two clusters are concen-
trated and they are nearly perpendicular to each other17
![Page 18: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/18.jpg)
Notations
• Data points: Xn×N = [x1, x2, , xN]. N points,each of n dimensions. Organize in columns.
• Feature extraction: φ(xi). d dimensional.
• Eigen value decomposition: A = UΛUT. First d
colums of U : Ud.
• Feature matrix: Φn̂×N = [φ(x)1, φ(x)2, , φ(x)N]
• Low dimension embedding: YN×d. Embed N
points into d-dimensional space.
• Number of clusters: K.
18
![Page 19: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/19.jpg)
K-Means
• Initialize m1, ,mK centers
• Iterate until convergence:
◦ Cluster assignment: Ci=argminj ‖xi−mj‖for all data points, 16 i6N .
◦ Update clustering: Sj={i:Ci= j},16 j6K
◦ Update centers: mj=1
|Sj |
∑
i∈Sjxi
19
![Page 20: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/20.jpg)
Remarks: K-Means
1. A chiken and egg problem.
2. How to initialize centers?
3. Determine superparameter K?
4. Decision boundary is a straight line:
• Ci= argminj ‖xi−mj‖
• ‖x−mj‖= ‖x−mk‖
• miTmi−mj
Tmj=2xT(mi−mj)
We address the 4th points by transforming the data intoa space where straight line boundary is enough.
20
![Page 21: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/21.jpg)
Principle Component Analysis
DataPCError
Figure 13. Error minimization formulation
21
![Page 22: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/22.jpg)
Principle Component Analysis
Assume xi is already centered (easy to preprocess).Project points onto the space spanned by Un×d with min-imum errors:
minU∈Rn×d
J(U)=∑
i=1
N
‖UUTxi− xi‖2
s.t. UTU = I
22
![Page 23: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/23.jpg)
Principle Component Analysis
Transform to trace maximization:
maxU∈Rn×d
Tr[UT(XXT)U ]
s.t. UTU = I
Standard problem in matrix theory:
• Solution of U is given by the largest d eigen vec-tors of XXT (those corresponding to largest eigen
values). i.e. XXTU =UΛ.
• Usually denote Σn×n =XXT, because XXT canbe interpreted as the covariance of variables (nvariables). (Again not that X is centered)
23
![Page 24: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/24.jpg)
Principle Component Analysis
About UUTxi:
• xi: the data points in original n dimensional space.
• Un×d: the d dimensional space (axis) expressed
using the coordinates of original n-D space. –Principle Axis.
• UTxi: the coordinates of xi in d-dimensional space;d-dimensional embedding; the projection of xi tod-D space expressed in d-D space. – Principle
Component.
• U UTxi: the projection of xi to d-D space expressedin n-D space.
24
![Page 25: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/25.jpg)
Principle Component Analysis
From principle axis to principle component:
• One data point: UTxi
• Matrix form: (Y T)d×N =UTX
Relation between covariance and similarity:
(XTX )Y = XTX (UTX)T
= XTXXTU
= XTU Λ
= Y Λ
Observation: Y is the eigen vectors of XTX .
25
![Page 26: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/26.jpg)
Principle Component Analysis
Two operational approaches:
• Decompose (XXT)n×n and Let (Y T)d×N=UTX.
• Decompose (XTX)N×N and directly get Y .
Implications:
• Choose the smaller size one in practice.
• XTX hint that we can do more with the structure.
Remarks:
• Principle components are what we want in mostcases. i.e. Y . i.e. d-dimension embedding. e.g. Cando clustering on coordinates given by Y .
26
![Page 27: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/27.jpg)
Kernel PCA
Settings:
• A feature extraction function:
φ:Rn→Rn̂
original n dimension features. Map to n̂ dimension.
• Matrix organization:
Φn̂×N = [φ(x)1, φ(x)2, , φ(x)N]
• Now Φ is the “data points” in the formulation ofPCA. Try to embed them into d-dimensions.
27
![Page 28: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/28.jpg)
Kernel PCA
According to the analysis of PCA, we can operate on:
• (ΦΦT)n̂×n̂: the covariance matrix of features
• (ΦTΦ)N×N: the similarity/ affinity matrix (inspectral clustering language); the gram/ kernalmatrix (in keneral PCA language).
The observation:
• n̂ can be very large. e.g. φ:Rn→R∞
• (ΦTΦ)i,j=φT(xi)φ(xj). Don’t need explict φ; only
need k(xi, xj)= φT(xi)φ(xj).
28
![Page 29: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/29.jpg)
Kernel PCA
k(xi, xj) = φT(xi)φ(xj) is the “kernel”. One importantproperty: by definition,
• k(xi, xj) is a positive semidefinite function.
• K is a positive semidefinite matrix.
Some example kernals:
• Linear: k(xi, xj)=xiTxj. Degrade to PCA.
• Polynomial: k(xi, xj)= (1+xiTxj)p
• Gaussion: k(xi, xj)= e−
∥
∥
∥xi−xj
∥
∥
∥
2
2σ2
29
![Page 30: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/30.jpg)
Remarks: KPCA
• Avoid explicit high dimension (maybe infinite) fea-ture construction.
• Enable one research direction: kernel engineering.
• The above discussion assume Φ is centered! SeeBishop 2006 [2] for how to center this matrix (usingonly kernel function). (or “double centering” tech-nique in [4])
• Out of sample embedding is the real difficulty, seeBengio 2004 [1].
30
![Page 31: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/31.jpg)
Review: PCA and KPCA
• Minimum error formulation of PCA
• Two equivalent implementation approaches:
◦ covariance matrix
◦ similarity matrix
• Similarity matrix is more convenient to manipulateand leads to KPCA.
• Kernel is Positive Semi-Definite (PSD) by defini-tion. K =ΦTΦ
31
![Page 32: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/32.jpg)
Spectral Clustering Framework
A bold guess:
• Decomposing K =ΦTΦ gives good low-dimensionembedding. Inner product measures similarity, i.e.k(xi, xj)= φT(xi)φ(xj). K is similarity matrix.
• In the operation, we acturally do not look at Φ.
• We can specify K directly and perform EVD:
Xn×N→KN×N
• What if we directly give a similarity measure, K,without the constraint of PSD?
That leads to the general spectral clustering.
32
![Page 33: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/33.jpg)
Spectral Clustering Framework
1. Get similarity matrix AN×N from data points X.(A: affinity matrix; adjacency matrix of a graph;similarity graph)
2. EVD: A= UΛUT . Use Ud (or post-processed ver-sion, see [4]) as the d-dimension embedding.
3. Perform clustering on d-D embedding.
33
![Page 34: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/34.jpg)
Spectral Clustering Framework
Review our naive spectral clustering demo:
1. epsilon = 0.7 ;
D = dist(X’) ;
A = double(D < epsilon) ;
2. [V, Lambda] = eigs(A, K) ;
3. [idx, c] = kmeans(V, K) ;
34
![Page 35: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/35.jpg)
Remarks: SC Framework
• We start by relaxing A (K) in KPCA.
• Lose PSD == Lose KPCA justification? Not exact
A′=A+σI
• Real tricks: (see [4] section 2 for details)
◦ How to form A?
◦ Decompose A or other variants of A (L =D−A).
◦ Use EVD result directly (e.g. U) or use a
variant (e.g. UΛ1/2).
35
![Page 36: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/36.jpg)
Similarity graph
Input is high dimensional data: (e.g. come in form of X)
• k-nearest neighbour
• ε-ball
• mutual k-NN
• complete graph (with Gaussian kernel weight)
36
![Page 37: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/37.jpg)
Similarity graph
Input is distance.(
D(2))
i,jis the squred distance between
i and j. (may not come from raw xi, xj)
c = [x1Tx1, , xN
TxN]T
D(2) = c 1T+1 cT− 2XTX
J = I −1
n1 1T
XTX = −1
2JD(2)J
Remarks:
• See MDS.
37
![Page 38: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/38.jpg)
Similarity graph
Input is a graph:
• Just use it.
• Or do some enhancements. e.g. Geodesic distance.See [4] Section 2.1.3 for some possible methods.
After get the graph (or input):
• Adjacency matrix A, Laplacian matrix L=D−A.
• Normalized versions:
◦ Aleft=D−1A, Asym=D−1/2
AD−1/2
◦ Lleft=D−1L, Lsym=D−1/2
LD−1/2
38
![Page 39: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/39.jpg)
EVD of the Graph
Matrix types:
• Adjacency series: Use the largest EVs.
• Laplacian series: Use the smallest EVs.
39
![Page 40: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/40.jpg)
Remarks: SC Framework
• There are many possibilities in construction of sim-ilarity matrix and the post-processing of EVD.
• Not all of these combinations have justifications.
• Once a combination is shown to working, it maynot be very hard to find justifications.
• Existing works actually starts from very differentflavoured formulations.
• Only one common property: involve EVD;aka “spectral analysis”; hence the name.
40
![Page 41: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/41.jpg)
Spectral Clustering Justification
• Cut based argument (main stream; origin)
• Random walk escaping probability
• Commute time: L−1 encodes the effective resis-tance. (where UΛ−1/2 come from)
• Low-rank approximation.
• Density estimation.
• Matrix perturbation.
• Polarization. (the demo)
See [4] for pointers.
41
![Page 42: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/42.jpg)
Cut Justification
Normalized Cut (Shi 2000 [5]):
NCut=∑
i=1
Kcut(Ci, V −Ci)
vol(Ci)
Characteristic vector for Ci, χi= {0, 1}N:
NCut=∑
i=1
KχiTLχi
χiTDχi
42
![Page 43: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/43.jpg)
Relax χi to real value:
minvi∈RN
∑
i=1
K
viTLvi
s.t. viTDvi=1
viT vj=0, ∀i� j
This is the generalized eigenvalue problem:
L v=λDv
Equivalent to EVD on:
Lleft=D−1L
43
![Page 44: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/44.jpg)
Matrix Perturbation Justification
• When the graph is ideally separable, i.e. multipleconnected components, A and L have character-istic (or piecewise linear) EVs.
• When not ideally separable but sparse cut exists,A can be viewed as ideal separable matrix plus asmall perturbation.
• Small perturbation of matrix entries leads to smallperturbation of EVs.
• EVs are not too far from piecewise linear: easy toseparate by simple algorithms like K-Means.
44
![Page 45: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/45.jpg)
Low Rank Approximation
The similarity matrix A is generated by inner productin some unknown space we want to recover. We want tominimize the recovering error:
minY ∈RN×d
‖A−YY T ‖F2
The standard low-rank approximation problem, whichleads to EVD of A:
Y =UΛ1/2
45
![Page 46: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/46.jpg)
Spectral Embedding Techniques
See [4] for some pointers: MDS, isomap, PCA, KPCA,LLE, LEmap, HEmap, SDE, MVE, SPE. The difference,as said, lies mostly in the construction of A.
46
![Page 47: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/47.jpg)
Bibliography
[1] Y. Bengio, J. Paiement, P. Vincent, O. Delalleau, N. Le Roux,
and M. Ouimet. Out-of-sample extensions for lle, isomap, mds,
eigenmaps, and spectral clustering. Advances in neural informa-
tion processing systems, 16:177–184, 2004.
[2] C. Bishop. Pattern recognition and machine learning,
volume 4. springer New York, 2006.
[3] M. Brand and K. Huang. A unifying theorem for spectral
embedding and clustering. In Proceedings of the Ninth Inter-
national Workshop on Artificial Intelligence and Statistics, 2003.
[4] P. Hu. Spectral clustering survey, 5 2012.
[5] J. Shi and J. Malik. Normalized cuts and image segmentation.
Pattern Analysis and Machine Intelligence, IEEE Transactions
on, 22(8):888–905, 2000.
47
![Page 48: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/48.jpg)
Thanks
Q/A
Some supplementary slides for details are attached.
48
![Page 49: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/49.jpg)
SVD and EVD
Definitions of Singular Value Decomposition (SVD):
Xn×N = Un×kΣk×kVN×kT
Definitions of Eigen Value Decomposition (EVD):
A = XTX
A = UΛUT
Relations:
XTX = V Σ2V T
XXT = UΣ2UT
49
![Page 50: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/50.jpg)
Remarks:
• SVD requires UTU = I , V TV = I and σi > 0(Σ = diag(σ1, , σN)). This is to guarantee theuniqueness of solution.
• EVD does not have constraints, any U and Λ satis-fying AU =UΛ is OK. The requirement of UTU =I is also to guarantee uniqueness of solution (e.g.PCA). Another benefit is the numerical stability ofsubspace spanned by U : orthogonal layout is moreerror resilient.
• The computation of SVD is done via EVD.
• Watch out the terms and the object they refer to.
50
![Page 51: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/51.jpg)
Out of Sample Embedding
• New data point x ∈Rn that is not in X . How tofind the lower dimension embedding, i.e. y ∈Rd.
• In PCA, we have principle axis U (XXT=UΛUT).Out of sample embedding is simple: y=UTx.
• Un×d is actually a compact representation of
knowledge.
• In KPCA and different variants of SC, we operateon similarity graph and do not have such compactrepresentation. It is thus hard to explicitly the outof sample embedding result.
• See [1] for some researches on this.
51
![Page 52: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/52.jpg)
Gaussian Kernel
The gaussian kernel: (Let τ =1
2σ2)
k(xi, xj)= e−
∥
∥
∥xi−xj
∥
∥
∥
2
2σ2 = e−τ ‖xi−xj‖2
Use Taylor expansion:
ex=∑
k=0
∞1
k!xk
Rewrite the kernel:
k(xi, xj) = e−τ(xi−xj)T(xi−xj)
= e−τxiTxi · e−τxj
Txj · e2τxiTxj
52
![Page 53: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/53.jpg)
Focus on the last part:
e2τxiTxj =
∑
k=0
∞1
k!(2τxi
Txj)k
It’s hard to write out the form when xi ∈Rn, n > 1. Wedemo the case when n = 1. xi and xj are now singlevariable:
e2τxiTxj =
∑
k=0
∞1
k!(2τ )kxi
kxjk
=∑
k=0
∞
c(k) ·xikxj
k
53
![Page 54: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)](https://reader033.fdocuments.us/reader033/viewer/2022051904/5ff530a3482d1409ea3443af/html5/thumbnails/54.jpg)
The feature vector is: (infinite dimension)
φ(x)= e−τx2[
c(0)√
, c(1)√
x, c(2)√
x2, , c(k)√
xk,]
Verify that:
k(xi, xj)= φ(xi) · φ(xj)
This shows that Gaussian kernel implicitly map 1-D datato an infinite dimensional feature space.
54