K -means Clustering via Principal Component Analysis
description
Transcript of K -means Clustering via Principal Component Analysis
![Page 1: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/1.jpg)
1
K-means Clustering via Principal Component Analysis
According to the paper by Chris Ding and Xiaofeng He from Int’l Conf.
Machine Learning, Banff, Canada, 2004
![Page 2: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/2.jpg)
2
Traditional K-means Clustering
K
k CikiK
k
J1
2)( mx
),,( 1 nX xx ),,( 1 di xx x
Minimizing the sum of squared errors
Where data matrix
kCii
kk n
xm1
Centroid of cluster Ck
nk is the number of points in Ck
![Page 3: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/3.jpg)
3
Principal Component Analysis (PCA)
Centered data matrix
),,,( 1 nY yy ,xxy ii
n
iin 1
1xx
Covariance matrix
n
i
Tii
T
nYY
n 1
))((1
1
1
1xxxx
Factor 1
1
nis ignored
![Page 4: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/4.jpg)
4
PCA - continuation
Eigenvalues and eigenvectors
2/1/,, kkT
kkkkT
kkkT YYYYY uvvvuu
Singular value decomposition (SVD)
k
TkkkY vu2/1
![Page 5: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/5.jpg)
5
PCA - example
![Page 6: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/6.jpg)
6
K-means → PCA
Indikator vectors 2/1/)0,,0,1,,1,0,,0( kT
n
k nk
h
),,( 1 KKH hh
Criterion )Tr()Tr( KTT
KT
K XHXHXXJ Linear transform by K × K orthonormal matrix T
THQ KKk ),,( 1 qq Last column of T
TK nnnnt )/,,/( 11
![Page 7: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/7.jpg)
7
K-means → PCA - continuation
Therefore ehhqnn
n
n
nK
KK
11
1
)Tr()Tr( 11 KTT
KT
K YQYQYYJCriterion
Optimization becomes
)Tr(max 111
KTT
KQ
YQYQK
Solution is first K-1 principal components
),,( 11 KkQ vv
![Page 8: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/8.jpg)
8
PCA → K-means
Clustering by PCA
K
k
Tkk
K
k
Tkk
K
k
Tkk
T nC11
1
1
/ hhqqvvee
Probability of connectivity between i and j
2/12/1jjii
ijij cc
cp
ij
ijij p
pp
if,1
if,0
0.5usually,10
![Page 9: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/9.jpg)
9
![Page 10: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/10.jpg)
10
![Page 11: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/11.jpg)
11
![Page 12: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/12.jpg)
12
![Page 13: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/13.jpg)
13
![Page 14: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/14.jpg)
14
![Page 15: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/15.jpg)
15
![Page 16: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/16.jpg)
16
![Page 17: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/17.jpg)
17
Eigenvalues
• 1. case 164030, 58, 5
• 2. case 212920, 1892, 157
![Page 18: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/18.jpg)
18
![Page 19: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/19.jpg)
19
![Page 20: K -means Clustering via Principal Component Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813aea550346895da35055/html5/thumbnails/20.jpg)
20
Thank you for your attention