CS260: Machine Learning...
Transcript of CS260: Machine Learning...
![Page 1: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/1.jpg)
CS260: Machine Learning AlgorithmsLecture 5: Clustering
Cho-Jui HsiehUCLA
Jan 23, 2019
![Page 2: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/2.jpg)
Supervised versus Unsupervised Learning
Supervised Learning:
Learning from labeled observations
Classification, regression, . . .
Unsupervised Learning:
Learning from unlabeled observations
Discover hidden patterns
Clustering (today)
![Page 3: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/3.jpg)
Kmeans Clustering
![Page 4: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/4.jpg)
Clustering
Given {x1, x2, . . . , xn} and K (number of clusters)
Output A(xi ) ∈ {1, 2, . . . ,K} (cluster membership)
![Page 5: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/5.jpg)
Two circles
Can we split the data into two clusters?
![Page 6: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/6.jpg)
Two circles
Can we split the data into two clusters?
![Page 7: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/7.jpg)
Clustering is Subjective
Non-trivial to say one partition is better than others
Each algorithm has two parts:Define the objective functionDesign an algorithm to minimize this objective function
![Page 8: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/8.jpg)
K-means Objective Function
Partition dataset into C1,C2, . . . ,CK to minimize the followingobjective:
J =K∑
k=1
∑x∈Ck
‖x −mk‖22,
where mk is the mean of Ck .
Multiple ways to minimize this objective
Hierarchical Agglomerative ClusteringKmeans Algorithm (Today). . .
![Page 9: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/9.jpg)
K-means Objective Function
Partition dataset into C1,C2, . . . ,CK to minimize the followingobjective:
J =K∑
k=1
∑x∈Ck
‖x −mk‖22,
where mk is the mean of Ck .
Multiple ways to minimize this objective
Hierarchical Agglomerative ClusteringKmeans Algorithm (Today). . .
![Page 10: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/10.jpg)
K-means Algorithm
![Page 11: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/11.jpg)
K-means Algorithm
Re-write objective:
J =N∑
n=1
K∑k=1
rnk‖xn −mk‖22,
where rnk ∈ {0, 1} is an indicator variable
rnk = 1 if and only if xn ∈ Ck
Alternative optimization between {rnk} and {mk}Fix {mk} and update {rnk}Fix {rnk} and update {mk}
![Page 12: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/12.jpg)
K-means Algorithm
Step 0: Initialize {mk} to some values
Step 1: Fix {mk} and minimize over {rnk}:
rnk =
{1 if k = arg minj ‖xn −mj‖220 otherwise
Step 2: Fix {rnk} and minimize over {mk}:
mk =
∑n rnkxn∑n rnk
Step 3: Return to step 1 unless stopping criterion is met
![Page 13: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/13.jpg)
K-means Algorithm
Step 0: Initialize {mk} to some values
Step 1: Fix {mk} and minimize over {rnk}:
rnk =
{1 if k = arg minj ‖xn −mj‖220 otherwise
Step 2: Fix {rnk} and minimize over {mk}:
mk =
∑n rnkxn∑n rnk
Step 3: Return to step 1 unless stopping criterion is met
![Page 14: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/14.jpg)
K-means Algorithm
Step 0: Initialize {mk} to some values
Step 1: Fix {mk} and minimize over {rnk}:
rnk =
{1 if k = arg minj ‖xn −mj‖220 otherwise
Step 2: Fix {rnk} and minimize over {mk}:
mk =
∑n rnkxn∑n rnk
Step 3: Return to step 1 unless stopping criterion is met
![Page 15: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/15.jpg)
K-means Algorithm
Step 0: Initialize {mk} to some values
Step 1: Fix {mk} and minimize over {rnk}:
rnk =
{1 if k = arg minj ‖xn −mj‖220 otherwise
Step 2: Fix {rnk} and minimize over {mk}:
mk =
∑n rnkxn∑n rnk
Step 3: Return to step 1 unless stopping criterion is met
![Page 16: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/16.jpg)
K-means Algorithm
Equivalent to the following procedure:
Step 0: Initialize centers {mk} to some values
Step 1: Assign each xn to the nearest center:
A(xn) = arg minj‖xn −mj‖22
Update clusters:
Ck = {xn : A(xn) = k} ∀k = 1, . . . ,K
Step 2: Calculate mean of each cluster Ck :
mk =1
|Ck |∑
xn∈Ck
xn
Step 3: Return to step 1 unless stopping criterion is met
![Page 17: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/17.jpg)
More on K-means Algorithm
Always decrease the objective function for each update
Objective function will remain unchanged when step 1 doesn’t changecluster assignment ⇒ Converged
May not converge to global minimum
Sensitive to initial values
Kmeans++: A better way to initialize the clusters
![Page 18: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/18.jpg)
More on K-means Algorithm
Always decrease the objective function for each update
Objective function will remain unchanged when step 1 doesn’t changecluster assignment ⇒ Converged
May not converge to global minimum
Sensitive to initial values
Kmeans++: A better way to initialize the clusters
![Page 19: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/19.jpg)
More on K-means Algorithm
Always decrease the objective function for each update
Objective function will remain unchanged when step 1 doesn’t changecluster assignment ⇒ Converged
May not converge to global minimum
Sensitive to initial values
Kmeans++: A better way to initialize the clusters
![Page 20: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/20.jpg)
Graph Clustering
![Page 21: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/21.jpg)
Graph Clustering
Given a graph G = (V ,E ,W )
V : nodes {v1, · · · , vn}E : edges {e1, · · · , em}W : weight matrix
Wij =
{wij , if (i , j) ∈ E
0 otherwise
Goal: Partition V into k clusters of nodes
V = V1 ∪ V2 ∪ · · · ∪ Vk , Vi ∩ Vj = ϕ, ∀i , j
![Page 22: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/22.jpg)
Similarly Graph
Example: similarity graph
Given samples x1, . . . , xnWeight (similarities) indicates “closeness of samples”
![Page 23: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/23.jpg)
Similarity Graph
E.g., Gaussian kernel Wij = e−‖xi−xj‖2/σ2
![Page 24: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/24.jpg)
Social graph
Nodes: users in social network
Edges: Wij = 1 if user i and j are friends,
otherwise Wij = 0
![Page 25: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/25.jpg)
Partitioning into Two Clusters
Partition graph into two sets V1,V2 to minimize the cut value:
cut(V1,V2) =∑
vi∈V1,vj∈V2
Wij
Also, the size of V1,V2 needs to be similar (balance)
One classical way of enforcing balance:
minV1,V2
cut(V1,V2)
s.t. |V1| = |V2|, V1 ∪ V2 = {1, · · · , n},V1 ∩ V2 = ϕ
⇒ this is NP-hard (cannot be solved in polynomial time)
![Page 26: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/26.jpg)
Partitioning into Two Clusters
Partition graph into two sets V1,V2 to minimize the cut value:
cut(V1,V2) =∑
vi∈V1,vj∈V2
Wij
Also, the size of V1,V2 needs to be similar (balance)
One classical way of enforcing balance:
minV1,V2
cut(V1,V2)
s.t. |V1| = |V2|, V1 ∪ V2 = {1, · · · , n},V1 ∩ V2 = ϕ
⇒ this is NP-hard (cannot be solved in polynomial time)
![Page 27: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/27.jpg)
Partitioning into Two Clusters
Partition graph into two sets V1,V2 to minimize the cut value:
cut(V1,V2) =∑
vi∈V1,vj∈V2
Wij
Also, the size of V1,V2 needs to be similar (balance)
One classical way of enforcing balance:
minV1,V2
cut(V1,V2)
s.t. |V1| = |V2|, V1 ∪ V2 = {1, · · · , n},V1 ∩ V2 = ϕ
⇒ this is NP-hard (cannot be solved in polynomial time)
![Page 28: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/28.jpg)
Kernighan-Lin Algorithm
Starts with some partitioning V1,V2
Calculate change in cut if 2 vertices are swapped
Swap the vertices (1 in V1 & 1 in V2) that decease the cut the most
Iterative until convergence
Used when we need exact balanced clusters
(e.g., circuit design)
![Page 29: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/29.jpg)
Kernighan-Lin Algorithm
Starts with some partitioning V1,V2
Calculate change in cut if 2 vertices are swapped
Swap the vertices (1 in V1 & 1 in V2) that decease the cut the most
Iterative until convergence
Used when we need exact balanced clusters
(e.g., circuit design)
![Page 30: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/30.jpg)
Objective function that considers balance
Ratio-Cut:
minV1,V2
{Cut(V1,V2)
|V1|+
Cut(V1,V2)
|V2|
}:= RC(V1,V2)
Normalized-Cut:
minV1,V2
{Cut(V1,V2)
deg(V1)+
Cut(V1,V2)
deg(V2)
}:= NC(V1,V2),
wheredeg(Vc) :=
∑vi∈Vc ,(i ,j)∈E
Wi ,j = links(Vc ,V )
![Page 31: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/31.jpg)
Generalize to k clusters
Ratio-Cut:
minV1,··· ,Vk
k∑c=1
Cut(Vc ,V − Vc)
|Vc |
Normalized-Cut:
minV1,··· ,Vk
k∑c=1
Cut(Vc ,V − Vc)
deg(Vc)
![Page 32: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/32.jpg)
Reformulation
Recall deg(Vc) = links(Vc ,V )
Define a diagonal matrix
D =
deg(V1) 0 0 · · ·
0 deg(V2) 0 · · ·0 0 deg(V3) · · ·...
......
. . .
yc = {0, 1}n: indicator vector for the c-th cluster
We have
yTc yc = |Vc |yTc Dyc = deg(Vc)
yTc W yc = links(Vc ,Vc)
![Page 33: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/33.jpg)
Reformulation
Recall deg(Vc) = links(Vc ,V )
Define a diagonal matrix
D =
deg(V1) 0 0 · · ·
0 deg(V2) 0 · · ·0 0 deg(V3) · · ·...
......
. . .
yc = {0, 1}n: indicator vector for the c-th cluster
We have
yTc yc = |Vc |yTc Dyc = deg(Vc)
yTc W yc = links(Vc ,Vc)
![Page 34: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/34.jpg)
Ratio Cut
Rewrite the ratio-cut objective:
RC (V1, · · · ,Vk) =k∑
c=1
Cut(Vc ,V − Vc)
|Vc |
=k∑
c=1
deg(Vc)− links(Vc ,Vc)
|Vc |
=k∑
c=1
yTc Dyc − yTc W ycyTc yc
=k∑
c=1
yTc (D −W )ycyTc yc
=k∑
c=1
yTc LycyTc yc
(L = D −W is “Graph Laplacian”)
![Page 35: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/35.jpg)
More on Graph Laplacian
L is symmetric positive semi-definite
For any x ,
xTLx =1
2
∑(i ,j)
Wij(xi − xj)2
![Page 36: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/36.jpg)
More on Graph Laplacian
L is symmetric positive semi-definite
For any x ,
xTLx =1
2
∑(i ,j)
Wij(xi − xj)2
![Page 37: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/37.jpg)
Solving Ratio-Cut
We have shown Ratio-Cut is equivalent to
RCut =k∑
c=1
yTc LycyTc yc
=k∑
c=1
(yc‖yc‖
)TLyc‖yc‖
Define yc = yc/‖yc‖ (normalized indicator),
Y = [y1, y2, · · · , yk ] ⇒ Y TY = I
Relaxed to real valued problem:
minY TY=I
Trace(Y TLY )
Solution: Eigenvectors corresponding to the smallest k eigenvalues of L
![Page 38: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/38.jpg)
Solving Ratio-Cut
We have shown Ratio-Cut is equivalent to
RCut =k∑
c=1
yTc LycyTc yc
=k∑
c=1
(yc‖yc‖
)TLyc‖yc‖
Define yc = yc/‖yc‖ (normalized indicator),
Y = [y1, y2, · · · , yk ] ⇒ Y TY = I
Relaxed to real valued problem:
minY TY=I
Trace(Y TLY )
Solution: Eigenvectors corresponding to the smallest k eigenvalues of L
![Page 39: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/39.jpg)
Solving Ratio-Cut
We have shown Ratio-Cut is equivalent to
RCut =k∑
c=1
yTc LycyTc yc
=k∑
c=1
(yc‖yc‖
)TLyc‖yc‖
Define yc = yc/‖yc‖ (normalized indicator),
Y = [y1, y2, · · · , yk ] ⇒ Y TY = I
Relaxed to real valued problem:
minY TY=I
Trace(Y TLY )
Solution: Eigenvectors corresponding to the smallest k eigenvalues of L
![Page 40: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/40.jpg)
Solving Ratio-Cut
Let Y ∗ ∈ Rn×k be these eigenvectors. Are we done?
No, Y ∗ does not have 0/1 values (not indicators)
(since we are solving a relaxed problem)
Solution: Run k-means on the rows of Y ∗
Summary of Spectral clustering algorithms:
Compute Y ∗ ∈ Rn×k : eigenvectors corresponds to k smallest eigenvaluesof (normalized) Laplacian matrixRun k-means to cluster rows of Y ∗
![Page 41: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/41.jpg)
Solving Ratio-Cut
Let Y ∗ ∈ Rn×k be these eigenvectors. Are we done?
No, Y ∗ does not have 0/1 values (not indicators)
(since we are solving a relaxed problem)
Solution: Run k-means on the rows of Y ∗
Summary of Spectral clustering algorithms:
Compute Y ∗ ∈ Rn×k : eigenvectors corresponds to k smallest eigenvaluesof (normalized) Laplacian matrixRun k-means to cluster rows of Y ∗
![Page 42: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/42.jpg)
Solving Ratio-Cut
Let Y ∗ ∈ Rn×k be these eigenvectors. Are we done?
No, Y ∗ does not have 0/1 values (not indicators)
(since we are solving a relaxed problem)
Solution: Run k-means on the rows of Y ∗
Summary of Spectral clustering algorithms:
Compute Y ∗ ∈ Rn×k : eigenvectors corresponds to k smallest eigenvaluesof (normalized) Laplacian matrixRun k-means to cluster rows of Y ∗
![Page 43: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/43.jpg)
Solving Ratio-Cut
Let Y ∗ ∈ Rn×k be these eigenvectors. Are we done?
No, Y ∗ does not have 0/1 values (not indicators)
(since we are solving a relaxed problem)
Solution: Run k-means on the rows of Y ∗
Summary of Spectral clustering algorithms:
Compute Y ∗ ∈ Rn×k : eigenvectors corresponds to k smallest eigenvaluesof (normalized) Laplacian matrixRun k-means to cluster rows of Y ∗
![Page 44: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/44.jpg)
Eigenvectors of Laplacian
If graph is disconnected ( k connected components), Laplacian is blockdiagonal and first k eigen-vectors are:
![Page 45: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/45.jpg)
Eigenvectors of Laplacian
What if the graph is connected?
There will be only one smallest eigenvalue/eigenvector:
L1 = (D − A)1 = 0
(1 = [1, 1, · · · , 1]T is the eigenvector with eigenvalue 0)
However, the 2nd to k-th smallest eigenvectors are still useful forclustering
![Page 46: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/46.jpg)
Eigenvectors of Laplacian
What if the graph is connected?
There will be only one smallest eigenvalue/eigenvector:
L1 = (D − A)1 = 0
(1 = [1, 1, · · · , 1]T is the eigenvector with eigenvalue 0)
However, the 2nd to k-th smallest eigenvectors are still useful forclustering
![Page 47: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/47.jpg)
Eigenvectors of Laplacian
What if the graph is connected?
There will be only one smallest eigenvalue/eigenvector:
L1 = (D − A)1 = 0
(1 = [1, 1, · · · , 1]T is the eigenvector with eigenvalue 0)
However, the 2nd to k-th smallest eigenvectors are still useful forclustering
![Page 48: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/48.jpg)
Normalized Cut
Rewrite Normalized Cut:
NCut =k∑
c=1
Cut(Vc ,V − Vc)
deg(Vc)
=k∑
c=1
yTc (D − A)ycyTc Dyc
Let yc = D1/2yc‖D1/2yc‖
, then
NCut =k∑
c=1
yTc D−1/2(D − A)D−1/2yc
yTc yc
Normalized Laplacian:
L = D−1/2(D − A)D−1/2 = I − D−1/2AD−1/2
Normalized Cut → eigenvectors correspond to the smallest eigenvaluesof L
![Page 49: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/49.jpg)
Kmeans vs Spectral Clustering
Kmeans: decision boundary is linear
Spectral clustering: boundary can be non-convex curves
σ in Wij = e−‖xi−xj‖
2
σ2 controls the clustering results (focus on local orglobal structure)
![Page 50: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/50.jpg)
Kmeans vs Spectral Clustering
![Page 51: CS260: Machine Learning Algorithmsweb.cs.ucla.edu/~chohsieh/teaching/CS260_Winter2019/lecture5.pdfnkg: r nk = (1 if k = arg min j kx n m jk22 0 otherwise Step 2: Fix fr nkgand minimize](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f47e90271147d34025fc6fc/html5/thumbnails/51.jpg)
Conclusions
Kmeans objective: minimize the distance to centers
Kmeans algorithm: an optimization algorithm to minimize this objective
Graph clustering ⇒ related to eigenvectors of graphs
Questions?