Pattern Recognition and Machine Learning Summer School 2014
Hierarchical methods
Agglomerative clustering
Divisive clustering
Iterative methods
k‐means clustering
EM algorithm
Mean‐shift algorithm
Spectral clustering
Normalized cut
Ratio cut
Graph‐cut
Clustering based on the spectrum of the graph
the multiset of the eigenvalues of the Laplacian matrix
Treats clustering as a graph partitioning problem without making specific assumptions on the form of the clusters.
Clusters points using eigenvectors of matrices derived from the data.
Maps data to a low‐dimensional space that are separated and can be easily clustered.
L = D (degree matrix) – W (adjacency matrix)
affinity or similarity of the two nodes
• Affinity matrix
• Laplacian matrix
• Similarity measures
– Cosine measure
– Bhattacharyya coefficient
• Distance measures
– Euclidean distance
– Manhattan distance
– Maximum distance …
Find a label vector x !
Convert the discrete problem to continuous domain
But, NP-hard problem..
Average association
Points in dominant cluster are non-zero
X(label) is divided
into 0 and 1
But, favor for small and isolated clusters
Sum of the weights to cut edges
Find the second minimum eigenvector
=
= assoc(G1,G) - assoc(G1,G1)
= cut(G1, G2)
(D-W) * 1 = 0 * 1
The smallest eigenvector is 1.
y : binary vector representing the
cluster association
Favors partitioning with equal size segments
The second smallest eigenvalue
Based on the edge weights
‘NP-complete’
Find z in
Pros Generic framework, can be used with many different
features
Cons High storage requirement and time complexity
Bias towards partitioning into equal segments
Need the number of clusters as parameter
Incremental partitioning
Partition using only one eigenvector at a time
Use procedure recursively
Batch partitioning
Use k eigenvectors
Directly compute k‐way partitioning, for example, by k‐means clustering
Usually performs better
Find a low‐dimensional
embedding by
eigen‐decomposition
separates data while projecting in the low dimensional space
allows clustering of non‐convex data effectively
Thank you !
Top Related