K means Clustering Algorithm
-
Upload
kasun-ranga-wijeweera -
Category
Technology
-
view
4.895 -
download
8
description
Transcript of K means Clustering Algorithm
![Page 2: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/2.jpg)
• Organizing data into classes such that there is
• high intra-class similarity
• low inter-class similarity
• Finding the class labels and the number of classes directly from the data (in contrast to classification).
• More informally, finding natural groupings among objects.
What is Clustering?What is Clustering?
![Page 3: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/3.jpg)
What is a natural grouping among these objects?What is a natural grouping among these objects?
![Page 4: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/4.jpg)
School Employees Simpson's Family Males Females
Clustering is subjectiveClustering is subjective
What is a natural grouping among these objects?What is a natural grouping among these objects?
![Page 5: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/5.jpg)
Defining Distance MeasuresDefining Distance MeasuresDefinition: Let O1 and O2 be two objects from the universe
of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2)
0.23 3 342.7
Kasun Kiosn
![Page 6: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/6.jpg)
Consider a Set of Data Points,Consider a Set of Data Points,
And a Set of Clusters,And a Set of Clusters,
![Page 7: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/7.jpg)
The Goal,The Goal,
![Page 8: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/8.jpg)
Algorithm k-means1. Randomly choose K data items from X as initial centroids.
2. Repeat
Assign each data point to the cluster which has the closest centroid.
Calculate new cluster centroids.
Until the convergence criteria is met.
![Page 9: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/9.jpg)
The data points
![Page 10: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/10.jpg)
Initialization
![Page 11: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/11.jpg)
#Runs = 1
![Page 12: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/12.jpg)
#Runs = 2
![Page 13: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/13.jpg)
#Runs = 3
![Page 14: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/14.jpg)
K-means gets stuck in a local optima
![Page 15: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/15.jpg)
The data points
![Page 16: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/16.jpg)
Initialization
![Page 17: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/17.jpg)
#Runs = 1
![Page 18: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/18.jpg)
#Runs = 2
![Page 19: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/19.jpg)
#Runs = 3
![Page 20: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/20.jpg)
#Runs = 4
![Page 21: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/21.jpg)
Applications of K-means Method
• Optical Character Recognition
• Biometrics
• Diagnostic Systems
• Military Applications
![Page 22: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/22.jpg)
Comments on the Comments on the K-MeansK-Means Method Method
• Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters,
and t is # iterations. Normally, k, t << n.– Often terminates at a local optimum. The global optimum may
be found using techniques such as: deterministic annealing and genetic algorithms
• Weakness– Applicable only when mean is defined, then what about
categorical data?– Need to specify k, the number of clusters, in advance– Unable to handle noisy data and outliers– Not suitable to discover clusters with non-convex shapes
![Page 23: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/23.jpg)
Any Questions ?Any Questions ?
![Page 24: K means Clustering Algorithm](https://reader034.fdocuments.us/reader034/viewer/2022050710/546e8501b4af9fd7268b4679/html5/thumbnails/24.jpg)
Thanks for your attention !Thanks for your attention !