A dissimilarity measure for the K-Modes clustering algorithm
description
Transcript of A dissimilarity measure for the K-Modes clustering algorithm
![Page 1: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/1.jpg)
A dissimilarity measure for the K-Modes clustering algorithm
Presenter : Bo-Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang
KBS, 2012
1
![Page 2: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/2.jpg)
Outlines
• Motivation• Objectives• Methodology• Experiments• Conclusions• Comments
2
![Page 3: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/3.jpg)
Motivation• In this paper, the limitations of simple matching
dissimilarity measure and Ng’s dissimilarity measure are revealed using some illustrative examples.
3
![Page 4: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/4.jpg)
Limitations of simple matching dissimilarity measure
• Simple matching is a common approach, the simple matching dissimilarity measure is is defined as:
• However, simple matching often results :– Weak intrasimilarity.– Disregards the similarity hidden between categorical values.
4
x≡y =1, if x≠y
0, otherwise
![Page 5: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/5.jpg)
Limitations of Ng’s dissimilarity measure
• For the k-Modes algorithm with Ng’s dissimilarity measure, the simple matching dissimilarity measure is still used in the first iteration.
– Disregards the similarity hidden between categorical values.
5
![Page 6: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/6.jpg)
Objectives• Based on the idea of biological and genetic taxonomy
and rough membership function, a new dissimilarity measure for the k-Modes algorithm is define.
• The dissimilarity measure between a mode of a cluster and an object is given by improving Ng’s dissimilarity measure.
6
![Page 7: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/7.jpg)
Methodology• Review some basic concepts of rough set theory.– Definition 1 Categorical information system• IS = (U,A,V,f)
– Definition 2 Binary relation IND(P)• 1.• 2.
– .Definition 3 The rough membership function µPX: U→[0,1]
•
7
![Page 8: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/8.jpg)
Methodology-A new dissimilarity measure between two objects• Definition 4 A similarity measure between objects x and y with respect to
a–
8
![Page 9: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/9.jpg)
Methodology-A new dissimilarity measure between two objects• Definition 5 The dissimilarity measure between x and y with respect to P.
9
![Page 10: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/10.jpg)
Methodology-A new dissimilarity measure between two objects
• Example : A new dissimilarity measure between two objects– Simple Matching Dissimilarity Measure :
– New Dissimilarity Measure :
10
![Page 11: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/11.jpg)
Methodology-A new dissimilarity measure between a mode and an object• Ng’s Dissimilarity Measure
11
![Page 12: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/12.jpg)
Methodology-A new dissimilarity measure between a mode and an object• Definition 7
The new dissimilarity measure between xi and zl with respect to P
12
![Page 13: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/13.jpg)
Methodology-A new dissimilarity measure between a mode and an objects• Example : A new dissimilarity measure between a mode and an object
– Ng’s dissimilarity measure
– New dissimilarity measure
13
![Page 14: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/14.jpg)
Methodology-Convergence and complexity analysis• The objective of clustering a set of n = |U| objects into k
clusters is to find W and Z that minimize:
14
![Page 15: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/15.jpg)
Methodology-Convergence and complexity analysis• This process can be formulated as the following k-
Modes algorithm:
15
![Page 16: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/16.jpg)
Methodology-Convergence and complexity analysis• Now we consider the convergence of the k-Modes algorithm
with the proposed dissimilarity measure NDisP(zl ,x i )
16
![Page 17: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/17.jpg)
Methodology-Convergence and complexity analysis• Proof. For a given W. we have :
17
![Page 18: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/18.jpg)
Methodology-Convergence and complexity analysis
18
![Page 19: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/19.jpg)
Methodology-Convergence and complexity analysis
19
![Page 20: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/20.jpg)
Experiments• Evaluation on scalability
20
![Page 21: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/21.jpg)
Experiments• Evaluation on scalability
21
![Page 22: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/22.jpg)
Experiments• Evaluation on clustering efficiency
22
![Page 23: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/23.jpg)
Conclusions• The new measure that unifies the dissimilarity measures
between two objects and between an object and a mode as well.
• The k-Modes algorithm using the new dissimilarity measure can be safely and effectively used in case of large data sets.
• The results of experiments using synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure.
23
![Page 24: A dissimilarity measure for the K-Modes clustering algorithm](https://reader036.fdocuments.us/reader036/viewer/2022062315/56815ed3550346895dcd66d0/html5/thumbnails/24.jpg)
Comments
• Advantages– The method that can save some time.
• Applications– Dissimilarity measure
24