A modified version of the K-means algorithm with a distance based on cluster symmetry

24
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology A modified version of the K- means algorithm with a distance based on cluster symmetry Advisor Dr. Hsu Reporter Chun Kai Chen Author Mu-Chun Su and Chien-Hsin g Chou IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001

description

A modified version of the K-means algorithm with a distance based on cluster symmetry. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Mu-Chun Su and Chien-Hsing Chou. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001. Outline. Motivation Objective Introduction - PowerPoint PPT Presentation

Transcript of A modified version of the K-means algorithm with a distance based on cluster symmetry

Page 1: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A modified version of the K-means algorithm with a distance based on cluster symmetry

Advisor : Dr. HsuReporter: Chun Kai ChenAuthor:Mu-Chun Su and Chien-Hsing Chou

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001

Page 2: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline Motivation Objective Introduction The Point Symmetry Distance Experimental Results Conclusions Personal Opinion

Page 3: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation Since clusters can be of arbitrary shapes and sizes, the

Minkowski metrics seem not a good choice for situations where no a priori information about the geometric characteristics of the data set to be clustered exists

Page 4: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective Therefore, we have to find another more flexib

le measure─ One of the basic features of shapes and objects is symm

etry Propose a nonmetric measure based on the con

cept of point symmetry

Page 5: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.K-means Partitional Clustering

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Update the cluster means

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Update the cluster means 0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

reassignreassign

Page 6: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Symmetry-based version of the K-means algorithm

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Update the cluster means

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

reassignreassign

Update the cluster means

Fine-Tuning

reassign

Coarse-Tuning

Page 7: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction(1/4) Most of the conventional clustering methods assume t

hat patterns having similar locations or constant density create a single cluster─ Location or density becomes a characteristic property of a cluster

Page 8: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction(2/4) Mathematically identify clusters in a data set

─ usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster center

─ the most popular similarity measure the Euclidean distance

Page 9: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction(3/4) Euclidean distance as a measure of similarity

─ hyperspherical-shaped clusters of equal size are usually detected Mahalanobis distance

─ take care of hyperellipsoidal-shaped clusters, is one of the popular choices

Page 10: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction(4/4) The major difficulties using the Mahalanobis d

istance─ have to recompute the inverse of the sample covariance

matrix every time a pattern changes its cluster domain, which is computationally expensive

─ In fact, not only similarity measures, but also the number of clusters which cannot always be defined a priori will influence the clustering results

In this paper─ we focus on the selection of similarity measures

Page 11: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Symmetry Symmetry is so common in the abstract and in

nature─ reasonable to assume some kinds of symmetry exit in t

he structures of clusters─ immediate problem is how to find a metric to measure s

ymmetry

Page 12: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.The Point Symmetry DistanceThe point symmetry distance is defined as follows: Given N patterns, xi; i=1,…,N, and a reference vector c (e.g., a cluster centroid)

─ the denominator term is used to normalize─ If the right hand term of (2) is minimized when xi = xj*, then the

pattern xj* is denoted as the symmetrical pattern relative to xj with respect to c

cxcxcxcx

cxdij

ij

jiandNi

js

)()(min),(

_,...,1

Page 13: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Example of The Point Symmetry Distance

cxcxcxcx

cxdij

ij

jiandNi

js

)()(min),(

_,...,1

Page 14: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Symmetry-based version of the K-means algorithm(1/3) Step 1: Initialization

─ randomly choose K data points from the data set to initialize K cluster centroids, c1, c2 . . . ; cK.

Step 2: Coarse-Tuning─ use the ordinary K-means algorithm with the Euclidean distance to upda

te the K cluster centroids─ after the K cluster centroids converge or some kind of terminating criter

ia is satisfied

Page 15: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Symmetry-based version of the K-means algorithm(2/3) Step 3: Fine-Tuning

─ For pattern x, find the cluster centroid nearest it in the symmetrical sense

─ If the point symmetry distance is smaller than a prespecified parameter θ, then assign the data point x to the k*th cluster

ds(x,ck) is the point symmetry distance

─ Otherwise, the data point is assigned to the cluster centroid k using the following criterion:

d(x,ck) is the Euclidean distance

),(min*,...,1

kkk

cxdsArgk

),(min*,...,1

kkk

cxdArgk

Page 16: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Symmetry-based version of the K-means algorithm(3/3)Step 4: Updating

─ Compute the new centroids of the K clusters

─ where Sk(t) is the set whose elements are the patterns assigned to the kth cluster at time t and Nk is the number of elements in Sk.

Step 5: Continuation─ If no patterns change categories or the number of iterations has reached

a prespecified maximum number, then stop. Otherwise, go to Step 3.

)(

1)1(tSi

xk

k

k

i

Ntc

Page 17: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experimental Results Used four examples to compare the SBKM alg

orithm and the SBCL algorithm In addition, we use one example to show how t

o use the point symmetry distance in face detections

Page 18: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Mixture of Spherical and Ellipsoidal clusters

ordinary K-means

SBKM SBCL

Page 19: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Ring-shaped clusters

SBKM SBCL

ordinary K-means

Page 20: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Linear structures

SBKM SBCL

ordinary K-means

Page 21: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Combination of ring-shaped, compact,and linear clusters

ordinary K-means

SBKM SBCL

Page 22: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Detecting a face in a complex background

Page 23: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion Both use the point symmetry distance as the

dissimilarity measure, the SBKM algorithm outperformed the SBCL algorithm in many cases

The proposed SBKM algorithm can be used to group a given data set into a set of clusters of different geometrical structures

Besides, we can also apply the point symmetry distance to detect human faces. The experimental results are encouraging

Page 24: A modified version of the K-means algorithm with a distance based on cluster symmetry

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Opinion

AdvantageIdea, innovate

Applicationclustering

Future WorkAdopt symmetry distance on SOM