PARTITIONAL CLUSTERING

38
PARTITIONAL CLUSTERING Deniz ÜSTÜN

description

PARTITIONAL CLUSTERING. Deniz ÜSTÜN. CONTENT. WHAT IS CLUSTERING ? WHAT IS PARTITIONAL CLUSTERING ? THE USED ALGORITHMS IN PARTITIONAL CLUSTERING. What is Clustering ?. - PowerPoint PPT Presentation

Transcript of PARTITIONAL CLUSTERING

Page 1: PARTITIONAL               CLUSTERING

PARTITIONAL CLUSTERING

Deniz ÜSTÜN

Page 2: PARTITIONAL               CLUSTERING

CONTENT

WHAT IS CLUSTERING ?

WHAT IS PARTITIONAL CLUSTERING ?

THE USED ALGORITHMS IN PARTITIONAL CLUSTERING

Page 3: PARTITIONAL               CLUSTERING

What is Clustering ? A process of clustering is classification of the objects which are

similar among them, and organizing of data into groups.

The techniques for Clustering are among the unsupervised methods.

Page 4: PARTITIONAL               CLUSTERING

What is Partitional Clustering ? The Partitional Clustering Algorithms separate the similar

objects to the Clusters.

The Partitional Clustering Algorithms are succesful to determine center based Cluster.

The Partitional Clustering Algorithms divide n objects to k cluster by using k parameter.

The techniques of the Partitional Clustering start with a randomly chosen clustering and then optimize the clustering according to some accuracy measurement.

Page 5: PARTITIONAL               CLUSTERING

The Used Algorithms in Partitional Clustering

K-MEANS ALGORITHM

K-MEDOIDS ALGORITHM

FUZZY C-MEANS ALGORITHM

Page 6: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHM

K-MEANS algorithm is introduced as one of the simplest unsupervised learning algorithms that resolve the clustering problems by J.B. MacQueen in 1967 (MacQueen, 1967).

K-MEANS algorithm allows that one of the data belong to only a cluster.

Therefore, this algorithm is a definite clustering algorithm.

Given the N-sample of the clusters in the N-dimensional space.

Page 7: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHM

This space is separated, {C1,C2,…,Ck} the K clusters. The vector mean (Mk) of the Ck cluster is given (Kantardzic, 2003) :

kn

iik

kk Xn

M1

1

where the value of Xk is i.sample belong to Ck.

The square-error formula for the Ck is given :

2

1

2

kn

ikiki MXe

Page 8: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHM The square-error formula for the Ck is called the changing in

cluster. The square-error for all the clusters is the sum of the changing

in clusters.

K

kkk eE

1

22

The aim of the square-error method is to find the K clusters that minimize the value of the Ek

2 according to the value of the given K

Page 9: PARTITIONAL               CLUSTERING

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

K-MEANS ALGORITHMEXAMPLE

Gözlemler Değişken1 Değişken2 Küme ÜyeliğiX1 3 2 C1X2 2 3 C2X3 7 8 C1

5,52

82,2

731

M

3,213,

12

2

M

Page 10: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE

2158575253 222221 e

2102122

21

2 eeE

03332 2222 e

Page 11: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE

41,12332, 2212 XMd

82,22535, 2211 XMd

60,33525, 2221 XMd

03322, 2222 XMd

60,38575, 2231 XMd

07,78372, 2232 XMd

Gözlemler d(M1) d(M2) Küme Üyeliği

X1 2,82 1,41 C2X2 3,60 0 C2X3 3,60 7,07 C1

1112 ,, XMdXMd

Page 12: PARTITIONAL               CLUSTERING

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

K-MEANS ALGORITHMEXAMPLE

Gözlemler Değişken1 Değişken2 Küme ÜyeliğiX1 3 2 C2X2 2 3 C2X3 7 8 C1

5.2,5.22

32,2

232

M

8,718,

17

1

M

Page 13: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE

15.235.225.225.23 222222 e

11022

21

2 eeE

08877 2221 e

Page 14: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE-1

21,72837, 2211 XMd

7,025,235,2, 2212 XMd

07,73827, 2221 XMd

7,035,225,2, 2222 XMd

08877, 2231 XMd

10,785.275.2, 2232 XMd

Gözlemler d(M1) d(M2) Küme Üyeliği

X1 7,21 0,7 C2X2 7,07 0,7 C2X3 0 7,10 C1

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

C2

C1

Page 15: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE-2

Dataset The Number of Attributes

The Number of Features

The Number of Class

Synthetic 1200 2 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 16: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE-2

K=2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 17: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE-2

K=3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 18: PARTITIONAL               CLUSTERING

K-MEANS ALGORITHMEXAMPLE-2

K=4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 19: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHM The aim of the K-MEDOIDS algorithm is to find the K

representative objects (Kaufman and Rousseeuw, 1987). Each cluster in K-MEDOIDS algorithm is represented by

the object in cluster. K-MEANS algorithm determine the clusters by the mean

process. However, K-MEDOIDS algorithm find the cluster by using mid-point.

2

1

2

kn

ikiki OXe

Page 20: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Page 21: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Select the Randomly K-MedoidsSelect the Randomly K-Medoids

Page 22: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Allocate to Each Point to Closest MedoidAllocate to Each Point to Closest Medoid

Page 23: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Allocate to Each Point to Closest MedoidAllocate to Each Point to Closest Medoid

Page 24: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Allocate to Each Point to Closest MedoidAllocate to Each Point to Closest Medoid

Page 25: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Determine New Medoid for Each ClusterDetermine New Medoid for Each Cluster

Page 26: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Determine New Medoid for Each ClusterDetermine New Medoid for Each Cluster

Page 27: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Allocate to Each Point to Closest MedoidAllocate to Each Point to Closest Medoid

Page 28: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-1

Stop ProcessStop Process

Page 29: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-2

Dataset The Number of Attributes

The Number of Features

The Number of Class

Synthetic 2000 2 3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 30: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-2

K=2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 31: PARTITIONAL               CLUSTERING

K-MEDOIDS ALGORITHMEXAMPLE-2

K=3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 32: PARTITIONAL               CLUSTERING

FUZZY C-MEANS ALGORITHM Fuzzy C-MEANS algorithm is the best known and widely

used a method. Fuzzy C-MEANS algorithm is introduced by DUNN in 1973

and improved by BEZDEK in 1981 [Höppner vd, 2000]. Fuzzy C-MEANS lets that objects are belonging to two and

more cluster. The total value of the membership of a data for all the

classes is equal to one. However, the value of the memebership of the cluster that

contain this object is high than other clusters. This Algorithm is used the least squares method [Höppner

vd, 2000].

Page 33: PARTITIONAL               CLUSTERING

FUZZY C-MEANS ALGORITHM

N

i

C

jii

mij mcxuJm

1 1

2 1 ,

The algorithm start by using randomly membership matrix (U) and then the center vector calculate [Höppner vd, 2000].

N

i

mij

N

ii

mij

j

u

xuc

1

1

Page 34: PARTITIONAL               CLUSTERING

FUZZY C-MEANS ALGORITHMAccording to the calculated center vector, the membership matrix (u) is computed by using the given as:

C

k

m

ki

ii

ij

cxcx

u

1

12

1

The new membership matrix (unew) is compared with the old membership matrix (uold) and the the process continues until the difference is smaller than the value of the ε

Page 35: PARTITIONAL               CLUSTERING

FUZZY C-MEANS ALGORITHMEXAMPLE

Dataset The Number of Attributes

The Number of Features

The Number of Class

Synthetic 2000 2 3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 36: PARTITIONAL               CLUSTERING

FUZZY C-MEANS ALGORITHMEXAMPLE

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

C=3m=5ε=1e-6

Page 37: PARTITIONAL               CLUSTERING

Results

K-MEDOIDS is the best algorithm according to K-MEANS and FUZZY C-MEANS.

However, K-MEDOIDS algorithm is suitable for small datasets.

K-MEANS algorithm is the best appropriate in terms of time.

In FUZZY C-MEANS algorithm, a object can belong to one or more cluster.

However, a object can belong to only a cluster in the other two algorithms.

Page 38: PARTITIONAL               CLUSTERING

References [MacQueen, 1967] J.B., MacQueen, “Some Methods for Classification and Analysis of

Multivariate Observations”, Proc. Symp. Math. Statist.and Probability (5th), 281-297,(1967). [Kantardzic, 2003] M., Kantardzic, “Data Mining: Concepts, Methods and Algorithms”, Wiley,

(2003). [Kaufman and Rousseeuw, 1987] L., Kaufman, P. J., Rousseeuw, “Clustering by Means of

Medoids,” Statistical Data Analysis Based on The L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416, (1987).

[Kaufman and Rousseeuw, 1990] L., Kaufman, P. J., Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley and Sons., (1990).

[Höppner vd, 2000] F., Höppner, F., Klawonn, R., Kruse, T., Runkler, “Fuzzy Cluster Analysis”, John Wiley&Sons, Chichester, (2000).

[Işık and Çamurcu, 2007] M., Işık, A.Y., Çamurcu, “K-MEANS, K-MEDOIDS ve Bulanık C-MEANS Algoritmalarının Uygulamalı olarak Performanslarının Tespiti”, İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, Sayı :11, 31-45, (2007).