Fast modified global k-means algorithm for incremental cluster construction

18
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Fast modified global k- means algorithm for incremental cluster construction Adil M.Bagirov, JulienUgon, DeanWebb PR, 2011 Presented by Wen-Chung Liao 2011/01/05

description

Fast modified global k-means algorithm for incremental cluster construction. Adil M.Bagirov, JulienUgon, DeanWebb PR, 2011 Presented by Wen-Chung Liao 2011/01/05. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Transcript of Fast modified global k-means algorithm for incremental cluster construction

Page 1: Fast modified global k-means algorithm for incremental cluster construction

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Fast modified global k-means algorithm for incremental cluster construction

Adil M.Bagirov, JulienUgon, DeanWebb PR, 2011

Presented by Wen-Chung Liao2011/01/05

Page 2: Fast modified global k-means algorithm for incremental cluster construction

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outlines

Motivation Objectives Methodology Experiments Conclusions Comments

Page 3: Fast modified global k-means algorithm for incremental cluster construction

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

The global k-means algorithm and the modified global k-means algorithm are incremental clustering algorithms.─ allow one to find global or a near global minimizer of the clust

er (or error) function.

However, these algorithms are memory demanding ─ they require the storage of the affinity matrix .

Alternatively, this matrix can be computed at each iteration, however, this extends the computational time significantly.

Page 4: Fast modified global k-means algorithm for incremental cluster construction

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

A new version of the modified global k-means algorithm is proposed:─ apply an auxiliary cluster function to generate a

set of starting points lying in different parts of the dataset.

─ the best solution is selected as a starting point for the next cluster center.

─ information gathered in previous iterations of the incremental algorithm to avoid computing the whole affinity matrix.

─ the triangle inequality for distances is used to avoid unnecessary computations

Page 5: Fast modified global k-means algorithm for incremental cluster construction

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Modified global k-means algorithm [1]

─ Starts with the computation of one cluster center and attempts to optimally add one new cluster center at each iteration.

─ An auxiliary cluster function using k-1 cluster centers from the(k-1)-th iteration to compute the starting point for the k-th center.

─ The k-means algorithm is applied starting from this point to find the k-partition of the dataset.

Fast modified global k-means algorithm─ auxiliary cluster function to generate a set of starting poi

nts─ the best solution is selected─ avoid computing the whole affinity matrix

Page 6: Fast modified global k-means algorithm for incremental cluster construction

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Modified global k-means algorithm

cluster function

Page 7: Fast modified global k-means algorithm for incremental cluster construction

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Page 8: Fast modified global k-means algorithm for incremental cluster construction

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

: the solution to the(k-1)-partition problem

Modified global k-means algorithm

Auxiliary cluster function:

x1

x2

x3

y S (y)

Page 9: Fast modified global k-means algorithm for incremental cluster construction

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Modified global k-means algorithm

Page 10: Fast modified global k-means algorithm for incremental cluster construction

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

u=1.0u=0.2

x1

x2

x3

y S 0.2(y)

x1

x2

x3

y S 1.0(y)

Fast modified global k-means algorithm

Page 11: Fast modified global k-means algorithm for incremental cluster construction

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Reduction of computational effort

x1

aiaj

S (ai)

Page 12: Fast modified global k-means algorithm for incremental cluster construction

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Reduction of computational effort

Page 13: Fast modified global k-means algorithm for incremental cluster construction

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Computational complexity

The modified global k-means algorithm ─ O(mk2T+km2+kmt)

The fast modified global k-means algorithm ─ O(p(mk2T+km2+kmt)) (without complexity reduction s

chemes) ─ O(p(mk2T+km1

2+km1t)) (with complexity reduction schemes)

T the number of iterations by Algorithm 2t the number of iterations by Algorithm 1 m1 the number of data points in the set P(u)∩A an

d m1<<m.

Page 14: Fast modified global k-means algorithm for incremental cluster construction

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Numericalexperiments

k the number of clustersfopt the best known value of the cluster function × m

E the error in %,

α the number of Euclidean norm evaluations t the CPU time

Page 15: Fast modified global k-means algorithm for incremental cluster construction

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

fFMGKM/fGKM

fFMGKM/fMGKM

α

CPU time

Page 16: Fast modified global k-means algorithm for incremental cluster construction

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The Dunn’s validity index

The Davies–Bouldin cluster validity measure • Show a similar pattern.

• Generate similar cluster structures

Page 17: Fast modified global k-means algorithm for incremental cluster construction

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

Developed a new version of the modified global k-means algorithm─ Using the k-1 cluster centers from the previous

iteration to solve the k-partition problem.─ does not rely on the affinity matrix to compute the

starting point─ use more than one starting point to minimize the

auxiliary function─ Two schemes to reduce the amount of computational

effort─ no guarantee that it will converge to the global

solution.

Page 18: Fast modified global k-means algorithm for incremental cluster construction

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

Advantages─ Schemes to avoid computational effort.

Shortages─ Determine the set U is not easy.

Applications─ clustering