Fast modified global k-means algorithm for incremental cluster construction

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Fast modified global k-means algorithm for incremental cluster construction

Adil M.Bagirov, JulienUgon, DeanWebb PR, 2011

Presented by Wen-Chung Liao2011/01/05

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outlines

Motivation Objectives Methodology Experiments Conclusions Comments

3


N.Y.U.S.T.

I. M.Motivation

The global k-means algorithm and the modified global k-means algorithm are incremental clustering algorithms.─ allow one to find global or a near global minimizer of the clust

er (or error) function.

However, these algorithms are memory demanding ─ they require the storage of the affinity matrix .

Alternatively, this matrix can be computed at each iteration, however, this extends the computational time significantly.

4


N.Y.U.S.T.

I. M.Objectives

A new version of the modified global k-means algorithm is proposed:─ apply an auxiliary cluster function to generate a

set of starting points lying in different parts of the dataset.

─ the best solution is selected as a starting point for the next cluster center.

─ information gathered in previous iterations of the incremental algorithm to avoid computing the whole affinity matrix.

─ the triangle inequality for distances is used to avoid unnecessary computations

5


N.Y.U.S.T.

I. M.Methodology Modified global k-means algorithm [1]

─ Starts with the computation of one cluster center and attempts to optimally add one new cluster center at each iteration.

─ An auxiliary cluster function using k-1 cluster centers from the(k-1)-th iteration to compute the starting point for the k-th center.

─ The k-means algorithm is applied starting from this point to find the k-partition of the dataset.

Fast modified global k-means algorithm─ auxiliary cluster function to generate a set of starting poi

nts─ the best solution is selected─ avoid computing the whole affinity matrix

6


N.Y.U.S.T.

I. M.Modified global k-means algorithm

cluster function

7


N.Y.U.S.T.

I. M.

8


N.Y.U.S.T.

I. M.

: the solution to the(k-1)-partition problem

Modified global k-means algorithm

Auxiliary cluster function:

ｘ１

ｘ２

ｘ３

ｙＳ (y)

9


N.Y.U.S.T.

I. M.Modified global k-means algorithm

10


N.Y.U.S.T.

I. M.

u=1.0u=0.2

ｘ１

ｘ２

ｘ３

ｙＳ 0.2(y)

ｘ１

ｘ２

ｘ３

ｙＳ 1.0(y)

Fast modified global k-means algorithm

11


N.Y.U.S.T.

I. M.Reduction of computational effort

ｘ１

aiaj

Ｓ (ai)

12


N.Y.U.S.T.

I. M.Reduction of computational effort

13


N.Y.U.S.T.

I. M.Computational complexity

The modified global k-means algorithm ─ O(mk2T+km2+kmt)

The fast modified global k-means algorithm ─ O(p(mk2T+km2+kmt)) (without complexity reduction s

chemes) ─ O(p(mk2T+km1

2+km1t)) (with complexity reduction schemes)

T the number of iterations by Algorithm 2t the number of iterations by Algorithm 1 m1 the number of data points in the set P(u)∩A an

d m1<<m.

14


N.Y.U.S.T.

I. M.Numericalexperiments

k the number of clustersfopt the best known value of the cluster function × m

E the error in %,

α the number of Euclidean norm evaluations t the CPU time

15


N.Y.U.S.T.

I. M.

fFMGKM/fGKM

fFMGKM/fMGKM

α

CPU time

16


N.Y.U.S.T.

I. M.

The Dunn’s validity index

The Davies–Bouldin cluster validity measure • Show a similar pattern.

• Generate similar cluster structures

17


N.Y.U.S.T.

I. M.Conclusions

Developed a new version of the modified global k-means algorithm─ Using the k-1 cluster centers from the previous

iteration to solve the k-partition problem.─ does not rely on the affinity matrix to compute the

starting point─ use more than one starting point to minimize the

auxiliary function─ Two schemes to reduce the amount of computational

effort─ no guarantee that it will converge to the global

solution.

18


N.Y.U.S.T.

I. M.Comments

Advantages─ Schemes to avoid computational effort.

Shortages─ Determine the set U is not easy.

Applications─ clustering

Fast modified global k-means algorithm for incremental cluster construction

Documents

Transcript of Fast modified global k-means algorithm for incremental cluster construction