Fast modified global k-means algorithm for incremental cluster construction
description
Transcript of Fast modified global k-means algorithm for incremental cluster construction
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Fast modified global k-means algorithm for incremental cluster construction
Adil M.Bagirov, JulienUgon, DeanWebb PR, 2011
Presented by Wen-Chung Liao2011/01/05
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outlines
Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
The global k-means algorithm and the modified global k-means algorithm are incremental clustering algorithms.─ allow one to find global or a near global minimizer of the clust
er (or error) function.
However, these algorithms are memory demanding ─ they require the storage of the affinity matrix .
Alternatively, this matrix can be computed at each iteration, however, this extends the computational time significantly.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
A new version of the modified global k-means algorithm is proposed:─ apply an auxiliary cluster function to generate a
set of starting points lying in different parts of the dataset.
─ the best solution is selected as a starting point for the next cluster center.
─ information gathered in previous iterations of the incremental algorithm to avoid computing the whole affinity matrix.
─ the triangle inequality for distances is used to avoid unnecessary computations
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology Modified global k-means algorithm [1]
─ Starts with the computation of one cluster center and attempts to optimally add one new cluster center at each iteration.
─ An auxiliary cluster function using k-1 cluster centers from the(k-1)-th iteration to compute the starting point for the k-th center.
─ The k-means algorithm is applied starting from this point to find the k-partition of the dataset.
Fast modified global k-means algorithm─ auxiliary cluster function to generate a set of starting poi
nts─ the best solution is selected─ avoid computing the whole affinity matrix
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Modified global k-means algorithm
cluster function
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
: the solution to the(k-1)-partition problem
Modified global k-means algorithm
Auxiliary cluster function:
x1
x2
x3
y S (y)
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Modified global k-means algorithm
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
u=1.0u=0.2
x1
x2
x3
y S 0.2(y)
x1
x2
x3
y S 1.0(y)
Fast modified global k-means algorithm
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Reduction of computational effort
x1
aiaj
S (ai)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Reduction of computational effort
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Computational complexity
The modified global k-means algorithm ─ O(mk2T+km2+kmt)
The fast modified global k-means algorithm ─ O(p(mk2T+km2+kmt)) (without complexity reduction s
chemes) ─ O(p(mk2T+km1
2+km1t)) (with complexity reduction schemes)
T the number of iterations by Algorithm 2t the number of iterations by Algorithm 1 m1 the number of data points in the set P(u)∩A an
d m1<<m.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Numericalexperiments
k the number of clustersfopt the best known value of the cluster function × m
E the error in %,
α the number of Euclidean norm evaluations t the CPU time
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
fFMGKM/fGKM
fFMGKM/fMGKM
α
CPU time
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
The Dunn’s validity index
The Davies–Bouldin cluster validity measure • Show a similar pattern.
• Generate similar cluster structures
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
Developed a new version of the modified global k-means algorithm─ Using the k-1 cluster centers from the previous
iteration to solve the k-partition problem.─ does not rely on the affinity matrix to compute the
starting point─ use more than one starting point to minimize the
auxiliary function─ Two schemes to reduce the amount of computational
effort─ no guarantee that it will converge to the global
solution.
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
Advantages─ Schemes to avoid computational effort.
Shortages─ Determine the set U is not easy.
Applications─ clustering