CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al....

29
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author: George et al. Advisor: Dr. Hsu Graduate: ZenJohn Huang IDSL seminar 2001/10/23

Transcript of CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al....

Page 1: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

CHAMELEON: A Hierarchical Clustering Algorithm Using

Dynamic Modeling

Author: George et al.

Advisor: Dr. Hsu

Graduate: ZenJohn Huang

IDSL seminar 2001/10/23

Page 2: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Outline

Motivation Objective Research restrict Literature review

An overview of related clustering algorithms The limitations of clustering algorithms

CHAMELEON Concluding remarks Personal opinion

Page 3: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Motivation

Existing clustering algorithms can breakdown Choice of parameters is incorrect Model is not adequate to capture the

characteristics of clusters Diverse shapes, densities, and sizes

Page 4: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Objective

Presenting a novel hierarchical clustering algorithm – CHAMELEON Facilitating discovery of natural and

homogeneous Being applicable to all types of data

Page 5: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Research Restrict

In this paper, authors ignored the issue of scaling to large data sets that cannot fit in the main memory

Page 6: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Literature Review

Clustering An overview of related clustering algorithms The limitations of the recently proposed state

of the art clustering algorithms

Page 7: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Clustering

The intracluster similarity is maximized and the intercluster similarity is minimized [Jain and Dubes, 1988]

Serving as the foundation for data mining and analysis techniques

Page 8: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Clustering(cont’d)

Applications Purchasing patterns Categorization of documents on WWW [Boley, et

al., 1999] Grouping of genes and proteins that have similar

functionality[Harris, et al., 1992] Grouping if spatial locations prone to earth

quakes[Byers and Adrian, 1998]

Page 9: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

An Overview of Related Clustering Algorithms Partitional techniques Hierarchical techniques

Page 10: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Partitional Techniques

K means[Jain and Dubes, 1988]

Page 11: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Hierarchical Techniques

CURE [Guha, Rastogi and Shim, 1998] ROCK [Guha, Rastogi and Shim, 1999]

Page 12: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Limitations of Existing Hierarchical Schemas CURE

Fail to take into account special characteristics

Page 13: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Limitations of Existing Hierarchical Schemas(cont’d) ROCK

Irrespective of densities and shapes

Page 14: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

CHAMELEON

Overview Modeling the data Modeling the cluster similarity A two-phase clustering algorithm Performance analysis Experimental Results

Page 15: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Overall Framework CHAMELEON

Page 16: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Modeling the Data

K-nearest graphs from an original data in 2D

Page 17: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Modeling the Cluster Similarity

Relative inter-connectivity

Page 18: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Modeling the Cluster Similarity(cont’d) Relative closeness

Page 19: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

A Two-phase Clustering Algorithm Phase I: Finding initial sub-clusters

Page 20: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

A Two-phase Clustering Algorithm(cont’d) Phase I: Finding initial sub-clusters

Multilevel paradigm[Karypis & Kumar, 1999]

hMeT|s [Karypis & Kumar, 1999]

Page 21: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

A Two-phase Clustering Algorithm(cont’d) Phase II: Merging sub-clusters using a

dynamic framework

RCjiRIji TCCRCTCCRI ),( and ),(

TRI, TRC: user specified threshold

Page 22: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

A Two-phase Clustering Algorithm(cont’d) Phase II: Merging sub-clusters using a

dynamic framework

),(*),( jiji CCRCCCRI

parameter specifieduser a is α

Page 23: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Performance Analysis

The amount of time required to compute K-nearest neighbor graph Two-phase clustering

Page 24: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Performance Analysis(cont’d)

The amount of time required to compute K-nearest neighbor graph

Low-dimensional data sets = O(n log n) High-dimensional data sets = O(n2)

Page 25: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Performance Analysis(cont’d)

The amount of time required to compute Two-phase clustering

Computing internal inter-connectivity and closeness for each cluster: O(nm)

Selecting the most similar pair of cluster: O(n log n + m2 log m)

Total time = O(nm + n log n + m2 log m)

Page 26: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Experimental Results

Program DBSCAN: a publicly available version CURE: a locally implemented version

Data sets Qualitative comparison

Page 27: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Data Sets• Five clusters• Different size, shape,

and density• Noise point

• Two clusters• Close to each other• Different region, different

densities• Six clusters• Different size, shape,

and orientation• Random noise point• Special artifacts

• Eight clusters• Different size, shape,

and orientation• Random noise

and special artifacts

• Eight clusters• Different size, shape, density,

and orientation• Random noise point

Page 28: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Concluding remarks

CHAMELEON can discover natural clusters of different shapes and sizes

It is possible to use other algorithms instead of k-nearest neighbor graph

Different domains may require different models for capturing closeness and inter-connectivity

Page 29: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Personal Opinion

Without further work