Efficient and Effective Clustering Methods for Spatial...
Transcript of Efficient and Effective Clustering Methods for Spatial...
![Page 1: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/1.jpg)
1
Efficient and Effective Clustering Methods for Spatial
Data Mining
Raymond T. Ng, Jiawei Han
![Page 2: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/2.jpg)
2
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 3: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/3.jpg)
3
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 4: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/4.jpg)
4
Spatial Data Mining
� Identifying interesting relationships and characteristics that may exist implicitly in Spatial Databases
� Different from Relational Databases� Spatial objects - store both spatial and non-
spatial attributes� Queries (“All Walmart stores within 10 miles of
UH)� Spatial Joins, work on spatial indexes (R-tree)� Huge sizes (Tera bytes)
� GIS is a classic example
![Page 5: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/5.jpg)
5
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 6: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/6.jpg)
6
Partitioning Methods
Given K, the number of partitions to create, a partitioning method constructs initial partitions. It then iterative refines the quality of these clusters so as to maximize intra-cluster similarity and inter-cluster dissimilarity.
[Quality of Clustering]: Average dissimilarity of objects from their cluster centers (medoids)
Selected algorithms:
1. K-medoids
2. PAM
3. CLARA
4. CLARANS
![Page 7: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/7.jpg)
7
K-Medoids
� Partition based clustering (K partitions)
� Effective, why ?
� Resistant to outliers� Do not depend on order in
which data points are examined
� Cluster center is part of dataset, unlike k-means where cluster center is gravity based
� Experiments show that large data sets are handled efficiently
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K-means
K-medoids
![Page 8: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/8.jpg)
8
PAM (Partitioning Around Medoids)
� [Goal]: Find K representative objects of the data set. Each of the K objects is called a Medoid, the most centrally located object within a cluster.
![Page 9: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/9.jpg)
9
PAM (2)
� Start with K data points designated as medoids. Create cluster around a medoid by moving data points close to the medoid
Oj belongs to Oi
if d(Oj, Oi) = minOe d(Oj, Oe)
� Iteratively replace Oi with Oh if quality of clustering improves.
� Swapping cost, Cijh, associated for replacing a selected object Oi with a non-selected object Oh
![Page 10: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/10.jpg)
10
PAM (3)
* O(k(n-k)2) for each iteration* Good for small data sets(n=100, k=5)
![Page 11: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/11.jpg)
11
CLARA (Clustering LARge Applications)
� Improvement over PAM
� Finds medoids in a sample from the dataset
� [Idea]: If the samples are sufficiently random, the medoids of the sample approximate the medoids of the dataset
� [Heuristics]: 5 samples of size 40+2k gives satisfactory results
� Works well for large datasets (n=1000, k=10)
![Page 12: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/12.jpg)
12
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 13: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/13.jpg)
13
CLARANS (Clustering Large Applications based on RANdomized Search)
� A graph abstraction, Gn,k
� Each vertex is a collection of k medoids
� | S1 S2 | = k – 1
� Each node has k(n-k) neighbors
� Cost of each node is total dissimilarity of objects to their medoids
� PAM searches whole graph
� CLARA searches subgraph
S1
S2
{Od1, ..., Odk}
{Oc1, ..., Ock}
{Ob1, ..., Obk}
{Oa1, ..., Oak}
{Om1, ..., Omk}
∩
![Page 14: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/14.jpg)
14
CLARANS (2)
Experimental values
• numLocal = 2
• maxNeighbors =
max(1.25% of k(n-k), 250)
![Page 15: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/15.jpg)
15
CLARANS (3)
� Outperforms PAM and CLARA in terms of running time and quality of clustering
� O(n2) for each iteration
CLARANS vs PAM
CLARANS vs CLARA
![Page 16: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/16.jpg)
16
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 17: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/17.jpg)
17
Generalization
� Useful to mine non-spatial attributes
� Process of merging tuples based on a concept hierarchy
� DBLearn – SQL query, gen. hierarchy and threshold
Initial relation Generalized relation
Sphere(color, diameter)
![Page 18: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/18.jpg)
18
Silhouette
Silhouette of object Oj
� determines how much Oj belongs to it’s cluster
� Between -1 and 1� 1 indicates high
degree of membership
Silhouette width of cluster� Average silhouette of
all objects in cluster
Silhouette coefficient� Average silhouette
widths of k clusters
Silhoutte width Interpretation
0.71 – 1 Strong cluster
0.51 – 0.7 Reasonable cluster
0.26 – 0.5 Weak or artificial cluster
≤ 0.25 No cluster found
![Page 19: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/19.jpg)
19
SD and NSD approach
� SD – Spatial Dominant
� NSD – Non-Spatial Dominant
� Clustering for spatial attributes / Generalization for non-spatial attributes
� Dominance is decided by what is carried out first (clustering/generalization)
� Second phase works on tuples from previous stage
![Page 20: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/20.jpg)
20
SD(CLARANS)
Specify learning
request in the
form of SQL
query
Data
SQL
TuplesOi
OjOh
CLARANS
on spatial
attributes
Knat clusters
Collect non-spatial
components
Apply DBLearn
For every cluster
� Finds non-spatial generalizations from spatial clustering
� Value for Knat is determined through heuristics using the silhouette coefficients
� Clustering phase can be treated as finding spatial generalization hierarchy
![Page 21: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/21.jpg)
21
NSD(CLARANS)
� Finds spatial clusters from non-spatial generalizations
� Clusters may overlap
![Page 22: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/22.jpg)
22
Overview
� Spatial Data Mining
� Clustering techniques
� CLARANS
� Spatial and Non-Spatial dominant CLARANS
� Observations
� Summary
![Page 23: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/23.jpg)
23
Observations
� In all previous methods, quality of mining depends on the SQL query
� CLARANS assumes that the entire dataset is in memory. Not always the case for large data sets.
� Quality of results cannot be guaranteed when N is very large – due to Randomized Search
![Page 24: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/24.jpg)
24
Observations (2)
� Other clustering algorithms proposed for Spatial Data Mining
� Hierarchical: BIRCH
� Density based: DBSCAN, GDBSCAN, DBRS
� Grid based: STING
![Page 25: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/25.jpg)
25
Summary
� A seminal paper on use of clustering for spatial data mining
� CLARANS is an effective clustering technique for large datasets
� SD(CLARANS)/NSD(CLARANS) are effective spatial data mining algorithms
![Page 26: Efficient and Effective Clustering Methods for Spatial ...cis.csuohio.edu/~sschung/CIS660/ClaransClustering.pdfEfficient and Effective Clustering Methods for Spatial Data Mining (1994)](https://reader035.fdocuments.us/reader035/viewer/2022071111/5fe71061217cd92bcc1560a4/html5/thumbnails/26.jpg)
26
References
� Primary
� Efficient and Effective Clustering Methods for Spatial Data Mining (1994) - Raymond T. Ng, Jiawei Han
� Secondary
� CLARANS: A Method for Clustering Objects for Spatial Data Mining - Raymond T. Ng, Jiawei Han
� Clustering for Mining in Large Spatial Databases -Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
� An Introduction to Spatial Database Systems - Ralf
Hartmut Güting