A Cluster Validity Measure With Outlier Detection for Support Vector Clustering
description
Transcript of A Cluster Validity Measure With Outlier Detection for Support Vector Clustering
![Page 1: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering
Presenter : Lin, Shu-HanAuthors : Jeen-Shing Wang, Jen-Chieh Chiang
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)
![Page 2: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments
![Page 3: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC
SVC is from SVMs SVMs is supervised clustering technique
Fast convergence Good generalization performance Robustness for noise
SVC is unsupervised approach1. Data points map to HD feature space using a Gaussian kernel.
2. Look for smallest sphere enclose data.
3. Map sphere back to data space to form set of contours.
4. Contours are treated as the cluster boundaries.
3
![Page 4: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
To find the minimal enclose sphere with soft margin:
To solve this problem, the Lagrangian function:
4
a
![Page 5: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
5
![Page 6: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
Karush-Kuhn-Tucker complementarity:
6
Bound SV; Outlier
![Page 7: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC -Sphere Analysis
To find the minimal enclose sphere with soft margin:
C : existence of outliers allowed
7
Wolfe dual optimization
problem a
![Page 8: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC -Sphere Analysis
The distance between x and a:
q : |clusters| & the smoothness/tightness of the cluster boundaries.
8
Mercer kernelKernel: Gaussian
a
Gaussian function:
![Page 9: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
9
The traditional cluster validity measure such as Partition coefficient (PC) Separation measures
Base on fuzzy membership grades and cancroids of clusters.
SVC algorithm generates boundaries to cluster are arbitrary no fuzzy membership grade.
Which clustering is better?
![Page 10: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Optimal cluster number Cluster validity measure Outlier-detection algorithm Cluster merging mechanism
10
Outlier-detection
Cluster merging
![Page 11: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology- Overview
11
Cluster Validity Measure for the SVC Algorithm
Outlier detection
Cluster-Merging Mechanism
C=1, no outliers are allowed
![Page 12: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Cluster Validity Measure for the SVC Algorithm
12
Compactness (intra-cluster)
Separation (inter-cluster)
Cluster Validity measure (ratio) for SVC
min
![Page 13: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Outlier Detection
13
In SVC, outliers (BSV) are the data in boundary regions.
q = 1
q = 4
q = 2
q = 1.8C=0.02
singleton
![Page 14: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Outlier Detection
C If C=1, result clusters are smooth, but not desirable
BSV (outlier) All outlier are SVs Some outlier is far away from other data in clusters
SVs More SVs make too tight to fit the data
q Increase q makes clusters compact
Singleton Important criterion
14
q = 1
q = 4
q = 2
q = 1.8C=0.02
singleton
![Page 15: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Outlier Detection
Outlier Existence Criterion
Desirable Cluster Criterion Singleton clusters can’t exceed threshold Datapoint’s % of SVs can’t greater than threshold, suggested 50% Recursively adjust C to satisfy this two criterion
15
Suggested γ = 2
![Page 16: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Cluster-Merging Mechanism
Similarity: overlapping degree
16
Gaussian function:
PC= 0
PA > 0
![Page 17: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Cluster-Merging Mechanism
1) Agglomerative outliers/noises: identificationFor all ci < ε, i = 1, . . . , K, where ε is density, chosen as
3%~5%{Set x ← mi. For each j, j = i, perform pj(x), where pj [0, 1] ∈
is the normalized overlapping index of the j cluster. If pj(x)>0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.}
2) Compatible clusters: Combination (similarity)Sort the size of the remaining K clusters in ascending order
such that cK = max(ci), i K. For each i, i = 1, . . . , K, perform {Set ∀ ∈x ← mi. For each j, j = i + 1, . . . , K, perform pj(x)
Find l = arg maxi+1≤j≤K pj(x), where arg maxa denotes the value of a at which the expression that follows is maximized.
If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.}
17
![Page 18: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Summary
1) Initialize a small value of q, and set C = 1 and γ = 2
2) Perform SVC algorithm, get |clusters|. 3) If |clusters| < 2, increase q, go to 2).4) If the outlier-detection criterion holds,
decrease C, fix q, and go to 2). Otherwise, go to 5).
5) If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2).
6) Compute validity measure index (V (m)).7) If |clusters| > √N, increase q, and go to 2).
Otherwise, stop the SVC.8) Use cluster-merging mechanism to identify
an ideal |clusters|. Output |clusters|. 18
![Page 19: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Benchmark and Artificial Examples Bensaid Data Set
19
![Page 20: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise
20
![Page 21: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Benchmark and Artificial Examples
21
Five-Cluster Data Set With Noise, after cluster-mergeMerge
![Page 22: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Benchmark and Artificial Examples
22
Crescent Data Set
![Page 23: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments - IRIS Data Set
23
Misclassificatoin
![Page 24: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
24
Conclusions
This paper integrated for SVC: cluster validity measure Outlier detection Merging mechanism
Automatically determine suitable values for Kernel parameter Soft-margin constant
Clustering with Compact and smooth arbitrary-shaped cluster contours Increasing robustness to outliers and noises
![Page 25: A Cluster Validity Measure With Outlier Detection for Support Vector Clustering](https://reader036.fdocuments.us/reader036/viewer/2022062501/568163e1550346895dd53c4f/html5/thumbnails/25.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
25
Comments
Advantage Provide a cluster validity index for a cluster method
Drawback …
Application SVC