Locally Constraint Support Vector Clustering

15
Time Series Data Mining Group Time Series Data Mining Group Locally Constraint Support Vector Clustering Dragomir Yankov, Eamonn Keogh, Kin Fai Kan Computer Science & Eng. Dept. University of California, Riverside

description

Locally Constraint Support Vector Clustering. Dragomir Yankov, Eamonn Keogh, Kin Fai Kan Computer Science & Eng. Dept. University of California, Riverside. Outline. On the need of improving the Support Vector Clustering (SVC) algorithm. Motivation Problem formulation Locally constrained SVC - PowerPoint PPT Presentation

Transcript of Locally Constraint Support Vector Clustering

Page 1: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Locally Constraint Support Vector Clustering

Dragomir Yankov, Eamonn Keogh, Kin Fai Kan

Computer Science & Eng. Dept.

University of California, Riverside

Page 2: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Outline

• On the need of improving the Support Vector Clustering (SVC) algorithm. Motivation

• Problem formulation

• Locally constrained SVC

– An overview of SVC

– Applying factor analysis for local outlier detection

– Regularizing the decision function of SVC

• Experimental evaluation

Page 3: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Motivation for improving SVC

• SVC transforms the data in a high dimensional feature space, where a decision function is computed

• The support-vectors define contours in the original space representing higher density regions

• The method is theoretically sound and useful for detecting non-convex formations

original data

detected clusters

Page 4: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Motivation for improving SVC (cont)

• Parametrizing SVC incorrectly may either disguise some objectively present clusters, or produce multiple unintuitive clusters

• Correct parametrization is especially hard in the presence of noise (frequently encountered when learning from embedded manifolds)

large kernel widths merge the clusters

small kernel widths produce multiple unintuitive clusters

Page 5: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Problem formulation

How can we make Support Vector Clustering:

1. Less susceptible to noise in the data

2. More resilient to imprecise parametrization

Page 6: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Support Vector density estimation

• Primal formulation

• Dual formulation

Locally constrained SVC – one class classification

Page 7: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Support Vector Clustering – decision function

• Labeling the individual classes

Locally constrained SVC – labeling the closed contours

Build an affinity matrix and find the connected components

Page 8: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Factor analysis:

• Mixture of factor analyzers

• We can adapt MFA to pinpoint local outliers

Locally constrained SVC – detecting local outliers

duzx

D RZRX

),0(),,0( NuINz

),(

jj

jj

Nz

uzx

Points like P1and P2 that deviate a lot from the FA are among the true outliers

Page 9: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• To compute the local deviation of each point we use their Mahalanobis distances with respect to the corresponding FA

• New primal formulation (weighting the slack variables)

• New dual formulation

Locally constrained SVC – regularizing the decision function

Page 10: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Gaussian with radial Gaussian distributions

Experimental evaluation – synthetic data

Good parameter values for LSVC are detected automatically. The right clusters are detected

SVC is harder to parametrize. The detected clusters are incorrect

LSVC

SVC

Page 11: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Swiss roll data with added Gaussian noise

Experimental evaluation – synthetic data

Most of the noise is identified as bounded SVs by LSVC. The correct clusters are detected

SVC tends to merge the two large clusters. With supervision the clusters are eventually identified

LSVC

SVC

Page 12: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Frey face dataset

Experimental evaluation – face images

LSVC discriminates the two objectively interesting manifolds embedding the data

Even with supervision we could not find parameters that separate the two major manifolds with SVC

LSVC

SVC

Page 13: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

• Arrowheads dataset

Experimental evaluation – shape clustering

Some of the classes are similar. There are multiple elements bridging their shape manifolds

LSVC achieves 73% accuracy vs 60% for SVC

LSVC

SVC

Page 14: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

Conclusion

• The LSVC method combines both a global and a local view of the data

– It computes a decision function that defines a global measure of density support

– MFA complements this with a local view based on the individual analyzers

• The algorithm improves significantly on the stability of SVC in the presence of noise

• LSVC allows for easier automatic parameterization of one-class SVMs

Page 15: Locally Constraint Support Vector Clustering

Time Series Data Mining GroupTime Series Data Mining Group

All datasets and the code for LSVC can be obtained by writing to the first author: [email protected]

THANK YOU!