Download - Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Transcript
Page 1: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta Applier (CAMA) Toolbox

Dmitry S. ShalymovKirill S. SkryganDmitry A. Lyubimov

Page 2: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

ClusteringClustering• Goals

– To detect the underlying structure in data– To reduce data set capacity– To extract unique objects

• Usage – Data mining– Machine learning– Financial mathematics– Optimization– Statistics– Pattern recognition– Control strategies development

SYRCoSE’09

Page 3: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering ProblemClustering Problem

Xxxx n },...,,{ 21

),( xx

YXA :lg

Clustering and Classification

min][

),(][

ji ji

ji jiji

yy

xxyyW

max

][

),(][

ji ji

jiji ji

yy

xxyyB

SYRCoSE’09

Page 4: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Variety of Clustering AlgorithmsVariety of Clustering Algorithms

• Hierarchical– Aglomerative– Partitioning

• Iterative– Hard (K-means, SVM, SPSA)– Fuzzy (FCM)

Important parameters-Distance norm-Number of clusters-Initial values of cluster centers

SYRCoSE’09

Page 5: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Cluster Stability AlgorithmsCluster Stability Algorithms

• Indexes

• Stability (similarity, merit) functions

• Probabilistic measures assessing the likelihood of a decision

• Density estimation approaches

SYRCoSE’09

Page 6: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Stochastic ApproximationStochastic Approximation

0/:* L)(1 kkkkk ga

/)( Lg

k

ikkikkkki c

ecyecyg2

)()()(

kik

kkkkkkkki c

cycyg

2)()()( T

kpkkk ),...,,( 21

Recursive stochastic approximation

FDSA

SPSA

SYRCoSE’09

Page 7: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

SYRCoSE’09

Page 8: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Effectiveness of SPSAEffectiveness of SPSA

SYRCoSE’09

Page 9: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Finding the number of clusters in data setFinding the number of clusters in data set

• Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions

• Select a transformation power, Y

• Calculate the “jumps” in transformed distortion

• Estimate the number of clusters in the data set by

1 KY

KY

K ddJ

Kd

KK JK maxarg*

SYRCoSE’09

Page 10: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Structure of data set detectionStructure of data set detection

SYRCoSE’09

Page 11: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

ExamplesExamples

• Iris (3 clusters, 4 features, 150 instances)

• Wine (3 clusters, 13 features, 178 instances)

• Breast Cancer (2 clusters, 32 features, 569 instances)

• Image Segmentation (7 clusters, 19 features, 2310 instances)

SYRCoSE’09

Page 12: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Software Tools for Clustering AnalysisSoftware Tools for Clustering Analysis

• Research– COMPACT– DCPR (Data Clustering & Pattern Recognition)– FCDA (Fuzzy Clustering and Data Analysis Toolbox)– ClusterPack Matlab Toolbox– The Curve Clustering Toolbox– SOM (Self-Organizing Map)– Spectral Clustering Toolbox– Yashil's FCM Clustering

• License software– SPSS– STATISTICA

• Characteristics– Visualization– Efectiveness analysis with patterns– Tools to check performance

• Shortcomings– Limited number of data sets and algorithms– No possibilities to load own algorithm– No on-line services– MATLAB

SYRCoSE’09

Page 13: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 14: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 15: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 16: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 17: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsphttp://ancient.punklan.net:8084/CAMA2/index.jsp

SYRCoSE’09

Page 18: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 19: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 20: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Thank you!

SYRCoSE’09