Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A....

Post on 18-Jan-2018

220 views 0 download

description

Clustering Problem Clustering and Classification SYRCoSE’09

Transcript of Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A....

Clustering Algorithms Meta Applier (CAMA) Toolbox

Dmitry S. ShalymovKirill S. SkryganDmitry A. Lyubimov

ClusteringClustering• Goals

– To detect the underlying structure in data– To reduce data set capacity– To extract unique objects

• Usage – Data mining– Machine learning– Financial mathematics– Optimization– Statistics– Pattern recognition– Control strategies development

SYRCoSE’09

Clustering ProblemClustering Problem

Xxxx n },...,,{ 21

),( xx

YXA :lg

Clustering and Classification

min][

),(][

ji ji

ji jiji

yy

xxyyW

max

][

),(][

ji ji

jiji ji

yy

xxyyB

SYRCoSE’09

Variety of Clustering AlgorithmsVariety of Clustering Algorithms

• Hierarchical– Aglomerative– Partitioning

• Iterative– Hard (K-means, SVM, SPSA)– Fuzzy (FCM)

Important parameters-Distance norm-Number of clusters-Initial values of cluster centers

SYRCoSE’09

Cluster Stability AlgorithmsCluster Stability Algorithms

• Indexes

• Stability (similarity, merit) functions

• Probabilistic measures assessing the likelihood of a decision

• Density estimation approaches

SYRCoSE’09

Stochastic ApproximationStochastic Approximation

0/:* L)(1 kkkkk ga

/)( Lg

k

ikkikkkki c

ecyecyg2

)()()(

kik

kkkkkkkki c

cycyg

2)()()( T

kpkkk ),...,,( 21

Recursive stochastic approximation

FDSA

SPSA

SYRCoSE’09

SYRCoSE’09

Effectiveness of SPSAEffectiveness of SPSA

SYRCoSE’09

Finding the number of clusters in data setFinding the number of clusters in data set

• Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions

• Select a transformation power, Y

• Calculate the “jumps” in transformed distortion

• Estimate the number of clusters in the data set by

1 KY

KY

K ddJ

Kd

KK JK maxarg*

SYRCoSE’09

Structure of data set detectionStructure of data set detection

SYRCoSE’09

ExamplesExamples

• Iris (3 clusters, 4 features, 150 instances)

• Wine (3 clusters, 13 features, 178 instances)

• Breast Cancer (2 clusters, 32 features, 569 instances)

• Image Segmentation (7 clusters, 19 features, 2310 instances)

SYRCoSE’09

Software Tools for Clustering AnalysisSoftware Tools for Clustering Analysis

• Research– COMPACT– DCPR (Data Clustering & Pattern Recognition)– FCDA (Fuzzy Clustering and Data Analysis Toolbox)– ClusterPack Matlab Toolbox– The Curve Clustering Toolbox– SOM (Self-Organizing Map)– Spectral Clustering Toolbox– Yashil's FCM Clustering

• License software– SPSS– STATISTICA

• Characteristics– Visualization– Efectiveness analysis with patterns– Tools to check performance

• Shortcomings– Limited number of data sets and algorithms– No possibilities to load own algorithm– No on-line services– MATLAB

SYRCoSE’09

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

CAMA. KernelCAMA. Kernel

SYRCoSE’09

CAMA. KernelCAMA. Kernel

SYRCoSE’09

CAMA ToolboxCAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsphttp://ancient.punklan.net:8084/CAMA2/index.jsp

SYRCoSE’09

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Thank you!

SYRCoSE’09