AINL 2016: Muravyov
-
Upload
lidia-pivovarova -
Category
Science
-
view
237 -
download
0
Transcript of AINL 2016: Muravyov
Towards Cluster Validity Index Evaluation and Selection
Andrey Filchenkov, Sergey Muravyov, Vladimir Parfenov
ITMO University St. Petersburg, Russia
{afilchenkov,smuravyov}@corp.ifmo.ru, [email protected]
11.11.2016
What is clustering?
Clustering means grouping the objects based on the information found in the data describing the objects or their relationships.
The goal is that the objects in a group will be similar (or related) to one other and different from (or unrelated to) the objects in other groups.
2 / 35
Applications of clustering
Biology
Medicine
Business and marketing
Social science
Computer science
4 / 35
Clustering Evaluation
No target field
Two ways of partition evaluation:
• External
• Internal
5 / 35
External measures
Based on known class labels and external benchmarks
Examples:
• F-measure
• Rand measure
• Jaccard index
Not applicable for real-life tasks!
6 / 35
Internal measures
Usually assign the best score to the algorithm that produces clusters with high similarity within a cluster and low similarity between clusters
No need for extra information
7 / 35
List of the most popular existing internal measures (clustering validity indicies)
Dunn index (D) COP-index (COP)
Davies-Bouldin index (DB) OS-Index (OS)
Silhouette index (Sil) Generalized Dunn indices (6 indicies: GD*)
Calinski–Harabasz (CH) C-Index (CI)
CS index (CS) Sym-index (Symm)
Modified Davies-Bouldin index (DB*) S Dbw index (SDbw)
Score function (SF)
8 / 35
Problem of choice
There are a lot of CVIs
Which one should be chosen for a hard clustering problem?
• How to compare the existing ones?
• How to evaluate each one?
9 / 35
Approaches for CVI comparison
Visual-based comparison
Comparison with known labels
Purely theoretical comparison that is based on studying CVI properties
Comparison based on stability, robustness of structure, or other desired properties
10 / 35
Approaches for CVI comparison
Visual-based comparison
Comparison with known labels
Purely theoretical comparison that is based on studying CVI properties
Comparison based on stability, robustness of structure, or other desired properties
11 / 35
Practical approach for partition evaluation
It is stated in the classical book [1], ”Understanding our world requires conceptualizing the similarities and differences between the entities that compose it”
M. Ackerman showed that there is no difference in partition evaluations made by clustering experts and non-experts
The only way of obtaining ground truth for partition quality estimation is human estimation
12 / 35 [1] - R.C. Tryon and D. E. Bailey, Cluster analysis, 1970
Formalized ground truth for partitions
Let 𝐶 denote the space of clustering tasks
Weak ordering:
• 𝐻𝑟 𝑐 : 𝐶 → 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 × 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 , 𝑐 ∈ 𝐶
Binary estimation scale
• 𝐻𝑏: 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑐 → {0,1}
13 / 35
Procedure for one assessor
Evaluate all possible partitions
Set partial order, compare all pairs of partitions
15 / 35
Constraints on one-assessors procedure
Choose finite set of partitions
Mark each partition with a number
• There is a score function that can be put into correspondence to any weak strict order
16 / 35
Procedure for multiple assessors
Four criteria for each CVI:
• Binarized adequacy
• Weighted adequacy
• Adequacy of the best
• Aggregated ranking
17 / 35
Binarized adequacy
𝑅𝐵𝐴 = 𝜌𝐾𝜏(𝑏𝑏𝑒𝑠𝑡,𝑏𝐶𝑉𝐼)
𝜌𝐾𝜏(𝑏𝑏𝑒𝑠𝑡,𝑏𝑤𝑜𝑟𝑠𝑡)
• 𝑏𝐶𝑉𝐼 — permutation on binary marks with respect to the CVIs estimations of these partitions, contains of + and −
• 𝑏𝑏𝑒𝑠𝑡 — best permutation: + + ⋯ + − ⋯ −
• 𝑏𝑤𝑜𝑟𝑠𝑡— worst permutation: − ⋯ − + ⋯ +
• 𝜌𝐾𝜏 — modified Kendall tau distance
18 / 35
Weighted adequacy
𝑅𝑊𝐴 = 𝜌𝐾𝜏(𝑟𝑏𝑒𝑠𝑡,𝑟𝐶𝑉𝐼)
𝜌𝐾𝜏(𝑟𝑏𝑒𝑠𝑡,𝑟𝑤𝑜𝑟𝑠𝑡)
• Instead of + and − we take the amount of assessors that gave an adequate mark to CVI: 𝑤𝑖
• 𝑟𝐶𝑉𝐼 — permutation on weighted marks with respect to the CVIs estimations of these partitions
• 𝑟𝑏𝑒𝑠𝑡 — best permutation: 𝑤1 ≥ ⋯ ≥ 𝑤𝑛
• 𝑟𝑤𝑜𝑟𝑠𝑡— worst permutation:𝑤1 ≤ ⋯ ≤ 𝑤𝑛
• 𝜌𝐾𝜏 — modified Kendall tau distance
19 / 35
Adequacy of the best
Chose the adequacy mark, which was assigned to the partition having the highest value of CVI being estimated.
20 / 35
Aggregated ranking
Represents how many orderings produced by the assessors and by each CVI differ
Weak order aggregation algorithm and distance measure 𝜌 J. L. Garcia-Lapresta and D. Perez-Roman
𝑅𝐴𝑅 = 𝜌(𝑎𝑟𝑏𝑒𝑠𝑡,𝑎𝑟𝐶𝑉𝐼)
𝜌(𝑎𝑟𝑏𝑒𝑠𝑡,𝑎𝑟𝑤𝑜𝑟𝑠𝑡)
• 𝑎𝑟∗— aggregated ordering
21 / 35
Experimental evaluation
19 of the most popular CVIs were taken to find out if any of them matches the real quality of resulting clusters
22 / 35
Clustering algorithms
6 algorithms:
• 𝑘-Means
• X-Means
• EM
• DBSCAN
• FarthestFirst
• Hierarchical
14 configurations totally
24 / 35
Summarization for binarized adequacy
CVI Adequacy CVI Adequacy
DB 0.151 Symm 0.333
D 0.515 CI 0.121
Syl 0.393 DB* 0.121
CH 0.352 GD31 0.333
SDBW 0.454 GD41 0.435
SF 0.393 GD51 0.393
CS 0.303 GD33 0.352
COP 0.575 GD43 0.272
SV 0.272 GD53 0.333
OS 0.575 threshold 0.7
27 / 35
Summarization for weighted adequacy
CVI Adequacy CVI Adequacy
DB 0.121 Symm 0.353
D 0.317 CI 0.073
Syl 0.121 DB* 0.000
CH 0.097 GD31 0.097
SDBW 0.195 GD41 0.097
SF 0.170 GD51 0.121
CS 0.146 GD33 0.073
COP 0.414 GD43 0.035
SV 0.195 GD53 0.035
OS 0.317 threshold 0.7
28 / 35
Adequacy of the cluster partitions, claimed to be the best by CVIs
CVI # of the best CVI # of the best
DB 4 (9.7%) Symm 5 (12.1%)
D 4 (9.7%) CI 4 (9.7%)
Syl 4 (9.7%) DB* 4 (9.7%)
CH 4 (9.7%) GD31 5 (12.1%)
SDBW 6 (14.6%) GD41 5 (12.1%)
SF 4 (9.7%) GD51 6 (14.6%)
CS 6 (14.6%) GD33 4 (9.7%)
COP 6 (14.6%) GD43 4 (9.7%)
SV 6 (14.6%) GD53 5 (12.1%)
OS 5 (12.1%) threshold 70%
29 / 35
Summarization for weighted adequacy
CVI Rank CVI Rank
DB 0.390 Symm 0.292
D 0.353 CI 0.353
Syl 0.170 DB* 0.317
CH 0.073 GD31 0.073
SDBW 0.353 GD41 0.146
SF 0.146 GD51 0.195
CS 0.146 GD33 0.035
COP 0.414 GD43 0.035
SV 0.414 GD53 0.073
OS 0.073 threshold 0.7
30 / 35
Results
None of the metrics fits the requirements described above
No perfectly applicable, universal metric
CVIs should be chosen for problems specifically
31 / 35
Main disadvantage of proposed measure evaluation framework
Expensive: each new dataset has to be evaluated by assessors
Improvement:
• Meta-learning
32 / 35
Meta-learning approach
Selecting a CVI based on the set of meta-features
Classification task:
• Train on datasets by using framework described above
33 / 35
Conclusion
Assessors’ estimates should be used as the ground truth (and measure of CVI quality)
No universal clustering validity index existing at the moment
Cluster validity indices should be chosen for problems specifically (meta-leaning approach)
34 / 35