Data Analysis - Lecture 4 Introduction to unsupervised...

110
Data Analysis - Lecture 4 Introduction to unsupervised learning: clustering B. Matei and J. Sublime LIPN - CNRS UMR 7030 B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 1 / 100

Transcript of Data Analysis - Lecture 4 Introduction to unsupervised...

Page 1: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Data Analysis - Lecture 4Introduction to unsupervised learning: clustering

B. Matei and J. Sublime

LIPN - CNRS UMR 7030

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 1 / 100

Page 2: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Plan

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 2 / 100

Page 3: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction

Outline

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 3 / 100

Page 4: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Unsupervised Learning

What is unsupervised learning ?

Unsupervised learning is a Machine Learning task the aim of which is tofind hidden structures and patterns in unlabeled data.There are several tasks linked to unsupervised learning:

Data partitioning (or clustering)

Anomalies detection

Unsupervised neural-networks

Latent variables model learning

Remark

Unlike statistical analysis methods that we have seen so far and whosefocus was the relations between the features of a data set, unsupervisedlearning focuses on links and similarities between the data themselves.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 4 / 100

Page 5: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Unsupervised Learning

What is unsupervised learning ?

Unsupervised learning is a Machine Learning task the aim of which is tofind hidden structures and patterns in unlabeled data.There are several tasks linked to unsupervised learning:

Data partitioning (or clustering)

Anomalies detection

Unsupervised neural-networks

Latent variables model learning

Remark

Unlike statistical analysis methods that we have seen so far and whosefocus was the relations between the features of a data set, unsupervisedlearning focuses on links and similarities between the data themselves.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 4 / 100

Page 6: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Unsupervised Learning

What is unsupervised learning ?

Unsupervised learning is said to be “unsupervised” because it findsstructures and build a model from completely unlabeled data. It thereforediffers from supervised learning (Cf. Lecture 5) which builds a modelgiven already labeled data.

(a) Unsupervised learning flow (b) Supervised learning flow

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 5 / 100

Page 7: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Data partitioning

Data partitioning : Definition

Data partitioning -or clustering- is a Machine Learning task of exploratorydata mining the aim of which is to split a data set made of several data (alsocalled objects, data objects, or observations) into several subsets containinghomogeneous and similar data. The groups created via this process arecalled clusters.

Applications of data partitioning

Pattern recognition, web mining, business intelligence, biology, securityapplications, customer analysis, etc.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 6 / 100

Page 8: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Data partitioning

Data partitioning : Examples

We want to automatically discover groups of similar data:

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 7 / 100

Page 9: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Data partitioning

Data partitioning : Examples

Figure: Example of 2 clusters with Gaussian distributionsB. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 8 / 100

Page 10: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Introduction Data partitioning

Data partitioning : Examples

Figure: Community detection in social network graphsB. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 9 / 100

Page 11: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering

Outline

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 10 / 100

Page 12: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Cluster : Definition

Definition

A cluster is a group of relatively homogeneous data that share morecommon characteristics between each other than with the data belongingto other clusters.

The notion of cluster therefore relies on the notion of similarity anddissimilarity between the data.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 11 / 100

Page 13: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Clusters and similarity

What is similarity ?

The quality or state of being similar, the likeness, the resemblance,the a similarity of features.

How to define the notion of similarity between complex objectsdescribed by several attributes ?

Similarity is hard to define, but ...

You know it when you see it !

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 12 / 100

Page 14: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Clusters and similarity

What is similarity ?

The quality or state of being similar, the likeness, the resemblance,the a similarity of features.

How to define the notion of similarity between complex objectsdescribed by several attributes ?

Similarity is hard to define, but ...

You know it when you see it !B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 12 / 100

Page 15: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Clusters, similarity and distance

The similarity or dissimilarity between two observations is most oftencomputed using a distance function.

Depending on the data and the application, some distances may provemore appropriate than others (e.g. Hamming distance for text data).

Euclidian distance ||a− b||2 =√∑

i (ai − bi )2

Squared Euclidian distance ||a− b||22 =∑

i (ai − bi )2

Manhattan distance ||a− b||1 =∑

i |ai − bi |Maximum distance ||a− b||∞ = maxi |ai − bi |

Mahalanobis distance√

(a− b)>S−1(a− b) where S is the covariance matrixHamming distance Hamming(a, b) =

∑i (1− δai ,bi )

Table: Examples of common distances

Creating custom distance functions is sometimes required.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 13 / 100

Page 16: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Clustering partitions

Let us denote X = {x1, · · · , xN}, xn ∈ Rd a data data set containing Nobservations described by d real features, and C = {c1, ..., cK} the set ofpossible clusters.

Clustering partitions

The result of a clustering algorithm is called a clustering partition, orjust partition. Depending on whether or not a data can belong to severalclusters, the partition can be hard, soft, or fuzzy.

(a) Hard clustering

c1 c2 c3

x1 1 0 0x2 0 1 0x3 0 0 1x4 0 0 1

(b) Soft clustering

c1 c2 c3

x1 1 1 0x2 0 1 1x3 0 0 1x4 0 0 1

(c) Fuzzy clustering

c1 c2 c3

x1 0.9 0.1 0x2 0 0.8 0.2x3 0 0.3 0.7x4 0 0 1.0

Table: Example of several objects’ degree of belonging to 3 clustersB. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 14 / 100

Page 17: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering The notion of cluster

Clustering partitions

(a) Hard clustering

c1 c2 c3

x1 1 0 0x2 0 1 0x3 0 0 1x4 0 0 1

(b) Soft clustering

c1 c2 c3

x1 1 1 0x2 0 1 1x3 0 0 1x4 0 0 1

(c) Fuzzy clustering

c1 c2 c3

x1 0.9 0.1 0x2 0 0.8 0.2x3 0 0.3 0.7x4 0 0 1.0

Let us note S the clustering partition:

In hard clustering, S is a vector S = {s1, · · · , sN}, sn ∈ [1..K ]

In soft clustering, S is a matrix S = {sn,k}(N×K), sn,k ∈ (0, 1)

In fuzzy clustering, S is a matrix too:S = {sn,k}(N×K), sn,k ∈ [0, 1] ∀n,

∑k sn,k = 1

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 15 / 100

Page 18: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Types of clustering algorithms

Now that we know what we want to do, the question is: Given a data set,a distance function and a desired type of partition, how do we find theclusters ?

There are several “families” of clustering algorithms that favor differentmethods to extract the clusters from the data:

Density-based clustering algorithms

Hierarchical clustering algorithms

Prototype-based clustering algorithms

Distribution-bsed clustering methods

Spectral clustering methods

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 16 / 100

Page 19: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Density-based clustering algorithms

Inspired from the human vision: theclusters are defined as high densityspaces and are separated by lowdensity areas.

Examples of algorithms: DBSCAN,OPTICS

Properties

Pros: Does not presume the shape ofthe clusters or their number, detectoutliers.

Cons: High complexity O(N2),difficult to parametrize

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 17 / 100

Page 20: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Hierarchical clustering algorithms

Inspired from phylogenetic trees in biology.

Approaches: agglomerative, divisive

Examples of algorithms: HCA, CURE, CLINK

Properties

Pros: Highlight hierarchical cluster structures,comprehensive visualization of the clusters

Cons: High complexity O(N2 logN)toO(N3),choosing where to cut the dendrogram can becumbersome.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 18 / 100

Page 21: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Prototype-based clustering algorithms

Uses vector quantization to sum up thedata using representatives (e.g. meanvalues)

Examples of algorithms: K-Means,Fuzzy C-Means

Properties

Pros: Lower complexity O(N)

Cons: Requires to give the number ofclusters and to presume of their shapes.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 19 / 100

Page 22: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Distribution-based clustering algorithms

Uses strong mathematical modelsto describe the clusters (e.g.mixture of gaussian distributions)

Examples of algorithms: EM

Properties

Pros: Lower complexity O(N),strong models, convergence proofs

Cons: Requires to give the numberof clusters and to assume that theclusters will follow a givendistribution.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 20 / 100

Page 23: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Clustering Types of clustering algorithms

Spectral clustering algorithms

Sees clustering as a graph partitioning problem using the similaritymatrix between the different objects.

Examples of algorithms: Shi-Malik algorithm (normalized cuts),Meila-Shi algorithms.

Mainly use for image analysis

Properties

Pros: Good results, does not assume any shape for the clusters, fastwith sparse data.

Cons: High complexity O(N3), does not scale well, not very intuitive.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 21 / 100

Page 24: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms

Outline

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 22 / 100

Page 25: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Introduction

DBSCAN is a density based algorithm:

The “Density” is based on the number of data within the radius“Eps” of a given element.

A point is a core point if it has more data in its neighboring radiusthan a specified number of points “MinPts”.

Core points are the main structures for the clusters.

A border point has fewer points in its radius Eps than the numberMinPts but it is within range of a core point.

A noise point has fewer points in its radius Eps than the numberMinPts and is not within range of a core point.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 23 / 100

Page 26: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Introduction

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 24 / 100

Page 27: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Algorithm

Function DBSCAN(X ,ε,MinPts)C=0forall xn ∈ X do

if xn has not been visited mark xn as visitedVn= regionQuery(xn ,ε)if sizeof(Vn)≥MinPts then

expandCluster(xn ,Vn ,C++,ε,MinPts)else

mark xn as noiseend

end

Function expandCluster(xn ,Vn ,C,ε,MinPts)Add xn to cluster Cforall xi ∈ Vn do

if xi has not been visited thenMark xi as visitedVi = regionQuery(xi ,ε)if sizeof(Vi ) > MinPts then

Vn+ = Viend

endifxi does not belong to any cluster yet: Add xi to cluster C

end

Function regionQuery(xn ,ε)forall xi ∈ X do

if i 6= n and d(xn, xi ) ≤ ε: list.add({xi})endreturn list

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 25 / 100

Page 28: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 26 / 100

Page 29: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Strengths

Resistant to noise

Can handle clusters of different shapes and sizes

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 27 / 100

Page 30: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms DBSCAN

DBSCAN: Limits

Difficult to parametrize

High Computational cost

Does not handle well varying densities

OPTICS is a good evolution of DBSCAN that handle varying densitiesa lot better.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 28 / 100

Page 31: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Introduction

Principle

Create a partition a each step by aggregating together the 2 closestclusters.

Each data starts out as a single cluster.

During the process a cluster can be a single data, or a group of data.

The algorithm return a hierarchy of partitions in the form of a treecontaining the history of the different aggregations.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 29 / 100

Page 32: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Algorithm

1 Repeat until there is only one cluster left:

1.a Compute the distance matrix between all existing clusters1.b Merge the two closest clusters1.c Update the dendrogram (hierarchical tree)

2 Cut the tree according to a criterion of your choice to get the finalpartition.

This algorithm requires distance criterion between clusters: thelinkage criterion.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 30 / 100

Page 33: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Linkage criteria

Name Formula Comments

Single-linkage Ds (c1, c2) = minx∈c1,y∈c2

d(x, y) Distance between the two closest elements

Complete-linkage Dc (c1, c2) = maxx∈c1,y∈c2

d(x, y) Distance between the two farthest element

Average-linkage Da(c1, c2) =1

|c1||c2|

∑x∈c1

∑y∈c2

d(x, y) Average pairwise distance

Centroid-linkage Dµ(c1, c2) = ||µ1 − µ2|| Distance between the centroids

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 31 / 100

Page 34: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Single-Linkage criterion

Can handle non-elliptical clusters

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 32 / 100

Page 35: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Single-Linkage criterion

Sensitive to noise and outliers

Cannot handle clusters that are in contact

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 33 / 100

Page 36: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Complete-Linkage criterion

Less sensitive to noise and outliers

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 34 / 100

Page 37: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Complete-Linkage criterion

Biased toward spherical clusters

Tends to break large clusters

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 35 / 100

Page 38: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Other linkages

Average-linkage

Less noise sensitive, tends to favor hyper-spherical clusters

High computational cost for roughly the same results as the centroid-linkage.

Centroid-linkage

A good compromise between single and complete link.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

Ward’s method

This similarity is based on the increase on the mean squared distance to the centroid when 2clusters are merged.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 36 / 100

Page 39: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Other linkages

Average-linkage

Less noise sensitive, tends to favor hyper-spherical clusters

High computational cost for roughly the same results as the centroid-linkage.

Centroid-linkage

A good compromise between single and complete link.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

Ward’s method

This similarity is based on the increase on the mean squared distance to the centroid when 2clusters are merged.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 36 / 100

Page 40: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Other linkages

Average-linkage

Less noise sensitive, tends to favor hyper-spherical clusters

High computational cost for roughly the same results as the centroid-linkage.

Centroid-linkage

A good compromise between single and complete link.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

Ward’s method

This similarity is based on the increase on the mean squared distance to the centroid when 2clusters are merged.

Less sensitive to noise and outliers.

tends to favor elliptical clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 36 / 100

Page 41: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 37 / 100

Page 42: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 38 / 100

Page 43: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 39 / 100

Page 44: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 40 / 100

Page 45: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 41 / 100

Page 46: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 42 / 100

Page 47: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Hierarchical clustering analysis

HCA: Examples comparisons

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 43 / 100

Page 48: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Introduction

The method of the K Means, or K-Means algorithm, is a special case of themobile centers methods. Its principle consists in using a certain number ofrepresentatives (centroids, or prototypes) in the data space. Each of theserepresentative will represent a cluster.

In the end, each data is associated to its closest prototype in order toobtain the clustering partition.

The groups formed with this process are homogeneous and wellseparated.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 44 / 100

Page 49: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Algorithm

1 Randomly select K initial center.

2 Assign each data x to its closest center µi .

3 If the partition doesn’t change: stop

4 Else, update the centers, each µi must be the gravity center of itscluster: µi = 1

|ci |∑

x∈ci x

5 Go to 2

Like in most clustering algorithms, step 2 is dependent on a chosendistance function. For the K-Means algorithm, the Euclidian distance isusually prefered:

d(x , y) =

√√√√ d∑i=1

(xi − yi )2

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 45 / 100

Page 50: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 46 / 100

Page 51: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 47 / 100

Page 52: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 48 / 100

Page 53: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 49 / 100

Page 54: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Properties

Guaranteed to monotonically decrease average squared distance ineach iteration

L(µ)(t) =N∑i=1

minj||µj − xi ||22

L(µ)(t+1) ≤ L(µ)(t)

Convergence to a local minimum

Algorithmic complexity at each iteration : O(n · d · K )

Non-convex optimization : NP-hard problem

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 50 / 100

Page 55: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Properties

Guaranteed to monotonically decrease average squared distance ineach iteration

L(µ)(t) =N∑i=1

minj||µj − xi ||22

L(µ)(t+1) ≤ L(µ)(t)

Convergence to a local minimum

Algorithmic complexity at each iteration : O(n · d · K )

Non-convex optimization : NP-hard problem

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 50 / 100

Page 56: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Properties

Guaranteed to monotonically decrease average squared distance ineach iteration

L(µ)(t) =N∑i=1

minj||µj − xi ||22

L(µ)(t+1) ≤ L(µ)(t)

Convergence to a local minimum

Algorithmic complexity at each iteration : O(n · d · K )

Non-convex optimization : NP-hard problem

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 50 / 100

Page 57: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Properties

Guaranteed to monotonically decrease average squared distance ineach iteration

L(µ)(t) =N∑i=1

minj||µj − xi ||22

L(µ)(t+1) ≤ L(µ)(t)

Convergence to a local minimum

Algorithmic complexity at each iteration : O(n · d · K )

Non-convex optimization : NP-hard problem

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 50 / 100

Page 58: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Limits

Instability

The K-Means algorithm is highly dependent on the initialization of thecenters.

This problem can be reduced by running the algorithm several time.

It is also possible to initialize the centers using another clusteringalgorithm.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 51 / 100

Page 59: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Limits

Clusters shapes, size and density

The K-Means algorithm can only capture spherical or almost sphericalclusters. Any non-convex shape will be missed.

Clusters with different density or sizes can also be difficult to findusing the K-Means algorithm.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 52 / 100

Page 60: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms K-Means algorithm

K-Means: Limits

Choosing the number of clusters

The K-means algorithms requires the number of clusters “K” to beprovided as a parameter. In purely exploratory data mining tasks, thisnumber is not always known and must be guessed.

Sometimes the only solution is to try several values for K and to seewhich gives the best results.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 53 / 100

Page 61: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: introduction

The Expectation-Maximization algorithm (EM) is an iterative methodused to find the maximum likelihood or the maximum a posteriori estimateof parameters in probabilistic and statistical models.

Adapatation to clustering

In clustering, it is possible to represent clusters using parametricdistribution: Each cluster has its own distribution with its ownparameters.

The EM algorithm can then be used to figure out these parametersand link each data to a cluster.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 54 / 100

Page 62: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Gaussian mixture model

The Gaussian mixture model consists in modeling clusters as Gaussiansdescribed by their mean value, their covariance matrix and a mixingprobability.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 55 / 100

Page 63: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Gaussian mixture model

Gaussian model in dimension 1

µ ∈ R the mean value of the gaussian distribution

σ ∈ R+ the standard deviation of the distribution

N (µ, σ) =1

σ√

2πexp−

12

(x−µ)2

σ2

Gaussian model in dimension d

µ ∈ Rd the mean value of the gaussian distribution

Σ = (σi ,j)d×d the variance-covariance matrix of the distribution

N (µ,Σ) =exp−

12

(x−µ)T Σ−1(x−µ)√|Σ|(2π)d

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 56 / 100

Page 64: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Gaussian mixture model

The Gaussian mixture model consists in modeling clusters as Gaussiansdescribed by their mean value, their covariance matrix and a mixingprobability.

p(x |Θ) =K∑

c=1

πc · p(x |θc)

K∑c=1

πc = 1, θc = {µc ,Σc}

p(x |θc) =exp−

12

(x−µc )T Σ−1c (x−µc )√

|Σc |(2π)d

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 57 / 100

Page 65: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Optimization process

The goal of the EM algorithm is to find the parameters Θ that maximizethe likelihood of generating the observed data:

ΘML = ArgmaxΘ

{log p(X |Θ)

}p(X |Θ) =

n∏i=1

p(xi |Θ)

This is done via an iterative two-step process:

Expectation step: Update the latent variables (cluster labels) usingthe current parameters Θ: assign each data to the cluster c thatmaximizes πc · p(x |θc).

Maximisation step: Use the new latent variables to update theparameters: Update the mean value, covariance matrix and mixingprobabilities of all clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 58 / 100

Page 66: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Update rules

E-Step

P(xn ∈ c) = sn,c =πcp(xn|θc)∑Kk=1 πkp(xn|θk)

=1

Zπc

exp−12

(x−µc )T Σ−1c (x−µc )√

|Σc |(2π)d

M-Step

Nc =N∑

n=1

sn,c =⇒ πc =Nc

N

µc =1

Nc

N∑n=1

sn,c · xn

Σc =1

Nc

N∑n=1

sn,c · (xn − µc)(xn − µc)T

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 59 / 100

Page 67: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Algorithm

Algorithm 1: EM Algorithm for the GMM

Initialize Θ randomlywhile the partition S is not stable do

E-Step: evaluate S = argmaxSp(S |X ,Θ)forall xn ∈ X do

sn(k) = 1Z πcN (xn, µc ,Σc)

endM-Step: Re-evaluate Θforall c ∈ [1..K ] do

Update µc , Σc and πcend

end

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 60 / 100

Page 68: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 61 / 100

Page 69: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Strengths

Is tolerant to clusters with different densities

Can detect clusters that are in contact

Strong mathematical model that can be adapted to other types ofmixtures (Bernouilli, Multinomials, etc.)

Existence of a convergence proof

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 62 / 100

Page 70: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Limits

The EM algorithm for the GMM model detects only ellipsoid clusters.

Computing Σ−1 the inverse of the variance-covariance matrix can beboth difficult and time consuming in high dimension.

Can be solved by using diagonal matrices, but it is less accurate.

Very much like the K-Means algorithm, the K needs to be providedand the EM algorithm is dependent on its initialization.

Can be improved by initializing the EM algorithm using K-Means orHCA ...

Remark

The K-Means algorithm is a degenerate case of EM algorithm for theGMM where Σ = Id and ∀c , πc = 1

K .

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 63 / 100

Page 71: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms EM algorithm for the GMM

EM algorithm for the GMM: Limits

The EM algorithm for the GMM model detects only ellipsoid clusters.

Computing Σ−1 the inverse of the variance-covariance matrix can beboth difficult and time consuming in high dimension.

Can be solved by using diagonal matrices, but it is less accurate.

Very much like the K-Means algorithm, the K needs to be providedand the EM algorithm is dependent on its initialization.

Can be improved by initializing the EM algorithm using K-Means orHCA ...

Remark

The K-Means algorithm is a degenerate case of EM algorithm for theGMM where Σ = Id and ∀c , πc = 1

K .

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 63 / 100

Page 72: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering

Simple, yet powerful clustering method

Requires less assumptions on the form of the clusters.

Outperforms the traditional clustering approaches.

Reasonably fast for sparse data, very slow otherwise.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 64 / 100

Page 73: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Difficult clusterings

Some data don’t lend themselves for a centroid, a prototype-based, ora density-based clustering.

Spectral clustering proposes a different approach.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 65 / 100

Page 74: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Difficult clusterings

Figure: Example of a partitioning that most clustering methods cannot find

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 66 / 100

Page 75: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering and graph theory

Spectral clustering treats the data clustering as a graph partitioningproblem:

Let us note X = {x1, · · · , xn}, xi ∈ Rd the data.

The points are characterized by pairwise similarities: To each pair(xi , xj) we can associate a similarity value wij = f (d(xi , xj), θ).

The most common model is:

wi,j = exp

(−1

2σ2||xi − xj ||22

)Let us note W = (wij)n×n this similarity matrix.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 67 / 100

Page 76: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering: Graph construction

There are different ways to build a graph based on the similarity matrix:

Fully connected graphs: All vertices having non-null similarities areconnected to each other.Radius-based graphs: Each vertex is connected to all the verticesfalling inside a ball of similarity of radius r .K-nearest neighbor graphs: Each vertex is connected to its K mostsimilar neighbors.Hybrid KNN and radius-based graphs

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 68 / 100

Page 77: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering and graph partitioning

In spectral clustering, the goal is to find two clusters that have theminimum weight sum connections.

This can be done by minimizing:

Cut(c1, c2) =∑i∈c1

∑j∈c2

wi ,j

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 69 / 100

Page 78: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering and graph partitioning

It is easy to prove that the previous MinCut problem can be written as:

JMinCut =1

4qT (D −W )q

where D is a diagonal matrix so that dii =∑n

j=1 wij , and q is a vector ofsize n so that:

qi =

{1 if xi ∈ c1

−1 if xi ∈ c2

We are therefore looking for the vector/partition q that minimizesthis equation.

This can be done by finding the eigenvalues and eigenvectors of theLaplacian matrix L = D −W .

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 70 / 100

Page 79: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering and graph partitioning

Ideally, we also would like to maximize the intra-cluster similarity:

First constraint: Minimize Cut(c1, c2)

Second constraint: Maximize Cut(c1, c1) and Cut(c2, c2)

This can be done by minimizing the following objective function:

JNcut(c1, c2) = Cut(c1, c2)

(1∑

i∈c1

∑nj=1 wij

+1∑

i∈c2

∑nj=1 wij

)

It is proved that this minimization is obtained through the secondsmallest eigenvector of the matrix LNcut = D−1/2(D −W )D−1/2

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 71 / 100

Page 80: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering: steps

Pre-processing

Compute the similarity matrix and build the graph

Spectral representation

Compute the associated Laplacian Matrix.

Compute the eigenvalues and eigenvectors.

Clustering

Assign each point to a cluster depending on the values in the secondeigenvector.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 72 / 100

Page 81: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering: Example

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 73 / 100

Page 82: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Spectral clustering: Example

Pre-processing:

Build the Laplacian MatrixL = D −W .

Decomposition:

Find the eigenvalues Λ.

Find the eigenvectors

Clustering:

Assign each data to acluster depending on thesign of the 2nd eigenvectorcomponents.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 74 / 100

Page 83: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Extension to several clusters

This method so far only allows to have 2 clusters. We would like to havemore !

Method 1: Recursively applying the algorithm in a hierarchical divisivemanner.

Weaknesses: Time consuming, inefficient, unstable.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 75 / 100

Page 84: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Extension to several clusters

This method so far only allows to have 2 clusters. We would like to havemore !

Method 1: Recursively applying the algorithm in a hierarchical divisivemanner.

Weaknesses: Time consuming, inefficient, unstable.B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 75 / 100

Page 85: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Extension to several clusters

Method 2: Change the objective function to handle K clusters.

Example:

JNcut(c1, · · · , cK ) =K∑i=1

Cut(ci , c̄i )∑j∈ci

∑nk=1 wjk

Weaknesses: The optimization process is tedious and often mathematicallyunpractical.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 76 / 100

Page 86: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Extension to several clusters

Method 2: Change the objective function to handle K clusters.

Example:

JNcut(c1, · · · , cK ) =K∑i=1

Cut(ci , c̄i )∑j∈ci

∑nk=1 wjk

Weaknesses: The optimization process is tedious and often mathematicallyunpractical.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 76 / 100

Page 87: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Examples of clustering algorithms Spectral Clustering

Extension to several clusters

Method 3: Use the other eigenvectors

Combining the K-Means algorithm and spectral clustering

The matrix U containing eigenvectors 2 to K can be used as a lowdimension representation of the data.

The matrix must first go through a unit normalization process:

Yij =Uij√∑nj=1 U

2ij

After a unit normalization of each row, the K-Means algorithm (orany other clustering algorithm) can be applied to Y to obtain apartition with K clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 77 / 100

Page 88: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering

Outline

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 78 / 100

Page 89: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking a similarity measure

Picking a similarity measure

Whatever the type of clustering algorithm, they all rely on a similaritymeasure. Picking a good similarity measure is therefore of paramountimportance.

It is the core of any model.

It determines what type of structures are considered interesting.

It will ultimately change the clustering result.

Remarks

Choosing a similarity measure, or building one often require a goodknowledge of the data to analyze, or at least of the field they comefrom.

Picking a similarity measure already introduces a bias in a task that issupposed to be exploratory.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 79 / 100

Page 90: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking a similarity measure

Picking a similarity measure

Figure: Depending on the similarity measure, both partitions can make sense.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 80 / 100

Page 91: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Validating clustering results

Validating clustering results is a difficult process:

There is no proper definition of what a “good cluster” is, or looks like.

There is no “ground truth” in unsupervised learning to check thequality of a partition.

Clusters come in all shapes and forms.

The number of clusters to be found is usually unknown, andsometimes there are none.

Not all data sets have well-defined clusters.

Identifying clusters

While it is usually obvious to visually spot the clusters in 2D or 3D datasets, it is almost impossible to do so when there are more features !

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 81 / 100

Page 92: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes

Internal indexes are criteria that can be used to evaluate the quality of aclustering partition.

Internal indexes can be subjective as they favor some cluster shapes,and can be biased toward a lower number of clusters.

They usually assess the compactness of the clusters and whether theyare well separated.

Remark

Internal indexes are said to be “internal”, by opposition to “externalindexes” that compare a result with an external ground truth.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 82 / 100

Page 93: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: Davies-Bouldin Index

Let Si be the average scatter of a cluster ci around its mean value µi :

Si =1

|ci |∑x∈ci

||x − µi ||2

Let Mi ,j = ||µi − µj ||2 be the average distance between two clusters ci andcj .

Davies-Bouldin Index for K clusters

DB =1

K

K∑i=1

maxj 6=i

Si + SjMi ,j

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 83 / 100

Page 94: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: Davies-Bouldin Index

The Davies-Bouldin index has the following properties:

A lower DB value means a better clustering.

This index is not normalized.

It favors spherical clusters.

it is biased so that it gives lower values with less clusters.

Remark

The Davies-Bouldin index is usually the favored internal index used toevaluate partitions created using the K-Means algorithm.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 84 / 100

Page 95: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: Silhouette Index

The Silhouette index is another internal index with interesting properties:

It is normalized between −1 and 1 (a value bellow 0 meaning a badpartition).

It can evaluate if a a data belongs to a cluster, if a cluster is wellformed, or the whole partition.

Let ax ' ||x − µi ||2 be the mean distance between x ∈ ci and the datathat belong to the same cluster. And bx the mean distance to the datathat belong to other clusters.

bx =1

N − |ci |∑y /∈ci

||x − y ||2 =∑k 6=i

|ck |N − |ci |

||x − µk ||2

Silhouette index of an element x

SC (x ∈ ci ) =bx − ax

max(ax , bx)

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 85 / 100

Page 96: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: Silhouette Index

Silhouette index of a cluster ci

SC (ci ) =1

|ci |∑x∈ci

SC (x)

The silhouette index best value is 1.

It favors spherical clusters.

Silhouette index of a partition

SC =1

K

K∑i=1

SC (ci )

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 86 / 100

Page 97: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: C-Index

This index is defined as follows:

C =S − Smin

Smax − Smin

S is the sum of distances over all pairs within the same clusters. Letthere be l of these pairs.

Smin is the sum of the l smallest distances

Smax is the sum of the l largest distances

Properties of the C-Index

This index is not normalized.

This index is better when smaller.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 87 / 100

Page 98: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Internal indexes: Calinski & Harabasz

This index is defined as follows:

CH(k) =B/(k − 1)

W /(N − k)

B is the sum of squares among the clusters

W is the sum of squares within clusters

k is the number of clusters

Properties of the C-Index

Not normalized

Better when higher

With balanced clusters, the CH index is generally a good criterion toindicate the correct number of clusters.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 88 / 100

Page 99: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

The notion of stability

Stability is a recent concept used to assess the quality of a clusteringpartition.

Stability: Definition

A cluster, or a partition are said to be stable if they are insensitive to smallchanges in the underlying data.

Stability is usually tested by bootstrapping the same clusteringalgorithm many times over several sub-samples of the data.

Figure: Example of an unstable partition

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 89 / 100

Page 100: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

The notion of stability

Given the results of R simulations C1, · · · ,CR , we have:

S =1

(R − 1)2

R∑i=1

R∑j 6=i

(1−∆(Ci ,Cj))

Pros

Very intuitive criterion

Assesses the quality of a partition without any bias

Cons

Time consuming to compute: Requires bootstrapping

Requires the use of an operator to assess the difference between 2partitions (∆), or of an external criterion.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 90 / 100

Page 101: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Evaluating clustering results

Using External indexes to asses cluster quality

Quite often clustering algorithms will be tested in a non-exploratory setting:

A data set for which the real classes are known will be used for test purposes(with the added advantage that the number of clusters will be known)

In this case, the cluster found will be compared to the real classes usingexternal indexes or purity measures.

Cluster purity

Given K real classes Y 1, · · · ,Y K :

Purity of a cluster C i : CP(C i ) = 1|C i |argmaxj |C i ∩ Y j |

Purity of a partition: CP = 1N

∑ki=1 argmaxj |C i ∩ Y j |

Examples of external indexes

Rand Index, Adjusted Rand Index, Accuracy, etc.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 91 / 100

Page 102: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right number of clusters

How many clusters ?

Guessing the right number of clusters is one of the oldest problem inclustering, and several methods require it as a parameter.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 92 / 100

Page 103: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right number of clusters

Criteria to pick the right number of clusters

Some criteria exist to pick the optimal number of clusters, but they requireto run the algorithm several times with different number of clusters inorder to evaluate the solutions:

The Bayes Information Criterion (BIC) and Akaike InformationCriterion (AIC) work well with probabilistic algorithms.

The Minimum Description Length criterion (MDL) and theMinimum Message Length criterion (MML) are recommended forhierarchical clustering.

Stability can also be used to guess the right number of clusters.

Remark

Most of the time, the best solution is to ask the advice of an expert in thefield of the studied data.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 93 / 100

Page 104: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right number of clusters

Criteria to pick the right number of clusters

Sometimes clustering indexes can be used to help picking the right numberof clusters:

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 94 / 100

Page 105: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right number of clusters

Criteria to pick the right number of clusters

Another example with the C-Index and the Calinski-Harabasz indexes:

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 95 / 100

Page 106: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right algorithm

Picking a clustering algorithm

Picking the right clustering algorithm depends on several factors andrequires prior knowledge on the data. With high dimensional data sets,you usually don’t have this knowledge:

Are there any clusters ?

What is the shape of the clusters and how well separated are they ?

Other considerations may include the complexity of your algorithm:

Complexity depending on the size of the data set.

Complexity depending on the number of attributes of the data set.

Picking an algorithm often ends up being a trade-off between the expectedquality of the result and the computational complexity of your algorithm.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 96 / 100

Page 107: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Picking the right algorithm

Picking a clustering algorithm

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 97 / 100

Page 108: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Open problems in clustering Summary

Summary of open problems in clustering

Defining the notion of similarity, and picking or defining a distancefunction accordingly.

Picking the right algorithm and the right parameters for thisalgorithm.

Figuring out how many clusters to search for.

Evaluating clustering results.

Comparing clustering results.

All known data analysis problems: normalizing the data or not,picking relevant attributes, removing the outliers or not, etc.

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 98 / 100

Page 109: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Bibliography

Outline

1 Introduction

2 Clustering

3 Examples of clustering algorithms

4 Open problems in clustering

5 Bibliography

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 99 / 100

Page 110: Data Analysis - Lecture 4 Introduction to unsupervised ...matei/teaching_fichiers/0910/AR/DM_CM4.pdf · 1 c 2 c 3 x 1 1 0 0 x 2 0 1 0 x 3 0 0 1 x 4 0 0 1 (b) Soft clustering c 1 c

Bibliography Bibliography

Bibliography

Christopher M. Bishop, Pattern Recognition and Machine Learning(2006)

Brian S. Everitt et al., Cluster Analysis 5th edition (2011)

B. Matei and J. Sublime Data Analysis - Lecture 4 M2 2018/2019 100 / 100