31 Machine Learning Unsupervised Cluster Validity

125
Machine Learning for Data Mining Cluster Validity Andres Mendez-Vazquez July 30, 2015 1 / 55

Transcript of 31 Machine Learning Unsupervised Cluster Validity

Page 1: 31 Machine Learning Unsupervised Cluster Validity

Machine Learning for Data MiningCluster Validity

Andres Mendez-Vazquez

July 30, 2015

1 / 55

Page 2: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

2 / 55

Page 3: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

3 / 55

Page 4: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What is a Good Clustering?

Internal criterionA good clustering will produce high quality clusters in which:

The intra-class (that is, intra-cluster) similarity is high.The inter-class similarity is low.The measured quality of a clustering depends on both the documentrepresentation and the similarity measure used.

4 / 55

Page 5: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What is a Good Clustering?

Internal criterionA good clustering will produce high quality clusters in which:

The intra-class (that is, intra-cluster) similarity is high.The inter-class similarity is low.The measured quality of a clustering depends on both the documentrepresentation and the similarity measure used.

4 / 55

Page 6: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What is a Good Clustering?

Internal criterionA good clustering will produce high quality clusters in which:

The intra-class (that is, intra-cluster) similarity is high.The inter-class similarity is low.The measured quality of a clustering depends on both the documentrepresentation and the similarity measure used.

4 / 55

Page 7: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What is a Good Clustering?

Internal criterionA good clustering will produce high quality clusters in which:

The intra-class (that is, intra-cluster) similarity is high.The inter-class similarity is low.The measured quality of a clustering depends on both the documentrepresentation and the similarity measure used.

4 / 55

Page 8: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

Many of the Clustering AlgorithmsThey impose a clustering structure on the data, even though the data maynot posses any.

This is whyCluster analysis is not a panacea.

ThereforeIt is necessary to have an indication that the vectors of X form clustersbefore we apply a clustering algorithm.

5 / 55

Page 9: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

Many of the Clustering AlgorithmsThey impose a clustering structure on the data, even though the data maynot posses any.

This is whyCluster analysis is not a panacea.

ThereforeIt is necessary to have an indication that the vectors of X form clustersbefore we apply a clustering algorithm.

5 / 55

Page 10: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

Many of the Clustering AlgorithmsThey impose a clustering structure on the data, even though the data maynot posses any.

This is whyCluster analysis is not a panacea.

ThereforeIt is necessary to have an indication that the vectors of X form clustersbefore we apply a clustering algorithm.

5 / 55

Page 11: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Then

The problem of verifying whether X possesses a clustering structureWithout identifying it explicitly, this is known as clustering tendency.

6 / 55

Page 12: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What can happen if X posses a cluster structure?

A different kind of problem is encountered nowAll the previous algorithms require knowledge of the values of specificparameters.Some of them impose restrictions on the shape of the clusters.

It is morePoor estimation of these parameters and inappropriate restrictions on theshape of the clusters may lead to incorrect conclusions about theclustering structure of X .

Thus, it is necessary to discussMethods suitable for quantitative evaluation of the results of aclustering algorithm.This task is known as cluster validity.

7 / 55

Page 13: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What can happen if X posses a cluster structure?

A different kind of problem is encountered nowAll the previous algorithms require knowledge of the values of specificparameters.Some of them impose restrictions on the shape of the clusters.

It is morePoor estimation of these parameters and inappropriate restrictions on theshape of the clusters may lead to incorrect conclusions about theclustering structure of X .

Thus, it is necessary to discussMethods suitable for quantitative evaluation of the results of aclustering algorithm.This task is known as cluster validity.

7 / 55

Page 14: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What can happen if X posses a cluster structure?

A different kind of problem is encountered nowAll the previous algorithms require knowledge of the values of specificparameters.Some of them impose restrictions on the shape of the clusters.

It is morePoor estimation of these parameters and inappropriate restrictions on theshape of the clusters may lead to incorrect conclusions about theclustering structure of X .

Thus, it is necessary to discussMethods suitable for quantitative evaluation of the results of aclustering algorithm.This task is known as cluster validity.

7 / 55

Page 15: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What can happen if X posses a cluster structure?

A different kind of problem is encountered nowAll the previous algorithms require knowledge of the values of specificparameters.Some of them impose restrictions on the shape of the clusters.

It is morePoor estimation of these parameters and inappropriate restrictions on theshape of the clusters may lead to incorrect conclusions about theclustering structure of X .

Thus, it is necessary to discussMethods suitable for quantitative evaluation of the results of aclustering algorithm.This task is known as cluster validity.

7 / 55

Page 16: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

What can happen if X posses a cluster structure?

A different kind of problem is encountered nowAll the previous algorithms require knowledge of the values of specificparameters.Some of them impose restrictions on the shape of the clusters.

It is morePoor estimation of these parameters and inappropriate restrictions on theshape of the clusters may lead to incorrect conclusions about theclustering structure of X .

Thus, it is necessary to discussMethods suitable for quantitative evaluation of the results of aclustering algorithm.This task is known as cluster validity.

7 / 55

Page 17: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

8 / 55

Page 18: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

The Cluster FlowchartFlowchart of the validity paradigm for clustering structures

9 / 55

Page 19: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

10 / 55

Page 20: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hypothesis Testing Revisited

Hypothesis

H1 :θ 6= θ0

H0 :θ = θ0

In additionAlso let Dρ be the critical interval corresponding to significance level ρ ofa test statistics q.

NowGiven Θ1, the set of all values that θ may take under hypothesis H1.

11 / 55

Page 21: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hypothesis Testing Revisited

Hypothesis

H1 :θ 6= θ0

H0 :θ = θ0

In additionAlso let Dρ be the critical interval corresponding to significance level ρ ofa test statistics q.

NowGiven Θ1, the set of all values that θ may take under hypothesis H1.

11 / 55

Page 22: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hypothesis Testing Revisited

Hypothesis

H1 :θ 6= θ0

H0 :θ = θ0

In additionAlso let Dρ be the critical interval corresponding to significance level ρ ofa test statistics q.

NowGiven Θ1, the set of all values that θ may take under hypothesis H1.

11 / 55

Page 23: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Power Function

Definition (Power Function)

W (θ) = P(q ∈ Dρ|θ ∈ Θ1

)(1)

MeaningW (θ) is the probability that q lies in the critical region when the value ofthe parameter vector θ.

ThusThe power function can be used for the comparison of two differentstatistical tests.The test whose power under the alternative hypotheses is greater isalways preferred.

12 / 55

Page 24: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Power Function

Definition (Power Function)

W (θ) = P(q ∈ Dρ|θ ∈ Θ1

)(1)

MeaningW (θ) is the probability that q lies in the critical region when the value ofthe parameter vector θ.

ThusThe power function can be used for the comparison of two differentstatistical tests.The test whose power under the alternative hypotheses is greater isalways preferred.

12 / 55

Page 25: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Power Function

Definition (Power Function)

W (θ) = P(q ∈ Dρ|θ ∈ Θ1

)(1)

MeaningW (θ) is the probability that q lies in the critical region when the value ofthe parameter vector θ.

ThusThe power function can be used for the comparison of two differentstatistical tests.The test whose power under the alternative hypotheses is greater isalways preferred.

12 / 55

Page 26: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Power Function

Definition (Power Function)

W (θ) = P(q ∈ Dρ|θ ∈ Θ1

)(1)

MeaningW (θ) is the probability that q lies in the critical region when the value ofthe parameter vector θ.

ThusThe power function can be used for the comparison of two differentstatistical tests.The test whose power under the alternative hypotheses is greater isalways preferred.

12 / 55

Page 27: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 28: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 29: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 30: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 31: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 32: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 33: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 34: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

There are two types of errors associated with a statisticaltest

Suppose that H0 is trueIf q (x) ∈ Dρ, H0 will be rejected even if it is true.It is called a type error I.The probability of such error is ρ.The probability of accepting H0 when it is true is 1− ρ.

Suppose that H0 is falseIf q (x) /∈ Dρ, H0 will be accepted even if it is false.It is called a type error I.The probability of such error is 1−W (θ).This depends on the specific value of θ.

13 / 55

Page 35: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

Something NotableThe probability density function (pdf) of the statistic q, under H0 ,for most of the statistics used in practice has a single maximum.In addition, the region Dρ, is either a half-line or the union of twohalf-lines.

14 / 55

Page 36: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Example

Dρ is the union of two half-lines (A two-tailed statistical test)

15 / 55

Page 37: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Example

A right-tailed test

16 / 55

Page 38: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Example

A left-tailed test

17 / 55

Page 39: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

In many practical casesThe exact form of the pdf of a statistic q, under a given hypothesis, is notavailable and it is difficult to obtain.

However, we can do the followingMonte Carlo techniques.Bootstrapping techniques.

18 / 55

Page 40: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

In many practical casesThe exact form of the pdf of a statistic q, under a given hypothesis, is notavailable and it is difficult to obtain.

However, we can do the followingMonte Carlo techniques.Bootstrapping techniques.

18 / 55

Page 41: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Problem

In many practical casesThe exact form of the pdf of a statistic q, under a given hypothesis, is notavailable and it is difficult to obtain.

However, we can do the followingMonte Carlo techniques.Bootstrapping techniques.

18 / 55

Page 42: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Monte Carlo techniques

What do they do?They rely on simulating the process at hand using a sufficient number ofcomputer-generated data.

Given enough data, we can try to learn the pdf for q.Then, using that pdf we simulate samples of q.

ThusFor each of the say r , data sets, Xi , we compute the value of q, denotedby qi .

ThenWe construct the corresponding histogram of these values

19 / 55

Page 43: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Monte Carlo techniques

What do they do?They rely on simulating the process at hand using a sufficient number ofcomputer-generated data.

Given enough data, we can try to learn the pdf for q.Then, using that pdf we simulate samples of q.

ThusFor each of the say r , data sets, Xi , we compute the value of q, denotedby qi .

ThenWe construct the corresponding histogram of these values

19 / 55

Page 44: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Monte Carlo techniques

What do they do?They rely on simulating the process at hand using a sufficient number ofcomputer-generated data.

Given enough data, we can try to learn the pdf for q.Then, using that pdf we simulate samples of q.

ThusFor each of the say r , data sets, Xi , we compute the value of q, denotedby qi .

ThenWe construct the corresponding histogram of these values

19 / 55

Page 45: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Monte Carlo techniques

What do they do?They rely on simulating the process at hand using a sufficient number ofcomputer-generated data.

Given enough data, we can try to learn the pdf for q.Then, using that pdf we simulate samples of q.

ThusFor each of the say r , data sets, Xi , we compute the value of q, denotedby qi .

ThenWe construct the corresponding histogram of these values

19 / 55

Page 46: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Monte Carlo techniquesWhat do they do?They rely on simulating the process at hand using a sufficient number ofcomputer-generated data.

Given enough data, we can try to learn the pdf for q.Then, using that pdf we simulate samples of q.

ThusFor each of the say r , data sets, Xi , we compute the value of q, denotedby qi .

ThenWe construct the corresponding histogram of these values

19 / 55

Page 47: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Using this approximation

Assumeq corresponds to a right-tailed statistical test.A histogram is constructed using r values of q corresponding to the rdata sets.

20 / 55

Page 48: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Using this approximation

Assumeq corresponds to a right-tailed statistical test.A histogram is constructed using r values of q corresponding to the rdata sets.

20 / 55

Page 49: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Using this approximation

A right-tailed test

21 / 55

Page 50: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

Then, acceptance or rejection may be based on the rulesReject H0, if q is greater than (1− ρ) r of the qi values.Accept H0, if q is smaller than (1− ρ) r of the qi values.

22 / 55

Page 51: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

Then, acceptance or rejection may be based on the rulesReject H0, if q is greater than (1− ρ) r of the qi values.Accept H0, if q is smaller than (1− ρ) r of the qi values.

22 / 55

Page 52: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Next

For a left-tailed test

23 / 55

Page 53: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Next

For a left-tailed test, rejection or acceptance of the null hypothesis isdone on the basis

1 Reject H0, if q is smaller than ρr of the qi values.2 Accept H0, if q is greater than ρr of the qi values

24 / 55

Page 54: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Now, for two-tailed statistical test

A two-tailed statistical test

25 / 55

Page 55: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

For a two-tailed test we haveAccept H0, if q is greater than

(ρ2)

r of the qi values and less than(1− ρ

2)

r of the qi values.

26 / 55

Page 56: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Bootstrapping Techniques

Why?They constitute an alternative way to cope with a limited amount of data.

ThenThe idea here is to parameterize the unknown pdf in terms of an unknownparameter.

HowTo cope with the limited amount of data and in order to improve theaccuracy of the estimate of the unknown pdf parameter.

27 / 55

Page 57: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Bootstrapping Techniques

Why?They constitute an alternative way to cope with a limited amount of data.

ThenThe idea here is to parameterize the unknown pdf in terms of an unknownparameter.

HowTo cope with the limited amount of data and in order to improve theaccuracy of the estimate of the unknown pdf parameter.

27 / 55

Page 58: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Bootstrapping Techniques

Why?They constitute an alternative way to cope with a limited amount of data.

ThenThe idea here is to parameterize the unknown pdf in terms of an unknownparameter.

HowTo cope with the limited amount of data and in order to improve theaccuracy of the estimate of the unknown pdf parameter.

27 / 55

Page 59: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

The Process

For this, we createSeveral “fake”data sets X1, ...,Xr are created by sampling X withreplacement.

ThusBy using this sample, we estimate the desired pdf for q.

ThenTypically, good estimates are obtained if r is between 100 and 200.

28 / 55

Page 60: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

The Process

For this, we createSeveral “fake”data sets X1, ...,Xr are created by sampling X withreplacement.

ThusBy using this sample, we estimate the desired pdf for q.

ThenTypically, good estimates are obtained if r is between 100 and 200.

28 / 55

Page 61: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

The Process

For this, we createSeveral “fake”data sets X1, ...,Xr are created by sampling X withreplacement.

ThusBy using this sample, we estimate the desired pdf for q.

ThenTypically, good estimates are obtained if r is between 100 and 200.

28 / 55

Page 62: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

29 / 55

Page 63: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Which Hypothesis?

Random position hypothesisH0: All the locations of N data points in some specific region of ad-dimensional space are equally likely.

Random graph hypothesisH0: All N ×N rank order proximity matrices are equally likely.

Random label hypothesisH0: All permutations of the labels on N data objects are equally likely.

30 / 55

Page 64: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Which Hypothesis?

Random position hypothesisH0: All the locations of N data points in some specific region of ad-dimensional space are equally likely.

Random graph hypothesisH0: All N ×N rank order proximity matrices are equally likely.

Random label hypothesisH0: All permutations of the labels on N data objects are equally likely.

30 / 55

Page 65: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Which Hypothesis?

Random position hypothesisH0: All the locations of N data points in some specific region of ad-dimensional space are equally likely.

Random graph hypothesisH0: All N ×N rank order proximity matrices are equally likely.

Random label hypothesisH0: All permutations of the labels on N data objects are equally likely.

30 / 55

Page 66: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

We must define an appropriate statisticWhose values are indicative of the structure of a data set, and comparethe value that results from our data set X against the value obtained fromthe reference (random) population.

Random PopulationIn order to obtain the baseline distribution under the null hypothesis,statistical sampling techniques like Monte Carlo analysis andbootstrapping are used (Jain and Dubes, 1988).

31 / 55

Page 67: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

We must define an appropriate statisticWhose values are indicative of the structure of a data set, and comparethe value that results from our data set X against the value obtained fromthe reference (random) population.

Random PopulationIn order to obtain the baseline distribution under the null hypothesis,statistical sampling techniques like Monte Carlo analysis andbootstrapping are used (Jain and Dubes, 1988).

31 / 55

Page 68: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

For example

Random position hypothesis - appropriate for ratio dataAll the arrangements of N vectors in a specific region of the d-dimensionalspace are equally likely to occur.

How?One way to produce such an arrangement is to insert each point randomlyin this region of the d-dimensional space, according to a uniformdistribution.

The random position hypothesisIt can be used with either external or internal criterion.

32 / 55

Page 69: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

For example

Random position hypothesis - appropriate for ratio dataAll the arrangements of N vectors in a specific region of the d-dimensionalspace are equally likely to occur.

How?One way to produce such an arrangement is to insert each point randomlyin this region of the d-dimensional space, according to a uniformdistribution.

The random position hypothesisIt can be used with either external or internal criterion.

32 / 55

Page 70: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

For example

Random position hypothesis - appropriate for ratio dataAll the arrangements of N vectors in a specific region of the d-dimensionalspace are equally likely to occur.

How?One way to produce such an arrangement is to insert each point randomlyin this region of the d-dimensional space, according to a uniformdistribution.

The random position hypothesisIt can be used with either external or internal criterion.

32 / 55

Page 71: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

33 / 55

Page 72: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Internal criteria

What do we do?In this case, the statistic q is defined so as to measure the degree to whicha clustering structure, produced by a clustering algorithm, matches theproximity matrix of the corresponding data set.

We apply our clustering algorithm to the following data set1 Let Xi be a set of N vectors generated according to the random

position hypothesis.2 Pi be the corresponding proximity matrix.3 Ci the corresponding clustering.

Then, we apply our algorithm over the real data X to obtain CNow, we compute the statistics q for each clustering structure.

34 / 55

Page 73: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Internal criteria

What do we do?In this case, the statistic q is defined so as to measure the degree to whicha clustering structure, produced by a clustering algorithm, matches theproximity matrix of the corresponding data set.

We apply our clustering algorithm to the following data set1 Let Xi be a set of N vectors generated according to the random

position hypothesis.2 Pi be the corresponding proximity matrix.3 Ci the corresponding clustering.

Then, we apply our algorithm over the real data X to obtain CNow, we compute the statistics q for each clustering structure.

34 / 55

Page 74: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Internal criteria

What do we do?In this case, the statistic q is defined so as to measure the degree to whicha clustering structure, produced by a clustering algorithm, matches theproximity matrix of the corresponding data set.

We apply our clustering algorithm to the following data set1 Let Xi be a set of N vectors generated according to the random

position hypothesis.2 Pi be the corresponding proximity matrix.3 Ci the corresponding clustering.

Then, we apply our algorithm over the real data X to obtain CNow, we compute the statistics q for each clustering structure.

34 / 55

Page 75: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Internal criteria

What do we do?In this case, the statistic q is defined so as to measure the degree to whicha clustering structure, produced by a clustering algorithm, matches theproximity matrix of the corresponding data set.

We apply our clustering algorithm to the following data set1 Let Xi be a set of N vectors generated according to the random

position hypothesis.2 Pi be the corresponding proximity matrix.3 Ci the corresponding clustering.

Then, we apply our algorithm over the real data X to obtain CNow, we compute the statistics q for each clustering structure.

34 / 55

Page 76: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Internal criteria

What do we do?In this case, the statistic q is defined so as to measure the degree to whicha clustering structure, produced by a clustering algorithm, matches theproximity matrix of the corresponding data set.

We apply our clustering algorithm to the following data set1 Let Xi be a set of N vectors generated according to the random

position hypothesis.2 Pi be the corresponding proximity matrix.3 Ci the corresponding clustering.

Then, we apply our algorithm over the real data X to obtain CNow, we compute the statistics q for each clustering structure.

34 / 55

Page 77: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Then, we use our hypothesis H0

The random hypothesis H0

It is rejected if the value q, resulting from X lies in the critical interval Dρ

of the statistic pdf of the reference random population.

Meaningif q is unusually small or large.

35 / 55

Page 78: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Then, we use our hypothesis H0

The random hypothesis H0

It is rejected if the value q, resulting from X lies in the critical interval Dρ

of the statistic pdf of the reference random population.

Meaningif q is unusually small or large.

35 / 55

Page 79: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Also a External Criteria can be used

DefinitionThe statistic q is defined to measure the degree of correspondencebetween a prespecified structure P imposed on X .And the clustering that results after the application of a specificclustering algorithm.

ThenThen, the value of q corresponding to the clustering C resulting fromthe data set X is tested against the qi ’s.These qi ’s correspond to the clusterings resulting from the referencepopulation generated under the random position hypothesis.

AgainThe random hypothesis is rejected if q is unusually large or small.

36 / 55

Page 80: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Also a External Criteria can be used

DefinitionThe statistic q is defined to measure the degree of correspondencebetween a prespecified structure P imposed on X .And the clustering that results after the application of a specificclustering algorithm.

ThenThen, the value of q corresponding to the clustering C resulting fromthe data set X is tested against the qi ’s.These qi ’s correspond to the clusterings resulting from the referencepopulation generated under the random position hypothesis.

AgainThe random hypothesis is rejected if q is unusually large or small.

36 / 55

Page 81: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Also a External Criteria can be used

DefinitionThe statistic q is defined to measure the degree of correspondencebetween a prespecified structure P imposed on X .And the clustering that results after the application of a specificclustering algorithm.

ThenThen, the value of q corresponding to the clustering C resulting fromthe data set X is tested against the qi ’s.These qi ’s correspond to the clusterings resulting from the referencepopulation generated under the random position hypothesis.

AgainThe random hypothesis is rejected if q is unusually large or small.

36 / 55

Page 82: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Also a External Criteria can be used

DefinitionThe statistic q is defined to measure the degree of correspondencebetween a prespecified structure P imposed on X .And the clustering that results after the application of a specificclustering algorithm.

ThenThen, the value of q corresponding to the clustering C resulting fromthe data set X is tested against the qi ’s.These qi ’s correspond to the clusterings resulting from the referencepopulation generated under the random position hypothesis.

AgainThe random hypothesis is rejected if q is unusually large or small.

36 / 55

Page 83: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Also a External Criteria can be used

DefinitionThe statistic q is defined to measure the degree of correspondencebetween a prespecified structure P imposed on X .And the clustering that results after the application of a specificclustering algorithm.

ThenThen, the value of q corresponding to the clustering C resulting fromthe data set X is tested against the qi ’s.These qi ’s correspond to the clusterings resulting from the referencepopulation generated under the random position hypothesis.

AgainThe random hypothesis is rejected if q is unusually large or small.

36 / 55

Page 84: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Nevertheless

There are more examplesIn chapter 16 in the book of Theodoridis.

37 / 55

Page 85: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

38 / 55

Page 86: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

External Criteria Usage

FirstFor the comparison of a clustering structure C , produced by a clusteringalgorithm, with a partition P of X drawn independently from C .

SecondFor measuring the degree of agreement between a predetermined partitionP and the proximity matrix of X , P.

39 / 55

Page 87: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

External Criteria Usage

FirstFor the comparison of a clustering structure C , produced by a clusteringalgorithm, with a partition P of X drawn independently from C .

SecondFor measuring the degree of agreement between a predetermined partitionP and the proximity matrix of X , P.

39 / 55

Page 88: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Comparison of P with a Clustering C

FirstIn this case, C may be either a specific hierarchy of clusterings or a specificclustering.

HoweverThe problem with the hierarchical clustering is the cutting in the correctlevel of the dendogram.

ThenWe will concentrate on clustering that does not imply hierarchicalclustering.

40 / 55

Page 89: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Comparison of P with a Clustering C

FirstIn this case, C may be either a specific hierarchy of clusterings or a specificclustering.

HoweverThe problem with the hierarchical clustering is the cutting in the correctlevel of the dendogram.

ThenWe will concentrate on clustering that does not imply hierarchicalclustering.

40 / 55

Page 90: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Comparison of P with a Clustering C

FirstIn this case, C may be either a specific hierarchy of clusterings or a specificclustering.

HoweverThe problem with the hierarchical clustering is the cutting in the correctlevel of the dendogram.

ThenWe will concentrate on clustering that does not imply hierarchicalclustering.

40 / 55

Page 91: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 92: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 93: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 94: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 95: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 96: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

SetupConsider a clustering C given by a specific clustering algorithm.This clustering is done in a independently drawn partition P.

Thus, we obtain the following setsC = {C1,C2, ...,Cm}P = {P1,P2, ...,Ps}

Note that the number of clusters in C need not be the same as the numberof groups in P.

41 / 55

Page 97: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Now

Consider the followingLet nij denote the number of vectors that belong to Ci and Pjsimultaneously.Also nC

i =∑s

j=1 nij .I It is the number of vectors that belong to Ci .

Similarly nPj =

∑mi=1 nij

I The number of vectors that belong to Pj .

42 / 55

Page 98: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Now

Consider the followingLet nij denote the number of vectors that belong to Ci and Pjsimultaneously.Also nC

i =∑s

j=1 nij .I It is the number of vectors that belong to Ci .

Similarly nPj =

∑mi=1 nij

I The number of vectors that belong to Pj .

42 / 55

Page 99: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Now

Consider the followingLet nij denote the number of vectors that belong to Ci and Pjsimultaneously.Also nC

i =∑s

j=1 nij .I It is the number of vectors that belong to Ci .

Similarly nPj =

∑mi=1 nij

I The number of vectors that belong to Pj .

42 / 55

Page 100: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Consider the a pair of vectors (xv, xu)

Thus, we have the following casesCase 1: If both vectors belong to the same cluster in C and to thesame group in P.Case 2: if both vectors belong to the same cluster in C and todifferent groups in P.Case 3: if both vectors belong to the different clusters in C and to thesame group in P.Case 4: if both vectors belong to different clusters in C and todifferent groups in P.

43 / 55

Page 101: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Consider the a pair of vectors (xv, xu)

Thus, we have the following casesCase 1: If both vectors belong to the same cluster in C and to thesame group in P.Case 2: if both vectors belong to the same cluster in C and todifferent groups in P.Case 3: if both vectors belong to the different clusters in C and to thesame group in P.Case 4: if both vectors belong to different clusters in C and todifferent groups in P.

43 / 55

Page 102: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Consider the a pair of vectors (xv, xu)

Thus, we have the following casesCase 1: If both vectors belong to the same cluster in C and to thesame group in P.Case 2: if both vectors belong to the same cluster in C and todifferent groups in P.Case 3: if both vectors belong to the different clusters in C and to thesame group in P.Case 4: if both vectors belong to different clusters in C and todifferent groups in P.

43 / 55

Page 103: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Consider the a pair of vectors (xv, xu)

Thus, we have the following casesCase 1: If both vectors belong to the same cluster in C and to thesame group in P.Case 2: if both vectors belong to the same cluster in C and todifferent groups in P.Case 3: if both vectors belong to the different clusters in C and to thesame group in P.Case 4: if both vectors belong to different clusters in C and todifferent groups in P.

43 / 55

Page 104: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

ExampleThe numbers of pairs of points for the four cases are denoted as a, b,c, and d

44 / 55

Page 105: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

The total number of pairs of points is N(N−1)2 denoted as M

a + b + c + d = M (2)

NowWe can give some commonly used external indices for measuring thematch between C and P.

45 / 55

Page 106: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

The total number of pairs of points is N(N−1)2 denoted as M

a + b + c + d = M (2)

NowWe can give some commonly used external indices for measuring thematch between C and P.

45 / 55

Page 107: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Commonly used external indices

Rand index (Rand, 1971)

R = a + dM (3)

Jaccard coefficient

J = aa + b + c (4)

Fowlkes and Mallows index (Fowlkes and Mallows, 1983)

FM =√ a

a + b ×a

a + c (5)

46 / 55

Page 108: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Commonly used external indices

Rand index (Rand, 1971)

R = a + dM (3)

Jaccard coefficient

J = aa + b + c (4)

Fowlkes and Mallows index (Fowlkes and Mallows, 1983)

FM =√ a

a + b ×a

a + c (5)

46 / 55

Page 109: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Commonly used external indices

Rand index (Rand, 1971)

R = a + dM (3)

Jaccard coefficient

J = aa + b + c (4)

Fowlkes and Mallows index (Fowlkes and Mallows, 1983)

FM =√ a

a + b ×a

a + c (5)

46 / 55

Page 110: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Explanation

Given a + dThe Rand statistic measures the fraction of the total number of pairs thatare either case 1 or 4.

Something NotableThe Jaccard coefficient follows the same philosophy as the Rand exceptthat it excludes case 4.

PropertiesThe values of these two statistics are between 0 and 1.

47 / 55

Page 111: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Explanation

Given a + dThe Rand statistic measures the fraction of the total number of pairs thatare either case 1 or 4.

Something NotableThe Jaccard coefficient follows the same philosophy as the Rand exceptthat it excludes case 4.

PropertiesThe values of these two statistics are between 0 and 1.

47 / 55

Page 112: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Explanation

Given a + dThe Rand statistic measures the fraction of the total number of pairs thatare either case 1 or 4.

Something NotableThe Jaccard coefficient follows the same philosophy as the Rand exceptthat it excludes case 4.

PropertiesThe values of these two statistics are between 0 and 1.

47 / 55

Page 113: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Commonly used external indices

Hubert’s Γ statistics

Γ = Ma −m1m2√m1m2 (M −m1) (M −m2)

(6)

PropertyUnusually large absolute values of suggest that C and P agree with eachother.

48 / 55

Page 114: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Commonly used external indices

Hubert’s Γ statistics

Γ = Ma −m1m2√m1m2 (M −m1) (M −m2)

(6)

PropertyUnusually large absolute values of suggest that C and P agree with eachother.

48 / 55

Page 115: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

How we use all this?

Assuming a Random HypothesisIt is possible using a Monte Carlo Method of sampling

Example: Gibbs Sampling1 Initialize x to some value2 Sample each variable in the feature vector x and resample

xi ∼ P(xi |x(i 6=j)

)

49 / 55

Page 116: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

How we use all this?

Assuming a Random HypothesisIt is possible using a Monte Carlo Method of sampling

Example: Gibbs Sampling1 Initialize x to some value2 Sample each variable in the feature vector x and resample

xi ∼ P(xi |x(i 6=j)

)

49 / 55

Page 117: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Example

A trace of sampling for a single variable

50 / 55

Page 118: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Algorithm and Example

PleaseGo and read the section 16.3 in the Theodoridis’ Book page 871 for more.

51 / 55

Page 119: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat is a Good Clustering?

2 Cluster ValidityThe ProcessHypothesis Testing

Monte Carlo techniquesBootstrapping Techniques

Which Hypothesis?Hypothesis Testing in Cluster ValidityExternal CriteriaRelative Criteria

Hard Clustering

52 / 55

Page 120: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hard Clustering

The Dunn and Dunn-like indicesGiven a dissimilarity function:

d (Ci ,Cj) = minx∈Ci ,y∈Cj

d (x,y) (7)

The it is possible to define the diameter of a cluster

diam (C ) = maxx,y∈C

d (x,y) (8)

Then the Dunn Index

Dm = mini=1,..,m

{min

j=i+1,...,m

(d (Ci ,Cj)

maxk=1,...,m diam (Ck)

)}(9)

53 / 55

Page 121: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hard Clustering

The Dunn and Dunn-like indicesGiven a dissimilarity function:

d (Ci ,Cj) = minx∈Ci ,y∈Cj

d (x,y) (7)

The it is possible to define the diameter of a cluster

diam (C ) = maxx,y∈C

d (x,y) (8)

Then the Dunn Index

Dm = mini=1,..,m

{min

j=i+1,...,m

(d (Ci ,Cj)

maxk=1,...,m diam (Ck)

)}(9)

53 / 55

Page 122: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Hard Clustering

The Dunn and Dunn-like indicesGiven a dissimilarity function:

d (Ci ,Cj) = minx∈Ci ,y∈Cj

d (x,y) (7)

The it is possible to define the diameter of a cluster

diam (C ) = maxx,y∈C

d (x,y) (8)

Then the Dunn Index

Dm = mini=1,..,m

{min

j=i+1,...,m

(d (Ci ,Cj)

maxk=1,...,m diam (Ck)

)}(9)

53 / 55

Page 123: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Thus

It is possible to prove thatIf X contains compact and well-separated clusters, Dunn’s index will belarge

Example

54 / 55

Page 124: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

ThusIt is possible to prove thatIf X contains compact and well-separated clusters, Dunn’s index will belarge

Example

54 / 55

Page 125: 31 Machine Learning Unsupervised Cluster Validity

Images/cinvestav-1.jpg

Although there are more

PleaseLook at chapter 16 in the Theodoridis for more examples

55 / 55