Clustering

26
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering

description

 

Transcript of Clustering

Page 1: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Machine Learning

Clustering

Page 2: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Clustering

• Grouping data into (hopefully useful) sets.

Things on the rightThings on the left

Page 3: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Clustering• Unsupervised Learning

– No labels

• Why do clustering?– Hypothesis Generation/Data Understanding

• Clusters might suggest natural groups.– Visualization– Data pre-processing, e.g.:

• Medical Diagnosis• Text Classification (e.g., search engines, Google Sets)

Page 4: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Some definitions• Let X be the dataset:

• An m-clustering of X is a partition of X into m sets (clusters) C1,…Cm such that:

},...,,{ 321 nxxxxX

if {}, :overlapnot do Clusters .3

:X of allcover Clusters .2

1{}, :empty-non are Clusters .1

1

ijCC

XC

miC

ji

imi

i

Page 5: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

How many possible clusters?(Stirling numbers)

nm

i

im ii

m

mmnS

0

)1(!

1),(

Size of dataset

Number of clusters

6810)5,100(

901,115,232,45)4,20(

101,375,2)3,15(

S

S

S

Page 6: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

What does this mean?

• We can’t try all possible clusterings.

• Clustering algorithms look at a small fraction of all partitions of the data.

• The exact partitions tried depend on the kind of clustering used.

Page 7: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Who is right?

• Different techniques cluster the same data set DIFFERENTLY.

• Who is right? Is there a “right” clustering?

Page 8: Clustering

Classic Example: Half Moons

From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 9: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Steps in Clustering

• Select Features• Define a Proximity Measure• Define Clustering Criterion• Define a Clustering Algorithm• Validate the Results• Interpret the Results

Page 10: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Kinds of Clustering

• Sequential– Fast– Results depend on data order

• Hierarchical– Start with many clusters– join clusters at each step

• Cost Optimization– Fixed number of clusters (typically)– Boundary detection, Probabilistic

classifiers

Page 11: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

A Sequential Clustering Method

• Basic Sequential Algorithmic Scheme (BSAS)– S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic

Press, London England, 1999

• Assumption: The number of clusters is not known in advance.

• Let:d(x,C) be the distance between feature

vector x and cluster C. be the threshold of dissimilarityq be the maximum number of

clusters

Page 12: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

BSAS Pseudo Code

End

End

}{

Else

}{

1

)( and )),(( If

),(min),(: Find

to2For

}{

1

11

ikk

im

ki

jij

kik

x C C

x C

mm

qmCxd

CxdCxdC

n i

xC

m

Page 13: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

A Cost-optimization method

• K-means clustering– J. B. MacQueen (1967): "Some Methods for classification and Analysis of

Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press,

1:281-297

• A greedy algorithm • Partitions n samples into k clusters• minimizes the sum of the squared

distances to the cluster centers

Page 14: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

The K-means algorithm

• Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means).

• Assign each object to the group that has the closest centroid (mean).

• When all objects have been assigned, recalculate the positions of the K centroids (means).

• Repeat Steps 2 and 3 until the centroids no longer move.

Page 15: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

K-means clustering

• The way to initialize the mean values is not specified. – Randomly choose k samples?

• Results depend on the initial means– Try multiple starting points?

• Assumes K is known.– How do we chose this?

Page 16: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 17: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 18: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

EM Algorithm

• General probabilistic approach to dealing with missing data

• “Parameters” (model)– For MMs: cluster distributions P(x | ci)

• For MoGs: mean i and variance i 2 of each ci

• “Variables” (data)– For MMs: Assignments of data points to clusters

• Probabilities of these represented as P(i | xi )

• Idea: alternately optimize parameters and variables

Page 19: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 20: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 21: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 22: Clustering

Mixture Models for Documents

• Learn simultaneously P(w | topic), P(topic | doc)

From Blei et al., 2003 (http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf)

Page 23: Clustering

Greedy Hierarchical Clustering

• Initialize one cluster for each data point

• Until done– Merge the two nearest clusters

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 24: Clustering

Hierarchical Clustering on Strings

• Features = contexts in which strings appear

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 25: Clustering

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo

Page 26: Clustering

Summary

• Algorithms:– Sequential clustering

• Requires key distance threshold, sensitive to data order

– K-means clustering• Requires # of clusters, sensitive to initial

conditions• Special case of mixture modeling

– Greedy agglomerative clustering• Naively takes order of n^2 runtime• Hard to tell when you’re “done”

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo