13 Unsupervised Learning Balloch - Georgia Institute of …bboots3/CS4641-Fall2016/Lectures/... ·...

54
Unsupervised Learning: K-Means, Gaussian Mixture Models Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.

Transcript of 13 Unsupervised Learning Balloch - Georgia Institute of …bboots3/CS4641-Fall2016/Lectures/... ·...

UnsupervisedLearning:K-Means,

GaussianMixtureModels

Robot Image Credit: Viktoriya Sukhanova ©

123RF.com

These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others

who made their course materials freely available online. Feel free to reuse or adapt these slides for

your own academic purposes, provided that you include proper attribution. Please send comments

and corrections to Eric.

UnsupervisedLearning

UnsupervisedLearning

UnsupervisedLearning

UnsupervisedLearning• Supervisedlearningusedlabeleddatapairs(x, y)

tolearnafunctionf : X→Y

– But,whatifwedon’thavelabels?– Mostdatadoesn’t!

• Nolabels=unsupervisedlearning• Onlysomepointsarelabeled=semi-supervisedlearning– Labelsmaybeexpensivetoobtain,soweonlygetafew

UnsupervisedLearningMajorCategories(andtheiruses)

•Clustering• Density-estimation• Dimensionalityreduction(featureselection)

KernelDensityEstimation

SomematerialadaptedfromslidesbyLeSong,GT.

Clustering with Gaussian Mixtures: Slide 8

Kernel Density Estimation

Clustering with Gaussian Mixtures: Slide 9

Kernel Density Estimation

Clustering with Gaussian Mixtures: Slide 10

Kernel Density Estimation

Clustering with Gaussian Mixtures: Slide 11

Kernel Density Estimation

Clustering with Gaussian Mixtures: Slide 12

Kernel Density Estimation

Clustering with Gaussian Mixtures: Slide 13

Kernel Density Estimation

K-MeansClustering

SomematerialadaptedfromslidesbyAndrewMoore,CMU.

Visithttp://www.autonlab.org/tutorials/ forAndrew’srepositoryofDataMiningtutorials.

ClusteringData

K-MeansClustering

K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)

• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid

• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster

K-MeansClustering

K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)

• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid

• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster

K-MeansClustering

K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)

• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid

• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansObjectiveFunction

• K-meansfindsalocaloptimumofthefollowingobjectivefunction:

ProblemswithK-Means

• Very sensitivetotheinitialpoints– DomanyrunsofK-Means,eachwithdifferentinitialcentroids

– Seedthecentroidsusingabettermethodthanrandomlychoosingthecentroids

• e.g.,Farthest-firstsampling

• Mustmanuallychoosek– Learntheoptimalk fortheclustering

• Notethatthisrequiresaperformancemeasure

Choosingthekeyparameter

1.TheElbowMethoda. SSE=∑Ki=1∑x�cidist(x,ci)2

2.x-meansclusteringa. repeatedlyattemptingsubdivision

3.Others:a.InformationCriterionApproachb.AnInformationTheoreticApproachc.TheSilhouetteMethodd.Cross-validation

Choosingthekeyparameter

• Howdoyoutellitwhichclusteringyouwant?

Constrainedclusteringtechniques(semi-supervised)

ProblemswithK-Means

Same-clusterconstraint(must-link)

Different-clusterconstraint(cannot-link)

k = 2

GaussianMixtureModelsandExpectationMaximization

GaussianMixtureModels

• RecalltheGaussiandistribution:

Clustering with Gaussian Mixtures: Slide 36Copyright © 2001, 2004, Andrew W. Moore

The GMM assumption• There are k components. The

i’th component is called ωi

• Component ωi has an associated mean vector µi

µ1

µ2

µ3

Clustering with Gaussian Mixtures: Slide 37Copyright © 2001, 2004, Andrew W. Moore

The GMM assumption• There are k components. The

i’th component is called ωi

• Component ωi has an associated mean vector µi

• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I

Assume that each datapoint is generated according to the following recipe:

µ1

µ2

µ3

Clustering with Gaussian Mixtures: Slide 38Copyright © 2001, 2004, Andrew W. Moore

The GMM assumption• There are k components. The

i’th component is called ωi

• Component ωi has an associated mean vector µi

• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I

Assume that each datapoint is generated according to the following recipe:

• Pick a component at random. Choose component i with probability P(ωi).

µ2

Clustering with Gaussian Mixtures: Slide 39Copyright © 2001, 2004, Andrew W. Moore

The GMM assumption• There are k components. The

i’th component is called ωi

• Component ωi has an associated mean vector µi

• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I

Assume that each datapoint is generated according to the following recipe:

• Pick a component at random. Choose component i with probability P(ωi).

• Datapoint ~ N(µi, σ2I )

µ2

x

Clustering with Gaussian Mixtures: Slide 40Copyright © 2001, 2004, Andrew W. Moore

The General GMM assumption

µ1

µ2

µ3

• There are k components. The i’th component is called ωi

• Component ωi has an associated mean vector µi

• Each component generates data from a Gaussian with mean µi and covariance matrix Σi

Assume that each datapoint is generated according to the following recipe:

• Pick a component at random. Choose component i with probability P(ωi).

• Datapoint ~ N(µi , Σi )

Clustering with Gaussian Mixtures: Slide 41

Just evaluate a Gaussian at xk

Copyright © 2001, 2004, Andrew W. Moore

Expectation-Maximization for GMMsIterate until convergence:On the t’th iteration let our estimates be

λt = { μ1(t), μ2(t) … μc(t) }

E-step: Compute “expected” classes of all datapoints for each class

M-step: Estimate μ given our data’s class membership distributions

Clustering with Gaussian Mixtures: Slide 42Copyright © 2001, 2004, Andrew W. Moore

E.M. for General GMMsIterate. On the t’th iteration let our estimates beλt = { μ1(t), μ2(t) … μc(t), Σ1(t), Σ2(t) … Σc(t), p1(t), p2(t) … pc(t) }

E-step: Compute “expected” clusters of all datapoints

M-step: Estimate μ, Σ given our data’s class membership distributions

pi(t) is shorthand for estimate of P(ωi) on t’th iteration

R = #records

Just evaluate a Gaussian at xk

Clustering with Gaussian Mixtures: Slide 43Copyright © 2001, 2004, Andrew W. Moore

Gaussian Mixture

Example: Start

Advance apologies: in Black and White this example will be

incomprehensible

Clustering with Gaussian Mixtures: Slide 44Copyright © 2001, 2004, Andrew W. Moore

After first iteration

Clustering with Gaussian Mixtures: Slide 45Copyright © 2001, 2004, Andrew W. Moore

After 2nd iteration

Clustering with Gaussian Mixtures: Slide 46Copyright © 2001, 2004, Andrew W. Moore

After 3rd iteration

Clustering with Gaussian Mixtures: Slide 47Copyright © 2001, 2004, Andrew W. Moore

After 4th iteration

Clustering with Gaussian Mixtures: Slide 48Copyright © 2001, 2004, Andrew W. Moore

After 5th iteration

Clustering with Gaussian Mixtures: Slide 49Copyright © 2001, 2004, Andrew W. Moore

After 6th iteration

Clustering with Gaussian Mixtures: Slide 50Copyright © 2001, 2004, Andrew W. Moore

After 20th iteration

Clustering with Gaussian Mixtures: Slide 51Copyright © 2001, 2004, Andrew W. Moore

Some Bio Assay data

Clustering with Gaussian Mixtures: Slide 52Copyright © 2001, 2004, Andrew W. Moore

GMM clustering

of the assay data

Clustering with Gaussian Mixtures: Slide 53Copyright © 2001, 2004, Andrew W. Moore

Resulting Density

Estimator

Clustering with Gaussian Mixtures: Slide 54

Hidden Markov Models:

● Andrew Ng’s Notes

Mean Shift● Mean Shift Tutorial

Deep Methods● Tutorial on Variational Autoencoders

Other Methods and Reading