13 Unsupervised Learning Balloch - Georgia Institute of …bboots3/CS4641-Fall2016/Lectures/... ·...

UnsupervisedLearning:K-Means,

GaussianMixtureModels

Robot Image Credit: Viktoriya Sukhanova ©

123RF.com

These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others

who made their course materials freely available online. Feel free to reuse or adapt these slides for

your own academic purposes, provided that you include proper attribution. Please send comments

and corrections to Eric.

UnsupervisedLearning

UnsupervisedLearning• Supervisedlearningusedlabeleddatapairs(x, y)

tolearnafunctionf : X→Y

– But,whatifwedon’thavelabels?– Mostdatadoesn’t!

• Nolabels=unsupervisedlearning• Onlysomepointsarelabeled=semi-supervisedlearning– Labelsmaybeexpensivetoobtain,soweonlygetafew

UnsupervisedLearningMajorCategories(andtheiruses)

•Clustering• Density-estimation• Dimensionalityreduction(featureselection)

KernelDensityEstimation

SomematerialadaptedfromslidesbyLeSong,GT.

Clustering with Gaussian Mixtures: Slide 8

Kernel Density Estimation

K-MeansClustering

SomematerialadaptedfromslidesbyAndrewMoore,CMU.

Visithttp://www.autonlab.org/tutorials/ forAndrew’srepositoryofDataMiningtutorials.

ClusteringData

K-MeansClustering

K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)

• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid

• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster

K-MeansAnimation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

K-MeansObjectiveFunction

• K-meansfindsalocaloptimumofthefollowingobjectivefunction:

ProblemswithK-Means

• Very sensitivetotheinitialpoints– DomanyrunsofK-Means,eachwithdifferentinitialcentroids

– Seedthecentroidsusingabettermethodthanrandomlychoosingthecentroids

• e.g.,Farthest-firstsampling

• Mustmanuallychoosek– Learntheoptimalk fortheclustering

• Notethatthisrequiresaperformancemeasure

Choosingthekeyparameter

1.TheElbowMethoda. SSE=∑Ki=1∑x�cidist(x,ci)2

2.x-meansclusteringa. repeatedlyattemptingsubdivision

3.Others:a.InformationCriterionApproachb.AnInformationTheoreticApproachc.TheSilhouetteMethodd.Cross-validation

Choosingthekeyparameter

• Howdoyoutellitwhichclusteringyouwant?

Constrainedclusteringtechniques(semi-supervised)

ProblemswithK-Means

Same-clusterconstraint(must-link)

Different-clusterconstraint(cannot-link)

k = 2

GaussianMixtureModelsandExpectationMaximization

GaussianMixtureModels

• RecalltheGaussiandistribution:

Clustering with Gaussian Mixtures: Slide 36Copyright © 2001, 2004, Andrew W. Moore

The GMM assumption• There are k components. The

i’th component is called ωi

• Component ωi has an associated mean vector µi

µ1

µ2

µ3





• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I

Assume that each datapoint is generated according to the following recipe:

µ1

µ2

µ3







• Pick a component at random. Choose component i with probability P(ωi).

µ2








• Datapoint ~ N(µi, σ2I )

µ2

x


The General GMM assumption

µ1

µ2

µ3

• There are k components. The i’th component is called ωi


• Each component generates data from a Gaussian with mean µi and covariance matrix Σi



• Datapoint ~ N(µi , Σi )


Just evaluate a Gaussian at xk

Copyright © 2001, 2004, Andrew W. Moore

Expectation-Maximization for GMMsIterate until convergence:On the t’th iteration let our estimates be

λt = { μ1(t), μ2(t) … μc(t) }

E-step: Compute “expected” classes of all datapoints for each class

M-step: Estimate μ given our data’s class membership distributions


E.M. for General GMMsIterate. On the t’th iteration let our estimates beλt = { μ1(t), μ2(t) … μc(t), Σ1(t), Σ2(t) … Σc(t), p1(t), p2(t) … pc(t) }

E-step: Compute “expected” clusters of all datapoints

M-step: Estimate μ, Σ given our data’s class membership distributions

pi(t) is shorthand for estimate of P(ωi) on t’th iteration

R = #records

Just evaluate a Gaussian at xk


Gaussian Mixture

Example: Start

Advance apologies: in Black and White this example will be

incomprehensible


After first iteration


After 2nd iteration


After 3rd iteration


After 4th iteration


After 5th iteration


After 6th iteration


After 20th iteration


Some Bio Assay data


GMM clustering

of the assay data


Resulting Density

Estimator


Hidden Markov Models:

● Andrew Ng’s Notes

Mean Shift● Mean Shift Tutorial

Deep Methods● Tutorial on Variational Autoencoders

Other Methods and Reading

13 Unsupervised Learning Balloch - Georgia Institute of …bboots3/CS4641-Fall2016/Lectures/... ·...

Documents

Transcript of 13 Unsupervised Learning Balloch - Georgia Institute of …bboots3/CS4641-Fall2016/Lectures/... ·...