Post on 27-Jun-2018
UnsupervisedLearning:K-Means,
GaussianMixtureModels
Robot Image Credit: Viktoriya Sukhanova ©
123RF.com
These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others
who made their course materials freely available online. Feel free to reuse or adapt these slides for
your own academic purposes, provided that you include proper attribution. Please send comments
and corrections to Eric.
UnsupervisedLearning• Supervisedlearningusedlabeleddatapairs(x, y)
tolearnafunctionf : X→Y
– But,whatifwedon’thavelabels?– Mostdatadoesn’t!
• Nolabels=unsupervisedlearning• Onlysomepointsarelabeled=semi-supervisedlearning– Labelsmaybeexpensivetoobtain,soweonlygetafew
UnsupervisedLearningMajorCategories(andtheiruses)
•Clustering• Density-estimation• Dimensionalityreduction(featureselection)
K-MeansClustering
SomematerialadaptedfromslidesbyAndrewMoore,CMU.
Visithttp://www.autonlab.org/tutorials/ forAndrew’srepositoryofDataMiningtutorials.
K-MeansClustering
K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)
• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid
• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster
K-MeansClustering
K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)
• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid
• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster
K-MeansClustering
K-Means(k ,X )• Randomlychoosek clustercenterlocations(centroids)
• Loopuntilconvergence• Assigneachpointtotheclusteroftheclosestcentroid
• Re-estimatetheclustercentroidsbasedonthedataassignedtoeachcluster
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
K-MeansAnimation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
ProblemswithK-Means
• Very sensitivetotheinitialpoints– DomanyrunsofK-Means,eachwithdifferentinitialcentroids
– Seedthecentroidsusingabettermethodthanrandomlychoosingthecentroids
• e.g.,Farthest-firstsampling
• Mustmanuallychoosek– Learntheoptimalk fortheclustering
• Notethatthisrequiresaperformancemeasure
Choosingthekeyparameter
1.TheElbowMethoda. SSE=∑Ki=1∑x�cidist(x,ci)2
2.x-meansclusteringa. repeatedlyattemptingsubdivision
3.Others:a.InformationCriterionApproachb.AnInformationTheoreticApproachc.TheSilhouetteMethodd.Cross-validation
• Howdoyoutellitwhichclusteringyouwant?
Constrainedclusteringtechniques(semi-supervised)
ProblemswithK-Means
Same-clusterconstraint(must-link)
Different-clusterconstraint(cannot-link)
k = 2
Clustering with Gaussian Mixtures: Slide 36Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption• There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
µ1
µ2
µ3
Clustering with Gaussian Mixtures: Slide 37Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption• There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
µ1
µ2
µ3
Clustering with Gaussian Mixtures: Slide 38Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption• There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(ωi).
µ2
Clustering with Gaussian Mixtures: Slide 39Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption• There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(ωi).
• Datapoint ~ N(µi, σ2I )
µ2
x
Clustering with Gaussian Mixtures: Slide 40Copyright © 2001, 2004, Andrew W. Moore
The General GMM assumption
µ1
µ2
µ3
• There are k components. The i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix Σi
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(ωi).
• Datapoint ~ N(µi , Σi )
Clustering with Gaussian Mixtures: Slide 41
Just evaluate a Gaussian at xk
Copyright © 2001, 2004, Andrew W. Moore
Expectation-Maximization for GMMsIterate until convergence:On the t’th iteration let our estimates be
λt = { μ1(t), μ2(t) … μc(t) }
E-step: Compute “expected” classes of all datapoints for each class
M-step: Estimate μ given our data’s class membership distributions
Clustering with Gaussian Mixtures: Slide 42Copyright © 2001, 2004, Andrew W. Moore
E.M. for General GMMsIterate. On the t’th iteration let our estimates beλt = { μ1(t), μ2(t) … μc(t), Σ1(t), Σ2(t) … Σc(t), p1(t), p2(t) … pc(t) }
E-step: Compute “expected” clusters of all datapoints
M-step: Estimate μ, Σ given our data’s class membership distributions
pi(t) is shorthand for estimate of P(ωi) on t’th iteration
R = #records
Just evaluate a Gaussian at xk
Clustering with Gaussian Mixtures: Slide 43Copyright © 2001, 2004, Andrew W. Moore
Gaussian Mixture
Example: Start
Advance apologies: in Black and White this example will be
incomprehensible
Clustering with Gaussian Mixtures: Slide 44Copyright © 2001, 2004, Andrew W. Moore
After first iteration
Clustering with Gaussian Mixtures: Slide 45Copyright © 2001, 2004, Andrew W. Moore
After 2nd iteration
Clustering with Gaussian Mixtures: Slide 46Copyright © 2001, 2004, Andrew W. Moore
After 3rd iteration
Clustering with Gaussian Mixtures: Slide 47Copyright © 2001, 2004, Andrew W. Moore
After 4th iteration
Clustering with Gaussian Mixtures: Slide 48Copyright © 2001, 2004, Andrew W. Moore
After 5th iteration
Clustering with Gaussian Mixtures: Slide 49Copyright © 2001, 2004, Andrew W. Moore
After 6th iteration
Clustering with Gaussian Mixtures: Slide 50Copyright © 2001, 2004, Andrew W. Moore
After 20th iteration
Clustering with Gaussian Mixtures: Slide 51Copyright © 2001, 2004, Andrew W. Moore
Some Bio Assay data
Clustering with Gaussian Mixtures: Slide 52Copyright © 2001, 2004, Andrew W. Moore
GMM clustering
of the assay data
Clustering with Gaussian Mixtures: Slide 53Copyright © 2001, 2004, Andrew W. Moore
Resulting Density
Estimator