UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16....

60
UVA CS 6316: Machine Learning Lecture 19c: Unsupervised Clustering (III): Gaussian Mixture Model Dr. Yanjun Qi University of Virginia Department of Computer Science

Transcript of UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16....

Page 1: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

UVA CS 6316: Machine Learning

Lecture 19c: Unsupervised Clustering (III): Gaussian Mixture Model

Dr. Yanjun Qi

University of Virginia Department of Computer Science

Page 2: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Course Content Plan èSix major sections of this course

q Regression (supervised)q Classification (supervised)

qFeature Selection

q Unsupervised models q Dimension Reduction (PCA)q Clustering (K-means, GMM/EM, Hierarchical )

q Learning theory

q Graphical models

q Reinforcement Learning 11/6/19 Dr. Yanjun Qi / UVA CS

2

Y is a continuous

Y is a discrete

NO Y

About f()

About interactions among X1,… Xp

Learn program to Interact with its environment

Page 3: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

An unlabeled Dataset X

• Data/points/instances/examples/samples/records: [ rows ]• Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [ columns]

11/25/19 3

a data matrix of n observations on p variables x1,x2,…xp

Unsupervised learning = learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification label of examples is given

Dr. Yanjun Qi / UVA CS

Page 4: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

• Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups

What is clustering?

Inter-cluster distances are maximized

Intra-cluster distances are

minimized

11/25/19 4

Dr. Yanjun Qi / UVA CS

Page 5: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

Roadmap: clustering

§ Definition of "groupness”§ Definition of "similarity/distance"§ Representation for objects§ How many clusters?§ Clustering Algorithms

§ Partitional algorithms§ Hierarchical algorithms

§ Formal foundation and convergence5

Dr. Yanjun Qi / UVA CS

Page 6: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

6

Clustering Algorithms

• Partitional algorithms– Usually start with a random

(partial) partitioning– Refine it iteratively

• K means clustering• Mixture-Model based clustering

• Hierarchical algorithms– Bottom-up, agglomerative– Top-down, divisive

Dr. Yanjun Qi / UVA CS

Page 7: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

(2) Partitional Clustering

• Nonhierarchical• Construct a partition of n objects into a set of

K clusters• User has to specify the desired number of

clusters K.

11/25/19 7

Dr. Yanjun Qi / UVA CS

Page 8: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

8

Other partitioning Methods• Partitioning around medoids (PAM): instead of averages,

use multidim medians as centroids (cluster “prototypes”). Dudoit and Freedland (2002).

• Self-organizing maps (SOM): add an underlying “topology”(neighboring structure on a lattice) that relates cluster centroids to one another. Kohonen (1997), Tamayo et al. (1999).

• Fuzzy k-means: allow for a “gradation” of points between clusters; soft partitions. Gash and Eisen (2002).

• Mixture-based clustering: implemented through an EM (Expectation-Maximization)algorithm. This provides softpartitioning, and allows for modeling of cluster centroids and shapes. (Yeung et al. (2001), McLachlan et al. (2002))

Dr. Yanjun Qi / UVA CS

Page 9: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

9

Other partitioning Methods• Partitioning around medoids (PAM): instead of averages,

use multidim medians as centroids (cluster “prototypes”). Dudoit and Freedland (2002).

• Self-organizing maps (SOM): add an underlying “topology”(neighboring structure on a lattice) that relates cluster centroids to one another. Kohonen (1997), Tamayo et al. (1999).

• Fuzzy k-means: allow for a “gradation” of points between clusters; soft partitions. Gash and Eisen (2002).

• Mixture-based clustering: implemented through an EM (Expectation-Maximization)algorithm. This provides softpartitioning, and allows for modeling of cluster centroids and shapes. (Yeung et al. (2001), McLachlan et al. (2002))

Dr. Yanjun Qi / UVA CS

Page 10: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Islands of music (Pampalk et al., KDD’ 03)

E.g.: SOM Used for Visualization

11/25/19 10

Dr. Yanjun Qi / UVA CS 6316 / f16

Page 11: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

11

Other partitioning Methods• Partitioning around medoids (PAM): instead of averages,

use multidim medians as centroids (cluster “prototypes”). Dudoit and Freedland (2002).

• Self-organizing maps (SOM): add an underlying “topology”(neighboring structure on a lattice) that relates cluster centroids to one another. Kohonen (1997), Tamayo et al. (1999).

• Fuzzy k-means: allow for a “gradation” of points between clusters; soft partitions. Gash and Eisen (2002).

• Mixture-based clustering: implemented through an EM (Expectation-Maximization)algorithm. This provides softpartitioning, and allows for modeling of cluster centroids and shapes. (Yeung et al. (2001), McLachlan et al. (2002))

Dr. Yanjun Qi / UVA CS

Page 12: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

12

Other partitioning Methods• Partitioning around medoids (PAM): instead of averages,

use multidim medians as centroids (cluster “prototypes”). Dudoit and Freedland (2002).

• Self-organizing maps (SOM): add an underlying “topology”(neighboring structure on a lattice) that relates cluster centroids to one another. Kohonen (1997), Tamayo et al. (1999).

• Fuzzy k-means: allow for a “gradation” of points between clusters; soft partitions. Gash and Eisen (2002).

• Mixture-based clustering: implemented through an EM (Expectation-Maximization)algorithm. This provides softpartitioning, and allows for modeling of cluster centroids and shapes. (Yeung et al. (2001), McLachlan et al. (2002))

Dr. Yanjun Qi / UVA CS

Page 13: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Partitional : Gaussian Mixture Model

• 1. Review of Gaussian Distribution • 2. GMM for clustering : basic algorithm• 3. GMM connecting to K-means• 4. Problems of GMM and K-means

11/25/19

Dr. Yanjun Qi / UVA CS

13

Page 14: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

A Gaussian Mixture Model for Clustering

• Assume that data are generated from a mixture of Gaussian distributions

• For each Gaussian distribution– Center: j

– covariance: j

• For each data point– Determine membership

: if belongs to j-th clusterij iz x

11/25/19 14

Dr. Yanjun Qi / UVA CS

⌃µ

Page 15: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Gaussian Distribution

Courtesy: http://research.microsoft.com/~cmbishop/PRML/index.htm

Covariance MatrixMean11/25/19

Dr. Yanjun Qi / UVA CS

15

( )2~ ,X N µ s

Page 16: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Example: the Bivariate Normal distribution

p( !x) = f x1,x2( ) = 1

2π( ) Σ 1/2 e−1

2!x−!µ( )T Σ−1 !x−

!µ( )

with

!µ =

µ1

µ2

⎣⎢⎢

⎦⎥⎥

22 1 2

22 22 22 2 2

s s s rs ss s rs s s11 1 1

´1 1

é ùé ùS = = ê úê ú

ë û ë û

and

( )2 2 2 222 12 1 2 1s s s s s r11S = - = -

11/25/19

Dr. Yanjun Qi / UVA CS

16

Page 17: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Scatter Plots of data from the bivariate Normal distribution

Dr. Yanjun Qi / UVA CS

Page 18: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

How to Estimate Gaussian: MLE

µ = 1

n ixi=1

n

• In the 1D Gaussian case, we simply set the mean and the variance to the sample mean and the sample variance:

2

σ = 1n

2

(Xi−µ)i=1

n

Dr. Yanjun Qi / UVA CS

Page 19: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

< X1, X2!, X p >~ N µ

"#,Σ( )

The p-multivariate Normal distribution

Page 20: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Partitional : Gaussian Mixture Model

• 1. Review of Gaussian Distribution • 2. GMM for clustering : basic algorithm• 3. GMM connecting to K-means• 4. Problems of GMM and K-means

11/25/19

Dr. Yanjun Qi / UVA CS

20

Page 21: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Application: Three Speaker Recognition Tasks

MIT Lincoln Laboratory7

Speaker Recognition Tasks

?

?

?

?

Whose voice is this?Whose voice is this??

?

?

?

Whose voice is this?Whose voice is this?

Identification

?

Is this Bob’s voice?Is this Bob’s voice?

?

Is this Bob’s voice?Is this Bob’s voice?

Verification/Authentication/Detection

Speaker B

Speaker A

Which segments are from the same speaker?Which segments are from the same speaker?

Where are speaker changes?Where are speaker changes?

Speaker B

Speaker A

Which segments are from the same speaker?Which segments are from the same speaker?

Where are speaker changes?Where are speaker changes?

Segmentation and Clustering (Diarization)

slide from Douglas Reynolds11/25/19

Dr. Yanjun Qi / UVA CS

21

Page 22: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Application : GMMs for speaker recognition

• A Gaussian mixture model (GMM) represents as the weighted sum of multiple Gaussian distributions

• Each Gaussian state i has a– Mean – Covariance– Weight

Dr. Yanjun Qi / UVA CS

Dim 1Dim 2

Model j

p(x | j)

µ j

Σ j

wj ≡ p(µ = µ j )

Page 23: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Recognition SystemsGaussian Mixture Models

Dr. Yanjun Qi / UVA CS

Parameters µ j

Σ j

wj

Dim 1Dim 2

( )p x

11/25/19 23

Page 24: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Recognition SystemsGaussian Mixture Models

Dr. Yanjun Qi / UVA CS

Model Components

Parameters

Dim 1Dim 2

( )p x

11/25/19 24

j = 1,..., K

Page 25: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Learning a Gaussian Mixture• Probability Model

p( !x = !xi )

= p( !x = !xi ,!µ =!µ j )

j∑

= p(!µ =!µ j ) p( !x = !xi |

!µ =!µ j )

j∑

= p(!µ =!µ j )

1

2π( )p/2Σ j

1/2 e−1

2!x−!µ j( )T Σ j

−1 !x−!µ j( )

j∑

11/25/19 25

Dr. Yanjun Qi / UVA CS

Total low of probability

Chain rule

A Gaussian mixture model (GMM) represents as the weighted sum of multiple Gaussian distributions

Page 26: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Max Log-likelihood of Observed Data Samples

Dr. Yanjun Qi / UVA CS

o Log-likelihood of data

Apply MLE to find

optimal Gaussian parameters

{p(!µ = µ j )}, j = 1...K{ }

{!µ j ,Σ j , j = 1...K}

26

logp(x1, x2, x3, ..., xn) =

logi=1..n∏ p(

!µ =!µ j )

1

2π( )p/2Σ j

1/2e−1

2!xi−!µ j( )T Σ j

−1 !xi−!µ j( )

j=1..K∑

Page 27: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

27

Expectation-Maximization for training GMM

• Start: – "Guess" the centroid and covariance for each of the K

clusters – “Guess” the proportion of clusters, e.g., uniform prob 1/K

• Loop– For each point, revising its proportions belonging to each

of the K clusters – For each cluster, revising both the mean (centroid

position) and covariance (shape)

Dr. Yanjun Qi / UVA CS

Page 28: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

Dr. Yanjun Qi / UVA CS

28

each cluster, revising both the mean (centroid position) and covariance (shape)

Page 29: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Another Gaussian Mixture Example: Start

11/25/19 29

Dr. Yanjun Qi / UVA CS

Page 30: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Another GMM Example: After First Iteration

11/25/19 30

Dr. Yanjun Qi / UVA CS

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

For each point, revising its proportions belonging to each of the K clusters

Page 31: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Another GMM Example: After 2nd Iteration

11/25/19 31

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 32: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

After 3rd Iteration

11/25/19 32

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 33: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

After 4th Iteration

11/25/19 33

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 34: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

After 5th Iteration

11/25/19 34

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 35: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

After 6th Iteration

11/25/19 35

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 36: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Another GMM Example: After 20th Iteration

11/25/19 36

Dr. Yanjun Qi / UVA CS

For each point, revising its proportions belonging to each of the K clusters

For each cluster, revising its mean (centroid position), covariance (shape) and proportion in the mixture

Page 37: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Recap: Gaussian Mixture Models

Dr. Yanjun Qi / UVA CS

Model Components

Parameters

Dim 1Dim 2

( )p x

11/25/19 37

j = 1,..., K

Page 38: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

The Simplest GMM assumption

• Each component generates data from a Gaussian with • mean μi

• Shared diagonal covariance matrix σ2I

µ1

µ2

µ3

11/25/19 38

Page 39: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

Another Simple GMM assumption

• Each component generates data from a Gaussian with • mean μi

• Shared covariance matrix as diagonal matrix

µ1

µ2

µ3

11/25/19 39

Page 40: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

The General GMM assumption

µ1

µ2

µ3

• Each component generates data from a Gaussian with • mean μi

• covariance matrix Σi

11/25/19 40

Page 41: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

Another Simple GMM assumption

• Each component generates data from a Gaussian with • mean μi

• Cluster-specific diagonal covariance matrix as σj2I

µ1

µ2

µ3

σj2I11/25/19 41

Page 42: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

A bit More General GMM assumption

• Each component generates data from a Gaussian with • mean μi

• Shared covariance matrix as full matrix

µ1

µ2

µ3

11/25/19 42

Page 43: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Concrete Equations for Learning a Gaussian Mixture(when assuming with known shared covariance)

11/25/19 43

Dr. Yanjun Qi / UVA CS

Assuming Known and Shared

p( !x = !xi )

= p( !x = !xi ,!µ =!µ j )

µ j

= p(!µ =!µ j ) p( !x = !xi |

!µ =!µ j )

j∑

= p(!µ =!µ j )

1

2π( )p/2Σ

1/2 e−1

2!xi−!µ j( )T Σ−1 !xi−

!µ j( )

j∑

when assuming with known shared covariance

Page 44: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Learning a Gaussian Mixture(when assuming with known shared covariance)

=

1

2π( )p/2Σ

1/2 e−1

2!xi−!µ j( )T Σ−1 !xi−

!µ j( ) p(µ = µ j )

1

2π( )p/2Σ

1/2 e−1

2!xi−!µs( )T Σ−1 !xi−

!µs( )

p(µ = µs )s=1

k

E[zij ] = p(

!µ = µ j | x = xi )E-Step

11/25/19 44

Dr. Yanjun Qi / UVA CS

=p(x = xi |µ = µ j ) p(µ = µ j )

p(x = xi |µ = µs ) p(µ = µs )s=1

k

Page 45: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

E-step (vs. Assignment Step in K-means)

=

1

2π( )p/2Σ

1/2 e−1

2!x−!µ j( )T Σ−1 !x−

!µ j( ) p(µ = µ j )

1

2π( )p/2Σ

1/2 e−1

2!x−!µs( )T Σ−1 !x−

!µs( )

p(µ = µs )s=1

k

[ ] ( | )ij j iE z p x xµ µ= = =E-Step

11/25/19 45

Dr. Yanjun Qi / UVA CS

=p(x = xi |µ = µ j ) p(µ = µ j )

p(x = xi |µ = µs ) p(µ = µs )s=1

k

when assuming with known shared covariance

Page 46: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Learning a Gaussian Mixture

µ j ←1

E[zij ]i=1

n

∑E[zij ]xi

i=1

n

∑M-Step

p(µ = µ j )←

1n

E[zij ]i=1

n

11/25/19 46

Dr. Yanjun Qi / UVA CS

Covariance: j (j: 1 to K) can also be derived in the M-step under a full setting

when assuming with known shared covariance

Page 47: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

M-step (vs. Centroid Step in K-means)

µ j ←1

E[zij ]i=1

n

∑E[zij ]xi

i=1

n

∑M-Step

p(µ = µ j )←

1n

E[zij ]i=1

n

11/25/19 47

Dr. Yanjun Qi / UVA CS

Covariance: j (j: 1 to K) will also be derived in the M-step under a full setting

when assuming with known shared covariance

Page 48: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

Dr. Yanjun Qi / UVA CS

48

M-step for Estimating unknown Covariance Matrix

(more general, details in EM-Extra lecture)

!!

!Σ j(t+1) = i=1

n E[zij ](t )(xi − µ j(t+1))(xi − µ j

(t+1))T∑E[zij ](t )

i=1

n

Page 49: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

49

Recap: Expectation-Maximization for training GMM

• Start: – "Guess" the centroid and covariance for each of the K

clusters – “Guess” the proportion of clusters, e.g., uniform prob 1/K

• Loop– For each point, revising its proportions belonging to each

of the K clusters – For each cluster, revising both the mean (centroid

position) and covariance (shape)

Dr. Yanjun Qi / UVA CS

Page 50: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Partitional : Gaussian Mixture Model

• 1. Review of Gaussian Distribution • 2. GMM for clustering : basic algorithm• 3. GMM connecting to K-means• 4. Problems of GMM and K-means

11/25/19

Dr. Yanjun Qi / UVA CS

50

Page 51: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Recap: K-means iterative learning

argmin!C j ,mi , j{ }

mi, j!xi −!C j( )2

i=1

n

∑j=1

K

{ } { },Memberships and centers are correlated.i j jm C

Given memberships mi, j{ }, !C j =

mi, j!xi

i=1

n

mi, ji=1

n

Given centers {!C j}, mi, j =

1 j = argmink

(!xi −!C j )

2

0 otherwise

⎧⎨⎪

⎩⎪

M-Step

E-Step

11/25/19 51

Dr. Yanjun Qi / UVA CS

Page 52: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

52

Compare: K-means

• The EM algorithm for mixtures of Gaussians is like a "soft version" of the K-means algorithm.

• In the K-means “E-step” we do hard assignment:• In the K-means “M-step” we update the means

as the weighted sum of the data, but now the weights are 0 or 1:

11/25/19

Page 53: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

11/25/19

Dr. Yanjun Qi / UVA CS

53

argmin!C j ,mi , j{ }

mi, j!xi −!C j( )2

i=1

n

∑j=1

K

log p(x = xi )i=1

n

∏i∑ = log p(µ = µ j )

1

2π( ) Σ 1/2e−1

2!x−!µ j( )T Σ−1 !x−

!µ j( )

µ j

∑⎡

⎢⎢

⎥⎥i

• K-Mean only detect spherical clusters.• GMM can adjust its self to elliptic shape clusters.

Page 54: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

(3) GMM Clustering

Clustering

Likelihood

EM algorithm

Each point’s soft membership & mean

/ covariance per cluster

Task

Representation

Score Function

Search/Optimization

Models, Parameters

11/25/19 54

Mixture of Gaussian

Dr. Yanjun Qi / UVA CS

log p(x = xi )i=1

n

∏i∑ = log p(µ = µ j )

1

2π( ) Σ j

1/2e−1

2!x−!µ j( )T Σ j

−1 !x−!µ j( )

µ j

∑⎡

⎢⎢⎢

⎥⎥⎥i

Page 55: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Partitional : Gaussian Mixture Model

• 1. Review of Gaussian Distribution • 2. GMM for clustering : basic algorithm• 3. GMM connecting to K-means• 4. Problems of GMM and K-means

11/25/19

Dr. Yanjun Qi / UVA CS

55

Page 56: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Dr. Yanjun Qi / UVA CS

Unsupervised Learning:not as hard as it looks

Sometimes easy

Sometimes impossible

and sometimes in between

Page 57: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Problems (I)

• Both k-means and mixture models need to compute centers of clusters and explicit distance measurement– Given strange distance measurement, the center of clusters

can be hard to computeE.g.,

!x − !x '

∞= max x1 − x1

' , x2 − x2' ,..., xp − xp

'( )x y

z

¥ ¥- = -x y x z

11/25/19

Dr. Yanjun Qi / UVA CS

57

Page 58: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

Problem (II)

• Both k-means and mixture models look for compact clustering structures– In some cases, connected clustering structures are more desirable

Graph based clustering

e.g. MinCut, Spectral

clustering

11/25/19 58

Dr. Yanjun Qi / UVA CS

Page 59: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

e.g. Image Segmentation through minCut

11/25/19 59

Dr. Yanjun Qi / UVA CS

Page 60: UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

References

q Hastie, Trevor, et al. The elements of statistical learning. Vol. 2. No. 1. New York: Springer, 2009.

q Big thanks to Prof. Eric Xing @ CMU for allowing me to reuse some of his slides

q Big thanks to Prof. Ziv Bar-Joseph @ CMU for allowing me to reuse some of his slides

q clustering slides from Prof. Rong Jin @ MSU

11/25/19 60

Dr. Yanjun Qi / UVA CS