UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS...

58
UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University of Virginia Department of Computer Science 11/29/16 Dr. Yanjun Qi / UVA CS 6316 / f15 1

Transcript of UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS...

Page 1: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

UVACS6316/4501–Fall2016

MachineLearning

Lecture21:EM(Extra)

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

11/29/16

Dr.YanjunQi/UVACS6316/f15

1

Page 2: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Wherearewe?èmajorsecGonsofthiscourse

q Regression(supervised)q ClassificaGon(supervised)

q FeatureselecGonq Unsupervisedmodels

q DimensionReducGon(PCA)q Clustering(K-means,GMM/EM,Hierarchical)

q Learningtheoryq Graphicalmodels

q (BNandHMMslidesshared)11/29/16 2

Dr.YanjunQi/UVACS6316/f15

Page 3: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Today Outline•  Principles for Model Inference

–  Maximum Likelihood Estimation –  Bayesian Estimation

•  Strategies for Model Inference –  EM Algorithm – simplify difficult MLE

•  Algorithm •  Application •  Theory

–  MCMC – samples rather than maximizing

11/29/16

Dr.YanjunQi/UVACS6316/f15

3

Page 4: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Model Inference through Maximum Likelihood Estimation (MLE)

Assumption: the data is coming from a known probability distribution The probability distribution has some parameters that are unknown to you

Example:dataisdistributedasGaussian,sotheunknownparametershereare

MLE is a tool that estimates the unknown parameters of the probability distribution from data

✓ = (µ,�2)yi = N(µ,�2)

11/29/16

Dr.YanjunQi/UVACS6316/f15

4

Page 5: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

MLE: e.g. Single Gaussian Model (when p=1)

•  Need to adjust the parameters (è model inference)

•  So that the resulting distribution fits the observed data well

11/29/16

Dr.YanjunQi/UVACS6316/f15

5

Page 6: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Maximum Likelihood revisited yi = N(µ,�2)

!! Y = { y1 , y2 ,…, yN }

!!l(θ )= log(L(θ ;Y ))= log p( yi )

i=1

N

)

11/29/16

Dr.YanjunQi/UVACS6316/f15

6

Page 7: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

MLE: e.g. Single Gaussian Model

•  Assume observation data yi are independent

•  Form the Likelihood:

•  Form the Log-likelihood:

!!

L(θ ;Y )= p( yi )i=1

N

∏ = 12πσ 2i=1

N

∏ exp(− ( yi − µ)2

2σ 2 );

Y = { y1 , y2 ,…, yN }

!!l(θ )= log( 1

2πσi=1

N

∏ exp(− ( yi − µ)2

2σ 2 ))= − ( yi − µ)22σ 2

i=1

N

∑ −N log( 2πσ )

11/29/16

Dr.YanjunQi/UVACS6316/f15

7

Page 8: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

MLE: e.g. Single Gaussian Model

•  To find out the unknown parameter values, maximize the

log-likelihood with respect to the unknown parameters:

∑∑=

= −=⇒=∂∂=⇒=

∂∂ N

ii

N

i i µyN

lNy

µµl

1

222

1 )(10;0 σσ

)

11/29/16

Dr.YanjunQi/UVACS6316/f15

8

Page 9: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

MLE: A Challenging Mixture Example

histogram

Indicator variable

is the probability with which the observation is chosen from density model 2 (1- ) is the probability with which the observation is chosen from density 1

Mixture model:

}1,0{;)1(),(~);,(~

21

2222

2111

∈ΔΔ+Δ−= YYYσµNYσµNY

),();,( 222111 σµθσµθ ==

⇡⇡

Dr.YanjunQi/UVACS6316/f15

911/29/16

Page 10: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

MLE: Gaussian Mixture Example

Maximum likelihood fitting for parameters:

Numerically (and of course analytically, too) Challenging to solve!!

),,,,( 2121 σσµµπ=θ

)

11/29/16

Dr.YanjunQi/UVACS6316/f15

10

Page 11: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Bayesian Methods & Maximum Likelihood

•  Bayesian Pr(model|data) i.e. posterior =>Pr(data|model) Pr(model) => Likelihood * prior

•  Assume prior is uniform, equal to MLE

argmax_model Pr(data | model) Pr(model) = argmax_model Pr(data | model)

11/29/16

Dr.YanjunQi/UVACS6316/f15

11

Page 12: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Today Outline•  Principles for Model Inference

–  Maximum Likelihood Estimation –  Bayesian Estimation

•  Strategies for Model Inference –  EM Algorithm – simplify difficult MLE

•  Algorithm •  Application •  Theory

–  MCMC – samples rather than maximizing

11/29/16

Dr.YanjunQi/UVACS6316/f15

12

Page 13: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Here is the problem

11/29/16

Dr.YanjunQi/UVACS6316/f15

13

Page 14: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

All we have is

FromwhichweneedtoinferthelikelihoodfuncGonwhichgeneratetheobservaGons

histogram

11/29/16

Dr.YanjunQi/UVACS6316/f15

14

Page 15: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Expectation Maximization: add latent variable è latent data

EM augments the data space– assumes with latent data

11/29/16

Dr.YanjunQi/UVACS6316/f15

15

Page 16: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Computing log-likelihood based on complete data

Maximizing this form of log-likelihood is now tractable

Note that we cannot analytically maximize the previous log-likelihood with only observed Y={y_1, y_2, …, y_n}

T = {ti = (yi,�i), i = 1...N}

Dr.YanjunQi/UVACS6316/f15

1611/29/16

Page 17: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM: The Complete Data Likelihood

;)1(

)1(0

1

11

1

0

=

=

Δ−

Δ−=⇒=

∂∂

N

ii

N

iii yl µ

µ

;)1(

))(1(0

1

1

21

212

1

0

=

=

Δ−

−Δ−=⇒=

∂∂

N

ii

N

iii µy

σσl

By simple differentiations we have:

How do we get the latent variables?

So, maximization of the complete data likelihood is much easier!

11/29/16

Dr.YanjunQi/UVACS6316/f15

17

Page 18: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM: The Complete Data Likelihood

;0

1

12

2

0

=

=

Δ

Δ=⇒=

∂∂

N

ii

N

iii yl µ

µ

;)(

0

1

1

22

222

2

0

=

=

Δ

−Δ=⇒=

∂∂

N

ii

N

iii µy

σσl

;0 10

πl

N

ii∑

=

Δ=⇒=

∂∂

By simple differentiations we have:

How do we get the latent variables?

So, maximization of the complete data likelihood is much easier!

11/29/16

Dr.YanjunQi/UVACS6316/f15

18

Page 19: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Obtaining Latent Variables The latent variables are computed as expected values given the data and parameters:

),|1Pr(),|()( iiiii yyEθγ θθ =Δ=Δ=

Apply Bayes’ rule:

πyΦπyΦπyΦ

yyyyθγ

ii

i

iiiiii

iiiiii

)()1)(()(

)|0Pr(),0|Pr()|1Pr(),1|Pr()|1Pr(),1|Pr(),|1Pr()(

21

2

θθ

θ

θθθθθθθ

+−=

=Δ=Δ+=Δ=Δ=Δ=Δ==Δ=

11/29/16

Dr.YanjunQi/UVACS6316/f15

19

Page 20: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Dilemma Situation

•  We need to know latent variable / data to maximize the complete log-likelihood to get the parameters

•  We need to know the parameters to calculate the expected values of latent variable / data

•  è Solve through iterations

11/29/16

Dr.YanjunQi/UVACS6316/f15

20

Page 21: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

So we iterate è EM for Gaussian Mixtures…

Y Y

Y11/29/16

Dr.YanjunQi/UVACS6316/f15

21

Page 22: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM for Gaussian Mixtures…

Y

11/29/16

Dr.YanjunQi/UVACS6316/f15

22

Page 23: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM for Two-component Gaussian Mixture

•  Initialize •  Iterate until convergence

–  Expectation of latent variables

– Maximization for finding parameters

)2

)(2

)(exp(11

1)()1)((

)()(

22

22

21

21

1

221

2

σµy

σµyπyΦπyΦ

πyΦθγ

iiii

ii −+−−−+

=+−

=

σσ

ππθθ

θ

;)1(

)1(

1

11

=

=

−= N

ii

N

iii

γ

yγµ ;

1

12

=

== N

ii

N

iii

γ

yγµ ;

)1(

))(1(

1

1

21

21

=

=

−−= N

ii

N

iii

γ

µyγσ ;

)(

1

1

22

22

=

=

−= N

ii

N

iii

γ

µyγσ ;1

N

γπ

N

ii∑

==

µ1,�1, µ2,�2,⇡

11/29/16

Dr.YanjunQi/UVACS6316/f15

23

Page 24: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM in….simple words

•  Given observed data, you need to come up with a generative model

•  You choose a model that comprises of some hidden variables hi (this is your belief!)

•  Problem: To estimate the parameters of model –  Assume some initial values parameters –  Replace values of hidden variable with their

expectation (given the old parameters) –  Recompute new values of parameters (given ) –  Check for convergence using log-likelihood

�i

�i

11/29/16

Dr.YanjunQi/UVACS6316/f15

24

Page 25: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM – Example (cont’d)

Selected iterations of the EM algorithm For mixture example

Iteration � 1 0.485 5 0.493 10 0.523 15 0.544 20 0.546

histogram

11/29/16

Dr.YanjunQi/UVACS6316/f15

25

Page 26: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

EM Summary •  An iterative approach for MLE •  Good idea when you have missing or latent

data •  Has a nice property of convergence •  Can get stuck in local minima (try different

starting points) •  Generally hard to calculate expectation over

all possible values of hidden variables •  Still not much known about the rate of

convergence

11/29/16

Dr.YanjunQi/UVACS6316/f15

26

Page 27: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Today Outline•  Principles for Model Inference

–  Maximum Likelihood Estimation –  Bayesian Estimation

•  Strategies for Model Inference –  EM Algorithm – simplify difficult MLE

•  Algorithm •  Application •  Theory

–  MCMC – samples rather than maximizing

11/29/16

Dr.YanjunQi/UVACS6316/f15

27

Page 28: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Applications of EM

– Mixture models – HMMs – Latent variable models – Missing data problems – …

11/29/16

Dr.YanjunQi/UVACS6316/f15

28

Page 29: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Applications of EM (1)

•  Fibngmixturemodels

11/29/16

Dr.YanjunQi/UVACS6316/f15

29

Page 30: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Applications of EM (2)

•  ProbabilisGcLatentSemanGcAnalysis(pLSA)– Techniquefromtextfortopicmodeling

P(w,d)

P(w|z)

P(z|d)

Z

W D

Z

D

W

11/29/16

Dr.YanjunQi/UVACS6316/f15

30

Page 31: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Applications of EM (3)

•  Learningpartsandstructuremodels

11/29/16

Dr.YanjunQi/UVACS6316/f15

31

Page 32: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Applications of EM (4)

•  AutomaGcsegmentaGonoflayersinvideo

hhp://www.psi.toronto.edu/images/figures/cutouts_vid.gif

11/29/16

Dr.YanjunQi/UVACS6316/f15

32

Page 33: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Expectation Maximization (EM)

• Oldidea(late50’s)butformalizedbyDempster,LairdandRubinin1977

• SubjectofmuchinvesGgaGon.SeeMcLachlan&Krishnanbook1997.

11/29/16

Dr.YanjunQi/UVACS6316/f15

33

Page 34: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

34

single-variable + two-cluster case

Page 35: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

35

multi-variable + multi-cluster case

Page 36: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Today Outline•  Principles for Model Inference

–  Maximum Likelihood Estimation –  Bayesian Estimation

•  Strategies for Model Inference –  EM Algorithm – simplify difficult MLE

•  Algorithm •  Application •  Theory

–  MCMC – samples rather than maximizing

11/29/16

Dr.YanjunQi/UVACS6316/f15

36

Page 37: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

37

WhyisLearningHarder?

•  Infullyobservediidsebngs,thecompleteloglikelihooddecomposesintoasumoflocalterms.

•  Whenwithlatentvariables,alltheparametersbecomecoupledtogetherviamarginaliza)on

),|(log)|(log)|,(log);( xzc zxpzpzxpD θθθθ +==l

!! l (θ ;D)= logp(x |θ )= log p(z |θz )p(x |z ,θ x )

z∑

Page 38: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Dr.YanjunQi/UVACS6316/f15

38

GradientLearningformixturemodels

•  WecanlearnmixturedensiGesusinggradientdescentontheobservedloglikelihood.ThegradientsarequiteinteresGng:

•  Inotherwords,thegradientistheresponsibilityweightedsumoftheindividualloglikelihoodgradients.

•  CanpassthistoaconjugategradientrouGne.

∑∑

∂∂=

∂∂

=

∂∂

=

∂∂

=∂∂

==

k k

kk

k k

kkkkk

k

kkkk

k

k

kkk

kkkk

rppp

ppp

pp

pp

θθθ

θθ

π

θθ

θθ

πθθ

πθθ

θπθθ

l

l

l

)x(log)|x()x(

)x(log)x(

)|x(

)x()|x(

)x(log)|x(log)(

1

11/29/16

Page 39: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

39

ParameterConstraints•  Onenwehaveconstraintsontheparameters,e.g.

beingsymmetricposiGvedefinite.•  WecanuseconstrainedopGmizaGon,orwecanre-

parameterizeintermsofunconstrainedvalues.–  Fornormalizedweights,sonmaxtoe.g.

–  Forcovariancematrices,usetheCholeskydecomposiGon:

whereAisupperdiagonalwithposiGvediagonal:

–  Usechainruletocompute

AAT=Σ−1

( ) )( )( exp ijij ijijijiii <=>=>= 00 AAA ηλ

. ,A∂∂

∂∂ llπ

⌃k

π j

j=1

K

∑ = 1

Page 40: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

40

IdenGfiability•  AmixturemodelinducesamulG-modallikelihood.•  Hencegradientascentcanonlyfindalocalmaximum.•  MixturemodelsareunidenGfiable,sincewecanalways

switchthehiddenlabelswithoutaffecGngthelikelihood.•  Henceweshouldbecarefulintryingtointerpretthe“meaning”oflatentvariables.

Page 41: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

41

ExpectaGon-MaximizaGon(EM)Algorithm

•  EMisanIteraGvealgorithmwithtwolinkedsteps:– E-step:fill-inhiddenvaluesusinginference:p(z|x,\thetat).– M-step:updateparameters(t+1)roundsusingstandardMLE/MAPmethodappliedtocompleteddata

•  Wewillprovethatthisproceduremonotonicallyimproves(orleavesitunchanged).ThusitalwaysconvergestoalocalopGmumofthelikelihood.

Page 42: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

42

TheoryunderlyingEM•  Whatarewedoing?

•  RecallthataccordingtoMLE,weintendtolearnthemodelparameterthatwouldhavemaximizethelikelihoodofthedata.

•  Butwedonotobservez,socompuGng

isdifficult!•  Whatshallwedo?

∑∑ ==z

xzz

c zxpzpzxpD ),|()|(log)|,(log);( θθθθl

Page 43: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

43

(1)IncompleteLogLikelihoods

•  IncompleteloglikelihoodWithzunobserved,ourobjecGvebecomesthelogofamarginalprobability:

– ThisobjecKvewon'tdecouple

!! l (θ ;x)= logp(x |θ )= log p(x ,z |θ )

z∑

Page 44: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

44

(2)CompleteLogLikelihoods

•  CompleteloglikelihoodLetXdenotetheobservablevariable(s),andZdenotethelatentvariable(s).IfZcouldbeobserved,then

– Usually,opGmizinglc()givenbothzandxisstraightorward(c.f.MLEforfullyobservedmodels).

–  RecalledthatinthiscasetheobjecGvefor,e.g.,MLE,decomposesintoasumoffactors,theparameterforeachfactorcanbeesGmatedseparately.

–  ButgiventhatZisnotobserved,lc()isarandomquanKty,cannotbemaximizeddirectly.

!! l c(θ ;x ,z)=deflogp(x ,z |θ )= logp(z |θz )p(x |z ,θ x )

Page 45: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Completelog-likelihood(CLL)

Log-likelihood[Incompletelog-likelihood(ILL)]

Expectedcompletelog-likelihood(ECLL)

Threetypesoflog-likelihoodovermulGpleobservedsamples(x_1,x_2,…,x_N)

Observeddata

Latentvariables

IteraGonindex

Page 46: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

(3)ExpectedCompleteLogLikelihood

•  ForanydistribuGonq(z),defineexpectedcompleteloglikelihood(ECLL):

•  CLLisrandomvariableèECLLisadeterminisGcfuncGonofq

•  LinearinCLL()---inherititsfactorizabiility•  Doesmaximizingthissurrogateyieldamaximizerofthelikelihood?

!! ECLL= l c(θ ;x ,z) q

=def

q(z |x ,θ )logp(x ,z |θ )z∑

Page 47: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

Jensen’sinequality

11/29/16

Dr.YanjunQi/UVACS6316/f15

47

Page 48: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

48

Jensen’sinequality

!! ECLL= l c(θ ;x ,z) q

=def

q(z |x ,θ )logp(x ,z |θ )z∑

!!

ILL= l (θ ;x)= logp(x |θ )= log p(x ,z |θ )

z∑

= log q(z |x)p(x ,z |θ )q(z |x)z

≥ q(z |x)log p(x ,z |θ )q(z |x)z

= q(z |x)logp(x ,z |θ )− q(z |x)logz∑ q(z |x)

z∑

= ECLL+Hq

qqc Hzxx +≥⇒ ),;();( θθ ll

•  Jensen’sinequality

Entropyterm

Page 49: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

49

LowerBoundsandFreeEnergy

•  Forfixeddatax,defineafuncGonalcalledthefreeenergy:

•  TheEMalgorithmiscoordinate-ascentonF :

– E-step:

– M-step:

);()|()|,(

log)|(),(def

xxzq

zxpxzqqFz

θθθ l≤=∑

),(maxarg tq

t qFq θ=+1

),(maxarg ttt qF θθθ

11 ++ =

Page 50: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

HowEMopGmizeILL?

11/29/16

Dr.YanjunQi/UVACS6316/f15

50

Page 51: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

51

E-step:maximizaGonofw.r.t.q •  Claim:

–  ThisistheposteriordistribuGonoverthelatentvariablesgiventhedataandtheparameters.OnenweneedthisattestGmeanyway(e.g.toperformclustering).

•  Proof(easy):thissebngahainstheboundofILL

•  CanalsoshowthisresultusingvariaGonalcalculusorthefactthat

),|(),(maxarg ttq

t xzpqFq θθ ==+1

);()|(log

)|(log),(

),()|,(log),()),,((

xxp

xpxzp

xzpzxpxzpxzpF

ttz

tt

zt

tttt

θθ

θθ

θθθθθ

l==

=

=

( )),|(||KL),();( θθθ xzpqqFx =−l

Page 52: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

E-step: Alternative derivation

11/29/16

Dr.YanjunQi/UVACS6316/f15

52

( )),|(||KL),();( θθθ xzpqqFx =−l

Page 53: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

53

M-step:maximizaGonw.r.t.\theta

•  Notethatthefreeenergybreaksintotwoterms:

– Thefirsttermistheexpectedcompleteloglikelihood(energy)andthesecondterm,whichdoesnotdependon q,istheentropy.

qqc

zz

z

Hzx

xzqxzqzxpxzqxzq

zxpxzqqF

+=

−=

=

∑∑∑

),;(

)|(log)|()|,(log)|()|()|,(log)|(),(

θ

θ

θθ

l

Page 54: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

54

M-step:maximizaGonw.r.t.\theta

•  Thus,intheM-step,maximizingwithrespecttoqforfixedqweonlyneedtoconsiderthefirstterm:

– UnderopGmalqt+1, thisisequivalenttosolvingastandardMLEoffullyobservedmodelp(x,z|q),withthesufficientstaGsGcsinvolvingzreplacedbytheirexpectaGonsw.r.t.p(z|x,q).

∑== ++

zqc

t zxpxzqzx t )|,(log)|(maxarg),;(maxarg θθθθθ

11 l

Page 55: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

55

Summary:EMAlgorithm•  AwayofmaximizinglikelihoodfuncGonforlatentvariablemodels.

FindsMLEofparameterswhentheoriginal(hard)problemcanbebrokenupintotwo(easy)pieces:1.  EsGmatesome“missing”or“unobserved”datafromobserveddata

andcurrentparameters.2.  Usingthis“complete”data,findthemaximumlikelihoodparameter

esGmates.

•  Alternatebetweenfillinginthelatentvariablesusingthebestguess(posterior)andupdaGngtheparametersbasedonthisguess:–  E-step:–  M-step:

•  IntheM-stepweopGmizealowerboundonthelikelihood.IntheE-stepweclosethegap,makingbound=likelihood.

),(maxarg tq

t qFq θ=+1

),(maxarg ttt qF θθθ

11 ++ =

Page 56: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

HowEMopGmizeILL?

11/29/16

Dr.YanjunQi/UVACS6316/f15

56

Page 57: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

11/29/16

Dr.YanjunQi/UVACS6316/f15

57

AReportCardforEM•  SomegoodthingsaboutEM:

–  nolearningrate(step-size)parameter–  automaGcallyenforcesparameterconstraints–  veryfastforlowdimensions–  eachiteraGonguaranteedtoimprovelikelihood–  CallsinferenceandfullyobservedlearningassubrouGnes.

•  SomebadthingsaboutEM:

–  cangetstuckinlocalminima–  canbeslowerthanconjugategradient(especiallynearconvergence)

–  requiresexpensiveinferencestep–  isamaximumlikelihood/MAPmethod

Page 58: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

References

•  BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

•  TheEMAlgorithmandExtensionsbyGeoffreyJ.MacLauchlan,ThriyambakamKrishnan

11/29/16 58

Dr.YanjunQi/UVACS6316/f15