07 cv mil_modeling_complex_densities
-
Upload
zukun -
Category
Technology
-
view
228 -
download
1
Transcript of 07 cv mil_modeling_complex_densities
![Page 1: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/1.jpg)
Computer vision: models, learning and inference
Chapter 7
Modeling complex densities
Please send errata to [email protected]
![Page 2: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/2.jpg)
Structure
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 3: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/3.jpg)
Models for machine vision
3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 4: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/4.jpg)
Face Detection
4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 5: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/5.jpg)
Type 3: Pr(x|w) - Generative
How to model Pr(x|w)?
– Choose an appropriate form for Pr(x)
– Make parameters a function of w
– Function takes parameters q that define its shape
Learning algorithm: learn parameters q from training data x,w
Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule
5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 6: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/6.jpg)
Classification Model
Or writing in terms of class conditional density functions
Parameters m0, S0 learnt just from data S0 where w=0
Similarly, parameters m1, S1 learnt just from data S1 where w=16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 7: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/7.jpg)
7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule
![Page 8: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/8.jpg)
Experiment
1000 non-faces1000 faces
60x60x3 Images =10800 x1 vectors
Equal priors Pr(y=1)=Pr(y=0) = 0.5
75% performance on test set. Not very good!
8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 9: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/9.jpg)
Results (diagonal covariance)
9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 10: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/10.jpg)
The plan...
10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 11: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/11.jpg)
Structure
1111Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 12: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/12.jpg)
Hidden (or latent) Variables
Key idea: represent density Pr(x) as marginalization of joint density with another variable h that we do not see
12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Will also depend on some parameters:
![Page 13: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/13.jpg)
Hidden (or latent) Variables
13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 14: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/14.jpg)
Expectation Maximization
An algorithm specialized to fitting pdfs which are the marginalization of a joint distribution
Defines a lower bound on log likelihood and increases bound iteratively
14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 15: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/15.jpg)
Lower bound
Lower bound is a function of parameters q and a set of probability distributions qi(hi)
Expectation Maximization (EM) algorithm alternates E-steps and M-Steps
E-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q
15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 16: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/16.jpg)
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
Lower bound
16
![Page 17: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/17.jpg)
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)
M-Step – Maximize bound w.r.t. parameters q
17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 18: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/18.jpg)
18
E-Step & M-Step
18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 19: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/19.jpg)
Structure
1919Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 20: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/20.jpg)
Mixture of Gaussians (MoG)
20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 21: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/21.jpg)
ML Learning
Q. Can we learn this model with maximum likelihood?
A. Yes, but using brute force approach is tricky• If you take derivative and set to zero, can’t solve
-- the log of the sum causes problems• Have to enforce constraints on parameters
• covariances must be positive definite• weights must sum to one
21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 22: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/22.jpg)
MoG as a marginalization
Define a variable and then write
Then we can recover the density by marginalizing Pr(x,h)
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 23: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/23.jpg)
MoG as a marginalization
23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 24: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/24.jpg)
MoG as a marginalization
Define a variable and then write
Note :
• This gives us a method to generate data from MoG• First sample Pr(h), then sample Pr(x|h)
• The hidden variable h has a clear interpretation –it tells you which Gaussian created data point x
24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 25: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/25.jpg)
GOAL: to learn parameters from training data
Expectation Maximization for MoG
E-Step – Maximize bound w.r.t. distributions q(hi)
M-Step – Maximize bound w.r.t. parameters q
25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 26: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/26.jpg)
E-Step
We’ll call this the responsibility of the kth Gaussian for the ith data point
Repeat this procedure for every datapoint!
26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 27: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/27.jpg)
M-Step
Take derivative, equate to zero and solve (Lagrange multipliers for l)
27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 28: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/28.jpg)
M-Step
Update means , covariances and weights according to responsibilities of datapoints
28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 29: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/29.jpg)
Iterate until no further improvement
E-Step M-Step
29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 30: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/30.jpg)
30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 31: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/31.jpg)
Different flavours...
Diagonalcovariance
Samecovariance
Full covariance
31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 32: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/32.jpg)
Local Minima
L = 98.76 L = 96.97 L = 94.35
Start from three random positions
32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 33: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/33.jpg)
Means of face/non-face model
Classification 84% (9% improvement!)
33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 34: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/34.jpg)
The plan...
34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 35: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/35.jpg)
Structure
3535Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 36: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/36.jpg)
Student t-distributions
36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 37: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/37.jpg)
Student t-distributions motivation
Normal distribution Normal distributionw/ one extra datapoint!
t-distribution
Normal distribution Normal distributionw/ one extra datapoint!
t-distribution
The normal distribution is not very robust – a single outlier can completely throw it off because the tails fall off so fast...
37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 38: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/38.jpg)
Student t-distributions
Univariate student t-distribution
Multivariate student t-distribution
38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 39: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/39.jpg)
t-distribution as a marginalization
Define hidden variable h
Can be expressed as a marginalization
39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 40: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/40.jpg)
Gamma distribution
40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 41: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/41.jpg)
t-distribution as a marginalization
Define hidden variable h
Things to note:
• Again this provides a method to sample from the t-distribution • Variable h has a clear interpretation:
• Each datum drawn from a Gaussian, mean m• Covariance depends inversely on h
• Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances
41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 42: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/42.jpg)
t-distribution
42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 43: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/43.jpg)
GOAL: to learn parameters from training data
EM for t-distributions
E-Step – Maximize bound w.r.t. distributions q(hi)
M-Step – Maximize bound w.r.t. parameters q
43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 44: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/44.jpg)
E-Step
Extract expectations
44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 45: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/45.jpg)
M-Step
Where...
45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 46: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/46.jpg)
Updates
No closed form solution for n. Must optimize bound – since it is only one number we can optimize by just evaluating with a fine grid of values and picking the best
46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 47: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/47.jpg)
EM algorithm for t-distributions
47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 48: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/48.jpg)
The plan...
48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 49: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/49.jpg)
Structure
4949Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 50: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/50.jpg)
Factor Analysis
Compromise between – Full covariance matrix (desirable but many parameters)
– Diagonal covariance matrix (no modelling of covariance)
Models full covariance in a subspace F
Mops everything else up with diagonal component S
50Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 51: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/51.jpg)
Data Density
Consider modelling distribution of 100x100x3 RGB Images
Gaussian w/ spherical covariance: 1 covariance matrix parameters
Gaussian w/ diagonal covariance: Dx covariance matrix parameters
Full Gaussian: ~Dx2 covariance matrix parameters
PPCA: ~Dx Dh covariance matrix parameters
Factor Analysis: ~Dx (Dh+1) covariance matrix parameters
(full covariance in subspace + diagonal)51Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 52: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/52.jpg)
Subspaces
52Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 53: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/53.jpg)
Factor Analysis
Compromise between – Full covariance matrix (desirable but many parameters)
– Diagonal covariance matrix (no modelling of covariance)
Models full covariance in a subspace F
Mops everything else up with diagonal component S
53Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 54: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/54.jpg)
Factor Analysis as a Marginalization
Then it can be shown (not obvious) that
Let’s define
54Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 55: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/55.jpg)
55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Sampling from factor analyzer
![Page 56: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/56.jpg)
Factor analysis vs. MoG
56Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 57: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/57.jpg)
E-Step
• Compute posterior over hidden variables
57Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Compute expectations needed for M-Step
![Page 58: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/58.jpg)
E-Step
58Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 59: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/59.jpg)
M-Step
• Optimize parameters w.r.t. EM bound
59Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where....
![Page 60: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/60.jpg)
M-Step
• Optimize parameters w.r.t. EM bound
60Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where...
giving expressions:
![Page 61: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/61.jpg)
Face model
61Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 62: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/62.jpg)
Sampling from 10 parameter model
To generate:
• Choose factor loadings, hi from standard normal distribution• Multiply by factors, F• Add mean, m• (should add random noise component ei w/ diagonal cov S)
62Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 63: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/63.jpg)
Combining Models
Can combine three models in various combinations
63Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Mixture of t-distributions = mixture of Gaussians + t-distributions• Robust subspace models = t-distributions + factor analysis• Mixture of factor analyizers = mixture of Gaussians + factor analysis
Or combine all models to create mixture of robust subspace models
![Page 64: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/64.jpg)
Structure
6464Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 65: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/65.jpg)
Expectation Maximization
Problem: Optimize cost functions of the form
Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)
Key idea: Define lower bound on log-likelihood and increase at each iteration
Continuous case
Discrete case
65Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 66: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/66.jpg)
Expectation Maximization
Defines a lower bound on log likelihood and increases bound iteratively
66Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 67: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/67.jpg)
Lower bound
Lower bound is a function of parameters q and a set of probability distributions {qi(hi)}
67Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 68: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/68.jpg)
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions {qi(hi)}
M-Step – Maximize bound w.r.t. parameters q
68Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 69: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/69.jpg)
E-Step: Update {qi[hi]} so that bound equals log likelihood for this q
69Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 70: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/70.jpg)
M-Step: Update q to maximum70Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 71: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/71.jpg)
71Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Missing Parts of Argument
![Page 72: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/72.jpg)
Missing Parts of Argument
1. Show that this is a lower bound
2. Show that the E-Step update is correct
3. Show that the M-Step update is correct
72Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 73: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/73.jpg)
Jensen’s Inequality
or…
similarly
73Computer vision: models, learning and inference. ©2011 Simon
J.D. Prince
![Page 74: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/74.jpg)
Lower Bound for EM Algorithm
Where we have used Jensen’s inequality
74Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 75: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/75.jpg)
Constant w.r.t. q(h) Only this term matters
E-Step – Optimize bound w.r.t {qi(hi)}
75Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 76: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/76.jpg)
E-Step
Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance)
76Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 77: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/77.jpg)
Use this relation
log[y]<y-1
77Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 78: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/78.jpg)
Kullback Leibler Divergence
So the cost function must be positive
In other words, the best we can do is choose qi(hi) so that this is zero
78Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 79: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/79.jpg)
E-StepSo the cost function must be positive
The best we can do is choose qi(hi) so that this is zero.
How can we do this? Easy – choose posterior Pr(h|x)
79Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 80: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/80.jpg)
M-Step – Optimize bound w.r.t. q
In the M-Step we optimize expected joint log likelihood with respect to parameters q (Expectation w.r.t distribution from E-Step)
80Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 81: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/81.jpg)
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)
M-Step – Maximize bound w.r.t. parameters q
81Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 82: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/82.jpg)
Structure
8282Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
![Page 83: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/83.jpg)
Face Detection
83Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
(not a very realistic model – face detection really done using discriminative methods)
![Page 84: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/84.jpg)
Object recognition
84Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
t-distributionsnormal
distributions
Model as J independent patches
![Page 85: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/85.jpg)
Segmentation
85Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 86: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/86.jpg)
Face Recognition
86Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 87: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/87.jpg)
Pose Regression
Training data
Frontal faces xNon-frontal faces w
Learning
Fit factor analysis model to joint distribution of x,w
Inference
Given new x, infer w(cond. dist of normal)
87Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 88: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/88.jpg)
Conditional Distributions
If
then
88Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
![Page 89: 07 cv mil_modeling_complex_densities](https://reader033.fdocuments.us/reader033/viewer/2022060200/5597e1981a28ab57758b4620/html5/thumbnails/89.jpg)
Modeling transformations
89Computer vision: models, learning and inference. ©2011 Simon J.D. Prince