07 cv mil_modeling_complex_densities
-
Upload
zukun -
Category
Technology
-
view
228 -
download
1
Transcript of 07 cv mil_modeling_complex_densities
Computer vision: models, learning and inference
Chapter 7
Modeling complex densities
Please send errata to [email protected]
Structure
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Models for machine vision
3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Face Detection
4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Type 3: Pr(x|w) - Generative
How to model Pr(x|w)?
– Choose an appropriate form for Pr(x)
– Make parameters a function of w
– Function takes parameters q that define its shape
Learning algorithm: learn parameters q from training data x,w
Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule
5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Classification Model
Or writing in terms of class conditional density functions
Parameters m0, S0 learnt just from data S0 where w=0
Similarly, parameters m1, S1 learnt just from data S1 where w=16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule
Experiment
1000 non-faces1000 faces
60x60x3 Images =10800 x1 vectors
Equal priors Pr(y=1)=Pr(y=0) = 0.5
75% performance on test set. Not very good!
8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Results (diagonal covariance)
9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
The plan...
10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Structure
1111Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Hidden (or latent) Variables
Key idea: represent density Pr(x) as marginalization of joint density with another variable h that we do not see
12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Will also depend on some parameters:
Hidden (or latent) Variables
13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Expectation Maximization
An algorithm specialized to fitting pdfs which are the marginalization of a joint distribution
Defines a lower bound on log likelihood and increases bound iteratively
14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Lower bound
Lower bound is a function of parameters q and a set of probability distributions qi(hi)
Expectation Maximization (EM) algorithm alternates E-steps and M-Steps
E-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q
15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
Lower bound
16
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)
M-Step – Maximize bound w.r.t. parameters q
17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
18
E-Step & M-Step
18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Structure
1919Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Mixture of Gaussians (MoG)
20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
ML Learning
Q. Can we learn this model with maximum likelihood?
A. Yes, but using brute force approach is tricky• If you take derivative and set to zero, can’t solve
-- the log of the sum causes problems• Have to enforce constraints on parameters
• covariances must be positive definite• weights must sum to one
21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MoG as a marginalization
Define a variable and then write
Then we can recover the density by marginalizing Pr(x,h)
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MoG as a marginalization
23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
MoG as a marginalization
Define a variable and then write
Note :
• This gives us a method to generate data from MoG• First sample Pr(h), then sample Pr(x|h)
• The hidden variable h has a clear interpretation –it tells you which Gaussian created data point x
24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
GOAL: to learn parameters from training data
Expectation Maximization for MoG
E-Step – Maximize bound w.r.t. distributions q(hi)
M-Step – Maximize bound w.r.t. parameters q
25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step
We’ll call this the responsibility of the kth Gaussian for the ith data point
Repeat this procedure for every datapoint!
26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step
Take derivative, equate to zero and solve (Lagrange multipliers for l)
27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step
Update means , covariances and weights according to responsibilities of datapoints
28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Iterate until no further improvement
E-Step M-Step
29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Different flavours...
Diagonalcovariance
Samecovariance
Full covariance
31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Local Minima
L = 98.76 L = 96.97 L = 94.35
Start from three random positions
32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Means of face/non-face model
Classification 84% (9% improvement!)
33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
The plan...
34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Structure
3535Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Student t-distributions
36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Student t-distributions motivation
Normal distribution Normal distributionw/ one extra datapoint!
t-distribution
Normal distribution Normal distributionw/ one extra datapoint!
t-distribution
The normal distribution is not very robust – a single outlier can completely throw it off because the tails fall off so fast...
37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Student t-distributions
Univariate student t-distribution
Multivariate student t-distribution
38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
t-distribution as a marginalization
Define hidden variable h
Can be expressed as a marginalization
39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Gamma distribution
40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
t-distribution as a marginalization
Define hidden variable h
Things to note:
• Again this provides a method to sample from the t-distribution • Variable h has a clear interpretation:
• Each datum drawn from a Gaussian, mean m• Covariance depends inversely on h
• Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances
41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
t-distribution
42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
GOAL: to learn parameters from training data
EM for t-distributions
E-Step – Maximize bound w.r.t. distributions q(hi)
M-Step – Maximize bound w.r.t. parameters q
43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step
Extract expectations
44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step
Where...
45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Updates
No closed form solution for n. Must optimize bound – since it is only one number we can optimize by just evaluating with a fine grid of values and picking the best
46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
EM algorithm for t-distributions
47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
The plan...
48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Structure
4949Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Factor Analysis
Compromise between – Full covariance matrix (desirable but many parameters)
– Diagonal covariance matrix (no modelling of covariance)
Models full covariance in a subspace F
Mops everything else up with diagonal component S
50Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Data Density
Consider modelling distribution of 100x100x3 RGB Images
Gaussian w/ spherical covariance: 1 covariance matrix parameters
Gaussian w/ diagonal covariance: Dx covariance matrix parameters
Full Gaussian: ~Dx2 covariance matrix parameters
PPCA: ~Dx Dh covariance matrix parameters
Factor Analysis: ~Dx (Dh+1) covariance matrix parameters
(full covariance in subspace + diagonal)51Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Subspaces
52Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Factor Analysis
Compromise between – Full covariance matrix (desirable but many parameters)
– Diagonal covariance matrix (no modelling of covariance)
Models full covariance in a subspace F
Mops everything else up with diagonal component S
53Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Factor Analysis as a Marginalization
Then it can be shown (not obvious) that
Let’s define
54Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Sampling from factor analyzer
Factor analysis vs. MoG
56Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step
• Compute posterior over hidden variables
57Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Compute expectations needed for M-Step
E-Step
58Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step
• Optimize parameters w.r.t. EM bound
59Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where....
M-Step
• Optimize parameters w.r.t. EM bound
60Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
where...
giving expressions:
Face model
61Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Sampling from 10 parameter model
To generate:
• Choose factor loadings, hi from standard normal distribution• Multiply by factors, F• Add mean, m• (should add random noise component ei w/ diagonal cov S)
62Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Combining Models
Can combine three models in various combinations
63Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Mixture of t-distributions = mixture of Gaussians + t-distributions• Robust subspace models = t-distributions + factor analysis• Mixture of factor analyizers = mixture of Gaussians + factor analysis
Or combine all models to create mixture of robust subspace models
Structure
6464Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Expectation Maximization
Problem: Optimize cost functions of the form
Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)
Key idea: Define lower bound on log-likelihood and increase at each iteration
Continuous case
Discrete case
65Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Expectation Maximization
Defines a lower bound on log likelihood and increases bound iteratively
66Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Lower bound
Lower bound is a function of parameters q and a set of probability distributions {qi(hi)}
67Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions {qi(hi)}
M-Step – Maximize bound w.r.t. parameters q
68Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step: Update {qi[hi]} so that bound equals log likelihood for this q
69Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step: Update q to maximum70Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
71Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Missing Parts of Argument
Missing Parts of Argument
1. Show that this is a lower bound
2. Show that the E-Step update is correct
3. Show that the M-Step update is correct
72Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Jensen’s Inequality
or…
similarly
73Computer vision: models, learning and inference. ©2011 Simon
J.D. Prince
Lower Bound for EM Algorithm
Where we have used Jensen’s inequality
74Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Constant w.r.t. q(h) Only this term matters
E-Step – Optimize bound w.r.t {qi(hi)}
75Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step
Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance)
76Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Use this relation
log[y]<y-1
77Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Kullback Leibler Divergence
So the cost function must be positive
In other words, the best we can do is choose qi(hi) so that this is zero
78Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-StepSo the cost function must be positive
The best we can do is choose qi(hi) so that this is zero.
How can we do this? Easy – choose posterior Pr(h|x)
79Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
M-Step – Optimize bound w.r.t. q
In the M-Step we optimize expected joint log likelihood with respect to parameters q (Expectation w.r.t distribution from E-Step)
80Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
E-Step & M-Step
E-Step – Maximize bound w.r.t. distributions qi(hi)
M-Step – Maximize bound w.r.t. parameters q
81Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Structure
8282Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Densities for classification
• Models with hidden variables
– Mixture of Gaussians
– t-distributions
– Factor analysis
• EM algorithm in detail
• Applications
Face Detection
83Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
(not a very realistic model – face detection really done using discriminative methods)
Object recognition
84Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
t-distributionsnormal
distributions
Model as J independent patches
Segmentation
85Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Face Recognition
86Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Pose Regression
Training data
Frontal faces xNon-frontal faces w
Learning
Fit factor analysis model to joint distribution of x,w
Inference
Given new x, infer w(cond. dist of normal)
87Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Conditional Distributions
If
then
88Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Modeling transformations
89Computer vision: models, learning and inference. ©2011 Simon J.D. Prince