07 cv mil_modeling_complex_densities

89
Computer vision: models, learning and inference Chapter 7 Modeling complex densities Please send errata to [email protected]

Transcript of 07 cv mil_modeling_complex_densities

Page 1: 07 cv mil_modeling_complex_densities

Computer vision: models, learning and inference

Chapter 7

Modeling complex densities

Please send errata to [email protected]

Page 2: 07 cv mil_modeling_complex_densities

Structure

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 3: 07 cv mil_modeling_complex_densities

Models for machine vision

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 4: 07 cv mil_modeling_complex_densities

Face Detection

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 5: 07 cv mil_modeling_complex_densities

Type 3: Pr(x|w) - Generative

How to model Pr(x|w)?

– Choose an appropriate form for Pr(x)

– Make parameters a function of w

– Function takes parameters q that define its shape

Learning algorithm: learn parameters q from training data x,w

Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 6: 07 cv mil_modeling_complex_densities

Classification Model

Or writing in terms of class conditional density functions

Parameters m0, S0 learnt just from data S0 where w=0

Similarly, parameters m1, S1 learnt just from data S1 where w=16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: 07 cv mil_modeling_complex_densities

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Inference algorithm: Define prior Pr(w) and then compute Pr(w|x) using Bayes’ rule

Page 8: 07 cv mil_modeling_complex_densities

Experiment

1000 non-faces1000 faces

60x60x3 Images =10800 x1 vectors

Equal priors Pr(y=1)=Pr(y=0) = 0.5

75% performance on test set. Not very good!

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 9: 07 cv mil_modeling_complex_densities

Results (diagonal covariance)

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 07 cv mil_modeling_complex_densities

The plan...

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 11: 07 cv mil_modeling_complex_densities

Structure

1111Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 12: 07 cv mil_modeling_complex_densities

Hidden (or latent) Variables

Key idea: represent density Pr(x) as marginalization of joint density with another variable h that we do not see

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Will also depend on some parameters:

Page 13: 07 cv mil_modeling_complex_densities

Hidden (or latent) Variables

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 14: 07 cv mil_modeling_complex_densities

Expectation Maximization

An algorithm specialized to fitting pdfs which are the marginalization of a joint distribution

Defines a lower bound on log likelihood and increases bound iteratively

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 15: 07 cv mil_modeling_complex_densities

Lower bound

Lower bound is a function of parameters q and a set of probability distributions qi(hi)

Expectation Maximization (EM) algorithm alternates E-steps and M-Steps

E-Step – Maximize bound w.r.t. distributions q(hi)M-Step – Maximize bound w.r.t. parameters q

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 16: 07 cv mil_modeling_complex_densities

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16

Lower bound

16

Page 17: 07 cv mil_modeling_complex_densities

E-Step & M-Step

E-Step – Maximize bound w.r.t. distributions qi(hi)

M-Step – Maximize bound w.r.t. parameters q

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 18: 07 cv mil_modeling_complex_densities

18

E-Step & M-Step

18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 19: 07 cv mil_modeling_complex_densities

Structure

1919Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 20: 07 cv mil_modeling_complex_densities

Mixture of Gaussians (MoG)

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 21: 07 cv mil_modeling_complex_densities

ML Learning

Q. Can we learn this model with maximum likelihood?

A. Yes, but using brute force approach is tricky• If you take derivative and set to zero, can’t solve

-- the log of the sum causes problems• Have to enforce constraints on parameters

• covariances must be positive definite• weights must sum to one

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 22: 07 cv mil_modeling_complex_densities

MoG as a marginalization

Define a variable and then write

Then we can recover the density by marginalizing Pr(x,h)

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 23: 07 cv mil_modeling_complex_densities

MoG as a marginalization

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 24: 07 cv mil_modeling_complex_densities

MoG as a marginalization

Define a variable and then write

Note :

• This gives us a method to generate data from MoG• First sample Pr(h), then sample Pr(x|h)

• The hidden variable h has a clear interpretation –it tells you which Gaussian created data point x

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 25: 07 cv mil_modeling_complex_densities

GOAL: to learn parameters from training data

Expectation Maximization for MoG

E-Step – Maximize bound w.r.t. distributions q(hi)

M-Step – Maximize bound w.r.t. parameters q

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 26: 07 cv mil_modeling_complex_densities

E-Step

We’ll call this the responsibility of the kth Gaussian for the ith data point

Repeat this procedure for every datapoint!

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 27: 07 cv mil_modeling_complex_densities

M-Step

Take derivative, equate to zero and solve (Lagrange multipliers for l)

27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 28: 07 cv mil_modeling_complex_densities

M-Step

Update means , covariances and weights according to responsibilities of datapoints

28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 29: 07 cv mil_modeling_complex_densities

Iterate until no further improvement

E-Step M-Step

29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 30: 07 cv mil_modeling_complex_densities

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 31: 07 cv mil_modeling_complex_densities

Different flavours...

Diagonalcovariance

Samecovariance

Full covariance

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 32: 07 cv mil_modeling_complex_densities

Local Minima

L = 98.76 L = 96.97 L = 94.35

Start from three random positions

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 33: 07 cv mil_modeling_complex_densities

Means of face/non-face model

Classification 84% (9% improvement!)

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 34: 07 cv mil_modeling_complex_densities

The plan...

34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 35: 07 cv mil_modeling_complex_densities

Structure

3535Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 36: 07 cv mil_modeling_complex_densities

Student t-distributions

36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 37: 07 cv mil_modeling_complex_densities

Student t-distributions motivation

Normal distribution Normal distributionw/ one extra datapoint!

t-distribution

Normal distribution Normal distributionw/ one extra datapoint!

t-distribution

The normal distribution is not very robust – a single outlier can completely throw it off because the tails fall off so fast...

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 38: 07 cv mil_modeling_complex_densities

Student t-distributions

Univariate student t-distribution

Multivariate student t-distribution

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 39: 07 cv mil_modeling_complex_densities

t-distribution as a marginalization

Define hidden variable h

Can be expressed as a marginalization

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 40: 07 cv mil_modeling_complex_densities

Gamma distribution

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 41: 07 cv mil_modeling_complex_densities

t-distribution as a marginalization

Define hidden variable h

Things to note:

• Again this provides a method to sample from the t-distribution • Variable h has a clear interpretation:

• Each datum drawn from a Gaussian, mean m• Covariance depends inversely on h

• Can think of this as an infinite mixture (sum becomes integral) of Gaussians w/ same mean, but different variances

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 42: 07 cv mil_modeling_complex_densities

t-distribution

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 43: 07 cv mil_modeling_complex_densities

GOAL: to learn parameters from training data

EM for t-distributions

E-Step – Maximize bound w.r.t. distributions q(hi)

M-Step – Maximize bound w.r.t. parameters q

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 44: 07 cv mil_modeling_complex_densities

E-Step

Extract expectations

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 45: 07 cv mil_modeling_complex_densities

M-Step

Where...

45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 46: 07 cv mil_modeling_complex_densities

Updates

No closed form solution for n. Must optimize bound – since it is only one number we can optimize by just evaluating with a fine grid of values and picking the best

46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 47: 07 cv mil_modeling_complex_densities

EM algorithm for t-distributions

47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 48: 07 cv mil_modeling_complex_densities

The plan...

48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 49: 07 cv mil_modeling_complex_densities

Structure

4949Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 50: 07 cv mil_modeling_complex_densities

Factor Analysis

Compromise between – Full covariance matrix (desirable but many parameters)

– Diagonal covariance matrix (no modelling of covariance)

Models full covariance in a subspace F

Mops everything else up with diagonal component S

50Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 51: 07 cv mil_modeling_complex_densities

Data Density

Consider modelling distribution of 100x100x3 RGB Images

Gaussian w/ spherical covariance: 1 covariance matrix parameters

Gaussian w/ diagonal covariance: Dx covariance matrix parameters

Full Gaussian: ~Dx2 covariance matrix parameters

PPCA: ~Dx Dh covariance matrix parameters

Factor Analysis: ~Dx (Dh+1) covariance matrix parameters

(full covariance in subspace + diagonal)51Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 52: 07 cv mil_modeling_complex_densities

Subspaces

52Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 53: 07 cv mil_modeling_complex_densities

Factor Analysis

Compromise between – Full covariance matrix (desirable but many parameters)

– Diagonal covariance matrix (no modelling of covariance)

Models full covariance in a subspace F

Mops everything else up with diagonal component S

53Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 54: 07 cv mil_modeling_complex_densities

Factor Analysis as a Marginalization

Then it can be shown (not obvious) that

Let’s define

54Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 55: 07 cv mil_modeling_complex_densities

55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Sampling from factor analyzer

Page 56: 07 cv mil_modeling_complex_densities

Factor analysis vs. MoG

56Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 57: 07 cv mil_modeling_complex_densities

E-Step

• Compute posterior over hidden variables

57Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Compute expectations needed for M-Step

Page 58: 07 cv mil_modeling_complex_densities

E-Step

58Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 59: 07 cv mil_modeling_complex_densities

M-Step

• Optimize parameters w.r.t. EM bound

59Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where....

Page 60: 07 cv mil_modeling_complex_densities

M-Step

• Optimize parameters w.r.t. EM bound

60Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where...

giving expressions:

Page 61: 07 cv mil_modeling_complex_densities

Face model

61Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 62: 07 cv mil_modeling_complex_densities

Sampling from 10 parameter model

To generate:

• Choose factor loadings, hi from standard normal distribution• Multiply by factors, F• Add mean, m• (should add random noise component ei w/ diagonal cov S)

62Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 63: 07 cv mil_modeling_complex_densities

Combining Models

Can combine three models in various combinations

63Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Mixture of t-distributions = mixture of Gaussians + t-distributions• Robust subspace models = t-distributions + factor analysis• Mixture of factor analyizers = mixture of Gaussians + factor analysis

Or combine all models to create mixture of robust subspace models

Page 64: 07 cv mil_modeling_complex_densities

Structure

6464Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 65: 07 cv mil_modeling_complex_densities

Expectation Maximization

Problem: Optimize cost functions of the form

Solution: Expectation Maximization (EM) algorithm (Dempster, Laird and Rubin 1977)

Key idea: Define lower bound on log-likelihood and increase at each iteration

Continuous case

Discrete case

65Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 66: 07 cv mil_modeling_complex_densities

Expectation Maximization

Defines a lower bound on log likelihood and increases bound iteratively

66Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 67: 07 cv mil_modeling_complex_densities

Lower bound

Lower bound is a function of parameters q and a set of probability distributions {qi(hi)}

67Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 68: 07 cv mil_modeling_complex_densities

E-Step & M-Step

E-Step – Maximize bound w.r.t. distributions {qi(hi)}

M-Step – Maximize bound w.r.t. parameters q

68Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 69: 07 cv mil_modeling_complex_densities

E-Step: Update {qi[hi]} so that bound equals log likelihood for this q

69Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 70: 07 cv mil_modeling_complex_densities

M-Step: Update q to maximum70Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 71: 07 cv mil_modeling_complex_densities

71Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Missing Parts of Argument

Page 72: 07 cv mil_modeling_complex_densities

Missing Parts of Argument

1. Show that this is a lower bound

2. Show that the E-Step update is correct

3. Show that the M-Step update is correct

72Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 73: 07 cv mil_modeling_complex_densities

Jensen’s Inequality

or…

similarly

73Computer vision: models, learning and inference. ©2011 Simon

J.D. Prince

Page 74: 07 cv mil_modeling_complex_densities

Lower Bound for EM Algorithm

Where we have used Jensen’s inequality

74Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 75: 07 cv mil_modeling_complex_densities

Constant w.r.t. q(h) Only this term matters

E-Step – Optimize bound w.r.t {qi(hi)}

75Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 76: 07 cv mil_modeling_complex_densities

E-Step

Kullback Leibler divergence – distance between probability distributions . We are maximizing the negative distance (i.e. Minimizing distance)

76Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 77: 07 cv mil_modeling_complex_densities

Use this relation

log[y]<y-1

77Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 78: 07 cv mil_modeling_complex_densities

Kullback Leibler Divergence

So the cost function must be positive

In other words, the best we can do is choose qi(hi) so that this is zero

78Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 79: 07 cv mil_modeling_complex_densities

E-StepSo the cost function must be positive

The best we can do is choose qi(hi) so that this is zero.

How can we do this? Easy – choose posterior Pr(h|x)

79Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 80: 07 cv mil_modeling_complex_densities

M-Step – Optimize bound w.r.t. q

In the M-Step we optimize expected joint log likelihood with respect to parameters q (Expectation w.r.t distribution from E-Step)

80Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 81: 07 cv mil_modeling_complex_densities

E-Step & M-Step

E-Step – Maximize bound w.r.t. distributions qi(hi)

M-Step – Maximize bound w.r.t. parameters q

81Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 82: 07 cv mil_modeling_complex_densities

Structure

8282Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Densities for classification

• Models with hidden variables

– Mixture of Gaussians

– t-distributions

– Factor analysis

• EM algorithm in detail

• Applications

Page 83: 07 cv mil_modeling_complex_densities

Face Detection

83Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

(not a very realistic model – face detection really done using discriminative methods)

Page 84: 07 cv mil_modeling_complex_densities

Object recognition

84Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

t-distributionsnormal

distributions

Model as J independent patches

Page 85: 07 cv mil_modeling_complex_densities

Segmentation

85Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 86: 07 cv mil_modeling_complex_densities

Face Recognition

86Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 87: 07 cv mil_modeling_complex_densities

Pose Regression

Training data

Frontal faces xNon-frontal faces w

Learning

Fit factor analysis model to joint distribution of x,w

Inference

Given new x, infer w(cond. dist of normal)

87Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 88: 07 cv mil_modeling_complex_densities

Conditional Distributions

If

then

88Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 89: 07 cv mil_modeling_complex_densities

Modeling transformations

89Computer vision: models, learning and inference. ©2011 Simon J.D. Prince