The Multivariate Gaussian

Post on 18-May-2022

7 views 0 download

Transcript of The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Multivariate Gaussian Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: nzabaras@gmail.com

URL: http://www.zabaras.com/

August 7, 2014

1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Gaussian Distribution

Mahalanobis Distance, Geometric Interpretation

Derivation of Mean and Moments

Restricted Forms of the Gaussian, 2D Examples and Generalizations

Contents

2

Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4

Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A multivariate is Gaussian if its probability density is

where is symmetric positive definite matrix

(covariance matrix).

,D D D

DX

1/2

11 1( , ) exp ( ) ( )

22 det

T

Dx x

N x |

Multivariate Gaussian

3

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Gaussian distribution is invariant under linear

transformations, i.e. for

1 1 1 2 2 2

1 2

~ ( , ), ~ ( , )

, and independent

N NX X

X X

1 2 1 2 1 2~ ( , )T T NAX BX c A B c A A B B

, ,c d dM A B c

Multivariate Gaussian

4

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The functional dependence of the Gaussian on x is through

the quadratic form (Mahalanobis distance)

Δ2 = (x − μ)TΣ−1(x − μ)

The Mahalanobis distance from μ to x reduces to the

Euclidean distance when Σ is the identity matrix.

The Gaussian distribution is constant on surfaces in x-space

for which this quadratic form is constant.

Σ can be taken to be symmetric, without loss of generality,

because any antisymmetric component would disappear from

the exponent.

Mahalanobis Distance

5

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Now consider the eigenvector equation for the covariance

matrix

where i = 1, . . . , D.

Because Σ is a real, symmetric, its eigenvalues will be real,

and its eigenvectors form an orthonormal set.

Multivariate Gaussian

6

1,

0

T

i i i i j ij

if i jwhere I

otherwise

u u u u

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The covariance matrix Σ can be expressed as an expansion

in terms of its eigenvectors

and similarly the inverse covariance matrix Σ−1 can be

expressed as

Multivariate Gaussian

7

1

D

T

i i i

i

u u

1

1

1DT

i i

i i

u u

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Mahalanobis distance now becomes:

We can interpret {yi} as a new coordinate system defined by the orthonormal vectors ui that are shifted and rotated with

respect to the original xi coordinates.

Forming the vector y={y1,…,yD}, we have

y = U(x − μ)

where U is a matrix whose rows are given by

U is an orthogonal matrix, UUT = I, UTU = I, I =identity matrix

Multivariate Gaussian

8

2 2

1

1, ( )

DT

i i i

i i

y y

u x

T

iu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Gaussian: Geometric Interpretation

9

The quadratic form, and thus the Gaussian density, are constant on ellipsoids,

with their centers at μ and their axes oriented along ui, and with scaling factors in

the directions of the axes given by

Note that the volume within the hyper-ellipsoid above can easily be computed:

1/2

i

2 2

1

1,

( )

D

i

i i

T

i i

y

y

u x

1/2

1/2

1/2 1/2

1 1 1/

| |

| |

i i i

D D DD

i i i D

i i iz y

sphere

dy dz V

VD = volume of

unit shpere

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

From and using the orthogonality of U,

we can derive

The Jacobian of the transformation from x to y, is given as:

The square of the determinant of the Jacobian is:

Multivariate Gaussian

10

( )T

j jy u x

1

1 1 1 1

( ) ( )

( )

DT

j j ji i i

i

D D D DT T T

kj j kj ji i i k k k k kj j

j i j j

y U x

U y U U x x x U y

u x

Tiij ij

j

xJ U

y

221T T TJ U U U U U

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Also can be written as:

The multivariate Gaussian distribution can now be written

in the y-coordinate system as:

In the y-coordinates, the multivariate Gaussian factorizes

into a product of independent Gaussian distributions. This

verifies that p(y) is correctly normalized!

Multivariate Gaussian

11

1/2 1/2

1

D

i

i

2 2

/2 1/21/21 1

1 1 1( ) ( ) | | exp exp

2 22 | | 2

DDj j

Dj jj jj

y yp p

y x J

2

1/21

1( ) exp 1

22

Dj

j

j jj

yp d dy

y y

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The mean of the multivariate Gaussian can be computed

as:

The exponent is an even function of the components of z

and, because the integrals over these are taken over the

range (−∞,∞), the term in z in the factor (z + μ) will vanish

by symmetry. Thus

Mean of the Multivariate Gaussian

12

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

T

D

T

D

d

d

x x x x x

z z z z

[ ]x

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The 2nd moment of the multivariate Gaussian can be

computed as:

Second Moment of the Multivariate Gaussian

13

1

/2 1/2

1

/2 1/2

1

/2 1/2

1

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

1 1exp

22 | |

1 1exp

22 | |

T T T

D

TT

D

T T

D

T T

D

d

d

d

d

xx x x xx x

z z z z x

z z z

z z zz z

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The terms with are zero due to symmetry.

The first remaining term is equal to from the

normalization of the multivariate Gaussian. It remains to

compute the last term.

Second Moment of the Multivariate Gaussian

14

T Tandz z T

1

/2 1/2

1

/2 1/2

1 1[ ] exp

22 | |

1 1exp

22 | |

T T T

D

T T

D

d

d

xx z z z

z z zz z

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We can simplify using:

Second Moment of the Multivariate Gaussian

15

2

1

/2 /21/2 1/21

1 1 1 1exp exp

2 22 | | 2 | |

DjT T T

D Dj j

yd d

z z zz z zz z

1 1 1 1

D D D DT T

k k kj j k kj j jk j j j

j j j j

x U y or z U y U y or y

z u

2 2

/2 /21/2 1/21 1 1 1 1 1

2

1/2 1/2 1

1

1 1 1 1exp exp

2 22 | | 2 | |

1 1exp

22

D D D D D DT Tk k

i j i j i j i jD Di j k i j kk k

i j

T ki iD

k km

m

y yy y d y y d

y

u u z u u y

u u

The integral with terms dropsdue to symmetry

2 2

1 1

0D D D

T

i i i i

i i

y d

y u u

|J|=1

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

In the last step, we used the expression for the 2nd moment

of a Gaussian and:

Second Moment of the Multivariate Gaussian

16

2

/2 1/21 1 1

22 2

/2 1/21 1 1

1 1exp

22 | |

1 1exp 0

22 | |

D D DT k

i j i jDi j k k

D D DT Tk

i i i i i iDi k ik

yy y d

yy d

u u z

u u y u u

( ) [ ] ( ) 0

[ ] var( )

T T

i i i i

T T T

i i i i i i i i i

y y and

var y

u x u

u x u u u u u

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We finally conclude that

From this, we can derive the covariance as

The parameters in the Gaussian distribution increase with

dimensionality. A general symmetric covariance matrix Σ has

D(D + 1)/2 independent parameters.

This together with the D independent parameters in μ, gives

D(D + 3)/2 parameters. This grows quadratically with D.

Second Moment of the Multivariate Gaussian

17

[ ]T T xx

[ ] T

cov - -x x x

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

For diagonal covariance matrix we have only 2D total number

of parameters.

The corresponding contours of constant

density are given by axis-aligned ellipsoids.

We could further restrict the covariance

matrix

and in this (isotropic covariance) case we

have a total of D+1 parameters. The

constant density contours are now circles.

Restricted Forms of the Multivariate Gaussian

18

2( )idiag

2 I

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)

2D Gaussian

19

full

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-10

-5

0

5

10

-10

-5

0

5

100

0.05

0.1

0.15

0.2

full

diagonal

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

-8

-6

-4

-2

0

2

4

6

8

10

-5

0

5

-10

-5

0

5

100

0.05

0.1

0.15

0.2

diagonal

spherical

-6 -4 -2 0 2 4 6-5

-4

-3

-2

-1

0

1

2

3

4

5

-5

0

5

-5

0

50

0.05

0.1

0.15

0.2

spherical

gaussPlot2DDemo

from PMTK

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Gaussian distribution is flexible (many parameters) but

is limited to unimodal distributions.

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed.

Restricted Forms of the Multivariate Gaussian

20

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed:

Multimodal distributions can be obtained by introducing

discrete latent variables (mixtures of Gaussians)

Introduction of continuous latent variables leads to

models in which the number of free parameters can be

controlled independently of the dimensionality D of the

data space while still allowing the model to capture the

dominant correlations in the data set.

These two approaches can be combined leading to

hierarchical models useful in many applications.

Restricted Forms of the Multivariate Gaussian

21

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

In probabilistic models of images, we often use the

Gaussian version of the Markov random field.

It is a Gaussian distribution over the joint space of pixel

intensities

It is tractable because of the structure imposed for the

spatial organization of the pixels.

Restricted Forms of the Multivariate Gaussian

22

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Similarly, the linear dynamical system used to model time

series data for tracking, is also a joint Gaussian distribution

over a large number of observed and hidden variables.

It is tractable due to the structure imposed on the

distribution.

Graphical models are often used to introduce the

structure for such complex models.

Restricted Forms of the Multivariate Gaussian

23