The Multivariate Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Multivariate Gaussian Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: nzabaras@gmail.com

URL: http://www.zabaras.com/

August 7, 2014

Multivariate Gaussian Distribution

Mahalanobis Distance, Geometric Interpretation

Derivation of Mean and Moments

Restricted Forms of the Gaussian, 2D Examples and Generalizations

Contents

Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4

Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2

A multivariate is Gaussian if its probability density is

where is symmetric positive definite matrix

(covariance matrix).

,D D D

11 1( , ) exp ( ) ( )

22 det

Multivariate Gaussian

The Gaussian distribution is invariant under linear

transformations, i.e. for

1 1 1 2 2 2

~ ( , ), ~ ( , )

, and independent

N NX X

1 2 1 2 1 2~ ( , )T T NAX BX c A B c A A B B

, ,c d dM A B c

The functional dependence of the Gaussian on x is through

the quadratic form (Mahalanobis distance)

Δ2 = (x − μ)TΣ−1(x − μ)

The Mahalanobis distance from μ to x reduces to the

Euclidean distance when Σ is the identity matrix.

The Gaussian distribution is constant on surfaces in x-space

for which this quadratic form is constant.

Σ can be taken to be symmetric, without loss of generality,

because any antisymmetric component would disappear from

the exponent.

Mahalanobis Distance

Now consider the eigenvector equation for the covariance

matrix

where i = 1, . . . , D.

Because Σ is a real, symmetric, its eigenvalues will be real,

and its eigenvectors form an orthonormal set.

i i i i j ij

if i jwhere I

otherwise

u u u u

The covariance matrix Σ can be expressed as an expansion

in terms of its eigenvectors

and similarly the inverse covariance matrix Σ−1 can be

expressed as

The Mahalanobis distance now becomes:

We can interpret {yi} as a new coordinate system defined by the orthonormal vectors ui that are shifted and rotated with

respect to the original xi coordinates.

Forming the vector y={y1,…,yD}, we have

y = U(x − μ)

where U is a matrix whose rows are given by

U is an orthogonal matrix, UUT = I, UTU = I, I =identity matrix

1, ( )

Multivariate Gaussian: Geometric Interpretation

The quadratic form, and thus the Gaussian density, are constant on ellipsoids,

with their centers at μ and their axes oriented along ui, and with scaling factors in

the directions of the axes given by

Note that the volume within the hyper-ellipsoid above can easily be computed:

1/2 1/2

1 1 1/

D D DD

i i i D

i i iz y

sphere

dy dz V

VD = volume of

unit shpere

From and using the orthogonality of U,

we can derive

The Jacobian of the transformation from x to y, is given as:

The square of the determinant of the Jacobian is:

j jy u x

1 1 1 1

( ) ( )

j j ji i i

D D D DT T T

kj j kj ji i i k k k k kj j

j i j j

U y U U x x x U y

Tiij ij

221T T TJ U U U U U

Also can be written as:

The multivariate Gaussian distribution can now be written

in the y-coordinate system as:

In the y-coordinates, the multivariate Gaussian factorizes

into a product of independent Gaussian distributions. This

verifies that p(y) is correctly normalized!

1/2 1/2

/2 1/21/21 1

1 1 1( ) ( ) | | exp exp

2 22 | | 2

Dj jj jj

y yp p

1( ) exp 1

yp d dy

The mean of the multivariate Gaussian can be computed

The exponent is an even function of the components of z

and, because the integrals over these are taken over the

range (−∞,∞), the term in z in the factor (z + μ) will vanish

by symmetry. Thus

Mean of the Multivariate Gaussian

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

x x x x x

z z z z

The 2nd moment of the multivariate Gaussian can be

computed as:

Second Moment of the Multivariate Gaussian

/2 1/2

1 1[ ] exp ( ) ( )

22 | |

1 1exp

22 | |

1 1exp

22 | |

1 1exp

22 | |

xx x x xx x

z z z z x

z z zz z

The terms with are zero due to symmetry.

The first remaining term is equal to from the

normalization of the multivariate Gaussian. It remains to

compute the last term.

T Tandz z T

/2 1/2

1 1[ ] exp

22 | |

1 1exp

22 | |

xx z z z

z z zz z

We can simplify using:

/2 /21/2 1/21

1 1 1 1exp exp

2 22 | | 2 | |

DjT T T

D Dj j

z z zz z zz z

1 1 1 1

D D D DT T

k k kj j k kj j jk j j j

j j j j

x U y or z U y U y or y

/2 /21/2 1/21 1 1 1 1 1

1/2 1/2 1

1 1 1 1exp exp

2 22 | | 2 | |

1 1exp

D D D D D DT Tk k

i j i j i j i jD Di j k i j kk k

T ki iD

y yy y d y y d

u u z u u y

The integral with terms dropsdue to symmetry

0D D D

i i i i

In the last step, we used the expression for the 2nd moment

of a Gaussian and:

/2 1/21 1 1

1 1exp

22 | |

1 1exp 0

22 | |

D D DT k

i j i jDi j k k

D D DT Tk

i i i i i iDi k ik

yy y d

u u y u u

( ) [ ] ( ) 0

[ ] var( )

i i i i

i i i i i i i i i

y y and

u x u u u u u

We finally conclude that

From this, we can derive the covariance as

The parameters in the Gaussian distribution increase with

dimensionality. A general symmetric covariance matrix Σ has

D(D + 1)/2 independent parameters.

This together with the D independent parameters in μ, gives

D(D + 3)/2 parameters. This grows quadratically with D.

[ ]T T xx

cov - -x x x

For diagonal covariance matrix we have only 2D total number

of parameters.

The corresponding contours of constant

density are given by axis-aligned ellipsoids.

We could further restrict the covariance

matrix

and in this (isotropic covariance) case we

have a total of D+1 parameters. The

constant density contours are now circles.

Restricted Forms of the Multivariate Gaussian

2( )idiag

Level sets of 2D Gaussians (full, diagonal and spherical covariance matrix)

2D Gaussian

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

diagonal

-5 -4 -3 -2 -1 0 1 2 3 4 5-10

diagonal

spherical

-6 -4 -2 0 2 4 6-5

spherical

gaussPlot2DDemo

from PMTK

The Gaussian distribution is flexible (many parameters) but

is limited to unimodal distributions.

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed.

Using latent variables (hidden variables, unobserved

variables) allows both of these problems to be addressed:

Multimodal distributions can be obtained by introducing

discrete latent variables (mixtures of Gaussians)

Introduction of continuous latent variables leads to

models in which the number of free parameters can be

controlled independently of the dimensionality D of the

data space while still allowing the model to capture the

dominant correlations in the data set.

These two approaches can be combined leading to

hierarchical models useful in many applications.

In probabilistic models of images, we often use the

Gaussian version of the Markov random field.

It is a Gaussian distribution over the joint space of pixel

intensities

It is tractable because of the structure imposed for the

spatial organization of the pixels.

Similarly, the linear dynamical system used to model time

series data for tracking, is also a joint Gaussian distribution

over a large number of observed and hidden variables.

It is tractable due to the structure imposed on the

distribution.

Graphical models are often used to introduce the

structure for such complex models.

The Multivariate Gaussian

Documents

Transcript of The Multivariate Gaussian

The Multivariate Gaussian Jian Zhao. Outline What is multivariate Gaussian? Parameterizations Mathematical Preparation Joint distributions, Marginalization.

Gaussian Processes for Big Data Problemsmpd37/teaching/tutorials/2015-04-14-mlss.pdf · 4/14/2015 · Gaussian Processes Marc Deisenroth @MLSS, 14 April 2015 9. Sampling from a Multivariate

Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to inﬁnitely

Multivariate Gaussian and Student-t process regression for ... · Matrix-variate Gaussian and Student-t distributions have many useful properties, as discussed in the studies [10,

Mapping Multiple Multivariate Gaussian Random …conferenze.dei.polimi.it/FPL2010/presentations/T1_C_3.pdf · Mapping Multiple Multivariate Gaussian Random Number Generators ... •Acceleration

Learning from Data: Gaussian Mixture Models · Gaussian Mixture Models Reminder: Class Conditional Classiﬁcation I Have real valued multivariate data, along with class label for

Multivariate Gaussian Process Regression for Portfolio ...math.iit.edu/~mdixon7/multivariate-gaussian-process_DC.pdf · Multivariate Gaussian Process Regression for Portfolio Risk

Transformation-based Nonparametric Estimation of Multivariate … · 2013. 3. 9. · the Gaussian kernel. For multivariate densities, the product kernel, which is the product of univariate

Gaussian Processes: An Introduction · Gaussian Processes De nition. A Gaussian process fZ tg, (t 2T) is a stochastic process, each subset of fZ tgforming a (multivariate) Gaussian.

Gaussian Process Vine Copulas for Multivariate Dependence

Bayesian inference for general Gaussian graphical models ......Bayesian inference for general Gaussian graphical models with application to multivariate lattice data Adrian Dobra,

Gaussian Scale Mixture Models For Robust Linear Multivariate Regression With Missing … · GAUSSIAN SCALE MIXTURE MODELS FOR ROBUST LINEAR MULTIVARIATE REGRESSION WITH MISSING DATA

Midterm Review - University of Torontorsalakhu/CSC411/notes/Review.pdfMidterm Review • Multivariate Gaussian distributions, Multivariate Gaussian Naïve Bayes, classifier, definition

Capacity of the Gaussian Channel With Memory: The Multivariate Case

Using Gaussian Windows to Explore a Multivariate Data Set · Using Gaussian Windows to Explore a Multivariate Data Set Louie A. Jaeckel Research Institute for Advanced Computer Science

Gaussian Graphical Modelsir.hit.edu.cn/~jguo/docs/notes/report-in-princeton...Gaussian Graphical Model Multivariate Gaussian Sample Covariance Matrix =~ 1 n 1 Xn i=1 (x i ˘)(x i ˘)T

Preconditioned Methods for Sampling Multivariate Gaussian Distributionsechow/pubs/sampling.pdf · Preconditioned Methods for Sampling Multivariate Gaussian Distributions Edmond Chow

Efficient Simulation of Stationary Multivariate Gaussian Random … · Fourier transform and matrix multiplication of the initial vector of univariate Gaussian fields is required.

CPSC 540: Machine Learningschmidtm/Courses/540-W19/L14.pdf · Properties of Multivariate Gaussian Mixture Models MLE for Multivariate Gaussians (Covariance Matrix) Gradient matrix

High-dimensional multivariate forecasting with low …...High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes David Salinas Naverlabs david.salinas@naverlabs.com