Download - A Unifying Review of Linear Gaussian Models Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth.

A Unifying Review of Linear Gaussian ModelsSummary Presentation 2/15/10 – Dae Il Kim

Department of Computer ScienceGraduate StudentAdvisor: Erik Sudderth Ph.D.

Overview

• Introduce the Basic Model• Discrete Time Linear Dynamical System (Kalman Filter)• Some nice properties of Gaussian distributions• Graphical Model: Static Model (Factor Analysis, PCA, SPCA)• Learning & Inference: Static Model• Graphical Model: Gaussian Mixture & Vector Quantization• Learning & Inference: GMMs & Quantization• Graphical Model: Discrete-State Dynamic Model (HMMs)• Independent Component Analysis• Conclusion

The Basic Model

• Basic Model: Discrete Time Linear Dynamical System (Kalman Filter)

Variations of this model produce:Factor AnalysisPrincipal Component AnalysisMixtures of GaussiansVector QuantizationIndependent Component AnalysisHidden Markov Models

),0(~ QNw ),0(~ RNv

Additive Gaussian Noise

wAxwAxx tttt 1

vCxvCxy tttt

A = k x k state transition matrixC = p x k observation / generative matrix

Generative Model

Nice Properties of Gaussians

• Markov Property

1

1

11111 )|()|()(}),...,{},,...,({

ttt

ttt xyPxxPxPyyxxP

• Inference in these models

}),...,({

}),...,{},,...,({}),...,{|},...,({

1

1111

yyP

yyxxPyyxxP

}),...,{|}({: 1 tt yyxPFiltering}),...,{|}({: 1 yyxPSmoothing t

• Learning via Expectation Maximization (EM)

dXYXPxQQstepEX

kQ

k )|,(log)(maxarg: 1

dXYXPYXPstepMX

kk )|,(log),|(maxarg: 1

1|),()|( 1

txttt QAxNxxPtyttt RCxNxyP |),()|(

• Conditional Independence

Graphical Model for Static Models

Factor Analysis: Q = I & R is diagonalSPCA: Q = I & R = αIPCA: Q = I & R = lime0eI

wxA 0

vCxy

),0(~ QNw

),0(~ RNv

Generative Model


Example of the generative process for PCA

Z = latent variableX = observed variable

1-dimensional latent space 2-dimensional observation space

Bishop (2006)

Marginal distribution for p(x)

Learning & Inference: Static ModelsAnalytically integrating over the joint, we obtain the marginal distribution of y.

),0(~ RCQCNy T

Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.

)(

)()|()|(

yp

xpxypyxp

yRCCN

xINyRCxNT |),0(

|),0(|),(

We can calculate our poterior using Bayes rule

xCIyNyxP |),()|(

1)( RCCC TT

Our posterior now becomes another Gaussian

Where beta is equal to:

Graphical Model: Gaussian Mixture Models & Vector Quantization

][0 wWTAxA

vCxy

Generative Model

),(~ QNw

),0(~ RNv


(Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ]

Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w.

IR 0lim

This model becomes a Vector Quantization model when:

Learning & Inference: GMMs & Quantization

)(

),()|()ˆ(

yP

yexPyexPx j

jj

k

i ij

jj

exPyRCN

exPyRCN

1)(|),(

)(|),(

k

i ij

jj

yRCN

yRCN

1)(|),(

)(|),(

Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.

Computing the Likelihood for the data is straightforward

),()(1

yexPyP j

k

i

)(|),(1

i

k

ii exPyRCN

k

iii yRCN

1

)(|),(

Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.

Gaussian Mixture Models

)( jj exP

Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.

Marginal Distribution p(y)Joint Distribution p(y,x)

Graphical Model: Discrete-State Dynamic Models

][1 ttt wAxWTAx

vCxvCxy tttt

),(~ QNw

),0(~ RNv


][ wAxWTA t

Generative Model

Independent Component Analysis• ICA can be seen as a linear generative model with non-gaussian priors for the hidden

variables or as a nonlinear generative model with gaussian priors for the hidden variables.

TT yWyfWW )(

The gradient learning rule to increase the likelihood:

dx

xpdxf x )(log)(

)(0 wgxA

vCxy

),0(~ QNw

),0(~ RNv

g(.) is a general nonlinearity that is invertible and differentiable

Generative Model

)))2

(1(4

ln(tan()(w

erfwg

Conclusion

Many more potential models!