A Unifying Review of Linear Gaussian ModelsSummary Presentation 2/15/10 – Dae Il Kim
Department of Computer ScienceGraduate StudentAdvisor: Erik Sudderth Ph.D.
Overview
• Introduce the Basic Model• Discrete Time Linear Dynamical System (Kalman Filter)• Some nice properties of Gaussian distributions• Graphical Model: Static Model (Factor Analysis, PCA, SPCA)• Learning & Inference: Static Model• Graphical Model: Gaussian Mixture & Vector Quantization• Learning & Inference: GMMs & Quantization• Graphical Model: Discrete-State Dynamic Model (HMMs)• Independent Component Analysis• Conclusion
The Basic Model
• Basic Model: Discrete Time Linear Dynamical System (Kalman Filter)
Variations of this model produce:Factor AnalysisPrincipal Component AnalysisMixtures of GaussiansVector QuantizationIndependent Component AnalysisHidden Markov Models
),0(~ QNw ),0(~ RNv
Additive Gaussian Noise
wAxwAxx tttt 1
vCxvCxy tttt
A = k x k state transition matrixC = p x k observation / generative matrix
Generative Model
Nice Properties of Gaussians
• Markov Property
1
1
11111 )|()|()(}),...,{},,...,({
ttt
ttt xyPxxPxPyyxxP
• Inference in these models
}),...,({
}),...,{},,...,({}),...,{|},...,({
1
1111
yyP
yyxxPyyxxP
}),...,{|}({: 1 tt yyxPFiltering}),...,{|}({: 1 yyxPSmoothing t
• Learning via Expectation Maximization (EM)
dXYXPxQQstepEX
kQ
k )|,(log)(maxarg: 1
dXYXPYXPstepMX
kk )|,(log),|(maxarg: 1
1|),()|( 1
txttt QAxNxxPtyttt RCxNxyP |),()|(
• Conditional Independence
Graphical Model for Static Models
Factor Analysis: Q = I & R is diagonalSPCA: Q = I & R = αIPCA: Q = I & R = lime0eI
wxA 0
vCxy
),0(~ QNw
),0(~ RNv
Generative Model
Additive Gaussian Noise
Example of the generative process for PCA
Z = latent variableX = observed variable
1-dimensional latent space 2-dimensional observation space
Bishop (2006)
Marginal distribution for p(x)
Learning & Inference: Static ModelsAnalytically integrating over the joint, we obtain the marginal distribution of y.
),0(~ RCQCNy T
Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.
)(
)()|()|(
yp
xpxypyxp
yRCCN
xINyRCxNT |),0(
|),0(|),(
We can calculate our poterior using Bayes rule
xCIyNyxP |),()|(
1)( RCCC TT
Our posterior now becomes another Gaussian
Where beta is equal to:
Graphical Model: Gaussian Mixture Models & Vector Quantization
][0 wWTAxA
vCxy
Generative Model
),(~ QNw
),0(~ RNv
Additive Gaussian Noise
(Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ]
Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w.
IR 0lim
This model becomes a Vector Quantization model when:
Learning & Inference: GMMs & Quantization
)(
),()|()ˆ(
yP
yexPyexPx j
jj
k
i ij
jj
exPyRCN
exPyRCN
1)(|),(
)(|),(
k
i ij
jj
yRCN
yRCN
1)(|),(
)(|),(
Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.
Computing the Likelihood for the data is straightforward
),()(1
yexPyP j
k
i
)(|),(1
i
k
ii exPyRCN
k
iii yRCN
1
)(|),(
Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.
Gaussian Mixture Models
)( jj exP
Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.
Marginal Distribution p(y)Joint Distribution p(y,x)
Graphical Model: Discrete-State Dynamic Models
][1 ttt wAxWTAx
vCxvCxy tttt
),(~ QNw
),0(~ RNv
Additive Gaussian Noise
][ wAxWTA t
Generative Model
Independent Component Analysis• ICA can be seen as a linear generative model with non-gaussian priors for the hidden
variables or as a nonlinear generative model with gaussian priors for the hidden variables.
TT yWyfWW )(
The gradient learning rule to increase the likelihood:
dx
xpdxf x )(log)(
)(0 wgxA
vCxy
),0(~ QNw
),0(~ RNv
g(.) is a general nonlinearity that is invertible and differentiable
Generative Model
)))2
(1(4
ln(tan()(w
erfwg
Conclusion
Many more potential models!
Top Related