Probabilistic Graphical Models (10-708)epxing/Class/10708-07/Slides...8 Eric Xing 15 KF intuition...

School of Computer Science

State Space Models

Probabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)

Lecture 13, part II Nov 5th, 2007

Eric XingEric XingReceptor A

Kinase C

Gene G Gene H

Kinase EKinase D

Receptor BX1 X2

X3 X4 X5

Receptor A

Kinase C

Gene G Gene H

Kinase EKinase D

Receptor BX1 X2

X3 X4 X5

Reading: J-Chap. 15, K&F chapter 19.1 -19.3

Eric Xing 2

A road map to more complex dynamic models

Ydiscrete

discrete

continuous

Mixture modele.g., mixture of multinomials

Mixture modele.g., mixture of Gaussians

Factor analysis

A AA Ax2 x3x1 xN

y2 y3y1 yN...

... A AA Ax2 x3x1 xN

y2 y3y1 yN...

HMM (for discrete sequential data, e.g., text)

HMM(for continuous sequential data, e.g., speech signal)

State space model

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

Factorial HMM Switching SSM

Eric Xing 3

State space models (SSM):

A sequential FA or a continuous state HMM

In general,

A AA AY2 Y3Y1 YN

X2 X3X1 XN...

... ),;0(~

);0(~ ),;0(~

RvQwvC

+=+= −

This is a linear dynamic system.

where f is an (arbitrary) dynamic model, and g is an (arbitrary) observation model

Eric Xing 4

LDS for 2D trackingDynamics: new position = old position + ∆×velocity + noise (constant velocity model, Gaussian noise)

Observation: project out first two components (we observe Cartesian position of object - linear!)

10000100

010001

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎛∆

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

noise +

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

00100001

Eric Xing 5

The inference problem 1

ittt jipipip 111 −− ===∝== ∑ αα )X|X()X|y()|X( :y

A Y2 Y3Y1 Yt

X2 X3X1 Xt...

)|( :1 ttxP y

0α 1α 2α tα

Filtering given y1, …, yt, estimate xt: The Kalman filter is a way to perform exact online inference (sequential Bayesian updating) in an LDS. It is the Gaussian analog of the forward algorithm for HMMs:

Eric Xing 6

The inference problem 2Smoothing given y1, …, yT, estimate xt (t<T)

The Rauch-Tung-Strievel smoother is a way to perform exact off-line inference in an LDS. It is the Gaussian analog of the forwards-backwards (alpha-gamma) algorithm:

∑ ++∝==j

itTt XXPyip 11:1 )|()|X( γαγ

A Y2 Y3Y1 Yt

X2 X3X1 Xt...

0α 1α 2α tα0γ 1γ 2γ tγ

Eric Xing 7

2D tracking

filtered

Eric Xing 8

Kalman filtering in the brain?

Eric Xing 9

Kalman filtering derivationSince all CPDs are linear Gaussian, the system defines a large multivariate Gaussian.

Hence all marginals are Gaussian.Hence we can represent the belief state p(Xt|y1:t) as a Gaussian

covariance .Hence, instead of marginalization for message passing, we will directly estimate the means and covariances of the required marginalsIt is common to work with the inverse covariance (precision) matrix ; this is called information form.

),,|XX(| ttttt yy K1TEP ≡

1−tt |P

),,|X(x̂ | tttt yy K1E≡

Eric Xing 10

Kalman filtering derivationKalman filtering is a recursive procedure to update the belief state:

Predict step: compute p(Xt+1|y1:t) from prior belief p(Xt|y1:t) and dynamical model p(Xt+1|Xt) --- time update

Update step: compute new belief p(Xt+1|y1:t+1) from prediction p(Xt+1|y1:t), observation yt+1 and observation model p(yt+1|Xt+1) --- measurement update

AA YtY1

Xt Xt+1X1 ...

A AA Yt Yt+1Y1

Xt Xt+1X1 ...

Eric Xing 11

Predict stepDynamical Model:

One step ahead prediction of state:

Observation model:One step ahead prediction of observation:

);(~ , QGA 01 Ntttt ww+=+ xx

);0(~ , RvvC tttt N+= xy

AA YtY1

Xt Xt+1X1 ...

A AA Yt Yt+1Y1

Xt Xt+1X1 ...

Eric Xing 12

Update stepSummarizing results from previous slide, we have p(Xt+1,Yt+1|y1:t) ~ N(mt+1, Vt+1), where

Remember the formulas for conditional Gaussian distributions:

) , (),|( ⎥⎦

⎤⎢⎣

⎡ΣΣΣΣ

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=Σ⎥

⎤⎢⎣

22121121

2212121

2121121

ΣΣΣ−Σ=

−ΣΣ+=

),|()(

),|()( N

⎟⎟⎠

⎞⎜⎜⎝

ttt xC

⎟⎟⎠

⎞⎜⎜⎝

+++ RCCPCP

CPPV Ttttt

Eric Xing 13

Kalman FilterMeasurement updates:

where Kt+1 is the Kalman gain matrix

)x̂C(yˆˆ t|1t1t|| ++++++ −+= 1111 ttttt Kxx

ttttttt CPKPP |11|11|1 +++++ −=

-1TT RCCPCPK )( || += +++ ttttt 111

Eric Xing 14

Example of KF in 1DConsider noisy observations of a 1D particle doing a random walk:

KF equations:

),(~ ,| xttt wwxx σ011 N+= −− ),(~ , ztt vvxz σ0N+=

t|tt|ttt xxx ˆˆˆ| ==+ A1, ||1 xt

TTtttt GQGAAPP σσ +=+=+

( )zxt

ttztxtttttt

xzxCzKxx

σσσσσσ

=+= +++++++

|1t|1t1t1|11|1

ˆ)ˆ-(ˆˆ

( )zxt

zxttttttt σσσ

σσσ++

+== ++++ ||| 1111 KCP-PP

))(()( || zxtxtttttt σσσσσ +++=+= +++-1TT RCCPCPK 111

Eric Xing 15

KF intuitionThe KF update of the mean is

the term is called the innovation

New belief is convex combination of updates from prior and observation, weighted by Kalman Gain matrix:

If the observation is unreliable, σz (i.e., R) is large so Kt+1 is small, so we pay more attention to the prediction.If the old prior is unreliable (large σt) or the process is very unpredictable (large σx), we pay more attention to the observation.

( )zxt

ttztxtttttt

xzxCzKxx

σσσσσσ

=+= +++++++

|1t|1t1t1|11|1

ˆ)ˆ-(ˆˆ

)ˆ( t|1t1t ++ − xz C

))(()( || zxtxtttttt σσσσσ +++=+= +++-1TT RCCPCPK 111

Eric Xing 16

The KF update of the mean is

Consider the special case where the hidden state is a constant, xt =θ, but the “observation matrix” C is a time-varying vector, C = xt

T.Hence the observation model at each time slide, , is a linear regression

We can estimate recursively using the Kalman filter:

This is called the recursive least squares (RLS) algorithm.

We can approximate by a scalar constant. This is called the least mean squares (LMS) algorithm.We can adapt ηt online using stochastic approximation theory.

)ˆ-(ˆˆt|1t1t|| +++++ += xyxx ttttt CKA 111

tTtt vxy += θ

ttTtttt xxy )ˆ(ˆˆ

1t θθθ −+= +−

1 +−

+ ≈ tt ηRP

KF, RLS and LMS

Eric Xing 17

Complexity of one KF stepLet and ,

Computing takes O(Nx3) time, assuming

dense P and dense A.

Computing takes O(Ny3) time.

So overall time is, in general, max {Nx3,Ny

xNtX R∈ yN

tY R∈

TTtttt GQGAAPP +=+ ||1

-1TT RCCPCPK )( || += +++ ttttt 111

Eric Xing 18

The inference problem 2Smoothing given y1, …, yT, estimate xt (t<T)

The Rauch-Tung-Strievel smoother is a way to perform exact off-line inference in an LDS. It is the Gaussian analog of the forwards-backwards (alpha-gamma) algorithm:

∑ ++∝==j

itTt XXPyip 11:1 )|()|X( γαγ

A Y2 Y3Y1 Yt

X2 X3X1 Xt...

0α 1α 2α tα0γ 1γ 2γ tγ

Eric Xing 19

RTS smoother derivationSmoothing given y1, …, yT, estimate P(xt |y1:T) (t<T)

Step 1: joint distribution of xt and xt+1 conditioned on y1:t

Use A AYt Yt+1

);;0(~;1 QwwGA tttt N+=+ xx

Eric Xing 20

RTS smoother derivationFollowing the results from previous slide, we need to derive p(Xt+1,Xt|y1:t) ~ N(m, V), where

all the quantities here are available after a forward KF pass

Remember the formulas for conditional Gaussian distributions:

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

+ tttt

PAPAPPV1

, ) , (),|( ⎥⎦

⎤⎢⎣

⎡ΣΣΣΣ

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=Σ⎥

⎤⎢⎣

22121121

2212121

2121121

ΣΣΣ−Σ=

−ΣΣ+=

),|()(

),|()( N

−+= ttttt || PAPL T

Eric Xing 21

RTS smoother derivation

Step 2: compute = E[xt|y0:T] using results aboveUseUse E[X|Z] = E[E[X|Y,Z]|Z]

A AYt Yt+1

Xt+1Xt

Ttx |ˆ

Eric Xing 22

RTS derivationRepeat the same process for Variance

Refer to Jordan chapter 15

The RTS smoother results:

)ˆ-ˆ(ˆˆ |1|1|| ttTttttTt Lx +++= xxx

( ) TtttTttttTt LPPLPP |1|1|| ++ −+=

Eric Xing 23

Learning SSMsComplete log likelihood

EME-step: compute

these quantities can be inferred via KF and RTS filters, etc., e,g.,

M-step: MLE using

c.f., M-step in factor analysis

{ }( ) { }( )RCGQA ,;:,,,;:,,);(

)|(log)|(log)(log),(log),( ,,,,

tXXXftXXXXXfXf

xypxxpxpyxpD

n ttntn

∀+∀+Σ=

− ∑∑∑∑∑∑

312011

11θcl

Ttt yyXXXXX K, , , 11−

22TtTtt

Ttt xPXXXXX ||

ˆ)(E)var( +=+≡

( ) { }( ) { }( )RCGQA ,;:,,,;:,,;),( tXXXftXXXXXfXfD tTttt

Ttt ∀+∀+Σ= − 312011θcl

Eric Xing 24

Nonlinear systemsIn robotics and other problems, the motion model and the observation model are often nonlinear:

An optimal closed form solution to the filtering problem is no longer possible.The nonlinear functions f and g are sometimes represented by neural networks (multi-layer perceptrons or radial basis function networks).The parameters of f and g may be learned offline using EM, where we do gradient descent (back propagation) in the M step, c.f. learning a MRF/CRF with hidden nodes.Or we may learn the parameters online by adding them to the state space: xt'= (xt, θ). This makes the problem even more nonlinear.

)( , )( tttttt vxgywxfx +=+= −1

Eric Xing 25

Extended Kalman Filter (EKF)The basic idea of the EKF is to linearize f and g using a second order Taylor expansion, and then apply the standard KF.

i.e., we approximate a stationary nonlinear system with a non-stationary linear system.

where and and

The noise covariance (Q and R) is not changed, i.e., the additional error due to linearization is not modeled.

)ˆ()ˆ(

ttttxttt

vxxxgywxxxfx

+−+=

−−

−−−−−

−−

)ˆ(ˆ|| 111 −−− = tttt xfx

ˆ ∂∂

School of Computer Science

Complex Graphical Models

Probabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)

Lecture 14, Nov 5th, 2007

Eric XingEric XingReceptor A

Kinase C

Gene G Gene H

Kinase EKinase D

Receptor BX1 X2

X3 X4 X5

Receptor A

Kinase C

Gene G Gene H

Kinase EKinase D

Receptor BX1 X2

X3 X4 X5

Reading: K&F chapter 20.1 - 20.3

Eric Xing 27

A road map to more complex dynamic models

Ydiscrete

discrete

continuous

Mixture modele.g., mixture of multinomials

Mixture modele.g., mixture of Gaussians

Factor analysis

A AA Ax2 x3x1 xN

y2 y3y1 yN...

HMM (for discrete sequential data, e.g., text)

HMM(for continuous sequential data, e.g., speech signal)

State space model

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

... ... ... ...

A AA Ax2 x3x1 xN

yk2 yk3yk1 ykN...

y12 y13y11 y1N...

S2 S3S1 SN...

Factorial HMM Switching SSM

Eric Xing 28

NLP and Data MiningWe want:

Semantic-based search infer topics and categorize documentsMultimedia inferenceAutomatic translation Predict how topics evolve…

Researchtopics

1900 2000

Researchtopics

1900 2000

Eric Xing 29

The Vector Space ModelRepresent each document by a high-dimensional vector in the space of words

⇒⇒

Eric Xing 30

T (m x k)

Λ(k x k)

(k x n)

X (m x n)

Document

Term ...

kkkk Tdw

Latent Semantic Indexing

LSA does not define a properly normalized probability distribution of observed and latent entities

Does not support probabilistic reasoning under uncertainty and data fusion

Eric Xing 31

Latent Semantic Structure

Latent Structure

l),()( ww PPl

Distribution over words

)w()()|w()w|(

PPPP ll

Inferring latent structure

...)w|( 1 =+nwPPrediction

Eric Xing 32

Objects are bags of elements

Mixtures are distributions over elements

Objects have mixing vector θRepresents each mixtures’ contributions

Object is generated as follows:Pick a mixture component from θPick an element from that component

Admixture Models

money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2

0.1 0.1 0.5…..

0.1 0.5 0.1…..

0.5 0.1 0.1…..

money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2

Eric Xing 33

Topic Models =Admixture ModelsGenerating a document

w βNd

( ){ } ( )

from ,| Draw -

from Draw- each wordFor

prior thefrom

:1 nzknn

lmultinomiazwlmultinomiaz

ββθ

Which prior to use?

Eric Xing 34

Choice of Prior

Dirichlet (LDA) (Blei et al. 2003)Conjugate prior means efficient inference

Can only capture variations in each topic’s intensity independently

Logistic Normal (CTM=LoNTAM) (Blei & Lafferty 2005, Ahmed & Xing 2006)

Capture the intuition that some topics are highly correlated and can rise up in intensity togetherNot a conjugate prior implies hard inference

Eric Xing 35

Logistic Normal Vs. Dirichlet

Dirichlet

Eric Xing 36

Logistic Normal Vs. Dirichlet

Logistic

Normal

Eric Xing 37

Mixed Membership Model (M3)Mixture versus admixture

⇒⇒

A Bayesian mixture model A Bayesian admixture model: Mixed membership model

Eric Xing 38

Latent Dirichlet Allocation: M3 in text mining

A document is a bag of words each generated from a randomly selected topic

Eric Xing 39

Population admixture: M3 in genetics

The genetic materials of each modern individual are inherited from multiple ancestral populations, each DNA locus may have a different generic origin …

Ancestral labels may have (e.g., Markovian) dependencies

Eric Xing 40

Inference in Mixed Membership Models

Mixture versus admixture

Inference is very hard in M3, all hidden variables are coupled and not factorizable!

∑ ∏ ∏∫ ∫ ⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

)|()|()|()|()(mn

nmnzmn dddGppzpxpDp φππφαππφ LL 1

⇒⇒

∑ ∫ ∏ ∏ −⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

)|()|()|()|(~)|(mn

nmnzmnn ddGppzpxpDp φπφαππφπ

Eric Xing 41

Approaches to inferenceExact inference algorithms

The elimination algorithmThe junction tree algorithms

Approximate inference techniques

Monte Carlo algorithms:Stochastic simulation / sampling methodsMarkov chain Monte Carlo methods

Variational algorithms: Belief propagationVariational inference

Probabilistic Graphical Models (10-708)epxing/Class/10708-07/Slides...8 Eric Xing 15 KF intuition...

Documents

Transcript of Probabilistic Graphical Models (10-708)epxing/Class/10708-07/Slides...8 Eric Xing 15 KF intuition...

KF 3/100 ... KF 6/730

KF Our Services

2007 KF ProgressReport

kf iianaHaaf

iFloor Express Brochure - JHS Carpets · Q74-1 KF 8057 SC 8093 SC 8092 KF 8052 Q74-3 SC 9065 KF 8060 KF 8052 SC 9067 Q74-4 KF 7058 SC 8093 SC 8092 SC 6116 Q74-5 KF 3083 KF 1059 KF

KF Coulometry

PROPOSAL KF

Surface Mount Multilayer Ceramic Chip ... - api.kemet.com€¦ · 560 pF 561 J K M HG HG HG HG HG HG HG JK JK JK JK JK JK JK KF KF KF KF KE KE KF ... All product specifi cations,

KF Contromatics Product Summarystocktonvalve.com/pdfs/KF/KFContromaticsPSSept2003.pdf2 KF Contromatics Product Summary ... Petrochemical, Refineries, Water and Wastewater, Mining,

Allocation of Geometric Tolerances zNew Criterion and Methodology

SOP: Karl Fischer (KF) Determination of Water with KF Drying Oven ...

KF Valves KF Series TTE/ TW/ / TWE One-Piece Top Entry ......KF Series TTE/ TW/ / TWE One-Piece Top Entry Trunnion Ball Valves Continuously Improving Flow Control 2 KF Valves KF Series

Religion Standards Essential Concepts Across the …...Essential Concepts: KF-R Revelation, KF-R-1 Sacred Scripture, KF-R-2 Salvation History, KF-R-3 Christology PK-KF-R Begin to understand

KF Blank Flange KF FLANGES CONNECTION WITH CLAMPS

KF-42WE610 KF-50WE610 KF-60WE610 - Sony eSupport - · PDF fileLCD Projection TV HD-Monitor ... KF-42WE610, KF-50WE610, KF-60WE610 ... necessary to clean the inside of the LCD Projection

KF Valves KF Threaded & Grooved End Floating Ball Valves

Recent results from RD50 - CERN · Recent results from RD50 zSLHC zRD50 zImproving silicon detectors zNew material zNew structures ... MCZ, EPI zOxy. dimer & hydrogen enriched Si

KF-42WE610 KF-50WE610 KF-60WE610 · model: kf-42we610, kf-50we610, kf-60we610 please keep this notice with the instruction manual. caution risk of electric shock do not open attention

Kf Monograph[1]

Operating and Installation Instructions - us.mieleusa.comus.mieleusa.com/MieleMedia/docs/operatinginstructions/Refrigeration...KF 1801 SF KF 1811 SF KF 1901 SF KF 1911 SF ... 36 Using