Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far....

Lirong Xia

Hidden Markov Models

• Markov decision process (MDP)– transition probability only depends on

(state,action) in the previous step• Reinforcement learning

– unknown probability/rewards• Markov models

• Hidden Markov models

The “Markov”s we have learned so far

Markov Models

• A Markov model is a chain-structured BN– Conditional probabilities are the same (stationarity)– Value of X at a given time is called the state– As a BN:

– Parameters: called transition probabilities

p(X1) p(X|X-1)

• p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+p(X=sun|X-1=rain)p(X=rain)

• p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+p(X=rain|X-1=rain)p(X=rain)

Computing the stationary distribution

Hidden Markov Models

• Hidden Markov models (HMMs)– Underlying Markov chain over state X– Effects (observations) at each time step– As a Bayes’ net:

Example

• An HMM is defined by:– Initial distribution: p(X1)– Transitions: p(X|X-1)– Emissions: p(E|X)

Rt-1 p(Rt)t 0.7f 0.3

Rt p(Ut)t 0.9f 0.2

Filtering / Monitoring

• Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time

• B(Xt) = p(Xt|e1:t)• We start with B(X) in an initial setting, usually

uniform• As time passes, or we get observations, we update

Example: Robot Localization

Sensor model: never more than 1 mistakeMotion model: may not execute action with small prob.

HMM weather example: a question

• You have been stuck in the lab for three days (!)• On those days, your labmate was dry, wet, wet,

respectively• What is the probability that it is now raining outside?• p(X3 = r | E1 = d, E2 = w, E3 = w)

p(w|s) = .1 p(w|c) = .3 p(w|r)

Filtering

• Computationally efficient approach: first computep(X1 = i, E1 = d) for all states i

• p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt | xt-1)

p(w|s) = .1 p(w|c) = .3 p(w|r)

• Formal algorithm for filtering– Elapse of time

• compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t)

– Observe• compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t)

– Renormalization

• Introduction to sampling

Inference Recap: Simple Cases

( ) ( ) ( )1

1 1 1 1 1

| , ( , ) = ( ) ( | )

p X e p X e p ep x e

p x p e x

( )1 1|p X e ( )2p X

( ) ( )( ) ( )

p x p x x

Elapse of Time

• Assume we have current belief p(Xt-1|evidence to t-1)

B(Xt-1)=p(Xt-1|e1:t-1)• Then, after one time step passes:

p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1)• Or, compactly

B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)• With the “B” notation, be careful about

– what time step t the belief is about, – what evidence it includes

Observe and renormalization

• Assume we have current belief p(Xt| previous evidence):

B’(Xt)=p(Xt|e1:t-1)

• Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1)

• Or:B(Xt) ∝p(et|Xt)B’(Xt)

• Basic idea: beliefs reweighted by likelihood of evidence

• Need to renormalize B(Xt)

Recap: The Forward Algorithm

• We are given evidence at each time and want to know

• We can derive the following updates

B Xt( ) = p X t | e1:t( )

( )( )

( ) ( ) ( )

1 1: 1

1 1:1 1

= | | ,

t t tx

t t t tx

t tt t tx

p x ep x x e

p x x p e x

p e x p x x

We can normalize as we go if we want

to have p(x|e) at each time step, or

just once at the end…

Example HMM

Rt-1 p(Rt)t 0.7f 0.3

Rt-1 p(Ut)t 0.9f 0.2

Observe and time elapse

Observe

Time elapse and renormalize• Want to know B(Rain2)=p(Rain2|+u1,+u2)

Online Belief Updates

• Each time step, we start with p(Xt-1 | previous evidence):

• Elapse of timeB’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)

• ObserveB(Xt) ∝p(et|Xt)B’(Xt)

• Renormalize B(Xt) • Problem: space is |X| and time is |X|2 per time step

– what if the state is continuous?

• Real-world robot localization

Continuous probability space

Sampling

Approximate Inference

• Sampling is a hot topic in machine learning, and it’s really simple

• Basic idea:– Draw N samples from a sampling distribution S– Compute an approximate posterior probability– Show this converges to the true probability P

• Why sample?– Learning: get samples from a distribution you don’t know– Inference: getting a sample is faster than computing the

right answer (e.g. with variable elimination)

Prior Sampling

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

( )|p S C

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

( )|p R C

+w 0.99-w 0.01

-r+w 0.90-w 0.10

+w 0.90-w 0.10

-r+w 0.01-w 0.99

( )| ,p W S R

+c 0.5-c 0.5

( )p C

Samples:+c, -s, +r, +w-c, +s, -r, +w

Prior Sampling (w/o evidences)

• This process generates samples with probability:

i.e. the BN’s joint probability

• Let the number of samples of an event be

• Then

• I.e., the sampling procedure is consistent

SPS x1xn( ) = p xi | Parents X i( )( )i=1

∏ = p x1xn( )

NPS x1xn( )limN→∞

p x1,,xn( ) = limN→∞

NPS x1,,xn( ) / N= SPS x1,,xn( )= p x1,,xn( )

Example

• We’ll get a bunch of samples from the BN:+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w

• If we want to p(W)– We have counts <+w:4, -w:1>– Normalize to get p(W) = <+w:0.8, -w:0.2>– This will get closer to the true distribution with more

samples– Can estimate anything else, too– What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)?– Fast: can use fewer samples if less time (what’s the

drawback?)

Rejection Sampling

• Let’s say we want p(C)– No point keeping all samples around– Just tally counts of C as we go

• Let’s say we want p(C|+s)– Same thing: tally C outcomes, but

ignore (reject) samples which don’t have S=+s

– This is called rejection sampling– It is also consistent for conditional

probabilities (i.e., correct in the limit)

+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w

Likelihood Weighting

• Problem with rejection sampling:– If evidence is unlikely, you reject a lot of samples– You don’t exploit your evidence as you sample – Consider p(B|+a)

• Idea: fix evidence variables and sample the rest

• Problem: sample distribution not consistent!• Solution: weight by probability of evidence given

parents

-b, -a-b, -a-b, -a-b, -a+b, +a

-b, +a-b, +a-b, +a-b, +a+b, +a

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

( )|p S C

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

( )|p R C

+w 0.99-w 0.01

-r+w 0.90-w 0.10

+w 0.90-w 0.10

-r+w 0.01-w 0.99

( )| ,p W S R

+c 0.5-c 0.5

( )p C

Samples:+c, +s, +r, +w……

w = 0.1×0.99

• Sampling distribution if z sampled and e fixed evidence

• now, samples have weights

• Together, weighted sampling distribution is consistent

( ) ( )( )1

WS i ii

S z e p z Parents Z=

( ) ( )( )1

w z e p e Parents E=

( ) ( ) ( )( ) ( )( )

( )1 1

, , | |

WS i i i ii i

S z e w z e p z Parents Z p e Parents E

p z e= =

Ghostbusters HMM

– p(X1) = uniform– p(X|X’) = usually move clockwise, but

sometimes move in a random direction or stay in place

– p(Rij|X) = same sensor model as before: red means close, green means far away.

1/9 1/9 1/9

1/6 1/6 1/2

0 1/6 0

p(X|X’=<1,2>)

Example: Passage of Time

• As time passes, uncertainty “accumulates”

B ' X '( ) = p X ' | x( )x∑ B x( )

Transition model: ghosts usually go clockwise

T = 1 T = 2 T= 5

Example: Observation

• As we get observations, beliefs get reweighted, uncertainty “decreases”

( ) ( ) ( )| 'B p ex X B Xµ

Before observation After observation

Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far....

Documents

Transcript of Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far....

Hidden Markov Models - uml.edugrinstei/91.510/HMM/Class Sildes - Hidden Markov... · Hidden Markov Models. ... particular hidden state. • Thus a hidden Markov model is a standard

L13: hidden Markov models - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/sp/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models

Hidden Markov Models -

Hidden Markov Models in Bioinformaticscsatol/mach_learn/bemutato/Mate_Korosi_HMMpr… · Outline ˜ Markov Chain ˜ HMM (Hidden Markov Model) ˜ Hidden Markov Models in Bioinformatics

9 Markov chains and Hidden Markov Models - Freie … · 9 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward,

Hidden Markov Models and Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov Model ... ASR Lectures 4&5 Hidden Markov Models and Gaussian

Markov Chains and Hidden Markov Models - Rice University · are “hidden”; hence, we have a hidden Markov model, or HMM. ... In Markov chains and hidden Markov models, the probability

Hidden Markov Models - Home | Princeton Universityrvan/orf557/hmm080728.pdf · 1 Hidden Markov Models..... 1 1.1 Markov Processes ..... 1 1.2 Hidden Markov Models..... 4 ... able

L13: hidden Markov modelscourses.cs.tamu.edu/rgutier/csce630_f14/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models • Forward and Backward

Inference in Mixed Hidden Markov Models and Applications ... · 2. From hidden Markov models to mixed hidden Markov models Many authors suggested applying hidden Markov models to

EE365: Hidden Markov Models - Stanford Universityee266.stanford.edu/lectures/hmm.pdf · EE365: Hidden Markov Models Hidden Markov Models The Viterbi Algorithm 1. Hidden Markov Models

Overview Hidden Markov Models Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov models HMM algorithms ... model ASR Lectures 4&5 Hidden

Bioinformatics Introduction to Hidden Markov Models - … · Bioinformatics Introduction to Hidden Markov Models Hidden Markov Models and Multiple Sequence Alignment Slides borrowed

Hidden Markov Models - AUusers-cs.au.dk/cstorm/courses/PRiB_f12/slides/hidden-markov-model… · Hidden Markov Models Markov Model Hidden Markov Model If the latent variables are

Introduction to Probabilistic Graphical Modelspub.ist.ac.at/~chl/courses/PGM_W16/part6.pdf · Inference in Hidden Markov Models Learning Hidden Markov Models Hidden Markov Models

Hidden Markov Models./awm/tutorials/hmm14.pdf · Hidden Markov Models ... 14)

Hidden Markov Models

Markov Models and Hidden Markov Models (HMMs)