Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far....

31
Lirong Xia Hidden Markov Models

Transcript of Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far....

Page 1: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Lirong Xia

Hidden Markov Models

Page 2: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

• Markov decision process (MDP)– transition probability only depends on

(state,action) in the previous step• Reinforcement learning

– unknown probability/rewards• Markov models

• Hidden Markov models

1

The “Markov”s we have learned so far

Page 3: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Markov Models

2

• A Markov model is a chain-structured BN– Conditional probabilities are the same (stationarity)– Value of X at a given time is called the state– As a BN:

– Parameters: called transition probabilities

p(X1) p(X|X-1)

Page 4: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

• p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+p(X=sun|X-1=rain)p(X=rain)

• p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+p(X=rain|X-1=rain)p(X=rain)

3

Computing the stationary distribution

Page 5: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Hidden Markov Models

4

• Hidden Markov models (HMMs)– Underlying Markov chain over state X– Effects (observations) at each time step– As a Bayes’ net:

Page 6: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example

5

• An HMM is defined by:– Initial distribution: p(X1)– Transitions: p(X|X-1)– Emissions: p(E|X)

Rt-1 p(Rt)t 0.7f 0.3

Rt p(Ut)t 0.9f 0.2

Page 7: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Filtering / Monitoring

6

• Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time

• B(Xt) = p(Xt|e1:t)• We start with B(X) in an initial setting, usually

uniform• As time passes, or we get observations, we update

B(X)

Page 8: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example: Robot Localization

7

Sensor model: never more than 1 mistakeMotion model: may not execute action with small prob.

Page 9: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

HMM weather example: a question

s

c r

.1

.2

.6

.3.4

.3

.3

.5

.3

• You have been stuck in the lab for three days (!)• On those days, your labmate was dry, wet, wet,

respectively• What is the probability that it is now raining outside?• p(X3 = r | E1 = d, E2 = w, E3 = w)

p(w|s) = .1 p(w|c) = .3 p(w|r)

= .8

Page 10: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Filtering

• Computationally efficient approach: first computep(X1 = i, E1 = d) for all states i

• p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt | xt-1)

s

c r

.1

.2

.6

.3.4

.3

.3

.5

.3

p(w|s) = .1 p(w|c) = .3 p(w|r)

= .8

Page 11: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

• Formal algorithm for filtering– Elapse of time

• compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t)

– Observe• compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t)

– Renormalization

• Introduction to sampling

10

Today

Page 12: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Inference Recap: Simple Cases

11

( ) ( ) ( )1

1 1 1 1 1

1 1

1 1 1

| , ( , ) = ( ) ( | )

X

p X e p X e p ep x e

p x p e x

( )1 1|p X e ( )2p X

( ) ( )( ) ( )

1

1

2 1 2

1 2 1

,

= |x

x

p x p x x

p x p x x

=åå

Page 13: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Elapse of Time

12

• Assume we have current belief p(Xt-1|evidence to t-1)

B(Xt-1)=p(Xt-1|e1:t-1)• Then, after one time step passes:

p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1)• Or, compactly

B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)• With the “B” notation, be careful about

– what time step t the belief is about, – what evidence it includes

Page 14: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Observe and renormalization

13

• Assume we have current belief p(Xt| previous evidence):

B’(Xt)=p(Xt|e1:t-1)

• Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1)

• Or:B(Xt) ∝p(et|Xt)B’(Xt)

• Basic idea: beliefs reweighted by likelihood of evidence

• Need to renormalize B(Xt)

Page 15: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Recap: The Forward Algorithm

14

• We are given evidence at each time and want to know

• We can derive the following updates

B Xt( ) = p X t | e1:t( )

( )( )

( ) ( ) ( )

( ) ( ) ( )

1

1

1

1:

1 1:

1

1:

1 1: 1

1 1:1 1

( , )

,

| , ,

| |

= | | ,

t

t

t

t t X

t t tx

t t t tx

t

t t

t t

t tt t tx

p x ep x x e

p x x p e x

p e x p x x

p x e

p x e

p x e

-

-

-

- -

-

-

-

- -

µ=

=

åå

å

We can normalize as we go if we want

to have p(x|e) at each time step, or

just once at the end…

Page 16: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example HMM

15

Rt-1 p(Rt)t 0.7f 0.3

Rt-1 p(Ut)t 0.9f 0.2

Page 17: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Observe and time elapse

16

Observe

Time elapse and renormalize• Want to know B(Rain2)=p(Rain2|+u1,+u2)

Page 18: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Online Belief Updates

17

• Each time step, we start with p(Xt-1 | previous evidence):

• Elapse of timeB’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)

• ObserveB(Xt) ∝p(et|Xt)B’(Xt)

• Renormalize B(Xt) • Problem: space is |X| and time is |X|2 per time step

– what if the state is continuous?

Page 19: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

• Real-world robot localization

18

Continuous probability space

Page 20: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

19

Sampling

Page 21: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Approximate Inference

20

• Sampling is a hot topic in machine learning, and it’s really simple

• Basic idea:– Draw N samples from a sampling distribution S– Compute an approximate posterior probability– Show this converges to the true probability P

• Why sample?– Learning: get samples from a distribution you don’t know– Inference: getting a sample is faster than computing the

right answer (e.g. with variable elimination)

Page 22: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Prior Sampling

21

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

( )|p S C

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

( )|p R C

+s+r

+w 0.99-w 0.01

-r+w 0.90-w 0.10

-s+r

+w 0.90-w 0.10

-r+w 0.01-w 0.99

( )| ,p W S R

+c 0.5-c 0.5

( )p C

Samples:+c, -s, +r, +w-c, +s, -r, +w

Page 23: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Prior Sampling (w/o evidences)

22

• This process generates samples with probability:

i.e. the BN’s joint probability

• Let the number of samples of an event be

• Then

• I.e., the sampling procedure is consistent

SPS x1xn( ) = p xi | Parents X i( )( )i=1

n

∏ = p x1xn( )

NPS x1xn( )limN→∞

p x1,,xn( ) = limN→∞

NPS x1,,xn( ) / N= SPS x1,,xn( )= p x1,,xn( )

Page 24: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example

23

• We’ll get a bunch of samples from the BN:+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w

• If we want to p(W)– We have counts <+w:4, -w:1>– Normalize to get p(W) = <+w:0.8, -w:0.2>– This will get closer to the true distribution with more

samples– Can estimate anything else, too– What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)?– Fast: can use fewer samples if less time (what’s the

drawback?)

Page 25: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Rejection Sampling

24

• Let’s say we want p(C)– No point keeping all samples around– Just tally counts of C as we go

• Let’s say we want p(C|+s)– Same thing: tally C outcomes, but

ignore (reject) samples which don’t have S=+s

– This is called rejection sampling– It is also consistent for conditional

probabilities (i.e., correct in the limit)

+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w

Page 26: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Likelihood Weighting

25

• Problem with rejection sampling:– If evidence is unlikely, you reject a lot of samples– You don’t exploit your evidence as you sample – Consider p(B|+a)

• Idea: fix evidence variables and sample the rest

• Problem: sample distribution not consistent!• Solution: weight by probability of evidence given

parents

-b, -a-b, -a-b, -a-b, -a+b, +a

-b, +a-b, +a-b, +a-b, +a+b, +a

Page 27: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Likelihood Weighting

26

+c +s 0.1-s 0.9

-c +s 0.5-s 0.5

( )|p S C

+c +r 0.8-r 0.2

-c +r 0.2-r 0.8

( )|p R C

+s+r

+w 0.99-w 0.01

-r+w 0.90-w 0.10

-s+r

+w 0.90-w 0.10

-r+w 0.01-w 0.99

( )| ,p W S R

+c 0.5-c 0.5

( )p C

Samples:+c, +s, +r, +w……

w = 0.1×0.99

Page 28: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Likelihood Weighting

27

• Sampling distribution if z sampled and e fixed evidence

• now, samples have weights

• Together, weighted sampling distribution is consistent

( ) ( )( )1

, |l

WS i ii

S z e p z Parents Z=

( ) ( )( )1

, |m

i ii

w z e p e Parents E=

( ) ( ) ( )( ) ( )( )

( )1 1

, , | |

,

l m

WS i i i ii i

S z e w z e p z Parents Z p e Parents E

p z e= =

=

=

Õ Õ

Page 29: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Ghostbusters HMM

28

– p(X1) = uniform– p(X|X’) = usually move clockwise, but

sometimes move in a random direction or stay in place

– p(Rij|X) = same sensor model as before: red means close, green means far away.

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9

p(X1)

1/6 1/6 1/2

0 1/6 0

0 0 0

p(X|X’=<1,2>)

Page 30: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example: Passage of Time

29

• As time passes, uncertainty “accumulates”

B ' X '( ) = p X ' | x( )x∑ B x( )

Transition model: ghosts usually go clockwise

T = 1 T = 2 T= 5

Page 31: Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

Example: Observation

30

• As we get observations, beliefs get reweighted, uncertainty “decreases”

( ) ( ) ( )| 'B p ex X B Xµ

Before observation After observation