Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

45
Hidden Markov Model 11/28/07
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Page 1: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Model

11/28/07

Page 2: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Bayes RuleThe posterior distribution

Select k with the largest posterior distribution.

Minimizes the average misclassification rate.

Maximum likelihood rule is equivalent to Bayes rule with uniform prior.

Decision boundary is

jj

k

jGP

kGP

P

kGPkGP

)|(

)|(

)(

),()|(

x

x

x

xx

)|(maxarg)( xx kGPkB C

)2|()1|( GPGP xx

Page 3: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Naïve Bayes approximation

• When x is high dimensional, it is difficult to estimate

)|( kGP x

Page 4: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Naïve Bayes Classifier

• When x is high dimensional, it is difficult to estimate

• But if we assume independence, then it becomes a 1-D problem.

)|( kGP x

j

j kGxPkGP )|()|(x

Page 5: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Naïve Bayes Classifier

• Usually the independence assumption is not valid.

• But sometimes the NBC can still be a good classifier.

• A lot of times simple models may not perform badly.

Page 6: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Model

Page 7: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

Page 8: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table!

Fair Biased

Page 9: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

A coin toss exampleThis is what really happening:

(H, T, H, T, H, H, H, H, T, H, H, T, …)

Of course you can’t see the color. So how can you tell your friend is cheating?

Page 10: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov ModelHidden state (the

coin)

Observed variable (H or T)

Page 11: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Markov PropertyHidden state (the

coin)

Observed variable (H or T)

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

Page 12: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Markov Property

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

tt xxxx

ttt

aaxp

xxpxxpxpxxp

,,1

11211

121...)(

)|()...|()(),...,(

Fair Biased1,1a

2,1a

1,2a2,2a

i ,1, j

jia

transition probability

prior distribution

Page 13: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Observation independenceHidden state (the

coin)

Observed variable (H or T)

)|(...)|(),...,|,...,( 1111 tttt xyPxyPxxyyP

Emission probability

Page 14: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Model parameters

A = (aij) (transition matrix)

p(yt | xt) (emission probability)

p(x1) (prior distribution)

Page 15: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Model inference

• Infer states when model parameters are known.

• Both states and model parameters are unknown.

Page 16: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

t-1 t t+1

1

2

3

4

stat

e

time

Page 17: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Page 18: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Page 19: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Therefore, the path can be found iteratively.

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Page 20: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Let vk(i) be the most probable path ending in state k.

Then ))((max)()1( 1 klkkill aivyeiv

Page 21: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi algorithm

• Initialization (i=0):

• Recursion (i=1,...,L):

• Termination:

• Traceback (i = L, ..., 1):

0. k for 0)0(,1)0(0 kvv

))1((maxarg)(

)|()( ),)1((max)()(

klkki

llllklkklll

aivlptr

ypyeaivyeiv

))((maxarg

))((max*),(

0*

0

kkkL

klk

aLv

aLvxP

)( **1 iii ptr

Page 22: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Advantage of Viterbi path

• Identify the most probable path very efficiently.

• The most probable path is legitimate, i.e., it is realizable by the HMM process.

Page 23: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Issue with Viterbi path

• The most probability path does not predict the confidence level of a state estimate.

• The most probably path may not be much more probable then other paths.

Page 24: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Posterior distribution

Estimate p(xk | y1, ..., yL).Strategy:

This is done by a forward-backward algorithm

),...(

)()(

),...(

)|,...(),...,,(

),...(

),...,|,...(),...,,(

),...(

),...,,(),...,|(

1

1

11

1

111

1

11

L

kk

L

iLiii

L

iiLiii

L

LiLi

yyp

ibif

yyp

kyypyykp

yyp

yykyypyykp

yyp

yykpyykp

Page 25: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Forward-backward algorithm

Estimate fk(i)

kilklk

kilklii

iil

yeaif

yeakyyp

lyypif

)()1(

)(),,...,(

),,...,()(

111

1

Page 26: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Forward algorithm

Estimate fk(i)

kilklk

kilklii

iil

yeaif

yeakyyp

lyypif

)()1(

)(),,...,(

),,...,()(

111

1

Initialization:

Recursion:

Termination:

0 k for 0)0(,1)0(0 kff

k klkill aifyeif )1()()(

k

kk aLfyP 0)()(

Page 27: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Backward algorithm

Estimate bk(i)

killkk

killkiLi

iLil

yeaib

yeakyyp

lyypib

)()(

)()|,...,(

)|,...,()1(

1

1

Page 28: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Backward algorithm

Estimate bk(i)

killkk

killkiLi

iLil

yeaib

yeakyyp

lyypib

)()(

)()|,...,(

)|,...,()1(

1

1

Initialization:

Recursion:

Termination:

k allfor ,)( 0kk aLb

l ilkllk yeaibib )()1()( 1

l

lkk yeabyP )()1()( 10

Page 29: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Probability of fair coin

1

P(fair)

Page 30: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Probability of fair coin

1

P(fair)

Page 31: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Posterior distribution

• Posterior distribution predicts the confidence level of a state estimate.

• Posterior distribution combines information from all paths.

But..

• The predicted path may not be legitimate.

Page 32: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Estimating parameters when state sequence is known

Given the state sequence {xk}

Define

Ajk = # transitions from j to k.

Ek(b) = #emissions of b from k.

The maximum likelihood estimates of parameters are:

''

lkl

klkl A

Aa

'

)'(

)()(

bk

kk bE

bEbe

Page 33: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Infer hidden states together with model parameters

• Viterbi training

• Baum-Welch

Page 34: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Viterbi training

Main idea: Use an iterative procedure

• Estimate state for fixed parameters using the Viterbi algorithm.

• Estimate model parameters for fixed states.

Page 35: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

Page 36: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

),...,(

)1()()(),,...,|,(

1

111

L

lilklkLii yyp

ibyeaifyylkp

j i

ji

jilkl

jkjkl ibyeaif

ypA )1()()(

)(

11

j bxi

jikl

jkjk

ji

ibaifyp

bE}|{

)()()(

1)(

Page 37: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Baum-Welch is a special case of EM algorithm

• Given an estimate of parameter t , try to find a better

x x

tt yxPyxpyxpyxP

yxPyxPyP

),|(log),|()|,(log),|(

),|(log)|,(log)|(log

)|( tQ

)|()|()'|(log)|(log ttt QQyPyP

• Choose to maximize Q

Page 38: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Baum-Welch is a special case of EM algorithm

• E-step: Calculate the Q function

• M-step: Maximize Q(|t) with respect to .

Page 39: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Issue with EM

• EM only finds local maxima.

• Solution:– Run multiple EM starting with different initial

guesses.– Use more sophisticated algorithm such as

MCMC.

Page 40: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Kelvin Murphy

Dynamic Bayesian Network

Page 41: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Software

• Kevin Murphy’s Bayes Net Toolbox for Matlab

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

Page 42: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Applications

(Yi Li)

Copy number changes

Page 43: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Applications

Protein-binding sites

Page 44: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Applications

www.biocentral.com

Sequence alignment

Page 45: Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Reading list

• Hastie et al. (2001) the ESL book– p184-185.

• Durbin et al. (1998) Biological Sequence Analysis– Chapter 3.