Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Post on 15-Jan-2016

223 views 0 download

Tags:

Transcript of Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Hidden Markov Model

11/28/07

Bayes RuleThe posterior distribution

Select k with the largest posterior distribution.

Minimizes the average misclassification rate.

Maximum likelihood rule is equivalent to Bayes rule with uniform prior.

Decision boundary is

jj

k

jGP

kGP

P

kGPkGP

)|(

)|(

)(

),()|(

x

x

x

xx

)|(maxarg)( xx kGPkB C

)2|()1|( GPGP xx

Naïve Bayes approximation

• When x is high dimensional, it is difficult to estimate

)|( kGP x

Naïve Bayes Classifier

• When x is high dimensional, it is difficult to estimate

• But if we assume independence, then it becomes a 1-D problem.

)|( kGP x

j

j kGxPkGP )|()|(x

Naïve Bayes Classifier

• Usually the independence assumption is not valid.

• But sometimes the NBC can still be a good classifier.

• A lot of times simple models may not perform badly.

Hidden Markov Model

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table!

Fair Biased

A coin toss exampleThis is what really happening:

(H, T, H, T, H, H, H, H, T, H, H, T, …)

Of course you can’t see the color. So how can you tell your friend is cheating?

Hidden Markov ModelHidden state (the

coin)

Observed variable (H or T)

Markov PropertyHidden state (the

coin)

Observed variable (H or T)

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

Markov Property

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

tt xxxx

ttt

aaxp

xxpxxpxpxxp

,,1

11211

121...)(

)|()...|()(),...,(

Fair Biased1,1a

2,1a

1,2a2,2a

i ,1, j

jia

transition probability

prior distribution

Observation independenceHidden state (the

coin)

Observed variable (H or T)

)|(...)|(),...,|,...,( 1111 tttt xyPxyPxxyyP

Emission probability

Model parameters

A = (aij) (transition matrix)

p(yt | xt) (emission probability)

p(x1) (prior distribution)

Model inference

• Infer states when model parameters are known.

• Both states and model parameters are unknown.

Viterbi algorithm

t-1 t t+1

1

2

3

4

stat

e

time

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Therefore, the path can be found iteratively.

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Viterbi algorithm

• Most probable path:

),(maxarg*

yp

t-1 t t+1

1

2

3

4

stat

e

time

Let vk(i) be the most probable path ending in state k.

Then ))((max)()1( 1 klkkill aivyeiv

Viterbi algorithm

• Initialization (i=0):

• Recursion (i=1,...,L):

• Termination:

• Traceback (i = L, ..., 1):

0. k for 0)0(,1)0(0 kvv

))1((maxarg)(

)|()( ),)1((max)()(

klkki

llllklkklll

aivlptr

ypyeaivyeiv

))((maxarg

))((max*),(

0*

0

kkkL

klk

aLv

aLvxP

)( **1 iii ptr

Advantage of Viterbi path

• Identify the most probable path very efficiently.

• The most probable path is legitimate, i.e., it is realizable by the HMM process.

Issue with Viterbi path

• The most probability path does not predict the confidence level of a state estimate.

• The most probably path may not be much more probable then other paths.

Posterior distribution

Estimate p(xk | y1, ..., yL).Strategy:

This is done by a forward-backward algorithm

),...(

)()(

),...(

)|,...(),...,,(

),...(

),...,|,...(),...,,(

),...(

),...,,(),...,|(

1

1

11

1

111

1

11

L

kk

L

iLiii

L

iiLiii

L

LiLi

yyp

ibif

yyp

kyypyykp

yyp

yykyypyykp

yyp

yykpyykp

Forward-backward algorithm

Estimate fk(i)

kilklk

kilklii

iil

yeaif

yeakyyp

lyypif

)()1(

)(),,...,(

),,...,()(

111

1

Forward algorithm

Estimate fk(i)

kilklk

kilklii

iil

yeaif

yeakyyp

lyypif

)()1(

)(),,...,(

),,...,()(

111

1

Initialization:

Recursion:

Termination:

0 k for 0)0(,1)0(0 kff

k klkill aifyeif )1()()(

k

kk aLfyP 0)()(

Backward algorithm

Estimate bk(i)

killkk

killkiLi

iLil

yeaib

yeakyyp

lyypib

)()(

)()|,...,(

)|,...,()1(

1

1

Backward algorithm

Estimate bk(i)

killkk

killkiLi

iLil

yeaib

yeakyyp

lyypib

)()(

)()|,...,(

)|,...,()1(

1

1

Initialization:

Recursion:

Termination:

k allfor ,)( 0kk aLb

l ilkllk yeaibib )()1()( 1

l

lkk yeabyP )()1()( 10

Probability of fair coin

1

P(fair)

Probability of fair coin

1

P(fair)

Posterior distribution

• Posterior distribution predicts the confidence level of a state estimate.

• Posterior distribution combines information from all paths.

But..

• The predicted path may not be legitimate.

Estimating parameters when state sequence is known

Given the state sequence {xk}

Define

Ajk = # transitions from j to k.

Ek(b) = #emissions of b from k.

The maximum likelihood estimates of parameters are:

''

lkl

klkl A

Aa

'

)'(

)()(

bk

kk bE

bEbe

Infer hidden states together with model parameters

• Viterbi training

• Baum-Welch

Viterbi training

Main idea: Use an iterative procedure

• Estimate state for fixed parameters using the Viterbi algorithm.

• Estimate model parameters for fixed states.

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

),...,(

)1()()(),,...,|,(

1

111

L

lilklkLii yyp

ibyeaifyylkp

j i

ji

jilkl

jkjkl ibyeaif

ypA )1()()(

)(

11

j bxi

jikl

jkjk

ji

ibaifyp

bE}|{

)()()(

1)(

Baum-Welch is a special case of EM algorithm

• Given an estimate of parameter t , try to find a better

x x

tt yxPyxpyxpyxP

yxPyxPyP

),|(log),|()|,(log),|(

),|(log)|,(log)|(log

)|( tQ

)|()|()'|(log)|(log ttt QQyPyP

• Choose to maximize Q

Baum-Welch is a special case of EM algorithm

• E-step: Calculate the Q function

• M-step: Maximize Q(|t) with respect to .

Issue with EM

• EM only finds local maxima.

• Solution:– Run multiple EM starting with different initial

guesses.– Use more sophisticated algorithm such as

MCMC.

Kelvin Murphy

Dynamic Bayesian Network

Software

• Kevin Murphy’s Bayes Net Toolbox for Matlab

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

Applications

(Yi Li)

Copy number changes

Applications

Protein-binding sites

Applications

www.biocentral.com

Sequence alignment

Reading list

• Hastie et al. (2001) the ESL book– p184-185.

• Durbin et al. (1998) Biological Sequence Analysis– Chapter 3.