Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Hidden Markov Model

11/28/07

Bayes RuleThe posterior distribution

Select k with the largest posterior distribution.

Minimizes the average misclassification rate.

Maximum likelihood rule is equivalent to Bayes rule with uniform prior.

Decision boundary is

kGPkGP

),()|(

)|(maxarg)( xx kGPkB C

)2|()1|( GPGP xx

Naïve Bayes approximation

• When x is high dimensional, it is difficult to estimate

)|( kGP x

Naïve Bayes Classifier

• When x is high dimensional, it is difficult to estimate

• But if we assume independence, then it becomes a 1-D problem.

)|( kGP x

j kGxPkGP )|()|(x

Naïve Bayes Classifier

• Usually the independence assumption is not valid.

• But sometimes the NBC can still be a good classifier.

• A lot of times simple models may not perform badly.

Hidden Markov Model

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

A coin toss example

Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table!

Fair Biased

A coin toss exampleThis is what really happening:

(H, T, H, T, H, H, H, H, T, H, H, T, …)

Of course you can’t see the color. So how can you tell your friend is cheating?

Hidden Markov ModelHidden state (the

Observed variable (H or T)

Markov PropertyHidden state (the

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

Markov Property

))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP

tt xxxx

xxpxxpxpxxp

121...)(

)|()...|()(),...,(

Fair Biased1,1a

1,2a2,2a

i ,1, j

transition probability

prior distribution

Observation independenceHidden state (the

)|(...)|(),...,|,...,( 1111 tttt xyPxyPxxyyP

Emission probability

Model parameters

A = (aij) (transition matrix)

p(yt | xt) (emission probability)

p(x1) (prior distribution)

Model inference

• Infer states when model parameters are known.

• Both states and model parameters are unknown.

Viterbi algorithm

t-1 t t+1

Viterbi algorithm

• Most probable path:

),(maxarg*

t-1 t t+1

Viterbi algorithm

),(maxarg*

t-1 t t+1

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Viterbi algorithm

),(maxarg*

t-1 t t+1

Therefore, the path can be found iteratively.

)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii

Viterbi algorithm

),(maxarg*

t-1 t t+1

Let vk(i) be the most probable path ending in state k.

Then ))((max)()1( 1 klkkill aivyeiv

Viterbi algorithm

• Initialization (i=0):

• Recursion (i=1,...,L):

• Termination:

• Traceback (i = L, ..., 1):

0. k for 0)0(,1)0(0 kvv

))1((maxarg)(

)|()( ),)1((max)()(

llllklkklll

aivlptr

ypyeaivyeiv

))((maxarg

))((max*),(

)( **1 iii ptr

Advantage of Viterbi path

• Identify the most probable path very efficiently.

• The most probable path is legitimate, i.e., it is realizable by the HMM process.

Issue with Viterbi path

• The most probability path does not predict the confidence level of a state estimate.

• The most probably path may not be much more probable then other paths.

Posterior distribution

Estimate p(xk | y1, ..., yL).Strategy:

This is done by a forward-backward algorithm

),...(

)|,...(),...,,(

),...(

),...,|,...(),...,,(

),...(

),...,,(),...,|(

iiLiii

kyypyykp

yykyypyykp

yykpyykp

Forward-backward algorithm

Estimate fk(i)

kilklk

kilklii

yeakyyp

lyypif

)(),,...,(

),,...,()(

Forward algorithm

Estimate fk(i)

kilklk

kilklii

yeakyyp

lyypif

)(),,...,(

),,...,()(

Initialization:

Recursion:

Termination:

0 k for 0)0(,1)0(0 kff

k klkill aifyeif )1()()(

kk aLfyP 0)()(

Backward algorithm

Estimate bk(i)

killkk

killkiLi

yeakyyp

lyypib

)()|,...,(

)|,...,()1(

Backward algorithm

Estimate bk(i)

killkk

killkiLi

yeakyyp

lyypib

)()|,...,(

)|,...,()1(

Initialization:

Recursion:

Termination:

k allfor ,)( 0kk aLb

l ilkllk yeaibib )()1()( 1

lkk yeabyP )()1()( 10

Probability of fair coin

P(fair)

Probability of fair coin

P(fair)

Posterior distribution

• Posterior distribution predicts the confidence level of a state estimate.

• Posterior distribution combines information from all paths.

• The predicted path may not be legitimate.

Estimating parameters when state sequence is known

Given the state sequence {xk}

Define

Ajk = # transitions from j to k.

Ek(b) = #emissions of b from k.

The maximum likelihood estimates of parameters are:

klkl A

Infer hidden states together with model parameters

• Viterbi training

• Baum-Welch

Viterbi training

Main idea: Use an iterative procedure

• Estimate state for fixed parameters using the Viterbi algorithm.

• Estimate model parameters for fixed states.

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

Baum-Welch algorithm

• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)

),...,(

)1()()(),,...,|,(

lilklkLii yyp

ibyeaifyylkp

jkjkl ibyeaif

ypA )1()()(

ibaifyp

)()()(

Baum-Welch is a special case of EM algorithm

• Given an estimate of parameter t , try to find a better

tt yxPyxpyxpyxP

yxPyxPyP

),|(log),|()|,(log),|(

),|(log)|,(log)|(log

)|( tQ

)|()|()'|(log)|(log ttt QQyPyP

• Choose to maximize Q

Baum-Welch is a special case of EM algorithm

• E-step: Calculate the Q function

• M-step: Maximize Q(|t) with respect to .

Issue with EM

• EM only finds local maxima.

• Solution:– Run multiple EM starting with different initial

guesses.– Use more sophisticated algorithm such as

Kelvin Murphy

Dynamic Bayesian Network

Software

• Kevin Murphy’s Bayes Net Toolbox for Matlab

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

Applications

(Yi Li)

Copy number changes

Applications

Protein-binding sites

Applications

www.biocentral.com

Sequence alignment

Reading list

• Hastie et al. (2001) the ESL book– p184-185.

• Durbin et al. (1998) Biological Sequence Analysis– Chapter 3.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Documents

Transcript of Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...

Matters Settled But Not Resolved: Worker Misclassification ...

Prior vs Likelihood vs Posterior Posterior Predictive Distribution …markirwin.net/stat220/Lecture/Lecture4.pdf · 2007. 11. 15. · Prior vs Likelihood vs Posterior The posterior

A misclassification hazard - AABRI Home Page · 2015-03-18 · A misclassification hazard A misclassification hazard Chelsea Schrader Morgan State University ... independent contractor

Suicide Misclassification in an International Context - …suicidology-online.com/pdf/SOL-2011-2-48-61.pdf · Suicide Misclassification in an International Context: Revisitation and

Economic Consequences of Misclassification in the State of ...

Truck Driver Misclassification: Climate, Labor, and ...laborcenter.berkeley.edu/pdf/2019/Truck-Driver-Misclassification.pdf · Truck Driver Misclassification: Climate, Labor, and

Validation of Bayesian posterior distributions using a ... · the inferred posterior distribution is a truthful representation of the actual constraints on the model parameters. We

Underground Economy and Employee Misclassification Task ... · of two workplace fraud cases that the Underground Economy and Employee Misclassification Task Force and DLT have closed.

Misclassification Task Force Presentation

A Likelihood-Free Reverse Sampler of the Posterior ...sn2294/papers/reverse.pdf · A Likelihood-Free Reverse Sampler of the Posterior Distribution Jean-Jacques Forneron Serena Ngy

DBSDA: Lowering the Bound of Misclassification Rate for ...

Worker Misclassification: Recent Trends in Independent ...media.straffordpub.com/products/worker-misclassification-recent... · Worker Misclassification: Recent Trends in Independent

Independent Contractor Misclassification – A Problem for ... Briefing 9-17-15 PPT.pdfthe “growing problem” of independent contractor misclassification • September 2011 - DOL

Misclassification in the Sharing Economy: It’s the ... · 3 Misclassification in the Sharing Economy 7/14/17 resolve commercial disputes.2 Merchants preferred using arbitration

Exempt or Non-Exempt? Employee Misclassification Challengesmedia.straffordpub.com/products/exempt-or-non-exempt-employee... · Exempt or Non-Exempt? Employee Misclassification Challenges

Independent Contractors v. Employees-- How to Avoid Misclassification

Misclassification rate

Prior vs Likelihood vs Posterior Posterior Predictive ... vs Likelihood vs Posterior Posterior Predictive Distribution Poisson Data ... e.g. Normal vs tdf † Probability of events

The distribution and trends in Chinese methane emissions ... · Assimilation results 2020 Prior Posterior Difference Annual mean prior and posterior CH 4 fluxes (2010-2017) and their

ESRI Discussion Paper Series No 1: Prior and posterior distributions of the parameters (a) Parameters Prior Posterior Distribution Mean S. D. Mean 90% interval h consumption habit