Independent Contractor or Employee? Avoiding Worker Misclassification Confusion
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...
-
date post
15-Jan-2016 -
Category
Documents
-
view
223 -
download
0
Transcript of Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest...
Hidden Markov Model
11/28/07
Bayes RuleThe posterior distribution
Select k with the largest posterior distribution.
Minimizes the average misclassification rate.
Maximum likelihood rule is equivalent to Bayes rule with uniform prior.
Decision boundary is
jj
k
jGP
kGP
P
kGPkGP
)|(
)|(
)(
),()|(
x
x
x
xx
)|(maxarg)( xx kGPkB C
)2|()1|( GPGP xx
Naïve Bayes approximation
• When x is high dimensional, it is difficult to estimate
)|( kGP x
Naïve Bayes Classifier
• When x is high dimensional, it is difficult to estimate
• But if we assume independence, then it becomes a 1-D problem.
)|( kGP x
j
j kGxPkGP )|()|(x
Naïve Bayes Classifier
• Usually the independence assumption is not valid.
• But sometimes the NBC can still be a good classifier.
• A lot of times simple models may not perform badly.
Hidden Markov Model
A coin toss example
Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)
A coin toss example
Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)
But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table!
Fair Biased
A coin toss exampleThis is what really happening:
(H, T, H, T, H, H, H, H, T, H, H, T, …)
Of course you can’t see the color. So how can you tell your friend is cheating?
Hidden Markov ModelHidden state (the
coin)
Observed variable (H or T)
Markov PropertyHidden state (the
coin)
Observed variable (H or T)
))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP
Markov Property
))1(|)(())1(),...,2(),1(|)(( txtxPxtxtxtxP
tt xxxx
ttt
aaxp
xxpxxpxpxxp
,,1
11211
121...)(
)|()...|()(),...,(
Fair Biased1,1a
2,1a
1,2a2,2a
i ,1, j
jia
transition probability
prior distribution
Observation independenceHidden state (the
coin)
Observed variable (H or T)
)|(...)|(),...,|,...,( 1111 tttt xyPxyPxxyyP
Emission probability
Model parameters
A = (aij) (transition matrix)
p(yt | xt) (emission probability)
p(x1) (prior distribution)
Model inference
• Infer states when model parameters are known.
• Both states and model parameters are unknown.
Viterbi algorithm
t-1 t t+1
1
2
3
4
stat
e
time
Viterbi algorithm
• Most probable path:
),(maxarg*
yp
t-1 t t+1
1
2
3
4
stat
e
time
Viterbi algorithm
• Most probable path:
),(maxarg*
yp
t-1 t t+1
1
2
3
4
stat
e
time
)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii
Viterbi algorithm
• Most probable path:
),(maxarg*
yp
t-1 t t+1
1
2
3
4
stat
e
time
Therefore, the path can be found iteratively.
)|(),...,,,...,(),...,,,...,( ,111111 1 iiiiii ypayypyypii
Viterbi algorithm
• Most probable path:
),(maxarg*
yp
t-1 t t+1
1
2
3
4
stat
e
time
Let vk(i) be the most probable path ending in state k.
Then ))((max)()1( 1 klkkill aivyeiv
Viterbi algorithm
• Initialization (i=0):
• Recursion (i=1,...,L):
• Termination:
• Traceback (i = L, ..., 1):
0. k for 0)0(,1)0(0 kvv
))1((maxarg)(
)|()( ),)1((max)()(
klkki
llllklkklll
aivlptr
ypyeaivyeiv
))((maxarg
))((max*),(
0*
0
kkkL
klk
aLv
aLvxP
)( **1 iii ptr
Advantage of Viterbi path
• Identify the most probable path very efficiently.
• The most probable path is legitimate, i.e., it is realizable by the HMM process.
Issue with Viterbi path
• The most probability path does not predict the confidence level of a state estimate.
• The most probably path may not be much more probable then other paths.
Posterior distribution
Estimate p(xk | y1, ..., yL).Strategy:
This is done by a forward-backward algorithm
),...(
)()(
),...(
)|,...(),...,,(
),...(
),...,|,...(),...,,(
),...(
),...,,(),...,|(
1
1
11
1
111
1
11
L
kk
L
iLiii
L
iiLiii
L
LiLi
yyp
ibif
yyp
kyypyykp
yyp
yykyypyykp
yyp
yykpyykp
Forward-backward algorithm
Estimate fk(i)
kilklk
kilklii
iil
yeaif
yeakyyp
lyypif
)()1(
)(),,...,(
),,...,()(
111
1
Forward algorithm
Estimate fk(i)
kilklk
kilklii
iil
yeaif
yeakyyp
lyypif
)()1(
)(),,...,(
),,...,()(
111
1
Initialization:
Recursion:
Termination:
0 k for 0)0(,1)0(0 kff
k klkill aifyeif )1()()(
k
kk aLfyP 0)()(
Backward algorithm
Estimate bk(i)
killkk
killkiLi
iLil
yeaib
yeakyyp
lyypib
)()(
)()|,...,(
)|,...,()1(
1
1
Backward algorithm
Estimate bk(i)
killkk
killkiLi
iLil
yeaib
yeakyyp
lyypib
)()(
)()|,...,(
)|,...,()1(
1
1
Initialization:
Recursion:
Termination:
k allfor ,)( 0kk aLb
l ilkllk yeaibib )()1()( 1
l
lkk yeabyP )()1()( 10
Probability of fair coin
1
P(fair)
Probability of fair coin
1
P(fair)
Posterior distribution
• Posterior distribution predicts the confidence level of a state estimate.
• Posterior distribution combines information from all paths.
But..
• The predicted path may not be legitimate.
Estimating parameters when state sequence is known
Given the state sequence {xk}
Define
Ajk = # transitions from j to k.
Ek(b) = #emissions of b from k.
The maximum likelihood estimates of parameters are:
''
lkl
klkl A
Aa
'
)'(
)()(
bk
kk bE
bEbe
Infer hidden states together with model parameters
• Viterbi training
• Baum-Welch
Viterbi training
Main idea: Use an iterative procedure
• Estimate state for fixed parameters using the Viterbi algorithm.
• Estimate model parameters for fixed states.
Baum-Welch algorithm
• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)
Baum-Welch algorithm
• Instead of using the Viterbi path to estimate state, consider the expected number of Akl and Ek(b)
),...,(
)1()()(),,...,|,(
1
111
L
lilklkLii yyp
ibyeaifyylkp
j i
ji
jilkl
jkjkl ibyeaif
ypA )1()()(
)(
11
j bxi
jikl
jkjk
ji
ibaifyp
bE}|{
)()()(
1)(
Baum-Welch is a special case of EM algorithm
• Given an estimate of parameter t , try to find a better
x x
tt yxPyxpyxpyxP
yxPyxPyP
),|(log),|()|,(log),|(
),|(log)|,(log)|(log
)|( tQ
)|()|()'|(log)|(log ttt QQyPyP
• Choose to maximize Q
Baum-Welch is a special case of EM algorithm
• E-step: Calculate the Q function
• M-step: Maximize Q(|t) with respect to .
Issue with EM
• EM only finds local maxima.
• Solution:– Run multiple EM starting with different initial
guesses.– Use more sophisticated algorithm such as
MCMC.
Kelvin Murphy
Dynamic Bayesian Network
Software
• Kevin Murphy’s Bayes Net Toolbox for Matlab
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html
Applications
(Yi Li)
Copy number changes
Applications
Protein-binding sites
Applications
www.biocentral.com
Sequence alignment
Reading list
• Hastie et al. (2001) the ESL book– p184-185.
• Durbin et al. (1998) Biological Sequence Analysis– Chapter 3.