Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… ·...
Transcript of Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… ·...
![Page 1: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/1.jpg)
Hidden Markov Models
Bonnie Dorr Christof Monz
CMSC 723: Introduction to Computational Linguistics
Lecture 5
October 6, 2004
![Page 2: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/2.jpg)
Hidden Markov Model (HMM) HMMs allow you to estimate probabilities
of unobserved events Given plain text, which underlying
parameters generated the surface E.g., in speech recognition, the observed
data is the acoustic signal and the wordsare the hidden parameters
![Page 3: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/3.jpg)
HMMs and their Usage HMMs are very common in Computational
Linguistics: Speech recognition (observed: acoustic signal,
hidden: words) Handwriting recognition (observed: image, hidden:
words) Part-of-speech tagging (observed: words, hidden:
part-of-speech tags) Machine translation (observed: foreign words,
hidden: words in target language)
![Page 4: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/4.jpg)
Noisy Channel Model In speech recognition you observe an
acoustic signal (A=a1,…,an) and you wantto determine the most likely sequence ofwords (W=w1,…,wn): P(W | A)
Problem: A and W are too specific forreliable counts on observed data, and arevery unlikely to occur in unseen data
![Page 5: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/5.jpg)
Noisy Channel Model Assume that the acoustic signal (A) is already
segmented wrt word boundaries P(W | A) could be computed as
Problem: Finding the most likely wordcorresponding to a acoustic representationdepends on the context
E.g., /'pre-z&ns / could mean “presents” or“presence” depending on the context
!
P(W | A) = maxwia
i
" P(wi| a
i)
![Page 6: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/6.jpg)
Noisy Channel Model Given a candidate sequence W we need
to compute P(W) and combine it with P(W| A)
Applying Bayesʼ rule:
The denominator P(A) can be dropped,because it is constant for all W
!
argmaxW
P(W | A) = argmaxW
P(A |W )P(W )
P(A)
![Page 7: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/7.jpg)
7
Noisy Channel in a Picture
![Page 8: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/8.jpg)
DecodingThe decoder combines evidence from
The likelihood: P(A | W) This can be approximated as:
The prior: P(W) This can be approximated as:
!
P(W ) " P(w1 ) P(wi
i= 2
n
# |wi$1)
!
P(A |W ) " P(ai
i=1
n
# |wi)
![Page 9: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/9.jpg)
Search Space Given a word-segmented acoustic sequence list
all candidates
Compute the most likely path
presentsexpressiveboldpressinactivebought
presenceexpensivebaldpresidentsexcessiveboat'pre-z&nsik-'spen-siv'bot
!
P(inactive |bald)!
P('bot |bald)
![Page 10: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/10.jpg)
Markov Assumption The Markov assumption states that
probability of the occurrence of word wi attime t depends only on occurrence ofword wi-1 at time t-1 Chain rule:
Markov assumption:
!
P(w1,...,wn) " P(w
i|w
i#1)i= 2
n
$!
P(w1,...,wn) = P(w
i|w1,...,wi"1)
i= 2
n
#
![Page 11: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/11.jpg)
The Trellis
![Page 12: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/12.jpg)
Parameters of an HMM States: A set of states S=s1,…,sn Transition probabilities: A= a1,1,a1,2,…,an,n Each
ai,j represents the probability of transitioningfrom state si to sj.
Emission probabilities: A set B of functions ofthe form bi(ot) which is the probability ofobservation ot being emitted by si
Initial state distribution: is the probability thatsi is a start state
!
"i
![Page 13: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/13.jpg)
The Three Basic HMM Problems Problem 1 (Evaluation): Given the observation
sequence O=o1,…,oT and an HMM model , how do we compute the
probability of O given the model? Problem 2 (Decoding): Given the observation
sequence O=o1,…,oT and an HMM model , how do we find the state
sequence that best explains the observations?!
" = (A,B,# )
!
" = (A,B,# )
![Page 14: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/14.jpg)
Problem 3 (Learning): How do we adjustthe model parameters , tomaximize ?
The Three Basic HMM Problems
!
" = (A,B,# )
!
P(O | ")
![Page 15: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/15.jpg)
Problem 1: Probability of an ObservationSequence
What is ? The probability of a observation sequence is the
sum of the probabilities of all possible statesequences in the HMM.
Naïve computation is very expensive. Given Tobservations and N states, there are NT
possible state sequences. Even small HMMs, e.g. T=10 and N=10,
contain 10 billion different paths Solution to this and problem 2 is to use dynamic
programming
!
P(O | ")
![Page 16: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/16.jpg)
Forward Probabilities What is the probability that, given an
HMM , at time t the state is i and thepartial observation o1 … ot has beengenerated?
!
" t (i) = P(o1...ot , qt = si | #)!
"
![Page 17: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/17.jpg)
Forward Probabilities
!
" t ( j) = " t#1(i)aiji=1
N
$%
& '
(
) * b j (ot )
!
" t (i) = P(o1...ot , qt = si | #)
![Page 18: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/18.jpg)
Forward Algorithm Initialization:
Induction:
Termination:!
" t ( j) = " t#1(i)aiji=1
N
$%
& '
(
) * b j (ot ) 2 + t + T,1+ j + N
!
"1(i) = #
ibi(o1) 1$ i $ N
!
P(O | ") = #T(i)
i=1
N
$
![Page 19: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/19.jpg)
Forward Algorithm Complexity In the naïve approach to solving problem
1 it takes on the order of 2T*NT
computations The forward algorithm takes on the order
of N2T computations
![Page 20: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/20.jpg)
Backward Probabilities Analogous to the forward probability, just
in the other direction What is the probability that given an HMM
and given the state at time t is i, thepartial observation ot+1 … oT isgenerated?
!
"t (i) = P(ot+1...oT |qt = si,#)!
"
![Page 21: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/21.jpg)
Backward Probabilities
!
"t (i) = aijb j (ot+1)"t+1( j)j=1
N
#$
% & &
'
( ) )
!
"t (i) = P(ot+1...oT |qt = si,#)
![Page 22: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/22.jpg)
Backward Algorithm Initialization:
Induction:
Termination:
!
"T(i) =1, 1# i # N
!
"t (i) = aijb j (ot+1)"t+1( j)j=1
N
#$
% & &
'
( ) ) t = T *1...1,1+ i + N
!
P(O | ") = #i$1(i)
i=1
N
%
![Page 23: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/23.jpg)
Problem 2: Decoding The solution to Problem 1 (Evaluation) gives us
the sum of all paths through an HMM efficiently. For Problem 2, we wan to find the path with the
highest probability. We want to find the state sequence Q=q1…qT,
such that
!
Q = argmaxQ '
P(Q' |O,")
![Page 24: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/24.jpg)
Viterbi Algorithm Similar to computing the forward
probabilities, but instead of summing overtransitions from incoming states, computethe maximum
Forward:
Viterbi Recursion:
!
" t ( j) = " t#1(i)aiji=1
N
$%
& '
(
) * b j (ot )
!
"t ( j) = max1# i#N
"t$1(i)aij[ ] bj (ot )
![Page 25: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/25.jpg)
Viterbi Algorithm Initialization: Induction:
Termination:
Read out path:
!
"1(i) = # ib j (o1) 1$ i $ N
!
"t ( j) = max1# i#N
"t$1(i)aij[ ] bj (ot )
!
"t ( j) = argmax1# i#N
$t%1(i)aij& ' (
) * + 2 # t # T,1# j # N
!
p*
=max1" i"N
#T (i)
!
qT*
= argmax1" i"N
#T (i)
!
qt*
="t+1(qt+1*) t = T #1,...,1
![Page 26: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/26.jpg)
Problem 3: Learning Up to now weʼve assumed that we know the
underlying model Often these parameters are estimated on
annotated training data, which has twodrawbacks: Annotation is difficult and/or expensive Training data is different from the current data
We want to maximize the parameters withrespect to the current data, i.e., weʼre lookingfor a model , such that
!
" = (A,B,# )
!
"'
!
"'= argmax"
P(O | ")
![Page 27: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/27.jpg)
Problem 3: Learning Unfortunately, there is no known way to
analytically find a global maximum, i.e., a model, such that
But it is possible to find a local maximum Given an initial model , we can always find a
model , such that!
"'
!
"'= argmax"
P(O | ")
!
"
!
"'
!
P(O | "') # P(O | ")
![Page 28: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/28.jpg)
Parameter Re-estimation Use the forward-backward (or Baum-
Welch) algorithm, which is a hill-climbingalgorithm
Using an initial parameter instantiation,the forward-backward algorithm iterativelyre-estimates the parameters andimproves the probability that givenobservation are generated by the newparameters
![Page 29: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/29.jpg)
Parameter Re-estimation Three parameters need to be re-
estimated: Initial state distribution: Transition probabilities: ai,j Emission probabilities: bi(ot)
!
"i
![Page 30: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/30.jpg)
Re-estimating Transition Probabilities
Whatʼs the probability of being in state siat time t and going to state sj, given thecurrent model and parameters?
!
" t (i, j) = P(qt = si, qt+1 = s j |O,#)
![Page 31: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/31.jpg)
Re-estimating Transition Probabilities
!
" t (i, j) =# t (i) ai, j b j (ot+1) $t+1( j)
# t (i) ai, j b j (ot+1) $t+1( j)j=1
N
%i=1
N
%
!
" t (i, j) = P(qt = si, qt+1 = s j |O,#)
![Page 32: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/32.jpg)
Re-estimating Transition Probabilities
The intuition behind the re-estimationequation for transition probabilities is
Formally:!
ˆ a i, j =expected number of transitions from state si to state sj
expected number of transitions from state si
!
ˆ a i, j =
" t (i, j)t=1
T#1
$
" t (i, j ')j '=1
N
$t=1
T#1
$
![Page 33: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/33.jpg)
Re-estimating Transition Probabilities
Defining
As the probability of being in state si,given the complete observation O
We can say:
!
ˆ a i, j =
" t (i, j)t=1
T#1
$
% t (i)t=1
T#1
$
!
" t (i) = # t (i, j)j=1
N
$
![Page 34: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/34.jpg)
Review of Probabilities Forward probability:
The probability of being in state si, given the partialobservation o1,…,ot
Backward probability:The probability of being in state si, given the partialobservation ot+1,…,oT
Transition probability:The probability of going from state si, to state sj, giventhe complete observation o1,…,oT
State probability:The probability of being in state si, given the completeobservation o1,…,oT
!
"t(i)
!
"t(i)
!
" t (i, j)
!
"t(i)
![Page 35: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/35.jpg)
Re-estimating Initial State Probabilities
Initial state distribution: is theprobability that si is a start state
Re-estimation is easy:
Formally:!
"i
!
ˆ " i= expected number of times in state si at time 1
!
ˆ " i= #
1(i)
![Page 36: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/36.jpg)
Re-estimation of Emission Probabilities Emission probabilities are re-estimated as
Formally:
WhereNote that here is the Kronecker delta function and is notrelated to the in the discussion of the Viterbi algorithm!!
!
ˆ b i(k) =
expected number of times in state si and observe symbol vk
expected number of times in state si
!
ˆ b i(k) =
"(ot,v
k)#
t(i)
t=1
T
$
#t(i)
t=1
T
$
!
"(ot,v
k) =1, if o
t= v
k, and 0 otherwise
!
"
!
"
![Page 37: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/37.jpg)
The Updated Model Coming from we get to by the following update rules:
!
" = (A,B,# )
!
"'= ( ˆ A , ˆ B , ˆ # )
!
ˆ b i(k) =
"(ot,v
k)#
t(i)
t=1
T
$
#t(i)
t=1
T
$
!
ˆ a i, j =
" t (i, j)t=1
T#1
$
% t (i)t=1
T#1
$
!
ˆ " i= #
1(i)
![Page 38: Hidden Markov Models - Université de Montréalpift6080/H09/documents/hmm_tutorial_dorr_… · Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events](https://reader035.fdocuments.us/reader035/viewer/2022063001/5f18392f08620525de707ee3/html5/thumbnails/38.jpg)
Expectation Maximization The forward-backward algorithm is an
instance of the more general EMalgorithm The E Step: Compute the forward and
backward probabilities for a give model The M Step: Re-estimate the model
parameters