Hidden Markov Models

51
1 Hidden Markov Models Hsin-Min Wang [email protected] References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Reco gnition, Chapter 6 2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3. L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Sel ected Applications in Speech Recognition,” Proceedings of the IEE E, vol. 77, No. 2, February 1989

description

Hidden Markov Models. Hsin-Min Wang [email protected]. References: L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 - PowerPoint PPT Presentation

Transcript of Hidden Markov Models

Page 1: Hidden Markov Models

1

Hidden Markov Models

Hsin-Min [email protected]

References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6

2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8

3. L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989

Page 2: Hidden Markov Models

2

Hidden Markov Model (HMM)

History– Published in Baum’s papers in late 1960s and early 1970s– Introduced to speech processing by Baker (CMU) and Jelinek (I

BM) in the 1970s– Introduced to computational biology in late1980s

• Lander and Green (1987) used HMMs in the construction of genetic linkage maps

• Churchill (1989) employed HMMs to distinguish coding from noncoding regions in DNA

Page 3: Hidden Markov Models

3

Hidden Markov Model (HMM)

Assumption– Speech signal (DNA sequence) can be characterized as a

parametric random process– Parameters can be estimated in a precise, well-defined manner

Three fundamental problems– Evaluation of probability (likelihood) of a sequence of

observations given a specific HMM– Determination of a best sequence of model states– Adjustment of model parameters so as to best account for

observed signal/sequence

Page 4: Hidden Markov Models

4

Hidden Markov Model (HMM)

S2

S1

S3

{A:.34,B:.33,C:.33}

{A:.33,B:.34,C:.33} {A:.33,B:.33,C:.34}

0.34

0.340.33

0.33

0.33

0.330.33

0.33

0.34 33.033.034.0

34.0,33.0,33.0

33.0,34.0,33.0

33.0,33.0,34.0

34.033.033.0

33.034.033.0

33.033.034.0

333

222

111

π

A

CbBbAb

CbBbAb

CbBbAb

Given an initial model as follows:

We can train HMMs for the following two classes using their training data respectively.

Training set for class 1:1. ABBCABCAABC 2. ABCABC 3. ABCA ABC 4. BBABCAB 5. BCAABCCAB 6. CACCABCA 7. CABCABCA 8. CABCA 9. CABCA

Training set for class 2:1. BBBCCBC 2. CCBABB 3. AACCBBB 4. BBABBAC 5. CCAABBAB 6. BBBCCBAA 7. ABBBBABA 8. CCCCC 9. BBAAA

We can then decide which class the following testing sequences belong to.        ABCABCCAB        AABABCCCCBBB

back

Page 5: Hidden Markov Models

5

Probability Theorem

Consider the simple scenario of rolling two dice, labeled die 1 and die 2. Define the following three events:

A: Die 1 lands on 3. B: Die 2 lands on 1. C: The dice sum to 8.

Prior probability: P(A)=P(B)=1/6, P(C)=5/36.

Joint probability: P(A,B) (or P(A∩B)) =1/36, two events A and B are statistically independent if and only if P(A,B) = P(A)xP(B).

P(B,C)=0, two events B and C are mutually exclusive if and only if B∩C=Φ, i.e., P(B∩C)=0.

Conditional probability: , P(B|A)=P(B), P(C|B)=0 6

1

6/1

36/1

)(

),()|(

AP

ACPACP

)|( max argˆ OP

)(

)()|( max arg

OP

POP

)()|( max arg

POPBayes’ rule

Posterior probability maximum likelihood principle

{(2,6), (3,5), (4,4), (5,3), (6,2)}

A∩B ={(3,1)}

B∩C=Φ

Page 6: Hidden Markov Models

6

The Markov Chain

),...,,(),...,,|(),...,,|( 2212211121 nnnnn XXXPXXXXPXXXXP

),,,...,,( 1221 nnn XXXXXP

)()|(),( APABPBAP

),...,,|()( 1212

1 ii

n

iXXXXPXP

),,...,,(),,...,,|( 12211221 nnnnn XXXXPXXXXXP

),(),|(...),...,,|(

),...,,|(),...,,|(

212133212

2211121

XXPXXXPXXXXP

XXXXPXXXXP

nn

nnnn

)()|(),|(...),...,,|(

),...,,|(),...,,|(

1122133212

2211121

XPXXPXXXPXXXXP

XXXXPXXXXP

nn

nnnn

)|()(),...,,( 12

121 ii

n

in XXPXPXXXP First-order Markov chain

Page 7: Hidden Markov Models

7

The parameters of a Markov chain, with N states labeled by {1,…,N} and the state at time t in the Markov chain denoted as qt, can be described as

aij=P(qt= j|qt-1=i) 1≤i,j≤N

i =P(q1=i) 1≤i≤N

The output of the process is the

set of states at each time instant t, where each state corresponds to an observable event Xi

There is a one-to-one

correspondence between the observable sequence and the Markov chain state sequence

Observable Markov Model

Nj ij ia1 ) 1(

)1( 1 Ni i

(Rabiner 1989)

Page 8: Hidden Markov Models

8

The Markov Chain – Ex 1

A 3-state Markov Chain – State 1 generates symbol A only,

State 2 generates symbol B only, State 3 generates symbol C only

– Given a sequence of observed symbols O={CABBCABC}, the only one corresponding state sequence is Q={S3S1S2S2S3S1S2S3}, and the corresponding probability is

P(O|)=P(CABBCABC|)=P(Q| )=P(S3S1S2S2S3S1S2S3 |)

=π(S3)P(S1|S3)P(S2|S1)P(S2|S2)P(S3|S2)P(S1|S3)P(S2|S1)P(S3|S2)=0.10.30.30.70.20.30.30.2=0.00002268

1.05.04.0

5.02.03.0

2.07.01.0

1.03.06.0

π

AS2 S3

A

B C

0.6

0.7

0.30.1

0.2

0.20.1

0.3

0.5

S1

Page 9: Hidden Markov Models

9

The Markov Chain – Ex 2

A three-state Markov chain for the Dow Jones Industrial average

0.3

0.2

0.5t

The probability of 5 consecutive up days

0.06480.60.5

days econsecutiv 5

4

111111111

11111

aaaa

,S,S,S,SSP

upP

(Huang et al., 2001)

Page 10: Hidden Markov Models

10

Extension to Hidden Markov Models

HMM: an extended version of Observable Markov Model– The observation is a probabilistic function (discrete or

continuous) of a state instead of an one-to-one correspondence of a state

– The model is a doubly embedded stochastic process with an underlying stochastic process that is not directly observable (hidden)

• What is hidden? The State Sequence!According to the observation sequence, we are not sure which state sequence generates it!

Page 11: Hidden Markov Models

11

Hidden Markov Models – Ex 1

A 3-state discrete HMM

– Given an observation sequence O={ABC}, there are 27 possible corresponding state sequences, and therefore the probability, P(O|), is

S2

S1

S3

{A:.3,B:.2,C:.5}

{A:.7,B:.1,C:.2} {A:.3,B:.6,C:.1}

0.6

0.7

0.30.1

0.2

0.20.1

0.3

0.5 1.05.04.0

1.0,6.0,3.0

2.0,1.0,7.0

5.0,2.0,3.0

5.02.03.0

2.07.01.0

1.03.06.0

333

222

111

π

A

CbBbAb

CbBbAb

CbBbAb

07.02.07.05.0

007.01.01.07.0,, when e.g.

: ,,,

23222

322322

27

1

27

1

SSPSSPSλP

SCPSBPSAPλPSSS

encestate sequλPλPλPλP

i

ii

ii

iii

i

Q

QOQ

QQQOQOO

Initial model

Page 12: Hidden Markov Models

12

Hidden Markov Models – Ex 2

(Huang et al., 2001)

Given a three-state Hidden Markov Model for the Dow Jones Industrial averageas follows:

How to find the probability P(up, up, up, up, up|)?How to find the optimal state sequence of the model which generates the observation sequence “up, up, up, up, up”?

cf. the Markov chain

(35 state sequences can generate “up, up, up, up, up”.)

Page 13: Hidden Markov Models

13

Elements of an HMM

An HMM is characterized by the following:1. N, the number of states in the model

2. M, the number of distinct observation symbols per state

3. The state transition probability distribution A={aij}, where aij=P[q

t+1=j|qt=i], 1≤i,j≤N

4. The observation symbol probability distribution in state j, B={bj

(vk)} , where bj(vk)=P[ot=vk|qt=j], 1≤j≤N, 1≤k≤M

5. The initial state distribution ={i}, where i=P[q1=i], 1≤i≤N

For convenience, we usually use a compact notation =(A,B,) to indicate the complete parameter set of an HMM1. Requires specification of two model parameters (N and M)

Page 14: Hidden Markov Models

14

Two Major Assumptions for HMM

First-order Markov assumptionFirst-order Markov assumption– The state transition depends only on the origin and destination

– The state transition probability is time invariant

Output-independent assumptionOutput-independent assumption– The observation is dependent on the state that generates it, not

dependent on its neighbor observations

aij=P(qt+1=j|qt=i), 1≤i, j≤N

T

tttTt qqPqPqqqPP

2111 ,,...,,..., Q

T

ttq

T

tttTtTt obqoPqqqoooPP

t11

11 ,,,...,,...,,...,,...,, QO

Page 15: Hidden Markov Models

15

Three Basic Problems for HMMs

Given an observation sequence O=(o1,o2,…,oT), and an HMM =(A,B,)

– Problem 1:

How to compute P(O|) efficiently ?

Evaluation Problem

– Problem 2:

How to choose an optimal state sequence Q=(q1,q2,……, qT) whic

h best explains the observations?

Decoding Problem– Problem 3:

How to adjust the model parameters =(A,B,) to maximize P(O|)? Learning/Training Problem

)|,(maxarg* OQQQ

P

)|(maxarg*iP

i

OP(up, up, up, up, up|)?

Page 16: Hidden Markov Models

16

Solution to Problem 1

Page 17: Hidden Markov Models

17

Solution to Problem 1 - Direct Evaluation

Given O and , find P(O|)= Pr{observing O given } Evaluating all possible state sequences of length T that ge

nerate the observation sequence O

: The probability of the path Q– By first-order Markov assumption

: The joint output probability along the path Q – By output-independent assumption

QQ

QQOQOO

,,allall

PPPP

TT qqqqqqq

T

ttt aaaqqPqPP

132211...,

211

Q

QP

,QOP

T

ttq

T

ttt obqoPP

t11

,, QO

Page 18: Hidden Markov Models

18

Solution to Problem 1 - Direct Evaluation (cont’d)

S2

S3

S1

o1

S2

S3

S1

S2

S3

S1

S2

S3

S1

State

o2 o3 oT

1 2 3 T-1 T Time

S2

S3

S1

oT-1

Sj means bj(ot) has been computed

aij means aij has been computed

)( 133 ob )( 2232 oba )( 3323 oba … )(121 Toba

Page 19: Hidden Markov Models

19

Solution to Problem 1 - Direct Evaluation (cont’d)

– A Huge Computation Requirement: O(NT) (NT state sequences)• Exponential computational complexity

A more efficient algorithm can be used to evaluate – The Forward Procedure/Algorithm

Tqqqqqq

,..,q,qqqq

allTqqqqqqqqqq

all

obaobaob

obobobaaa

PPP

TTTT

TTT

122121

11

21132211

.....

..........

,

21

21

QOQO

Q

Q

ADD 1 2MUL12 : -N, TN NT- TTT Complexity

OP

Page 20: Hidden Markov Models

20

Solution to Problem 1 - The Forward Procedure

Base on the HMM assumptions, the calculation of and involves only qt-1, qt , and o

t , so it is possible to compute the likelihood

with recursion on t

Forward variable : – The probability of the joint event that o1,o2,…,ot are observed and

the state at time t is i, given the model λ

,1tt qqP ,tt qoP

λiqoooPiα ttt ,,...,, 21

OP

λjqooooPjα tttt 11211 ,,,...,,

)()( 11

tj

N

iijt obaiα

Page 21: Hidden Markov Models

21

)()(

)(),|(,,...,,

)(),,,...,,|(,,...,,

)(,,,...,,

)(|,,...,,

),|(|,,...,,

)|(),|(,|,...,,

)|(,|,,...,,

|,,,...,,

11

11

121

11

21121

11

121

1121

11121

111121

11121

11211

tj

N

iijt

tj

N

itttt

tj

N

ittttt

tj

N

ittt

tjtt

tttt

ttttt

tttt

tttt

obai

obλiqjqPλiqoooP

obλiqooojqPλiqoooP

obλjqiqoooP

objqoooP

jqoPjqoooP

jqPjqoPjqoooP

jqPjqooooP

jqooooPj

Solution to Problem 1 - The Forward Procedure (cont’d)

Ball

BAPAP

),(

)|(),|()(

),(

),(

),,(

)(

),,()|,(

BPBAP

P

BP

BP

BAP

P

BAPBAP

Output-independent assumption

)|,()|(),|( BAPBPBAP

)(, 111 tjtt objqoP

),|()|()|,( ABPAPBAP

First-order Markov assumption

Page 22: Hidden Markov Models

22

Solution to Problem 1 - The Forward Procedure (cont’d)

3(2)=P(o1,o2,o3,q3=2|)

=[2(1)*a12+ 2(2)*a22 +2(3)*a32]b2(o3)

S2

S3

S1

o1

S2

S3

S1

S3

S2

S1

S2

S3

S1

State

o2 o3 oT

1 2 3 T-1 T Time

S2

S3

S1

oT-1

Sj means bj(ot) has been computed

aij means aij has been computed

2(1)

2(2)

2(3)

a12

a22

a32b2(o3)

Time index

State index

Page 23: Hidden Markov Models

23

Solution to Problem 1 - The Forward Procedure (cont’d)

Algorithm

– Complexity: O(N2T)

Based on the lattice (trellis) structure– Computed in a time-synchronous fashion from left-to-right,

where each cell for time t is completely computed before proceeding to time t+1

All state sequences, regardless how long previously, merge to N nodes (states) at each time instance t

N

iT

tj

N

iijtt

ii

iαλP

Nj,T-t, obaiαjα

Ni, obπiqoPiα

1

11

1

1111

Oion 3.Terminat

111 Induction 2.

1)|,(tion Initializa 1.

TNN))N(T-(N-

T N)+N )(T-N(N+2

2

111: ADD

11 : MUL

λiqoooPi ttt ,...21

cf. O(NT) for direct evaluation

Page 24: Hidden Markov Models

24

Solution to Problem 1 - The Forward Procedure (cont’d)

A three-state Hidden Markov Model for the Dow Jones Industrial average

b1(up)=0.7

b2(up)= 0.1

b3(up)=0.3

a11=0.6

a21=0.5

a31=0.4

(Huang et al., 2001)

b1(up)=0.7

b2(up)= 0.1

b3(up)=0.3

π1=0.5

π2=0.2

π3=0.3

α1(1)=0.5*0.7

α1(2)= 0.2*0.1

α1(3)= 0.3*0.3

α2(1)= (0.35*0.6+0.02*0.5+0.09*0.4)*0.7

α2(2)=(0.35*0.2+0.02*0.3+0.09*0.1)*0.1

α2(3)=(0.35*0.2+0.02*0.2+0.09*0.5)*0.3

P(up, up|) = α2(1)+α2(2)+α2(3)

a12=0.2

a22=0.3

a32=0.1a13=0.2a23=0.2

a33=0.5

Page 25: Hidden Markov Models

25

Solution to Problem 2

Page 26: Hidden Markov Models

26

Solution to Problem 2 - The Viterbi Algorithm

The Viterbi algorithm can be regarded as the dynamic programming algorithm applied to the HMM or as a modified forward algorithm– Instead of summing probabilities from different paths coming to t

he same destination state, the Viterbi algorithm picks and remembers the best path

• Find a single optimal state sequence Q*

– The Viterbi algorithm also can be illustrated in a trellis framework similar to the one for the forward algorithm

)|,(maxarg* OQQQ

P

Page 27: Hidden Markov Models

27

Solution to Problem 2 - The Viterbi Algorithm (cont’d)

S2

S3

S1

o1

S2

S3

S1

S2

S3

S1

S2

S1

S3

State

o2 o3 oT

1 2 3 T-1 T Time

S2

S3

S1

oT-1

Page 28: Hidden Markov Models

28

Solution to Problem 2 - The Viterbi Algorithm (cont’d)

1. Initialization

2. Induction

3. Termination

4. Backtracking),...,,(

1,...,2.1),(**

2*1

*

*11

T

tt*t

qqq

TTtqq

Q

Ni, i

Ni, obπi ii

10)(

1

1

11

Nj,T-t, aij

Nj,T-t, obaij

ijtNi

tjijtNi

t

1 11][maxarg)(

1 11][max

11t

11

1

iq

iλP

TNi

*T

TNi

1

1

*

maxarg

maxO

Complexity: O(N2T)

is the best state sequence

11

1 cf.

tj

N

iijtt obaiαjα

N

iT iαλP

1O

Page 29: Hidden Markov Models

29

b1(up)=0.7

b2(up)= 0.1

b3(up)=0.3

a11=0.6

a21=0.5

a31=0.4

b1(up)=0.7

b2(up)= 0.1

b3(up)=0.3

π1=0.5

π2=0.2

π3=0.3

Solution to Problem 2 - The Viterbi Algorithm (cont’d)

A three-state Hidden Markov Model for the Dow Jones Industrial average

(Huang et al., 2001)

δ1(1)=0.5*0.7

δ1(2)= 0.2*0.1

δ1(3)= 0.3*0.3

δ2(1)=max (0.35*0.6, 0.02*0.5, 0.09*0.4)*0.7

δ2(1)= 0.35*0.6*0.7=0.147Ψ2(1)=1

0.09

a12=0.2a22=0.3

a32=0.1a13=0.2

a23=0.2

a33=0.5

δ2(2)=max (0.35*0.2, 0.02*0.3, 0.09*0.1)*0.1

δ2(2)= 0.35*0.2*0.1=0.007Ψ2(2)=1

δ2(3)=max (0.35*0.2, 0.02*0.2, 0.09*0.5)*0.3

δ2(3)= 0.35*0.2*0.3=0.021Ψ2(3)=1

The most likely state sequence that generates “up up”: 1 1

1)1()( 2*221 qq*

1maxarg 231

2

iqi

*

Page 30: Hidden Markov Models

30

Some Examples

Page 31: Hidden Markov Models

31

Isolated Digit Recognition

o1 o2 o3 oT

1 2 3 T-1 T Time

oT-1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

1

0

)3()|( 0 TP O

S2

S3

S1

)3()|( 1 TP O

S2

S3

S1

S2

S3

S1

Page 32: Hidden Markov Models

32

Continuous Digit Recognition

o1 o2 o3 oT

1 2 3 T-1 T Time

oT-1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S5

S6

S4

S5

S6

S4

S5

S6

S4

S5

S6

S4

S5

S6

S4

1

0

)6(T

S2

S3

S1

)3(T

S2

S3

S1

S5

S6

S4

S5

S6

S4

)6(T

)3(T

Page 33: Hidden Markov Models

33

Continuous Digit Recognition (cont’d)

1 2 3 4 5 6 7 8 9 Time

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S5

S6

S4

S5

S6

S4

S5

S6

S4

S5

S6

S4

1

0 S2

S3

S1

S5

S6

S4

S2

S3

S1

S5

S6

S4

S2

S3

S1

S5

S6

S4

S2

S3

S1

S5

S6

S4

S2

S3

S1

S5

S6

S4

)6(8

)6(8

S1 S1 S2 S6S3 S3 S4 S5 S5

Beststate sequence

Page 34: Hidden Markov Models

34

CpG Islands

Two QuestionsTwo Questions Q1: Given a short sequence, does it come from a CpG is

land? Q2: Given a long sequence, how would we find the CpG

islands in it?

Page 35: Hidden Markov Models

35

CpG Islands Answer to Q1:

– Given sequence x, probabilistic model M1 of CpG islands, and probabilistic model M2 for non-CpG island regions

– Compute p1=P(x|M1) and p2=P(x|M2)

– If p1 > p2, then x comes from a CpG island (CpG+)

– If p2 > p1, then x does not come from a CpG island (CpG-)

S1:A S2:C

S3:T S4:G

CpG+ A C G T

A 0.180 0.274 0.426 0.120

C 0.171 0.368 0.274 0.188

G 0.161 0.339 0.375 0.125

T 0.079 0.355 0.384 0.182

CpG- A C G T

A 0.300 0.205 0.285 0.210

C 0.322 0.298 0.078 0.302

G 0.248 0.246 0.298 0.208

T 0.177 0.239 0.292 0.292

Large CG transition probability

vs.

Small CG transition probability

Page 36: Hidden Markov Models

36

CpG Islands Answer to Q2:

S1 S2

A: 0.3C: 0.2G: 0.2T: 0.3

A: 0.2C: 0.3G: 0.3T: 0.2

p22=0.9999

p11=0.99999

p12=0.00001

p21=0.0001 CpG+CpG-

… A C T C G A G T A …

S1 S1 S1 S1S2 S2 S2 S2 S1

Observable

Hidden

Page 37: Hidden Markov Models

37

A Toy Example: 5’ Splice Site Recognition

5’ splice site indicates the “switch” from an exon to an intron

Assumptions:– Uniform base composition on average in exons (25% each base) – Introns are A/T rich (40% A/T, and 10% C/G)– The 5’SS consensus nucleotide is almost always a G (say, 95%

G and 5% A)

From “What is a hidden Markov Model?”, by Sean R. Eddy

Page 38: Hidden Markov Models

38

A Toy Example: 5’ Splice Site Recognition

Page 39: Hidden Markov Models

39

Solution to Problem 3

Page 40: Hidden Markov Models

40

Solution to Problem 3 – Maximum Likelihood Estimation of Model Parameters

How to adjust (re-estimate) the model parameters =(A,B,) to maximize P(O|)?– The most difficult one among the three problems, because there

is no known analytical method that maximizes the joint probability of the training data in a closed form

• The data is incomplete because of the hidden state sequence

– The problem can be solved by the iterative Baum-Welch algorithm, also known as the forward-backward algorithm

• The EM (Expectation Maximization) algorithm is perfectly suitable for this problem

– Alternatively, it can be solved by the iterative segmental K-means algorithm

• The model parameters are adjusted to maximize P(O, Q* |), Q* is the state sequence given by the Viterbi algorithm

• Provide a good initialization of Baum-Welch training

Page 41: Hidden Markov Models

41

Solution to Problem 3 – The Segmental K-means Algorithm

Assume that we have a training set of observations and an initial estimate of model parameters– Step 1 : Segment the training data

The set of training observation sequences is segmented into states, based on the current model, by the Viterbi Algorithm

– Step 2 : Re-estimate the model parameters

– Step 3: Evaluate the model If the difference between the new and current model scores exceeds a threshold, go back to Step 1; otherwise, return

j

jkkb j statein nsobservatio ofNumber

statein "" ofNumber

sequences trainingofNumber

timesofNumber 1 iqi

state from ns transitioofNumber

state to state from ns transitioofNumber

i

jiaij

Page 42: Hidden Markov Models

42

Solution to Problem 3 – The Segmental K-means Algorithm (cont’d)

3 states and 2 codewords

π1=1, π2=π3=0 a11=3/4, a12=1/4 a22=2/3, a23=1/3 a33=1 b1(A)=3/4, b1(B)=1/4 b2(A)=1/3, b2(B)=2/3 b3(A)=2/3, b3(B)=1/3

A

B

O1

State

O2 O3

1 2 3 4 5 6 7 8 9 10O4

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

O5 O6 O9O8O7 O10Training data:

Re-estimatedparameters:

What if the training data is labeled?

Page 43: Hidden Markov Models

43

Solution to Problem 3 – The Backward Procedure

Backward variable :– The probability of the partial observation sequence ot+1,ot+2,…,oT,

given state i at time t and the model – 2(3)=P(o3,o4,…, oT|q2=3,)

=a31* b1(o3)*3(1)+a32* b2(o3)*3(2)+a33* b3(o3)*3(3)

S2

S3

S1

o1

S2

S3

S1

S2

S3

S1

S2

S3

S1

o2 o3 oT

1 2 3 T-1 T Time

S2

S3

S3

oT-1

S2

S3

S1

State

λ,,...,, 21 iqoooPi tTttt

3(1)b1(o3)

a31

Page 44: Hidden Markov Models

44

Solution to Problem 3 – The Backward Procedure (cont’d)

Algorithm

TN))N(T-(N-T N) (T-N

Nj,T-tjobai

NiiβN

jttjijt

T

222

111

11 :ADD ; 12 : MUL Complexity

1 11 ,Induction 2.

1 ,1tion Initializa 1.

λiqoooPi tTttt ,,...,, 21

N

iii

N

iT

N

iT

N

iT

obi

λiqPλiqoPλiqoooP

λiqPλiqooooPλiqooooPλP

111

1111

132

11

13211

1321

)()(

,,,...,,

,,...,,,,,...,,,

O

N

iT iαλP

1Ocf.

Page 45: Hidden Markov Models

45

Solution to Problem 3 – The Forward-Backward Algorithm

Relation between the forward and backward variables

)(][

,...

11

21

ti

N

jjitt

ttt

obaji

iqoooPi

λ

N

jttjijt

tTttt

jobai

iqoooPi

111

21

)(

,...

λ

λiqPii ttt ,)( O

(Huang et al., 2001)

Ni tt iiλP 1 )(O

Page 46: Hidden Markov Models

46

Solution to Problem 3 – The Forward-Backward Algorithm (cont’d)

λiqP

iqoooP

iqPiqoooP

iqoooPiqPiqoooP

iqoooPiqoooP

ii

t

tT

ttT

tTttttt

tTtttt

tt

,

)|,,...,,(

)|(),|,...,,(

),|,...,,()|(),|,...,,(

),|,...,,()|,,...,,(

)(

21

21

2121

2121

O

N

itt

N

it iiλiqPλP

11)()(, OO

Page 47: Hidden Markov Models

47

Solution to Problem 3 – The Intuitive View

Define two new variables:t(i)= P(qt = i | O, ) – Probability of being in state i at time t, given O and

t( i, j )=P(qt = i, qt+1 = j | O, ) – Probability of being in state i at time t and state j at time t+1, given O an

d

N

m

N

nttnmnt

ttjijtttt

nobam

jobai

λP

λjqiqPji

1 111

111 ,,,

O

O

N

jtt jii

1

,

Ni tt

tttttt

ii

ii

λP

ii

λP

iqPi

1

)|,(

OO

O

λiqPii ttt ,)( O

Ni tt iiλP 1 )(O

Page 48: Hidden Markov Models

48

Solution to Problem 3 – The Intuitive View (cont’d)

P(q3 = 1, O | )=3(1)*3(1)

o1

s2

s1

s3

s2

s1

s3

S2

S3

S1

State

o2 o3 oT

1 2 3 4 T-1 T Time

oT-1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

S2

S3

S1

3(1) 3(1)

Page 49: Hidden Markov Models

49

Solution to Problem 3 – The Intuitive View (cont’d)

P(q3 = 1, q4 = 3, O | )=3(1)*a13*b3(o4)*4(3)

o1

s2

s1

s3

s2

s1

s3

S2

S3

S1

State

o2 o3 oT

1 2 3 4 T-1 T Time

oT-1

S2

S3

S1

S2

S3

S1

S3

S2

S1

S2

S3

S1

S2

S3

S1

3(1)

4(3)

a13

b3(o4)

Page 50: Hidden Markov Models

50

Solution to Problem 3 – The Intuitive View (cont’d)

t( i, j )=P(qt = i, qt+1 = j | O, )

t(i)= P(qt = i | O, )

Oin state to state from ns transitioofnumber expected

,1

1

ji

jiT

tt

Oin state from ns transitioofnumber expected

1

1

i

iT

tt

Page 51: Hidden Markov Models

51

Solution to Problem 3 – The Intuitive View (cont’d)

Re-estimation formulae for , A, and B are

itii 11)( at time statein times)of(number freqency expected

statein timesofnumber expected

symbol observing and statein timesofnumber expectedT

1t

T

s.t.1 t

j

j

j

vjvb

t

vo

t

kkj

kt

i

i,jξ

i

jia

T-

tt

T-

tt

ij

1

1

1

1 state from ns transitioofnumber expected

state to state from ns transitioofnumber expected