Viterbi Training

1

Viterbi Training

• It is like Baum-Welsh. • Instead of the As and Es, the most probable paths for the training sequences are derived using the Viterbi algorithm.• Guaranteed to converge.• Maximize

1

2

Baum-Welsh Example:

Generating model Estimated model from 300 rolls of dice

2

33

Estimated model from 30000 rolls of dice

44

Modeling with labeled sequences

55

CML (conditional maximum likelihood)

6

3.4 HMM model structure

• Fully connected model?– Never works in practice due to local maxima

• In practice, successful models are constructed based on knowledge about the problem

• If we set akl=0, in the Baum-Weltch estimation, akl will remain 0

• How to choose a model with our knowledge?

7

Duration modeling

9

Silent States

for 200 states, it requires 200*199/2

transitions

for 200 non-silent states, it requires

around 600 transitions

10

For HMM without loops consisting entirely of silent states,all HMM algorithms in Section 3.2 and 3.3 could be extended.

For forward algorithm:

For HMM with loops consisting entirely of silent states,we could eliminate silent states by calculating the effectivetransition probabilities between real states in the model

11

3.5 Higher Order Markov Chains

2nd-order Markov Chain

11

13NORF: Non-coding Open Reading Frame

13

15

Inhomogeneous Markov Chain

• Use three different Markov Chains to model coding regions

Pr(x) =

• n-th order emission probabilities

15

16

• To avoid underflow error, two ways to deal with the problem– The log transformation

– Scaling of probabilities

3.6 Numerical stability of HMM algorithms

• The log transformation

18

• Scaling of probabilities

Viterbi Training

Documents

Transcript of Viterbi Training