Viterbi Training

18
1 Viterbi Training • It is like Baum-Welsh. • Instead of the As and Es, the most probable paths for the training sequences are derived using the Viterbi algorithm. • Guaranteed to converge. Maximize 1

description

Viterbi Training. It is like Baum-Welsh. Instead of the As and Es , the most probable paths for the training sequences are derived using the Viterbi algorithm. Guaranteed to converge. Maximize. 1. Baum-Welsh Example:. Generating model. Estimated model from 300 rolls of dice. - PowerPoint PPT Presentation

Transcript of Viterbi Training

Page 1: Viterbi Training

1

Viterbi Training

• It is like Baum-Welsh. • Instead of the As and Es, the most probable paths for the training sequences are derived using the Viterbi algorithm.• Guaranteed to converge.• Maximize

1

Page 2: Viterbi Training

2

Baum-Welsh Example:

Generating model Estimated model from 300 rolls of dice

2

Page 3: Viterbi Training

33

Estimated model from 30000 rolls of dice

Page 4: Viterbi Training

44

Modeling with labeled sequences

Page 5: Viterbi Training

55

CML (conditional maximum likelihood)

Page 6: Viterbi Training

6

3.4 HMM model structure

• Fully connected model?– Never works in practice due to local maxima

• In practice, successful models are constructed based on knowledge about the problem

• If we set akl=0, in the Baum-Weltch estimation, akl will remain 0

• How to choose a model with our knowledge?

Page 7: Viterbi Training

7

Duration modeling

Page 8: Viterbi Training

8

Page 9: Viterbi Training

9

Silent States

for 200 states, it requires 200*199/2

transitions

for 200 non-silent states, it requires

around 600 transitions

Page 10: Viterbi Training

10

For HMM without loops consisting entirely of silent states,all HMM algorithms in Section 3.2 and 3.3 could be extended.

For forward algorithm:

For HMM with loops consisting entirely of silent states,we could eliminate silent states by calculating the effectivetransition probabilities between real states in the model

Page 11: Viterbi Training

11

3.5 Higher Order Markov Chains

2nd-order Markov Chain

11

Page 12: Viterbi Training

1212

Page 13: Viterbi Training

13NORF: Non-coding Open Reading Frame

13

Page 14: Viterbi Training

1414

Page 15: Viterbi Training

15

Inhomogeneous Markov Chain

• Use three different Markov Chains to model coding regions

Pr(x) =

• n-th order emission probabilities

15

Page 16: Viterbi Training

16

• To avoid underflow error, two ways to deal with the problem– The log transformation

– Scaling of probabilities

3.6 Numerical stability of HMM algorithms

• The log transformation

Page 17: Viterbi Training

17

Page 18: Viterbi Training

18

• Scaling of probabilities