Fast Temporal State-Splitting for HMM Model Selection and Learning
description
Transcript of Fast Temporal State-Splitting for HMM Model Selection and Learning
![Page 1: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/1.jpg)
Fast Temporal State-Splittingfor HMM Model Selection and
Learning
Sajid Siddiqi
Geoffrey Gordon
Andrew Moore
![Page 2: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/2.jpg)
t
x
![Page 3: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/3.jpg)
t
x
How many kinds of observations (x) ?
![Page 4: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/4.jpg)
t
x
How many kinds of observations (x) ? 3
![Page 5: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/5.jpg)
t
x
How many kinds of observations (x) ? 3
How many kinds of transitions (xt+1|xt)?
![Page 6: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/6.jpg)
t
x
How many kinds of observations (x) ? 3
How many kinds of transitions (xt+1|xt)? 4
![Page 7: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/7.jpg)
t
x
How many kinds of observations (x) ? 3
How many kinds of transitions ( xtxt+1)? 4
We say that this sequence ‘exhibits four states under the first-order Markov assumption’
Our goal is to discover the number of such states (and their parameter settings) in sequential data, and to do so efficiently
![Page 8: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/8.jpg)
Definitions
An HMM is a 3-tuple = {A,B,π}, where
A : NxN transition matrix
B : NxM observation probability matrix
π : Nx1 prior probability vector
|| : number of states in HMM , i.e. N
T : number of observations in sequence
qt : the state the HMM is in at time t
![Page 9: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/9.jpg)
HMMs as DBNs
1/3
q0
q1
q2
q3
q4
O0
O1
O2
O3
O4
![Page 10: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/10.jpg)
Each of these probability tables is identical
i P(qt+1=s1|qt=si) P(qt+1=s2|qt=si) … P(qt+1=sj|qt=si) …P(qt+1=sN|qt=si)
1 a11 a12…a1j
…a1N
2 a21 a22…a2j
…a2N
3 a31 a32…a3j
…a3N
: : : : : : :
i ai1 ai2…aij
…aiN
N aN1 aN2…aNj
…aNN
Transition Model
1/3
q0
q1
q2
q3
q4
Notation:
)|( 1 itjtij sqsqPa
![Page 11: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/11.jpg)
Observation Modelq0
q1
q2
q3
q4
O0
O1
O2
O3
O4
i P(Ot=1|qt=si) P(Ot=2|qt=si) … P(Ot=k|qt=si) … P(Ot=M|qt=si)
1 b1(1) b1 (2) … b1 (k) … b1(M)
2 b2 (1) b2 (2) … b2(k) … b2 (M)
3 b3 (1) b3 (2) … b3(k) … b3 (M)
: : : : : : :
i bi(1) bi (2) … bi(k) … bi (M)
: : : : : : :
N bN (1) bN (2) … bN(k) … bN (M)
Notation:)|()( itti sqkOPkb
![Page 12: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/12.jpg)
HMMs as DBNs
1/3
q0
q1
q2
q3
q4
O0
O1
O2
O3
O4
![Page 13: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/13.jpg)
HMMs as FSAsHMMs as DBNs
1/3
q0
q1
q2
q3
q4
O0
O1
O2
O3
O4
S1 S3
S2
S4
![Page 14: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/14.jpg)
Operations on HMMs Problem 1: Evaluation
Given an HMM and an observation sequence, what is the likelihood of this sequence?
Problem 2: Most Probable PathGiven an HMM and an observation sequence, what
is the most probable path through state space?Problem 3: Learning HMM parameters
Given an observation sequence and a fixed number of states, what is an HMM that is likely to have produced this string of observations?
Problem 3: Learning the number of states Given an observation sequence, what is an HMM (of
any size) that is likely to have produced this string of observations?
![Page 15: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/15.jpg)
Problem Algorithm Complexity
Evaluation:
Calculating P(O|)
Forward-Backward
O(TN2)
Path Inference:
Computing Q* = argmaxQ P(O,Q|) ViterbiO(TN2)
Parameter Learning:
1. Computing *=argmax,Q P(O,Q|2. Computing *=argmax P(O|
Viterbi Training Baum-Welch (EM) O(TN2)
Learning the number of states ?? ??
Operations on HMMs
![Page 16: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/16.jpg)
Path Inference
• Viterbi Algorithm for calculating argmaxQ P(O,Q|)
![Page 17: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/17.jpg)
t δt(1)
δt(2)
δt(3)
… δt(N)
1
2
3
4
5
6
7
8
9
![Page 18: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/18.jpg)
t δt(1)
δt(2)
δt(3)
… δt(N)
1
2 …
3 …
4
5
6
7
8
9
![Page 19: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/19.jpg)
Path Inference
• Viterbi Algorithm for calculating argmaxQ P(O,Q|)
Running time: O(TN2)
Yields a globally optimal path through hidden state space, associating each timestep with exactly one HMM state.
![Page 20: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/20.jpg)
Parameter Learning I
• Viterbi Training(≈ K-means for sequences)
![Page 21: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/21.jpg)
Parameter Learning I
• Viterbi Training(≈ K-means for sequences)
Q*s+1 = argmaxQ P(O,Q|s)
(Viterbi algorithm)
s+1 = argmax P(O,Q*s+1|)
Running time: O(TN2) per iteration
Models the posterior belief as a δ-function per timestep in the sequence. Performs well on data with easily distinguishable states.
![Page 22: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/22.jpg)
Parameter Learning II
• Baum-Welch(≈ GMM for sequences)1. Iterate the following two steps until
2. Calculate the expected complete log-likelihood given s
3. Obtain updated model parameters s+1 by maximizing this log-likelihood
![Page 23: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/23.jpg)
Parameter Learning II
• Baum-Welch(≈ GMM for sequences)1. Iterate the following two steps until
2. Calculate the expected complete log-likelihood given s
3. Obtain updated model parameters s+1 by maximizing this log-likelihood
Obj(,s) = EQ[P(O,Q|) | O,s]
s+1 = argmax Obj(,s)
Running time: O(TN2) per iteration, but with a larger constant Models the full posterior belief over hidden states per timestep. Effectively models sequences with overlapping states at the cost of extra computation.
![Page 24: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/24.jpg)
HMM Model Selection
• Distinction between model search and actual selection step– We can search the spaces of HMMs with
different N using parameter learning, and perform selection using a criterion like BIC.
![Page 25: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/25.jpg)
HMM Model Selection
• Distinction between model search and actual selection step– We can search the spaces of HMMs with
different N using parameter learning, and perform selection using a criterion like BIC.
Running time: O(Tn2) to compute likelihood for BIC
![Page 26: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/26.jpg)
HMM Model Selection I
• for n = 1 … Nmax• Initialize n-state HMM randomly• Learn model parameters• Calculate BIC score • If best so far, store model• if larger model not chosen, stop
![Page 27: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/27.jpg)
HMM Model Selection I
• for n = 1 … Nmax• Initialize n-state HMM randomly• Learn model parameters• Calculate BIC score• If best so far, store model• if larger model not chosen, stop
Running time: O(Tn2) per iteration
Drawback: Local minima in parameter optimization
![Page 28: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/28.jpg)
HMM Model Selection II
• for n = 1 … Nmax– for i = 1 … NumTries
• Initialize n-state HMM randomly• Learn model parameters• Calculate BIC score• If best so far, store model
– if larger model not chosen, stop
![Page 29: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/29.jpg)
HMM Model Selection II
• for n = 1 … Nmax– for i = 1 … NumTries
• Initialize n-state HMM randomly• Learn model parameters• Calculate BIC score• If best so far, store model
– if larger model not chosen, stopRunning time: O(NumTries x Tn2) per iteration
Evaluates NumTries candidate models for each n to overcome local minima. However: expensive, and still prone to local minima especially for large N
![Page 30: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/30.jpg)
Idea: Binary state splits* to generate candidate models
• To split state s into s1 and s2,
– Create ’ such that ’\s \s
– Initialize ’s1 and ’s2 based on s and on parameter constraints
* first proposed in Ostendorf and Singer, 1997
Notation:s : HMM parameters related to state s\s : HMM parameters not related to state s
![Page 31: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/31.jpg)
Idea: Binary state splits* to generate candidate models
• To split state s into s1 and s2,
– Create ’ such that ’\s \s
– Initialize ’s1 and ’s2 based on s and on parameter constraints
• This is an effective heuristic for avoiding local minima
* first proposed in Ostendorf and Singer, 1997
Notation:s : HMM parameters related to state s\s : HMM parameters not related to state s
![Page 32: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/32.jpg)
Overall algorithm
![Page 33: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/33.jpg)
Overall algorithmStart with a small number of states
Binary state splits* followed by EM
BIC on training set
Stop when bigger HMMis not selected
EM (B.W. or V.T.)
![Page 34: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/34.jpg)
Overall algorithmStart with a small number of states
Binary state splits
followed by EM
BIC on training set
Stop when bigger HMMis not selectedWhat is ‘efficient’? Want this
loop to be at most O(TN2)
EM (B.W. or V.T.)
![Page 35: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/35.jpg)
HMM Model Selection III
• Initialize n0-state HMM randomly• for n = n0 … Nmax
– Learn model parameters– for i = 1 … n
• Split state i, learn model parameters• Calculate BIC score• If best so far, store model
– if larger model not chosen, stop
![Page 36: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/36.jpg)
HMM Model Selection III
• Initialize n0-state HMM randomly• for n = n0 … Nmax
– Learn model parameters– for i = 1 … n
• Split state i, learn model parameters• Calculate BIC score• If best so far, store model
– if larger model not chosen, stop
O(Tn2)
![Page 37: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/37.jpg)
HMM Model Selection III
• Initialize n0-state HMM randomly• for n = n0 … Nmax
– Learn model parameters– for i = 1 … n
• Split state i, learn model parameters• Calculate BIC score• If best so far, store model
– if larger model not chosen, stop
Running time: O(Tn3) per iteration of outer loop
More effective at avoiding local minima than previous approaches. However, scales poorly because of n3 term.
O(Tn2)
![Page 38: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/38.jpg)
Fast Candidate Generation
![Page 39: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/39.jpg)
Fast Candidate GenerationOnly consider timesteps owned by s in Viterbi path
Only allow parameters of split states to vary
Merge parameters and store as candidate
![Page 40: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/40.jpg)
OptimizeSplitParams I: Split-State Viterbi Training (SSVT)
Iterate until convergence:
![Page 41: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/41.jpg)
Constrained Viterbi
Splitting state s to s1,s2. We calculate
using a fast ‘constrained’ Viterbi algorithm over only those timesteps owned by s in Q*, and constraining them to belong to s1
or s2 .
![Page 42: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/42.jpg)
t δt(1)
δt(2)
δt(3)
… δt(N)
1
2
3
4
5
6
7
8
9
The Viterbi path is denoted by Suppose we split state N into s1,s2
![Page 43: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/43.jpg)
t δt(1)
δt(2)
δt(3)
… δt(s1
)δt(s2
)
1
2
3
4
5
6
7
8
9
? ?
??????
The Viterbi path is denoted by Suppose we split state N into s1,s2
![Page 44: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/44.jpg)
t δt(1)
δt(2)
δt(3)
… δt(s1
)δt(s2
)
1
2
3
4
5
6
7
8
9
The Viterbi path is denoted by Suppose we split state N into s1,s2
![Page 45: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/45.jpg)
t δt(1)
δt(2)
δt(3)
… δt(s1
)δt(s2
)
1
2
3
4
5
6
7
8
9
The Viterbi path is denoted by Suppose we split state N into s1,s2
![Page 46: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/46.jpg)
![Page 47: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/47.jpg)
Iterate until convergence:
Running time: O(|Ts|n) per iteration
When splitting state s, assumes rest of the HMM parameters (\s ) and rest of the Viterbi path (Q*
\Ts) are both fixed
OptimizeSplitParams I: Split-State Viterbi Training (SSVT)
![Page 48: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/48.jpg)
Fast approximate BIC
Compute once for base model: O(Tn2)
Update optimistically* for candidate model: O(|Ts|)
* first proposed in Stolcke and Omohundro, 1994
![Page 49: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/49.jpg)
HMM Model Selection IV
• Initialize n0-state HMM randomly• for n = n0 … Nmax
– Learn model parameters– for i = 1 … n
• Split state i, optimize by constrained EM• Calculate approximate BIC score• If best so far, store model
– if larger model not chosen, stop
![Page 50: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/50.jpg)
HMM Model Selection IV
• Initialize n0-state HMM randomly• for n = n0 … Nmax
– Learn model parameters– for i = 1 … n
• Split state i, optimize by constrained EM• Calculate approximate BIC score• If best so far, store model
– if larger model not chosen, stop
Running time: O(Tn2) per iteration of outer loop!
O(Tn)
![Page 51: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/51.jpg)
Algorithms
SOFT: Baum-Welch / Constrained Baum-WelchHARD : Viterbi Training / Constrained Viterbi Training
faster, coarser
slower, more accurate
![Page 52: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/52.jpg)
Results
1. Learning fixed-size models
2. Learning variable-sized models
Baseline: Fixed-size HMM Baum-Welch with five restarts
![Page 53: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/53.jpg)
Learning fixed-size models
![Page 54: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/54.jpg)
![Page 55: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/55.jpg)
Fixed-size experiments table, continued
![Page 56: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/56.jpg)
Learning fixed-size models
![Page 57: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/57.jpg)
Learning fixed-size models
![Page 58: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/58.jpg)
Learning variable-size models
![Page 59: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/59.jpg)
Learning variable-size models
![Page 60: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/60.jpg)
Learning variable-size models
![Page 61: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/61.jpg)
Learning variable-size models
![Page 62: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/62.jpg)
Learning variable-size models
![Page 63: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/63.jpg)
Conclusion
• Pros:– Simple and efficient method for HMM model selection– Also learns better fixed-size models
• (Often faster than single run of Baum-Welch )
– Different variants suitable for different problems
![Page 64: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/64.jpg)
Conclusion
• Cons:– Greedy heuristic: No performance guarantees– Binary splits also prone to local minima– Why binary splits?– Works less well on discrete-valued data
• Greater error from Viterbi path assumptions
• Pros:– Simple and efficient method for HMM model selection– Also learns better fixed-size models
• (Often faster than single run of Baum-Welch )
– Different variants suitable for different problems
![Page 65: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/65.jpg)
Thank you
![Page 66: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/66.jpg)
Appendix
![Page 67: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/67.jpg)
Viterbi Algorithm
![Page 68: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/68.jpg)
Constrained Vit.
![Page 69: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/69.jpg)
EM for HMMs
![Page 70: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/70.jpg)
![Page 71: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/71.jpg)
More definitions
![Page 72: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/72.jpg)
![Page 73: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/73.jpg)
OptimizeSplitParams II: Constrained Baum-Welch
Iterate until convergence:
![Page 74: Fast Temporal State-Splitting for HMM Model Selection and Learning](https://reader036.fdocuments.us/reader036/viewer/2022062409/5681486c550346895db57975/html5/thumbnails/74.jpg)
Penalized BIC