Sequential Modeling with the Hidden Markov Model

Lecture 9Spoken Language Processing

Prof. Andrew Rosenberg

Markov Assumption

• If we can represent all of the information available in the present state, encoding the past is un-necessary.

The future is independent of the past given the present

Markov Assumption in Speech• Word Sequences• Phone Sequences• Part of Speech Tags• Syntactic constituents• Phrase sequences• Discourse Acts• Intonation

Markov Chain

• The probability of a sequence can be decomposed into a probability of sequential events.

x1 x2 x3

Hidden Markov model

• In a Hidden Markov Model the state sequence is unobserved.

• Only an observation sequence is available

q1 q2 q3

x1 x2 x3

Hidden Markov model

• Observations are MFCC vectors• States are phone labels• Each state (phone) has an associated

GMM modeling the MFCC likelihood

q1 q2 q3

x1 x2 x3

Forward-Backwards Algorithm • HMMs are trained by collecting and

distributing information from observations to states.

• The Forward-Backwards algorithm is a specific example of EM.

• In the HMM topology (variable relationship), the training converges in one forward pass, and a backwards pass.– hence the name

Forwards Backwards Algorithm

• Forwards-Step:– Collect up from the observations to the states– Collect from left state to right state.

• “Collect” – update parameters to correctly model the observations– Observation collection will give a distribution over states, given the initial

state– State collection will also give a distribution over states– the new q distribution will reflect the combination of these two

q1 q2 q3

x1 x2 x3

Forwards Backwards Algorithm

• Backwards-Step:– Distribute down to the observations from the states– Collect from left state to right state.

• “Distribute” – update parameters to correctly model the observations– Observation distribute updates the state-observation relationship– State distribution updates the state-state transition matrix

• Forward-backwards can be shown to converge in one pass.

q1 q2 q3

x1 x2 x3

Finite State Automata• “Start” “Accept” States• Epsilon Transitions• Relationship to Regular Expressions• Operations on FSA

– Addition– Inversion– Node expansion– Determinization

• Weighted automata allow probabilities to be assigned to transitions

State transitions as FSA

/d/ /t//ey/ /ax/

/ae/ /dx/

Word FSA to phone FSA

/d/ /t//ey/

/ae/ /dx/

MORE DATA

/m/ /ao/ /r/

Word FSA to phone FSA

/d/ /t//ey/

/ae/ /dx/

/m/ /ao/ /r/

Decoding a Hidden Markov Model

• Decoding is finding the most likely state sequence.

• How many state sequences are there in a HMM with N observations and k states?

Viterbi Decoding• Dynamic Programming can make this

a lot faster.• Idea: Any optimal sequence between

x0 and xn must include the optimal sequence between xn and xn-1.– Based on the Markov Assumption.

Viterbi Decoding• Probability of most likely state

sequence

• Recovering the the optimal sequence involves storing pointers as decisions are made.

Example (from Wikipedia)states = ('Rainy', 'Sunny') observations = ('walk', 'shop', 'clean') start_probability = {'Rainy': 0.6, 'Sunny': 0.4} transition_probability = { 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3}, 'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},} emission_probability = { 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}, 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},}

What is the most likely state sequence?

HMM Topology for Training• Rather than having one GMM per

phone, it is common for acoustic models to represent each phone as 3 triphones

S1 S3S2 S4 S5

Flat Start• In Flat Start training, GMM

parameters are initialized to global means and variances.

• Viterbi is used to perform forced alignment between observations and phone sequence.– The phone sequence is derived from the

lexical transcription and pronunciation model

Forced Alignment• Given a phone sequence and

observations, assign each observation to a phone.

• Uses– Identifying which observation belong to

each phone label for later training– Getting time boundaries for phone or

word labels.

Flat Start• In Flat Start training, GMM

parameters are initialized to global means and variances.

• Viterbi is used to perform forced alignment between observations and phone sequence.– The phone sequence is derived from the

lexical transcription and pronunciation model

• After alignment, retrain Acoustic Models, and repeat.

What about silence?

• If there is no “silence” state, the silent frames will be assigned to either the /d/ or the /ax/

• This leads to worse acoustic models.• A solution: Explicit training of silence models, /sp/

– Allowing /sp/ transitions at word boundaries

/d/ /ey/ /dx/ /ax/ /d/ /ey/ /dx/ /ax/ /d/ /ey/ /dx/ /ax/

Next Class• Pronunciation Modeling• Reading: J&M Chapter 2,

Section10.5.3, 11.1, 11.2

Sequential Modeling with the Hidden Markov Model

Documents

Transcript of Sequential Modeling with the Hidden Markov Model

Hidden Markov Models

Hidden Markov Models and Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov Model ... ASR Lectures 4&5 Hidden Markov Models and Gaussian

Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2.

L13: hidden Markov modelscourses.cs.tamu.edu/rgutier/csce630_f14/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models • Forward and Backward

Hidden Markov Models in Bioinformaticscsatol/mach_learn/bemutato/Mate_Korosi_HMMpr… · Outline ˜ Markov Chain ˜ HMM (Hidden Markov Model) ˜ Hidden Markov Models in Bioinformatics

Hidden Markov

Bioinformatics Introduction to Hidden Markov Models - … · Bioinformatics Introduction to Hidden Markov Models Hidden Markov Models and Multiple Sequence Alignment Slides borrowed

Sequence Classification - with emphasis on Hidden Markov …lazebnik/fall09/sequence... · 2009. 9. 30. · Sequential Data Methods Hidden Markov Models Kernels for Sequences ...

Inference in Mixed Hidden Markov Models and Applications ... · 2. From hidden Markov models to mixed hidden Markov models Many authors suggested applying hidden Markov models to

L13: hidden Markov models - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/sp/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models

Overview Hidden Markov Models Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov models HMM algorithms ... model ASR Lectures 4&5 Hidden

Hidden Markov Model

Hidden Markov models in time series, with applications in ...€¦ · Hidden Markov models are mixture models with sequential dependence or persistence in the mixture distribution.

Sequential Monte Carlo Sampling in Hidden Markov Models of ...anitescu/PUBLICATIONS/...the use of a hidden Markov model to estimate important functionals of the state of the system

Hidden Markov Models · Hidden Markov Models 1 10-601 Introduction to Machine Learning Matt Gormley Lecture 20 Nov. 7, 2018 ... Hidden Markov Model 28 A Hidden Markov Model (HMM)

Hidden Markov Models - cs.cmu.eduaarti/Class/10701/slides/Lecture17.pdf · Hidden Markov Models • Distributions that characterize sequential data with few parameters but are not

EE365: Hidden Markov Models - Stanford Universityee266.stanford.edu/lectures/hmm.pdf · EE365: Hidden Markov Models Hidden Markov Models The Viterbi Algorithm 1. Hidden Markov Models

Markov Chains and Hidden Markov Models - Rice University · are “hidden”; hence, we have a hidden Markov model, or HMM. ... In Markov chains and hidden Markov models, the probability

Hidden Markov Models./awm/tutorials/hmm14.pdf · Hidden Markov Models ... 14)

Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.