Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13...
Transcript of Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13...
Introduction to Hidden Markov Modeling (HMM)
Daniel S. Terry
Scott Blanchard and Harel Weinstein labs
1
HMM is useful for many, many problems.
2
Speech Recognitionand Translation
Weather Modeling Sequence Alignment
Financial Modeling
So let’s say you’re riding out nuclear war in a bunker…
To keep sane, you want to know what the weather outside is like…
…but all you can observe is if the security guard brings his umbrella.
4
?
Probabilistic reasoning
5
P(Sunny|Umbrella)
XE
P(Cloudy|Umbrella) P(Rain|Umbrella)
P(Rain|No Umbrella)P(Cloudy|No Umbrella)P(Sunny|No Umbrella)
P(X|E) = probability of X happening if E is observed.
Ob
serv
atio
ns
Hidden State
6
Probabilistic reasoning in stochastic processes
Time
Observations(“Emissions”)
HiddenState
X0
E0
X1
E1
X2
E2
X3
E3
X4
E4
…HiddenState
Observations(“Emissions”)
This is called a Markov chain
7
Assumptions in Markov modeling
Assumption 1: This is a stationary process, specifically a first-order Markov Process:
P(Xt|Xt-1,Xt-2,Xt-3,…) = P(Xt|Xt-1)
…in other words, the current state depends only on the previous state.We call this the transition model.
Assumption 2: The current observations depends only on the current state:
P(Et|Xt,Xt-1,Xt-2,…,Et-1,Et-2,Et-3,…) = P(Et|Xt)
…in other words, the current observation depends only on the current state.We call this the observation (or emission) model.
X0
E0
X1
E1
X2
E2
X3
E3
X4
E4
…HiddenState
Observations(“Emissions”)
The initial and transition probability models: π and A
8
X0
E0
X1
E1
…
Xt-1 P(Xt = Sunny)
P(Xt = Cloudy)
P(Xt = Raining)
Sunny 0.7 0.25 0.05
Cloudy 0.33 0.33 0.33
Raining 0.2 0.6 0.2
Encodes prior knowledge about weather trends.
π
Sunny 0.7
Cloudy 0.15
Raining 0.15
P(X)
P(Xt|Xt-1)
The observation probability model: B
9
X0
E0
X1
E1
X2
E2
…HiddenState
Observations(“Emissions”)
Xt P(Et=Um.)
Sunny 0.05
Cloudy 0.10
Raining 0.85
Encodes prior knowledge about how likely people are to bring their umbrella depending on weather conditions.
Together these parameters define a Markov model.
10
},,{ BA
Initial StateProbabilities
State TransitionProbabilities
ObservationDistributions
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
11
Predicting state sequences from observations
Observation Sequence (t=1..T)
Predicted Hidden State Sequence
X0
E0
X1
E1
X2
E2
X3
E3
…
Markov Chain
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
Markov Model
Finding the optimal state sequence with Viterbi
12
},,{ BA Given a model that describes the system ( ), we can determine the optimal state sequence (idealization) as follows:
S
C
R
S
C
R
S
C
R
S
C
R
For each state at time t, calculate probability of the state at time t (Xt) being a particular state xi (sunning, raining, etc), given observations and previous states:
P(Xt=xi|Et,Et-1,Et-2,…,Xt-1,Xt-2,Xt-3,…) = P(Xt=xi|Et, Xt-1=xj) = P(Xt=xi|Et) × P(Xt=xi|Xt=xj)
P(X0=xi) = π
X0 X1 X2 X3
Time
States …
Finding the optimal state sequence with Viterbi
13
S
C
R
S
C
R
S
C
R
S
C
R
Repeat these calculations for all possible transitions recursively.Then at each point in time we have an estimate of how likely we are to be in a particular state at that time given all possible previous paths.We also keep track of the most likely state at each point in time.(This complex looking thing is called a trellis. Can you see why?)
X0 X1 X2 X3
Time
States …
Finding the optimal state sequence with Viterbi
14
S
C
R
S
C
R
S
C
R
S
C
R
Find the most likely end state from the probabilities.We can then backtrack to find the most likely state sequence.You have seen a similar procedure with sequence alignment.
X0 X1 X2 X3
Time
States …
15
Predicting state sequences from observations
Observation Sequence (t=1..T)
Predicted Hidden State Sequence
X0
E0
X1
E1
X2
E2
X3
E3
…
Markov Chain
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
Markov Model
16
A practical example of Markov modeling:
Analysis of single-molecule fluorescence trajectories
0 1 2 3 4 5
FR
ET
Time (min)
Flu
ore
sce
nce
…Ok, so I’m bored of talking about the weather.
www.nia.NIH.gov, public domain.
Neurotransmitter release and reuptake is central toneuronal signaling and proper functioning of the brain.
NSSReuptake
Neurotransmitter:Sodium Symporter (NSS) proteinsare the targets of many clinically-important drugs.
Drugs of Abuse
Therapeutic Inhibitors
www.nia.NIH.gov, public domain.
NSSReuptake
Intracellular
Extracellular
Neurotransmitter
High Na+ Outside
Low Na+ Inside
Key Question: What are the specific conformational changesrequired for such a mechanism and how do they mediate transport?
A practical example of Markov modeling:Analysis of single-molecule fluorescence trajectories
2 4 6 8 100.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Distance (nm)
R0
DonorAcceptor
Single molecule FRET:A tool for examining conformational dynamics
20
FRET imaging of single-molecules can be achieved usinga few tricks, including total internal reflection excitation.
DonorAcceptor
532 nm TIR Excitation
Surface
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Time (min)
0
5
10
15
Flu
ore
sce
nce
21
We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?
Co
nfo
rma
tion
0 2 4 6 8 10 12 14
FR
ET
Time (sec)
Flu
ore
sce
nce
HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.
Sequence of Hidden States
Sequence of Observations
22
X0
E0
X1
E1
X2
E2
…
Unlike with the weather, we have to learn the model form the data itself!!
Hidden Markov models have three components:
1) Initial state probabilities:
CO ,
2) Transition probabilities:
CCOC
COOO
jiaa
aaaA
,,
,,
, }{
},,{ BA
O CπO πC
aO,C
aO,O aC,C
aC,O
bO bC
23
3) Observation probability distribution (OPD):
0.4 0.5 0.6 0.7
FRET
μi
σi
2
2
2
)(exp
2
1)(
i
it
i
ti
EEbB
FRET distributionfor state i.
Goal: best model to explain the experimental data.
)|(argmaxˆ EP
In other words, we want to maximize the probability of the model giving the data.
(where λ is the model, E is the observed FRET trajectory)
)(
)()|()|(
EP
PEPEP
But we don’t know how to calculate P( λ | E ) !
Instead, turn it around using Bayes’ theorem:
)|(argmax)|(argmaxˆ EPEP
The prior probability P(E) is independent of the model choice andwill not affect model ranking. If we assume all models are equally likely, then:
24
P( E | λ ) is easy to calculate – it is the observation distribution.Why is X not here? We have to do this over all possible state sequences!
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Time (min)
Segmental k-means (SKM): optimization on the cheap
25
State assignment(Viterbi)
Parameterreestimation
• To get B, simply calculate the mean and std for each state from the current assignment.
• To get A, count the number of transitions of each type and normalize.
• To get π, count the number of times each dwell starts with each state xi and normalize.
F. Qin (2004), Biophys J 86: 1488
λ0
λi
Works only if the starting model that is close to final.
Model optimization: expectation maximization (EM).
Expectation: Calculate the probability of data given the model (expectation).
Maximization: Adjust model parameters to better fit the calculated probabilities.
Termination: Iterate until log-likelihood converges (e.g., ΔLL<10-4).
26
)]|()|(log[)](log[..0
1 tt
Tt
tt EXPXXPXPLL
Tt
tttt EXPXXPXPEP..0
1 )|()|()()|(
Initial (π) Transition (A) Observation (B)
Restarts: if the likelihood “landscape” is very frustrated, restarting from a random initial model can help get out of local minima.
27
The forward-backward algorithm (Baum Welch)
X0
E0
X1
E1
X2
E2
X3
E3
X96
E96
X97
E97
X98
E98
X99
E99
…
Calculating the probabilities at a particular point in time (t):
The “past” The “future”
),|()|( ..1..1..1 TtttTt EEXPEXP )|()|( ..1..1 Ttttt EXPEXP α
We can do this because of Bayes’ rule and conditional independence of observations over time…
We calculate these much like we did with Viterbi…
Forward Backward
28
The forward algorithm
Partial probabilities (α) are calculated recursively as:
αt(j) = P(observation|hidden state is j) × P(all paths to state j at time t)
Initial condition: α0(j) = π(j)∙B(j,Et)
Iterate:
Then the total probability of the sequence is the sum of these α’s…
O
C
O
C
O
C
O
C
X0 X1 X2 X3
Time
States …
jit
n
i
t aij ,
1
t1 )(Ej,B )(
Maximization using forward-backward probabilities
Probability of being in state i at time t:
Probability of transitioning from state i to j at time t:(from the Forward-Backward algorithm)
Model parameters adjusted to maximize log-likelihood:
29
This very much like SKM, except we use explicit probabilities instead of just counting.
The problem of bias
30
• You can always get a better fit using more parameters!But it may not be a good model.
• Bayesian information criterion (BIC):
-2∙ln* P(E|k) + ≈ BIC = -2∙ln(LL) + k∙ln(n)
k is the number of free parameters, LL is log-likelihood of the optimal “fit”, and n is the number of data points.
• Akike information (AIC)AIC = -2∙k - 2∙ln(LL)
• Maximum evidence methods (vbFRET), etc.
We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?
Co
nfo
rma
tion
0 2 4 6 8 10 12 14
FR
ET
Time (sec)
Flu
ore
sce
nce
HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.
Sequence of Hidden States
Sequence of Observations
31
X0
E0
X1
E1
X2
E2
…
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
Time (min)
FR
ET
+2 mM Ala2 mM Na+:
Quantifying kinetics is then useful for understandinghow outside factors (ligands) influence dynamics.
-1 0 1 2 3 40
10
20
30
Open State
Closed State
Dw
ell
Tim
e (
s)
log [Ala] (M)
-1 0 1 2 3 4
20
40
60
80
Occup
an
cy (
%)
log [Ala] (M)
Zhao and Terry, et al (2011), Nature 474
Other important examples of Markov modeling:
• Single-channel recordings (patch clamp)
• Sequence analysis
• Cardiac electrical modeling
• Systems modeling of metabolic networks
33
C
O
We can do non-equilibrium Markov modeling, too
34Geggier et al (2010), JMB 399: 576
HMM is useful for many, many problems.
35
Speech Recognitionand Translation
Weather Modeling Sequence Alignment
Financial Modeling
Some useful references
• Artificial Intelligence: A Modern Approach
• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html
• Rabiner (1989), Proc. of the IEEE 77: 257.
• Qin F. Principles of single-channel kinetic analysis. Methods Mol Biol. 2007; 403.
• Bronson et al (2009), Biophys J 97: 3196.
• QuB software suite: www.qub.buffalo.edu
36