Joint Maximum Likelihood Estimation for High-dimensional ...
Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood...
Transcript of Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood...
![Page 1: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/1.jpg)
1
CSEP 527 Spring 2016
4. Maximum Likelihood Estimation and the E-M Algorithm
![Page 2: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/2.jpg)
2
Outline
HW#2 Discussion
MLE: Maximum Likelihood Estimators
EM: the Expectation Maximization Algorithm
Next: Motif description & discovery
![Page 3: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/3.jpg)
Species Name Description Access
-ionscore to #1
1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein 1 P15172 1709
2 Homo sapiens (Human) TAL1_HUMAN T-cell acute lymphocytic leukemia protein 1 (TAL-1) P17542 143
3 Mus musculus (Mouse) MYOD1_MOUSE Myoblast determination protein 1 P10085 1494
4 Gallus gallus (Chicken) MYOD1_CHICK Myoblast determination protein 1 homolog (MYOD1 homolog) P16075 1020
5 Xenopus laevis (African clawed frog) MYODA_XENLA Myoblast determination protein 1 homolog A (Myogenic factor 1) P13904 978
6 Danio rerio (Zebrafish) MYOD1_DANRE Myoblast determination protein 1 homolog (Myogenic factor 1) Q90477 893
7 Branchiostoma belcheri (Amphioxus) Q8IU24_BRABE MyoD-related Q8IU24 428
8 Drosophila melanogaster (Fruit fly) MYOD_DROME Myogenic-determination protein (Protein nautilus) (dMyd) P22816 368
9 Caenorhabditis elegans LIN32_CAEEL Protein lin-32 (Abnormal cell lineage protein 32) Q10574 118
10 Homo sapiens (Human) SYFM_HUMAN Phenylalanyl-tRNA synthetase, mitochondrial O95363 56
HW # 2 Discussion
![Page 4: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/4.jpg)
![Page 5: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/5.jpg)
![Page 6: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/6.jpg)
MyoD
http://www.rcsb.org/pdb/explore/jmol.do?structureId=1MDY&bionumber=1
![Page 7: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/7.jpg)
25
Permutation Score Histogram vs Gaussian
score
Frequency
40 60 80 100 120
0200
400
600
800
1000 Histogram for scores of 20k
Smith-Waterman alignments of MyoD vs permuted versions of C. elegans Lin32. Looks roughly normal!
And real Lin32 scores well above highest permuted seq.
**
![Page 8: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/8.jpg)
Permutation Score Histogram vs Gaussian
score
Freq
uenc
y
40 60 80 100 120
020
040
060
080
010
00
*
True
Sco
re: 7
.9σ
*M
ax P
erm
Sco
re: 5
.7σ
31
Red curve is approx fit of EVD to score histogram – fit looks better, esp. in tail. Max permuted score has probability ~10-4, about what you’d expect in 2x104 trials.
True score is still moderately unlikely, < one tenth the above.
![Page 9: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/9.jpg)
9
Maximum Likelihood Estimators
Learning From Data: MLE
![Page 10: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/10.jpg)
10
Parameter Estimation
Given: independent samples x1, x2, ..., xn from a parametric distribution f(x|θ)
Goal: estimate θ.
E.g.: Given sample HHTTTTTHTHTTTHH of (possibly biased) coin flips, estimate
θ = probability of Heads
f(x|θ) is the Bernoulli probability mass function with parameter θ
![Page 11: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/11.jpg)
P(x | θ): Probability of event x given model θViewed as a function of x (fixed θ), it’s a probability
E.g., Σx P(x | θ) = 1
Viewed as a function of θ (fixed x), it’s called likelihoodE.g., Σθ P(x | θ) can be anything; relative values of interest. E.g., if θ = prob of heads in a sequence of coin flips then P(HHTHH | .6) > P(HHTHH | .5), I.e., event HHTHH is more likely when θ = .6 than θ = .5
And what θ make HHTHH most likely?
Likelihood
11
![Page 12: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/12.jpg)
Likelihood FunctionP( HHTHH | θ ):
Probability of HHTHH, given P(H) = θ:
θ θ4(1-θ)
0.2 0.0013
0.5 0.0313
0.8 0.0819
0.95 0.04070.0 0.2 0.4 0.6 0.8 1.0
0.00
0.02
0.04
0.06
0.08
Theta
P( H
HTH
H |
Thet
a)
max
![Page 13: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/13.jpg)
13
One (of many) approaches to param. est.Likelihood of (indp) observations x1, x2, ..., xn
As a function of θ, what θ maximizes the likelihood of the data actually observedTypical approach:
Maximum Likelihood Parameter Estimation
L(x1, x2, . . . , xn | ) =n
i=1
f(xi | )
@@L(~x | ) = 0 or
@@ logL(~x | ) = 0
![Page 14: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/14.jpg)
14
(Also verify it’s max, not min, & not better on boundary)
Example 1n independent coin flips, x1, x2, ..., xn; n0 tails, n1 heads, n0 + n1 = n; θ = probability of heads
Observed fraction of successes in sample is MLE of success probability in population
dL/dθ = 0
![Page 15: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/15.jpg)
15
Parameter EstimationGiven: indp samples x1, x2, ..., xn from a parametric distribution f(x|θ), estimate: θ.
E.g.: Given n normal samples, estimate mean & variance f(x) = 1
22 e(xµ)2/(22)
= (µ,2)
-3 -2 -1 0 1 2 3
µ ± σ
μ
![Page 16: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/16.jpg)
Ex2: I got data; a little birdie tells me it’s normal, and promises σ2 = 1
16
X X XX X XXX X
Observed Data
x →
![Page 17: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/17.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Which is more likely: (a) this?
17
X X XX X XXX X
Observed Data
μ unknown, σ2 = 1
![Page 18: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/18.jpg)
Which is more likely: (b) or this?
18
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
X X XX X XXX X
Observed Data
μ unknown, σ2 = 1
![Page 19: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/19.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Which is more likely: (c) or this?
19
X X XX X XXX X
Observed Data
μ unknown, σ2 = 1
![Page 20: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/20.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Which is more likely: (c) or this?
20
X X XX X XXX X
Observed Data
Looks good by eye, but how do I optimize my estimate of μ ?
μ unknown, σ2 = 1
![Page 21: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/21.jpg)
21
Ex. 2: xi N(µ,2), 2 = 1, µ unknown
And verify it’s max, not min & not better on boundary
Sample mean is MLE of population mean
dL/dθ = 0
L(x1, x2, . . . , xn
|) =nY
i=1
1p2
e
(xi)2/2
lnL(x1, x2, . . . , xn
|) =nX
i=1
1
2ln(2) (x
i
)2
2
d
d
lnL(x1, x2, . . . , xn
|) =nX
i=1
(xi
)
=
nX
i=1
x
i
! n = 0
b =
nX
i=1
x
i
!/n = x
![Page 22: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/22.jpg)
Hmm …, density ≠ probability
22
So why is “likelihood” function equal to product of densities?? (Prob of seeing any specific xi is 0, right?)
a) for maximizing likelihood, we really only care about relative likelihoods, and density captures that
b) has desired property that likelihood increases with better fit to the model
and/or
c) if density at x is f(x), for any small δ>0, the probability of a sample within ±δ/2 of x is ≈ δf(x), but δ is constant wrt θ, so it just drops out of d/dθ log L(…) = 0.
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
X X XX X
![Page 23: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/23.jpg)
Ex3: I got data; a little birdie tells me it’s normal (but does not tell me μ, σ2)
23
X X XX X XXX X
Observed Data
x →
![Page 24: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/24.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Which is more likely: (a) this?
24
X X XX X XXX X
Observed Data
μ, σ2 both unknown
μ ± 1
![Page 25: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/25.jpg)
Which is more likely: (b) or this?
25
μ, σ2 both unknown
-3 -2 -1 0 1 2 3
µ ± σ 3
X X XX X XXX X
Observed Data
μ ± 3
μ
![Page 26: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/26.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Which is more likely: (c) or this?
26
X X XX X XXX X
Observed Data
μ, σ2 both unknown
μ ± 1
![Page 27: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/27.jpg)
Which is more likely: (d) or this?
27
μ, σ2 both unknown
-3 -2 -1 0 1 2 3
µ ± σ
μ
X X XX X XXX X
Observed Data
μ ± 0.5
![Page 28: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/28.jpg)
Which is more likely: (d) or this?
28
X X XX X XXX X
Observed Data
Looks good by eye, but how do I optimize my estimates of μ & σ2 ?μ, σ2 both unknown
-3 -2 -1 0 1 2 3
µ ± σ
μ
μ ± 0.5
![Page 29: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/29.jpg)
29
Ex 3: xi N(µ,2), µ,2 both unknown
-0.4
-0.2
0
0.2
0.4
0.2
0.4
0.6
0.8
0
1
2
3
-0.4
-0.2
0
0.2
0.4θ1
θ2
Sample mean is MLE of population mean, again
In general, a problem like this results in 2 equations in 2 unknowns. Easy in this case, since θ2 drops out of the ∂/∂θ1 = 0 equation
Likelihood surface
lnL(x1, x2, . . . , xn|1, 2) =nX
i=1
1
2ln(22)
(xi 1)2
22
@
@1lnL(x1, x2, . . . , xn|1, 2) =
nX
i=1
(xi 1)
2= 0
b1 =
nX
i=1
xi
!/n = x
![Page 30: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/30.jpg)
30
Ex. 3, (cont.)
Sample variance is MLE of population variance
lnL(x1, x2, . . . , xn|1, 2) =nX
i=1
1
2ln(22)
(xi 1)2
22
@
@2lnL(x1, x2, . . . , xn|1, 2) =
nX
i=1
1
2
2
22+
(xi 1)2
222= 0
b2 =
Pni=1(xi b
1)2/n = s
2
![Page 31: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/31.jpg)
Bias? if Y is sample mean Y = (Σ1≤i≤n Xi)/n then E[Y] = (Σ1≤i≤n E[Xi])/n = n μ/n = μso the MLE is an unbiased estimator of population mean
Similarly, (Σ1≤i≤n (Xi-μ)2)/n is an unbiased estimator of σ2.Unfortunately, if μ is unknown, estimated from the same data, as above, is a consistent, but biased estimate of population variance. (An example of overfitting.) Unbiased estimate is:
Moral: MLE is a great idea, but not a magic bullet31
Ex. 3, (cont.)
I.e., limn→∞
= correct
![Page 32: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/32.jpg)
Biased? Yes. Why? As an extreme, think about n = 1. Then θ2 = 0; probably an underestimate!
Also, consider n = 2. Then θ1 is exactly between the two sample points, the position that exactly minimizes the expression for θ2. Any other choices for θ1, θ2 make the likelihood of the observed data slightly lower. But it’s actually pretty unlikely (probability 0, in fact) that two sample points would be chosen exactly equidistant from, and on opposite sides of the mean, so the MLE θ2 systematically underestimates θ2.
(But not by much, & bias shrinks with sample size.)
More on Bias of θ2
32
ˆ
θ1
θ2
θ2
![Page 33: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/33.jpg)
SummaryMLE is one way to estimate parameters from dataYou choose the form of the model (normal, binomial, ...)Math chooses the value(s) of parameter(s)Defining the “Likelihood Function” (based on the form of the model) is often the critical step; the math/algorithms to optimize it are generic
Often simply (d/dθ)(log Likelihood) = 0
Has the intuitively appealing property that the parameters maximize the likelihood of the observed data; basically just assumes your sample is “representative”
Of course, unusual samples will give bad estimates (estimate normal human heights from a sample of NBA stars?) but that is an unlikely event
Often, but not always, MLE has other desirable properties like being unbiased, or at least consistent
33
![Page 34: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/34.jpg)
Conditional Probability &
Bayes Rule
34
![Page 35: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/35.jpg)
conditional probability
S
SF
F
Conditional probability of E given F: probability that E occurs given
that F has occurred.
“Conditioning on F”
Written as P(E|F)
Means “P(E has happened, given F observed)”
E
E
35
where P(F) > 0
![Page 36: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/36.jpg)
law of total probability
E and F are events in the sample space S
E = EF ∪ EFc
EF ∩ EFc = ∅
⇒ P(E) = P(EF) + P(EFc)
S
E F
36
![Page 37: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/37.jpg)
Most common form:
Expanded form (using law of total probability):
Proof:
Bayes Theorem
37
![Page 38: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/38.jpg)
38
EM The Expectation-Maximization Algorithm(for aTwo-Component Gaussian Mixture)
![Page 39: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/39.jpg)
A Hat TrickTwo slips of paper in a hat:
Pink: μ = 3, and Blue: μ = 7.
You draw one, then (without revealing color or μ) reveal a single sample X ~ Normal(mean μ, σ2 = 1).
You happen to draw X = 6.001.
Dr. Mean says “your slip = 7.” What is P(correct)?
What if X had been 4.9?
39
![Page 40: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/40.jpg)
40
0 1 2 3 4 5 6 7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
A Hat Trick
x
density
X X
Let “X 6” be a shorthand for 6.001 /2 < X < 6.001 + /2
P (µ = 7|X = 6) = lim
!0P (µ = 7|X 6)
P (µ = 7|X 6) =
P (X 6|µ = 7)P (µ = 7)
P (X 6)
=
0.5P (X 6|µ = 7)
0.5P (X 6|µ = 3) + 0.5P (X 6|µ = 7)
f(X = 6|µ = 7)
f(X = 6|µ = 3) + f(X = 6)|µ = 7), so
P (µ = 7|X = 6) =
f(X = 6|µ = 7)
f(X = 6|µ = 3) + f(X = 6)|µ = 7)
0.982
f = normal density
3σ σ
Bayes rule
![Page 41: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/41.jpg)
Another Hat TrickTwo secret numbers, μpink and μblue
On pink slips, many samples of Normal(μpink, σ2 = 1),
Ditto on blue slips, from Normal(μblue, σ2 = 1).
Based on 16 of each, how would you “guess” the secrets (where “success” means your guess is within ±0.5 of each secret)?
Roughly how likely is it that you will succeed?
41
![Page 42: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/42.jpg)
Another Hat Trick (cont.)
Pink/blue = red herrings; separate & independent
Given X1, …, X16 ~ N(μ, σ2), σ2 = 1
Calculate Y = (X1 + … + X16)/16 ~ N( ? , ? )
E[Y] =
Var(Y) =
I.e., Xi’s are all ~ N(μ, 1); Y is ~ N(μ, 1/16)
and since 0.5 = 2 sqrt(1/16), we have:
“Y within ±.5 of μ” = “Y within ±2 σ of μ” ≈ 95% prob
Note 1: Y is a point estimate for μ; Y ± 2 σ is a 95% confidence interval for μ (More on this topic later)
42
μ 16σ2/162 = σ2/16 = 1/16
![Page 43: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/43.jpg)
Histogram of 1000 samples of the average of 16 N(0,1) RVs
Sum
Frequency
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
050
100
150
Sample Mean
Red = N(0,1/16) density
Freq
uenc
y
![Page 44: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/44.jpg)
44
Hat Trick 2 (cont.)
0 1 2 3 4 5 6 7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
A Hat Trick
x
density
X XX X
Note 2:
What would you do if some of the slips you pulled had coffee spilled on them, obscuring color?
If they were half way between means of the others? If they were on opposite sides of the means of the others
![Page 45: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/45.jpg)
-3 -2 -1 0 1 2 3
µ ± σ
μ
1
Previously: How to estimate μ given data
45
X X XX X XXX X
Observed Data
For this problem, we got a nice, closed form, solution, allowing calculation of the μ, σ that maximize the likelihood of the
observed data.
We’re not always so lucky...
![Page 46: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/46.jpg)
This?
Or this?
(A modeling decision, not a math problem..., but if the later, what math?)
46
More Complex Example
![Page 47: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/47.jpg)
A Living Histogram
47
Text
http://mindprod.com/jgloss/histogram.html
male and female genetics students, University of Connecticut in 1996
![Page 48: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/48.jpg)
Another Real Example:CpG content of human gene promoters
“A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters” Saxonov, Berg, and Brutlag, PNAS 2006;103:1412-1417
©2006 by National Academy of Sciences48
![Page 49: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/49.jpg)
49
No closed-formmax
Parameters
means µ1 µ2
variances 21 2
2
mixing parameters 1 2 = 1 1
P.D.F. f(x|µ1,21) f(x|µ2,2
2)
Likelihood
L(x1, x2, . . . , xn|µ1, µ2,21 ,2
2 , 1, 2)
=n
i=1
2j=1 jf(xi|µj ,2
j )
Gaussian Mixture Models / Model-based Clustering
separately
together
![Page 50: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/50.jpg)
50
-20
-10
0
10
20 -20
-10
0
10
20
0
0.05
0.1
0.15
-20
-10
0
10
20
Likelihood Surface
μ1
μ2
![Page 51: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/51.jpg)
51
-20
-10
0
10
20 -20
-10
0
10
20
0
0.05
0.1
0.15
-20
-10
0
10
20
σ2 = 1.0τ1 = .5τ2 = .5
xi =−10.2, −10, −9.8−0.2, 0, 0.211.8, 12, 12.2
(-5,12)
(-10,6)
(6,-10)
(12,-5)
μ1
μ2
![Page 52: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/52.jpg)
52
Messy: no closed form solution known for finding θ maximizing L
But what if we knew the hidden data?
A What-If Puzzle
![Page 53: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/53.jpg)
53
EM as Egg vs ChickenIF parameters θ known, could estimate zij
E.g., |xi - µ1|/σ1 ≫ |xi - µ2|/σ2 ⇒ P[zi1=1] ≪ P[zi2=1]
IF zij known, could estimate parameters θ E.g., only points in cluster 2 influence µ2, σ2
But we know neither; (optimistically) iterate: E-step: calculate expected zij, given parameters
M-step: calculate “MLE” of parameters, given E(zij)
Overall, a clever “hill-climbing” strategy
Hat
Trick
1
Hat
Trick
2
Hat
Trick
1
Hat
Trick
2
![Page 54: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/54.jpg)
54
Simple Version: “Classification EM”
If E[zij] < .5, pretend zij = 0; E[zij] > .5, pretend it’s 1
I.e., classify points as component 1 or 2Now recalc θ, assuming that partition (standard MLE)Then recalc E[zij], assuming that θThen re-recalc θ, assuming new E[zij], etc., etc. “Full EM” is slightly more involved, (to account for uncertainty in classification) but this is the crux.
Not “E
M,” bu
t may
help
clarif
y con
cept
s
“K-meansclustering,”essentially
Not wha
t’s ne
eded
for
homew
ork,
but m
ay
help
clarif
y con
cept
s
Another contrast: HMM parameter estimation via “Viterbi” vs “Baum-Welch” training. In both, “hidden data” is “which state was it in at each step?” Viterbi is like E-step in classification EM: it makes a single state prediction. B-W is full EM: it captures the uncertainty in state prediction, too. For either, M-step maximizes HMM emission/transition probabilities, assuming those fixed states (Viterbi) / uncertain states (B-W).
![Page 55: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/55.jpg)
55
Full EM
![Page 56: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/56.jpg)
56
The E-step: Find E(zij), i.e., P(zij=1)
Assume θ known & fixedA (B): the event that xi was drawn from f1 (f2)D: the observed datum xi
Expected value of zi1 is P(A|D)
Repeat for
each xi
E = 0 · P (0) + 1 · P (1)
Note: denominator = sum of numerators - i.e. that which normalizes sum to 1 (typical Bayes)
E[zi1] =
![Page 57: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/57.jpg)
57
0 1 2 3 4 5 6 7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
A Hat Trick
x
density
X X
Let “X 6” be a shorthand for 6.001 /2 < X < 6.001 + /2
P (µ = 7|X = 6) = lim
!0P (µ = 7|X 6)
P (µ = 7|X 6) =
P (X 6|µ = 7)P (µ = 7)
P (X 6)
=
0.5P (X 6|µ = 7)
0.5P (X 6|µ = 3) + 0.5P (X 6|µ = 7)
f(X = 6|µ = 7)
f(X = 6|µ = 3) + f(X = 6)|µ = 7), so
P (µ = 7|X = 6) =
f(X = 6|µ = 7)
f(X = 6|µ = 3) + f(X = 6)|µ = 7)
0.982
f = normal density
3σ σRecall
![Page 58: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/58.jpg)
58
Complete Data Likelihood
(Better):
equal, if zij are 0/1
![Page 59: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/59.jpg)
59
M-step: Find θ maximizing E(log(Likelihood))
wrt dist of zij
![Page 60: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/60.jpg)
60
Hat Trick 2 (cont.)
0 1 2 3 4 5 6 7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
A Hat Trick
x
density
X XX X
Note 2: red/blue separation is just like the M-step of EM if values of the hidden variables (zij) were known.
What if they’re not? E.g., what would you do if some of the slips you pulled had coffee spilled on them, obscuring color?
If they were half way between means of the others? If they were on opposite sides of the means of the others
Recall
![Page 61: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/61.jpg)
M-step:calculating mu’s
row sum avg
E[zi1] 0.99 0.98 0.7 0.2 0.03 0.01 2.91E[zi2] 0.01 0.02 0.3 0.8 0.97 0.99 3.09
xi 9 10 11 19 20 21 90 15E[zi1]xi 8.9 9.8 7.7 3.8 0.6 0.2 31.0 10.66E[zi1]xi 0.1 0.2 3.3 15.2 19.4 20.8 59.0 19.09 ne
w μ
’s
old
E’s
In words: μj is the average of the observed xi’s, weighted by the probability that xi was sampled from component j.
61
![Page 62: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/62.jpg)
62
2 Component Mixtureσ1 = σ2 = 1; τ = 0.5
Essentially converged in 2 iterations
(Excel spreadsheet on course web)
![Page 63: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/63.jpg)
63
EM Summary
Fundamentally a maximum likelihood parameter estimation problem; broader than just Gaussian
Useful if 0/1 hidden data, and if analysis would be more tractable if hidden data z were known
Iterate: E-step: estimate E(z) for each z, given θM-step: estimate θ maximizing E[log likelihood] given E[z] [where “E[logL]” is wrt random z ~ E[z] = p(z=1)]
Baye
s
MLE
![Page 64: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/64.jpg)
64
EM IssuesUnder mild assumptions (DEKM sect 11.6), EM is guaranteed to increase likelihood with every E-M iteration, hence will converge.
But it may converge to a local, not global, max. (Recall the 4-bump surface...)
Issue is intrinsic (probably), since EM is often applied to NP-hard problems (including clustering, above and motif-discovery, soon)
Nevertheless, widely used, often effective
![Page 65: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/65.jpg)
Applications
65
Clustering is a remarkably successful exploratory data analysis tool
Web-search, information retrieval, gene-expression, ...
Model-based approach above is one of the leading ways to do it
Gaussian mixture models widely usedWith many components, empirically match arbitrary distribution
Often well-justified, due to “hidden parameters” driving the visible data
EM is extremely widely used for “hidden-data” problemsHidden Markov Models – speech recognition, DNA analysis, ...
![Page 66: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/66.jpg)
27!
Given: 104 unlabeled, scanned images of handwritten digits, say 25 x 25 pixels,
Goal: automatically classify new examples
Possible Method:
Each image is a point in ℝ625; the “ideal” 7, say, is one such point; model other 7’s as a Gaussian cloud around it
Do EM, as above, but 10 components in 625 dimensions instead of 2 components in 1 dimension
“Recognize” a new digit by best fit to those 10 models, i.e., basically max E-step probability
A “Machine Learning” ExampleHandwritten Digit Recognition
66
![Page 67: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/67.jpg)
67
Relative entropy
![Page 68: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/68.jpg)
68
• AKA Kullback-Liebler Distance/Divergence, AKA Information Content
• Given distributions P, Q
Notes:
Relative Entropy
H(P ||Q) =!
x∈Ω
P (x) logP (x)Q(x)
Undefined if 0 = Q(x) < P (x)
Let P (x) logP (x)Q(x)
= 0 if P (x) = 0 [since limy→0
y log y = 0]
![Page 69: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/69.jpg)
69
• Intuition: A quantitative measure of how much P “diverges” from Q. (Think “distance,” but note it’s not symmetric.)• If P ≈ Q everywhere, then log(P/Q) ≈ 0, so H(P||Q) ≈ 0• But as they differ more, sum is pulled above 0 (next 2 slides)
• What it means quantitatively: Suppose you sample x, but aren’t sure whether you’re sampling from P (call it the “null model”) or from Q (the “alternate model”). Then log(P(x)/Q(x)) is the log likelihood ratio of the two models given that datum. H(P||Q) is the expected per sample contribution to the log likelihood ratio for discriminating between those two models.
• Exercise: if H(P||Q) = 0.1, say. Assuming Q is the correct model, how many samples would you need to confidently (say, with 1000:1 odds) reject P?
Relative EntropyH(P ||Q) =
!
x∈Ω
P (x) logP (x)Q(x)
![Page 70: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/70.jpg)
70
lnx ≤ x − 1
− lnx ≥ 1 − xln(1/x) ≥ 1 − x
lnx ≥ 1 − 1/x
0.5 1 1.5 2 2.5
-2
-1
1
(y = 1/x)
y y
![Page 71: Spring 2016 4. Maximum Likelihood Estimation and the E-M ... · Spring 2016 4. Maximum Likelihood Estimation ... 1 Homo sapiens (Human) MYOD1_HUMAN Myoblast determination protein](https://reader033.fdocuments.us/reader033/viewer/2022042916/5f57898eb124255fa00db71c/html5/thumbnails/71.jpg)
71
Theorem: H(P ||Q) ≥ 0
Furthermore: H(P||Q) = 0 if and only if P = QBottom line: “bigger” means “more different”
H(P ||Q) =!
x P (x) log P (x)Q(x)
≥!
x P (x)"1 − Q(x)
P (x)
#
=!
x(P (x) − Q(x))
=!
x P (x) −!
x Q(x)
= 1 − 1
= 0
Idea: if P ≠ Q, then
P(x)>Q(x) ⇒ log(P(x)/Q(x))>0
and
P(y)<Q(y) ⇒ log(P(y)/Q(y))<0
Q: Can this pull H(P||Q) < 0? A: No, as theorem shows. Intuitive reason: sum is weighted by P(x), which is bigger at the positive log ratios vs the negative ones.