Edinburgh MT lecture 3: Latent Variable Models and Word Alignment
-
Upload
alopezfoo -
Category
Technology
-
view
406 -
download
5
Transcript of Edinburgh MT lecture 3: Latent Variable Models and Word Alignment
![Page 1: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/1.jpg)
• Machine learning • Databases • Algorithms and systems • Statistics and optimization
Four year PhD programme Courses + PhD dissertation (No previous MSc required)
http://datascience.inf.ed.ac.uk/
Discovering new patterns and knowledge from data
• Big data • Natural language processing • Computer vision • Speech processing
at the University of Edinburgh
![Page 2: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/2.jpg)
The biggest revolution in the technological landscape for fifty years
Now accepting applications! Find out more and apply at:
pervasiveparallelism.inf.ed.ac.uk
• • 4-year programme: 4-year programme: MSc by Research + PhDMSc by Research + PhD
• Collaboration between: ▶ University of Edinburgh’s School of Informatics ✴ Ranked top in the UK by 2014 REF
▶ Edinburgh Parallel Computing Centre ✴ UK’s largest supercomputing centre
• Full funding available
• Industrial engagement programme includes internships at leading companies
• Research-focused: Work on your thesis topic from the start
• Research topics in software, hardware, theory and
application of: ▶ Parallelism ▶ Concurrency ▶ Distribution
![Page 3: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/3.jpg)
Latent Variable Models and Word Alignment
INFR11062 spring 2015lecture 3
![Page 4: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/4.jpg)
Goal
•Write down a model over sentence pairs.
•Learn an instance of the model from data.
•Use it to predict translations of new sentences.
![Page 5: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/5.jpg)
Goal
•Write down a model over sentence pairs.
•Learn an instance of the model from data.
•Use it to predict translations of new sentences.
![Page 6: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/6.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
IBM Model 1
![Page 7: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/7.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
IBM Model 1
p(f ,a|e)Let’s write a simple model in terms of word-to-word alignments:
![Page 8: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/8.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
IBM Model 1
![Page 9: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/9.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
![Page 10: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/10.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
![Page 11: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/11.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
p(English length|Chinese length)
![Page 12: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/12.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
p(English length|Chinese length)
![Page 13: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/13.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
![Page 14: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/14.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
p(Chinese word position)
![Page 15: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/15.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
![Page 16: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/16.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However
![Page 17: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/17.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However
p(English word|Chinese word)
![Page 18: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/18.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However
![Page 19: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/19.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However
![Page 20: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/20.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However ,
![Page 21: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/21.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However ,
![Page 22: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/22.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However , the
![Page 23: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/23.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
IBM Model 1
However , the sky remained clear under the strong north wind .
![Page 24: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/24.jpg)
IBM Model 1
French, Englishsentence lengths
alignment of French word at position i
French, Englishword pair
p(f ,a|e) = p(I|J)IY
i=1
p(ai|J) · p(fi|eai)
![Page 25: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/25.jpg)
IBM Model 1
p(despite| )虽然
p(however| )
p(although| )
p(northern| )
p(north| )北
虽然
虽然
北
![Page 26: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/26.jpg)
IBM Model 1
p(despite| )虽然
p(however| )
p(although| )
p(northern| )
p(north| )北
虽然
虽然
北
???
???
???
???
???
![Page 27: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/27.jpg)
IBM Model 1
p(despite| )虽然
p(however| )
p(although| )
p(northern| )
p(north| )北
虽然
虽然
北
???
???
???
???
???
✓{
![Page 28: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/28.jpg)
• Optimization
![Page 29: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/29.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
MLE for IBM Model 1 (observed)
![Page 30: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/30.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = argmax
✓p(f ,a|e)
![Page 31: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/31.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓
NY
n=1
0
@p(I(n)|J (n))
I(n)Y
i=1
p(a(n)i |J (n)
) · p(f (n)i |e(n)
ai)
1
A
![Page 32: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/32.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓
NY
n=1
0
@p(I(n)|J (n))
I(n)Y
i=1
p(a(n)i |J (n)
) · p(f (n)i |e(n)
ai)
1
A
number ofsentences
French, Englishsentence lengths
alignment of French word at position i
French, Englishword pair
![Page 33: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/33.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓
NY
n=1
0
@p(I(n)|J (n))
I(n)Y
i=1
p(a(n)i |J (n)
) · p(f (n)i |e(n)
ai)
1
A
constant (w.r.t. words)!{
![Page 34: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/34.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓C
NY
n=1
I(n)Y
i=1
p(f (n)i |e(n)
ai)
![Page 35: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/35.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓log
0
@CNY
n=1
I(n)Y
i=1
p(f (n)i |e(n)
ai)
1
A
log(a) < log(b) () a < b
![Page 36: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/36.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓
log
0
@C ·Y
f,e
p(f |e)count(hf,ei)
1
A
![Page 37: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/37.jpg)
MLE for IBM Model 1 (observed)
ˆ
✓ = arg max
✓log C +
X
f,e
count(hf, ei) log p(f |e)
log of product = sum of logs
![Page 38: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/38.jpg)
MLE for IBM Model 1 (observed)
⇤(✓,�) = log C +
X
f,e
count(hf, ei) log p(f |e)
�X
e
�e
0
@X
f
p(f |e) � 1
1
A
Lagrange multiplier expresses normalization constraint{
![Page 39: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/39.jpg)
MLE for IBM Model 1 (observed)
⇤(✓,�) = log C +
X
f,e
count(hf, ei) log p(f |e)
�X
e
�e
0
@X
f
p(f |e) � 1
1
A
@⇤(✓,�)@p(f |e) =
count(hf, ei)p(f |e) � �ederivative
![Page 40: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/40.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
MLE for IBM Model 1 (observed)
虽然# of times 虽然 aligns to any word
# of times 虽然 aligns to Howeverp(however| ) =
![Page 41: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/41.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
MLE for IBM Model 1 (unobserved)
虽然p(however| ) = ???
![Page 42: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/42.jpg)
MLE for IBM Model 1 (observed)
ˆ✓ = arg max
✓log
0
@CNY
n=1
I(n)Y
i=1
p(f (n)i |e(n)
ai)
1
A
![Page 43: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/43.jpg)
ˆ✓ = arg max
✓log
0
@CNY
n=1
X
a
I(n)Y
i=1
p(f (n)i |e(n)
ai)
1
A
MLE for IBM Model 1 (unobserved)
Data likelihood = marginal probability of observed data
p(f |e) =X
a
p(f ,a|e)
![Page 44: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/44.jpg)
MLE for IBM Model 1 (unobserved)
ˆ✓ = arg max
✓
log
0
@C ·Y
f,e
p(f |e)E[count(hf,ei)]
1
A
![Page 45: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/45.jpg)
MLE for IBM Model 1 (unobserved)
ˆ✓ = arg max
✓
log
0
@C ·Y
f,e
p(f |e)E[count(hf,ei)]
1
A
Not constant! Depends on parameters, no analytic solution.
![Page 46: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/46.jpg)
MLE for IBM Model 1 (unobserved)
ˆ✓ = arg max
✓
log
0
@C ·Y
f,e
p(f |e)E[count(hf,ei)]
1
A
Not constant! Depends on parameters, no analytic solution.
But it does strongly imply an iterative solution.
![Page 47: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/47.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
Likelihood Estimation for Model 1
However
p(English word|Chinese word)
, the sky remained clear under the strong north wind .
unobserved!
Parameters and alignments are both unknown.
![Page 48: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/48.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
Likelihood Estimation for Model 1
However
p(English word|Chinese word)
, the sky remained clear under the strong north wind .
unobserved!
If we knew the alignments, we could calculate the values of the parameters.
Parameters and alignments are both unknown.
![Page 49: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/49.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
Likelihood Estimation for Model 1
However
p(English word|Chinese word)
, the sky remained clear under the strong north wind .
unobserved!
If we knew the alignments, we could calculate the values of the parameters.
If we knew the parameters, we could calculate the likelihood of the data.
Parameters and alignments are both unknown.
![Page 50: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/50.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
Likelihood Estimation for Model 1
However
p(English word|Chinese word)
, the sky remained clear under the strong north wind .
unobserved!
If we knew the alignments, we could calculate the values of the parameters.
If we knew the parameters, we could calculate the likelihood of the data.
Parameters and alignments are both unknown.
![Page 51: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/51.jpg)
The Plan: Bootstrapping
•Arbitrarily select a set of parameters (say, uniform).
•Calculate expected counts of the unseen events.
•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.
•Iterate.
![Page 52: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/52.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
The Plan: Bootstrapping
However , the sky remained clear under the strong north wind .
![Page 53: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/53.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
The Plan: Bootstrapping
However , the sky remained clear under the strong north wind .
if we had observed the alignment, this line would
either be here (count 1) or it wouldn’t (count 0).
![Page 54: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/54.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
�
The Plan: Bootstrapping
However , the sky remained clear under the strong north wind .
if we had observed the alignment, this line would
either be here (count 1) or it wouldn’t (count 0).
since we didn’t observe the alignment, we calculate the probability that it’s there.
![Page 55: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/55.jpg)
Remember: Probabilistic Primer136136136136136136
136136136136136136
136136136136136136
136136136136136136
136136136136136136
136136136136136136
p(Y = 1) =X
x2X
p(X = x, Y = 1) =16
marginal probability
![Page 56: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/56.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p( )
) +
) +
Marginalize: sum all alignments containing the link
![Page 57: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/57.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
) +
) +
)
Divide by sum of all possible alignments
![Page 58: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/58.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
) +
) +
)
Divide by sum of all possible alignments
Is this hard? How many alignments are there?
![Page 59: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/59.jpg)
Computing Expectations
probability of an alignment.
p(F,A|E) = p(I|J)Y
ai
p(ai = j)p(fi|ej)
![Page 60: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/60.jpg)
Computing Expectations
probability of an alignment.
observed uniform
p(F,A|E) = p(I|J)Y
ai
p(ai = j)p(fi|ej)
![Page 61: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/61.jpg)
factors across words.
Computing Expectations
probability of an alignment.
observed uniform
p(F,A|E) = p(I|J)Y
ai
p(ai = j)p(fi|ej)
![Page 62: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/62.jpg)
Computing Expectations
北北
.�
a⇥A: �north
p(north| ) · p(rest of a)
marginal probability of alignments containing link
![Page 63: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/63.jpg)
Computing Expectations
p(north| ).�
a⇥A: �north
p(rest of a)北北
marginal probability of alignments containing link
![Page 64: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/64.jpg)
Computing Expectations
p(north| ).�
a⇥A: �north
p(rest of a)北北
marginal probability of alignments containing link
�
c⇥Chinese words
p(north|c).�
a⇥A: �north
p(rest of a)
marginal probability of all alignments
c
![Page 65: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/65.jpg)
Computing Expectations
p(north| ).�
a⇥A: �north
p(rest of a)北北
marginal probability of alignments containing link
�
c⇥Chinese words
p(north|c).�
a⇥A: �north
p(rest of a)
marginal probability of all alignments
c
![Page 66: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/66.jpg)
Computing Expectations
p(north| ).�
a⇥A: �north
p(rest of a)北北
marginal probability of alignments containing link
�
c⇥Chinese words
p(north|c).�
a⇥A: �north
p(rest of a)
marginal probability of all alignments
c
identical!
![Page 67: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/67.jpg)
Computing Expectations
北p(north| ).�
c�Chinese words p(north|c)
![Page 68: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/68.jpg)
Computing Expectations
北p(north| ).�
c�Chinese words p(north|c)
marginal probability (expected count) of an alignment containing the link
![Page 69: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/69.jpg)
Computing Expectations
北p(north| ).�
c�Chinese words p(north|c)
marginal probability (expected count) of an alignment containing the link
For each sentence, use this quantity instead of 0 or 1
![Page 70: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/70.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
Maximum Likelihood
虽然# of times 虽然 aligns to any word
# of times 虽然 aligns to Howeverp(however| ) =
![Page 71: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/71.jpg)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
Expectation Maximization
虽然
Expected # of times 虽然 aligns to However
p(however| ) =# of times 虽然 aligns to any word
![Page 72: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/72.jpg)
Expectation Maximization
•Arbitrarily select a set of parameters (say, uniform).
•Calculate expected counts of the unseen events.
•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.
•Iterate. (Until when?)
![Page 73: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/73.jpg)
Textbook example
![Page 74: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/74.jpg)
Textbook example
![Page 75: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/75.jpg)
Textbook example
![Page 76: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/76.jpg)
Textbook example
![Page 77: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/77.jpg)
Textbook example
![Page 78: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/78.jpg)
Expectation Maximization
北p(north| ).�
c�Chinese words p(north|c)
Why does this even work?
![Page 79: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/79.jpg)
Expectation Maximization
Observation 1: We are still solving a maximum likelihood estimation problem.
![Page 80: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/80.jpg)
Expectation Maximization
Observation 1: We are still solving a maximum likelihood estimation problem.
p(Chinese|English) =X
alignments
p(Chinese, alignment|English)
![Page 81: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/81.jpg)
Expectation Maximization
Observation 1: We are still solving a maximum likelihood estimation problem.
p(Chinese|English) =X
alignments
p(Chinese, alignment|English)
MLE: choose parameters that maximize this expression.
![Page 82: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/82.jpg)
Expectation Maximization
Observation 1: We are still solving a maximum likelihood estimation problem.
p(Chinese|English) =X
alignments
p(Chinese, alignment|English)
MLE: choose parameters that maximize this expression.
Minor problem: there is no analytic solution.
![Page 83: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/83.jpg)
(from Minka ’98)
![Page 84: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/84.jpg)
0 1
... and, likelihood is convex for this model:
![Page 85: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment](https://reader034.fdocuments.us/reader034/viewer/2022042716/55b3beedbb61ebe3088b4751/html5/thumbnails/85.jpg)
Summary
•Learning is optimization: choose parameters that optimize some function, such as likelihood.
•Supervised: maximum likelihood.
•Beware of overfitting.
•Unsupervised: expectation maximization.
•Many other objective functions and algorithms.