Introduction to Machine Translation and Phrase-Based...
Transcript of Introduction to Machine Translation and Phrase-Based...
![Page 1: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/1.jpg)
Introduction to Machine Translation andPhrase-Based Machine Translation
Ales Tamchyna
Charles University in Prague
September 8, 2015
Ales Tamchyna Introduction to MT September 8, 2015 1 / 33
![Page 2: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/2.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.
Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 3: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/3.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 4: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/4.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.
Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 5: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/5.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 6: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/6.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.
Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 7: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/7.jpg)
Other Approaches to Machine Translation
Our topic: phrase-based MT.(Some) other approaches:
Neural networks.Wednesday 12:00: Neural Network Models in Google Translate (KeithStevens)
Deep (dependency) syntax.Thursday 9:00: Deep Syntactic MT and TectoMT (Martin Popel)
Constituency syntax.Friday 13:30: Syntax Extraction and Decoding (Hieu Hoang)
Ales Tamchyna Introduction to MT September 8, 2015 2 / 33
![Page 8: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/8.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 9: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/9.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 10: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/10.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 11: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/11.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 12: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/12.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 13: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/13.jpg)
Where to Find More?Books:
Ondrej Bojar: Cestina a strojovy preklad
Philipp Koehn: Statistical Machine Translation
Online:
http://www.statmt.org/
http://www.statmt.org/moses/
http://mttalks.ufal.cz/
Ales Tamchyna Introduction to MT September 8, 2015 3 / 33
![Page 14: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/14.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.
E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 15: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/15.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 16: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/16.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 17: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/17.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.
Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 18: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/18.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 19: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/19.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 20: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/20.jpg)
Probability – Quick Refresher
P(A) ∈ [0, 1] ... Probability of event A.E.g. what is the chance it will rain today?
P(A ∩ B) or P(A,B)... Joint probability (both A and B occur).
P(A|B) ... Probability of event A given that B occurred.Given that I see clouds (B), what is the chance it will rain today (A)?
P(A|B) =P(A ∩ B)
P(B)
Bayes’ Theorem (inverse probability):
P(B|A) =P(A|B)P(B)
P(A)
Ales Tamchyna Introduction to MT September 8, 2015 4 / 33
![Page 21: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/21.jpg)
Our Goal: Phrase-Based Machine Translation
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
Ales Tamchyna Introduction to MT September 8, 2015 5 / 33
![Page 22: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/22.jpg)
The Essential Ingredient
Ales Tamchyna Introduction to MT September 8, 2015 6 / 33
![Page 23: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/23.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 24: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/24.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 25: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/25.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 26: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/26.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 27: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/27.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.
Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 28: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/28.jpg)
Our Own Parallel Corpus
zluty
zluty
byl
ten
papousek
the
parrot
was
yellow
yellow
zluty
zluty
pes
yellow
yellow
dog
What does ”zluty” mean in English?
We used the data to infer an alignment between the words.Given the alignment, we could find the most probable translation.
Ales Tamchyna Introduction to MT September 8, 2015 7 / 33
![Page 29: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/29.jpg)
Estimating Translation Probability
If we had the alignment:
zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
P(”yellow”|”zluty”) =c(yellow→ zluty)
c(zluty)=
2
2= 1
P(∗|”zluty”) = 0
We will align the English words to Czech words.
Ales Tamchyna Introduction to MT September 8, 2015 8 / 33
![Page 30: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/30.jpg)
Estimating Translation Probability
If we had the alignment:
zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
P(”yellow”|”zluty”) =c(yellow→ zluty)
c(zluty)=
2
2= 1
P(∗|”zluty”) = 0
We will align the English words to Czech words.
Ales Tamchyna Introduction to MT September 8, 2015 8 / 33
![Page 31: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/31.jpg)
Estimating Translation Probability
If we had the alignment:
zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
P(”yellow”|”zluty”) =c(yellow→ zluty)
c(zluty)=
2
2= 1
P(∗|”zluty”) = 0
We will align the English words to Czech words.
Ales Tamchyna Introduction to MT September 8, 2015 8 / 33
![Page 32: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/32.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Our approach: distribute our one alignment link among all words.
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 33: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/33.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
How to weight these ”partial” links? Use translation probability P(e|f).
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 34: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/34.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Initially, we don’t know anything ⇒ start with uniform estimates.
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 35: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/35.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Let’s sum up the evidence that ”yellow” aligns to ”zluty”:c(yellow→ zluty) = 1/4 + 1/2 = 3/4
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 36: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/36.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
...and the evidence that ”yellow” aligns to other words...c(yellow→ zluty) = 3/4c(yellow→ byl) = 1/4c(yellow→ ten) = 1/4c(yellow→ papousek) = 1/4c(yellow→ pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 37: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/37.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .
c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 38: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/38.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .
c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 39: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/39.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .
c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 40: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/40.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
...and do the same for the other ”partial” alignment links...c(yellow→ zluty) = 3/4, c(yellow→ byl) = 1/4, . . .c(was→ zluty) = 1/4, c(was→ byl) = 1/4, . . .c(parrot→ zluty) = 1/4, c(parrot→ byl) = 1/4, . . .c(the→ zluty) = 1/4, c(the→ byl) = 1/4, . . .c(dog→ zluty) = 1/2, c(dog→ pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 41: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/41.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/2
3/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Normalize to get the conditional probability distributions:P(yellow|zluty) = 3/8 P(yellow|byl) = 1/4 P(yellow|pes) = 1/2P(was|zluty) = 1/8 P(was|byl) = 1/4 P(was|pes) = 0
P(parrot|zluty) = 1/8 P(parrot|byl) = 1/4 · · · P(parrot|pes) = 0P(the|zluty) = 1/8 P(the|byl) = 1/4 P(the|pes) = 0P(dog|zluty) = 2/8 P(dog|byl) = 0 P(dog|pes) = 1/2
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 42: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/42.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/2
3/9
2/9
2/9
2/9
3/7
4/7
3/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
What next? Iterate.
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 43: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/43.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/7
3/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 44: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/44.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 45: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/45.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 46: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/46.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 47: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/47.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 48: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/48.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
P(yellow|zluty) = 0.5 P(yellow|byl) = 0.206 P(yellow|pes) = 0.462P(was|zluty) = 0.094 P(was|byl) = 0.265 P(was|pes) = 0
P(parrot|zluty) = 0.094 P(parrot|byl) = 0.265 · · · P(parrot|pes) = 0P(the|zluty) = 0.094 P(the|byl) = 0.265 P(the|pes) = 0P(dog|zluty) = 0.219 P(dog|byl) = 0 P(dog|pes) = 0.538
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 49: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/49.jpg)
Estimation of IBM Model 1zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
1/4
1/4
1/4
1/4
1/2
1/2
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4 1/2
1/2
3/8
1/4
1/4
1/4
3/8
1/23/9
2/9
2/9
2/9
3/7
4/73/9
2/9
2/9
2/9
3/7
4/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7
1/7
2/7
2/7
2/7 1/3
2/3
0.447
0.184
0.184
0.184
0.520
0.480
The algorithm: expectation maximization (EM)
1 Initialize the model with uniform probabilities.
2 Apply the model to the data (expectation step).
3 Re-estimate the model from the data (maximization step).
4 Go to 2 and repeat until probabilities stop changing.
Ales Tamchyna Introduction to MT September 8, 2015 9 / 33
![Page 50: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/50.jpg)
Word-Based Models
IBM Models 1-5 (increasing model complexity)
Brown et al. (1993): The Mathematics of Statistical MachineTranslation: Parameter Estimation
Originally developed for word-based translation
Higher models account for:I word position (IBM 1 only models the lexical translation probability)I fertility (number of English words aligned to a foreign word)
Today: used for word alignment
Ales Tamchyna Introduction to MT September 8, 2015 10 / 33
![Page 51: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/51.jpg)
IBM Model 1
We treat the alignment between words as a hidden variable.
Alignment is a function; each English word (position) picks a foreigncounterpart, e.g. a(4) = 1 (”yellow” aligns to ”zluty” in the firstsentence).
IBM Model 1 only models lexical translation probability, so formally,the probability of sentence e = (e1, . . . , em) given f = (f1, . . . , fn) is:
P(e|f) =n∑
a1=0
· · ·n∑
am=0
ε
(n + 1)m
m∏j=1
t(ej |faj ) =ε
(n + 1)m
m∏j=1
n∑i=0
t(ej |fi )
EM finds such an alignment which maximizes the (log) likelihood ofour data.
Ales Tamchyna Introduction to MT September 8, 2015 11 / 33
![Page 52: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/52.jpg)
NULL Token
NULL
vidım
problem
I
see
a
problem
Ales Tamchyna Introduction to MT September 8, 2015 12 / 33
![Page 53: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/53.jpg)
NULL Token
NULL
vidım
problem
I
see
a
problem
Do we align the indefinite article to all Czech nouns?
Ales Tamchyna Introduction to MT September 8, 2015 12 / 33
![Page 54: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/54.jpg)
NULL Token
NULL
vidım
problem
I
see
a
problem
Align words which are dropped in Czech to NULL.
Ales Tamchyna Introduction to MT September 8, 2015 12 / 33
![Page 55: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/55.jpg)
From IBM Models to Word Alignment
IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.
Alignment is a function ⇒ one English word cannot align to more thanone foreign word.
psacı
stroj
typewriter vysavac vacuum
cleaner
There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.
Ales Tamchyna Introduction to MT September 8, 2015 13 / 33
![Page 56: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/56.jpg)
From IBM Models to Word Alignment
IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.
psacı
stroj
typewriter vysavac vacuum
cleaner
There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.
Ales Tamchyna Introduction to MT September 8, 2015 13 / 33
![Page 57: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/57.jpg)
From IBM Models to Word Alignment
IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.
psacı
stroj
typewriter vysavac vacuum
cleaner
There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.
Ales Tamchyna Introduction to MT September 8, 2015 13 / 33
![Page 58: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/58.jpg)
From IBM Models to Word Alignment
IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.
psacı
stroj
typewriter vysavac vacuum
cleaner
There is no way that we can get this word alignment with our currentmodels.
Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.
Ales Tamchyna Introduction to MT September 8, 2015 13 / 33
![Page 59: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/59.jpg)
From IBM Models to Word Alignment
IBM models 1 to 5 are learned in a sequence; estimated parameters of onemodel are used to initialize the next model.At the end of training, we can obtain the most likely alignment of the data.Alignment is a function ⇒ one English word cannot align to more thanone foreign word.
psacı
stroj
typewriter vysavac vacuum
cleaner
There is no way that we can get this word alignment with our currentmodels.Solution: run the alignment in both directions (i.e., train all the modelstwice, English→Czech and Czech→English) and symmetrize thealignment.
Ales Tamchyna Introduction to MT September 8, 2015 13 / 33
![Page 60: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/60.jpg)
Alignment Symmetrization
A heuristic procedure, several possible strategies.
Start with an intersection of the alignment links.
Gradually add links from the union which are allowed by the chosencriteria.
psacı
stroj
typewriter vysavac vacuum
cleaner
Ales Tamchyna Introduction to MT September 8, 2015 14 / 33
![Page 61: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/61.jpg)
Alignment Symmetrization
A heuristic procedure, several possible strategies.
Start with an intersection of the alignment links.
Gradually add links from the union which are allowed by the chosencriteria.
psacı
stroj
typewriter vysavac vacuum
cleaner
Ales Tamchyna Introduction to MT September 8, 2015 14 / 33
![Page 62: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/62.jpg)
Progress Check
zluty
byl
ten
papousek
the
parrot
was
yellow
zluty
pes
yellow
dog
Ales Tamchyna Introduction to MT September 8, 2015 15 / 33
![Page 63: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/63.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
Ales Tamchyna Introduction to MT September 8, 2015 15 / 33
![Page 64: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/64.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
Let’s go from words to phrases.
Ales Tamchyna Introduction to MT September 8, 2015 15 / 33
![Page 65: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/65.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 66: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/66.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 67: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/67.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 68: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/68.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 69: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/69.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 70: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/70.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 71: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/71.jpg)
Phrase Extraction
žlutý
byl
ten
papoušek
theparrotwas
yellow
Ales Tamchyna Introduction to MT September 8, 2015 16 / 33
![Page 72: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/72.jpg)
Building a Phrase Table (Translation Model)
Obtain some parallel data (sentence aligned).
Run word alignment (IBM models) in both directions, symmetrize.
Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).
Count phrase (co-)occurrences to estimate phrase translationprobabilities:
P(e|f) =c(e&f)
c(f)
Tiny example:
zluty papousek ||| a yellow parrot ||| 0.1
zluty papousek ||| yellow parakeet ||| 0.1
zluty papousek ||| yellow parrot ||| 0.6
zluty papousek ||| yellowish parrot ||| 0.2
Ales Tamchyna Introduction to MT September 8, 2015 17 / 33
![Page 73: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/73.jpg)
Building a Phrase Table (Translation Model)
Obtain some parallel data (sentence aligned).
Run word alignment (IBM models) in both directions, symmetrize.
Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).
Count phrase (co-)occurrences to estimate phrase translationprobabilities:
P(e|f) =c(e&f)
c(f)
Tiny example:
zluty papousek ||| a yellow parrot ||| 0.1
zluty papousek ||| yellow parakeet ||| 0.1
zluty papousek ||| yellow parrot ||| 0.6
zluty papousek ||| yellowish parrot ||| 0.2
Ales Tamchyna Introduction to MT September 8, 2015 17 / 33
![Page 74: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/74.jpg)
Building a Phrase Table (Translation Model)
Obtain some parallel data (sentence aligned).
Run word alignment (IBM models) in both directions, symmetrize.
Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).
Count phrase (co-)occurrences to estimate phrase translationprobabilities:
P(e|f) =c(e&f)
c(f)
Tiny example:
zluty papousek ||| a yellow parrot ||| 0.1
zluty papousek ||| yellow parakeet ||| 0.1
zluty papousek ||| yellow parrot ||| 0.6
zluty papousek ||| yellowish parrot ||| 0.2
Ales Tamchyna Introduction to MT September 8, 2015 17 / 33
![Page 75: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/75.jpg)
Building a Phrase Table (Translation Model)
Obtain some parallel data (sentence aligned).
Run word alignment (IBM models) in both directions, symmetrize.
Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).
Count phrase (co-)occurrences to estimate phrase translationprobabilities:
P(e|f) =c(e&f)
c(f)
Tiny example:
zluty papousek ||| a yellow parrot ||| 0.1
zluty papousek ||| yellow parakeet ||| 0.1
zluty papousek ||| yellow parrot ||| 0.6
zluty papousek ||| yellowish parrot ||| 0.2
Ales Tamchyna Introduction to MT September 8, 2015 17 / 33
![Page 76: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/76.jpg)
Building a Phrase Table (Translation Model)
Obtain some parallel data (sentence aligned).
Run word alignment (IBM models) in both directions, symmetrize.
Extract admissible phrase pairs up to a certain length (typicallyaround 7 words).
Count phrase (co-)occurrences to estimate phrase translationprobabilities:
P(e|f) =c(e&f)
c(f)
Tiny example:
zluty papousek ||| a yellow parrot ||| 0.1
zluty papousek ||| yellow parakeet ||| 0.1
zluty papousek ||| yellow parrot ||| 0.6
zluty papousek ||| yellowish parrot ||| 0.2
Ales Tamchyna Introduction to MT September 8, 2015 17 / 33
![Page 77: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/77.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
How do we decide which of these translations is best?
Ales Tamchyna Introduction to MT September 8, 2015 18 / 33
![Page 78: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/78.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
How do we decide which of these translations is best?
Ales Tamchyna Introduction to MT September 8, 2015 18 / 33
![Page 79: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/79.jpg)
The Noisy Channel ModelWarren Weaver (1955):
When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’
Transmitter ReceiverNoisy ChannelEnglish Russian
We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).
e = arg maxe
P(e|f) = arg maxe
P(f|e)P(e)
P(f)
= arg maxe
P(f|e)︸ ︷︷ ︸Translation model
P(e)︸︷︷︸Language model
Ales Tamchyna Introduction to MT September 8, 2015 19 / 33
![Page 80: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/80.jpg)
The Noisy Channel ModelWarren Weaver (1955):
When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’
Transmitter ReceiverNoisy ChannelEnglish Russian
We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).
e = arg maxe
P(e|f) = arg maxe
P(f|e)P(e)
P(f)
= arg maxe
P(f|e)︸ ︷︷ ︸Translation model
P(e)︸︷︷︸Language model
Ales Tamchyna Introduction to MT September 8, 2015 19 / 33
![Page 81: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/81.jpg)
The Noisy Channel ModelWarren Weaver (1955):
When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’
Transmitter ReceiverNoisy ChannelEnglish Russian
We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).
e = arg maxe
P(e|f) = arg maxe
P(f|e)P(e)
P(f)
= arg maxe
P(f|e)︸ ︷︷ ︸Translation model
P(e)︸︷︷︸Language model
Ales Tamchyna Introduction to MT September 8, 2015 19 / 33
![Page 82: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/82.jpg)
The Noisy Channel ModelWarren Weaver (1955):
When I look at an article in Russian, I say: ’This is really writtenin English, but it has been coded in some strange symbols. I willnow proceed to decode.’
Transmitter ReceiverNoisy ChannelEnglish Russian
We are looking for the most probable original English sentence (which wereceived in Russian due to “noise”).
e = arg maxe
P(e|f) = arg maxe
P(f|e)P(e)
P(f)
= arg maxe
P(f|e)︸ ︷︷ ︸Translation model
P(e)︸︷︷︸Language model
Ales Tamchyna Introduction to MT September 8, 2015 19 / 33
![Page 83: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/83.jpg)
Noisy Channel Model
e = arg maxe
P(f|e)P(e)
P(e) is the language model (LM).
P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).
A useful decomposition:
I TM: How accurately does the translation match the input?(Parallel data needed for training.)
I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)
So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)
Ales Tamchyna Introduction to MT September 8, 2015 20 / 33
![Page 84: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/84.jpg)
Noisy Channel Model
e = arg maxe
P(f|e)P(e)
P(e) is the language model (LM).
P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).
A useful decomposition:
I TM: How accurately does the translation match the input?(Parallel data needed for training.)
I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)
So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)
Ales Tamchyna Introduction to MT September 8, 2015 20 / 33
![Page 85: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/85.jpg)
Noisy Channel Model
e = arg maxe
P(f|e)P(e)
P(e) is the language model (LM).
P(f|e) depends on the application:I Automatic speech recognition: the acoustic model.I Spelling correction: the spelling error model.I Machine translation: the translation model (TM).
A useful decomposition:
I TM: How accurately does the translation match the input?(Parallel data needed for training.)
I LM: Is the translation is good (fluent) English?(Only requires monolingual data!)
So far, we only talked about half of the story.(And technically, in the wrong direction, given that we want totranslate Czech into English.)
Ales Tamchyna Introduction to MT September 8, 2015 20 / 33
![Page 86: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/86.jpg)
Language Model
Don’t miss the lecture on language modelling tomorrow (KennethHeafield).
The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)
We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.
Side note – chain rule (example for 4 variables):
P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)
Let’s formulate the n-gram LM:
P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)
≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)
Ales Tamchyna Introduction to MT September 8, 2015 21 / 33
![Page 87: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/87.jpg)
Language Model
Don’t miss the lecture on language modelling tomorrow (KennethHeafield).
The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)
We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.
Side note – chain rule (example for 4 variables):
P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)
Let’s formulate the n-gram LM:
P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)
≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)
Ales Tamchyna Introduction to MT September 8, 2015 21 / 33
![Page 88: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/88.jpg)
Language Model
Don’t miss the lecture on language modelling tomorrow (KennethHeafield).
The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)
We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.
Side note – chain rule (example for 4 variables):
P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)
Let’s formulate the n-gram LM:
P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)
≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)
Ales Tamchyna Introduction to MT September 8, 2015 21 / 33
![Page 89: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/89.jpg)
Language Model
Don’t miss the lecture on language modelling tomorrow (KennethHeafield).
The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)
We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.
Side note – chain rule (example for 4 variables):
P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)
Let’s formulate the n-gram LM:
P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)
≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)
Ales Tamchyna Introduction to MT September 8, 2015 21 / 33
![Page 90: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/90.jpg)
Language Model
Don’t miss the lecture on language modelling tomorrow (KennethHeafield).
The task: decide which sequences of words are good English.For any English sentence e = (e1, . . . , ele ), estimate P(e)
We need to decompose the joint probability somehow. The (usual)solution: n-gram language models.
Side note – chain rule (example for 4 variables):
P(A,B,C ,D) = P(D|A,B,C ) · P(C |A,B) · P(B|A) · P(A)
Let’s formulate the n-gram LM:
P(e) = P(e1)P(e2|e1)P(e3|e1, e2) . . .P(ele |e1, . . . , ele−1)
≈ P(e1)P(e2|e1) . . .P(ele |ele−n+1, . . . , ele−1)
Ales Tamchyna Introduction to MT September 8, 2015 21 / 33
![Page 91: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/91.jpg)
Language Model: Example
A 3-gram language model (only depend on 2 previous words).
P(”thank you very much”) = P(”thank”|”<s>”)
× P(”you”|”<s>thank”)
× P(”very”|”thank you”)
× P(”much”|”you very”)
To estimate e.g. P(”very”|”thank you”), we go through the data andcount:
How many times we saw ”thank you” followed by ”very”.
How many times we saw ”thank you” (followed by anything).
P(”very”|”thank you”) =c(”thank you very”)
c(”thank you”)
Smoothing!
Ales Tamchyna Introduction to MT September 8, 2015 22 / 33
![Page 92: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/92.jpg)
Language Model: Example
A 3-gram language model (only depend on 2 previous words).
P(”thank you very much”) = P(”thank”|”<s>”)
× P(”you”|”<s>thank”)
× P(”very”|”thank you”)
× P(”much”|”you very”)
To estimate e.g. P(”very”|”thank you”), we go through the data andcount:
How many times we saw ”thank you” followed by ”very”.
How many times we saw ”thank you” (followed by anything).
P(”very”|”thank you”) =c(”thank you very”)
c(”thank you”)
Smoothing!
Ales Tamchyna Introduction to MT September 8, 2015 22 / 33
![Page 93: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/93.jpg)
Language Model: Example
A 3-gram language model (only depend on 2 previous words).
P(”thank you very much”) = P(”thank”|”<s>”)
× P(”you”|”<s>thank”)
× P(”very”|”thank you”)
× P(”much”|”you very”)
To estimate e.g. P(”very”|”thank you”), we go through the data andcount:
How many times we saw ”thank you” followed by ”very”.
How many times we saw ”thank you” (followed by anything).
P(”very”|”thank you”) =c(”thank you very”)
c(”thank you”)
Smoothing!
Ales Tamchyna Introduction to MT September 8, 2015 22 / 33
![Page 94: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/94.jpg)
Language Model: Example
A 3-gram language model (only depend on 2 previous words).
P(”thank you very much”) = P(”thank”|”<s>”)
× P(”you”|”<s>thank”)
× P(”very”|”thank you”)
× P(”much”|”you very”)
To estimate e.g. P(”very”|”thank you”), we go through the data andcount:
How many times we saw ”thank you” followed by ”very”.
How many times we saw ”thank you” (followed by anything).
P(”very”|”thank you”) =c(”thank you very”)
c(”thank you”)
Smoothing!
Ales Tamchyna Introduction to MT September 8, 2015 22 / 33
![Page 95: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/95.jpg)
Log-Linear ModelBegin with the noisy channel model:
e = arg maxe
P(f|e)P(e)
= arg maxe
log(P(f|e)P(e))
= arg maxe
log(P(f|e)) + log(P(e))
Perhaps the importance of LM vs. TM should be weighted differently?
e = arg maxe
λTM log(P(f|e)) + λLM log(P(e))
We could add other features (besides LM and TM), so generally:
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 23 / 33
![Page 96: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/96.jpg)
Log-Linear ModelBegin with the noisy channel model:
e = arg maxe
P(f|e)P(e)
= arg maxe
log(P(f|e)P(e))
= arg maxe
log(P(f|e)) + log(P(e))
Perhaps the importance of LM vs. TM should be weighted differently?
e = arg maxe
λTM log(P(f|e)) + λLM log(P(e))
We could add other features (besides LM and TM), so generally:
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 23 / 33
![Page 97: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/97.jpg)
Log-Linear ModelBegin with the noisy channel model:
e = arg maxe
P(f|e)P(e)
= arg maxe
log(P(f|e)P(e))
= arg maxe
log(P(f|e)) + log(P(e))
Perhaps the importance of LM vs. TM should be weighted differently?
e = arg maxe
λTM log(P(f|e)) + λLM log(P(e))
We could add other features (besides LM and TM), so generally:
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 23 / 33
![Page 98: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/98.jpg)
Log-Linear Model: Features
We now have the freedom to add new features. In PBMT, we typically use:
Phrase translation probability, both direct and inverse:I P(e|f)I P(f|e)
Lexical translation probability (direct and inverse):I Plex(e|f)I Plex(f|e)
Language model probability:I P(e)
Phrase penalty.
Word penalty.
Distortion penalty.
Ales Tamchyna Introduction to MT September 8, 2015 24 / 33
![Page 99: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/99.jpg)
Lexical Weights (Plex)
The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)
Ales Tamchyna Introduction to MT September 8, 2015 25 / 33
![Page 100: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/100.jpg)
Lexical Weights (Plex)
The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)
P(”modry autobus pristal na Marsu”|”a blue bus lands on Mars”) = 1
P(”a blue bus lands on Mars”|”modry autobus pristal na Marsu”) = 1
Is that a reliable probability estimate?
Ales Tamchyna Introduction to MT September 8, 2015 25 / 33
![Page 101: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/101.jpg)
Lexical Weights (Plex)
The problem: many extracted phrases are rare.(Esp. long phrases might only be seen once in the parallel corpus.)
P(”; distortion carried - over”|”; zkreslenı”) = 1
P(”; zkreslenı”|”; distortion carried - over”) = 1
Data from the “wild” are noisy. Word alignment contains errors.This is a real phrase pair from our best English-Czech system.Both P(e|f) and P(f|e) say that this is a perfect translation.
Ales Tamchyna Introduction to MT September 8, 2015 25 / 33
![Page 102: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/102.jpg)
Lexical Weights (Plex)
Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.
Several possible definitions, e.g.:
Plex(e|f, a) =le∏
j=1
1
|i |(i , j) ∈ a|∑∀(i ,j)∈a
w(ej , fi )
psacı
stroj
a
typewriter0.2
0.3
0.1
Plex(”a typewriter”|”psacı stroj”) =
[1
1· 0.1
]·[
1
2· (0.3 + 0.2)
]= 0.025
Ales Tamchyna Introduction to MT September 8, 2015 26 / 33
![Page 103: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/103.jpg)
Lexical Weights (Plex)
Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:
Plex(e|f, a) =le∏
j=1
1
|i |(i , j) ∈ a|∑∀(i ,j)∈a
w(ej , fi )
psacı
stroj
a
typewriter0.2
0.3
0.1
Plex(”a typewriter”|”psacı stroj”) =
[1
1· 0.1
]·[
1
2· (0.3 + 0.2)
]= 0.025
Ales Tamchyna Introduction to MT September 8, 2015 26 / 33
![Page 104: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/104.jpg)
Lexical Weights (Plex)
Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:
Plex(e|f, a) =le∏
j=1
1
|i |(i , j) ∈ a|∑∀(i ,j)∈a
w(ej , fi )
psacı
stroj
a
typewriter0.2
0.3
0.1
Plex(”a typewriter”|”psacı stroj”) =
[1
1· 0.1
]·[
1
2· (0.3 + 0.2)
]= 0.025
Ales Tamchyna Introduction to MT September 8, 2015 26 / 33
![Page 105: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/105.jpg)
Lexical Weights (Plex)
Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:
Plex(e|f, a) =le∏
j=1
1
|i |(i , j) ∈ a|∑∀(i ,j)∈a
w(ej , fi )
psacı
stroj
a
typewriter0.2
0.3
0.1
Plex(”a typewriter”|”psacı stroj”) =
[1
1· 0.1
]·[
1
2· (0.3 + 0.2)
]= 0.025
Ales Tamchyna Introduction to MT September 8, 2015 26 / 33
![Page 106: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/106.jpg)
Lexical Weights (Plex)
Decompose the phrase pair into word pairs. Look at the word-leveltranslation probabilities.Several possible definitions, e.g.:
Plex(e|f, a) =le∏
j=1
1
|i |(i , j) ∈ a|∑∀(i ,j)∈a
w(ej , fi )
psacı
stroj
a
typewriter0.2
0.3
0.1
Plex(”a typewriter”|”psacı stroj”) =
[1
1· 0.1
]·[
1
2· (0.3 + 0.2)
]= 0.025
Ales Tamchyna Introduction to MT September 8, 2015 26 / 33
![Page 107: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/107.jpg)
Word Penalty
Not all languages use the same number of words on average.
vidım problem ||| I can see a problem
We want to control how many words are generated.
Word penalty simply adds 1 for each produced word in the translation.
Depending on the λ for word penalty, we will either generate shorteror longer outputs.
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 27 / 33
![Page 108: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/108.jpg)
Word Penalty
Not all languages use the same number of words on average.
vidım problem ||| I can see a problem
We want to control how many words are generated.
Word penalty simply adds 1 for each produced word in the translation.
Depending on the λ for word penalty, we will either generate shorteror longer outputs.
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 27 / 33
![Page 109: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/109.jpg)
Word Penalty
Not all languages use the same number of words on average.
vidım problem ||| I can see a problem
We want to control how many words are generated.
Word penalty simply adds 1 for each produced word in the translation.
Depending on the λ for word penalty, we will either generate shorteror longer outputs.
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 27 / 33
![Page 110: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/110.jpg)
Word Penalty
Not all languages use the same number of words on average.
vidım problem ||| I can see a problem
We want to control how many words are generated.
Word penalty simply adds 1 for each produced word in the translation.
Depending on the λ for word penalty, we will either generate shorteror longer outputs.
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 27 / 33
![Page 111: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/111.jpg)
Word Penalty
Not all languages use the same number of words on average.
vidım problem ||| I can see a problem
We want to control how many words are generated.
Word penalty simply adds 1 for each produced word in the translation.
Depending on the λ for word penalty, we will either generate shorteror longer outputs.
e = arg maxe
∑i
λi fi (e, f)
Ales Tamchyna Introduction to MT September 8, 2015 27 / 33
![Page 112: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/112.jpg)
Phrase Penalty
Add 1 for each produced phrase in the translation.
Varying the λ for phrase penalty can lead to more literal(word-by-word) translations (made from a lot of short phrases) or tomore idiomatic outputs (use fewer, longer phrases – if available).
Ales Tamchyna Introduction to MT September 8, 2015 28 / 33
![Page 113: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/113.jpg)
Phrase Penalty
Add 1 for each produced phrase in the translation.
Varying the λ for phrase penalty can lead to more literal(word-by-word) translations (made from a lot of short phrases) or tomore idiomatic outputs (use fewer, longer phrases – if available).
Ales Tamchyna Introduction to MT September 8, 2015 28 / 33
![Page 114: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/114.jpg)
Distortion Penalty
The simplest way to capture phrase reordering.
Can be sufficient for some language pairs (our English→Czechsystems use it).
Several possible definitions, e.g.:
I Distance between the end of the previous phrase (on the source side)and the beginning of the current phrase.
Ales Tamchyna Introduction to MT September 8, 2015 29 / 33
![Page 115: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/115.jpg)
Model Weights
How to get λs for our feature functions?
Usual approach: set them so that we translate some held-out data aswell as possible.
“Tuning”
See the lecture tomorrow: Discriminative Training (Milos Stanojevic)
Ales Tamchyna Introduction to MT September 8, 2015 30 / 33
![Page 116: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/116.jpg)
Model Weights
How to get λs for our feature functions?
Usual approach: set them so that we translate some held-out data aswell as possible.
“Tuning”
See the lecture tomorrow: Discriminative Training (Milos Stanojevic)
Ales Tamchyna Introduction to MT September 8, 2015 30 / 33
![Page 117: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/117.jpg)
Model Weights
How to get λs for our feature functions?
Usual approach: set them so that we translate some held-out data aswell as possible.
“Tuning”
See the lecture tomorrow: Discriminative Training (Milos Stanojevic)
Ales Tamchyna Introduction to MT September 8, 2015 30 / 33
![Page 118: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/118.jpg)
Model Weights
How to get λs for our feature functions?
Usual approach: set them so that we translate some held-out data aswell as possible.
“Tuning”
See the lecture tomorrow: Discriminative Training (Milos Stanojevic)
Ales Tamchyna Introduction to MT September 8, 2015 30 / 33
![Page 119: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/119.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
Search for the best translation.
Ales Tamchyna Introduction to MT September 8, 2015 31 / 33
![Page 120: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/120.jpg)
Progress Check
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
John gave Mary a kiss yesterday
Search for the best translation.
Ales Tamchyna Introduction to MT September 8, 2015 31 / 33
![Page 121: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/121.jpg)
Translation Process: Generate Translation Options
Jan včera políbil Marii
John
Johny
yesterday kissed
gave a kiss to
Mary
gave Mary a kiss
Ales Tamchyna Introduction to MT September 8, 2015 32 / 33
![Page 122: Introduction to Machine Translation and Phrase-Based ...ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf · Introduction to Machine Translation and Phrase-Based](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058db1d8fbf617dd4192f4f/html5/thumbnails/122.jpg)
Translation Process: Beam Search
...
John
Johny
yesterday
Mary
gave Mary a kiss
gave a kiss to
kissed
Mary
yesterday
...
......
yesterday
Ales Tamchyna Introduction to MT September 8, 2015 33 / 33