Machine Translation
-
Upload
sloane-bryan -
Category
Documents
-
view
38 -
download
2
description
Transcript of Machine Translation
![Page 1: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/1.jpg)
Machine Translation
Om Damani(Ack: Material taken from
JurafskyMartin 2nd Ed., Brown et. al. 1993)
![Page 2: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/2.jpg)
2
The spirit is willing but the flesh is weak
English-Russian Translation System
Дух охотно готов но плоть слаба
Russian-English Translation System
The vodka is good, but the meat is rotten
State of the Art
Babelfish: Spirit is willingly ready but flesh it is weak
Google: The spirit is willing but the flesh is week
![Page 3: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/3.jpg)
3
The spirit is willing but the flesh is weak
Google English-Hindi Translation System
आत्मा� पर शर�र दुर्ब�ल है�
Google Hindi-English Translation System
Spirit on the flesh is weak
State of the Art (English-Hindi) – March 19, 2009
![Page 4: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/4.jpg)
4
Is state of the art so bad
Google English-Hindi Translation System
कल� क� है�लत इतनी� खर�र्ब है�
Google Hindi-English Translation System
The state of the art is so bad
Is State of the Art (English-Hindi) so bad
![Page 5: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/5.jpg)
5
State of the english hindi translation is not so bad
Google English-Hindi Translation System
र�ज्य क� अं�ग्रे�ज़ी� हिहैन्दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�
Google Hindi-English Translation System
State of the English translation of English is not so bad
State of the english-hindi translation is not so bad
OK. Maybe it is __ bad.OK. Maybe it is __ bad.
![Page 6: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/6.jpg)
6
State of the English Hindi translation is not so bad
Google English-Hindi Translation System
र�ज्य मा# अं�ग्रे�जी� से� हिंहै'दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�
Google Hindi-English Translation System
English to Hindi translation in the state is not so bad
State of the English-Hindi translation is not so bad
OK. Maybe it is __ __ bad.OK. Maybe it is __ __ bad.
र�ज्य क� अं�ग्रे�ज़ी� हिहैन्दी� अंनी वा�दी क� इतनी� र्ब र� नीहै" है�
![Page 7: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/7.jpg)
7
Your Approach to Machine Translation
![Page 8: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/8.jpg)
8
Translation Approaches
![Page 9: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/9.jpg)
9
Direct Transfer – What Novices do
![Page 10: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/10.jpg)
10
Direct Transfer: Limitations
Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl
Many Bengali poets have sung songs of this land
Final: Many Bengali poets of this land songs have sung
Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl
कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain
Morph: कई र्ब�गा�ल� कहिवा-PL,OBL नी� इसे भू,मिमा क� गा�त { गा�ए है�}-PrPer,PlKai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl
![Page 11: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/11.jpg)
11
Syntax Transfer (Analysis-Transfer-Generation)
Here phrases NP, VP etc. can be arbitrarily large
![Page 12: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/12.jpg)
12
Syntax Transfer Limitations
He went to Patna -> Vah Patna gaya
He went to Patil -> Vah Patil ke pas gaya
Translation of went depends on the semantics of the object of went
Fatima eats salad with spoon – what happens if you change spoon
Semantic properties need to be included in transfer rules – Semantic Transfer
![Page 13: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/13.jpg)
13
Interlingua Based Transfer
you
this
farmer
agtobj
pur
plc
contact
nam
orregion
khatav
manchar
taluka
nam :01
For this, you contact the farmers of Manchar region or of Khatav taluka.
In theory: N analysis and N transfer modules in stead of N2
In practice: Amazingly complex system to tackle N2 language pairs
![Page 14: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/14.jpg)
14
Difficulties in Translation – Language Divergence (Concepts from Dorr 1993, Text/Figures from Dave, Parikh and
Bhattacharyya 2002)
Constituent Order Prepositional Stranding Null Subject
Conflational Divergence Categorical Divergence
![Page 15: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/15.jpg)
15
Lost in Translation: We are talking mostly about syntax, not semantics, or pragmatics
You: Could you give me a glass of waterRobot: Yes.….wait..wait..nothing happens..wait……Aha, I see…You: Will you give me a glass of water…wait…wait..wait..
Image from http://inicia.es/de/rogeribars/blog/lost_in_translation.gif
![Page 16: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/16.jpg)
16
CheckPoint State of the Art Different Approaches Translation Difficulty Need for a novel approach
![Page 17: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/17.jpg)
17
Statistical Machine Translation: Most ridiculous idea ever
Consider all possible partitions of a sentence.For a given partition,
Consider all possible translations of each part.Consider all possible combinations of all possible translationsConsider all possible permutations of each combination
And somehow select the best partition/translation/permutation
कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain
कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/
Many Bengali Poets this land of have sung poem
Several Bengali to this place ‘s sing songs
Many poets from Bangal
in this space song sung
Poets from Bangladesh
farm have sung songs
To this space have sung songs of many poets from Bangal
![Page 18: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/18.jpg)
18
How many combinations are we talking about
Number of choices for a N word sentence
N=20 ??
Number of possible chess games
![Page 19: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/19.jpg)
19
How do we get the Phrase TableCollect large amount of bi-lingual parallel text.For each sentence pair, Consider all possible partitions of both sentences For a given partition pair, Consider all possible mapping between parts (phrases) on two sideSomehow assign the probability to each phrase pair
इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए
For this you contact the farmers of Manchar region
![Page 20: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/20.jpg)
20
Data Sparsity Problems in Creating Phrase Table
Sunil is eating mangoe -> Sunil aam khata haiNoori is eating banana -> Noori kela khati haiSunil is eating banana -> We need examples of everyone eating everything !!
We want to figure out that eating can be either khata hai or khati hai
And let Language Model select from ‘Sunil kela khata hai’ and ‘Sunil kela khati hai’
Select well-formed sentences among all candidates using LM
![Page 21: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/21.jpg)
21
Formulating the Problem
. A language model to compute P(E)
. A translation model to compute P(F|E)
. A decoder, which is given F and produces the most probable E
![Page 22: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/22.jpg)
22
P(F|E) vs. P(E|F)
P(F|E) is the translation probability – we need to look at the generationprocess by which <F,E> pair is obtained.
Parts of F correspond to parts of E. With suitable independence assumptions,P(F|E) measures whether all parts of E are covered by F.
E can be quite ill-formed.
It is OK if {P(F|E) for an ill-formed E} is greater than the {P(F|E) for a well formed E}. Multiplication by P(E) should hopefully take care of it.
We do not have that luxury in estimating P(E|F) directly – we will need toensure that well-formed E score higher.
Summary: For computing P(F|E), we may make several independence assumptions that are not valid. P(E) compensated for that.
P( र्ब�रिरश है8 रहै� है�|It is raining) = .02P( र्बरसे�त आ रहै� है�| It is raining) = .03P( र्ब�रिरश है8 रहै� है�|rain is happening) = .420
We need to estimate P(It is raining| र्ब�रिरश है8 रहै� है�) vs. P(rain is happening| र्ब�रिरश है8 रहै� है�)
![Page 23: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/23.jpg)
23
CheckPoint From a parallel corpus, generate
probabilistic phrase table Give a sentence, generate various
candidate translations using the phrase table
Evaluate the candidates using Translation and Language Models
![Page 24: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/24.jpg)
24
What is the meaning of Probability of Translation What is the meaning of P(F|E) By Magic: you simply know P(F|E) for every (E,F) pair –
counting in a parallel corpora Or, each word in E generates one word of F, independent of
every other word in E or F Or, we need a ‘random process’ to generate F from E A semantic graph G is generated from E and F is generated
from G We are no better off. We now have to estimate P(G|E) and P(F|
G) for various G and then combine them – How? We may have a deterministic procedure to convert E to G, in
which case we still need to estimate P(F|G) A parse tree TE is generated from E; TE is transformed to TF;
finally TF is converted into F Can you write the mathematical expression
![Page 25: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/25.jpg)
25
The Generation Process Partition: Think of all possible partitions of the
source language Lexicalization: For a give partition, translate each
phrase into the foreign language Spurious insertion: add foreign words that
are not attributable to any source phrase Reordering: permute the set of all foreign words -
words possibly moving across phrase boundaries
Try writing the probability expression for the generation process
We need the notion of alignment
![Page 26: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/26.jpg)
26
Generation Example: Alignment
![Page 27: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/27.jpg)
27
Simplify Generation: Only 1->Many Alignments allowed
![Page 28: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/28.jpg)
28
AlignmentA function from target position to source position:
The alignment sequence is: 2,3,4,5,6,6,6Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..
To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I+1)J
![Page 29: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/29.jpg)
29
IBM Model 1: Generative Process
![Page 30: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/30.jpg)
30
IBM Model 1: Basic Formulation
),,|(*),|(*)|()|(
:rit togethe Putting
),,|(*),|(),|,(),|(
),|(*)|(),'|(*)|'()|('
AEJFPEJAPEJPEFP
AEJFPEJAPEJAFPEJFP
EJFPEJPEJFPEJPEFP
A
AA
J
![Page 31: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/31.jpg)
31
IBM Model 1: Details
No assumptions. Above formula is exact. Choosing length: P(J|E) = P(J|E,I) = P(J|I) = Choosing Alignment: all alignments equiprobable
Translation Probability
),,|(*),|(*)|()|( AEJFPEJAPEJPEFPA
A
J
jaJ jjeft
IEFP
1
)|(*)1(
)|(
),,,|(*),,|(
),,|(*),|(
),,|(*),|(
11
11
11
11
1
11111
IJjj
J
j
Ijj
IJJIJ
eaJffPeJaaP
eaJfPeJaP
EJAFPEJAP
A
IJjj
J
j
Ijj eaJffPeJaaPEJPEFP ),,,|(*),,|(*)|()|( 1
11
11
11
11
1
1),,|( 1
11
IeJaaP Ij
j
)|(),,,|( 11
11
1 jajefteaJffP IJj
j
![Page 32: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/32.jpg)
32
HMM Alignment All alignments are not equally likely
Can you guess what properties does an alignment have
Alignments tend to be locality preserving – neighboring words tend to get aligned together
We would like P(aj) to depend on aj-1
![Page 33: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/33.jpg)
33
HMM Alignment: Details P(F,A|J,E) decomposed as P(A|
J,E)*P(F|A,J,E) in Model 1 Now we will decompose it
differently (J is implict, not mentioned in conditional
expressions)
Alignment Assumption (Markov): Alignment probability of Jth word P(aj) depends only on the alignment of the previous word aj-1
Translation assumption: probability of the foreign word fj
depends only on the aligned English word eaj
),,|(*),,|(
),,|,(
)|,(
)|,(
111
111
11
11
11
11
11
111
Ijjj
Ijjj
J
j
Ijjjj
J
j
IJJ
eaffPeafaP
eafafP
eafP
EAFP
),|(),,|( 111
11
1 IaaPeafaP jjIjj
j
)|(),,|( 111
1 jajIjj
j efPeaffP
A
ajjj
J
jAj
efPIaaPIJPEAFPEFP )|(*),|(*)|()|,()|( 11
![Page 34: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/34.jpg)
34
Computing the Alignment Probability P(aj|aj-1, I) is written as P(i|i’, I)
Assume - probability does not depend on absolute word positions but on the jump-width (i-i’) between words: P (4 | 6, 17) = P (5 | 7, 17)
Note: Denominator counts are collected over sentences of all lengths. But sum is performed over only those jump-widths relevant to (i,i’) – For I’=6: -5 to 11 is relevant
![Page 35: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/35.jpg)
35
HMM Model - Example
A
ajjj
J
jAj
efPIaaPIJPEAFPEFP )|(*),|(*)|()|,()|( 11
P(F,A|E) = P(J=10|I=9)*P(2|start,9)*P(इसेक� |this)*P(-1|2,9) *P(लिलए|this)*P(2|1,9)*….*P(0|4,9)*P(क�जिजीए|contact)
![Page 36: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/36.jpg)
36
Enhancing the HMM model Add NULL words in the English to which foreign
words can align Condition the alignment on the word class of the
previous English word
Other suggestions ?? What is the problem in making more realistic
assumptions How to estimate the parameters of the model
))(,,|(11 jajj eCIaaP
![Page 37: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/37.jpg)
37
Checkpoint Generative Process is important for
computing probability expressions Model1 and HMM model What about Phrase Probabilities
![Page 38: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/38.jpg)
38
Training Alignment Models Given a parallel corpora, for each (F,E)
learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment
probability P(ai|ai-1,I) for the HMM model
How will you compute these probabilities if all you have is a parallel corpora
![Page 39: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/39.jpg)
39
Intuition : Interdependence of Probabilities If you knew which words are probable
translation of each other then you can guess which alignment is probable and which one is improbable
If you were given alignments with probabilities then you can compute translation probabilities
Looks like a chicken and egg problem Can you write equations expressing one in
terms of other
![Page 40: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/40.jpg)
40
Computing Alignment Probabilities Align. Prob. In terms of
trans. Prob. :P(A,F|J,E)
Compute P(A) in terms of P(A,F) Note: Prior Prob. for all
Alignments are equal. We are interested in posterior probabilities.
Can you specify translation prob. in terms of align. prob.
J
jaj
J jeft
I
EJAFPEJAPEJFAP
1
)|(*)1(
),,|(*),|(),|,(
A
EFAP
EFAPEFAP
)|,(
)|,(),|(
![Page 41: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/41.jpg)
41
Computing Translation probabilities
P(से�पक� | contact) = 2/6
What if alignments had probabilities
.5
.3
= (.5*1+.3*1+.9*0)/(.5*3+.3*2+.9*1)=.8/3
Note: It is not .7*1/3 + .5*1/2 + .9*0 ??
.9
![Page 42: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/42.jpg)
42
Computing Translation Probabilities –Maximum Likelihood Estimate
f
EF A
eftcount
eftcounteft
EFAefCEFAPeftcount
)|(
)|()|(
),,|,(*),|()|(,
![Page 43: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/43.jpg)
43
Expectation Maximization (EM) Algorithm
Used when we want maximum likelihood estimate of the parameters ofa model when the model depends on hidden variables-In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities
Init: Start with an arbitrary estimate of parametersE-step: compute the expected value of hidden variablesM-Step: Recompute the parameters that maximize the likelihood of
data given the expected value of the hidden variables from E-step
![Page 44: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/44.jpg)
44
Working out alignments for a simplified Model 1 Ignore the NULL words Assume that every english word aligns
with some foreign word (just to reduce the number of alignments for the illustration)
![Page 45: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/45.jpg)
45
Example of EMGreen houseCasa verde
The houseLa case
Init: Assume that any word can generate any word with equal prob:
P(la|house) = 1/3
![Page 46: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/46.jpg)
46
E-Step
J
jaj
J jeft
I
EJAFPEJAPEJFAP
1
)|(*)1(
),,|(*),|(),|,(
E-Step:
A
EFAP
EFAPEFAP
)|,(
)|,(),|(
![Page 47: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/47.jpg)
47
M-Step
f
EF A
eftcount
eftcounteft
EFAefCAPeftcount
)|(
)|()|(
),,|,(*)()|(,
![Page 48: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/48.jpg)
48
E-Step again
J
jaj
J jeft
I
EJAFPEJAPEJFAP
1
)|(*)1(
),,|(*),|(),|,(
A
EFAP
EFAPEFAP
)|,(
)|,(),|( 1/3 2/3 2/3 1/3
Repeat till convergence
![Page 49: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/49.jpg)
49
Computing Translation Probabilities in Model 1 E-M algo is fine, but it requires exponential
computation For each alignment we recompute alignment probability Translation probability is computed from all alignment
probabilities We need efficient algo
![Page 50: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/50.jpg)
50
![Page 51: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/51.jpg)
51
![Page 52: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/52.jpg)
52
![Page 53: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/53.jpg)
53
Checkpoint Use of EM algorithm for estimating phrase
probabilities under IBM Model-1 An example And an efficient algorithm
![Page 54: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/54.jpg)
54
Generating Bi-directional Alignments Existing models only generate uni-directional
alignments Combine two uni-directional alignments to get
many-to-many bi-directional alignments
![Page 55: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/55.jpg)
55
Eng-Hindi Alignment
छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�
Goa
|
is
a
|premier
|
beach
|
vacation
|
destination
|
![Page 56: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/56.jpg)
56
Hindi-Eng Alignmentछु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-
तटी�य गा�तव्य है�
Goa |
is
a |premier |
beach
vacation | | |destination | |
![Page 57: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/57.jpg)
57
Combining Alignments
छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�
Goa +
is
a +premier |
|
beach
|
vacation | |
+
destination
|
| |
P=2/3=.67, R=2/7=.3P=4/5=.8,R=4/7=.6
P=5/6=.83,R=5/7=.7P=6/9=.67,R=6/7=.85
![Page 58: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/58.jpg)
58
A Different Heuristic from Moses-Site
GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e);
GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and
( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and
( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new )
Proposed Changes:After growing diagonalAlign the shorter sentence firstAnd use alignments only fromcorresponding directional alignment
![Page 59: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/59.jpg)
59
Generating Phrase Alignments
छु ट्टि;य* क� लिलए गा8वा� एक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�
Goa +
is
a +premier +
beach
+
vacation + +
+
destination + +
a premier beach vacation destinationएक प्रमा ख सेमा द्र-तटी�य गा�तव्य है�
premier beach vacationप्रमा ख सेमा द्र-तटी�य
![Page 60: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/60.jpg)
60
Phrase Alignment Probabilities We have been dealing with just one
sentence pair. In fact, we have been dealing with just one
alignment – the most probable alignment Such a alignment can easily have
mistakes, and generate garbage phrases Compute phrase alignment probabilities
over entire corpus
f
efcount
efcountef
),(
),()|(
![Page 61: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/61.jpg)
61
IBM Model 3
Model 1 story seems bizarre -Who will first chose the sentence length and then align and then generate
A more likely case is - generate translation for each word and then reorder
Model 1 Generative story
![Page 62: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/62.jpg)
62
Model 3 Generative Story
![Page 63: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/63.jpg)
63
Model 3 Formula Ignore NULL for a
moment Choosing Fertility:
Generating words:
Aligning words:
)|(1
i
I
ii en
))|((*)!(11
ja
J
jj
I
ii eft
),,|(0:
JIajdJ
ajj
j
![Page 64: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/64.jpg)
64
Generating Spurious Words Instead of using n(2|NULL) or n(1|NULL) With probability p1, generate a spurious word
every time a valid word is generated Ensures that longer sentences generate more
spurious words
![Page 65: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/65.jpg)
65
Diagrams converted into pictures in next slides
![Page 66: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/66.jpg)
66
इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए
For this you contact the farmers of Manchar region
![Page 67: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/67.jpg)
67
इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए
For this you contact the farmers of Manchar region
![Page 68: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/68.jpg)
68
इसेक� लिलए हिकसे�नी* से5 मिमालिलय�
For this you contact the farmers
![Page 69: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/69.jpg)
69
इसेक� लिलए आप मा�चर क्षे�त्र क� हिकसे�नी* से5 से�पक� क�जिजीए
For this you contact the farmers of Manchar region
![Page 70: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/70.jpg)
70
OchNey03 Heuristic: Intution Decide the intersection Extend it by adding alignments from the
union if both the words in union alignment are not already aligned in the final alignment
Then add an alignment only if: It already has an adjacent alignment in the
final alignment, and, Adding it will not cause any final alignment to
have both horizontal and vertical neighbors as final alignments
![Page 71: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/71.jpg)
71
SMT Example कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/
Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain
कई र्ब�गा�ल� कहिवाय* नी� इसे भू,मिमा क� गा�त गा�ए है/
Many Bengali Poets this land of have sung poem
Several Bengali to this place ‘s sing songs
Many poets from Bangal
in this space song sung
Poets from Bangladesh
farm have sung songs
To this space have sung songs of many poets from Bangal
![Page 72: Machine Translation](https://reader030.fdocuments.us/reader030/viewer/2022032612/56812dd3550346895d931a19/html5/thumbnails/72.jpg)
72
Translation Model - Notations F: f1, f2,..,fJ ; E: e1, e2,..eJ P(F|E) not same as P (f1..fJ | e1..eI)
What is P(फा�हितमा� च�वाल ख�त� है�| Fatima eats rice) P(F|E) = P (J, f1..fJ | I, e1..eI)
We explicitly mention I and J only when needed We will work with above formulation instead of the
alternative P(F|E) = P (w1(F)=f1..wJ(F)=fJ w[J+1](F)=$ | …)
फा�हितमा� च�वाल ख�त� है� Fatima eats rice
फा�हितमा� चम्माच से� च�वाल ख�त� है� Fatima eats rice with spoon