CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 ·...
Transcript of CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 ·...
![Page 1: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/1.jpg)
CS460/626 : Natural LanguageCS460/626 : Natural Language Processing/Speech, NLP and the Web
(Lecture 17 Alignment in SMT)(Lecture 17– Alignment in SMT)
Pushpak BhattacharyyaPushpak BhattacharyyaCSE Dept., IIT Bombay
14th F b 201114th Feb, 2011
![Page 2: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/2.jpg)
Language Divergence Theory: Lexico-Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, g ( yyJournal of MT, 2002)
Conflational divergenceCo at o a d e ge ceF: vomir; E: to be sickE: stab; H: churaa se maaranaa (knife-with hit)S: Utrymningsplan; E: escape plan
Structural divergenceE: SVO; H: SOV
Categorial divergenceChange is in POS category (many examples discussed)
Head swapping divergenceE: Prime Minister of India; H: bhaarat ke pradhaan mantrii(I di f P i Mi i t )(India-of Prime Minister)
Lexical divergenceE: advise; H: paraamarsh denaa (advice give): NounIncorporation- very common Indian Language PhenomenonIncorporation very common Indian Language Phenomenon
![Page 3: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/3.jpg)
Language Divergence Theory: Syntactic DivergencesDivergences
Constituent Order divergenceE: Singh, the PM of India, will address the nation today; H:bh t k dh t ii i h (I di f PM Si h )bhaarat ke pradhaan mantrii, singh, … (India-of PM, Singh…)
Adjunction DivergenceE: She will visit here in the summer; H: vah yahaa garmii meM
ii ( h h i ill )aayegii (she here summer-in will come)Preposition-Stranding divergence
E: Who do you want to go with?; H: kisake saath aap jaanaah h t h ? ( h ith )chaahate ho? (who with…)
Null Subject DivergenceE: I will go; H: jaauMgaa (subject dropped)
Pleonastic DivergenceE: It is raining; H: baarish ho rahii haai (rain happening is: notranslation of it)
![Page 4: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/4.jpg)
Alignment
Completely alignedYour answer is rightYour answer is rightVotre response est just
Problematic alignmentProblematic alignmentWe first met in ParisN t lNous nous sommes rencontres pour la premiere fois a Paris
![Page 5: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/5.jpg)
Th St ti ti l MT d l t tiThe Statistical MT model: notationSource language: FSource language: FTarget Language: ESource language sentence: fg gTarget language sentence: eSource language word: wf
Target language word: we
![Page 6: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/6.jpg)
The Statistical MT modelThe Statistical MT modelTo translate f:To translate f:
h ll i1. Assume that all sentences in E are translations of f with some probability!
2. Choose the translation with the highest probability
ˆ arg max ( | )e
e P e f=
![Page 7: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/7.jpg)
SMT M d lSMT Model
• What is a good translation?What is a good translation?– Faithful to source
Fluent in target– Fluent in target
faithfulness
ˆ arg max ( ) ( | ) e
e P e P f e=e
fluency
![Page 8: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/8.jpg)
Language ModelingLanguage Modeling• Task to find P(e) (assigning probabilities toTask to find P(e) (assigning probabilities to
sentences)
1 2If ,ne w w w= K
1 2 1 2 1 3 1 2 1 2 1( ) ( ... ) ( ) ( | ) ( | ) ( | )
( )
n n nP e P w w w P w P w w P w w w P w w w w
count w w w
−= = K K
1 21 2 1
1 2 1
( )( | )( )
nn n
nw
count w w wP w w w wcount w w w w−
−
=∑
KK
K
![Page 9: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/9.jpg)
Language Modeling: The N-gram i tiapproximation
• Probability of the word given the previous N-1Probability of the word given the previous N-1words
• N 2: bigram approximation• N=2: bigram approximation• N=3: trigram approximation• Bigram approximation:
1 2 1 2 1 3 2 1( ) ( ... ) ( ) ( | ) ( | ) ( | )n n nP e P w w w P w P w w P w w P w w −= = K
11
1
( )( | )( )n n
n nn
count w wP w wcount w w
−− =
∑ 1( )nw
−∑
![Page 10: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/10.jpg)
Translation ModelingTranslation ModelingTask: to find P(f|e)( | )Cannot use the counts of f and eApproximate: P(f|e) using the product of word translation probabilities (IBM model 1)translation probabilities (IBM model 1)
Problem: How to calculate word translation probabilities?Problem: How to calculate word translation probabilities?Note: We do not have counts – training corpus is
sentence-aligned, not word-aligned
![Page 11: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/11.jpg)
Word alignment exampleWord-alignment example(1) (2) (3) (4)(1) (2) (3) (4)
Ram has an apple
राम के पास एक सेब हैराम क पास एक सब ह(1) (2)(3) (4) (5) (6)
![Page 12: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/12.jpg)
E t ti M i i ti fExpectation Maximization for the translation modelthe translation model
![Page 13: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/13.jpg)
E t ti M i i ti l ithExpectation-Maximization algorithm1. Start with uniform word translation probabilitiesp2. Use these probabilities to find the counts (fractional)3. Use these new counts to recompute the word
translation probabilitiestranslation probabilities4. Repeat the above steps till values converge
Works because of the co-occurrence of words that are actually translations
b hIt can be proven that EM converges
![Page 14: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/14.jpg)
The counts in IBM Model 1The counts in IBM Model 1Works by maximizing P(f|e) over the entire corpusWo s by a g (f|e) ove t e e t e co pus
For IBM Model 1, we get the following relationship:g g p
0
( | )( | ; , )( | ) ( | )l
f ef e
e ef f
t w wc w w f et w w t w w
=+ +K
Α.Β
( | ; , ) is the fractional count of the alignment of with in and
f e f
e
c w w f e ww f e with in and
( | ) is the probability of being the translation of is the count of
f e f e
w f et w w w wΑ in fw f
is the count of in e
fw eΒ
![Page 15: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/15.jpg)
The translation probabilities in IBM M d l 1Model 1
( ) ( )( | ) ( | ; )S
f e f e s st f′ ∑ ( ) ( )
1( | ) ( | ; , )
To get ( | ) normalize such that
f e f e s s
sf e
t w w c w w f e
t w w=
′ = ∑To get ( | ), normalize such that
( | ) 1
f
f e
t w wt w w =∑
f∑
![Page 16: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/16.jpg)
English-French example of alignment
Completely alignedYour1 answer2 is3 right4
Votre1 response2 est3 just4
Alignment: 1 1, 2 2, 3 3, 4 4
Problematic alignmentProblematic alignmentWe1 first2 met3 in4 Paris5
Nous1 nous2 sommes3 rencontres4 pour5 la6 premiere7p pfois8 a9 Paris10
Alignment: 1 (1,2) , 2 (5,6,7,8) , 3 4 , 4 9 , 5 105 10 Fertilty?: yes
![Page 17: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/17.jpg)
EM for word alignment from sentence alignment: example
English(1) three rabits
French(1) trois lapins(1) three rabits
a b(1) trois lapins
x y
(2) rabbits of Grenobleb d
(2) lapins de Grenobleb c d x y z
![Page 18: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/18.jpg)
Initial Probabilities: h ll d ( ) ( )each cell denotes t(a w), t(a x) etc.
a b c d
w 1/4 1/4 1/4 1/4
x 1/4 1/4 1/4 1/4x 1/4 1/4 1/4 1/4
y 1/4 1/4 1/4 1/4y
z 1/4 1/4 1/4 1/4
![Page 19: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/19.jpg)
The counts in IBM Model 1The counts in IBM Model 1Works by maximizing P(f|e) over the entire corpusWo s by a g (f|e) ove t e e t e co pus
For IBM Model 1, we get the following relationship:g g p
0
( | )( | ; , )( | ) ( | )l
f ef e
e ef f
t w wc w w f et w w t w w
=+ +K
Α.Β
( | ; , ) is the fractional count of the alignment of with in and
f e f
e
c w w f e ww f e with in and
( | ) is the probability of being the translation of is the count of
f e f e
w f et w w w wΑ in fw f
is the count of in e
fw eΒ
![Page 20: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/20.jpg)
Example of expected countExample of expected countC[a w; (a b) (w x)]
t(a w)= ------------------------- X #(a in ‘a b’) X #(w in ‘w x’)
( ) ( )t(a w)+t(a x)
1/4= ----------------- X 1 X 1= 1/2
1/4+1/4
![Page 21: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/21.jpg)
“counts”
b c d a b c da b a b c d
x y zw 0 0 0 0
w xw 1/2 1/2 0 0 w 0 0 0 0
x 0 1/3 1/3 1/3
w 1/2 1/2 0 0
x 1/2 1/2 0 0 0 /3 /3 /3
y 0 1/3 1/3 1/3
/ / 0 0
y 0 0 0 0
z 0 1/3 1/3 1/3z 0 0 0 0
![Page 22: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/22.jpg)
Revised probability: example
trevised(a w)
1/2= ----------------------------------------
(½ 1/2 0 0 ) (0 0 0 0 )(½+1/2 +0+0 )(a b) ( w x) +(0+0+0+0 )(b c d) (x y z)
![Page 23: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/23.jpg)
Revised probabilities tableRevised probabilities table
a b c d
w 1/2 1/4 0 0
x 1/2 5/12 1/3 1/3x 1/2 5/12 1/3 1/3
y 0 1/6 1/3 1/3y
z 0 1/6 1/3 1/3
![Page 24: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/24.jpg)
“revised counts”
b c d a b c da b a b c d
x y zw 0 0 0 0
w xw 1/2 3/8 0 0 w 0 0 0 0
x 0 5/9 1/3 1/3
w 1/2 3/8 0 0
x 1/2 5/8 0 0 0 5/9 /3 /3
y 0 2/9 1/3 1/3
/ 5/8 0 0
y 0 0 0 0
z 0 2/9 1/3 1/3z 0 0 0 0
![Page 25: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/25.jpg)
R R i d b biliti t blRe-Revised probabilities table
a b c d
w 1/2 3/16 0 0
x 1/2 85/144 1/3 1/3x 1/2 85/144 1/3 1/3
y 0 1/9 1/3 1/3y
z 0 1/9 1/3 1/3
Continue until convergence; notice that (b,x) binding gets progressively stronger
![Page 26: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/26.jpg)
Another Example
A four-sentence corpus:pa b ↔ x y (illustrated book ↔ livre illustrie) b c ↔ x z (book shop ↔ livre magasin)
Assuming no null alignments. Possible alignments:
a b a b b c b c
x y x y x z x z
![Page 27: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/27.jpg)
Iteration 1Initialize: uniform probabilities to all word translations
1( | ) ( | ) ( | )t a x t b x t c x= = =( | ) ( | ) ( | )31( | ) ( | ) ( | )3
t a x t b x t c x
t a y t b y t c y= = =
1( | ) ( | ) ( | )3
Compute the fractional counts:
t a z t b z t c z= = =
Compute the fractional counts:1( | ; , ) , ( | ; , ) 02
c a x ab xy c a x bc xz= =
1( | ; ,c a y ab xy 1) , ( | ; , ) 021 1( | ; ) ( | ; )
c a y bc xz
c b x ab xy c b x bc xz
= =
= =( | ; , ) , ( | ; , )2 2
c b x ab xy c b x bc xz= =
![Page 28: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/28.jpg)
Iteration 2Iteration 2
From these counts, recomputing the probabilities:1 1 1 1( | ) 0 ; ( | ) 0 ; ( | ) 0t a x t a y t a z′ ′ ′= + = = + = =( | ) 0 ; ( | ) 0 ; ( | ) 02 2 2 21 1 1 1 1 1( | ) 1; ( | ) 0 ; ( | ) 02 2 2 2 2 2
t a x t a y t a z
t b x t b y t b z
+ +
′ ′ ′= + = = + = = + =
1 1 1 1( | ) 0 ; ( | ) 0; ( | ) 02 2 2 2
t c x t c y t c z′ ′ ′= + = = = + =
These probabilities are not normalized (indicated by )t′
![Page 29: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/29.jpg)
Normalized probabilities: after iteration 22
1 11 12 21 12 2( | ) ; ( | ) ; ( | ) 01 1 1 14 21 0
2 2 2 21 1 1
t a x t a y t a z= = = = =+ + + +
1 1 1( | ) ; ( | ) ; ( | ) ;2 2 21 1( | ) ; ( | ) 0; ( | ) ;
t b x t b y t b z
t c x t c y t c z
= = =
= = =( | ) ; ( | ) 0; ( | ) ;4 2
t c x t c y t c z
![Page 30: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/30.jpg)
Normalized probabilities: after iteration 33
( | ) 0.15; ( | ) 0.64; ( | ) 0t a x t a y t a z= = =
( | ) 0.70; ( | ) 0.36; ( | ) 0.36t b x t b y t b z= = =
( | ) 0.15; ( | ) 0; ( | ) 0.64
The probabilities (after a few iterations) converge
t c x t c y t c z= = =
The probabilities (after a few iterations) converge as expected (a y; b ; c )x z⇔ ⇔ ⇔
![Page 31: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/31.jpg)
Translation Model: Exact expression
Choose alignment i d
Choose the identity f f i d
Choose the length f f i l
d l f h [ ]
given e and m of foreign word given e, m, a
of foreign language string given e
Five models for estimating parameters in the expression [2]
Model‐1, Model‐2, Model‐3, Model‐4, Model‐5
![Page 32: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/32.jpg)
Proof of Translation Model: Exact expression
∑= eafef )|,Pr()|Pr(
expression
; marginalization∑a
ff )|,()|(
∑=m
emafeaf )|,,Pr()|,Pr(
∑
; marginalization
; g
∑=m
emafememaf ),|,Pr()|Pr()|,,Pr(
∑=m
emafem ),|,Pr()|Pr(
∑ ∏=
−−=m
m
j
jjjj emfaafem
1
11
11 ),,,|,Pr()|Pr(
∏∑ −−−=m
jjjj emfafemfaaem 111 )|Pr()|Pr()|Pr( ∏∑=
=j
jjj
jjj
memfafemfaaem
11111 ),,,|Pr(),,,|Pr()|Pr(
)|P (f )|P ( ∏ −−−m
jjjj fff 111 )|P ()|P (
m is fixed for a particular f, hence
)|,,Pr( emaf )|Pr( em= ∏=j
jjj
jjj emfafemfaa
1
111
11
11 ),,,|Pr(),,,|Pr(
![Page 33: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/33.jpg)
Model-1
Simplest modelAssumptions
Pr(m|e) is independent of m and e and is equal to εAlignment of foreign language words (FLWs) depends only on length of English sentence
= (l+1)-1( )l is the length of English sentence
The likelihood function will be
M i i th lik lih d f ti t i d tMaximize the likelihood function constrained to
![Page 34: CS460/626 : Natural Language: Natural Language ...pb/cs626-460-2011/cs626-460...2011/02/14 · EttiExpectation-Miiti l ithMaximization algorithm 1. Start with uniform word translation](https://reader033.fdocuments.us/reader033/viewer/2022060100/60af0e1f4cff637bca4e8278/html5/thumbnails/34.jpg)
Model-1: Parameter estimation
Using Lagrange multiplier for constrained maximization, the solution for model‐1 parameters
λe : normalization constant; c(f|e; f,e) expected count; δ(f,fj) is 1 if f & f are same zero otherwiseis 1 if f & fj are same, zero otherwise.
Estimate t(f|e) using Expectation Maximization (EM) procedurep