Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring...

Stephan Vogel - Machine Translation 1

Machine Translation

Word Alignment

Stephan VogelSpring Semester 2011


Overview

IBM 3: Fertility IBM 4: Relative Distortion

Acknowledgement: These slides are based on slides by Hermann Ney and Franz Josef Och


Fertility Models

Basic concept: each word in one language can generatemultiple words in the other language

deseo – I would likeübermorgen – the day after tomorrowdeparted – fuhr ab

The same word can generate different number of words -> probability distribution

Alignment is function -> fertility only on one side In my terminology: target words have fertility, i.e. each target

word can cover multiple source words Others say source word generates multiple target words

Some source words are aligned to NULL word, i.e. NULL word has fertility

Many target words are not aligned, i.e. have fertility 0


The Generative Story

e0 e1 e2 e3 e4 e5

1 2 0 1 3 0

f01 f11 f12 f31 f41 f42 f43

f1 f2 f3 f4 f5 f6 f7

fertilitygeneration

wordgeneration

permutationgeneration


Fertility Model

Ja

IJJIJ eafef0

)|,Pr()|Pr( 01101

)( ie

)(,...,1,~

ii ef

Alignment model:

Select fertility for each English word:

For each English word select a tablet of French words:

Select a permutation for the entire sequence of French words:

iji ),(:

Sum over all realizations:

),(),

~(

0011

11

)|,~

Pr()|,Pr(JJ aff

IIJJ efeaf


Fertility Model: Constraints

J

jjii aie

1

),()(

iffi

~

Fertility bound to alignment:

Permutation:

French words:

iajiiii :,...,1 ,


Fertility Model

I

iii

IIi

efpef0 1

00 )|~

(),|~

Pr(

),,~

|Pr(),|~

Pr()|Pr()|,~

Pr( 0000000IIIIIII efefeef

I

iii

I

ii

II epepe11

0000 )|(),|()|Pr(

Decomposition into factors:

Apply chain rule to each factor, limit dependencies:

Fertility generation (IBM 3,4,5):

Word generation (IBM 3,4,5):

Permutation generation (only IBM 3):

I

ii

IIi

JIipef1 10

00 ),,|(!

1),,

~|Pr(

Note: 1/ results from special model for i = 0.


Fertility Model: Some Issues

Permutation model can not guaranty that p is a permutation-> Words ca be stacked on top of each other-> This leads to deficiency

Position i = 0 is not a real position-> special alignment and fertility model for the empty word


Fertility Model: Empty Position

Alignment assumptions for the empty position i = 0 Uniform position distribution for each of the 0 French words generated from

e0

Place these French words only after all other words have been placed

Alignment model for the positions aligned to the Empty position: One position:

All positions:

00

1 0010 !

1

1

1),,0|(

JIip

vacantis j if11

occupied is j if0:),,0|(

0

0 JIijp


Fertility Model: Empty Position

Fertility model for words generated by e0, i.e. by empty position We assume that each word from f1

J requires the Empty word withprobability [1 – p0]

Probability that exactly from the J words in f1J require the Empty word:

': ,:'with

]1['

),'|(

01

0'

00

0000

JJJ

ppJ

eJp

I

ii

J


Deficiency

Distortion model for real words is deficient Distortion model for empty word is non-deficient Deficiency can be reduced by aligning more words to

the empty word Training corpus likelihood can be increased by

aligning more words with empty word

Play with p0!


IBM 4: 1st Order Distortion Model

Introduce more detailed dependencies into the alignment (permutation) model

First order dependency along e-axis

HMM IBM4


Inverted Alignment

Consider alignments

Dependency along I axis: jumps along the J axis Two first order models

for aligning first word in a set and for aligning remaining words

We skip the math :-)

},...,,...,1{: JjBiB i

...)|( and ...)|( 11 jpjp


Characteristics of Alignment Models

Model Alignment Fertility E-step Deficient

IBM1 Uniform No Exact No

IBM2 0-order No Exact No

HMM 1-order No Exact No

IBM3 0-order Yes Approx Yes

IBM4 1-order Yes Approx Yes

IBM5 1-order Yes Approx No


Consideration: Overfitting

Training on data has always the danger of overfitting Model describes training data in too much detail But does not perform well on unseen test data

Solution: Smoothing Lexicon: distribute some of the probability mass from seen events to

unseen events for p( f | e ), do this for each e) For unseen e: uniform distribution or ???

Distortion: interpolate with uniform distribution

Fertility: for many languages ‘longer word’ = ‘more content’ E.g. compounds or agglutinative morphology Train a model for fertility given word length and interpolate with Interpolate fertility estimates based on word frequency: frequent word, use

the word model, low frequency word bias towards the length model

))(|( egp ))(|( egp

/Iα,I)a|α)p(a(,I)a|p'(a jjjj 11 11


Extension: Using Manual Dictionaries

Adding manual dictionaries Simple method 1: add as bilingual data Simple method 2: interpolate manual with trained dictionary Use constraint GIZA (Gao, Nguyen, Vogel, WMT 2010) Can put higher weight on word pairs from dictionary (Och, ACL

2000) Not so simple: “But dictionaries are data too” (Brown et al,

HLT 93)

Problem: manual dictionaries do not have inflected form

Possible Solution: Generate additional word forms (Vogel and Monson, LREC 04)


Extension: Using POS

Use POS in distortion model We had:

Now we condition of word class of previous aligned target

Available in GIZA++ Automatic clustering of vocabulary into word classes with mkcls Default: 50 classes

Use POS as 2nd ‘Lexicon’ model (e.g. Zhao et al, ACL 2005) Train p( C(f) | C(d ), start with initial model trained with IBM1 just on

word classes Align sentence pairs using p( C(f) | C(d ) and p( f | e ) Update both distributions from Viterbi path

),,|(),,|Pr( 101

1 IJaapeJaa jjIj

j

),),(,|(),,|Pr( )1(101

1 IJeCaapeJaa jajjIj

j


And Much More …

Add fertilities to HMM model Symmetrize during training: i.e. update lexicon

probabilities based on symmetrized alignment Benefit from shorter sentence pairs

Split long sentences based on initial alignment and retrain Extract phrase pairs and add reliable ones to training data

And then all the work on discriminative word alignment


Alignment Results

Unbalanced between wrong and missing -> unbalanced between precision and recall

Chinese is harder, many missing links -> low precision One direction seems harder: related to which side has

more words Alignment models generate one link per source word

Alignment Correct Wrong Missing Precision

Recall AER

Arabic-English

IBM4 S2T 202,898 72,488 134,097 73.7 60.2 33.7

IBM4 T2S 232,840 106,441 104,155 68.6 69.1 31.1

Combined 244,814 89,652 92,178 73.2 72.6 27.1

Chinese-English

IBM4 S2T 186,620 172,865 341,183 52,91 35.4 57.9

IBM4 T2S 299,744 151,478 228,059 66.4 56.8 38.8

Combined 296,312 140.929 231,491 67.8 56.1 38.6


Unaligned Words

Alignment NULL Alignment Not Aligned

Arabic-English

Manual Alignment 8.58 11.84

IBM4 S2T 3.49 30.02

IBM4 T2S 5.33 15.72

Combined 5.53 7.70

Chinese-Engish

Manual Alignment 7.80 11.90

IBM4 S2T 5.46 23.84

IBM4 T2S 6.41 34.53

Combined 9.80 14.64

NULL Alignment explicit, part of the model; non-aligned happens This is serious: alignment model neglects 1/3 of target words Alignment is very asymmetric, therefore combination


Alignment Errors for Most Frequent Words (CH-EN)


Sentence Length Distribution

Sentences are often unbalanced Wrong sentence alignment Bad translations But also language divergences

May wanna remove unbalance sentences Sentence length model very weak

SL

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

1 5 9 19

34

43

47

31

21

16

7 2 4 2 0 1

Table: Target sentence length distribution for source sentence length 10


Summary

Word Alignment Models Alignment is (mathematically) a function, i.e many source

words to 1 target word, but not the other way round Symmetry by training in both directions

Model IBM1 word-word probabilities Simple training with Expectation-Maximization

Model IBM2 Position alignment Training also with EM

Model HMM Relative positions (first order model) Training with Viterbi or Forward-Backward Algorithm

Alignment errors reflect restrictions in generative alignment models

Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring...

Documents

Transcript of Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring...