Language and Translation Model Adaptation using ...snover/pub/emnlp08/emnlp2008_lm...LCTL vs Full...
Transcript of Language and Translation Model Adaptation using ...snover/pub/emnlp08/emnlp2008_lm...LCTL vs Full...
-
Language and Translation Model Adaptation
using Comparable CorporaMatthew Snover, Bonnie Dorr, and Richard Schwartz
1
-
Monolingual Data in MT
• Limited primarily to language model estimation• Parallel data to train TM is very expensive
• Monolingual data is very cheap and easy to acquire• How can we better exploit monolingual data?
• News stories are repeated across languages, giving us greater context for translation
• Even without repetition of news stories; monolingual data gives greater context for stories.
2
2
-
Comparable Documents
• Reference Translation Cameras are flashing and reporters are following up, for Hollywood star Angelina Jolie is finally talking to the public after a one-month stay in India, but not as a movie star. The Hollywood actress, goodwill ambassador of the United Nations high commissioner for refugees, met with the Indian minister of state for external affairs, Anand Sharma, here today, Sunday, to discuss issues of refugees and children. ... Jolie, accompanied by her five-year-old son, Maddox, visited the refugee camps that are run by the Khalsa Diwan Society for social services and the high commissioner for refugees Saturday afternoon after she arrived in Delhi. Jolie has been in India since October 5th shooting the movie "A Mighty Heart," which is based on the life of Wall Street Journal correspondent Daniel Pearl, who was kidnapped and killed in Pakistan. Jolie plays the role of Pearl's wife, Mariane.
• Comparable Document Actress Angelina Jolie hopped onto a crowded Mumbai commuter train Monday to film a scene for a movie about slain journalist Daniel Pearl, who lived and worked in India's financial and entertainment capital. Hollywood actor Dan Futterman portrays Pearl and Jolie plays his wife Mariane in the "A Mighty Heart" co-produced by Plan B, a production company founded by Brad Pitt and his ex-wife, actress Jennifer Aniston. Jolie and Pitt, accompanied by their three children -- Maddox, 5, 18-month-old Zahara and 5-month-old Shiloh Nouvel -- arrived in Mumbai on Saturday from the western Indian city Pune where they were shooting the movie for nearly a month. ....
3
3
-
Previous WorkExploiting Monolingual Data
• New word-to-word translations from comparable (not parallel) data [ Fung and Yee, 1998; Rapp 1999 ]
• Find parallel text by mining monolingual data in multiple languages [Resnik and Smith 2003; Munteanu and Marcu 2005]
• Re-weight portions of language model data using CLIR techniques [Kim and Khudanpur 2003; Zhao et al. 2004; Kim 2005]
• Similar techniques have been used for weighting bi-text4
4
-
Our Approach
• For each document to be translated• Find comparable documents in target data• Adapt language model using comparable documents
• Following Kim 2005• Add new phrasal translation rules based on source text
and comparable documents
• Use existing language model training for monolingual data
5
5
-
Outline
• Comparable Data Selection using CLIR• Model Adaptation
• Language Model Adaptation• Translation Model Adaptation
• Experimental Results• Discussion
6
6
-
Outline
• Comparable Data Selection using CLIR• Model Adaptation
• Language Model Adaptation• Translation Model Adaptation
• Experimental Results• Discussion
7
7
-
Cross Lingual Information Retrieval
• Find English documents comparable to foreign documents using CLIR• Established method of locating relevant documents across languages• For each foreign document, query a database and return a ranked and
scored list of relevant English documents
• Adaptation method is independent of CLIR method
8
8
-
Somewhat Comparable
... An instrument aboard one of the two NASA rovers en route to Mars has malfunctioned, prompting worries it could harm the robot's
information-gathering ability, a scientist said. ...
Somewhat Comparable
... An instrument aboard one of the two NASA rovers en route to Mars has malfunctioned, prompting worries it could harm the robot's
information-gathering ability, a scientist said. ...
Somewhat Comparable
... An instrument aboard one of the two NASA rovers en route to Mars has malfunctioned, prompting worries it could harm the robot's
information-gathering ability, a scientist said. ...
CLIR
9
Monolingual EnglishDocuments
(Also LM Training)CLIR
Scoring
ForeignQuery
Ranked ListEnglish Doc 1English Doc 2English Doc 3English Doc 4
...
Foreign Source
... 火星 探测 车 任务 的 主管 塞 辛格 说 : " 我们 发现 探测 车 有
很 严重 异常 现象." ...
Comparable
... NASA scientists said on Thursday they had lost contact with the Mars Spirit rover for more than 24 hours , describing the problem as " a
very serious anomaly . " ...
9
-
CLIR
• Use multiple translations for each foreign word• Assign scores to documents and return ranked
list
10
Pr(Doc is rel|Q) = Pr(Doc is rel) Pr(Q|Doc is rel)Pr(Q)
Pr(Q|Doc) =∏
f∈QaPr(f |F ) + (1− a)
∑
e∈EPr(e|Doc) Pr(f |e)
Xu et. al, 2001
10
-
Long Comparable Documents
• Favors longer English documents• Long documents tend to be about many topics
• Bad for improving MT • Solution: Break documents into short (overlapping)
passages of ~300 words each
• Set of top N passages is bias-text
11
∑
e∈EPr(e|Doc) Pr(f |e)
11
-
Outline
• Comparable Data Selection using CLIR• Model Adaptation
• Language Model Adaptation• Translation Model Adaptation
• Experimental Results• Discussion
12
12
-
• New LM generated from comparable passages • Interpolated with original generic LM using very
low weight (0.01)
• New LM is small & specific (~3000 words)
Language Model Biasing
13
Pr(e) = (1− λ) Prg
(e) + λ Prb
(e)
13
-
Example Improvement
14
Reference
the pope said, “each life should and needs to be guarded and developed.
Original Translation
“each and every individual should need of safeguarding and developing,” the pope said.
Portion of Comparable Text
” Every human life , as it is , deserves and demands to always be defended and promoted,” the pope said on the day the Catholic Church celebrates annually
as a “Day of Life”.
Biased Translation
“every life should and must be defended and promoted,” the pope said.
Biased Translation
“every life should and must be defended and promoted,” the pope said.
Portion of Comparable Text
” Every human life , as it is , deserves and demands to always be defended and promoted,” the
pope said on the day the Catholic Church celebrates annually as a “Day of Life”.
14
-
Outline
• Comparable Data Selection using CLIR• Model Adaptation
• Language Model Adaptation• Translation Model Adaptation
• Experimental Results• Discussion
15
15
-
Implementation of Translation Model Adaptation
• Naïve assumption: same as IBM Model 1• Every phrase (1-3 words) in source can translate to every phrase (1-3
words) in comparable text
• Each rule has uniform probability• Use only those words and phrases from bias-text that occur ≥ k
times. (k = 2 in these experiments)
16
16
-
Implementation of TM Adaptation
• New translation rules differ only in lexical probability• Added to the decoder as phrasal rules but marked as special ‘bias
rules’
• Different weights for ‘bias rules’ and ‘generic rules’• Bias rule weight optimized alongside all other weights
17
17
-
Outline
• Comparable Data Selection using CLIR• Model Adaptation
• Language Model Adaptation• Translation Model Adaptation
• Experimental Results• Discussion
18
18
-
Base MT System
• State-of-the-art hierarchical MT system (BBN’s HierDec - Shen et. al 2008)
• Arabic-to-English• LM: decode w/ 3-gram & rescore w/ 5-gram• Optimized with BLEU
• Performance measured using BLEU and TER• 10 most comparable passages used to create biastext
19
19
-
Data Sets
• Monolingual Comparable Data (all in LM training)• English Gigaword 2.8 billion words• FBIS corpus 28.5 million words• News Archive Data from Web: 828 million words
• Tuned on portions of MT04, MT05, GALE07 newswire (48,921 words)
• Test on MT06 (55,578 words)• 4 reference translations
20
20
-
Less Commonly Taught Languages Scenario
• TM adaptation should prove most beneficial when less bi-text is available
• Simulate LCTL with Arabic by reducing bi-text training set
• 5 Million words of Newswire training• Full monolingual data used
21
21
-
LCTL ScenarioMeasuring the Upper Bound of TM Adaptation
• Best possible comparable data would be parallel data• Use reference translations to simulate
• Ideally TM adaptation could determine which source words should align to which target words
• Align source to references with GIZA++• Discard rule probabilities
• We can also limit bias rules to target side that occurs in top 100 passages from comparable data
22
22
-
BLEU
LCTL ScenarioTM Adaptation from Aligned References
30.0
37.5
45.0
52.5
60.0
Tune TER MT06 TER Tune BLEU MT06 BLEU
43.35
51.38
48.99
41.79
52.16
58.41
45.17
36.92
34.68
40.80
55.16
49.84
23
No Adaptation Aligned Reference Overlap w/ Comparable
TER
23
-
LCTL ScenarioMeasuring a Weaker Upper Bound of TM Adaptation
• Fair TM adaptation doesn’t align source and bias-text• Use only reference text without alignments
• Every phrase in reference document aligns to every phrase in source document
• Uniform probabilities of rules (as before)• We also limit bias rules to target side that occurs in top 100 passages
from comparable data
• Upper-bound for our method if you found ‘perfect’ comparable documents
24
24
-
TER
LCTL ScenarioTM Adaptation from Unaligned References
30.0
37.5
45.0
52.5
60.0
Tune TER MT06 TER Tune BLEU MT06 BLEU
36.95
43.13
53.90
48.08
39.90
45.66
52.54
44.92
34.68
40.80
55.16
49.84
25
BLEU
No Adaptation Unaligned Reference Overlap w/ Comparable
25
-
Fair LCTL Adaptation
30.0
37.5
45.0
52.5
60.0
Tune TER MT06 TER Tune BLEU MT06 BLEU
35.36
42.44
55.09
48.88
34.78
41.69
55.45
48.08
34.90
41.40
55.59
49.22
34.68
40.80
55.16
49.84
26
No Adaptation LM Adapt TM Adapt LM&TM Adapt
TER BLEU
26
-
Full Training Scenario
• LCTL: Only small gain (0.68 BLEU) on MT06 test set• Can we benefit if we use all our data?• Full Training: 230M Words of BiText and 18.5M Segments
• Includes data parallel data extracted from comparable data from ISI (LDC2007T08)
27
27
-
BLEU
Full Training Adaptation
35
40
45
50
55
Tune TER MT06 TER Tune BLEU MT06 BLEU
40.59
48.82
50.82
42.45
39.89
46.57
51.20
43.51
39.17
48.57
51.40
42.27
38.52
46.61
51.46
43.39
28
No Adaptation LM Adapt TM Adapt LM&TM Adapt
TER
28
-
LCTL vs Full Training Gains
• LCTL Training: Gain 0.68 BLEU & 0.07 TER• Full Training: Gain 2.07 BLEU & 0.58 TER• Counter-Intuitive: Larger gains with full training
• Better lexical probability estimates with full training• Better generic models to aid adaptation
29
TER Gain BLEU Gain
LCTL Training
Full Training
0.07 0.68
0.58 2.07
29
-
Discussion
• Exploit monolingual data to adapt both LM and TM.• Using no new information
• Comparable data part of LM. • CLIR uses generic TM.
• TM uses very simple method to generate bias rules• Gives substantial gains on test set for Newswire
• What is effect of level of comparability?• Can this be applied to less structured data? [web or audio] data?
30
30
-
Questions
31