Efficient solutions for word reordering in German-English...
Transcript of Efficient solutions for word reordering in German-English...
Efficient solutions for word reordering in German-English phrase-based SMT
Arianna Bisazza & Marcello Federico – FBK (Italy)
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 2 2
Outline
• Why German-‐English?
• Why phrase-‐based SMT?
• Goal of this work • Techniques to achieve it:
1. early distor5on cost
2. word-‐aOer-‐word reordering pruning
• Experiments & discussion
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 3 3
Why German-‐English?
Jedoch konnten sie Kinder in Teilen von Helmand und Kandahar im Süden aus Sicherheitsgrund nicht erreichen.
But they could not reach children in parts of Helmand and Kandahar in the south for security reasons.
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 4 4
Why German-‐English?
Jedoch konnten sie Kinder in Teilen von Helmand und Kandahar im Süden aus Sicherheitsgrund nicht erreichen.
But they could not reach children in parts of Helmand and Kandahar in the south for security reasons.
German word order
• Discon5nuous verb phrases, main verb far from inflected auxiliary or modal
• Verb-‐second order VS English SVO
• Clause-‐final verb in subordinate clauses
Long-‐range reordering of isolated words or short phrases is frequent and important for transla5on quality!
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 5 5
Why phrase-‐based SMT?
• Shallow modeling: learns direct correspondences between surface forms in two languages
• Versa5le, cost-‐effec5ve
• Wrt hierarchical SMT: smaller models, faster decoding, very compe55ve for transla5ng between similar languages
Most popular framework in SMT produc5on scenarios today
Problem: doesn’t handle well long-‐range reordering! (cf. typical configura5ons use DL=6 up to 10)
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 6 6
Most popular framework in SMT produc5on scenarios today
Problem: doesn’t handle well long-‐range reordering!
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 7 7
Goal of this work
Most popular framework in SMT produc5on scenarios today
Problem: doesn’t handle well long-‐range reordering!
Improve handling of large reordering search spaces in PSMT.
How?
1. an5cipate payment of distor5on penalty for long backward jumps
2. dynamically prune unlikely long jumps before they are performed
Beier transla5on quality and faster decoding at high distor5on limits
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 8 8
How (1): Early Distor5on Cost [Moore & Quirk 2007]
1
TotDisto=0
8
4
6
8
10
12
14
14
1
1
1
1
1
14
TotDisto=0
14
Standard: pay jump cost when jumping Early: accumulate cost gradually before jumping
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 9 9
How (1): Early Distor5on Cost [Moore & Quirk 2007]
1
TotDisto=0
8
4
6
8
10
12
14
14
1
1
1
1
1
14
TotDisto=0
14
Standard: pay jump cost when jumping Early: accumulate cost gradually before jumping
Very important for handling long backward jumps
Implemented in Moses
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 10 10
How (2): Word-‐aOer-‐word reordering pruning
• New reordering models are designed every year, but problem of long reordering is s5ll unsolved
• Exis5ng word reordering models are not perfect, but they are expected to guide search over huge search spaces
… let’s refine the reordering search space!
…then...
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 11 11
How (2): Word-‐aOer-‐word reordering pruning
Standard search: explore all jumps within fixed DL
Our method: only explore long reorderings that are likely according to the reordering model
DL=6
0.2 0.2 0.4 0.6 0.6 0.2 0.7 0.4
Reo. scores
DL=6
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 12 12
How (2): Word-‐aOer-‐word reordering pruning
Standard search: explore all jumps within fixed DL
Our method: only explore long reorderings that are likely according to the reordering model
DL=6
0.2 0.2 0.4 0.6 0.6 0.2 0.7 0.4
Reo. scores
DL=6
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 13 13
How (2): Word-‐aOer-‐word reordering pruning
Standard search: explore all jumps within fixed DL
Our method: only explore long reorderings that are likely according to the reordering model
DL=6
0.2 0.2 0.4 0.6 0.6 0.7
Reo. scores
DL=6
ϑ=2 “Safe zone” always explored
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 14 14
How (2): Word-‐aOer-‐word reordering pruning
Standard search: explore all jumps within fixed DL
Our method: only explore long reorderings that are likely according to the reordering model
DL=6
0.2 0.2 0.4 0.6 0.6 0.7
Reo. scores
DL=6
ϑ=2 “Safe zone” always explored
Ra5onale: -‐ don’t waste 5me exploring unlikely long jumps -‐ less hypo’s in stack => less risk of search/model errors
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 15 15
How (2): Word-‐aOer-‐word reordering pruning
Reordering model ad hoc:
max-‐ent binary classifier predic5ng whether a given input word should be translated right a.er another
0.2 0.2 0.4 0.6 0.6 Reo. scores
Binary features, extracted from local context of star5ng/landing posi5ons using surface form, POS or chunk labels
(Model details & features in the TACL paper)
??
16 16
Experiments
• Experiments on WMT-‐tests(09-‐11) using WMT-‐10 training data
• Systems based on Moses, include state-‐of-‐the-‐art hierarchical lexicalized reordering models [Koehn & al 05; Galley & Manning 08]
• Contras5ve experiment with hierarchical SMT: -‐ standard Moses configura5on -‐ decoding-‐5me span constraint = 10 or 20
• Evalua5on by: -‐ BLEU for lexical match & local order -‐ KRS Kendall Reordering Score for global order same as LRscore with α=1 [Birch & al. 2010]
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 17 17
19.2
18.6
17.8 18.9
17.9
16.7 16.5
17.0
17.5
18.0
18.5
19.0
19.5
8 12 18 Disto.Limit
BLEU early standard
67.4
64.5
59.6
67.2
62.7
55.9 55.5
57.5
59.5
61.5
63.5
65.5
67.5
69.5
8 12 18 Disto.Limit
KRS
Exps (1): Early Distor5on Cost [Moore & Quirk 2007]
Large improvement in reordering under high DL,
but loss is s5ll there
Included in following exp’s
DL8
hiero10
DL18
DL18
+WaWprun
hiero20
61.0
62.0
63.0
64.0
65.0
66.0
67.0
68.0
69.0
18.0 18.5 19.0 19.5 20.0
KRS
BLEU
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 18 18
Exps (2): Word-‐aOer-‐word reordering pruning
-‐ WaWprune: non-‐prunable zone of width ϑ=5 -‐ More metrics in the paper
Transla5on Quality
+1.3 BLEU +5.0 KRS
DL8
hiero10
DL18
DL18
+WaWprun
hiero20
61.0
62.0
63.0
64.0
65.0
66.0
67.0
68.0
69.0
18.0 18.5 19.0 19.5 20.0
KRS
BLEU
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 19 19
Exps (2): Word-‐aOer-‐word reordering pruning
Decoding Time
202
408
142
406
706
0 200 400 600 800
DL8
DL18
DL18+WaWprun
hiero10
hiero20
ms/word
Transla5on Quality
-‐ WaWprune: non-‐prunable zone of width ϑ=5 -‐ More metrics in the paper
+1.3 BLEU +5.0 KRS
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 20 20
What’s going on?
Hypotheses created: scored by all models and
added to stack
Long-‐range phrase-‐to-‐phrase jumps performed for every 100
sentences of the test
SRC Jedoch konnten sie Kinder in Teilen von Helm. und Kand. im Süden aus Sicherheitsgrund nicht erreichen.
REF But they could not reach children in parts of Helm. and Kand. in the south for security reasons.
DL8 However, they were children in parts of Helm. and Kand. in the south, for security reasons.
DL18 However, they were children in parts of Helm. and Kand. in the south do not reach for security reasons.
+WaW However, they could not reach children in parts of Helm. and Kand. in the south for security reasons.
H10 However, they were children in parts of Helm. and Kand. in the south not reach for security reasons.
H20 However, they were children in parts of Helm. and Kand. in the south not reach for security reasons.
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 21 21
Example
22 22
Conclusions
• Long-‐range reordering in PSMT can be made possible by: -‐ using a beier distor5on cost func5on (Moore & Quirk 2007) -‐ dynamically refining the reordering search space, i.e. only exploring long jumps that are “promising”
• Results: -‐ long jumps captured, similar BLEU, higher KRS -‐ faster decoding
• Narrowed the gap between PSMT and hiero, with faster decoding and smaller models
• Reordering pruning can be tried with other kinds of reo. models
• Can benefit other language pairs with isolated long-‐range reorderings (e.g. Arabic-‐English)
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT
Thanks for your aien5on!
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 24 24
Part 2: Word-‐aOer-‐word reordering modeling and pruning [Bisazza & Federico 2013]
Idea: dynamically prune unlikely long reordering steps before performing them Method: train a binary classifier to learn if an input word should be translated right a.er another
At decoding 5me, only explore long reorderings that are likely according to this model
Jedoch konnten sie Kinder in Teilen von … nicht erreichen . no no no yes no no no no no
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 25 25
Part 2: Word-‐aOer-‐word reordering modeling and pruning [Bisazza & Federico 2013]
Idea: dynamically prune unlikely long reordering steps before performing them Method: train a binary classifier to learn if an input word should be translated right a.er another
Jedoch konnten sie Kinder in Teilen von … nicht erreichen . 0.1 0.1 0.2 0.3 0.4 0.2
At decoding 5me, only explore long reorderings that are likely according to this model
0.7 0.4 0.1
Short-‐to-‐medium zone always explored
Bisazza & Federico – Efficient solu5ons for word reordering in De-‐En PSMT 26 26
Part 2: Word-‐aOer-‐word reordering modeling and pruning [Bisazza & Federico 2013]
Idea: dynamically prune unlikely long reordering steps before performing them Method: train a binary classifier to learn if an input word should be translated right a.er another
Jedoch konnten sie Kinder in Teilen von … nicht erreichen . 0.1 0.1 0.2 0.3 0.4 0.2
At decoding 5me, only explore long reorderings that are likely according to this model
0.7 0.4 0.1