Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! •...

24
Training a Parser for Machine Translation Reordering Jason Katz-Brown Slav Petrov Ryan McDonald Franz Och David Talbot Hiroshi Ichikawa Masakazu Seno Hideto Kazawa

Transcript of Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! •...

Page 1: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Training a Parser for Machine Translation Reordering

Jason Katz-Brown Slav Petrov Ryan McDonald Franz Och

David Talbot Hiroshi Ichikawa Masakazu Seno Hideto Kazawa

Page 2: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Two juicy bits

1. Method for improving performance of any parser on any extrinsic metric

2. Experimental results in training parsers for machine-translation reordering

Page 3: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Our reordering system

• Syntactic reordering during preprocessing at training and decoding time (Collins et al., 2005)

• Translation quality sensitive to reordering quality

• Reordering quality sensitive to parse quality

nsubj ROOT dobj

I saw herPRP VBD PRP

I her saw

Reorder verb after its arguments, to

produce Japanese word order:

Input English:

Page 4: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Evaluating reordering quality

Source Wear sunscreen that has an SPF of 15 or above

Japanese-y Reference

15 or above of an SPF has that sunscreen Wear

References made by bilingual non-linguist annotators→Cheap→Fast

Page 5: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Evaluating reordering quality

Source Wear sunscreen that has an SPF of 15 or above

Japanese-y Reference

15 or above of an SPF has that sunscreen Wear

Experiment 15 or above of an SPF has that Wear sunscreen

Experiment reordering score: 1 - (3-1)/(11-1) = 0.8

More detail: Talbot et al., EMNLP-WMT 2011

Page 6: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Let’s make a magical treebank!

Page 7: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Let’s make a magical treebank!

• Reordering quality is predictive of parse quality

Page 8: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Let’s make a magical treebank!

• Reordering quality is predictive of parse quality

• So, what if we make a treebank that contains only parse trees that get a good reordering score?

Page 9: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Let’s make a magical treebank!

• Reordering quality is predictive of parse quality

• So, what if we make a treebank that contains only parse trees that get a good reordering score?

• We can search through a large n-best list of parses from a baseline parser for the one with the best reordering score.

Page 10: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Targeting good reordering

nn ROOT ref rcmod

Wear sunscreen that has ...NN NN WDT VBZ

Reordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenScore: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place)

ROOT dobj ref rcmod

Wear sunscreen that has ...VB NN WDT VBZ

Reordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearScore: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)

Parse #1:

Parse #23:

Page 11: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Sweet

• This could work for any extrinsic metric!

Page 12: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Self-Training

• McClosky et al. (2006) demonstrated self-training to improve implicit parse accuracies e.g. labeled attachment score (LAS)

• For every sentence in some new corpus:

1. Parse sentence with a baseline parser.2. Add parse to baseline parser’s training data.

Page 13: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Targeted Self-Training

• We target the self-training to incorporate an extrinsic metric

• For every sentence in some new corpus:1. Parse sentence with a baseline parser and

produce a ranked n-best list.2. Select parse with highest extrinsic metric.3. Add parse to baseline parser’s training data.

Page 14: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Experimental Setup

• Two parser models• Deterministic Shift-Reduce Parser (Nivre et al.) + TnT Tagger

(Brants)• Latent Variable Parser (BerkeleyParser; Petrov et al. '06)

• English→SOV reordering rules of Xu et al. (2009) • Precedence rules for how to reorder the children of a

dependency tree node• Suitable for English to Japanese, Korean, Turkish, Hindi, ...

• Stanford converter to convert constituency trees to dependency trees (Marnaffe et al., 2006)

Page 15: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Training on questions

• QuestionBank (Judge et al. '06)

• 1000-sentence train set, 1000-sentence test set

• Compare training on:1. WSJ2. WSJ + 1000 QuestionBank trees3. WSJ + 1000 reference reorderings of same sentences

Page 16: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Question reordering scores

Shift Reduce

Berkeley

0.6 0.663 0.725 0.788 0.85

0.775

0.779

0.8

0.768

0.733

0.663

WSJWSJ+TreebankWSJ+Targeted Self-Training

Reordering score

Page 17: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Question attachment scores

0.6

0.675

0.75

0.825

0.9

0.6 0.675 0.75 0.825 0.9

Shift ReduceBerkeley

Reordering score

LASTargeted

self-training

Treebanktraining

Page 18: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Question translation evaluations

English to

BLEUBLEU Human eval (scores 0-6)Human eval (scores 0-6)Human eval (scores 0-6)

English to

WSJ-only Targeted WSJ-only Targeted Sig. diff?

Japanese 0.2379 0.2615 2.12 2.94 yes

Page 19: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Training on web text

• 13595 English sentences sampled from web

• 6268-sentence train set, 7327-sentence test set

• Compare training on:• WSJ + self-training on 6268 sentences• WSJ + targeted self-training on 6268 reference reorderings

Page 20: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Web text reordering scores

Shift Reduce

Berkeley

0.6 0.663 0.725 0.788 0.85

0.79

0.777

0.785

0.756

0.78

0.757

WSJWSJ+Self-TrainingWSJ+Targeted Self-Training

Reordering score

Page 21: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Web text attachment scores

0.8

0.825

0.85

0.875

0.9

0.75 0.763 0.775 0.788 0.8

Shift ReduceBerkeley

Web text reordering score

WSJLAS

Page 22: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Web text translation evaluations

English to

BLEUBLEU Human eval (scores 0-6)Human eval (scores 0-6)Human eval (scores 0-6)

English to

WSJ-only Targeted WSJ-only Targeted Sig. diff?

Japanese 0.1777 0.1802 2.56 2.69yes

(95%)

Turkish 0.3229 0.3259 2.61 2.70yes

(90%)

Korean 0.1344 0.1370 2.10 2.20yes

(95%)

Page 23: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Related work

• Perceptron model for end-to-end MT system, updating alignment parameters from n-best list based on which leads to best BLEU score (Liang et al., 2006)

• Weakly or distantly supervised structured prediction (e.g. Chang et al., 2007)

• Re-training with bilingual data jointly parsed with multiview learning objective (Burkett et al., 2010)

• Up-training (Petrov et al., 2010)

• “Training dependency parsers by jointly optimizing multiple objectives” (Hall et al., EMNLP 2011)

Page 24: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that

Contributions

• Introduced targeted self-training which• Adapts any parser to any extrinsic metric• Improves translation quality significantly when applied to

word reordering