NTT-UT SMT System for NTCIR-9...

46
NTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh , Kevin Duh, Hajime Tsukada, Masaaki Nagata NTT Communication Science Laboratories Kyoto, Japan Xianchao Wu, Takuya Matsuzaki, Jun’ichi Tsujii The University of Tokyo Tokyo, Japan 1

Transcript of NTT-UT SMT System for NTCIR-9...

Page 1: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

NTT-UT SMT System for NTCIR-9 PatentMT

Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata

NTT Communication Science LaboratoriesKyoto, Japan

Xianchao Wu, Takuya Matsuzaki, Jun’ichi Tsujii

The University of Tokyo

Tokyo, Japan

1

Page 2: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

task

singlesystemfeatures

additionalfeature

rank

English-Japanese Japanese-English Chinese-English

• Pre-ordering• Big LM• WFST decode

• Pre-ordering • WA Adaptation

Sys. Comb.+ U-Tokyo

forest-to-tree

Sys. Comb.+ U-Tokyo

HPBMT

Sys. Comb.+ U-Tokyo

HPBMT

1st 5th 9th

Overview

2

Page 3: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

task

singlesystemfeatures

additionalfeature

rank

English-Japanese Japanese-English Chinese-English

• Pre-ordering• Big LM• WFST decode

• Pre-ordering • WA Adaptation

Sys. Comb.+ U-Tokyo

forest-to-tree

Sys. Comb.+ U-Tokyo

HPBMT

Sys. Comb.+ U-Tokyo

HPBMT

1st 5th 9th

Overview

2

Better than RBMTeven in Subjective

Evaluation!!!

Page 4: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

task

singlesystemfeatures

additionalfeature

rank

English-Japanese Japanese-English Chinese-English

• Pre-ordering• Big LM• WFST decode

• Pre-ordering • WA Adaptation

Sys. Comb.+ U-Tokyo

forest-to-tree

Sys. Comb.+ U-Tokyo

HPBMT

Sys. Comb.+ U-Tokyo

HPBMT

1st 5th 9th

Overview

Today’s Focus !2

Better than RBMTeven in Subjective

Evaluation!!!

Page 5: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalizationfor En-Ja pre-ordering

3

Page 6: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)

3

Page 7: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)

• Moving heads to rhs on HPSG tree

• English HPSG Parser “Enju” (U-Tokyo)

3

Page 8: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)

• Moving heads to rhs on HPSG tree

• English HPSG Parser “Enju” (U-Tokyo)

• Pseudo-word insertion for Ja particles

• Predicate-argument structure by Enju

3

Page 9: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)

• Moving heads to rhs on HPSG tree

• English HPSG Parser “Enju” (U-Tokyo)

• Pseudo-word insertion for Ja particles

• Predicate-argument structure by Enju

• Determiner (a/an/the) deletion

3

Page 10: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

4

I lost my wallet in the airport yesterday

head

head

Page 11: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

4

I lost my wallet in the airport yesterday

head

head

Page 12: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinthe airportyesterday

head

head

Page 13: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinthe airportyesterday

head

head

• Move Heads

Page 14: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinthe airportyesterday

head

head

• Move Heads• Remove a, an, the

Page 15: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinthe airportyesterday

head

head

_va0 _va2

• Move Heads• Remove a, an, the• Insert pseudo-particles for subjects & objects

Page 16: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinairportyesterday_va0 _va2

Page 17: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinairportyesterday_va0 _va2

私 は 昨日 空港 で 私 の 財布 を なくし た

Page 18: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Head Finalization Example

I lostmy walletinairportyesterday_va0 _va2

私 は 昨日 空港 で 私 の 財布 を なくし た

Monotone Translation !!

Page 19: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Japanese Big LM

• Word 5-gram LM from 300M Ja sentences

7

Page 20: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

8

Page 21: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

• MT becomes monotone by pre-ordering

8

Page 22: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

• MT becomes monotone by pre-ordering

• Efficient decoding by WFST

8

Page 23: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

• MT becomes monotone by pre-ordering

• Efficient decoding by WFST

• phrase segmentation > phrase translation > word segmentation > LM

8

Page 24: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

• MT becomes monotone by pre-ordering

• Efficient decoding by WFST

• phrase segmentation > phrase translation > word segmentation > LM

• Efficient on-the-fly composition

8

Page 25: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

WFST-based Monotone Decoding

• MT becomes monotone by pre-ordering

• Efficient decoding by WFST

• phrase segmentation > phrase translation > word segmentation > LM

• Efficient on-the-fly composition

• ~3x faster than Moses PBMT

8

Page 26: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination

9

Page 27: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)

9

Page 28: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)

• Hyp. selection on N-bests on M systems

9

Page 29: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)

• Hyp. selection on N-bests on M systems

• Optimization in RIBES+BLEU

9

Page 30: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)

• Hyp. selection on N-bests on M systems

• Optimization in RIBES+BLEU

• System-independent “agreement” features

• Sub-components on RIBES & BLEU

9

Page 31: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)

• Hyp. selection on N-bests on M systems

• Optimization in RIBES+BLEU

• System-independent “agreement” features

• Sub-components on RIBES & BLEU

• Ranking SVM-like pairwise training

9

Page 32: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Auto-Eval Results

10

BLEU (%) RIBES (%)

Page 33: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Auto-Eval Results

10

0 20 40 60 80

HPBMT Baseline

F2S (U-Tokyo)

PreOrder (WFST)

PO+BigLM (Moses)

GMBR Sys. Comb.

BLEU (%) RIBES (%)

Page 34: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Auto-Eval Results

10

0 20 40 60 80

HPBMT Baseline

F2S (U-Tokyo)

PreOrder (WFST)

PO+BigLM (Moses)

GMBR Sys. Comb.39.48

38.81

36.83

27.99

31.66

BLEU (%) RIBES (%)

Page 35: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Auto-Eval Results

10

0 20 40 60 80

HPBMT Baseline

F2S (U-Tokyo)

PreOrder (WFST)

PO+BigLM (Moses)

GMBR Sys. Comb.39.48

38.81

36.83

27.99

31.66

78.13

77.82

77.29

68.61

72

BLEU (%) RIBES (%)

Page 36: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Subj.-Eval Results

11

Adequacy Acceptability (%)

Page 37: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Subj.-Eval Results

11

0 20 40 60 80

HPBMT Baseline

PreOrder (WFST)

GMBR Sys. Comb.

RBMT6-1

Adequacy Acceptability (%)

Page 38: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Subj.-Eval Results

11

0 20 40 60 80

HPBMT Baseline

PreOrder (WFST)

GMBR Sys. Comb.

RBMT6-1

Adequacy Acceptability (%)

2.60

3.56

3.67

3.51

Page 39: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

EJ Subj.-Eval Results

11

0 20 40 60 80

HPBMT Baseline

PreOrder (WFST)

GMBR Sys. Comb.

RBMT6-166

69

47

Adequacy Acceptability (%)

2.60

3.56

3.67

3.51

n/a

Page 40: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

What we found...

12

Page 41: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

What we found...

12

• Head Finalization worked QUITE well !

• Simple but effective way for EJ translation

• Monotone translation is relatively easy?

Page 42: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

What we found...

12

• Head Finalization worked QUITE well !

• Simple but effective way for EJ translation

• Monotone translation is relatively easy?

• Further improved by GMBR Sys. Comb.

• System variance (diversity) is important?

Page 43: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Conclusion

13

Page 44: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Conclusion

• State-of-the-art EJ translation

• even better than RBMT !

13

Page 45: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

Conclusion

• State-of-the-art EJ translation

• even better than RBMT !

• ... moderate in JE/CE

• JE pre-ordering, CE adaptation

13

Page 46: NTT-UT SMT System for NTCIR-9 PatentMTresearch.nii.ac.jp/.../03-NTCIR9-PATENTMT-SudohK_slides.pdfNTT-UT SMT System for NTCIR-9 PatentMT Katsuhito Sudoh, Kevin Duh, Hajime Tsukada,

That’s It!

Acknowledgments• PatentMT organizers, for all of this great task!

• Prof. Hideki Isozaki, for Head Finalization

• Dr. Takaaki Hori, Dr. Shinji Watanabe, for WFST decoding

14