Imposing Constraints from the Source Tree on ITG Constraints for SMT
Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita
National Institute of Information and Communications Technology
ATR Spoken Language Communication Research Labs.
Kindai University School of Science and Engineering Department of Information
Background
In current SMT, erroneous word reordering is one of the most serious problems, especially for dis- similar language pair such as English-Chinese or English-Japanese.
1) To introduce linguistic syntax directly.
Not robust to parsing error
Tree-to-stringString-to-tree
Tree-to-tree
Background
In current SMT, erroneous word reordering is one of the most serious problems, especially for not similar language pair such as English-Chinese or English-Japanese.
2) To assign probabilistic constraints for word reordering
Weaker constraints than the first type
To introduce syntax information to second type
IBM distortion, Lexical reordering, ITG
ITG Constraints
Translation source sentences are represented by binary tree. Translation target sentences can be generated by rotating branches of nodes of source tree.
BA DC
db ca
BA DC
ac bd
Above target word order cannot be generated from any source binary tree.Source binary tree instance is not considered.
Basic Idea of IST-ITG
To use ITG constraints under the given source tree
BA DC
BA DC
abcd, abdc, bacd, badc,cdad, cdba, dcab, dcba
abcd, bacd, cabd, cbad,dabc, dbac, dcab, dcba
In original ITG constraints, 22 combinations are allowed.
The Number of Word Order Combinations
For binary source tree, word order combinations are allowed without constraints. Under the IST-ITG constraints, this number is reduced to .12 N
6N720!N
322 1 N
!N
If
Without constraintsITG constraintsIST-ITG
394
10N
800,628,3!N
5122 1 N
If
Without constraintsITG constraintsIST-ITG
098,206
Extension to Non-binary Tree
Parsing results sometimes are not binary tree.
For the nodes which have more than two branches, any word reorderings are allowed.
BA DC
abcd, abdc, acbd, acdb,adbc, adcb, bcda, bdca, cbda, cdba,dbca, dcba
Extension to Non-binary Tree
Parsing results sometimes are not binary tree.
For the node which have more than two branches, any word reorderings are allowed.
For non-binary tree, the number of combinations of IST-ITG can represented by . ( represents number of branches in -th node)
n
i iB1)!(
iB i
IST-ITG in Phrase-based SMT (1)
× The unit of parsing tree is “word”, but the unit of phrase-based SMT is “phrase”. Units are different.
Additional rules for phrase-based SMT
1) Word reordering that breaks a phrase is not allowed.
2) Phrase internal word reordering is not checked.
○ Word-to-word alignments are sometimes not one-to-one. But phrase-to-phrase alignments are always one-to-one
IST-ITG in Phrase-based SMT (2)
E F G
2 3
APh
B C D
14
5
1:NG 2:NG 3:OK 4:NG 5:OK(unacceptable)
IST-ITG in Phrase-based SMT (2)
E F G
2 3
APh
B C D
14
5
1:NG 2:NG 3:OK 4:NG 5:OK
Ph
IST-ITG in Phrase-based SMT (2)
E F G
2 3
APh
B C D
14
5
1:NG 2:NG 3:OK 4:NG 5:OK
Ph
IST-ITG in Phrase-based SMT (2)
E FG
2 3
APh
B C D
14
5
1:NG 2:NG 3:OK 4:NG 5:OK
IST-ITG in Phrase-based SMT (2)
EFG
2 3
APh
B C D
14
5
1:NG 2:NG 3:OK 4:NG 5:OK
Ph
IST-ITG in Phrase-based SMT (2)
EFG
2 3
APh
BCD
14
5
1:NG 2:NG 3:OK 4:NG 5:OK
Decoding Algorithm with IST-ITG
E F GA B C D1 0 00 0 0 1
12
0
2
0
0:Untranslated 1 : Translated 2 : Translating
d e
H I0 0
0
Decoding Algorithm with IST-ITG
E F GA B C D1 0 01 1 0 1
12
2
NG
0
H I0 0
0
If phrases A and B are translated,Sub-tree that includes more than two “2” NG
d e a b
Decoding Algorithm with IST-ITG
E F GA B C D1 0 00 0 0 1
12
0
2
0
H I0 0
0
Consider minimum Translating sub-tree(sub-tree that includes both “0” and “1”.)
d e
Decoding Algorithm with IST-ITG
E F GA B C D1 1 10 0 0 1
11
0
2
1
H I1 0
2
All of minimum Translating sub-tree are translated. OK
d e f g h
Decoding Algorithm with IST-ITG
E F GA B C D1 0 10 0 0 1
12
0
2
2
H I0 0
0
Translate sub-part of minimum Translating sub-tree. OK
d e g
English and Japanese Patent Corpus Experiments
# of sent. Total Words # of entry
E/J Train
E/J Dev
E/J Eval
Experimental corpus size
1.8M
916
899
60M/64M
30K/32K
29K/32K
188K/118K
4,072/3,646
3,967/3,682
Single reference
Other Experimental Conditions
LM training: SRI Language model toolkit (5-grams)Word alignment for TM training: GIZA++Decoder: Moses compatible in-house decoder named CleopATRa
Evaluation measures
BLEU,NIST,WER,PER
njj eeXeeee ,...,,,,...,,, 2321
jnj eeeeXee ,...,,,,,,..., 3212
English and Japanese Patent TranslationExperimental Results
IBM+Lex
IBM+Lex+IST
BLEU NIST WER PER
31.17
32.20
7.50
7.61
76.30
71.18
38.61
38.15
English-to-Japanese
IST-ITG 30.26 7.41 74.90 38.93
Monotone 24.91 6.95 79.97 40.02
No Constraint 26.83 7.19 81.10 39.52
IBM 28.34 7.29 78.35 39.25
English and Japanese Patent TranslationExperimental Results
IBM+Lex
IBM+Lex+IST
BLEU NIST WER PER
31.17
32.20
7.50
7.61
76.30
71.18
38.61
38.15
English-to-Japanese
IST-ITG 30.26 7.41 74.90 38.93
Monotone 24.91 6.95 79.97 40.02
No Constraint 26.83 7.19 81.10 39.52
IBM 28.34 7.29 78.35 39.25
English and Japanese Patent TranslationExperimental Results
IBM+Lex
+IST-ITG
BLEU NIST WER PER
29.93
29.77
7.54
7.50
77.27
72.80
39.12
39.73
Japanese-to-English
English and Japanese Patent TranslationExperimental Results
IBM+Lex
+IST-ITG
BLEU NIST WER PER
29.93
29.77
7.54
7.50
77.27
72.80
39.12
39.73
Japanese-to-English
Chinese-to-English Translation Experiments
NIST MT08 English-to-Chinese track
IBM+Lex
+IST-ITG
W-Bleu C-Bleu WER CER
21.0
23.2
35.2
37.0
75.069.7
74.167.9
Experimental Results
Training data for TMTraining data for LMDevelopment dataEvaluation data
6.2M20.1M1,6641,859
1 reference4 reference
Conclusion
We proposed new word reordering constrains IST-ITG using source tree structure. It is extension of ITG constraints.
We conducted three experiments of proposed method: E-J and J-E patent translation and NIST MT08 E-C track. In all experiments, improvements of BLEU and WER are confirmed.
Especially, improvement for WER is very large, and effectiveness for global word reordering is confirmed.
Thank you!
Top Related