Example-based Machine Translation based on Deeper NLP

24
Example-based Machine Example-based Machine Translation based on Deeper NLP Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656 2. Graduate School of Informatics, Toshiaki Nakazawa 1 , Kun Yu 1 , Sadao Kurohashi 2

description

Example-based Machine Translation based on Deeper NLP. Toshiaki Nakazawa 1 , Kun Yu 1 , Sadao Kurohashi 2. 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656 2. Graduate School of Informatics, - PowerPoint PPT Presentation

Transcript of Example-based Machine Translation based on Deeper NLP

Page 1: Example-based Machine Translation based on Deeper NLP

Example-based Machine Example-based Machine Translation based on Deeper NLPTranslation based on Deeper NLP

1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656

2. Graduate School of Informatics,Kyoto University, Kyoto, Japan, 606-8501

Toshiaki Nakazawa1, Kun Yu1, Sadao Kurohashi2

Page 2: Example-based Machine Translation based on Deeper NLP

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 3: Example-based Machine Translation based on Deeper NLP

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 4: Example-based Machine Translation based on Deeper NLP

Why EBMT?Why EBMT? Pursuing deep NLP

- Improvement of fundamental analyses leads to

improvement of MT- Feedback from MT can be expected

EBMT setting is suitable in many cases

- Not a large corpus, but similar translation examples

in relatively close domain- e.g. manual translation, patent translation, …

Page 5: Example-based Machine Translation based on Deeper NLP

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 6: Example-based Machine Translation based on Deeper NLP

Kyoto-U System OverviewKyoto-U System Overview

my

traffic

The light

was green

when

entering

the intersection

Language Model

My traffic light was green when entering the intersection.

Input

Output

交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

交差点に入る時私の信号は青でした。

Page 7: Example-based Machine Translation based on Deeper NLP

Structure-based AlignmentStructure-based Alignment

- Step1: Dependency structure transformation

- Step2: Word/phrase correspondences detection

- Step3: Correspondences disambiguation

- Step4: Handling remaining words

- Step5: Registration to database

Page 8: Example-based Machine Translation based on Deeper NLP

Dependency Structure Dependency Structure TransformationTransformation

交差

点 で 、突然

あの

車 が飛び出して 来た のです

the carcame

at mefrom the sideat the intersection

J: 交差点で、突然あの車が

飛び出して来たのです。

E : The car came at me from the side at the intersection.

J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree

Step1

Page 9: Example-based Machine Translation based on Deeper NLP

Word Correspondence Detection Word Correspondence Detection KENKYUSYA J-E, E-J dictionaries (300K entries) Transliteration (person/place names, Katakana words)

交差

点 で 、

突然あの

車 が

飛び出して 来た のです

the carcame

at mefrom the side

at the intersection

Ex) 新宿 → shinjuku sinjuku synjucu

...

⇔ shinjuku (similarity:1.0)

Step2

Page 10: Example-based Machine Translation based on Deeper NLP

Calculate correspondence score based on unambiguous alignment

Select correspondence with higher score

11 Score.

MatchesUnamb EJ distdist

Correspondence Disambiguation Correspondence Disambiguation

distJ/E = Distance to unambiguous correspondence in Japanese/English tree

Step3

Page 11: Example-based Machine Translation based on Deeper NLP

日本 で

保険

会社 に

対して

保険

請求の

申し立て が

可能です よ

you

will haveto file

insurance

an claim

insurance

with the office

in Japan1.51.0

0.8

Correspondence Disambiguation Correspondence Disambiguation (cont.)(cont.)

Step3

Page 12: Example-based Machine Translation based on Deeper NLP

Handling Remaining Words Handling Remaining Words

交差

点 で 、

突然あの

車 が

飛び出して 来た のです

the carcame

at mefrom the side

at the intersection

Align root nodes when remained Merge Base NP nodes Merge into ancestor nodes

Step4

Page 13: Example-based Machine Translation based on Deeper NLP

Registration to Database Registration to Database

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the carcame

at mefrom the sideat the intersection

Step5

Register each correspondence Register a couple of correspondences

Page 14: Example-based Machine Translation based on Deeper NLP

TranslationTranslation

Translation example (TE) retrieval

- for all the sub-trees in the input TE selection

- prefer to large size example TE combination

- greedily from the root node

Page 15: Example-based Machine Translation based on Deeper NLP

Combination ExampleCombination Example

my

traffic

The light

was green

when

entering

the intersection

Input交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

Page 16: Example-based Machine Translation based on Deeper NLP

Combination Example (cont.)Combination Example (cont.)

my

traffic

The light

was green

when

entering

the intersection

Input交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

Page 17: Example-based Machine Translation based on Deeper NLP

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 18: Example-based Machine Translation based on Deeper NLP

Pronoun EstimationPronoun Estimation

Omitted in TE:

- TE 胃が痛いのです → I’ve a stomachache

- Input 私は胃が痛いのです →

Omitted in Input

- TE これを日本に送ってください → Will you mail this to Japan?

- Input: 日本へ送ってください →

Pronouns are often omitted in Japanese sentences

I I’ve a stomachache ×

Will you mail to Japan? ×△

Page 19: Example-based Machine Translation based on Deeper NLP

Pronoun Estimation (cont.)Pronoun Estimation (cont.)

Omitted in TE:

- TE 胃が痛いのです → I’ve a stomachache

- Input 私は胃が痛いのです →

Omitted in Input

- TE これを日本に送ってください → Will you mail this to Japan?

- Input: 日本へ送ってください →

Estimate omitted pronoun by modality and subject case

Will you mail this to Japan? ○

I’ve a stomachache ○

(私は)胃が痛いのです → I’ve a stomachache

(これを)日本へ送ってください →

Page 20: Example-based Machine Translation based on Deeper NLP

Synonymous Relation

Various Expressions in Japanese Various Expressions in Japanese

- Hiragana/Katakana/Kanji variations りんご = リンゴ = 林檎 (apple)

- Variations of Katakana expressions コンピュータ = コンピューター (computer)

- Synonymous words 登山 = 山登り (climbing mountain vs mountain

climgbing)- Synonymous phrases 最寄りの = 一番近い

Hypernym-Hyponym Relation

Morphological Analyzer

Automatically Acquired from Japanese Dictionaries

- 災難 ← 災害 ← 地震 (earthquake) 、台風 (typhoon)(disaster)

(nearest) (most) (near)

Page 21: Example-based Machine Translation based on Deeper NLP

Japanese Flexible MatchingJapanese Flexible Matching

Page 22: Example-based Machine Translation based on Deeper NLP

Open data track (JE) Correct recognition translation & ASR output translation

BLEU NIST

Correct recognition

Dev1 0.5087 9.6803Dev2 0.4881 9.4918Dev3 0.4468 9.1883Dev4 0.1921 5.7880Test 0.1655 (8th/14) 5.4325 (8th/14)

ASR outputDev4 0.1590 5.0107Test 0.1418 (9th/14) 4.8804 (10th/14)

IWSLT06 Evaluation ResultsIWSLT06 Evaluation Results

Page 23: Example-based Machine Translation based on Deeper NLP

Punctuation insertion failure caused parsing error

Dictionary robustness affected alignment accuracy

TE selection criterion failed when choosing among ‘almost equal’ examples

- e.g. Input: “ 買います” (buy a ticket)

TE: “ 買いません” (not buy a ticket)

Results DiscussionResults Discussion

Page 24: Example-based Machine Translation based on Deeper NLP

We not only aim at the development of MT, but also tackle this task from the viewpoint of structural NLP.

Conclusion and Future WorkConclusion and Future Work

Implement statistical method on alignment Improve parsing accuracies (both J and E) Improve Japanese flexible matching method J-C and C-J MT Project with NICT