Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki...

23
Language & K nowledge E ngineeringLab Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa

Transcript of Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki...

Page 1: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Example-based Machine Translation Pursuing Fully

Structural NLP

Kurohashi-lab M156430 Toshiaki Nakazawa

Page 2: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Outline

I. History of Machine Translation

II. Introduction of recent MT systemsi. Statistic Machine Translation (SMT)ii. Example-based Machine Translation

(EBMT)

III. Related work for EBMTi. Logical Formii. Efficient retrieval method

IV. EBMT pursuing fully structural NLP

V. Conclusion

Page 3: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Outline

I. History of Machine Translation

II. Introduction of recent MT systemsi. Statistic Machine Translation (SMT)ii. Example-based Machine Translation

(EBMT)

III. Related work for EBMTi. Logical Formii. Efficient retrieval method

IV. EBMT pursuing fully structural NLP

V. Conclusion

Page 4: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

History of Machine Translation

19401950

19601970

1980

Beginning of Machine

Translation

MT quality didn’t improved despite spending much

money

Doldrums of MT

MT quality had been improving because of the development

of NLP

“Machine Translation based on analogy”

is proposed[Nagao, 1981]

“Mu project” started SMT had been

becoming active[Brown et al., 1993]

Not enough quality yet…

When I look at an article in Russian, I say: "This is really

written in English, but is has been coded in some strange symbols. I

will now proceed to decode."

[Warren Weaver, 1947]

Page 5: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Outline

I. History of Machine Translation

II. Introduction of recent MT systemsi. Statistic Machine Translation (SMT)ii. Example-based Machine Translation

(EBMT)

III. Related work for EBMTi. Logical Formii. Efficient retrieval method

IV. EBMT pursuing fully structural NLP

V. Conclusion

Page 6: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering LabStatistical Machine Translation (SMT)

Learn models for translation from parallel corpus statistically

Not use any linguistic resources

Small translation unit (= “word”)

Require large parallel corpus for highly-accurate translation

田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。

輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。

田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。

参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。

きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。

Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan.

About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan.

The event was organized by the private Foundation for Global Peace and Environment.

The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn.

Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting.

Parallel Corpus

Page 7: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Basic Method for SMT

Translate by maximizing the probability:

)|()(maxarg

)|(maxarg

EJPEP

JEPE

E

E

Language Model Translation Model

Learn from a parallel corpus

Page 8: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Translation Model

IBM Model 4 [Brown et al., 93]

# of Japanese words which each English word

generatesModel for generating NULL to justify the # of

words

Probability of translation from one E word to one J wordModel for word order

×

×

×

Translation Model

Page 9: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Overview of EBMT

ParallelCorpus Alignment TMDB

Output

Translation

Input

Advanced NLP technologies

交差点 で 、

at the intersection

Page 10: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering LabExample-based Machine Translation (EBMT)

Divide the input sentence into a few parts Find similar expressions (= examples,

TMs) from parallel corpus for each part Combine the examples to generate output

translation Use any linguistic resources as much as

possible Larger translation unit (larger example) is

better

Page 11: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Flow of EBMT

Page 12: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Furthermore...

Translation algorithm is implicit in EBMT

→ Probabilistic Model for EBMT

[Aramaki et al., 05]

Recently, the number of studies handling bigger unit is increasing

Difference between SMT and EBMT is becoming smaller

Most active study = Phrase-based SMT SMT and EBMT will be merged (?)

Page 13: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Outline

I. History of Machine Translation

II. Introduction of recent MT systemsi. Statistic Machine Translation (SMT)ii. Example-based Machine Translation

(EBMT)

III. Related work for EBMTi. Logical Formii. Efficient retrieval method

IV. EBMT pursuing fully structural NLP

V. Conclusion

Page 14: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering LabAlignment method using Logical Form

Logical Form– Represent the relations among the content

words of a sentence by unordered graph Nodes are content words Branches indicate

underlying semantic relations

– Abstract language-particular aspects of a sentenceEx. word order, inflectional

morphology, function words

[Arul et al., 01]

Spanish

English

Under Hyperlink Information, click the hyperlink address

Page 15: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering LabEfficient Retrieval Method [Doi et al,. 04]

Similarity between input and examples is calculated by word-based Edit Distance

Finding suitable examples from a large parallel corpus takes a long time

Challenged to resolve this problem by– Classifying sentences into groups according to

the # of content words and function words– Compressing all sentences in a group into

“directed word graph”– Searching best example in a group by A*

algorithm

Page 16: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Outline

I. History of Machine Translation

II. Introduction of recent MT systemsi. Statistic Machine Translation (SMT)ii. Example-based Machine Translation

(EBMT)

III. Related work for EBMTi. Logical Formii. Efficient retrieval method

IV. EBMT pursuing fully structural NLP

V. Conclusion

Page 17: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Why EBMT?

Pursuing structural NLP– Improvement of basic analyses leads to

improvement of MT as an application of basic analyses

– Feedback from application (MT) can be expected

Adequacy of problem settings– Not a large corpus, but similar examples in

relatively close domain Ex. Translation of -> version up of instruction manual

related patent document ...

Page 18: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Overview of EBMT

ParallelCorpus Alignment TMDB

Output

EBMT

Input

Advanced NLP technologies

Translation

Page 19: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Alignment

交差点 で 、突然

あの車 が

飛び出して 来た のです 。

the car

came

at me

from the side

at the intersection

Japanese :交差点で、突然あの車が飛び出して来たのです。English : The car came at me from the side at the intersection.

1. Transform into dependency structure

2. Word-based alignment using bilingual lexicon

3. Extend the correspondence of phrases

4. Extract Translation Examples

Page 20: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Translation

my

traffic

The light

was green

when

entering

the intersection

Language Model

My traffic light was green when entering the intersection.

Input

Output

交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)

(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)

(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

交差点に入る時私の信号は青でした。

Page 21: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

IWSLT2005

IWSLT – International Workshop on Spoken

Language Translation– Aiming at translation of ASR (Automatic

Speech Recognition) Outline of campaign

– Training set: parallel corpus including 20K sentences

– Development set: two sets including 500 and 506 sentences

– Test set: manual transcription and ASR output (500 sentences each)

Page 22: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Evaluation Results

Name BLUE

ATR-C3 0.4774

MICROSOFT 0.4057

ATR-SLR 0.3884

TUV 0.3718

NGKUT 0.3418

USC 0.2741

Name NIST

ATR-C3 8.1720

MICROSOFT 8.0375

TUV 7.8472

NGKUT 7.7158

ATR-SLR 4.3928

USC 2.9648

Manual Transcription(Supplied & Tools)

Page 23: Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa.

Language & K nowledge Engineering Lab

Conclusion

In this presentation …– History of Machine Translation– SMT and EBMT– Two related work for EBMT– Introduction of our EBMT system

Future work– Improve our EBMT system

Resolve paraphrase problem Apply anaphora resolution