A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı...

63
A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer

Transcript of A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı...

Page 1: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

A Hybrid Machine Translation System from Turkish to English

Ferhan TüreMSc Thesis, Sabancı UniversityAdvisor: Kemal Oflazer

Page 2: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

2

Introduction Goal: Create a machine translation

system that translates Turkish text into English textTurkish has an agglutinative morphology

ev+im+de+ki+ne to the one at my home

Turkish has free word order Ben eve gittim, Eve gittim ben, Gittim ben eve, ... I went to the house

IdeaWrite rules to translate analyzed Turkish

sentence into English

Page 3: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

3

Outline Machine Translation (MT)

Motivation Challenges in MT History of MT Classical Approaches to MT

The Hybrid Approach Challenges Translation Steps

Analysis and Preprocessing Transfer and Generation Decoding

Evaluation Methods Experimental Results Examples

Conclusions

Page 4: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

4

Machine Translation

Translation Given: Input text s in source language S Find: A well-formed text in target language T

that is equivalent to s

Machine Translation (MT) Any system using an electronic computer to

perform translation

Page 5: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

5

Motivation

Satisfy increasing demand for translation 100 languages with 5 million or more native

speakers Reduce the cost and effort of human translation

13% of EU budget weeks vs. minutes

Make information available to more people in less time translation of web sites automatically

Exploring limits to computers’ ability and linguistic challenges

Page 6: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

6

Challenges in MT

Morphological issues Each language has a different morphology

Syntactical issues Word order in sentences and noun phrases Language-specific features (narrative past tense in

Turkish, distinguishing feminine and masculine nouns)

Semantical issues Word sense ambiguities

bank geographical term OR financial institution? Idiomatic phrases

kafa çekmek pull head OR drink alcohol?

Page 7: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

7

History of MT

Idea by Warren Weaver in 1945 1950s: Russian-English MT research during cold

war between US and USSR 1960s: Funding for research stopped due to

failure Mid-1970s

METÉO: English-French MT in Canada Systran and Eurotra: Multi-lingual MT in Europe TITRAN and MU Project in Kyoto University, Japan

After 90s Statistical MT: Use statistics and large amount of data

Page 8: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

8

MT between English and Turkish

Morphological analyzer Oflazer, 1993.

Morphological disambiguator Oflazer & Kuruöz, 1994. Hakkani-Tür et al., 2000. Yuret & Türe, 2006.

English-to-Turkish MT Sagay, 1981. Hakkani et al., 1998. Keyder Turhan, 1997.

No Turkish-to-English system

Page 9: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

9

Classical Approaches to MT

Page 10: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

10

Vauquois Triangle

Ana

lysi

s

Generation

Syntactic level

Semantic level

Lexical level

Interlingua

Transfer

Page 11: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

11

Word-by-word Translation

Source sentence

Bilingual Dictionary

Target sentence

Source sentence: Ali evdeki kediyi çok sevmezTranslation: Ali home cat very likeReference: Ali does not like the cat at home very much

Page 12: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

12

Direct Translation

Source: Ali evde -ki kediyi çok sevmezAnalysis: Ali ev+Loc Rel+Adj kedi+Acc çok+Adv sev+Neg+PresentLexical: Ali home+Loc at+Adj cat+Acc very much+Adv like+Neg+PresentReorder: Ali at+Adj home+Loc cat+Acc like+Neg+Present very much+AdvGenerate: Ali at home cat not like very much

Source sentence

Morphological Analyzer

Lexical Transfer

Local Reordering

Target sentence

Page 13: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

13

Transfer-based Translation

Source sentence

SL

Grammar

TL

GrammarTarget

sentenceSL

RepresentationTL

Representation

Transfer rules / Dictionary

Page 14: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

14

Source sentence

SL

Grammar

TL

GrammarTarget

sentenceSL

RepresentationTL

Representation

Transfer rules / Dictionary

Amavi

Nev+in

AP NP

NP

Nduvar+ı

NP

NP

Nwall

Detthe

NP

NP PP

Prepof

NP

Detthe

Ablue

Nhouse

AP NP

NP

mavi evin duvarı

the wall of the blue house

Transfer-based Translation

Page 15: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

15

Interlingual Translation

Source sentence

Target sentence

InterlinguaAnalysis Generation

Source: Ali evdeki kediyi çok sevmezInterlingua: ¬holds(in_general,

like(subj: Ali, obj: cat(at: home), degree: very much))

Translation: Ali does not like the cat at home very much

Page 16: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

16

Statistical MT

Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t

)()(maxarg

)(

)()(maxarg)(maxarg

ePetP

tP

ePetPteP

e

ee

Page 17: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

17

Statistical MT

TranslationModel P(t|e)

LanguageModel P(e)

Decodingargmax P(e) * P(t|e) e

whether an English text e is a good translation of a Turkish text t

whether an English text e is well-formed English or not

Turkish-English aligned text English

text

Page 18: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

18

Statistical MT

Translation

LM Score

TM Score

Score

e P(e) P(t|e) P(t|e)×P(e)

I have a book 0.9 0.2 0.18

Hungry Ali be so

0.1 0.8 0.08

Ali was so hungry

0.8 0.8 0.64

...

Ali çok açtı

Ali was so hungry

Page 19: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

19

Outline Machine Translation (MT)

Motivation Challenges in MT History of MT Classical Approaches to MT

The Hybrid Approach Challenges Translation Steps

Analysis and Preprocessing Transfer and Generation Decoding

Evaluation Methods Experimental Results Examples

Conclusions

Page 20: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

20

The Hybrid Approach

Page 21: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

21

Why Hybrid?

Classical transfer-based approaches are good atrepresenting the structural differences

between the source and target languages.and statistical methods are good at

extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.

Page 22: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

22

Challenges

Avrupalılaştıramadıklarımızdanmışsınız

You were among the ones who we were not able to cause to become European

Morphological differences

Extreme case of a word in an agglutinative language

Each Turkish morpheme corresponds to one or more words in English

Page 23: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

23

arkadaşımdakiler

the ones at my friend

Challenges

Morphological differences

Page 24: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

24

dinle+miş+sin (someone told me that) you listeneddinle+di+n you listened

dinle+t+ti+n you made (someone) listendinle+t+tir+di+n you had (someone) make (someone) listen

dinle+r+im I listendinle+r+di+m I used to listen

dinle+t+ebil+ir+miş+im ???

Challenges

Structural differences

Page 25: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

25

Adam evde kitap okuyordu The man was reading a book at homeSUBJ ADJCT OBJ V SUBJ V OBJ ADJCT

mavi kitap blue book AP NP AP NP

evdeki kitap the book at home AP NP NP AP

kitabımın kapağı my book’s cover NP1 NP2 NP1 NP2

arkadaşımın yüzünden because of my friend NP1 NP2 NP2 NP1

Challenges

Structural differences

Page 26: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

26

koyun1.sheep (or bosom)2.your bay3.your dark (one)4.of the bay5.put!

Challenges

Ambiguities

Page 27: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

27

silahını evine koy1.put your gun to your home2.put your gun to his home3.put his gun to your home4.put his gun to his home5.put your gun to her home6.put her gun to your home7.put her gun to her home

.

.

Challenges

Ambiguities

Page 28: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

28

Challenges

Ambiguities

kitabın kapağı1.the book’s cover2.book’s cover3.the cover of the book

Page 29: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

29

ev+Dative (gitti) (went) to the housemasa+Dative (çıktı) (jumped) on the tableadam+Dative (baktı) (looked) at the man

Challenges

Ambiguities

Page 30: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

30

Challenges

Morphological differences---------------------------------------------------------------------------

Structural differences---------------------------------------------------------------------------

Ambiguities

Use morphological analysis on Turkish side and generation on English side

Transfer rules can represent such transformations

An English language model can determine the most probable translation statistically

Page 31: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

31

The Avenue Transfer System

Avenue Project initiated by CMU LTI Group

Grammar formalism, which allows one to manually create a parallel grammar between two languages

and

Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar

Page 32: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

32

Overview of Our ApproachTurkish sentence

Morphological Analyzer

Preprocessor

Analysis

Avenue Transfer Engine

Transfer rulesLattice

English Language Model

...English translations

Most probable English translation

Page 33: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

33

I. Analysis and Preprocessing

Morphological analyses of each word:A set of features, describing the structural properties of the word

adam evde oğlunu yendi

Page 34: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

34

I. Analysis and Preprocessing

Lattice representation of the sentence

ada+N+P1Sg

adam+N+PNon

ev+N+Loc oğul+N+P2Sg

oğul+N+P3Sg

ye+V +Pass+V+Past

yen+N Zero+V+Past

yen+V+Past

0 1 2 3 4

111IG 211IG 311IG

321IG

411IG

6

412IG

121IG

5421IG 422IG

431IG

Page 35: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

35

I. Analysis and Preprocessing

Representation of IGs111IG

211IG 311IG 321IG

411IG 412IG

121IG

421IG 422IG431IG

Page 36: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

36

II. Transfer and Generation

Page 37: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

37

II. Transfer and Generation

121IG 211IG 321IG 431IG

Page 38: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

38

II. Transfer and Generation

121IG 211IG 321IG 431IG

N N N V

Page 39: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

39

II. Transfer and Generation

adam evde oğlunu yendi

N N N Vman won son house

N V N N

Page 40: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

40

II. Transfer and Generation

adam evde oğlunu yendi

N N N V

NP

man won son house

N

NP

the V N N

Page 41: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

41

II. Transfer and Generation

adam evde oğlunu yendi

N N N V

NP

SUBJ SUBJ

man won son house

N

NP

the V N N

Page 42: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

42

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP

SUBJ

N

NP

SUBJ

man won son house

N

NP

the V N N

NP

the

Page 43: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

43

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP

SUBJ Adjct

N

NP

SUBJ Adjct

man won son house

N

NP

the V N

at

N

NP

the

Page 44: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

44

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP NP

SUBJ Adjct

N

NP

SUBJ Adjct

man won son house

N

NP

the V N

NP

his

at

N

NP

the

Page 45: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

45

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP NP

SUBJ Adjct

N

NP

OBJ SUBJ AdjctOBJ

man won son house

N

NP

the V N

NP

his

at

N

NP

the

Page 46: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

46

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP NP Vc

SUBJ Adjct

N

NP

OBJ SUBJ OBJ

man won son house

N

NP

the V

Vc

N

NP

his

Adjct

at

N

NP

the

Page 47: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

47

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP NP Vc

SUBJ Adjct

N

NP

OBJ Vfin

the

SUBJ OBJVfin

man won son house

N

NP

the V

Vc

N

NP

his

Adjct

at

N

NP

the

Page 48: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

48

II. Transfer and Generation

adam evde oğlunu yendi

N N V

NP NP Vc

SUBJ Adjct

N

NP

OBJ Vfin

S

SUBJ OBJVfin

S

man won son house

N

NP

the V

Vc

N

NP

his

Adjct

at

N

NP

the

Page 49: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

49

II. Transfer and Generation

SUBJ AdjctOBJVfin

S

SUBJ Adjct OBJ Vfin

S

Page 50: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

50

II. Transfer and Generation

NP

Adjunct

{Adjunct,3}Adjunct::Adjunct : [NP] -> ["at" NP]((x1::y2)

(x0 = x1)

((x1 CASE) =c Loc)((x1 poss) =c yes)

(y0 = x0))

Adjunct

at NP

Page 51: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

51

II. Transfer and Generation

Vc

Vfin

Vc

Vfin

;; yendi -> won{Vc,2}Vc::Vc : [V] -> [V]((x1::y1)

;Analysis(x0 = x1)

;Constraints((x1 lex) =c (*or* “yen" ...)((x0 casev) <= Acc)((x0 trans) <= yes)

;Transfer((y1 TENSE) = (x1 TENSE))((y1 AGR-PERSON) = (x1 AGR-PERSON))((y1 AGR-NUMBER) = (x1 AGR-NUMBER))((y1 POLARITY) = (x1 POLARITY))

;Generation(y0 = y1))

Page 52: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

52

III. Decoding

Transfer engine outputs n translations T1, ..., Tn

We use an English language model to calculate probability of each translation, and pick the one with highest language model score

)()...()()(

)...()(

12213121

1

mmm

m

wwwPwwwPwwPwP

wwPTP

Page 53: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

53

III. Decoding

),(),(

)()(

)(

atsonhomePislandmybeatP

myislandPmyP

me son at ho beat yourMy islandP

Page 54: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

54

III. Decoding

Translation Log Probability

My island beat your son at home

-29.5973

My island beat his son at home

-27.1953

The man beat your son at home

-23.7629

The man beat his son at home -26.1649

Page 55: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

55

Outline Machine Translation (MT)

Motivation Challenges in MT History of MT Classical Approaches to MT

The Hybrid Approach Challenges Translation Steps

Analysis and Preprocessing Transfer and Generation Decoding

Evaluation Methods Experimental Results Examples

Conclusions

Page 56: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

56

Evaluation

Page 57: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

57

MT Evaluation

•Manual evaluation: •SSER (subjective sentence error rate)•Correct/Incorrect

•Manual evaluations require human effort and time

•Automatic evaluation: •WER (word error rate)•BLEU (Bilingual Evaluation Understudy)•METEOR

Page 58: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

58

Automatic Evaluation

Word Error Rate (WER)Number of insertions, deletions, and substitutions required to transform the reference translation into the system translation

BLEUNumber of common n-grams of words between the system translation S and a set of reference translations

METEORSimilar to BLEU, considers roots and synonyms

Page 59: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

59

Experimental Results

System contains over 200 transfer rules, and 20000 lexical rules

It can parse and translate challenging sentences

Translations are sound, but not complete

We tested the system on 192 noun phrases, and 70 sentences.

BLEU Score for noun phrases: 60.38BLEU Score for sentences: 33.17

Page 60: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

60

Examples

Noun phrase: siyahlarla birlikte bir protesto yürüyüşünde

Translation: in a protest walk with the blacksReference: in a protest walk with the blacks

Noun phrase: Elif 'in arkasındaki kapıdaTranslation: at the door at the back of ElifReference: on the door behind Elif

Noun phrase: alışveriş dünyasındaTranslation: in the shopping worldReference: at the shopping world

Page 61: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

61

ExamplesSentence: Bu tutku zamanla bana acı vermeye başladıTranslation: This passion began to give pain to me with timeReference: In time this passion began to give me pain

Sentence: Perşembe uzun yürüyüşler ve ziyaretler yapıyorumTranslation: I am doing long walks and visits on ThursdayReference: On Thursdays I take long walks and make visits

Sentence: Kaçtıkça daha büyüdü, bir tutku olduTranslation: It grew more as escaping, it became a passionReference: He grew as he ran away, became an obsession

Page 62: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

62

Conclusions & Future Work A hybrid machine translation system from

Turkish to English wide linguistic coverage by manually-crafted

transfer rules in Avenue ambiguities handled by English language model

computationally inefficient translation time-consuming development

Future work further improvement of transfer rules learning rules automatically from parallel corpus

Page 63: A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

63

Thank you!