A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı...
-
Upload
hannah-cobb -
Category
Documents
-
view
215 -
download
3
Transcript of A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı...
A Hybrid Machine Translation System from Turkish to English
Ferhan TüreMSc Thesis, Sabancı UniversityAdvisor: Kemal Oflazer
2
Introduction Goal: Create a machine translation
system that translates Turkish text into English textTurkish has an agglutinative morphology
ev+im+de+ki+ne to the one at my home
Turkish has free word order Ben eve gittim, Eve gittim ben, Gittim ben eve, ... I went to the house
IdeaWrite rules to translate analyzed Turkish
sentence into English
3
Outline Machine Translation (MT)
Motivation Challenges in MT History of MT Classical Approaches to MT
The Hybrid Approach Challenges Translation Steps
Analysis and Preprocessing Transfer and Generation Decoding
Evaluation Methods Experimental Results Examples
Conclusions
4
Machine Translation
Translation Given: Input text s in source language S Find: A well-formed text in target language T
that is equivalent to s
Machine Translation (MT) Any system using an electronic computer to
perform translation
5
Motivation
Satisfy increasing demand for translation 100 languages with 5 million or more native
speakers Reduce the cost and effort of human translation
13% of EU budget weeks vs. minutes
Make information available to more people in less time translation of web sites automatically
Exploring limits to computers’ ability and linguistic challenges
6
Challenges in MT
Morphological issues Each language has a different morphology
Syntactical issues Word order in sentences and noun phrases Language-specific features (narrative past tense in
Turkish, distinguishing feminine and masculine nouns)
Semantical issues Word sense ambiguities
bank geographical term OR financial institution? Idiomatic phrases
kafa çekmek pull head OR drink alcohol?
7
History of MT
Idea by Warren Weaver in 1945 1950s: Russian-English MT research during cold
war between US and USSR 1960s: Funding for research stopped due to
failure Mid-1970s
METÉO: English-French MT in Canada Systran and Eurotra: Multi-lingual MT in Europe TITRAN and MU Project in Kyoto University, Japan
After 90s Statistical MT: Use statistics and large amount of data
8
MT between English and Turkish
Morphological analyzer Oflazer, 1993.
Morphological disambiguator Oflazer & Kuruöz, 1994. Hakkani-Tür et al., 2000. Yuret & Türe, 2006.
English-to-Turkish MT Sagay, 1981. Hakkani et al., 1998. Keyder Turhan, 1997.
No Turkish-to-English system
9
Classical Approaches to MT
10
Vauquois Triangle
Ana
lysi
s
Generation
Syntactic level
Semantic level
Lexical level
Interlingua
Transfer
11
Word-by-word Translation
Source sentence
Bilingual Dictionary
Target sentence
Source sentence: Ali evdeki kediyi çok sevmezTranslation: Ali home cat very likeReference: Ali does not like the cat at home very much
12
Direct Translation
Source: Ali evde -ki kediyi çok sevmezAnalysis: Ali ev+Loc Rel+Adj kedi+Acc çok+Adv sev+Neg+PresentLexical: Ali home+Loc at+Adj cat+Acc very much+Adv like+Neg+PresentReorder: Ali at+Adj home+Loc cat+Acc like+Neg+Present very much+AdvGenerate: Ali at home cat not like very much
Source sentence
Morphological Analyzer
Lexical Transfer
Local Reordering
Target sentence
13
Transfer-based Translation
Source sentence
SL
Grammar
TL
GrammarTarget
sentenceSL
RepresentationTL
Representation
Transfer rules / Dictionary
14
Source sentence
SL
Grammar
TL
GrammarTarget
sentenceSL
RepresentationTL
Representation
Transfer rules / Dictionary
Amavi
Nev+in
AP NP
NP
Nduvar+ı
NP
NP
Nwall
Detthe
NP
NP PP
Prepof
NP
Detthe
Ablue
Nhouse
AP NP
NP
mavi evin duvarı
the wall of the blue house
Transfer-based Translation
15
Interlingual Translation
Source sentence
Target sentence
InterlinguaAnalysis Generation
Source: Ali evdeki kediyi çok sevmezInterlingua: ¬holds(in_general,
like(subj: Ali, obj: cat(at: home), degree: very much))
Translation: Ali does not like the cat at home very much
16
Statistical MT
Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t
)()(maxarg
)(
)()(maxarg)(maxarg
ePetP
tP
ePetPteP
e
ee
17
Statistical MT
TranslationModel P(t|e)
LanguageModel P(e)
Decodingargmax P(e) * P(t|e) e
whether an English text e is a good translation of a Turkish text t
whether an English text e is well-formed English or not
Turkish-English aligned text English
text
18
Statistical MT
Translation
LM Score
TM Score
Score
e P(e) P(t|e) P(t|e)×P(e)
I have a book 0.9 0.2 0.18
Hungry Ali be so
0.1 0.8 0.08
Ali was so hungry
0.8 0.8 0.64
...
Ali çok açtı
Ali was so hungry
19
Outline Machine Translation (MT)
Motivation Challenges in MT History of MT Classical Approaches to MT
The Hybrid Approach Challenges Translation Steps
Analysis and Preprocessing Transfer and Generation Decoding
Evaluation Methods Experimental Results Examples
Conclusions
20
The Hybrid Approach
21
Why Hybrid?
Classical transfer-based approaches are good atrepresenting the structural differences
between the source and target languages.and statistical methods are good at
extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.
22
Challenges
Avrupalılaştıramadıklarımızdanmışsınız
You were among the ones who we were not able to cause to become European
Morphological differences
Extreme case of a word in an agglutinative language
Each Turkish morpheme corresponds to one or more words in English
23
arkadaşımdakiler
the ones at my friend
Challenges
Morphological differences
24
dinle+miş+sin (someone told me that) you listeneddinle+di+n you listened
dinle+t+ti+n you made (someone) listendinle+t+tir+di+n you had (someone) make (someone) listen
dinle+r+im I listendinle+r+di+m I used to listen
dinle+t+ebil+ir+miş+im ???
Challenges
Structural differences
25
Adam evde kitap okuyordu The man was reading a book at homeSUBJ ADJCT OBJ V SUBJ V OBJ ADJCT
mavi kitap blue book AP NP AP NP
evdeki kitap the book at home AP NP NP AP
kitabımın kapağı my book’s cover NP1 NP2 NP1 NP2
arkadaşımın yüzünden because of my friend NP1 NP2 NP2 NP1
Challenges
Structural differences
26
koyun1.sheep (or bosom)2.your bay3.your dark (one)4.of the bay5.put!
Challenges
Ambiguities
27
silahını evine koy1.put your gun to your home2.put your gun to his home3.put his gun to your home4.put his gun to his home5.put your gun to her home6.put her gun to your home7.put her gun to her home
.
.
Challenges
Ambiguities
28
Challenges
Ambiguities
kitabın kapağı1.the book’s cover2.book’s cover3.the cover of the book
29
ev+Dative (gitti) (went) to the housemasa+Dative (çıktı) (jumped) on the tableadam+Dative (baktı) (looked) at the man
Challenges
Ambiguities
30
Challenges
Morphological differences---------------------------------------------------------------------------
Structural differences---------------------------------------------------------------------------
Ambiguities
Use morphological analysis on Turkish side and generation on English side
Transfer rules can represent such transformations
An English language model can determine the most probable translation statistically
31
The Avenue Transfer System
Avenue Project initiated by CMU LTI Group
Grammar formalism, which allows one to manually create a parallel grammar between two languages
and
Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar
32
Overview of Our ApproachTurkish sentence
Morphological Analyzer
Preprocessor
Analysis
Avenue Transfer Engine
Transfer rulesLattice
English Language Model
...English translations
Most probable English translation
33
I. Analysis and Preprocessing
Morphological analyses of each word:A set of features, describing the structural properties of the word
adam evde oğlunu yendi
34
I. Analysis and Preprocessing
Lattice representation of the sentence
ada+N+P1Sg
adam+N+PNon
ev+N+Loc oğul+N+P2Sg
oğul+N+P3Sg
ye+V +Pass+V+Past
yen+N Zero+V+Past
yen+V+Past
0 1 2 3 4
111IG 211IG 311IG
321IG
411IG
6
412IG
121IG
5421IG 422IG
431IG
35
I. Analysis and Preprocessing
Representation of IGs111IG
211IG 311IG 321IG
411IG 412IG
121IG
421IG 422IG431IG
36
II. Transfer and Generation
37
II. Transfer and Generation
121IG 211IG 321IG 431IG
38
II. Transfer and Generation
121IG 211IG 321IG 431IG
N N N V
39
II. Transfer and Generation
adam evde oğlunu yendi
N N N Vman won son house
N V N N
40
II. Transfer and Generation
adam evde oğlunu yendi
N N N V
NP
man won son house
N
NP
the V N N
41
II. Transfer and Generation
adam evde oğlunu yendi
N N N V
NP
SUBJ SUBJ
man won son house
N
NP
the V N N
42
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP
SUBJ
N
NP
SUBJ
man won son house
N
NP
the V N N
NP
the
43
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP
SUBJ Adjct
N
NP
SUBJ Adjct
man won son house
N
NP
the V N
at
N
NP
the
44
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP NP
SUBJ Adjct
N
NP
SUBJ Adjct
man won son house
N
NP
the V N
NP
his
at
N
NP
the
45
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP NP
SUBJ Adjct
N
NP
OBJ SUBJ AdjctOBJ
man won son house
N
NP
the V N
NP
his
at
N
NP
the
46
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP NP Vc
SUBJ Adjct
N
NP
OBJ SUBJ OBJ
man won son house
N
NP
the V
Vc
N
NP
his
Adjct
at
N
NP
the
47
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP NP Vc
SUBJ Adjct
N
NP
OBJ Vfin
the
SUBJ OBJVfin
man won son house
N
NP
the V
Vc
N
NP
his
Adjct
at
N
NP
the
48
II. Transfer and Generation
adam evde oğlunu yendi
N N V
NP NP Vc
SUBJ Adjct
N
NP
OBJ Vfin
S
SUBJ OBJVfin
S
man won son house
N
NP
the V
Vc
N
NP
his
Adjct
at
N
NP
the
49
II. Transfer and Generation
SUBJ AdjctOBJVfin
S
SUBJ Adjct OBJ Vfin
S
50
II. Transfer and Generation
NP
Adjunct
{Adjunct,3}Adjunct::Adjunct : [NP] -> ["at" NP]((x1::y2)
(x0 = x1)
((x1 CASE) =c Loc)((x1 poss) =c yes)
(y0 = x0))
Adjunct
at NP
51
II. Transfer and Generation
Vc
Vfin
Vc
Vfin
;; yendi -> won{Vc,2}Vc::Vc : [V] -> [V]((x1::y1)
;Analysis(x0 = x1)
;Constraints((x1 lex) =c (*or* “yen" ...)((x0 casev) <= Acc)((x0 trans) <= yes)
;Transfer((y1 TENSE) = (x1 TENSE))((y1 AGR-PERSON) = (x1 AGR-PERSON))((y1 AGR-NUMBER) = (x1 AGR-NUMBER))((y1 POLARITY) = (x1 POLARITY))
;Generation(y0 = y1))
52
III. Decoding
Transfer engine outputs n translations T1, ..., Tn
We use an English language model to calculate probability of each translation, and pick the one with highest language model score
)()...()()(
)...()(
12213121
1
mmm
m
wwwPwwwPwwPwP
wwPTP
53
III. Decoding
),(),(
)()(
)(
atsonhomePislandmybeatP
myislandPmyP
me son at ho beat yourMy islandP
54
III. Decoding
Translation Log Probability
My island beat your son at home
-29.5973
My island beat his son at home
-27.1953
The man beat your son at home
-23.7629
The man beat his son at home -26.1649
55
Outline Machine Translation (MT)
Motivation Challenges in MT History of MT Classical Approaches to MT
The Hybrid Approach Challenges Translation Steps
Analysis and Preprocessing Transfer and Generation Decoding
Evaluation Methods Experimental Results Examples
Conclusions
56
Evaluation
57
MT Evaluation
•Manual evaluation: •SSER (subjective sentence error rate)•Correct/Incorrect
•Manual evaluations require human effort and time
•Automatic evaluation: •WER (word error rate)•BLEU (Bilingual Evaluation Understudy)•METEOR
58
Automatic Evaluation
Word Error Rate (WER)Number of insertions, deletions, and substitutions required to transform the reference translation into the system translation
BLEUNumber of common n-grams of words between the system translation S and a set of reference translations
METEORSimilar to BLEU, considers roots and synonyms
59
Experimental Results
System contains over 200 transfer rules, and 20000 lexical rules
It can parse and translate challenging sentences
Translations are sound, but not complete
We tested the system on 192 noun phrases, and 70 sentences.
BLEU Score for noun phrases: 60.38BLEU Score for sentences: 33.17
60
Examples
Noun phrase: siyahlarla birlikte bir protesto yürüyüşünde
Translation: in a protest walk with the blacksReference: in a protest walk with the blacks
Noun phrase: Elif 'in arkasındaki kapıdaTranslation: at the door at the back of ElifReference: on the door behind Elif
Noun phrase: alışveriş dünyasındaTranslation: in the shopping worldReference: at the shopping world
61
ExamplesSentence: Bu tutku zamanla bana acı vermeye başladıTranslation: This passion began to give pain to me with timeReference: In time this passion began to give me pain
Sentence: Perşembe uzun yürüyüşler ve ziyaretler yapıyorumTranslation: I am doing long walks and visits on ThursdayReference: On Thursdays I take long walks and make visits
Sentence: Kaçtıkça daha büyüdü, bir tutku olduTranslation: It grew more as escaping, it became a passionReference: He grew as he ran away, became an obsession
62
Conclusions & Future Work A hybrid machine translation system from
Turkish to English wide linguistic coverage by manually-crafted
transfer rules in Avenue ambiguities handled by English language model
computationally inefficient translation time-consuming development
Future work further improvement of transfer rules learning rules automatically from parallel corpus
63
Thank you!