4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation
-
Upload
riilp -
Category
Technology
-
view
456 -
download
2
Transcript of 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation
![Page 1: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/1.jpg)
Example-Based Machine Translation
Josef van Genabith, CNGL, Dublin City University
Khalil Sima’an, University of Amsterdam
![Page 2: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/2.jpg)
Notes are based on � Carl, M. and Way, A., editors (2003). Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers,
Dordrecht, The Netherlands � Sandipan Dandapat “Mitigating the Problems of SMT using EBMT” PhD Thesis, DCU, 2012 � Dandapat, S., Morrissey, S., Way, A., and van Genabith, J. (2012). Combining EBMT, SMT, TM and IR Technologies for
Quality and Scale. In Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to MachineTranslation (HyTra), pages 48--58, Avignon, France.
� Gough, N. and Way, A. (2004). Robust Large-Scale EBMT with Marker-Based Segmentation. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, (TMI 2004), page 95–104,Baltimore, MD.
� Green, T. (1979). The Necessity of Syntax Markers: Two Experiments with Artificial Languages. Journal of Verbal Learning and Behavior, 18:481–496.
� Groves, D. and Way, A. (2006). Hybridity in MT: Experiments on the Europarl Corpus. In Proceedings of the 11th Annual Conference of the European Association for Machine Translation, (EAMT 2006), page 115–124, Oslo, Norway.
� Hutchins, J. (2005). Example-Based Machine Translation: a Review and Commentary. Machine Translation, 19(3–4):197–211. � Lepage, Y. and Denoual, E. (2005c). The ‘purest’ EBMT System Ever Built: No Variables, No Templates, No Training,
Examples, Just examples, Only Examples. In Proceedings of the 2nd Workshop on Example-based Machine Translation, a Workshop at the MT Summit X, page 81–90, Phuket, Thailand.
� Nagao, M. (1984). A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In Elithorn, A. and Banerji, R., editors, Artificial and Human Intelligence, page 173–180. North-Holland, Amsterdam.
� Somers, H., Dandapat, S., and Naskar, S. K. (2009). A review of EBMT using proportional analogy. In Proceedings of the 3rd Workshop on Example-Based Machine Translation (EBMT 2009), pages 53--60, Dublin, Ireland.
� Dekai Wu (2006) MT Model Space: Statistical versus Compositional versus Example-Based Machine Translation, Machine Translation (2005) 19:213-227
Acknowledgements
![Page 3: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/3.jpg)
3/23
He buys a book on international politics
Input
Matches + Alignment
He buys a notebook. Kare wa nōto o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu.
Recombination Result
Kare wa o kau. kokusai seiji nitsuite kakareta hon
Example (Sato & Nagao 1990)
From: Sandipan Dandapt, PhD, 2012
![Page 4: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/4.jpg)
� “. . . translation is a fine and exciting art, but there is much about it
that is mechanical and routine.” Martin Kay (1997)
� SMT and EBMT systems are corpus-based approaches to MT � SMT: phrase translation probabilities, word reordering probabilities,
lexical weighting � EBMT: usually lacks well defined probability model
EBMT
![Page 5: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/5.jpg)
� EBMT generally uses a sentence-aligned parallel text (TM) as the primary source of data.
� EBMT systems search the source side of the example-base for close matches to the input sentences and obtain corresponding target segments at runtime
� target segments are reused during recombination � EBMT is often linked with the related concept of “Translation Memory” (TM).
� TM is an interactive tool for human translators � EBMT is a fully automatic translation
EBMT and TM
![Page 6: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/6.jpg)
� EBMT: supposed to be good on limited amounts of data and homogeneous data (lots of repetition)
� EBMT systems produce a good translation while SMT systems fail and vice versa
EBMT
![Page 7: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/7.jpg)
� phrase-based SMT approach has proven to be the most successful
MT approach in MT competitions e.g. NIST, WMT, IWSLT etc. � SMT systems discard the actual training data once the translation
model and language model have been estimated � => cannot always guarantee good quality translations for sentences
which closely match those in the training corpora
EBMT and SMT
![Page 8: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/8.jpg)
� Two main approaches to EBMT: � Runtime using proportional analogy � Compile time using generalized translation template-based EBMT
model
EBMT
![Page 9: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/9.jpg)
� rule-based or data-driven MT � Data driven MT: EBMT and SMT
� Corpus-based data driven approaches derive knowledge from
parallel corpora to translate new input
� Mostly SMT today
� A few EBMT (hybrid) systems include CMU-EBMT (Brown, 2011) and Cunei (Phillips, 2011)
� No commercial EBMT?
Background
![Page 10: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/10.jpg)
� Nagao (1984) � “MT by analogy principle”
“Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, ... then by translating these phrases into other language phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.” (Nagao, 1984, p.178)
Where did it start?
![Page 11: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/11.jpg)
� Translation in three steps: matching, alignment and recombination (Somers, 2003): � Matching: finds the example or set of examples from the bitext
which most closely match the source-language string to be translated.
� Alignment: extracts the source–target translation equivalents from the retrieved examples of the matching step.
� Recombination: produces the final translation by combining the target translations of the relevant subsentential fragments.
EBMT core steps
![Page 12: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/12.jpg)
EBMT core steps
Sandipan Dandapt, PhD, 2012
![Page 13: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/13.jpg)
� Where can I find tourist information
� Where can I find ladies dresses⇔ payan kıyafetlerini nereden bulabilirim � just in front of the tourist information⇔ turist bilgilerini hemen önünde
� Where can I find ⇔ nereden bulabilirim � tourist information ⇔ turist bilgilerini
� Where can I find tourist information ⇔ turist bilgilerini nereden bulabilirim
� Update example base … -
Informal Example
![Page 14: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/14.jpg)
� EBMT systems differ widely in their matching stages � involve a distance or similarity measure of some kind (e.g. edit
distance) Character-Based Matching � dynamic programming technique, e.g. Levenshtein distance
a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.
� Problem: system will chose (b) given (a) /
Varieties of EBMT
![Page 15: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/15.jpg)
Word-Based Matching: � Nagao (1984) � uses dictionaries and thesauri to determine the relative word distance
in terms of meaning (Sumita et al., 1990) a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.
� System will chose (c) given (a). - � Linguistics/knowledge heavy: WordNet, thesaurus, ontology, … /
Varieties of EBMT
![Page 16: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/16.jpg)
Pattern-Based Matching � similar examples can be used to abstract “generalised” translation
templates � Brown (1999): � NE equivalence classes, such as person, date and city � some linguistic information, such as gender and number
a. John Miller flew to Frankfurt on December 3rd. b. ⟨FIRSTNAME-M⟩ ⟨LASTNAME⟩ flew to ⟨CITY⟩ on ⟨MONTH⟩ ⟨ORDINAL⟩. c. ⟨PERSON-M⟩ flew to ⟨CITY⟩ on ⟨DATE⟩.
Varieties of EBMT
![Page 17: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/17.jpg)
Syntax-Based Matching � Kaji et al. (1992): � Source and target side parsers � Alignment using bilingual dictionaries � Generate translation templates
Varieties of EBMT
![Page 18: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/18.jpg)
� X1[NP] no nagasa wa saidai 512 baito de aru ⇔ The maximum length of X1[NP] is 512 bytes � X1[NP] no nagasa wa saidai X2[N] baito de aru ⇔ The maximum length of X1[NP] is X2[N] bytes Looks almost like HPB-SMT ….
Varieties of EBMT
![Page 19: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/19.jpg)
Marker-Based Matching � Green (1979): � Closed call marker words/morphs � Use to chunk
� Veale and Way (1997), Gough and Way (2004)
Varieties of EBMT
![Page 20: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/20.jpg)
that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne [<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’ est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne]
Varieties of EBMT
![Page 21: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/21.jpg)
[<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’ est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne] a. <DET>that is almost ⇔ <DET>c’ est pratiquement b. <DET>a personal record ⇔ <DET>un record personnel c. <PREP>for me this autumn ⇔ <PREP>pour moi cet automne a. <DET> is almost ⇔ <DET> est pratiquement b. <DET> personal record ⇔ <DET> record personnel c. <PREP> me this autumn ⇔ <PREP> moi cet automne
Varieties of EBMT
![Page 22: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/22.jpg)
� a. <PREP> for ⇔ <PREP> pour � b. autumn ⇔ automne
Varieties of EBMT
![Page 23: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/23.jpg)
23/23
The monkey ate a peach. � saru wa momo o tabeta. The man ate a peach. � hito wa momo o tabeta
monkey � saru man � hito
The … ate a peach. � … wa momo o tabeta
The dog ate a rabbit. � inu wa usagi o tabeta
dog � inu rabbit � usagi
The … ate a … . � … wa … o tabeta
Varieties of EBMT
![Page 24: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/24.jpg)
� first introduced as an analogy-based approach to MT � “case-based”, “memory-based” and “experience-guided” MT
� Many, many varieties …..
� two main approaches � With or without preprocessing/training stage � Pure/runtime EBMT vs. compiled EBMT
Approaches to EBMT
![Page 25: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/25.jpg)
Pure/runtime EBMT: � (e.g. Lepage and Denoual, 2005b) � No time consumed for training/preprocessing � But: runtime/translation complexity very considerable …
Compiled approaches: � (e.g. Al-Adhaileh and Tang,1999; Cicekli and G¨uvenir, 2001) � Pre-compute units below sentence level before prediction/translation
time
Approaches to EBMT
![Page 26: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/26.jpg)
� Everything happens at the translation stage � Lepage and Denoual (2005c)
� Based on proportional analogy (PA) � type of analogical learning
� A : B :: C : D � “A is to B as C is to D”
Pure/runtime EBMT
![Page 27: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/27.jpg)
� A : B :: C : D � “A is to B as C is to D”
� A global relationship between 4 objects � “::” ~ “=“ � “Analogical equation” � A : B :: C : D? � Can have one or more solutions � Plato, Aristotle, … � Artificial Intelligence
Analogical Reasoning
![Page 28: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/28.jpg)
a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken
Analogies
![Page 29: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/29.jpg)
a. lungs are to humans as gills are to X? X = fish b. cat : kitten :: dog : X? X = puppy c. speak : spoken :: break : X? X = broken
Analogies
![Page 30: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/30.jpg)
a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken Note: only (c) is a formal analogy! Computable using string operations … That’s the guys we’ll be concerned with most (but not exclusively ….)
Analogies
![Page 31: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/31.jpg)
� Lepage (1998) � algorithm that solves analogical equations over strings or characters � based on longest common subsequences, and edit distance � can handle � insertion/deletion of prefixes and suffixes (22a), � exchange of prefixes/suffixes (22b), � infixing (22c) � parallel infixing (22d).
Solving Analogical Equations
![Page 32: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/32.jpg)
a. (French) répression : répressionnaire :: réaction : x ⇒ x=réactionnaire b. wolf : wolves :: leaf : x ⇒ x=leaves c. (German) fliehen : floh :: schließen : x ⇒ x=schloß d. (Proto-Semitic) yasriqu : sariq :: yanqimu : x ⇒ x=naqim
Solving Analogical Equations
![Page 33: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/33.jpg)
Solving Analogical Equations
![Page 34: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/34.jpg)
Analogy-Based EBMT � “The ‘purest’ EBMT system ever built: no variables, no templates, no
training, examples, just examples, only examples” � (Lepage and Denoual 2005c)
Pure EBMT
![Page 35: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/35.jpg)
Pure EBMT
Nadaron en el mar. Atravesaron el río Flotó en el mar. ???? nadando.
Atravesó el río flotando
![Page 36: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/36.jpg)
1. Find a pair ⟨A,B⟩ of sentences in the example set that satisfies the PA in Equation: A : B :: C(?) : It floated across the river Solving this results in C = It floated in the sea. 2. Take the translations corresponding to A, B and C : A′,B′ and C′. 3. Solve Equation: A′ : B′ :: C′ : D′ (3.3) D′ is the desired translation.
Pure EBMT
![Page 37: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/37.jpg)
This is complex
� O(n²) possibilities for ⟨A,B⟩
� Quadratic time for longest common subsequences and edit distances
� Time bounded solutions � Heuristics ……. (!)
Pure EBMT
![Page 38: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/38.jpg)
Pure EBMT
![Page 39: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/39.jpg)
� analogical equation � Given the three entities (A, B, and C) of a PA � Lepage (1998): algorithm to solve an analogical equation to construct
the fourth entity (D). � Simple and some good results � ALEPH EBMT system � did very well on data from the IWSLT 2004 competition, coming a
close second to the competition winner on all measures(Lepage and Denoual, 2005b, p.273).
� GREYC system
Pure EBMT
![Page 40: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/40.jpg)
� But: � Low recall � Processing time when example base gets large ..
� Yea : Yep :: At five a.m. : At five p.m. - � Yea : Yep :: At five a.m. : At five p.m.
Pure EBMT: Challenges
![Page 41: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/41.jpg)
Pure EBMT: Challenges
![Page 42: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/42.jpg)
Pure EBMT: Challenges
![Page 43: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/43.jpg)
� learns translation templates from parallel sentences � (Cicekli and G¨uvenir, 2001)
Compiled/off-line EBMT
![Page 44: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/44.jpg)
a. I will drink orange juice : portakal suyu içeceğim b. I will drink coffee : kahve içeceğim => a. I will drink : içeceğim b. coffee : kahve c. orange juice : portakal suyu => a. I will drink XS : XT içeceğim b. XS coffee : kahve XT
c. XS orange juice : portakal suyu XT
Compiled/off-line EBMT
![Page 45: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/45.jpg)
� Translation templates essentially reduce the data-sparsity problem by generalizing some of the word sequences
� Another example: the work on the marker based approach we saw earlier on
Compiled/off-line EBMT
![Page 46: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/46.jpg)
� Boundary friction
Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. Der schöne Junge aß sein Frühstück I saw the handsome boy. Ich sah den schönen Jungen. A woman entered the room. Eine Frau betrat den Raum. Output: den schönen Jungen betrat den Raum
� Solutions?
� Labelled fragments (remember where you got the fragment from – use its context)
� Target-language grammar � Target language model (as in SMT)
Compiled/off-line EBMT
![Page 47: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/47.jpg)
� Dekai Wu � MT model space: statistical versus compositional versus example-
based machine translation � a perspective on EBMT from a statistical MT standpoint � What is the definition of EBMT? Do we even know what EBMT is? Is
there a strict definition of EBMT, or are there simply a large number of different models all using corpora, rules, and statistics to varying degrees? Is X a kind of EBMT model? Does X’s model qualify as EBMT but not SMT? Are all SMT models (perhaps excluding the IBM models) actually EBMT models as well? Are EBMT models actually SMT models? Can a rule-based model be example-based? Statistical?
� a three-dimensional “MT model space
EBMT, SMT, RBMT
![Page 48: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/48.jpg)
EBMT, SMT, RBMT
Dekai Wu
![Page 49: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/49.jpg)
� Nagao (1984) first proposed “translation by analogy” (cf. Lepage and Denoual 2005)
� analogical models arising in the mid-1980s under various similar names including “case-based reasoning” (CBR) as in Kolodner (1983a,b), “exemplar-based reasoning” as in Porter and Bareiss (1986) or Kibler and Aha (1997), “instance-based reasoning” as in Aha et al. (1991), “memory based reasoning” as in Stanfill and Waltz (1988), or “analogy-based reasoning” as in Hall (1989) or Veloso and Carbonell (1993).
� Collins and Somers (2003) � CBR: specific implication about how and when learning and
adaptation take place
EBMT, SMT, RBMT
![Page 50: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/50.jpg)
� nontrivial use of a large library of examples/cases/exemplars/instances at runtime
� that is, during the task performance/testing phase rather than the learning/training phase
� New problems are solved at runtime via analogy to similar examples retrieved from the library, which are broken down, adapted, and recombined as needed to form a solution
� This stands in contrast to most other machine learning approaches which focus on heavy offline learning/training phases, so as to compile or generalize large example sets into abstracted performance models consisting of various forms of abstracted schemata (which are normally much smaller than the entire set of training examples).
EBMT, SMT, RBMT
![Page 51: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/51.jpg)
� Leaning toward memorization rather than abstraction of the training set makes some significant tradeoffs. On one hand, given sufficiently large example libraries, memorization avoids loss of coverage often caused by incorrect generalization or overgeneralization. In the extreme case, memorization approaches are guaranteed to reproduce exactly all unique sentence translations from the training corpus, something abstracted schematic approaches may not necessarily do. On the other hand, memorization approaches tend to undergeneralize, and runtime space and time complexity are vastly increased.
� EBMT: SMT with fairly ad hoc numerical measures … -
EBMT, SMT, RBMT
![Page 52: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/52.jpg)
� EBMT - SMTS
� Modern EBMT systems incorporate both; for example, Aramaki et al. (2005), Langlais and Gotti (2006), Liu et al. (2006), and Quirk and Menezes (2006) aim for probabilistic formulations of EBMT in terms of statistical inference
� Lots of references in Dekai’s paper
EBMT, SMT, RBMT
![Page 53: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/53.jpg)
EBMT, SMT, RBMT
Dekai Wu
![Page 54: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/54.jpg)
EBMT, SMT, RBMT
Dekai Wu
![Page 55: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/55.jpg)
� Using EBMT chunks (marker hypothesis) in SMT � Using SMT chunks in EBMT
� Making EBMT more efficient: using IR technology for retrieval � Combining EBMT with TM
A few other approaches
![Page 56: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/56.jpg)
� A taste of EBMT � Hard to define what exactly EBMT is and how it relates to SMT,
RBMT and TM � Some core EBMT approaches � Runtime, pure � Compile time/preprocessing � Many EBMT approaches seem to be essentially hybrid “in nature/in practice”
� EBMT: quo vadis – where are you going?
Conclusion
![Page 57: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/57.jpg)
Compiled/off-line EBMT
![Page 58: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/58.jpg)
Compiled/off-line EBMT
![Page 59: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/59.jpg)
Compiled/off-line EBMT
![Page 60: 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation](https://reader033.fdocuments.us/reader033/viewer/2022052823/55502570b4c905de2d8b4756/html5/thumbnails/60.jpg)
Compiled/off-line EBMT