Post on 11-May-2015
Example-Based Machine Translation
Josef van Genabith, CNGL, Dublin City University
Khalil Sima’an, University of Amsterdam
Notes are based on � Carl, M. and Way, A., editors (2003). Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers,
Dordrecht, The Netherlands � Sandipan Dandapat “Mitigating the Problems of SMT using EBMT” PhD Thesis, DCU, 2012 � Dandapat, S., Morrissey, S., Way, A., and van Genabith, J. (2012). Combining EBMT, SMT, TM and IR Technologies for
Quality and Scale. In Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to MachineTranslation (HyTra), pages 48--58, Avignon, France.
� Gough, N. and Way, A. (2004). Robust Large-Scale EBMT with Marker-Based Segmentation. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, (TMI 2004), page 95–104,Baltimore, MD.
� Green, T. (1979). The Necessity of Syntax Markers: Two Experiments with Artificial Languages. Journal of Verbal Learning and Behavior, 18:481–496.
� Groves, D. and Way, A. (2006). Hybridity in MT: Experiments on the Europarl Corpus. In Proceedings of the 11th Annual Conference of the European Association for Machine Translation, (EAMT 2006), page 115–124, Oslo, Norway.
� Hutchins, J. (2005). Example-Based Machine Translation: a Review and Commentary. Machine Translation, 19(3–4):197–211. � Lepage, Y. and Denoual, E. (2005c). The ‘purest’ EBMT System Ever Built: No Variables, No Templates, No Training,
Examples, Just examples, Only Examples. In Proceedings of the 2nd Workshop on Example-based Machine Translation, a Workshop at the MT Summit X, page 81–90, Phuket, Thailand.
� Nagao, M. (1984). A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In Elithorn, A. and Banerji, R., editors, Artificial and Human Intelligence, page 173–180. North-Holland, Amsterdam.
� Somers, H., Dandapat, S., and Naskar, S. K. (2009). A review of EBMT using proportional analogy. In Proceedings of the 3rd Workshop on Example-Based Machine Translation (EBMT 2009), pages 53--60, Dublin, Ireland.
� Dekai Wu (2006) MT Model Space: Statistical versus Compositional versus Example-Based Machine Translation, Machine Translation (2005) 19:213-227
Acknowledgements
3/23
He buys a book on international politics
Input
Matches + Alignment
He buys a notebook. Kare wa nōto o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu.
Recombination Result
Kare wa o kau. kokusai seiji nitsuite kakareta hon
Example (Sato & Nagao 1990)
From: Sandipan Dandapt, PhD, 2012
� “. . . translation is a fine and exciting art, but there is much about it
that is mechanical and routine.” Martin Kay (1997)
� SMT and EBMT systems are corpus-based approaches to MT � SMT: phrase translation probabilities, word reordering probabilities,
lexical weighting � EBMT: usually lacks well defined probability model
EBMT
� EBMT generally uses a sentence-aligned parallel text (TM) as the primary source of data.
� EBMT systems search the source side of the example-base for close matches to the input sentences and obtain corresponding target segments at runtime
� target segments are reused during recombination � EBMT is often linked with the related concept of “Translation Memory” (TM).
� TM is an interactive tool for human translators � EBMT is a fully automatic translation
EBMT and TM
� EBMT: supposed to be good on limited amounts of data and homogeneous data (lots of repetition)
� EBMT systems produce a good translation while SMT systems fail and vice versa
EBMT
� phrase-based SMT approach has proven to be the most successful
MT approach in MT competitions e.g. NIST, WMT, IWSLT etc. � SMT systems discard the actual training data once the translation
model and language model have been estimated � => cannot always guarantee good quality translations for sentences
which closely match those in the training corpora
EBMT and SMT
� Two main approaches to EBMT: � Runtime using proportional analogy � Compile time using generalized translation template-based EBMT
model
EBMT
� rule-based or data-driven MT � Data driven MT: EBMT and SMT
� Corpus-based data driven approaches derive knowledge from
parallel corpora to translate new input
� Mostly SMT today
� A few EBMT (hybrid) systems include CMU-EBMT (Brown, 2011) and Cunei (Phillips, 2011)
� No commercial EBMT?
Background
� Nagao (1984) � “MT by analogy principle”
“Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, ... then by translating these phrases into other language phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.” (Nagao, 1984, p.178)
Where did it start?
� Translation in three steps: matching, alignment and recombination (Somers, 2003): � Matching: finds the example or set of examples from the bitext
which most closely match the source-language string to be translated.
� Alignment: extracts the source–target translation equivalents from the retrieved examples of the matching step.
� Recombination: produces the final translation by combining the target translations of the relevant subsentential fragments.
EBMT core steps
EBMT core steps
Sandipan Dandapt, PhD, 2012
� Where can I find tourist information
� Where can I find ladies dresses⇔ payan kıyafetlerini nereden bulabilirim � just in front of the tourist information⇔ turist bilgilerini hemen önünde
� Where can I find ⇔ nereden bulabilirim � tourist information ⇔ turist bilgilerini
� Where can I find tourist information ⇔ turist bilgilerini nereden bulabilirim
� Update example base … -
Informal Example
� EBMT systems differ widely in their matching stages � involve a distance or similarity measure of some kind (e.g. edit
distance) Character-Based Matching � dynamic programming technique, e.g. Levenshtein distance
a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.
� Problem: system will chose (b) given (a) /
Varieties of EBMT
Word-Based Matching: � Nagao (1984) � uses dictionaries and thesauri to determine the relative word distance
in terms of meaning (Sumita et al., 1990) a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision.
� System will chose (c) given (a). - � Linguistics/knowledge heavy: WordNet, thesaurus, ontology, … /
Varieties of EBMT
Pattern-Based Matching � similar examples can be used to abstract “generalised” translation
templates � Brown (1999): � NE equivalence classes, such as person, date and city � some linguistic information, such as gender and number
a. John Miller flew to Frankfurt on December 3rd. b. ⟨FIRSTNAME-M⟩ ⟨LASTNAME⟩ flew to ⟨CITY⟩ on ⟨MONTH⟩ ⟨ORDINAL⟩. c. ⟨PERSON-M⟩ flew to ⟨CITY⟩ on ⟨DATE⟩.
Varieties of EBMT
Syntax-Based Matching � Kaji et al. (1992): � Source and target side parsers � Alignment using bilingual dictionaries � Generate translation templates
Varieties of EBMT
� X1[NP] no nagasa wa saidai 512 baito de aru ⇔ The maximum length of X1[NP] is 512 bytes � X1[NP] no nagasa wa saidai X2[N] baito de aru ⇔ The maximum length of X1[NP] is X2[N] bytes Looks almost like HPB-SMT ….
Varieties of EBMT
Marker-Based Matching � Green (1979): � Closed call marker words/morphs � Use to chunk
� Veale and Way (1997), Gough and Way (2004)
Varieties of EBMT
that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne [<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’ est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne]
Varieties of EBMT
[<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’ est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne] a. <DET>that is almost ⇔ <DET>c’ est pratiquement b. <DET>a personal record ⇔ <DET>un record personnel c. <PREP>for me this autumn ⇔ <PREP>pour moi cet automne a. <DET> is almost ⇔ <DET> est pratiquement b. <DET> personal record ⇔ <DET> record personnel c. <PREP> me this autumn ⇔ <PREP> moi cet automne
Varieties of EBMT
� a. <PREP> for ⇔ <PREP> pour � b. autumn ⇔ automne
Varieties of EBMT
23/23
The monkey ate a peach. � saru wa momo o tabeta. The man ate a peach. � hito wa momo o tabeta
monkey � saru man � hito
The … ate a peach. � … wa momo o tabeta
The dog ate a rabbit. � inu wa usagi o tabeta
dog � inu rabbit � usagi
The … ate a … . � … wa … o tabeta
Varieties of EBMT
� first introduced as an analogy-based approach to MT � “case-based”, “memory-based” and “experience-guided” MT
� Many, many varieties …..
� two main approaches � With or without preprocessing/training stage � Pure/runtime EBMT vs. compiled EBMT
Approaches to EBMT
Pure/runtime EBMT: � (e.g. Lepage and Denoual, 2005b) � No time consumed for training/preprocessing � But: runtime/translation complexity very considerable …
Compiled approaches: � (e.g. Al-Adhaileh and Tang,1999; Cicekli and G¨uvenir, 2001) � Pre-compute units below sentence level before prediction/translation
time
Approaches to EBMT
� Everything happens at the translation stage � Lepage and Denoual (2005c)
� Based on proportional analogy (PA) � type of analogical learning
� A : B :: C : D � “A is to B as C is to D”
Pure/runtime EBMT
� A : B :: C : D � “A is to B as C is to D”
� A global relationship between 4 objects � “::” ~ “=“ � “Analogical equation” � A : B :: C : D? � Can have one or more solutions � Plato, Aristotle, … � Artificial Intelligence
Analogical Reasoning
a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken
Analogies
a. lungs are to humans as gills are to X? X = fish b. cat : kitten :: dog : X? X = puppy c. speak : spoken :: break : X? X = broken
Analogies
a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken Note: only (c) is a formal analogy! Computable using string operations … That’s the guys we’ll be concerned with most (but not exclusively ….)
Analogies
� Lepage (1998) � algorithm that solves analogical equations over strings or characters � based on longest common subsequences, and edit distance � can handle � insertion/deletion of prefixes and suffixes (22a), � exchange of prefixes/suffixes (22b), � infixing (22c) � parallel infixing (22d).
Solving Analogical Equations
a. (French) répression : répressionnaire :: réaction : x ⇒ x=réactionnaire b. wolf : wolves :: leaf : x ⇒ x=leaves c. (German) fliehen : floh :: schließen : x ⇒ x=schloß d. (Proto-Semitic) yasriqu : sariq :: yanqimu : x ⇒ x=naqim
Solving Analogical Equations
Solving Analogical Equations
Analogy-Based EBMT � “The ‘purest’ EBMT system ever built: no variables, no templates, no
training, examples, just examples, only examples” � (Lepage and Denoual 2005c)
Pure EBMT
Pure EBMT
Nadaron en el mar. Atravesaron el río Flotó en el mar. ???? nadando.
Atravesó el río flotando
1. Find a pair ⟨A,B⟩ of sentences in the example set that satisfies the PA in Equation: A : B :: C(?) : It floated across the river Solving this results in C = It floated in the sea. 2. Take the translations corresponding to A, B and C : A′,B′ and C′. 3. Solve Equation: A′ : B′ :: C′ : D′ (3.3) D′ is the desired translation.
Pure EBMT
This is complex
� O(n²) possibilities for ⟨A,B⟩
� Quadratic time for longest common subsequences and edit distances
� Time bounded solutions � Heuristics ……. (!)
Pure EBMT
Pure EBMT
� analogical equation � Given the three entities (A, B, and C) of a PA � Lepage (1998): algorithm to solve an analogical equation to construct
the fourth entity (D). � Simple and some good results � ALEPH EBMT system � did very well on data from the IWSLT 2004 competition, coming a
close second to the competition winner on all measures(Lepage and Denoual, 2005b, p.273).
� GREYC system
Pure EBMT
� But: � Low recall � Processing time when example base gets large ..
� Yea : Yep :: At five a.m. : At five p.m. - � Yea : Yep :: At five a.m. : At five p.m.
Pure EBMT: Challenges
Pure EBMT: Challenges
Pure EBMT: Challenges
� learns translation templates from parallel sentences � (Cicekli and G¨uvenir, 2001)
Compiled/off-line EBMT
a. I will drink orange juice : portakal suyu içeceğim b. I will drink coffee : kahve içeceğim => a. I will drink : içeceğim b. coffee : kahve c. orange juice : portakal suyu => a. I will drink XS : XT içeceğim b. XS coffee : kahve XT
c. XS orange juice : portakal suyu XT
Compiled/off-line EBMT
� Translation templates essentially reduce the data-sparsity problem by generalizing some of the word sequences
� Another example: the work on the marker based approach we saw earlier on
Compiled/off-line EBMT
� Boundary friction
Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. Der schöne Junge aß sein Frühstück I saw the handsome boy. Ich sah den schönen Jungen. A woman entered the room. Eine Frau betrat den Raum. Output: den schönen Jungen betrat den Raum
� Solutions?
� Labelled fragments (remember where you got the fragment from – use its context)
� Target-language grammar � Target language model (as in SMT)
Compiled/off-line EBMT
� Dekai Wu � MT model space: statistical versus compositional versus example-
based machine translation � a perspective on EBMT from a statistical MT standpoint � What is the definition of EBMT? Do we even know what EBMT is? Is
there a strict definition of EBMT, or are there simply a large number of different models all using corpora, rules, and statistics to varying degrees? Is X a kind of EBMT model? Does X’s model qualify as EBMT but not SMT? Are all SMT models (perhaps excluding the IBM models) actually EBMT models as well? Are EBMT models actually SMT models? Can a rule-based model be example-based? Statistical?
� a three-dimensional “MT model space
EBMT, SMT, RBMT
EBMT, SMT, RBMT
Dekai Wu
� Nagao (1984) first proposed “translation by analogy” (cf. Lepage and Denoual 2005)
� analogical models arising in the mid-1980s under various similar names including “case-based reasoning” (CBR) as in Kolodner (1983a,b), “exemplar-based reasoning” as in Porter and Bareiss (1986) or Kibler and Aha (1997), “instance-based reasoning” as in Aha et al. (1991), “memory based reasoning” as in Stanfill and Waltz (1988), or “analogy-based reasoning” as in Hall (1989) or Veloso and Carbonell (1993).
� Collins and Somers (2003) � CBR: specific implication about how and when learning and
adaptation take place
EBMT, SMT, RBMT
� nontrivial use of a large library of examples/cases/exemplars/instances at runtime
� that is, during the task performance/testing phase rather than the learning/training phase
� New problems are solved at runtime via analogy to similar examples retrieved from the library, which are broken down, adapted, and recombined as needed to form a solution
� This stands in contrast to most other machine learning approaches which focus on heavy offline learning/training phases, so as to compile or generalize large example sets into abstracted performance models consisting of various forms of abstracted schemata (which are normally much smaller than the entire set of training examples).
EBMT, SMT, RBMT
� Leaning toward memorization rather than abstraction of the training set makes some significant tradeoffs. On one hand, given sufficiently large example libraries, memorization avoids loss of coverage often caused by incorrect generalization or overgeneralization. In the extreme case, memorization approaches are guaranteed to reproduce exactly all unique sentence translations from the training corpus, something abstracted schematic approaches may not necessarily do. On the other hand, memorization approaches tend to undergeneralize, and runtime space and time complexity are vastly increased.
� EBMT: SMT with fairly ad hoc numerical measures … -
EBMT, SMT, RBMT
� EBMT - SMTS
� Modern EBMT systems incorporate both; for example, Aramaki et al. (2005), Langlais and Gotti (2006), Liu et al. (2006), and Quirk and Menezes (2006) aim for probabilistic formulations of EBMT in terms of statistical inference
� Lots of references in Dekai’s paper
EBMT, SMT, RBMT
EBMT, SMT, RBMT
Dekai Wu
EBMT, SMT, RBMT
Dekai Wu
� Using EBMT chunks (marker hypothesis) in SMT � Using SMT chunks in EBMT
� Making EBMT more efficient: using IR technology for retrieval � Combining EBMT with TM
A few other approaches
� A taste of EBMT � Hard to define what exactly EBMT is and how it relates to SMT,
RBMT and TM � Some core EBMT approaches � Runtime, pure � Compile time/preprocessing � Many EBMT approaches seem to be essentially hybrid “in nature/in practice”
� EBMT: quo vadis – where are you going?
Conclusion
Compiled/off-line EBMT
Compiled/off-line EBMT
Compiled/off-line EBMT
Compiled/off-line EBMT