Example-based Machine Translation
The other corpus-based approach to MT
2/23
Example-based Machine Translation
Historically predates SMT (just about)
At first seen as a rival approach
Now almost marginalised …
… despite (because of?) some convergence
The other corpus-based approach to MT
In this talk I will
Explain basic ideas and problems
Point to differences and similarities between EBMT and SMT
3/23
Example-based MT
Long-established approach to empirical MT
First developed in contrast with rule-based MT
Idea of translation by analogy (Nagao 1984)Translate by adapting previously seen examples rather than by linguistic rule
“Existing translations contain more solutions to more translation problems than any other available resource.” (P. Isabelle et al., TMI, Kyoto, 1993)
In computational terms, belongs in family of Case-based reasoning approaches
4/23
EBMT basic idea
database of translation pairs
match input against example database (like Translation Memory)
identify corresponding translation fragments (align)
recombine fragment into target text
5/23
He buys a book on international politics
Input
Matches
He buys a notebook. Kare wa nōto o kau.
I read a book on international politics.
Watashi wa kokusai seiji nitsuite kakareta hon o yomu.
Result
Kare wa o kau.kokusai seiji nitsuite kakareta hon
Example (Sato & Nagao 1990)
6/23
A bit less hand-waving
Simple example hides some problems, but first notice already some differences with SMT
If the input already appeared in the bitext, system is guaranteed to produce an exact (correct) translation (assuming no contradictory examples)
If the input is only slightly different from the example, there’s a pretty good chance that the translation will be OK
These are both properties of Translation Memories
In its purest form, there is no preprocessing of the corpus in EBMT: everything is done at run time
7/23
Matching the input
In principle, the simplest part of the process:Levenshtein distance for simple string matchCan be enhanced by annotating the examples with linguistic knowledge (POS tags, semantic info, structural representations) to improve accuracy and flexibility
Some approaches suggest generalizing example pairs
you end up with something which looks like RBMT transfer rulesExample generalization is done off-line
Using “rules” that express linguistic knowledgeOr more automatically by merging similar examples
8/23
Generalization using knowledge
John Miller flew to Frankfurt on December 3rd. John Miller ist am 3. Dezember nach Frankfurt geflogen.
<1stname> <lastname> flew to <city> on <month> <ord>.<1stname> <lastname> ist am <num>. <month> nach <city> geflogen.
<person-m> flew to <city> on <date> . < person-m > ist am <date> nach <city> geflogen.
Dr Howard Johnson flew to Ithaca on 7 April 1997.
9/23
The monkey ate a peach. saru wa momo o tabeta.
The man ate a peach. hito wa momo o tabeta
monkey saru man hito
The … ate a peach. … wa momo o tabeta
The dog ate a rabbit. The dog ate a rabbit. inu wa usagi o tabetainu wa usagi o tabeta
dog inurabbit usagi
The … ate a … . … wa … o tabeta
Generalization by analogy – an exercise
10/23
Alignment
Taking the input and the closely-matching example and deciding which fragments of the translation can be reused or need to be changed
Input:The operation was interrupted because the Listening key was pressed.
Matches:
The operation failed because the print key was pressed.
L’opération a échoué car la touche d’impression a été enfoncée.
11/23
Alignment – how is this done?
Dictionary look-up
Comparison of multiple examples
12/23
Alignment – Comparison of multiple examples
Comparison of multiple examples to distinguish alternatives, using semantic similarity (Nagao 1984)
He eats potatoes .
Input
Matches
A man eats vegetables . Hito wa yasai o taberu.
Result
Kare wa jagaimo o
Acid eats metal . San wa kinzoku o okasu.
☺
taberu.
13/23
Alignment – Comparison of multiple examples
Comparison of multiple examples to distinguish alternatives, using semantic similarity (Nagao 1984)
He eats potatoes .
Input
Matches
A man eats vegetables . Hito wa yasai o taberu.
Result
Kare wa jagaimo o
Acid eats metal . San wa kinzoku o okasu.
Sulphuric acid eats iron .
Ryūsan wa tetsu o
☺
taberu. okasu.
14/23
Alignment – how is this done?
Dictionary look-up
Comparison of multiple examples
Precomputed as in SMT: using word-alignment model
15/23
Phrase alignment
Granularity of fragments is a problemToo small = too general when it comes to recombination
(You wouldn’t dream of translating by looking up each individual word in a dictionary and pasting it into position)
Too big = sparse, and difficult to recombine
Working at an intermediate level seems attractive:Phrase-based chunking
Also found in SMT
One fairly successful approach (at DCU) has been …
16/23
Marker-based chunking
Most languages have a set of “marker words” (Green 1979) – roughly speaking, closed-class words
Marker words can be used to distinguish chunksStart a new phrase every time you come across a marker word
Except that each phrase must contain at least one non-marker word
<D> these limits are designed <P> to provide reasonable protection <P> against harmful interference <WH> when the equipment is operated <P> in a residential environment .
<D> these limits are designed <P> to provide reasonable protection <P> against harmful interference <WH> when <D> the equipment is operated <P> in<D> a residential environment .
these limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a residential environment .
17/23
Chunk alignment
Align by finding similar pairs of chunks in other examplesNo need for chunks to align 1:1, …… nor follow the same sequenceMarkers can help, but don’t have to
1<D> these limits are designed 2<P> to provide reasonable protection 3<P> against harmful interference 4<WH> when the equipment is operated 5<P> in a residential environment .
1<D> ces limites sont destinées 2<CONJ> à assurer 3<D> une protection raisonnable 4<P> contre les interférences 5<CONJ> lorsque le matériel est utilisé 6<P> dans un environnement résidentiel .
1<NULL> consult 2<D> the dealer 3<CONJ> or an experienced radio/TV technician 4<P> for help . 1<P> en cas 2<D> de besoin , 3<PRON> se adresser 4<CONJ> à un technicien radio 5<CONJ> ou TV qualifié .
18/23
Recombination
Having identified target-language fragments, how do we put them together?
Depends how examples are storedTemplates with labelled slots
<person-m> flew to <city> on <date> .Tree structures
Kanojo wa kami ga nagai. SHE (topic) HAIR (subj) IS-LONG.
She has long hair. kanojo
nagai
kami
wa ga
have
she hair
long
subj obj
modKare wa me ga aoi.
kare
aoi
me he eyes
blue
He has blue eyes.
19/23
Recombination – a problemConsider again:
He buys a book on politics
Matches
He buys a notebook. Kare wa nōto o kau.
I read a book on politics. Watashi wa seiji nitsuite kakareta hon o yomu.
He buys a pen. Kare wa pen o kau.
She wrote a book on politics. Kanojo wa seiji nitsuite kakareta hon o kaita.Result
Kare wa o kau.
wa seiji nitsuite kakareta hon o wa seiji nitsuite kakareta
hon oKare wa
o kau
20/23
Recombination – another problem
Boundary friction
Solutions?Labelled fragments
(remember where you got the fragment from – use its context)
Target-language grammar
Target language model (as in SMT)
Input: The handsome boy entered the room
Matches:
The handsome boy ate his breakfast. Der schöne Junge aß sein Frühstück
I saw the handsome boy. Ich sah den schönen Jungen.
21/23
EBMT and SMT hybrids
Recombination is like decoding
Matching/alignment phases have produced a bag of fragments that now need to be recombined to form a grammatical target sentence
Essentially the same task as is found in SMT decoding
Doesn’t matter what the source of the fragments is
Similarly, one could imagine an SMT translation model taking ideas from EBMT matching/alignment
22/23
So are EBMT and SMT the same?
Use of a bitext as the fundamental data source
Empirical rather than rational: Principle of machine learning rather than human (linguist) writing rules
From which it follows (in principle) that systems can be improved mainly by getting more data
And it is hoped that new language-pairs can be developed “just” by finding suitable parallel corpus data
Some things in common which distinguish them from Rule-based MT
23/23
So are EBMT and SMT the same?
SMT essentially uses statistical data (parameters, probabilities) derived from the bitext
Preprocessing the data is essential
Even if the input is in the training data, you are not guaranteed to get the same translation
EBMT uses the bitext as its primary data source
Preprocessing the data is optional
If the input is in the example set, you are guaranteed to get the same translation
It may be merely dogmatic to insist, but there are some definitional differences
Top Related