Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II...
-
Upload
hilary-burke -
Category
Documents
-
view
227 -
download
0
Transcript of Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II...
Jan 2005 CSA4050 Machine Translation II 1
CSA4050: Advanced Techniques in NLP
Machine Translation II• Direct MT• Transfer MT• Interlingual MT
Jan 2005 CSA4050 Machine Translation II 2
History – Pre ALPAC
• 1952 – First MT Conference (MIT)• 1954 – Georgetown System (word for word
based) successfully translated 49 Russian sentences
• 1954 – 1965 – Much investment into brute force empirical approach – crude word-for-word techniques with limited reshuffling of output
• ALPAC (Automatic Language Processing Advisory Committee) Report concludes that research funds should be directed into more fundamental linguistic research
Jan 2005 CSA4050 Machine Translation II 3
History – Post ALPAC
• 1965-1970– Operational Systems approach: SYSTRAN (eventually became
the basis for babelfish)– University centres established in Grenoble (CETA), Montreal
and Saarbruecken
• Systems developed on the basis of linguistic and non-linguistic representations 1970-1990– Ariane (Dependency Grammar)– TAUM METEO (Metamorphoses Grammars)– EUROTRA (multilingual intermediate representations)– ROSETTA (Landsbergen) interlingua based– BSO (Witkam) – Esperanto
• 1990- Data Driven Translation Systems
Jan 2005 CSA4050 Machine Translation II 4
MT Methods
MT
Direct MT Rule-Based MT Data-Driven MT
Transfer Interlingua EBMT SMT
Jan 2005 CSA4050 Machine Translation II 5
Basic Architecture:Direct Translation
source text target text
Basic idea - language pair specific- no intermediate representation- pipeline architecture
Jan 2005 CSA4050 Machine Translation II 6
Staged Direct MT (En/Jp)
Jan 2005 CSA4050 Machine Translation II 7
Direct TranslationAdvantages
• Exploits fact that certain potential ambiguities can be left unresolvedwall -wand/mauer – parete/muro
• Designers can concentrate more on special cases where languages differ.
• Minimal resources necessary: a cheap bilingual dictionary & rudimentary knowledge of target language suffices.
• Translation memories are a (successful and much used) development of this approach.
Jan 2005 CSA4050 Machine Translation II 8
Direct TranslationDisadvantages
• Computationally naive– Basic model: word-for-word translation + local
reordering (e.g. to handle adj+noun order)• Linguistically naive:
– no analysis of internal structure of input, esp. wrt the grammatical relationships between the main parts of sentences.
– no generalisation; everything on a case-by-case basis.
• Generally, poor translation– except in simple cases where there is lots of
isomorphism between sentences.
Jan 2005 CSA4050 Machine Translation II 9
Transfer Model of MT
• To overcome language differences, first build a more abstract representation of the input.
• The translation process as such (called transfer) operates upon at the level of the representation.
• This architecture assumes– analysis via some kind of parsing process.– synthesis via some kind of generation.
Jan 2005 CSA4050 Machine Translation II 10
Basic Architecture:Transfer Model
source text target text
sourcerepresentation
targetrepresentation
analysis generation
transfer
Jan 2005 CSA4050 Machine Translation II 11
Transfer Rules
In General there are two kinds of transfer rule:
• Structural Transfer Rules: these deal with differences in the syntactic structures.
• Lexical Transfer Rules: these deal with cross lingual mappings at the level of words and fixed phrases.
Jan 2005 CSA4050 Machine Translation II 12
Structural Transfer Rule
NPs(Adjs,Nouns) NPt(Nount,Adjt)
13
existential-there-sentence
there was an old man gardening
intermediate-representation-1
an old man gardening was
intermediate-representation-2
gardening an old man wasjapanese-s
niwa no teire o suru ojiisan ita
• delete initial there
• make gardening modify NP
• reverse order of NP/modifier
• lexical transfer
Jan 2005 CSA4050 Machine Translation II 14
More Structural Transfer Rules
Jan 2005 CSA4050 Machine Translation II 15
Lexical Transfer
• Easy cases are based on bilingual dictionary lookup.
• Resolution of ambiguitiesmay require further knowledge
know savoirknow connaître
• Not necessarily word for wordschimmel white horse
Jan 2005 CSA4050 Machine Translation II 16
Transfer Model
• Degree of generalisation depends upon depth of representation:– Deeper the representation, harder it is to do
analysis or generation.– Shallower the representation, the larger the
transfer component.
• Where does ambiguity get resolved?• Number of bilingual components can get
large.
Jan 2005 CSA4050 Machine Translation II 17
Interlingual Translation:The Vauquois Triangle
source text target text
interlingua
analysis generation
increasing depth
Jan 2005 CSA4050 Machine Translation II 18
Interlingual Translation
• Transfer model requires different transfer rules for each language pair.
• Much work for multilingual system.• Interlingual approach eliminates transfer
altogether by creating a language independent canonical form known as an interlingua.
• Various logic-based schemes have been used to represent such forms.
• Other approaches include attribute/value matrices called feature structures.
Jan 2005 CSA4050 Machine Translation II 19
Possible Feature Structure for “There was an old man gardening”
event gardening
type managent number sg
definiteness indef
aspect progressivetense past
Jan 2005 CSA4050 Machine Translation II 20
Ontological Issues
• The designer of an interlingua has a very difficult task.
• What is the appropriate inventory of attributes and values?
• Clearly, the choice has radical effects on the ability of the system to translate faithfully.
• For instance, to handle the muro/parete distinction, the internal/external characteristic of the wall would have to be encoded.
Jan 2005 CSA4050 Machine Translation II 21
Feature Structure for “muro”
word muro
syntax POS class nountype count
field buildingssemantics type structural
position external
Jan 2005 CSA4050 Machine Translation II 22
Interlingual Approach Pros and Cons
• Pros– Portable (avoids N2 problem)– Because representation is normalised structural
transformations are simpler to state.– Explanatory Adequacy
• Cons– Difficult to deal with terms on primitive level:– universals?– Must decompose and reassemble concepts– Useful information lost (paraphrase)
• In practice, works best in small domains.