Machine Translation MÖSG vt 2004
description
Transcript of Machine Translation MÖSG vt 2004
@Anna Sågvall Hein, MÖSG 2004
Can computers translate?
Not a simple yes or no• depends on the text• the purpose of the translation• the required quality
@Anna Sågvall Hein, MÖSG 2004
Classical problems with MT
• unrealistic expectations• bad translations• difficulties in integrating MT in
the work flow– the Ericsson case
@Anna Sågvall Hein, MÖSG 2004
What is MT proper?To be considered as MT, a system
should provide• mininally correct morphology• minimal syntactic processing• minimal semantic processing• handle and produce full
sentences
Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://ourworld.compuserve.com/homepages/WJHutchins/IAMTcert.htm)
@Anna Sågvall Hein, MÖSG 2004
Basic translation strategies
• direct translation• transfer-based translation• statistical translation• combined strategies
@Anna Sågvall Hein, MÖSG 2004
Direct translation, 1
• no intermediary sentence structure
• the most important language component is a translation dictionary
• translation proceeds mostly word by word, or phrase by phrase
• translation problems are handled more or less case by case by means of specific rules
@Anna Sågvall Hein, MÖSG 2004
Direct translation, 2• quality
– typically browsing quality– depends on
• the quality of the translation dictionary• the coverage of the translation rules
– editing quality may be achieved• problems with
– ambiguity– inflection– word order– structural differences
@Anna Sågvall Hein, MÖSG 2004
Advanced classical approach (Tucker 1987)
• source text dictionary lookups and morphological analysis
• identification of homographs• identification of compounds• identification of nouns and verb
phrases• processing of idioms
@Anna Sågvall Hein, MÖSG 2004
Advanced approach, cont.
• processing of prepositions• subject-predicate identification• syntactic ambiguity
identification• synthesis and morphological
processing of target text• rearrangement of words and
phrases in target text
@Anna Sågvall Hein, MÖSG 2004
Feasibility of the direct translation strategyIs it possible to carry out the direct
translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure?
@Anna Sågvall Hein, MÖSG 2004
SYSTRANSYStem TRANslation• developped in the US by Peter Toma• first version 1969 (Ru-En)• EC bought the rights of Systran in 1976• Systran SA, France, is the current
owner of the rights of Systran• currently 18 language pairs, excl.
Swedish• Swedish-->English is being introduced,
starting in June 2004(http://babelfish.altavista.com/)
@Anna Sågvall Hein, MÖSG 2004
Systran, cont.
• more than 1,600,000 dictionary units• 20 domain dictionaries• daily use by EC translators,
administrators of the European institutions
• originally a direct translation strategy– see H&S
• to-day more of a transfer-based strategy
@Anna Sågvall Hein, MÖSG 2004
Ex. 1: fairly good translation /Systran sv-en"Enskilda företagare som inte bildat
bolag klassificeras hit."
"Individual entrepreneurs that have not formed companies are classified here.”
Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats.
@Anna Sågvall Hein, MÖSG 2004
Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de
inte ens utsatts för influensa."
"When the villages were contacted had they not even been exposed to flu.”
Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd.
@Anna Sågvall Hein, MÖSG 2004
Ex. 3: ambiguity problem/ Systran sv-en"Vad kan vi lära av
Arrawetestammen?"
"What can we faith of the Arawete?”
Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb.
@Anna Sågvall Hein, MÖSG 2004
Ex. 4: ambiguity problem/ Systran sv-en”Extrapoleringen går till så här. "
”The extrapolation goes to so here.”
Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord.
@Anna Sågvall Hein, MÖSG 2004
Motivations for transfer-based translation• lexical ambiguity• structural differences
See further Ingo 91 (6), Wikholm (89)
@Anna Sågvall Hein, MÖSG 2004
Transfer-based translation,1• intermediary sentence structure• provides a basis for the
systematic handling of grammatical problems and lexical choices
• basic processes– analysis– transfer– generation (synthesis)
@Anna Sågvall Hein, MÖSG 2004
Transfer-based translation, 2• knowledge-intensive• language modules
– dictionary and grammar of source language
– transfer dictionary and transfer rules
– dictionary and grammar of target language
@Anna Sågvall Hein, MÖSG 2004
Multra
• transfer-based translation engine• high quality• focus on restricted domains• developped at Uppsala University
@Anna Sågvall Hein, MÖSG 2004
@Anna Sågvall Hein, MÖSG 2004
Multra formalisms
intermediary structure– feature structure
• grammatical function & constituencyanalysis grammar
– proceduraltransfer
– unification based (Beskow 93)synthesis
– PATR-like style (Beskow 93)
@Anna Sågvall Hein, MÖSG 2004
Simplistic approach
• sentence splitting• tokenisation• handling capital letters• dictionary look-up and lexical
substitution• copying unknown words, digits,
signs of punctuation etc.• formal editing
@Anna Sågvall Hein, MÖSG 2004
Ex. 1: Multra
Sv. I oljefilterhållaren sitter en överströmningsventil.
En. The oil filter retainer has an overflow valve.
(from the Scania corpus)
sitter hasadv subjsubj obj
@Anna Sågvall Hein, MÖSG 2004
Ex. 2
Sv. Fyll på olja i växellådan. En. Fill gearbox with oil.(from the Scania corpus)
fyll på fillobj advadv obj
@Anna Sågvall Hein, MÖSG 2004
Ex. 3: Multra
Detta filter ska bytas med jämna mellanrum.
This filter must be renewed at regular intervals.
Lexical choices in the context
ska - mustbyta –renewmed - atjämna – regularmellanrum - interval
@Anna Sågvall Hein, MÖSG 2004
Ex. 4: Multra
Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600.
The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600.
gäller – applies tobeteckning – the designations
@Anna Sågvall Hein, MÖSG 2004
Feasibility of machine translation• Re-use of translations• Quality in relation to purpose• Sublanguage• Spell checked and grammar
checked SL• Controlled language• Human machine interaction• Evalution data and criteria
@Anna Sågvall Hein, MÖSG 2004
Re-use of previous translations• translation memories• translation dictionaries• statistical machine translation
@Anna Sågvall Hein, MÖSG 2004
Re-use techniques,1
• sentence alignment– linking source and target sentences
pairwise– success rate close to 100 %– translation memories
@Anna Sågvall Hein, MÖSG 2004
Re-use techniques, 2
• word alignment– linking sub-sentence segments,
typically, source and target words and phrases pairwise
– large-scale processing– success rate close to 80 %– translation dictionaries– statistical machine translation
@Anna Sågvall Hein, MÖSG 2004
A word alignment exampleJag tar mittplatsen, som jag inte tycker om.
I take the middle seat, which I dislike.
jag – Itar – takemittplatsen – the middle seatsom – whichjag – Iinte tycker om – dislike
(from Tiedemann 2003)
@Anna Sågvall Hein, MÖSG 2004
Statistical machine translation• large scale word alignment
– raw translation dictionary • direct translation using the
dictionary– no translation rules
• smoothing the translation by means of a language model– statistically based
• decoding algorithm cruical• arabic – english• hindi - english
@Anna Sågvall Hein, MÖSG 2004
Quality
• publishing quality– high quality translation, good
enough for publishing, typically, after inspection and minor editing
• browsing quality– low quality translation,
comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words
@Anna Sågvall Hein, MÖSG 2004
Translation purposes• translation
– publishing quality• browsing
– browsing quality• gisting
– browsing quality• drafting
– publishing/browsing quality?• cross-language information
retrieval– browsing quality
@Anna Sågvall Hein, MÖSG 2004
MT as a cross-language communication toolMT is used not only for pure
translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001)
Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001
(http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.htm)
@Anna Sågvall Hein, MÖSG 2004
Restrictions on the input language
– sublanguage• text type• domain
– controlled language– spell checked– grammar checked
@Anna Sågvall Hein, MÖSG 2004
Typically
• general language – browsing quality
• restricted language – high quality
@Anna Sågvall Hein, MÖSG 2004
Spell checking and grammar checking• If there are spelling errors or typos
in the SL dictionary search will fail• If there are grammatical errors in
the SL grammatical analysis will fail
Where and how should spell and grammar checking be accounted for? Before or during the process?
@Anna Sågvall Hein, MÖSG 2004
Controlled language
controlled vocabulary– full lexical coverage, e.g. Scania
Swedishcontrolled grammar
– full grammatical coveragelanguage checker
– e.g. Scania Checker
@Anna Sågvall Hein, MÖSG 2004
Human intervention
before– language checking
during– e.g. ambiguity resolution
after– post-editing
@Anna Sågvall Hein, MÖSG 2004
Evaluation of MT
• coverage (recall)• quality (precision)
@Anna Sågvall Hein, MÖSG 2004
Current trends in direct translationre-use of translations
– translation memories of sentences and sub-sentence units such as words, phrases and larger units
– example-based translation– statistical translation
Will re-use of translations overcome the problems with the direct translation approach that were discussed above?
If so, how can the problems be handled?
@Anna Sågvall Hein, MÖSG 2004
Why machine translation?
• cheaper• faster• more consequent• when it succeeds ..
@Anna Sågvall Hein, MÖSG 2004
Assignment: Hable Con Ella (en-sv)
• Make a general quality assessment of the translation.
• Suggest a possible use of a translation of this kind. • Identify the steps that were taken in the
translation. • Specify the translation errors that were made and
discuss them.• Suggest improvements in the framework of the
direct translation strategy.• Motivate them.• Formalise them in a framework of your own
choice.• Discuss their general adequacy in the translation
of Swedish to English.