KantanFest: Andy Way
-
Upload
kantanmt -
Category
Technology
-
view
53 -
download
1
Transcript of KantanFest: Andy Way
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Andy Way
ADAPT Centre @ Dublin City University
Separating the Hype from the Reality: Neural Machine Translation
www.adaptcentre.ieThe Time for MT is
| 2
www.adaptcentre.ieThe hype begins …
| 3
www.adaptcentre.ie
| 4
The hype gathers pace …
Thanks to Sheila Castilho
www.adaptcentre.ie
| 5
And translators go (again) …
Thanks to Sheila Castilho
www.adaptcentre.ieLeads to Us vs. Them Mentality (again)
| 6
www.adaptcentre.ieTranslators are quite well-disposed towards technology
| 7
LeBlanc 2013, 2017
Koskinen & Ruokenon 2017
Thanks to Dorothy Kenny
www.adaptcentre.ieTranslators are quite well-disposed towards technology
| 8
“The public thinks that technology takes care of translation. Good technology is all about making good translators better”
Jost Zetzsche (@jeromobot)
24th June 2017
www.adaptcentre.ieSo, should translators be afraid of NMT?
| 9
www.adaptcentre.ieSo, should translators be afraid of NMT?
| 10
Labradoodles, or Fried Chicken?
www.adaptcentre.ieSo, should translators be afraid of NMT?
| 11
www.adaptcentre.ieSo, should translators be afraid of NMT?
| 12
Chihuahuas, or Muffins?
www.adaptcentre.ie
Philipp Koehn, Omniscien Webinar 2017
MT has been overhyped for years …
| 13
www.adaptcentre.ieSo what can we learn from the past?
Why did we start doing SMT (and Hybrid MT)?
We wrote a (I thought) good paper on EBMT, submitted to ACL, and were rejected by all three reviewers. Why?
Because we hadn't compared our results with 'state-of-the-art' SMT!
| 14
www.adaptcentre.iePhrase-Based SMT then came along in earnest
• “Let the data decide”!
| 15
www.adaptcentre.iePhrase-Based SMT then came along in earnest
• “Let the data decide”!
• But what have we spent the last 10 years doing?
Smuggling in Syntax, Semantics, and (lately) Discourse features to break through the glass ceiling.
| 16
www.adaptcentre.iePhrase-Based SMT then came along in earnest
• “Let the data decide”!
• But what have we spent the last 10 years doing?
Smuggling in Syntax, Semantics, and (lately) Discourse features to break through the glass ceiling.
| 17
www.adaptcentre.ieSMT & Linguistics
SMT practitioners know now about the value of linguistic information
cf. Alex Fraser's keynote at EAMT-16: agreement phenomena (gender, person, number, case),
verbal inflection,
compounding,
terminology,
lexical/structural ambiguity,
pronouns ...
| 18
www.adaptcentre.ieWhat’s happened since?
Deep Learning came along and took off! “Let the data decide”! Recent (accepted) ACL 2016 paper on SMT:
“you haven't compared your results with 'state-of-the-art' NMT”!
| 19
www.adaptcentre.ieThis isn’t rocket science!
| 20
www.adaptcentre.ieWhat is the actual situation?
• Wins for NMT for numerous language pairs at IWSLT/WMT 2015 & WMT 2016
• Bentivogli et al. (2016 – arxiv; EMNLP)
– IWSLT 2015 English-German: NMT compared to 4 SMT systems
– Automatic Evaluation:
• NMT outperforms SMT system in any length bin, with statistically significant differences
– Human Evaluation:
• NMT makes at least 19% fewer morphology errors than SMT
• NMT makes at least 17% fewer lexical errors than SMT
• NMT translations require about 50% fewer shifts than SMT
• NMT reduces verb order errors by 70% with respect to best SMT system
• NMT reduces noun order errors by 47% with respect to best SMT system
• NMT gains also for prepositions (-18%), negation particles (-17%) and articles (-4%)
• NMT generates outputs that considerably lower the overall post-editing effort w.r.t best SMT system (-26%)
| 21
www.adaptcentre.ieOther Use-Cases
• NMT for E-Commerce
• NMT for Patents
• NMT for MOOCs
[Castilho et al. 2017, EAMT]
• Five other human evaluations of NMT/SMT at EAMT 2017 (inc. from ))
| 22
www.adaptcentre.ieNMT for E-Commerce
• Translate product listings
• Systems (Calixto et al. 2017—EACL):
• (1) a PBSMT baseline model built with the Moses SMT Toolkit
• (2) a text-only NMTt model
• (3) a multi-modal NMT model (NMTm)
• English into German
• Data set: 24k parallel product listings + images
• Validation/test data: 480/444 tuples
• 18 German native speakers
• Ranking
• Translations from the 3 systems + product image
• Adequacy (Likert scale 1- All of it to 4- None of it)
• Source + translation + product image
| 23
www.adaptcentre.ieNMT for E-Commerce
• AEM:
• PBSMT outperforms both NMT models (BLEU, METEOR and chrF3)
• NMTm performs as well as PBSMT (TER)
• Adequacy
• NMTm performs as well as PBSMT
• Ranking
• PBSMT: 56.3% preferred system
• NMTm: 24.8%
• NMTt: 18.8%
| 24
www.adaptcentre.ieNMT for Patents
| 25
• Compare performance of mature patent MT engines used in production with new NMT system
• Systems
• PBSMT (a combination of elements of phrase-based, syntactic, and rule-driven MT, along with automatic post-editing)
• NMT (baseline)
• English into Chinese
• Data set: ~1M sentence pairs chemical abstracts, ~350K chemical titles, ~12M general patent, and ~2K glossaries.
• 2 reviewers
• Ranking
• Error analysis
• Punctuation, part of speech, omission, addition, wrong terminology, literal translation, and word form.
www.adaptcentre.ieNMT for Patents
| 26
• AEM:• SMT outperforms NMT for abstracts, NMT outperforms SMT for titles
• Ranking• General: PBSMT 54% -- MT 39%• Long sentences: PBSMT 58% -- NMT 33%• Short sentences: PBSMT 84% -- NMT 8%• Medium-length sentences: PBSMT 36% -- NMT 57%
• Error analysis• SMT: sentence structure 35% (10% NMT)• NMT: 37% omission (8% SMT)• % segments with “no errors”: SMT 25% -- NMT 2%
www.adaptcentre.ieNMT for MOOCs
• Decide which system would provide better quality translations for the
project domain
• Systems
• PBMST (Moses)
• NMT (baseline)
• English into German, Greek, Portuguese and Russian
• Data set:
• OFD : ~24M (DE), ~31M (EL), ~32M (PT), ~22M (RU)
• In-domain : ~270K (DE), ~140K (EL), ~58K (PT), ~2M (RU)
• Ranking
• Post-editing
• Fluency and Adequacy (1-4 Likert scale)
• Error analysis: inflectional morphology, word order, omission, addition, and mistranslation
| 27
www.adaptcentre.ieNMT for MOOCs
• AEM:
• NMT outperforms SMT in terms of BLEU and METEOR
• More PE for SMT
• Fluency and Adequacy
• NMT is preferred across all languages for Fluency
• Adequacy results a bit less consistent
| 28
www.adaptcentre.ieNMT for MOOCs
Post-editing
Technical effort improved for DE, but marginally for other languages
Temporal effort marginally improved
Ranking
NMT is preferred across all languages (DE 80%, EL 56%, PT 61% and RU 63%)
| 29
www.adaptcentre.ieObservations
| 30
www.adaptcentre.ieObservations (from an old guy)
| 31
www.adaptcentre.ieObservations (from an old guy)
• MT is hard; it’s about as hard a problem as we’ve some up with.
• Just by adopting a new paradigm, the problems don’t become any easier.
| 32
www.adaptcentre.ieObservations (from an old guy)
• MT is hard; it’s about as hard a problem as we’ve some up with.
• Just by adopting a new paradigm, the problems don’t become any easier.
• (Some) newcomers to the field will soon find that MT is too hard for them and will disappear …
• The same thing happened with SMT – people came into the field, published an ACL paper with their favourite statistical method and ran off to their next field.
• For them, MT was just another application, whereas some of us have been doing this for half our lives and more!
| 33
www.adaptcentre.ieConcluding Remarks
• NMT results are really promising!
• But … human evaluations show that results are not yet so clear-cut
• Especially where data is scarce, NMT hopelessly underperforms compared to SMT
• Translation industry is eager for improved MT quality in order to minimise costs
• The hype around NMT must be treated cautiously; overselling a technology that is still in need of more research may cause more negativity about MT
| 34
www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
| 35
www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this technology now?
| 36
www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this technology now?
• If not, what needs to happen? And by when? Who can help?
| 37
www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this technology now?
• If not, what needs to happen? And by when? Who can help?
• Finally: training NMT engines typically takes weeks rather than days for SMT.
| 38
www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this technology now?
• If not, what needs to happen? And by when? Who can help?
• Finally: training NMT engines typically takes weeks rather than days for SMT.
– What’s the impact on the climate of all these GPU servers running 24/7?
| 39
www.adaptcentre.ie
| 40| 40
Thanks for listening!