KantanFest: Andy Way

40
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. Andy Way ADAPT Centre @ Dublin City University Separating the Hype from the Reality: Neural Machine Translation

Transcript of KantanFest: Andy Way

Page 1: KantanFest: Andy Way

The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

Andy Way

ADAPT Centre @ Dublin City University

Separating the Hype from the Reality: Neural Machine Translation

Page 2: KantanFest: Andy Way

www.adaptcentre.ieThe Time for MT is

| 2

Page 3: KantanFest: Andy Way

www.adaptcentre.ieThe hype begins …

| 3

Page 4: KantanFest: Andy Way

www.adaptcentre.ie

| 4

The hype gathers pace …

Thanks to Sheila Castilho

Page 5: KantanFest: Andy Way

www.adaptcentre.ie

| 5

And translators go (again) …

Thanks to Sheila Castilho

Page 6: KantanFest: Andy Way

www.adaptcentre.ieLeads to Us vs. Them Mentality (again)

| 6

Page 7: KantanFest: Andy Way

www.adaptcentre.ieTranslators are quite well-disposed towards technology

| 7

LeBlanc 2013, 2017

Koskinen & Ruokenon 2017

Thanks to Dorothy Kenny

Page 8: KantanFest: Andy Way

www.adaptcentre.ieTranslators are quite well-disposed towards technology

| 8

“The public thinks that technology takes care of translation. Good technology is all about making good translators better”

Jost Zetzsche (@jeromobot)

24th June 2017

Page 9: KantanFest: Andy Way

www.adaptcentre.ieSo, should translators be afraid of NMT?

| 9

Page 10: KantanFest: Andy Way

www.adaptcentre.ieSo, should translators be afraid of NMT?

| 10

Labradoodles, or Fried Chicken?

Page 11: KantanFest: Andy Way

www.adaptcentre.ieSo, should translators be afraid of NMT?

| 11

Page 12: KantanFest: Andy Way

www.adaptcentre.ieSo, should translators be afraid of NMT?

| 12

Chihuahuas, or Muffins?

Page 13: KantanFest: Andy Way

www.adaptcentre.ie

Philipp Koehn, Omniscien Webinar 2017

MT has been overhyped for years …

| 13

Page 14: KantanFest: Andy Way

www.adaptcentre.ieSo what can we learn from the past?

Why did we start doing SMT (and Hybrid MT)?

We wrote a (I thought) good paper on EBMT, submitted to ACL, and were rejected by all three reviewers. Why?

Because we hadn't compared our results with 'state-of-the-art' SMT!

| 14

Page 15: KantanFest: Andy Way

www.adaptcentre.iePhrase-Based SMT then came along in earnest

• “Let the data decide”!

| 15

Page 16: KantanFest: Andy Way

www.adaptcentre.iePhrase-Based SMT then came along in earnest

• “Let the data decide”!

• But what have we spent the last 10 years doing?

Smuggling in Syntax, Semantics, and (lately) Discourse features to break through the glass ceiling.

| 16

Page 17: KantanFest: Andy Way

www.adaptcentre.iePhrase-Based SMT then came along in earnest

• “Let the data decide”!

• But what have we spent the last 10 years doing?

Smuggling in Syntax, Semantics, and (lately) Discourse features to break through the glass ceiling.

| 17

Page 18: KantanFest: Andy Way

www.adaptcentre.ieSMT & Linguistics

SMT practitioners know now about the value of linguistic information

cf. Alex Fraser's keynote at EAMT-16: agreement phenomena (gender, person, number, case),

verbal inflection,

compounding,

terminology,

lexical/structural ambiguity,

pronouns ...

| 18

Page 19: KantanFest: Andy Way

www.adaptcentre.ieWhat’s happened since?

Deep Learning came along and took off! “Let the data decide”! Recent (accepted) ACL 2016 paper on SMT:

“you haven't compared your results with 'state-of-the-art' NMT”!

| 19

Page 20: KantanFest: Andy Way

www.adaptcentre.ieThis isn’t rocket science!

| 20

Page 21: KantanFest: Andy Way

www.adaptcentre.ieWhat is the actual situation?

• Wins for NMT for numerous language pairs at IWSLT/WMT 2015 & WMT 2016

• Bentivogli et al. (2016 – arxiv; EMNLP)

– IWSLT 2015 English-German: NMT compared to 4 SMT systems

– Automatic Evaluation:

• NMT outperforms SMT system in any length bin, with statistically significant differences

– Human Evaluation:

• NMT makes at least 19% fewer morphology errors than SMT

• NMT makes at least 17% fewer lexical errors than SMT

• NMT translations require about 50% fewer shifts than SMT

• NMT reduces verb order errors by 70% with respect to best SMT system

• NMT reduces noun order errors by 47% with respect to best SMT system

• NMT gains also for prepositions (-18%), negation particles (-17%) and articles (-4%)

• NMT generates outputs that considerably lower the overall post-editing effort w.r.t best SMT system (-26%)

| 21

Page 22: KantanFest: Andy Way

www.adaptcentre.ieOther Use-Cases

• NMT for E-Commerce

• NMT for Patents

• NMT for MOOCs

[Castilho et al. 2017, EAMT]

• Five other human evaluations of NMT/SMT at EAMT 2017 (inc. from ))

| 22

Page 23: KantanFest: Andy Way

www.adaptcentre.ieNMT for E-Commerce

• Translate product listings

• Systems (Calixto et al. 2017—EACL):

• (1) a PBSMT baseline model built with the Moses SMT Toolkit

• (2) a text-only NMTt model

• (3) a multi-modal NMT model (NMTm)

• English into German

• Data set: 24k parallel product listings + images

• Validation/test data: 480/444 tuples

• 18 German native speakers

• Ranking

• Translations from the 3 systems + product image

• Adequacy (Likert scale 1- All of it to 4- None of it)

• Source + translation + product image

| 23

Page 24: KantanFest: Andy Way

www.adaptcentre.ieNMT for E-Commerce

• AEM:

• PBSMT outperforms both NMT models (BLEU, METEOR and chrF3)

• NMTm performs as well as PBSMT (TER)

• Adequacy

• NMTm performs as well as PBSMT

• Ranking

• PBSMT: 56.3% preferred system

• NMTm: 24.8%

• NMTt: 18.8%

| 24

Page 25: KantanFest: Andy Way

www.adaptcentre.ieNMT for Patents

| 25

• Compare performance of mature patent MT engines used in production with new NMT system

• Systems

• PBSMT (a combination of elements of phrase-based, syntactic, and rule-driven MT, along with automatic post-editing)

• NMT (baseline)

• English into Chinese

• Data set: ~1M sentence pairs chemical abstracts, ~350K chemical titles, ~12M general patent, and ~2K glossaries.

• 2 reviewers

• Ranking

• Error analysis

• Punctuation, part of speech, omission, addition, wrong terminology, literal translation, and word form.

Page 26: KantanFest: Andy Way

www.adaptcentre.ieNMT for Patents

| 26

• AEM:• SMT outperforms NMT for abstracts, NMT outperforms SMT for titles

• Ranking• General: PBSMT 54% -- MT 39%• Long sentences: PBSMT 58% -- NMT 33%• Short sentences: PBSMT 84% -- NMT 8%• Medium-length sentences: PBSMT 36% -- NMT 57%

• Error analysis• SMT: sentence structure 35% (10% NMT)• NMT: 37% omission (8% SMT)• % segments with “no errors”: SMT 25% -- NMT 2%

Page 27: KantanFest: Andy Way

www.adaptcentre.ieNMT for MOOCs

• Decide which system would provide better quality translations for the

project domain

• Systems

• PBMST (Moses)

• NMT (baseline)

• English into German, Greek, Portuguese and Russian

• Data set:

• OFD : ~24M (DE), ~31M (EL), ~32M (PT), ~22M (RU)

• In-domain : ~270K (DE), ~140K (EL), ~58K (PT), ~2M (RU)

• Ranking

• Post-editing

• Fluency and Adequacy (1-4 Likert scale)

• Error analysis: inflectional morphology, word order, omission, addition, and mistranslation

| 27

Page 28: KantanFest: Andy Way

www.adaptcentre.ieNMT for MOOCs

• AEM:

• NMT outperforms SMT in terms of BLEU and METEOR

• More PE for SMT

• Fluency and Adequacy

• NMT is preferred across all languages for Fluency

• Adequacy results a bit less consistent

| 28

Page 29: KantanFest: Andy Way

www.adaptcentre.ieNMT for MOOCs

Post-editing

Technical effort improved for DE, but marginally for other languages

Temporal effort marginally improved

Ranking

NMT is preferred across all languages (DE 80%, EL 56%, PT 61% and RU 63%)

| 29

Page 30: KantanFest: Andy Way

www.adaptcentre.ieObservations

| 30

Page 31: KantanFest: Andy Way

www.adaptcentre.ieObservations (from an old guy)

| 31

Page 32: KantanFest: Andy Way

www.adaptcentre.ieObservations (from an old guy)

• MT is hard; it’s about as hard a problem as we’ve some up with.

• Just by adopting a new paradigm, the problems don’t become any easier.

| 32

Page 33: KantanFest: Andy Way

www.adaptcentre.ieObservations (from an old guy)

• MT is hard; it’s about as hard a problem as we’ve some up with.

• Just by adopting a new paradigm, the problems don’t become any easier.

• (Some) newcomers to the field will soon find that MT is too hard for them and will disappear …

• The same thing happened with SMT – people came into the field, published an ACL paper with their favourite statistical method and ran off to their next field.

• For them, MT was just another application, whereas some of us have been doing this for half our lives and more!

| 33

Page 34: KantanFest: Andy Way

www.adaptcentre.ieConcluding Remarks

• NMT results are really promising!

• But … human evaluations show that results are not yet so clear-cut

• Especially where data is scarce, NMT hopelessly underperforms compared to SMT

• Translation industry is eager for improved MT quality in order to minimise costs

• The hype around NMT must be treated cautiously; overselling a technology that is still in need of more research may cause more negativity about MT

| 34

Page 35: KantanFest: Andy Way

www.adaptcentre.ieFood for Thought?

• Imagine NMT really is better than SMT:

– for all domains

– for all language pairs

| 35

Page 36: KantanFest: Andy Way

www.adaptcentre.ieFood for Thought?

• Imagine NMT really is better than SMT:

– for all domains

– for all language pairs

• Is the translation industry set up to provide this technology now?

| 36

Page 37: KantanFest: Andy Way

www.adaptcentre.ieFood for Thought?

• Imagine NMT really is better than SMT:

– for all domains

– for all language pairs

• Is the translation industry set up to provide this technology now?

• If not, what needs to happen? And by when? Who can help?

| 37

Page 38: KantanFest: Andy Way

www.adaptcentre.ieFood for Thought?

• Imagine NMT really is better than SMT:

– for all domains

– for all language pairs

• Is the translation industry set up to provide this technology now?

• If not, what needs to happen? And by when? Who can help?

• Finally: training NMT engines typically takes weeks rather than days for SMT.

| 38

Page 39: KantanFest: Andy Way

www.adaptcentre.ieFood for Thought?

• Imagine NMT really is better than SMT:

– for all domains

– for all language pairs

• Is the translation industry set up to provide this technology now?

• If not, what needs to happen? And by when? Who can help?

• Finally: training NMT engines typically takes weeks rather than days for SMT.

– What’s the impact on the climate of all these GPU servers running 24/7?

| 39

Page 40: KantanFest: Andy Way

www.adaptcentre.ie

| 40| 40

Thanks for listening!