Kantanfest: Dimitar Shterionov - Part 1

22
KantanNeural™ from A to Z 1/3: To NMT or not to NMT? Dimitar Shterionov

Transcript of Kantanfest: Dimitar Shterionov - Part 1

Page 1: Kantanfest: Dimitar Shterionov - Part 1

KantanNeural™ from A to Z1/3: To NMT or not to NMT?

Dimitar Shterionov

Page 2: Kantanfest: Dimitar Shterionov - Part 1

The Rise of MT

1954 1966 1970 1982 1993 2003 2005 2016 2020

Quality of MT over time

Rel

ati

ve q

ua

lity

Time

31/07/2017 KantanFest, Dublin, Ireland 2

Page 3: Kantanfest: Dimitar Shterionov - Part 1

Breakthrough in NeuralMT

31/07/2017 KantanFest, Dublin, Ireland 3

Page 4: Kantanfest: Dimitar Shterionov - Part 1

Yet another MT paradigm?

31/07/2017 KantanFest, Dublin, Ireland 4

Page 5: Kantanfest: Dimitar Shterionov - Part 1

Yet another MT paradigm?

Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?

How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?

31/07/2017 KantanFest, Dublin, Ireland 5

Page 6: Kantanfest: Dimitar Shterionov - Part 1

Yet another MT paradigm?

Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?

How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?

31/07/2017 KantanFest, Dublin, Ireland 6

Is NMT better than PBSMT???

Page 7: Kantanfest: Dimitar Shterionov - Part 1

Yet another MT paradigm?

Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?

How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?

31/07/2017 KantanFest, Dublin, Ireland 7

Can NMT better than PBSMT???

Page 8: Kantanfest: Dimitar Shterionov - Part 1

Various empirical evaluations (since 2015)

31/07/2017 KantanFest, Dublin, Ireland 8

Scientific Rigour – NMT vs PBSMT

Page 9: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 9

Experiment Setup Identical Training, Test and Tune Data

NMT training limited to 4 days

Evaluation: Automated Scores: F-Measure, TER, BLEU

Ranking with KantanLQR™, A/B Testing

Publications and Presentations EAMT 2017

MT Summit 2017

LocWorld34 NMT GALA Track

Scientific Rigour – NMT vs PBSMT

Page 10: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 10

A small parenthesis…There are so many factors

Learning algorithm and rate

Number of epochs

ANN properties

Data – preprocessing, segmentationyou need the right data!

Scientific Rigour – NMT vs PBSMT

Page 11: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 11

Training: Identical Corpora

Language ArcParallel

SentencesTWC UWC Domain(s)

English->German 8,820,562 110,150,238 859,167 Legal/Medical

English->Chinese(Simplified) 6,522,064 84,426,931 956,864 Legal/Technical

English->Japanese 8,545,366 87,252,129 676,244 Legal/Technical

English->Italian 2,756,185 35,295,535 765,930 Medical

English->Spanish 3,681,332 44,917,538 952,089 Legal

Page 12: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 12

Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time

English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h

English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h

English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h

English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h

English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h

SMT NMT

Training: Automated Scores

“In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.”

Page 13: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 13

Training: Automated Scores

0

10

20

30

40

50

60

70

80

90

English->German English->Chinese(S) English->Japanese English->Italian English->Spanish

SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER

Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time

English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h

English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h

English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h

English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h

English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h

SMT NMT

Page 14: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 14

Training: Automated Scores

0

10

20

30

40

50

60

70

80

90

English->German English->Chinese(S) English->Japanese English->Italian English->Spanish

SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER

Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time

English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h

English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h

English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h

English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h

English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h

SMT NMT

Page 15: Kantanfest: Dimitar Shterionov - Part 1

Alternative translations

SourceAll dossiers must be individually analysed by the ministry responsible for the economy and scientific policy.

ReferenceJeder Antrag wird von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik individuell geprüft.

PBSMTAlle Unterlagen müssen einzeln analysiert werden von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik.

NMTAlle Unterlagen müssen von dem für die Volkswirtschaft und die wissenschaftliche Politik zuständigen Ministerium einzeln analysiert werden.

58%

0%

Source En este punto muestro mi desacuerdo con el informe.

Reference On this point, I am not in agreement with the report before us.

PBSMT At this point, I am not in agreement with the report.

NMT In this point I disagree with the report.

72%

7%

Source Debemos apoyarles a todos para que alcancen este objetivo.

Reference We must give them all our support to reach that goal.

PBSMT We must give them all our support to reach that goal.

NMT We have to support everyone to achieve this goal.

100%

0%

BLEU

EN→

DE

ES→

ENES

→EN

31/07/2017 KantanFest, Dublin, Ireland 15

Page 16: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 16

Ranking

37

21

13

24

10

21

EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE

Average Scores from A/B Testing (in percent)

Same SMT NMT

Page 17: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 17

Ranking

37

21

13

24

10

21

24

21

34

19

28

25.2

EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE

Average Scores from A/B Testing (in percent)

Same SMT NMT

Page 18: Kantanfest: Dimitar Shterionov - Part 1

31/07/2017 KantanFest, Dublin, Ireland 18

Ranking

37

21

13

24

10

2124

21

34

19

2825.2

39

58

5356

62

53.6

EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE

Average Scores from A/B Testing (in percent)

Same SMT NMT

Page 19: Kantanfest: Dimitar Shterionov - Part 1

BLEU underestimation of NMT

Take the translations from the NMT engine

considered better than their PBSMT counterparts.

How many of those are scored by BLEU lower than

their PBSMT counterparts?

Do the same for the PBSMT translations.

31/07/2017 KantanFest, Dublin, Ireland 19

EN→ZH-CN EN→JP EN→DE EN→IT EN→ES Average

NMT 40% 59% 55% 34% 53% 48%

PBSMT 12% 0% 9% 9% 0% 6%

Page 20: Kantanfest: Dimitar Shterionov - Part 1

Take-away messages…

NMT is a new efficient paradigm for MT

NMT does not solve the problem of language

NMT can be much better than PBSMT

Evaluating NMT:

BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT

Using KantanLQR™ (A/B Testing) facilitates MT ranking

31/07/2017 KantanFest, Dublin, Ireland 20

Page 21: Kantanfest: Dimitar Shterionov - Part 1

Take-away messages…

NMT is a new efficient paradigm for MT

NMT does not solve the problem of language … but it is getting there

NMT can be much better than PBSMT

Evaluating NMT:

BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT

Using KantanLQR™ (A/B Testing) facilitates MT ranking

31/07/2017 KantanFest, Dublin, Ireland 21

To NMT or not to NMT?

Page 22: Kantanfest: Dimitar Shterionov - Part 1

Quality Evaluation

Thank you…

31/07/2017 KantanFest, Dublin, Ireland 22