Kantanfest: Dimitar Shterionov - Part 1

KantanNeural™ from A to Z1/3: To NMT or not to NMT?

Dimitar Shterionov

The Rise of MT

1954 1966 1970 1982 1993 2003 2005 2016 2020

Quality of MT over time

Rel

ati

ve q

ua

lity

Time

31/07/2017 KantanFest, Dublin, Ireland 2

Breakthrough in NeuralMT


Yet another MT paradigm?



Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?

How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?






Is NMT better than PBSMT???





Can NMT better than PBSMT???

Various empirical evaluations (since 2015)


…

Scientific Rigour – NMT vs PBSMT


Experiment Setup Identical Training, Test and Tune Data

NMT training limited to 4 days

Evaluation: Automated Scores: F-Measure, TER, BLEU

Ranking with KantanLQR™, A/B Testing

Publications and Presentations EAMT 2017

MT Summit 2017

LocWorld34 NMT GALA Track



A small parenthesis…There are so many factors

Learning algorithm and rate

Number of epochs

ANN properties

Data – preprocessing, segmentationyou need the right data!



Training: Identical Corpora

Language ArcParallel

SentencesTWC UWC Domain(s)

English->German 8,820,562 110,150,238 859,167 Legal/Medical

English->Chinese(Simplified) 6,522,064 84,426,931 956,864 Legal/Technical

English->Japanese 8,545,366 87,252,129 676,244 Legal/Technical

English->Italian 2,756,185 35,295,535 765,930 Medical

English->Spanish 3,681,332 44,917,538 952,089 Legal


Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time

English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h

English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h

English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h

English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h

English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h

SMT NMT

Training: Automated Scores

“In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.”



0

10

20

30

40

50

60

70

80

90

English->German English->Chinese(S) English->Japanese English->Italian English->Spanish

SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER







SMT NMT

Alternative translations

SourceAll dossiers must be individually analysed by the ministry responsible for the economy and scientific policy.

ReferenceJeder Antrag wird von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik individuell geprüft.

PBSMTAlle Unterlagen müssen einzeln analysiert werden von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik.

NMTAlle Unterlagen müssen von dem für die Volkswirtschaft und die wissenschaftliche Politik zuständigen Ministerium einzeln analysiert werden.

58%

0%

Source En este punto muestro mi desacuerdo con el informe.

Reference On this point, I am not in agreement with the report before us.

PBSMT At this point, I am not in agreement with the report.

NMT In this point I disagree with the report.

72%

7%

Source Debemos apoyarles a todos para que alcancen este objetivo.

Reference We must give them all our support to reach that goal.

PBSMT We must give them all our support to reach that goal.

NMT We have to support everyone to achieve this goal.

100%

0%

BLEU

EN→

DE

ES→

ENES

→EN



Ranking

37

21

13

24

10

21

EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE

Average Scores from A/B Testing (in percent)

Same SMT NMT


Ranking

37

21

13

24

10

21

24

21

34

19

28

25.2



Same SMT NMT


Ranking

37

21

13

24

10

2124

21

34

19

2825.2

39

58

5356

62

53.6



Same SMT NMT

BLEU underestimation of NMT

Take the translations from the NMT engine

considered better than their PBSMT counterparts.

How many of those are scored by BLEU lower than

their PBSMT counterparts?

Do the same for the PBSMT translations.


EN→ZH-CN EN→JP EN→DE EN→IT EN→ES Average

NMT 40% 59% 55% 34% 53% 48%

PBSMT 12% 0% 9% 9% 0% 6%

Take-away messages…

NMT is a new efficient paradigm for MT

NMT does not solve the problem of language

NMT can be much better than PBSMT

Evaluating NMT:

BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT

Using KantanLQR™ (A/B Testing) facilitates MT ranking


Take-away messages…

NMT is a new efficient paradigm for MT

NMT does not solve the problem of language … but it is getting there

NMT can be much better than PBSMT

Evaluating NMT:

BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT

Using KantanLQR™ (A/B Testing) facilitates MT ranking


To NMT or not to NMT?

Quality Evaluation

Thank you…


Kantanfest: Dimitar Shterionov - Part 1

Sports

Transcript of Kantanfest: Dimitar Shterionov - Part 1