Bulgaria Sliven, Bulgaria. Church “Saint Dimitar”- the patron of the town.
Kantanfest: Dimitar Shterionov - Part 1
Transcript of Kantanfest: Dimitar Shterionov - Part 1
KantanNeural™ from A to Z1/3: To NMT or not to NMT?
Dimitar Shterionov
The Rise of MT
1954 1966 1970 1982 1993 2003 2005 2016 2020
Quality of MT over time
Rel
ati
ve q
ua
lity
Time
31/07/2017 KantanFest, Dublin, Ireland 2
Breakthrough in NeuralMT
31/07/2017 KantanFest, Dublin, Ireland 3
Yet another MT paradigm?
31/07/2017 KantanFest, Dublin, Ireland 4
Yet another MT paradigm?
Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?
How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?
31/07/2017 KantanFest, Dublin, Ireland 5
Yet another MT paradigm?
Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?
How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?
31/07/2017 KantanFest, Dublin, Ireland 6
Is NMT better than PBSMT???
Yet another MT paradigm?
Which technique is faster?Which technique is better?How can I integrate NMT in my pipeline?
How can I compare PBSMT and NMT?How can I improve my NMT engine?When to use PBSMT and when NMT?
31/07/2017 KantanFest, Dublin, Ireland 7
Can NMT better than PBSMT???
Various empirical evaluations (since 2015)
31/07/2017 KantanFest, Dublin, Ireland 8
…
Scientific Rigour – NMT vs PBSMT
31/07/2017 KantanFest, Dublin, Ireland 9
Experiment Setup Identical Training, Test and Tune Data
NMT training limited to 4 days
Evaluation: Automated Scores: F-Measure, TER, BLEU
Ranking with KantanLQR™, A/B Testing
Publications and Presentations EAMT 2017
MT Summit 2017
LocWorld34 NMT GALA Track
Scientific Rigour – NMT vs PBSMT
31/07/2017 KantanFest, Dublin, Ireland 10
A small parenthesis…There are so many factors
Learning algorithm and rate
Number of epochs
ANN properties
Data – preprocessing, segmentationyou need the right data!
Scientific Rigour – NMT vs PBSMT
31/07/2017 KantanFest, Dublin, Ireland 11
Training: Identical Corpora
Language ArcParallel
SentencesTWC UWC Domain(s)
English->German 8,820,562 110,150,238 859,167 Legal/Medical
English->Chinese(Simplified) 6,522,064 84,426,931 956,864 Legal/Technical
English->Japanese 8,545,366 87,252,129 676,244 Legal/Technical
English->Italian 2,756,185 35,295,535 765,930 Medical
English->Spanish 3,681,332 44,917,538 952,089 Legal
31/07/2017 KantanFest, Dublin, Ireland 12
Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time
English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h
English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h
English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h
English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h
English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h
SMT NMT
Training: Automated Scores
“In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.”
31/07/2017 KantanFest, Dublin, Ireland 13
Training: Automated Scores
0
10
20
30
40
50
60
70
80
90
English->German English->Chinese(S) English->Japanese English->Italian English->Spanish
SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER
Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time
English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h
English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h
English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h
English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h
English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h
SMT NMT
31/07/2017 KantanFest, Dublin, Ireland 14
Training: Automated Scores
0
10
20
30
40
50
60
70
80
90
English->German English->Chinese(S) English->Japanese English->Italian English->Spanish
SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER
Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time
English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h
English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h
English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h
English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h
English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h
SMT NMT
Alternative translations
SourceAll dossiers must be individually analysed by the ministry responsible for the economy and scientific policy.
ReferenceJeder Antrag wird von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik individuell geprüft.
PBSMTAlle Unterlagen müssen einzeln analysiert werden von den Dienststellen des zuständigen Ministers für Wirtschaft und Wissenschaftspolitik.
NMTAlle Unterlagen müssen von dem für die Volkswirtschaft und die wissenschaftliche Politik zuständigen Ministerium einzeln analysiert werden.
58%
0%
Source En este punto muestro mi desacuerdo con el informe.
Reference On this point, I am not in agreement with the report before us.
PBSMT At this point, I am not in agreement with the report.
NMT In this point I disagree with the report.
72%
7%
Source Debemos apoyarles a todos para que alcancen este objetivo.
Reference We must give them all our support to reach that goal.
PBSMT We must give them all our support to reach that goal.
NMT We have to support everyone to achieve this goal.
100%
0%
BLEU
EN→
DE
ES→
ENES
→EN
31/07/2017 KantanFest, Dublin, Ireland 15
31/07/2017 KantanFest, Dublin, Ireland 16
Ranking
37
21
13
24
10
21
EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE
Average Scores from A/B Testing (in percent)
Same SMT NMT
31/07/2017 KantanFest, Dublin, Ireland 17
Ranking
37
21
13
24
10
21
24
21
34
19
28
25.2
EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE
Average Scores from A/B Testing (in percent)
Same SMT NMT
31/07/2017 KantanFest, Dublin, Ireland 18
Ranking
37
21
13
24
10
2124
21
34
19
2825.2
39
58
5356
62
53.6
EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE
Average Scores from A/B Testing (in percent)
Same SMT NMT
BLEU underestimation of NMT
Take the translations from the NMT engine
considered better than their PBSMT counterparts.
How many of those are scored by BLEU lower than
their PBSMT counterparts?
Do the same for the PBSMT translations.
31/07/2017 KantanFest, Dublin, Ireland 19
EN→ZH-CN EN→JP EN→DE EN→IT EN→ES Average
NMT 40% 59% 55% 34% 53% 48%
PBSMT 12% 0% 9% 9% 0% 6%
Take-away messages…
NMT is a new efficient paradigm for MT
NMT does not solve the problem of language
NMT can be much better than PBSMT
Evaluating NMT:
BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT
Using KantanLQR™ (A/B Testing) facilitates MT ranking
31/07/2017 KantanFest, Dublin, Ireland 20
Take-away messages…
NMT is a new efficient paradigm for MT
NMT does not solve the problem of language … but it is getting there
NMT can be much better than PBSMT
Evaluating NMT:
BLEU, TER, F-Measure may underestimate NMT when compared to PBSMT
Using KantanLQR™ (A/B Testing) facilitates MT ranking
31/07/2017 KantanFest, Dublin, Ireland 21
To NMT or not to NMT?
Quality Evaluation
Thank you…
31/07/2017 KantanFest, Dublin, Ireland 22