Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)
-
Upload
taus-enabling-better-translation -
Category
Presentations & Public Speaking
-
view
114 -
download
0
Transcript of Smarter, Faster, Better: The secrets of productive Machine Translation, Tony O’Dowd (KantanMT)
KantanMT.com – A Complete MT Platform
Kantan
Templates
Kantan
NER
Kantan
Llibrary
Kantan
Fleet
Kantan
BuildAnalytics
Build
Kantan
Analytics
Kantan
PEX
Kantan
LQR
Adaptive
MT
Kantan
GENTRY
Kantan
TotalRecall
Kantan
Neural
Improve
Kantan
Translate
Kantan
Swift
Kantan
API
Kantan
AutoScale
Kantan
OfficeMT
Kantan
Connectors
Kantan
Snippets
Deploy
Translation Quality Evaluation
Translation Quality Evaluation
KantanLQR Built into the KantanMT platform
Integral step in KantanMT Engine Development
Translation Quality Evaluation Factored Model
Templates based on Simplified Factors, MQM, and DQF and MQM-DQF
A/B Testing A, B (C or D) testing now fully supported
Real-time data analytics built into your LQR Dashboard
Available to all KantanMT Account holders
Improving Training Efficiency
Improving Training EfficiencyEn
gin
e tr
ain
ing
tim
e Pro
du
ct Delivery
Improving Training Efficiency
Giza++
Fast_Align
Improving Training Efficiency
Language Arc WC Unique WC
EN-FR 781,075 42,563
109,379,800 1,008,696
EN-DE 786,981 42,648
138,119,563 1,084,485
EN-ES 861,557 44,375
154,169,102 1,119,475
EN-IT 924,331 38,506
104,196,079 914,889
EN-ZH 810,134 33,281
58,274,131 550,862
Improving Training Efficiency
Language Arc
WC Unique WC GIZA++
EN-FR 781,075 42,563 00:09:23
109,379,800 1,008,696 10:35:11
EN-DE 786,981 42,648 00:10:06
138,119,563 1,084,485 15:33:43
EN-ES 861,557 44,375 00:10:21
154,169,102 1,119,475 14:07:21
EN-IT 924,331 38,506 00:11:03
104,196,079 914,889 11:09:32
EN-ZH 810,134 33,281 00:10:07
58,274,131 550,862 10:08:16
Improving Training Efficiency
Language Arc
WC Unique WC GIZA++ Fast-Align
EN-FR 781,075 42,563 00:09:23 00:03:49
109,379,800 1,008,696 10:35:11 04:02:14
EN-DE 786,981 42,648 00:10:06 00:03:57
138,119,563 1,084,485 15:33:43 04:13:57
EN-ES 861,557 44,375 00:10:21 00:04:20
154,169,102 1,119,475 14:07:21 04:54:12
EN-IT 924,331 38,506 00:11:03 00:04:32
104,196,079 914,889 11:09:32 05:46:41
EN-ZH 810,134 33,281 00:10:07 00:04:45
58,274,131 550,862 10:08:16 03:34:13
Improving Training Efficiency
Language Arc
WC Unique WC GIZA++ Fast-Align Difference
EN-FR 781,075 42,563 00:09:23 00:03:49 59%
109,379,800 1,008,696 10:35:11 04:02:14 62%
EN-DE 786,981 42,648 00:10:06 00:03:57 61%
138,119,563 1,084,485 15:33:43 04:13:57 73%
EN-ES 861,557 44,375 00:10:21 00:04:20 58%
154,169,102 1,119,475 14:07:21 04:54:12 65%
EN-IT 924,331 38,506 00:11:03 00:04:32 59%
104,196,079 914,889 11:09:32 05:46:41 48%
EN-ZH 810,134 33,281 00:10:07 00:04:45 55%
58,274,131 550,862 10:08:16 03:34:13 65%
Average 61%
Improving Training Efficiency
70.8
73.7
70.4
71
74.7
66.3
75.9
69.5
66.6
74.4
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-large
EN-ES-large
EN-FR-large
EN-IT-large
EN-ZH-large
F-MEASURE
66.2
60.5
61.8
60.5
53.7
63.4
63.5
62.2
61.3
52.2
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-large
EN-ES-large
EN-FR-large
EN-IT-large
EN-ZH-large
BLEU
43.7
40.2
42.7
41
48.7
49.6
37.2
43.5
44.6
48.8
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-large
EN-ES-large
EN-FR-large
EN-IT-large
EN-ZH-large
TER
Improving Training Efficiency
57.3
69.5
63
61.9
75.4
58.6
67.1
61.8
61
76.5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-small
EN-ES-small
EN-FR-small
EN-IT-small
EN-ZH-small
F-MEASURE
55.6
59.2
62.7
54.2
44.2
59.2
56.9
60
53
45.3
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-small
EN-ES-small
EN-FR-small
EN-IT-small
EN-ZH-small
BLEU
58.9
44.9
51.9
52.6
43.9
55.1
48.6
53.5
54.4
41.5
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EN-DE-small
EN-ES-small
EN-FR-small
EN-IT-small
EN-ZH-small
TER
Improving Training Efficiency
Dr. Dimitar Shterionov, [email protected], KantanLabs Dr. Jinhua Du, [email protected], ADAPT Centre, DCU
Marc Anthony Palminteri, [email protected], KantanMT.comLaura Casanellas, [email protected], KantanMT.com
Tony O’Dowd, [email protected], KantanMT.comProf. Andy Way, [email protected], ADAPT Centre, DCU
KantanNeural™
KantanNeural™ - Developments
3 Language Combinations
EN-DE, EN-ZH, EN-JP
Identical Training Data Catalogs
Training, Testing & Tuning
Phase 1 : Automated Test Score Comparisons
Phase 2 : Professional Translator A/B Testing
Arcs # Segments # Words Domain
EN-DE 8.8 million 156 million Legal
EN-ZH 3.5 million 53 million Legal
EN-JA 8.1 million 90 million Legal
KantanNeural™ - Developments
Phase 1 : Automated Test Score Comparisons
Arcs Type F-Measure BLEU TER
EN-DE SMT 68% 59% 50%
NMT 67% 49% 51%
Arcs Type F-Measure BLEU TER
EN-ZH SMT 76% 43% 45%
NMT 73% 43% 44%
Arcs Type F-Measure BLEU TER
EN-JA SMT 78% 53% 45%
NMT 68% 40% 53%
KantanNeural™ - Developments
Phase 1 : Automated Test Score Comparisons
Now available for use on the KantanMT Platform Beta I Release
Part of the KantanFleet Collection of pre-built engines
KantanMT Account holders can now translate All document formats are supported
New Language Arcs will be added during Q1 2017
Arcs Type F-Measure BLEU TER
EN-DE SMT 68% 59% 50%
NMT 67% 49% 51%
Arcs Type F-Measure BLEU TER
EN-JA SMT 78% 53% 45%
NMT 68% 40% 53%
Arcs Type F-Measure BLEU TER
EN-ZH SMT 76% 43% 45%
NMT 73% 43% 44%
KantanNeural™ - Developments
Phase 2 : Professional Translator A/B Testing
KantanLQR A/B Testing starting in Feb
Will publish results in March/April timeframe
Domain Adapted NMT
Available Feb 2017
Beta I Release
Solving
Thank you…