Kantanfest: Dimitar Shterionov - Part 2
-
Upload
kantanmt -
Category
Technology
-
view
62 -
download
0
Transcript of Kantanfest: Dimitar Shterionov - Part 2
![Page 1: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/1.jpg)
KantanNeural™ from A to Z3/3: NMT in 4 weeks → 4 days → 4 hours
Dimitar Shterionov
![Page 2: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/2.jpg)
What is NMT?
31/07/2017 KantanFest, Dublin, Ireland 2
![Page 3: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/3.jpg)
What is NMT?
31/07/2017 KantanFest, Dublin, Ireland 3
x1 x2 x3 c y1 y2 y3
![Page 4: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/4.jpg)
How to NMT – The Recipe
Hardware + Software: GPUs, torch, Theano
nematus, OpenNMT
Know-how, Support
Integration,Deployment
Training data
31/07/2017 KantanFest, Dublin, Ireland 4
![Page 5: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/5.jpg)
How to NMT – KantanNeural™
Hardware + Software: GPUs, torch, theano
nematus, OpenNMT
Know-how, Support
Integration,Deployment
Training data
KantanNeural™
31/07/2017 KantanFest, Dublin, Ireland 5
![Page 6: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/6.jpg)
KantanNeural™: black board to production
Proof of Concept:
AWS, NVIDIA K520 GPUs
Nematus, ADAM, BPE, SCN
MT (engines) build: 4 weeks
Quality: impressive
01 Nov 2016
31/07/2017 KantanFest, Dublin, Ireland 6
• ADAM: Parameter update algorithm
• Byte-pair encoding (BPE)• Single-character n-gram (SCN)
lower → low ertallest → tall estalmost → al most
lowesttallerallow
![Page 7: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/7.jpg)
KantanNeural™ α:
OpenNMT, ADAM, BPE
ΜΤ build time: 4 days
Quality: on a par with nematus
KantanFleet™
01 Nov 2016 01 Feb 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 7
![Page 8: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/8.jpg)
KantanNeural™ β:
Build-your-own NMT
Available to all clients (no extra charge)
Extended KantanFleet™
01 Nov 2016 01 Feb 2017 15 March 2017
KantanNeural™: black board to production
31/07/2017 KantanFest, Dublin, Ireland 8
![Page 9: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/9.jpg)
01 Nov 2016 01 Feb 2017 15 March 2017
Currently:
Build-your-own NMT
NVIDIA K80 GPUs
AdaptiveMT
Incremental Retraining
4 hours?
30 June 2017
31/07/2017 KantanFest, Dublin, Ireland 9
KantanNeural™: black board to production
![Page 10: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/10.jpg)
KantanMT.com – A Complete Platform
Build
Improve
Deploy
Kantan
Templates
Kantan
NER
Kantan
Llibrary
Kantan
Fleet
Kantan
BuildAnalytics
Kantan
Analytics
Kantan
PEX
Kantan
LQR
Adaptive
MT
Kantan
GENTRY
Kantan
TotalRecall
KantanNeural™Kantan
Translate
Kantan
Swift
Kantan
API
Kantan
AutoScale
Kantan
OfficeMT
Kantan
Connectors
Kantan
Snippets
KantanNeural™
31/07/2017 KantanFest, Dublin, Ireland 10
![Page 11: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/11.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
31/07/2017 KantanFest, Dublin, Ireland 11
![Page 12: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/12.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Select a KantanFleet™ engine
KantanFleet™ Neural (18 language
pairs)
Multiple domains
Create new NMT engine
Import library data
Import your own data
Convert an SMT profile:
… just two clicks away from NMT
31/07/2017 KantanFest, Dublin, Ireland 12
![Page 13: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/13.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Select a KantanFleet™ engine
31/07/2017 KantanFest, Dublin, Ireland 13
![Page 14: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/14.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Create a blank KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 14
![Page 15: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/15.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Convert a PBSMT engine into KantanNeural™
engine
31/07/2017 KantanFest, Dublin, Ireland 15
![Page 16: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/16.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
31/07/2017 KantanFest, Dublin, Ireland 16
Artificial Neural Networks train iteratively:
While stopping condition not met:
While training data not exhausted:
Take a batch
Learn from it
Repeat
![Page 17: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/17.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corporaPreprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TERKantanLQR(Error typology, AB Testing)
New Preprocessing rulesNew data
Augment data Augment data Augment data
31/07/2017 KantanFest, Dublin, Ireland 17
![Page 18: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/18.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corporaPreprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TERKantanLQR(Error typology, AB Testing)
New Preprocessing rulesNew data
Augment data Augment data Augment data
31/07/2017 KantanFest, Dublin, Ireland 18
![Page 19: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/19.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
Augment data
Parallel corporaPreprocessing rules
(PEX, tokeniser excep., etc.)
F-Measure, BLEU, TERKantanLQR(Error typology, AB Testing)
New Preprocessing rulesNew data
Augment data Augment data Augment data
31/07/2017 KantanFest, Dublin, Ireland 19
4 hours?
![Page 20: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/20.jpg)
KantanMT.com – A Complete Platform
Build Improve Deploy
API
Connectors
KantanWidgets™
As every other KantanMT engine
31/07/2017 KantanFest, Dublin, Ireland 20
![Page 21: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/21.jpg)
Conclusions…
KantanMT:
A complete MT platform for both NMT and PBSMT engines
Easy access to powerful MT technology
How to train, improve and deploy KantanNeural™ engines
Seamless switch from PBSMT to NMT
Incremental retraining to improve, adapt and specialize engines
![Page 22: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/22.jpg)
Conclusions…
KantanMT:
A complete MT platform for both NMT and PBSMT engines
Easy access to powerful MT technology
How to train, improve and deploy KantanNeural™ engines
Seamless switch from PBSMT to NMT
Incremental retraining to improve, adapt and specialize engines
4 hours training?
![Page 23: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/23.jpg)
… and future work
Better control:
Terminology
Tags
NTAs
Learn from postedits:
Exploit feedback from KantanLQR™
Exploit feedback from connectors
Models:
Add language knowledge
Hybrid MT
Convolutional Neural Networks (CNN)
…
![Page 24: Kantanfest: Dimitar Shterionov - Part 2](https://reader036.fdocuments.us/reader036/viewer/2022062306/5a64bffe7f8b9a6d5d8b47fb/html5/thumbnails/24.jpg)
Solving
Thank you…
Laura Casanellas: [email protected] Shterionov: [email protected]
KantanLabs: [email protected]
KantanMT: [email protected]