Pure Neural® Machine Translationpowering businesses
2
Jean Senellart
SYSTRAN SA CEOSYSTRAN Group Chief Scientist
50 Years leading Machine Translation Technologies
NMT has commoditized MT
What is a 'Commodity’ ?
A commodity is a basic good used in commerce that is interchangeable with other commodities of the same type. Commodities are most often used as inputs in the production of other goods or services. The quality of a given commodity may differ slightly, but it is essentially uniform across producers.
© 2 0 1 8 A L L R I G H T S R E S E R V E D 4
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 5
NDISPENSABLEOT A FINALITY BY ITSELF
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 6
DAPTED
NDISPENSABLEOT A FINALITY BY ITSELF
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 7
DAPTEDELIABLENDISPENSABLEOT A FINALITY BY ITSELF
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 8
IGHEST QUALITYDAPTEDELIABLENDISPENSABLEOT A FINALITY BY ITSELF
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 9
CALABLEIGHEST QUALITYDAPTEDELIABLENDISPENSABLEOT A FINALITY BY ITSELF
MT as a Commodity
© 2 0 1 8 A L L R I G H T S R E S E R V E D 10
CALABLEIGHEST QUALITYDAPTEDELIABLENDISPENSABLEOT A FINALITY BY ITSELFLOBAL
SHARING at SYSTRAN COMMUNITY DAY
SYSTRAN vision for MT democratization
Addressing a diversity of markets with Multilingual Solutions
Client Panel: The benefits of Machine Translation
Lunch & Networking
Product innovations
Infrastructure for AI platform
Partner Panel #1: How to value your multilingual data in a secure environment
Partner Panel #2: The Added Value of Machine Translation for the Language Industry
Break
Keynote: Beyond NMT, Towards a cross-lingual universal model?
Closing Cocktail
© 2 0 1 8 A L L R I G H T S R E S E R V E D 11
12
Thanks to our sponsors
Sharing Technology – 2 years of OpenNMT
© 2 0 1 8 A L L R I G H T S R E S E R V E D 13
24 months since launch
4300 stars and 1800 forks on github
4400 posts on the forum
100+ contributors
50 major releases
6 complete code refactoring
+1000 unit tests
Less than … 20000 lines of code
NMT Frameworks
Large “coopetition”
OpenNMT strengths• Large Community
• Computation-framework agnostic
• Very active development cycle
• Real-time Integration of latest research results
• Focus on Research and on Industry
fairseq
sockeyeAI Research
tensor2tensor
marianmt
Success of OpenNMT
218 citations to OpenNMT in academic papers
Best Demo-System Award at ACL 2017
Top-10 AI project on github in 2017
Fastest NMT system on CPU at ACL 2018
Adopted by multiple companies, including large corporations
Very large community
© 2 0 1 8 A L L R I G H T S R E S E R V E D 15
CleanedDATA
Preprocessing
Limits of OpenNMT – the training Quest
OpenSourceDATA
DomainDATA
SmartData Scientist
Evaluate and Start Again
NMT Model
Deploy
AI Research
marianmt
Training Script
NMT Framework
HandcraftedIntegration
Manual Deployment
Teaching
EvaluatingDeploying
From Research to 144 Production Systems
17© 2 0 1 8 A L L R I G H T S R E S E R V E D
Infinite Training
Dynamic Sampling
Model Catalog
Un réseau neural. Un réseau de neuralone.
Training or Teaching NMT models
18© 2 0 1 8 A L L R I G H T S R E S E R V E D
A neural network. Un réseau de neurone.A neural network.
Teaching an NMT Model – Understanding NMT
19
A neural network is smart,
… really smart
• Will try to make sense out of non-sense
• Will always outsmart the trainer
• Will take training data as gold truth
… but cannot guess user expectations
Learning process is incremental but follows random process
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Traditional approaches either filter out noise or try to clean it
Neural approach
In source – controlled noise extends the capacity of the model
In target – there is never any good noise!
Do not try to fix, but teach the model to deal with noise by generating more noise
Teaching an NMT model – Dealing with Noise
20© 2 0 1 8 A L L R I G H T S R E S E R V E D
Teaching an NMT model – Dealing with Errors
21
Trace the error back to the data
Understand the error• Wrong generalization of a pattern?• Missing training examples?• Error in the training data?
Fix the problem or make-up new examples
Teach the model to fix the errors by feeding the new data
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Teaching an NMT model – Dealing with Details
22
Core models deal with text – but translation need is about formatted documents
• Need to deal with formatting tags
• Typesetting status
Technical text contains special tokens (User entities)
Extend models with support of user dictionaries
Teach the model to handle all of these
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Teaching an NMT Model – Data Preparation
23
Data Preparation
Augmenting
Noising
Filtering
For a given task: quality of the model will directly depends on appropriateness of the training data.
PNS Sampler fully automate the data manipulation/preparation with integrated:• Data Sampling• Data Augmentation, Filtering, Noising• Tokenization
At each training iteration, NMT models are fed new corpus
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Teaching an NMT Model – Taking Time…
24
Not all the models learn the same facts at the same speed.
Infinite Training™ process coupled with sampling is continuously feeding new data selection.
Systematic model evaluation monitors progress on variety of test sets allowing training to adapt training curriculum.
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Models trained on 8 V100 GPU cards continue to learn after more than one month of infinite training…
Teaching NMT models - the model catalog
25
All produced models are stored in a catalog, represented as a model hierarchy and
evaluated.
More than 5000 models produced so far.
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Evaluation of Translation Quality – Challenge I
That can be particularly harmful for microelectronic applications since boron is a dopant for silicon.
© 2 0 1 8 A L L R I G H T S R E S E R V E D 26
Evaluation of Translation Quality – Challenge II
The Canadian opportunities strategy is not just concerned with today's immediate needs.
La stratégie canadienne en matière d'opportunités ne se limite pas aux besoins immédiats d'aujourd'hui.
La Stratégie canadienne pour l'égalité des chances ne se limite pas aux besoins immédiats d'aujourd'hui.
La stratégie canadienne d'opportunités ne s'intéresse pas seulement aux besoins immédiats d'aujourd'hui.
La stratégie canadienne en matière d'opportunités ne concerne pas seulement les besoins immédiats d'aujourd’hui.
La stratégie canadienne des possibilités n'est pas seulement soucieuse des besoins immédiats d'aujourd'hui.
La stratégie canadienne de promotion des chances ne s'intéresse pas seulement aux besoins immédiats d'aujourd'hui.
La stratégie canadienne en matière de débouchés n'est pas seulement axée sur les besoins immédiats d'aujourd'hui.
© 2 0 1 8 A L L R I G H T S R E S E R V E D 27
Evaluation of Translation Quality
© 2 0 1 8 A L L R I G H T S R E S E R V E D 28
“One-fit-all” translation cannot be good enough
Domain approach is a good starting point but needs to be refined for each use-case with:
• Application
• Specific customer terminology
• Style preference
Specialization is key (and easy) for most business cases
Success and Limitations
© 2 0 1 8 A L L R I G H T S R E S E R V E D 29
Unique workflow implemented for internal trainings
Above 5000 models generated for 20language pairs
Fully Automated Testing on large panel of test sets
Full Orchestration of computing resources
Simplify Specialization effort
Empower language expert
Too many languages and domains
to cover !
Need to open the system to local language experts and data providers…
Toward SYSTRAN Marketplace
30
The SYSTRAN Marketplace is an open online platform where language experts bring their expertise to MT Industry Users.
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Toward SYSTRAN Marketplace
31
MT industry users benefits:
State of the art translation models through competitive marketplace
Access to full catalog for testing and finding best translation models
Easy access to selected models through• Online APIs• Offline models
Access to largest and unified trainer network
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Toward SYSTRAN Marketplace
32
Trainers get
• Latest technology
• Unlimited computing resources
• Curated language resources
• SYSTRAN User specific request
• Model Catalog
Trainers provide and resell
• Domain models
• Their expertise
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Toward SYSTRAN Marketplace
© 2 0 1 8 A L L R I G H T S R E S E R V E D 33
SYSTRAN Marketplace currently opened to private invitations
Further NMT Research areas
34
Faster training and inference
© 2 0 1 8 A L L R I G H T S R E S E R V E D
Better leverage of training data
Extending the concept of translation
Unsupervized methods for machine translation
Trainers
Content Providers
Process AutomationIntegrators
Content Management
Services
Hosting Services Monitoring and Security
Language Experts
TechnologicalBricks
SYSTRAN Network for MT democratization
Research
1
Gaëlle Bou
Sales & Marketing Director, SYSTRAN
WHY TRANSLATION IS IMPORTANT?
© 2 0 1 8 A L L R I G H T S R E S E R V E D 2
Globalized Businesses & Workforces
20% speak English
less than 5% as their 1st
language
World population: 7,6B
World Economy
Emerging Markets meanemerging languages
Can’t read, won’t buy
Fewer than 10% of the population speak English in China, Colombia, Brazil, Africa, Middle East, Russia, Argentina, Chile…
Translation Connects the Global Economy
WHEN PROFESSIONAL MT IS OBVIOUS
High content volumes
Quick Turnaround
times
Globalized organization
Many languages
Pressure on costs
Specific domain
Compliance & Security
requirements
© 2 0 1 8 A L L R I G H T S R E S E R V E D 3
MACHINE TRANSLATION: FROM NICHE TO MAINSTREAM APPLICATION
© 2 0 1 8 A L L R I G H T S R E S E R V E D 4
First used by large corporations and government agencies
It has become an increasingly mainstream technology
Used by language service providers, enterprises of the mid-market, new industries…
Digital transformation
Customer engagement content
Multilingual, omni-channel challenge
“Translation volumes will increase by 67% over the next 3 years”
MT POWERED BY ARTIFICIAL INTELLIGENCE: WHAT ’S NEW?
© 2 0 1 8 A L L R I G H T S R E S E R V E D 5
TrainingGeneric Quality improved
Neural systems can do more with less but clean dataNew specialization techniques for high improvement of quality
Integration Capabilities
MT is increasingly integrated with third-party technologies
TMS, CMS,EMS, CRM, Ticketing systems, Digital workplace, Mobile, Chat bot…
36,32
60,02
34,76
56,98
-10
10
30
50
70
SYSTRAN Generic SYSTRANspecialization
Specialization quality improvement (BLEU score)
ES>FR
BUILDING A GLOBALLY INTEGRATED ORGANIZATION
© 2 0 1 8 A L L R I G H T S R E S E R V E D 6
Augmented employees with multilingual capabilities
Integration of business process globalization into digital transformation
Marketing
Customer Care
Product development
Manufacturing
Procurement
Transport & Logistics
HR/ Finance/ Legal
Customer SatisfactionRevenue and market sharesWeb trafficSelf-service supportKnowledge bases usageProductivityInternal Communications
Education/Training
MT OPPORTUNITIES / CHALLENGES
OPPORTUNITIES• Reduced Costs• Faster Turnaround time• More content translated• Increased throughput• More languages
CHALLENGES• Quality• Technical Complexity• Formatting• Qualified Staff• Data Security
© 2 0 1 8 A L L R I G H T S R E S E R V E D 7
DEPLOY MT WITH THE RIGHT PARTNER
NOT ONLY A TECHNOLOGY PROVIDER BUT A PROJECT PARTNER
© 2 0 1 8 A L L R I G H T S R E S E R V E D 8
Evaluate your translation needs and your existing
resources
Help to focus your efforts and set clear goals
Define the right scope and steps
for your MT project
Support you
in the implementation
and change management
Optimize year after year
benefits of MT in your
organization
SYSTRAN’s UNIQUE BUSINESS VALUES
© 2 0 1 8 A L L R I G H T S R E S E R V E D 9
Data Secured Customer Centric
Pure Player Technology Excellence Rich User Experience
Tailored Engines
UNIVERSES OF SYSTRAN’S COMMUNITY
© 2 0 1 8 A L L R I G H T S R E S E R V E D 11
Content ProductionDefense & Security CollaborationCustomer SupportE-Discovery
Augmented by multilingual capabilities to understand, search, collaborate, translate, support
One Solution, many Use Cases that embrace all Translation Needs within an Organization
DEFENSE & SECURITY
© 2 0 1 8 A L L R I G H T S R E S E R V E D 12
During the Cold War, SYSTRAN cooperated with the US Air Force to create the first effective translation software from Russian to English
Real time decision based on a massive collection of various data (text, audio, video, images…)
Rare languages with few human expertise
Speech transcription & machine translation integrated into the intelligence chain
Instant detection of crucial information in massive collection of multilingual data
EDISCOVERY
© 2 0 1 8 A L L R I G H T S R E S E R V E D 13
International Electronic investigation into Big Data : bulk translation & performance required!
Large volumes to be translated in a veryshort time
Confidentiality mandatory
Efficiency with a translation tool integratedwithin eDiscovery suite
Translations never leave your network, preventing data leakage and ensuring compliance
CONTENT PRODUCTION
© 2 0 1 8 A L L R I G H T S R E S E R V E D 14
When translation teams need to deliver quality within smaller budgets and tighter deadlines
TMS connectorsCAT tools connectorsCMS connectorsEIM connectors
MT integrated in the translation production chain
Competitive advantage in a market where pricepressure is high
Today a majority of LSPs have incorporated MT into a wide variety of production workflows
CSA Research finds that LSPs that embrace MT grow at 3.5 times the rate of those that shy away from it
CUSTOMER SUPPORT
© 2 0 1 8 A L L R I G H T S R E S E R V E D 15
Auto-translate knowledge bases, FAQs, and technical documentation
Enable call-center agents to support customers globally with multi-lingual chat
Automatically translate Incident chains for customer support agents
Reduce Your Calls, Reduce Your Cost
Delight your international users
Deliver first-rate multilingual service to international customers and achieve top satisfaction scores
COLLABORATION
© 2 0 1 8 A L L R I G H T S R E S E R V E D 16
Augment your employees with multilingual capabilities and real time understanding of any content in any language
Secured Translation tool from the Intranet Portal for all employees
MS Office suite plug-in to translate a Word, powerpoint, Excel instantly wihtout living the app
Skype for Business add-on: each employee chats in hisnative language
Customer data Sensitive
data
Confidential emails
Trade secrets
Understand any content in any language in real-time, autonomously and securely
1
Yannick Douzant
Director Products & Technology
Paris • Software Products & Core Translation Engine• 100+ Language Pairs• NLP Research
San Diego• Focus on « Historic » Language Pairs (Russian,
Middle East, Chinese,…)• Connectors to Third Party Products
Seoul• Focus on Korean Centric Language Pairs• Specific Projects (Patents)
2
SYSTRAN Global Product Team
3
User Tools Connectors API
YOUR TRANSLATION PROFILES
Custom Resources• SYSTRAN dictionaries • Client dictionaries• Industry dictionaries• Translation memories
MT Engines (* 140 LP)New user interface
Extended integration capabilities
New generation of NMT engines
New Training Process
4
New User Interface
5
What’s new?
New Look & Feel & Updated technical framework
New Navigation: • Better space utilization • 2 clicks away from all menus• Menus and Features reorg.
User Experience optimizations • Text box, Lists, profiles deployment, etc.
6
Text Translation
Optimization of Screen space usage
Giving more space for user actions and readability
For instance: on a classic 17-inch Screen (Resolution 1280 * 1024), the Translate Box is 25% larger
7
PN9 new server interface - List view generalization
What’s new?
• Homogenization and simplification of all items management screens
• Better Search & Overview
• Enabling both unitary & bulk actions (more to come)
Simpler Profile Management & Deployment
8
Before:
• First: create & edit a ‘Profile’
• Then create an ‘Active profile’, select the above Profile, and activate it
• Need to repeat and redeploy for any update
Now:
• Just create a profile: it can then be activated / deactivated / updated
9
Extended integrations
Translating Audio content – Connectors to Speech to Text Servers
TRANSCRIPTION TRANSLATION
11
New! Connector to Nuance Transcription Engine
• Support for over 14 languages / 28 dialects
• 3 operating modes
12
Speech Adapted Post Editor
1. Audio player
2. Synchronized highlight in the Translator Editor as the audio plays
13
Speech Adapted Post EditorReview source & update translation
14
Translating Non-Text content – OCR Integrations
TRANSLATIONOCR
15
SPNS 9 embeds new IRIS IDRS SDK (Included by default)
IRIS IDRS SDK (15.4): High Quality OCR for EN, AR and FA engines
Coming Soon: • HQ-OCR for FR / NL / DE
• Language Detection for PDF files
New plugin available for ABBYY FineReader Engine
Possibility to use ABBYY as the OCR Engine in SYSTRAN Server
NB: Requires separate runtime license for ABBYY
Image File Translation
Now possible to extract text directly from image files
16.bmp .jpg .jpeg .png .tif .docx
• SYSTRAN REST API
• Swagger / OpenAPI compliant
• Translate & Create Feedback
• Corpus & Dictionaries
• Pre-built Plugins & Connectors
• By SYSTRAN
• By Partners
• In-house & ad-hoc integrations
17
Integrate Translation Everywhere
Skype for Business plug-in
18
19
Microsoft Edge plug-in
20
Integration Ecosystem
Office SuitesCAT & Translation Management SystemsContent Management Systems & Marketing Media Monitoring and mininge-DiscoveryOSINT…
API / Connectors – What’s next
Recent & Current Work:
• (big) File Translation robustness
• New MS Office Integration
Next:
• Expose Deployment APIs
• More filters (Markdown, Json specifier, …)
• More integration• Investigate Microsoft Teams & Slack SDKs
• Continuously Expanding Partner Network for More integrations
21
22
Second Generation of NMT Engines
NMT First Generation Engines
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Char/Sec - CPU
RB SMT NMTv8
Better Quality but Slower than previoustechnologies…
NMT Models were running through historic coreEngine (RBMT & SMT)
Algorithms are evolving (RNN > Transformer > ?)
➡Need for a new core engine !
24
PN9 Engines – Purely designed for NMT
PN9 Core Engine – Optimized for production
OpenNMT-tf
Data Sampling
M-preprocess
B-preprocess
Tokenization
TRAI
N
TRANSLATE
R E S T A P I
Detokenization
Postprocess
SAMPLER SAMPLER
Core is OpenNMT-tf
• TensorFlow implementation of OpenNMT
Enhanced with SYSTRAN modules
• Monolingual & Bilingual preprocess
• Tokenization
• Post-process
With a Streamlined mode for Translation and integration withinSYSTRAN Pure Neural Server
26
PN9 - Translation Quality
Test sets Legal1 Legal2 Legal 3 News1 News2
Dialog1
Dialog2
EN > ES GOOGLE
42.08 47.18 50.65 42.99 42.46 36.35 33.56
EN > ES SYSTRAN
43.43 50.90 49.06 45.41 45.54 36.28 38.90
Test sets Legal1 Legal2 Legal 3 News1 News2
Dialog1
Dialog2
EN > FRGOOGLE
30.32 27.37 44.83 34.00 36.04 49.60 27.99
EN > FR SYSTRAN
32.43 37.41 43.37 35.73 35.16 52.53 30.43
Test sets IT2 Dialog1 Legal1 Legal2
Dialog2
News Legal3
AR > FR GOOGLE
33.18 15.85 31.05 35.70 22.71 22.96 44.72
AR > FR SYSTRAN
23.33 20.89 37.91 39.61 22.77 24.89 56.63
Better quality and fluidity for generic engines
Outperform all previous generations of SYSTRAN engines
Comparable or better to generic competition – constant objective
Starting point for any further specialization
27
Dedicated Core Engine for Translation step (“inference”)
Removes abstractions and calculations needed only for Training step
Uses Intel MKL libraries for calculations management
Model Quantization – reduce calculations• from float-32bits to Integer 16-bits
Internal Parallelization / batch
PN9 – Better Performances
15 to 30 times faster than first gen NMT on CPUReduced Memory footprint
0
500
1000
1500
2000
2500
Speed CPU (Char/Sec)
NMT v8
NMT v9 - CPU avx - 4 cores
NMT v9 - CPU avx2 - 4 cores
Coming Soon / Current Work
Imminent Release / Under QA: PN9 Engines for 20+ Language Pairs
Better quality, Faster, Lower pre-requisites
Next: Long Tail catalog including 140+ Generic Engine and Domain Engines
User Dictionaries in NMT engines
GPU: Inference mode and Optimizations – 5 to 10x faster
28
29
PN9 Engines – New training process
SYSTRAN Added-value in a very competitive environment
SYSTRAN Competition
Training Algorithm =Training Data + (Customer Data, Rare Languages)
Training Infrastructure + ++++Training Skills +++ ++Focus on MT +++ +Product +++ +Unique Features (UD, formatting) +++Service Offer & Specialization ++++
30
PN9 Core Engine – Training Features
• Data Sampling• Data Selection
• Monolingual preprocessing• Normalization
• Entity recognition
• Localization
• Bilingual preprocessing• Noising
• Tag injection (UD, placeholders, NFW)
• Divergence Filtering
• Tokenization
OpenNMT-tf
Data Sampling
M-preprocess
B-preprocess
Tokenization
TRANS
TRAI
N
R E S T A P I
Detokenization
Postprocess
SAMPLER SAMPLER
PN9 Infinite Training process
32
50+ Test Sets
Each iteration of the training is stored and evaluated
PN9 Infinite Training process
33
Thanks to sampling every single iteration uses different data
Possibility to specialize models (Specialized “branches”)
CorpusRepository
GPU
GPU
ModelCatalog
Push Model
Infinite Training Platform
GPU GPU GPUSelect Pool & Run training container
Score DB
Evaluate
Select & ReleaseSelect CorpusPull Training Image
GPUGPU
GPU GPU
New paradigm for SYSTRAN 9 Engines
A new engine
• Built on OpenNMT – but with unique proprietary features
• Complete rethinking of core engine design –Code is simpler & easier to manage
A new workflow
• New simplified training environment
• Infinite Training – our models won’t stop evolving
• Data Curation and continuous improvement
• A new logic – “don’t fix, teach how to fix”
Beyond Open Source
• Open to partners contributions
More options for customers
• Simplified workflow Get back on corpus focus
• First Bricks for next steps: Model market-place?DIY Trainings?
35
36
ResearchFrom Lab to Products
(Some of) Team’s 2017 Papers
37
OpenNMT: Open-Source Toolkit for Neural Machine TranslationGuillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, Alexander RushProceedings of ACL 2017, System Demonstrations (pages 67-72) - Publisher: Association for Computational Linguistics. 2017, Vancouver, Canada
SYSTRAN Purely Neural MT Engines for WMT2017Yongchao Deng, Jungi Kim, Guillaume Klein, Catherine Kobus, Natalia Segal, Christophe Servan, Bo Wang, Dakun Zhang, Josep Crego, Jean SenellartProceedings of the Second Conference on Machine Translation (pages 265-270) - Publisher: Association for Computational Linguistics. 2017, Copenhagen, Denmark
Boosting Neural Machine TranslationDakun Zhang, Jungi Kim, Josep Crego, Jean SenellartProceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) - Publisher: Asian Federation of Natural Language Processing. 2017, Taipei, Taiwan
Adaptation incrémentale de modèles de traduction neuronauxChristophe Servan, Josep Crego, Jean Senellart24e Conférence sur le Traitement Automatique des Langues Naturelles (TALN) - Actes de TALN 2017, volume 2 : articles courts - pages 218-225. Orléans, France – 26-30 juin 2017
Retrain concept for Infinite Training
OpenNMT
First GenNMT Engines
38
System Description for WNMT 2018: 800 words/sec on a single-core CPUJean Senellart, Dakun Zhang, Bo Wang, Guillaume Klein, J.P. Ramatchandirin, Josep Crego, Alexander M. RushProceedings of the 2nd Workshop on Neural Machine Translation and Generation" (pages 122–128) -Publisher: Association for Computational Linguistics. 2018, Melbourne, Australia, July 20
Neural Network Architectures for Arabic Dialect IdentificationElise Michon, Minh Quang Pham, Josep Crego and Jean SenellartProceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects" (pages 128–136) - Publisher: Association for Computational Linguistics. 2018, New Mexico, USA, Aug. 20
Analyzing Knowledge Distillation in NMTDakun Zhang, Josep Crego and Jean Senellart15th International Workshop on Spoken Language Translation, October 29-30 2018, Bruges, Belgium
Fixing Translation Divergences in Parallel Corpora for NMTMinh Quang Pham, Josep Crego, Jean Senellart and François Yvon2018 Conference on Empirical Methods in Natural Language Processing, October 31 – November 4 2018, Brussels, Belgium
CTranslate & Inference optimizations
Corpus Cleaning / Sampler
(Some of) Team’s Recent Papers
Arabizi script & Arabic Dialects management
New Optimizations for PN9 Engines
Research Projects – CoFunded activities
PARADE
Plateforme d’Analyse et de Recherche en Arabe et ses DialectEs
SIMPLES
SIMPLIFICATION DES LANGUES ÉCRITES
ROSETTA
RObot de Sous-titrage Et Toute Traduction Adaptés
ANITA
Advanced tools for fighting oNline Illegal TrAfficking
Whassat?
Automatic Multilingual Product Description in Natural Language
39
Top Related