Combining Crowd and AI to scale professional-quality ...External NLP Services Spell Check Syntax...
Transcript of Combining Crowd and AI to scale professional-quality ...External NLP Services Spell Check Syntax...
Combining Crowd and AI to scale professional-quality
translation
João GraçaCTO
80% English
The Internet, 1997
5% Portuguese
8% Spanish
20% Chinese
4% German
6% Japanese
3% Arabic
30% English
The Internet, 2017
“All translation firms together are able to translate far less than 1% of relevant content
produced everyday”
CSA – MT Is Unavoidable to Keep Up with Content Volumes
Is Machine Translation already here?
* Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang
Everyone agrees that NMT is here to stay and much better than SMT
Bleu
Moses(generic)
Neural MT(generic)
Moses(adapted)
Neural MT(adapted)
49,943,5
35,729,6
Unbabel Experiments with Customer Service Tickets
Is NMT really enough?
* A Neural Network for Machine Translation, at Production Scale. Google Blog
0%
20%
40%
60%
80%
100%
0 6 12 18 24 30
Good Not sure Bad
MQM
Job
Quality per Job
Will AI solve translation?
MACHINE ONLYTIME
JOBS
MQM 95
QUALITY
Will AI solve translation?H
UM
AN
EFF
OR
T
TIME
JOBS
MQM 95
0%
20%
40%
60%
80%
100%
0 6 12 18 24 30
Good Not sure Bad
MQM
Job
Quality per Job
Unbabel Pipeline
Job
Unbabel Pipeline
JobMachine
Translation
Unbabel Pipeline
JobMachine
Translation
Unbabel Pipeline
JobMachine
Translation
Unbabel Pipeline
Q.E.
QualityEstimation
JobMachine
Translation
Customer
High Q.E.
Unbabel Pipeline
Q.E.
QualityEstimation
JobMachine
Translation
Customer
High Q.E.
Unbabel Pipeline
Q.E.
QualityEstimation
Low Q.E.
Re-Eval CommunityTranslators
JobMachine
Translation
Customer
Q.E.
QualityEstimation Community
Translators
Customer
Q.E.
QualityEstimation
Data Generation Engine
Initial text
Submitted text
After
Initial text
Submitted text
Before
NODATA POINTS
All changes in between:Mouse clicksKey pressesTimestamps
Data Generation Engine
Raw data Processed information
At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace
At 18:03:35:In nugget 3Pressed Shift Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed s Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed i Cursor at 26Selected: 0 At 18:03:35:In nugget 3Pressed e
At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace
Initial text
“Espero que esto es útil”
• Deleted word “es”• Inserted word “sea”Submitted text
“Espero que esto sea útil”
Keystroke Analysis
JobMachine
Translation
Customer
High Q.E.
Q.E.
QualityEstimation
Low Q.E.
Re-Eval Crowd
MACHINE QE COMMUNITY
Unbabel Pipeline
TranslationMemory
Job ResultMT Router
Customer MT Customer APE
Machine Translation Pipeline
Phrase-based MT Neural MT
Machine Translation Models
Customer Support Tickets
30
37,5
45
52,5
60
Google NMT
NMT
1K 5K 10K
20k
25K
51,748,8
47,246,9
43,2
34,1
43,1
Bleu
Customer Adaptation
Word-Level QE Which words are translated
correctly/incorrectly?
Sentence-Level QE How good is the entire
translation?
Quality Estimation
Word-level QE example
Hey lá , eu sou pesaroso sobre aquele !OK OK OK OKBA
DBAD
BAD
BAD
BAD
Quality Estimation
Unbabel Ticket
Bad translation
Good translation
source MT final
QE Training
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT41,1
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
45,2
41,1
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
47,3
45,2
41,1
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
50,3
47,3
45,2
41,1
WMT 16WL QE Winner
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
55,7
50,3
47,3
45,2
41,1
WMT 16WL QE Winner
QE Word Level
Ugent System
LinearQE
NeuralQE
StackedQE
APE-QE
FullStackedQE0 15 30 45 60
F1_MULT
57,5
55,7
50,3
47,3
45,2
41,1
Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz.“Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
WMT 16WL QE Winner
QE Word Level
QE Sentence Level
YANDEX
StackedQE
APE-QE
FullStackedQE
Pearson Correlation0 17,5 35 52,5 70
QE Sentence Level
YANDEX
StackedQE
APE-QE
FullStackedQE
Pearson Correlation0 17,5 35 52,5 70
52,5
QE Sentence Level
YANDEX
StackedQE
APE-QE
FullStackedQE
Pearson Correlation0 17,5 35 52,5 70
54,9
52,5
QE Sentence Level
YANDEX
StackedQE
APE-QE
FullStackedQE
Pearson Correlation0 17,5 35 52,5 70
61,3
54,9
52,5
QE Sentence Level
YANDEX
StackedQE
APE-QE
FullStackedQE
Pearson Correlation0 17,5 35 52,5 70
65,6
61,3
54,9
52,5
QE Sentence Level
Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz.“Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)
QE in the Pipeline
Job MachineTranslation
Customer
High Q.E.
Q.E.
QualityEstimation
QE in the Pipeline
Job MachineTranslation
Customer
High Q.E.
Q.E.
QualityEstimation
Document-Level QE how good is the entire document?
QE in the Pipeline
Job MachineTranslation
Customer
High Q.E.
Q.E.
QualityEstimation
Low Q.E.
Re-EvalCommunityTranslators
Document-Level QE how good is the entire document?
QE in the Pipeline
Job MachineTranslation
Customer
High Q.E.
Q.E.
QualityEstimation
Low Q.E.
Re-EvalCommunityTranslators
Document-Level QE how good is the entire document?
Human QE Can we evaluate post-edit output?
Interesting numbers
coming soon
QE in the Pipeline
Goals
Quality
Speed
Cost
Professional Translation
Goals
Quality
Speed
Cost
Pillars
Professional Translation
Goals
Quality
Speed
Cost
Pillars
Professional Translation
Goals
Quality
Speed
Cost
Pillars
• Editors Pool
Professional Translation
Goals
Quality
Speed
Cost
Pillars
• Editors Pool• Initial Text (MT)
Professional Translation
Goals
Quality
Speed
Cost
Pillars
• Editors Pool• Initial Text (MT)• Editor Assignment
Professional Translation
Goals
Quality
Speed
Cost
Pillars
• Editors Pool• Initial Text (MT)• Editor Assignment• Interfaces
Professional Translation
Goals
Quality
Speed
Cost
Pillars
• Editors Pool• Initial Text (MT)• Editor Assignment• Interfaces• Quality Evaluation
Professional Translation
Unbabel Community
50 000 Users
Unbabel Community
Quality Estimation
Distributed Pipeline
Expert Specialisation layers will grow with time
Testing Phase Editors are tested when they sign up
Training Content Editors get ratings for the tasks
Paid Content The best editors have access to paid content
Editors Pool
Expert Specialisation layers will grow with time
Testing Phase Editors are tested when they sign up
Training Content Editors get ratings for the tasks
Paid Content The best editors have access to paid content
Evaluators
Editors Pool
Expert Specialisation layers will grow with time
Testing Phase Editors are tested when they sign up
Training Content Editors get ratings for the tasks
Paid Content The best editors have access to paid content
Evaluators
Annotators
Editors Pool
How Editors are Evaluated
Editors Profiling
Editors Profiling
Editor Assignment
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
EditorsTasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull
EditorsTasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA EditorsTasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority EditorsTasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority EditorsQueue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority Editors Rating
4.2
3.8
4.3
4.8
Queue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority Editors Rating
4.2
3.8
4.3
4.8
NativeQueue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority Editors Rating
4.2
3.8
4.3
4.8
NativeQueue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Topics
Editor Assignment
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority Editors Rating
4.2
3.8
4.3
4.8
Native TopicsQueue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Topics
Editor Assignment
Editor Assignment
Regular distribution
3.8
old rating
Editor Assignment
Regular distribution
3.8
old rating
Smart distribution
4.6
Improved rating
Editor Assignment
Post-Editing Interfaces
Time Spent on Job
TIME
DELIVERYMT
Time Spent on Job
WAITING
Translator 1
TIME
DELIVERYMT
Time Spent on Job
WAITING
Translator 1
TIME
DELIVERYMT
Translator 2
WAITING
Time Spent on Job: Mobile
WAITING
Translator 1
TIME
DELIVERYMT
Translator 2
WAITING
20% less
Time Spent on Job
Pos-Editing Interfaces
Smartcheck
Spelling
Tone
Formality
Consistency
External NLP Services
Spell Check
Syntax Parser
Word AlignerClient Rule
Smartcheck
Spelling
Tone
Formality
Consistency
External NLP Services
Spell Check
Syntax Parser
Word AlignerClient Rule
Smartcheck
Spelling
Tone
Formality
Consistency
External NLP Services
Spell Check
Syntax Parser
Word AlignerClient Rule
Smartcheck
Annotation Tool
Eval
Annotated
Spelling
Tone
Formality
Consistency
External NLP Services
Spell Check
Syntax Parser
Word AlignerClient Rule
Smartcheck
Annotation Tool
Eval
Annotated Learn
M A C H I N E + H U M A N
CORE API
CHAT API
CYRANO API
Customer Service Conversational
MESSAGING API
Language OS
Language Engine
Bots
Tickets can come in many languages.
In Customer Service
Unbabel for Zendesk
Unbabel’s Domain Adapted
Machine Translation
Distributed Human Translation
English-speakingagent
Zendesk appconnected with Unbabel
API
Unbabel can offer the same Customer Satisfaction as native agents
SPEED QUALITY
9420minutes
Customer Replies: Speed & Quality
Chat
Editors train the
MT engineCommunity of 50K+ translatorsSkips Humans
Chat Translation Flow
Chat Messages: Speed & Quality
SPEED QUALITY
902Minutes
What is the future?H
UM
AN
EFF
OR
T
TIME
JOBS
MQM 95
0%?