Combining Crowd and AI to scale professional-quality ...External NLP Services Spell Check Syntax...

Post on 06-Aug-2020

0 views 0 download

Transcript of Combining Crowd and AI to scale professional-quality ...External NLP Services Spell Check Syntax...

Combining Crowd and AI to scale professional-quality

translation

João GraçaCTO

80% English

The Internet, 1997

5% Portuguese

8% Spanish

20% Chinese

4% German

6% Japanese

3% Arabic

30% English

The Internet, 2017

“All translation firms together are able to translate far less than 1% of relevant content

produced everyday”

CSA – MT Is Unavoidable to Keep Up with Content Volumes

Is Machine Translation already here?

* Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang

Everyone agrees that NMT is here to stay and much better than SMT

Bleu

Moses(generic)

Neural MT(generic)

Moses(adapted)

Neural MT(adapted)

49,943,5

35,729,6

Unbabel Experiments with Customer Service Tickets

Is NMT really enough?

* A Neural Network for Machine Translation, at Production Scale. Google Blog

0%

20%

40%

60%

80%

100%

0 6 12 18 24 30

Good Not sure Bad

MQM

Job

Quality per Job

Will AI solve translation?

MACHINE ONLYTIME

JOBS

MQM 95

QUALITY

Will AI solve translation?H

UM

AN

EFF

OR

T

TIME

JOBS

MQM 95

0%

20%

40%

60%

80%

100%

0 6 12 18 24 30

Good Not sure Bad

MQM

Job

Quality per Job

Unbabel Pipeline

Job

Unbabel Pipeline

JobMachine

Translation

Unbabel Pipeline

JobMachine

Translation

Unbabel Pipeline

JobMachine

Translation

Unbabel Pipeline

Q.E.

QualityEstimation

JobMachine

Translation

Customer

High Q.E.

Unbabel Pipeline

Q.E.

QualityEstimation

JobMachine

Translation

Customer

High Q.E.

Unbabel Pipeline

Q.E.

QualityEstimation

Low Q.E.

Re-Eval CommunityTranslators

JobMachine

Translation

Customer

Q.E.

QualityEstimation Community

Translators

Customer

Q.E.

QualityEstimation

Data Generation Engine

Initial text

Submitted text

After

Initial text

Submitted text

Before

NODATA POINTS

All changes in between:Mouse clicksKey pressesTimestamps

Data Generation Engine

Raw data Processed information

At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace

At 18:03:35:In nugget 3Pressed Shift Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed s Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed i Cursor at 26Selected: 0 At 18:03:35:In nugget 3Pressed e

At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace

Initial text

“Espero que esto es útil”

• Deleted word “es”• Inserted word “sea”Submitted text

“Espero que esto sea útil”

Keystroke Analysis

JobMachine

Translation

Customer

High Q.E.

Q.E.

QualityEstimation

Low Q.E.

Re-Eval Crowd

MACHINE QE COMMUNITY

Unbabel Pipeline

TranslationMemory

Job ResultMT Router

Customer MT Customer APE

Machine Translation Pipeline

Phrase-based MT Neural MT

Machine Translation Models

Customer Support Tickets

30

37,5

45

52,5

60

Google NMT

NMT

1K 5K 10K

20k

25K

51,748,8

47,246,9

43,2

34,1

43,1

Bleu

Customer Adaptation

Word-Level QE Which words are translated

correctly/incorrectly?

Sentence-Level QE How good is the entire

translation?

Quality Estimation

Word-level QE example

Hey lá , eu sou pesaroso sobre aquele !OK OK OK OKBA

DBAD

BAD

BAD

BAD

Quality Estimation

Unbabel Ticket

Bad translation

Good translation

source MT final

QE Training

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT41,1

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

45,2

41,1

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

47,3

45,2

41,1

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

50,3

47,3

45,2

41,1

WMT 16WL QE Winner

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

55,7

50,3

47,3

45,2

41,1

WMT 16WL QE Winner

QE Word Level

Ugent System

LinearQE

NeuralQE

StackedQE

APE-QE

FullStackedQE0 15 30 45 60

F1_MULT

57,5

55,7

50,3

47,3

45,2

41,1

Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz.“Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

WMT 16WL QE Winner

QE Word Level

QE Sentence Level

YANDEX

StackedQE

APE-QE

FullStackedQE

Pearson Correlation0 17,5 35 52,5 70

QE Sentence Level

YANDEX

StackedQE

APE-QE

FullStackedQE

Pearson Correlation0 17,5 35 52,5 70

52,5

QE Sentence Level

YANDEX

StackedQE

APE-QE

FullStackedQE

Pearson Correlation0 17,5 35 52,5 70

54,9

52,5

QE Sentence Level

YANDEX

StackedQE

APE-QE

FullStackedQE

Pearson Correlation0 17,5 35 52,5 70

61,3

54,9

52,5

QE Sentence Level

YANDEX

StackedQE

APE-QE

FullStackedQE

Pearson Correlation0 17,5 35 52,5 70

65,6

61,3

54,9

52,5

QE Sentence Level

Andre F. T. Martins, Marcin Junczys-Dowmunt, Fabio Kepler, Ramon Astudillo, Chris Hokamp, Roman Grundkiewicz.“Pushing the Limits of Translation Quality Estimation.” TACL 2017 (To Appear)

QE in the Pipeline

Job MachineTranslation

Customer

High Q.E.

Q.E.

QualityEstimation

QE in the Pipeline

Job MachineTranslation

Customer

High Q.E.

Q.E.

QualityEstimation

Document-Level QE how good is the entire document?

QE in the Pipeline

Job MachineTranslation

Customer

High Q.E.

Q.E.

QualityEstimation

Low Q.E.

Re-EvalCommunityTranslators

Document-Level QE how good is the entire document?

QE in the Pipeline

Job MachineTranslation

Customer

High Q.E.

Q.E.

QualityEstimation

Low Q.E.

Re-EvalCommunityTranslators

Document-Level QE how good is the entire document?

Human QE Can we evaluate post-edit output?

Interesting numbers

coming soon

QE in the Pipeline

Goals

Quality

Speed

Cost

Professional Translation

Goals

Quality

Speed

Cost

Pillars

Professional Translation

Goals

Quality

Speed

Cost

Pillars

Professional Translation

Goals

Quality

Speed

Cost

Pillars

• Editors Pool

Professional Translation

Goals

Quality

Speed

Cost

Pillars

• Editors Pool• Initial Text (MT)

Professional Translation

Goals

Quality

Speed

Cost

Pillars

• Editors Pool• Initial Text (MT)• Editor Assignment

Professional Translation

Goals

Quality

Speed

Cost

Pillars

• Editors Pool• Initial Text (MT)• Editor Assignment• Interfaces

Professional Translation

Goals

Quality

Speed

Cost

Pillars

• Editors Pool• Initial Text (MT)• Editor Assignment• Interfaces• Quality Evaluation

Professional Translation

Unbabel Community

50 000 Users

Unbabel Community

Quality Estimation

Distributed Pipeline

Expert Specialisation layers will grow with time

Testing Phase Editors are tested when they sign up

Training Content Editors get ratings for the tasks

Paid Content The best editors have access to paid content

Editors Pool

Expert Specialisation layers will grow with time

Testing Phase Editors are tested when they sign up

Training Content Editors get ratings for the tasks

Paid Content The best editors have access to paid content

Evaluators

Editors Pool

Expert Specialisation layers will grow with time

Testing Phase Editors are tested when they sign up

Training Content Editors get ratings for the tasks

Paid Content The best editors have access to paid content

Evaluators

Annotators

Editors Pool

How Editors are Evaluated

Editors Profiling

Editors Profiling

Editor Assignment

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

EditorsTasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull

EditorsTasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA EditorsTasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority EditorsTasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority EditorsQueue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority Editors Rating

4.2

3.8

4.3

4.8

Queue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority Editors Rating

4.2

3.8

4.3

4.8

NativeQueue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority Editors Rating

4.2

3.8

4.3

4.8

NativeQueue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Topics

Editor Assignment

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority Editors Rating

4.2

3.8

4.3

4.8

Native TopicsQueue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Topics

Editor Assignment

Editor Assignment

Regular distribution

3.8

old rating

Editor Assignment

Regular distribution

3.8

old rating

Smart distribution

4.6

Improved rating

Editor Assignment

Post-Editing Interfaces

Time Spent on Job

TIME

DELIVERYMT

Time Spent on Job

WAITING

Translator 1

TIME

DELIVERYMT

Time Spent on Job

WAITING

Translator 1

TIME

DELIVERYMT

Translator 2

WAITING

Time Spent on Job: Mobile

WAITING

Translator 1

TIME

DELIVERYMT

Translator 2

WAITING

20% less

Time Spent on Job

Pos-Editing Interfaces

Smartcheck

Spelling

Tone

Formality

Consistency

External NLP Services

Spell Check

Syntax Parser

Word AlignerClient Rule

Smartcheck

Spelling

Tone

Formality

Consistency

External NLP Services

Spell Check

Syntax Parser

Word AlignerClient Rule

Smartcheck

Spelling

Tone

Formality

Consistency

External NLP Services

Spell Check

Syntax Parser

Word AlignerClient Rule

Smartcheck

Annotation Tool

Eval

Annotated

Spelling

Tone

Formality

Consistency

External NLP Services

Spell Check

Syntax Parser

Word AlignerClient Rule

Smartcheck

Annotation Tool

Eval

Annotated Learn

M A C H I N E + H U M A N

CORE API

CHAT API

CYRANO API

Customer Service Conversational

MESSAGING API

Language OS

Language Engine

Bots

Tickets can come in many languages.

In Customer Service

Unbabel for Zendesk

Unbabel’s Domain Adapted

Machine Translation

Distributed Human Translation

English-speakingagent

Zendesk appconnected with Unbabel

API

Unbabel can offer the same Customer Satisfaction as native agents

SPEED QUALITY

9420minutes

Customer Replies: Speed & Quality

Chat

Editors train the

MT engineCommunity of 50K+ translatorsSkips Humans

Chat Translation Flow

Chat Messages: Speed & Quality

SPEED QUALITY

902Minutes

What is the future?H

UM

AN

EFF

OR

T

TIME

JOBS

MQM 95

0%?

Thank Youjoao@unbabel.com