Machine Translation Quality Estimation - A Linguist's Approach

17
MACHINE TRANSLATION QUALITY ESTIMATION A Linguist’s Approach

Transcript of Machine Translation Quality Estimation - A Linguist's Approach

Page 1: Machine Translation Quality Estimation - A Linguist's Approach

MACHINE TRANSLATION QUALITY ESTIMATIONA Linguist’s Approach

Page 2: Machine Translation Quality Estimation - A Linguist's Approach

2

WHAT IS MT QUALITY ESTIMATION?

Automatically providing a quality indicator for machine translation output without depending on human reference translations.

Our objective:Estimate quality and post-editing effort for eBay listing titles and descriptions

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 3: Machine Translation Quality Estimation - A Linguist's Approach

3

ONE big CHALLENGE

min W Ʃ T t=1 ||(W(t)X(t) − Y (t) )||2 2 + λs||S||1 + λb||B||1,∞ subject to: W = S + B

or

“State-of-the-art QE explores different supervised linear or non-linear learning methods for regression or classification such as Support Vector Machines (SVM), different types of Decision Trees, Neural Networks, Elastic-Net, Gaussian Processes, Naive Bayes, among others”

(Machine Translation Quality Estimation Across Domains, de Souza et al, 2014)

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 4: Machine Translation Quality Estimation - A Linguist's Approach

4

A LINGUIST’S APPROACH

Using linguistic features from 3 dimensions:

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

COMPLEXITY ADEQUACYFLUENCY

Page 5: Machine Translation Quality Estimation - A Linguist's Approach

5

FEATURESComplexity:

• Length

• Polysemy

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Adequacy:

• QA Terminology Patterns Blacklist Numbers

• Automated Post-Editing

• (POS)

• (NER)

Fluency:

• Misspellings

• Grammar errors

Page 6: Machine Translation Quality Estimation - A Linguist's Approach

6

IMPLEMENTATION

Checkmate+LanguageTool

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Reusable Profile

Detailed Report

Score

Page 7: Machine Translation Quality Estimation - A Linguist's Approach

7

TESTING

• One Language (es-LA)

• Short samples (~300 words)

• Bigger samples (~1000 words)

• Post-Edited files (~50,000 words)

• pt-BR, ru-RU, zh-CN

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 8: Machine Translation Quality Estimation - A Linguist's Approach

RESULTS

Page 9: Machine Translation Quality Estimation - A Linguist's Approach

9

MEASURING RESULTS

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 10: Machine Translation Quality Estimation - A Linguist's Approach

10

SAMPLES - SCORE AND TIME ALIGN

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 11: Machine Translation Quality Estimation - A Linguist's Approach

11

FILES - SCORE AND ED ALIGN

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Average ED (es-LA, descriptions) = 72

Page 12: Machine Translation Quality Estimation - A Linguist's Approach

12

MT QE OVER TIME

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 13: Machine Translation Quality Estimation - A Linguist's Approach

13

SAMPLES - OTHER LANGUAGES

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 14: Machine Translation Quality Estimation - A Linguist's Approach

14

CHALLENGES

• False positives

• Matching score and post-editing effort

• Same weight for all features

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 15: Machine Translation Quality Estimation - A Linguist's Approach

15

WHAT’S NEXT

• Tracking scores over time

• Adding scores to our post-editing tool

• Adding new languages

• Researching new features

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 16: Machine Translation Quality Estimation - A Linguist's Approach

16

HOW CAN YOU USE THIS?

• Tailor the model to your needs

• Estimate quality at the file/segment level

• Target post-editing, discard bad content

• Estimate post-editing effort/time

• Compare MT systems

• Monitor MT system progress

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH

Page 17: Machine Translation Quality Estimation - A Linguist's Approach

17

Q&A

THANK YOU! [email protected]

MT QUALITY ESTIMATION – A LINGUIST’S APPROACH