Download - Post-editing: how to future-proof your career in translation Paulo Camargo, PhD. Owner, Terminologist [email protected] BLC - Brazilian Localization.

Post-editing: how to future-proof your career in translation

Paulo Camargo, PhD.Owner, [email protected] - Brazilian Localization CompanyWeb: www.blc.com.br

Purpose of this presentation

• Promote the adoption of Machine Translation (MT) and Post-editing (PE)– How we can work faster, better, and make

more money

• Target-audience: – Novice, experienced, and advanced free-

lance translators

– Small LSPs and in-house translators

Perspectives: novice translator

• Introduce PE as a new profession– Background information

– Current adoption of PE

– PE productivity/compensation

• Explore availability of PE training– Why a translator need PE training

– What are the required skills

– PE certifications available: TAUS, and SDL

Perspectives: experienced/advanced

• Use MT output as translation aid- Research shows MT increases productivity

- Translators prefer MT instead of unaided

- GT, SDL Cloud, MS Hub, Systran

• Advanced: combine MT / term. manag.- Term extraction/customization of MT

- Generation/PE of MT output ↑ productivity

- Replace combined TM/on-line TM servers?

Perspectives: small LSP

• Large/medium LSPs use MT (> decade)– Small LSPs: need to catch-up

• How to get started with low budget- MT developments: ↓ need of specialized IT

- Key-resource: in-house translator

- Terminology management

- Customizable MT

- Preliminary analysis/PE Guidelines

Definition of post-editing (TAUS)

• Post-editing: “the correction of machine-generated translation output to ensure it meets a level of quality negotiated in advance between client and post-editor”.

• “Post-editing seeks the minimum steps required for an acceptable text”

Background information

• PE reality: driven by advances in MT– Hybrid MT: rule-based / statistic-based

– Rule-based: dictionaries, rules; e.g Systran

– Statistic-based: training data (TM); e.g GT

• Pre-editing – Customization: glossary, training data

– Preliminary analysis: language rules, client rules, example card

Preliminary analysis (Rico, 2011)

• After engine customization– Select MT samples

– Check term consistency/accuracy

– Check for recurrent MT errors

• Draw guidelines (quality acceptance)– Quality/errors to expect / how to proceed

– Language independent/dependent rules

– Feedback (glossary update, errors report)

Guidelines for PE (Rico, 2011)

• Language independent rules– Fix terminology, syntactic, morphology

– Fix misspelling, punctuation, omissions

– Edit offensive/inappropriate text

• Language dependent rules– Language specific examples:

– Example card: expected errors/how to fix

Custom MT: what to expect (O’Brien, 2002)

• Custom MT: high-level MT output– Most segments = 85% TM fuzzy

– Some better than 100% TM match (review)

– Some bad translations: retranslate

• Translator: critical to MT success

• Need human assessment: always!

“Not only will MTPE not replace the translator but it also will not happen without the translator”

Full MT post-editing (Dillinger, 2004)

• Goal: human-quality output

• Most frequent: higher visibility texts

• Quality expectations: high (TEP)

• Grammar, syntactic, semantic correct

• Stylistically appropriate

• Productivity expected: 4K – 10 K w/day

Current adoption of post-editing

• Common Sense Advisory report (2012)– Freelance: 21.7% (15.4% plan)– Small LSP: 32.5% (22.6 % plan)– Large LSP: 72.0% (28.0% plan) **

• ALC report (2015)– Small LSP: 20.0% (USA), 25.0% (Europe)

• Lionbridge (Marciano, 2015)– Apply 30% projects (goal 50%), 60M, 2014

Post-editing productivity data

• Post-editing productivity (O’Brien, 2006) – Equal/higher than editing TM High Fuzzy

– Typical: 4K to 10K words/day

– Proficiency: 100K w (1 month full-time PE)

• Other productivity data

– Full PE: 5K–8K w/day (DePalma, 2011)

PE compensation: follow TM fuzzy

• TM fuzzy matches (Guerberof, 2013)

– 60-66% of full TR rate for 75%-94% match

• MT full post-editing– 70-50% of rate (Guerberof, 2013)

– 65-68% of rate (Marciano, 2015)

– Smaller companies: prefer to pay per hour

Proposal for PE training (O’Brien, 2002)

• PE: what do TRs think about?– Dislike for correcting repetitive errors

– Fear of losing proficiency (poor MT output)

– Dislike for limited freedom of expression

• Why do TRs need PE training?– ≠ skills: 2 source texts

– Quality requirements, different error types

– Qualified translator ≠ successful post-editor

What skills does a post-editor need?

• Same as the translator (O’Brien, 2002) - Expert in subject area and target language

- Excellent knowledge of source language

- Word-processing (WP) skills, tolerance

• Skills for post-editor only (Rico, 2011)

- Adv WP: RegEx, S&R, term. management

- Positive attitude towards MT

Proposal for PE course (O’Brien, 2002)

• Theoretical component– Intro to PE/MT tech / controlled language

– Adv. term. Management / text linguistics

– Basic programming skills

• Required background– TRA skills; basic linguistics/term manag

– IT skills; intro lang tech; source/target skills

Sources for PE certification

• TAUS (Transl. Automation User Society)

– English >23 lang (European, Arabic, Asian)

– Also Spanish > English

– Cost: 60 Euro (member), 80 Euro (non)

• SDL MT PE Certification– Free with SDL Language Cloud MT

Perspectives: experienced/advanced

What possibilities can MT offer other than post-editing?

Is it worth using MT output as an aid to increase TR productivity?

Can MT replace with advantages the use of combined TM/on-line TM servers?

Efficiency of PE for language translation

• Rigorous, controlled analysis (Spence, 2013)

- Hypothesis 1: PE reduces translation time

- Hypothesis 2: PE increases quality

- Hypothesis 3: MT primes the translator

• Compared PE vs. unaided translation– Blind experiment: TR did not GT was used

– Pre-interview: TR showed strong MT dislike

– 16 PRO TRA/pair: EN-AR, EN-FR, EN-GE

Results clarify value of post-editing

• Which one is faster? 69% PE

• Useful? Yes 56%, No 29%, Unsure 15%

• Suggestions improved quality (all)

• MT output primes the translator– PE text (closer MT) ≠ Unaided ≠ Raw MT

– Lower the TR experience → closer to MT

Does MT output increase productivity?

• Example 1: Google Translate

– Now a paid service: $20/M characters

– Plug-in to SDL Trados/other CAT tools

– General statistical MT engine

– Not customizable

– Confidentiality issues

– See app for complete setup procedure


• Example 2: SDL Cloud MT – Price range: $5 – $75 /month (Expert)– Plug-in to SDL Trados/other CAT tools– Complete confidentiality (nothing is stored)– Pre-trained engines: Travel, IT, Life Sciences,

Automotive, Consumer Electronics

– Customizable MT: can add own glossaries– Comprehensive analytics (quality analysis)


• Example 3: Microsoft Translator Hub– Plug-in to SDL Trados/others, secure

– First 2M char free; 4M/mon $40

– Fully customizable MT engine

• Previous translations (> 20K words)

• Add glossaries

• Request training / evaluate results

• Option to “Use Microsoft Models”

How about confidentiality?

• Consider e.g. Microsoft and Google– Among the largest providers of MT

– Among the largest buyers of translation

– Control information flow around the globe

• Confidentiality should not be problem– Google not option → MS Hub/SDL Cloud

– Uncomfortable sending data to MS/SDL

– Use desktop/server solution: Systran

Changes in MT offer to TRs

• Common scenario for TR (4 years ago)– One affordable desktop product (Systran)

– Macros, RegEx, format conversion

– No plug-in for CAT tools (high-end)

• Current scenario– Software as a service (GT, SDL, MS Hub)

– Plug-in for CAT tools is standard

– Much lower IT requirements

Can MT replace combined/online TMs?

• Experienced/advanced translators– Use combined/on-line TM for productivity– Proud users: ↑ 50% prod, see TM as asset

– TM is error-prone (consistency, mistranslation)– Need to check term consistency

• MT improved a lot in last 5 years – TRs trust TM fuzzy > raw MT (Guerberof, 2008)

– Mistake MT output for TM output (human?)

MTPE can provide a better result

• Avoid problems in combined/on-line TM:– Terminology inconsistencies

– Mistranslations

– Waste time correcting TUs that will never use

• New approach using MTPE– Extract terms (Systran, rule-based)

– Customizable MT (SDL Cloud or MS Hub)

– Post-edit fresh MT for ↑ productivity/quality

How small LSPs can get started

• Scenario: 40% TRs use MT (TAUS)

• Actual post-editing offer (2015)– PE of our MT engine output (GT?)

– Payment: 900 words/hour

– Instruction: as readable as possible

– No pre-editing: cal, gloss or guidelines

• Translators were really upset

Need more than just Google Translate

• Pre-editing: custom., prelim. analysis– Key: in-house translator (O’Brien, 2002)

• Allocate translator for MTPE activities – Use secure on-line customizable engines

– Define suitable projects

– Invest in terminology management

– Develop PE guidelines

– No rate discount initially (learning curve)

MT implementation at BLC (4 years)

• Smaller projects: 10 - 50K words

– Terminology extraction (Systran, rule-based)

– Normal TR + ED procedure

– Semi-customized: SDL Cloud + Multiterm

• Larger projects > 50K words– Extract bigger glossary (higher coverage)

– Raw MT (Systran, SDL Cloud, MS Hub)

– PE + ED (no pre-editing, no discount)

MT implementation at BLC

• What does MT do for BLC?– Leverage my knowledge: engineering/science– Increase productivity (more/larger projects)– Increase quality (terminology/TM updates)

• Future developments– MS Hub, SDL Cloud; Systran– Hire new translator (2016)– Develop a PE team/service

Conclusion

The combination of machine translation (MT) and post-editing (PE) is a disruptive innovation that can improve translator’s productivity and translation quality, no matter how you plan to use it.

Can you afford to ignore it?

References

• Guerberof , Ana (2008). Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation. Masters Dissertation. Universitat Rovira i Virgili.

• O’Brien, Sharon (2006). Eye-tracking and Translation Memory matches. Perspectives: Studies in Translatology 14 (3), 185-205.

• Spence, Green et al (2013), The Efficacy of Human Post-Editing for Language Translation, ACM Human Factors in Computing Systems (CHI), Computer Science Department, Stanford University

• Rico, Celia et al (2011), EDI-TA: Post-editing Methodology for Machine Translation, MultilingualWeb-LT.

• O’Brien, Sharon (2002), Teaching Post-editing: A Proposal for Course Content. Proceedings of the 6th EAMT Workshop on Teaching Machine Translation. EAMT/BCS, UMIST, Manchester, UK. 99-106.

• DePalma, Donald (2011), Common Sense Advisory, Trends in Machine Translation.• Dillinger, Mike et al (2004), Implementing Machine Translation, LISA Best Practice

Guides.• TAUS (2014), MT Post-editing Guidelines• Marciano, Jay (2015), Personal communication.