Post-editing: how to future-proof your career in translation
Paulo Camargo, PhD.Owner, [email protected] - Brazilian Localization CompanyWeb: www.blc.com.br
Purpose of this presentation
• Promote the adoption of Machine Translation (MT) and Post-editing (PE)– How we can work faster, better, and make
more money
• Target-audience: – Novice, experienced, and advanced free-
lance translators
– Small LSPs and in-house translators
Perspectives: novice translator
• Introduce PE as a new profession– Background information
– Current adoption of PE
– PE productivity/compensation
• Explore availability of PE training– Why a translator need PE training
– What are the required skills
– PE certifications available: TAUS, and SDL
Perspectives: experienced/advanced
• Use MT output as translation aid- Research shows MT increases productivity
- Translators prefer MT instead of unaided
- GT, SDL Cloud, MS Hub, Systran
• Advanced: combine MT / term. manag.- Term extraction/customization of MT
- Generation/PE of MT output ↑ productivity
- Replace combined TM/on-line TM servers?
Perspectives: small LSP
• Large/medium LSPs use MT (> decade)– Small LSPs: need to catch-up
• How to get started with low budget- MT developments: ↓ need of specialized IT
- Key-resource: in-house translator
- Terminology management
- Customizable MT
- Preliminary analysis/PE Guidelines
Definition of post-editing (TAUS)
• Post-editing: “the correction of machine-generated translation output to ensure it meets a level of quality negotiated in advance between client and post-editor”.
• “Post-editing seeks the minimum steps required for an acceptable text”
Background information
• PE reality: driven by advances in MT– Hybrid MT: rule-based / statistic-based
– Rule-based: dictionaries, rules; e.g Systran
– Statistic-based: training data (TM); e.g GT
• Pre-editing – Customization: glossary, training data
– Preliminary analysis: language rules, client rules, example card
Preliminary analysis (Rico, 2011)
• After engine customization– Select MT samples
– Check term consistency/accuracy
– Check for recurrent MT errors
• Draw guidelines (quality acceptance)– Quality/errors to expect / how to proceed
– Language independent/dependent rules
– Feedback (glossary update, errors report)
Guidelines for PE (Rico, 2011)
• Language independent rules– Fix terminology, syntactic, morphology
– Fix misspelling, punctuation, omissions
– Edit offensive/inappropriate text
• Language dependent rules– Language specific examples:
– Example card: expected errors/how to fix
Custom MT: what to expect (O’Brien, 2002)
• Custom MT: high-level MT output– Most segments = 85% TM fuzzy
– Some better than 100% TM match (review)
– Some bad translations: retranslate
• Translator: critical to MT success
• Need human assessment: always!
“Not only will MTPE not replace the translator but it also will not happen without the translator”
Full MT post-editing (Dillinger, 2004)
• Goal: human-quality output
• Most frequent: higher visibility texts
• Quality expectations: high (TEP)
• Grammar, syntactic, semantic correct
• Stylistically appropriate
• Productivity expected: 4K – 10 K w/day
Current adoption of post-editing
• Common Sense Advisory report (2012)– Freelance: 21.7% (15.4% plan)– Small LSP: 32.5% (22.6 % plan)– Large LSP: 72.0% (28.0% plan) **
• ALC report (2015)– Small LSP: 20.0% (USA), 25.0% (Europe)
• Lionbridge (Marciano, 2015)– Apply 30% projects (goal 50%), 60M, 2014
Post-editing productivity data
• Post-editing productivity (O’Brien, 2006) – Equal/higher than editing TM High Fuzzy
– Typical: 4K to 10K words/day
– Proficiency: 100K w (1 month full-time PE)
• Other productivity data
– Full PE: 5K–8K w/day (DePalma, 2011)
PE compensation: follow TM fuzzy
• TM fuzzy matches (Guerberof, 2013)
– 60-66% of full TR rate for 75%-94% match
• MT full post-editing– 70-50% of rate (Guerberof, 2013)
– 65-68% of rate (Marciano, 2015)
– Smaller companies: prefer to pay per hour
Proposal for PE training (O’Brien, 2002)
• PE: what do TRs think about?– Dislike for correcting repetitive errors
– Fear of losing proficiency (poor MT output)
– Dislike for limited freedom of expression
• Why do TRs need PE training?– ≠ skills: 2 source texts
– Quality requirements, different error types
– Qualified translator ≠ successful post-editor
What skills does a post-editor need?
• Same as the translator (O’Brien, 2002) - Expert in subject area and target language
- Excellent knowledge of source language
- Word-processing (WP) skills, tolerance
• Skills for post-editor only (Rico, 2011)
- Adv WP: RegEx, S&R, term. management
- Positive attitude towards MT
Proposal for PE course (O’Brien, 2002)
• Theoretical component– Intro to PE/MT tech / controlled language
– Adv. term. Management / text linguistics
– Basic programming skills
• Required background– TRA skills; basic linguistics/term manag
– IT skills; intro lang tech; source/target skills
Sources for PE certification
• TAUS (Transl. Automation User Society)
– English >23 lang (European, Arabic, Asian)
– Also Spanish > English
– Cost: 60 Euro (member), 80 Euro (non)
• SDL MT PE Certification– Free with SDL Language Cloud MT
Perspectives: experienced/advanced
What possibilities can MT offer other than post-editing?
Is it worth using MT output as an aid to increase TR productivity?
Can MT replace with advantages the use of combined TM/on-line TM servers?
Efficiency of PE for language translation
• Rigorous, controlled analysis (Spence, 2013)
- Hypothesis 1: PE reduces translation time
- Hypothesis 2: PE increases quality
- Hypothesis 3: MT primes the translator
• Compared PE vs. unaided translation– Blind experiment: TR did not GT was used
– Pre-interview: TR showed strong MT dislike
– 16 PRO TRA/pair: EN-AR, EN-FR, EN-GE
Results clarify value of post-editing
• Which one is faster? 69% PE
• Useful? Yes 56%, No 29%, Unsure 15%
• Suggestions improved quality (all)
• MT output primes the translator– PE text (closer MT) ≠ Unaided ≠ Raw MT
– Lower the TR experience → closer to MT
Does MT output increase productivity?
• Example 1: Google Translate
– Now a paid service: $20/M characters
– Plug-in to SDL Trados/other CAT tools
– General statistical MT engine
– Not customizable
– Confidentiality issues
– See app for complete setup procedure
Does MT output increase productivity?
• Example 2: SDL Cloud MT – Price range: $5 – $75 /month (Expert)– Plug-in to SDL Trados/other CAT tools– Complete confidentiality (nothing is stored)– Pre-trained engines: Travel, IT, Life Sciences,
Automotive, Consumer Electronics
– Customizable MT: can add own glossaries– Comprehensive analytics (quality analysis)
Does MT output increase productivity?
• Example 3: Microsoft Translator Hub– Plug-in to SDL Trados/others, secure
– First 2M char free; 4M/mon $40
– Fully customizable MT engine
• Previous translations (> 20K words)
• Add glossaries
• Request training / evaluate results
• Option to “Use Microsoft Models”
How about confidentiality?
• Consider e.g. Microsoft and Google– Among the largest providers of MT
– Among the largest buyers of translation
– Control information flow around the globe
• Confidentiality should not be problem– Google not option → MS Hub/SDL Cloud
– Uncomfortable sending data to MS/SDL
– Use desktop/server solution: Systran
Changes in MT offer to TRs
• Common scenario for TR (4 years ago)– One affordable desktop product (Systran)
– Macros, RegEx, format conversion
– No plug-in for CAT tools (high-end)
• Current scenario– Software as a service (GT, SDL, MS Hub)
– Plug-in for CAT tools is standard
– Much lower IT requirements
Can MT replace combined/online TMs?
• Experienced/advanced translators– Use combined/on-line TM for productivity– Proud users: ↑ 50% prod, see TM as asset
– TM is error-prone (consistency, mistranslation)– Need to check term consistency
• MT improved a lot in last 5 years – TRs trust TM fuzzy > raw MT (Guerberof, 2008)
– Mistake MT output for TM output (human?)
MTPE can provide a better result
• Avoid problems in combined/on-line TM:– Terminology inconsistencies
– Mistranslations
– Waste time correcting TUs that will never use
• New approach using MTPE– Extract terms (Systran, rule-based)
– Customizable MT (SDL Cloud or MS Hub)
– Post-edit fresh MT for ↑ productivity/quality
How small LSPs can get started
• Scenario: 40% TRs use MT (TAUS)
• Actual post-editing offer (2015)– PE of our MT engine output (GT?)
– Payment: 900 words/hour
– Instruction: as readable as possible
– No pre-editing: cal, gloss or guidelines
• Translators were really upset
Need more than just Google Translate
• Pre-editing: custom., prelim. analysis– Key: in-house translator (O’Brien, 2002)
• Allocate translator for MTPE activities – Use secure on-line customizable engines
– Define suitable projects
– Invest in terminology management
– Develop PE guidelines
– No rate discount initially (learning curve)
MT implementation at BLC (4 years)
• Smaller projects: 10 - 50K words
– Terminology extraction (Systran, rule-based)
– Normal TR + ED procedure
– Semi-customized: SDL Cloud + Multiterm
• Larger projects > 50K words– Extract bigger glossary (higher coverage)
– Raw MT (Systran, SDL Cloud, MS Hub)
– PE + ED (no pre-editing, no discount)
MT implementation at BLC
• What does MT do for BLC?– Leverage my knowledge: engineering/science– Increase productivity (more/larger projects)– Increase quality (terminology/TM updates)
• Future developments– MS Hub, SDL Cloud; Systran– Hire new translator (2016)– Develop a PE team/service
Conclusion
The combination of machine translation (MT) and post-editing (PE) is a disruptive innovation that can improve translator’s productivity and translation quality, no matter how you plan to use it.
Can you afford to ignore it?
References
• Guerberof , Ana (2008). Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation. Masters Dissertation. Universitat Rovira i Virgili.
• O’Brien, Sharon (2006). Eye-tracking and Translation Memory matches. Perspectives: Studies in Translatology 14 (3), 185-205.
• Spence, Green et al (2013), The Efficacy of Human Post-Editing for Language Translation, ACM Human Factors in Computing Systems (CHI), Computer Science Department, Stanford University
• Rico, Celia et al (2011), EDI-TA: Post-editing Methodology for Machine Translation, MultilingualWeb-LT.
• O’Brien, Sharon (2002), Teaching Post-editing: A Proposal for Course Content. Proceedings of the 6th EAMT Workshop on Teaching Machine Translation. EAMT/BCS, UMIST, Manchester, UK. 99-106.
• DePalma, Donald (2011), Common Sense Advisory, Trends in Machine Translation.• Dillinger, Mike et al (2004), Implementing Machine Translation, LISA Best Practice
Guides.• TAUS (2014), MT Post-editing Guidelines• Marciano, Jay (2015), Personal communication.
Top Related