2011 Tekom Wiesbaden: Implementation of a machine translation engine at CPSL

Post on 18-May-2015

110 views 4 download

Tags:

description

Presentation by CPSL and tauyou at the tekom annual conference. It provides the case of a successful implementation of machine translation in a mid-size Language Service Providers.

Transcript of 2011 Tekom Wiesbaden: Implementation of a machine translation engine at CPSL

Speaker: Speaker: BelBeléénn GarcGarcííaa--Ochoa (CPSL)Ochoa (CPSL)

CoCo--speaker: Diego speaker: Diego BartolomBartoloméé ((tauyoutauyou <language technology>)<language technology>)

Implementation of a Machine Implementation of a Machine

Translation Engine at CPSLTranslation Engine at CPSL

TheThe speakerspeaker

Localization Director at CPSL

CPSL is a Multilingual Service Provider since 1963

Headquarters in Barcelona-Spain

Other Offices in:

Madrid-Spain

Germany

UK

CPSL staff includes over 50 people

Belén García-Ochoa

TheThe coco--speakerspeaker

CEO tauyou <language technology>

tauyou provides language technologies for the localization industry since 2006

Main clients: medium-sized LSPs

Headquarters in Barcelona

Diego Bartolomé

CPSL and Machine Translation

Post-editing services provided to a software

company for a huge project

Lots of translated words in a tight timeframe

MainMain difficultiesdifficulties foundfound

LotsLots ofof clientsclients

DifferentDifferent subjectsubject mattersmatters

DifferentDifferent languagelanguage combinationscombinations

WorkaroundWorkaround

LotsLots ofof clientsclients::

A A listlist ofof thethe mostmost appropiateappropiate clientsclients forfor

usingusing thethe engineengine waswas createdcreated

BasedBased onon thisthis listlist, , wewe establishedestablished thethe

DifferentDifferent subjectsubject mattersmatters

AndAnd thethe

DifferentDifferent languagelanguage combinationscombinations

Human Human postpost--editingediting vs. vs.

humanhuman translationtranslation

TheThe standardstandard wordswords thatthat a a translatortranslator

can do can do perper dayday isis 2,5002,500..

TheThe standardstandard wordswords thatthat a a reviewerreviewer ofof

human human translationtranslation can do can do perper dayday isis

12,000.12,000.

AnAn average average ofof thethe wordswords thatthat can be can be

postpost--editededited perper dayday isis 8,000. 8,000.

Dedicated hybrid machine translation Dedicated hybrid machine translation

engine that is continuously customizedengine that is continuously customized

CorpusCorpus--based with rules for prebased with rules for pre-- and and

postpost--processingprocessing

Data confidentiality is guaranteedData confidentiality is guaranteed

Translation speedTranslation speed

The tauyou solutionThe tauyou solution

Any type of documentAny type of document

Glossary priorizationGlossary priorization

Fast domain creation/updateFast domain creation/update

Fully customizableFully customizable

Quality metrics computationQuality metrics computation

Terminology extractionTerminology extraction

Main characteristicsMain characteristics

gather ingather in--domain datadomain data

train the translation solutiontrain the translation solution

enrich solution with related textenrich solution with related text

terminology priorizationterminology priorization

update the translation solutionupdate the translation solution

add rules to enhance qualityadd rules to enhance quality

weekly updatesweekly updates

Optimum domain creationOptimum domain creation

Optimize translation quality for a clientOptimize translation quality for a client

gather client datagather client data

train the translation solutiontrain the translation solution

add rules to enhance qualityadd rules to enhance quality

continuous improvementcontinuous improvement

CPSL workflow 1CPSL workflow 1

General purpose translatorGeneral purpose translator

gather clients datagather clients data

add generic texts to provide a good sampleadd generic texts to provide a good sample

train the translation solutiontrain the translation solution

add rules to enhance qualityadd rules to enhance quality

periodical improvementperiodical improvement

CPSL workflow 2CPSL workflow 2

Data creation and enhancementData creation and enhancement

user defineduser defined

unaligned translated documentsunaligned translated documents

generic translationsgeneric translations

optimum corpus/memories creationoptimum corpus/memories creation

rulerule--based extension/filtering based extension/filtering

Other use casesOther use cases

tauyou interfacetauyou interface

Tabs can be customizedTabs can be customized

Detailed analysis of translated documentsDetailed analysis of translated documents

Several customized parameters, including word Several customized parameters, including word

error rate, number of word edits, tag differences, etcerror rate, number of word edits, tag differences, etc

Useful in machine translation but also in normal Useful in machine translation but also in normal

quality processquality process

Quality metricsQuality metrics

Unilingual and bilingual terminology listsUnilingual and bilingual terminology lists

Customized according to position in the sentence, Customized according to position in the sentence,

word type, number of words, etcword type, number of words, etc

Feed the MT engine or tool for human translatorFeed the MT engine or tool for human translator

Terminology extractionTerminology extraction

Increase usage of translation memoriesIncrease usage of translation memories

Automatic domain classificationAutomatic domain classification

Source text enhancement Source text enhancement

spelling, grammar, structure, terminology ...spelling, grammar, structure, terminology ...

Special words detectionSpecial words detection

New domains/language pairs creationNew domains/language pairs creation

The futureThe future

QuestionsQuestions??

bgarciabgarcia--ochoa@cpsl.comochoa@cpsl.com

www.cpsl.comwww.cpsl.com

dbc@tauyou.comdbc@tauyou.com

www.tauyou.comwww.tauyou.com