2012 MosesCore GALA Monaco: Friendly Machine Translation

Post on 18-May-2015

82 views 3 download

description

Presentation by tauyou language technology at the annual GALA conference, in the event organized by TAUS for the MosesCore project.

Transcript of 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #1

friendly machine translation

Diego Bartolomé, CEO

© 2012 #2

outline

before starting with machine translation

what happens when you go live

how to minimize the risks

practical hints + some numbers

© 2012 #3

is machine translation for us?

<LSP> <tauyou>

translation memories open-source corpora

previous documents documentation alignment

websites of clients public information

language-specific rules programming of rules

TAUS data terminology extraction

<some issues>

minimum amount of data

need for data classification

language pairs

© 2012 #4

for sure it is!

<data cleaning + selection>

translation tables and language models

data and parameters for tuning

test measures

<engines creation>

several + pruning afterwards

<engine validation>

by professional translators

<continuous improvement>

new files, new corpora, new rules, etc.

© 2012 #5

the production process (I)

statistical MT decoding

convertfile format

segmenttext

NLPtasks

tokenizerewritesource

lowercase

© 2012 #6

the production process (II)

statistical MT decoding

translatedfile

reformat detokenize

rewrite target

uppercaseevaluate

© 2012 #7

risk minimization

<tauyou>quality metrics computation

<LSP>time and cost analysis

<LSP> + <tauyou>track the evolution over time

© 2012 #8

practical hints

bigger clients

languages

with highest translation volumes

with similar structure

with specific terminology/needs

MT-friendly translators

start moving

© 2012 #9

some numbers

more than 1,500 million words per month

in latin languages ES, FR, PT, CA, GA, IT, RO

EN as source or target is the star

ES, FR, DE, PT, IT, DA, SV, ZH, AR, JP...

LSPs are translating +3 million words per month

investment pays off if you translate

+50,000 words per month

© 2012 #10

Thanks!

// Diego Bartolomé, PhD

<address> C/ Les Planes 39 – 08201 Sabadell – Spain

<phone> +34 93 711 29 96

<cell> +34 670 331 225

<email> dbc@tauyou.com

<www> tauyou.com