2012 MosesCore GALA Monaco: Friendly Machine Translation

10
© 2012 #1 friendly machine translation Diego Bartolomé, CEO

description

Presentation by tauyou language technology at the annual GALA conference, in the event organized by TAUS for the MosesCore project.

Transcript of 2012 MosesCore GALA Monaco: Friendly Machine Translation

Page 1: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #1

friendly machine translation

Diego Bartolomé, CEO

Page 2: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #2

outline

before starting with machine translation

what happens when you go live

how to minimize the risks

practical hints + some numbers

Page 3: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #3

is machine translation for us?

<LSP> <tauyou>

translation memories open-source corpora

previous documents documentation alignment

websites of clients public information

language-specific rules programming of rules

TAUS data terminology extraction

<some issues>

minimum amount of data

need for data classification

language pairs

Page 4: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #4

for sure it is!

<data cleaning + selection>

translation tables and language models

data and parameters for tuning

test measures

<engines creation>

several + pruning afterwards

<engine validation>

by professional translators

<continuous improvement>

new files, new corpora, new rules, etc.

Page 5: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #5

the production process (I)

statistical MT decoding

convertfile format

segmenttext

NLPtasks

tokenizerewritesource

lowercase

Page 6: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #6

the production process (II)

statistical MT decoding

translatedfile

reformat detokenize

rewrite target

uppercaseevaluate

Page 7: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #7

risk minimization

<tauyou>quality metrics computation

<LSP>time and cost analysis

<LSP> + <tauyou>track the evolution over time

Page 8: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #8

practical hints

bigger clients

languages

with highest translation volumes

with similar structure

with specific terminology/needs

MT-friendly translators

start moving

Page 9: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #9

some numbers

more than 1,500 million words per month

in latin languages ES, FR, PT, CA, GA, IT, RO

EN as source or target is the star

ES, FR, DE, PT, IT, DA, SV, ZH, AR, JP...

LSPs are translating +3 million words per month

investment pays off if you translate

+50,000 words per month

Page 10: 2012 MosesCore GALA Monaco: Friendly Machine Translation

© 2012 #10

Thanks!

// Diego Bartolomé, PhD

<address> C/ Les Planes 39 – 08201 Sabadell – Spain

<phone> +34 93 711 29 96

<cell> +34 670 331 225

<email> [email protected]

<www> tauyou.com