TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

16
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE A Moses MT engine for legal translation By Joël Sigling

description

A Moses engine for legal translation This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme. Latest news on Twitter - #MosesCore

Transcript of TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Page 1: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE

A Moses MT engine for legal translation

By Joël Sigling

Page 2: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

a Moses MT engine forlegal

translationModern technology in a traditional sector

Joël SiglingDirector

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASEMonte Carlo, 25 March 2012

Page 3: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB Translations background• Amstelveens Vertaalburo: founded 1972 – traditional, high-

quality agency

• Translation World: founded 2002, tech-savvy all-round player

• Merger in 2010 >> AVB Translations: premium brand with strong tech focus

• Top 5 player in The Netherlands, 2011 turnover € 4.6 million

• Core business: general translations – legal, financial, technical, …NO software localization (yet!)

Page 4: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

History of MT interest

• Member of TAUS since 2008, 1st round table Amsterdam

• Visited TAUS User Conferences in US since 2009

• Sense of urgency developed, merger distraction 2010

• Action in 2011 after merger

• 2011: choice for Dutch <> English legal (not IT-related!) domain engine

• Why SMT, why Moses? Quicker, cheaper, similar quality (shows research)

Page 5: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Why legal domain MT engine?• Legal translations about approx. 40% of AVB business, 80%

Dutch <>English

• Not the obvious choice: people said MT wouldn’t work for legal: sentences too long, material too intricate

• Statistical MT suited to non-stylistic materials: eg legal

• If this works, we can make MT happen for all other domains

Page 6: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

MT engine objectives

• Increased productivity, no BLEU % target, but tangible, practical results. How much extra can a translator do when compared to HT?

• Tool to offer usable quality with very quick turnarounds for

high volume (typical “Friday afternoon lawyer requests”) • Becoming an MT front runner in the non-localization sector for

Dutch (5th language in Europe after FIGS)

Page 7: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Developing the Moses engine• Choice between in-house and external development

• In-house: control, developing expertise, lower long-term cost• External: lower initial cost, much more expertise > best for

now

• Our pre-requisites for development option • ownership and free access to engine• assurance data will not be used or copied by builder• Acceptable costs for development & usage• skilled partner > AsiaOnline, CrossLang, Pangeanic, LetsMT,

SmartMate??

• CrossLang > all of the above, closest to our office, independent

Page 8: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

What we needed

• Large quantities of high-quality translation data

• Aligning existing high-quality legal translations (took longest to prepare)

• Existing legal TMs• Going forward: company-/industry-specific terminology

• Ways to measure gains

• Not just automated evaluation % increase, but also tangible improvements > we are entrepreneurs, not scientists

• CrossLang automated assessment tool (TER, BLEU, NIST, METEOR)

• Manual assessment: eg. how many hours for post-editing 10,000 words?

Page 9: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Input data

• Highest quality AVB Dutch <>English legal translations: approx. 700k words per language. Predominantly civil law.

• Not fully reviewed AVB TM, still high-quality: approx. 10 mi. words per language. Predominantly civil law.

• Legal translations harvested by CrossLang, more diverse legal material: 7 mi. words per language

Page 10: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

CrossLang automated test results• Best results from AVB + harvested data, AVB data

weighted extra

• Results particularly good in civil law domain (bulk of AVB input data)

• Results improved dramatically for other legal domains by adding harvested data

Page 11: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Test done in CrossLang production assessment tool: productivity 5% higher for post-editing than human output (human output in this case very high >1000 w p/h, PE even higer)

Page 12: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Live rush translations done in past two weeks:

• 1,500 word trial done for law firm needing high volume in very short time. Post-edited in 75 minutes. Customer happy with quality/price ratio.

• 25,000 words in two days with moderate PE effort by two post-editors. Quality estimate 80-90% of human translation.

• 4,500 words in 3 hours with almost full PE effort by one post-editor. Quality estimate >90% of human translation

• 15,000 words in one day, done by two post-editors. Quality estimate 80-90% of human translation

Page 13: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

AVB results in practice

• Test and live project show great potential in two areas:

• Producing usable translations very quickly and at 50-60% of normal translation cost. Margins are similar to normal translation, but likely to improve!

• Higher productivity, ie lower production cost and increased margins.

Page 14: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

CrossLang Gateway benefits

• Standard Moses engine offers no high-level functions• Only plain text files, always sentence by sentence,

experimental recasing, experimental tag handling

• CrossLang Gateway offers Java service layer (not wrapper scripts)• Most common file formats: Word, XML, XLIFF, • Adjustable text segmentation • Hardened, aligment-based tag handling• Advanced recasing tool based on alignment data• Named entity recognition & (re)tokenization• Terminology checking and replacement

Gateway features crucial to processing our material properly

Page 15: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Conclusions

• Developing a good engine is not an “out of the box” task

• Sufficient high-quality data is necessary for good results

• Results are very promising, our objectives can be achieved

• Working with a value added partner is recommended

• Need to integrate MT solution in translation workflow apparent

Page 16: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 March 2012

Phone: +31 20 645.66.10Mobile: +31 625.025.475E-mail: [email protected]: @JoelAVBAdres: Ouderkerkerlaan 50

1185 AD AmstelveenThe Netherlands

Website: www.avb.nl