TAUS MT SHOWCASE, Moses in the Mix. A Technology Agnostic Approach to a Winning MT Strategy, Lori...

Post on 11-Jun-2015

601 views 2 download

Tags:

description

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore

Transcript of TAUS MT SHOWCASE, Moses in the Mix. A Technology Agnostic Approach to a Winning MT Strategy, Lori...

TAUS  MACHINE  TRANSLATION  SHOWCASE  

Moses in the Mix: A Technology Agnostic Approach to a Winning MT Strategy!!10:50 – 11:10!Wednesday, 12 June 2013!!Lori Thicke!LexWorks!

Moses  in  the  Mix:  A  Technology  Agnos-c  Approach  to    

the  Winning  MT  Strategy  

•  McKinsey’s definition of the T-shaped company

•  Language Services Provider (Lexcelera, founded 1986; managing translators & post-editors)

•  MT Services Provider (training engines, post-editing, etc.)

•  Technology Agnostic!

What is LexWorks?!

•  Developing new technologies to help MT work better with community content!

Other  Technology  Agnos-cs  

“A good MT strategy should be technology-agnostic and look for the most efficient solution on a case-by-case basis. The type of technology that best suits your needs will change depending on the language pair.” !

All approaches - SMT, RBMT, Hybrid - are good when matched

to the course!

The  process  aims  to  define  best  of  breed  soluDons  for  superior  performance  

MT is not a tool. MT is an industrial process.!

1.  Best  of  breed  means  raw  MT  that  is  perfectly  understandable  

MS Translator! Systran Hybrid!sentences:! %! %!not understandable! 15.65! 20.87!partly understandable! 20.00! 34.78!fully understandable! 64.35! 44.35!

Raw  MT  for  FAQs  and  Forum  Content    

MS Translator! Systran Hybrid!

Average score on FAQ article! 2.6! 2.4!

Average score on forum! 2.31! 1.97!

Overall score! 2.48! 2.23!

2. Best of breed means managing post-editing costs!

3.  Best  of  breed  means  retaining  your  post-­‐editors  

4.  Best  of  breed  means  clear  metrics  

!Translation engine!

!Engine Type!

!BLEU Score!

!GTM Score (SymEval)!

!Systran !

!Hybrid!

!69.74!

!72.69!

!Moses!

!Statistical!

!50.46!

!57.93!

!Microsoft Translator!

!Statistical!

!54.01!

!60.81!

15!

Area! Feature! RBMT! SMT!Capability!Add rare language pairs! !

Capability!Number of languages it can handle out of the box! 20! 50!

Cost! Free or Open Source version exists! ! !

Quality! Respects grammatical rules! !

Quality! Handles software tags properly! !

Quality! Output is fluent! !

Quality! Can handle bad grammar! !

Quality! Quality improves with Controlled Authoring! !

Quality! Output is predictable! !

Quality! Retains corrections to terminology (and applies the correct grammar)! !

16!

Area! Feature! RBMT! SMT!

Suitability! Is better for User Generated Content and broad domain material such as patents! !

Suitability! Is better suited to on-the-fly translations of short shelf-life content! !

Suitability! Is better for documentation and even software! !

Suitability! Is suited for rare language pairs! !

Suitability! Is better suited to post-editing! !

Training! Learns automatically ! !

Training! Rapid development customization cycle! !

Training! Effective with limited training corpus! !

17!

Languages! Online! Hybrid! RBMT! SMT!

French, Spanish! ! ! ! !

Russian, Japanese, German! ! !

Norwegian, Danish, Thai! ! !

18!

Content Type & Other Considerations! Online! Hybrid! RBMT! SMT!

Documentation, reports, online help, UI! ! !

FAQs, forums, UGC, ! ! !

Patents, other broad domain! ! !

Marketing materials!

Insufficient in-domain/out-of-domain data ('I', 'me')! ! ! !

Poor grammar, spelling! ! !

Choose  the  horse  that  will  win  on  your  course  

19!