Post on 26-Dec-2014
description
TAUS MACHINE TRANSLATION SHOWCASE
Moses and Other Open Resources 09:40 – 10:00 Wednesday, 12 June 2013 Maxim Khalilov TAUS Labs
Tools
Data
Education and trainings
Support
MT open resources
Tools
Data
Education and trainings
Support
MT open resources
MT open resources: tools o Open source MT toolkits:
o Moses (University of Edinburgh, UK + others)
o Joshua (JHU, USA)
o Cdec (CMU, USA)
o NiuTrans (Northeastern University, China)
o Apertium (University of Alicante, Spain)
o etc..
o Free MT support tools:
o Word alignment (GIZA++, MGIZA++, Berkeley Aligner, etc.)
o Language-dependent tools (tokenizers, segmentors, parsers,..)
o MT evaluation tools (BLEU, TER, METEOR, etc.)
o Many more…
MT open resources: tools
TAUS Tracker: h5p://www.taustracker.com/
Tools
Data
Education and trainings
Support
MT open resources
MT open resources: data
Name Description Domain Aligned data (average)
Languages
Europarl European Parliament Proceedings
Legal/“General Domain”
1.8 million sentences
11 European languages
JRC-Acquis EU laws Legal 270 000 paragraphs
22 European languages
Hansards Canadian Parliament Proceedings
Legal/“General Domain”
1.3 million sentences
North American English, French
UN Resolutions of the general assembly
Legal 3 million words English, French, Spanish, Russian, Chinese, Arabic
} Governmental resources:
MT open resources: data
Name Description Domain Languages
OPUS Free corpora collected by Jörg Tiedemann
IT, movie subtitles, medical
European, non-European for IT
LDC Linguistic Data Consortium (US)
News English, Chinese, Arabic, …
ELRA European Language Resources Association
European
} Academic resources:
MT open resources: data
} Industrial resources:
Name Description Domain Languages
TAUS Data* TAUS Data Repository
Several with slant to IT
All major languages
TMs Translation Memories • your own • from your customer • from your supplier
Project-specific (great for v2.0 or later)
* Open for the participants of the TAUS Developing Talent project.
MT open resources: data
ü 2,200 language pairs ü 17 industry categories ü more than 54 billion words
Tools
Data
Education and trainings
Support
MT open resources
MT open resources: educa>on and trainings
} TAUS MT and Moses tutorial
} Online courses (Coursera.org, Stanford NLP course)
} TAUS Developing Talent project
} Machine Translation Marathons
} Other online resources (JHU MT class, UPC practical tutorial, UEdin MT class)
MT open resources: TAUS MT and Moses Tutorial
o https://tauslabs.com/open-source-mt/mosescore/50-moses-tutorial-guest
o Online tutorial o Narrated presentations
o Step-by-step screen casts
o Technical audience
o Learn about statistical MT and its practical application on the example of Moses
Moses-‐specific
Presenta=on/ Demo
Principles of Machine Transla>on No Presenta>on
Training Data
Data Types and Sources No Presenta>on
Data Conversion and Corpus Prepara>on No Demo
Data Cleaning and Tokeniza>on No Presenta>on
Data Cleaning and Tokeniza>on Demo No Demo
Training Moses MT Systems
Moses Introduc>on Yes Presenta>on
Training a Moses MT System Yes Demo
Bulk Transla>on and MT System Op>miza>on Yes Demo
MT open resources: TAUS MT and Moses Tutorial
Moses-‐specific
Presenta=on/ Demo
Evalua>ng MT Systems
Automa>c Metrics No Presenta>on
Human Evalua>on No Presenta>on
Integra>on
Document Transla>on and Integra>on Scenarios Yes Presenta>on
Document Transla>on and Web API Demo Yes Demo
o More to come o Demos o In-‐depth Info o Commercial Vendor Presenta>ons
MT open resources: TAUS MT and Moses Tutorial
MT open resources: TAUS MT and Moses Tutorial
MT open resources: TAUS MT and Moses Tutorial
MT open resources: TAUS MT and Moses Tutorial
MT open resources: TAUS MT and Moses Tutorial
MT open resources: TAUS Developing Talent
MT open resources: TAUS Developing Talent
MT open resources: TAUS Developing Talent
Tools
Data
Education and trainings
Support
MT open resources
MT open resources: Support
o Moses support list o http://mailman.mit.edu/mailman/listinfo/moses-support
o EAMT MT list o http://www.eamt.org/mt-list.php
o Corpora list o http://www.hit.uib.no/corpora/