Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization...

11
MOSES CORE 2012 Annual Report Charles University, Prague

Transcript of Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization...

Page 1: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE

2012

Annual Report

Charles University,Prague

Page 2: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

1 Overview

The diversity of languages in Europe makes translation vitally important tothe economic, cultural and social lives of Europeans. Machine translation(MT) provides a way of fully or partially automating the translation pro-cess, and hence reducing the costs and enabling more text and speech to betranslated.

Machine translation, however, is a complex field and presents many sub-stantial barriers for entry to potential researchers, and users of the technolo-gies. The principal aim of MosesCore is to reduce these barriers, making iteasier to join and participate in the MT research community, and to becomean MT user.

MosesCore will achieve these aims by organising a variety of events tar-geted at users, developers and researchers of MT, and by promoting andcoordinating the development and use of open-source MT tools, in particu-lar the Moses toolkit.

1.1 Key Facts

Project type FP7 Coordination ActionDuration February 1st 2012 - January 31st 2015Financing e1.2Contact Barry Haddow ([email protected])

1.2 Partners

University of Edinburgh United KingdomTAUS NetherlandsCharles University, Prague Czech RepublicFondazione Bruno Kessler ItalyCapita Translation and Interpreting United Kingdom(formerly Applied Language Solutions)

1.3 Beneficiaries

Researchers have events in which they can showcase their research, com-pare their systems with others, and gather to implement new MT tech-niques. They also have a state-of-the-art open-source platform to testout their ideas on.

1

Page 3: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

Users and Developers have a stable and well supported open-source MTtoolkit, and have forums to learn about new research developments inMT and share system building and deployment experience.

Everyone benefits from improved information exchange between develop-ers, users and researchers.

2 Events

2.1 Machine Translation Marathon

The Machine Translation Marathon (MTM) is a week-long event where MTresearchers and developers gather to discuss and implement the latest MTtechniques in open-source software. It also provides a “summer” school forthose new to MT, taught by leading MT researchers, an open-source conven-tion to present papers on new MT tools, and a series of invited talks on MTand related topics. This year the Seventh Marathon (MTM121) was held atthe University of Edinburgh, in Scotland, and attracted approximately 70participants from a mixture of backgrounds and countries.

The MTM featured invited talks from both academic and commercialMT experts on subjects ranging from discourse and confidence estimationin MT, to large-scale mining of parallel texts, translation process research,and how to make the jump from academia to industry. The summer schoolincluded lecturers from the US, Canada, Italy and the UK, and coveredall aspects of MT from the basics, through phrase-based and syntactic ap-proaches, to computer-aided translation. The accompanying labs were runas “mini-projects” where participants implemented state-of-the-art MT al-gorithms under the guidance of leading experts. The MTM also featuredorally-presented papers on new tools and developments in open-source MTsoftware.

For many of the Marathon’s attendees though, the main business of theevent is the collection of MT hacking projects which run throughout theweek. These involved small groups of MT researchers and developers, withdiffering levels of experience, working to extend or create a piece of open-source MT software. The projects may involve implementing new researchideas, or creating a new implementation of a recently published idea. The

1www.statmt.org/mtm12

2

Page 4: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

hacking projects targeted the most popular open-source decoders (Joshua,cdec and Moses) as well as MT evaluation tools such as DELiC4MT andAsiya. In total there were 13 different projects running during the marathon,and approximately two-thirds of the attendees participated in projects.

Figure 1: The MT Marathon in Edinburgh

2.2 Industrial Outreach Events

These are organised by TAUS. Our goal was to have 30 people participatingin each event. Interest in the sessions exceeded this goal.

GALA Pre-conference 25 March, Monte Carlo (Monaco) with use casepresentations from AVB Vertalingen, Capita, Tilde (LetsMT!), Logrus Inter-national, Tauyou, Trusted Translations, and TAUS

3

Page 5: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

TAUS Asia Summit Pre-Conference 24 April, Beijing (China, PRC)with use case presentations from Adobe, Capita, Chinese Academy of Sci-ences, Straker Translations and TAUS

Localization World Pre-Conference 10 June, Paris (France) with usecase presentations from CA Technologies, CrossLang, Hunnect, Pangeanic,Sybase, Trusted Translations and TAUS

Localization World Pre-Conference 17 October, Seattle (USA) withuse case presentations from Adobe, AVB Translations, EMC, Safaba Trans-lation Solutions, Symantec, Tauyou and TAUS

Figure 2: TAUS at Localization World, Seattle

The speaker presentations and articles summarising the events can befound via www.tauslabs.com/open-source-mt/mosescore/resources1

Over 180 people took part in TAUS Open Source MT Showcases in 2012.Twenty-four of the presentations were uploaded to Slideshare for public ac-cess. These slides have been viewed a total of 8,862 times at 8 November2012. On average each presentation has been viewed 369 times.

4

Page 6: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

2.3 Workshop on Machine Translation

The Workshop on Machine Translation (WMT) is an annual academic work-shop, which is normally attached to one of the leading computational linguis-tics conferences. As well as providing a forum to present the latest researchon MT, the workshop includes reports on its associated collection of sharedtasks. These tasks provide an opportunity for MT researchers to comparetheir latest techniques against others in the field.

The first WMT to take place within the MosesCore project timeframewas in Montreal, Canada, attached to the North American Association forComputational Linguistics (NAACL) conference. This WMT was not organ-ised by MosesCore, but the project partners took the opportunity to discussthe running of the shared task and workshop with each other, and with thecurrent organisers. The item that attracted the most attention was how toimprove the human evaluation of the shared task output, and to this end wewill be piloting some of the suggestions for improvement in late 2012.

The first MosesCore-run WMT and shared task will be attached to theAssociation for Computational Linguistics (ACL) conference in Sofia, Bul-garia, to be held in August 2013. There will be three different shared tasks,comparing translation systems, MT evaluation metrics, and MT confidenceestimation algorithms. New parallel test data will be created specifically forthese tasks, and made available to the research community, with Russianfeaturing as a guest language in this year’s task, alongside English, French,Spanish, German and Czech. The training corpora will also be updated forthis year’s tasks, with new releases of the parallel and monolingual corpora.

3 The Moses Toolkit

3.1 Background

Moses2 is an open-source toolkit for building statistical machine translationsystems. It provides tools to train such systems from parallel data, and adecoder to translate sentences using models trained with the toolkit. Thetwo main statistical MT paradigms (phrase-based and hierarchical) are bothimplemented in Moses, and its comprehensive coverage of current technolo-gies, together with its liberal LGPL license have made it popular with both

2www.statmt.org/moses

5

Page 7: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

academic and commercial users.The MosesCore project aims to retain Moses’ place as (arguably) the most

popular open-source SMT toolkit by continuing to incorporate new research,whilst improving stability and support. It has funded the appointment of a“Moses Coordinator” (Hieu Hoang) to oversee Moses development.

3.2 Current Development

Contributions to Moses continue to be received from a diverse range of com-mitters, in addition to the core developers mainly based in Edinburgh. Someof the highlights in the first year of MosesCore are:

• An improved build system which links to the test suite.

• Parallelisation of many parts of the training pipeline.

• Support for sparse features, enabling new types of SMT models to bebuilt.

• Integration with translation memories (the result of a collaborationwith Systran)

• Integration of a new tuning algorithm (k-best MIRA), contributed byColin Cherry of NRC Canada

• Integration of a compact phrase table, contributed by Marcin Junczys-Dowmunt of Adam Mickiewicz University

Multi-platform support and debugging of has been improved by settingup virtual machines at Edinburgh to test the most common platforms used torun Moses. It is now possible to build and run Moses on OSX, Windows, aswell as many flavours of Linux (the primary development platform) and weplan to set up regular automated tests to make sure that committed changesdon’t break multi-platform functionality.

3.3 Releases

The MosesCore consortium is committed to providing three releases of Moses,one in each year of the project. To help test the release process, we havealready made release 0.91 available in October 2012.

6

Page 8: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

As well as running the code through all the current automated tests, therelease process involved using Moses to produce 4 different types of SMTmodels in 8 European language pairs. These models have been provided fordownload to enable Moses users to kick-start their MT system development.

The next release of Moses (version 1.0) is due for the end of year 1 ofMosesCore, i.e. February 2013, and will include new installers at least forWindows and Linux.

4 Communications

4.1 Industrial Outreach Newsletter and Website

Dissemination Activity Our aims this year were to raise awareness amongindustry of the use cases, benefits and challenges of Moses. We sought toachieve these goals through dedicated events, web, e-campaigns, publishingactivity and an online MT and Moses Tutorial.

Web and e-campaigns A web landing page was established at tauslabs.com, as well as a project website at /www.statmt.org/mosescore/

In order to keep stakeholders up to date with the latest developments,bi-monthly newsletters have been sent to an opt-in list. These can be foundvia www.tauslabs.com/open-source-mt/mosescore/newsletters

The e-campaigns have promoted events, helped to disseminate articlesand presentations, as well as promote the MT and Moses Tutorial. All e-campaigns have been supported by social media activity across the majorplatforms.

Satisfaction survey At the end of 2012 we plan to undertake a satisfac-tion survey to collect the feedback from event participants, subscribers andtutorial users. The findings will help to plan and execute future activity.

4.2 Moses Website and Mailing List

The main modes of communication between Moses developers and users arethe Moses website (www.statmt.org/moses) and the Moses support mailinglist.

7

Page 9: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

The Moses website covers not only Moses design and usage, but includesarticles on MT background and theory, and tutorial materials on phrase-based and chart-based MT. The website is compiled nightly into a 250 pagepdf manual, and made available for download from the site. In the firstyear of MosesCore much of the website was updated, with new tutorial andoverview material added, and out-of-date information removed. Maintenanceof the Moses website is an ongoing MosesCore task.

The Moses support mailing list (receiving an average of 6-7 emails perday) is an open forum where all aspects of the Moses toolkit, parallel cor-pora and machine translation in general are discussed. Many questions areanswered by other Moses users, however, the MosesCore members play a“sweeper” role, ensuring that all questions receive a satisfactory response.

Figure 3: The Moses website

8

Page 10: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

4.3 Tutorials

AMTA 2012 Philipp Koehn and Hieu Hoang (Edinburgh) presented a tu-torial on how to use the Moses open-source machine translation system atthe AMTA (Association for Machine Translation in the Americas) confer-ence. The tutorial covered basics on statistical machine translation, datapreparation, use of different models, use of visualisation tools, optimisationof performance in the face of different data conditions, and addressing prac-tical problems such as integration with translation memory and translationof markup. The tutorial was attended by 10 participants, mostly technicalstaff from industry.

MT and Moses Tutorial This online tutorial was created by TAUS, withsome input from Edinburgh. Providing guidelines on how to get up and run-ning with Moses, it was launched in September 2012. By 8 November therewere 232 registered users, spanning translation buyers, service providers andacademia. The free tutorial can be found at http://www.tauslabs.com/

open-source-mt/mosescore/moses-tutorial. The content of the tutorialwill be periodically updated with new modules.

4.4 Publications

Philipp Koehn (2012), Simulating Human Judgement in Machine TranslationEvaluation Campaigns, Proceedings of IWSLT

9

Page 11: Charles University, Prague · 2013-04-19 · ences, Straker Translations and TAUS Localization World Pre-Conference 10 June, Paris (France) with use case presentations from CA Technologies,

MOSES CORE www.mosescore.eu

Figure 4: The MT and Moses Tutorial

10