TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

20
TAUS Data Association Roadmap Consultation Webinar 5PM CEST / 8 AM PDT 12 May 2010

description

The proposed developments were wide-reaching and have significant implications for how the industry conducts business. We received 680 usable responses to the consultation questionnaire and a wealth of new ideas on further new features and services. Responses came from every stakeholder group in the industry: translators, corporate buyers, public sector buyers, service providers, technology vendors, academia, and consultants / sector analysts / commentators. A very large majority recognize the benefits of sharing translation memories. There was a strong endorsement of plans to provide users and members with greater intelligent access and easier access to data through translation matching and open APIs for services. View the presentation to see how people voted, what has been prioritized, and when new services will be delivered.

Transcript of TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Page 1: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

TAUS Data Association Roadmap ConsultationWebinar5PM CEST / 8 AM PDT12 May 2010

Page 2: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Hosts

Jaap van der Meer, Director

Rahzeb Choudhury, Operations Director

Page 3: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Webinar Participants

280 registered50% translators30% service providers10% buyers providers (held buyer webinar 28 April)5% technology providers5% other

From all corners of the globe and all have reviewed the following three tables and indicated needs and preferences:

Page 4: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

How People Use TDA TodayServices Result Saving

TERMINOLOGY

Free servicesTAUS Search to look up terms and phrasesUpload translation memories (TMs) “For Search Only”

Most people can’t look up terms and phrases in the whole corpus of their TMs. TAUS Search lets you/everyone find translations in all of your uploaded TMs, and compare this across an industry.This helps to solve translation and review bottlenecks, saving time, increasing quality and consistency.

5%-10%

TRANSLATION

MEMORY

Member servicesDownload TMs to obtain additional leverage beyond that provided by your own TMs. Download by industry or data owner while checking the volume counter. Import the TMX files in your regular translation editor and start leveraging translations.

Classic segment level leveraging tools typically provide small increases. Advanced insegment leveraging tool s which can search out matches from large TM corpora by using statistical routines and sometimes linguistic intelligence can generate 10% to 50% or more high-fuzzy matches.

10%-50%

MACHINE TRANSLATION

Member servicesDownload TMs to get data to train machine translation (MT) engines. Download by language pair, data owner, industry and/or content type.

The Associations members have experienced increases in quality of up to 50% as a result of using TMs from TDA. Good quality MT output often leads to a doubling of translation/post-editing productivity, or allows publishers to provide real-time translation when the translation only needs to be of usable quality.

50%

Page 5: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Proposed New Features & ServicesFeature or service Benefits

TERMINOLO

GY

TAUS Search - Multi-word translation. Now we compute translations for single words only. Extend this computed translation to include phrases.

Better translation quality and saving more time and cost.

TAUS Search - Synonym search. Automatically finds related terms and their translations in context.

Better translation quality ,and saving more time and cost/money.

TAUS Search - Matrix search. Search across all language pairs (instead of primarily from and into English).

Make TAUS Search beneficial for more users and more languages.

TM

Tool compliance. Currently all TMs are stored in TMX format at TDA. This feature allows users to also store TMs in specific tool compliant formats.

Optimizes leveraging within tools. TDA hosts data used by translators in distributed networks.

Translation Matching. Allows users to upload new documents and retrieve all matches from the entire TDA repository in a TMX format.

Easily retrieve all matches from entire database to increase leverage/productivity.

TM & MT

TM Cleaning. A statistical tool that filters out suspicious translation units. Eliminates bad quality translations.

Matrix TM. Allows users to extract TMs from TDA in all language pairs (so long as data owner and product line correspond).

Allows TM leveraging in new languages.

Matching scores. A statistical tool to identify the best matching data by zooming in or out depending on volume or accuracy requirements.

Ideal for optimizing data selection.

MT

MT Trainer. Allows users to upload TMs and request new engines to be trained through TDA users based on TDA data sets.

Buyers can access and compare engines for all languages and industries.

Genre identification. Statistical tool that identifies content types, helping users to select data of the same genre for MT training.

Ideal for optimizing data selection.

Private: members may limit sharing of TMs to their own selection of registered users (‘private vaults’).Integration: API’s for all services will be available to everyone.

Page 6: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Use Scenarios Use scenarios Membership

TERMINOLO

GY

1 Search terms and phrases through www.tausdata.org or widget. Not required

2 Integrate the TAUS Search in your software, portal or web site. Not required

3 Upload your TMs so that they can be searchedfor terms and phrases (“For Search Only”)

Not required

TRANSLATIO

N MEMO

RY

4 Upload your TMs so that they can be downloaded, leveraged and used by all TDA members.

Not required

5 Download TMs and import them in your own translation tool for leveraging. Yes

6 Define your own ‘private vault’ to restrict the distribution of your TMs to specific registered users.

Yes

7 Integrate TM Sharing (uploading and downloading) in your own translation environment

Not required

8 Use Translation Matching to retrieve full and fuzzy matches from the entire TDA database and import them in you own translation environment for editing

Yes

9 Integrate the Translation Matching service in your own or preferred software. Not required

MACHINE

TRANSLATIO

N

10 Upload your TMs and request the training of an MT engine using the MT Trainer & Evaluator service.

Yes

11 Integrate the MT Trainer & Evaluator in your own software, portal or web site. Not required

Registration and acceptance of terms is required for all use scenarios.

Page 7: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Agenda

Implement an industry visionBackgroundStatusConsultationRoadmapTo share or not to shareBeyond current proposalsQuestionsNext stepsThanks again

Page 8: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Implement an industry vision

See TAUS animation on 2000 years of translation on YouTube or dotSUB – Translation: yesterday, today, tomorrow

Share translation memories ….to extend our investmentsto translate much more contentto help world communicate better!

Page 9: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Background - from closed to open industry

The evolution is irreversible .…From fragmentation to consolidation …. From closed to open …. From desk-top to enterprise server to industry

shared platforms.

Sharing language data accelerates automation and innovation in the translation industry.

TM Industry-sharedlanguage data

Industry-sharedlanguage data

TM TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

TM

Page 10: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Background - milestones

March 2007 Idea is born in Taos, New Mexico

July 2008 TDA established by 40 members, TAUS publishes Localization Business Innovation White Paper

March 2009 TDA platform launchedOn schedule, on budget

August 2009 Member pilot projects prove benefits

October 2009 TAUS Search v2 launchedUsage takes off rapidly

February 2010 Share for TAUS Search only (free)

April-May 2010 Roadmap 2010 Consultation

Page 11: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Status – TDA and the market

Sixty-five member organizations

7 billion words downloaded, 2.6 billion uploaded

Sharing translation memories growing industry practice

Translation tools moving to the “Cloud”

TDA is the only member-driven and industry-sanctioned choice

Page 12: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Consultation

680 usable responses to survey

45% had not heard of TDA before the consultation

Overall 85% believe/agree sharing translation memories brings improvements to term use, leveraging, machine translation quality

94% want to make use of TAUS Search

75% want TAUS Search integrated into own environment

86% want to be able to download data with translation matches

70% want to private vaults for limited sharing

Page 13: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Roadmap in implementation

TM Cleaning already in place – tag filters, remove corrupt characters, remove corrupt XML, reject duplicates, flag missing translations, users report errors via TAUS Search

Free TAUS Search API available by end May

Translation Matching using scaled up open-source GlobalSight engine scheduled for October (sponsored by Welocalize)

• Free translation matching API to reuse short strings• TDA integration layer enabling connectivity with any CAT tool

using XLIFF

Private Vaults for limited sharing – by end of year

Page 14: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Roadmap in pilot or testing

TM Cleaning to come – Use statistical algorithms to flag bad translations. Proven in research now being tested for scalability

Matching Scores – for more granular data selection for MT training & TM leveraging being tested

MT Trainer – currently pilotingTo enable fully automated customization of machine translation engines. Prospective buyer loads sample file & own data. This is sent with additional data from TDA to MT vendors who train engines. Output quality is benchmarked to provide objective reporting across engines from various providers. Will allow small specialized solution providers to connect with buyers.

Results to be presented 20 May during the TAUS Executive Forum in Copenhagen

Page 15: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Roadmap

Improvements to TAUS Search, matrix search and matrix TM, genre identification, full MT trainer service, tool compliance (saving in tool specific formats) will be delivered next year

Page 16: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

To share or not to share

50% willing to share TMs for others to download 60% willing to share for TAUS Search only70% happy to share in private vaults

Page 17: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Beyond current proposals

Responses to consultation include a wealth of ideas on new features:Forums, open source middleware, glossary creation tools,

flagging jargon and much more

People / companies offering services instead of sharing data

We will report more use cases, provide more guidance on good practices with shared data, and create a blog

Assessing potential to open-source TDA technology

Page 18: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Questions

30 minutes were taken to answer questions on a wide variety of issues:A few answers :

Member fees represent a collective investment by the all stakeholders in the industry to create a common infrastructure for everyone to win from sharing and benefit from reuse of translation memories

Yes, TDA is partnering with other industry organizations

Legal issues are covered by the sharing and pooling conditions found via the JOIN TDA web page

Page 19: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Next steps

May 19 - Members vote on reduced entry-level member fees (EGM in Copenhagen)

May 20 - Report results of MT Trainer Pilot (Copenhagen)End May - Open API for TAUS SearchJune - TAUS Data Association blog

Report more use casesMultiCorpora’s MultiTrans fully integrated for seamless upload and download

July - Lionbridge’s Translator Workspace fully integrated October - Translation matching live with open API and XLIFF

GlobalSight fully integratedTDA Annual General Meeting, Portland (OR), USA

Page 20: TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)

Thanks!

Questions and comments are very welcome

[email protected]

[email protected]