TAUS MT SHOWCASE, TAUS DQF, Rahzeb Choudhury, TAUS, 10 April 2013
TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)
-
Upload
taus-enabling-better-translation -
Category
Technology
-
view
688 -
download
0
description
Transcript of TAUS Data Association | Consultation: Strategic Roadmap - Webinar (May 12, 2010)
TAUS Data Association Roadmap ConsultationWebinar5PM CEST / 8 AM PDT12 May 2010
Hosts
Jaap van der Meer, Director
Rahzeb Choudhury, Operations Director
Webinar Participants
280 registered50% translators30% service providers10% buyers providers (held buyer webinar 28 April)5% technology providers5% other
From all corners of the globe and all have reviewed the following three tables and indicated needs and preferences:
How People Use TDA TodayServices Result Saving
TERMINOLOGY
Free servicesTAUS Search to look up terms and phrasesUpload translation memories (TMs) “For Search Only”
Most people can’t look up terms and phrases in the whole corpus of their TMs. TAUS Search lets you/everyone find translations in all of your uploaded TMs, and compare this across an industry.This helps to solve translation and review bottlenecks, saving time, increasing quality and consistency.
5%-10%
TRANSLATION
MEMORY
Member servicesDownload TMs to obtain additional leverage beyond that provided by your own TMs. Download by industry or data owner while checking the volume counter. Import the TMX files in your regular translation editor and start leveraging translations.
Classic segment level leveraging tools typically provide small increases. Advanced insegment leveraging tool s which can search out matches from large TM corpora by using statistical routines and sometimes linguistic intelligence can generate 10% to 50% or more high-fuzzy matches.
10%-50%
MACHINE TRANSLATION
Member servicesDownload TMs to get data to train machine translation (MT) engines. Download by language pair, data owner, industry and/or content type.
The Associations members have experienced increases in quality of up to 50% as a result of using TMs from TDA. Good quality MT output often leads to a doubling of translation/post-editing productivity, or allows publishers to provide real-time translation when the translation only needs to be of usable quality.
50%
Proposed New Features & ServicesFeature or service Benefits
TERMINOLO
GY
TAUS Search - Multi-word translation. Now we compute translations for single words only. Extend this computed translation to include phrases.
Better translation quality and saving more time and cost.
TAUS Search - Synonym search. Automatically finds related terms and their translations in context.
Better translation quality ,and saving more time and cost/money.
TAUS Search - Matrix search. Search across all language pairs (instead of primarily from and into English).
Make TAUS Search beneficial for more users and more languages.
TM
Tool compliance. Currently all TMs are stored in TMX format at TDA. This feature allows users to also store TMs in specific tool compliant formats.
Optimizes leveraging within tools. TDA hosts data used by translators in distributed networks.
Translation Matching. Allows users to upload new documents and retrieve all matches from the entire TDA repository in a TMX format.
Easily retrieve all matches from entire database to increase leverage/productivity.
TM & MT
TM Cleaning. A statistical tool that filters out suspicious translation units. Eliminates bad quality translations.
Matrix TM. Allows users to extract TMs from TDA in all language pairs (so long as data owner and product line correspond).
Allows TM leveraging in new languages.
Matching scores. A statistical tool to identify the best matching data by zooming in or out depending on volume or accuracy requirements.
Ideal for optimizing data selection.
MT
MT Trainer. Allows users to upload TMs and request new engines to be trained through TDA users based on TDA data sets.
Buyers can access and compare engines for all languages and industries.
Genre identification. Statistical tool that identifies content types, helping users to select data of the same genre for MT training.
Ideal for optimizing data selection.
Private: members may limit sharing of TMs to their own selection of registered users (‘private vaults’).Integration: API’s for all services will be available to everyone.
Use Scenarios Use scenarios Membership
TERMINOLO
GY
1 Search terms and phrases through www.tausdata.org or widget. Not required
2 Integrate the TAUS Search in your software, portal or web site. Not required
3 Upload your TMs so that they can be searchedfor terms and phrases (“For Search Only”)
Not required
TRANSLATIO
N MEMO
RY
4 Upload your TMs so that they can be downloaded, leveraged and used by all TDA members.
Not required
5 Download TMs and import them in your own translation tool for leveraging. Yes
6 Define your own ‘private vault’ to restrict the distribution of your TMs to specific registered users.
Yes
7 Integrate TM Sharing (uploading and downloading) in your own translation environment
Not required
8 Use Translation Matching to retrieve full and fuzzy matches from the entire TDA database and import them in you own translation environment for editing
Yes
9 Integrate the Translation Matching service in your own or preferred software. Not required
MACHINE
TRANSLATIO
N
10 Upload your TMs and request the training of an MT engine using the MT Trainer & Evaluator service.
Yes
11 Integrate the MT Trainer & Evaluator in your own software, portal or web site. Not required
Registration and acceptance of terms is required for all use scenarios.
Agenda
Implement an industry visionBackgroundStatusConsultationRoadmapTo share or not to shareBeyond current proposalsQuestionsNext stepsThanks again
Implement an industry vision
See TAUS animation on 2000 years of translation on YouTube or dotSUB – Translation: yesterday, today, tomorrow
Share translation memories ….to extend our investmentsto translate much more contentto help world communicate better!
Background - from closed to open industry
The evolution is irreversible .…From fragmentation to consolidation …. From closed to open …. From desk-top to enterprise server to industry
shared platforms.
Sharing language data accelerates automation and innovation in the translation industry.
TM Industry-sharedlanguage data
Industry-sharedlanguage data
TM TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
TM
Background - milestones
March 2007 Idea is born in Taos, New Mexico
July 2008 TDA established by 40 members, TAUS publishes Localization Business Innovation White Paper
March 2009 TDA platform launchedOn schedule, on budget
August 2009 Member pilot projects prove benefits
October 2009 TAUS Search v2 launchedUsage takes off rapidly
February 2010 Share for TAUS Search only (free)
April-May 2010 Roadmap 2010 Consultation
Status – TDA and the market
Sixty-five member organizations
7 billion words downloaded, 2.6 billion uploaded
Sharing translation memories growing industry practice
Translation tools moving to the “Cloud”
TDA is the only member-driven and industry-sanctioned choice
Consultation
680 usable responses to survey
45% had not heard of TDA before the consultation
Overall 85% believe/agree sharing translation memories brings improvements to term use, leveraging, machine translation quality
94% want to make use of TAUS Search
75% want TAUS Search integrated into own environment
86% want to be able to download data with translation matches
70% want to private vaults for limited sharing
Roadmap in implementation
TM Cleaning already in place – tag filters, remove corrupt characters, remove corrupt XML, reject duplicates, flag missing translations, users report errors via TAUS Search
Free TAUS Search API available by end May
Translation Matching using scaled up open-source GlobalSight engine scheduled for October (sponsored by Welocalize)
• Free translation matching API to reuse short strings• TDA integration layer enabling connectivity with any CAT tool
using XLIFF
Private Vaults for limited sharing – by end of year
Roadmap in pilot or testing
TM Cleaning to come – Use statistical algorithms to flag bad translations. Proven in research now being tested for scalability
Matching Scores – for more granular data selection for MT training & TM leveraging being tested
MT Trainer – currently pilotingTo enable fully automated customization of machine translation engines. Prospective buyer loads sample file & own data. This is sent with additional data from TDA to MT vendors who train engines. Output quality is benchmarked to provide objective reporting across engines from various providers. Will allow small specialized solution providers to connect with buyers.
Results to be presented 20 May during the TAUS Executive Forum in Copenhagen
Roadmap
Improvements to TAUS Search, matrix search and matrix TM, genre identification, full MT trainer service, tool compliance (saving in tool specific formats) will be delivered next year
To share or not to share
50% willing to share TMs for others to download 60% willing to share for TAUS Search only70% happy to share in private vaults
Beyond current proposals
Responses to consultation include a wealth of ideas on new features:Forums, open source middleware, glossary creation tools,
flagging jargon and much more
People / companies offering services instead of sharing data
We will report more use cases, provide more guidance on good practices with shared data, and create a blog
Assessing potential to open-source TDA technology
Questions
30 minutes were taken to answer questions on a wide variety of issues:A few answers :
Member fees represent a collective investment by the all stakeholders in the industry to create a common infrastructure for everyone to win from sharing and benefit from reuse of translation memories
Yes, TDA is partnering with other industry organizations
Legal issues are covered by the sharing and pooling conditions found via the JOIN TDA web page
Next steps
May 19 - Members vote on reduced entry-level member fees (EGM in Copenhagen)
May 20 - Report results of MT Trainer Pilot (Copenhagen)End May - Open API for TAUS SearchJune - TAUS Data Association blog
Report more use casesMultiCorpora’s MultiTrans fully integrated for seamless upload and download
July - Lionbridge’s Translator Workspace fully integrated October - Translation matching live with open API and XLIFF
GlobalSight fully integratedTDA Annual General Meeting, Portland (OR), USA
Thanks!
Questions and comments are very welcome