TAUS Translation Data Landscape Report
Authors: Andrew Joscelyne & Anna SamiotouReviewer: Jaap van der Meer
The report…
• was published in December 2015• has been written by TAUS in consultation with
the EU project LT Observatory supervised by LT Innovate
• has drawn insights through surveys of industry and interviews with a broad range of stakeholders
The report attempts to answer to:
• Who are the producers and consumers of translation data? How are they changing?
• Is there a viable “market” for translation data, beyond the current informal sharing or web- scraping model?
• What can we do to overcome the legal/technical issues and concerns regarding translation data sharing?
• How could translation data sharing as a natural practice integrate with the European Digital Single Market program?
• Which models of translation data circulation work best? For how long? What could disrupt them?
Methods to obtain Translation data
• Leveraging public and open resources• Creating one’s own resources by human, semi-
automatic or automatic means• Scraping the web by web crawling: Parallel text
collections to be used mainly by MT systems
• Sharing or exchanging data• Paying for data: Stakeholders will pay for translation data
when these are known to be uniquely valuable in terms of relevance and impact to the task at hand, are affordable and there is no other solution
Translation data user types
Scenarios for a Translation data Marketplace
• Datasets: Buy data, sell data, exchange data, bid for data, order data, offer specific in-domain translation data.
• Datasets & Tools: A commercial service for translation data together with multilingual enablers and tools that can provide fingerprints of the data, curate, benchmark, validate the quality and relevance of the data to the task at hand.
• Trained domain MT engines: Deliver in-domain translation engines
• Plug & play model: This is the current model used today for accessing a service in one go.
Translation data provision models SWOT analysis 1/2
Translation data provision models SWOT analysis 2/2
How about a Translation data Marketplace?
Drivers: highly globalized market – providing translation data for reasonable price – allow for benchmarking prior to purchaseInhibitors: Using other peoples’ resources can be a blind guess – current lack of tools – imbalance of high & low resource languagesChallenges: enhance language coverage – address high risk of local markets being edged by global players and by plug & play technologies
Impact of drivers and inhibitors
Critical determinants of the way ahead
• We are at the beginning of the translation data age. • Content will be king and queen. • Innovation will be vital: many different competing
solutions will emerge for streamlining the value chain between raw data and specific translation requirements.
• The term “translation data” has two meanings:– we need the data to drive translation automation. – we also vitally need data about translation: find good
data about global data usage.
Top Related