Tools of the Trade
Angelika ZerfassConsultant / Trainer for Translation Tools
Agenda
What tools are used in translation / localizationOverview over the main functionalities of the tools in one category Special topics like word counts, TMX
How to evaluate what tool is right for you
Tools in Translation / Localization
Translation Memory ToolsSoftware Localization ToolsTerminology Management ToolsProject Management ToolsWorkflow Management ToolsAdditional Utilities
Translation Memory Tools
Same idea different approach
Re-use of already translated segmentsDatabase / list for terminologyAlignment of translated material, where no TM tool was usedStatistics on number of words, number of re-usable segments
Same idea different approach
Components Repository for text pairs / sentence pairs Editor for translation Repository for terminology Alignment
Editor Word, separate editor, editor within TM tool
Repository Segment pair database per language pair Segment pair database per project (multilingual) Segment pairs in separate reference files Paragraph pairs / parallel documents Segment pairs by ID number (Software Localization)
Project setup All settings within one tool Separate tools for separate tasks (terminology, conversion, translation)
Same idea different approach
Project setup One file format / language pair per project Several file formats / languages per project
Statistics Word count in source / target language Recycling of segments in the source language Repetitions
Alignment Re-use of previously translated files (source and target language files) One-to-one alignment / many-to-one alignment
Terminology management Term list / term database Terminology extraction (monolingual / bilingual)
QA features Check for file structure integrity Check for missed translations, numbers Check for correct usage of terminology
Macros within WordWordfast
Metatexis
Trados
TM + Editor
2 applications open for translationConnection needs to be establishedSetup of translation memory independent of file for translationWorking on single files in the editorWorking on batches of files in the TM system (pre-translation)
Translation Memory Tool
Terminology Tool
Editor
Terminology Extraction Tool
Source language files
Target language files
Alignment Tool
Separate tools
Trados with Word
Trados with TagEditor
Integrated tools
Setup of a project before translation can startSelection of file format, languages, files, location for log files with every projectRe-usable settings from previous projectsFiles are imported into the tool, translation are exported from the tool
Integrated tools
Translation Repository
Old source file
Old target file
Read in reference material
Target files
Generation of target files
Files to translate
Read in new files
Alignment Component
Translation Editor
Terminology Component
Heartsome Editor
SDLX
switchboard
SDLX
Project setup
SDLX
target windowsource window
terminology window
MemoQ
MemoQ
Transit
target window
status windowterminology window
source window
Dj Vu
Word counts in Translation Tools
[email protected] [email protected]
What is counted in an analysis?
Chars/Word: average number of characters per word Chars Total: Number of characters in counted words no spaces or stand-alone numbers
[email protected] [email protected]
Word Count
What is a word? A word (for Trados) contains at least one letter (or
language character for Asian languages) Words are often delimited by spaces (exception
Chinese, Japanese, Thai) Stand-alone numbers (for example in a table column)
or symbols (like 1) are NOT counted as translatable words in Trados, but are counted in other tools
Different TM tools count words differently (sometimes even between different versions of the same tool)
Word count tools: Word, Trados, Transit, Dj Vu, special word count tools like AnyCount, PractiCount...)
Word Count Examples
text element Word Trados200 2 words Not countedcompany/name 1 word 2 wordsSegment with an index field
Contents of index field not counted
Contents of index field counted
Segment with automatic field
Field contents is a word
Field contents not counted
[email protected] [email protected]
Calculation of Match Rates
Comparison of segments from the documents with the source language segments in the TMThe similarity is given as a percentageAlgorithm for calculation is secret, but takes number of words, number of other elements (numbers) and length of segment into account.Different tools calculate matches differently, so the match rates are not really comparable
Match Rate Examples
SDLX Trados Transit Info
There is information on a new tool.
test segment
There is new information on a tool.
85% 92% 99% 1 word moved
There is information on a new tool.
97% 99% 98% Same segment but different formatting
[email protected] [email protected]
Repetitions
A segment that appears in the analyzed documents more than once and does not have a 100% match from the TM, is counted as:
a no match or match (if the match rate is above the minimum match value in the TM options) at the first occurrence
the first repetition at the second occurrence
During translation, the first occurrence needs to be translated from scratch or adapted from the match, from the second occurrence on, it will be a 100% match from the TMWith large projects, where the documents are split up between different translators, you can extract and translate the repetitions (also called frequent segments) first.
But the extracted list shows the segments out of context, so it might not be easy to translate all of them first, especially if they are quite short
What influences the statistics?
Word count Version of (Trados) software INI file (setting for making text between tags or within tags translatable
or non-translatable) Analysis of TTX versus RTF/DOC
Sub segments, like footnotes, index entries
Settings in the Filter Settings dialog for translating different file formats in TagEditor
Ex. Making hidden layers translatable
Settings for non-translatable text via Word stylesSegment count
Settings for penalties (from alignment) Filter settings to prefer matches with a certain additional field Segmentation rules (like abbreviation lists or different segment end
symbols
TMX (Translation Memory Exchange)
Localization Standards
TMX: Translation Memory ExchangeSRX: Segmentation Rules ExchangeOlif: Open Lexicon InterchangeTBX: Term base ExchangeXLIFF: Localization Interchange File
Format (predecessor OpenTag)
TMX Translation Memory Exchange
OSCAR LISA (Localization Industry Standards Association) group (Open Standards for Container/Content Allowing Re-use)
From the TMX specification: The purpose of the TMX format is to provide a
standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process
What is TMX
It is an XML representation of translation memory data Header Body
What is TMX
Body
This is the first sentence.
Dies ist der erste Satz
tu = Translation Unit, tuv lang = translation unit variant (language), seg = segment
What is TMX
Depending on the tool that created the TMX file, it can be bilingual or multilingual.Importing multilingual TMX file into a bilingual project will only import the relevant languages
Levels of TMX
Level 1: Plain text only (sufficient for data coming from software
localization tools)
Level 2: Text plus formatting (data coming from translation memory tools
used for translation of documentation)
To move formatting and text from one tool to the other both tools need to be level 2 compliant!
Level 1
Formatting that is applied to the source and target text of a translation unit is not exported to the TMX file, only pure text.
Original This sentence has some formatting.
In TMX This sentence has some formatting.
Level 2
Formatting that is applied to the source and target text of a translation unit is exported to the TMX file.Different tools use different ways of encoding that information.
TMX from Dj Vu (Atril)Original This sentence contains different formatting
information.In TMX from Dj Vu
{1}This {2} sentence {3} contains {4}different {5}{6}formatting information {7}.
DV puts placeholders (ph) where the formatting will go, not the formatting information itself, formatting information is stored in a separate file.
TMX from Trados Original
This sentence contains different formatting information. In TMX from Translators Workbench
This {\b /ut>sentence} contains
{\i different} {\ul formatting information}.
This {\b sentence} contains {\i different} {\ul formatting information}.
Example 1 is from Version 6.5, example 2 from version 7
TMX from Transit (Star) Original
This sentence contains different formatting information. In TMX from Transit
This sentence contains different formatting information.
Transit uses the begin paired tag (bpt) the end paired tag (ept) and the information for bold (b), italics (i) and underlined (u)
TMX from SDLX (SDL)Original This sentence contains different formatting
information.In TMX from SDLX
This sentence 1> contains different 3>.
SDLX uses placeholders for formatting information that is stored in a different file
Implications of different tags for formatting
Tools that use placeholder tags do not include the actual formatting information in the TMX file Other tools can only re-use the text The result of the exchange is the same as with
TMX level 1 (text only)TMX files which carry the actual formatting information will yield better matches in other tools that can read this information.
When is it useful?
A company moves from tool A to tool B, where tool B cannot import the proprietary TM format of tool ATranslators of one project use different tools or a TM needs to be reused in software localization tools as well as regular translation memory toolsExport format after alignment with one tool, to import into another toolTM maintenance, when the TM tool does not offer all functionalities that are neededBilingual terminology extraction
Does it work?
With the current versions of translation tools on the market it works quite well Previous versions sometimes created their
own flavor of TMX which could not readily be imported by other tools, but the export files had to be changed before import. (en-us, EN_US)
Yes, it does what it was developed for, it makes the exchange of data between tools possibleBUT - This is only half of the storyThe question is, how well can the data that has been exchanged be used
Reusing TMX data
Although Translation Memory Tools have the same basic idea (storing source-target language pairs and recycling translations), this has been realized in different ways.Main issue here, are the segmentation rules
SRX Segmentation Rules Exchange
From the SRX specification The purpose of the SRX format is to
provide a standard method to describe segmentation rules that are being exchanged among tools and/or translation vendors...
is intended to enhance the TMX standard
Why SRX?
Tool A Semicolon is end of segment
This is a sentence; this is another sentence. TM system sees two separate segments
Tool B Semicolon is NOT end of segment
This is a sentence; this is another sentence. TM system sees one segment No match from the TMX data!
Match rate around 50%, usual setting around 70%
Segmentation rules
Rules that the tool applies to the text to translate to split it up into segments paragraph sentence phrase incomplete sentences in bulleted lists single words (headings, Note,
Attention)
Segmentation rulesEnd of segment rules (common to the default settings of all tools) Dot at the end of a sentence (not after known
abbreviations) Question mark, exclamation mark Paragraph mark Colon
End of segment rules (different for different tools) Semicolon Tab character Sub segments (index entries, footnotes,
graphics)
Comparison of default rules
Workbench Transit DV SDLX Across
Colon end end end no end no end
Semi- colon
no end end end no end no end
Tab end no end no end no end no end
Soft return
no end no end end in Word no end in PPT
end in Word no end in PPT
no end
Settings for better reuse
Check the segmentation settings of the source tool, if possibleRe-create this setting in the target tool, as far as possibleSet down the minimum match value from the default 75% to about 50%For TM data that does not yield useful results, you may have to run an alignment of the original material on the target system.
When is SRX useful?
Moving TM data from one tool to another when the rules for translation have always been the same and the receiving tool is able to recreate the rules from the exporting tool Because: SRX only transports the rules that
are defined at the time of the export from the TM
Not many tools can write/read SRX at the moment, so there is only limited experience as of now.
SRX
SRX is under developed at the moment. The SRX file will contain the following information:
- Definition of the rules of a specific language
- Definition, how those rules were set at the time of the TMX export
Endrules and exceptions
Rule: A dot followed by a space is the end of a
segment.. This is the first sentence. This is the second
sentence.
Exception: A dot, preceded by a number is not the end of
a segment. Dies ist der 1. Satz.
End rules and exceptions
Rule:
[\.] \s
Exception for numbers, abbreviations...
[0-9]+\. \s
What can SRX not do?
It can only show the segmentation rule settings at the time of export.It cannot show any changes that have been applied in the segmentation rules during the use of the TM.Sometimes the rules from system 1 cannot be re-created in system 2, then the rule will be ignored.
Issues with data exchange via TMX
Initial Situation
Translation memory in TMX format (Translation Memory Exchange), created by a translation project in Star Transit with FrameMaker filesImport of the TMX file into SDL Trados Translators Workbench and a Star Transit projectComparison of statistics (word count, match rates) during an analysis (Trados) / import (Transit)
Issues
Word counts differedMatch rates differed
Prices differed quite a lot
[email protected] [email protected]
Goal of the test
Comparison of Word counts Segment counts Match rates
Comparison of Match Rates
Trados creates an Ancillary file that contains elements from master pages, headers/footers and variables, so that these elements only have to be translated (and counted) onceDifferent segmentation rules (especially for abbreviations) lead to different numbers of segments in Transit and TradosDifferent representation of tags from the FrameMaker files in TMX lead to low match rates when using a TMX file from Transit in the Analysis in TradosDifferent ways of counting words lead to different word counts
[email protected] [email protected]
Why are the results so different?
Overview on the word count rules and segmentation rules in Transit and TradosComparison of the segments during Analysis/ImportDifferences in handling of texts from master pages, headers, footers and variables Trados Ancillary file
Comparison of the TMX files
Word Count / Segment Count
Word count comparison
Microsoft Office d 97 - 2003-Dokum
Segmentation Rules
Trados Workbench
Star Transit
DV SDLX Across
Colon end end end no end no end
Semi- colon
no end end end no end no end
Tab end no end no end no end no end
Soft return
no end no end end in Word no end in PPT
end in Word no end in PPT
no end
Segmentation Rules
Defining abbreviations During import of files in Transit Separately in Trados TM
Trados Transit
Segment Comparison
When reusing a TMX file from Transit (FrameMaker project) Trados very often does not show a match. Only a concordance search shows that the segment is in the TM.
Comparison of segments within the TM tools
Only after changing the minimum match value to a lower value is Trados able to show matches during translation. Match values are often below 50%.The low match rates result from different handling of the tags:As an example: Transit counts 25 Tags, for the same segment, Trados counts 36 tags.For every tag difference Trados subtracts a 2% penalty.
Comparison of segments within the TM tools
Match rate 80%
Transit adds placeholders to the tags themselves.
Trados shows two tags and the placeholder as a separate letter
Different handling of tags changes the segment structure.Trados shows additional words (gray) and therefore also words that where moved in comparison to the match from the TMX TM.
Trados Ancillary
Variables, text from headers and footers and texts from the master pages are saved to a separate file in Trados so that the contents only has to be translated once.
Comparison of TMX DATA
TMX from TransitTMX from Trados
Segment in Transit Editor (small tag view)
Segment in Transit Editor (full tag view)
Segment in Trados Editor (full tag view)
Segment in Transit TMX
Segment in Trados TMX
Software Localization Tools
Is there a difference between localization and translation?
[email protected] [email protected]
Is there a difference between localization and translation?
Translation Transfer of text based information from language A to
language B where necessary, this includes adaptations in
contents (to comply with legal standards) form (page layout, page format) address (general public, subject matter experts) non-textual elements (pictures)
Localization Translation and adaptation of
text (software interface and accompanying documentation) software contents (tax calculation for different countries) resizing (so that translated text fits into the button) non-textual elements (icons, graphics, pictures, symbols) (software testing)
[email protected] [email protected]
The history of SW L10N
In the beginning, each software application was developed for one specific country. There was no thought given to the possibility of one day selling this software in another country.When these ideas came up, it was assumed, that the users would have to cope with the original user interface language, which was mostly English.
[email protected] [email protected]
The history of SW L10N
If a different language version was needed, the software was re-created in the target language Disadvantage: different language versions of the
software had to be maintainedThe text that showed on the user interface (UI, GUI) was written into the code of the program (hard-coded)Translating hard-coded text requires a lot of programming knowledge on the side of the translator, so that the program does not break during translation.
[email protected] [email protected]
sub print_form {my ($content);my ($template,$HTML) = @_;open (FILE, "
[email protected] [email protected]
CGI code snippet in PERLsub print_form {
my ($content);my ($template,$HTML) = @_;open (FILE, "
Hard-coded strings in C Code: Not Internationalized
#include main() {
int
n; char y[5]; printf("This
program converts decimal numbers to hexadecimal\n\n");while(1) { printf("\nEnter
decimal number: "); scanf("%d",&n); printf("\nNumber
entered is decimal and hexa",n,n); printf("\nDo
you want to continue? "); scanf("%s",y); if(strcmp(y,"yes")) { printf("\n
exiting ..\n"); exit(); } } }
strings are directly in the code
source.c
[email protected] [email protected]
#include extern unsigned char *intl_m_msg(), *intl_f_msg(); main() {
int
n; char y[5]; printf(intl_m_msg("","mypg",1)); while(1) { printf(intl_m_msg("","mypg",2));
scanf("%d",&n); printf(intl_m_msg("","mypg",3),n,n); printf(intl_m_msg("","mypg",4)); scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)); exit(); } } }
source.c
"This program converts decimal numbers to hexadecimal\n\n""\nEnter
decimal number:" "\nNumber
entered is decimal and hexa" "\nDo
you want to continue?" "\nexiting
..\n" "yes"
mypg.en
123456
Extract Strings using Extract Tool to Message File
[email protected] [email protected]
""Ce
programme
convertit
les nombres
dcimaux
en hexadcimal\n\n"\nEntrer
le nombre
dcimal:" "\nLe
nombre
entr
est
dcimal
et hexadcimal" "\nVoulez
vous
continuer?" "\nSortie
..\n" "oui"
#include extern unsigned char *intl_m_msg(), *intl_f_msg(); main() {
int
n; char y[5]; printf(intl_m_msg("","mypg",1)); while(1) { printf(intl_m_msg("","mypg",2));
scanf("%d",&n); printf(intl_m_msg("","mypg",3),n,n); printf(intl_m_msg("","mypg",4)); scanf("%s",y); if(strcmp(y, (intl_m_msg("","mypg",6))) { printf(intl_m_msg("","mypg",5)); exit(); } } }
source.c
mypg.fr
123456
Translate the extracted text and place it back into the cod
[email protected] [email protected]
Visual Basic: Example
[email protected] [email protected]
Action: Copy a file called afile.txt to the floppy drive A: from the hard disk C:
Command Line/DOS: COPY C:afile.txt A:afile.txt
Function Key / DOS:
Menu:
WIMP:
User Interface Examples: Copying a File: Four Different Ways
F5 menu or dialog
dialog
Origin and Evolution of the GUI (Graphical User Interface)
The software user interfaces we use today are called GUI (Graphical User Interfaces). Windows, the Macintosh operating system and the Unix operating system all use GUIs. The specific type of GUI is called a WIMP interface:
Windows
Icons
Mouse
Pointer
[email protected] [email protected]
Extracted text for translationExcel list, without context information
Sequence of occurrence Sorted alphabetically
Sorted alphabetically, only one occurrence for each term
[email protected] [email protected]
The history of SW L10N
Next came globally enabled software (internationalized software), where developers took into account that their program would be sold in many different countries and therefore would need to be able to show many different languages on the user interface. Support for a wide range of fonts Support for different code pages (characters) Separation of translatable text from programming code
Resource files containing translatable data Binary files containing programming code
Software Localization Tool
Editor
Extraction of translatable text
Software Localization Tool
Old source file
Old target file
Read in reference material
Target file
Generation of target file
File to translate
Read in new file
Translation
Software Localization: Passolo
Software Localization: Catalyst
Features
For the developer / project manager Preparing files for translation Quality assurance - Pseudo-translation Customization Managing translation projects
For the translator Translation environment Use of translation memory and terminology Quality assurance features Proofreading
Data Exchange
Preparing Files for Translation
Hiding text that was extracted but should not be translatedLocking text that should not be translated, but which is important for understanding the contextAdding comments or status flags to text
Read-onlytbd&Special
Changed from Version 1.5!
&Datei&FileStatusCommentTargetSource
QA Pseudo Translation
Checking the original application for: ability to display characters of target
language controls or text fields that are too small to
hold translated text Check stability of translated software
[email protected] [email protected]
Text expansion (averages)
Text expansion by language
There are several sources on the internet, that show charts for average text expansion.
Text expansion by languagehttp://www.omnilingua.com/resourcecenter/textexpansion.aspx.
[email protected] [email protected]
Comments
Bookmarks
Customization
Creating parsers to extract text from customer file formatsAbility to call other applications via APICreating macros to facilitate repetitive tasks Using the command line to process files and create projects
Managing Translation Projects
Creating translation projects for external translators Creating a package of all files needed for
translation, including the original files, reference material, glossaries, macros
Creating statistics Amount of text to be translated Amount of text that was pre-translated with
material from a previous version Repetitions
Managing Translation Projects
Data exchange Export of translation lists
Source segment target segment pairs Import of translation memory data from other translation
tools Export of terminology lists Import of terminology lists Recycling of older software versions via alignment
Update management Updating a project with changed source language files
Translation Environment
Editor Translation window Navigation window
WYSIWYG view of dialogs and menusResizing of controls
Translation Environment
Localization of bitmaps, icons and cursors
External editor Internal editor
Automatic pre-translation Recycling translations from previous
projects or translation memory data
Filling terminology lists Adding new terms to terminology lists during
translation
Quality Assurance Features
Checking features Spell-checking Formatting
tabs between access key and text
Access keys Same number of access keys in source and target Do all access keys exist Are all access keys in a dialog unique
Translation status (for review, signed- off)
Proofreading
Proofreading the translated software is made easier by the ability of the software localization tool to show the menus and dialogs as they will appear at runtime.No more proofreading of Excel lists where the context of the text is not apparent.
Data exchange
Most tools have their own export/import formatMost tools support the export/import format of standard tools (Trados, Star)Most tools support TMX (translation memory exchange format) for importing and/or exporting dataSome tools differentiate between export of terminology data and export of translation memory data
Help/Manual FR
Documentation
First localize the software Then localize help and documentation
SW Development
Build FR
Project Workflow
Localization Processes
Pseudo translation
Translation Memory Tool
Export of Terminology
Export of segment pairs for TM
Import into TM
Import intotermbank
Translation of Online-help, readme files,
manuals, webpages, packaging...
Translation of software
Extract translatable segments
File preparation
Software Localization Tool
Reuse of translations
Build FR Version 2
Database
Software Updates
Differences between Localization Tools and
Translation Tools
Localization Tools
Extract translatable text by ID numbersSegmentation of file is predefined by file format itself. Segment pairs are identified by their ID numbersAs text in software tends to consist of single words or short phrases, there is no separate management of terms and sentencesTranslation of update files means that already translated IDs are not touched. Only new or changed text will be touched.
Terminology Management Tools
Terminology management
Term extraction (monolingual and bilingual) creation of term lists from source documents or
translation memoriesTerm lists / term bases connect to an editor (source document creation) or the
TM systems and localization tools during translationTerm check ensure the consistent use of terms over the whole
project check for the use of forbidden terms
Terminology Management Tools
Extraction
Monolingual Extraction
Extraction of terms from documents in one language.
Creation of term lists important terms
Who defines what is important? How can a tool know, what is important?
frequent terms What is frequent? 3 times / 10 times Are frequent terms also important?
new terms According to whose level of subject matter
knowledge? Compared to which term list / term database?
Bilingual Extraction
Term extraction from bilingual sources like translation memory files or bilingual translation files Creation of parallel lists of terms and
their translation(s) All forms of the term and all its translations Only basic form Most frequent translation of source term
Term extraction issues
Terminology extraction is a highly individual process Goal of extraction, subject matter expertise,
available time Tools use different methods for terminology extraction Concordance, statistics, linguistics
Tools support different file formats for extraction and export Monolingual, bilingual, export formats
Tools sometimes dont show the context from which the term was taken
Term Extraction Tools
Assistance for manual extractionConcordance tools Extraction of all term combinations
Statistical extraction tools Frequent terms All languages
Linguistic extraction tools Extraction of noun phrases Supported languages only
Manual Extraction
Human reads the text, understands the meaning and selects terms (or term pairs) according to previous knowledge of the subject matter and/or the goal for the extraction. List of standard terms List of company terms List of new terms Additional information like source, context
example
Tools assisting manual extraction
Tools that connect to an editor and allow the collection of terms or term pairs
Translation memory tools that save terms and term pairs directly into the term database component
Term checking tools that report missing terms / translations
Manual extraction
Time consumingResource intensiveSubject matter and language expertise requiredMost accurate regarding the goalIndividual goals can be set
Concordance Tools
Automatic creation of a list of all terms and term combinations from a documentNo term is missedLong list of termsManual selection process necessary
Concordance results (Simple Concordance Program SCP)
TM tool
Term can consist of up to X wordsTerms that already exist in the database are not extractedExtraction from all files of a project (various file formats)
Extraction with TM tool
Export of term list for translationExport of term list to term database
Statistical Extraction Tool
Monolingual and bilingual extractionTerms that occur more than X times are extractedList of frequent terms frequent terms are seen as importantImportant terms / new terms that appear in this document less than X times are not extractedCan be used for any languageList of term candidates must be checked by a human with subject matter and language expertise
Terminology Tool of TM Suite (Dj Vu, Lexicon)
Settings for number of words per termSettings for frequency
Bilingual Extraction Results
SDL MultiTerm Extract
This is, how a text looks to a statistical extraction tool
Vot gnig harengoga fuor tok gnig nor shewerginhatz. Mirhon bortup tip trewshu gnig batbo loqtet. Bortup ter, bortup nofdas, semsel nih furpo ayano bliktreptat. Mirhon granbevtrov driktopret grig go wasbrekit mut mirkep taptro gnig suf. Aktrep zitpek nitnit bortup mil. Setrimb ak troptan bur metlatkento.
Linguistic Extraction Tool
Tool knows about the structure of the languageExtracted terms can be reduced to their basic from with the help of dictionaries and rulesUser can define the rules used for extractionExtraction limited to supported languages
Linguistic Settings
Extraction according to specific rules of the languageFrequency settings
Results of Extraction with Context Window (TerminologyWizard)
Linguistic Extraction Tools
Translations of terms come from the extraction files and internal dictionariesEach term is shown with its context and a grammatical analysisResults of extraction List of one-word terms List of multi-word terms List of context sentences
Export and view can be filtered
SDL PhraseFinder
File Formats
TM tools extract from every file format they supportConcordance tools are usually limited to text or Word filesBilingual extraction can be produces from bilingual file formats like translation memories, project files of a TM tool or bilingual translation files, but not from two separate filesExport usually in Excel, tab-delimited TXT or directly into the terminology component
ConclusionNo one tool can do what a human can do, but depending on the goal, the tools can help to automate repetitive tasks and comparisons with stop lists and/or term bases Concordance tools extract all words and
provide filter and search settings for the view of the term list
Statistical tools offer settings for frequencies, term length and comparison with stop word lists or existing term lists / term databases
Linguistic tools can be customized by rules for the extraction, which could be different for various languages and use language-specific dictionaries
Terminology Management Tools
Databases
Term Base Entry (MultiTerm)
Text information
Synonyms
Category
Category
Term Base Entry (Termstar)
Setup of term lists / term bases
Subject matter or company specific terms ProductName, Company - Name deactivate checkbox YY, activate checkbox YY Click inside checkbox YY twice
Collect terms, synonyms, abbreviations screen, scr., monitor
Base form of the termDecide on term status field
forbidden, deprecated, pending, confirmed by, used by us, used by competitor
Setup of term lists / term bases
Add as many information fields as needed Note, definition, context example, source Information should help authors and translators to
decide if and when to use a term / translation
Not all terminology management tools allow user defined fieldsKeep in mind that the more fields you have, the more time you need to schedule for filling and maintaining the information in those fields
Terminology Management Tools
Retrieval
Terminology Retrieval
Term components of translation memory tools Standalone tools or integrated term modules
Search sentences to translate for terms from the term base / term list
Pasting translations of terms into the translation / sending new term pairs to the term base or term list
Fuzzy matching of terminology Filtering terms (ex: per product)
Retrieving terms during translation
Sending terms to the term base
Term retrieval
Terminology Management Tools
Checking
Source Language Terminology Check
Check the source language documentation for consistent use of terms in the term list, use of forbidden terms, use of synonyms
Checking tools with an interface to your authoring system for terminology checks and grammar checks. (checks are customizable to your own rules)
TerminologyConnect the AC power cord to the AC power adapter, then to the back of the photo printer
AC power cord AC power cablepower cord power cablecord cable
Source Language Terminology Check
Authoring Memory Systems to check consistent use of standard sentencesPlace the DVD in the.
Target Language Terminology Check
Checking the usage of the terms in the target language (in Translation Memories, bilingual files)
Is the target language term from the term base used?
Are there synonyms for the target language term?
Is there a target language term in the translation where the source term from the term base does not appear in the source segment of the text?
Has a forbidden term been [email protected]
Example (Wordfast)
Translation from term list not found
Example (Wordfast)
Blacklisted term found
Example (Quintilian, Add-in for Word)
Hit or Miss for terms from Excel list
Example (across)
Wrong translation (building = Gebude)
Missing translation
Example (Transit)
Wrong translation (building = Gebude)
Several translations
Missing translation in term database
Examples (Dj Vu)
Wrong translation (building = Gebude)
Examples (SDLX)
Several translations for one word
Examples (Trados)
- Forbidden term (Monitor)- Wrong term (Auswahlschalter)- No target (menu)
Setup of term list / term base
In order to check for forbidden terms, these terms need to be marked with an attribute.
Term check in QA tools
File formats Bilingual files
TRADOS DOC/RTF and TTX STAR language files TM in TMX or TRADOS TXT format
Checks Terms from term list / term base not used Terms without target equivalent in term list /
term base Target term present but source term not
present (reverse check)
Example (ErrorSpy)Excel term list / suffix and prefix lists or MultiTerm term base
Example (QA Distiller)
Excel term list converted to internal dictionary
Summary
Each checking routine only checks some possibilities, none checks the whole range
Missing translation in term list / term base Term with several translations Term with wrong translation Missing source term (reverse check)
Terminology Management Tools
Exchange
Terminology Exchange
Most tools can export and import tab-delimited text filesSome tools are offering the TBX (TermBase Exchange) format or similar XML format for data exchange Transit, Heartsome, across MultiTerm (XML format, similar to TMX)
TBX from term databases could also be useful for: Extracting stop word lists for extraction tools Re-use as dictionary in machine translation systems Displaying terminology information online Optimizing search engines (keywords) and text mining tools
TBX TermBase Exchange
From the TBX Specification TBX is an open XML-based standard format for
terminological data TBX is designed to support the analysis,
representation, dissemination, and exchange of information from human-oriented terminological databases (term bases)
TBX is built on the basis of ISO 12620 (data categories) and ISO 12200 (MARTIF - Machine-readable Terminology Interchange Format, core structure)
Structure of a TBX file
Structure of a TBX file
TMX TBX
English term
French term
Global information in the entry head
Term level information
Administrative data of a language
Language ID
Languag ID
Conversion from MultiTerm to TBX (Medtronic)
Medtronic Mapping Table
Terminology Processes
Term list
Authoring
Term list
Terminology extraction
Terminology approval Term
list
Import
Terminology Database
Translation and Terminology Check
Term list
Term translationsNew termsChange requests
Term list
Terminology approval
Import of translations
Term check during authoring
Online publication of term database (intranet/internet)
Term list
New termsChange requests
Term list
Terminology approval
Terminology Processes
Be aware that terminology work requires a lot of resourcesAlways include everybody who has to deal with terminology (authoring, production, development, marketing, sales, translation)There needs to be one person responsible for terminology for each language
Terminology Costs
Terminology Cost
About 10 to 15 Euro per source term including definitionAbout 20 terms a dayAbout between 1 and 1.5 Euro per translation into one language
Changing a term during a translation project, will cost about $1000 per term per language (JDEdwards)
Sample calculation
1000 source terms5 target languagesRate per hour 50 EuroInitial corpus of terms 100 terms per hourTerm maintenance (hours per year and language with 1000 base terms) 12(study by tekom regional group Saxony, Germany)
Alignment
Alignment
Old source and target language documents are read into the alignment component of the TM toolThe tool segments the files and tries to connect the segments that belong together, thus creating segment pairsA translator checks the alignmentResults are imported into a TM system for reuse with new translations
Example: Dj Vu
SDLX
Alignment
Before translation, to add new segment pairs to a TMAfter translation, to get the really final segment pairs into the TM As target language documents tend to
get corrected after translation as wellExport of alignment to TMX for terminology extraction
Translation Tools
Translation MemoryTerm base /
term list
Editor of TM tool
Terminology Extraction
Source language files
Target language files
Alignment
Possible file preparation
Creation of target language fileDTP
possible text extraction file conversion
Tools in the localization process
Software Localization Tool
Translation Tools
Utilities
Utilities
Word count tools Conversion tools (from PDF to Word) Extraction tools for text extraction from
certain file formats QA tools for bilingual material (TMs) Macros, self-developed tools
Word count tools
Different tools count differently, because they have different definitions of what a "word" is.
Some count stand-alone numbers and symbols (like Word) others don't (like Trados)
Always agree on the tool (and better still on the tool version) that both sides use for counting words
[email protected] [email protected]
Word count tools
Freebudget, http://www.webbudget.com/fb4dload.htmTextCount, http://www.textcount.com/html/textcount-en.htmlAnyCount, http://www.anycount.com/TotalAssistant, http://www.surefiresoftware.com/totalassistant/PractiCount, http://www.practiline.com/Online Word Count, http://allworldphone.com/count-words-characters.htmWordCalc, Syllable and word count, http://www.wordcalc.com/Online Word Count Tool, http://www.wordcounttool.com/
Conversion / Text Extraction Tools
Converting PDF to Batch converting files to a specific
format (FrameMaker to MIF, InDesign to INX, Word to RTF)
Text extraction for translation (Copyflow for QuarkXPress, Software strings to Excel/XML)
QA and TM Maintenance Tools
Checking for consistent punctuation (same as in source or target language specific)
Checking numbers Checking for missing translations Checking for same source / target Checking for same source / different
target Search / Replace functions
QA and TM Maintenance Tools
ErrorSpy, DOG, http://dog-gmbh.com/index.php?id=44&L=1
QA Distiller, Yamagata, http://www.qa-distiller.com/
Quintilian, TerminologyMatters, http://www.terminologymatters.com/quintilian.html
BlackJack, ITR, http://www.itr.co.uk/en/translation-quality- assurance.html
Olifant, Enlaso Tools, http://www.translate.com/technology/tools/Olifant.html
Translation Tools
Workflow Management / Project Management Tools
Project Management and Workflow Tools
Project Creation in TM tool packaging of project files inserting translated project files
Project Management Tools Offers and invoicing Data on customers and vendors
Workflow Tool Automation of processes (file conversion, pre-
translation, packaging, sending out package to assigned translator)
Project / Workflow Management
Batch processingMultiple TMs / Term databasesOnline tracking of project statusAutomating sequences of steps Word count pre-translation copying
files to translate to different language folders
Rights and roles
What is a Workflow?
A workflow defines the order (rules) in which specific processes (consisting of separate tasks) are performed to achieve a defined result.Each process consists of individual tasks.Each task is associated with a specific resource.Every player in the workflow has a specific role with certain rights.In order to describe a workflow Define your processes that make up the workflow Standardize processes Break down each process into individual tasks Assign the task to a player in the process Assign a role to the player (defining the rights)
This means that you need to take a very detailed look into the processes involved to be able to define a standard workflow.
Breaking Down a Project
Localization Project Define the processes
Terminology questions File preparation File handling Translation Proofreading / editing Quality assurance
Process: File Preparation
Define the tasksGet file list with file formatsCheck files for translatability Simulated translation for software files Check for text on graphics in documentation Check if there is enough space for text expansion Check for consistent use of terminology
Define what file format needs what kind of preparation Convert files / extract text / mark translatable and
untranslatable text Create additional files like settings for use in certain translation
tools Description for the translators on how to handle the files
Get statistics from the files (number of words, number of repeated sentences, number of sentences that will come from the translation memory system as a match / translation suggestion)Pre-translate files with existing translation memories
Automation?
Define the tasksGet file list with file formatsCheck files for translatability
Simulated translation for software files Check for text on graphics in documentation Check if there is enough space for text expansion Check for consistent use of terminology
Define what file format needs what kind of preparation Create additional files like settings for use in certain translation tools Description for the translators on how to handle the files Convert files / extract text / mark translatable and untranslatable text
automate Get statistics from the files (number of words, number of repeated
sentences, number of sentences that will come from the translation memory system as a match / translation suggestion)
automatePre-translate files with existing translation memories
automate
How to Automate?
Simple macrosCreate scripts or small toolsUse the API of translation tools to automate several steps in one goSet up a workflow management system
When to use Workflow (Automation)?
Whenever there is set of processes that will be used over and over again, this can be thought of as a workflow. the rules are set the process sequence is always the
same each process has its pre-defined set of
tasks for each task you can decide who is
going to do it it can be documented and learned
Translator has a question on a term
AgencyTranslator asks the project manager of the translation agencyClient
Agency PM asks the PM at the customer
Client authoring / engineering
department
For terminology question in the source language Client
market center
For terminology question in the target language
For terminology question in the target language
Translator adds term questions to an online list.
ClientPM
Client market center Target language
questions
Automation through a central connection point and roles for each user
Central connection
point
Client authoring department
Source language questions
Agency adds term questions to an online list.
Translator has a question on a term
Agency
Translator sends questions to agency
Summary
Workflow management is work.Many workflow management systems provide a platform for describing and automating parts of a workflow, not a ready to fly automated system.Get the process right, then think about automation.Break down bigger elements into smaller units (processes tasks steps) and describe their conditions and interdependencies.When necessary, there should be a way to break the workflow rules.
Machine Translation
Machine Translation
Requires source documentation that is optimized for machine translation (controlled source language)Requires post-editing, not so much post- translationCan be used in combination with TM systems (to pre-translate large amounts), but is often not appreciated as very helpful by translators
TM - MTInteractive translation
interactive process
almost all language pairs possible
creation of a repository
Recycling of translations independent of the format of the source document
Machine translation
fully automated process
only works for the language pair the system was created for
text is usually pre- edited and or post- edited
good systems are relatively costly
very fast
Translation Tools
Evaluation
Evaluation
Collect your requirementsGet a demo with your files from different tools vendorsTest the tools yourself with the evaluation matrix
Tools, Tools, Tools
[email protected] [email protected]
TM tools information
Software reviews http://www.localizationworks.com/DRTOM/index.html
Comparisons DOG, Translation Memory Systeme im Vergleich (German), 2005, ISBN 978-3-9810595-1-9
Details functionalities, test setup, result matrix
MD, Translation Memory Systeme im Vergleich (German), issue 4/5 2005
Newsletters http://www.internationalwriters.com/toolkit/
Magazines www.multilingual.com
Training material and courses on tools http://ecolotrain.uni-saarland.de/index.php?id=2525&L=1
Survey LISA (www.lisa.org) Translation Memory Survey 2002 and 2004 Imperial College London, Translation Memories Survey 2006,
http://www3.imperial.ac.uk/portal/pls/portallive/docs/1/7307707.PDF
[email protected] [email protected]
Some software localization tools
Passolo www.passolo.com (XLIFF)RC-WinTrans www.schaudin.comCatalyst www.alchemysoftware.ie (XLIFF)Multilizer www.multilizer.comSDL Insight www.sdl.com/products/sdlinsight.htm (XLIFF)Language Studio ls.atia.comLingobit Localizer www.lingobit.com,RapidTranslation www.rapidtranslation.netVisual Localize www.visloc.com
AppleGlot http://developer.apple.com/intl/localization/tools.html
Some TM tools also offer the ability to translate EXE files and the like, but usually they do not offer a visual representation of the dialogs and menus.
Some Terminology Extraction Tools
Concordance tools Simple Concordance Program (SCP), http://www.textworld.com/scp/ ExtPhr32, http://publish.uwo.ca/~craven/freeware.htm
Term extraction tools / components of translation memory toolsStatistical Extraction
MultiTerm Extract, Dj Vu Lexicon, Heartsome Dictionary Editor, across
TermiDOG (www.dog-gmbh.de), Chamblon Terminology Extractor (http://www.chamblon.com/terminologyextractor.htm)
Linguistic Extraction Synthema Terminology Wizard
(http://www.synthema.it/english/servizi/traduzioni.html), SDL PhraseFinder
Some Translation Memory Tools
Word based Wordfast, www.wordfast.net Metatexis, www.metatexis.com SDL Trados Workbench with Word, www.sdl.com TinyTM (open source), tinytm.sourceforge.net JiveFusion (for Office files),
http://www.jivefusiontech.com/products_FT.html
Integrated TM systems Dj Vu, www.atril.com Transit, www.star-group.net MemoQ,
http://en.kilgray.com/?q=node/products/memoq/memoq4free Across, www.across.net SDLX, www.sdl.com (integarted into SDL Trados installation) Heartsome, http://www.heartsome.net/EN/home.html
Some Online Translation Memory Tools
Online TM tools Ontram, http://www.andrae-ag.de/EN/products/ontram.htm TinyTM, tinytm.sourceforge.net Server-based translation environment from SDL Trados, across
Some Tools for Source Language Checking
Acrolinx IQ Suite, acrolinx http://www.acrolinx.com/iq_suite_overview_de.html (German)
HyperSTE, Tedopres http://www.tedopres.com/en/products-services/hypervision/
SDL Author Assistant, SDL Writer's Workbench, SDL Trados Authoring Coach, Sajan,
http://marketing.sajan.com/marketing/solutions/authoring.aspx Author-it Xtend,
http://www.author-it.com/index.php?page=xtendauthoringmemory across Plug-in
Some Tools for Target Language Checking
ErrorSpy, DOG, http://dog-gmbh.com/index.php?id=44&L=1
QA Distiller, Yamagata, http://www.qa-distiller.com/
Quintilian, TerminologyMatters, http://www.terminologymatters.com/quintilian.html
BlackJack, ITR, http://www.itr.co.uk/en/translation-quality- assurance.html
Some Workflow/Project Management Tools
SDL Trados Teamworks SDL TMS Idiom Worldserver Across workflow module Business Manager, Plunet Project open LTC Orgnizer
TMX specificationTMX is a recommendation by OSCAR OSCAR: LISA special interest group
Open Standards for Container/Content Allowing Re-use The latest specification can be downloaded from
http://www.lisa.org/tmx/tmx.htm For comments: [email protected] List of TMX certified tools
The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.
SRX Specification
Latest version www.lisa.org/srx/srx.htm
www.lisa.org/srx/srx10-20040420.htm
Tools of the TradeAgendaTools in Translation / LocalizationTranslation Memory ToolsSame idea different approachSame idea different approachSame idea different approachMacros within WordTM + EditorSeparate toolsTrados with WordTrados with TagEditorIntegrated toolsIntegrated toolsHeartsome EditorSDLXSDLXSDLXMemoQMemoQTransitDj VuWord counts in Translation ToolsWhat is counted in an analysis?Word CountWord Count ExamplesCalculation of Match RatesMatch Rate ExamplesRepetitionsWhat influences the statistics?TMX (Translation Memory Exchange)Localization StandardsTMXTranslation Memory ExchangeWhat is TMXWhat is TMXWhat is TMXLevels of TMXLevel 1Level 2TMX from Dj Vu (Atril)TMX from TradosTMX from Transit (Star)TMX from SDLX (SDL)Implications of different tags for formattingWhen is it useful?Does it work?Reusing TMX dataSRX Segmentation Rules ExchangeWhy SRX?Segmentation rulesSegmentation rulesComparison of default rulesSettings for better reuseWhen is SRX useful?SRXEndrules and exceptionsEnd rules and exceptionsWhat can SRX not do?Issues with data exchange via TMXInitial SituationIssuesGoal of the testComparison of Match RatesWhy are the results so different?Word Count / Segment CountSegmentation RulesSegmentation RulesSegment ComparisonComparison of segments within the TM toolsComparison of segments within the TM toolsTradosAncillaryComparison of TMX DATAFoliennummer 73Software Localization ToolsIs there a difference between localization and translation?Is there a difference between localization and translation?The history of SW L10NThe history of SW L10NFoliennummer 79CGI code snippet in PERLHard-coded strings in C Code: Not InternationalizedExtract Strings using Extract Tool to Message FileTranslate the extracted text and place it back into the codVisual Basic: ExampleFoliennummer 85Origin and Evolution of the GUI (Graphical User Interface)Extracted text for translationThe history of SW L10NSoftware Localization ToolSoftware Localization: PassoloSoftware Localization: CatalystFeaturesPreparing Files for TranslationQA Pseudo TranslationFoliennummer 95Text expansion (averages)Text expansion by languageText expansion by languageCommentsCustomizationManaging Translation ProjectsManaging Translation ProjectsTranslation EnvironmentTranslation EnvironmentQuality Assurance FeaturesProofreadingData exchangeProject WorkflowLocalization ProcessesSoftware UpdatesDifferences between Localization Tools and Translation ToolsLocalization ToolsTerminology Management ToolsTerminology managementTerminology Management ToolsMonolingual ExtractionBilingual ExtractionTerm extraction issuesTerm Extraction ToolsManual ExtractionTools assisting manual extractionManual extractionConcordance ToolsConcordance results (Simple Concordance Program SCP)TM toolExtraction with TM toolStatistical Extraction ToolTerminology Tool of TM Suite (Dj Vu, Lexicon)Bilingual Extraction ResultsSDL MultiTerm ExtractThis is, how a text looks to a statistical extraction toolLinguistic Extraction ToolLinguistic SettingsResults of Extraction with Context Window (TerminologyWizard)Linguistic Extraction ToolsSDL PhraseFinderFile FormatsConclusionTerminology Management ToolsTerm Base Entry (MultiTerm)Term Base Entry (Termstar)Setup of term lists / term basesSetup of term lists / term basesTerminology Management ToolsTerminology RetrievalRetrieving terms during translationTerm retrievalTerminology Management ToolsSource Language Terminology CheckSource Language Terminology CheckTarget Language Terminology CheckExample (Wordfast)Example (Wordfast)Example (Quintilian, Add-in for Word)Example (across)Example (Transit)Examples (Dj Vu)Examples (Trados)Setup of term list / term baseTerm check in QA toolsExample (ErrorSpy)Example (QA Distiller)SummaryTerminology Management ToolsTerminology ExchangeFoliennummer 166Foliennummer 167Foliennummer 168Foliennummer 169TBX TermBase ExchangeStructure of a TBX fileStructure of a TBX fileFoliennummer 173Foliennummer 174Conversion from MultiTerm to TBX (Medtronic)Medtronic Mapping TableTerminology ProcessesTerminology ProcessesTerminology CostsTerminology CostSample calculationAlignmentAlignmentExample: Dj VuSDLXAlignmentTranslation ToolsTools in the localization processTranslation ToolsUtilitiesWord count toolsWord count tools Conversion / Text Extraction ToolsQA and TM Maintenance ToolsQA and TM Maintenance ToolsTranslation ToolsProject Management and Workflow ToolsProject / Workflow ManagementWhat is a Workflow?Breaking Down a ProjectProcess: File PreparationAutomation?How to Automate?When to use Workflow (Automation)?Foliennummer 205Foliennummer 206SummaryMachine TranslationMachine TranslationTM - MTTranslation ToolsEvaluationTools, Tools, ToolsTM tools informationSome software localization toolsSome Terminology Extraction ToolsSome Translation Memory ToolsSome OnlineTranslation Memory ToolsSome Tools forSource Language CheckingSome Tools for Target Language CheckingSome Workflow/Project Management ToolsTMX specificationSRX Specification