Metadata in Translation Tools: Importance, Usage ... - ZAAC · Metadata in Translation Tools:...
Transcript of Metadata in Translation Tools: Importance, Usage ... - ZAAC · Metadata in Translation Tools:...
Metadata in Translation Tools: Importance, Usage, Storage, Transfer
Angelika Zerfaß
& Richard Sikes
Metadata is...
• Data that describes other data.
• It provides information about a certain item's content.
Some Examples from everyday life
• Graphic File:
– How large
– Color depth
– Image resolution
– When created
– File name
– Description
– Key words
• Book
– Author
– Genre
– Subject
– Length
– When written
– Summary
– Location
– Key words
Data vs Metadata
Metadata in Localization
• What kind of metadata can be provided throughout the localization process?
• Where and how can it be used?
• How well does it transfer between tools?
Metadata in Translation Tools Information about the translation itself
– Creation date
– User who created the translation
– User who created the translation databases
• TMs,
• Term bases
• ...
– status information about the file in translation
• (is the translation confirmed,
• does it come from a TM,
• automated translation,
• Alignment
• …
Uses
Metadata within one TM system:
• Categorize translated content within one translation memory system
• Influence the match rates during translation.
Transfer between Tools
Different translation tool components use different exchange formats.
• Translation memories are exchanged via TMX (Translation Memory Exchange format).
• Terminology data is exchanged via CSV or TBX (Term Base Exchange format).
• While the segment pairs of source segment plus translated segment can be exchanged via XLIFF or a customized version thereof.
Where does Metadata Reside?
• In the header of exchange file formats like TMX, XLIFF, TBX – Administrative data
– user defined data (name of TM, path where XLIFF was saved to…)
• On the translation unit/term level of the exchange files – Administrative data (who and when)
– Process data (coming from alignment…)
– categorization data (pre-defined and customer fields)
• Inside segments (formatting…)
More detailed used for Metadata
• Searching for segments in a TM
• Searching for terms in a term base
• Filtering for segments during translation (prefer /
penalize, i.e. decrease match rate)
• TM clean-up
• TM splitting (filter during export)
• Term base splitting (filter during export)
Details…
• Metadata in TMs
– TM level data • Administrative data (name, who, when)
• Pre-defined categories (subject, client…) if the tool so provides
• Document level data with corpus based tools
– TU level data • Administrative (created when and by who)
• Process data (TU comes from alignment…)
• Custom categories
Our Tests
• Create TM with custom metadata fields
• Add translations to TM with field information
• Export to TMX
• Import into different tool
• What metadata transfers well, what doesn’t
Metadata when creating a TM (1)
select/fill predefined fields
Metadata when creating a TM (2)
custom fields
Metadata in the TM
associated with a segment
Administrative data
Process data
Predefined data fields
User defined data fields
Information on the
TM (container) level
<header creationtool="MemoQ" creationtoolversion="5.0.21"
segtype="sentence" adminlang="en-us" creationid="AZerfass"
srclang="en" o-tmf="MemoQTM" datatype="unknown">
<prop type="defclient">client A</prop>
<prop type="defproject">proj 123</prop>
<prop type="defdomain">automotive</prop>
<prop type="defsubject">transmission</prop>
<prop type="description">models 1-5b</prop>
<prop type="targetlang">de</prop>
<prop type="name">Full Settings</prop>
</header>
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
Information on the segment level <tu changedate="20111012T003107Z" creationdate="20111012T003107Z"
creationid="AZerfass" changeid="AZerfass">
<prop type="client">autoparts</prop>
<prop type="project">1-2-3-4-5</prop>
<prop type="domain">automotive-aeronautics</prop>
<prop type="subject">spare parts</prop>
<prop type="corrected">no</prop>
<prop type="aligned">no</prop>
<prop type="x-document">Demo 1</prop>
<tuv xml:lang="en">
<prop type="x-context-pre"><seg>Dies ist ein neuer Satz.</seg></prop>
<prop type="x-context-post"><seg>Dies ist ein kurzer wunderschöner Satz.</seg></prop>
<prop type="x-reviewer">rev1</prop>
<prop type="x-internal id">44567</prop>
<prop type="x-date of review">20111012T003000Z</prop>
<prop type="x-doc type">broschure</prop>
<prop type="x-model">model 1</prop>
<prop type="x-model">model 5</prop>
<seg>Dies ist ein kurzer neuer Satz.</seg>
</tuv>
<tuv xml:lang="de">
<seg>This is a short, new sentence.</seg>
Metadata used for sorting in the TM
Metadata when creating a TM (1)
select/fill predefined fields
Metadata when creating a TM (2)
custom fields
Metadata in the TM associated with
a segment
Administrative data
Process data
Predefined data fields
User defined data fields
Information on the TM level
<?xml version="1.0" encoding="utf-8"?>
<tmx version="1.4">
<header creationtool="SDL Language Platform" creationtoolversion="8.0" o-tmf="SDL TM8 Format" datatype="xml" segtype="sentence" adminlang="de-DE" srclang="de-DE" creationdate="20111012T011627Z" creationid="Z-0314F13C5AED4\A">
<prop type="x-reviewer:SingleString"></prop>
<prop type="x-doc type:SinglePicklist">legal,workshop manual,website,broschure</prop>
<prop type="x-model:SinglePicklist">model 1,model 2,model 3,model 4,model 5</prop>
<prop type="x-internal id:Integer"></prop>
<prop type="x-review date:DateTime"></prop>
<prop type="x-Recognizers">RecognizeAll</prop>
<prop type="x-TMName">TM for TMX test</prop>
</header>
Information on the segment level <tu creationdate="20111012T032948Z" creationid="ALIGN!" changedate="20111012T032948Z" changeid="ALIGN!" lastusagedate="20111012T013621Z" usagecount="2">
<prop type="x-Context">-8428286702482475836, 1404007344699555312</prop>
<prop type="x-Context">615444784753120163, 615444784753120163</prop>
<prop type="x-Origin">Alignment</prop>
<prop type="x-OriginalFormat">TradosTranslatorsWorkbench</prop>
or
<prop type="x-Origin">TM</prop>
<prop type="x-ConfirmationLevel">Translated</prop>
<prop type="x-review date:DateTime">20100303T120000Z</prop>
<prop type="x-reviewer:SingleString">AZ</prop>
<prop type="x-internal id:Integer">12345</prop>
<prop type="x-doc type:SinglePicklist">workshop manual</prop>
<prop type="x-model:SinglePicklist">model 4</prop>
<tuv xml:lang="de-DE">
<seg>Dies ist ein neuer Satz.</seg>
</tuv>
<tuv xml:lang="en-US">
<seg>This is a new sentence.</seg>
Metadata used for sorting in the TM
Metadata on document level
Information on the document level <?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
<header creationtool="MultiTrans" creationtoolversion="5.0.1947.0" segtype="sentence" o-tmf="DVMDB" adminlang="en-us" srclang="en-us" datatype="html" creationdate="20111011T220958Z" creationid="Richard" changedate="20111011T223931Z" changeid="Richard">
<prop type="TXB:Name">LocWorld HTML Example</prop>
<prop type="DOC:Created date">20111011T094600Z</prop>
<prop type="DOC:Modified date">20111011T095000Z</prop>
<prop type="DOC:Name">HTML Example.htm</prop>
<prop type="DOC:Source language">eng</prop>
<prop type="DOC:Revision Number">0</prop>
<prop type="DOC:Revision Date">20111011T000000Z</prop>
<prop type="DOC:Created date">20111011T094600Z</prop>
<prop type="DOC:Modified date">20111011T095000Z</prop>
<prop type="DOC:Name">HTML Beispiel.htm</prop>
<prop type="DOC:Source language">eng</prop>
<prop type="DOC:Revision Number">0</prop>
<prop type="DOC:Revision Date">20111011T000000Z</prop>
Metadata on document level
Metadata on document level
• Take TMX from Trados 2009 to memoQ 5
• Add all unknown fields to the setup
Tests • Take TMX from Trados 2009 to memoQ 5
Tests
• Take TMX from memoQ and MultiTrans into SDL Trados 2009, different use for picklist and text field values
• memoQ – <prop type="x-model">model 1</prop>
– <prop type="project">1-2-3-4-5</prop>
• MultiTrans – <prop type="Document
Type">Manual</prop>
• SDL Trados 2009 – <prop type="x-doc
type:SinglePicklist">workshop manual</prop>
– <prop type="x-reviewer:SingleString">AZ</prop>
Tests • Mapping of fields from a TMX file to existing fields
in MultiTrans
Tests
Tests
• Create a TM with translations from one specific file format like, HTML, DOC, InDesign, XML…
• Export to TMX and import into another tool
• Run a translation of the exact same file used to create the first TM
• Match rates differ greatly depending on file format used to create the TM – Because different segmentation rules are applied
– Because inline tags are not recognized or interpreted differently or were not imported at all during TMX exchange
TMX exchange results
Details…
• Metadata in XLIFF
– File level data
• Administrative data (name, who, when)
– TU level data
• Administrative (created when and by who)
• Process data (status of translation, origin of match)
• Comments, history…
What an XLIFF contains…
<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:mq="MQXliff" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-transitional.xsd">
<file original="C:\Users\azerfass.ZAAC\Desktop\TMX Test\Beispieldateien\HTML Beispiel.htm" mq:id="e3b904d9-7967-46bc-88e3-1bdd83d25544" source-language="de" target-language="en" datatype="x-html">
<header>
<skl>
<internal-file>UEsDBBQAAAAIAJSO8j5AXl8ttAIAADoIAAARAAAASFRNTCBCZWlzcGllbC54bWytVVFv2jAQfp+0/4B4B0OgLVRuKgbbigQdK2yrJqTJxJdgNbEj2ym0v37GCTQJQXTSXiC++777znf2Gd9uo7D2DFIxwW/q7WarXgPuCcp4cFNPtN/o1W/………..
What an XLIFF file can contain on
the translation unit level <body>
<trans-unit id="1" mq:status="ManuallyConfirmed" mq:rep="rep"
mq:segmentguid="1f447f04-f394-43e6-aacc-355a1dabed92"
mq:translatorcommittimestamp="0001-01-01T00:00:00Z"
mq:reviewer1committimestamp="0001-01-01T00:00:00Z"
mq:reviewer2committimestamp="0001-01-01T00:00:00Z"
mq:lastchangedtimestamp="2011-07-18T15:47:25Z"
mq:maxlengthchars="-1" mq:nosplitjoin="false">
<source mq:segpart="1"
mq:hasfollowingobject="hasfollowingobject">Beispielseite</source>
<target>sample page</target>
</trans-unit>
What an XLIFF file can contain on
the translation unit level <trans-unit id="5" mq:minorversionend="5"
mq:minorversionstart="4" mq:status="PartiallyEdited" mq:segmentguid="5ecd83ee-3bd8-4a86-9a96-ed10485254bc" mq:translatorcommittimestamp="0001-01-01T00:00:00Z" mq:reviewer1committimestamp="0001-01-01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01T00:00:00Z" mq:lastchangedtimestamp="2011-10-11T15:52:05Z" mq:maxlengthchars="-1" mq:nosplitjoin="false">
<source mq:segpart="6">Hier kommt der 5. Satz. </source>
<target>Here comes sentence number five. </target>
<note>Number below 10 are written as words.</note>
Metadata that can be contained
<mq:warnings40>
<mq:errorwarning mq:errorwarning-code="03062" mq:errorwarning-ignorable="errorwarning-ignorable" mq:errorwarning-shorttext="Numbers in source and target segment do not match" mq:errorwarning-problemname="numbers do not match" mq:errorwarning-segmenthash="0" mq:errorwarning-combinedposstart="-1" mq:errorwarning-combinedposlength="0" />
</mq:warnings40>
<trans-unit id="855094c2-c334-4c6f-abf5-9fbad9e89d76">
<source>Schicken Sie eine Mail an: <g id="pt1"><g id="pt2">[email protected]</g></g></source>
<seg-source><mrk mtype="seg" mid="6">Schicken Sie eine Mail an:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7">[email protected]</mrk></g></g></seg-source><target><mrk mtype="seg" mid="6">Send a mail to:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7"><mrk mtype="x-sdl-comment" sdl:cid="af90adc0-a008-4484-a46a-5824fddef1ea">[email protected]</mrk></mrk></g></g></target><sdl:seg-defs><sdl:seg id="6" conf="RejectedTranslation" origin="interactive"><sdl:value key="SDL:OriginalTranslationHash">-2010148818</sdl:value></sdl:seg><sdl:seg id="7" conf="Translated" origin="source"><sdl:value key="SDL:OriginalTranslationHash">-669315889</sdl:value></sdl:seg></sdl:seg-defs></trans-unit>
This is a new <mrk mtype="x-sdl-comment" sdl:cid="dc02f347-9f59-
486e-8f2e-1f97b2aa2c91">sentence</mrk>.
Tests
• Translate a document
• Send document for review
• Compare documents with track changes
• Export to XLIFF
History of a file
Metadata in XLIFF with track changes
File header <tool tool-id="MQ" tool-name="MemoQ" tool-version="5.0.21" tool-company="Kilgray" />
<mq:export-path>C:\temp\Demo 1_ger.rtf</mq:export-path>
<mq:docinformation mq:hashistory="true">
<mq:versioninfos mq:majorversion="1"><mq:minorversioninfo mq:minorversion="0" mq:comment="" mq:createdthroughview="false" mq:createreason="Import" mq:creationtime="120498833" mq:creatoruser="AZerfass" mq:tag=""><mq:details mq:type="MinorVersionDetailsImport"><![CDATA[<?xml version="1.0"?>
<MinorVersionDetailsImport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<FilePath>C:\temp\Demo 1.rtf</FilePath>
</MinorVersionDetailsImport>]]></mq:details></mq:minorversioninfo>
<mq:minorversioninfo mq:minorversion="1" mq:comment="after manual translation" mq:createdthroughview="false" mq:createreason="Snapshot" mq:creationtime="120498866" mq:creatoruser="AZerfass" mq:tag=""><mq:details /></mq:minorversioninfo>
<mq:minorversioninfo mq:minorversion="2" mq:comment="" mq:createdthroughview="false" mq:createreason="BilingualExport" mq:creationtime="120499070" mq:creatoruser="AZerfass" mq:tag=""><mq:details mq:type="MinorVersionDetailsBilingExport"><![CDATA[<?xml version="1.0"?>
<MinorVersionDetailsBilingExport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<BilingualType>Xliff</BilingualType>
<TargetPath>C:\temp\compare current to version 1.xlf</TargetPath>
<TwoColumpRtfProperties>MutipleDocuments</TwoColumpRtfProperties>
<BilingRtfProperties>EmptySegmentsWithMarkup</BilingRtfProperties>
<XliffProperties>IncludePreview IncludeSkeletons</XliffProperties>
</MinorVersionDetailsBilingExport>]]></mq:details></mq:minorversioninfo>
<trans-unit id="3" mq:minorversionend="2" mq:minorversionstart="2"
mq:status="PartiallyEdited" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01T00:00:00Z" mq:reviewer1committimestamp="0001-01-01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01T00:00:00Z"
mq:lastchangedtimestamp="2011-10-12T04:26:19Z" mq:maxlengthchars="-1" mq:nosplitjoin="false">
<mq:historical-unit mq:minorversionend="1" mq:minorversionstart="1"
mq:status="ManuallyConfirmed" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01T00:00:00Z" mq:reviewer1committimestamp="0001-01-01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01T00:00:00Z"
mq:lastchangedtimestamp="2011-10-12T04:22:32Z" mq:maxlengthchars="-1" mq:nosplitjoin="false">
<source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">
Dies ist ein kurzer schöner Satz.</source><target>This is a short, nice sentence.</target>
</mq:historical-unit>
<mq:historical-unit mq:minorversionend="0" mq:minorversionstart="0"
="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01T00:00:00Z" mq:reviewer1committimestamp="0001-01-01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01T00:00:00Z" mq:lastchangedtimestamp="2007-12-17T12:28:29Z" mq:maxlengthchars="-1" mq:nosplitjoin="false">
<source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">Dies ist ein kurzer schöner Satz.</source>
<target></target>
</mq:historical-unit>
</mq:minorversions></trans-unit>
Tests
• Create a translation with
– Segments in different states (translated, pre-
translated, not edited, comments…)
– Create an XLIFF exchange file
– Import XLIFF into another tool
• What metadata can be re-used between tools?
• Unfortunately….None, at least in the tool
combinations tested for this presentation...
Future vision...
Statistics for BI reporting • Where segments came from
• How much was used, changed, rejected
• Which user changes a lot or just accepts TUs as-is
• Rollback
• What percentage of TM is used over time
• Segment usage counter
• Change history
• QA messaging
• User input / feedback commentary into bug tracking