Inline Markup in XLIFF 2.0
description
Transcript of Inline Markup in XLIFF 2.0
![Page 1: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/1.jpg)
Fredrik Estreen - LionbridgeYves Savourel - ENLASO
Inline Markup in XLIFF 2.0
![Page 2: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/2.jpg)
While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup.
Things may change during the formal approval by the sub-committee and later when it goes through the process of review and approval from the main XLIFF TC.
Disclaimer
![Page 3: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/3.jpg)
• Principles and Background
• Inline Markupo Characters that are invalid in XMLo Native Codeso Annotations
• Extensions
• Processing requirements
• XLIFF Toolkit
Agenda
![Page 4: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/4.jpg)
Some of the guidelines we are trying to follow during the work:
• Try to have only one way to do one thing
• Provide processing requirements
• Try to re-use existing standards when possible
• Try to keep things simple
Some Principles
![Page 5: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/5.jpg)
The structural part of XLIFF changes in 2.0 and the inline markup should be easy to handle in the new model.
• Static structureo <file> -> <group>* -> <unit>o Contents of the concatenated <source> elements
remain static during processing
• Dynamic structure inside <unit>o <segment>, <ignorable> -> <source>, <target>o A processor may merge or split the contents of
segments or ignorable.
Containing Structure
![Page 6: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/6.jpg)
The inline markup is what's inside the <source> and <target> elements
• Characters that are invalid in XML
• Original inline codes
• Annotations
What's the Inline Markup?
![Page 7: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/7.jpg)
• Inline codes belong to the <unit> and not to the <segment>(s)
• ID uniqueness within the <unit>
• Allows simple re-segmentation of the content of <unit>
• No need to clone codes that span multiple segments
Inline codes and segmentation
![Page 8: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/8.jpg)
For example control characters are not allowed in XML content, so they cannot be stored as-it in XLIFF.
<cp hex="0007"/> represents U+0007 (the "bell" character)
- Same as Unicode LDML format
- Only characters invalid in XML must use this notation.
Characters that are Invalid in XML
![Page 9: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/9.jpg)
• Support any type of native markup
• Standalone: <ph/>
• Spanning: <pc> and <sc/> + <ec/>
Inline Codes
![Page 10: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/10.jpg)
All possible cases:
Standalone code <ph id='1'/>
Well-formed spanning code <pc id='1'>text</pc>
Start marker of spanning code <sc id='1'/>
End marker of spanning code <ec rid='1'/>
Orphan start marker of spanning code <sc id='1' isolated='yes'/>
Orphan end marker of spanning code <ec id='1' isolated='yes'/>
Inline Codes - Use Cases
![Page 11: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/11.jpg)
• No storage:
<source>A<ph id="1"/>B</source>
• Store, but only outside the segment:
<source>A<ph id="1" nid="d1"/>B</source>
<originalData> <data id="d1"><BR></data>
</originalData>
Inline Codes - Storage of Original
![Page 12: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/12.jpg)
<mrk> for well-formed constructs
<sm/> + <em/> otherwise
Attributes:
• id (required)
• type (default=generic)
• translate (yes or no, default=yes)
• ref (optional type-specific URI)
• value (optional type-specific text/data)
Annotations
![Page 13: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/13.jpg)
• Translate annotations
• Term annotations
• Comment annotations
• Custom annotations
The IDs link the same annotation in source and target if needed.
Annotations Types
![Page 14: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/14.jpg)
• To protect (or not) a span of content:
<mrk id="1" translate="no">content</mrk>
Note that translate can also be used with other types of annotations.
Translate Annotation
![Page 15: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/15.jpg)
• To denote a "term":
<mrk id="1" type="term" value="simple definition" ref="reference to more info">content</mrk>
The id links source and target if needed
Term Annotation
![Page 16: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/16.jpg)
• Simple:
<source><mrk id="1" type="comment" value="The text of the comment">content</mrk></source>
• With associated note:
<source><mrk id="1" type="comment" ref="#n1">content</mrk></source>
<notes>
<note id="n1">Text of the note</note></notes>
Comment Annotation
![Page 17: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/17.jpg)
• User-defined annotation:
- The type attribute = <prefix>:<userType>
- The meanings of the value and ref attributes are defined by the user.
<mrk id="1" type="myPrefix:isbn" value="978-0-14-44919-8">The Epic of Gilgamesh</mrk>
Custom Annotation
![Page 18: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/18.jpg)
• A few attributes can take user-defined values: e.g. mrk@type, ph@type, pc@type
• No additional attributes are allowed in any of the inline elements
• No additional elements are allowed inside <source>, <target> or <data>
Custom annotations are essentially the only way to extend markup inside the inline content.
Extensions
![Page 19: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/19.jpg)
• Allowed markup transforms and related attribute mapping. Between <pc> and <sc>,<ec> pair.
• Define requirements for creation and editing of target text.
• Rules on cloning markup with and without reference to native data
• Stricter rules on attributes and ID references
• How to handle segmentation changes
Processing Requirements
![Page 20: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/20.jpg)
• Java-based and open source (LGPL)
• http://code.google.com/p/okapi-xliff-toolkit/
• Stream-based rather than DOM to handle very large documents
• Reader is event-driven
• Unit available as single object
• Writer also available
XLIFF Toolkit - A Library and More
![Page 21: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/21.jpg)
XLIFFReader reader = new XLIFFReader();
reader.open(new File("myInput.xlf"));
while ( reader.hasNext() ) {
XLIFFEvent event = reader.next();
if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {
Unit unit = event.getUnit();
// Do something with the unit
}
}
reader.close();
Library - Reading a Document
![Page 22: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/22.jpg)
XLIFFReader reader = new XLIFFReader();
XLIFFwriter writer = new XLIFFWriter();
reader.open(new File("myInput.xlf"));writer.create(new File("myOutput.xlf"));
while ( reader.hasNext() ) {
XLIFFEvent event = reader.next();
if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {
Unit unit = event.getUnit();
// Do something with the unit
}
writer.write(event);
}
reader.close(); writer.close();
Library - Updating a Document
![Page 23: Inline Markup in XLIFF 2.0](https://reader035.fdocuments.us/reader035/viewer/2022070407/5681431a550346895daf7569/html5/thumbnails/23.jpg)
Useful links
• Read the latest Editor's Draft:https://wiki.oasis-open.org/xliff/
• Comment or ask questions in the mailing lists:https://lists.oasis-open.org/archives/xliff-comment/https://lists.oasis-open.org/archives/xliff-users/
• Try out the toolkit:http://code.google.com/p/okapi-xliff-toolkit/
Q & A