Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC...
-
Upload
alexandro-goodwyn -
Category
Documents
-
view
215 -
download
0
Transcript of Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC...
Tutorial on Standoff Markupas used in:
HCRC Map Task CorpusMATE/NITE Workbench
Amy IsardHCRC Language Technology
GroupUniversity of Edinburgh
Standoff Annotation
• Don’t keep all your data in one big document
• One document for each annotation level (with its own DTD)
• Links between documents
LTG link syntax (1)
• an element can point to one or more contiguous elements in the same or a different document
• each element is identified by a unique ID
• a link is shown as an attribute on an element
• default attributes in the DTD tell a program that this is a link
LTG link syntax (2)
• attributes to describe a link which will be embedded in the original element output document
href CDATA #IMPLIEDxml:link CDATA #FIXED "simple“show CDATA #FIXED "embed“actuate CDATA #FIXED "auto"
Standoff Example (1):Words XML
<!DOCTYPE SYSTEM “words.dtd”><words> <word id=“w1”>turn</word> <word id=“w2”>right</word> <word id=“w3”>for</word> <word id=“w4”>three</word> <word id=“w5”>centimetres</word> <word id=“w6”>okay</word></words>
Standoff Example (2):Moves XML
<!DOCTYPE SYSTEM “moves.dtd”><moves> <move type=“instruct” speaker=“spk1”
id=“m1” href=“words.xml#id(w1)..id(w5)”/> <move type=“align” speaker=“spk1” id=“m2” href=“words.xml#id(w6)”/>…</moves>
Standoff Example (3):Moves and Words XML
<!DOCTYPE SYSTEM “words.dtd”>
<words> <word
id=“w1”>turn</word> <word
id=“w2”>right</word> <word id=“w3”>for</word> <word
id=“w4”>three</word> <word id=“w5”>centimetres </word> <word
id=“w6”>okay</word></words>
<!DOCTYPE SYSTEM “moves.dtd”>
<moves> <move type=“instruct”
speaker=“spk1” id=“m1” href=“words.xml#id(w1)..id(w5)”/>
<move type=“align” speaker=“spk1” id=“m2”
href=“words.xml#id(w6)”/>…</moves>
Advantages of Standoff Annotation
• It is possible to have levels of annotation which have crossing branches (not normally possible in XML)
• New levels of annotation can be added without disturbing existing ones
• Editing one level of annotation has minimal knock-on effects on others
• People can work on different levels at the same time without worrying about creating different versions
Example Map Task Annotation Structure
three centimetres okay three or four centimetres okay
right right
M instruct M ack M instruct M ackM align M align
S1
S2
turn right for
reparandum repair
Game instruct
Disfluency
DialogueMoves
DialogueGames
Disfluencies
Words
HCRC Map Task XML Corpus Architecture
Gaze
Timed Units
Tokens
Tagged Words
Automatic Syntax
Moves
Games
Transactions
Disfluencies
LandmarkReferences
Other Speaker’sWords
Tools and Software
• LTXML tools www.ltg.ed.ac.uk/software
• MATE workbench (NITE)mate.nis.sdu.dk (nite.nis.sdu.dk)
• Map Task XMLwww.hcrc.ed.ac.uk/maptask
knit
• Part of the LTXML toolkit• Allows you to “expand” links
according to how they have been defined in the DTD (e.g. replace or embed)
• Command line program, can be used in pipelines
Standoff Example (3):Moves and Words XML
<!DOCTYPE SYSTEM “words.dtd”>
<words> <word
id=“w1”>turn</word> <word
id=“w2”>right</word> <word id=“w3”>for</word> <word
id=“w4”>three</word> <word id=“w5”>centimetres </word> <word
id=“w6”>okay</word></words>
<!DOCTYPE SYSTEM “moves.dtd”>
<moves> <move type=“instruct”
speaker=“spk1” id=“m1” href=“words.xml#id(w1)..id(w5)”/>
<move type=“align” speaker=“spk1” id=“m2”
href=“words.xml#id(w6)”/>…</moves>
Standoff Example (4)Moves XML with embed
links<!DOCTYPE SYSTEM “moves.dtd”><moves> <move type=“instruct” speaker=“spk1” id=“m1”
href=“words.xml#id(w1)..id(w5)”> <word id=“w1”>turn</word> <word id=“w2”>right</word> <word id=“w3”>for</word> <word id=“w4”>three</word> <word id=“w5”>centimetres</word> </move> <move type=“align” speaker=“spk1” id=“m2”
href=“words.xml#id(w6)”> <word id=“w6”>okay</word> </move>…</moves>
Standoff Example (4)Moves XML with replace
links <!DOCTYPE SYSTEM “moves.dtd”><moves> <word id=“w1”>turn</word> <word id=“w2”>right</word> <word id=“w3”>for</word> <word id=“w4”>three</word> <word id=“w5”>centimetres</word> <word id=“w6”>okay</word>…</moves>
Working with knit
• Use knit on one XML document to work with one hierarchical view of the data
• To work across hierarchies, knit several views and navigate using the structures plus the unique ids of elements
Stylesheets
• style sheet: template rules– pattern which specifies which tree it applies to– pattern which specifies which tree it should
output
stylesheet processor– reads XML document and stylesheet– carries out the instructions in the stylesheet– outputs a new XML document or
Template Matching
• XPath is a language for addressing parts of an XML document, and is used by XSLT in the match attribute of a template e.g. <template match=“sentence”> matches any sentence element.
• A stylesheet processor goes through the XML document matching elements to templates and carries out the instructions in the template.
Standard Stylesheet Example
<template match=“dial”> <table> <apply-templates/> </table></template>
<template match=“move”>
<tr> <apply-templates/> </tr></template>
<template match=“word”>
<td> <apply-
templates/> </td></template>
The MATE Workbench
• For display, querying, and especially annotation of XML corpora
• Flexible user-defined user interfaces• Uses stylesheets to create Java display
objects which have defined user interface behaviours
• In MATE internal data representation, elements with link pointers are viewed as parent elements
MATE query language
• Easy to write queries over more than one hierarchy
• In MATE query language you define variables by element type and then relationships between them
• ($a ^ $b) means that element $a is a parent of element $b, either in the same document, or via a link.
MATE example query
• Find all words which are in a move whose label is “instruct” and which are part of a disfluency
($w word)($m move)($d disfluency);($m ^ $w) and ($m label ~ instruct)
and ($d ^ $w)