Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

1
Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone Jinho Choi, Claire Bonial, Martha Palmer Institute of Cognitive Science, University of Colorado at Boulder • A corpus in which the arguments of each verb predicate are annotated with their semantic roles. • Each predicate is also annotated with its sense ID supplied through frameset files. Propbank • Frameset files outline the argument structure for each sense of every predicate in the Propbank. • Annotators use the semantic and syntactic information provided in frameset files to efficiently make consistent annotations. Frameset Files Cornerstone Advantages and Features Free of errors: Frameset files created by Cornerstone are guaranteed to be free of errors. Easy customization: allows users to easily customize tags required for frameset annotations. Free of XML: frameset authors do not need to know any XML. Multilingual: accommodates Arabic, Chinese, English, Hindi and Korean. Platform independent: runs on any platform with JVM (Java 6.0). Run on X11: annotators can make updates remotely. How to obtain Cornerstone • Available as an open source project on Google code ( http://code.google.com/p/propbank ). • The project also provides a Propbank instance editor, Jubilee. • Both Cornerstone and Jubilee have been used in several universities. Contact: [email protected] More about Cornerstone • We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE- CRI 0709167, Collaborative: A Multi-Representational and Multi- Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views Acknowledgements E.g. John opened the door with his foot. Sense ID open.01 (‘open’) ARG 0 John (agent: ‘opener’) ARG 1 the door (theme: ‘thing opening’) ARG 2 with his foot (instrument) Propbank annotations are based on the predicate argument structure outlined in frameset files It is important to keep the frameset files consistent, simple to read, and easy to update. opened: open.01 John the door with his foot ARG 0 (agent) ARG 1 (theme) ARG 2 (instrument) Including verb-particle constructions One per each verb All verbs in Propbank Multi-lemma mode • A predicate can have multiple lemmas (e.g., open, open up). • Languages: English, Hindi Uni-lemma mode • A predicate can have only one lemma. • Languages: Arabic, Chinese Frameset files in XML • All frameset files are written in XML that is complicated for those who are not familiar with it, leading to potential errors. <frameset> <predicate lemma="open"> <roleset id="open.01" name="open" vncls="40.3.2 45.4 47.6"> <roles> <role descr="opener" n="0"> <vnrole vncls="47.6" vntheta="Agent"/> <vnrole vncls="40.3.2" vntheta="Agent"/> <vnrole vncls="45.4" vntheta="Agent"/> </role> <role descr="thing opening" n="1"> <vnrole vncls="47.6" vntheta="Theme"/> <vnrole vncls="40.3.2" vntheta="Patient"/> <vnrole vncls="45.4" vntheta="Patient"/> </role> Lemma( s) Senses Multi-lemma: Roleset Uni-lemma: Frameset Argument structure for the selected sense Examples for the selected sense Mappings between syntactic and semantic arguments

description

This paper gives guidelines of how to create and update Propbank frameset files using a dedicated editor, Cornerstone. Propbank is a corpus in which the arguments of each verb predicate are annotated with their semantic roles in relation to the predicate. Propbank annotation also requires the choice of a sense ID for each predicate. Thus, for each predicate in Propbank, there exists a corresponding frameset file showing the expected predicate argument structure of each sense related to the predicate. Since most Propbank annotations are based on the predicate argument structure defined in the frameset files, it is important to keep the files consistent, simple to read as well as easy to update. The frameset files are written in XML, which can be difficult to edit when using a simple text editor. Therefore, it is helpful to develop a user-friendly editor such as Cornerstone, specifically customized to create and edit frameset files. Cornerstone runs platform independently, is light enough to run as an X11 application and supports multiple languages such as Arabic, Chinese, English, Hindi and Korean.

Transcript of Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

Page 1: Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

Propbank Frameset Annotation GuidelinesUsing a Dedicated Editor, Cornerstone

Jinho Choi, Claire Bonial, Martha PalmerInstitute of Cognitive Science, University of Colorado at Boulder

• A corpus in which the arguments of each verb predicate are annotated with their semantic roles.• Each predicate is also annotated with its sense ID supplied through frameset files.

Propbank

• Frameset files outline the argument structure for each sense of every predicate in the Propbank.• Annotators use the semantic and syntactic information provided in frameset files to efficiently make consistent annotations.

Frameset Files

Cornerstone

Advantages and Features

• Free of errors: Frameset files created by Cornerstone are guaranteed to be free of errors.

• Easy customization: allows users to easily customize tags required for frameset annotations.

• Free of XML: frameset authors do not need to know any XML.

• Multilingual: accommodates Arabic,

Chinese, English, Hindi and Korean.

• Platform independent: runs on any platform with JVM (Java 6.0).

• Run on X11: annotators can

make updates remotely.

How to obtain Cornerstone

• Available as an open source project on Google code (http://code.google.com/p/propbank).• The project also provides a Propbank instance editor, Jubilee.• Both Cornerstone and Jubilee have been used in several universities.• Contact: [email protected]

More about Cornerstone

• We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE- CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE

program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Acknowledgements

E.g. John opened the door with his foot.

Sense ID open.01 (‘open’)ARG0 John (agent: ‘opener’)

ARG1 the door (theme: ‘thing opening’)

ARG2 with his foot (instrument)

Propbank annotations are based on the predicate argument structure outlined in frameset files

It is important to keep the frameset files consistent, simple to read, and easy to update.

opened: open.01

John the door with his footARG0 (agent) ARG1 (theme) ARG2 (instrument)

Includingverb-particleconstructions

One pereach verb

All verbsin Propbank

Multi-lemma mode

• A predicate can have multiple lemmas (e.g., open, open up).• Languages: English, Hindi

Uni-lemma mode

• A predicate can have only one lemma.• Languages: Arabic, Chinese

Frameset files in XML

• All frameset files are written in XML that is complicated for those who are not familiar

with it, leading to potential errors.

<frameset><predicate lemma="open"><roleset id="open.01" name="open" vncls="40.3.2 45.4 47.6"><roles><role descr="opener" n="0"> <vnrole vncls="47.6" vntheta="Agent"/> <vnrole vncls="40.3.2" vntheta="Agent"/> <vnrole vncls="45.4" vntheta="Agent"/></role><role descr="thing opening" n="1"> <vnrole vncls="47.6" vntheta="Theme"/> <vnrole vncls="40.3.2" vntheta="Patient"/> <vnrole vncls="45.4" vntheta="Patient"/></role>…

Lemma(s)

SensesMulti-lemma: RolesetUni-lemma: Frameset

Argument structurefor the selected sense

Examplesfor the selected sense

Mappings between

syntactic and semantic arguments