ANNOTATING EVENT ANAPHORA:
A CASE STUDY
Tommaso Caselli and Irina Prodanof
ILC-CNR, Pisa
[email protected] [email protected]
LREC-10 – May, 19th, La Valletta, Malta
Outline
Motivations Coreference annotation in TimeML Annotating event anaphora: a preliminary
scheme Annotation methodology and results Lesson learned and future works
Motivations
Eventualities represent the building blocks of the informative content of a document
Eventualities give rise to relations which create a rich informative network. temporal relations sharing of participants factivity coreferential relations
Coreferential relations among eventualities plays an important role for facilitating access to content and extract relevant information
Coref. in TimeML
TimeML & ISO-TimeML are standards for the annotation of events, temporal expressions and a set of relations between these entities (temporal, subordinating and aspectual relations)
Main contribution of TimeML: standard definition of event and methodology for its annotation
It-TimeML: Italian adaptation of TimeML (updated version on request) and part of ISO-TimeML
It-TimeML is currently used for the creation of the Italian TimeBank (172 news articles from ISST, PAROLE and Web, 67,140 tokens)
TimeML tags involved: EVENT and TLINK (temporal link) TimeML has not a specific link for coreference
annotation workaround: use of a special value of the TLINK tag: “identity”
“identity” is used to: connect two tokens which are part of a single
event instance (e.g. light verbs) connect coreferential relations between events,
namely set-subset
Coref. in TimeML (2)
fare la spesa [to do shopping].<EVENT id="e1">fare</EVENT> la<EVENT id="e2">spesa</EVENT><TLINK lid="l1" eventInstanceID="e1"relatedToEventInstance="e2“ relType="IDENTITY"/>
Coref. in TimeML (3) – Use of “identity”
Coref. in TimeML – Use of “identity” (3) La sessione privata servira’ a tre adempimentij . Innanzitutto, all’
approvazionej della proposta di Abete (ISST sole006).The private session will be used for three [fulfillments] j . First, the
[approval]j of the proposal of Abete.La <EVENT id="e1">sessione</EVENT> privata <EVENT id="e2">servira’</EVENT> a tre <EVENT id="e3">adempimenti</EVENT>. <SIGNAL id="s1">Innanzitutto</SIGNAL>, all’ <EVENT id="e4>approvazione</EVENT> della <EVENT id="e5">proposta</EVENT>di Abete.
<TLINK lid="l1" eventInstanceID="e4“ relatedToEventInstance="e3"relType="IDENTITY"/>
The use of the value “identity” is not satisfactory since it is NOT homogeneous
During the (current!) annotation effort for the creation of the Italian TimeBank we have observed that this value could be applied to other cases such as: synonyms hypernyms coreference (strict coreference – same referent in the
world)
Coref. in TimeML (4)
Event Anaphora Previous works: Hasler et al 2006; Bejan & Harabagiu
2008 Hasler et al. 2006: only NPs coreference (strict
definition), detailed guidelines – but NO specifications for the annotation; which events? ACE event frame (LIFE, CONFLICT,
MOVEMENT, JUSTICE….) TimeML compliant
Bejan & Harabagiu 2008: event coreference as a side effect of event structure. Event coreference is considered when two predicates express
same predicate, synonyms or hypernyms and share same arguments
TimeML compliant
Event Anaphora - Methodology (2)
Our approach: no event frames nor event templates; all instances of
event annotated in the Italian TimeBank (TimeML compliant);
open-domain text/discourse coarse grained bottom up approach in the definition
of the annotation scheme reduced and limited set of guidelines active
discovery of what is needed through annotation and observations from the data
event anaphora: strict coreference + indirect coreference
Event Anaphora - Annotation scheme (3)
TAGS ATTRIBUTES
MARKABLE ID, POS, DEFINITENESS, CLASS
EMPTY ID
TOPIC ID
LINK ID, ANAPHORTYPE, SRC
MAJJJJJJIII<MARKABLE> = <EVENT> BUT extended includes annotation of pronouns and adverbs.
Event Anaphora - Annotation scheme (4)
<EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian)
<TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic
“Stiamo ancora parlando, come certamente deve essere, e continueremo a consultarci”j . James Baker, segretario al Tesoro americano, ha commentato cosi’j i risultati dell’assemblea. (ISST els019)
“[We are still speaking, as it should be, and we will keep consulting]”j . James Baker, the American Treasure secretary, commented [so]j the results of the assembly.
Event Anaphora - Annotation scheme (4)
<EMPTY> = to annotate cases of zero anaphora and ellipsis (frequent in Italian)
<TOPIC> = to annotate entire portions of text; it provides anchor to those linguistic entities which can refer to discourse topic
<LINK> = it marks up an anaphoric relations. The attribute “anaphorType” explicits which type of anaporic relation “src” marks the anchor
Event Anaphora – Results (5) Annotation tool: PALinkA (Orasan, 2003) 3 annotators / 1,792 tokens no K scores
-Low agreement on the identification of anaphora but relative good on the anchors
- More specific guidelines and information
-Event anaphora is a widespread phenomenon
Lession Learned and Future Work Event anaphora is a widespread phenomenon which must be
addressed in separate tasks Relations between full event N, V, PP and Adj no pronominal anaphoras
New annotation scheme: 2 tags: <EVENT> and <AnafLink> different attributes for <EVENT>: FACTIVITY, GENERICITY,
POLARITY relations between particular events according to the attributes' values reduced type of anaphors (two values: direct vs. indirect)
Tracking of the participants: how to? Event anaphora annotation as a further link in TimeML or as
a separate task which can be built upon the TimeML annotation
New Tool: BAT (thanks to Marc Verhagen)
Top Related