ww
w.u
ni-
stu
ttg
art.
de
[A Recursive Annotation Scheme [for Referential Information Status] ]
Arndt Riester1, David Lorenz2, Nina Seemann1
1Institute for Natural Language Processing (IMS) & SFB 732,University of Stuttgart
2English Department,University of Freiburg
19.5.2010LREC Malta
ww
w.u
ni-
stu
ttg
art.
de
2
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items
ww
w.u
ni-
stu
ttg
art.
de
3
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994)
ww
w.u
ni-
stu
ttg
art.
de
4
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994) or between EVOKED, INFERRABLE and NEW items (Prince
1981)
ww
w.u
ni-
stu
ttg
art.
de
5
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994) or between EVOKED, INFERRABLE and NEW items (Prince
1981) or: e.g. Prince (1992), Nissim et al. (2004), Dipper et al. (2007)
BRAND-NEW ANCHORED
DISCOURSE OLD
OLD-RELATIVE
HEARER NEW
OLD-IDENTITYUNUSED
CONTAINING INFERRABLE
BRAND-NEW UNANCHORED
BRIDGING
DIS
CO
UR
SE
NE
WTEXTUALLY EVOKED
MEDIATED-SITUATION
OLD-GENERIC
MEDIATED-PART
OLD-ID-GENERIC
OLD-GENERIC
OLD-GENERAL
DISCOURSE OLD
OLD-EVENTMEDIATED-GENERAL MEDIATED-AGGREGATED
MEDIATED-FUNC_VALUES
MEDIATED-POSSESSIVE
MEDIATED-EVENT
ACCESSIBLE-INFERABLE
ACCESSIBLE-SITUATION
ACCESSIBLE-GENERAL
SITUATIONALLY EVO
KED
ww
w.u
ni-
stu
ttg
art.
de
6
Desiderata
A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings
(1) [the red gem [in [the Queen‘s] crown] ]
3 referents
ww
w.u
ni-
stu
ttg
art.
de
7
Desiderata
A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings
(1) [the red gem [in [the Queen‘s] A crown] B ] C
3 referents
3 nested labels for information status
ww
w.u
ni-
stu
ttg
art.
de
8
Two levels of givenness
Givenness of words: repetition, synonymy, hypernymy
(2) {On my way home, I saw a poodle.
a. It reminded me of Anna‘s poodle.
b. It reminded me of Anna‘s dog.
Givenness of referents: coreference
(3) {On my way home, I saw a poodle.}
a. The poodle / It tried to bite me.
b. The stupid beast tried to bite me.
ww
w.u
ni-
stu
ttg
art.
de
9
Two levels of givenness
Givenness of words: repetition, synonymy, hypernymy
(2) {On my way home, I saw a poodle.
a. It reminded me of Anna‘s poodle.
b. It reminded me of Anna‘s dog.
Givenness of referents: coreference
(3) {On my way home, I saw a poodle.}
a. The poodle / It tried to bite me.
b. The stupid beast tried to bite me.
Keep the two apart! In the following: GIVEN ≡ coreferential But see Baumann & Riester (2010) for a two-level scheme
( Importance for prosody)
ww
w.u
ni-
stu
ttg
art.
de
10
Context Theory
discourse context
(e.g. DRT; Kamp & Reyle 1993): what has been explicitly stated
before
utterance context
(indexicality; e.g. Kaplan 1989): speaker, location, time; entities in visual environment
frame contexts
(e.g. Fillmore 1985): plausible protagonists
in a scenario
encyclopaedic context
(e.g. Kamp, to appear): world
knowledge of an expected audience
ww
w.u
ni-
stu
ttg
art.
de
11
A Simple Rule for Definite Expressions
Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).
Claim: Information status classes should directly reflect the four context components.
ww
w.u
ni-
stu
ttg
art.
de
12
A Simple Rule for Definite Expressions
Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).
Claim: Information status classes should directly reflect the four context components.
Definite identified in Information status class
discourse context GIVEN
utterance context SITUATIVE
frame context BRIDGING
encyclopaedic context UNUSED
ww
w.u
ni-
stu
ttg
art.
de
13
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-anaphoric) item is known to an intended audience
ww
w.u
ni-
stu
ttg
art.
de
14
Will they know this?
YES
UNUSED-KNOWN
NO
UNUSED-UNKNOW
N
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-anaphoric) item is known to an intended audience
„Barack Obama“ „the woman Max went out with last night“
ww
w.u
ni-
stu
ttg
art.
de
15
Will they know this?
YES
UNUSED-KNOWN
NO
UNUSED-UNKNOW
N
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-GIVEN) item is known to an intended audience
„Barack Obama“ „the woman Max went out with last night“
accommodation
encyclo
paedic
knowledge
ww
w.u
ni-
stu
ttg
art.
de
16
News Example (USA Today, 17.5.10)
[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of
the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but
[a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the
Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should
[Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the
nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a
Protestant-free zone]INDEF-GENERIC [for the first time since [John
Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an
Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged
[[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-
stu
ttg
art.
de
17
News Example (USA Today, 17.5.10)
[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of
the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but
[a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the
Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should
[Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the
nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a
Protestant-free zone]INDEF-GENERIC [for the first time since [John
Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an
Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged
[[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-
stu
ttg
art.
de
18
News Example (USA Today, 17.5.10)
[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of
the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but
[a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the
Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should
[Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the
nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a
Protestant-free zone]INDEF-GENERIC [for the first time since [John
Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an
Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged
[[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-
stu
ttg
art.
de
19
News Example (USA Today, 17.5.10)
[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of
the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but
[a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the
Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should
[Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the
nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a
Protestant-free zone]INDEF-GENERIC [for the first time since [John
Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an
Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged
[[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-
stu
ttg
art.
de
20
News Example (USA Today, 17.5.10)
[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of
the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED,
but [a recent flurry of Catholic and Jewish
appointments]INDEF-NEW has turned [them]GIVEN-PRONOUN [into a
minority of one [on the Supreme
Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be
confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET
highest court]GIVEN-EPITHET would be [a Protestant-free
zone]INDEF-GENERIC [for the first time since [John Jay, [the
nation‘s]GIVEN-REPEATED first chief justice (and an
Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged
[[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-
stu
ttg
art.
de
21
Data
Transcripts from German radio news bulletins (three full days of (hourly) news)
About 3000 sentences Parsed with XLE / German LFG grammar (Rohrer & Forst 2006) Annotated with SALTO tool (Burchardt et al. 2006), extended
TigerXML format Two annotators, verification and ultimate decision by a third
annotator
ww
w.u
ni-
stu
ttg
art.
de
22
Annotation using SALTO (Burchardt et al. 2006)
„...said Kirchner in Cordoba...“ „... the Argentinian head of state...“
ww
w.u
ni-
stu
ttg
art.
de
23
Inter-Annotator Agreement (Cohen 1960)
Evaluation performed on a subset comprising 1149 nominal expressions, which the annotators had to identify by themselves
1100 expressions identified by both annotators 757 labeled identically Agreement κ = .66 (full scheme: 21 subclasses)
κ = .78 (core scheme comprising 6 classes: GIVEN, SITUATIVE, BRIDGING, UNUSED, INDEF, OTHER)
Comparison: Dipper et al. (2007), κ = .55 (newspaper commentaries) Nissim et al. (2004), κ = .79 (full); κ = .85 (core) (dialogue)
(fewer embeddings; pre-exclusion of „difficult“ cases)
(Source: Ritz et al. 2008)
ww
w.u
ni-
stu
ttg
art.
de
24
Conclusion
Scheme enables fast, comprehensible and reliable annotations of nested expressions in arbitrary text genres
Useful fora. Computational linguists: e.g. creating a gold standard for anaphora
resolution and related tasks
b. Theoretical linguists: empirical data for investigations into form of referring expressions, (non-)restrictivity of modification, word order, grammatical role, discourse structure etc.
c. Phoneticians: investigating prosody in spoken corpora
Learn more: http://www.ims.uni-stuttgart.de/~arndt
ww
w.u
ni-
stu
ttg
art.
de
25
Thank you!
ww
w.u
ni-
stu
ttg
art.
de
26
Details: GIVEN
Subclasses: PRONOUN, REFLEXIVE, SHORT, REPEATED, EPITHET
(1) Both had the blessings of Dr. Richard Klausner. But even [Klausner]GIVEN-SHORT had to be persuaded at first.
(2) Before the European Union‘s ban on incandescent lightbulbs went into effect on Sept. 1, consumers across Europe raided stores to stockpile [the familiar bulbs]GIVEN-EPITHET
ww
w.u
ni-
stu
ttg
art.
de
27
Details: BRIDGING
Subclasses: 0, TEXT, CONTAINED
(1) Germany lost the football match against England because [the audience]BRIDGING was against them.
(2) United were trailing 3-1 when Fletcher was felled [in the area]BRIDGING-TEXT by Aleksei Berezutski. The Scotland Midfielder midfielder was then yellow-carded by [the referee]BRIDGING-TEXT.
ww
w.u
ni-
stu
ttg
art.
de
28
Details: bridging-contained vs. unused-unknown
(1) The Republicans won [the governorship of Virginia]BRIDGING-
CONTAINED.
(expected / prototypical relationship)
(2) He was convicted of helping to organise [the seizure [of Osama Moustafa Nasr]]UNUSED-UNKNOWN from a Milan street in February 2003.
(non-prototypical relationship, can‘t be separated)
(3) # Speaking of Osama Moustafa Nasr, [the seizure] happened in 2003.
ww
w.u
ni-
stu
ttg
art.
de
29
Details: INDEF
Subclasses: NEW, GENERIC, PARTITIVE, RESUMPTIVE
(1) [A man]INDEF-NEW came in. He bought a pair of shoes.
(2) [Serious beer drinkers]INDEF-GENERIC should head straight to this 550-year old institution.
(3) At violent clashes between the police and demonstrating Kurds, [three demonstrators]INDEF-PARTITIVE were injured.
(4) That‘s close to how a cancer vaccine works, but not precisely. Most experts see [cancer vaccines]INDEF-RESUMPTIVE as a hybrid of treatment and prevention.
ww
w.u
ni-
stu
ttg
art.
de
30
Other
EXPLETIVE NULL: nobody, nothing RELATIVE: non-restrictive relative clause CATAPHOR: can be indefinite or definite
Top Related