Annotating the WordNet Glosses Ben Haskell
description
Transcript of Annotating the WordNet Glosses Ben Haskell
![Page 2: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/2.jpg)
2004/10/08
Annotating the Glosses
• Annotating open-class words with their WordNet sense tag (a.k.a. sense-tagging)
• A disambiguation task: Process of linking an instance of a word to the WordNet synset representing its context-appropriate meaning, e.g.
run a company vs. run an errand
![Page 3: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/3.jpg)
2004/10/08
{ run#29 }, v
{ control, command } { change, alter, modify }
{ end, terminate }
{ complete, finish }
{ carry_through, accomplish, exceute, carry_out, action, fulfil, fulfill }
{ make, create }
{ cause, do, make }
{ effect, effectuate, bring_about, set_up }
{ manage, deal care, handle }
{ direct }
{ run#12, operate }, v . . . . . . . . .
. . . run a company . . . . . . run an errand . . .
carry out;“run an errand”
direct or control; project, businesses, etc.“She is running a relief operation in theSudan”
![Page 4: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/4.jpg)
2004/10/08
Glosses as node points in the network of relations
• Once a word’s gloss is annotated, the synsets for all conceptually-related words used in the gloss can be accessed via their sense tags
• Situates the word in an expanded network of links to other semantically-related words/concepts in WordNet
![Page 5: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/5.jpg)
2004/10/08
{ step }{ move }
{ dance#2 }, v
move in a graceful and rhythmical way;
IS-AENTAIL
DERIV
DERIVdancer#1
social_dancer
dancer#2professional_dancer
![Page 6: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/6.jpg)
2004/10/08
{ step }{ move }
{ dance#2 }, v
move in a graceful and rhythmical way;
IS-AENTAIL
DERIV
DERIVdancer#1
social_dancer
dancer#2professional_dancer
![Page 7: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/7.jpg)
2004/10/08
{ step }{ move }
awkward
{ graceful#1 }, a
{ dance#2 }, v
move in a graceful and rhythmical way;
IS-AENTAIL
DERIV
DERIV
ANT
deft
elegant
liquidfluent
fluid
SIM
SIM
SIM
gainly. . .
SIM
dancer#1
social_dancer
dancer#2professional_dancer
![Page 8: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/8.jpg)
2004/10/08
{ step }{ move }
awkward
{ graceful#1 }, a
{ dance#2 }, v
move in a graceful and rhythmical way;
IS-AENTAIL
DERIV
DERIV
ANT
{ rhythmical#1 }, a
{ way#8 }, n
manner
mode
style
fashion
unrhythmical
ANT
beatingpulsating
pulsing
SIM
cadenced
cadent
SIM SIM
danceable
. . .
deft
elegant
liquidfluent
fluid
SIM
SIM
SIM
gainly. . .
SIM
dancer#1
social_dancer
dancer#2professional_dancer
![Page 9: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/9.jpg)
2004/10/08
Annotating the Glosses
• Automatically tag monosemous words/collocations
• For gold standard quality, sense-tagging of polysemous words must be done manually
• More accurate sense-tagged data means better results for WSD systems, which means better performance from applications that depend on WSD
![Page 10: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/10.jpg)
2004/10/08
System overview
• Preprocessor– Gloss “parser” and tokenizer/lemmatizer– Semantic class recognizer– Noun phrase chunker– Collocation recognizer (globber)
• Automatic sense tagger for monosemous terms
• Manual tagging interface
![Page 11: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/11.jpg)
2004/10/08
Logical structure of a Gloss
• Smallest unit is a word, contracted form, or non-lexical punctuation
• Collocations are decomposed into their constituent parts– Allows coding of discontinuous collocations– A collocation can be treated either as a single
unit or a sequence of forms
![Page 12: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/12.jpg)
2004/10/08
Example glosses
• n. pass, toss, flip: (sports) the act of throwing the ball to another member of your team; "the pass was fumbled"
• n. brace, suspender: elastic straps that hold trousers up (usually used in the plural)
• v. kick: drive or propel with the foot
![Page 13: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/13.jpg)
2004/10/08
Optional info preceding def: domain category, etc.
def
Optional infofollowing def:usage info, etc.
ex*
![Page 14: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/14.jpg)
2004/10/08
Optional info preceding def: domain category, etc.
def
Optional infofollowing def:usage info, etc.
ex*
![Page 15: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/15.jpg)
2004/10/08
[a] [musical] [composition] [or] [passage] [performed] [quickly] . . .
def
{ allegro#2 }, n
coll=a, sk=musical_composition%1:10:00::coll=b, sk=musical_passage%1:10:00::
coll=a coll=b
sk=perform%2:36:01::
sk=quickly%4:02:00::
![Page 16: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/16.jpg)
2004/10/08
Gloss “parser”
• Regularization & clean-up of the gloss• Recognize & XML tag <def>, <aux>,
<ex>, <qf>, verb arguments, domain <classif>
• <aux> and <classif> contents do not get tagged
• Replace XML-unfriendly characters (&, <, >) with XML entities
![Page 17: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/17.jpg)
2004/10/08
Tokenizer
• Isolate word forms
• Differentiate non-lexical from lexical punctuation– E.g., sentence-ending periods vs. periods in
abbreviations
• Recognize apostrophe vs. quotation marks– E.g., states’ rights vs. `college-bound students’
![Page 18: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/18.jpg)
2004/10/08
Lemmatizer
• A lemma is the WordNet entry form plus WordNet part of speech
• Inflected forms are uninflected using a stemmer developed in-house specifically for this task
• A <wf> may be assigned multiple potential lemmas– saw: lemma=“saw%1|saw%2|see%2”
– feeling: lemma=“feeling%1|feel%2”
![Page 19: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/19.jpg)
2004/10/08
Lemmatizer, cont.
• Exceptions: stopwords/phrases– Closed-class words (prepositions, pronouns,
conjunctions, etc.) – multi-word terms such as “by means of”,
“according to”, “granted that”
• Hyphenated terms not in WordNet get split and separately lemmatized– E.g., over-fed becomes over + fed
![Page 20: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/20.jpg)
2004/10/08
Semantic class recognizer
• Recognizes and marks up parenthesized and free text belonging to a finite set of semantic classes
• chem(ical symbol), curr(ency), date, d(ate)range, math, meas(ure phrase), n(umeric)range, num(ber), punc(tuation), symb(olic text), time, year
• Words and phrases in these classes will not be sense-tagged
![Page 21: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/21.jpg)
2004/10/08
Noun Phrase chunker
• Isolates noun phrases (“chunks”) in order to narrow the scope for finding noun collocations in the next stage
• Glosses are not otherwise syntactically parsed
• Trained and tagged POS using Thorsten Brants’s TnT statistical tagger
![Page 22: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/22.jpg)
2004/10/08
Noun Phrase chunker, cont.
• Trained and chunked noun phrases using Steven Abney’s partial parser Cass
• Enabled automatic recognition of otherwise ambiguous noun compounds and fixed expressions– E.g., opening move (JJ NN vs. VBG NN vs. VBG VB
vs. NN VB), bill of fare (NN IN NN vs. VB IN NN)
• Effected an increase in noun collocation coverage by 25% (types) and 29% (tokens)
![Page 23: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/23.jpg)
2004/10/08
Collocation recognizer
• Bag of Words approach– To find ‘North_America’, find glosses that have both
‘North’ and ‘America’
• Four passes1. Ghost: ‘bring_home_the_bacon’
• mark ‘bacon’ so it won’t be tagged as monosemous
2. Contiguous: ‘North_America’
3. Disjoint: North (and) [(South) America]
4. Examples: tag the synset’s collocations in its gloss
![Page 24: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/24.jpg)
2004/10/08
Automatic sense-tagger
• Tag monosemous words.
• Words that have…– …only one lemmatized form– …only one WordNet sense– …not been marked as possibly ambiguous
• i.e. non wait-list words, non ‘bacon’ words
![Page 25: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/25.jpg)
2004/10/08
The mantag interface
• Simplicity– Taggers will repeat the same actions hundreds
of times per day
• Automation– Instead of typing the 148,000 search terms, use
a centralized list– Also allows for easy tracking of double-
checking process
![Page 26: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/26.jpg)
2004/10/08
![Page 27: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/27.jpg)
2004/10/08
![Page 28: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/28.jpg)
2004/10/08
Statistics
Total number of glosses 117,549
Total number of words (tokens) 1,221,341
Total taggable words (tokens) 658,958 (57.9%)
auto-tagged 86,914 13.2%
mono sense/pos 3,872 0.6%
poly sense and/or pos 567,944 86.2%
not in WN 228 ~0.0%
![Page 29: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/29.jpg)
2004/10/08
Statistics, cont.
Initial taggable collocations (tokens) 49,726
auto-tagged 41,475 83.4%
mono sense/pos 462 0.9%
poly sense and/or pos 6,888 13.8%
not in WN 0 0.0%
![Page 30: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/30.jpg)
2004/10/08
Statistics, cont.
Total taggable word types 61,811
auto-tagged 19,117 30.9%
mono sense/pos 760 1.2%
poly sense and/or pos 41,650 67.4%
words not in WN 127 0.2%
non-word forms 30 ~0.0%
![Page 31: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/31.jpg)
2004/10/08
Statistics, cont.
Done thus far…
automatic tags 130,770
automatic collocations 49,726
manual tags 42,020
manual collocations 2,961
![Page 32: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/32.jpg)
2004/10/08
Aim of ISI Effort
• Jerry Hobbs, Ulf Hermjakob, Nishit Rathod, Fahad al-Qahtani
• Gold standard translation of glosses into first-order logic with reified events
![Page 33: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/33.jpg)
2004/10/08
In:
ISI Effort examples
ignore graceful#a#1
move#v#2
way#n#8
rhythmic#a#1
Out:
gloss for dance, v, 2:
dance-V-2'(e0,x) -> move-V-2'(e1,x) & in'(e2,e1,y) & graceful-A-1'(e3,y) & rhythmic-A-1'(e4,y) & way-N-8'(e5,y)
move ain graceful rhythmicand way
ignore ignore
![Page 34: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/34.jpg)
2004/10/08
In:ISI Effort examples, cont.
compositiona musical or passage performed quickly
ignore
ignoremusical_composition#n#1
musical_passage#n#1
perform#v#2
quickly#r#4
allegro-N-2'(e0,x) -> musical_composition-N-1/musical_passage-N-1'(e1,x) & perform-V-2'(e2,y,x) & quick-D-4'(e3,e2)
musical_composition-N-1'(e1,x) ->musical_composition-N-1/musical_passage-N-1'(e1,x)
musical_passage-N-1'(e1,x) ->musical_composition-N-1/musical_passage-N-1'(e1,x)
Out:
gloss for allegro, n, 2:
![Page 35: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/35.jpg)
2004/10/08
ISI Method
• Identify the most common gloss patterns and convert them first
• Parse– using Charniak’s parser:
• uneven, sometimes bizarre results (“aspen”: VBN)
– Hermjakob’s CONTEX parser:• greater local control
![Page 36: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/36.jpg)
2004/10/08
ISI Progress
• Completed glosses of nouns with patterns:– NG (P NG)*: 45% of
nouns– + NG ((VBN | VING) NG): 15% of nouns
• 45 + 15 = 60% complete!
• But gloss patterns are in a Zipf distribution:
![Page 37: Annotating the WordNet Glosses Ben Haskell](https://reader035.fdocuments.us/reader035/viewer/2022062409/5681477a550346895db4b0da/html5/thumbnails/37.jpg)
2004/10/08
NP (NP,PP)718141%
NP (NP,SBAR)297817%
NP (NP,VP)268415%
NP (NP,PP,SBAR)3632%
NP (NP,CC,NP)2802%
NP (DT,JJ,NN)2722%
Distribution of noun glosses