Lecture: Lexical Semantics - UMass...

55
Lecture: Lexical Semantics [most slides borrowed from J&M 3rd ed. website] CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O’Connor College of Information and Computer Sciences University of Massachusetts Amherst Tuesday, October 17, 17

Transcript of Lecture: Lexical Semantics - UMass...

Lecture:Lexical Semantics

[most slides borrowed from J&M 3rd ed. website]

CS 585, Fall 2017Introduction to Natural Language Processing

http://people.cs.umass.edu/~brenocon/inlp2017

Brendan O’ConnorCollege of Information and Computer Sciences

University of Massachusetts Amherst

Tuesday, October 17, 17

• Wordsparsityisaproblem!• Showmetweetsaboutvo6ngirregulari6es• Trainaclassifieron50documents

• Idea:externaldatabaseofwordmeaninginforma6on,forwordtypes.• Today:wordsensesandtaxonomies• Thursday:sen6mentlexiconsandlexiconexpansion• Post-midterm:vectors,wordembeddings,distribu6onalseman6cs

2

Tuesday, October 17, 17

Terminology:lemmaandwordform

• Alemmaorcita(onform• Samestem,partofspeech,roughseman6cs

• Awordform• Theinflectedwordasitappearsintext

Wordform Lemma

banks bank

sung sing

Tuesday, October 17, 17

Lemmashavesenses

Tuesday, October 17, 17

Lemmashavesenses

• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…

• “…as agriculture burgeons on the east bank the river will shrink even more”

• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.

• ThelemmabankherehastwosensesTuesday, October 17, 17

Lemmashavesenses

• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…

• “…as agriculture burgeons on the east bank the river will shrink even more”

• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.

• Thelemmabankherehastwosenses

1Sense1:

Tuesday, October 17, 17

Lemmashavesenses

• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…

• “…as agriculture burgeons on the east bank the river will shrink even more”

• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.

• Thelemmabankherehastwosenses

1

2

Sense1:

Sense2:

Tuesday, October 17, 17

Homonymy

Homonyms:wordsthatshareaformbuthaveunrelated,dis6nctmeanings:

• bank1:financialins6tu6on,bank2:slopingland• bat1:clubforhiSngaball,bat2:nocturnalflyingmammal

1. Homographs(bank/bank,bat/bat)2. Homophones:

1. Writeandright2. Pieceandpeace

Tuesday, October 17, 17

HomonymycausesproblemsforNLPapplica6ons

• Informa6onretrieval• “bat care”

• MachineTransla6on• bat:murciélago(animal)orbate(forbaseball)

• Text-to-Speech• bass(stringedinstrument)vs.bass(fish)

Tuesday, October 17, 17

Polysemy

• 1.Thebankwasconstructedin1875outoflocalredbrick.• 2.Iwithdrewthemoneyfromthebank• Arethosethesamesense?

• Sense2:“Afinancialins6tu6on”• Sense1:“Thebuildingbelongingtoafinancialins6tu6on”

• Apolysemouswordhasrelatedmeanings• Mostnon-rarewordshavemul6plemeanings

Tuesday, October 17, 17

• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.

• Asystema6crela6onship:• BuildingOrganiza6on

• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)

MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses

Tuesday, October 17, 17

• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.

• Asystema6crela6onship:• BuildingOrganiza6on

• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)Tree(Plums have beautiful blossoms)

MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses

Tuesday, October 17, 17

• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.

• Asystema6crela6onship:• BuildingOrganiza6on

• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)Tree(Plums have beautiful blossoms) ! Fruit(I ate a preserved plum)

MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses

Tuesday, October 17, 17

Howdoweknowwhenawordhasmorethanonesense?

Tuesday, October 17, 17

Howdoweknowwhenawordhasmorethanonesense?

• The“zeugma”test:Twosensesofserve?• Which flights serve breakfast?• Does Lufthansa serve Philadelphia?• ?DoesLu_hansaservebreakfastandSanJose?

Tuesday, October 17, 17

Howdoweknowwhenawordhasmorethanonesense?

• The“zeugma”test:Twosensesofserve?• Which flights serve breakfast?• Does Lufthansa serve Philadelphia?• ?DoesLu_hansaservebreakfastandSanJose?

• Sincethisconjunc6onsoundsweird,• wesaythatthesearetwodifferentsensesof“serve”

Tuesday, October 17, 17

Word meaning relations

• Relationships between pairs of word meanings

• Synonymy: same meaning

• Antonymy: opposite meanings

• Hypernymy/hyponymy: more general/specific meanings

• Meronymy: part-whole relations

• etc.

10

Tuesday, October 17, 17

Synonyms

• Wordthathavethesamemeaninginsomeorallcontexts.• filbert/hazelnut• couch/sofa• big/large• automobile/car• vomit/throwup• Water/H20

• Twolexemesaresynonyms• iftheycanbesubs6tutedforeachotherinallsitua6ons• Ifsotheyhavethesameproposi(onalmeaning

Tuesday, October 17, 17

Synonyms

Tuesday, October 17, 17

Synonyms

• Buttherearefew(orno)examplesofperfectsynonymy.• Evenifmanyaspectsofmeaningareiden6cal• S6llmaynotpreservetheacceptabilitybasedonno6onsofpoliteness,slang,register,genre,etc.

Tuesday, October 17, 17

Synonyms

• Buttherearefew(orno)examplesofperfectsynonymy.• Evenifmanyaspectsofmeaningareiden6cal• S6llmaynotpreservetheacceptabilitybasedonno6onsofpoliteness,slang,register,genre,etc.

• Example:• Water/H20

• Big/large• Brave/courageous

Tuesday, October 17, 17

Synonymyisarela6onbetweensensesratherthanwords

Tuesday, October 17, 17

Synonymyisarela6onbetweensensesratherthanwords

• Considerthewordsbigandlarge

Tuesday, October 17, 17

Synonymyisarela6onbetweensensesratherthanwords

• Considerthewordsbigandlarge• Aretheysynonyms?

• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?

Tuesday, October 17, 17

Synonymyisarela6onbetweensensesratherthanwords

• Considerthewordsbigandlarge• Aretheysynonyms?

• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?

• Howabouthere:• MissNelsonbecameakindofbigsistertoBenjamin.• ?MissNelsonbecameakindoflargesistertoBenjamin.

Tuesday, October 17, 17

Synonymyisarela6onbetweensensesratherthanwords

• Considerthewordsbigandlarge• Aretheysynonyms?

• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?

• Howabouthere:• MissNelsonbecameakindofbigsistertoBenjamin.• ?MissNelsonbecameakindoflargesistertoBenjamin.

• Why?• bighasasensethatmeansbeingolder,orgrownup• largelacksthissense

Tuesday, October 17, 17

Antonyms

• Sensesthatareoppositeswithrespecttoonefeatureofmeaning• Otherwise,theyareverysimilar!

dark/light short/long! fast/slow! rise/fallhot/cold! up/down! in/out

• Moreformally:antonymscan• defineabinaryopposi6on

orbeatoppositeendsofascale• long/short, fast/slow

• bereversives:• rise/fall, up/down

Tuesday, October 17, 17

HyponymyandHypernymy

• Onesenseisahyponymofanotherifthefirstsenseismorespecific,deno6ngasubclassoftheother• carisahyponymofvehicle• mangoisahyponymoffruit

• Converselyhypernym/superordinate(“hyperissuper”)• vehicleisahypernymofcar• fruitisahypernymofmango

Superordinate/hyper vehicle fruit furnitureSubordinate/hyponym car mango chair

Tuesday, October 17, 17

Hyponymymoreformally

• Extensional:• Theclassdenotedbythesuperordinateextensionallyincludestheclassdenotedbythehyponym

• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB

• Hyponymyisusuallytransi6ve• (AhypoBandBhypoCentailsAhypoC)

• Anothername:theIS-Ahierarchy• AIS-AB(orAISAB)• BsubsumesA

Tuesday, October 17, 17

HyponymsandInstances

• WordNethasbothclassesandinstances.• Aninstanceisanindividual,apropernounthatisauniqueen6ty

• San Francisco isaninstanceofcity• Butcityisaclass• cityisahyponymofmunicipality...location...

17

Tuesday, October 17, 17

Meronymy

• Thepart-wholerela6on• Alegispartofachair;awheelispartofacar.

• Wheelisameronymofcar,andcarisaholonymofwheel.

18

Tuesday, October 17, 17

Computing with a ThesaurusWordNet

Tuesday, October 17, 17

WordNet3.0

• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadic6onary

• Someotherlanguagesavailableorunderdevelopment• (Arabic,Finnish,German,Portuguese…)

Category UniqueStrings

Noun 117,798Verb 11,529Adjec6ve 22,479Adverb 4,481

Tuesday, October 17, 17

Sensesof“bass”inWordnet

Tuesday, October 17, 17

Howis“sense”definedinWordNet?

• Thesynset(synonymset),thesetofnear-synonyms,instan6atesasenseorconcept,withagloss

• Example:chumpasanounwiththegloss:“apersonwhoisgullibleandeasytotakeadvantageof”

• Thissenseof“chump”issharedby9words:chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2

• Eachofthesesenseshavethissamegloss• (Noteverysense;sense2ofgullistheaqua6cbird)

Tuesday, October 17, 17

WordNetHypernymHierarchyfor“bass”

Tuesday, October 17, 17

WordNet:Viewedasagraph

24

Tuesday, October 17, 17

Hyponyms of “person” in WN7588 total -- with most freq. sense restriction. [from Michael Heilman]

• vintager• matrisib• horseback rider• ceo• seeker• fieldhand• radiologist• captain• moujik• research director• damsel• nibbler• nailer• nude person• seismologist• oddball• prankster• radiotherapist• nebraskan• cupbearer• psychic

• accompanist• plagiariser• timberman• photographer's

model• lombard• debaser• courtier• dutch uncle• schlemiel• dizygotic twin• mental case• matriarch• vocalist• internist• transplanter• techie• sniffler• marrano• first baseman• government man

• child prodigy• athenian• hospital chaplain• dominatrix• bibliopole• hombre• east indian• ballet master• bad person• rock 'n' roll

musician• flack catcher• telephoner• dominus• cheater• groveler• accomplice• herb doctor• schoolfriend• preteen• gastronome

• concierge• shogun• flutist• bottom dog• imperialist• emir• libeler• manichaean• abnegator• cousin-german• masorite• trouble maker• villainess• rajpoot• calapooya• overlord• bank guard• tumbler• polycarp• radiographer• slave owner

• stick-in-the-mud• audile• deadbeat• maltman• jeweler• pasha• screwballer• prioress• crosspatch• persecutor• movie maker• capo• class act• navvy• golden boy• sweet talker• junior• feminist• villager• specialiser• scotsman

25

Tuesday, October 17, 17

“Supersenses”

26

(countsfromSchneiderandSmith2013’sStreuselcorpus)

“Supersenses”The*top*level*hypernyms in*the*hierarchy

26

(counts%from%Schneider%and%Smith%2013’s%Streusel%corpus)Noun Verb

GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.

is short forCOMMUNICATION

SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018

Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.

enrich the MWE annotations of the CMWE corpus1

(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.

2 Background: Supersense Tags

WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3

1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified

Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into

lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/

The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.

Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.

As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.

Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE

(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one

HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though

NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops

contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.

Noun Verb

GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.

is short forCOMMUNICATION

SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018

Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.

enrich the MWE annotations of the CMWE corpus1

(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.

2 Background: Supersense Tags

WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3

1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified

Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into

lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/

The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.

Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.

As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.

Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE

(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one

HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though

NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops

contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.

Noun Verb

GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.

is short forCOMMUNICATION

SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018

Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.

enrich the MWE annotations of the CMWE corpus1

(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.

2 Background: Supersense Tags

WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3

1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified

Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into

lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/

The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.

Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.

As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.

Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE

(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one

HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though

NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops

contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.

Tuesday, October 17, 17

Supersenses

• Aword’ssupersensecanbeausefulcoarse-grainedrepresenta6onofwordmeaningforNLPtasks

27

• See“STREUSEL”systemhpp://www.cs.cmu.edu/~ark/LexSem/

Tuesday, October 17, 17

• TouseWordNet,oranylexicaldatabase,forNLP:• 1.WordSenseDisambigua6on(WSD)a.k.a.En6tyLinking• The[bank]3wasopenearly.

• 2.Uselexicalentryinforma6onforfeaturesorinferences• Whenwasthatbusinessopen?• [bank]3<-hypo-[commercialins6tu6on]

28

Tuesday, October 17, 17

WordNet3.0

• Whereitis:• hpp://wordnetweb.princeton.edu/perl/webwn

• Libraries• Python:WordNetfromNLTK• Java:JWNL,extJWNL

Tuesday, October 17, 17

• MeSH(MedicalSubjectHeadings)• 177,000entrytermsthatcorrespondto26,142biomedical“headings”

• HemoglobinsEntryTerms:Eryhem,FerrousHemoglobin,HemoglobinDefini(on:Theoxygen-carryingproteinsofERYTHROCYTES.Theyarefoundinallvertebratesandsomeinvertebrates.Thenumberofglobinsubunitsinthehemoglobinquaternarystructurediffersbetweenspecies.Structuresrangefrommonomerictoavarietyofmul6mericarrangements

MeSH:MedicalSubjectHeadingsthesaurusfromtheNa6onalLibraryofMedicine

Tuesday, October 17, 17

Synset

• MeSH(MedicalSubjectHeadings)• 177,000entrytermsthatcorrespondto26,142biomedical“headings”

• HemoglobinsEntryTerms:Eryhem,FerrousHemoglobin,HemoglobinDefini(on:Theoxygen-carryingproteinsofERYTHROCYTES.Theyarefoundinallvertebratesandsomeinvertebrates.Thenumberofglobinsubunitsinthehemoglobinquaternarystructurediffersbetweenspecies.Structuresrangefrommonomerictoavarietyofmul6mericarrangements

MeSH:MedicalSubjectHeadingsthesaurusfromtheNa6onalLibraryofMedicine

Tuesday, October 17, 17

TheMeSHHierarchy

• a

31

Tuesday, October 17, 17

TheMeSHHierarchy

• a

31

Tuesday, October 17, 17

TheMeSHHierarchy

• a

31

Tuesday, October 17, 17

UsesoftheMeSHOntology

• Providesynonyms(“entryterms”)• E.g.,glucoseanddextrose

• Providehypernyms(fromthehierarchy)• E.g.,glucoseISAmonosaccharide

• IndexinginMEDLINE/PubMEDdatabase• NLM’sbibliographicdatabase:• 20millionjournalar6cles• Eachar6clehand-assigned10-20MeSHterms

Tuesday, October 17, 17

En(ty-focusedlexicaldatabases

• General-domain,derivedfromWikipedia• DBpediahpp://wiki.dbpedia.org/• Freebase:nowdiscon6nued

• GoogleKnowledgeGraphandotherproprietarydatabases(Bing,Facebook,etc.)• Lotsofrela6ons/apributes,aimedforconsumerinternetuse• Canbeuseddirectlytoanswerqueries• Internally,theyen6ty-linkdocumenttextsagainstthem

33

Tuesday, October 17, 17

WordSimilarity

• Synonymy:abinaryrela6on• Twowordsareeithersynonymousornot

• Similarity(ordistance):aloosermetric• Twowordsaremoresimilariftheysharemorefeaturesofmeaning

• Similarityisproperlyarela6onbetweensenses• Theword“bank”isnotsimilartotheword“slope”• Bank1issimilartofund3

• Bank2issimilartoslope5

• Butwe’llcomputesimilarityoverbothwordsandsensesTuesday, October 17, 17

Whywordsimilarity

• Aprac6calcomponentinlotsofNLPtasks• Ques6onanswering• Naturallanguagegenera6on• Automa6cessaygrading• Plagiarismdetec6on

• Atheore6calcomponentinmanylinguis6candcogni6vetasks• Historicalseman6cs• Modelsofhumanwordlearning• Morphologyandgrammarinduc6on

Tuesday, October 17, 17

Wordsimilarityandwordrelatedness

• Weo_endis6nguishwordsimilarityfromwordrelatedness• Similarwords:near-synonyms• Relatedwords:canberelatedanyway• car, bicycle:similar• car, gasoline:related,notsimilar

Tuesday, October 17, 17

Twoclassesofsimilarityalgorithms

• Thesaurus-basedalgorithms• Arewords“nearby”inhypernymhierarchy?• Dowordshavesimilarglosses(defini6ons)?

• Distribu6onalalgorithms• Dowordshavesimilardistribu6onalcontexts?• Distribu6onal(Vector)seman6csonThursday!

Tuesday, October 17, 17

38

Path*based*similarity

• Two%concepts%(senses/synsets)%are%similar%if%they%are%near%each%other%in%the%thesaurus%hierarchy%• =have%a%short%path%between%them• concepts%have%path%1%to%themselves

Tuesday, October 17, 17

Evalua6ngsimilarity

• Extrinsic(task-based,end-to-end)Evalua6on:• Ques6onAnswering• SpellChecking• Essaygrading

• IntrinsicEvalua6on:• Correla6onbetweenalgorithmandhumanwordsimilarityra6ngs• Wordsim353:353nounpairsrated0-10.sim(plane,car)=5.77

• TakingTOEFLmul6ple-choicevocabularytests• Levied is closest in meaning to: imposed, believed, requested, correlated

Tuesday, October 17, 17