Lecture 22 Word Similarity

64
Lecture 22 Word Similarity Topics Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarity Readings: Readings: NLTK book Chapter 2 (wordnet) Text Chapter 20 April 8, 2013 CSCE 771 Natural Language Processing

description

Lecture 22 Word Similarity. CSCE 771 Natural Language Processing. Topics word similarity Thesaurus based word similarity I ntro. Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 8, 2013. Overview. Last Time (Programming) - PowerPoint PPT Presentation

Transcript of Lecture 22 Word Similarity

Page 1: Lecture 22 Word Similarity

Lecture 22Word Similarity

Lecture 22Word Similarity

Topics Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarity

Readings:Readings: NLTK book Chapter 2 (wordnet)

Text Chapter 20

April 8, 2013

CSCE 771 Natural Language Processing

Page 2: Lecture 22 Word Similarity

– 2 – CSCE 771 Spring 2013

OverviewOverviewLast Time (Programming)Last Time (Programming)

Features in NLTK NL queries SQL NLTK support for Interpretations and Models Propositional and predicate logic support Prover9

TodayToday Last Lectures slides 25-29 Features in NLTK Computational Lexical Semantics

Readings: Readings: Text 19,20 NLTK Book: Chapter 10

Next Time: Computational Lexical Semantics IINext Time: Computational Lexical Semantics II

Page 3: Lecture 22 Word Similarity

– 3 – CSCE 771 Spring 2013

Figure 20.1 Possible sense tags for bassFigure 20.1 Possible sense tags for bassChapter 20 – Word Sense disambiguation (WSD)Chapter 20 – Word Sense disambiguation (WSD)

Machine translationMachine translation

Supervised vs unsupervised learningSupervised vs unsupervised learning

Semantic concordance – corpus with words tagged Semantic concordance – corpus with words tagged with sense tags with sense tags

Page 4: Lecture 22 Word Similarity

– 4 – CSCE 771 Spring 2013

Feature Extraction for WSDFeature Extraction for WSD

Feature vectorsFeature vectors

CollocationCollocation

[w[wi-2i-2, POS, POSi-2i-2, w, wi-1i-1, POS, POSi-1i-1, w, wii, POS, POSii, w, wi+1i+1, POS, POSi+1i+1, w, wi+2i+2, POS, POSi+2i+2]]

Bag-of-words – unordered set of neighboring wordsBag-of-words – unordered set of neighboring words

Represent sets of most frequent content words with Represent sets of most frequent content words with membership vectormembership vector

[0,0,1,0,0,0,1] – set of 3[0,0,1,0,0,0,1] – set of 3rdrd and 7 and 7thth most freq. content word most freq. content word

Window of nearby words/featuresWindow of nearby words/features

Page 5: Lecture 22 Word Similarity

– 5 – CSCE 771 Spring 2013

Naïve Bayes ClassifierNaïve Bayes Classifier

w – word vectorw – word vector

s – sense tag vectors – sense tag vector

f – feature vector [wf – feature vector [wii, POS, POSii ] for i=1, …n ] for i=1, …n

Approximate by frequency countsApproximate by frequency counts

But how practical?But how practical?

)|(maxarg^

fsPsSs

Page 6: Lecture 22 Word Similarity

– 6 – CSCE 771 Spring 2013

Looking for Practical formulaLooking for Practical formula

..

Still not practicalStill not practical

)(

)()|(maxarg

)|(maxarg^

fP

sPsfP

fsPs

Ss

Ss

Page 7: Lecture 22 Word Similarity

– 7 – CSCE 771 Spring 2013

Naïve == Assume IndependenceNaïve == Assume Independence

n

jj sfPsfP

1

)|()|(

Now practical, but realistic?Now practical, but realistic?

n

jj

SssfPs

1

^

)|(maxarg

Page 8: Lecture 22 Word Similarity

– 8 – CSCE 771 Spring 2013

Training = count frequenciesTraining = count frequencies

..

Maximum likelihood estimator (20.8)Maximum likelihood estimator (20.8)

)(

),()|(

scount

sfcountsfP jij

)(

),()(

j

jii wcount

wscountsP

Page 9: Lecture 22 Word Similarity

– 9 – CSCE 771 Spring 2013

Decision List ClassifiersDecision List Classifiers

Naïve Bayes Naïve Bayes hard for humans to examine decisions hard for humans to examine decisions and understandand understand

Decision list classifiers - like “case” statementDecision list classifiers - like “case” statement

sequence of (test, returned-sense-tag) pairssequence of (test, returned-sense-tag) pairs

Page 10: Lecture 22 Word Similarity

– 10 – CSCE 771 Spring 2013

Figure 20.2 Decision List Classifier RulesFigure 20.2 Decision List Classifier Rules

Page 11: Lecture 22 Word Similarity

– 11 – CSCE 771 Spring 2013

WSD Evaluation, baselines, ceilingsWSD Evaluation, baselines, ceilings

Extrinsic evaluation - evaluating embedded NLP in Extrinsic evaluation - evaluating embedded NLP in end-to-end applications (in vivo)end-to-end applications (in vivo)

Intrinsic evaluation – WSD evaluating by itself (in vitro)Intrinsic evaluation – WSD evaluating by itself (in vitro)

Sense accuracy Sense accuracy

Corpora – SemCor, SENSEVAL, SEMEVALCorpora – SemCor, SENSEVAL, SEMEVAL

Baseline - Most frequent sense (wordnet sense 1)Baseline - Most frequent sense (wordnet sense 1)

Ceiling – Gold standard – human experts with Ceiling – Gold standard – human experts with discussion and agreement discussion and agreement

Page 12: Lecture 22 Word Similarity

– 12 – CSCE 771 Spring 2013

Similarity of Words or SensesSimilarity of Words or Senses

generally we will be saying words but giving generally we will be saying words but giving similarity of word sensessimilarity of word senses

similarity vs relatednesssimilarity vs relatedness ex similarity ex relatedness

Similarity of wordsSimilarity of words

Similarity of phrases/sentence (not usually done)Similarity of phrases/sentence (not usually done)

Page 13: Lecture 22 Word Similarity

– 13 – CSCE 771 Spring 2013

Figure 20.3 Simplified Lesk AlgorithmFigure 20.3 Simplified Lesk Algorithm

gloss/sentence overlap

Page 14: Lecture 22 Word Similarity

– 14 – CSCE 771 Spring 2013

Simplified Lesk exampleSimplified Lesk example

The bank can guarantee deposits will eventually cover The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable future tuition costs because it invests in adjustable rate mortgage securities.rate mortgage securities.

Page 15: Lecture 22 Word Similarity

– 15 – CSCE 771 Spring 2013

Corpus LeskCorpus Lesk

Using equals weights on words just does not seem Using equals weights on words just does not seem rightright

weights applied to overlap wordsweights applied to overlap words

inverse document frequencyinverse document frequency

idfidfii = log (N = log (Ndocsdocs / num docs containing w / num docs containing wii))

Page 16: Lecture 22 Word Similarity

– 16 – CSCE 771 Spring 2013

SENSEVAL competitions SENSEVAL competitions

http://www.senseval.org/

Check the Senseval-3 website.

Page 17: Lecture 22 Word Similarity

– 17 – CSCE 771 Spring 2013

SemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex eventSemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex event

Page 18: Lecture 22 Word Similarity

– 18 – CSCE 771 Spring 2013

Task NameTask Name AreaArea

#1#1 Coreference Resolution in Coreference Resolution in Multiple Languages CorefMultiple Languages Coref

#2#2 Cross-Lingual Lexical Cross-Lingual Lexical SubstitutionSubstitution Cross-Lingual, Lexical Cross-Lingual, Lexical SubstituSubstitu

#3#3 Cross-Lingual Word Sense Cross-Lingual Word Sense DisambiguationDisambiguation Cross-Lingual, Word Cross-Lingual, Word SensesSenses

#4#4 VP Ellipsis - Detection and VP Ellipsis - Detection and ResolutionResolution EllipsisEllipsis

#5#5 Automatic Keyphrase Extraction Automatic Keyphrase Extraction from Scientific Articlesfrom Scientific Articles

#6#6 Classification of Semantic Classification of Semantic Relations between MeSH Entities in Relations between MeSH Entities in Swedish Medical Texts Swedish Medical Texts

#7#7 Argument Selection and CoercionArgument Selection and CoercionMetonymyMetonymy

#8#8 Multi-Way Classification of Multi-Way Classification of Semantic Relations Between Pairs of Semantic Relations Between Pairs of NominalsNominals

#9#9 Noun Compound Interpretation Noun Compound Interpretation Using Paraphrasing VerbsUsing Paraphrasing Verbs Noun Noun compoundscompounds

#10#10 Linking Events and their Linking Events and their Participants in DiscourseParticipants in DiscourseSemantic Role Labeling, Information Semantic Role Labeling, Information ExtractionExtraction

#11#11 Event Detection in Chinese Event Detection in Chinese News SentencesNews SentencesSemantic Role Labeling, Semantic Role Labeling, Word SensesWord Senses

#12#12 Parser Training and Evaluation Parser Training and Evaluation using Textual Entailmentusing Textual Entailment

#13#13 TempEval 2TempEval 2 Time Time ExpressionsExpressions

#14#14 Word Sense InductionWord Sense Induction

#15#15 Infrequent Sense Identification Infrequent Sense Identification for Mandarin Text to Speech Systemsfor Mandarin Text to Speech Systems

#16#16 Japanese WSDJapanese WSD Word SensesWord Senses

#17#17 All-words Word Sense All-words Word Sense Disambiguation on a Specific Domain Disambiguation on a Specific Domain (WSD-domain)(WSD-domain)

#18#18 Disambiguating Sentiment Disambiguating Sentiment Ambiguous AdjectivesAmbiguous Adjectives Word Senses, Word Senses, SentimSentim

Page 19: Lecture 22 Word Similarity

– 19 – CSCE 771 Spring 2013

20.4.2 Selectional Restrictions and Preferences20.4.2 Selectional Restrictions and Preferences• verb eat verb eat theme=object has feature Food+ theme=object has feature Food+

• Katz and Fodor 1963 used this idea to rule out Katz and Fodor 1963 used this idea to rule out senses that were not consistentsenses that were not consistent

• WSD of diskWSD of disk

(20.12) “In out house, evrybody has a career and none (20.12) “In out house, evrybody has a career and none of them includes washing dishes,” he says.of them includes washing dishes,” he says.

(20.13) In her tiny kitchen, Ms, Chen works efficiently, (20.13) In her tiny kitchen, Ms, Chen works efficiently, stir-frying several simple dishes, inlcuding …stir-frying several simple dishes, inlcuding …

• Verbs wash, stir-fryingVerbs wash, stir-frying• wash washable+• stir-frying edible+

Page 20: Lecture 22 Word Similarity

– 20 – CSCE 771 Spring 2013

Resnik’s model of Selectional AssociationResnik’s model of Selectional AssociationHow much does a predicate tell you about the semantic How much does a predicate tell you about the semantic

class of its arguments?class of its arguments?

• eat eat

• was, is, to be …was, is, to be …

• selectional preference strength of a verb is indicated selectional preference strength of a verb is indicated by two distributions:by two distributions:

1.1. P(c) how likely the direct object is to be in class cP(c) how likely the direct object is to be in class c

2.2. P(c|v) the distribution of expected semantic classes P(c|v) the distribution of expected semantic classes for the particular verb vfor the particular verb v

• the greater the difference in these distributions the greater the difference in these distributions means the verb provides more informationmeans the verb provides more information

Page 21: Lecture 22 Word Similarity

– 21 – CSCE 771 Spring 2013

Relative entropy – Kullback-Leibler divergenceRelative entropy – Kullback-Leibler divergenceGiven two distributions P and QGiven two distributions P and Q

D(P || Q) =D(P || Q) = ∑ P(x) log (p(x)/Q(x)) ∑ P(x) log (p(x)/Q(x)) (eq (eq 20.16)20.16)

Selectional preferenceSelectional preference

SSRR(v) = D( P(c|v) || P(c)) = (v) = D( P(c|v) || P(c)) =

Page 22: Lecture 22 Word Similarity

– 22 – CSCE 771 Spring 2013

Resnik’s model of Selectional AssociationResnik’s model of Selectional Association

Page 23: Lecture 22 Word Similarity

– 23 – CSCE 771 Spring 2013

High and Low Selectional Associations – Resnik 1996High and Low Selectional Associations – Resnik 1996Selectional AssociationsSelectional Associations

Page 24: Lecture 22 Word Similarity

– 24 – CSCE 771 Spring 2013

20.5 Minimally Supervised WSD: Bootstrapping20.5 Minimally Supervised WSD: Bootstrapping ““supervised and dictionary methods require large supervised and dictionary methods require large

hand-built resources”hand-built resources”

bootstrapping or semi-supervised learning or bootstrapping or semi-supervised learning or minimally supervised learning to address the no-data minimally supervised learning to address the no-data problemproblem

Start with seed set and grow it.Start with seed set and grow it.

Page 25: Lecture 22 Word Similarity

– 25 – CSCE 771 Spring 2013

Yarowsky algorithm preliminariesYarowsky algorithm preliminaries

Idea of bootstrapping: “create a larger training set from Idea of bootstrapping: “create a larger training set from a small set of seeds”a small set of seeds”

Heuritics: senses of “bass”Heuritics: senses of “bass”

1.1. one sense per collocationone sense per collocation in a sentence both senses of bass are not used

2.2. one sense per discourseone sense per discourse Yarowsky showed that of 37,232 examples of bass

occurring in a discourse there was only one sense per discourse

YarowskyYarowsky

Page 26: Lecture 22 Word Similarity

– 26 – CSCE 771 Spring 2013

Yarowsky algorithmYarowsky algorithm

Goal: learn a word-sense classifier for a wordGoal: learn a word-sense classifier for a word

Input: Input: ΛΛ00 small seed set of labeled instances of each sense small seed set of labeled instances of each sense

1.1. train classifier on seed-set train classifier on seed-set ΛΛ00,,

2.2. label the unlabeled corpus Vlabel the unlabeled corpus V00 with the classifier with the classifier

3.3. Select examples delta in V that you are “most Select examples delta in V that you are “most confident in”confident in”

4.4. ΛΛ11 = = Λ Λ00 + delta + delta

5.5. Repeat Repeat

Page 27: Lecture 22 Word Similarity

– 27 – CSCE 771 Spring 2013

Figure 20.4 Two senses of plantFigure 20.4 Two senses of plant

Plant 1 – manufacturing plant …Plant 1 – manufacturing plant …

plant 2 – flora, plant lifeplant 2 – flora, plant life

Page 28: Lecture 22 Word Similarity

– 28 – CSCE 771 Spring 2013

2009 Survey of WSD by Navigili2009 Survey of WSD by Navigili

,,

iroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf

Page 29: Lecture 22 Word Similarity

– 29 – CSCE 771 Spring 2013

Figure 20.5 Samples of bass-sentences from WSJ (Wall Street Journal)Figure 20.5 Samples of bass-sentences from WSJ (Wall Street Journal)

Page 30: Lecture 22 Word Similarity

– 30 – CSCE 771 Spring 2013

Word Similarity: Thesaurus Based Methods Figure 20.6 Path Distances in hierarchy

Word Similarity: Thesaurus Based Methods Figure 20.6 Path Distances in hierarchyWordnet of course (pruned)Wordnet of course (pruned)

Page 31: Lecture 22 Word Similarity

– 31 – CSCE 771 Spring 2013

Figure 20.6 Path Based SimilarityFigure 20.6 Path Based Similarity

..

\\

simsimpathpath(c(c11, c, c22)= 1/pathlen(c)= 1/pathlen(c11, c, c22) (length + 1)) (length + 1)

Page 32: Lecture 22 Word Similarity

– 32 – CSCE 771 Spring 2013

WN -hierarchyWN -hierarchy

# Wordnet examples from NLTK # Wordnet examples from NLTK bookbook

import nltkimport nltk

from nltk.corpus import wordnet from nltk.corpus import wordnet as wnas wn

right = right = wn.synset('right_whale.n.01')wn.synset('right_whale.n.01')

orca = wn.synset('orca.n.01')orca = wn.synset('orca.n.01')

minke = minke = wn.synset('minke_whale.n.01')wn.synset('minke_whale.n.01')

tortoise = tortoise = wn.synset('tortoise.n.01')wn.synset('tortoise.n.01')

novel = wn.synset('novel.n.01')novel = wn.synset('novel.n.01')

print "LCS(right, print "LCS(right, minke)=",right.lowest_commonminke)=",right.lowest_common_hypernyms(minke)_hypernyms(minke)

print "LCS(right, print "LCS(right, orca)=",right.lowest_common_horca)=",right.lowest_common_hypernyms(orca)ypernyms(orca)

print "LCS(right, print "LCS(right, tortoise)=",right.lowest_commotortoise)=",right.lowest_common_hypernyms(tortoise)n_hypernyms(tortoise)

print "LCS(right, novel)=", print "LCS(right, novel)=", right.lowest_common_hypernyright.lowest_common_hypernyms(novel)ms(novel)

Page 33: Lecture 22 Word Similarity

– 33 – CSCE 771 Spring 2013

#path similarity#path similarity

print "Path similarities"print "Path similarities"

print right.path_similarity(minke)print right.path_similarity(minke)

print right.path_similarity(orca)print right.path_similarity(orca)

print right.path_similarity(tortoise)print right.path_similarity(tortoise)

print right.path_similarity(novel)print right.path_similarity(novel)

Path similaritiesPath similarities

0.250.25

0.1666666666670.166666666667

0.07692307692310.0769230769231

0.04347826086960.0434782608696

Page 34: Lecture 22 Word Similarity

– 34 – CSCE 771 Spring 2013

Wordnet in NLTKWordnet in NLTK

http://nltk.org/_modules/nltk/corpus/reader/wordnet.html

http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html (partially in Chap 02 NLTK book; but different version) (partially in Chap 02 NLTK book; but different version)

http://grey.colorado.edu/mingus/index.php/Objrec_Wordnet.py code for similarity – runs for a while; lots of results

xx

Page 35: Lecture 22 Word Similarity

– 35 – CSCE 771 Spring 2013

https://groups.google.com/forumhttps://groups.google.com/forum

beautiful 3/4/10

Hi,I was wondering if it is possible for me to use NLTK + wordnet togroup (nouns) words together via similar meanings?

Assuming I have 2000 words or topics. Is it possible for me to groupthem together according to similar meanings using NLTK?

So that at the end of the day I would have different groups of wordsthat are similar in meaning? Can that be done in NLTK? and possibly beable to detect salient patterns emerging? (trend in topics etc...).

Is there a further need for a word classifier based on the CMU BOWtoolkit to classify words to get it into categories? or the above groupwould be good enough? Is there a need to classify words further?

How would one classify words in NLTK effectively?Really hope you can enlighten me?FM

Page 36: Lecture 22 Word Similarity

– 36 – CSCE 771 Spring 2013

Response from Steven BirdResponse from Steven Bird

Steven Bird 3/7/10

2010/3/5 Republic <[email protected]>:

> Assuming I have 2000 words or topics. Is it possible for me to group> them together according to similar meanings using NLTK?

You could compute WordNet similarity (pairwise), so that eachword/topic is represented as a vector of distances, which could thenbe discretized, so each vector would have a form like this:[0,2,3,1,0,0,2,1,3,...].  These vectors could then be clustered usingone of the methods in the NLTK cluster package.

> So that at the end of the day I would have different groups of words> that are similar in meaning? Can that be done in NLTK? and possibly be> able to detect salient patterns emerging? (trend in topics etc...).

This suggests a temporal dimension, which might mean recomputing theclusters as more words or topics come in.

It might help to read the NLTK book sections on WordNet and on textclassification, and also some of the other cited material.-Steven Bird

Page 37: Lecture 22 Word Similarity

– 37 – CSCE 771 Spring 2013

More general? Stack-OverflowMore general? Stack-Overflow

import nltkimport nltk

from nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn

waiter = wn.synset('waiter.n.01')waiter = wn.synset('waiter.n.01')

employee = wn.synset('employee.n.01')employee = wn.synset('employee.n.01')

all_hyponyms_of_waiter = list(set([w.replace("_"," ")all_hyponyms_of_waiter = list(set([w.replace("_"," ")

for s in waiter.closure(lambda s:s.hyponyms())for s in waiter.closure(lambda s:s.hyponyms())

for w in s.lemma_names]))for w in s.lemma_names]))

all_hyponyms_of_employee = …all_hyponyms_of_employee = …

if 'waiter' in all_hyponyms_of_employee:if 'waiter' in all_hyponyms_of_employee:

print 'employee more general than waiter'print 'employee more general than waiter'

elif 'employee' in all_hyponyms_of_waiter:elif 'employee' in all_hyponyms_of_waiter:

print 'waiter more general than employee'print 'waiter more general than employee'

else:else:http://stackoverflow.com/questions/...-semantic-hierarchies-relations-in--nltk

Page 38: Lecture 22 Word Similarity

– 38 – CSCE 771 Spring 2013

print wn(help)print wn(help)

……

| res_similarity(self, synset1, synset2, ic, | res_similarity(self, synset1, synset2, ic, verbose=False)verbose=False)

| Resnik Similarity:| Resnik Similarity:

| Return a score denoting how similar two word | Return a score denoting how similar two word senses are, based on thesenses are, based on the

| Information Content (IC) of the Least Common | Information Content (IC) of the Least Common Subsumer (most specificSubsumer (most specific

| ancestor node).| ancestor node).

http://grey.colorado.edu/mingus/index.php/Objrec_Wordnet.py

Page 39: Lecture 22 Word Similarity

– 39 – CSCE 771 Spring 2013

Similarity based on a hierarchy (=ontology)Similarity based on a hierarchy (=ontology)

Page 40: Lecture 22 Word Similarity

– 40 – CSCE 771 Spring 2013

Information Content word similarityInformation Content word similarity

Page 41: Lecture 22 Word Similarity

– 41 – CSCE 771 Spring 2013

Resnick Similarity / WordnetResnick Similarity / Wordnet

simsimresnickresnick(c1, c2) = -log P(LCS(c1, c2))\ (c1, c2) = -log P(LCS(c1, c2))\

wordnetwordnet

res_similarity(self, synset1, synset2, ic, verbose=False)res_similarity(self, synset1, synset2, ic, verbose=False)

| Resnik Similarity:| Resnik Similarity:

| Return a score denoting how similar two word | Return a score denoting how similar two word senses are, based on thesenses are, based on the

| Information Content (IC) of the Least Common | Information Content (IC) of the Least Common Subsumer (most specificSubsumer (most specific

| ancestor node).| ancestor node).

Page 42: Lecture 22 Word Similarity

– 42 – CSCE 771 Spring 2013

Fig 20.7 Wordnet with Lin P(c) valuesFig 20.7 Wordnet with Lin P(c) values

Change for Resnick!!

Page 43: Lecture 22 Word Similarity

– 43 – CSCE 771 Spring 2013

Lin variation 1998Lin variation 1998

• Commonality –Commonality –

• Difference – Difference –

• IC(description(A,B)) – IC(common(A,B))IC(description(A,B)) – IC(common(A,B))

• simsimLinLin(A,B) = Common(A,B) / description(A,B)(A,B) = Common(A,B) / description(A,B)

Page 44: Lecture 22 Word Similarity

– 44 – CSCE 771 Spring 2013

Fig 20.7 Wordnet with Lin P(c) valuesFig 20.7 Wordnet with Lin P(c) values

Page 45: Lecture 22 Word Similarity

– 45 – CSCE 771 Spring 2013

Extended LeskExtended Lesk

based onbased on1. glosses

2. glosses of hypernyms, hyponyms

ExampleExample

• drawing paper: drawing paper: paperpaper that is that is specially prepared specially prepared for for use in draftinguse in drafting

• decal: the art of transferring designs from decal: the art of transferring designs from specially specially preparedprepared paperpaper to a wood, glass or metal surface. to a wood, glass or metal surface.

• Lesk score = sum of squares of lengths of common Lesk score = sum of squares of lengths of common phrasesphrases

• Example: 1 + 2Example: 1 + 222 = 5 = 5

Page 46: Lecture 22 Word Similarity

– 46 – CSCE 771 Spring 2013

Figure 20.8 Summary of Thesaurus Similarity measuresFigure 20.8 Summary of Thesaurus Similarity measures

Page 47: Lecture 22 Word Similarity

– 47 – CSCE 771 Spring 2013

Wordnet similarity functionsWordnet similarity functions

path_similarity()?path_similarity()?

lch_similarity()?lch_similarity()?

wup_similarity()?wup_similarity()?

res_similarity()?res_similarity()?

jcn_similarity()?jcn_similarity()?

lin_similarity()?lin_similarity()?

Page 48: Lecture 22 Word Similarity

– 48 – CSCE 771 Spring 2013

Problems with thesaurus-basedProblems with thesaurus-based

don’t always have a thesaurusdon’t always have a thesaurus

Even so problems with recallEven so problems with recall missing words phrases missing

thesauri work less well for verbs and adjectivesthesauri work less well for verbs and adjectives less hyponymy structure

Distributional Word Similarity D. Jurafsky

Page 49: Lecture 22 Word Similarity

– 49 – CSCE 771 Spring 2013

Distributional models of meaning Distributional models of meaning

vector-space models of meaningvector-space models of meaning

offer higher recall than hand-built thesaurioffer higher recall than hand-built thesauri less precision probably

Distributional Word Similarity D. Jurafsky

Page 50: Lecture 22 Word Similarity

– 50 – CSCE 771 Spring 2013

Word Similarity Distributional MethodsWord Similarity Distributional Methods

20.31 tezguino example20.31 tezguino example

• A bottle of tezguino is on the table.A bottle of tezguino is on the table.

• Everybody likes tezguino.Everybody likes tezguino.

• tezguino makes you drunk.tezguino makes you drunk.

• We make tezguino out of corn.We make tezguino out of corn.

• What do you know about tezguino?What do you know about tezguino?

Page 51: Lecture 22 Word Similarity

– 51 – CSCE 771 Spring 2013

Term-document matrixTerm-document matrix

Collection of documentsCollection of documents

Identify collection of important terms, discriminatory Identify collection of important terms, discriminatory terms(words)terms(words)

Matrix: terms X documents – Matrix: terms X documents – term frequency tfw,d =

each document a vector in ZV: Z= integers; N=natural numbers more accurate but perhaps

misleading

ExampleExample

Distributional Word Similarity D. Jurafsky

Page 52: Lecture 22 Word Similarity

– 52 – CSCE 771 Spring 2013

Example Term-document matrixExample Term-document matrix

Subset of terms = {battle, soldier, fool, clown}Subset of terms = {battle, soldier, fool, clown}

Distributional Word Similarity D. Jurafsky

As you like it 12th Night Julius Caesar Henry V

Battle 1 1 8 15

Soldier 2 2 12 36

fool 37 58 1 5

clown 6 117 0 0

Page 53: Lecture 22 Word Similarity

– 53 – CSCE 771 Spring 2013

Figure 20.9 Term in context matrix for word similarityFigure 20.9 Term in context matrix for word similaritywindow of 20 words – 10 before 10 after from Brown window of 20 words – 10 before 10 after from Brown

corpuscorpus

Page 54: Lecture 22 Word Similarity

– 54 – CSCE 771 Spring 2013

Pointwise Mutual InformationPointwise Mutual Information

• td-idf (inverse document frequency) rating instead of td-idf (inverse document frequency) rating instead of raw countsraw counts• idf intuition again –

• pointwise mutual information (PMI)pointwise mutual information (PMI)• Do events x and y occur more than if they were

independent?• PMI(X,Y)= log2 P(X,Y) / P(X)P(Y)

• PMI between wordsPMI between words

• Positive PMI between two words (PPMI)Positive PMI between two words (PPMI)

Page 55: Lecture 22 Word Similarity

– 55 – CSCE 771 Spring 2013

Computing PPMIComputing PPMI

Matrix with W (words) rows and C (contexts) Matrix with W (words) rows and C (contexts) columnscolumns

ffijij is frequency of w is frequency of wii in c in cjj, ,

Page 56: Lecture 22 Word Similarity

– 56 – CSCE 771 Spring 2013

Example computing PPMIExample computing PPMI

..

Page 57: Lecture 22 Word Similarity

– 57 – CSCE 771 Spring 2013

Figure 20.10Figure 20.10

Page 58: Lecture 22 Word Similarity

– 58 – CSCE 771 Spring 2013

Figure 20.11Figure 20.11

Page 59: Lecture 22 Word Similarity

– 59 – CSCE 771 Spring 2013

Figure 20.12Figure 20.12

Page 60: Lecture 22 Word Similarity

– 60 – CSCE 771 Spring 2013

Figure 20.13Figure 20.13

Page 61: Lecture 22 Word Similarity

– 61 – CSCE 771 Spring 2013

Figure 20.14Figure 20.14

Page 62: Lecture 22 Word Similarity

– 62 – CSCE 771 Spring 2013

Figure 20.15Figure 20.15

Page 63: Lecture 22 Word Similarity

– 63 – CSCE 771 Spring 2013

Figure 20.16Figure 20.16

Page 64: Lecture 22 Word Similarity

– 64 – CSCE 771 Spring 2013

http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkhttp://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkNLTK 3.0a1 released : February 2013NLTK 3.0a1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. This version adds support for NLTK’s graphical user interfaces.

http://nltk.org/nltk3-alpha/http://nltk.org/nltk3-alpha/

which similarity function in nltk.corpus.wordnet is Appropriate for find which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?similarity of two words?

I want use a function for word clustering and yarowsky algorightm for find I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text.similar collocation in a large text.

http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguisticshttp://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics

http://en.wikipedia.org/wiki/Portal:Linguisticshttp://en.wikipedia.org/wiki/Portal:Linguistics

http://en.wikipedia.org/wiki/Yarowsky_algorithmhttp://en.wikipedia.org/wiki/Yarowsky_algorithm

http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.htmlhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html