Lecture 22 Word Similarity
description
Transcript of Lecture 22 Word Similarity
Lecture 22Word Similarity
Lecture 22Word Similarity
Topics Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarity
Readings:Readings: NLTK book Chapter 2 (wordnet)
Text Chapter 20
April 8, 2013
CSCE 771 Natural Language Processing
– 2 – CSCE 771 Spring 2013
OverviewOverviewLast Time (Programming)Last Time (Programming)
Features in NLTK NL queries SQL NLTK support for Interpretations and Models Propositional and predicate logic support Prover9
TodayToday Last Lectures slides 25-29 Features in NLTK Computational Lexical Semantics
Readings: Readings: Text 19,20 NLTK Book: Chapter 10
Next Time: Computational Lexical Semantics IINext Time: Computational Lexical Semantics II
– 3 – CSCE 771 Spring 2013
Figure 20.1 Possible sense tags for bassFigure 20.1 Possible sense tags for bassChapter 20 – Word Sense disambiguation (WSD)Chapter 20 – Word Sense disambiguation (WSD)
Machine translationMachine translation
Supervised vs unsupervised learningSupervised vs unsupervised learning
Semantic concordance – corpus with words tagged Semantic concordance – corpus with words tagged with sense tags with sense tags
– 4 – CSCE 771 Spring 2013
Feature Extraction for WSDFeature Extraction for WSD
Feature vectorsFeature vectors
CollocationCollocation
[w[wi-2i-2, POS, POSi-2i-2, w, wi-1i-1, POS, POSi-1i-1, w, wii, POS, POSii, w, wi+1i+1, POS, POSi+1i+1, w, wi+2i+2, POS, POSi+2i+2]]
Bag-of-words – unordered set of neighboring wordsBag-of-words – unordered set of neighboring words
Represent sets of most frequent content words with Represent sets of most frequent content words with membership vectormembership vector
[0,0,1,0,0,0,1] – set of 3[0,0,1,0,0,0,1] – set of 3rdrd and 7 and 7thth most freq. content word most freq. content word
Window of nearby words/featuresWindow of nearby words/features
– 5 – CSCE 771 Spring 2013
Naïve Bayes ClassifierNaïve Bayes Classifier
w – word vectorw – word vector
s – sense tag vectors – sense tag vector
f – feature vector [wf – feature vector [wii, POS, POSii ] for i=1, …n ] for i=1, …n
Approximate by frequency countsApproximate by frequency counts
But how practical?But how practical?
)|(maxarg^
fsPsSs
– 6 – CSCE 771 Spring 2013
Looking for Practical formulaLooking for Practical formula
..
Still not practicalStill not practical
)(
)()|(maxarg
)|(maxarg^
fP
sPsfP
fsPs
Ss
Ss
– 7 – CSCE 771 Spring 2013
Naïve == Assume IndependenceNaïve == Assume Independence
n
jj sfPsfP
1
)|()|(
Now practical, but realistic?Now practical, but realistic?
n
jj
SssfPs
1
^
)|(maxarg
– 8 – CSCE 771 Spring 2013
Training = count frequenciesTraining = count frequencies
..
Maximum likelihood estimator (20.8)Maximum likelihood estimator (20.8)
)(
),()|(
scount
sfcountsfP jij
)(
),()(
j
jii wcount
wscountsP
– 9 – CSCE 771 Spring 2013
Decision List ClassifiersDecision List Classifiers
Naïve Bayes Naïve Bayes hard for humans to examine decisions hard for humans to examine decisions and understandand understand
Decision list classifiers - like “case” statementDecision list classifiers - like “case” statement
sequence of (test, returned-sense-tag) pairssequence of (test, returned-sense-tag) pairs
– 10 – CSCE 771 Spring 2013
Figure 20.2 Decision List Classifier RulesFigure 20.2 Decision List Classifier Rules
– 11 – CSCE 771 Spring 2013
WSD Evaluation, baselines, ceilingsWSD Evaluation, baselines, ceilings
Extrinsic evaluation - evaluating embedded NLP in Extrinsic evaluation - evaluating embedded NLP in end-to-end applications (in vivo)end-to-end applications (in vivo)
Intrinsic evaluation – WSD evaluating by itself (in vitro)Intrinsic evaluation – WSD evaluating by itself (in vitro)
Sense accuracy Sense accuracy
Corpora – SemCor, SENSEVAL, SEMEVALCorpora – SemCor, SENSEVAL, SEMEVAL
Baseline - Most frequent sense (wordnet sense 1)Baseline - Most frequent sense (wordnet sense 1)
Ceiling – Gold standard – human experts with Ceiling – Gold standard – human experts with discussion and agreement discussion and agreement
– 12 – CSCE 771 Spring 2013
Similarity of Words or SensesSimilarity of Words or Senses
generally we will be saying words but giving generally we will be saying words but giving similarity of word sensessimilarity of word senses
similarity vs relatednesssimilarity vs relatedness ex similarity ex relatedness
Similarity of wordsSimilarity of words
Similarity of phrases/sentence (not usually done)Similarity of phrases/sentence (not usually done)
– 13 – CSCE 771 Spring 2013
Figure 20.3 Simplified Lesk AlgorithmFigure 20.3 Simplified Lesk Algorithm
gloss/sentence overlap
– 14 – CSCE 771 Spring 2013
Simplified Lesk exampleSimplified Lesk example
The bank can guarantee deposits will eventually cover The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable future tuition costs because it invests in adjustable rate mortgage securities.rate mortgage securities.
– 15 – CSCE 771 Spring 2013
Corpus LeskCorpus Lesk
Using equals weights on words just does not seem Using equals weights on words just does not seem rightright
weights applied to overlap wordsweights applied to overlap words
inverse document frequencyinverse document frequency
idfidfii = log (N = log (Ndocsdocs / num docs containing w / num docs containing wii))
– 16 – CSCE 771 Spring 2013
SENSEVAL competitions SENSEVAL competitions
http://www.senseval.org/
Check the Senseval-3 website.
– 17 – CSCE 771 Spring 2013
SemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex eventSemEval-2 -Evaluation Exercises on Semantic Evaluation - ACL SigLex event
– 18 – CSCE 771 Spring 2013
Task NameTask Name AreaArea
#1#1 Coreference Resolution in Coreference Resolution in Multiple Languages CorefMultiple Languages Coref
#2#2 Cross-Lingual Lexical Cross-Lingual Lexical SubstitutionSubstitution Cross-Lingual, Lexical Cross-Lingual, Lexical SubstituSubstitu
#3#3 Cross-Lingual Word Sense Cross-Lingual Word Sense DisambiguationDisambiguation Cross-Lingual, Word Cross-Lingual, Word SensesSenses
#4#4 VP Ellipsis - Detection and VP Ellipsis - Detection and ResolutionResolution EllipsisEllipsis
#5#5 Automatic Keyphrase Extraction Automatic Keyphrase Extraction from Scientific Articlesfrom Scientific Articles
#6#6 Classification of Semantic Classification of Semantic Relations between MeSH Entities in Relations between MeSH Entities in Swedish Medical Texts Swedish Medical Texts
#7#7 Argument Selection and CoercionArgument Selection and CoercionMetonymyMetonymy
#8#8 Multi-Way Classification of Multi-Way Classification of Semantic Relations Between Pairs of Semantic Relations Between Pairs of NominalsNominals
#9#9 Noun Compound Interpretation Noun Compound Interpretation Using Paraphrasing VerbsUsing Paraphrasing Verbs Noun Noun compoundscompounds
#10#10 Linking Events and their Linking Events and their Participants in DiscourseParticipants in DiscourseSemantic Role Labeling, Information Semantic Role Labeling, Information ExtractionExtraction
#11#11 Event Detection in Chinese Event Detection in Chinese News SentencesNews SentencesSemantic Role Labeling, Semantic Role Labeling, Word SensesWord Senses
#12#12 Parser Training and Evaluation Parser Training and Evaluation using Textual Entailmentusing Textual Entailment
#13#13 TempEval 2TempEval 2 Time Time ExpressionsExpressions
#14#14 Word Sense InductionWord Sense Induction
#15#15 Infrequent Sense Identification Infrequent Sense Identification for Mandarin Text to Speech Systemsfor Mandarin Text to Speech Systems
#16#16 Japanese WSDJapanese WSD Word SensesWord Senses
#17#17 All-words Word Sense All-words Word Sense Disambiguation on a Specific Domain Disambiguation on a Specific Domain (WSD-domain)(WSD-domain)
#18#18 Disambiguating Sentiment Disambiguating Sentiment Ambiguous AdjectivesAmbiguous Adjectives Word Senses, Word Senses, SentimSentim
– 19 – CSCE 771 Spring 2013
20.4.2 Selectional Restrictions and Preferences20.4.2 Selectional Restrictions and Preferences• verb eat verb eat theme=object has feature Food+ theme=object has feature Food+
• Katz and Fodor 1963 used this idea to rule out Katz and Fodor 1963 used this idea to rule out senses that were not consistentsenses that were not consistent
• WSD of diskWSD of disk
(20.12) “In out house, evrybody has a career and none (20.12) “In out house, evrybody has a career and none of them includes washing dishes,” he says.of them includes washing dishes,” he says.
(20.13) In her tiny kitchen, Ms, Chen works efficiently, (20.13) In her tiny kitchen, Ms, Chen works efficiently, stir-frying several simple dishes, inlcuding …stir-frying several simple dishes, inlcuding …
• Verbs wash, stir-fryingVerbs wash, stir-frying• wash washable+• stir-frying edible+
– 20 – CSCE 771 Spring 2013
Resnik’s model of Selectional AssociationResnik’s model of Selectional AssociationHow much does a predicate tell you about the semantic How much does a predicate tell you about the semantic
class of its arguments?class of its arguments?
• eat eat
• was, is, to be …was, is, to be …
• selectional preference strength of a verb is indicated selectional preference strength of a verb is indicated by two distributions:by two distributions:
1.1. P(c) how likely the direct object is to be in class cP(c) how likely the direct object is to be in class c
2.2. P(c|v) the distribution of expected semantic classes P(c|v) the distribution of expected semantic classes for the particular verb vfor the particular verb v
• the greater the difference in these distributions the greater the difference in these distributions means the verb provides more informationmeans the verb provides more information
– 21 – CSCE 771 Spring 2013
Relative entropy – Kullback-Leibler divergenceRelative entropy – Kullback-Leibler divergenceGiven two distributions P and QGiven two distributions P and Q
D(P || Q) =D(P || Q) = ∑ P(x) log (p(x)/Q(x)) ∑ P(x) log (p(x)/Q(x)) (eq (eq 20.16)20.16)
Selectional preferenceSelectional preference
SSRR(v) = D( P(c|v) || P(c)) = (v) = D( P(c|v) || P(c)) =
– 22 – CSCE 771 Spring 2013
Resnik’s model of Selectional AssociationResnik’s model of Selectional Association
– 23 – CSCE 771 Spring 2013
High and Low Selectional Associations – Resnik 1996High and Low Selectional Associations – Resnik 1996Selectional AssociationsSelectional Associations
– 24 – CSCE 771 Spring 2013
20.5 Minimally Supervised WSD: Bootstrapping20.5 Minimally Supervised WSD: Bootstrapping ““supervised and dictionary methods require large supervised and dictionary methods require large
hand-built resources”hand-built resources”
bootstrapping or semi-supervised learning or bootstrapping or semi-supervised learning or minimally supervised learning to address the no-data minimally supervised learning to address the no-data problemproblem
Start with seed set and grow it.Start with seed set and grow it.
– 25 – CSCE 771 Spring 2013
Yarowsky algorithm preliminariesYarowsky algorithm preliminaries
Idea of bootstrapping: “create a larger training set from Idea of bootstrapping: “create a larger training set from a small set of seeds”a small set of seeds”
Heuritics: senses of “bass”Heuritics: senses of “bass”
1.1. one sense per collocationone sense per collocation in a sentence both senses of bass are not used
2.2. one sense per discourseone sense per discourse Yarowsky showed that of 37,232 examples of bass
occurring in a discourse there was only one sense per discourse
YarowskyYarowsky
– 26 – CSCE 771 Spring 2013
Yarowsky algorithmYarowsky algorithm
Goal: learn a word-sense classifier for a wordGoal: learn a word-sense classifier for a word
Input: Input: ΛΛ00 small seed set of labeled instances of each sense small seed set of labeled instances of each sense
1.1. train classifier on seed-set train classifier on seed-set ΛΛ00,,
2.2. label the unlabeled corpus Vlabel the unlabeled corpus V00 with the classifier with the classifier
3.3. Select examples delta in V that you are “most Select examples delta in V that you are “most confident in”confident in”
4.4. ΛΛ11 = = Λ Λ00 + delta + delta
5.5. Repeat Repeat
– 27 – CSCE 771 Spring 2013
Figure 20.4 Two senses of plantFigure 20.4 Two senses of plant
Plant 1 – manufacturing plant …Plant 1 – manufacturing plant …
plant 2 – flora, plant lifeplant 2 – flora, plant life
– 28 – CSCE 771 Spring 2013
2009 Survey of WSD by Navigili2009 Survey of WSD by Navigili
,,
iroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf
– 29 – CSCE 771 Spring 2013
Figure 20.5 Samples of bass-sentences from WSJ (Wall Street Journal)Figure 20.5 Samples of bass-sentences from WSJ (Wall Street Journal)
– 30 – CSCE 771 Spring 2013
Word Similarity: Thesaurus Based Methods Figure 20.6 Path Distances in hierarchy
Word Similarity: Thesaurus Based Methods Figure 20.6 Path Distances in hierarchyWordnet of course (pruned)Wordnet of course (pruned)
– 31 – CSCE 771 Spring 2013
Figure 20.6 Path Based SimilarityFigure 20.6 Path Based Similarity
..
\\
simsimpathpath(c(c11, c, c22)= 1/pathlen(c)= 1/pathlen(c11, c, c22) (length + 1)) (length + 1)
– 32 – CSCE 771 Spring 2013
WN -hierarchyWN -hierarchy
# Wordnet examples from NLTK # Wordnet examples from NLTK bookbook
import nltkimport nltk
from nltk.corpus import wordnet from nltk.corpus import wordnet as wnas wn
right = right = wn.synset('right_whale.n.01')wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')orca = wn.synset('orca.n.01')
minke = minke = wn.synset('minke_whale.n.01')wn.synset('minke_whale.n.01')
tortoise = tortoise = wn.synset('tortoise.n.01')wn.synset('tortoise.n.01')
novel = wn.synset('novel.n.01')novel = wn.synset('novel.n.01')
print "LCS(right, print "LCS(right, minke)=",right.lowest_commonminke)=",right.lowest_common_hypernyms(minke)_hypernyms(minke)
print "LCS(right, print "LCS(right, orca)=",right.lowest_common_horca)=",right.lowest_common_hypernyms(orca)ypernyms(orca)
print "LCS(right, print "LCS(right, tortoise)=",right.lowest_commotortoise)=",right.lowest_common_hypernyms(tortoise)n_hypernyms(tortoise)
print "LCS(right, novel)=", print "LCS(right, novel)=", right.lowest_common_hypernyright.lowest_common_hypernyms(novel)ms(novel)
– 33 – CSCE 771 Spring 2013
#path similarity#path similarity
print "Path similarities"print "Path similarities"
print right.path_similarity(minke)print right.path_similarity(minke)
print right.path_similarity(orca)print right.path_similarity(orca)
print right.path_similarity(tortoise)print right.path_similarity(tortoise)
print right.path_similarity(novel)print right.path_similarity(novel)
Path similaritiesPath similarities
0.250.25
0.1666666666670.166666666667
0.07692307692310.0769230769231
0.04347826086960.0434782608696
– 34 – CSCE 771 Spring 2013
Wordnet in NLTKWordnet in NLTK
http://nltk.org/_modules/nltk/corpus/reader/wordnet.html
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html (partially in Chap 02 NLTK book; but different version) (partially in Chap 02 NLTK book; but different version)
http://grey.colorado.edu/mingus/index.php/Objrec_Wordnet.py code for similarity – runs for a while; lots of results
xx
– 35 – CSCE 771 Spring 2013
https://groups.google.com/forumhttps://groups.google.com/forum
beautiful 3/4/10
Hi,I was wondering if it is possible for me to use NLTK + wordnet togroup (nouns) words together via similar meanings?
Assuming I have 2000 words or topics. Is it possible for me to groupthem together according to similar meanings using NLTK?
So that at the end of the day I would have different groups of wordsthat are similar in meaning? Can that be done in NLTK? and possibly beable to detect salient patterns emerging? (trend in topics etc...).
Is there a further need for a word classifier based on the CMU BOWtoolkit to classify words to get it into categories? or the above groupwould be good enough? Is there a need to classify words further?
How would one classify words in NLTK effectively?Really hope you can enlighten me?FM
– 36 – CSCE 771 Spring 2013
Response from Steven BirdResponse from Steven Bird
Steven Bird 3/7/10
2010/3/5 Republic <[email protected]>:
> Assuming I have 2000 words or topics. Is it possible for me to group> them together according to similar meanings using NLTK?
You could compute WordNet similarity (pairwise), so that eachword/topic is represented as a vector of distances, which could thenbe discretized, so each vector would have a form like this:[0,2,3,1,0,0,2,1,3,...]. These vectors could then be clustered usingone of the methods in the NLTK cluster package.
> So that at the end of the day I would have different groups of words> that are similar in meaning? Can that be done in NLTK? and possibly be> able to detect salient patterns emerging? (trend in topics etc...).
This suggests a temporal dimension, which might mean recomputing theclusters as more words or topics come in.
It might help to read the NLTK book sections on WordNet and on textclassification, and also some of the other cited material.-Steven Bird
– 37 – CSCE 771 Spring 2013
More general? Stack-OverflowMore general? Stack-Overflow
import nltkimport nltk
from nltk.corpus import wordnet as wnfrom nltk.corpus import wordnet as wn
waiter = wn.synset('waiter.n.01')waiter = wn.synset('waiter.n.01')
employee = wn.synset('employee.n.01')employee = wn.synset('employee.n.01')
all_hyponyms_of_waiter = list(set([w.replace("_"," ")all_hyponyms_of_waiter = list(set([w.replace("_"," ")
for s in waiter.closure(lambda s:s.hyponyms())for s in waiter.closure(lambda s:s.hyponyms())
for w in s.lemma_names]))for w in s.lemma_names]))
all_hyponyms_of_employee = …all_hyponyms_of_employee = …
if 'waiter' in all_hyponyms_of_employee:if 'waiter' in all_hyponyms_of_employee:
print 'employee more general than waiter'print 'employee more general than waiter'
elif 'employee' in all_hyponyms_of_waiter:elif 'employee' in all_hyponyms_of_waiter:
print 'waiter more general than employee'print 'waiter more general than employee'
else:else:http://stackoverflow.com/questions/...-semantic-hierarchies-relations-in--nltk
– 38 – CSCE 771 Spring 2013
print wn(help)print wn(help)
……
| res_similarity(self, synset1, synset2, ic, | res_similarity(self, synset1, synset2, ic, verbose=False)verbose=False)
| Resnik Similarity:| Resnik Similarity:
| Return a score denoting how similar two word | Return a score denoting how similar two word senses are, based on thesenses are, based on the
| Information Content (IC) of the Least Common | Information Content (IC) of the Least Common Subsumer (most specificSubsumer (most specific
| ancestor node).| ancestor node).
http://grey.colorado.edu/mingus/index.php/Objrec_Wordnet.py
– 39 – CSCE 771 Spring 2013
Similarity based on a hierarchy (=ontology)Similarity based on a hierarchy (=ontology)
– 40 – CSCE 771 Spring 2013
Information Content word similarityInformation Content word similarity
– 41 – CSCE 771 Spring 2013
Resnick Similarity / WordnetResnick Similarity / Wordnet
simsimresnickresnick(c1, c2) = -log P(LCS(c1, c2))\ (c1, c2) = -log P(LCS(c1, c2))\
wordnetwordnet
res_similarity(self, synset1, synset2, ic, verbose=False)res_similarity(self, synset1, synset2, ic, verbose=False)
| Resnik Similarity:| Resnik Similarity:
| Return a score denoting how similar two word | Return a score denoting how similar two word senses are, based on thesenses are, based on the
| Information Content (IC) of the Least Common | Information Content (IC) of the Least Common Subsumer (most specificSubsumer (most specific
| ancestor node).| ancestor node).
– 42 – CSCE 771 Spring 2013
Fig 20.7 Wordnet with Lin P(c) valuesFig 20.7 Wordnet with Lin P(c) values
Change for Resnick!!
– 43 – CSCE 771 Spring 2013
Lin variation 1998Lin variation 1998
• Commonality –Commonality –
• Difference – Difference –
• IC(description(A,B)) – IC(common(A,B))IC(description(A,B)) – IC(common(A,B))
• simsimLinLin(A,B) = Common(A,B) / description(A,B)(A,B) = Common(A,B) / description(A,B)
– 44 – CSCE 771 Spring 2013
Fig 20.7 Wordnet with Lin P(c) valuesFig 20.7 Wordnet with Lin P(c) values
– 45 – CSCE 771 Spring 2013
Extended LeskExtended Lesk
based onbased on1. glosses
2. glosses of hypernyms, hyponyms
ExampleExample
• drawing paper: drawing paper: paperpaper that is that is specially prepared specially prepared for for use in draftinguse in drafting
• decal: the art of transferring designs from decal: the art of transferring designs from specially specially preparedprepared paperpaper to a wood, glass or metal surface. to a wood, glass or metal surface.
• Lesk score = sum of squares of lengths of common Lesk score = sum of squares of lengths of common phrasesphrases
• Example: 1 + 2Example: 1 + 222 = 5 = 5
– 46 – CSCE 771 Spring 2013
Figure 20.8 Summary of Thesaurus Similarity measuresFigure 20.8 Summary of Thesaurus Similarity measures
– 47 – CSCE 771 Spring 2013
Wordnet similarity functionsWordnet similarity functions
path_similarity()?path_similarity()?
lch_similarity()?lch_similarity()?
wup_similarity()?wup_similarity()?
res_similarity()?res_similarity()?
jcn_similarity()?jcn_similarity()?
lin_similarity()?lin_similarity()?
– 48 – CSCE 771 Spring 2013
Problems with thesaurus-basedProblems with thesaurus-based
don’t always have a thesaurusdon’t always have a thesaurus
Even so problems with recallEven so problems with recall missing words phrases missing
thesauri work less well for verbs and adjectivesthesauri work less well for verbs and adjectives less hyponymy structure
Distributional Word Similarity D. Jurafsky
– 49 – CSCE 771 Spring 2013
Distributional models of meaning Distributional models of meaning
vector-space models of meaningvector-space models of meaning
offer higher recall than hand-built thesaurioffer higher recall than hand-built thesauri less precision probably
Distributional Word Similarity D. Jurafsky
– 50 – CSCE 771 Spring 2013
Word Similarity Distributional MethodsWord Similarity Distributional Methods
20.31 tezguino example20.31 tezguino example
• A bottle of tezguino is on the table.A bottle of tezguino is on the table.
• Everybody likes tezguino.Everybody likes tezguino.
• tezguino makes you drunk.tezguino makes you drunk.
• We make tezguino out of corn.We make tezguino out of corn.
• What do you know about tezguino?What do you know about tezguino?
– 51 – CSCE 771 Spring 2013
Term-document matrixTerm-document matrix
Collection of documentsCollection of documents
Identify collection of important terms, discriminatory Identify collection of important terms, discriminatory terms(words)terms(words)
Matrix: terms X documents – Matrix: terms X documents – term frequency tfw,d =
each document a vector in ZV: Z= integers; N=natural numbers more accurate but perhaps
misleading
ExampleExample
Distributional Word Similarity D. Jurafsky
– 52 – CSCE 771 Spring 2013
Example Term-document matrixExample Term-document matrix
Subset of terms = {battle, soldier, fool, clown}Subset of terms = {battle, soldier, fool, clown}
Distributional Word Similarity D. Jurafsky
As you like it 12th Night Julius Caesar Henry V
Battle 1 1 8 15
Soldier 2 2 12 36
fool 37 58 1 5
clown 6 117 0 0
– 53 – CSCE 771 Spring 2013
Figure 20.9 Term in context matrix for word similarityFigure 20.9 Term in context matrix for word similaritywindow of 20 words – 10 before 10 after from Brown window of 20 words – 10 before 10 after from Brown
corpuscorpus
– 54 – CSCE 771 Spring 2013
Pointwise Mutual InformationPointwise Mutual Information
• td-idf (inverse document frequency) rating instead of td-idf (inverse document frequency) rating instead of raw countsraw counts• idf intuition again –
• pointwise mutual information (PMI)pointwise mutual information (PMI)• Do events x and y occur more than if they were
independent?• PMI(X,Y)= log2 P(X,Y) / P(X)P(Y)
• PMI between wordsPMI between words
• Positive PMI between two words (PPMI)Positive PMI between two words (PPMI)
– 55 – CSCE 771 Spring 2013
Computing PPMIComputing PPMI
Matrix with W (words) rows and C (contexts) Matrix with W (words) rows and C (contexts) columnscolumns
ffijij is frequency of w is frequency of wii in c in cjj, ,
– 56 – CSCE 771 Spring 2013
Example computing PPMIExample computing PPMI
..
– 57 – CSCE 771 Spring 2013
Figure 20.10Figure 20.10
– 58 – CSCE 771 Spring 2013
Figure 20.11Figure 20.11
– 59 – CSCE 771 Spring 2013
Figure 20.12Figure 20.12
– 60 – CSCE 771 Spring 2013
Figure 20.13Figure 20.13
– 61 – CSCE 771 Spring 2013
Figure 20.14Figure 20.14
– 62 – CSCE 771 Spring 2013
Figure 20.15Figure 20.15
– 63 – CSCE 771 Spring 2013
Figure 20.16Figure 20.16
– 64 – CSCE 771 Spring 2013
http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkhttp://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltkNLTK 3.0a1 released : February 2013NLTK 3.0a1 released : February 2013 This version adds support for NLTK’s graphical user interfaces. This version adds support for NLTK’s graphical user interfaces.
http://nltk.org/nltk3-alpha/http://nltk.org/nltk3-alpha/
which similarity function in nltk.corpus.wordnet is Appropriate for find which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?similarity of two words?
I want use a function for word clustering and yarowsky algorightm for find I want use a function for word clustering and yarowsky algorightm for find similar collocation in a large text.similar collocation in a large text.
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguisticshttp://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics
http://en.wikipedia.org/wiki/Portal:Linguisticshttp://en.wikipedia.org/wiki/Portal:Linguistics
http://en.wikipedia.org/wiki/Yarowsky_algorithmhttp://en.wikipedia.org/wiki/Yarowsky_algorithm
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.htmlhttp://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html