Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan...
-
Upload
rebecca-day -
Category
Documents
-
view
227 -
download
0
Transcript of Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan...
![Page 1: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/1.jpg)
Do Supervised Distributional Methods Really Learn Lexical
Inference Relations?Omer Levy Ido Dagan
Bar-Ilan UniversityIsrael
Steffen Remus Chris BiemannTechnische Universität Darmstadt
Germany
![Page 2: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/2.jpg)
Lexical Inference
![Page 3: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/3.jpg)
Lexical Inference: Task Definition
• Given 2 words • Does infer ?
• In the talk: refers to hypernymy (“ is a ”)
Dataset• Positive examples:
dolphin mammalJon Stewart comedian
• Negative examples:shark mammal
Jon Stewart politician
![Page 4: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/4.jpg)
Distributional Methods of Lexical Inference
![Page 5: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/5.jpg)
Unsupervised Distributional Methods
• Represent and as vectors and • Word Embeddings• Traditional (Sparse) Distributional Vectors
• Measure the similarity of and • Cosine Similarity• Distributional Inclusion (Weeds & Weir, 2003; Kotlerman et al., 2010)
• Tune a threshold over the similarity of and • Train a classifier over a single feature
![Page 6: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/6.jpg)
Supervised Distributional Methods
• Represent and as vectors and • Word Embeddings• Traditional (Sparse) Distributional Vectors
• Represent the pair as a combination of and • Concat: (Baroni et al., 2012)• Diff: (Roller et al., 2014; Weeds et al., 2014; Fu et al., 2014)
• Train a classifier over the representation of • Multi-feature representation
![Page 7: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/7.jpg)
Main Questions
• Are current supervised DMs better than unsupervised DMs?
• Are current supervised DMs learning a relation between and ?• (No)
• If not, what are they learning?
![Page 8: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/8.jpg)
Experiment Setup
![Page 9: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/9.jpg)
Experiment Setup
• 9 Word Representations• 3 Representation Methods: PPMI, SVD (over PPMI), word2vec (SGNS)• 3 Context Types
• Bag-of-Words (5 words to each side)• Positional (2 words to each side + position)• Dependency (all syntactically-connected words + dependency)
• Trained on English Wikipedia
• 5 Lexical-Inference Datasets• Kotlerman et al., 2010• Baroni and Lenci, 2011 (BLESS)• Baroni et al., 2012• Turney and Mohammad, 2014• Levy et al., 2014
![Page 10: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/10.jpg)
Supervised Methods
• Concat:• Baroni et al., 2012
• Diff:• Roller et al., 2014; Weeds et al., 2014; Fu et al., 2014
• Only :
• Only :
![Page 11: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/11.jpg)
Are current supervised DMs better than unsupervised DMs?
![Page 12: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/12.jpg)
Previously Reported Success
Prior Art: • Supervised DMs better than unsupervised DMs• Accuracy >95% (in some datasets)
Our Findings:• High accuracy of supervised DMs stems from lexical memorization
![Page 13: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/13.jpg)
Lexical Memorization
• Learning that a specific word is a strong indicator of label
Example:• Many positive training examples like (*, animal)• The classifier memorizes that animal is a good indicator• Test examples like (*, animal) are correctly classified “for free”
• In other words: overfitting• Raises questions on dataset construction
![Page 14: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/14.jpg)
Lexical Memorization
• Avoid lexical memorization with lexical train/test splits
• If “animal” appears in train, it cannot appear in test
• Lexical splits applied to all our experiments
![Page 15: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/15.jpg)
Experiments without Lexical Memorization• 4 supervised vs 1 unsupervised• Cosine similarity
• Cosine similarity outperforms all supervised DMs in 2/5 datasets
• Conclusion: supervised DMs are not necessarily better
Kotler-man2010
Bless2011 Baroni2012 Turney2014 Levy20140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Uns
uper
vise
d
Uns
uper
vise
d
Uns
uper
vise
d
Uns
uper
vise
d
Uns
uper
vise
d
Perf
orm
ance
(F1)
![Page 16: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/16.jpg)
Are current supervised DMslearning a relation between and ?
![Page 17: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/17.jpg)
Learning a Relation between and
• Requires information about the compatibility of and
• What happens when we use Only (ignore )?
• Intuitively, it should fail – could be anything!
![Page 18: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/18.jpg)
Learning a Relation between and
• In practice:
• Almost as well as Concat & Diff
• Best method in 1/5 dataset
Kotler-man2010
Bless2011 Baroni2012 Turney2014 Levy20140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Best
Sup
ervi
sed
Onl
y y
Onl
y y
Onl
y y
Onl
y y
Onl
y y
Perf
orm
ance
(F1)
How can the classifier know that if it does not observe ?
![Page 19: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/19.jpg)
If these methods are not learninga relation between and,
what exactly are they learning?
![Page 20: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/20.jpg)
If these methods are not learninga relation between and,
what exactly are they learning?
![Page 21: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/21.jpg)
Prototypical Hypernyms
Hypothesis: the methods learn whether is a prototypical hypernym
• Prototypical Hypernyms:• animal• mammal• fruit• drug• country• …
• Categories, Supersenses, etc.
![Page 22: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/22.jpg)
Prototypical Hypernyms
Hypothesis: the methods learn whether is a prototypical hypernymExperiment:• Given 2 positive examples and
and ✔
• Create artificial negative examples and
and ✘
• These artificial examples contain prototypical hypernyms as • How easily is the classifier “fooled” by these artificial examples?
![Page 23: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/23.jpg)
Prototypical Hypernyms
• Recall: portion of real positive examples (✔) classified true• Match Error: portion of artificial examples (✘) classified true
• Bottom-right: prefer ✔ over ✘• Good classifiers
• Top-left: prefer ✘ over ✔• Worse than random
• Diagonal: cannot distinguish ✔ from ✘• Predicted by hypothesis
![Page 24: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/24.jpg)
Prototypical Hypernyms
• Recall: portion of real positive examples (✔) classified true• Match Error: portion of artificial examples (✘) classified true
• Regression slope: 0.935
• Result: classifiers cannot distinguishbetween artificial (✘) and real (✔)
• Conclusion: classifiers returns truewhen is a prototypical hypernym
![Page 25: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/25.jpg)
Prototypical Hypernyms: Analysis
• What are the classifiers’ most indicative features?
• Indicators that is a category word:
• Partial Hearst (1992) patterns:
![Page 26: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/26.jpg)
Conclusions
![Page 27: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/27.jpg)
Conclusions
• Are current supervised DMs better than unsupervised DMs?• Not necessarily• Previously reported success stems from lexical memorization
• Are current supervised DMs learning a relation between and ?• No, they are not• Only yields similar results to Concat and Diff
• If not, what are they learning?• Whether is a prototypical hypernym• (“mammal”, “fruit”, “country”, …)
![Page 28: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/28.jpg)
What if the necessary relational information
does not exist in contextual features?
![Page 29: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/29.jpg)
The Limitations of Contextual Features• Example:
• Contextual features cannot capture “” jointly• What can they capture?
(separately)
![Page 30: Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.](https://reader036.fdocuments.us/reader036/viewer/2022081506/56649d6e5503460f94a4efeb/html5/thumbnails/30.jpg)
Than ou