SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation...

19
SphereRE: Distinguishing Lexical Relations with Hyperspherical Relation Embeddings Chengyu Wang 1 , Xiaofeng He 1* , Aoying Zhou 2 1 School of Computer Science and Software Engineering, 2 School of Data Science and Engineering, East China Normal University Shanghai, China

Transcript of SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation...

Page 1: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Distinguishing Lexical Relations with Hyperspherical Relation Embeddings

Chengyu Wang1, Xiaofeng He1*, Aoying Zhou2

1 School of Computer Science and Software Engineering,2 School of Data Science and Engineering,

East China Normal UniversityShanghai, China

Page 2: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Outline• Introduction• The SphereRE Model

– Learning Objective– Relation-aware Semantic Projection– Relation Representation Learning– Lexical Relation Classification

• Experiments• Conclusion

2

Page 3: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Introduction (1)

• Lexical Relation Classification– Task: Classifying a word pair into a finite set of relation

types (e.g., synonymy, antonymy)

3

Examples taken from the CogALex-V shared task

Page 4: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Introduction (2)

• Existing Approaches– Path-based approaches: use dependency paths connecting

two terms to infer lexical relations• “Low coverage” problem

– Distributional approaches: consider the global contexts of terms to predict lexical relations using word embeddings

• “Lexical memorization” problem

4

Page 5: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Introduction (3)

• Our Idea– Learning relation embeddings for term pairs (in the hyperspherical

embedding space)– Term pairs with similar lexical relation types share similar embeddings

5

Page 6: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Learning Objective (1)

• Basic Notations– Training data (term pairs): (𝑥# ,𝑦#) ∈ 𝐷– Testing data (term pairs): (𝑥# ,𝑦#) ∈ 𝑈– Lexical relation types (e.g., synonymy, antonymy): 𝑟# ∈ 𝑅

• Learning Objective in the Word Embedding Space– 𝑓.(�⃑�#): maps the relation subject 𝑥# to the relation object 𝑦# in the

embedding space, where 𝑥# and 𝑦# have the lexical relation type 𝑟. ∈ 𝑅– Objective function:

6

Page 7: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Learning Objective (2)• Learning Objective in the Hyperspherical Relation Space

– 𝛿(𝑟#, 𝑟1) = 31, 𝑥#, 𝑦# , 𝑥1, 𝑦1 ℎ𝑎𝑣𝑒𝑡ℎ𝑒𝑠𝑎𝑚𝑒𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑡𝑦𝑝𝑒

−1, 𝑥#, 𝑦# , 𝑥1, 𝑦1 ℎ𝑎𝑣𝑒𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑡𝑦𝑝𝑒𝑠

• General Learning Objective of SphereRE

7

Page 8: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Relation-aware Semantic Projection

• Learning 𝐽E– For each lexical relation type 𝑟. ∈ 𝑅

– Closed-form solution

• Approximating the probabilistic distribution over all lexical relation types 𝑅 w.r.t. (𝑥#, 𝑦#) ∈ 𝑈– Train a logistic regression classifier using the feature set

8

Page 9: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Relation Representation Learning (1)

• Approximating 𝐽F– Learning a SphereRE vector 𝑟# for each (𝑥# , 𝑦#) ∈ 𝐷 ∪ 𝑈

– Re-writing 𝐽F via negative log likelihood

9

Similar to node2vec!

Page 10: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Relation Representation Learning (2)

• Minimizing 𝐽FH by random walk based sampling– Sampling probability

10

Page 11: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Relation Representation Learning (3)

• Overall Procedure of Learning SphereRE Vectors

11

Page 12: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

SphereRE: Lexical Relation Classification

• Train a feed-forward neural network over all the features to predict lexical relations

12

Page 13: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Experiments (1)• Datasets and Experimental Settings

– Word embeddings: fastText embeddings, 𝑑 = 300– Default parameters settings:

• 𝜇 = 0.001, 𝑑N = 300, 𝐷.#O# = 20, 𝑆 = 100, γ = 2, 𝑙 = 3– Five datasets:

13

Page 14: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Experiments (2)• General Performance over Four Public Datasets

– SphereRE outperforms all the baselines in terms of F1 scores.

14

Page 15: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Experiments (3)• Detailed analysis of SphereRE

– Network structure analysis

– MC sampling analysis

15

Page 16: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Experiments (4)• Experiments over the CogALex-V Shared Task (Subtask 2)

– Consider random relations as noise, discarding it from the averaged F1 score.

– Enforce the lexical spilt of the training and testing sets.– SphereRE outperforms previous systems in the shared task.

16

Page 17: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Experiments (5)

• Visualization of SphereRE Vectors

17

Page 18: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Conclusion• Model

– SphereRE: A distributional model for lexical relation classification based on hyperspherical relation embeddings

• Result– Outperforming previous baselines on four public datasets and the

CogALex-V shared task• Future Work

– Dealing with datasets containing a relatively large number of lexical relation types and random term pairs

– Improving the the mapping technique used for relation-aware semantic projection

18

Page 19: SphereRE: Distinguishing Lexical Relations with ... · Introduction (1) • Lexical Relation Classification – Task: Classifying a word pair into a finite set of relation types (e.g.,

Thank You!

Questions & Answers