Sampling and Sampling Distributions Simple Random Sampling Point Estimation Sampling Distribution.
NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf ·...
Transcript of NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf ·...
![Page 1: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/1.jpg)
NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding
Dr. Quanming YAO
Researcher@4Paradigm. Inc
Accepted by ICDE 2019: https://arxiv.org/pdf/1812.06410.pdf
Code: https://github.com/yzhangee/NSCaching
Email: [email protected]
![Page 2: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/2.jpg)
About This Talk
• Knowledge Graph Embedding
• Negative Sampling
• NSCaching: faster and better negative sampling
• Experiments
![Page 3: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/3.jpg)
Knowledge Graph Embedding
![Page 4: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/4.jpg)
Knowledge Graph (KG)
Knowledge structure as graph• Each node = an entity
• Each edge = a relation
Fact (triplet):• (head, relation, tail)
Typical KGs:• WordNet: Linguistic KG
• Freebase, DBpedia, YAGO: World KG
Applications:• Structured search [Dong et.al. KDD 2014]
• Question answering [Lukovnikov et.al. WWW 2017]
• Recommendation [Zhang et.al. KDD 2016]
(Michelle, hasChild, ?)
![Page 5: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/5.jpg)
KG Embedding (what & why)
Encode entities and relations in a KG into low-dimensional vector spaces ℝ𝑑, while capturing nodes’ and edges’ connection properties
Once triplets are processed into vectors, they can be used for subsequent learning tasks
![Page 6: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/6.jpg)
KG Embedding (how)
A scoring function 𝑓(ℎ, 𝑟, 𝑡) is given to capture the interactions (similarity) between two entities based on a relation by their embeddings
TransE [Bordes, etal 2013]: 𝑓 ℎ, 𝑟, 𝑡 = − 𝐡 + 𝐫 − 𝐭 1
DistMult [Yang, etal. 2017]: 𝑓 ℎ, 𝑟, 𝑡 = 𝐡𝑇diag 𝐫 𝐭
𝑓 ℎ, 𝑟, 𝑡
• ℎ: embedded vector of head entity
• 𝑟: embedded vector of relation
• 𝑡: embedded vector of tail entity
![Page 7: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/7.jpg)
KG Embedding (how)
Target:
• maximize 𝑓 on a set of positive triplets 𝒮 = ℎ, 𝑟, 𝑡
• minimize 𝑓 on a set of negative triplets ҧ𝒮 = തℎ, 𝑟, ҧ𝑡
− 𝐎𝐛𝐚𝐦𝐚 +𝐌𝐚𝐫𝐫𝐢𝐞𝐝𝐓𝐨 −𝐌𝐢𝐜𝐡𝐞𝐥𝐥𝐞 1
− 𝐎𝐛𝐚𝐦𝐚 +𝐌𝐚𝐫𝐫𝐢𝐞𝐝𝐓𝐨 − 𝐓𝐫𝐮𝐦𝐩 1
Objective, minimize, e.g.,
• 𝐿 ℰ,ℛ = σ ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮 𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+, (𝛾 > 0)
• 𝐿 ℰ,ℛ = σ ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮 ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))
Use TransE as example
ℓ is a loss function for binary classification
![Page 8: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/8.jpg)
Negative Sampling (why)
A KG only contains observed facts (positive triplets)
Non-observed ones are assumed to be negative with large probability
Positive Negative
(Obama, marriedTo, Michelle) (Obama, marriedTo, Sasha), (SaSha, marriedTo, Michelle), (Obama, bornOn, Michelle)
(Michelle, hasChild, Malia) (Michelle, hasChild, Obama), (Sasha, hasChild, Malia), (Michelle, bornOn, Malia)
![Page 9: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/9.jpg)
Negative Sampling (why)
𝐿 ℰ,ℛ =
ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮
ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))
• Performance: Not all negative samples are equally good, bad ones can make
performance worse
Positive: (Steve Jobs, FounderOf, Apple Inc.)
Low-quality: (Baseball, FounderOf, Apple Inc.)
High-quality: (Bill Gates, FounderOf, Apple Inc.)
![Page 10: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/10.jpg)
Negative Sampling (why)
𝐿 ℰ,ℛ =
ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮
ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))
• Computation: Number of negative samples (unobserved triplets) is very large,
considering all of them is computationally infeasible
How to sample a few high-quality negative samples is important for performance and efficiency
![Page 11: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/11.jpg)
Negative Sampling (why)
Negative sampling is not isolated in KG, they also appears in word2vec [Mikolov, et.al. 2013], Click-Through Rate prediction (CTR)
The needs of negative sampling are the same
word2vec
![Page 12: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/12.jpg)
Negative Sampling (how)
Given a positive triplet ℎ, 𝑟, 𝑡 , the set of negative triplets
Few negative samples are sampled from ҧ𝒮 ℎ,𝑟,𝑡 .
Note that ℎ, ҧ𝑟, 𝑡 ∉ 𝒮| ҧ𝑟 ∈ ℰ is not included since it is more likely to be false negative.
ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ
![Page 13: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/13.jpg)
Negative Sampling (problems)
Given a negative triplets set ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ ,
uniformly sampling from the set is widely used in literature.
The quality of negative sample matters!
![Page 14: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/14.jpg)
Negative Sampling (problems)
Low quality negative samples become less informative gradually [Wang et.al. AAAI 2018]
• Positive: (Steve Jobs, FounderOf, Apple Inc.)
• Low-quality: (Baseball, FounderOf, Apple Inc.)
• High-quality: (Bill Gates, FounderOf, Apple Inc.)
Vanishing Gradient
𝐿 ℰ,ℛ =
ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮
𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+
[Wang et.al. AAAI 2018]
We need to adaptively generate high quality negative triplets as training goes on.
High-quality negative triplets should havelarge scores. We need to capture the dynamicdistribution of them and sample from it.
![Page 15: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/15.jpg)
GAN-based Method (existing solutions)
Key idea• Use a generator to model the dynamic negative triplet distribution
• High quality negative triplets are sampled by the generator
• Joint optimize (reinforcement learning is used):• Discriminator is trained based on negative triplets provided by generator;
• Generator obtains reward by the discriminator.
target KG embedding
IGAN [Wang et.al. AAAI 2018]
KBGAN [Cai et.al. NAACL 2018]
Self-pace NE [Gao et. al. KDD 2018]
![Page 16: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/16.jpg)
NSCaching: faster and better negative sampling
![Page 17: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/17.jpg)
𝐿 ℰ,ℛ =
ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮
𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+
Key Observations
High-quality: large score evaluated from the scoring function
Recall that:
![Page 18: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/18.jpg)
Key Observations
The score distribution of negative triplets is highly skewed
Properties: dynamic, rare, complex
![Page 19: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/19.jpg)
Key Observations
Word2vec has similar observations on negative samples (on words)
- “While NCE can be shown to approximately maximize the log probability of the softmax, the Skip-gram model is only concerned with learning high-quality vector representations, so we are free to simplify NCE as long as the vector representations retain their quality.”
Can we design a sampling scheme fully explore above properties?
![Page 20: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/20.jpg)
GAN-based Method (existing solutions)
GAN based NSCaching
Increased number of training parameters No extra parameters introduced
Sampling is not efficient Efficient sampling through the cache
Training suffers from instability and degeneracy Stable without pre-train
![Page 21: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/21.jpg)
NSCaching (overview)
Challenges:
• How to model the dynamic distribution of negative triplets
• How to sample high-quality negative triplets in an efficient way
Motivation:
the KG embedding itself contains information of triplets quality
• Use a small amount of extra memory, which caches negative samples with large scores for each triplet in 𝒮 during training
• Sample the negative triplet directly from the cache
![Page 22: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/22.jpg)
NSCaching (design issues)
Core idea: cache high-quality negative samples for each observed triplets
• How to construct & update the cache?
• How to sample from the cache?
Recall that: the distribution of negative samples are dynamic
during training, but high-quality ones are rare
![Page 23: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/23.jpg)
NSCaching (design issues)
Sampling from the cache• The negative triplets in the cache may not be accurate enough;
• There are false negative triplets in the negative sample sets.
Update the cache• The cache needs to be dynamically changed during the iterations of the algorithm;
• Should be able to explore all the possible high-quality negative samples;
• The update procedure should be efficient.
sample
sample
ҧ𝑡
തℎ
𝒯ℎ,𝑟
ℋ 𝑟,𝑡
Recall that all possible choices are:ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ
![Page 24: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/24.jpg)
NSCaching (update & construct cache)
• Randomly sample candidates
from all possible negative triplets
• Evaluate scores of each candidate
• Compared with existing ones in
the cache and keep top ones
Recall that all possible choices are:ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ
![Page 25: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/25.jpg)
Cache update
Possible Choices:• Compute the score over all ℎ′ ∈ ℰ, 𝑡′ ∈ ℰ and
select among them.
• Sample a subset ℛ𝑚from ℰ, and select among
them.
• Sample a subset ℛ𝑚from ℰ, concatenate it with
the cache, and select among the new set.
Entity set ℰ
𝒯ℎ,𝑟 ℋ 𝑟,𝑡
ℛ𝑚
Design Requirements
Capture dynamic distribution Explore all possible candidates Efficient
![Page 26: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/26.jpg)
𝒯ℎ,𝑟 ℋ 𝑟,𝑡ℛ𝑚
Entity set ℰ
𝒯ℎ,𝑟 ℋ 𝑟,𝑡
update cache
random sample
ℛ𝑚
Update scheme• top-k• importance sampling
Connection to self-paced learning:• As training goes on, easy samples will gradually have small scores and are removed from the
cache. Thus, hard samples are gradually stored.
NSCaching (update & construct cache)
![Page 27: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/27.jpg)
NSCaching (sample from cache)
Since negative samples in the cache is almost equally good, we uniformly
sample from them
![Page 28: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/28.jpg)
ℋ
𝒯
tail cache
head cache
ℎ, 𝑟, 𝑡
ℎ, 𝑟
𝑟, 𝑡
index
index
ℎ, 𝑟, ҧ𝑡
തℎ, 𝑟, 𝑡
samp
lesam
ple
concatenate
concatenate
𝑓(ℎ, 𝑟, 𝑡)
തℎ, 𝑟, ҧ𝑡 𝑓
𝑓 തℎ, 𝑟, ҧ𝑡
ℎ, 𝑟, 𝑡 തℎ, 𝑟, ҧ𝑡
loss
𝑓
negative triplet
update cache
Cache KGE
NSCaching (overview)
𝒯ℎ,𝑟
ℋ 𝑟,𝑡
![Page 29: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/29.jpg)
NSCaching (detailed design concerns)
There are other design possibilities for NSCaching, e.g.
• Sample the top 1 in the cache
• Keep the top in the cache
• Etc….
Please check our paper for detailed discussion, the main principle is :
exploration and exploitation
![Page 30: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/30.jpg)
Experiments
![Page 31: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/31.jpg)
Effectiveness
Measurements• Given a triplet ℎ, 𝑟, 𝑡 ;• Compute the score of ℎ′, 𝑟, 𝑡 , ∀ℎ′ ∈ ℰ;• Get the rank of ℎ among all ℎ′;• Same for 𝑡.
Metrics• MRR (mean reciprocal rank):
• MR (mean rank):
• Hit@10:
1
𝒮
𝑖=1
𝒮1
rank𝑖
1
𝒮
𝑖=1
𝒮
rank𝑖
1
𝒮
𝑖=1
𝒮
𝕀 rank𝑖 < 10
![Page 32: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/32.jpg)
EfficiencyWe measure the convergence by testing performance v.s. training time.
![Page 33: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/33.jpg)
EfficiencyWe measure the convergence by testing performance v.s. training time.
![Page 34: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/34.jpg)
Sampling and Updating schemes
• Sampling from cache: uniform, importance sampling (IS), top-1
• Cache update: importance sampling (IS), top-k
Diff. sampling scheme. Diff. updating scheme.
![Page 35: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/35.jpg)
Stability
We change the cache size 𝑁1 among 10, 30, 50, 70, 90 when fixing 𝑁2 = 50,
and random subset size 𝑁2 among 10, 30, 50, 70, 90 when fixing 𝑁1 = 50.
Diff. 𝑁1 Diff. 𝑁2
![Page 36: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/36.jpg)
Visualization
Given positive triplet (manorama, profession, actor), we randomly select and
visualize some entities in the tail-cache 𝒯(𝑚𝑎𝑛𝑜𝑟𝑎𝑚𝑎,𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑖𝑜𝑛) during
training.
![Page 37: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/37.jpg)
Summary
A novel negative sampling method.
Why it works
• It can dynamically hold high-quality negative samples;
• Sampling is efficient and extra memory is small;
• Both sampling and updating schemes are carefully designed to balance through exploration and exploitation;
• The cache schemes has connection with self-paced learning.
![Page 38: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph](https://reader034.fdocuments.us/reader034/viewer/2022050123/5f52cee808e7c56bf5682d17/html5/thumbnails/38.jpg)
Future works
• Using advanced index structure for the cache to further improve efficiency and
reduce cache sizes.
• Adapt to negative sampling in other tasks like word embedding, network
embedding, PU learning;
• Theoretical analysis on the convergence;
• Using AutoML to make NSCaching better adapt to other datasets.