MUSE: MODULARIZING UNSUPERVISED SENSE EMBEDDINGS miuyvchen/doc/EMNLP17_MUSE_poster.pdf ·...
Transcript of MUSE: MODULARIZING UNSUPERVISED SENSE EMBEDDINGS miuyvchen/doc/EMNLP17_MUSE_poster.pdf ·...
MUSE: MODULARIZING UNSUPERVISED SENSE EMBEDDINGSGuang-He Lee and Yun-Nung (Vivian) Chen
https://github.com/MiuLab/MUSE
Motivation
ConclusionExperiment 2: Synonym Selection
o Efficiency: purely sense-level
representation learning with linear-time
sense decoding
o Modeling: single objective for modular
unsupervised sense embedding learning
o Learning: leverage RL to model the sense
selection process
o Exploration: introduce various exploration
mechanisms for the sense selection for
robustness
o Experiment: state-of-the-art performance
[email protected] [email protected]
guanghelee.github.io vivianchen.idv.tw
Why sense embeddings
o Words are polysemous, but their embeddings are usually not
o The word embedding is restricted by the triangle inequality
o Sense embeddings can easily circumvent such constraint
Smartphone companies including blackberry, and sony will be invited.apple
apple-1 apple-2sense selection
sense embedding
Key mechanisms
Traditional Frameworks
Clustering based on contexts as sense ID
o Efficient sense selection
o Embedding learning involves word tokens
word vec in context
cluster 1 cluster 2 …
sense vec
word vec in context
S𝑖1 S𝑗1 S𝑘1 S𝑙1 S𝑚1
S𝑖2 S𝑗2 S𝑘2 S𝑙2 S𝑚2
S𝑖3 S𝑗3 S𝑘3 S𝑙3 S𝑚3
… … … … …… … … … …
𝑤𝑖 𝑤𝑗 𝑤𝑘 𝑤𝑙 𝑤𝑚
Sense selection by a distribution
o No word tokens are involved
o Inefficient EM algorithms
The Proposed Approach: MUSE
Advantages from both worlds
coarse-grained embedding
fine-grained embedding
efficient sense selection
inefficient sense selection
word+senseembedding
sense embedding
Issue 1: impossible for a single model
Solution: modular framework
o Distinct modules for specific mechanisms
Issue 2: how to formulate single objective?
Solution: Markov Decision Process (MDP)
o sense selection → state/action in MDP
o sense representation → reward in MDP
sense representationsense selection
sense selection: linear neural network
o Policy-based:
o Value-based:
sense representation: skip-gram
o
Issue 3: how to optimize modules?
Solution: Reinforcement Learning (RL)
o Policy-based: maximizes the expected rewards
o Value-based: estimates the rewards directly
Issue 4: how to conduct sense selection?
Solution: Exploration
o Policy gradient: sampling
o Value-based: greedy, ε-Greedy, Boltzmann sampling
Approach MaxSimC AvgSimC
Huang et al., 2012 26.1 65.7
Neelakantan et al., 2014 60.1 69.3
Tian et al., 2014 63.6 65.4
Li & Jurafsky, 2015 66.6 66.8
Bartunov et al., 2016 53.8 61.2
Qiu et al., 2016 64.9 66.1
MUSE-Policy 66.1 67.4
MUSE-Greedy 66.3 68.3
MUSE-ε-Greedy 67.4+ 68.6
MUSE-Boltzmann 67.9+ 68.7
Experiment 1: Contextual Word Similarities
clusterin
gp
rob
abilistic
𝑠𝑡𝑜𝑛𝑒 𝑠ℎ𝑎𝑘𝑒
𝑟𝑜𝑐𝑘
word embedding schema
𝑠𝑡𝑜𝑛𝑒 𝑠ℎ𝑎𝑘𝑒
𝑟𝑜𝑐𝑘1 𝑟𝑜𝑐𝑘2
≈≈
sense embedding schema
Setup
Dataset: April 2010 Wikipedia dump
Context window: 5
Embedding dim: 300
Evaluation: similarity on contextual word pairs
… east bank of the Des Moines River …
… basis of all money laundering …
o Compared target: human-judged similarity
bank
bank-1 bank-20.4
He borrowed the money from
0.6 x + 0.4 x
(1) MaxSimC
(2) AvgSimC
0.6
MUSE achieves the state-of-the-art on MaxSimC
Q: yield (A) submit (B) challenge (C) boast (D) scorn Task:
𝑧𝑖1
Sense Representation Module
…𝑃(𝑧𝑗2|𝑧𝑖1) 𝑃(𝑧𝑢𝑣|𝑧𝑖1)
negative sampling
matrix 𝑉
matrix 𝑈
𝑞(𝑧𝑖2| ഥ𝐶𝑡) 𝑞(𝑧𝑖3| ഥ𝐶𝑡)
Sense Selection Module
…𝐶𝑡 = 𝑤𝑖𝐶𝑡−1
𝑞(𝑧𝑖1| ഥ𝐶𝑡)
Sense selection for target word 𝐶𝑡
matrix 𝑄𝑖
matrix 𝑃… 𝐶𝑡+1
including apple blackberrycompanies and
Corpus: { Smartphone companies including apple blackberry, and sony will be invited.}sample collocation
sense selectio
n→
reward
←
𝑞(𝑧𝑗2|𝐶𝑡′) 𝑞(𝑧𝑗3|𝐶𝑡′)
Sense Selection Module
…𝐶𝑡′ = 𝑤𝑗𝐶𝑡′−1
𝑞(𝑧𝑗1|𝐶𝑡′)
Sense selection for collocated word 𝐶𝑡′
matrix 𝑄𝑗
matrix 𝑃… 𝐶𝑡′+1
including apple blackberry and
sense selection ←2
23
1 Model architecture
sony
tree
trees
rock
rocks
more
practical Approach ESL-50 RD-300 TOEFL-80
Global Context 47.73 45.07 60.87
SkipGram 52.08 55.66 66.67
IMS+SkipGram 41.67 53.77 66.67
EM 27.08 33.96 40.00
MSSG (Neelakantan et al., ‘14) 57.14 58.93 78.26
CRP (Li & Jurafsky, ‘15) 50.00 55.36 82.61
MUSE-Policy 52.38 51.79 79.71
MUSE-Greedy 57.14 58.93 79.71
MUSE-ε-Greedy 61.90+ 62.50+ 84.06+
MUSE-Boltzmann 64.29+ 66.07+ 88.41+
Retro-GlobalContext 63.64 66.20 71.01
Retro-SkipGram 56.25 65.09 73.33
word embeddingsword sense
disambiguation
unsupervised sense
embeddings
supervised sense
embeddings
MUSE with exploration outperforms all baselines.MUSE can effectively separate different senses in an unsupervised way.
MUSE beat some supervised systems w/o any supervision.
miu
Code Available:
http://github.com/MiuLab/MUSE