Evaluation of Utility of LSA for Word Sense Discrimination
Esther Levin, Mehrbod Sharifi, Jerry Ball
http://www-cs.ccny.cuny.edu/~esther/research/lsa/
Slide 2
2 Outline Latent Semantic Analysis (LSA) Word sense
discrimination through Context Group Discrimination Paradigm
Experiments Sense-based clusters (supervised learning) K-means
clustering (unsupervised learning) Homonyms vs. Polysemes
Conclusions
Slide 3
3 Latent Semantic Analysis (LSA) Deerwester 90 Represents words
and passages as vectors in the same (low-dimensional) semantic
space Similarity in word meaning is defined by similarity of their
contexts.
Slide 4
4 LSA Steps 1. Document-Term Co-occurrence Matrix e.g., 1151
documents X 5793 terms 2. Compute SVD 3. Reduce dimension by taking
k largest singular values 4. Compute the new vector representations
for documents 5. [Our Research] Clustering the new context
vectors
Slide 5
5 Context Vectors of an ambiguous word Inducing senses of
ambiguous words from their contextual similarity Context Group
Discrimination Paradigm Shutze 98
Slide 6
6 a b a < b Sense 1Sense 2 Context Group Discrimination
Paradigm Shutze 98 1. Cluster the context vectors 2. Compute the
centroids (sense vectors) 3. Classify new contexts based on
distance to centroids
Slide 7
Experiments
Slide 8
8 Experimental Setup Corpus Leacock `93 Line (3 senses 1151
instances) Hard (2 senses 752 instances) Serve (2 senses 1292
instances) Interest (3 senses 2113 instances) Context size: full
document (small paragraph) Number of clusters = Number of
senses
Slide 9
9 Research Objective How well the different senses of ambiguous
words are separated in the LSA-based vector space. Parameters:
Dimensionality of LSA representation Distance measure L1: City
Block L2: Squared Euclidean Cosine
Slide 10
10 Sense-based Clusters An instance of supervised learning An
upper bound on unsupervised performance of K-means or EM Not
influenced by the choice of clustering algorithm Best Case
Separation Worst Case Separation
Slide 11
11 Training: Finding sense vectors based on 90% of data
Testing: Assigning the 10% remaining data to the closest sense
vectors and evaluate by comparing this assignment to sense tags
Random selection, cross validation Sense-based Clusters:
Accuracy
Slide 12
12 Evaluating Clustering Quality: Tightness and Separation
Dispersion: Inter-cluster (K-Means minimizes) Silhouette:
Intra-cluster a(i): average distance of point i to all other points
in the same cluster b(i): average distance of point i to the points
in closest cluster
Slide 13
13 More on Silhouette Value i Closest Cluster Points are
perfectly clustered Points can belong one cluster or another Points
belong to wrong cluster a(i) average of all blue lines b(i) average
of all yellow lines
Slide 14
14 Cosine0.9639 L10.7355 L20.9271 Cosine-0.0876 L1-0.0504
L2-0.0879 Average Silhouette Value Evaluating Clustering Quality:
Tightness and Separation
Slide 15
15 Sense-based Clusters: Discrimination Accuracy Baseline:
Percentage of the majority sense
Slide 16
16 Sense-based Clusters: Average Silhouette Value
Slide 17
17 Sense-based Clusters: Results Good discrimination accuracy
Low silhouette value How is that possible?
Slide 18
18 Unsupervised Learning with K-means Cosine measure Start
randomlyMost compact resultStart with sense vector Sense-based
clustering Training/Testing
Slide 19
19 Unsupervised Learning with K-means
Slide 20
20 Polysemes vs. Homonyms Polysemes: words with multiple
related meanings Homonyms: words with the same spelling but
completely different meaning
Slide 21
21 Pseudo Words as Homonyms Shutze 98 find it hard to believe
exactly how to say a line and about 30 minutes and serve warm set
the interest rate on the find it x to believe exactly how to say a
x and about 30 minutes and x warm set the x rate on the
Slide 22
22 Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA
Space The correlation between compactness of clusters and
discrimination accuracy is higher for homonyms than polysemes
Points on red lines are the most compact cluster out of 10
experiments
Slide 23
23 Conclusions Good unsupervised sense discrimination
performance for homonyms Major deterioration in sense
discrimination of polysemes in absence of supervision
Dimensionality reduction benefit is computational only (no peak in
performance) Cosine measure performs better than L1 and L2