Download - Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball esther/research/lsa

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/

2 Outline Latent Semantic Analysis (LSA) Word sense discrimination through Context Group Discrimination Paradigm Experiments Sense-based clusters (supervised learning) K-means clustering (unsupervised learning) Homonyms vs. Polysemes Conclusions

3 Latent Semantic Analysis (LSA) Deerwester 90 Represents words and passages as vectors in the same (low-dimensional) semantic space Similarity in word meaning is defined by similarity of their contexts.

4 LSA Steps 1. Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms 2. Compute SVD 3. Reduce dimension by taking k largest singular values 4. Compute the new vector representations for documents 5. [Our Research] Clustering the new context vectors

5 Context Vectors of an ambiguous word Inducing senses of ambiguous words from their contextual similarity Context Group Discrimination Paradigm Shutze 98

6 a b a < b Sense 1Sense 2 Context Group Discrimination Paradigm Shutze 98 1. Cluster the context vectors 2. Compute the centroids (sense vectors) 3. Classify new contexts based on distance to centroids

Experiments

8 Experimental Setup Corpus Leacock `93 Line (3 senses 1151 instances) Hard (2 senses 752 instances) Serve (2 senses 1292 instances) Interest (3 senses 2113 instances) Context size: full document (small paragraph) Number of clusters = Number of senses

9 Research Objective How well the different senses of ambiguous words are separated in the LSA-based vector space. Parameters: Dimensionality of LSA representation Distance measure L1: City Block L2: Squared Euclidean Cosine

10 Sense-based Clusters An instance of supervised learning An upper bound on unsupervised performance of K-means or EM Not influenced by the choice of clustering algorithm Best Case Separation Worst Case Separation

11 Training: Finding sense vectors based on 90% of data Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags Random selection, cross validation Sense-based Clusters: Accuracy

12 Evaluating Clustering Quality: Tightness and Separation Dispersion: Inter-cluster (K-Means minimizes) Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster

13 More on Silhouette Value i Closest Cluster Points are perfectly clustered Points can belong one cluster or another Points belong to wrong cluster a(i) average of all blue lines b(i) average of all yellow lines

14 Cosine0.9639 L10.7355 L20.9271 Cosine-0.0876 L1-0.0504 L2-0.0879 Average Silhouette Value Evaluating Clustering Quality: Tightness and Separation

15 Sense-based Clusters: Discrimination Accuracy Baseline: Percentage of the majority sense

16 Sense-based Clusters: Average Silhouette Value

17 Sense-based Clusters: Results Good discrimination accuracy Low silhouette value How is that possible?

18 Unsupervised Learning with K-means Cosine measure Start randomlyMost compact resultStart with sense vector Sense-based clustering Training/Testing

19 Unsupervised Learning with K-means

20 Polysemes vs. Homonyms Polysemes: words with multiple related meanings Homonyms: words with the same spelling but completely different meaning

21 Pseudo Words as Homonyms Shutze 98 find it hard to believe exactly how to say a line and about 30 minutes and serve warm set the interest rate on the find it x to believe exactly how to say a x and about 30 minutes and x warm set the x rate on the

22 Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes Points on red lines are the most compact cluster out of 10 experiments

23 Conclusions Good unsupervised sense discrimination performance for homonyms Major deterioration in sense discrimination of polysemes in absence of supervision Dimensionality reduction benefit is computational only (no peak in performance) Cosine measure performs better than L1 and L2