Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.
-
Upload
william-blankenship -
Category
Documents
-
view
215 -
download
2
Transcript of Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.
![Page 1: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/1.jpg)
Asymmetric Word Similarity
Behrad Assadian
Trevor Martin
Ben Azvine
![Page 2: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/2.jpg)
• An approach to understanding of text documents
• Capture semantics of textual information
• Matrix of Word Similarity
• Applicable to a particular domain
• Use a corpus of textual documents
•Resolves issues encountered by other traditional methods
• Can use this to measure document similarity and clustering
Introduction
![Page 3: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/3.jpg)
• It is deduced that it is possible to guess the meaning of an unknown word from its context (Pantal P, D Linn)
A bottle of Tezguno is on the table.Everyone likes Tezguno. Tezguno makes you drunk.We make Tezguno out of corn
Can be deduces using Distributional Hypothesis that“Tezguno” is a type of alcoholic drink
![Page 4: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/4.jpg)
Asymmetric Word Similarity Matrix
Based on Identifying frequencies of ngrams of context words
e.g c1-x-c2 represented as x:([c1,c2])
Consider
The quick brown fox jumps over the lazy dog.
The quick brown cat jumps onto the active dog.
The slow brown fox jumps onto the quick brown cat.
The quick brown cat leaps over the quick brown fox.
![Page 5: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/5.jpg)
• Fuzzy set represents context of a word
• e.g for brown
{(quick,cat):1,(quick, fox):0.833, (slow,fox):0.50}
• Convert frequencies to fuzzy sets
![Page 6: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/6.jpg)
• Mass assignment followed by Semantic Unification is carried out.
• Result given as a single value probability
• Two words W1 and W2
pr(w1|w2)
degree to which w1 could replace w2
• Performing every possible semantic unification gives word similarity matrix
• Many elements shall be zero
![Page 7: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/7.jpg)
Document Clustering
• Can cluster documents using AWS matrix
• Other known methods Vector Space Model
• Limitation:- String matching
• Words such as taxi and cab could be ignored
• document similarity matrix
• Distance between two documents can be identified.
• Cluster files around starting file
![Page 8: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/8.jpg)
Results• Film Description
• Reviews of movies
• Tested using WordNet & inspection
• Identified Synonyms/antonyms
• Close Hypernyms identified
• Exhaustive search Total antonyms/synonyms/hypernyms that exists but not identified
• Hit rate of 67%, 28% and 30%
![Page 9: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/9.jpg)
Clustering results
• Movie corpus reviews
• Possible to compare clustered results
• Can set threshold value
1-4Aviator
1-4Chorus
1-4BeingJulia
1-4Ray
![Page 10: Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649e365503460f94b25ea9/html5/thumbnails/10.jpg)
• Proposed a method for clustering documents
using Asymmetric Word Similarity
• Results using WordNet prove encouraging
• Using context to determine semantics can be affective
• Must carry out further comparison with other common methods
• Performance issues for large corpuses must be addressed