EmojiNet: An Open Service and API for Emoji Sense Discovery
-
Upload
sanjaya-wijeratne -
Category
Education
-
view
179 -
download
0
Transcript of EmojiNet: An Open Service and API for Emoji Sense Discovery
![Page 1: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/1.jpg)
EmojiNet: An Open Service and API for Emoji Sense Discovery
Presented By - Sanjaya Wijeratne
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran, EmojiNet: An Open Service and API for Emoji Sense Discovery, In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. Demo | BibTeX
Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran
![Page 2: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/2.jpg)
Problems with current State-of-the-art● Current version of EmojiNet supports:
○ Only 35% of all emoji supported by the Unicode
Consortium (845 out of 2,389)
○ Emoji sense definitions are very short (10 ~ 15 words)
○ No support for platform-specific emoji meanings
○ Not available for download as a dataset
○ Does not support REST API access
2
![Page 3: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/3.jpg)
What is new in EmojiNet● Supports all 2,389 emoji supported by Unicode Consortium
○ 2,389 emoji (3 times increase)
○ 12,904 sense definitions (4 times increase)
● Sense-embeddings learned over text corpora
○ Twitter and Google News corpora are used to learn word
embeddings to further strengthen sense definitions
● Platform-specific meanings for 40 commonly misunderstood
emoji obtained through an Amazon Mechanical Turk Task
● Public release of the EmojiNet dataset with REST API access 3
![Page 6: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/6.jpg)
Sense Filtering● We had 50,115 total number of senses in our sense pool
○ 21,779 of them were incorrect according to English
○ We evaluated the remaining 28,336 sense labels
■ 15,432 sense labels were removed as they were not
correct (noisy data extracted from Emoji Dictionary)
○ Remaining 12,904 sense labels were considered for sense
disambiguation
6
![Page 9: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/9.jpg)
EmojiNet Resource Evaluation● Resource linking based on image similarity performed with
96.27% accuracy
9
![Page 10: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/10.jpg)
Adding Word Embeddings to EmojiNet● We trained a Twitter word embedding model using 110
million tweets with emoji. We also used a publicly available
Google News word embedding model to learn word vectors
● Each word in each emoji sense in each emoji was replaced by
the 20 most related words learned by the word embeddings
models. This lead to 3 contexts for each emoji sense
○ BabelNet-based context words
○ Twitter-based context words
○ Google News-based context words10
![Page 11: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/11.jpg)
Adding Platform-specific senses to EmojiNet● We conducted an experiment on Amazon Mechanical Turk
to understand what emoji senses are platform-specific for a
given emoji
○ We selected 40 commonly misunderstood emoji for this
○ We created 14,448 tasks, where each task asked to
evaluate whether a particular platform-specific sense is
valid
○ 1,128 tasks were filtered as they were spam
11
![Page 12: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/12.jpg)
Emoji Sense Disambiguation● We selected 25 most misunderstood emoji based on past
work for a emoji sense disambiguation task
○ Randomly selected 50 tweets for each emoji
○ Used Simplified LESK algorithm for disambiguation
12
![Page 13: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/13.jpg)
Emoji Similarity● We used 100 emoji available in EmoTwi50 dataset to create a
graph based on emoji similarity
○ Emoji are represented as nodes
○ If two emoji share the same sense label, they are
connected by an edge
● We used label propagation algorithm to find clusters in our
emoji graph
13
![Page 15: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/15.jpg)
Calculate Emoji Similarity using Jaccard Coefficient● In another experiment, we used Jaccard Similarity on emoji
senses to find emoji similarity
15
![Page 16: EmojiNet: An Open Service and API for Emoji Sense Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062302/58d0fc511a28abc00b8b643f/html5/thumbnails/16.jpg)
Questions?
Thank You!
16
Read more about EmojiNet at - http://wiki.knoesis.org/index.php/EmojiNet