DOF ASA Annual Report 2014 - dofman.no ASA/IR/2015/DOF ASA... · DOF ASA Annual Report 2014
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
-
Upload
asagroup -
Category
Technology
-
view
155 -
download
0
Transcript of [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
![Page 1: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/1.jpg)
Sentiment Analysis in Twitter a Study on the Saudi Community
Online talk by: Dr. Nora Altwairesh
Date: 11 Dec, 8:00-9:30pm
![Page 2: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/2.jpg)
www.asa.imamu.edu.sa
Outline
•ASA •ASA Research Group?•Housekeeping •The talk
![Page 3: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/3.jpg)
www.asa.imamu.edu.sa
Sentiment Analysis
• Keyword: iPhone • Tweets: Total Tweets’
Sentiments
Pos NegNeut
iPhone is great!
iPhone connection sucks!
I bought an iPhone yesterday
Yeah IPhone has long battery life its even longer than my life :@(Challenge)!
![Page 4: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/4.jpg)
www.asa.imamu.edu.sa
Outline
•ASA •ASA Research Group?•Housekeeping •The talk
![Page 5: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/5.jpg)
Arabic Sentiment AnalysisResearch Group
www.asa.imamu.edu.sa @asa__iu
![Page 6: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/6.jpg)
www.asa.imamu.edu.sa
Group Members
Name RoleDr. Sarah alHumoud Principal Investigator
Dr. Areeb alOwisheq Co-Investigator
Dr. Nora alTwairesh Senior Investigator
Ms. Afnan alMoammar Ms. AlHanouf alSwilim
Ms. Mawaheb alTowijri Ms. Tarfa alBuhairi
Ms. Wejdan alOhaideb
![Page 7: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/7.jpg)
www.asa.imamu.edu.sa
Arabic Sentiment Analysis Group• Create an Arabic corpora• Develop a Sentiment Analyzer web
service• Disseminate aims, findings and
developed resources:• Website• Workshops • Scientific articles
• ASA Survey (collection, classification, analysis)
• Analyze and compare different SA methodologies performances
• Develop an SA classifier with discourse relation
![Page 8: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/8.jpg)
www.asa.imamu.edu.sa
Side Projects• Annotation
• 11 Annotators; • 142,434 Tweets
• Tools demo• ASA• Spam detection
![Page 9: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/9.jpg)
www.asa.imamu.edu.sa
Coming events• Sentiment Analysis in
Social Media session in• HCII2017
• Publications in• Lecture Notes in
Computer Science (LNCS)• Deadline
• 17/ Dec/ 2016
![Page 10: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/10.jpg)
www.asa.imamu.edu.sa
Outline
•ASA •ASA Research Group?•Housekeeping •The talk
![Page 11: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/11.jpg)
www.asa.imamu.edu.sa
Ask and talk?• For Textual Question
Use QA, • if your question is
answered it will be public
• To Speak• raise your hand
![Page 12: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/12.jpg)
www.asa.imamu.edu.sa
Attendees Countries
Saudi ArabiaUnited Arab EmiratesOtherOman
![Page 13: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/13.jpg)
www.asa.imamu.edu.sa
Attendees Majors
CSISITOtherIMDSSE 0
10
20
30
40
50
60
![Page 14: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/14.jpg)
www.asa.imamu.edu.sa
Outline
•ASA •ASA Research Group?•Housekeeping •The talk
![Page 15: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/15.jpg)
Sentiment Analysis in Twitter a Study on the Saudi Community
Online talk by: Dr. Nora Altwairesh
Date: 11 Dec, 8:00-9:30pm
![Page 16: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/16.jpg)
www.asa.imamu.edu.sa
The Speaker: Nora Al-Twairesh, Ph.D.
• Assistant Professor, • Information Technology Department• College of Computer and Information Sciences,• King Saud University• Riyadh, Saudi Arabia• Website: http://fac.ksu.edu.sa/twairesh • Research Groups:
• http://iwan.ksu.edu.sa • https://asa.imamu.edu.sa
• Research Interests:• Arabic Sentiment Analysis of Social Media text,• Arabic Natural Language Processing,• Web and Data Mining.
![Page 17: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/17.jpg)
17www.asa.imamu.edu.sa
Contents• Introduction• What is Sentiment Analysis?• Why is it Important?• Sentiment Analysis of Arabic• Twitter• Research Motivation• Research Contributions • Results• Conclusion and Future Work
![Page 18: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/18.jpg)
18www.asa.imamu.edu.sa
What is Sentiment Analysis?
• Sentiment analysis is “the field of study that analyzes people’s opinions, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in written text" (Liu, 2012)
• Different names: Sentiment Analysis, Opinion mining, opinion extraction, sentiment mining, subjectivity analysis
• Sentiment Analysis classifies text polarity (positive, negative, neutral and mixed)
![Page 19: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/19.jpg)
19www.asa.imamu.edu.sa
What is Sentiment Analysis?
TweetSentiment
Positive Negative Neutral Mixed
إيجابي تغير خالد ـ الملك ـ مطارملحوظ
جدا فاشل مذيع أنه أثبت لألسف
قاريء برنامج لي ترشح ممكنممتاز باركود
االسعار لكن رائع جرير قارئغالية
لكن و جدا ممتاز بالجهاز انصحكثقيل عيبه
![Page 20: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/20.jpg)
20www.asa.imamu.edu.sa
Why is it Important?
• The proliferation of social media websites has led to the production of vast amounts of unstructured text on the Web.
• Aggregating and evaluating these opinions manually is a tedious task and could be nearly impossible.
• These opinions are important for organizations (government, business) and for individuals
![Page 21: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/21.jpg)
21www.asa.imamu.edu.sa
Sentiment Analysis Methods
• Lexicon-based: rule-based method that utilizes sentiment lexicons.
• Corpus-based: supervised learning that utilizes machine learning classifiers.
![Page 22: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/22.jpg)
22www.asa.imamu.edu.sa
Research Motivation
• Hot research field• Challenges of Arabic language• Challenges of Twitter data
![Page 23: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/23.jpg)
23www.asa.imamu.edu.sa
Arabic Language
• Morphologically Rich Language• Extremely challenging to process due to rich morphology
and complex word order• Diglossic situation with a multitude of dialects• Modern Standard Arabic : formal language• Dialects: informal language
![Page 24: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/24.jpg)
24www.asa.imamu.edu.sa
Challenges of SA of Arabic Tweets
• Use of Dialectal Arabic (DA)• Lack of Arabic Corpora and Datasets• Lack of Arabic Sentiment Lexicons
![Page 25: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/25.jpg)
25www.asa.imamu.edu.sa
• Why Twitter?
![Page 26: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/26.jpg)
26www.asa.imamu.edu.sa
• Why Twitter?• Mubarak, H., and Darwish K. "Using Twitter to collect a multi-
dialectal corpus of Arabic." ANLP 2014 (2014): 1.• 175 M Arabic tweets • during March 2014• 6.5 M tweets
![Page 27: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/27.jpg)
27www.asa.imamu.edu.sa
Characteristics of Twitter Data
• Language is informal• Short: 140 characters or less• Abbreviations and shortenings• Wide array of topics and large vocabulary• Spelling mistakes and creative spellings• Special strings: hashtags, emoticons,
conjoined words
![Page 28: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/28.jpg)
28www.asa.imamu.edu.sa
Research Contributions• Collecting a large dataset of Arabic Tweets 2.2M.• AraSenti-Tweet Corpus: A corpus of Saudi tweets was
constructed from the dataset of tweets.• AraSenti Lexicon: A sentiment lexicon of Arabic words was
extracted from the dataset of tweets. • Constructing an extensive list of Arabic contextual valence
shifters (negators, intensifiers, diminishers, modal words and contrast words).
• Lexicon-based method.• Corpus-based method.• Hybrid method.
![Page 29: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/29.jpg)
29www.asa.imamu.edu.sa
Data Collection
• EMO-TWEET Dataset:• distant supervision: using emoticons as noisy labels :positive, : negative.
• KEY-TWEET Dataset:• sentiment words as search keywords, ex: – سيء أعجبني
• Saudi-Tweet Dataset: • Tweet or user location set to Saudi location
![Page 30: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/30.jpg)
30www.asa.imamu.edu.sa
Data Collection and Preprocessing
![Page 31: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/31.jpg)
31www.asa.imamu.edu.sa
Data Collection and Preprocessing
![Page 32: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/32.jpg)
32www.asa.imamu.edu.sa
AraSenti-Tweet Corpus• Set of ~ 13,000 tweets were selected from the Saudi
Dataset• Most of the annotated tweets in the first stage were
positive or negative and we needed to augment the dataset with more neutral tweets, so we collected 4,000 tweets from two Saudi news accounts
• More tweets were collected to set up the test set ~2000 tweets
![Page 33: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/33.jpg)
33www.asa.imamu.edu.sa
AraSenti-Tweet Corpus
Class No. of Tweets No. of Tokens
Positive 4,957 93,601
Negative 6,155 127,182
Neutral 4,639 71,492
Mixed 1,822 39,883
Total 17,573 332,158
![Page 34: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/34.jpg)
34www.asa.imamu.edu.sa
AraSenti Lexicon
• AraSenti-Trans: Using MADAMIRA, the English glosses of the extracted words from the tweets were compared to English sentiment lexicons using certain heuristics. Then a manual correction was performed
• AraSenti-PMI: The second lexicon was generated through calculating the pointwise mutual information (PMI) measure for all words in the positive and negative datasets of tweets.
• Sentiment Score(w)=PMI(w,pos)-PMI(w,neg)
![Page 35: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/35.jpg)
35www.asa.imamu.edu.sa
Significance of AraSenti Lexion
• Captures the idiosyncratic nature of social media text.• Provides sentiment intensity of words, not only the
sentiment orientation.• MSA and DA• High coverage :200K words
![Page 36: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/36.jpg)
36www.asa.imamu.edu.sa
Arabic Valence Shifters
• Extensive list of Arabic valence shifters extracted from the datasets through similarity measures.
• Negation words, intensifiers, diminishers, modal words, presuppositional and contrast words.
• Different hypotheses were evaluated for negation handling.
![Page 37: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/37.jpg)
37www.asa.imamu.edu.sa
Arabic Valence Shifters
Example Sentiment Valence shifter
ممتع الكتاب .هذا Positive None
الكتاب ممتع غيرهذا . Negative Negation
ألنه االخالق سيء الرجل هذاالعامل .أهان
Negative None
كان االخالق لو سيء الرجلالعامل .ألهان
Neutral Modal
جيد الكتاب .هذا Positive None
جيد ظنيت كتاب بيكون إنه . Neutral or Negative
Presuppositional
ينجح أن .استطاع Positive None
ينجح بالكاد أن استطاع . Negative Presuppositional
![Page 38: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/38.jpg)
38www.asa.imamu.edu.sa
Sentiment Analysis Methods
• Three sentiment analysis methods:• Lexicon-based• Corpus-based• Hybrid
• Three classification models: • Two-way classification (positive, negative),• Three-way classification (positive, negative, neutral) • Four-way classification (positive, negative, neutral, mixed)
![Page 39: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/39.jpg)
39www.asa.imamu.edu.sa
Lexicon-based Method
• Rule-based method that utilizes the AraSenti-lexicon and performs context-aware sentiment analysis by special handling of negation and contextual valence shifters.
• Calculates sentiment score which represents sentiment intensity in addition to polarity.
![Page 40: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/40.jpg)
40www.asa.imamu.edu.sa
Corpus-based Method
• Supervised learning method that utilizes ML classifiers using the AraSenti-Tweet corpus.
• Used SVM linear kernel.• Features engineered: syntactic, semantic, and Twitter
specific. • Semantic features include the AraSenti-lexicon.• Performed feature backward selection to reach best set of
features.
![Page 41: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/41.jpg)
41www.asa.imamu.edu.sa
Hybrid Method
• The approach was to incorporate the knowledge extracted from the rule-based method as features into the statistical method.
• The tweet score that is calculated in the lexicon-based method was added to the features used in the corpus-based method.
• The hybrid method exhibited significant increases in performance for two-way and three-way classification.
• However, in four-way classification the performance of the hybrid and corpus-based method was almost the same.
![Page 42: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/42.jpg)
42www.asa.imamu.edu.sa
Results
Lexicon-based
Corpus-based Hybrid
Two-way classification 67.08 65.7 69.9
Three-way classification 45.69 59.85 61.63
Four-way classification 34.8 55.38 55.07
![Page 43: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/43.jpg)
43www.asa.imamu.edu.sa
Conclusion and Future Work
• Twitter ANLP tool: Arabic language needs enabling technologies for preprocessing Twitter data.
• Other statistical methods for generating the lexicon: Chi-Square and Information Gain.
• A sentiment treebank that allows for a complete analysis of the compositional effects of sentiment in Arabic language would enable better classification.
![Page 44: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/44.jpg)
44www.asa.imamu.edu.sa
Conclusion and Future Work
• Better handling of negation and valence shifters through constructing a specialized corpus that contains these valence shifters and annotating them with regard to the impact on sentiment.
• Sarcasm detection in tweets is a vital research direction.• Future solutions should be domain specific, dialect
specific and periodically updated to adhere to the time shift in the language on Twitter.
![Page 45: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/45.jpg)
45www.asa.imamu.edu.sa
Conclusion and Future Work
• Major and novel contributions to the field can be accomplished through collaboration of computer scientists, linguist experts and social scientists.
• Hence, interdisciplinary research is a major research necessity for the field to flourish and advance.
![Page 46: [ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community](https://reader035.fdocuments.us/reader035/viewer/2022070513/5884e9971a28abf76f8b46f7/html5/thumbnails/46.jpg)
46www.asa.imamu.edu.sa
• Thank you..• Questions?