Sentiment Detection in Online Content: A WordNet based approach

Sentiment Detection in Online Content: A WordNet based approach

Soumi Dutta, Moumita Roy, Asit Kumar Das, and Saptarshi GhoshInstitute Of engginering and Management, Kolkata 700091, India

[email protected] Institute Of enginering Science and Technology, Shilbur,

Howrah 711103, India

AbstractOnline Social Networks (OSN), such as Facebook, Twitter, Youtube and so on, are important sources of online content today. These platforms are used by millions of people world-wide, to share information and express their sentiment and opinion on various social issues. Sentiment analysis of online content – automatically inferring whether a particular textual content reflects a positive (e.g., happy) or negative (e.g., sad) sentiment of the person who posted the content – is an important research problem today, and has several potential applications such as analysing public opinion on various products or social issues. In this paper, we propose a simple but effective methodology of inferring the sentiment of textual content posted in online social media. Our approach is based on first identifying the positive / negative polarity of terms, i.e., whether a certain term (e.g., a word) is normally used in a positive or negative context, and then to infer the sentiment of a given text based on the polarity of the terms present in the text. A key challenge in this approach is that in online social media, different users use different words while expressing similar opinion. To address this, we use the well-known lexical database WordNet to identify groups of words which are synonymous to each other. We apply our proposed methodology on a large publicly available dataset containing content from six different online social media, which has been labeled as positive / negative by human annotators, and find that our methodology achieves better performance than several approaches developed earlier.

KeywordsOnline social network Sentiment analysis WordNet

1. Introduction

Online Social Networks (OSNS), such as Facebook, Twitter, and Youtube, are presently used by hundreds of millions of users, not only to commtmicate with friends, but also to post content on various topics of interest. The users in these social networking sites generate huge amotmts of content every day (which is known as "user-generated content’), and this content. is increasingly being usedfor a variety of data mining applications, ranging from content search [18], to opinion mining [10,21].

One of the important research problems in mining of user-generated content in OSNS is sentiment analysis or opinion mining. In its simplest form, this involves detecting whether a piece of text. reflects positive sentiment. (e.g., happiness, enjoyment) or negative sentiment (e.g., sadness or anger or fear) of the user who post.ed / wrote the t.ext. For instance, the text. “absolutely awesome"indicates positive sentiment, while the text “i feel really sick“ indicates negative sentiment. l\Iore complex variants of the problem include graded sentiment analysis, i.e., attempting t.o infer the degree of positive or negative sentiment; for instance, “i feel really sick” indicates nmch more negative sentiment than “i Bee sick”. However, in this study, we limit our attention to binary sentiment detec-

mailto:[email protected]

tion, i.e., simply infe.rring whether a. piece of text reflects positive or negative sentiment.

Sentiment analysis involves use of data mining and natural language processing techniques for inferring the mood or opinion of users from the text. they write. Sentiment analysis of text generated in online social media has many important applications. For instance, in marketing, it helps in judging the success of an advertisement campaign or new product launch (e.g.. Samsung launching a new version of the Galaxy phone), such as determining whether a product. or service is being liked by the general population. Again, se.nt.in1ent analysis on tweets posted in the Twitter social network has been used to predict. the majority decision in elections [21], a11d to judge the general mood of people during important socio-economic events.

Sentiment. analysis has been studied for several years, am] a large number of algorithms have been proposed for sentiment detection from English text. [11] (see Sect..'2 for related work). It should be noted that sentiment analysis is a challenging problem due to several reasons. First, different people have different. ways of expressing sentiment, which makes it difficult to design a COl11I11011 methodology for all users. Second, most traditional text processing algorithms rely on the fact that small differences between two pieces of text do not change the meaning significantly. However, in sentiment. analysis, the text. “Bob is good" is very different (in fact, has opposite polarity) from “Bob is not good”.

Over and above the fact that. sentiment analysis is a challenging problem, there are several additional cha.llenges in sent.iment analysis of content posted in online social media. The content. posted in such media is usually very small, e.g._. a tweet in the Twitter social network can be at most 140 characters in length. It is especially challenging to detect the sentiment of at user based on such little information, because of the lack of context. More iniportantly, while posting text in onli11e social media, users frequently use inforinal, grammatically incorrect language. They also use conversational slang words, acronyms (e.g._. ‘LOL' for ‘Laugh Out. Loud’ or ‘O.\IG’ for ‘Oh My God’), and so on, to indica.te their sentiment. As a result, algorithms designed for formal English text often do not give accurate results for text. in online social media [10]. Also note that people often mix positive and negative sentiments in the same text. In normal text (such as books or newspaper articles), positive and negative comments are usually contained in separated sentences, which is somewhat manageable by analyzing one sentence at a time. However, in the more informal medium like online social media or blogs. users often combine different opinions in the same sentence. such as “The Indian team played very well. but were lagued by bad innpiring decisions": though this is easy for a human to understand. it is more difficult for a machine to judge the sentiment.

In this paper, we propose a simple hut effective methodology to infer the sentiment of text posted in online social media (details in Sect.3). \\"e take a lexical approach. where we first attempt to infer sentiment scores for specific tokens or words. The sentiment scores of a word basically indicate the likelihood of the word being used to convey a positive or negative sentiment. For instance. the word "impressive" is much more likely to he used in a positive sense than a negative sense: on the contrary, the word “ugly” is predominantly used in a negative sense. Then. we infer the sentiment of a given piece. of text by using the sentiment scores of the tenns contained in the text. One of the principal difiicultics in this approach is that in online social media, different users have widely varying styles of writing text, and they use very different words to indicate similar sentiment. To address this, we use the. well-known \\"ordNet lexical directory [22] to identify groups of words which are synonymous to each other, and then treat all synonymous words (tokens) uniformly.

We applied our proposed methodology over six publicly available datasets [19] containing textual messages from six different online social media - Youtube. Twitter. .\lySpace. Runners \Vorld, BBC and Digg (detail in Sect.-l). We also compared the performance of our approach with that of several state-of-the-art sentiment detection approaches. Our evaluation (details in Sect. 5) shows that though

our proposed method is very simple. it achieves similar or hetter accuracy than most of the stateof-the-art approaches.

2. Related Work

A lot of recent research has focused on inferring the sentiment opinions of users from content generated in online social networks [10,21]. Broadly, there are two different types of approaches for sentiment detection [9] — (i) machine-learning-based approaches. and (ii) lexical-based approaches. We discuss some such studies in this section. The reader is referred to [£1.12] for a detailed survey and comparison among different sentiment detection approaches.

Machine learning based methods are developed based on supervised cla$ification. where sentiment detection is modeled as a binary classification problem (i.e., positive or negative). For instance. Go et al. [8] proposed to use different It gram based features in conjugation with part-of-speech tags to train supervised classifiers (e.g., Naive Bayes. SVM. Maximum Entropy) for sentiment detection. The primary difficulty faced by the machine learning approaches is that they need labeled data to train the classifiers, and such labeled data is difficult to obtain for online social media. A popular approach used to automatically generate labeled data is to rely on emoticons (such as ‘:-)' and ‘:-(’) which are frequently used by people to indicate their sentiment while posting content to online social media [4]. For instance. [15] used emoticons to form the training set for classifiers, i.e., the data was lebeled ‘positive’ if it contained happy emoticons, and ‘negative’ if it contained sad or angry emoticons. On the other hand, lexical approaches make use of a predefined list of words (usuall_v called a ‘token list ). where each word (token) has a specific sentiment score [3]. Different. lexicalbased approaches have been proposed in literature. such as Linguistic Inquiry and Word Count [17], SASA |l6|, Sentistrength [20], and so on.

Though sentiment analysis of English text has been well studied [11]. sentiment inference algorithms trained on proper English text (e.g., lexical approaches which use standard English token lists) do not work as well when applied to online content [10] because online content frequent contains abbreviations and users do not use proper spellings or grannnar. In fact, machine learning based approaches have been found to be more suitable for sentiment analysis of Online content. than the lexical approaches [1, 17]. To address the difficulties in using standard English token lists. recent lexical approaches [10] attempt to construct their own token-lists which are specific to the content posted in Online social media. The present study follows a similar approach, as detailed in later sections.

3. Proposed Methodology

In this section, we discuss the proposed methodology in detail. We consider that the input to the methodology will be a set of text messages. and the output will be a binary polarity (positive / negative) for each message.

Pre-processing: In most online social sites, textual messages contain URLs. punctuation marks, user-names. numbers, special characters (except emoticons). whitespaces. and so on. These are not import ant for sentiment analysis, hence we firs! tokenize the messages and filter out these tokens (if any) from the inessages. We also filter out a standard set of English stop-words (e.g.. ‘a'_. ‘the’).

Inferring Sentiment Scores of Tokens: We take a lexical approach for sentiment detection. We first construct a token list which associates two sentiment scores with each token (word). The sentiment scores for a token t indicates the number of times t is used in a positive sense. and the number of times t is used in a negative sense. Prior research on sentiment detection has already produced such token

lists for English language text [3]. But it has been seen that token lists meant for normal English text do not work well with content. posted in online social networks |10]. Hence, we create a token-list specific to the input set of text messages.

To construct the token list. we need a set of clearly positive and negative messages (because we want to count the number of times a certain token is used in a positive / negative message). We initially derive a set of clearl_v positive and negative messages, by considering only uies-rages that contain positive or negative emoticons. Note that prior research on sentiment analysis has shown the utility of emoticons, which frequently match the true sentiment of the writer.

Finally, in the normalized token list, if a certain token t has a higher (or equal) positive count than the negative count, we consider that tolten to be positive: otherwise the token is taken to be negative, as shown in Table2 (4"' column). We can also consider a sentiment score for each token, which is simply the difference between the positive and negative counts for the token (see Table 2, last column) — this score is greater than 0 for positive tokens, and less than 0 for negative ones.

Inferring Sentiment of a Text Message: As stated above, we computed a token list, where every token has a positive / negative polarity. Now. given a text message. we identify the positive and negative tokens contained in the message. and add the sentiment scores for the tokens. If the total sentiment score (for the tokens in this message) comes out to be greater than zero, we judge the message to be positive; on the other hand, if the total sentiment score comes out to be less than zero, the message is judged to be negative.

4. Dataset

To judge the performance of sentiment analysis schemes, we use a set of six datasets that were publicly made available by the Sentistrength research [19]. These consist of six sets of messages collected from different online social media sites — l\l_vSpace, Twitter, Digg, BBC forum. Runners World forlun, and YouTube. The advantage of using these datasets is two-fold. First, the messages have been pre-annotated by human annotators as positive or negative, which can be used as ground truth for sentiment analysis task. Second, a recent study I9] applied several sentiment detection approaches over this dataset: hence, we can directly compare the performance of our approach with that of several others.

In the datasets. each message is labeled by lmnian annotators. and has two scores — a ‘mean positive’ score, and a ‘mean negative‘ score. These two scores represent two weighted metrics for positive and negative polarity of the message. Each score is between 1 and 5, where 5 means that the message is highly positive or highly negative. and 1 means weakly positive or negative.

In this work, we focus on binary polarity only. i.e., we attempt to label a given text as positive or negative. Hence, we considered a single polarity value for each message. For a particular message, if the mean positive score is greater or equal to the mean negative score, then we consider the polarity of this message to be positive, and negative otherwise. Table 3 shows some examples of messages from the Youtube and Twitter datasets. Also shown are the mean positive and mean negative scores (as originally given in the dataset), and the binary polarity (5 indicates positive, and 0 indicates negative) computed by us for each message. Finally. 'Ihble-I summarizes the number of messages in the six da.tasets. along with the number of messages having positive and negative polarity.

5. Evaluation of the Approach

In this section. we evaluate the performance of the proposed approach, and compare its performance with some prior approaches.

Table 3. Examples of messages in the dataset [19] - comments in Youtube, and tweets posted in Twitter. Also shown are the mean positive and mean negative scores given in the dataset, and the binary positive / negative polarity computed by us.

Table 4. Labeled data sets

Metrics for Comparison: To measure the performance of various sentiment detection approaches, we follow the standard methodology [9] of representing results in the form of a confusion matrix, and then compute precision, recall, accuracy, and F-measure. Table 5 shows an example of a confusion matrix, where ‘True’ means the ground truth polarity of a message, and ‘Predicted’ means the polarity predicted by a particular sentiment detection methodology. For the ‘positive’ class, the precision (P) is P = a/(a+c), while the recall is R = a/(a+b). The accuracy (A) is the proportion of the total number of items that are correctly

Table 6. Average prediction performance of different methodologies, for all labeled datasets. The best scora are marked in boldface.

Table 7. F -measure for the various sentiment detection methods, for eacl1 of the six individual datasets. For each dataset, the best score is indicated in boldface.

classified, i.e., A = (a + d) / (a + b+ c+ (1). Finally, the F-measure is the harmonic mean of precision and recall: F = 2i" . The F -rneasure is especially important as it summarizes both precision and recall.

Results of Comparison with Other Approaches: We applied our proposed methodology (described in Sect. 3) on the dataset described in Sect. 4. As stated earlier, the recent study [9] reported the performance of a number of sentiment detection approaches 011 the sa111e dataset, and we co111pare the perfonnance of the proposed methodology with these prior approaches.’ 'I‘able6 compares the performance of the proposed method with that of seven other methods (whose results are obtained from It is evident that the proposed methodology, in spite of being very simple, performs favourably to the prior approaches. Especially, the proposed approach l1as the highest F-measure among all the approaches.

Since the F-measure summarizes both precision and recall, it is the most important metric of performance. Hence, we evaluate the F -measure for Beach individual dataset. The results are given in Table 7. along with the average F-measure over all six datasets. Again, it is seen that the overall F-measure for the proposed method is better than that for the other methods (though Sentistrength [19] performs almost equally well).

2 Note that the study [9] studied one more approach (apart from the ones shown in Table 6), where the polarity of a text is directly given based on the emoticons contained in the text. Since less than 10 % of the text messages in this dataset (as well as in online social media in general) contain emoticons [9, 13], this approach can be used for only 10 % of the messages (as also observed in [9]). Hence, we do not consider this approach for comparison.

Table 8. Prediction performance, averaged over all six labeled datasets

Table 9. Examples of messages from Twitter and BBC, which were mis-classified by

Finally, we compared the perfonnance of the proposed approach with two other methodologies:

(i) A machine learning approach: VVe trained a standard Maximum Entropy Classifier with the dataset. using distinct words contained in the messages as the features, and then applied the classifier on the same dataset.

(ii) (ii) A natural language processing (NLP) approach: Python TextBlob [14] is a standard library for N LP-based tasks, including sentiment analysis. We use this library to predict the sentiment of all messages in the dataset.

Table8 shows the percentage of messages whose polarity was predicted correctly by the three methodologies (averaged over all six datasets). It is evident that the proposed methodology performs better than these state-of-the-art methods.

Error Analysis of the Proposed Methodology: hi this final section, we perform a brief error analysis of the proposed approach by attempting to explain for what types of text messages the approach generally fails to infer sentiment correctly. Table9 shows some examples of text messages from two of the datasets - Twitter and BBC — which were mis-classified by the proposed approach (i.e.. the proposed approach inferred the sentiment as opposite to the true polarity shown in Table 9). We find that the misclassified messages are generally of two types, as described below.3.

First, some of these messages do not contain any strong sentiment bearing words: the positive / negative sentiment is brought about by the whole message instead of few specific words. Examples include “Brings a whole new meaning to 7up!" and “Out for drinks with the guys!”. For such messagm, the token-based approach, at times, fails due to absence of any strongly positive or negative token.

Second, some of the other mis-classified messages contain one or more strongly positive / negative tokens, but the entire message has a polarity that is opposite to that of the few individual tokens. For instance, the message “lol. Is that all you can come up with?” contains the strongly positive token ‘lol’ (abbreviated form of ‘laugh out loud’), but the polarity of the mewage as a whole is negative. The proposed method might end up Inis-classifying such messages due to the presence of the strongly positive / negative token(s).

6. Conclusion

This work presents a simple but effective methodology for sentiment detectionof textual content posted in various online social media. We adopt a lexical approach. based on creating a token list specific to the input. set of text messages. The key step is to use the WordNet dictionary to unify the treatment for groups of words (tokens) that are synonymous with each other. This step Phelps to achieve consistent performance in the face of different words being used by different users (and in different social media). Detailed evaluation over a human annotated dataset shows that the proposed approach gives competitive or Bette performance than several state-of-the-art techniques for sentiment detection.

It can be noted that though the proposed approach gives competitive or better performance than several state-of-the-art methods, the best F-measures achieved by any of the methods is around 0.8. This highlights the fact that detecting sentiment of online text is inherently a very challenging problem, probably due to the wide variety of writing styles of users of online media. Thus, there is sufficient scope for improving the classification performance. Potential methods of improving perfonnance include developing more accurate lexical databases specifically for online media (instead of relying on databases like VVordnet which are originally meant for formal English text), applying normalization techniques (e.g., Min-Max normalization) on the sentiment scores of tokens, and so on. We plan to try some of these approaches as future work.

Sentiment Detection in Online Content: A WordNet based approach

Documents

Transcript of Sentiment Detection in Online Content: A WordNet based approach