2019-nCoV Twitter big data to aid worldwide emergency...

13
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020 ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72 www.ijiset.com 2019-nCoV Twitter big data to aid worldwide emergency response: Case study on US tweets Koffka Khan 1 and Emilie Ramsahai 2 1 Department of Computing and Information Technology, The University of the West Indies, Trinidad and Tobago, W.I. 2 UWI School of Business & Applied Studies Ltd (UWI-ROYTEC), 136-138 Henry Street, Port of Spain, Trinidad and Tobago Abstract 2019-nCoV has reeked worldwide havoc bringing about a global health crisis. This pandemic continues to spread and emergency healthcare systems are trying to bring ample aid to relevant 2019-nCoV sufferers. During these attacks people use tweets to send their opinions and other relevant information about the virus to others. As tweets have unstructured data Twitter can be categorized as big data source. We analyze the United Stated (US) from a global tweet dataset from Kaggle to determine the polarity of these sentiments. We noticed that the number of retweets are an important factor to consider in tweet analysis. We develop a new Sentiment Analysis System (SAS) using the tweet polarity and retweet counts. From this we were able to discern pertinent negative and positive tweets. We used these tweets to build a word cloud. The word cloud summarizes the important tweet information from the many daily tweets. Further, analysis can drill down into specific problem tweets. We believe that using the SAS can aid global emergency response by targeting persons with specific tweet features, for example, showing the need for additional counseling and/or medical services. Keywords: health crisis, pandemic, 2019-nCoV, big data, US, Sentiment Analysis System, tweet, polarity, retweet, emergency, response. 1. Introduction The 2019-nCoV pandemic has taken the world by surprise [38]. Since its emergence in December 2019 it has spread throughout the world affecting millions. However, though some countries are severely affected there are those that are able to keep the pandemic at bay. Due to its highly infectious nature and consequently rapid spread many countries are left at 2019-nCoV mercy [32]. As a result it is imperative for health care services to effectively locate and target emergency response to those in dire need. However, as the infection is not constant across the world there is a need for a mechanism to provide accurate and up-to-date information regarding the persons made needy by this virus. Public health officials' crucial role is to maintain track of safety problems, such as the emergence and consequent spread of epidemics. Public concern over communicable diseases may be seen as a issue of its own. Thus, it is also important to maintain track of developments in public health issues and to recognise levels in public concern [12]. Nevertheless, tracking public safety issues is not only costly for conventional surveillance technologies, it still suffers from restricted scope and substantial delays. To fix these problems, we use Twitter messages that are accessible free of charge, are generated worldwide and are posted in real time. Unstructured data (or unstructured information) refers to information that either lacks a predefined data model and/or does not fit well into relational tables [11]. Usually, unstructured information is text-heavy, but can also include data such as dates, numbers and facts. Most of the big data sources include unstructured data, like Facebook, Twitter etc… [28]. Tweet messages and other social commentaries need to be evaluated to determine population opinion. Recently, a lot of confidence to decision making process on dynamic and big data [10]. Social network data has grown as a major medium for the identification and tracking of unexpected events. A variety of recent reports have shown that social media data feeds may be used to exploit data to produce actionable response and relief to unexpected events. In recent years there has been many attempts at an initial evaluation of Twitter as an instrument for emergency response in the context of a crisis event [2]. Twitter users prefer to post tweets that raise understanding of the problem and inspire people to act [7]. Tweets were shown to be accurate and offered relevant facts, reinforcing the claim 65

Transcript of 2019-nCoV Twitter big data to aid worldwide emergency...

Page 1: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

2019-nCoV Twitter big data to aid worldwide emergency response: Case study on US tweets

Koffka Khan1 and Emilie Ramsahai2

1 Department of Computing and Information Technology, The University of the West Indies, Trinidad and Tobago, W.I.

2 UWI School of Business & Applied Studies Ltd (UWI-ROYTEC), 136-138 Henry Street, Port of Spain, Trinidad and Tobago

Abstract 2019-nCoV has reeked worldwide havoc bringing about a global health crisis. This pandemic continues to spread and emergency healthcare systems are trying to bring ample aid to relevant 2019-nCoV sufferers. During these attacks people use tweets to send their opinions and other relevant information about the virus to others. As tweets have unstructured data Twitter can be categorized as big data source. We analyze the United Stated (US) from a global tweet dataset from Kaggle to determine the polarity of these sentiments. We noticed that the number of retweets are an important factor to consider in tweet analysis. We develop a new Sentiment Analysis System (SAS) using the tweet polarity and retweet counts. From this we were able to discern pertinent negative and positive tweets. We used these tweets to build a word cloud. The word cloud summarizes the important tweet information from the many daily tweets. Further, analysis can drill down into specific problem tweets. We believe that using the SAS can aid global emergency response by targeting persons with specific tweet features, for example, showing the need for additional counseling and/or medical services. Keywords: health crisis, pandemic, 2019-nCoV, big data, US, Sentiment Analysis System, tweet, polarity, retweet, emergency, response.

1. Introduction

The 2019-nCoV pandemic has taken the world by surprise [38]. Since its emergence in December 2019 it has spread throughout the world affecting millions. However, though some countries are severely affected there are those that are able to keep the pandemic at bay. Due to its highly infectious nature and consequently rapid spread many countries are left at 2019-nCoV mercy [32]. As a result it is imperative for health care services to effectively locate and target emergency response to those in dire need. However, as the infection is not constant across the world there is a need for a mechanism to provide accurate and up-to-date information regarding the persons made needy by this virus. Public health officials' crucial role is to maintain track of safety problems, such as the emergence and consequent spread of epidemics. Public concern over communicable diseases may be seen as a issue of its own. Thus, it is also important to maintain track of developments in public health issues and to recognise levels in public concern [12]. Nevertheless, tracking public safety issues is not only costly for conventional surveillance technologies, it still suffers from restricted scope and substantial delays. To fix these problems, we use Twitter messages that are accessible free of charge, are generated worldwide and are posted in real time. Unstructured data (or unstructured information) refers to information that either lacks a predefined data model and/or does not fit well into relational tables [11]. Usually, unstructured information is text-heavy, but can also include data such as dates, numbers and facts. Most of the big data sources include unstructured data, like Facebook, Twitter etc… [28]. Tweet messages and other social commentaries need to be evaluated to determine population opinion. Recently, a lot of confidence to decision making process on dynamic and big data [10]. Social network data has grown as a major medium for the identification and tracking of unexpected events. A variety of recent reports have shown that social media data feeds may be used to exploit data to produce actionable response and relief to unexpected events. In recent years there has been many attempts at an initial evaluation of Twitter as an instrument for emergency response in the context of a crisis event [2]. Twitter users prefer to post tweets that raise understanding of the problem and inspire people to act [7]. Tweets were shown to be accurate and offered relevant facts, reinforcing the claim

65

Page 2: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

that Twitter has a very strong potential to be a helpful resource in circumstances where quick emergency response is necessary [40], [35], [26]. Sentiment analysis has been widely researched in the area of web review sites with the goal of producing condensed customer views on the related facets of goods [9], [22], [8]. Nevertheless, little research has been undertaken to define the polarity of emotions shared by users through disasters [36]. Identifying these emotions from online social networking platforms may help first personnel identify the complexities of the network, e.g. the interests of core participants, the panics and the relational impacts of experiences among members. Sentiment analysis of disaster-related messages on social media will lead to recognition of the situation and a deeper understanding of the nature of disasters by recognizing the polarity of popular feelings [19], [3]. Researchers found that important tweets were more likely to be retweeted, where tweets aligned with policy or action show the best relationship with retweeting [18]. They also indicated that Tweets with a conveyed emotion were more likely than neutral tweets to be retweeted. In addition, it was shown that tweets which contained important content related to Policy or Action or a Group as well as sentiment were most likely to be retweeted. Retweeting not only shows interest in a message but also trust in the message and the originator and satisfaction with the quality of the message [23]. Finally, researchers demonstrate how they can assign key players and secret identities using a retweet network to different users who likely share the same ideology or may belong to the same interest group [15]. This paper analyzes the sentiments of tweets shared on Twitter during the spread of the 2019-nCoV pandemic. Our goal is to process these streams of message to better understand how people feel, and to respond more appropriately, maybe by official messaging. By focusing on a specific sentiment-to-retweet portion of tweets may give insights on specific important topics that may be missed when analysing the complete data. We found that a more in-grained look at the positve and negative tweets may help to distinguish different kind of feelings, for instance joy or fear. Further, based on negative sentiments persons that need medical or psychological attention can be given appropriate treatment. It can be used as an important tool for advising humanitarian efforts and improving the manner in which information messages are prepared for the crowd about an event. In Section 2 we discuss how we developed our Sentiment Analysis system (SAS). To test SAS we used a 2019-nCoV dataset from Kaggle [37]. Our results and a discussion in Section 3. We describe relevant parts of SAS output and show how the results can be used to identify tweets for identifying key-terms to guide emergency response. Finally, we present our concluding remarks in Section 4.

2. Methods

You can very easily use Python to access Twitter data. There are two broad types of Twitter APIs, Streaming APIs [24] and Representational state transfer (REST) APIs [27]. Live streaming data from Twitter gives access to Real Time Twitter data i.e. Real time tweets [14], [13], [20] all around the globe. Historic Twitter data gives access to historic tweets or follower data [29]. There are plenty of Python libraries to choose from such as Twython [33], Tweepy, python-twitter [31] and many more. All of these allow you to access Twitter data using the Twitter API. Further, lets suppose the Tweepy library [17] was used. Tweepy's features or functions can be manipulated to access any form of Twitter data for instance get the Twitter data as a JSON object [6], use the python json module to read it, or load it directly into a pandas data frame [4]. This is the first step in obtaining Twitter data and it involves registering with Twitter and obtaining a valid Twitter account. Alternatively, there are persons or groups that download the Twitter data feeds and provide it to researchers for example Kaggle [37]. In our case we opted for the latter and obtained an 2019-nCoV Kaggle dataset [37]. It contained daily global Twitter feeds for the period March 12th to 28th 2020. It was five giga-bytes in size. The novel method used to produce the sentiment scores and word clouds consisted of the following steps:

1. Reading the tweets; 2. Concatenating all the tweets into a list of words; 3. Calculating the polarity value of tweets; 4. Obtaining the number of retweets per tweet;

66

Page 3: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

5. Obtain tweets with meaningful sentiments by matching the highest retweets to their associated negative and positive compound polarity values;

6. Generating the word cloud using the tweets from the ranges generated in the previous step; 7. Identify key-terms from the words showing the most meaningful sentiments from the word cloud; 8. Use the key-terms to search for the crowd-based most meaningful sentiments tweets that is those with the

highest retweet counts. Output those matching tweets; 9. Pass the matching tweets information to appropriate government bodies; 10. Allocate relevant aid;

Before we used the data for processing we had to clean the data for the desired tasks. Preprocessing the text data is an important step because it makes the raw text ready for mining, i.e., extracting information from the text so that applying machine learning algorithms to it becomes simpler. If we skip this step then there is a higher chance of working with noisy and inconsistent data. The aim of this step is to clean noise that is less important to finding the sentiment of tweets like punctuation, special characters, numbers, and words that do not hold much weighting in the context of the text. Tweets contain lots of twitter handles. All these twitter handles will be omitted from the data as they don't convey much detail. Special characters, punctuations and numbers do not help much in sentiment analysis. We remove them from the text. A list of stop words refers to a collection of terms or words which do not have any intrinsic useful knowledge. Stop words pose difficulties in distinguishing key concepts and terms from textual sources when they are not excluded because of their disproportionate presence in both frequency and textual sources [41]. We remove stop words from the text. Finally, we remove the Uniform Resource Locator (URL) [21] instances from the text. All programming was done using the Python version 3.7 [43]. Natural Language Toolkit (NLTK) [30] is a leading platform for developing Python programs that work with data in the human language. VADER (Valence Aware Dictionary and sEntiment Reasoner) [5] is a lexicon and rule-based sentiment analysis tool specifically tailored to social media sentiments. It is a module within NLTK. VADER uses a variation of sentiment lexicon as a set of lexical characteristics (e.g., words) that are usually classified as either positive or negative according to their semantic orientation []. In dealing with social media messages, NY Times editorials, film reviews, and product reviews, VADER has been found to be very effective. This is because VADER not only tells us about the score for Positivity and Negativity but also tells us how positive or negative a feeling is. VADER has many benefits to it. These include (1) it works extremely well on social media post, but it quickly spreads through multiple domains, (2) it needs no training data but is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon, and (3) it is quick enough to be used with streaming data. Word Cloud is a technique of data visualization used to display text data, in which the size of each word indicates its frequency or importance. Using a word cloud you can highlight meaningful textual data points. Word clouds are commonly used to analyze data from websites with social networks. Modules are needed to generate word cloud in Python-matplotlib [16], pandas, and wordcloud [1].

3. Results and discussion

In step one we obtain the 2019-nCoV dataset from Kaggle [37]. Our target was English speaking first world countries with high number of tweets. We first selected English speaking countries such as US, Canada and England but the US was most affected by 2019-nCoV in March 12th to th 28th 2020, see Table 1. Using the Natural Language Processing tools we were able to categorize the tweets into positive, negative and neutral for each day, see Figure 1. There were a total of 194481 tweets during this seventeen (17) day period. For the entire period there was a maximum of 5391 negative tweets on the 13th of March and a minimum of 2139 on the 14th. Likewise there was a maximum of 7364 positive tweets on the 13th of March and a minimum of 3733 on the 14th. This may have occurred as March 13th 2020 was a black Friday [42], [34] contributing to a lot of people sending out tweets and then having a sharp decline the following day. For our analysis we choose three days: March 13th, 21st and 28th 2020. This approximates a start, middle and end of the period March 12th to 28th 2020. On 13th, 21st and 28th of March there were 18356, 12696 and 9298 recorded tweets, respectively. We then concatenate our selected tweets into a list of words, step two.

Table 1: 2019-nCoV statistics for US.

67

Page 4: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Fig. 1 US sentiment.

For the list of tweets we had to determine their polarity (whether positive, negative or neutral) and compound values. This was step three in our method. The tweets were separated into negative and positive based on their compound values. Negative compound values indicated negative tweets whereas positive compound values indicated positive tweets. We first show the results of the sentiment analysis software by showing the most negative and positive tweets. Tables 2, 3 and 4 show the top negative and positive tweets for the 13th, 21st and 28th of March 2020. However, examining these tweets closely we observe that the sentiment analysis software in searching for extremely negative and positive tweets gives feedback that is not very useful. We found that tweets with very high and low compound values were generally not retweeted and are not particularly useful in determining the sentiment of the crowd in this analysis so they can be discarded. Hence, we had to find a way in which to mine the data for more useful tweets. Our aim was to get tweets that could guide leaders of a country when dealing with unexpected events such as the 2019-nCoV.

Table 2: US March 13th 2020 positive compound valued tweets and sum of retweet counts.

68

Page 5: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Table 3: US March 21st 2020 positive compound valued tweets and sum of retweet counts.

Table 4: US March 28th 2020 positive compound valued tweets and sum of retweet counts.

Each tweet has an associated retweet count. In step four we obtain the number of retweet counts for each tweet. This was already available in a column of the dataset. We now look at the top retweets. Tables 5, 6 and 7 show the top three retweets per day for the 13th, 21st and 28th of March 2020. We observed that compared to the top negative and positive tweets these tweets had a better chance at guiding leadership policy as they were more pointed towards pertinent information relating directly to the 2019-nCoV pandemic. Our premise was that a more fine-grained classification on tweets with a high retweet value may help to distinguish different kind of relevant crowd sentiments.

Table 5: US March 13th 2020: Top retweets, polarities, retweet counts and sentiment compound scores.

Table 6: {US March 21st 2020: Top retweets, polarities, retweet counts and sentiment compound scores.

69

Page 6: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Table 7: US March 28th 2020: Top retweets, polarities, retweet counts and sentiment compound scores.

In step five we obtain tweets with meaningful sentiments by matching the highest retweets to their associated negative and positive compound polarity values. The negative and positive tweets were grouped with the sum of retweet counts for the particular range in which they fell. This is shown on Tables 8, 9 and 10 for the 13th, 21st and 28th of March 2020, respectively. We now choose the negative or positive tweets from the respective ranges with the highest sum of tweet counts. For instance, on March 13th we would select tweets within the range of -0.478 to -0.378 as it contained the highest number of retweets, 3797 for negative tweets. For positive tweets we would select tweets within the range of 0.402 to 0.502 as it contained the highest number of retweets, 2659. On March 21st we would select tweets within the range of -0.276 to -0.176 as it contained the highest number of retweets, 1620 for negative tweets. For positive tweets we would select tweets within the range of 0.402 to 0.502 as it contained the highest number of retweets, 1493. On March 28th we would select tweets within the range of -0.784 to -0.684 as it contained the highest number of retweets, 5389 for negative tweets. For positive tweets we would select tweets within the range of 0.602 to 0.702 as it contained the highest number of retweets, 1093.

Table 8: US March 13th 2020 negative and positive compound valued tweets and sum of retweet counts.

Table 9: US March 21st 2020 positive compound valued tweets and sum of retweet counts.

Table 10: US March 28th 2020 positive compound valued tweets and sum of retweet counts.

70

Page 7: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

We now perform step six of our method. We generate word clouds using the tweets from the ranges generated in the previous step. We would then take the highest range of negative tweets to retweet values and produce a word cloud for these. We would do the same for the highest range of positive tweets to retweet values. This would result in two world clouds for the given day, see Figures 1, 2, 3, 4, 5 and 6. We realize that though the tweets within these ranges are closer to crowd-based sentiments they still needed to be fine tuned to find specific meaning. To do this we move to analyzing our derived word clouds.

Fig. 2 March 13th 2020: \\Word cloud of selected negative tweets.

71

Page 8: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Fig. 3 March 13th 2020:\\ Word cloud of selected positive tweets.

Fig. 4 March 21st 2020:\\ Word cloud of selected negative tweets.

72

Page 9: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Fig. 5 March 21st 2020:\\ Word cloud of selected positive tweets.

Fig. 6 March 28th 2020:\\ Word cloud of selected negative tweets.

73

Page 10: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Fig. 7 March 28th 2020:\\ Word cloud of selected positive tweets.

In step seven we identify key-terms from the words showing the most meaningful sentiments from the word cloud. Based on these word clouds per day the words with the highest sentiment crowd-based significance would be selected [25]. On the 13th of March 2020 the sentiment-compound range values of -0.478 to -0.378 revealed 966 tweets while the 0.402 to 0.502 range values elicited 1261 tweets. On the 21st of March 2020 the sentiment-compound range values of -0.276 to -0.176 revealed 306 tweets while the 0.402 to 0.502 range values elicited 895 tweets. On the 28th of March 2020 the sentiment-compound range values of -0.784 to -0.684 revealed 360 tweets while the 0.602 to 0.702 range values elicited 575 tweets. By using our word clouds for 'hints' we would search through these 'selected' tweets to find important tweets. The importance of the words on the word cloud can be based on frequency, rank, co-occurrence etc... The persons implementing the software can choose their basis but we used frequency in our experiment. The more frequent words were shown with larger size text. We assume that words spoken more often than others in this environment would represent a crowd-based significance to that particular word. Thus, to do our search for the word with the highest sentiment crowd-based significance within the 'selected' tweets, step eight. We would match this word within a tweet with the corresponding highest retweet value. This process can be repeated any number of times searching for the second highest retweet value then third etc…. This procedure is performed on the highest negative and and positive significant words but can be repeated any number of times for the second significant word then third …. The result would be a collection of tweets that most likely represents the interest of the tweeters. In step nine the list can be sent to appropriate community leaders who can take action on these tweets. For example, during these times many Governments have daily news bulletins. Government officials can use the SAS results to better target issues when talking to the population which is part of the final step ten. Their now guided correspondence would target issues pertaining to the majority of people while still facilitating niche groups of citizens. Our results are shown on Tables 11, 12 and 13. We see that instead of non-meaningful viewpoints the 'selected' key-term tweets now show a more balanced and comprehensive crowd-based sentiment significance to the topics being tweeted.

Table 11: US March 13th 2020: Selected tweets, polarities, retweet counts and sentiment-compound range on emergency (negative) and people (positive) key-terms from word cloud.

74

Page 11: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

Table 12: US March 21st 2020: Selected tweets, polarities, retweet counts and sentiment-compound range on pandemic (negative) and social distancing (positive) key-terms from word cloud.

Table 13: US March 28th 2020: Selected tweets, polarities, retweet counts and sentiment-compound range on home (negative) and help (positive) key-terms from word cloud.

Our method of sentiment detection is effective in detecting crowd sentiment. It can serve as gauge for measuring feeling towards an event. We are mindful of the drawback of relying solely on the sentiment that is gathered from social networks to pose as the crowd's sentiment. Thus, we added the retweet count as an additional parameter for our method. Together they can give a feeling or pulse of the crowd towards an event. This situation can be further clarified by looking further into the types of users who are involved in expressing their feelings on social networks and the sample representation. \par It can be used to craft messages for the crowd e.g. before a government news bulletin to get feedback on how the crowd thinks. The method is simple and accurate, and takes into account nearly all corner conditions and error susceptibility. We have demonstrated its feasibility on significant-sized datasets consisting of live twitter feeds. Based on the findings, we can conclude that social media data can be a viable addition to traditional survey methodologies and provide a more in-depth insight into an event's real-time public perception. The key focus of this work involves promoting the use of social media as an important and uncensored tool for reporting of disasters on various fronts.

4. Conclusions

2019-nCoV has reeked worldwide havoc bringing about a global health crisis. This pandemic continues to spread and emergency healthcare systems are trying to bring ample aid to relevant 2019-nCoV sufferers. During these attacks people use tweets 'big data' to send their opinions and other relevant information about the virus to others. We analyze the United Stated (US) from a global tweet dataset from Kaggle to determine the polarity of these sentiments. We noticed that the number of retweets are an important factor to consider in tweet analysis. We develop a new Sentiment Analysis system (SAS) using the tweet polarity and retweet counts. From this we were able to discern pertinent negative and positive tweets. We used these tweets to build a word cloud. The word cloud summarizes the important tweet information from the many daily tweets. Further, analysis can drill down into specific problem tweets. We believe that using the SAS can aid global emergency response by targeting persons with specific tweet features, for example, showing the need for additional counseling and/or medical services. In future work we plan to modify SAS to take into account different types of feelings. We can also include favourites, friends and followers into our SAS system to further refine our model. We also plan also to extend our experiments over a greater set of tweets around the event in order to further test SAS. Another change that we intend to incorporate is the implementation of entity extractors to detect sentiments against a specific entity in a post that can provide more knowledge

75

Page 12: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

about sentiment holders and provide insights into sarcastic Tweets, which is a difficult task for Tweet analysis and mining. We plan to make the use of other the social networking sites such as Facebook and Google+ as part of future research to extend our 2019-nCoV dataset. Such portals often have dedicated emergency management pages run by different governmental and private organisations, which can be analyzed to collect organized data. References [1] A. A. Name, and B. B. Name, Book Title, Place: Press, Year. [1] Amal, S., Adam, M., Brusilovsky, P., Minkov, E., Kuflik, T., 2019. Enhancing explainability of social recommendation using 2d

graphs and word cloud visualizations, in: Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion, pp. 21–22.

[2] Athanasia, N., Stavros, P.T., 2015. Twitter as an instrument for crisis response: The typhoon haiyan case study, in: The 12th International Conference on Information Systems for Crisis Response and Management.

[3] Bai, H., Yu, G., 2016. A weibo-based approach to disaster informatics: incidents monitor in post-disaster situation via weibo text negative sentiment analysis. Natural Hazards 83, 1177–1196.

[4] Betancourt, R., Chen, S., 2019. pandas library, in: Python for SAS Users. Springer, pp. 65–109. [5] Bharathan, K., Varma, P.D., 2019. Polarity detection using digital media, in: 2019 9th International Conference on Advances in

Computing and Communication (ICACC), IEEE. pp. 181–187. [6] Bourhis, P., Reutter, J.L., Vrgo£, D., 2020. Json: Data model and query languages. Information Systems 89, 101478. [7] Boyd, D., Golder, S., Lotan, G., 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on twitter, in: 2010 43rd Hawaii

International Conference on System Sciences, IEEE. pp. 1–10. [8] Chachra, A., Mehndiratta, P., Gupta, M., 2017. Sentiment analysis of text using deep convolution neural networks, in: 2017 Tenth

International Conference on Contemporary Computing (IC3), IEEE. pp. 1–6. [9] Chang, Y.C., Ku, C.H., Chen, C.H., 2019. Social media analytics: Extracting and visualizing hilton hotel ratings and reviews from

tripadvisor. International Journal of Information Management 48, 263–279. [10] Diamantoulakis, P.D., Kapinas, V.M., Karagiannidis, G.K., 2015. Big data analytics for dynamic energy management in smart grids.

Big Data Research 2, 94–101. [11] Feldman, R., Sanger, J., et al., 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge

university press. [12] Freitas, D.P., Borges, M.R., Carvalho, P.V.R.d., 2020. A conceptual framework for developing solutions that organise social media

information for emergency response teams. Behaviour & Information Technology 39, 360–378. [13] Hasan, M., Orgun, M.A., Schwitter, R., 2019a. Real-time event detection from the twitter data stream using the twitternews+

framework. Information Processing & Management 56, 1146–1165. [14] Hasan, M., Rundensteiner, E., Agu, E., 2019b. Automatic emotion detection in text streams by analyzing twitter data. International

Journal of Data Science and Analytics 7, 35–51. [15] Hrishiah, M., Safar, M., Mahdi, K., 2016. Modeling twitter as weighted complex networks using retweets, in: 2016 IEEE/ACM

International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE. pp. 701–702. [16] Hunt, J., 2019. Introduction to matplotlib, in: Advanced Guide to Python 3 Programming. Springer, pp. 35–42. [17] Karthiga, M., Aravindhraj, N., Priyanka, S., et al., 2019. Machine learning-based sentiment analysis of twitter data, in: 2019

International Conference on Advances in Computing and Communication Engineering (ICACCE), IEEE. pp. 1–8. [18] Keib, K., Himelboim, I., Han, J.Y., 2018. Important tweets matter: Predicting retweets in the# blacklivesmatter talk on twitter.

Computers in Human Behavior 85, 106–115. [19] Leykin, D., Aharonson-Daniel, L., Lahad, M., 2016. Leveraging social computing for personalized crisis communication using

social media. PLoS currents 8. [20] Li, X., Xie, Q., Jiang, J., Zhou, Y., Huang, L., 2019. Identifying and monitoring the development trends of emerging technologies

using patent analysis and twitter data mining: The case of perovskite solar cell technology. Technological Forecasting and Social Change 146, 687–705.

[21] Mason, H., Stern, P., 2019. Systems and methods for influence of a user on content shared via an encoded uniform resource locator (url) link. US Patent 10,504,192.

[22] Mehraliyev, F., Kirilenko, A.P., Choi, Y., 2020. From measurement scale to sentiment scale: Examining the effect of sensory experiences on online review rating behavior. Tourism Management 79, 104096.

[23] Metaxas, P., Mustafaraj, E.,Wong, K., Zeng, L., O’Keefe, M., Finn, S., 2015. What do retweets indicate? results from user survey and meta-review of research, in: Ninth International AAAI Conference on Web and Social Media.

[24] Morstatter, F., Pfeffer, J., Liu, H., 2014. When is it biased? assessing the representativeness of twitter’s streaming api, in: Proceedings of the 23rd international conference on world wide web, pp. 555–556.

[25] Musto, C., Semeraro, G., Lops, P., de Gemmis, M., 2015. Crowdpulse: A framework for real-time semantic analysis of social streams. Information Systems 54, 127–146.

[26] Neppalli, V.K., Caragea, C., Squicciarini, A., Tapia, A., Stehle, S., 2017. Sentiment analysis during hurricane sandy in emergency response. International journal of disaster risk reduction 21, 213–222.

76

Page 13: 2019-nCoV Twitter big data to aid worldwide emergency ...ijiset.com/vol7/v7s7/IJISET_V7_I7_08.pdf · that Twitter has a very strong potential to be a helpful resource in circumstances

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 7 Issue 7, July 2020

ISSN (Online) 2348 – 7968 | Impact Factor (2020) – 6.72

www.ijiset.com

[27] Ong, S.P., Cholia, S., Jain, A., Brafman, M., Gunter, D., Ceder, G., Persson, K.A., 2015. The materials application programming interface (api): A simple, flexible and efficient api for materials data based on representational state transfer (rest) principles. Computational Materials Science 97, 209–215.

[28] Pääkkönen, P., Pakkala, D., 2015. Reference architecture and classification of technologies, products and services for big data systems. Big data research 2, 166–186.

[29] Pezanowski, S., MacEachren, A.M., Savelyev, A., Robinson, A.C., 2018. Senseplace3: a geovisual framework to analyze place–time–attribute information in social media. Cartography and Geographic Information Science 45, 420–437.

[30] Rajput, A., 2020. Natural language processing, sentiment analysis, and clinical analytics, in: Innovation in Health Informatics. Elsevier, pp. 79–97.

[31] Razak, C.S.A., Zulkarnain, M.A., Ab Hamid, S.H., Anuar, N.B., Jali, M.Z., Meon, H., 2020. Tweep: A system development to detect depression in twitter posts, in: Computational Science and Technology. Springer, pp. 543–552.

[32] Ross, S.W., Lauer, C.W., Miles, W.S., Green, J.M., Christmas, A.B., May, A.K., Matthews, B.D., 2020. Maximizing the calm before the storm: Tiered surgical response plan for novel coronavirus (covid-19). Journal of the American College of Surgeons .

[33] Samaras, L., García-Barriocanal, E., Sicilia, M.A., 2020. comparing social media and google to detect and predict severe epidemics. Scientific Reports 10, 1–11.

[34] Saura, J.R., Reyes-Menendez, A., Palos-Sanchez, P., 2019. Are black friday deals worth it? mining twitter users’ sentiment and behavior response. Journal of Open Innovation: Technology, Market, and Complexity 5, 58.

[35] Shelton, T., Poorthuis, A., Graham, M., Zook, M., 2014. Mapping the data shadows of hurricane sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum 52, 167–179.

[36] Singh, N., Roy, N., Gangopadhyay, A., 2019. Analyzing the emotions of crowd for improving the emergency response services. Pervasive and Mobile Computing 58, 101018.

[37] Smith, Shane, 2020. Tweets using hashtags associated with coronavirus. Data retrieved March 30, 2020 from Kaggle, https://www.kaggle.com/smid80/coronavirus-covid19-tweets.

[38] Sohrabi, C., Alsafi, Z., O’Neill, N., Khan, M., Kerwan, A., Al-Jabir, A., Iosifidis, C., Agha, R., 2020. World health organization declares global emergency: A review of the 2019 novel coronavirus (covid-19). International Journal of Surgery .

[39] Soni, R., Sharma, S., Fagna, H., Mittal, S., et al., 2019. News analysis using word cloud, in: Advances in Signal Processing and Communication. Springer, pp. 55–64.

[40] Tapia, A.H., Moore, K., 2014. Good enough is good enough: Overcoming disaster response organizations’ slow social media data adoption. Computer supported cooperative work (CSCW) 23, 483–512.

[41] Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., Kingsbury, P., Liu, H., 2018. A comparison of word embeddings for the biomedical natural language processing. Journal of biomedical informatics 87, 12–20.

[42] Ye, X., She, B., Li, W., Kudva, S., Benya, S., 2020. What and where are we tweeting about black friday?, in: Urban and Regional Planning and Development. Springer, pp. 173–186.

[43] Zadka, M., 2019. Installing python, in: DevOps in Python. Springer, pp. 1–6. Koffka Khan received the B.Sc., M.Sc., M.Phil., and D.Phil degrees from the University of the West Indies (UWI). He is currently an Assistant Lecturer at UWI and has up-to-date, published numerous papers in journals \& proceedings of international repute. His research areas are computational intelligence, routing protocols, wireless communications, information security, adaptive video streaming and machine learning. Emilie Ramsahai is a consulting Data Scientist, with more than 20 years industry experience. She is currently working with UWI-Roytec in programme development and course writing. She completed her PhD degree in Statistics and a MSc. degree in Computer Science, both at the University of the West Indies, where she has also lectured the Big Data and Visualisation course from the MSc. in Data Science, offered by the Department of Computing and Information Technology, St Augustine Campus. She also completed her fellowship at the International Centre for Genetic Engineering and Biotechnology (ICGEB) in New Delhi, India and continues to publish and collaborate with a number of researchers in this area.

77