Diving into Twitter data on consumer electronic brands
-
Upload
eugene-yan -
Category
Data & Analytics
-
view
107 -
download
2
description
Transcript of Diving into Twitter data on consumer electronic brands
![Page 1: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/1.jpg)
Diving into Twitter dataon consumer electronic brands
![Page 2: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/2.jpg)
Which brands get tweeted about most? Is it mainly positive or negative?
![Page 3: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/3.jpg)
15.3 gb of JSON data downloaded from Twitter’s Streaming API
between 13 – 25 May using Python
![Page 4: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/4.jpg)
Before processing, tweets were in raw JSON format
Time Created Tweet text/status
Username
Tweet location (if available)
No. of followers No. of people followed
No. of statusesLanguage
Data should be optimized as only a fraction of the data used for analysis—
optimization improves performance in models and saves cost and time
![Page 5: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/5.jpg)
The same tweet we saw previously
By optimizing the data,
15.3 gb of json was converted to 757 mb of csv (5% of original size)
After processing, only some fields retained and converted to CSV format
![Page 6: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/6.jpg)
Brand Positive Sentiment
Brand Negative Sentiment
Brand Mixed Sentiment
The list of words for sentiment analysis is adapted from
the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014
Tweets are then tagged for brand and sentiment in R
![Page 7: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/7.jpg)
Initially, collected tweets based on 17 keywords
Samsung
S4
Xperia
HTC
Huawei
BlackBerry
Apple
S5
Sony
Nokia
Note 3Lumia
q5
iPhone
q10
z10
Motorala
![Page 8: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/8.jpg)
“Apple” and “iPhone” accounted for 87% of tweet volume
Removed from keywords during actual data collection to focus on
other brands (, save space, and reduce bandwidth usage)
A trial was conducted with 16 keywords on 11 May, 8 – 9am
1 gb of JSON data was collected in a hour
During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
![Page 9: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/9.jpg)
Samsung
Sony
Nokia
HTC
Huawei
BlackBerry
Motorola
Tweets containing seven keywords were collected from 13 – 25 May
![Page 10: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/10.jpg)
4% of tweets mentioned
> 2 brands; they were
excluded from analysis
8% of tweets had
mixed sentiment
(i.e., positive and
negative sentiment);
they were excluded
from analysis
92% of tweets
remained, each only
mentioning 1 brand
with either “positive”,
“negative”, or
“neutral” sentiment
3,681,942 tweets were collected
After processing, 3,234,678 tweets remained for analysis
![Page 11: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/11.jpg)
Samsung leads in twitter buzz, followed by Sony and Nokia
Together, they make up 75% of twitter buzz
Samsung is the clear leader in twitter buzz, followed by Sony and Nokia
However, Samsung and Sony have wider product offerings
relative to the rest that mainly focus on phones
Also, Huawei’s users may mainly be on Weibo, Renren, etc
![Page 12: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/12.jpg)
Most brands have roughly 1:1 ratio of
positive to negative tweets
Samsung is the exception with ratio of
roughly 3:2
Brands have equal ratio of positive to negative tweets
![Page 13: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/13.jpg)
Dip due to connectivity issues
Brands’ share of tweets is roughly consistent over time
![Page 14: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/14.jpg)
Spikes in tweet volume coincide with product launches
![Page 15: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/15.jpg)
Spikes in tweet volume coincide with product launches
![Page 16: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/16.jpg)
Users who tweet about
BlackBerry tend to be
better connected (i.e.,
higher median of
followers and people
followed)*
* Excluding outliers
Across brands, there is not much difference in user connectedness
The median user has
around 250 followers
and also follows 250
people
![Page 17: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/17.jpg)
50th – 75th percentile of users
who tweet about Sony, HTC,
and Motorola have very high
numbers of all time tweets
(spam bots perhaps?)*
While Nokia is 3rd in twitter buzz
share (14%), users who tweet
about Nokia have least
numbers of all time tweets
Suggests that tweets likely to
come from real users and not
bots (or maybe less active bots)
* Excluding outliers
However, there is a large difference between users’ all time tweets
![Page 18: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/18.jpg)
12833979
followers
11796709
followers
CNN’s tweet on Obama’s BlackBerry was “seen” by most followers
![Page 19: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/19.jpg)
1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
![Page 20: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/20.jpg)
1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
![Page 21: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/21.jpg)
Initially, BlackBerry tweets showed 100% negative sentiment
Culprit was the word “lack”—it was removed
However, removing it reduced negative sentiment for other
brands by 2 – 3 %
An interesting error led to BlackBerry having 100% negative sentiment
![Page 22: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/22.jpg)
Track brands’ managed twitter accounts and conversations to measure engagement Which brands have better engagement with users and why?
Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news?
Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand?
Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions?
Where do we go from here?
![Page 23: Diving into Twitter data on consumer electronic brands](https://reader033.fdocuments.us/reader033/viewer/2022051612/54c664dc4a79594b538b4724/html5/thumbnails/23.jpg)
Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA
Python script to download
tweets in JSON format
Python scripts to convert
tweets from JSON to CSV
(with & without regular
expressions filtering)
R script and sentiment
analysis list of words
R script and sentiment
analysis list of words to
reproduce BlackBerry error