John Conroy
description
Transcript of John Conroy
This talk…• Twitter 101
• Twitter data: Open, Plentiful, Real-time
• Twitter Size, Growth & User Profile
• Acquiring Twitter Data (the easy & the hard way)
• Applications of Twitter data analysis– Business– Research
• Limitations of Twitter data – demographics and spam
Twitter 101• Post short messages
• Follow Other Users
• Messages (tweets) can contain hyperlinks
• i.e. subscribe to see their tweets when you log in
• 1
Twitter 101
Twitter 101
Retweets, at/replies…
Twitter Data is Open, Plentiful, Real-time
• Open attitude to data: most users’ tweets are public (>90%)
– Channels: API, Twitter Search
• Data is plentiful: ~100m tweets per day by Nov. ’10
• Data is real-time: – 140 char posts + retweets = wildfire dissemination of
news & viral content
• Iran protests ‘09:– retweets
Twitter Size, Growth
• Size: 105m users by April ’10• 2.1m new users per week• 600m search queries/day
» Williams(CEO), Chirp, April ‘10
• User growth: 155% p.a.• Daily tweets growing at 550% p.a• ~100m tweets per day by Nov. ‘10
» Conroy/Griffith June ’10 (6 months data)
Twitter User-Profile
• Even Male-Female split• Brand knowledge now ubiquitous• 1/6 as many users as Facebook• Age: 1/3 between 25-34 years old• Better educated, earn more
http://www.edisonresearch.com/twitter_usage_2010.php
Edison research (U.S.-oriented research)
Twitter users: Age
http://www.edisonresearch.com/twitter_usage_2010.php
Edison research (U.S.-oriented research)
Twitter Users Profile
Acquiring Twitter Data
• Twitter Search– http://search.twitter.com– For anybody
• The Easy Way
• The Hard Way
• Twitter APIs– REST, Search, Streaming APIs– Code (Python/PHP/Java etc…)
Acquiring Twitter Data- Twitter Search
Acquiring Twitter Data- Twitter Search
• Things to do with Twitter Search
Acquiring Twitter Data- Twitter Search
• Find business opportunities
• Intel on competitors
• Community-building: Answer “Does anyone know…?” queries in your segment
• Find gripes/compliments on your service
•Find anything else people are saying about you
•…etc…
Acquiring Data from Twitter APIs
• REST api – find out about users – how many friends, how often they tweet, get last N tweets, are they active etc.
• SEARCH api – programmatic access to Twitter search
• STREAMING api – ‘firehose’ of tweets from everyone
What can we do with this data?
• Model the social graph of sub groups: find most-influential users (retweets, replies, follower/friend quotient)
• Eg Modelling the Irish Twittersphere (Conroy, Griffith, 2010)
• Find the ‘true’ social graph described by conversations, find authoritative users, broadcasters
• User engagement metrics (how often they tweet etc.)• Find similar users based on graph theory• Study viral news propagation through this sub-group• Find super-users (with a view to engaging them)
Acquiring Data from Twitter APIs
• Irish users: time since last tweet (c.23k users)
1351
19 491
3192
4077
27452974 2948
4012
252 57
0500
10001500200025003000350040004500
Time Since Last Tweet
Acquiring Data from Twitter APIs
• Most replied-to by Irish users
• Feb-March ’10• 93k replies from
23k users• Also c.7k retweets
(?)
Acquiring Data from Twitter APIs
Predictive Modelling• Business Intelligence
• Non-Twitter example: satellite images of Wal-Mart car-parks to predict earnings – smart but expensive!
Acquiring Data from Twitter APIs
Using Twitter for Predictive Modelling
• Eg 1: holiday destinations
Tweets mentioning [~holiday] plus [DESTINATION]
0
500000
1000000
1500000
2000000
2500000
3000000
2010 2011 2012 2013 2014 2015
Jan-Mar: Year
#Tw
eets
Dublin Prague Dubrovnik
Acquiring Data from Twitter APIs
Actual Tourists
0
10000002000000
30000004000000
50000006000000
7000000
2010 2011 2012 2013 2014 2015
Year
#Tou
rists
Dublin Prague Dubrovnik
• Eg 2: Movie pre-launch “Buzz” & marketing budget (not real figures!)
Acquiring Data from Twitter APIs – Predictive Modelling
0 500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Inception
The A-Team
The Expendables
Our New Movie
#tweets two weeks before launch Opening w/e Take
Brand Sentiment Analysis
• Sentiment analysis of Super Bowl commercials 2010 Conroy and Griffith, 2010
– 300k tweets collected during the game– Probabilistic classification models & machine
learning– Naïve Bayes, Maximum Entropy, (S.V.M.)
– Try to find out which were the most popular commercials
– Hard!! Human language is complex…
Acquiring Data from Twitter APIs
Acquiring Data from Twitter APIs
Sentiment Analysis of Superbowl Commercials: Results
Note: initial manual verification of these results shows
disappointing results… the research continues
What else can we do with this data?
Acquiring Data from Twitter APIs
Limitations of Twitter Data
• Twitter < Facebook for “knowing your customer”
– Facebook has demographics- age, sex etc
• Demographic skewed towards 25-34 yr olds & tech-savvy- not ubiquitous
• Spam: The game can be rigged