Five steps to search and store tweets by keywords
-
Upload
cosmopolitanvan -
Category
Education
-
view
2.835 -
download
0
Transcript of Five steps to search and store tweets by keywords
Five Steps to Search and Store Tweets by Keywords
• Created by The Curiosity Bits Blog (curiositybits.com)
• With the support from Dr. Gregory D. Saxton
(http://social-metrics.org/ )
The output you will get…
Let’s say I want to study Twitter discussions of the missing Malaysian airliner
MH370. I plan to gather all tweets that include the keywords MH370 or
Malaysian.
You will get an ample amount of metadata for each tweet. Here is a breakdown
of each metadata type:
name Def.
tweet_id The unique identifier for a tweet
inserted_date When the tweet is downloaded into your database
language language
retweeted_status Is the tweet a RETWEET?
content The content of the tweet
from_user_scree
n_name
The screen name of the tweet sender
name Def.
from_user_followers_count The number of followers the sender has
from_user_friends_count The number of users the sender is following
from_user_listed_count How many times the sender is listed
from_user_statuses_count The number of tweets sent by the sender
from_user_description The profile bio of the sender
from_user_location The location of the sender
from_user_created_at When the Twitter account is created
retweet_count How many times the tweet is retweeted
entities_urls The URLs included in the tweet
entities_urls_count The number of URLs included in the tweet
entities_hashtags The hashtags included in the tweet
entities_hashtags_count The number of hashtags in the tweet
entities_mentions The screen-names mentioned in a tweet
name Def.
in_reply_to_screen_name The screen name of the user who is replied to
by the sender
in_reply_to_status_id The unique identifier of a reply
entities_expanded_urls Complete URLs extracted from short URLs
json_output The ENTIRE metadata in JSON format,
including metadata not parsed into columns
entities_media_count NA
media_expanded_url NA
media_url NA
media_type NA
video_link NA
photo_link NA
twitpic NA
Step 1: Checklist
• Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/
• Do you know how to browse and edit SQLite database through SQLite Database Browser? If not, please review pg.10-14 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/
Download the codehttps://drive.google.com/file/d/0Bwwg6GLCW_I
Pdm1mcHNXeU85Nkk/edit?usp=sharing
Have you installed these necessary
Python libraries?
Step 1: Checklist
Step 1: Checklist
Most importantly, we need to install a Twitter mining
library called Twython
(https://twython.readthedocs.org/en/latest/index.html)
Step 2: enter the search terms
You can enter multiple search terms, separated by comas. Please notice
that the last search term ends by a coma.
You can enter non-English search terms. But make sure the Python
script starts by the following block of code:
Step 3: enter your API keys
API Key
API secret
Access token
Access token secret
Enter the key inside the quotation marks
Step 3: enter your API keys
• Set up your API keys - 1
First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.
Step 3: enter your API keys
• Set up your API keys - 2
Enter any name that makes sense to you
Enter any text that makes sense to you
you can enter any legitimate URL, here, I put in the URL of my institution.
Same as above, you can enter any legitimate
URL, here, I put in the URL of my institution.
Step 4: change the parameter
result_type defined by the Twitter API Documents. Now, we
set it to recent, we can also set it to mixed or popular.
Step 4: change the parameter
Here is a list of parameters you can tweak or add:
https://dev.twitter.com/docs/api/1.1/get/search/tweets
For example, if you want to limit the search to Chinese, you
can add lang = ‘zh’
Step 4: change the parameter
For another example, if you want to limit the search to all
tweets sent until April 1 of 2014. You can add until = ‘2014-
04-01’
Step 5: set up SQLite database
• When you type in just a file name, the database will be
saved in the same folder with the Python script. You can
use a full file path such as
sqlite:///C:/xxxx/xxx/MH370.sqlite.
Hit RUN!
If you run the script daily or twice a day, you should be good enough to cover all tweets generated on that day, and tweets a few days old.
But, historical tweets are EXPENSIVE! Tweets older than a week can be purchased through http://gnip.com/
Are we getting all the tweets?