Five steps to search and store tweets by keywords

17
Five Steps to Search and Store Tweets by Keywords Created by The Curiosity Bits Blog (curiositybits.com) With the support from Dr. Gregory D. Saxton (http://social-metrics.org/ )

Transcript of Five steps to search and store tweets by keywords

Page 1: Five steps to search and store tweets by keywords

Five Steps to Search and Store Tweets by Keywords

• Created by The Curiosity Bits Blog (curiositybits.com)

• With the support from Dr. Gregory D. Saxton

(http://social-metrics.org/ )

Page 2: Five steps to search and store tweets by keywords

The output you will get…

Let’s say I want to study Twitter discussions of the missing Malaysian airliner

MH370. I plan to gather all tweets that include the keywords MH370 or

Malaysian.

You will get an ample amount of metadata for each tweet. Here is a breakdown

of each metadata type:

name Def.

tweet_id The unique identifier for a tweet

inserted_date When the tweet is downloaded into your database

language language

retweeted_status Is the tweet a RETWEET?

content The content of the tweet

from_user_scree

n_name

The screen name of the tweet sender

Page 3: Five steps to search and store tweets by keywords

name Def.

from_user_followers_count The number of followers the sender has

from_user_friends_count The number of users the sender is following

from_user_listed_count How many times the sender is listed

from_user_statuses_count The number of tweets sent by the sender

from_user_description The profile bio of the sender

from_user_location The location of the sender

from_user_created_at When the Twitter account is created

retweet_count How many times the tweet is retweeted

entities_urls The URLs included in the tweet

entities_urls_count The number of URLs included in the tweet

entities_hashtags The hashtags included in the tweet

entities_hashtags_count The number of hashtags in the tweet

entities_mentions The screen-names mentioned in a tweet

Page 4: Five steps to search and store tweets by keywords

name Def.

in_reply_to_screen_name The screen name of the user who is replied to

by the sender

in_reply_to_status_id The unique identifier of a reply

entities_expanded_urls Complete URLs extracted from short URLs

json_output The ENTIRE metadata in JSON format,

including metadata not parsed into columns

entities_media_count NA

media_expanded_url NA

media_url NA

media_type NA

video_link NA

photo_link NA

twitpic NA

Page 5: Five steps to search and store tweets by keywords

Step 1: Checklist

• Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/

• Do you know how to browse and edit SQLite database through SQLite Database Browser? If not, please review pg.10-14 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/

Download the codehttps://drive.google.com/file/d/0Bwwg6GLCW_I

Pdm1mcHNXeU85Nkk/edit?usp=sharing

Page 6: Five steps to search and store tweets by keywords

Have you installed these necessary

Python libraries?

Step 1: Checklist

Page 7: Five steps to search and store tweets by keywords

Step 1: Checklist

Most importantly, we need to install a Twitter mining

library called Twython

(https://twython.readthedocs.org/en/latest/index.html)

Page 8: Five steps to search and store tweets by keywords

Step 2: enter the search terms

You can enter multiple search terms, separated by comas. Please notice

that the last search term ends by a coma.

You can enter non-English search terms. But make sure the Python

script starts by the following block of code:

Page 9: Five steps to search and store tweets by keywords

Step 3: enter your API keys

API Key

API secret

Access token

Access token secret

Enter the key inside the quotation marks

Page 10: Five steps to search and store tweets by keywords

Step 3: enter your API keys

• Set up your API keys - 1

First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.

Page 11: Five steps to search and store tweets by keywords

Step 3: enter your API keys

• Set up your API keys - 2

Enter any name that makes sense to you

Enter any text that makes sense to you

you can enter any legitimate URL, here, I put in the URL of my institution.

Same as above, you can enter any legitimate

URL, here, I put in the URL of my institution.

Page 12: Five steps to search and store tweets by keywords

Step 4: change the parameter

result_type defined by the Twitter API Documents. Now, we

set it to recent, we can also set it to mixed or popular.

Page 13: Five steps to search and store tweets by keywords

Step 4: change the parameter

Here is a list of parameters you can tweak or add:

https://dev.twitter.com/docs/api/1.1/get/search/tweets

For example, if you want to limit the search to Chinese, you

can add lang = ‘zh’

Page 14: Five steps to search and store tweets by keywords

Step 4: change the parameter

For another example, if you want to limit the search to all

tweets sent until April 1 of 2014. You can add until = ‘2014-

04-01’

Page 15: Five steps to search and store tweets by keywords

Step 5: set up SQLite database

• When you type in just a file name, the database will be

saved in the same folder with the Python script. You can

use a full file path such as

sqlite:///C:/xxxx/xxx/MH370.sqlite.

Page 16: Five steps to search and store tweets by keywords

Hit RUN!

Page 17: Five steps to search and store tweets by keywords

If you run the script daily or twice a day, you should be good enough to cover all tweets generated on that day, and tweets a few days old.

But, historical tweets are EXPENSIVE! Tweets older than a week can be purchased through http://gnip.com/

Are we getting all the tweets?