Five steps to search and store tweets by keywords

Post on 13-Jul-2015

2.835 views 0 download

Transcript of Five steps to search and store tweets by keywords

Five Steps to Search and Store Tweets by Keywords

• Created by The Curiosity Bits Blog (curiositybits.com)

• With the support from Dr. Gregory D. Saxton

(http://social-metrics.org/ )

The output you will get…

Let’s say I want to study Twitter discussions of the missing Malaysian airliner

MH370. I plan to gather all tweets that include the keywords MH370 or

Malaysian.

You will get an ample amount of metadata for each tweet. Here is a breakdown

of each metadata type:

name Def.

tweet_id The unique identifier for a tweet

inserted_date When the tweet is downloaded into your database

language language

retweeted_status Is the tweet a RETWEET?

content The content of the tweet

from_user_scree

n_name

The screen name of the tweet sender

name Def.

from_user_followers_count The number of followers the sender has

from_user_friends_count The number of users the sender is following

from_user_listed_count How many times the sender is listed

from_user_statuses_count The number of tweets sent by the sender

from_user_description The profile bio of the sender

from_user_location The location of the sender

from_user_created_at When the Twitter account is created

retweet_count How many times the tweet is retweeted

entities_urls The URLs included in the tweet

entities_urls_count The number of URLs included in the tweet

entities_hashtags The hashtags included in the tweet

entities_hashtags_count The number of hashtags in the tweet

entities_mentions The screen-names mentioned in a tweet

name Def.

in_reply_to_screen_name The screen name of the user who is replied to

by the sender

in_reply_to_status_id The unique identifier of a reply

entities_expanded_urls Complete URLs extracted from short URLs

json_output The ENTIRE metadata in JSON format,

including metadata not parsed into columns

entities_media_count NA

media_expanded_url NA

media_url NA

media_type NA

video_link NA

photo_link NA

twitpic NA

Step 1: Checklist

• Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/

• Do you know how to browse and edit SQLite database through SQLite Database Browser? If not, please review pg.10-14 in http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/

Download the codehttps://drive.google.com/file/d/0Bwwg6GLCW_I

Pdm1mcHNXeU85Nkk/edit?usp=sharing

Have you installed these necessary

Python libraries?

Step 1: Checklist

Step 1: Checklist

Most importantly, we need to install a Twitter mining

library called Twython

(https://twython.readthedocs.org/en/latest/index.html)

Step 2: enter the search terms

You can enter multiple search terms, separated by comas. Please notice

that the last search term ends by a coma.

You can enter non-English search terms. But make sure the Python

script starts by the following block of code:

Step 3: enter your API keys

API Key

API secret

Access token

Access token secret

Enter the key inside the quotation marks

Step 3: enter your API keys

• Set up your API keys - 1

First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.

Step 3: enter your API keys

• Set up your API keys - 2

Enter any name that makes sense to you

Enter any text that makes sense to you

you can enter any legitimate URL, here, I put in the URL of my institution.

Same as above, you can enter any legitimate

URL, here, I put in the URL of my institution.

Step 4: change the parameter

result_type defined by the Twitter API Documents. Now, we

set it to recent, we can also set it to mixed or popular.

Step 4: change the parameter

Here is a list of parameters you can tweak or add:

https://dev.twitter.com/docs/api/1.1/get/search/tweets

For example, if you want to limit the search to Chinese, you

can add lang = ‘zh’

Step 4: change the parameter

For another example, if you want to limit the search to all

tweets sent until April 1 of 2014. You can add until = ‘2014-

04-01’

Step 5: set up SQLite database

• When you type in just a file name, the database will be

saved in the same folder with the Python script. You can

use a full file path such as

sqlite:///C:/xxxx/xxx/MH370.sqlite.

Hit RUN!

If you run the script daily or twice a day, you should be good enough to cover all tweets generated on that day, and tweets a few days old.

But, historical tweets are EXPENSIVE! Tweets older than a week can be purchased through http://gnip.com/

Are we getting all the tweets?