TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL...

31
TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL AND MONGODB PRESENT BY: HELLY PATEL KUSH PATEL

Transcript of TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL...

Page 1: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL AND

MONGODB

PRESENT BY:

HELLY PATELKUSH PATEL

Page 2: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Twitter

Data: Twitter Real time Tweets

Tools used: R tool

NO SQL SYSTEM: MongoDB

Page 3: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Creating the Developer’s Account

• First step is to create the twitter Developer’s Account.

Page 4: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Get the API key and Access Tokens

Page 5: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Access tokens

Page 6: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Twitter Authentication WorkFlow

Page 7: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Tool installation for getting Data

• Now installing R studio for getting the Data by adding the different library packages.

Page 8: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Installing the Packages required

• For getting the Twitter data we required the ‘streamR’, ‘ROAuth’ and ‘twitteR’ packages.

Page 9: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Installing Packages(Cont)..

Page 10: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Installing Packages(Cont..)

• Here installing the TwitteR package automatically it install all the dependent packages which are required.

Page 11: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Command used for installing packages

The following command helps us to install the packages in the R studio

• Install.packages(“streamR”)

• Install.packages(“ROAuth”)

• Install.packages(“twitteR”)

Commands for checking up the library installed or not.

• library(streamR)

• library(ROAuth)

• library(twitteR)

Page 12: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Code for handshaking in R

• Executing the command in R for the Handshaking process. It requires the consumer key and consumer secret key which we got by creating the Twitter’s developer account.

Page 13: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Authorizing user

• After writing the code the URL is opened automatically and the user gets authorized by the PIN no.

Page 14: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Capturing Tweets

Now we need to provide the Pin here.

Page 15: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Capturing Tweets

• Now, we need to set the Timeout and the no of tweets for getting the tweets in the filter stream command.

Page 16: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Twitter Data

Twitter returns the data in the .json format and is the logging structure data which looks like as follow:

Page 17: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Twitter Data

• The twitter data looks like as follow which stores in the json file format as shown:

Page 18: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Fields in twitter data

Page 19: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Storing the twitter Data on mongodb

Now after getting the twitter data in form of the Json format from the Rstudio we need to import that data by in the NO SQL System named MongoDB which we have used here by the following steps:

First connecting with MongoDb with:

� mongo

Connecting to the database:

� use Database name

Creating the collection:

db.createCollection(“kushtwitter”)

Page 20: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Storing the json file on mongodb command

• Command used:

Mongoimport - -db helludb - - collection kushtwitter - - file /home/hduser/Downloads/hellutweets_test.json

Page 21: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Twitter data in Mongodb

Page 22: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Data mining on twitter Data

• Query to find the top five hashtag on my data:

> db.kushtwitter.aggregate([{$unwind: '$entities.hashtags'},{$group: {_id: '$entities.hashtags.text',tagCount: {$sum:1}}}, {$sort: {tagCount: -1}}, {$limit:5}]);

Page 23: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Top five Hashtags:

Page 24: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

‘Lang’ field with different languages in twitter data

> db.kushtwitter.aggregate([{$group:{_id:'$lang',count:{$sum:1}}},]);

Page 25: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Arranging the data way it was produced by Time

For finding the data way it proceduced the command is:

> db.kushtwitter.find().sort()({”created_at”:-1});

Page 26: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Tweets created with true status

db.kushtwitter.findOne({“retweeted_status”:{$exists”:”true”}})

Page 27: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Tweets creates by word ‘hello’

Tweets=db.kushtwitter.findOne({‘text’:‘$regex’:’hello’}})

Page 28: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Friends count vs Followers Count

Plotting Graph

Page 29: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Displaying the first ten words of tweet

substring(tweet_df$text, 1, 10)

Page 30: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Wordcount by column

Wordcount of the tweet_df

Page 31: TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL …cis.csuohio.edu/~sschung/cis612/cis612projectppt_Helly.pdf · Storing the twitter Data on mongodb Now after getting the twitter

Thank You