TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL...

Post on 06-Jun-2020

5 views 0 download

Transcript of TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL...

TWITTER MESSAGE DATA TRANSFORMATION USING R-TOOL AND

MONGODB

PRESENT BY:

HELLY PATELKUSH PATEL

Twitter

Data: Twitter Real time Tweets

Tools used: R tool

NO SQL SYSTEM: MongoDB

Creating the Developer’s Account

• First step is to create the twitter Developer’s Account.

Get the API key and Access Tokens

Access tokens

Twitter Authentication WorkFlow

Tool installation for getting Data

• Now installing R studio for getting the Data by adding the different library packages.

Installing the Packages required

• For getting the Twitter data we required the ‘streamR’, ‘ROAuth’ and ‘twitteR’ packages.

Installing Packages(Cont)..

Installing Packages(Cont..)

• Here installing the TwitteR package automatically it install all the dependent packages which are required.

Command used for installing packages

The following command helps us to install the packages in the R studio

• Install.packages(“streamR”)

• Install.packages(“ROAuth”)

• Install.packages(“twitteR”)

Commands for checking up the library installed or not.

• library(streamR)

• library(ROAuth)

• library(twitteR)

Code for handshaking in R

• Executing the command in R for the Handshaking process. It requires the consumer key and consumer secret key which we got by creating the Twitter’s developer account.

Authorizing user

• After writing the code the URL is opened automatically and the user gets authorized by the PIN no.

Capturing Tweets

Now we need to provide the Pin here.

Capturing Tweets

• Now, we need to set the Timeout and the no of tweets for getting the tweets in the filter stream command.

Twitter Data

Twitter returns the data in the .json format and is the logging structure data which looks like as follow:

Twitter Data

• The twitter data looks like as follow which stores in the json file format as shown:

Fields in twitter data

Storing the twitter Data on mongodb

Now after getting the twitter data in form of the Json format from the Rstudio we need to import that data by in the NO SQL System named MongoDB which we have used here by the following steps:

First connecting with MongoDb with:

� mongo

Connecting to the database:

� use Database name

Creating the collection:

db.createCollection(“kushtwitter”)

Storing the json file on mongodb command

• Command used:

Mongoimport - -db helludb - - collection kushtwitter - - file /home/hduser/Downloads/hellutweets_test.json

Twitter data in Mongodb

Data mining on twitter Data

• Query to find the top five hashtag on my data:

> db.kushtwitter.aggregate([{$unwind: '$entities.hashtags'},{$group: {_id: '$entities.hashtags.text',tagCount: {$sum:1}}}, {$sort: {tagCount: -1}}, {$limit:5}]);

Top five Hashtags:

‘Lang’ field with different languages in twitter data

> db.kushtwitter.aggregate([{$group:{_id:'$lang',count:{$sum:1}}},]);

Arranging the data way it was produced by Time

For finding the data way it proceduced the command is:

> db.kushtwitter.find().sort()({”created_at”:-1});

Tweets created with true status

db.kushtwitter.findOne({“retweeted_status”:{$exists”:”true”}})

Tweets creates by word ‘hello’

Tweets=db.kushtwitter.findOne({‘text’:‘$regex’:’hello’}})

Friends count vs Followers Count

Plotting Graph

Displaying the first ten words of tweet

substring(tweet_df$text, 1, 10)

Wordcount by column

Wordcount of the tweet_df

Thank You