Twitter6

19
Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2 [email protected]

description

 

Transcript of Twitter6

Page 1: Twitter6

Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2

[email protected]

Page 2: Twitter6

Data Mining

Page 3: Twitter6

Discover New Knowledge

Page 4: Twitter6

Discover New Knowledge

from Existing Information

Page 5: Twitter6

What do #TeaParty and #JustinBieber

have in common

Page 6: Twitter6

Tools: Pymongo, MongoDB

apt-get install python-dev pip install pymongo

Page 7: Twitter6

from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() tweets = api.search('#JustinBieber', rpp=100) for tweet in tweets: db.foo.save(tweet.__getstate__())

Get Tweets

Page 8: Twitter6

from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() for num in range(1,16): tweets = api.search('#JustinBieber', rpp=100, page=num) for tweet in tweets: db.foo.save(tweet.__getstate__())

Insert TO MongoDB

Page 9: Twitter6

map = function(){ words = this.text.split(' '); for ( i in words ){ emit({ key: words[i] }, {count: 1}); } };

Count Frequency in mongo

MAP

Page 10: Twitter6

reduce = function (key, values) { var count = 0; values.forEach(function (v) {count += v.count;}); return {count:count}; }

Count Frequency in mongo

REDUCE

Page 11: Twitter6

res = db.foo.mapReduce( map, reduce, {out: "mystring"});

Count Frequency in mongo

EXECUTE

Page 12: Twitter6

{ "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }

Count Frequency in mongo

RESULT

Page 13: Twitter6

from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo cursor = db.mystring.find() for d in cursor: print d

Get From MongoDB

Page 14: Twitter6

What Entities Co-Occur Most Often with #JustinBieber and

#TeaParty Tweets?

Page 15: Twitter6

import sys from sets import Set if __name__=='__main__': r1 = open( sys.argv[1] ) r2 = open( sys.argv[2] ) s1 = Set() s2 = Set() for line in r1.readlines(): key = line.split() if( len(key) > 0 ): s1.add(key[0]) for line in r2.readlines(): key = line.split() if( len(key) > 0 ): s2.add(key[0]) s3 = s1.intersection(s2) print len(s1) print len(s2) print len(s3)

intersection

Page 16: Twitter6

On Average, Do #JustinBieber or #TeaParty Tweets Have

More Hashtags?

Page 17: Twitter6

Which Get Retweeted More Often: #JustinBieber or

#TeaParty?

Page 18: Twitter6

How Much Overlap Exists Between the Entities of

#TeaParty and #JustinBieber Tweet?

Page 19: Twitter6

Thank You!