Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

45
Why Twitter Is All The Rage: A Data Miner's Perspective Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com PyTN - 23 February 2014 1

description

Sunday 9:55 a.m.–10:45 a.m. Why Twitter Is All the Rage: A Data Miner's Perspective Presenter: Matthew Russell Audience level: Novice Description: In order to be successful, technology must amplify a meaningful aspect of our human experience, and Twitter’s success largely has been dependent on its ability to do this quite well. Although you could describe Twitter as just a “free, high-speed, global text-messaging service,” that would be to miss the much larger point that Twitter scratches some of the most fundamental itches of our humanity. Abstract: This talk explains explains why Twitter is "all the rage" by examining Twitter in light of fundamental questions about our humanity: * We want to be heard * We want to satisfy our curiosity * We want it easy * We want it now This session examines Twitter's ability to examine these questions and presents its underlying conceptual architecture as an interest graph. Even if you have minimal programming skills, you'll come away empowered with the ability to think about data mining on Twitter in more effective ways and apply a powerful collection of easily adaptable recipes to fully exploit the 5 kilobytes of metadata that decorates those 140 characters that you commonly think of as a tweet. Learn how to access Twitter's API, search for tweets, discover trending topics, process tweets in real-time from the firehose, and much more.

Transcript of Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Page 1: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Why Twitter Is All The Rage:A Data Miner's PerspectiveMatthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com

PyTN - 23 February 2014

1

Page 2: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Overview

Intro

Twitter as a Platform for Data Science

Applications of Firehose Analysis (#Syria circa last)

Understanding the Amazon Prime Air Reaction (IPython Notebook Walk Through)

Q&A

2

Page 3: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Intro

3

Page 4: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Hello, My Name Is ... Matthew

4

Background in Computer Science

Data mining & machine learning

CTO @ Digital Reasoning Systems

Data mining; machine learning

Author @ O'Reilly Media

5 published books on technology

Principal @ Zaffra

Selective boutique consulting

Page 5: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Transforming Curiosity Into Insight

5

An open source software (OSS) project

http://bit.ly/MiningTheSocialWeb2E

A book

http://bit.ly/135dHfs

Accessible to (virtually) everyone

Virtual machine with turn-key coding templates for data science experiments

Think of the book as "premium" support for the OSS project

Page 6: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Mining the Social Web ToCChapter 1 - Mining Twitter

Chapter 2 - Mining Facebook

Chapter 3 - Mining LinkedIn

Chapter 4 - Mining Google+

Chapter 5 - Mining Web Pages

Chapter 6 - Mining Mailboxes

Chapter 7 - Mining GitHub

Chapter 8 - Mining the Semantically Marked-Up Web

Chapter 9 - Twitter Cookbook

6

Page 7: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Anatomy of Each ChapterBrief Intro

Objectives

API Primer

Analysis Technique(s)

Data Visualization

Recap

Suggested Exercises

Recommended Resources

7

Page 8: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Opportunities for Data Alchemy

A model for the world: signal and sinks

Growth in data exhaust is accelerating

Digital fingerprints of the "real world" are accumulating

Lots of opportunities for motivated Python hackers

"Software is eating the world"

8

Page 9: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Social Media Is All the Rage

World population: 7B people

Facebook: 1B+ users

Twitter: 650M users

Google+ 500M users

LinkedIn: 260M users

250M+ blogs (conservatively?)

9

Page 10: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

But what does it all mean, Basil?

It's a platform for data science and the frontier for predictive analytics

Understanding world events

Swaying political elections

Modeling human behavior

Analyzing sentiment

Making intelligent recommendations

10

Page 11: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter & Data Science

11

Page 12: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Data Science

12

Data => Actionable information

Highly interdisciplinary

Nascent

Necessary

http://wikipedia.org/wiki/Data_science

Page 13: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Another View of Data Science

13

Page 14: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

14

Page 15: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter Is All the Rage

It satisfies fundamental human desires

We want to be heard

We want to satisfy our curiosity

We want it easy

We want it now

Accessible, rich, and (mostly) "open" data

RESTful APIs and JSON responses

Great proving ground for predictive analytics about the real world

15

Page 16: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter's Network Dynamics

~650M curious users

A collective consciousness

Real-time communication

Short, sweet, ... and fast

Asymmetric Following Model

An interest graph

16

Page 17: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter Primitives

17

Accounts Types: "Anything"

"Following" Relationships

Favorites

Retweets

Replies

(Almost) No Privacy Controls

Page 18: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter and Facebook Compared

18

Twitter

Accounts Types: "Anything"

"Following" Relationships

Favorites

Retweets

Replies

(Almost) No Privacy Controls

Facebook

Accounts Types: People & Pages

Mutual Connections

"Likes"

"Shares"

"Comments"

Extensive Privacy Controls

Page 19: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

What's in a Tweet?

19

140 Characters ...

... Plus ~5KB of metadata!

Authorship

Time & location

Tweet "entities"

Replying, retweeting, favoriting, etc.

Page 20: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

What are Tweet Entities?

Essentially, the "easy to get at" data in the 140 characters

@usermentions

#hashtags

URLs

multiple variations

(financial) symbols

stock tickers

media

20

Page 21: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

API RequestsRESTful requests

Everything is a "resource"

You GET, PUT, POST, and DELETE resources

Standard HTTP "verbs"

Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=SocialWebMining

Streaming API filters

JSON responses

Cursors (not quite pagination)

21

Page 22: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Data Mining: Low Hanging Fruit

"Know thy data..."

Start with simple stats:

Count

Compare

Filter

Rank

Then, apply more complex analyses

22

Page 23: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

A Starting Point: Histograms

A chart that is handy for frequency analysis

They look like bar charts...except they're not bar charts

Each value on the x-axis is a range (or "bin") of values

Not categorical data

Each value on the y-axis is the combined frequency of values in each range

23

Page 24: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

24

Example: Histogram of Retweets

Page 25: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

25

Roberto Mercedes

Jorge

Ana

Nina

Social Network Mechanics

Page 26: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Interest Graph Mechanics

26

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 27: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

A (Social) Interest Graph

27

Roberto Mercedes

Jorge

Ana

Nina

U2

Juan Luis

Guerra

Juan Luís

Guerra

Page 28: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

A (Political) Interest Graph

28

Roberto Mercedes

Jorge

Ana

Nina

Johnny Araya

Rodolfo Hernández

Page 29: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Measuring Influence Is Tricker Than It Looks

29

Spam bot accounts that effectively are zombies and can’t be harnessed for any utility at all

Inactive or abandoned accounts that can’t influence or be influenced since they are not in use

Accounts that follow so many other accounts that the likelihood of getting noticed (and thus influencing) is practically zero

The network effects of retweets by accounts that are active and can be influenced to spread a message

See also http://wp.me/p3QiJd-2a

Page 30: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Justin Bieber vs Tea Party

30

Page 31: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Realtime Analysis: #Syria

31

Monitor Twitter's firehose for realtime data using filters such as #Syria

Keep in mind the sheer volume of data can be considerable

Fuller analysis at http://wp.me/p3QiJd-1I

Page 32: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: Who?

32

See http://wp.me/p3QiJd-1I

Page 33: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: Who?

33

See http://wp.me/p3QiJd-1I

Page 34: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: Who?

34

See http://wp.me/p3QiJd-1I

Page 35: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: What?

35

See http://wp.me/p3QiJd-1I

Page 36: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: What?

36

See http://wp.me/p3QiJd-1I

Page 37: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: Where?

37

See http://wp.me/p3QiJd-1I

Page 38: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: When?

38

See http://wp.me/p3QiJd-1I

Page 39: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

#Syria: Why?

39

That's for you (as the data scientist) to decide

Quantitative automation can amplify human intelligence

Qualitative analysis is still requires human intelligence

Page 40: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Twitter Firehose Analysis with pandas

40

Page 41: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

MTSW Virtual Machine Experience

Goal: Make it easy to transform curiosity into insight

Vagrant-based virtual machine

Virtualbox or AWS

IPython Notebook User Experience

Point-and-click GUI

100+ turn-key examples and templates

Social web mining for the masses

41

Page 42: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Social Media Analysis Framework

A memorable four step process to guide data science experiments:

Aspire

Acquire

Analyze

Summarize

42

Page 43: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Goals

To understand how to capture data from Twitter's firehose

A understand basic pandas usage for tweets

To work through a data science experiment with a systematic 4-step process

To better understand the emotional reaction to the Amazon Prime Air announcement

To introduce some tools for data science

43

Page 44: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Useful Links

Website

http://MiningTheSocialWeb.com

Twitter Data Mining Round Up

http://wp.me/p3QiJd-5H

All Source Code in IPython Notebook format (GitHub)

http://bit.ly/MiningTheSocialWeb2E

44

Page 45: Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)

Q&A

45