École d'été: Web Science and the Mind :UQAM

Post on 24-Apr-2015

118 views 3 download

description

Presentation to the Web Science summer school at UQAM, on the rise of the data scientist in the new economy

Transcript of École d'été: Web Science and the Mind :UQAM

The opportunity for Social Data Scientists

@cgtheoret

Part 1 The Explosion

@cgtheoret

@cgtheoret

Every minute 8-10 months ago:

• 48 hours of video are downloaded on Youtube• 320 new accounts and 98,000 tweets appear

on Twitter• 168,000,000 million emails are sent • 20,000 new posts on Tumblr• 6,600 photos appear on Flickr• Over 20% of all websites are

CMS/wordpress/etc…

Every minute today:

• 100 hours of video are downloaded on Youtube

• ??? new accounts and 236,000 tweets appear on Twitter

• 204,000,000 million emails are sent • 28,000 new posts on Tumblr• 1,600 photos appear on Flickr !!! No shit!

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

But…• Facebook has lost 1.5 million users in Canada

and 6 million in the United States • Yahoo study: 50% of the content that is read

and shared by humans is produced by only 20, 000 accounts 0.05%

@cgtheoret

@cgtheoret

Gartner is predicting an explosion in Social Media Analytics It spending

@cgtheoret

@cgtheoret

@cgtheoret

In a lot of ways Social “Big Data” is like Oil…• Difficult and expensive to extract

@cgtheoret

Difficult and expensive to extract

@cgtheoret

Difficult and expensive to store and distribute

Cheapest (and least useful) when its unrefined

@cgtheoret

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…• Can’t be used by consumers unless refined• More expensive at every step of refinement

@cgtheoret

The Market is Producing a plethora of derived higher value data products

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract• Difficult and expensive to store and distribute• Cheapest in its unrefined form• More expensive at every step of refinement• Produces a plethora of derived products• and it’s actually quite “dirty”!!!!

@cgtheoret

Part 2

Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition

VERACITY

@cgtheoret

Social Data Analytics = Oil Refineries

@cgtheoret

@cgtheoret

6 factors affect Data Veracity …

1. Accuracy: Is it true?2. Precision: If true, error margin?3. Reliability: Is it there all the time?4. Provenance: Can you trace the source?5. Fidelity: Did it change from the

source?6. Permission: Can you use it for the

context?

Black Hat SEO : Blogs

Twitter: 46% of brand followers are bots

Black Hat Social Marketing : Twitter

Or in some cases over 90 %…

Dissapearing Romney: FB as well…

And it is getting worse …

Trying to solve the Veracity problem …

Trying to solve the Veracity problem …

The Big Guys are now doing Veracity …

Murali Krishnam <murali.krishnam@saama.com>Murali Krishnam <murali.krishnam@saama.com>

@cgtheoret

Part 3The Opportunity for Social Data Scientists

@cgtheoret

@cgtheoret

“McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”

@cgtheoret @fffady

Zeitgeist

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret @fffady

@cgtheoret

cg.theoret@nexalogy.com

@cgtheoret

Merci!