NoSQL & Big Data Analytics: History, Hype, Opportunities

27
analyze(NoSQL,BigData); /* history, hype, opportunities */ // By: Vishy Poosala // Head of Bell Labs, India // [email protected] // @vishyp 1

description

Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.

Transcript of NoSQL & Big Data Analytics: History, Hype, Opportunities

Page 1: NoSQL & Big Data Analytics: History, Hype, Opportunities

1

analyze(NoSQL,BigData);/* history, hype, opportunities */

// By: Vishy Poosala

// Head of Bell Labs, India

// [email protected]

// @vishyp

Page 2: NoSQL & Big Data Analytics: History, Hype, Opportunities

2

The dark ages of COBOL

Page 3: NoSQL & Big Data Analytics: History, Hype, Opportunities

3

..then Codd saidlet there be tables

Rows & Columns

Normal Forms

ACID

SQL

Page 4: NoSQL & Big Data Analytics: History, Hype, Opportunities

4

www.data-for-humans.com

WHAT COLUMNS

? SET-VALUED

ATTRIBUTES

Schema Evolution

XML

Page 5: NoSQL & Big Data Analytics: History, Hype, Opportunities

5

Billions of Keys & Values

Cassandra

Dynamo

Hadoop

Google

Big Table

GFS

Page 6: NoSQL & Big Data Analytics: History, Hype, Opportunities

6

How would you build a super-fast, FB-scale chat service, in 2012?

(for example)

Page 7: NoSQL & Big Data Analytics: History, Hype, Opportunities

7

I want my own DB!

• Memcached• redisMain

Memory

• MongoDBDistr.

K-V

• CouchDBVersions

• Neo4jSocial Graphs

Page 8: NoSQL & Big Data Analytics: History, Hype, Opportunities

8

BIG

Data

Analytics

Language

60’s 80-96

96-’07 ‘07-

KB

FILES

STATS

COBOL

GB

TABLES

OLAPCube

SQL

TB

Semi-Structured

Apps

XML

PB

VarietyDynamic

Mahout

NoSQL

Page 9: NoSQL & Big Data Analytics: History, Hype, Opportunities

9

Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.com

You can find all the slides from his talk at:

http://www.slideshare.net/gpiatetskyshapiro/analytics-and-data-mining-industry-overview

Analyzing Analytics,

Job Trends

Page 10: NoSQL & Big Data Analytics: History, Hype, Opportunities

10

Data Tsunami

• In 2010 enterprises stored 7 exabytes =7,000,000,000 GB

of new data (McKinsey)• 90 percent of the

world's data has been generated in the past two years (IBM)

Image with apologies to KDD-2011

Page 11: NoSQL & Big Data Analytics: History, Hype, Opportunities

11

Pre-history

From Google Ngram viewer – English language booksNote: Our analysis uses only English language data. Other languages, especially Chinese , need to be considered for full picture

Statistics is the biggest term in 20th century, but data mining and analytics appears in late 1990s

Page 12: NoSQL & Big Data Analytics: History, Hype, Opportunities

12

Recent History: Analytics, Data Mining, Knowledge Discovery

Analytics has been used since 1800, but started to rise in 2005Data Mining jumps around 1996 (soon after first KDD conference) but declines after 2003 (TIA controversy, associated with gov. invasion of privacy).Knowledge Discovery appears in 1989, jumps in 1996, and plateaus after 2000

Page 13: NoSQL & Big Data Analytics: History, Hype, Opportunities

13

Google Trends: After 2006, Data Mining < Analytics

Page 14: NoSQL & Big Data Analytics: History, Hype, Opportunities

14

Google Insights: searches for data mining, analytics -googleare most popular in India, US

Page 15: NoSQL & Big Data Analytics: History, Hype, Opportunities

15

Analytics > Data Mining > Data Science

Page 16: NoSQL & Big Data Analytics: History, Hype, Opportunities

16

Data Science, Big Data

Page 17: NoSQL & Big Data Analytics: History, Hype, Opportunities

17

Data Types Analyzed/Mined

www.KDnuggets.com/polls/2011/data-types-analyzed-mined.html

Page 18: NoSQL & Big Data Analytics: History, Hype, Opportunities

18

Largest Dataset Analyzed?2011 median dataset size ~10-20 GB, vs 8-10 GB in 2010.

Increase in10 GB to 1 PB range

www.KDnuggets.com/polls/2011/largest-dataset-analyzed-data-mined.html

Page 19: NoSQL & Big Data Analytics: History, Hype, Opportunities

19

Which methods/algorithms did you use for data analysis in 2011

Decision Trees

Regression

Clustering

Statistics

Visualization

Time series/Sequence analysis

Support Vector (SVM)

Association rules

Ensemble methods

Text Mining

Neural Nets

Boosting

Bayesian

Bagging

Factor Analysis

Anomaly/Deviation detection

Social Network Analysis

Survival Analysis

Genetic algorithms

Uplift modeling

0% 10% 20% 30% 40% 50% 60% 70%

% analysts who used it

www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html

Page 20: NoSQL & Big Data Analytics: History, Hype, Opportunities

20

Cloud Analytics is not common (yet)

www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html

Page 21: NoSQL & Big Data Analytics: History, Hype, Opportunities

21

Shortage of Skills

• McKinsey: shortage by 2018 in the US of– 140-190,000 people with deep analytical skills

– 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions.

Source: www.mckinsey.com/mgi/publications/big_data/

Page 22: NoSQL & Big Data Analytics: History, Hype, Opportunities

22

Job data: Data Scientist

Page 23: NoSQL & Big Data Analytics: History, Hype, Opportunities

23

Jobs: Data Mining >> Data Scientist

Page 24: NoSQL & Big Data Analytics: History, Hype, Opportunities

24

“Ground” Analytics (LinkedIn Skills)

~ 75,000 with Data Mining skill

~ 7,000 with Predictive Modeling

Also ~ 20,000 with Predictive Analytics(not related with Predictive Modeling ??

Page 25: NoSQL & Big Data Analytics: History, Hype, Opportunities

25

Analytics LinkedIn Skills

Machine LearningPredictive Analytics

Text Mining MapRedu

ce

Page 26: NoSQL & Big Data Analytics: History, Hype, Opportunities

26

Big Data Bubble?

Gartner Hype Cycle

Big Data

Page 27: NoSQL & Big Data Analytics: History, Hype, Opportunities

27

@vishyp

http://innovation-edge.blogspot.com