Introduction to Big Data An analogy between Sugar Cane & Big Data

34
Introduction to Big Data An analogy between Sugar Cane & Big Data Jean-Marc Desvaux – March 2012 Image Source: MicFarris.com age Source: alternative-energy-fuels.com

description

 

Transcript of Introduction to Big Data An analogy between Sugar Cane & Big Data

Page 1: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Introduction to Big DataAn analogy between Sugar Cane & Big Data

Jean-Marc Desvaux – March 2012

Image Source: MicFarris.comImage Source: alternative-energy-fuels.com

Page 2: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Session Abstract :

What is Big Data ? Where does it apply ?What are the technologies behind it ?Is it going to replace your RDBMS ? …

Page 3: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’

“Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “

Page 4: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Source: Tom Kyte’s Big Data Are you ready ? presentation

Page 5: Introduction to Big Data An analogy  between Sugar Cane & Big Data

What is Big Data ?

Page 6: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Big Data is data that exceeds the processing capacity of conventional database systems.

It’s too big, too fast or does not fit the structures of database architectures.To gain value from this type of data you need an alternative way to process it.

Why this is happening ?Data is growing faster than computers are getting bigger.

Page 7: Introduction to Big Data An analogy  between Sugar Cane & Big Data

A catch-all term.Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more…

Can be characterized by the 3 Vs :-

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

Page 8: Introduction to Big Data An analogy  between Sugar Cane & Big Data

VolumeData growing faster than machines getting bigger. Data sources adding up..

VelocityRate of acquisition and desired rate of consumption.

VarietyExtends beyond structured data, includes unstructured data of all varieties.

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

Page 9: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Where does Big Data apply?

Page 10: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Big Data value to an Organisation falls into two main categories :

Analytical Use

Enabling new products and services

Page 11: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Analytical Use

To reveal insights previously hidden because hard to record and exploit.

An edge on classic Analytics based on sampling and more “static” & predetermined reports.

It promotes an investigative approach to data and put the data scientist and analyst in the spotlight.

Hal Varian, chief economist at Google“I keep saying that the sexy job in the next 10 years will be statisticians”

Page 12: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Some terms linked to the Analytical Use of Big Data

Sentiment Analysis :Mining the Web in real time and getting a quick read of what people are thinking.

Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big

Brother or Amitabh Bachan)

Page 13: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Product/Service Enabler

Some products and services cannot exist if not backed up by Big Data technologies:-Need to Scale-Need a fast Feedback Loop on complex analytics.

Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example:Google, Yahoo, Amazon,Facebook.

Page 14: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Sectors with Fast Adoption and High Potential

Financial SectorTelecommunications

GovernmentHealthRetail

Page 15: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Big Data Sources :Internal & Data Marketplaces.

Page 16: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Internal sources

Time Attendance logsRFID sensors logs

Security LogsVehicles GPS tracking

Machinery/Telemetry LogsPictures & videos

Enterprise Social NetworksService Forum/Discussions

….

Mostly anything unstructured or simply structured

Page 17: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Source: DataSift.com

External Sources (feeders/data marketplaces)Examples: Infochimps.com, DataSift.com, datamarket.azure.com

Page 18: Introduction to Big Data An analogy  between Sugar Cane & Big Data

An Enterprise Architecture for Big Data

An analogy with a Sugar Cane Factory

Page 19: Introduction to Big Data An analogy  between Sugar Cane & Big Data

AQUIRE (HARVEST)

EXTRACT/SCHRED

EVAPORATE/DISTILL/BOIL

DRY/STORE/SUGAR

A Sugar Factory

= VALUEBOTTOM LINE

SUGAR CANE FIELDS

Page 20: Introduction to Big Data An analogy  between Sugar Cane & Big Data

An Enterprise Big Data Factory

AQUIRE (HARVEST)

ORGANIZE(EXTRACT)

ANALYSE (SCHRED/DISTILL/BOIL)

BUSINESS INTELLIGENCE

(DECIDE)

= VALUEBOTTOM LINE

DATA SOURCES(RDBMS &

Data Marketplaces)

HDFS(Hadoop Distributed FS)

NoSQL Database(Hadoop Distributed FS)

RDBMSEnterprise Applications

Map Reduce(Hadoop)

Big DataConnectors

RDBMSConnectors

Data Warehousing / RDBMS stores

Analytic Applicationsthe sweet part (sugar/rhum)

Page 21: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Some Factories & architectures from vendors

Page 22: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Greenplum (EMC2)An Example of a Turnkey Factory Solution

Page 23: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Another “Turnkey Factory” Example from OracleTargeting high-end Analytics

AQUIRE (HARVEST)

ORGANIZE(EXTRACT)

ORGANIZE(EXTRACT)ANALYSE

(SCHRED/DISTILL/BOIL)

BUSINESS INTELLIGENCE

(DECIDE)

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

Page 24: Introduction to Big Data An analogy  between Sugar Cane & Big Data

+ Of Course, you can build your own factory using OpenSource widely available and on which most

turnkey factory are built.

The Microsoft way

Page 25: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Technologies behind Big Data

Page 26: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Factory blocks & screws used for engineering solutions

Page 27: Introduction to Big Data An analogy  between Sugar Cane & Big Data

NoSQL will kill SQL ?!

Page 28: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Turning RDBMS to a legacy data store ?

Not at all.

We need RDBMS to store high value data and for its feature rich approach (feature first).

NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics).

Remember NoSQL is not “No SQL” but “Not Only SQL”

Page 29: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Big Data future

Page 30: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Rise of Data Marketplaces

Data Science tools development:More powerful & expressive toolsets for analysis

Streaming Data processing emerging tools(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI

Further cloud-enablement

Ease of integration to Enterprise Sources

Page 31: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Conclusion

Page 32: Introduction to Big Data An analogy  between Sugar Cane & Big Data

To leverage Big Data you need something like a Sugar Factory.It can be very entry level factory (Excel – Azure Source)or more complex. The more complex and complete the more value at the end of the processing chain

To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.

Page 33: Introduction to Big Data An analogy  between Sugar Cane & Big Data

The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels.

Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other.

Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.

Page 34: Introduction to Big Data An analogy  between Sugar Cane & Big Data

Thank you.