Introduction to Big Data An analogy between Sugar Cane & Big Data

Post on 20-Dec-2014

1.316 views 0 download

Tags:

description

 

Transcript of Introduction to Big Data An analogy between Sugar Cane & Big Data

Introduction to Big DataAn analogy between Sugar Cane & Big Data

Jean-Marc Desvaux – March 2012

Image Source: MicFarris.comImage Source: alternative-energy-fuels.com

Session Abstract :

What is Big Data ? Where does it apply ?What are the technologies behind it ?Is it going to replace your RDBMS ? …

Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’

“Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “

Source: Tom Kyte’s Big Data Are you ready ? presentation

What is Big Data ?

Big Data is data that exceeds the processing capacity of conventional database systems.

It’s too big, too fast or does not fit the structures of database architectures.To gain value from this type of data you need an alternative way to process it.

Why this is happening ?Data is growing faster than computers are getting bigger.

A catch-all term.Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more…

Can be characterized by the 3 Vs :-

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

VolumeData growing faster than machines getting bigger. Data sources adding up..

VelocityRate of acquisition and desired rate of consumption.

VarietyExtends beyond structured data, includes unstructured data of all varieties.

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

Where does Big Data apply?

Big Data value to an Organisation falls into two main categories :

Analytical Use

Enabling new products and services

Analytical Use

To reveal insights previously hidden because hard to record and exploit.

An edge on classic Analytics based on sampling and more “static” & predetermined reports.

It promotes an investigative approach to data and put the data scientist and analyst in the spotlight.

Hal Varian, chief economist at Google“I keep saying that the sexy job in the next 10 years will be statisticians”

Some terms linked to the Analytical Use of Big Data

Sentiment Analysis :Mining the Web in real time and getting a quick read of what people are thinking.

Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big

Brother or Amitabh Bachan)

Product/Service Enabler

Some products and services cannot exist if not backed up by Big Data technologies:-Need to Scale-Need a fast Feedback Loop on complex analytics.

Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example:Google, Yahoo, Amazon,Facebook.

Sectors with Fast Adoption and High Potential

Financial SectorTelecommunications

GovernmentHealthRetail

Big Data Sources :Internal & Data Marketplaces.

Internal sources

Time Attendance logsRFID sensors logs

Security LogsVehicles GPS tracking

Machinery/Telemetry LogsPictures & videos

Enterprise Social NetworksService Forum/Discussions

….

Mostly anything unstructured or simply structured

Source: DataSift.com

External Sources (feeders/data marketplaces)Examples: Infochimps.com, DataSift.com, datamarket.azure.com

An Enterprise Architecture for Big Data

An analogy with a Sugar Cane Factory

AQUIRE (HARVEST)

EXTRACT/SCHRED

EVAPORATE/DISTILL/BOIL

DRY/STORE/SUGAR

A Sugar Factory

= VALUEBOTTOM LINE

SUGAR CANE FIELDS

An Enterprise Big Data Factory

AQUIRE (HARVEST)

ORGANIZE(EXTRACT)

ANALYSE (SCHRED/DISTILL/BOIL)

BUSINESS INTELLIGENCE

(DECIDE)

= VALUEBOTTOM LINE

DATA SOURCES(RDBMS &

Data Marketplaces)

HDFS(Hadoop Distributed FS)

NoSQL Database(Hadoop Distributed FS)

RDBMSEnterprise Applications

Map Reduce(Hadoop)

Big DataConnectors

RDBMSConnectors

Data Warehousing / RDBMS stores

Analytic Applicationsthe sweet part (sugar/rhum)

Some Factories & architectures from vendors

Greenplum (EMC2)An Example of a Turnkey Factory Solution

Another “Turnkey Factory” Example from OracleTargeting high-end Analytics

AQUIRE (HARVEST)

ORGANIZE(EXTRACT)

ORGANIZE(EXTRACT)ANALYSE

(SCHRED/DISTILL/BOIL)

BUSINESS INTELLIGENCE

(DECIDE)

Image Source: Tom Kyte’s Big Data Are you ready ? presentation

+ Of Course, you can build your own factory using OpenSource widely available and on which most

turnkey factory are built.

The Microsoft way

Technologies behind Big Data

Factory blocks & screws used for engineering solutions

NoSQL will kill SQL ?!

Turning RDBMS to a legacy data store ?

Not at all.

We need RDBMS to store high value data and for its feature rich approach (feature first).

NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics).

Remember NoSQL is not “No SQL” but “Not Only SQL”

Big Data future

Rise of Data Marketplaces

Data Science tools development:More powerful & expressive toolsets for analysis

Streaming Data processing emerging tools(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI

Further cloud-enablement

Ease of integration to Enterprise Sources

Conclusion

To leverage Big Data you need something like a Sugar Factory.It can be very entry level factory (Excel – Azure Source)or more complex. The more complex and complete the more value at the end of the processing chain

To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.

The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels.

Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other.

Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.

Thank you.