Big data analytics

37
BIG DATA ANALYTICS By Rahul Kulkarni

Transcript of Big data analytics

Page 1: Big data analytics

BIG DATA ANALYTICS By

Rahul Kulkarni

Page 2: Big data analytics

Big Data

Big Data Players in the MarketHadoop Ecosystems

Analytics Machine Learning Algorithms

SMAC

Page 3: Big data analytics
Page 4: Big data analytics

WHAT IS BIG DATA?

“Big Data” is high-volume, high velocity, high variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making.

Page 5: Big data analytics

By 2020, 1.7 MB of new information will be created for each and every human being on the planet – every second every day.

Page 6: Big data analytics
Page 7: Big data analytics

DATA CONTRIBUTIONS

Page 8: Big data analytics

Personalized for each visitor

Page 9: Big data analytics

HADOOP WAS A KEY PART OF IBM’S WATSON

Hadoop analytics and data discovery abilities were a big reason that IBM's Watson computer was able to win a widely publicized "Jeopardy“ showdown last year against a couple of very successful human former champions.

Page 10: Big data analytics

BIG DATA PLAYERS

Page 11: Big data analytics

EVOLUTION OF HADOOP

Page 12: Big data analytics
Page 13: Big data analytics
Rahul
Page 14: Big data analytics
Page 15: Big data analytics

Simple models do better than experts LET US GET STARTED

Page 16: Big data analytics

AN INSURANCE PROBLEM

ProductRevenues in last quarter in

millionCar Insurance 110Life Insurance 180Health Insurance 2202-wheeler Insurance 90Heavy Vehicle Insurance 100

Page 17: Big data analytics

WHAT WE CANNOT EXPLAIN

Page 18: Big data analytics

FIRST MODEL . . .

Categorize data as VEHICLE and NON-VEHICLE insurance.

The average of vehicle insurance: 100

The unexplained = (90-100)^2+(100-100)^2+(110-100)^2=200

The average non-vehicle insurance = 200

The unexplained = (180-200)2+(220-200)2= 800

R2

Page 19: Big data analytics
Page 20: Big data analytics

Lets get started with two different techniques(Supervised) - Classification and Regression(Un-Supervised) - Clustering

Analytics Machine Learning : Supervised & Un-supervised Learning

Page 21: Big data analytics

Machine Learning- Grew out of work in AI- New capability for computers

Examples: - Database mining

Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering

- Applications can’t program by hand.E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.

- Self-customizing programsE.g., Amazon, Netflix product recommendations

- Understanding human learning (brain, real AI).

Page 22: Big data analytics

SUPERVISED LEARNING

Page 23: Big data analytics

PREDICTION AND FORECASTING

Page 24: Big data analytics

UN-SUPERVISED LEARNING

Page 25: Big data analytics
Page 26: Big data analytics

"Consumer data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically will win“-Angela Ahrendts, CEO of BurberryBig Data is key to any Loyalty scheme

Page 27: Big data analytics

The Obama 2012 campaign used data analytics and the experimental method to assemble a winning coalition vote by vote. The interests of individual voters were known and addressed.

Online Media and Web Analytics helped Obama beat McCain, changed the political scene in one of the most powerful nations in the world and how it has influenced the course of history

- Obama had 2.5 M Facebook friends compared to a paltry 0.5 M Facebook friends for McCain (seems strange to think of politicians on Facebook though..)– Obama raised USD 500 M online versus the total amount of USD, 201 M by McCain

Percentage of votes cast for Obama by early voters in HamiltonModel - 57.68%, Actual 57.16%

Television commercials aired on TV land (National cable level)Obama campaign - 1,710, Romney campaign - 0Money spent on online Ads through Mid-October

Romney Campaign - $26 million, Obama Campaign - $52 millions

Page 28: Big data analytics
Page 29: Big data analytics
Page 30: Big data analytics
Page 31: Big data analytics

SMAC will be the platform that will enable organizations to drive consumerization of technology, including IT. Early adopters of SMAC stack would have a clear competitive edge in their line of business

Page 32: Big data analytics
Page 33: Big data analytics
Page 34: Big data analytics

cloud computing is a synonym for distributed computing over a network, and means the ability to run a program or application on many connected computers at the same time

Page 35: Big data analytics
Page 36: Big data analytics
Page 37: Big data analytics

THANK YOU . . . . .