Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen,...

15
Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

Transcript of Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen,...

Page 1: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout

Saptak Sen, MicrosoftBill Ramos, Advaiya

Page 2: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

• Overview of predictive analytics & data mining

• How Microsoft supports predictive analytics

• How Mahout fits into the picture

• Demos

Agenda

Page 3: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Data Mining

Page 4: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Predicting future performance from historical data

*Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

IT infrastructure and web app optimization

Legal discovery and document archiving

Pricing analysisFraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized Insurance

Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later*

Page 5: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Data mining tool in SQL Server Analysis Services

• Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more

• Rich developer experience

Page 6: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Analysis Services Data Mining Algorithms

Classify Estimate Cluster Forecast Associate

• Decision Trees

• Logistic Regression

• Naïve Bayes

• Neural Networks

• Decision Trees

• Linear Regression

• Logistic Regression

• Neural Networks

• Clustering

• Time Series

• Association Rules

• Decision Trees

Page 7: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Data mining add-in for Excel

• Ease of use through Excel

• Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more

• Scalable through integration with SSAS

Page 8: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Algorithms: Data Mining Add-in for Excel

Menu Data Mining

Analyze Key Influencers Naïve Bayes

Detect Categories Clustering

Fill From Example Logical Regression

Forecast Time Series

Highlight Exceptions Clustering

Scenario Analysis – Goal Seek Logical Regression

Scenario Analysis – What If Logical Regression

Prediction Calculator Logical Regression

Shopping Basket Analysis Association Rules

Page 9: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Demo 1: Excel Data Mining Add-In

Windows Azure HDInsight

Microsoft Excel(Mining Add-in)

Microsoft Excel

Excel Data Mining Add-in

Serving LayerSpeed LayerBatch Layer

Flat files (.txt, .dat, .xl

sx, etc.)

Page 10: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Mahout

Page 11: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Mahout

• Scalable machine learning algorithms on Hadoop platform

• Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm

• Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies

Applications

ClusteringRecommendersVector Similarity

PatternMining

Classification

Regression

GeneticDimension Reduction

Matrices

Collocations

Examples

Page 12: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Demo 2: Mahout

Flat files (.txt, .dat, .xl

sx, etc.)

Running Mahout job on Hadoop Command Window to get output

file

Convert to Mahout input

Hadoop Command Window

Output file

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

HDInsight Consoles

Page 14: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Questions?

Page 15: Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.