Predictive Data Mining or Supervised Machine Learning [Classification]
Data mining by example - building predictive model using microsoft decision trees
-
Upload
shaoli-lu -
Category
Data & Analytics
-
view
490 -
download
0
Transcript of Data mining by example - building predictive model using microsoft decision trees
Data Mining By Example – Building Predictive Model Using Microsoft
Decision Trees
by Shaoli Lu
Microsoft Decision Trees
• Developed by Microsoft research team, the Microsoft Decision Trees algorithm is a hybrid decision tree algorithm that supports classification and regression
Goal
• To predict a prospect’s likelihood of purchasing a bike
Prerequisite
• An SQL Server instance created (2005 or above)
• SQL Server Analysis Service (SSAS) –Multidimensional Feature Installed
(this is used to host and browse the mining structures; cube is not required for data mining!)
• AdventureWorksDW database attached(download from CodePlex - tailor to the SQL Server version you have)
• Visual Studio 2010 or above with SQL Server Data Tools (SSDT) installed
My Demo Setup
• Visual Studio 2010
• SQL Server 2012
Create Data Mining Project
• Name the project as DM Decision Trees (DM = Data Mining)
Create Data Source and Impersonation
Create Data Source View
Create Mining Structure
• Choose Microsoft Decision Trees model
• Select Data Source View
• Choose training data
• Select Input/Predict parameters
• Set content types
• Set Holdout percentage
• Name the mining structure and model
Deploy the mining structure and model
Process the mining model
• This is also called “training the model”
Mining Model Viewer
• Identify dominant attributes
• Left is associative with more important attributes
• Rich visualization is good for data exploration as well
Mining Model Accuracy Chart
• This is called “Testing the Model” using the Holdout data
• Lift chart
• Profit chart
Mining Model Prediction
• Singleton query
• Mass prediction
Browse mining model on SQL Server
• Decision trees
• Dependency network
Summary
• Microsoft Decision Trees is a powerful data mining model, yet it is easy to build, train and use
• Can perform both Singleton (e.g. embed in an app) and Mass Predictions (e.g. targeted marketing)
• Holdout data can be used to test trained model• Rich visualizations such as Lift/Profit Charts and
Dependency Network can facilitate analysis and data exploration
• Relational database can be used for data mining; cube is not required
The End