Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd...
-
Upload
luisa-cheyney -
Category
Documents
-
view
214 -
download
1
Transcript of Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd...
![Page 1: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/1.jpg)
Using Data Mining in Your IT SystemsRafal LukawieckiStrategic Consultant, Project Botticelli [email protected]
![Page 2: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/2.jpg)
2
Objectives
• Solve common business and IT scenarios• Understand how to use BIDS
• See it in action (about 70% afternoon for demos)• Solve DM problems by choosing and
parametrising correct DM algorithms
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
© 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
This seminar is partly based on “Data Mining” book by ZhaoHui Tang and Jamie MacLennan, and also on Jamie’s presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this session. Thank you to Roni Karassik for a slide. Thank you to Mike Tsalidis, Olga Londer, and Marin Bezic for all the support. Thank you to Maciej Pilecki for assistance with demos.
![Page 3: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/3.jpg)
3
Agenda
• Overview of Techniques
• Scenarios:• Customer Segmentation and Classification• Analysing Sales• Profitability and Risk• Customer Needs Analysis• Forecasting• Other Scenarios
![Page 4: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/4.jpg)
4
The Techniques
![Page 5: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/5.jpg)
5
Microsoft DM Algorithms
• Designed for wide use• Auto tuning and parameterisation• They just work with little effort on your side
• Consistent and simple interface• Why “Microsoft xxx”?
• There are few truly standard algorithms yet• Each DM vendor has their own significant
variations• Microsoft invented some of the techniques
• E.g. Use of trees for regression or nested cases• You can add 3rd party and your own algorithms
easily
![Page 6: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/6.jpg)
6
Data Mining AlgorithmsAlgorithm Description
Decision Trees
Finds the odds of an outcome based on values in a training set
Association Rules
Identifies relationships between cases
Clustering Classifies cases into distinctive groups based on any attribute sets
Naïve Bayes
Clearly shows the differences in a particular variable for various data elements
Sequence Clustering
Groups or clusters data based on a sequence of previous events
Time Series Analyzes and forecasts time-based data combining the power of ARTXP (developed by Microsoft Research) for short-term predictions with ARIMA (in SQL 2008) for long-term accuracy.
Neural Nets
Seeks to uncover non-intuitive relationships in data
Linear Regression
Determines the relationship between columns in order to predict an outcome
Logistic Regression
Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
![Page 7: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/7.jpg)
7
Algorithm Matrix
Time Series
Sequence Clustering
Neural Nets
Naïve Bayes
Logistic Regression
Linear Regression
Decision Trees
Clustering
Association Rules
Classifica
tion
Estimatio
n
Segmentation
Associa
tion
Forecasti
ng
Text Analysis
Advanced Data
Exploration
![Page 8: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/8.jpg)
8
SCENARIO 1: CUSTOMER CLASSIFICATION & SEGMENTATION
Who are our customers? Are there any relationships between their demographics and their interest in buying from us? Who should we concentrate on more?
![Page 9: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/9.jpg)
9
Let’s Get Familiar with BIDSBusiness Intelligence Development Studio
• Offline and online modes• Everything you do stays on the server• Offline requires server admin privileges to deploy
• Process:1. Define Data Sources and Data Source Views2. Define Mining Structure and Models3. Train (process) the Structures4. Verify accuracy5. Explore and visualise6. Perform predictions7. Deploy for other users8. Regularly update and re-validate the Model
![Page 10: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/10.jpg)
Demo1. Using BIDS to Prepare for Data Mining
![Page 11: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/11.jpg)
11
Data Mining Designer
1. Build a Mining Structure and its first Mining Model
2. Train (process) model3. Validate model in the Accuracy Chart tab4. Explore and visualise5. Perform predictions
![Page 12: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/12.jpg)
12
Microsoft Decision Trees
• Use for:• Classification: churn
and risk analysis• Regression: predict
profit or income • Association analysis
based on multiple predictable variable
• Builds one tree for each predictable attribute
• Fast
![Page 13: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/13.jpg)
Demo1. Building a Data Mining Model for Customer
Classification Using Microsoft Decision Trees2. Exploring the Decision Tree
![Page 14: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/14.jpg)
15
Microsoft Naïve Bayes
• Use for:• Classification• Association with
multiple predictable attributes
• Assumes all inputs are independent
• Simple classification technique based on conditional probability
![Page 15: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/15.jpg)
17
Clustering
• Applied to • Segmentation: Customer
grouping, Mailing campaign
• Also: classification and regression
• Anomaly detection• Discrete and continuous• Note:
• “Predict Only” attributes not used for clustering
![Page 16: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/16.jpg)
19
ClusteringAnomaly Detection
Male Female
Son
Daughter
Parent
Age
![Page 17: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/17.jpg)
21
Neural Network
• Applied to• Classification• Regression
• Great for finding complicated relationship among attributes• Difficult to
interpret results• Gradient Descent
method
Age Education Sex Income
Input Layer
Hidden Layers
Output Layer
Loyalty
![Page 18: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/18.jpg)
Demo1. Expanding Customer Classification and
Segmentation with Microsoft Clustering, Naïve Bayes, and Neural Networks
2. Exploring and Visualising Patters Found with the Above
![Page 19: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/19.jpg)
24
Validating Results
• Accuracy Viewer tabs run a full prediction against the holdout data
• Results compared to known holdout values and visualised:• Classification Matrix – tedious but precise• Lift Charts show how your model compares to a
random uneducated guess• Compare multiple algorithm results• Two types of chart: generic and specific to a
predicted value (e.g. [Total of Car Purchases] = 2)• Profit Chart is a simple variation of a Lift Chart
• Not a real “profit” prediction, just a name
![Page 20: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/20.jpg)
Demo1. Verifying Results Using Classification Matrix2. Validating Model Accuracy Using Two Types of Lift
Charts
![Page 21: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/21.jpg)
26
Improving Models
• Approaches:• Change the algorithm• Change model parameters• Change inputs/outputs to avoid bad correlations• Clean the data set
• Perhaps there are no good patterns in data• Verify statistics (Data Explorer)
![Page 22: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/22.jpg)
Demo1. Improving Clustering Results by Parameterisation2. Re-validating Customer Classification Models
![Page 23: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/23.jpg)
28
SCENARIO 2: ANALYSING SALES
What makes some of our products more successful? Why is a model or a brand favoured by some groups of clients? Can we recommend additional products on our web site automatically without annoying them?
![Page 24: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/24.jpg)
29
First of All, Use:
• Decision Trees• Especially with Nested Cases
• This subtly changes DT so it finds associations• Clustering• Naive Bayes• Neural Networks
• And...
![Page 25: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/25.jpg)
30
Association Rules
• Use for:• Market basket
analysis• Cross selling and
recommendations
• Advanced data exploration
• Finds frequent itemsets and rules
• Sensitive to parameters
![Page 26: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/26.jpg)
Demo1. Analysing Customer Needs with Decision Trees
and No Nesting...2. ... and Decision Trees with Nested Cases3. Using Association Rules to Find Shopping
Preferences
![Page 27: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/27.jpg)
33
SCENARIO 3: PROFITABILITY AND RISK
Who are our most profitable customers? Can I predict profit of a future customer based on demographics? Should I give them a “Platinum Elite” card now?
![Page 28: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/28.jpg)
34
Profitability and Risk
• Finding what makes a customer profitable is another example of classification
• Typically solved with:• Decision Trees (Regression), Linear Regression,• and Neural Networks or Logistic Regression
• Often used for prediction• Important to predict probability of the predicted,
or expected profit• Risk scoring
• Logistic Regression and Neural Networks
![Page 29: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/29.jpg)
35
Functions
• DMX functions can be used to build richer prediction expressions
• Predict statistical measures:• PredictProbability• PredictHistogram
• Essential to use when predicting any values, in particular profit or risk
![Page 30: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/30.jpg)
36
PredictProbability
PredictProbability(LoanStatus)
Probability of the most likely outcome
PredictProbability(LoanStatus, “Defaulted”)
Probability that the loan will become very unhealthy
• Similar for PredictAdjustedProbability, etc.
![Page 31: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/31.jpg)
Demo1. Analyzing and Predicting Lending Risk with a
Named Query2. Analyzing Profitability Using Multiple Algorithms3. Performing Predictions in BIDS4. Predicting in Excel Using Previously Deployed
Models and “Data Mining” Tab
![Page 32: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/32.jpg)
38
Cross-Validating Results: ReliabilitySQL Server 2008• X iterations of
retraining and retesting the model
• Results from each test statistically collated
• Model deemed accurate (and perhaps reliable) when variance is low and results meet expectations
![Page 33: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/33.jpg)
Demo1. Cross-validation of Models’ Reliability
![Page 34: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/34.jpg)
40
SCENARIO 4: CUSTOMER NEEDS ANALYSIS
How do they behave? What are they likely to do once they bought that really expensive car? Should I intervene?
![Page 35: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/35.jpg)
41
What is a Sequence?
• To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using:
• Association Rules• Sequence Clustering
![Page 36: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/36.jpg)
42
Sequence Clustering
• Analysis of:• Customer behaviour• Transaction patterns• Click stream• Customer
segmentation• Sequence prediction
• Mix of clustering and sequence technologies• Groups individuals
based on their profiles including sequence data
![Page 37: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/37.jpg)
43
Sequence Data
Cust ID Age
MaritalStatus
Car Purchases
Seq ID Brand
1 35 M 1 Porch-A2 Bamborgini3 Kexus
2 20 S 1 Wagen
2 Voovo
3 Voovo
3 57 M 1 Voovo
2 T-Yota
![Page 38: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/38.jpg)
Demo1. Analysing Customer Transaction Behaviour Using
Sequence Clustering2. Analysing Events Leading to Customer Loss using
Sequence Clustering
![Page 39: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/39.jpg)
46
SCENARIO 5: FORECASTING
What are my sales going to be like in the next few months? Will I have credit problems? Will my server need an upgrade in the next 3 months?
![Page 40: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/40.jpg)
47
Estimating FutureForecasting
• But: data often is very seasonal• Seasonality detected by a Fast Fourier Transform
• Time Series• SQL Server 2005 uses ARTXP (Auto Regressive
Tree with Cross Prediction)• For short term forecasting
• SQL Server 2008 uses a hybrid of improved ARTXP and industry-standard ARIMA (Auto Regressive Integrated Moving Average)• Great for short and long-term forecasting
![Page 41: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/41.jpg)
48
Time Series
• Uses:• Forecast sales• Inventory
prediction• Web hits prediction• Stock value
estimation• Regression tree
technology to describe and predict series values• Tree allows multiple
regressors
![Page 42: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/42.jpg)
51
Input Data
Month Milk Bread
Jan 100 80Feb 120 90
Mar 110 85
Apr 115 110
May 125 120
Jun 120 123
July 140 150
…
Month Product Sales
Jan Milk 100Jan Bread 80
Feb Milk 120
Feb Bread 90
Mar Milk 110
Mar Bread 85
Apr Milk 115
…
Format A Format B
![Page 43: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/43.jpg)
Demo1. Forecasting Sales Using a Custom Model and BIDS2. Forecasting Sales Using Table Analysis Tools in
Excel
![Page 44: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/44.jpg)
56
OTHER SCENARIOS
![Page 45: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/45.jpg)
57
Data Improvement During ETL
• Issue:• Inconsistent or missing data during Extract-
Transform-Load (data warehousing).
• Suggested Solution:1. Decision Tree (or Clustering, Naive Bayes) model
for existing data2. Apply prediction in real-time as ETL is taking
place3. Flag each row containing predicted values row
with a probability measure (it is not a fact)
![Page 46: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/46.jpg)
58
Security Threat Detection
• Issue:• Finding suspicious transactions and intruder
detection.
• Suggested Solution:1. Clustering (or Neural Network) to find small
groups of outliers2. Prediction of “just one row” transactional data to
see if it belongs to the suspected cluster• Or1. Sequence Clustering of clicks to detect a known
attack pattern
![Page 47: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/47.jpg)
59
Web Site and Email Feedback Analysis• Issue:
• What are the main issues our customers highlight? How can I quickly spot problem submissions that need a response?
• Suggested Solution:1. Text extraction and tokenization using SSIS2. Association Rules (or Sequence Clustering) of
extracted tokens3. Possible prediction of a previously suggested
resolution or just classification of the submission
![Page 48: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/48.jpg)
60
Resources
• Demos & newsletter: www.sqlserverdatamining.com• AdventureWorksDW: www.codeplex.com • Book by Jamie MacLennan and ZhaoHui Tang “Data Mining
with SQL Server 2005”, Wiley 2005, ISBN 0471462616• Also:
• www.beyeblogs.com/donaldfarmer • blogs.msdn.com/jamiemac• www.microsoft.com/sql/technologies/dm• forums.microsoft.com/MSDN/ShowForum.aspx?ForumID
=81&SiteID=1• SQL Server Books Online for all documentation
• Excellent seminars at www.microsoft.com/technetspotlight
![Page 49: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/49.jpg)
61
Summary
• Data Mining is a key technology for Predictive Analysis, a major trend
• Intuitive with great visual feedback for quality• May promote you to a keeper of knowledge
• Discover and explore the hidden knowledge that can make you and your company more successful
![Page 50: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/50.jpg)
62
Questions and Answers
Thank You!
![Page 51: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/51.jpg)
63
© 2008 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.
© 2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
![Page 52: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/52.jpg)
64
BONUS SCENARIO: DATA ENTRY VALIDATION
How can I detect incorrect data entry without hard-coding the rules? Intelligent applications?
![Page 53: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/53.jpg)
65
What is So Special?
• Application behaviour evolves and follows the data mining model• Influenced by real-world events caused by your
application!
• We are creating a feedback loop from the application through its effects back to the application• The “trick” that connects the two is the discovery
of new emerging patterns and old patterns disappearing – the very job of DM
![Page 54: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/54.jpg)
66
Mining Model
Intelligent Application
DMEngine
Data To Predict
DMEngine
Predicted Data
Training Data
Mining Model
Mining Model
DB dataClient dataApplication log
“Just one row”New EntryNew Txion
![Page 55: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/55.jpg)
67
Intelligent Application – StepsA Simplified View
1. Prepare the database for mining2. Create and train the DM model on your data,
consisting of both the inputs and actual outcomes
3. Test the model. If OK...
4. The model predicts outcomes5. Make application logic depend on predicted
outcomes (if, case etc.)
6. Update (and validate) the model periodically as data evolves
![Page 56: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/56.jpg)
68
The Intelligent Bit in Our Apps
• Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome
• Steps:1. Build a case (set of attributes) representing the
transaction you are processing at the moment• E.g. Shopping basket of a customer plus their shipping
info
2. Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model
3. Read returned attributes, especially case probability for a some outcome
• E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure”
4. Your application has just made an intelligent decision!5. Remember to refresh and retest the model regularly –
daily?
![Page 57: Using Data Mining in Your IT Systems Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.co.uk.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c8f5503460f94947dbf/html5/thumbnails/57.jpg)
69
Watch The Demo At...
• www.microsoft.com/technetspotlight
• Find my session “Build More Intelligent Applications Using Data Mining” from Microsoft TechEd Developers 2007 in Barcelona