Data Mining with SQL Server 2005
-
Upload
dean-willson -
Category
Technology
-
view
1.649 -
download
3
description
Transcript of Data Mining with SQL Server 2005
![Page 1: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/1.jpg)
DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
presented to fwPASS on 1/26/2010
![Page 2: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/2.jpg)
About Me
Work for Systemental as a Consultant and Software Developer
Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)
.Net since 2004
President, fwPASS.org
Mfg. Eng. Technology degrees from Ball State University
Six Sigma Black Belt, Certified
![Page 3: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/3.jpg)
What We Will cover
Data mining – what is it?
“Cash for Clunkers”
Other examples
Amazon.com
Coke Freestyle
Basic Data Mining Concepts
Demo time
![Page 4: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/4.jpg)
Wikipedia
Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, frauddetection and scientific discovery.
![Page 5: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/5.jpg)
Cash for Clunkers
Columbia City: SR 30 & SR 9
![Page 6: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/6.jpg)
Objectives of “Cash for Clunkers” Jump start automotive sector sales
Specifically higher mileage vehicles
Get gas guzzlers off the street
![Page 7: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/7.jpg)
Cash for Clunkers
How did they decide who to target and how?
How would you do it?
Where did the data come from?
Where should the data come from?
![Page 8: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/8.jpg)
Who to target?
Anyone, everyone, or targeted
Self qualified
Organic growth or just “pull up” existing sales
Convert foreign sales to GM
Conflict of interest? – Government motors
Discriminatory?
![Page 9: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/9.jpg)
Estimating the effectiveness
Affect of “pull up” vs. organic growth
Peripheral commercial effect
Estimation of payback
Sales, plates and excise tax
Income tax from lay-off recalls
Reduction of unemployment
Auto Insurance
Reduction in tax revenue at gas pumps
![Page 10: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/10.jpg)
Data content and source
Public records
CAFE
GM Data
Industry sponsored studies
![Page 11: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/11.jpg)
Amazon.com
![Page 12: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/12.jpg)
SQL Server 2005 Data Mining
Nine algorithms (3rd party pluggable)
Both Modeling and exploration in VS
Integrated tools: SS*S
API
Data Mining Extensions to SQL (DMX)
![Page 13: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/13.jpg)
Type of analysis
Optimization vs. Predictive
Descriptive – provides deeper understanding of existing data
Predictive – provides insight to understand probability of future conditions
![Page 14: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/14.jpg)
Data Mining Objective
Classification – assign data to known classes (discrete)
Segmentation – clustering in similar groups
Estimation – predicting continuous values
Association – what events occur together
Forecasting – time series estimating of future
![Page 15: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/15.jpg)
Algorithms
1. Decision Trees (attributes from the tree)
2. Naive Bayes (uses all attributes)
3. Clustering
4. Linear Regression
5. Logistic Regression
6. Neural Nets
7. Sequence Clustering
8. Time Series
9. Association Rules (discrete only)
![Page 16: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/16.jpg)
DMX
Column syntax: Name, data type, content type, [usage]
Case being analyzed – key
Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)
Usage: Input, predict, predict-only (not to build any other part of model)
![Page 17: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/17.jpg)
Structure
Datamart, DW, cube
Data source
Mining Structure (which fields)
Mining Models (algorithms, attributes)
Viewers (tree, clusters, discrimination, classification)
![Page 18: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/18.jpg)
Training the model
SSIS Percentage Sampling Data Flow Component
Training, Testing
Estimating error
![Page 19: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/19.jpg)
Demos
Visual Studio
SSMS
Win Client
Web Client
![Page 20: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/20.jpg)
Miscellaneous
Sequence or timing
Prediction + measure of confidence
Caution: Over-fitting the model
Nested tables ex: transactional detail data
Key is never foreign key to case table
Key is what table is about
![Page 21: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/21.jpg)
References
http://dean-o.blogspot.com/
http://abbottanalytics.blogspot.com/
http://www.thearling.com/umass/index_frame.htm
http://www.thearling.com/text/dmtechniques/dmtechniques.htm
MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20Mining%20Web%20Controls%20Library
http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=34035
Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
![Page 22: Data Mining with SQL Server 2005](https://reader034.fdocuments.us/reader034/viewer/2022051514/54840820b4af9f730d8b4a9d/html5/thumbnails/22.jpg)
Thank you!
Website http://www.systemental.com
Blogs http://dean-o.blogspot.com/ http://practicalhoshin.blogspot.com
Twitter http://www.twitter.com/deanwillson
Email [email protected]
LinkedIn http://www.linkedin.com/in/deanwillson