Post on 07-Apr-2018
8/4/2019 2 Introduction to Data Mining
1/33
Introduction to Data MiningRafal LukawieckiStrategic Consultant, Project Botticelli Ltd
rafal@projectbotticelli.co.uk
8/4/2019 2 Introduction to Data Mining
2/33
2
Objectives
Overview Data Mining
Introduce typical applications and scenarios
Explain some DM concepts
Review wider product platform
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or RafalLukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express,implied or statutory, as to the information in this presentation.
2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, asindividually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registeredtrademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only andrepresents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft mustrespond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft andProject Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli
makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
This seminar is partly based on Data Mining book by ZhaoHui Tang and Jamie MacLennan, and also
on Jamies presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this
session. Thank you to Roni Karassik for a slide. Thank you to Mike Tsalidis, Olga Londer, and Marin
Bezic for all the support. Thank you to Maciej Pilecki for assistance with demos.
8/4/2019 2 Introduction to Data Mining
3/33
3
Before We Dive In...
To help me select the most suitable examples anddemonstrations I would like to ask you about yourbackground
Who do you identify yourself with:
IT Professional,
Database Professional,
Software/System Developer?
8/4/2019 2 Introduction to Data Mining
4/33
4
The Essence of Data Mining as
Part of Business Intelligence
8/4/2019 2 Introduction to Data Mining
5/33
5
Business IntelligenceImproving Business Insight
A broad category of applicationsand technologies for gathering,storing, analyzing, sharing andproviding access to data to helpenterprise users make betterbusiness decisions. Gartner
8/4/2019 2 Introduction to Data Mining
6/33
6
RelationshipsAnd Acronyms...
DataMining(DM)
KnowledgeDiscovery inDatabases
(KDD)
Business Intelligence(BI)
8/4/2019 2 Introduction to Data Mining
7/337
Data Mining
Technologies for analysis of data and discovery of(very) hidden patterns
Fairly young (
8/4/2019 2 Introduction to Data Mining
8/338
What does Data Mining Do?
ExploresYour Data
FindsPatterns
PerformsPredictions
8/4/2019 2 Introduction to Data Mining
9/339
DM and BI
BI is geared at an end user, such as a business owner,knowledge worker etc.
DM is an IT technology generallygeared towards amore advanced user today
By the way: who is qualified to use DM today?
8/4/2019 2 Introduction to Data Mining
10/3310
DM Past and Present
Traditional approaches from Microsofts competitors
are for DM experts: White-coat PhD statisticians
DM tools also fairly expensive
Microsofts full approach is designed for those with
somedatabase skills
Tools similar to T-SQL and Management Studio
DM built into Microsoft SQL Server 2005 and 2008 at noextra cost
DM easy is geared at any Excel-aware user
8/4/2019 2 Introduction to Data Mining
11/3311
Predictive Analysis
Presentation Exploration Discovery
Passive
Interactive
ProactiveRole of Software
Business
Insight
Canned reporting
Ad-hoc reporting
OLAP
Data mining
DM Enables Predictive Analysis
8/4/2019 2 Introduction to Data Mining
12/3312
Application and Scenarios
8/4/2019 2 Introduction to Data Mining
13/3313
Value of Predictive AnalysisTypical Applications
PredictiveAnalysis
SeekProfitable
Customers
UnderstandCustomer
Needs
AnticipateCustomer
Churn
PredictSales &
Inventory
BuildEffective
MarketingCampaigns
Detect andPrevent
Fraud
CorrectData During
ETL
8/4/2019 2 Introduction to Data Mining
14/3314
Putting Data
Mining to Work
Doing Data
MiningBusiness
UnderstandingData
Understanding
DataPreparation
Modeling
Evaluation
Deployment
Data
Data Mining ProcessCRISP-DM
www.crisp-dm.org
8/4/2019 2 Introduction to Data Mining
15/3315
Customer Profitability
Typically, you will:
1. Segment or classify customers in a relevant way
Clustering
2. Find a relationship between profit and customercharacteristics
Decision Tree
3. Understand customer preferences
Association Rules
4. Study customer behaviour Sequence Clustering
and
1. Predict profitability of potential new customers
8/4/2019 2 Introduction to Data Mining
16/3316
Predict Sales and Inventory
You may:
1. Structure the sales or inventory data as a time series
Perhaps from a Data Warehouse
2. Forecast future sales and needs
Time Series or Decision Trees with Regression
8/4/2019 2 Introduction to Data Mining
17/3317
Build Effective Marketing
Campaigns
You would:
1. Segment your existing customers
Clustering and Decision Trees
2. Study what makes them respond to your campaigns Decision Tree, Naive Bayes, Clustering, Neural Network
3. Experiment with a campaign by focusing it
Lift Charts
4. Run the campaign
Predict recipients
5. Review your strategy as you get response
Update your models
8/4/2019 2 Introduction to Data Mining
18/3318
Detect and Prevent Fraud
You could:
1. Build a risk model for existing customers or transactions
Decision Trees, Clustering, Neural Networks, and often LogisticRegression
2. Assess risk of a new transaction Predict risk and its probability using the model
Or
1. Model transaction sequences
Sequence Clustering
2. Find unusual ones (outliers) Mine the mining model neural networks, trees, clustering
3. Assess new events as they happen
Predicting by means of the metamodel
8/4/2019 2 Introduction to Data Mining
19/3319
New Opportunity:
Intelligent Applications
Examples of Intelligent Applications:
Input Validation, based on previously accepted data,not on fixed rules
Business Process Validation early detection of failure Adaptive User Interface based on past behaviour
Also known as Predictive Programming
Learn more by downloading Build More IntelligentApplications using Data Miningfromwww.microsoft.com/technetspotlight
http://www.microsoft.com/technetspotlighthttp://www.microsoft.com/technetspotlight8/4/2019 2 Introduction to Data Mining
20/3320
Data Mining Products
8/4/2019 2 Introduction to Data Mining
21/3321
Microsoft DM Competitors
SAS, largest market shareof DM, specialisedproduct for traditionalexperts
SPSS(Clementine),strength in statisticalanalysis
IBM(Intelligent Miner) tiedto DB2, interoperates withMicrosoft through PMML
Oracle(10g), supportsJava APIs
Angoss
(KnowledgeSTUDIO),result visualisation, workswith SQL Server
KXEN, supports OLAPand Excel
8/4/2019 2 Introduction to Data Mining
22/3322
Data acquisition andintegration frommultiple sources
Data transformationand synthesis usingData Mining
Knowledge andpattern detectionthrough Data Mining
Data enrichment withlogic rules andhierarchical views
Data presentationand distribution
Publishing of DataMining results
Integrate Analyze Report
SQL Server 2005We Need More Than Just Database Engine
8/4/2019 2 Introduction to Data Mining
23/3323
DM Technologies in SQL Server
2005
Strong, patented algorithms from Microsoft Researchlabs
Interoperability
PMML (Predictive Model Markup Language) for SAS,SPSS, IBM and Oracle
Multiple tools:
Business Intelligence Development Studio (BIDS)
Data Mining Extensions for Excel (and more) DMX and OLE DB for Data Mining
XML for Analysis (XMLA)
8/4/2019 2 Introduction to Data Mining
24/3324
What is New in SQL Server 2008?Data Mining Enhancements
Enhanced Mining Structures
Easier to prepare and test your models
Models allow for cross-validation
Filtering Algorithm Updates
Improved Time Series algorithm combining best ofARIMA and ARTXP
What-If analysis Microsoft Data Mining Framework
Supplements CRISP-DM
8/4/2019 2 Introduction to Data Mining
25/33
25
DM Add-Ins for Microsoft Office 2007
efine Data
dentifyTask
etResults
8/4/2019 2 Introduction to Data Mining
26/33
Demo1. Using Data Mining Add-in Table Tools for Microsoft Excel
2007
8/4/2019 2 Introduction to Data Mining
27/33
27
Analysis ServicesServer
Mining Model
Data Mining Algorithm DataSource
Server Mining Architecture
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA/AMO
Deploy
BIDSExcelVisioSSMS
App
Data
8/4/2019 2 Introduction to Data Mining
28/33
28
Conclusions
8/4/2019 2 Introduction to Data Mining
29/33
29
ABS-CBN Interactive (ABSi)
Challenge
Selling custom ring tonesand other downloadablecontent for mobile phoneusers requires staying intune with the market.
Searching transactional
data for hints on what tooffer users in cross-sellingvalue-added mobileservices took days anddidnt provide customer-specific recommendations.
Solution
ABSi deployed MicrosoftSQL Server 2005 to useits data mining feature todetermine productrecommendations.
Benefit
More accurate andpersonalized servicerecommendations tocustomers
Doubling response ratesfrom marketing campaigns
Ad hoc reporting inminutes, not days
Eight times faster datamining process
Faster data miningprediction
Wireless Services Firm Doubles Response Rates with SQL Server 2005 Data Mining
Our management is very impressed that we could double our response rate through our SQL
Server 2005 data mining managers of other services ask us to provide the same magic for
themwhich is what we will do with the full project rollout
- Grace Cunanan, Technical Specialist, ABS-CBN Interactive
Subsidiary of the largest integrated media and entertainment company in the Philippines
http://www.abs-cbn.com/8/4/2019 2 Introduction to Data Mining
30/33
30
Clalit Health Services
Challenge
Identify which memberswould most benefit fromproactive intervention toprevent health deterioration
Solution
Use sociodemographic andmedical records to generate apredictive score, identifyingelder members with highestrisk for health deterioration
Once identified, physicianscan try to involve these
patients in proactive treatmentplans to prevent healthdeterioration
Benefit
A chance to preserve lifeand enhance life quality
Reduced health carecosts
Tightly integrated solution
Data Mining Helps Clalit Preserve Health and Save Lives
Provides health care for 3.7 million insured members, representing about 60
percent of Israels population
Providing physicians with a list of patients that the data mining model predicts are at risk of
health deterioration over the next year, gives them the opportunity to intervene, and prevent
what has been predicted.
- Mazal Tuchler, Data Warehouse Manager , Clalit Health Services
8/4/2019 2 Introduction to Data Mining
31/33
31
.8 TB SS2005 DW for Ring-Tone Marketing
Uses Relational, OLAP and Data Mining
3 TB end-to-end BI decision support system
Oracle competitive win
End-to end DW on SQL Server, including OLAPExtensive use of Data Mining Decision Trees
1.2 TB, 20 billion records
Large Brazilian Grocery Chain
.8 TB DW at main TV network in ItalyIncreased viewership by understanding trends
.5 TB DW at US Cable companyEnd to end BI, Analysis and Reporting
More Data Mining Customers
http://www.abs-cbn.com/http://members.microsoft.com/CustomerEvidence/Search/EvidenceDetails.aspx?EvidenceID=2384&LanguageID=1http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=13181http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=10682http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=10932http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=145728/4/2019 2 Introduction to Data Mining
32/33
32
Summary
Data Mining is a powerful technology still undiscoveredby many IT and database professionals
Turns data into intelligence
SQL Server 2005 and 2008 Analysis Services havebeen created with you in mind
Lets mine for valuable gems of knowledge in our
databases!
8/4/2019 2 Introduction to Data Mining
33/33
2008 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The materialpresented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this
presentation.
2007 Project Bot ticelli Ltd & Microsoft Corp. Some slides contain quotations f rom copyrighted materials by other authors, as individually attributed. All
rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or
other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of thispresentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the
part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project
Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.