2 Introduction to Data Mining

Post on 07-Apr-2018

220 views 0 download

Transcript of 2 Introduction to Data Mining

  • 8/4/2019 2 Introduction to Data Mining

    1/33

    Introduction to Data MiningRafal LukawieckiStrategic Consultant, Project Botticelli Ltd

    rafal@projectbotticelli.co.uk

  • 8/4/2019 2 Introduction to Data Mining

    2/33

    2

    Objectives

    Overview Data Mining

    Introduce typical applications and scenarios

    Explain some DM concepts

    Review wider product platform

    The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or RafalLukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express,implied or statutory, as to the information in this presentation.

    2007 Project Botticelli Ltd & Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, asindividually attributed. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registeredtrademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only andrepresents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft mustrespond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft andProject Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli

    makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

    This seminar is partly based on Data Mining book by ZhaoHui Tang and Jamie MacLennan, and also

    on Jamies presentations. Thank you to Jamie and to Donald Farmer for helping me in preparing this

    session. Thank you to Roni Karassik for a slide. Thank you to Mike Tsalidis, Olga Londer, and Marin

    Bezic for all the support. Thank you to Maciej Pilecki for assistance with demos.

  • 8/4/2019 2 Introduction to Data Mining

    3/33

    3

    Before We Dive In...

    To help me select the most suitable examples anddemonstrations I would like to ask you about yourbackground

    Who do you identify yourself with:

    IT Professional,

    Database Professional,

    Software/System Developer?

  • 8/4/2019 2 Introduction to Data Mining

    4/33

    4

    The Essence of Data Mining as

    Part of Business Intelligence

  • 8/4/2019 2 Introduction to Data Mining

    5/33

    5

    Business IntelligenceImproving Business Insight

    A broad category of applicationsand technologies for gathering,storing, analyzing, sharing andproviding access to data to helpenterprise users make betterbusiness decisions. Gartner

  • 8/4/2019 2 Introduction to Data Mining

    6/33

    6

    RelationshipsAnd Acronyms...

    DataMining(DM)

    KnowledgeDiscovery inDatabases

    (KDD)

    Business Intelligence(BI)

  • 8/4/2019 2 Introduction to Data Mining

    7/337

    Data Mining

    Technologies for analysis of data and discovery of(very) hidden patterns

    Fairly young (

  • 8/4/2019 2 Introduction to Data Mining

    8/338

    What does Data Mining Do?

    ExploresYour Data

    FindsPatterns

    PerformsPredictions

  • 8/4/2019 2 Introduction to Data Mining

    9/339

    DM and BI

    BI is geared at an end user, such as a business owner,knowledge worker etc.

    DM is an IT technology generallygeared towards amore advanced user today

    By the way: who is qualified to use DM today?

  • 8/4/2019 2 Introduction to Data Mining

    10/3310

    DM Past and Present

    Traditional approaches from Microsofts competitors

    are for DM experts: White-coat PhD statisticians

    DM tools also fairly expensive

    Microsofts full approach is designed for those with

    somedatabase skills

    Tools similar to T-SQL and Management Studio

    DM built into Microsoft SQL Server 2005 and 2008 at noextra cost

    DM easy is geared at any Excel-aware user

  • 8/4/2019 2 Introduction to Data Mining

    11/3311

    Predictive Analysis

    Presentation Exploration Discovery

    Passive

    Interactive

    ProactiveRole of Software

    Business

    Insight

    Canned reporting

    Ad-hoc reporting

    OLAP

    Data mining

    DM Enables Predictive Analysis

  • 8/4/2019 2 Introduction to Data Mining

    12/3312

    Application and Scenarios

  • 8/4/2019 2 Introduction to Data Mining

    13/3313

    Value of Predictive AnalysisTypical Applications

    PredictiveAnalysis

    SeekProfitable

    Customers

    UnderstandCustomer

    Needs

    AnticipateCustomer

    Churn

    PredictSales &

    Inventory

    BuildEffective

    MarketingCampaigns

    Detect andPrevent

    Fraud

    CorrectData During

    ETL

  • 8/4/2019 2 Introduction to Data Mining

    14/3314

    Putting Data

    Mining to Work

    Doing Data

    MiningBusiness

    UnderstandingData

    Understanding

    DataPreparation

    Modeling

    Evaluation

    Deployment

    Data

    Data Mining ProcessCRISP-DM

    www.crisp-dm.org

  • 8/4/2019 2 Introduction to Data Mining

    15/3315

    Customer Profitability

    Typically, you will:

    1. Segment or classify customers in a relevant way

    Clustering

    2. Find a relationship between profit and customercharacteristics

    Decision Tree

    3. Understand customer preferences

    Association Rules

    4. Study customer behaviour Sequence Clustering

    and

    1. Predict profitability of potential new customers

  • 8/4/2019 2 Introduction to Data Mining

    16/3316

    Predict Sales and Inventory

    You may:

    1. Structure the sales or inventory data as a time series

    Perhaps from a Data Warehouse

    2. Forecast future sales and needs

    Time Series or Decision Trees with Regression

  • 8/4/2019 2 Introduction to Data Mining

    17/3317

    Build Effective Marketing

    Campaigns

    You would:

    1. Segment your existing customers

    Clustering and Decision Trees

    2. Study what makes them respond to your campaigns Decision Tree, Naive Bayes, Clustering, Neural Network

    3. Experiment with a campaign by focusing it

    Lift Charts

    4. Run the campaign

    Predict recipients

    5. Review your strategy as you get response

    Update your models

  • 8/4/2019 2 Introduction to Data Mining

    18/3318

    Detect and Prevent Fraud

    You could:

    1. Build a risk model for existing customers or transactions

    Decision Trees, Clustering, Neural Networks, and often LogisticRegression

    2. Assess risk of a new transaction Predict risk and its probability using the model

    Or

    1. Model transaction sequences

    Sequence Clustering

    2. Find unusual ones (outliers) Mine the mining model neural networks, trees, clustering

    3. Assess new events as they happen

    Predicting by means of the metamodel

  • 8/4/2019 2 Introduction to Data Mining

    19/3319

    New Opportunity:

    Intelligent Applications

    Examples of Intelligent Applications:

    Input Validation, based on previously accepted data,not on fixed rules

    Business Process Validation early detection of failure Adaptive User Interface based on past behaviour

    Also known as Predictive Programming

    Learn more by downloading Build More IntelligentApplications using Data Miningfromwww.microsoft.com/technetspotlight

    http://www.microsoft.com/technetspotlighthttp://www.microsoft.com/technetspotlight
  • 8/4/2019 2 Introduction to Data Mining

    20/3320

    Data Mining Products

  • 8/4/2019 2 Introduction to Data Mining

    21/3321

    Microsoft DM Competitors

    SAS, largest market shareof DM, specialisedproduct for traditionalexperts

    SPSS(Clementine),strength in statisticalanalysis

    IBM(Intelligent Miner) tiedto DB2, interoperates withMicrosoft through PMML

    Oracle(10g), supportsJava APIs

    Angoss

    (KnowledgeSTUDIO),result visualisation, workswith SQL Server

    KXEN, supports OLAPand Excel

  • 8/4/2019 2 Introduction to Data Mining

    22/3322

    Data acquisition andintegration frommultiple sources

    Data transformationand synthesis usingData Mining

    Knowledge andpattern detectionthrough Data Mining

    Data enrichment withlogic rules andhierarchical views

    Data presentationand distribution

    Publishing of DataMining results

    Integrate Analyze Report

    SQL Server 2005We Need More Than Just Database Engine

  • 8/4/2019 2 Introduction to Data Mining

    23/3323

    DM Technologies in SQL Server

    2005

    Strong, patented algorithms from Microsoft Researchlabs

    Interoperability

    PMML (Predictive Model Markup Language) for SAS,SPSS, IBM and Oracle

    Multiple tools:

    Business Intelligence Development Studio (BIDS)

    Data Mining Extensions for Excel (and more) DMX and OLE DB for Data Mining

    XML for Analysis (XMLA)

  • 8/4/2019 2 Introduction to Data Mining

    24/3324

    What is New in SQL Server 2008?Data Mining Enhancements

    Enhanced Mining Structures

    Easier to prepare and test your models

    Models allow for cross-validation

    Filtering Algorithm Updates

    Improved Time Series algorithm combining best ofARIMA and ARTXP

    What-If analysis Microsoft Data Mining Framework

    Supplements CRISP-DM

  • 8/4/2019 2 Introduction to Data Mining

    25/33

    25

    DM Add-Ins for Microsoft Office 2007

    efine Data

    dentifyTask

    etResults

  • 8/4/2019 2 Introduction to Data Mining

    26/33

    Demo1. Using Data Mining Add-in Table Tools for Microsoft Excel

    2007

  • 8/4/2019 2 Introduction to Data Mining

    27/33

    27

    Analysis ServicesServer

    Mining Model

    Data Mining Algorithm DataSource

    Server Mining Architecture

    Excel/Visio/SSRS/Your App

    OLE DB/ADOMD/XMLA/AMO

    Deploy

    BIDSExcelVisioSSMS

    App

    Data

  • 8/4/2019 2 Introduction to Data Mining

    28/33

    28

    Conclusions

  • 8/4/2019 2 Introduction to Data Mining

    29/33

    29

    ABS-CBN Interactive (ABSi)

    Challenge

    Selling custom ring tonesand other downloadablecontent for mobile phoneusers requires staying intune with the market.

    Searching transactional

    data for hints on what tooffer users in cross-sellingvalue-added mobileservices took days anddidnt provide customer-specific recommendations.

    Solution

    ABSi deployed MicrosoftSQL Server 2005 to useits data mining feature todetermine productrecommendations.

    Benefit

    More accurate andpersonalized servicerecommendations tocustomers

    Doubling response ratesfrom marketing campaigns

    Ad hoc reporting inminutes, not days

    Eight times faster datamining process

    Faster data miningprediction

    Wireless Services Firm Doubles Response Rates with SQL Server 2005 Data Mining

    Our management is very impressed that we could double our response rate through our SQL

    Server 2005 data mining managers of other services ask us to provide the same magic for

    themwhich is what we will do with the full project rollout

    - Grace Cunanan, Technical Specialist, ABS-CBN Interactive

    Subsidiary of the largest integrated media and entertainment company in the Philippines

    http://www.abs-cbn.com/
  • 8/4/2019 2 Introduction to Data Mining

    30/33

    30

    Clalit Health Services

    Challenge

    Identify which memberswould most benefit fromproactive intervention toprevent health deterioration

    Solution

    Use sociodemographic andmedical records to generate apredictive score, identifyingelder members with highestrisk for health deterioration

    Once identified, physicianscan try to involve these

    patients in proactive treatmentplans to prevent healthdeterioration

    Benefit

    A chance to preserve lifeand enhance life quality

    Reduced health carecosts

    Tightly integrated solution

    Data Mining Helps Clalit Preserve Health and Save Lives

    Provides health care for 3.7 million insured members, representing about 60

    percent of Israels population

    Providing physicians with a list of patients that the data mining model predicts are at risk of

    health deterioration over the next year, gives them the opportunity to intervene, and prevent

    what has been predicted.

    - Mazal Tuchler, Data Warehouse Manager , Clalit Health Services

  • 8/4/2019 2 Introduction to Data Mining

    31/33

    31

    .8 TB SS2005 DW for Ring-Tone Marketing

    Uses Relational, OLAP and Data Mining

    3 TB end-to-end BI decision support system

    Oracle competitive win

    End-to end DW on SQL Server, including OLAPExtensive use of Data Mining Decision Trees

    1.2 TB, 20 billion records

    Large Brazilian Grocery Chain

    .8 TB DW at main TV network in ItalyIncreased viewership by understanding trends

    .5 TB DW at US Cable companyEnd to end BI, Analysis and Reporting

    More Data Mining Customers

    http://www.abs-cbn.com/http://members.microsoft.com/CustomerEvidence/Search/EvidenceDetails.aspx?EvidenceID=2384&LanguageID=1http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=13181http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=10682http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=10932http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=14572
  • 8/4/2019 2 Introduction to Data Mining

    32/33

    32

    Summary

    Data Mining is a powerful technology still undiscoveredby many IT and database professionals

    Turns data into intelligence

    SQL Server 2005 and 2008 Analysis Services havebeen created with you in mind

    Lets mine for valuable gems of knowledge in our

    databases!

  • 8/4/2019 2 Introduction to Data Mining

    33/33

    2008 Microsoft Corporation & Project Botticelli Ltd. All rights reserved.

    The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The materialpresented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this

    presentation.

    2007 Project Bot ticelli Ltd & Microsoft Corp. Some slides contain quotations f rom copyrighted materials by other authors, as individually attributed. All

    rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or

    other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of thispresentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the

    part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project

    Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.