SSAS 2008 Data Mining
Lynn Langit/MSDN Developer Evangelist Microsofthttp://blogs.msdn.com/SoCalDevGal
Session Prerequisites• Working SQL Server 2008
Developer• Understanding of OLAP concepts• Working SQL Server Analysis
Server 2005 Developer• Interest in or basic knowledge of
Data Mining concepts
Objectives and Agenda• Understand what, why, when & how of SQL
Server 2008 Data Mining• Examine the core functionality of the Data
Mining Extensions• Hear about the new and/or advanced
functionality of Data Mining
Predictive AnalyticsPredictive AnalyticsPredictive AnalyticsPredictive Analytics
PresentatioPresentationn
ExplorationExploration DiscoveryDiscovery
PassivePassive
InteractiInteractiveve
ProactivProactivee
Role of SoftwareRole of Software
Business Business InsightInsight
Canned reportingCanned reporting
Ad-hoc reportingAd-hoc reporting
OLAPOLAP
Data miningData mining
What and Why Data Mining?
Cubes vs. Data Mining
DM - Scenarios to Tasks
Tasks to Techniques
BI for Everyone
Individual – Excel Individual – Excel
Project – Share PointProject – Share Point
Microsoft’s Predictive Analytics
Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)
Application Application DeveloperDeveloper
Data Mining Data Mining SpecialistSpecialist
Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation
SQL Server 2008 SQL Server 2008 Business Intelligence Development StudioBusiness Intelligence Development Studio
Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services
Information Information WorkerWorker
Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system
Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining
BI AnalystBI Analyst
Custom Custom AlgorithmsAlgorithms
SQL Services SQL Services AzureAzure
Data Mining Add-ins for Office 2007Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007
Data Mining Template for Visio 2007Data Mining Template for Visio 2007
Data Mining Client for Excel 2007Data Mining Client for Excel 2007
Information Information WorkerWorker
BI AnalystBI Analyst
Data Mining Data Mining SpecialistSpecialist
SSASSSAS(Data(Data
Mining)Mining)ExcelExcel
SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel
SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps
SSISSSISSSASSSASExcelExcel
Business Business UnderstandiUnderstandi
ngng
Data Data UnderstandiUnderstandi
ngng
Data Data PreparationPreparation
ModelingModeling
EvaluationEvaluation
DeploymentDeployment
DataData
Microsoft Data Mining Lifecycle CRISP-DM
www.crisp-dm.org
Understand & Prepare specifics
Demo
1 – Explore / Clean / Partition Data2 – Prepare Data
Modeling Specifics
Demo
3 – Select algorithm4 – Create model
Evaluation Specifics
Demo
5 – Evaluate Model6 – Deploy model7- Update model8 – Query model
Data Mining – Logical Model
Mining ModelMining Model
Mining ModelMining Model
Training DataTraining Data
DB dataDB dataClient dataClient dataApplication dataApplication data
Data MiningData MiningEngineEngine
To To PredictPredict
Predicted DataPredicted Data
Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””
Data MiningData MiningEngineEngine
algorithmalgorithm
Analysis ServicesAnalysis ServicesServerServer
Mining ModelMining Model
Data Mining AlgorithmData Mining Algorithm DataDataSourceSource
Data Mining - Physical Model
Your ApplicationYour Application
OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA
DeploDeployy
BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)
App DataApp Data
Data Mining Interfaces – APIs
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET
.Net Stored Procedures Microsoft Algorithms Third Party Algorithms
XMLAXMLAOver TCP/IPOver TCP/IP
OLEDB for OLAP/DM ADO/DSO
XMLAXMLAOver HTTPOver HTTP
Any Platform, Any Device
C++ App VB App .Net App
AMO
Any App
ADOMD.NET
WANWAN
DM Interfaces
Configuration & Deployment
Model Creation/Management Database Administrators Session Mining Models
Model Application Permissions on models Permissions on data sources
• Browse• Copy to Excel• Drillthrough
• Query• Default• Advanced
• Excel Services• Manage models and structures
• Export/Import• Rename
• Connection• Database• Trace
Data Mining Extensions (DMX) CREATE MINING MODELCREATE MINING MODEL
CreditRiskCreditRisk
(CustID(CustID LONG KEY, LONG KEY,
Gender TEXT DISCRETE,Gender TEXT DISCRETE,
Income Income LONG LONG CONTINUOUS,CONTINUOUS,
Profession TEXT DISCRETE,Profession TEXT DISCRETE,
RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)
USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees
INSERT INTOINSERT INTO CreditRisk CreditRisk
(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)
Select Select
CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk
From CustomersFrom Customers
SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)
FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers
ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender
ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income
AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession
DMX Column Expressions
• Predictable Columns• Source Data Columns
• Functions - Predict“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to
PredictAssociationPredictSequencePredictTimeSeries
PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode
Arithmetic operators Stored Procedure Subselect
Select from nested tables
Demo – Data Mining & Excel 20007
integration
Excel Functions*
DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])
DMPREDICT ( Connection, Model, PredictionResult,
Value1, Name1, [...,Value32, Name32])
DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])
DM in the Cloud
Test Data Types•Relational•CSV•SQL Services (Azure Services)
Try it in the cloud…
Analysis Results in the Cloud…
Calling the Cloud…(from Excel 2007)
New to SQL Server 2008 DM
• Microsoft Time Series algorithm improved • ARIMA plus ARTxp method, and a blending algorithm = better results • New prediction mode allows adding new data to time series models
• Holdout Support added• Easily partition data into training and test sets that are stored in mining structure &
available to query after processing
• Ability to build mining models based on filtered subsets added• Results in less structures, i.e. can just filter existing
• Drillthrough functionality extended • makes all mining structure columns available, not just columns included in the model• allows you to build more compact models
• Cross-validation added• allows users to quickly validate their modeling approach by automatically building
temporary models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.
Summary
• Data Mining in SQL Server 2008 is mature, powerful and accessible
• Can use Excel 2007• Familiar client for BI – OLAP cubes AND Data Mining
models• Model Creators / Users• Excel Data or Server Data
• SSAS and Excel both support the full DM Cycle• Data Understanding & Data Preparation• Modeling, Validation & Deployment
• SQL Services Incubations available now• Data Mining from the Cloud• More
DM Webcasts
Fri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)
DM Resources
Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx
Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx
SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx
BI Resources from Lynn Langit
http://blogs.msdn.com/SoCalDevGalhttp://blogs.msdn.com/SoCalDevGal
““How Do I…BI?” screencast series on MSDNHow Do I…BI?” screencast series on MSDN
““Smart Business Intelligence Solutions with Microsoft SQL Server Smart Business Intelligence Solutions with Microsoft SQL Server 2008” 2008” MSPress Feb 2009
““Foundations of SQL Server 2005 Business IntelligenceFoundations of SQL Server 2005 Business Intelligence” ” APress April 2007
Top Related