Predictive Analytics: No Crystal Ball Required
Transcript of Predictive Analytics: No Crystal Ball Required
Business Analytics
Predictive Analytics: No Crystal Ball Required
© 2010 IBM Corporation
Steve Barbee, MS Data Mining, MS Plasma PhysicsIBM SPSS Predictive Analytics SpecialistJune 15, 2010
Business Analytics
Contents
� What is Predictive Analytics?– Right Time, High Priority– Definitions– Disciplines– vs. Statistics– Datasets– vs. BI Methods
� What Does It Do, Where Is It Applied?– Questions It Answers
� How Does It Work?– Modeler Data Mining Workbench– Mining Methods– Text Mining– Training a Learning Machine– Breadth of Data– Scoring Large Datasets
� How Do You Teach It?– Hot Jobs
© 2010 IBM Corporation
– Questions It Answers– Application Areas– IBM’s Large Investment
– Hot Jobs– Disciplines– Curriculum– Textbooks
Business Analytics
The Time is Still Right for Analytics
• Executives are looking for new sources of advantage and differentiation
• They have more data about their businesses than ever before
• A new generation of technically literate executives is coming into organizations
• The ability to make sense of data through computers and software has finally come of age
Tom Davenport & Jeanne Harris, Competing on Analytics, p.11
© 2010 IBM Corporation
Top Four of the Ten Most Important Visionary Plan ElementsInterviewed CIOs could select as many as they wanted
Source: IBM Global CIO Study 2009; n = 2345
BI/Analytics #1investment to improve competitiveness
Business Analytics
� “…the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules” -- Berry & Linoff*
� “…the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” --Gartner Group
Predictive analytics
What is Data Mining?
© 2010 IBM Corporation
� “Predictive analytics is a set of business intelligence technologies that uncovers relationships and patterns within large volumes of data that can be used to predict behavior and events.” -- TDWI Research**
* From Data Mining Techniques: For Marketing, Sales & Customer Support, Michael J.A. Berry & Gordon LInoff, p.5
** “Predictive Analytics,” What Works in Data Integration, TDWI Research, Vol.23, 2007, p.49
Business Analytics
Artificial Information
Databases
Neural Networks
ML Perceptron
Machine Learning
Batch & OLAP reports Data Warehousing
Relational Data Model Association Rules
Similarity Measures
Some Fields Contributing To Data Mining
© 2010 IBM Corporation
Artificial Intelligence
Statistics
InformationRetrieval
Machine Learning
Genetic Algorithm
Kohonen SOM
Decision Tree
Similarity Measures
Clustering
SMART IR systems
Bayes (Naïve & Nets) Maximum Likelihood Estimate
Regression analysis Resampling, Jackknife, Bias reduction
Linear classification Exploratory data analysis
EM algorithm K-Means clustering
Based on Data Mining: Intro. & Adv. Topics, Margaret H. Dunham, p.13
Business Analytics
5
6
7
8
9
10C
om
mo
n L
og
ari
thm
of
Nu
mb
er
of
Reco
rds
Narrow
& Deep
Retail sales
Range of Records and Variables in Data Mining
© 2010 IBM Corporation
0
1
2
3
4
5
0 1 2 3 4 5 6 7
Co
mm
on
Lo
gari
thm
of
Nu
mb
er
of
Reco
rds
Common Logarithm of Number of Variables
Genomics
Semiconductor
Manufacturing
Wide &
Shallow
Proteomics
Modified from S. Barbee thesis: http://web.ccsu.edu/datamining/data%20mining%20theses/steve%20barbee%20thesis1905.pdf /
Business Analytics
Top-Down Approaches:Query, Search
Bottom-Up Approaches:Data Mining, Text Mining
� A Statistical Approach can
involve a user forming a theory
about a possible relationship in
a database and converting that
to a hypothesis and testing
� The difference with data mining (which includes multivariate statistical models!)is that the interrogation of the data is done by the data
Time To Change the 2 Cultures* Clash
© 2010 IBM Corporation
to a hypothesis and testing
that hypothesis using a
statistical method. It is a
manual, user-driven, top-down
approach to data analysis.
� Source DM Review
data is done by the data mining method--rather than by the user. It is a data-driven, self-organizing, bottom-up approach to data analysis
Statisticians can use their favorite methods from within Modeler 14 and Data Miners can broaden their capabilities by invoking statistical methods from Statistics 18
* "Statistical Modeling: The Two Cultures," Leo Breiman, Statistical Science, 2001, Vol.16 (3), pp.199-231.
Business Analytics
The Kinds of Questions that Data Mining Can Answer
• Based on the percussion beat, what genre of music is this?
• Which books of the New Testament have the same author?
• What class of astronomical object is this image?
• Which genes express when drug B prevents the rejection of a transplanted organ?
• Which transformer in a grid is likely to fail due to a breakdown of its dielectric?
© 2010 IBM Corporation
• What combination of repair parts are needed at worldwide aircraft service centers?
• To which of 4 products will a customer respond in a marketing campaign?
• How much of a costume should store # 7005 stock for Halloween this year?
• Which annuity holder will prematurely surrender their policy?
• Which physician will prescribe more of this acid reflux drug than an alternative?
Business Analytics
Neonatal Care Trading Advantage
Law Enforcement Radio Astronomy
Environment
Telecom
Application Areas
© 2010 IBM Corporation
Manufacturing Smart Traffic Fraud Prevention
Business Analytics
� Over $12B in software
investments since 2005
� Over 4,000
Dedicated Consultants
� Analytics in a Box to
IBM is Investing to Accelerate an Information-Led Transformation
© 2010 IBM Corporation10
� Analytics in a Box to
Accelerate Time to Value
� Largest Math Department in
Private Industry
“IBM, not SAP or Oracle, is now the industry's premo analytics solution/platform vendor…”
Business Analytics
Query/Reporting OLAP Data Mining
• Hypothesis-driven
• Manual
• Hypothesis-driven
• Manual
Tra
inin
g
• Data- & Goal-driven
• Creates Hypotheses
• Automatic
Some Business Analytics Methods Compared
© 2010 IBM Corporation
‘Which training regimen increases the lactate threshold the most?
Diet
Tra
inin
g
‘Drill down Training = 5 and Diet = 4 and VO2 = 9th
decile
Rule 3 for ‘Athlete Qualified’:
VO2 Max > 5th decile and
Interval Training Regiment in {1-
5, 7-10}
results in 100% Qualified for 83
athletes
Reports & Graphs
ScoringModel
Business Analytics
IBM Analytics Landscape
Predictive Analytics
Optimization
Co
mp
etitive
Ed
ge
© 2010 IBM CorporationBased on: Competing on Analytics, Davenport and Harris, 2007
Complexity
Querying, Reporting, OLAP
Simulation, AlertsCo
mp
etitive
Ed
ge
Business Analytics
• Easy to Learn / Visual Design Paradigm
• Visual approach - no writing code!
• Comprehensive range of data mining methods
• Powerful Automated modeling
• Automatically prepares data
SPSS Modeler Capabilities
© 2010 IBM Corporation
• Automatically finds the best model
• Mines text, web & survey data
• Fully integrated with Statistics
• Open & Scalable architecture
• No proprietary database required
• Leverage your existing IT investment
• Scales to enterprise volumes with SQL pushback in-database scoring
Business Analytics
Mining Methods in IBM SPSS Modeler 14
Data Preparation� Dimension Reduction:
– Feature Selection– Principal Components Analysis– Factor Analysis
Classification and Regression� Naïve Bayes� Bayesian Networks� Trees:
� Generalized Linear Model� Discriminant Analysis� SVM (Support Vector Machine)
Segmentation and Anomaly Detection� Clustering:
– K-Means – Kohonen Self-Organizing Maps– 2-Step (based on BIRCH)
© 2010 IBM Corporation
� Trees: – CHAID– C5.0– C&RT– QUEST
� Neural Networks– Multi-Layer Perceptron– Radial Basis Functions
� Regression– Binomial, Multinomial Logistic– Multiple, Multivariate Linear
Forecasting & Survival Analysis� Time Series (ARIMA**)� Cox Regression
Market Basket & Sequence Analysis� Association Rules:
– A Priori– GRI– CARMA
Case-Based Reasoning� KNN – K Nearest Neighbor
Business Analytics
Getting Closer to 360-degree Customer View:
Demographics Data Web Data Text Mining: Comments
© 2010 IBM Corporation
Customer Usage Data
Business Analytics
Predict: SPSS Text Analytics
� Leverages unstructured
data via call center notes, blogs, web pages, open ended surveys etc. to improve predictive model accuracy
� Extracts concepts from
© 2010 IBM CorporationPage 17
� Extracts concepts from text and can categorize
them as sentiments
� Strong visualization
capabilities enable quick
understanding of business issues
Business Analytics
Classification and Regression Require a Target Field
and a
TargetInputs
Text Analytics adds columns such as the number of calls categorized as aNegative Billing Sentiment
Neg Billg
© 2010 IBM Corporation
Business Analytics
Mining Methods “Learn” from Data
Customer NotesText Mining(Category = T or F)
Merged Data
Customer DatabaseSurvey/demographic (Satisfaction = 1—4 )
Web page hitsWeb Mining(Event = Y or N)
© 2010 IBM Corporation
Predictive Model
New Data
Scored Predictions
Data To Train
Learning method
Data To TestModel
Merged Data
2/3 1/3
Business Analytics
Predicton newdata
Understand Prepare Model Evaluate Deploy
Connectto datasources
Parse Trx by Mo.Aggregate call dataMerge (plan & ID)
Define Target& Train Method
TestMethod
Steps in the Data Mining Process
© 2010 IBM Corporation
Transform log TrxBinary, �� trendFeature selection
Gains,accuracy,AUROC,Profit,Contin-gencymatrix
Actions,Attitudes,Attributes
Salesstrategy
ExportResults,Model
Trees, NeuralNetworks,Regressions,SVM, BayesianNetwork
Trans-actions,3rd Party,Surveys
Subdivide by region, plans, etc.
Data exploration
Anomaly detection
Business Analytics
Automated Data Mining Scoring Process
Build a Geographic
Crime Predictive Model
Score the Model
on New Data in
Your Database
© 2010 IBM Corporation
21© 2009 SPSS Inc.
Crime Predictive Model
Deploy a Map of
Hot Spots in the
Field
Business Analytics
In addition, as the U.S. business environment becomes increasingly competitive and organizations strive to increase efficiency and reduce costs through the use of information technology, computer and mathematical science occupations will see strong employment growth.“ -- 2008—2018 Outlook in Monthly Labor Review, Nov. 2009, p.83
Should I Teach Data Mining Skills in My Department?
Hot Careers for College Graduates 2010A Special Report for Recent and Mid-Career College GraduatesUC San Diego Extension, May 2010
© 2010 IBM Corporation
1. Health Information Technology2. Clinical Trials Design and Management for Oncology3. Data Mining4. Embedded Engineering5. Feature Writing for the Web6. Geriatric Health Care7. Mobile Media8. Occupational Health and Safety
9. Spanish/English Translation and Interpretation10. Sustainable Business Practices and the Greening of all Jobs11. Teaching Adult Learners12. Teaching English as a Foreign Language13. Marine Biodiversity and Conservation14. Health Law
Business Analytics
A Sampling of Academic Disciplines Impacted by Data Mining – A Method of Obtaining Knowledge Empirically
ArtsMusicLanguage, LinguisticsWriting / Communications
Political Science / GovernmentCrimePublic SafetyElection Campaigning
Physical EducationAthletic Performance
Engineering ManagementUtilitiesPetrochemicalYield & Reliability
Science
© 2010 IBM Corporation
Election Campaigning
LawTax FraudLegal Documents
EducationAdmissionsRetentionPerformance
ScienceAstronomyMaterial Science
MedicineGenomic and Proteomic AnalysisBiomarkersDiagnosis
Business Analytics
I. Foundations1. Intro
3. Advanced association, correlation and frequent pattern analysis
How Do You Teach It?
© 2010 IBM Corporation
1. Intro2. Data Preprocessing3. Data Warehousing and OLAP for Data Mining4. Association, correlation and frequent pattern analysis5. Classification6. Cluster and Outlier Analysis7. Mining Time-Series and Sequence Data8. Text Mining and Web Mining9. Visual Data Mining10. Data Mining: Industry efforts and social impactsII. Advanced Topics
1. Advanced Data Preprocessing2. Data Warehousing, OLAP, Data Generalization
analysis4. Advanced Classification5. Advanced cluster analysis6. Advanced Time-Series and Sequential Data Mining7. Mining Data Streams8. Mining Spatial, Spatiotemporal and Multimedia data9. Mining Biological Data10. Text Mining11. Hypertext and Web mining12. Data Mining Languages13. Data Mining Applications14. Data Mining and Society15. Trends in Data Mining
http://www.sigkdd.org/curriculum/CURMay06.pdf
Business Analytics
Hastie, Tibshirani & Friedman
Han, Kamber & Pei Statistical
Textbooks
© 2010 IBM Corporation
Witten & Frank
Tan, Steinbach
& KumarLarose
Margaret Dunham
Witten & Frank
Larose
Mitchell
DIF
FIC
ULT
Y
Machine Learning Practical S/W apps.
Business
Berry & Linoff
Nisbet, Elder & Miner
Business Analytics
© 2010 IBM Corporation
For a copy of the presentation please e-mail: