Predictive Analytics: No Crystal Ball Required

26
Business Analytics Predictive Analytics: No Crystal Ball Required © 2010 IBM Corporation Steve Barbee, MS Data Mining, MS Plasma Physics IBM SPSS Predictive Analytics Specialist June 15, 2010

Transcript of Predictive Analytics: No Crystal Ball Required

Business Analytics

Predictive Analytics: No Crystal Ball Required

© 2010 IBM Corporation

Steve Barbee, MS Data Mining, MS Plasma PhysicsIBM SPSS Predictive Analytics SpecialistJune 15, 2010

Business Analytics

Contents

� What is Predictive Analytics?– Right Time, High Priority– Definitions– Disciplines– vs. Statistics– Datasets– vs. BI Methods

� What Does It Do, Where Is It Applied?– Questions It Answers

� How Does It Work?– Modeler Data Mining Workbench– Mining Methods– Text Mining– Training a Learning Machine– Breadth of Data– Scoring Large Datasets

� How Do You Teach It?– Hot Jobs

© 2010 IBM Corporation

– Questions It Answers– Application Areas– IBM’s Large Investment

– Hot Jobs– Disciplines– Curriculum– Textbooks

Business Analytics

The Time is Still Right for Analytics

• Executives are looking for new sources of advantage and differentiation

• They have more data about their businesses than ever before

• A new generation of technically literate executives is coming into organizations

• The ability to make sense of data through computers and software has finally come of age

Tom Davenport & Jeanne Harris, Competing on Analytics, p.11

© 2010 IBM Corporation

Top Four of the Ten Most Important Visionary Plan ElementsInterviewed CIOs could select as many as they wanted

Source: IBM Global CIO Study 2009; n = 2345

BI/Analytics #1investment to improve competitiveness

Business Analytics

� “…the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules” -- Berry & Linoff*

� “…the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” --Gartner Group

Predictive analytics

What is Data Mining?

© 2010 IBM Corporation

� “Predictive analytics is a set of business intelligence technologies that uncovers relationships and patterns within large volumes of data that can be used to predict behavior and events.” -- TDWI Research**

* From Data Mining Techniques: For Marketing, Sales & Customer Support, Michael J.A. Berry & Gordon LInoff, p.5

** “Predictive Analytics,” What Works in Data Integration, TDWI Research, Vol.23, 2007, p.49

Business Analytics

Artificial Information

Databases

Neural Networks

ML Perceptron

Machine Learning

Batch & OLAP reports Data Warehousing

Relational Data Model Association Rules

Similarity Measures

Some Fields Contributing To Data Mining

© 2010 IBM Corporation

Artificial Intelligence

Statistics

InformationRetrieval

Machine Learning

Genetic Algorithm

Kohonen SOM

Decision Tree

Similarity Measures

Clustering

SMART IR systems

Bayes (Naïve & Nets) Maximum Likelihood Estimate

Regression analysis Resampling, Jackknife, Bias reduction

Linear classification Exploratory data analysis

EM algorithm K-Means clustering

Based on Data Mining: Intro. & Adv. Topics, Margaret H. Dunham, p.13

Business Analytics

5

6

7

8

9

10C

om

mo

n L

og

ari

thm

of

Nu

mb

er

of

Reco

rds

Narrow

& Deep

Retail sales

Range of Records and Variables in Data Mining

© 2010 IBM Corporation

0

1

2

3

4

5

0 1 2 3 4 5 6 7

Co

mm

on

Lo

gari

thm

of

Nu

mb

er

of

Reco

rds

Common Logarithm of Number of Variables

Genomics

Semiconductor

Manufacturing

Wide &

Shallow

Proteomics

Modified from S. Barbee thesis: http://web.ccsu.edu/datamining/data%20mining%20theses/steve%20barbee%20thesis1905.pdf /

Business Analytics

Top-Down Approaches:Query, Search

Bottom-Up Approaches:Data Mining, Text Mining

� A Statistical Approach can

involve a user forming a theory

about a possible relationship in

a database and converting that

to a hypothesis and testing

� The difference with data mining (which includes multivariate statistical models!)is that the interrogation of the data is done by the data

Time To Change the 2 Cultures* Clash

© 2010 IBM Corporation

to a hypothesis and testing

that hypothesis using a

statistical method. It is a

manual, user-driven, top-down

approach to data analysis.

� Source DM Review

data is done by the data mining method--rather than by the user. It is a data-driven, self-organizing, bottom-up approach to data analysis

Statisticians can use their favorite methods from within Modeler 14 and Data Miners can broaden their capabilities by invoking statistical methods from Statistics 18

* "Statistical Modeling: The Two Cultures," Leo Breiman, Statistical Science, 2001, Vol.16 (3), pp.199-231.

Business Analytics

The Kinds of Questions that Data Mining Can Answer

• Based on the percussion beat, what genre of music is this?

• Which books of the New Testament have the same author?

• What class of astronomical object is this image?

• Which genes express when drug B prevents the rejection of a transplanted organ?

• Which transformer in a grid is likely to fail due to a breakdown of its dielectric?

© 2010 IBM Corporation

• What combination of repair parts are needed at worldwide aircraft service centers?

• To which of 4 products will a customer respond in a marketing campaign?

• How much of a costume should store # 7005 stock for Halloween this year?

• Which annuity holder will prematurely surrender their policy?

• Which physician will prescribe more of this acid reflux drug than an alternative?

Business Analytics

Neonatal Care Trading Advantage

Law Enforcement Radio Astronomy

Environment

Telecom

Application Areas

© 2010 IBM Corporation

Manufacturing Smart Traffic Fraud Prevention

Business Analytics

� Over $12B in software

investments since 2005

� Over 4,000

Dedicated Consultants

� Analytics in a Box to

IBM is Investing to Accelerate an Information-Led Transformation

© 2010 IBM Corporation10

� Analytics in a Box to

Accelerate Time to Value

� Largest Math Department in

Private Industry

“IBM, not SAP or Oracle, is now the industry's premo analytics solution/platform vendor…”

Business Analytics

Query/Reporting OLAP Data Mining

• Hypothesis-driven

• Manual

• Hypothesis-driven

• Manual

Tra

inin

g

• Data- & Goal-driven

• Creates Hypotheses

• Automatic

Some Business Analytics Methods Compared

© 2010 IBM Corporation

‘Which training regimen increases the lactate threshold the most?

Diet

Tra

inin

g

‘Drill down Training = 5 and Diet = 4 and VO2 = 9th

decile

Rule 3 for ‘Athlete Qualified’:

VO2 Max > 5th decile and

Interval Training Regiment in {1-

5, 7-10}

results in 100% Qualified for 83

athletes

Reports & Graphs

ScoringModel

Business Analytics

IBM Analytics Landscape

Predictive Analytics

Optimization

Co

mp

etitive

Ed

ge

© 2010 IBM CorporationBased on: Competing on Analytics, Davenport and Harris, 2007

Complexity

Querying, Reporting, OLAP

Simulation, AlertsCo

mp

etitive

Ed

ge

Business Analytics

IBM SPSS Product Areas

© 2010 IBM Corporation

Business Analytics

• Easy to Learn / Visual Design Paradigm

• Visual approach - no writing code!

• Comprehensive range of data mining methods

• Powerful Automated modeling

• Automatically prepares data

SPSS Modeler Capabilities

© 2010 IBM Corporation

• Automatically finds the best model

• Mines text, web & survey data

• Fully integrated with Statistics

• Open & Scalable architecture

• No proprietary database required

• Leverage your existing IT investment

• Scales to enterprise volumes with SQL pushback in-database scoring

Business Analytics

Mining Methods in IBM SPSS Modeler 14

Data Preparation� Dimension Reduction:

– Feature Selection– Principal Components Analysis– Factor Analysis

Classification and Regression� Naïve Bayes� Bayesian Networks� Trees:

� Generalized Linear Model� Discriminant Analysis� SVM (Support Vector Machine)

Segmentation and Anomaly Detection� Clustering:

– K-Means – Kohonen Self-Organizing Maps– 2-Step (based on BIRCH)

© 2010 IBM Corporation

� Trees: – CHAID– C5.0– C&RT– QUEST

� Neural Networks– Multi-Layer Perceptron– Radial Basis Functions

� Regression– Binomial, Multinomial Logistic– Multiple, Multivariate Linear

Forecasting & Survival Analysis� Time Series (ARIMA**)� Cox Regression

Market Basket & Sequence Analysis� Association Rules:

– A Priori– GRI– CARMA

Case-Based Reasoning� KNN – K Nearest Neighbor

Business Analytics

Getting Closer to 360-degree Customer View:

Demographics Data Web Data Text Mining: Comments

© 2010 IBM Corporation

Customer Usage Data

Business Analytics

Predict: SPSS Text Analytics

� Leverages unstructured

data via call center notes, blogs, web pages, open ended surveys etc. to improve predictive model accuracy

� Extracts concepts from

© 2010 IBM CorporationPage 17

� Extracts concepts from text and can categorize

them as sentiments

� Strong visualization

capabilities enable quick

understanding of business issues

Business Analytics

Classification and Regression Require a Target Field

and a

TargetInputs

Text Analytics adds columns such as the number of calls categorized as aNegative Billing Sentiment

Neg Billg

© 2010 IBM Corporation

Business Analytics

Mining Methods “Learn” from Data

Customer NotesText Mining(Category = T or F)

Merged Data

Customer DatabaseSurvey/demographic (Satisfaction = 1—4 )

Web page hitsWeb Mining(Event = Y or N)

© 2010 IBM Corporation

Predictive Model

New Data

Scored Predictions

Data To Train

Learning method

Data To TestModel

Merged Data

2/3 1/3

Business Analytics

Predicton newdata

Understand Prepare Model Evaluate Deploy

Connectto datasources

Parse Trx by Mo.Aggregate call dataMerge (plan & ID)

Define Target& Train Method

TestMethod

Steps in the Data Mining Process

© 2010 IBM Corporation

Transform log TrxBinary, �� trendFeature selection

Gains,accuracy,AUROC,Profit,Contin-gencymatrix

Actions,Attitudes,Attributes

Salesstrategy

ExportResults,Model

Trees, NeuralNetworks,Regressions,SVM, BayesianNetwork

Trans-actions,3rd Party,Surveys

Subdivide by region, plans, etc.

Data exploration

Anomaly detection

Business Analytics

Automated Data Mining Scoring Process

Build a Geographic

Crime Predictive Model

Score the Model

on New Data in

Your Database

© 2010 IBM Corporation

21© 2009 SPSS Inc.

Crime Predictive Model

Deploy a Map of

Hot Spots in the

Field

Business Analytics

In addition, as the U.S. business environment becomes increasingly competitive and organizations strive to increase efficiency and reduce costs through the use of information technology, computer and mathematical science occupations will see strong employment growth.“ -- 2008—2018 Outlook in Monthly Labor Review, Nov. 2009, p.83

Should I Teach Data Mining Skills in My Department?

Hot Careers for College Graduates 2010A Special Report for Recent and Mid-Career College GraduatesUC San Diego Extension, May 2010

© 2010 IBM Corporation

1. Health Information Technology2. Clinical Trials Design and Management for Oncology3. Data Mining4. Embedded Engineering5. Feature Writing for the Web6. Geriatric Health Care7. Mobile Media8. Occupational Health and Safety

9. Spanish/English Translation and Interpretation10. Sustainable Business Practices and the Greening of all Jobs11. Teaching Adult Learners12. Teaching English as a Foreign Language13. Marine Biodiversity and Conservation14. Health Law

Business Analytics

A Sampling of Academic Disciplines Impacted by Data Mining – A Method of Obtaining Knowledge Empirically

ArtsMusicLanguage, LinguisticsWriting / Communications

Political Science / GovernmentCrimePublic SafetyElection Campaigning

Physical EducationAthletic Performance

Engineering ManagementUtilitiesPetrochemicalYield & Reliability

Science

© 2010 IBM Corporation

Election Campaigning

LawTax FraudLegal Documents

EducationAdmissionsRetentionPerformance

ScienceAstronomyMaterial Science

MedicineGenomic and Proteomic AnalysisBiomarkersDiagnosis

Business Analytics

I. Foundations1. Intro

3. Advanced association, correlation and frequent pattern analysis

How Do You Teach It?

© 2010 IBM Corporation

1. Intro2. Data Preprocessing3. Data Warehousing and OLAP for Data Mining4. Association, correlation and frequent pattern analysis5. Classification6. Cluster and Outlier Analysis7. Mining Time-Series and Sequence Data8. Text Mining and Web Mining9. Visual Data Mining10. Data Mining: Industry efforts and social impactsII. Advanced Topics

1. Advanced Data Preprocessing2. Data Warehousing, OLAP, Data Generalization

analysis4. Advanced Classification5. Advanced cluster analysis6. Advanced Time-Series and Sequential Data Mining7. Mining Data Streams8. Mining Spatial, Spatiotemporal and Multimedia data9. Mining Biological Data10. Text Mining11. Hypertext and Web mining12. Data Mining Languages13. Data Mining Applications14. Data Mining and Society15. Trends in Data Mining

http://www.sigkdd.org/curriculum/CURMay06.pdf

Business Analytics

Hastie, Tibshirani & Friedman

Han, Kamber & Pei Statistical

Textbooks

© 2010 IBM Corporation

Witten & Frank

Tan, Steinbach

& KumarLarose

Margaret Dunham

Witten & Frank

Larose

Mitchell

DIF

FIC

ULT

Y

Machine Learning Practical S/W apps.

Business

Berry & Linoff

Nisbet, Elder & Miner

Business Analytics

© 2010 IBM Corporation

For a copy of the presentation please e-mail:

[email protected]