Data Mining and Knowledge Discovery for Strategic Business Optimization Peter van der Putten ALP...
-
Upload
daniella-holt -
Category
Documents
-
view
216 -
download
1
Transcript of Data Mining and Knowledge Discovery for Strategic Business Optimization Peter van der Putten ALP...
Data Mining and Knowledge Discovery
for Strategic Business Optimization
Peter van der Putten
ALP Group, LIACS & KiQ Ltd
November 2004
Why is a business in business?
• Successful businesses create a lot of added value for their customers and capture it– Maximize long term profit
• Optimize: Maximize sales, minimize costs, minimize risk
Challenges
• Businesses are bigger• Fragmentation of products, customer interaction
channels, market segments• Fierce competition, chaotic economic climate and
dynamic customer behavior• Data glut & information overflow
• Solution: data mining & knowledge discovery for strategic business optimization
All applications
Expert knowledge 29.8% accepted
12.7% infection
34.5% accepted
Prediction model plus rules
9.1% infection
Accepted Accepted volumevolume
Credit scoring case: minimizing loan risk while maximizing loan acceptionCredit scoring case: minimizing loan risk while maximizing loan acception
Marketing case: maximizing direct mail response while minimizing cost
Logistic-Regression
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
0 10 20 30 40 50 60 70 80 90 100
Cum
. po
sitiv
e
Cases (%)
A model was created that predicts the probability to respond to a mailing. By using the model to select
customers to mail we could reach 50% of the responders
by mailing only 20% of all customers
Siebel
OMEGA predicts a slight preference for general
insurance and offers a one-click cross-sell button.
Although the next customer might have preferences as well, the exit risk is overriding. Using a combination of
predictive models and business rules, OMEGA suggests to Siebel an immediate
attempt to retain the customer.
OMEGA offers Siebel the appropriate text for its script
engine.
Within general insurance, OMEGA predicts a
preference for car insurance and offers one-click access
to the appropriate script.
OMEGA again offers Siebel the appropriate text to
execute a retention script.
Overview
• Why Data Mining?• The Data Mining Process• Data Mining Tasks• Data Mining Techniques• Future Outlook• Data Mining Opportunities by Sector and
Function• Q&A
Some working definitions….
• ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably
• Data mining = – the discovery of interesting, meaningful and
actionable patterns hidden in large amounts of data • Multidisciplinary field originating from artificial
intelligence, pattern recognition, statistics, machine learning, econometrics, ….
Data mining is a process…
• Model Development– Objective– Data collection & preparation– Model construction– Model evaluation– Combining models with business knowledge into
decision logic• Model / decision logic deployment• Model / decision logic monitoring
Data mining tasks
• Undirected, explorative, descriptive, ‘unsupervised’ data mining– Matching & search– Profile & rule extraction– Clustering & segmentation
• Directed, predictive, ‘supervised’ data mining– Predictive modeling
Data mining task example: Clustering & segmentation
Data mining task example: Clustering & segmentation
Start Looking Glass
Tussenresultaat looking glass
Resultaat Looking Glass
Resultaat Looking Glass
Case A 7
Case B 4
10987654321
Worsebusiness
Score
Betterbusiness
Case A
Case B
Past experience
Data Behaviour
GoodBad
Bad
Good
Model
Data mining task example:predictive modeling
Data mining task example:predictive modeling
Income Age Children
60K 38 2
30K 23 1
30K 29 0
... ... ...
120K 55 2
Collected data
Data mining task example:predictive modeling
Income Age Children Status
60K 38 2 Good
30K 23 1 Good
30K 29 0 Bad
... ... ... ...
120K 55 2 Bad
Known customerbehaviour
score = (0 x Income) + (-1 x Age) + (25 x Children)
Data mining task example:predictive modeling
Income Age Children Status Value Score
60K 38 2 Good 100 12
30K 23 1 Good 45 2
30K 29 0 Bad -80 -24
... ... ... ... ... ...
120K 55 2 Bad -40 -5
Data mining task example:predictive modeling
• Recruitment– Who will respond to a mailing campaign?– To who can we cross sell which products?– What will be the customer value one year from now?
• Retention– Who is going to cancel his/her mobile phone subscription. Should I
attempt to keep this customer?– Which customers have accounts that will go dormant?
• Risk– Should I sell a loan to this person?– How much money will someone claim on a policy?– Is this caller going to pay his bills?
Data mining techniques for predictive modeling
• Linear and logistic regression• Decision trees• Neural Networks• Genetic Algorithms• ….
score
=
(0 x Income) + (-1 x Age) + (25 x Children)
Linear Regression Models
Regression in pattern space
age
inco
me
Only a single line available in pattern space to separate classes
Class ‘circle’
Class ‘square’
Decision Trees
20000 customersresponse 1%
Income >150000?
18800 customersPurchases >10?
1200 customersbalance>50000?
800 customersresponse 1,8% etc.400 customers
response 0,1%
no
noyes
yes
no
Decision Trees in Pattern Space
age
inco
me
Line pieces perpendicular to axes
Each line is a split in the tree, two answers to a question
Infotrees (Genetic Programming)• Nested regression formulas
– sum(average(region, spend), max(age, children))
sumsum
maxmax
childrenchildrenageage
averageaverage
regionregion spendspend
Infotrees in Pattern Space
age
inco
me
Infotrees can seperate any class in pattern space, even if the class boundary is non-linear
Can model complex customer behavior
Genetic Algorithms / Programming
• How to find the best Infotree? Genetic algorithms– Based on the idea of evolution– Start with (random) Infotrees– Build a new generation
• Fittest models can reproduce to create offspring, worst models die
• Small amount of mutation occurs to keep exploring– Repeat process
Notes about Infotree models: Cross-over
Notes about Infotree models: Cross-over
•New models can be created by cross-over:– part of one model is swapped with part of another– parts may chosen randomly or intelligently
convexconvex
concaveconcave
invertinvert
childrenchildren
ageage
salarysalary
s1s1
ameanamean
quadvquadv
regionregion
spendspend
ageage
convexconvex
concaveconcave
childrenchildren
ageage
ameanamean
regionregion
spendspend
new model
old model
old model
cross-over point
cross-over point
Notes about Infotree models:Mutation
Notes about Infotree models:Mutation
• New models can be created by mutation:– part of a model (a sub-tree, operator or predictor) is changed – part and type of change may chosen randomly or intelligently
convexconvex
concaveconcave
childrenchildren
ageage
s2s2
househouse
TV RegionTV Region
convexconvex
concaveconcave
childrenchildren
ageage
ameanamean
regionregion
spendspend
convexconvex
concaveconcave
childrenchildren
ageage
s2s2
regionregion
spendspend
convexconvex
concaveconcave
childrenchildren
ageage
ameanamean
regionregion
spendspend
convexconvex
concaveconcave
childrenchildren
ageage
s2s2
househouse
spendspend
convexconvex
concaveconcave
childrenchildren
ageage
ameanamean
regionregion
spendspend
Sub-tree
Operator
Predictor
Short Demo(if time allows…)
Model to predict caravan policy ownership
Combining this model with other models and business rules
Data Mining: the Future
• Business (marketing)– More fine-grained segmentation down to the cluster or
individual level– More personalised actions, inbound and outbound, in all
customer contact channels– Optimization of both value for the business and the
customer– Privacy
• Technical– From Data Mining to Decisioning, combining multiple
models with business rules– Monitoring business and model performance– Data Mining Process Automation
Let’s discuss:Data Mining Opportunities by Function
• Marketing, Sales, CRM• Product Development, R&D• Manufacturing, Production, Logistics• Customer service• Finance• Procurement• Human Resources• IT• ….
Let’s discuss:Data Mining Opportunities by Sector
• Retail• Telco• Pharma• Government• Automotive• Oil• Charity• Consumers / Citizens• ….
The Paper: Requirements
• 2500 words + -10%, APA style references• No plagiarism / copying! Rephrase in your own words,
reference, cite & quote• Two parts of each 1250 words
– Your grasp of the research topic: what is data mining? Own interpretation, clear, put into context
– Memo to CEO/CIO of a specific company / industry: what are the benefits/changes/opportunities and next steps (best practice, proof of concept)? Impact, convincing, plan to action.
The Paper: Suggestions
• Suggestions for ‘companies’– KPN Mobile, Marketing: how to reduce loss of customers to
competitors– Dutch Police, Strategic Innovation: opportunities for law
enforcement, privacy implications– Pfizer, Drug Discovery: using data mining to find new drugs– Google, Product Management / R&D: opportunities for new
data mining features to enhace customer experience
– Your Idea!
The Paper: Resources
• Webpage for this talk: – http://www.liacs.nl/~putten/ictvision.html
• General Writing Resources: – http://www.liacs.nl/~putten/writingpapers.html
• Homepage: – www.liacs.nl/~putten , mail [email protected]
Dilbert’s Perspective on Data Mining