Data Analytics for Customer Facing Applications
Transcript of Data Analytics for Customer Facing Applications
2/25/2008 1
Data Analytics forCustomer Facing Applications
Jaideep SrivastavaComputer Science & Engineering
2/25/2008 © Jaideep Srivastava 2
Presentation OutlineTechnology trendsCustomer facing applicationsStatus of CRM effortsAnalytical CRM
Customer segmentationCustomer loyaltyCustomer retention
Analytical CRM architectureData warehouseDimensional data modelingOn-line analytical processing (OLAP)
Data miningAmazon.com: case study in building customer loyaltyAnalytics behind e-marketingYodlee.com: case study in web business intelligencePrivacy issuesConclusion
2/25/2008 © Jaideep Srivastava 3
Technology TrendsInternet growth
Faster than any other infrastructureData collection
Rapid drop in storage costsDramatic improvement in resolution and rate of data collection ‘probes’
Data analyticsIncreasing deployment of warehousesMajor leap forward in data mining technologies and tools
Becoming possible to really understand what yourcustomers want – even at the individual level!!
2/25/2008 © Jaideep Srivastava 4
Infrastructure Adoption in the US
Millionsof users
0
60
120
1922 1950 1980 1995 2000
Radio TV CableInternet
2/25/2008 © Jaideep Srivastava 5
Product Marketing – 75 years ago
• Production – a la Adam Smith• You can have any color as long as its black –Ford Motor Co.
2/25/2008 © Jaideep Srivastava 6
Product Marketing - today
5
Add the spice of flexibility, courtesy of robotics, computers …
2/25/2008 © Jaideep Srivastava 7
New approach to marketingTO: Finding products that are right for each customer
TURN the processthrough 90 degrees
FROM: Finding customers that are right for each product
Products: 1 2 3 4 5 ….. To achieve this we need to align around
•Organization and culture•Business processes and skill•Measurement and incentives•Information management•Technology
2/25/2008 © Jaideep Srivastava 8
“Mass Customization” – B. Joseph Pine
Mass productionCheap to produceEfficient to produceUniform features/quality‘one size fits all’ approachOptimize production cost
CustomizationExpensive to produceInefficient to produceCustomized features‘tailor made’ approachOptimize customer satisfaction
Mass customizationCheap & efficient to produceCustomized features‘tailor made’ approachOptimize production cost & customer satisfaction
2/25/2008 9
Customer Facing Applications
2/25/2008 © Jaideep Srivastava 10
Customer Facing Applications
Consumer marketingCampaign managementOpportunity managementWeb-based encyclopedia, configuratorMarket segmentationLead generation/enhancement/tracking
2/25/2008 © Jaideep Srivastava 11
Customer Facing Applications
Customer care & supportIncident assignment/escalation/tracking/reportingProblem management/resolutionOrder management/promise fulfillmentWarranty/contract management
Field service supportWork orders, dispatchingReal time information transfer to field personnel via mobile technologies
2/25/2008 © Jaideep Srivastava 12
Customer Facing ApplicationsCorporate sales
Contact management profiles and historyAccount management including activitiesOrder entryProposal generation
Sales managementPipeline analysis, e.g. forecastingSales cycle analysisTerritory alignmentRoll-up and drill-down reporting
2/25/2008 13
Status of Customer Relationship Management (CRM) Efforts
2/25/2008 © Jaideep Srivastava 14
Companies are spending mega-budgets on CRM
CRM = software + support servicesEuropean CRM expenditure = $1.2B + $3.0B = $4.2B*
UK marketingservice industrygrowing at 17.4%to $7.7B
CRM
Relationship marketingCustomer serviceValue added programsLoyalty programsCulture change
*Hewson Consulting October 2000
2/25/2008 © Jaideep Srivastava 15
But - satisfaction is declining
2/25/2008 © Jaideep Srivastava 16
And - more customers are complaining
2/25/2008 © Jaideep Srivastava 17
Increasing customer resistance
98% of customer solicitations are irrelevant82% of individuals would like to block all marketing access to their own dataCampaign hit rates and customer loyalty indicators are declining
2/25/2008 © Jaideep Srivastava 18
Consequently
The ‘best’ customers are being over communicated toToday’s less valuable customers are not being developed into tomorrow’s ‘best’ customersThe business potential of the customer base is not being maximized
2/25/2008 © Jaideep Srivastava 19
Solution: Analytical CRM
CRM = Customer Understanding + Relationship Management
Analytics helps in Customer Understanding
Analytics = OLAP, Statistical analysis, data mining, etc.
2/25/2008 © Jaideep Srivastava 20
Example Customer Facing Applications Helped by Analytical CRM
Customer segmentation
Customer loyalty building
Customer retention/recovery
2/25/2008 © Jaideep Srivastava 21
Customer segmentationPurpose of segmentation is to identify groups of customers with similar needs and behavior patterns, so that they be offered more tightly focused
ProductsServicesCommunications
Segments should beIdentifiableQuantifiableAddressableOf sufficient size to be worth addressing
Two approaches to segmentationcluster common characteristics, and then map out behavior patternsSeparate out behavior patterns, then identify segment characteristics
2/25/2008 © Jaideep Srivastava 22
Customer base segmentationPotential business
High
Care &Maintenance
RetainDevelop
Observe &Incentivize
Actual business
Low
HighLow
Targeted communication to each segment
2/25/2008 © Jaideep Srivastava 23
Segmentation by value
2/25/2008 © Jaideep Srivastava 24
Express profits as deciles, and ask questions
12001000800600400200
0-200-400-600-800
-1000-1200
Who are thesecustomers; what dothey look like?
Middle 60%, eitherside of break even.What can we do aboutthese?
Are these worth keeping?Can we service them with a lower cost channel?What can we do to make this segment profitable?
Should the focus be on retaining wallet share from segments 8 – 10?Or, on gaining from segments 1 – 4?
Profit
Deciles
2/25/2008 © Jaideep Srivastava 25
Customer loyalty: close relationships are more profitable
2/25/2008 © Jaideep Srivastava 26
Relationship intensity anddefection odds
Evidence suggests that customer ‘lock in’ occurs once 4 or more products are purchased
Odds of notdefecting
1.1%
10.2%
18.1%
98.3%
1 2 3 4Number of products purchased
2/25/2008 © Jaideep Srivastava 27
A difference of opinion …
70%
90%Company view Customer view
32%
2%
Customers are happywith our customerservice
We research customerservice needs andwants as part of ourcustomer serviceimprovement
Customer serviceneeds noimprovement
Customer servicetoday is betterthan ever
2/25/2008 © Jaideep Srivastava 28
… and action
Company view Customer view98%
43%
7%
We want to develop arelationship withour customers
We want to form anddevelop a relationshipwith our suppliers
The relationship now isstronger than 12 monthsago
2/25/2008 © Jaideep Srivastava 29
Increasing propensity to buy over a customer life cycle
Actions which buildrelationship warmth
•No-fault service•“Have a nice day”•Targeted sales
Customerrelationshipprofitability
2/25/2008 © Jaideep Srivastava 30
Loyalty is built through a virtuous circle of new customer experience
Virtuous circle of customer experience
SuperlativeCustomerserviceProvides legitimacy
to offer adviceProvides legitimacyto offer advice
Innovativenew products
Individualizedand helpfuldialog
Excites the customerand builds loyalty
2/25/2008 © Jaideep Srivastava 31
Lifetime Impact of Customer Loyalty
TIME
“Realized” customer value
Customer potential
“Maximized” customer value
VALUE
2/25/2008 © Jaideep Srivastava 32
Managing Credit-Card Retention in the Pacific Rim
•Behavioral Propensity Model based Campaigns generate New Customers •Selective score-based phone follow-up more than doubles response•“Event-driven”(Trans. Vol. & Value) Campaigns to stimulate initial usage of credit-card. •Propensity model + “Event-driven” Customer Retention program identifies likely non-renewers 3 months prior to renewal, and kicks in usage stimulation program •Different offers (“Frequent User Club” versus Premium) being tested
Impact: Over 100% improvement in both Acquisition and Retention. New market opened up.
2/25/2008 © Jaideep Srivastava 33
Using Negative Events to drive Positive Sales
Event = “ATM request for cash” is rejected due to lack of funds.
For credit-worthy customers, unsecured personal loan is offered by mail or phone the next day!
30% acceptance rate of product offered.
Impact: Significant cross-sales of additional product
Significant reduction in negative reactions
2/25/2008 34
Analytical CRM Architecture
2/25/2008 © Jaideep Srivastava 35
Analytical CRM Loop
Hypothesisgeneration
ResultsAnalysis
Action
2/25/2008 © Jaideep Srivastava 36
Traditional Growth of Functions in an Organization
Inbound Call Centre
BranchATM
Fax
Kiosk
Outbound Call Centre
WAPEmail
3rd Party Resellers
Data
Data
DataData
WEB
THE PRESENT MULTIPLE CHANNELS & DATA STORES / IMPERSONAL SERVICE
• IMPERSONAL
• LOW QUALITY
• UNINFORMED
• INCONSISTENT
Impact!Impact!
l In Confidence
2/25/2008 © Jaideep Srivastava 37
DATA
THE NEAR FUTURE MULTIPLE CHANNELS & DATA STORES / PERSONALISED SERVICE
Impact!Impact!
• PERSONALISED
• HIGH QUALITY
• INFORMED
• CONSISTENT
Vision for Customer Driven CRM
2/25/2008 © Jaideep Srivastava 38
Canonical Analytics ArchitectureCanonical Analytics Architecture
Monitor &
Integrator
DataWarehouse
ExtractTransformLoadRefresh
metadataOLAPServer
AnalysisQueryReportsData mining
Data SourcesTools
Serve
Data Marts
OperationalDBs
othersources
2/25/2008 39
Data Warehouse
2/25/2008 © Jaideep Srivastava 40
Data WarehouseA decision support database that is maintained separately from the organization’s operational databaseA data warehouse is a
subject-oriented,integrated,time-varying,non-volatile
collection of data that is used primarily in organizational decision making.
2/25/2008 © Jaideep Srivastava 41
Data Warehouse - Subject Orientedsubject oriented: oriented to the major subject areas of the corporation that have been defined in the data model.
E.g. for an insurance company: customer, product, transaction or activity, policy, claim, account, and etc.
operational DB and applications may be organized differently
E.g. based on type of insurance's: auto, life, medical, fire, ...
2/25/2008 © Jaideep Srivastava 42
Data Warehouse - Integrated
There is no consistency in encoding, naming conventions, … among different data sourcesWhen data is moved to the warehouse, it is converted.
2/25/2008 © Jaideep Srivastava 43
Data Warehouse - Non-Volatile
Operational data is regularly accessed and manipulated a record at a time and update is done to data in the operational environment. Warehouse Data is loaded and accessed. Update of data does not occur in the data warehouse environment.
2/25/2008 © Jaideep Srivastava 44
Data Warehouse - Time VarianceThe time horizon for the data warehouse is significantly longer than that of operational systems.Operational database contain current value data. Data warehouse data is nothing more than a sophisticated series of snapshots, taken as of some moment in time.The key structure of operational data may or may not contain some element if time. The key structure of the data warehouse always contains some element of time.
2/25/2008 © Jaideep Srivastava 45
Data SourcesData sources are often the operational systems, providing the lowest level of data.Data sources are designed for operational use, not for decision support, and the data reflect this fact.Multiple data sources are often from different systems run on a wide range of hardware and much of the software is built in-house or highly customized. Multiple data sources introduce a large number of issues -- semantic conflicts.
2/25/2008 © Jaideep Srivastava 46
Data CleaningImportant to warehouse clean data (operational data from multiple sources are often dirty).Three classes of tools
Data migration: allows simple data transformationData Scrubbing: uses domain-specific knowledge to scrub dataData auditing: discovers rules and relationships by scanning data (detect outliers).
2/25/2008 © Jaideep Srivastava 47
Load and RefreshLoading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, build indxes, etc.Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse
when to refreshdetermined by usage, types of data source, etc.
how to refreshdata shipping: using triggers to update snapshot log table and propagate the updated data to the warehousetransaction shipping: shipping the updates in the transaction log
2/25/2008 © Jaideep Srivastava 48
Monitordetect changes to an information source that are of interest to the warehouse
define triggers in a full-functionality DBMSexamine the updates in the log filewrite programs for legacy systems
propagate the change in a generic form to the integrator
2/25/2008 © Jaideep Srivastava 49
Integratorreceive changes from the monitors
make the data conform to the conceptual schema used by the warehouse
integrate the changes into the warehousemerge the data with existing data already presentresolve possible update anomalies
2/25/2008 © Jaideep Srivastava 50
Metadata RepositoryAdministrative metadata
source database and their contentsgateway descriptionswarehouse schema, view and derived data definitionsdimensions and hierarchiespre-defined queries and reportsdata mart locations and contentsdata partitionsdata extraction, cleansing, transformation rules, defaultsdata refresh and purge rulesuser profiles, user groupssecurity: user authorization, access control
2/25/2008 © Jaideep Srivastava 51
Metadata RepositoryBusiness data
business terms and definitionsownership of datacharging policies
Operational metadatadata lineage: history of migrated data and sequence of transformations appliedcurrency of data: active, archived, purgedMonitoring information: warehouse usage statistics, error reports, audit trails
2/25/2008 © Jaideep Srivastava 52
Data MartsA data mart (departmental data warehouse) is a specialized system that brings together the data needed for a department or related applications.Data marts can be implemented within the data warehouse by creating special, application-specific views.Data marts can also be implemented as materialized views departmental subsets that focus on selected subjects. Data marts may use different data representations and include their own OLAP engines
2/25/2008 © Jaideep Srivastava 53
Other ToolsUser interface that allows users to interact with the warehouse
query and reporting toolsanalysis toolsdata mining tools
2/25/2008 54
Dimensional Data Modeling
2/25/2008 © Jaideep Srivastava 55
Conceptual Modeling of Data Warehouses
Modeling data warehouses: dimensions & measurements
Star schema: A single object (fact table) in the middle
connected to a number of objects (dimension tables)
Snowflake schema: A refinement of star schema where
the dimensional hierarchy is represented explicitly by
normalizing the dimension tables.
Fact constellations: Multiple fact tables share dimension
tables.
2/25/2008 © Jaideep Srivastava 56
Example of Star Schema
DateMonthYear
Date
CustIdCustNameCustCityCustCountry
Cust
Sales Fact Table
Date
Product
Store
Customer
unit_sales
dollar_sales
Yen_salesMeasurements
ProductNoProdNameProdDescCategoryQOH
Product
StoreIDCityStateCountryRegion
Store
2/25/2008 © Jaideep Srivastava 57
Example of Snowflake Schema
DateMonth
Date
CustIdCustNameCustCityCustCountry
Cust
Sales Fact Table
Date
Product
Store
Customer
unit_sales
dollar_sales
Yen_sales
Measurements
ProductNoProdNameProdDescCategoryQOH
Product
MonthYear
MonthYearYear
CityState
City
CountryRegion
CountryStateCountry
State
StoreIDCity
Store
2/25/2008 © Jaideep Srivastava 58
A Query ModelShipping Method
AIR-EXPRESS
TRUCKORDER
Customer Orders
CONTRACTS
Customer
Product
PRODUCT GROUP
PRODUCT LINE
PRODUCT ITEM
SALES PERSON
DISTRICT
DIVISION
OrganizationPromotion
DISTRICT
REGION
COUNTRY
Geography
DAILYQTRLYANNUALYTime
2/25/2008 © Jaideep Srivastava 59
Summary TablesData warehouse may store some selected summary data, the pre-aggregated data.Summary data can store as separate fact tablessharing the same dimension tables with the base fact table.Summary data can be encoded in the original fact table and dimension tables.
DateID ProdID Sales0 1 10001 1 200001 2 400003 1 300000
id level date month year0 1 1 1 19981 2 NULL 1 19982 2 NULL 2 19983 3 NULL NULL 1998
2/25/2008 © Jaideep Srivastava 60
Multidimensional Data
Sales volume as a function of product, time, and geography
Prod
uct
Region
month
Dimensions: Product, Region, weekHierarchical summarization paths
Industry Country Year
Category Region Quarter
Product City Month Week
Office Day
2/25/2008 © Jaideep Srivastava 61
A Sample Data CubeTotal annual salesof TV in China.Date
Produ
ct
Cou
ntrysum
sumTV
VCRPC
1Qtr 2Qtr 3Qtr 4QtrChina
India
Japan
sum
2/25/2008 62
On-Line Analytical Processing (OLAP)
2/25/2008 © Jaideep Srivastava 63
Sample OperationsRoll up: summarize data
total sales volume last year by product category by region
Roll down, drill down, drill through: go from higher level summary to lower level summary or detailed data
For a particular product category, find the detailed sales data for each salesperson by date
Slice and dice: select and projectSales of beverages in the West over the last 6 months
Pivot: reorient cube
2/25/2008 © Jaideep Srivastava 64
Cube Operation
SELECT date, product, customer, SUM (amount)
FROM SALES
CUBE BY date, product, customer
Need compute the following Group-Bys
(date, product, customer),(date,product),(date, customer), (product, customer),(date), (product) (customer)
2/25/2008 © Jaideep Srivastava 65
Cuboid Lattice
(B)(A) (C) (D)
(B,C) (B,D) (C,D)(A,D)(A,C)
(A,B,D) (B,C,D)(A,C,D)
(A,B)
( all )
(A,B,C,D)
(A,B,C)
RData cube can be viewed as a lattice of cuboids
The bottom-most cuboid is the base cube.
The top most cuboidcontains only one cell.
2/25/2008 © Jaideep Srivastava 66
Cube Computation -- Array Based Algorithm
An MOLAP approach: the base cuboid is stored as a multidimensional arrayRead in a number of cells to compute partial cuboids
B {ABC}{AB}{AC}{BC}
{A}{B}{C}{ }
A
C
{}
2/25/2008 © Jaideep Srivastava 67
ROLAP versus MOLAPROLAP
exploits services of relational engine effectivelyprovides additional OLAP services
design tools for DSS schemaperformance analysis tool to pick aggregates to materialize
SQL comes in the way of sequential processing and columnar aggregationSome queries are hard to formulate and can often be time consuming to execute
2/25/2008 © Jaideep Srivastava 68
ROLAP versus MOLAPMOLAP
the storage model is an n-dimensional array
Front-end multidimensional queries map to server capabilities in a straightforward way
Direct addressing abilities
Handling sparse data in array representation is expensive
Poor storage utilization when the data is sparse
2/25/2008 69
Data Mining
© Jaideep Srivastava
What Is Data Mining?
Data mining (knowledge discovery in databases): Extraction of interesting ( non-trivial, implicit, previously
unknown and potentially useful) information from data in large databases
Alternative names and their “inside stories”: Data mining: a misnomer?Knowledge discovery in databases (KDD: SIGKDD), knowledge extraction, data archeology, data dredging, information harvesting, business intelligence, etc.
What is not data mining?(Deductive) query processing. Expert systems or small ML/statistical programs
2/25/2008 © Jaideep Srivastava 71
Examples of Interesting Knowledge
Association rules98% of people who purchase diapers also buy beer
ClassificationPeople with age less than 25 and salary > 40k drive sports cars
Similar time sequencesStocks of companies A and B perform similarly
Outlier DetectionResidential customers for telecom company with businesses at home
© Jaideep Srivastava
Motivation: “Necessity is the Mother of Invention”
Data explosion problem:Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories.
We are drowning in data, but starving for knowledge!
Data warehousing and data mining :On-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases.
© Jaideep Srivastava
Data Mining and Business IntelligenceIncreasing potentialto supportbusiness decisions End User
BusinessAnalyst
DataAnalyst
DBA
MakingDecisions
Data PresentationVisualization Techniques
Data MiningInformation Discovery
Data Exploration
OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data SourcesPaper, Files, Information Providers, Database Systems, OLTP
© Jaideep Srivastava
Data Mining: Confluence of Multiple Disciplines
Database systems, data warehouse and OLAP
Statistics
Machine learning
Visualization
Information science
High performance computing
Other disciplines:
Neural networks, mathematical modeling, information retrieval, pattern recognition, etc.
2/25/2008 75
The Data Mining Process
© Jaideep Srivastava
Data Mining: A KDD Process
Data mining: the core of knowledge discovery process.
Data Cleaning
Data Integration
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Databases
© Jaideep Srivastava
Steps of a KDD ProcessLearning the application domain:
relevant prior knowledge and goals of applicationCreating a target data set: data selectionData cleaning and preprocessing: (may take 60% of effort!)Data reduction and projection:
Find useful features, dimensionality/variable reduction, invariant representation.
Choosing functions of data mining summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)Data mining: search for patterns of interestInterpretation: analysis of results.
visualization, transformation, removing redundant patterns, etc.Use of discovered knowledge.:
2/25/2008 78
Data Mining – Some Issues to Consider
© Jaideep Srivastava
Three Schemes in ClassificationKnowledge to be mined:
Summarization (characterization), comparison, association, classification, clustering, trend, deviation and pattern analysis, etc.Mining knowledge at different abstraction levels: primitive level, high level, multiple-level, etc.
Databases to be mined: Relational, transactional, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, etc.
Techniques adopted:Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc.
© Jaideep Srivastava
Data Mining: Classification Schemes
General functionality:Descriptive data mining
Predictive data mining
Different views, different classifications:Kinds of knowledge to be discovered,
Kinds of databases to be mined, and
Kinds of techniques adopted.
© Jaideep Srivastava
Data Mining FunctionalityConcept description: Characterization and Comparison:
Generalize, summarize, and possibly contrast data characteristics, e.g., dry vs. wet regions.
Association:
From association, correlation, to causality.
finding rules like “inside(x, city) near(x, highway)”.
Classification and Prediction:
Classify data based on the values in a classifying attribute, e.g., classify countries based on climate, or classify cars based on gas mileage.
Predict some unknown or missing attribute values based on other information.
© Jaideep Srivastava
Data Mining Functionality (Cont.)
Clustering:
Group data to form new classes, e.g., cluster houses to find distribution patterns.
Time-series analysis:Trend and deviation analysis: Find and characterize evolution trend, sequential patterns, similar sequences, and deviation data, e.g., stock analysis.Similarity-based pattern-directed analysis: Find and characterize user-specified patterns in large databases.Cyclicity/periodicity analysis: Find segment-wise or total cycles or periodic behaviours in time-related data.
Other pattern-directed or statistical analysis:
© Jaideep Srivastava
Data Mining: On What Kind of Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB systems and information repositoriesObject-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
© Jaideep Srivastava
Are All the “Discovered” Patterns Interesting?
A data mining system/query may generate thousands of patterns, not all of them are interesting.
Suggested approach: Query-based, focused mining
Interestingness measures: A pattern is interesting if it iseasily understood by humansvalid on new or test data with some degree of certainty.potentially usefulnovel, or validates some hypothesis that a user seeks to confirm
Objective vs. subjective interestingness measures:Objective: based on statistics and structures of patterns, e.g., support, confidence, etc.Subjective: based on user’s beliefs in the data, e.g., unexpectedness, novelty, etc.
© Jaideep Srivastava
Can It Find All and Only Interesting Patterns?
Find all the interesting patterns: Completeness.Can a data mining system find all the interesting patterns?
Search for only interesting patterns: Optimization.Can a data mining system find only the interesting patterns?Approaches
First general all the patterns and then filter out the uninteresting ones.Generate only the interesting patterns --- mining query optimization
© Jaideep Srivastava
Requirements and Challenges in Data Mining
Mining methodology issuesMining different kinds of knowledge in databases.Interactive mining of knowledge at multiple levels of abstraction.Incorporation of background knowledgeData mining query languages and ad-hoc data mining.Expression and visualization of data mining results.Handling noise and incomplete dataPattern evaluation: the interestingness problem.
Performance issues:Efficiency and scalability of data mining algorithms.Parallel, distributed and incremental mining methods.
© Jaideep Srivastava
Requirements/Challenges in Data Mining (Cont.)
Issues relating to the variety of data types: Handling relational and complex types of dataMining information from heterogeneous databases and global information systems.
Issues related to applications and social impacts:Application of discovered knowledge.
Domain-specific data mining toolsIntelligent query answeringProcess control and decision making.
Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem.Protection of data security and integrity.
2/25/2008 88
Amazon.com: Case study in building customer loyalty
2/25/2008 © Jaideep Srivastava 89
The continuing relationship …Amazon.com “Loyalty” model
Need CreationNeed Creation anticipate/stimulate
Information search Information search provide /assist
Evaluate alternatives Evaluate alternatives assist / negate
Purchase transaction Purchase transaction optimise /reward
Post purchase experiencePost purchase experience add value
2/25/2008 © Jaideep Srivastava 90
Need Creation(attract to website)
anticipate/stimulateNeed CreationNeed Creation
2/25/2008 © Jaideep Srivastava 91
Further Need Creation(upon reaching website)
2/25/2008 © Jaideep Srivastava 92
Information Searchprovide /assistInformation searchInformation search
2/25/2008 © Jaideep Srivastava 93
Evaluation of Alternatives
assist / negateEvaluate alternativesEvaluate alternatives
2/25/2008 © Jaideep Srivastava 94
Purchase Optimisation/Rewardoptimise /rewardPurchase transaction Purchase transaction
••11--click purchaseclick purchase••‘‘slippery check out counterslippery check out counter’’ vs. vs. ‘‘sticky aislessticky aisles’’
2/25/2008 © Jaideep Srivastava 95
Post-purchase experience
add valuePost purchase experiencePost purchase experience
2/25/2008 © Jaideep Srivastava 96
Account Management
2/25/2008 © Jaideep Srivastava 97
Why is loyalty importantAmazon’s ‘customer lifetime value’ model (for book buyers
Average $50 for first time purchaseAverage $40 per visit thereafterAverage of one visit per 2 monthsAssume customer will be active for 10 years – not validated yet ☺
‘4 buys and you are hooked’ empirical lawUse Alexa data to bring back ‘prodigal sons’(and daughters)
2/25/2008 © Jaideep Srivastava 98
Build more loyalty faster“Loyalty”LTV
Time
2/25/2008 © Jaideep Srivastava 99
The ‘Virtuous Cycle’
Purchase response
Customer knowledge
Buying decision/process
2/25/2008 © Jaideep Srivastava 100
Internet Marketing Insight – Jeff Bezos
Role ofAdvertisement – get customer to the storeCustomer experience – get customer to buy
Brick & mortar storesGetting customer to store is the hard partShopping cart abandonment is not common, since the overhead of going to another store is very high – especially in Minnesota winters!
Marketing expenses80% for advertisement; 20% for customer experience
The 80-20 rule is reversed for on-line stores– Jeff Bezos
2/25/2008 © Jaideep Srivastava 101
Remarks on Amazon.comA very innovative company – the poster child for e-commerceIs pushing the envelope in personalizationCustomers love itWill it make money – we’re all waiting to see
A company of the future, with a product ofthe past, in a market of the present
2/25/2008 102
The Analytics Behinde-Marketing
2/25/2008 © Jaideep Srivastava 103
Web Logs – Record of consumer behavior
looney.cs.umn.edu han - [09/Aug/1996:09:53:52 -0500] "GET mobasher/courses/cs5106/cs5106l1.html HTTP/1.0" 200 mega.cs.umn.edu njain - [09/Aug/1996:09:53:52 -0500] "GET / HTTP/1.0" 200 3291mega.cs.umn.edu njain - [09/Aug/1996:09:53:53 -0500] "GET /images/backgnds/paper.gif HTTP/1.0" 200 3014mega.cs.umn.edu njain - [09/Aug/1996:09:54:12 -0500] "GET /cgi-bin/Count.cgi?df=CS home.dat\&dd=C\&ft=1 HTTP mega.cs.umn.edu njain - [09/Aug/1996:09:54:18 -0500] "GET advisor HTTP/1.0" 302mega.cs.umn.edu njain - [09/Aug/1996:09:54:19 -0500] "GET advisor/ HTTP/1.0" 200 487looney.cs.umn.edu han - [09/Aug/1996:09:54:28 -0500] "GET mobasher/courses/cs5106/cs5106l2.html HTTP/1.0" 200
. . . . . . . . .
Access Log FormatIP address userid time method url protocol status size
mega.cs.umn.edu njain 09/Aug/1996:09:54:31 advisor/csci-faq.html
Other Server Logs: referrer logs, agent logsApplication server logs: business event logging
2/25/2008 © Jaideep Srivastava 104
Shopping Pipeline AnalysisOverall goal:•Maximize probability
of reaching final state•Maximize expected
sales from each visitEnterstore
Browsecatalog
Selectitems
Completepurchase
cross-sellpromotions
up-sellpromotions
‘sticky’states
‘slippery’state, i.e.1-click buy
• Shopping pipeline modeled as state transition diagram• Sensitivity analysis of state transition probabilities• Promotion opportunities identified• E-metrics and ROI used to measure effectiveness
2/25/2008 © Jaideep Srivastava 105
Original Amazon Model for Customer Segmentation
number of purchases in past quarter
dollarsspent inpastquarter
1 2 3 4 65
1500
1000
500
7
Light buyersMedium buyersHeavy buyersSuper heavy buyers
HM
Customer M - mediumCustomer H - heavy
2/25/2008 © Jaideep Srivastava 106
Data Driven Customer Segmentation Model
frequency
tenure
monetaryrecency
• modeled customers in a 4-dim space• used PCA to determine relative weights
of each dimension• Composite Score = w1*recency + w2*frequency +
w3*monetary + w4*tenure
2/25/2008 © Jaideep Srivastava 107
Customer Score InterpretationRecency Frequency Monetary Tenure Composite
Score
… … … … …
10 days 4 times $480 3 months 80%
… … … … …
30 days 2 times $900 10 months
72%
… … … … …
… … … … …
Cust M
Cust H
• Cust M => frequent visitor but low spender=> potential for acquiring higher wallet share=> focus on improving relationship
• Cust H => infrequent visitor but heavy spender=> focus on sustaining relationship
2/25/2008 108
Yodlee.com: Case study inweb business intelligence
2/25/2008 © Jaideep Srivastava 109
Current Situation: Consumer Confusion
“It takes me two hours to get to all my accounts”
“I can’t look at my assets across accounts”
“I can’t remember all my user IDs and passwords”
“I want the web to work for me, not the other way around”
“This is overwhelming……I need some help”
“Make it easier for me!”
2/25/2008 © Jaideep Srivastava 110
Solution –PersonalInformationAggregation
2/25/2008 © Jaideep Srivastava 111
Aggregation Service Model
Communication Site(content partner)
FinanceSite
TravelSite Capabilities
AggregationServiceProvider
AOL CitibankAOLfinance MyCiti
ContentAcquisition
Aggregation,Analysis,Personalization
Applications
ConnectedUser
Presentation &Interaction
MobileUser
2/25/2008 © Jaideep Srivastava 112
Business Intelligence Benefits to Corporation
‘Tip-of-the-iceberg’ analysis for a brokerage houseLifestyle preference analysis of banking customers for a survey‘True-wallet-share’ analysis for a credit card organizationDynamic targeting for banner advertisements, e-mail campaigns, etc.
2/25/2008 © Jaideep Srivastava 113
‘Tip-of-the-Iceberg’ Analysis for a Brokerage House
Asset BasedTiers
Number ofUsers
< $20K 7579
$20K - $100K 2539
$100K - $500K 1994
$500K - $1M 525
$1M - $5M 547
$5M - $25M 106
> $25M 9
• This brokeragehouse treatedcustomers withnet worth > $1Mas ‘high net worth’(HNW) customerswith specializedservices
• Almost none of thecustomers in thegreen region had> $1M with thisbrokerage
2/25/2008 © Jaideep Srivastava 114
Household Lifestyle Preference Analysis for a Survey
- 53% have at least one online banking account
- 51% have an online credit card account -- higher than
Yodlee users as a whole
- 31% also have an E*Trade account, and 11% also have a
Schwab account
- Have a preference for FirstUSAover Citibank, the opposite
preference for users as a whole
- The most popular credit card is American Express
Financial Preferences
25% make travel reservations online --fewer than users as a whole
- Expedia is more popular as an on-line travel site than Travelocity
- 49% have a frequent flier account --higher than users as a whole
-The favorite frequent flier programs are United, Delta, American, in that order
- Half as many of co-brand users shop on Ebay than users as a whole
Lifestyle Preferences
2/25/2008 © Jaideep Srivastava 115
‘True-Wallet-Share’ Analysis for a Credit Card Organization
Analysis of credit card balance habits of user base• There are1386 people, each of which carries a total balance between $1000 and $2000 on all credit cards that (s)he owns• 292 of these 1386 people own discover cards, and carry an average balance of $174.55• 540 of these 1386 people own AmEx cards, with an average balance of $988.97• 323 of these 1386 people carry one or more Visa, with an average Visa network balance of $1018.50
Range Total Users Discover American Express
Mastercard Visa Other Average Total< $100 462 4.13
(73)
-467.40 (152) 0 -29.76 (87) -60.29 (272) -190.74
$100 - $200 232 -12.61 (39) 120.17 (66) 0 89.95
(40)
167.10 (156) 149.44
$200 - $500 643 36.97 (107) 253.77 (207) 0 218.93 (135) 272.42 (421) 342.99$500 - $1000 968 75.57 (182) 571.09 (378) 0 597.83 (217) 623.36 (593) 893.47
$1000 -$2000
1386 174.55 (292) 988.97 (540) 837.25
(1)
1018.50 (323)
1078.01 (866)
1471.38
$2000 -$5000
2422 263.27 (432) 2156.30 (1099)
957.69
(1)
2087.75 (601)
2358.22 (1579)
3297.58
$5000 -$10000
1732 620.80 (354) 4091.64 (814)
3648.40
(3)
3976.93 (483)
4966.61 (1200)
7100.20
$10000+ 1696 1332.48 (452)
10111.75 (1010)
1921.16
(9)
8934.39 (642)
14649.52 (1341)
22329.56
2/25/2008 © Jaideep Srivastava 116
Business Implications ofTrue Wallet Share Analysis
A credit card offeror knows exactly how much money customers holding its cards spend (every month) on its card vs. that on the competition’s cardsOfferor can target users falling within various segments for specific customer acquisition, retention, etc. purposesDetailed profile and history information of these users can be used for precision targeting and customer messaging through various channels including ad serving, e-mail campaigns, promotions, etc.If transaction level detail information of these users is analyzed, it can be determined exactly which credit cards are being used by aggregation users as a whole for what kind of lifestyle activity, e.g. travel, entertainment, shopping, groceries, etc; this can help partner decide which market segments to focus on
2/25/2008 © Jaideep Srivastava 117
Business Implications (contd.)The analysis above, if carried out at an individual user level detail, can be used to target individual customers with specific promotions, etc.Transaction level detail can be classified into charges to specific organizations, department stores, airlines, etc. This will identify the top organizations that aggregation users spend money at, either on the partner’s card or on a competing network. This would be useful in determining which organizations to partner with for customer retention, and acquisition, respectivelyAll of these analyses if performed periodically, and tracked over time, can provide valuable insight into the evolving credit balance distribution and usage behavior at the user population or individual user level
2/25/2008 © Jaideep Srivastava 118
Targeted Ad Serving
2/25/2008 © Jaideep Srivastava 119
Targeted Ad Serving (contd.)
2/25/2008 120
Privacy Issues
2/25/2008 121
let’s begin with some real examples …
2/25/2008 © Jaideep Srivastava 122
Problem: Shopping for spouse’sanniversary – too much clutter
2/25/2008 © Jaideep Srivastava 123
Solution: Focused and relevantadvertisement
2/25/2008 © Jaideep Srivastava 124
Problem: Tired of mistreatment by financial institutions …
You have tons of money in your investment portfolioBut you are over-worked and slipped a couple of credit card payment deadlines – after all you are busy managing your investment portfolio ☺Credit card institution treats you like a deadbeat
2/25/2008 © Jaideep Srivastava 125
SolutionWhy not let the credit card institution know what your investment portfolio balance is? Impress them ☺Perhaps even authorize credit card company to transfer funds from your investment account to cover the payment? Or maybe not ☺
2/25/2008 © Jaideep Srivastava 126
So, what’s the catch…Shopping example
Allow the vendor to collect detailed information about you and build an accurate profileJunk mail is only a nuisance for the receiver, but an expense for the sender! – the sender wants to avoid it more than the receiver!!
Credit card exampleAllow the credit card company and investment company to share your information
Multiple online accounts exampleHand over your account names and passwords to aggregation serviceSounds scary – but over 1.5 million people have done this in about 18 months’ time!!
2/25/2008 © Jaideep Srivastava 127
let’s now talk about privacy …
Merriam Webster definitiona: the quality or state of being apart from company or observation b : freedom from unauthorized intrusion
Justice Oliver Wendell Holmes“the right to be left alone”
Operational definitionCollection and analysis of personal data beyond some limit
2/25/2008 © Jaideep Srivastava 128
Public Attitude Towards PrivacyA (self-professed) non scientific study carried out by a USA Today reporterAsked 10 people the following two questions
Are you concerned about privacy? 8 said YESIf I buy you a Big Mac, can I keep the wrapper (to get fingerprints)? 8 said YES
ACM E-Commerce 2001 paper [Spiekermann et al]Most people willing to answer fairly personal questions to anthropomorphic web-bot, even though not relevant to the task at handDifferent privacy policies had no impact on behaviorStudy carried out in Europe, where privacy consciousness is (presumably) higher
2/25/2008 © Jaideep Srivastava 129
Public Attitude (contd.)Amazon.com (and practically every commercial site) uses cookies to identify and track visitors
97.6% of Amazon.com customers accepted cookies
Airline frequent flier programs with cross promotions
We willingly agree to be trackedGet upset if the tracking fails!
Over 1.5 million people have trusted the aggregation service (called Yodlee) with the names and passwords of their financial accounts in less than 18 monthsAdoption rate has been over 3 times the most optimistic projections
Medical data is (perhaps) an exception to this
2/25/2008 © Jaideep Srivastava 130
What people really wantSome people will not share any kind of private data at any cost – the ‘paranoids’Some people will share any data for returns –the ‘Jerry Springerites’The vast majority in the middle wants
a reasonable level of comfort that private data about them will NOT be misusedTangible and compelling benefits in return for sharing their private data – Big Mac example, frequent flier programs
2/25/2008 © Jaideep Srivastava 131
Remarks on PrivacyIs it ‘much ado about nothing’?
If indeed data collection was outlawed, and thus personalization impossible, wouldn’t the public lose – faced with generic, undifferentiated products/services?Given the public’s attitude about privacy (as shown in their actions), are privacy advocates barking up the wrong tree?Is it just a matter of time or generational issue, e.g. adoption of credit cards
Where do we stand?Current position - loss of your privacy may be beneficial for youEmerging position (post September 11th ) - loss of your privacy will be beneficial for everyoneCritical emerging debate - is privacy a right or a privilege?
2/25/2008 © Jaideep Srivastava 132
Concluding RemarksInternet is a high bandwidth, low latency, negligible cost, interactive channel to the customerVery high adoption rates for this channelProcessing speeds and storage capacities continuing to increase while costs continue to fallData analytics technology has grown rapidlyCustomer facing applications are ready for a paradigm shiftInnovative companies have moved aheadPrivacy is an issue, but not much of a concern