Multi Intelligence approach in Preparation of Instructional Material
Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.
-
Upload
arleen-holt -
Category
Documents
-
view
212 -
download
0
Transcript of Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.
Business Intelligence/Decision Models
Week 3Data Preparation
and Transformation
Last Week OLTP, data warehouse repository
and data mart structures (flat and relational files)
Data integrity and normalization DB interrogation (SQL) for:
OLAP and Reporting Migration into data mining suites
Time/Cost
Cumulated Productivity
Learning by associationor problem solving
This Week CRISP
(Cross Industry Standard Procedure for Data Mining)
Data preparation (import, aggregate and merge)
Data transformation (for analytics)
CRISP-DM Phases
Source SPSS Inc. 2008
Source SPSS Inc. 2008
Case Study A large telecom (XYZ PHONE) has discovered that it is losing
customers at a much higher rate than in previous years.
Reporting through the corporate dashboard (OLAP)has shown churn rates growing by a large margin last year.
Source SPSS Inc. 2008
Define Business Objectives Strategic objective definition
Increase revenues by retaining more customers Related business goal identification
Retain high value customers Identify process problems that need to be changed
Clear success factor (metric) Decrease customer churn by 1%
Cost-benefit analysis Increase revenues by $750,000
Actionable BI objectives XYZ wants to retain more customers by identifying likely
churners 2 months prior and putting an action in place to retain them
Source SPSS Inc. 2008
Timeline Example XYZ’s project: 13 weeks
8 weeks a) business understanding and b) data preparation• Involved line of business manager and data expert• Included better defining high-value and churner definition
2 weeks data understanding• Heavy reliance on data expert and database administrator
2 weeks modeling and evaluation• Models developed by data miner and results evaluated by line of business
manager 1 week deployment ?
• Heavy involvement of database administrator
Model deployment entailed setting up a data model for monthly scoring of customer base with resulting reports feeding a mail offer
Source PSS Inc. 2008
Time Allocation Generally accepted industry timeline standards
50 to 70 percent data preparation 20 to 30 percent data understanding 10 to 20 percent modeling, evaluation, and business
understanding 5 to 10 percent deployment
Data Import and Transformation
Lab Objectives Extract data from
Customer file Transactional file Transform data into information
Data preparation Aggregate data from transactional file Merge aggregate data & customer file
Data Import Step by Step Import files from Access or Excel
Customer and Transaction files
Document variables labels and value labels using the data dictionary
Aggregate the transaction file by cust_id with summary data and key variables
Merge Customer and aggregated transaction file using cust_id as a common key
Aggregating Transaction FileOrder_id
Date Cust_id
Prod_num
Amt
4433 10/21 1011 231 120
4434 10/30 2234 143 240
4435 11/05 2876 432 175
4436 11/05 3454 143 240
4437 11/07 2234 223 600
4438 11/08 1011 254 211
4439 11/08 2876 534 300
4440 11/08 1011 143 240
4441 11/12 3454 322 150
4442 11/13 2876 512 321
4443 11/13 1011 412 125
Cust_id
Freq Date1 Date2 Amt_sum
1011 4 10/21 11/13 696
2234 2 10/30 11/07 840
2876 3 11/05 11/13 796
3454 2 11/05 11/12 380
Lab Objectives (Cont) Data transformation
Compute customers’ length on file Compute recency of last purchase Compute frequency of purchases Compute amount spent Compute customer status
Purpose CLV (Week4) RFM (Week5)
Data Transformation Step by Step Revisit measurement variables (nominal, ord,
scale) Define date formats Auto recode nominal string variables Define missing values Calculate length on file or tenure
(Date last purchase – Date first purchase) tenure Calculate time since last purchase
(Date of current file – Date last purchase) Define customer status (active or lapsed)
Merging Customer and Transaction Summary FilesCust_ id
Na-me
Add-ress
Type CC
1011 Jean NY 1 Visa
2234 John OH 1 MC
2876 Janet CA 2 Visa
3454 Jane NY 3 Amex
Freq Date1 Date2 Amt_sum
4 10/21 11/13 696
2 10/30 11/07 840
3 11/05 11/13 796
2 11/05 11/12 380
Data TransformationCust_ ids
Na-me
Add-ress
Type CC
1011 Jean 1/NY 1/Res 1/Visa
2234 John 2/OH 1/Res 2/MC
2876 Janet 3/CA 2/Bus 1/Visa
3454 Jane 1/NY 3/DNK 3/Amx
Freq Dte1 Dte2 Amt Days Rec-ency
4 10/21 11/13 696 23 17
2 10/30 11/07 840 8 23
3 11/05 11/13 796 8 17
2 11/05 11/12 380 7 18
Purpose of this exercise? Prepare data for next two weeks:
Lifetime Customer Value RFM Analysis …