Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

22
Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation

Transcript of Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Page 1: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Business Intelligence/Decision Models

Week 3Data Preparation

and Transformation

Page 2: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Last Week OLTP, data warehouse repository

and data mart structures (flat and relational files)

Data integrity and normalization DB interrogation (SQL) for:

OLAP and Reporting Migration into data mining suites

Page 3: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.
Page 4: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.
Page 5: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Time/Cost

Cumulated Productivity

Page 6: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.
Page 7: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Learning by associationor problem solving

Page 8: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

This Week CRISP

(Cross Industry Standard Procedure for Data Mining)

Data preparation (import, aggregate and merge)

Data transformation (for analytics)

Page 9: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

CRISP-DM Phases

Source SPSS Inc. 2008

Page 10: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Source SPSS Inc. 2008

Case Study A large telecom (XYZ PHONE) has discovered that it is losing

customers at a much higher rate than in previous years.

Reporting through the corporate dashboard (OLAP)has shown churn rates growing by a large margin last year.

Page 11: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Source SPSS Inc. 2008

Define Business Objectives Strategic objective definition

Increase revenues by retaining more customers Related business goal identification

Retain high value customers Identify process problems that need to be changed

Clear success factor (metric) Decrease customer churn by 1%

Cost-benefit analysis Increase revenues by $750,000

Actionable BI objectives XYZ wants to retain more customers by identifying likely

churners 2 months prior and putting an action in place to retain them

Page 12: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Source SPSS Inc. 2008

Timeline Example XYZ’s project: 13 weeks

8 weeks a) business understanding and b) data preparation• Involved line of business manager and data expert• Included better defining high-value and churner definition

2 weeks data understanding• Heavy reliance on data expert and database administrator

2 weeks modeling and evaluation• Models developed by data miner and results evaluated by line of business

manager 1 week deployment ?

• Heavy involvement of database administrator

Model deployment entailed setting up a data model for monthly scoring of customer base with resulting reports feeding a mail offer

Page 13: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Source PSS Inc. 2008

Time Allocation Generally accepted industry timeline standards

50 to 70 percent data preparation 20 to 30 percent data understanding 10 to 20 percent modeling, evaluation, and business

understanding 5 to 10 percent deployment

Page 14: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Data Import and Transformation

Page 15: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Lab Objectives Extract data from

Customer file Transactional file Transform data into information

Data preparation Aggregate data from transactional file Merge aggregate data & customer file

Page 16: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Data Import Step by Step Import files from Access or Excel

Customer and Transaction files

Document variables labels and value labels using the data dictionary

Aggregate the transaction file by cust_id with summary data and key variables

Merge Customer and aggregated transaction file using cust_id as a common key

Page 17: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Aggregating Transaction FileOrder_id

Date Cust_id

Prod_num

Amt

4433 10/21 1011 231 120

4434 10/30 2234 143 240

4435 11/05 2876 432 175

4436 11/05 3454 143 240

4437 11/07 2234 223 600

4438 11/08 1011 254 211

4439 11/08 2876 534 300

4440 11/08 1011 143 240

4441 11/12 3454 322 150

4442 11/13 2876 512 321

4443 11/13 1011 412 125

Cust_id

Freq Date1 Date2 Amt_sum

1011 4 10/21 11/13 696

2234 2 10/30 11/07 840

2876 3 11/05 11/13 796

3454 2 11/05 11/12 380

Page 18: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Lab Objectives (Cont) Data transformation

Compute customers’ length on file Compute recency of last purchase Compute frequency of purchases Compute amount spent Compute customer status

Purpose CLV (Week4) RFM (Week5)

Page 19: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Data Transformation Step by Step Revisit measurement variables (nominal, ord,

scale) Define date formats Auto recode nominal string variables Define missing values Calculate length on file or tenure

(Date last purchase – Date first purchase) tenure Calculate time since last purchase

(Date of current file – Date last purchase) Define customer status (active or lapsed)

Page 20: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Merging Customer and Transaction Summary FilesCust_ id

Na-me

Add-ress

Type CC

1011 Jean NY 1 Visa

2234 John OH 1 MC

2876 Janet CA 2 Visa

3454 Jane NY 3 Amex

Freq Date1 Date2 Amt_sum

4 10/21 11/13 696

2 10/30 11/07 840

3 11/05 11/13 796

2 11/05 11/12 380

Page 21: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Data TransformationCust_ ids

Na-me

Add-ress

Type CC

1011 Jean 1/NY 1/Res 1/Visa

2234 John 2/OH 1/Res 2/MC

2876 Janet 3/CA 2/Bus 1/Visa

3454 Jane 1/NY 3/DNK 3/Amx

Freq Dte1 Dte2 Amt Days Rec-ency

4 10/21 11/13 696 23 17

2 10/30 11/07 840 8 23

3 11/05 11/13 796 8 17

2 11/05 11/12 380 7 18

Page 22: Business Intelligence/ Decision Models Week 3 Data Preparation and Transformation.

Purpose of this exercise? Prepare data for next two weeks:

Lifetime Customer Value RFM Analysis …