Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM...

Making Data Mining Models Useful to Model Non-paying

Customers of Exchange Carriers

Wei Fan, IBM T.J.Watson

Janek Mathuria, and Chang-tien LuVirginia Tech, Northern Virginia Center

Our Selling Points

A real practical problem for an actual CLEC company.

A whole process: Start with a great goal. Reality taught us a lesson Settle down with a realistic solution

A new set of algorithms to calibrate probability outputs (as distinguished from Zadrozny and Elkan’s calibration methods)

Challenging Problem

Differentiate between “Late” and “Default”: Late: 1 month past due Default: two month past due.

Default Percentage: 20%. Designed feature set: Details in Paper

Calling summary. Billing summary. Obvious ones. Other ones out there? Maybe.

Failure

Failure of Commonly Used Methods: Nearly predicting every customer as “paying on

time” and still has 80% What this means:

Our feature set not complete? Probably. Problem itself is just stochastic in nature.

Natural next step: cost-sensitive learning? Impossible to define precisely due to complexity.

A Compromised Solution

Predict a reliable probability score. A customer is uniquely distinguished by its feature

vector. If the model predict that a customer has 20%

chance to default Indeed the customer has 20% chance to default The predicted score is considered “reliable”

Previously Proposed Calibration Methods

Existing approaches that output scores are not reliable (Zadrozny and Elkan) Decision trees. Naïve Bayes SVM Logistic Regression

Use “function” mapping to calibrate unreliable score to reliable ones. Assumption: original unreliable score need to be

monotonous. Otherwise, it is not applicable.

A Good Calibration

A Bad Calibration

Random Decision Trees Amazingly Simple and Counter-intuitive: Do not use any purity check function. Pick a feature “randomly”. Continuous feature, pick a random splitting point. Discrete feature can be picked only once in one decision path. Continuous feature can be picked multiple times. Tree depth up to the number of features. Original feature set. No bootstrap! Each tree computes probability at the leaf node.

10 fraud and 90 normal transaction, p(fraud|x) = 0.1 Multiple trees, 10 min and 30 enough, average probability.

Random Forest +

Marriage between Random Decision Tree and Random Forest

Pick a feature subset randomly. Compute info gain for each feature. Choose

the one with highest info gain. Original dataset. Not bootstrap. Leaf node computes probability. 10 to 30 trees.

Availability.

Software available upon request.

Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM...

Documents

Transcript of Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM...