Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM...
-
Upload
timothy-west -
Category
Documents
-
view
212 -
download
0
Transcript of Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM...
Making Data Mining Models Useful to Model Non-paying
Customers of Exchange Carriers
Wei Fan, IBM T.J.Watson
Janek Mathuria, and Chang-tien LuVirginia Tech, Northern Virginia Center
Our Selling Points
A real practical problem for an actual CLEC company.
A whole process: Start with a great goal. Reality taught us a lesson Settle down with a realistic solution
A new set of algorithms to calibrate probability outputs (as distinguished from Zadrozny and Elkan’s calibration methods)
Challenging Problem
Differentiate between “Late” and “Default”: Late: 1 month past due Default: two month past due.
Default Percentage: 20%. Designed feature set: Details in Paper
Calling summary. Billing summary. Obvious ones. Other ones out there? Maybe.
Failure
Failure of Commonly Used Methods: Nearly predicting every customer as “paying on
time” and still has 80% What this means:
Our feature set not complete? Probably. Problem itself is just stochastic in nature.
Natural next step: cost-sensitive learning? Impossible to define precisely due to complexity.
A Compromised Solution
Predict a reliable probability score. A customer is uniquely distinguished by its feature
vector. If the model predict that a customer has 20%
chance to default Indeed the customer has 20% chance to default The predicted score is considered “reliable”
Previously Proposed Calibration Methods
Existing approaches that output scores are not reliable (Zadrozny and Elkan) Decision trees. Naïve Bayes SVM Logistic Regression
Use “function” mapping to calibrate unreliable score to reliable ones. Assumption: original unreliable score need to be
monotonous. Otherwise, it is not applicable.
A Good Calibration
A Bad Calibration
Random Decision Trees Amazingly Simple and Counter-intuitive: Do not use any purity check function. Pick a feature “randomly”. Continuous feature, pick a random splitting point. Discrete feature can be picked only once in one decision path. Continuous feature can be picked multiple times. Tree depth up to the number of features. Original feature set. No bootstrap! Each tree computes probability at the leaf node.
10 fraud and 90 normal transaction, p(fraud|x) = 0.1 Multiple trees, 10 min and 30 enough, average probability.
Random Forest +
Marriage between Random Decision Tree and Random Forest
Pick a feature subset randomly. Compute info gain for each feature. Choose
the one with highest info gain. Original dataset. Not bootstrap. Leaf node computes probability. 10 to 30 trees.
Availability.
Software available upon request.