A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees...
-
Upload
landon-hawkins -
Category
Documents
-
view
218 -
download
0
Transcript of A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees...
![Page 1: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/1.jpg)
A General Framework for Fast and Accurate Regression by Data
Summarization in Random Decision Trees
Wei Fan, IBM T.J.Watson
Joe McCloskey, US Department of Defense
Philip Yu, IBM T.J.Watson
![Page 2: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/2.jpg)
Three DM Problems
Classification: Label: given set of labels in training data.
Probability Estimation: Similar to the above setting: estimate the
probability that x is an example of class y. Difference: no truth is given, i.e., no true
probability Regression:
Target value: continuous values.
![Page 3: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/3.jpg)
Model Approximation True model or correct model.
Generates y for each x with probability P(y|x). Normally never known in reality.
Perfect model: never makes mistakes or has the same prediction as the true model.
Not always possible due to: Stochastic nature of the problem Noise in training data Data is insufficient
![Page 4: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/4.jpg)
Optimal Model Loss function L(t,y) to evaluate performance.
Optimal decision decision y* is the label that minimizes expected loss when x is sampled repeatedly:
Examples 0-1 loss: y* is the label that appears the most often,
i.e., if P(fraud|x) > 0.5, predict fraud cost-sensitive loss: the label that minimizes the
“empirical risk”.• If P(fraud|x) * $1000 > $90 or p(fraud|x) > 0.09, predict
fraud MSE or mean square error: predict average
![Page 5: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/5.jpg)
How we look for optimal models? Don’t impose “exact forms”:
Decision Trees, Classification based on Association rules, Production rules
Learner estimate structure as well as parameters
NP-hard for most “model representation”
Impose “exact forms”: logistic regression functions,
linear regression model, etc Learners estimate parameter
ONLY. Structure is pre-fixed Inductive Bias.
Decision tree is rather flexible, efficient yet powerful representation.
![Page 6: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/6.jpg)
Consider Decision Tree Compromise between accuracy and model
complexity We think that simplest-structured hypothesis that fits
the data is the best. We employ all kinds of heuristics to look for it.
info gain, gini index, Kearns-Mansour, etc pruning: MDL pruning, reduced error-pruning, cost-
based pruning. Reality: tractable, but still pretty expensive Truth: none of purity check functions guarantee
accuracy over testing data.
![Page 7: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/7.jpg)
Random Decision Tree -classification, regression, probability estimation
Key characteristics: Structure is randomly picked. Statistics are summarized from training data.
At each node, an un-used feature is chosen randomly A discrete feature is un-used if it has never
been chosen previously on a given decision path starting from the root to the current node.
A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen
![Page 8: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/8.jpg)
Continued We stop when one of the following
happens: A node becomes too small. Or the total height of the tree exceeds some
limits:• Such as the total number of features.
![Page 9: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/9.jpg)
Node Statistics Classification and Probability
Estimation: Each node of the tree keeps the number of
examples belonging to each class.
Regression: Each node of the tree keeps the mean value of
examples sorted into the node
![Page 10: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/10.jpg)
Classification/Prob Estimatimation
During classification, each tree outputs posterior probability:
B1 < 0.5
Y
B2 > 0.7 B1 > 0.3
P1: 200P2: 10
N
Y N
P1: 30P2: 70
Y
… …
P(P1|x)=0.3
![Page 11: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/11.jpg)
Regression During classification, each tree average
value of training examples that falls within each node
Age >30
Y
Capt> 70% Edu=PhD
AvgAGI=100K
N
Y N
AvgAGI=150K
Y
… …
![Page 12: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/12.jpg)
Classification
The prediction from multiple random trees are averaged as the final output.
Classification: loss function is needed.
![Page 13: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/13.jpg)
A few words about some of its advantage Training can be very efficient.
Particularly true for very large datasets.
Natural multi-class probability. Natural multi-label classification and
probability estimation. Imposes very little about the
structures of the model.
![Page 14: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/14.jpg)
Number of trees Sampling theory:
The random decision tree can be thought as sampling from a large (infinite when continuous features exist) population of trees.
Unless the data is highly skewed, 30 to 50 gives pretty good estimate with reasonably small variance. In most cases, 10 are usually enough.
Worst scenario Only one feature is relevant. All the rest are noise. Probability:
Variance Deduction:
![Page 15: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/15.jpg)
Donation Dataset-classification and prob estimation
Decide whom to send charity solicitation letter.
It costs $0.68 to send a letter. Loss function
![Page 16: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/16.jpg)
Result
![Page 17: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/17.jpg)
Credit Card Fraud-classification and prob estimation
Detect if a transaction is a fraud There is an overhead to detect a
fraud, {$60, $70, $80, $90} Loss Function
![Page 18: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/18.jpg)
Result
![Page 19: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/19.jpg)
Comparing with Boosting Don’t handle multi-class problems
naturally, ECOC Do not output probabilities. Inefficient. Boosting rounds is tricky. Sometimes,
more rounds can lead to overfitting. Inefficient. Implementation needs careful numerical
manipulation.
![Page 20: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/20.jpg)
Comparing with Bagging Could be very inefficient particularly
for very large dataset i.e., bootstrap sampling needs linear
scan of the data. Do not output reliable probabilities.
![Page 21: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/21.jpg)
Probability Estimation
![Page 22: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/22.jpg)
Probability Estimation
![Page 23: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/23.jpg)
Overfitting
![Page 24: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/24.jpg)
Non-overfitting of RDT
![Page 25: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/25.jpg)
Selectivity
![Page 26: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/26.jpg)
Tolerance to data insufficiency
![Page 27: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/27.jpg)
GUIDE
Age >30
Y
Capt> 70% Edu=PhD
MLR
N
Y N
MLR
Y
… …
MLR y = a+a1*x1+a2*x2 + … ak*xk
![Page 28: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/28.jpg)
Regression: single independent variable
![Page 29: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/29.jpg)
RDT
![Page 30: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/30.jpg)
Depend on combination of 5 independent variables
![Page 31: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/31.jpg)
RDT
![Page 32: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/32.jpg)
It grows like …
![Page 33: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/33.jpg)
Comparing with GUIDE Need to decide grouping variables and
independent variables. A non-trivial task. If all variables are categorical, GUIDE
becomes a single CART regression tree. Strong assumption and greedy-based
search. Sometimes, can lead to very unexpected results, like the one given earlier
![Page 34: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/34.jpg)
Conclusion Imposing a particular form of model is not
a good idea to train highly-accurate models.
It may not even be efficient for some forms of models.
RDT has been show to solve all three major problems in data mining, classification, probability estimation and regressions, simply, efficiently and accurately.
![Page 35: A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.](https://reader035.fdocuments.us/reader035/viewer/2022062318/551515a0550346a80c8b5d74/html5/thumbnails/35.jpg)
Selected Bibliography of RDT ICDM’03: “Is random model better? On its accuracy and efficiency”
(Fan, Wang, Yu and Ma) AAAI’04: “On the Optimality of Posterior Probability Estimation by
Random Decision Tree” (Fan) ICDM’05: “Effective Estimation of Posterior Probabilities: Explaining
the Accuracy of Randomized Decision Tree Approaches” (Fan, Greengrass, McCloskey, Yu, and Drummey)
ICDM’05: “Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees” (Zhang, Buckles, Peng, and Xu)
Master Thesis by Tony Liu, supervised by Kai Ming Ting, “The Utility of Randomness in Decision Tree Construction”, Monash University, 2005
KDD’06: “A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees”