Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron...
-
Upload
brittney-palmer -
Category
Documents
-
view
215 -
download
1
Transcript of Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron...
![Page 1: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/1.jpg)
Prelude of Machine Learning 202Statistical Data Analysis in the Computer Age (1991)
Bradely Efron and Robert Tibshirani
![Page 2: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/2.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 3: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/3.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 4: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/4.jpg)
Overview
• Classical statistical methods from 1920-1950:– Linear regression, hypothesis testing, standard
errors, confidence intervals, etc.• New statistical methods Post 1980:– Based on the power of electronic computation– Require fewer distributional assumptions than
their predecessors• How to spend computational wealth wisely?
![Page 5: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/5.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 6: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/6.jpg)
Bootstrap
• Random sample from 164 data points
• t(x) = 28.58• How accurate is t(x)?
• A device for extending SE to estimators other than the mean
• Suppose t(x) is 25% trimmed mean
![Page 7: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/7.jpg)
Bootstrap• Why use a trimmed mean
rather than mean(x)?• If data is from a long-tailed
probability distribution, then the trimmed mean can be substantially more accurate than mean(x)
• In practice, one does not know a priori if the true probability distribution is long-tailed. The bootstrap can help answer this question.
![Page 8: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/8.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 9: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/9.jpg)
Nonparametric Regression
• Quadratic regression curve at 60% compliance
• 27.72 +/- 3.08
![Page 10: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/10.jpg)
Nonparametric Regression
• i.e. – Windowing with nearlest
20% data points– Smooth weight function– Weighted linear regression
• Nonparametric Regression with loess at 60% compliance
• 32.38 +/- ?
• How to find SE?
![Page 11: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/11.jpg)
Nonparametric Regression
• How to find SE?• Bootstrap
• 32.38 +/- 5.71 with B=50
• At 60% compliance• QR: 27.72 +/- 3.08• NPR: 32.38 +/- 5.71
• On balance, the quadratic estimate should probably be preferred in this case.
• It would have to have an unusually large bias to undo its superiority in SE.
![Page 12: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/12.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 13: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/13.jpg)
Generalized Additive Models• Generalized Linear model:
– Generalizes linear regression – Linear model related to response variable using a link function
Y = g(b0 + b1*X1 + ... + bm*Xm)
• Additive Model:– Non parametric regression method– Estimate a non parametric function for each predictor– Combine all predictor functions to predict the dependent variable
• Generalized Additive Model (GAM) :– Blends properties of Additive models with generalized linear model (GLM)– Each predictor function fi(xi) is fit using parametric or non parametric means– Provides good fits to training data at the expense of interpretability
![Page 14: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/14.jpg)
GAM Case Study
• Analyze survival of infants after cardiac surgery for heart defects
• Dataset: 497 infant records• Explanatory variables:
– Age (Days)– Weight (Kg)– Whether Warm-blood cardiopelgia (WBC) was applied
• WBC support data:– Of 57 infants who received WBC procedure, 7 died– Of 440 infants who received standard procedure, 133 died
![Page 15: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/15.jpg)
GAM Case Study: Logistic regression results
• Three parameter regression model– Age, Weight: continuous variables– WBC applied: binary variable
• Results:– WBC has strong beneficial effect: odds ratio of 3.8:1– Higher weight => Lower risk of death– Age has no significant effect
![Page 16: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/16.jpg)
GAM Case Study: GAM Analysis
• Add three individual smooth functions– Use locally weighted scatter
plot smoothing (Loess) method
• Results:– WBC has strong beneficial
effect: odds ratio of 4.2:1– Lighter infants have 55 times
more likely to die than heavier infants
– Surprising findings from log odds curve for age !
![Page 17: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/17.jpg)
GAM Case Study: Conclusion
• Traditional regression models may lead to oversimplification– Linear logistic regression forces curves to be straight lines– Vital information regarding effect of age lost in a linear model– More acute problem with large number of explanatory variables
• GAM analysis exploits computational power to achieve new level of analysis flexibility– A Personal computer can do what required a Mainframe 10 years ago
![Page 18: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/18.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 19: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/19.jpg)
Classification and Regression Tree
• A non parametric technique• An ideal analysis method to apply computer
algorithms• Splits based upon how well the splits can
explain variability• Once a node is split, the procedure is applied
to each “split” recursively
![Page 20: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/20.jpg)
CART Case study• Gain insight into causes of duodenal ulcers
– Use sample of 745 rats– 1 out of 56 different alkyl nucleophiles administered to each rat
– Response: One of three severity levels (1,2,3), 3 being the highest severity
• Skewed misclassification costs– Severe ulcer misclassification is more expensive than mild ulcer
misclassification
• Analysis tree construction:– Use 745 observations as the training data– Compute ‘apparent’ misclassification rates– Training data misclassification rate has downward bias
![Page 21: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/21.jpg)
CART Case study
• Classification tree
![Page 22: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/22.jpg)
CART Case study: Observations• Optimal size of classification tree is a tradeoff
– Higher training errors versus overfitting
• It is usually better to construct large tree and prune from bottom
• How to chose optimal size classification tree ?– Use test data on different tree models to understand misclassification
rate in each tree– In the absence of test data, use cross validation approach
![Page 23: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/23.jpg)
CART: Cross validation• Mimic the use of test sample• Standard cross validation approach:
– Divide dataset into 10 equal partitions– Use 90% of data as training set and the remaining 10% as test data– Repeat with all different combinations of the training and test data
• Cross validation misclassification errors found to be 10% higher than the original
• Cross validation and bootstrapping are closely related– Research on hybrid approaches in progress
![Page 24: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/24.jpg)
Agenda
• Overview• Bootstrap• Nonparametric Regression• Generalized Additive Models• Classification and Regression Trees• Conclusion
![Page 25: Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.](https://reader035.fdocuments.us/reader035/viewer/2022062715/56649d8c5503460f94a7479f/html5/thumbnails/25.jpg)
Conclusion• Computers have enabled a new generation of
statistical methods and tools• Replace traditional mathematical ways with computer
algorithms.• Freedom from bell-shaped curve assumptions of the
traditional approach• Modern Statisticians need to understand:• Mathematical tractability is not required for computer
based methods• Which computer based methods to use• When to use each method