Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning...
Transcript of Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning...
![Page 1: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/1.jpg)
Statistical Learning
Saharon RossetSpecial thanks: Trevor Hastie
![Page 2: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/2.jpg)
Outline• Part 1: Introduction to Statistical Learning
Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani and Friedman (2001)– Motivation and problem examples– Introduction of fundamental concepts:
• Supervised learning: regression and classification • Local models (k-NN, kernel smoothing)• Linear models• Bias-variance tradeoff(s)• Examples
– Illustration through discussion of some simple regression methods: linear regression and k-NN
![Page 3: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/3.jpg)
Outline
• Part 2: Regularization and Boosting– Regularized optimization: introduction and examples– Boosting: introduction and examples– Boosting as approximate L1 regularization
• Part 3: L1 Regularization: statistical and computational properties– Piecewise linear regularized solution paths– L1 regularization in infinite dimensional feature spaces
![Page 4: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/4.jpg)
ESL Chap1 - Introduction
Statistical Learning Problems• Identify the risk factors for prostate cancer (lcavol), based on clinical and demographic variables.
![Page 5: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/5.jpg)
• Classify a recorded phoneme, based on a log-periodogram.
![Page 6: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/6.jpg)
• Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements
![Page 7: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/7.jpg)
• Customize an email spam detection system.
![Page 8: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/8.jpg)
• Identify the numbers in a handwritten zip code, from a digitized image
![Page 9: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/9.jpg)
• Classify a tissue sample into one of several cancer classes, based on a gene expression profile.
![Page 10: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/10.jpg)
• Classify the pixels in a LANDSAT image, according to usage:{red soil, cotton, vegetation stubble, mixture, gray soil, damp gray soil, very damp gray soil}
![Page 11: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/11.jpg)
The Supervised Learning Problem
• Outcome measurement Y (also called dependent variable, response,
target)
• Vector of p predictor measurements X (also called independent variables, inputs, regressors, covariates, features)
• In regression problems, Y is quantitative (price, blood pressure)
• In classification problems, Y takes values in a finite, unordered set
(survived/died, digit 0-9, cancer class of tissue sample)
We often use G for classification labels (e.g. G ∈ {survived, died})
• We have training data (x1, y1)L(xN , yN). These are
observations (examples, instances) of these measurements.
![Page 12: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/12.jpg)
Objectives
On the basis of the training data we would like to:
• Accurately predict unseen test cases
• Understand which inputs affect the outcome, and how
• Assess the quality of our predictions and inferences
![Page 13: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/13.jpg)
Philosophy
• It is important to understand the ideas behind the various techniques, in order to know how and when to use them.
• One has to understand the simpler methods first, in order to grasp the more sophisticated ones.
• It is important to accurately assess the performance of a method, to know how well or how badly it is working [simpler methods often perform as well as fancier ones!]
• This is an exciting research area, having important applications in science, industry and finance.
![Page 14: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/14.jpg)
200 points generated in R2 from an unknown distribution; 100 in each of two classes G = {GREEN; RED}. Can we build a rule to predict the color of future points?
![Page 15: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/15.jpg)
Linear Regression
![Page 16: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/16.jpg)
The decision boundary is the points such that the prediction is 0.5 exactly.
It is linear (obviously) and seems to be making a lot of errors in prediction in this case
![Page 17: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/17.jpg)
Possible Scenarios
![Page 18: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/18.jpg)
K-Nearest Neighbors
![Page 19: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/19.jpg)
15-nearest neighbor classification. Fewer training data are misclassified, and the decision boundary adapts to the local densities of the classes.
![Page 20: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/20.jpg)
1-nearest neighbor classification. None of the training data are misclassified.
![Page 21: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/21.jpg)
Discussion• Linear regression uses 3 parameters to describe its fit.
K-nearest neighbors uses 1, the value of k?
• More realistically, k-nearest neighbors uses N/k effective number of parameters
Many modern procedures are variants of linear regression and K-nearest neighbors:
• Kernel smoothers (or viewed as RKHS regression)• Local linear regression• Linear basis expansions• Projection pursuit and neural networks• Support vector machines and logistic regression
![Page 22: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/22.jpg)
See page 17 for more details, or the book website for the actual data.
![Page 23: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/23.jpg)
The Bayes Error is the best performance possible: Using the decision boundary in the image attains this best possible performance
![Page 24: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/24.jpg)
![Page 25: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/25.jpg)
![Page 26: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/26.jpg)
![Page 27: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/27.jpg)
![Page 28: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/28.jpg)
How should we choose the right modeling approach?
• We want to minimize EPE• What kind of considerations do we need to keep in
mind?– Data in high dimension is sparse: Curse of Dimensionality
⇒ Makes estimation hard, affects some methods more– If the models we keep are too complex, they will be overfitted
⇒ Have high variance, be unstable– If the models are too simple, they will be too poor to represent
f(x)⇒ Have high bias, predict poorly
In the next few slides we will give a little more detail and examples, will revisit these concepts later
![Page 29: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/29.jpg)
The bias-variance decomposition
• In a regression setting, using squared loss• Assume we are building a model which predicts• What makes up our expected risk?
)(ˆ xf
22
2
2
))(ˆ)(ˆ())(ˆ()(
))(ˆ)(ˆ)(ˆ(
))(ˆ())(ˆ(
XfXfEEXfEEYYVar
XfXfEXfEEYEYYE
XfYEXfEPE
−+−+=
=−+−+−=
=−=
Irreducible error of best possible estimator:
)|()(ˆ XYEXf =
Squared bias, measuring our model’s lack of expressiveness
Variance of our model’s prediction
![Page 30: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/30.jpg)
![Page 31: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/31.jpg)
![Page 32: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/32.jpg)
![Page 33: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/33.jpg)
Effect as dimension p increases
![Page 34: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/34.jpg)
![Page 35: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/35.jpg)
![Page 36: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/36.jpg)
![Page 37: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/37.jpg)
![Page 38: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/38.jpg)
![Page 39: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/39.jpg)
![Page 40: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/40.jpg)
![Page 41: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/41.jpg)
![Page 42: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani](https://reader034.fdocuments.us/reader034/viewer/2022042909/5f3ace2e8fd12c64335222e3/html5/thumbnails/42.jpg)