Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… ·...
Transcript of Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… ·...
![Page 1: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/1.jpg)
Gilad Lerman
School of Mathematics
University of Minnesota
Topics in Machine
Learning
Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng
![Page 2: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/2.jpg)
Machine Learning - Motivation
• Arthur Samuel (1959): “Field of study that
gives computers the ability to learn
without being explicitly programmed”
![Page 3: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/3.jpg)
Machine Learning - Motivation
• Arthur Samuel (1959): “Field of study that
gives computers the ability to learn
without being explicitly programmed”
• In between, computer science, statistics,
optimization,…
• Three categories (soft dichotomy)
Supervised learning
Unsupervised learning
Reinforcement learning
![Page 4: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/4.jpg)
Difficulties
• Understanding the methods
(requires knowledge of various areas)
• Understanding data and application areas
• Sometimes hard to establish mathematical
guarantees
• Sometimes hard to code and test
• Fast developing area of research
![Page 5: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/5.jpg)
Simplification
• To avoid such difficulties, but obtain a fine
level of knowledge in 2 days, we’ll follow:
• Book is available online
• Plan: last 3 chapters (8-10)
and a bit more….
![Page 6: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/6.jpg)
Review
• Supervised learning (training and test
sets) vs. unsupervised learning
• Examples of supervised learning:
regression, classification
• Examples of unsupervised learning:
density/function estimation, clustering,
dimension reduction
• Recall: regression, bias-variance tradeoff,
resampling (e.g., cross validation), linear
and non-linear models
![Page 7: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/7.jpg)
Quick Review of Regression
and Nearest Neighbors
• Regression predicts a response variable Y (quantitative
variable) in terms of input variables (predictors) X1,…,Xp
given n samples in p; denote X=(X1,…,Xp)
• The regression function f(x)=E(Y|X=x) is the minimizer
of the mean square prediction error
• We cannot precisely compute f, since we have few if any
values of given x
![Page 8: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/8.jpg)
Estimating f by NN
![Page 9: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/9.jpg)
Remarks on NN and
Classification
• Need 𝑝 ≤ 4 and sufficiently large n
• Nearest neighbors tend to be far away in
high dimensions
• Can use kernel or spline smoothing
• Other common methods: parametric and
structure models
![Page 10: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/10.jpg)
Neighborhoods in Increasing
Dimensions
![Page 11: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/11.jpg)
More on Regression
• Assessing model accuracy:
![Page 12: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/12.jpg)
More on Regression
Flexibility = degrees of freedom (each square represents method with same color),
Dashed line explained later (irreducible error)
![Page 13: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/13.jpg)
More on Regression
![Page 14: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/14.jpg)
More on Regression
![Page 15: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/15.jpg)
More on Regression
![Page 16: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/16.jpg)
On Regression Error
• For an estimator 𝑓 learned on training set
the mean squared error is
𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2
• Assume 𝑌 = 𝑓 𝑋 + 𝜀, wher𝜀 is independent
noise with mean zero, then
𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2 = 𝐸(𝑓 𝑋 + 𝜀 − 𝑓 𝑋 |𝑋 = 𝑥)2
= 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 + Var(𝜀)• Var(𝜀) is the irreducible error
• 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 is the reducible error
( 𝑓 𝑋 depends on random training sample)
![Page 17: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/17.jpg)
Regression Error:
Bias and Variance
• 𝐸(𝑓 𝑋 − 𝑓 𝑋 |𝑋 = 𝑥)2 =
𝐸( 𝑓 𝑋 − 𝐸( 𝑓 𝑋 )|𝑋 = 𝑥)2 +
(𝐸( 𝑓 𝑋 |𝑋 = 𝑥) −𝑓 𝑥 )2=
Var( 𝑓 𝑋 |𝑋 = 𝑥)+Bias2(( 𝑓 𝑋 |𝑋 = 𝑥)
• 𝐸(𝑌 − 𝑓 𝑋 |𝑋 = 𝑥)2 =
Var( 𝑓 𝑋 |𝑋 = 𝑥)+Bias2(( 𝑓 𝑋 |𝑋 = 𝑥)+Var(𝜀)
![Page 18: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/18.jpg)
Bias-Variance Tradeoff
Two other tradeoffs:
![Page 19: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/19.jpg)
Bias-Variance Tradeoff
![Page 20: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/20.jpg)
Quick Review of Classification
and Nearest Neighbors
• Classification:
![Page 21: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/21.jpg)
Quick Review of Classification
and Nearest Neighbors
• Example:
![Page 22: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/22.jpg)
Quick Review of Classification
and Nearest Neighbors
![Page 23: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/23.jpg)
Quick Review of Classification
and Nearest Neighbors
![Page 24: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/24.jpg)
Quick Review of Classification
and Nearest Neighbors
![Page 25: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/25.jpg)
Quick Review of Classification
and Nearest Neighbors
![Page 26: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/26.jpg)
Quick Review of Classification
and Nearest Neighbors
![Page 27: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/27.jpg)
Chapter 9: SVM
![Page 28: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/28.jpg)
Separation of 2 Classes by a
hyperplane• Training set: 𝑛 points (𝑥i,1, … , 𝑥i,p) , 1 ≤ 𝑖 ≤ 𝑛,
with 𝑛 labels 𝑦i∈ −1,1 , 1 ≤ 𝑖 ≤ 𝑛• Separating hyperplane (if exists) satisfies:
![Page 29: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/29.jpg)
Separation of 2 Classes by a
hyperplane
Example:
![Page 30: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/30.jpg)
Separation of 2 Classes by a
hyperplane• If a separating hyperplane exists, then
for a test observation 𝑥*, a classifier is
obtained by the sign of
(negative (positive) sign → -1/1)
• The magnitude of 𝑓 𝑥 * provides
confidence on class assignment
•* * 2
0
1 1
d( ,Hyp.) /p p
i i
i i
x β β x β
![Page 31: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/31.jpg)
Maximal Margin Classifier
![Page 32: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/32.jpg)
Maximal Margin Classifier
• MMC is the solution of
• No explanation in book, but immediate for
a math student…
• Actual algorithm is not discussed…
![Page 33: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/33.jpg)
Numerical Solution (following
A. Ng’s Cs229 notes)
• Change of notation: y(i)=yi, 𝑥(i)=(𝑥i,1 , … , 𝑥i,p)
• Recall – Distance of (𝑥(i),y(i)) to a hyperplane
𝑤T𝑋 +b=0 is |𝑤T𝑥(i)+b|/ 𝑤
![Page 34: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/34.jpg)
Numerical Solution (following
A. Ng’s Cs229 notes)
Original Problem (non-convex):
Equivalent non-convex problem via
![Page 35: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/35.jpg)
Numerical Solution (following
A. Ng’s Cs229 notes)
Scale w and b by the same constant so that
(no effect on problem) and change
to the convex problem (quadratic program)
![Page 36: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/36.jpg)
Equivalent Formulation
(following A. Ng’s Cs229 notes)
Lagrangian:
Dual:
Solution: Hence:
(used later)
![Page 37: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/37.jpg)
A Non-separable Example
![Page 38: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/38.jpg)
Non-robustness of the
Maximal Margin Classifier
![Page 39: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/39.jpg)
The Support Vector Classifier
• If εi=0 → correct side of boundary
• If εi>0 → wrong side of margin
• If εi>1 → wrong side of hyperplane
• Solution is effected only by support vectors, i.e.,
observations on wrong side of margins or boundary.
![Page 40: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/40.jpg)
Concept Demonstration
![Page 41: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/41.jpg)
More on the Optimization
Problem
• C – controls # observations on wrong side of margin
• C – controls the bias-variance trade-off
• Optimizer is effected only by support vectors
Increasing C in
clock-wise order:
![Page 42: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/42.jpg)
Equivalent Formulation
(following A. Ng’s Cs229 notes)
• Dual:
• Similarly as before wTx is a linear
combination of <x,x(i)>
![Page 43: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/43.jpg)
Support Vector Machine (SVM)
• From linear to nonlinear boundaries by
embedding to a higher-dimensional space
• The algorithm can be written in terms of a
dot product
• Instead of embed to a very high-dimen.
space, replace dot products with kernels
![Page 44: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/44.jpg)
Clarification
![Page 45: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/45.jpg)
Clarification
![Page 46: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/46.jpg)
More (following book)
By solution of SVC (recall earlier comment)
Can use only support vectors for SVC
For SVM – replace dot products with kernels
![Page 47: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/47.jpg)
Demonstration
![Page 48: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/48.jpg)
SVM for K>2 Classes
• OVO (One vs. One): For training data,
construct 𝐾2
1/-1 classifiers (2 classes
out of K classes). For test point, use voting
(class with most pairwise assignments)
• OVA (One vs. All): For training, construct K
classifiers (class with 1 vs. rest of classes
with -1). For test x*, classify according to
largest estimated f(x*)
• OVO is better for K not too large
![Page 49: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/49.jpg)
Chapter 8: Tree-based
Methods (or CART)
• Decision Trees for Regression
• Demonstration of predicting log(salary/1000) as a func.
of # of years in major leagues and hits in previous year
• Terminology: leaf/terminal node, internal node, branch
![Page 50: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/50.jpg)
Chapter 8: Tree-based
Methods (or CART)
![Page 51: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/51.jpg)
Building a Decision Tree
• We wish to minimize the RSS (residual sum of squares):
• Computationally infeasible. Use instead recursive binary
splitting (top-down greedy procedure)
![Page 52: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/52.jpg)
Recursive Binary Splitting
• At each node (top to bottom) determine
predictor Xj and cutoff s minimizing
21
1 2
22
: ( , ): ( , )2 2
: ( , ) : ( , )1 2( , ) ( , )ii
i i
ii
i x R j si x R j s
i i
i x R j s i x R j s
yy
y yR j s R j s
![Page 53: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/53.jpg)
Recursive Binary Splitting
• For 𝑗 = 1, … , 𝑝, determine s that maximize
• Can be done by sorting the j-values and
checking all n-1 pairs (xi,xi+1)
(O(1) operations for each) and reporting
average of xi and xi+1, for max. i.
• Total cost is O(pn).
• We assumed continuous random variables (can
modify for discrete ones)
21
22
: ( , ): ( , )
1 2( , ) ( , )ii
ii
i x R j si x R j s
yy
R j s R j s
![Page 54: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/54.jpg)
More on Recursive Binary
Splitting
• The previous process is repeated until a stopping
criteria is met
• Predict response by mean of training
observations in region the test sample belong to
![Page 55: Topics in Machine Learning - University of Minnesotalerman/bootcamp/machine_learning_cours… · Gilad Lerman School of Mathematics University of Minnesota Topics in Machine Learning](https://reader034.fdocuments.us/reader034/viewer/2022051720/5a78b6c77f8b9a07028d1a41/html5/thumbnails/55.jpg)
Tree Pruning
• Continue page 17 of books’ slides trees.pdf