Taming the Learning Zoo
description
Transcript of Taming the Learning Zoo
![Page 1: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/1.jpg)
TAMING THE LEARNING ZOO
![Page 2: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/2.jpg)
2
SUPERVISED LEARNING ZOO Bayesian learning
Maximum likelihood Maximum a posteriori
Decision trees Support vector machines Neural nets k-Nearest-Neighbors
![Page 3: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/3.jpg)
VERY APPROXIMATE “CHEAT-SHEET” FOR TECHNIQUES DISCUSSED IN CLASS
Attributes N scalability D scalability
Capacity
Bayes nets D Good Good GoodNaïve Bayes D Excellent Excellent LowDecision trees D,C Excellent Excellent FairNeural nets C Poor Good GoodSVMs C Good Good GoodNearest neighbors
D,C Learn: E, Eval: P
Poor Excellent
![Page 4: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/4.jpg)
WHAT HAVEN’T WE COVERED? Boosting
Way of turning several “weak learners” into a “strong learner”
E.g. used in popular random forests algorithm Regression: predicting continuous outputs y=f(x)
Neural nets, nearest neighbors work directly as described
Least squares, locally weighted averaging Unsupervised learning
Clustering Density estimation Dimensionality reduction [Harder to quantify performance]
![Page 5: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/5.jpg)
AGENDA Quantifying learner performance
Cross validation Precision & recall
Model selection
![Page 6: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/6.jpg)
CROSS-VALIDATION
![Page 7: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/7.jpg)
ASSESSING PERFORMANCE OF A LEARNING ALGORITHM Samples from X are typically unavailable Take out some of the training set
Train on the remaining training set Test on the excluded instances Cross-validation
![Page 8: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/8.jpg)
CROSS-VALIDATION Split original set of examples, train
+
+
+
+
++
+
-
-
-
--
-
+
+
+
+
+
--
-
--
-Hypothesis space H
Train
Examples D
![Page 9: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/9.jpg)
CROSS-VALIDATION Evaluate hypothesis on testing set
+
+
+
+
++
+
-
-
-
--
-
Hypothesis space H
Testing set
![Page 10: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/10.jpg)
CROSS-VALIDATION Evaluate hypothesis on testing set
Hypothesis space H
Testing set
++
++
+
--
-
-
-
-
++
Test
![Page 11: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/11.jpg)
CROSS-VALIDATION Compare true concept against prediction
+
+
+
+
++
+
-
-
-
--
-
Hypothesis space H
Testing set
++
++
+
--
-
-
-
-
++
9/13 correct
![Page 12: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/12.jpg)
COMMON SPLITTING STRATEGIES k-fold cross-validation
Train TestDataset
![Page 13: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/13.jpg)
COMMON SPLITTING STRATEGIES k-fold cross-validation
Leave-one-out (n-fold cross validation)
Train TestDataset
![Page 14: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/14.jpg)
COMPUTATIONAL COMPLEXITY k-fold cross validation requires
k training steps on n(k-1)/k datapoints k testing steps on n/k datapoints (There are efficient ways of computing L.O.O.
estimates for some nonparametric techniques, e.g. Nearest Neighbors)
Average results reported
![Page 15: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/15.jpg)
BOOTSTRAPPING Similar technique for estimating the
confidence in the model parameters Procedure:1. Draw k hypothetical datasets from original
data. Either via cross validation or sampling with replacement.
2. Fit the model for each dataset to compute parameters k
3. Return the standard deviation of 1,…,k (or a confidence interval)
Can also estimate confidence in a prediction y=f(x)
![Page 16: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/16.jpg)
SIMPLE EXAMPLE: AVERAGE OF N NUMBERS Data D={x(1),…,x(N)}, model is constant Learning: minimize E() = i(x(i)-)2 => compute
average Repeat for j=1,…,k :
Randomly sample subset x(1)’,…,x(N)’ from D Learn j = 1/N i x(i)’
Return histogram of 1,…,j
10 100 1000 100000.44
0.46
0.48
0.5
0.52
0.54
0.56
AverageLower rangeUpper range
|Data set|
![Page 17: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/17.jpg)
17
PRECISION RECALL CURVES
![Page 18: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/18.jpg)
PRECISION VS. RECALL Precision
# of true positives / (# true positives + # false positives)
Recall # of true positives / (# true positives + # false
negatives) A precise classifier is selective A classifier with high recall is inclusive
18
![Page 19: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/19.jpg)
PRECISION-RECALL CURVES
19
Precision
Recall
Measure Precision vs Recall as the classification boundary is tuned
Better learningperformance
![Page 20: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/20.jpg)
PRECISION-RECALL CURVES
20
Precision
Recall
Measure Precision vs Recall as the classification boundary is tuned
Learner A
Learner B
Which learner is better?
![Page 21: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/21.jpg)
AREA UNDER CURVE
21
Precision
Recall
AUC-PR: measure the area under the precision-recall curve
AUC=0.68
![Page 22: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/22.jpg)
AUC METRICS A single number that measures “overall”
performance across multiple thresholds Useful for comparing many learners “Smears out” PR curve
Note training / testing set dependence
![Page 23: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/23.jpg)
MODEL SELECTION AND REGULARIZATION
![Page 24: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/24.jpg)
COMPLEXITY VS. GOODNESS OF FIT More complex models can fit the data better,
but can overfit Model selection: enumerate several possible
hypothesis classes of increasing complexity, stop when cross-validated error levels off
Regularization: explicitly define a metric of complexity and penalize it in addition to loss
![Page 25: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/25.jpg)
MODEL SELECTION WITH K-FOLD CROSS-VALIDATION Parameterize learner by a complexity level C Model selection pseudocode:
For increasing levels of complexity C: errT[C],errV[C] = Cross-Validate(Learner,C,examples)
[average k-fold CV training error, testing error] If errT has converged,
Find value Cbest that minimizes errV[C] Return Learner(Cbest,examples)
Needed capacity reached
![Page 26: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/26.jpg)
MODEL SELECTION: DECISION TREES C is max depth of decision tree. Suppose N
attributes For C=1,…,N:
errT[C],errV[C] = Cross-Validate(Learner,C, examples)
If errT has converged, Find value Cbest that minimizes errV[C] Return Learner(Cbest,examples)
![Page 27: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/27.jpg)
MODEL SELECTION: FEATURE SELECTION EXAMPLE Have many potential features f1,…,fN Complexity level C indicates number of
features allowed for learning For C = 1,…,N
errT[C],errV[C] = Cross-Validate(Learner, examples[f1,..,fC])
If errT has converged, Find value Cbest that minimizes errV[C] Return Learner(Cbest,examples)
![Page 28: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/28.jpg)
BENEFITS / DRAWBACKS Automatically chooses complexity level to
perform well on hold-out sets Expensive: many training / testing iterations
[But wait, if we fit complexity level to the testing set, aren’t we “peeking?”]
![Page 29: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/29.jpg)
REGULARIZATION Let the learner penalize the inclusion of new
features vs. accuracy on training set A feature is included if it improves accuracy
significantly, otherwise it is left out Leads to sparser models Generalization to test set is considered
implicitly Much faster than cross-validation
![Page 30: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/30.jpg)
REGULARIZATION Minimize:
Cost(h) = Loss(h) + Complexity(h) Example with linear models y = Tx:
L2 error: Loss() = i (y(i)-Tx(i))2
Lq regularization: Complexity(): j |j|q
L2 and L1 are most popular in linear regularization L2 regularization leads to simple computation
of optimal L1 is more complex to optimize, but produces
sparse models in which many coefficients are 0!
![Page 31: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/31.jpg)
DATA DREDGING As the number of attributes increases, the
likelihood of a learner to pick up on patterns that arise purely from chance increases
In the extreme case where there are more attributes than datapoints (e.g., pixels in a video), even very simple hypothesis classes can overfit E.g., linear classifiers Sparsity important to enforce
Many opportunities for charlatans in the big data age!
![Page 32: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/32.jpg)
ISSUES IN PRACTICE The distinctions between learning algorithms
diminish when you have a lot of data The web has made it much easier to gather
large scale datasets than in early days of ML Understanding data with many more
attributes than examples is still a major challenge! Do humans just have really great priors?
![Page 33: Taming the Learning Zoo](https://reader036.fdocuments.us/reader036/viewer/2022062310/568161f7550346895dd222ae/html5/thumbnails/33.jpg)
NEXT LECTURES Intelligent agents (R&N Ch 2) Markov Decision Processes Reinforcement learning Applications of AI: computer vision, robotics