Today’s Topics Ensembles Decision Forests (actually, Random Forests) Bagging and Boosting Decision...

Lecture 1, Slide 1

Today’s Topics

• Ensembles

• Decision Forests (actually, Random Forests)

• Bagging and Boosting

• Decision Stumps

9/22/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

CS 540 - Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Ensembles(Bagging, Boosting, and all that)

Old View– Learn one good model

New View– Learn a good set of models

Probably best example of interplay between ‘theory & practice’ in machine learning

Naïve Bayes, k-NN, neural net,d-tree, SVM, etc

9/22/15 2


Ensembles of Neural Networks(or any supervised learner)

• Ensembles often produce accuracy gains of 5-10 percentage points!

• Can combine “classifiers” of various types– Eg, decision trees, rule sets, neural networks, etc.

Network Network Network

INPUT

Combiner

OUTPUT

3

Three Explanations of Why Ensembles Help


1.Statistical (sample effects)

2.Computational (limited cycles for search)

3.Representational(wrong hypothesis space)

From: Dietterich, T. G. (2002). Ensemble Learning. In The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. 405-408

Key true concept learned models search path

Concept Space Considered

4


Combining Multiple ModelsThree ideas for combining predictions

1. Simple (unweighted) votes• Standard choice

2. Weighted votes• eg, weight by tuning-set accuracy

3. Learn a combining function• Prone to overfitting?• ‘Stacked generalization’ (Wolpert)

5


Random Forests(Breiman, Machine Learning 2001; related to Ho, 1995)

A variant of something called BAGGING (‘multi-sets’)

Algorithm

Repeat k times

(1) Draw with replacement N examples, put in train set

(2) Build d-tree, but in each recursive call– Choose (w/o replacement) i features– Choose best of these i as the root

of this (sub)tree

(3) Do NOT prune

In HW2, we’ll give you 101 ‘bootstrapped’ samples of the Thoracic Surgery Dataset

Let N = # of examples F = # of features i = some number << F

9/22/15 6

http://pages.cs.wisc.edu/~shavlik/cs540/HWs/HW0/Thoracic_Surgery_Data_train.data


Using Random Forests

After training we have K decision trees

How to use on TEST examples?Some variant of If at least L of these K trees say ‘true’ then output ‘true’

How to choose L ?Use a tune set to decide

9/22/15 7


More on Random Forests• Increasing i

– Increases correlation among individual trees (BAD)– Also increases accuracy of individual trees (GOOD)

• Can also use tuning set to choose good value for i

• Overall, random forests– Are very fast (eg, 50K examples, 10 features,

10 trees/min on 1 GHz CPU back in 2004)– Deal well with large # of features– Reduce overfitting substantially; NO NEED TO PRUNE!– Work very well in practice

9/22/15 8


A Relevant Early Paper on ENSEMBLESHansen & Salamen, PAMI:20, 1990

– If (a) the combined predictors have errors that are independent from one another

– And (b) prob any given model correct predicts any given testset example is > 50%, then

0)predictors of rateerror set test (lim

NN

9


Some MoreRelevant Early Papers

• Schapire, Machine Learning:5, 1990 (‘Boosting’)– If you have an algorithm that gets > 50% on any

distribution of examples, you can create an algorithm that gets > (100% - ), for any > 0

– Need an infinite (or very large, at least) source of examples

- Later extensions (eg, AdaBoost) address this weakness

• Also see Wolpert, ‘Stacked Generalization,’Neural Networks, 1992

10


Some Methods for Producing ‘Uncorrelated’ Members of an Ensemble

• K times randomly choose (with replacement) N examples from a training set of size N

• Give each training set to a std ML algo– ‘Bagging’ by Brieman (Machine Learning, 1996)

– Want unstable algorithms (so learned models vary)

• Reweight examples each cycle (if wrong, increase weight; else decrease weight)– ‘AdaBoosting’ by Freund & Schapire (1995, 1996)

11

Empirical Studies (from Freund & Schapire; reprinted in Dietterich’s AI Magazine paper)

Error Rate of C4.5

Error Rate of Bagged (Boosted) C4.5

(Each point one data set)

Boosting and Bagging helped almost always!

Error Rate of AdaBoost

Error Rate of Bagging

On average, Boosting slightly better?

9/22/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 7, Week 3 12

ID3 successor


Some More Methods for Producing “Uncorrelated” Members of an Ensemble

• Directly optimize accuracy + diversity– Opitz & Shavlik (1995; used genetic algo’s)– Melville & Mooney (2004-5)

• Different number of hidden units in a neural network, different k in k -NN, tie-breaking scheme, example ordering, diff ML algos, etc– Various people– See 2005-2008 papers of Rich Caruana’s group

for large-scale empirical studies of ensembles

13

Boosting/Bagging/etc Wrapup • An easy to use and usually highly

effective technique- always consider it (Bagging, at least) when

applying ML to practical problems

• Does reduce ‘comprehensibility’ of models- see work by Craven & Shavlik though (‘rule extraction’)

• Increases runtime, but cycles usually much cheaper than examples (and easily parallelized)

9/22/15 CS 540 - Fall 2015 (© Jude Shavlik), Lecture 7, Week 3 14


Decision “Stumps”(formerly part of HW; try on your own!)

• Holte (ML journal) compared:– Decision trees with only one decision (decision stumps)

vs– Trees produced by C4.5 (with pruning algorithm used)

• Decision ‘stumps’ do remarkably well on UC Irvine data sets– Archive too easy? Some datasets seem to be

• Decision stumps are a ‘quick and dirty control for comparing to new algorithms– But ID3/C4.5 easy to use and probably a better control

15


C4.5 Compared to 1R (‘Decision Stumps’)

See Holte paper in Machine Learning for key (eg, HD=heart disease)

Dataset C4.5 1R

BC 72.0% 68.7%

CH 99.2% 68.7%

GL 63.2% 67.6%

G2 74.3% 53.8%

HD 73.6% 72.9%

HE 81.2% 76.3%

HO 83.6% 81.0%

HY 99.1% 97.2%

IR 93.8% 93.5%

LA 77.2% 71.5%

LY 77.5% 70.7%

MU 100.0% 98.4%

SE 97.7% 95.0%

SO 97.5% 81.0%

VO 95.6% 95.2%

V1 89.4% 86.8%

Testset Accuracy

16

Today’s Topics Ensembles Decision Forests (actually, Random Forests) Bagging and Boosting Decision...

Documents

Transcript of Today’s Topics Ensembles Decision Forests (actually, Random Forests) Bagging and Boosting Decision...