Ensemble Learning

44
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan E-mail: [email protected]

description

Ensemble Learning. Lecturer: Dr. Bo Yuan E-mail: [email protected]. Real World Scenarios. VS. Real World Scenarios. What is ensemble learning?. Many individual learning algorithms are available: Decision Trees, Neural Networks, Support Vector Machines - PowerPoint PPT Presentation

Transcript of Ensemble Learning

Page 1: Ensemble Learning

LOGO

Ensemble Learning

Lecturer: Dr. Bo Yuan

E-mail: [email protected]

Page 2: Ensemble Learning

Real World Scenarios

VS.

2

Page 3: Ensemble Learning

Real World Scenarios

3

Page 4: Ensemble Learning

What is ensemble learning?

Many individual learning algorithms are available: Decision Trees, Neural Networks, Support Vector Machines

The process by which multiple models are strategically generated and combined in order to better solve a particular Machine Learning problem.

Motivations To improve the performance of a single model. To reduce the likelihood of an unfortunate selection of a poor model.

Multiple Classifier Systems

One idea, many implementations Bagging Boosting

4

Page 5: Ensemble Learning

Algorithm Hierarchy

Machine learning

Supervised learning

Classification

Single algorithms:SVM,DT,NN

Ensemble algorithm

s

Boosting Bagging

Semi-supervised learning Unsupervise

d learning

Clustering

5

Page 6: Ensemble Learning

Combination of Classifiers

6

Page 7: Ensemble Learning

Model Selection

7

Page 10: Ensemble Learning

Diversity

The key to the success of ensemble learning Need to correct the errors made by other classifiers. Does not work if all models are identical.

Different Learning Algorithms DT, SVM, NN, KNN …

Different Training Processes Different Parameters Different Training Sets Different Feature Sets

Weak Learners Easy to create different decision boundaries. Stumps …

10

Page 11: Ensemble Learning

Combiners

How to combine the outputs of classifiers.

Averaging

Voting Majority Voting

• Random Forest Weighted Majority Voting

• AdaBoost

Learning Combiner General Combiner

• Stacking Piecewise Combiner

• RegionBoost

No Free Lunch

11

Page 12: Ensemble Learning

Bagging

12

Page 13: Ensemble Learning

Bootstrap Samples

13

Sample 1

Sample 2

Sample 3

Page 14: Ensemble Learning

A Decision Tree

14

Page 15: Ensemble Learning

Tree vs. Forest

15

Page 16: Ensemble Learning

Random Forests

Developed by Prof. Leo Breiman Inventor of CART www.stat.berkeley.edu/users/breiman/ http://www.salfordsystems.com/ Breiman, L.: Random Forests. Machine Learning 45(1), 5–32, 2001

Bootstrap Aggregation (Bagging) Resample with Replacement Use around two third of the original data.

A Collection of CART-like Trees Binary Partition No Pruning Inherent Randomness

Majority Voting16

Page 17: Ensemble Learning

RF Main Features

Generates substantially different trees: Use random bootstrap samples of the training data. Use random subsets of variables for each node.

Number of Variables Square Root (K) K: total number of available variables Can dramatically speed up the tree building process.

Number of Trees 500 or more

Self-Testing Around one third of the original data are left out. Out of Bag (OOB) Similar to Cross-Validation

17

Page 18: Ensemble Learning

RF Advantages

All data can be used in the training process. No need to leave some data for testing. No need to do conventional cross-validation. Data in OOB are used to evaluate the current tree.

Performance of the entire RF Each data point is tested over a subset of trees. Depends on whether it is in the OOB.

High levels of predictive accuracy Only a few parameters to experiment with. Suitable for both classification and regression.

Resistant to overtraining (overfitting).

No need for prior feature selection.18

Page 19: Ensemble Learning

Stacking

19

Page 20: Ensemble Learning

Stacking

20

Page 21: Ensemble Learning

Boosting

21

Page 22: Ensemble Learning

Boosting

22

Page 23: Ensemble Learning

Boosting

Bagging reduces variance.

Bagging does not reduce bias.

In Boosting, classifiers are generated sequentially.

Focuses on most informative data points.

Training samples are weighted.

Outputs are combined via weighted voting.

Can create arbitrarily strong classifiers.

The base learners can be arbitrarily weak.

As long as they are better than random guess!

23

Page 24: Ensemble Learning

Boosting

24

Base classifier h1(x)

Base classifierh3(x)

Base classifierh2(x)

Training

Boosting classifierH(x) = sign(∑αihi(x))

Test Results

Page 25: Ensemble Learning

AdaBoost

25

Page 26: Ensemble Learning

Demo

26

Page 27: Ensemble Learning

Demo

27

Page 28: Ensemble Learning

Demo

28

Which one is bigger?

Page 29: Ensemble Learning

Example

29

Page 30: Ensemble Learning

The Choice of α

30

Page 31: Ensemble Learning

The Choice of α

31

Page 32: Ensemble Learning

The Choice of α

32

))(())(()(iiii

xhy xhyPexhyPee ii

1ln

21

)()(1

ln21

i iii

i iii

xhyPDxhyPD

Page 33: Ensemble Learning

Error Bounds

33

𝑟=∑𝑖𝐷 𝑖 𝑦 𝑖h (𝑥 𝑖 ) 𝜀=1−𝑟

2

¿√1−𝑟2

Page 34: Ensemble Learning

Summary of AdaBoost

Advantages Simple and easy to implement No parameters to tune Proven upper bounds on training set Immune to overfitting

Disadvantages Suboptimal α values Steepest descent Sensitive to noise

Future Work Theory Comprehensibility New Framework

34

Page 35: Ensemble Learning

Fixed Weighting Scheme

35

Page 36: Ensemble Learning

Dynamic Weighting Scheme

36

Base classifier

h1(x)

Base classifier

h3(x)

Base classifier

h2(x)

Boosting classifierH(x) = sign(∑αi(x)hi(x))Test Result

s

Estimatorα1(x)

Estimatorα2(x)

Estimatorα3(x)

Training

Page 37: Ensemble Learning

Boosting with Dynamic Weighting

37

Boosting with

dynamic weighting

RegionBoost iBoost DynaBoost WeightBoost

1

( ) ( ) ( )T

i

H x x h x

Page 38: Ensemble Learning

RegionBoost

AdaBoost assigns fixed weights to models.

However, different models emphasize different regions.

The weights of models should be input-dependent.

Given an input, only invoke appropriate models.

Train a competency predictor for each model.

Estimate whether the model is likely make a right decision.

Use this information as the weight.

Many classifiers can be used such as KNN and Neural Networks.

Maclin, R.: Boosting classifiers regionally. AAAI, 700-705, 1998.

38

Page 39: Ensemble Learning

RegionBoost

39Base Classifier Competency Predicator

Page 40: Ensemble Learning

RegionBoost with KNN

40

To calculate : Find the K nearest neighbors of xi in the training set.

Calculate the percentage of points correctly classified by hj.

)( ij x

Page 41: Ensemble Learning

RegionBoost Results

41

Page 42: Ensemble Learning

RegionBoost Results

42

Page 43: Ensemble Learning

Review

What is ensemble learning?

What can ensemble learning help us?

Two major types of ensemble learning: Parallel (Bagging) Sequential (Boosting)

Different ways to combine models: Average Majority Voting Weighted Majority Voting

Some representative algorithms Random Forests AdaBoost RegionBoost

43

Page 44: Ensemble Learning

Next Week’s Class Talk

Volunteers are required for next week’s class talk.

Topic: Applications of AdaBoost

Suggested Reading

Robust Real-Time Object Detection P. Viola and M. Jones International Journal of Computer Vision 57(2), 137-154.

Length: 20 minutes plus question time

44