Transcript of Classifier Ensembles Ludmila Kuncheva School of Computer Science Bangor University...
- Slide 1
- Classifier Ensembles Ludmila Kuncheva School of Computer
Science Bangor University mas00a@bangor.ac.uk Part 2 1
- Slide 2
- Combiner Features Classifier 2Classifier 1Classifier L Data set
A Combination level selection or fusion? voting or another
combination method? trainable or non-trainable combiner? and why
not another classifier? B Classifier level same or different
classifiers? decision trees, neural networks or other? how many? C
Feature level all features or subsets of features? random or
selected subsets? D Data level independent/dependent bootstrap
samples? selected data sets? Levels of questions Building ensembles
Building ensembles Boosting Random subspace Random Forest Rotation
Forest Bagging Linear Oracle
- Slide 3
- Combiner Features Classifier 2Classifier 1Classifier L Data set
A Combination level selection or fusion? voting or another
combination method? trainable or non-trainable combiner? and why
not another classifier? B Classifier level same or different
classifiers? decision trees, neural networks or other? how many? C
Feature level all features or subsets of features? random or
selected subsets? D Data level independent/dependent bootstrap
samples? selected data sets? Levels of questions Boosting Random
subspace Random Forest Rotation Forest Bagging Linear Oracle
Building ensembles Building ensembles This seems
under-researched...
- Slide 4
- Classifier combiners Nobody talks about this...
- Slide 5
- Label outputs Continuous-valued outputs 1 1 2 2 3 3 x 1 1 2 2 3
3 x Decision profile Combiner
- Slide 6
- Ensemble (label outputs, R,G,B) 204 R 102 G 54 B Red Blue Red
Green Red Majority vote Combiner
- Slide 7
- Ensemble (label outputs, R,G,B) 200 R 219 G 190 B Red Blue Red
Green Red Majority vote Green Weighted Majority vote 0.05 0.50 0.02
0.10 0.70 0.10 0.27 0.70 0.50 Combiner
- Slide 8
- Ensemble (label outputs, R,G,B) Red Blue Red Green Red RBRRGR
Classifier Green Combiner
- Slide 9
- Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0
0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8]
Combiner
- Slide 10
- Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0
0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R =
0.45 Combiner
- Slide 11
- Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0
0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R =
0.45 Mean G = 0.48 Combiner
- Slide 12
- Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0
0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R =
0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN Combiner
- Slide 13
- Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0
0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R =
0.45 Mean B = 0.35 Class GREEN Decision profile 0.6 0.3 0.1 0.1 0.0
0.6 0.7 0.6 0.5 0.4 0.3 0.1 0.0 1.0 0.0 0.9 0.7 0.8 Combiner Mean G
= 0.48
- Slide 14
- Time for an example: combiner matters
- Slide 15
- Data set: Lets call this data The Tropical Fish or just the
fish data. 50-by-50 = 2500 objects in 2-d Bayes error rate = 0%
Induce label noise to make the problem more interesting noise
10%noise 45%
- Slide 16
- Example: 2 ensembles Train 50 linear classifiers on bootstrap
samples Throw 50 straws and label the fish side so that the
accuracy is greater than 0.5
- Slide 17
- Example: 2 ensembles Each classifier returns an estimate for
class Fish And, of course, we have but we will not need this.
- Slide 18
- Example: 2 ensembles 10% label noise
- Slide 19
- Example: 2 ensembles 45% label noise
- Slide 20
- Example: 2 ensembles 45% label noise
- Slide 21
- Example: 2 ensembles 45% label noise
- Slide 22
- What does the example show? The combiner matters (a lot) Noise
helps the ensemble! The trained combiner for continuous labels is
best (linear, tree) BKS works because of the small number of
classes and classifiers Example: 2 ensembles However, nothing is as
simple as it looks...
- Slide 23
- http://samcnitt.tumblr.com/ The Combining Classifier: to Train
or Not to Train?
- Slide 24
- Slide 25
- Train the COMBINER if you have enough data! Otherwise, like
with any classifier, we may over- fit the data. Get this: Almost
NOBODY trains the combiner, not in the CLASSIC ensemble methods
anyway. Ha-ha-ha, what is enough data?
- Slide 26
- Diversity Everybody talks about this...
- Slide 27
- Publications (580) Citations (4594) CLASSIFIER ENSEMBLE
DIVERSITY Search on 10 Sep 2014 Diversity
- Slide 28
- MULTIPLE CLASSIFIER SYSTEMS 30 INT JOINT CONF ON NEURAL
NETWORKS (IJCNN) 22 PATTERN RECOGNITION 17 NEUROCOMPUTING 14 EXPERT
SYSTEMS WITH APPLICATIONS 13 INFORMATION SCIENCES 12 APPLIED SOFT
COMPUTING 11 PATTERN RECOGNITION LETTERS 10 INFORMATION FUSION 9
IEEE INT JOINT CONF ON NEURAL NETWORKS 9 KNOWLEDGE-BASED SYSTEMS 7
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 7 INT J OF
PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE 6 MACHINE LEARNING
5 IEEE TRANSACTIONS ON NEURAL NETWORKS 5 JOURNAL OF MACHINE
LEARNING RESEARCH 5 APPLIED INTELLIGENCE 4 INTELLIGENT DATA
ANALYSIS 4 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 4 ADVANCES
IN KNOWLEDGE DISCOVERY AND DATA MINING 4 NEURAL INFORMATION
PROCESSING 4 580 papers Diversity
- Slide 29
- Where in the world are we? China 140 UK 68 USA 63 Spain 55
Brazil 41 Canda 32 Poland 28 Iran 23 Italy 19... Diversity
- Slide 30
- Are we still talking about diversity in classifier ensembles?
Apparently yes... That elusive diversity... We want the classifiers
in the ensemble to be ACCURATE and DIVERSE simultaneously. And HOW
CAN THIS HAPPEN?!? Diversity
- Slide 31
- All ensemble methods we have seen so far strive to keep the
individual accuracy high while increasing diversity. How can we
measure diversity? WHAT can we do with the diversity value?
- Slide 32
- Measure diversity for a PAIR of classifiers Classifier 2
Classifier 1 correct wrong correct wrong independent outputs
independent errors hence, use ORACLE outputs Number of instances
labelled correctly by classifier 1 and mislabelled by classifier 2
Diversity
- Slide 33
- Classifier 2 Classifier 1 correct wrong correct wrong Q kappa
correlation (rho) disagreement double fault... Diversity
- Slide 34
- SEVENTY SIX !!! Diversity
- Slide 35
- Do we need more NEW pairwise diversity measures? Looks like we
dont... Diversity And the same holds for non-pairwise measures...
Far too many already.
- Slide 36
- All ensemble methods we have seen so far strive to keep the
individual accuracy high while increasing diversity. How can we
measure diversity? WHAT can we do with the diversity value?
-Compare ensembles -Explain why a certain ensemble heuristic works
and others dont -Construct ensemble by overproducing and selecting
classifiers with high accuracy and high diversity
- Slide 37
- Why is diversity so baffling? The problem is that diversity is
NOT monotonically related to the ensemble accuracy. In other words,
diverse ensembles may be good or may be bad...
- Slide 38
- Good diversity and bad diversity
- Slide 39
- Good and Bad diversity 3 classifiers: A, B, C 15 objects, wrong
vote, correct vote individual accuracy = 10/15 = 0.667 P = ensemble
accuracy independent classifiers P = 11/15 = 0.733 identical
classifiers P = 10/15 = 0.667 dependent classifiers 1 P = 7/15 =
0.467 dependent classifiers 2 P = 15/15 = 1.000 ABCABC ABCABC
ABCABC ABCABC MAJORITY VOTE
- Slide 40
- Good and Bad diversity 3 classifiers: A, B, C 15 objects, wrong
vote, correct vote individual accuracy = 10/15 = 0.667 P = ensemble
accuracy independent classifiers P = 11/15 = 0.733 identical
classifiers P = 10/15 = 0.667 dependent classifiers 1 P = 7/15 =
0.467 dependent classifiers 2 P = 15/15 = 1.000 ABCABC ABCABC
ABCABC ABCABC MAJORITY VOTE Good diversity Bad diversity
- Slide 41
- Good and Bad diversity Data set Z Ensemble, L = 7 classifiers
Are these outputs diverse?
- Slide 42
- Good and Bad diversity Data set Z Ensemble, L = 7 classifiers
How about these?
- Slide 43
- Good and Bad diversity Data set Z Ensemble, L = 7 classifiers 3
vs 4... Cant be more diverse, really...
- Slide 44
- Good and Bad diversity Data set Z Ensemble, L = 7 classifiers
MAJORITY VOTE Good diversity
- Slide 45
- Good and Bad diversity Data set Z Ensemble, L = 7 classifiers
MAJORITY VOTE Bad diversity
- Slide 46
- Good and Bad diversity maj maj Decomposition of the Majority
Vote Error Individual error Subtract GOOD diversity Add BAD
diversity Brown G., L.I. Kuncheva, "Good" and "bad" diversity in
majority vote ensembles, Proc. Multiple Classifier Systems
(MCS'10), Cairo, Egypt, LNCS 5997, 2010, 124-133.
- Slide 47
- Good and Bad diversity Note that diversity quantity is 3 in
both cases
- Slide 48
- Ensemble Margin POSITIVE NEGATIVE
- Slide 49
- Ensemble Margin Average margin However, nearly all diversity
measures are functions of Average absolute margin or Average square
margin Margin has no sign...
- Slide 50
- Ensemble Margin
- Slide 51
- The bottom line is: Diversity is not MONOTONICALLY related to
ensemble accuracy So, stop looking for what is not there...
- Slide 52
- Where next in classifier ensembles?
- Slide 53
- proposed by Margineantu and Dietterich in 1997 visualise
individual accuracy and diversity in a 2-dimensional plot have been
used to decide which ensemble members can be pruned without much
harm to the overall performance Kappa-error diagrams
- Slide 54
- Adaboost 75.0% Bagging 77.0% Random subspace 80.9% Random
oracle 83.3% Rotation Forest 84.7% sonar data (UCI): 260 instances,
60 features, 2 classes, ensemble size L = 11 classifiers, base
model tree C4.5 Example Kuncheva L.I., A bound on kappa-error
diagrams for analysis of classifier ensembles, IEEE Transactions on
Knowledge and Data Engineering, 2013, 25 (3), 494-501 (DOI:
10.1109/TKDE.2011.234).
- Slide 55
- correctwrong C1 correct ab wrong cd C2 error kappa = (observed
chance)/(1-chance) Kappa-error diagrams
- Slide 56
- bound (tight) bound (tight) error kappa Kappa-error
diagrams
- Slide 57
- error kappa Kappa-error diagrams simulated ensembles L = 3
- Slide 58
- error kappa Real data: 77,422,500 pairs of classifiers room for
improvement
- Slide 59
- Is there space for new classifier ensembles? Looks like
yes...
- Slide 60
- Number of classifiers L 1 The perfect classifier 3-8
classifiers heterogeneous trained combiner (stacked generalisation)
100+ classifiers same model non-trained combiner (bagging,
boosting, etc.) Large ensemble of nearly identical classifiers -
REDUNDANCY Small ensembles of weak classifiers - INSUFFICIENCY ? ?
Must engineer diversity Strength of classifiers How about here?
30-50 classifiers same or different models? trained or non-trained
combiner? selection or fusion?
- Slide 61
- 61 MathWorks recommendations: AdaBoost and... wait for it...
wait for iiiiit... AdaBoost
- Slide 62
- 62 plus, is quite expensive MathWorks recommendations:
- Slide 63
- One final play instead of conclusions...
- Slide 64
- 64 For the winner by my favourite illustrator Marcello Barenghi
Well, Ill give you a less crinkled one :)
- Slide 65
- 65 Time for you now... Recall our digit example The competitors
are: Bagging, AdaBoost, Random Forest, Random Subspace and Rotation
Forest ALL with 10 decision trees A guessing game Data for this
example: A small part of MNIST... decision tree 68.2% YOUR TASK:
Rank the competitors and predict the ensemble accuracy for each
one. The WINNER will be a correct ranking and predictions within 3%
of the true accuracies. (MSE for a tie-break) The judge is
WEKA
- Slide 66
- decision tree 68.2% 4. Random Forest 78.7% 1. Rotation Forest
85.0% 2. AdaBoost 82.9% 5. Bagging 75.6% 3. Random Subspace 79.1%
Ensembles of 10
- Slide 67
- decision tree 68.2% 4. Random Forest 78.7% 1. Rotation Forest
85.0% 2. AdaBoost 82.9% 5. Bagging 75.6% 3. Random Subspace 79.1%
Ensembles of 10 But you know what the funny thing is?...
- Slide 68
- Rotation Forest 85.0% AdaBoost 82.9% Random Subspace 79.1%
Random Forest 78.7% Bagging 75.6% decision tree 68.2% 1-nn 87.4%
SVM 89.5%
- Slide 69
- The moral of the story... 1. There may be a simpler solution.
Dont overlook it! 2. The most acclaimed methods are not always the
best. Heeeeey, this proves fallibility of my classifier ensemble
theory, Marcello Pelillo! (who left already...) :(
- Slide 70
- Everyone, WAKE UP! And thank you for still being here :)
Everyone, WAKE UP! And thank you for still being here :) 1.
Classifier combiners. Nobody talks about this... 2. Time for an
example: combiner matters 3. Diversity. Everybody talks about
this... 4. Good diversity and bad diversity 5. Where next in
classifier ensembles? 6. One final play instead of
conclusions...