Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer...
-
Upload
aubrey-todd -
Category
Documents
-
view
214 -
download
1
Transcript of Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer...
Classifier Classifier Ensembles: Facts, Ensembles: Facts, Fiction, Faults and Fiction, Faults and
Future Future
Ludmila I KunchevaSchool of Computer ScienceBangor University, Wales, UK
1. Facts1. Facts
classifier
feature values(object description)
class label
Classifier ensembles
classifier
feature values(object description)
classifier classifier
class label
“combiner”
Classifier ensembles
feature values(object description)
class label
a neural network
classifier
combiner
classifier
Classifier ensembles
ensemble?
classifier
feature values(object description)
class label
Classifier ensembles
class
ifie
rcl
ass
ifie
rcl
ass
ifie
rcl
ass
ifie
rcl
ass
ifie
rcl
ass
ifie
r
combinerensemble?
a fancy combiner
classifier
feature values(object description)
classifier classifier
class label
combinerclassifier?
Classifier ensembles
a fancy feature
extractorclassifier
a. because we like to complicate entities beyond necessity(anti-Occam’s razor)
b. because we are lazy and stupid and can’t be bothered todesign and train one single sophisticated classifier
c. because democracy is so important to our society, it mustbe important to classification
Why classifier ensembles then?
Classifier ensembles
Juan : “I just like to combine things…”
Classifier ensembles
Juan : “I just like combining things…”
combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]committees of neural networks [Bishop95,Drucker94]consensus aggregation [Benediktsson92,Ng92,Benediktsson97]voting pool of classifiers [Battiti94]dynamic classifier selection [Woods97]composite classifier systems [Dasarathy78]classifier ensembles [Drucker94,Filippi94,Sharkey99]bagging, boosting, arcing, wagging [Sharkey99]modular systems [Sharkey99]collective recognition [Rastrigin81,Barabash83]stacked generalization [Wolpert92]divide-and-conquer classifiers [Chiang94]pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93]etc.
fanciest
oldest
Classifier ensembles
oldest
Moscow, Energoizdat, 1981
≈ 1 c
The method of collective recognition
classifier ensemble
classifier selection(regions of
competence)
weighted majority vote
Collective statistical decisions in [pattern] recognition
Moscow, Radio I svyaz’, 1983
weighted majority vote
This superb graph was borrowed from “Fuzzy models and digital signal processing (for pattern recognition): Is this a good marriage?”, Digital Signal Processing, 3, 1993, 253-270, by my good friend Jim Bezdek.Jim Bezdek.
Exp
ect
ati
on
1965 1970 1975 1980 1985 1993
Naive euphoria
Peak of hype
Overreaction to immature technology
Depth of Cynicism
True user benefit
Asymptote of reality
Exp
ect
ati
on
1965 1970 1975 1980 1985 1993
Naive euphoria
Peak of hype
Overreaction to immature technology
Depth of Cynicism
True user benefit
Asymptote of reality
So where are we?
Exp
ect
ati
on
1978 2008 2008 2008 2008 2008
Naive euphoria
Peak of hype
Overreaction to immature technology
Depth of Cynicism
True user benefit
Asymptote of reality
11 22 33 44 55
So where are we?
To make the matter worse...
Expert 1:Expert 1: J. Ghosh
Forum:Forum: 3rd International Workshop on Multiple Classifier Systems, 2002 (invited lecture)
Quote:Quote: “... our current understandingcurrent understanding of ensemble-type multiclassifier systems is now quite matureis now quite mature...”
Expert 2:Expert 2: T.K. Ho
Forum:Forum: Invited book chapter, 2002
Quote:Quote: “Many of the above questions are there because we do not yet have a scientific understandingwe do not yet have a scientific understanding of the classifier combination mechanisms”
half full
half empty
2000 2002 2004 2006 20080
50
100
150
200
250
3001. Classifier ensembles
2. AdaBoost – (1)
3. Random Forest – (1) – (2)
4. Decision Templates – (1) – (2) – (3)
Number of publications (13 Nov 2008)
inco
mp
lete
for
20
08
2000 2002 2004 2006 20080
100
200
300
400
500 1. Classifier ensembles
2. AdaBoost – (1)
3. Random Forest – (1) – (2)
4. Decision Templates – (1) – (2) – (3)
Number of publications (13 Nov 2008)
inco
mp
lete
for
20
08
Literature
“One cannot embrace the unembraceable.”
Kozma Prutkov
ICPR 2008984 papers~2000 words in the titlesfirst 2 principal components
imagesegmentfeature
videotrackobject
featurelocalselect
classifier ensembles
Exp
ect
ati
on
1978 2008 2008 2008 2008 2008
Naive euphoria
Peak of hype
Overreaction to immature technology
Depth of Cynicism
True user benefit
Asymptote of reality
11 22 33 44 55
So where are we?
still here… somewhere…
2. Fiction2. Fiction
Fiction?
Diversity. Diverse ensembles are better ensembles? Diversity = independence?
Adaboost.“The best off-the-shelf classifier”?
If these reports differ from one another, the computer identifies the two reports with the greatest overlap and produces a "majority report," taking this as the accurate prediction of the future.
But the existence of majority reports implies the existence of a "minority report."
- a science fiction short story by Philip K. Dick first published in 1956. It is about a future society where murders are prevented through the efforts of three mutants (“precogs”) who can see two weeks ahead in the future. The story was made into a popular film in 2002.
Each of the three “precogs” generates its own report or prediction. The three reports of are analysed by a computer.
Minority Report
Classifier E
nsemble
And, of course, the most interesting case is when the classifiers disagree – the minority report.
Diversity is goodWrong Correct3 classifiers
individual accuracy = 10/15 = 0.667
independent classifiersensemble accuracy (majority vote)= 11/15 = 0.733
identical classifiersensemble accuracy (majority vote)= 10/15 = 0.667
dependent classifiers 1ensemble accuracy (majority vote)= 7/15 = 0.467
dependent classifiers 2ensemble accuracy (majority vote)= 15/15 = 1.000
Myth: Independence is the best scenario.Myth: Diversity is always good.
identical 0.667independent 0.733dependent 1 0.467dependent 2 1.000
worse than individualbetter than independence
Example
The set-up • UCI data repository• “heart” data set• First 9 features; all 280 different partitions into [3, 3, 3] • Ensemble of 3 linear classifiers• Majority vote• 10-fold cross-validation
What we measured:
• Individual accuracies of the ensemble members• The ensemble accuracy• The ensemble diversity (just one of all these measures…)
0.4 0.5 0.6 0.7 0.8 0.90.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
ensemble is better
Ense
mble
acc
ura
cy
Individual accuracy
average individual accuracy
minimum individual accuracy
maximum individual accuracy
Example
280 ensembles
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.8
Example
diversity
Ense
mble
acc
ura
cy ?
more diverse
less
acc
ura
te
Example
0.660.68
0.70.72
0.74
0
0.2
0.4
0.6
0.80.72
0.74
0.76
0.78
0.8
0.82
IndividualDiversity
Ens
embl
e
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
Example
diversity
Indiv
idual acc
ura
cy
expected large ensemble accuracy
large
AdaBoost is everything
AdaBoost
AdaBoost
AdaBoost
AdaBoost
AdaBoostAdaBoost
AdaBoost
Bagging
Swiss Army Knife
Russian Army Knife
Surely, there is Surely, there is more to more to
combining combining classifiers than classifiers than
Bagging and Bagging and AdaBoostAdaBoost
“This altogether gives a very bad impression of ill-conceived experiments and confusing and unreliable conclusions. ... The current spotty conclusions are The current spotty conclusions are incomprehensible, and are of no generalization incomprehensible, and are of no generalization or reference value.or reference value.”
“This is a potentially great new method and any experimental analysis would be very useful for understanding its potential. Good study, with very Good study, with very useful information in the Conclusions.useful information in the Conclusions.”
Example – Rotation Forest
20 40 60 80 1000
20
40
60
80
100
Rotation Forest
Random Forest
Boosting
Bagging
Ensemble size
% of data sets (out of 32) where the respective ensemble method is best
So, no,AdaBoost is NOT everything
3. Faults3. Faults
OUR faults!
Complacent: We don’t care about terminology.
Vain: To get publications, we invent complex models for simple problems or, worse even, complex non-existent problems.
Untidy: There is little effort to systemise the area.
Ignorant and lazy: By virtue of ignorance we tackle problems well and truly solved by others. Krassi’s motto “I don’t have time to read papers because I am busy writing them”.
Haughty: Simple things that work do not impress us until they get proper theoretical proofs.
God, seeing what the people were doing, gave each person a different language
to confuse them and scattered the people throughout the earth…
image taken from http://en.wikipedia.org/wiki/Tower_of_Babel
•Pattern recognition land
•Data mining kingdom
•Machine learning ocean
•Statistics underworld and…
•Weka…
Terminology
instance
example
observation
data point
object
attribute
feature
variable
classifier
hypothesis
SVM
SMO
nearest neighbour
lazy learner
classifier ensemble
learner
naïve Bayes
AODE
decision tree
C4.5
J48
SVM SMO
nearest neighbour lazy learner
classifier hypothesislearner
naïve Bayes AODE
decision tree C4.5 J48
instance example observationdata pointobject
attributefeature variable
classifier ensemble meta learner
ML
Stats
Weka
combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]committees of neural networks [Bishop95,Drucker94]consensus aggregation [Benediktsson92,Ng92,Benediktsson97]voting pool of classifiers [Battiti94]dynamic classifier selection [Woods97]composite classifier systems [Dasarathy78]classifier ensembles [Drucker94,Filippi94,Sharkey99]bagging, boosting, arcing, wagging [Sharkey99]modular systems [Sharkey99]collective recognition [Rastrigin81,Barabash83]stacked generalization [Wolpert92]divide-and-conquer classifiers [Chiang94]pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93]etc.
Out of fashion
Out of fashion
Subsumed Subsumed
Classifier ensembles - names
combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]
classifier ensembles [Drucker94,Filippi94,Sharkey99]
United terminology! Yey!
MCS – Multiple Classifier Systems Workshops 2000-2009
Simple things that work…
We detest simple things that work well for an unknown reason!!!
flagship of THEORY…
Ideal scenario…
empirics and applications
hijacked by heuristics…
HEURISTICS
Real
theory
Simple things that work…
We detest simple things that work well for an unknown reason!!!
Lessons from the past:
Fuzzy sets •stability of the system?•reliability?•optimality?•why not probability?
Who cares?...•temperature for washing machine programmes•automatic focus in digital cameras•ignition angle of internal combustion in cars
Because it is•computationally simpler (faster)•easier to build, interpret and maintain
Learn to trust heuristics and empirics…
4. Future4. Future
Exp
ect
ati
on
1978 2008
Asymptote of reality
Future Branch out ?
Multiple instance learning
Non i.i.d. examples
Skewed class distributions
Noisy class labels
Sparse data
Non-stationary data
classifier ensembles for changing environments classifier ensembles for change detection
D.J. HandClassifier Technology and the Illusion of ProgressStatistical Science 21(1), 2006, 1-14.
“… I am not suggesting that no major advances in classification methods will ever be made. Such a claim would be absurd in the face of developments such as the bootstrap and other resampling approaches, which have led to significant advances in classification and other statistical models. All I am saying is that much of the purported advance may well be illusory.”...
So have we truly made progress or are we just kidding ourselves?
Empty-y-y…
(not even half empty-y-y-y …)
Bo-o-ori-i-i-ing....Bo-o-ori-i-i-ing....