Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer...

52

Transcript of Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer...

Page 1: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.
Page 2: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Classifier Classifier Ensembles: Facts, Ensembles: Facts, Fiction, Faults and Fiction, Faults and

Future Future

Ludmila I KunchevaSchool of Computer ScienceBangor University, Wales, UK

Page 3: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

1. Facts1. Facts

Page 4: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

classifier

feature values(object description)

class label

Classifier ensembles

Page 5: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

classifier

feature values(object description)

classifier classifier

class label

“combiner”

Classifier ensembles

Page 6: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

feature values(object description)

class label

a neural network

classifier

combiner

classifier

Classifier ensembles

ensemble?

Page 7: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

classifier

feature values(object description)

class label

Classifier ensembles

class

ifie

rcl

ass

ifie

rcl

ass

ifie

rcl

ass

ifie

rcl

ass

ifie

rcl

ass

ifie

r

combinerensemble?

a fancy combiner

Page 8: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

classifier

feature values(object description)

classifier classifier

class label

combinerclassifier?

Classifier ensembles

a fancy feature

extractorclassifier

Page 9: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

a. because we like to complicate entities beyond necessity(anti-Occam’s razor)

b. because we are lazy and stupid and can’t be bothered todesign and train one single sophisticated classifier

c. because democracy is so important to our society, it mustbe important to classification

Why classifier ensembles then?

Page 10: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Classifier ensembles

Juan : “I just like to combine things…”

Page 11: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Classifier ensembles

Juan : “I just like combining things…”

Page 12: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]committees of neural networks [Bishop95,Drucker94]consensus aggregation [Benediktsson92,Ng92,Benediktsson97]voting pool of classifiers [Battiti94]dynamic classifier selection [Woods97]composite classifier systems [Dasarathy78]classifier ensembles [Drucker94,Filippi94,Sharkey99]bagging, boosting, arcing, wagging [Sharkey99]modular systems [Sharkey99]collective recognition [Rastrigin81,Barabash83]stacked generalization [Wolpert92]divide-and-conquer classifiers [Chiang94]pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93]etc.

fanciest

oldest

Classifier ensembles

oldest

Page 13: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Moscow, Energoizdat, 1981

≈ 1 c

The method of collective recognition

Page 14: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

classifier ensemble

classifier selection(regions of

competence)

weighted majority vote

Page 15: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Collective statistical decisions in [pattern] recognition

Moscow, Radio I svyaz’, 1983

Page 16: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

weighted majority vote

Page 17: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

This superb graph was borrowed from “Fuzzy models and digital signal processing (for pattern recognition): Is this a good marriage?”, Digital Signal Processing, 3, 1993, 253-270, by my good friend Jim Bezdek.Jim Bezdek.

Exp

ect

ati

on

1965 1970 1975 1980 1985 1993

Naive euphoria

Peak of hype

Overreaction to immature technology

Depth of Cynicism

True user benefit

Asymptote of reality

Page 18: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Exp

ect

ati

on

1965 1970 1975 1980 1985 1993

Naive euphoria

Peak of hype

Overreaction to immature technology

Depth of Cynicism

True user benefit

Asymptote of reality

So where are we?

Page 19: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Exp

ect

ati

on

1978 2008 2008 2008 2008 2008

Naive euphoria

Peak of hype

Overreaction to immature technology

Depth of Cynicism

True user benefit

Asymptote of reality

11 22 33 44 55

So where are we?

Page 20: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

To make the matter worse...

Expert 1:Expert 1: J. Ghosh

Forum:Forum: 3rd International Workshop on Multiple Classifier Systems, 2002 (invited lecture)

Quote:Quote: “... our current understandingcurrent understanding of ensemble-type multiclassifier systems is now quite matureis now quite mature...”

Expert 2:Expert 2: T.K. Ho

Forum:Forum: Invited book chapter, 2002

Quote:Quote: “Many of the above questions are there because we do not yet have a scientific understandingwe do not yet have a scientific understanding of the classifier combination mechanisms”

half full

half empty

Page 21: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

2000 2002 2004 2006 20080

50

100

150

200

250

3001. Classifier ensembles

2. AdaBoost – (1)

3. Random Forest – (1) – (2)

4. Decision Templates – (1) – (2) – (3)

Number of publications (13 Nov 2008)

inco

mp

lete

for

20

08

Page 22: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

2000 2002 2004 2006 20080

100

200

300

400

500 1. Classifier ensembles

2. AdaBoost – (1)

3. Random Forest – (1) – (2)

4. Decision Templates – (1) – (2) – (3)

Number of publications (13 Nov 2008)

inco

mp

lete

for

20

08

Literature

“One cannot embrace the unembraceable.”

Kozma Prutkov

Page 23: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

ICPR 2008984 papers~2000 words in the titlesfirst 2 principal components

imagesegmentfeature

videotrackobject

featurelocalselect

classifier ensembles

Page 24: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Exp

ect

ati

on

1978 2008 2008 2008 2008 2008

Naive euphoria

Peak of hype

Overreaction to immature technology

Depth of Cynicism

True user benefit

Asymptote of reality

11 22 33 44 55

So where are we?

still here… somewhere…

Page 25: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

2. Fiction2. Fiction

Page 26: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Fiction?

Diversity. Diverse ensembles are better ensembles? Diversity = independence?

Adaboost.“The best off-the-shelf classifier”?

Page 27: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

If these reports differ from one another, the computer identifies the two reports with the greatest overlap and produces a "majority report," taking this as the accurate prediction of the future.

But the existence of majority reports implies the existence of a "minority report."

- a science fiction short story by Philip K. Dick first published in 1956. It is about a future society where murders are prevented through the efforts of three mutants (“precogs”) who can see two weeks ahead in the future. The story was made into a popular film in 2002.

Each of the three “precogs” generates its own report or prediction. The three reports of are analysed by a computer.

Minority Report

Classifier E

nsemble

Page 28: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

And, of course, the most interesting case is when the classifiers disagree – the minority report.

Diversity is goodWrong Correct3 classifiers

individual accuracy = 10/15 = 0.667

independent classifiersensemble accuracy (majority vote)= 11/15 = 0.733

identical classifiersensemble accuracy (majority vote)= 10/15 = 0.667

Page 29: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

dependent classifiers 1ensemble accuracy (majority vote)= 7/15 = 0.467

dependent classifiers 2ensemble accuracy (majority vote)= 15/15 = 1.000

Myth: Independence is the best scenario.Myth: Diversity is always good.

identical 0.667independent 0.733dependent 1 0.467dependent 2 1.000

worse than individualbetter than independence

Page 30: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Example

The set-up • UCI data repository• “heart” data set• First 9 features; all 280 different partitions into [3, 3, 3] • Ensemble of 3 linear classifiers• Majority vote• 10-fold cross-validation

What we measured:

• Individual accuracies of the ensemble members• The ensemble accuracy• The ensemble diversity (just one of all these measures…)

Page 31: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

0.4 0.5 0.6 0.7 0.8 0.90.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

ensemble is better

Ense

mble

acc

ura

cy

Individual accuracy

average individual accuracy

minimum individual accuracy

maximum individual accuracy

Example

280 ensembles

Page 32: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

0.8

Example

diversity

Ense

mble

acc

ura

cy ?

more diverse

less

acc

ura

te

Page 33: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Example

0.660.68

0.70.72

0.74

0

0.2

0.4

0.6

0.80.72

0.74

0.76

0.78

0.8

0.82

IndividualDiversity

Ens

embl

e

Page 34: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.66

0.67

0.68

0.69

0.7

0.71

0.72

0.73

Example

diversity

Indiv

idual acc

ura

cy

expected large ensemble accuracy

large

Page 35: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

AdaBoost is everything

AdaBoost

AdaBoost

AdaBoost

AdaBoost

AdaBoostAdaBoost

AdaBoost

Bagging

Swiss Army Knife

Russian Army Knife

Surely, there is Surely, there is more to more to

combining combining classifiers than classifiers than

Bagging and Bagging and AdaBoostAdaBoost

Page 36: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

“This altogether gives a very bad impression of ill-conceived experiments and confusing and unreliable conclusions. ... The current spotty conclusions are The current spotty conclusions are incomprehensible, and are of no generalization incomprehensible, and are of no generalization or reference value.or reference value.”

“This is a potentially great new method and any experimental analysis would be very useful for understanding its potential. Good study, with very Good study, with very useful information in the Conclusions.useful information in the Conclusions.”

Example – Rotation Forest

Page 37: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

20 40 60 80 1000

20

40

60

80

100

Rotation Forest

Random Forest

Boosting

Bagging

Ensemble size

% of data sets (out of 32) where the respective ensemble method is best

Page 38: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

So, no,AdaBoost is NOT everything

Page 39: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

3. Faults3. Faults

Page 40: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

OUR faults!

Complacent: We don’t care about terminology.

Vain: To get publications, we invent complex models for simple problems or, worse even, complex non-existent problems.

Untidy: There is little effort to systemise the area.

Ignorant and lazy: By virtue of ignorance we tackle problems well and truly solved by others. Krassi’s motto “I don’t have time to read papers because I am busy writing them”.

Haughty: Simple things that work do not impress us until they get proper theoretical proofs.

Page 41: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

God, seeing what the people were doing, gave each person a different language

to confuse them and scattered the people throughout the earth…

image taken from http://en.wikipedia.org/wiki/Tower_of_Babel

•Pattern recognition land

•Data mining kingdom

•Machine learning ocean

•Statistics underworld and…

•Weka…

Terminology

Page 42: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

instance

example

observation

data point

object

attribute

feature

variable

classifier

hypothesis

SVM

SMO

nearest neighbour

lazy learner

classifier ensemble

learner

naïve Bayes

AODE

decision tree

C4.5

J48

Page 43: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

SVM SMO

nearest neighbour lazy learner

classifier hypothesislearner

naïve Bayes AODE

decision tree C4.5 J48

instance example observationdata pointobject

attributefeature variable

classifier ensemble meta learner

ML

Stats

Weka

Page 44: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]classifier fusion [Cho95,Gader96,Grabisch92,Keller94,Bloch96]mixture of experts [Jacobs91,Jacobs95,Jordan95,Nowlan91]committees of neural networks [Bishop95,Drucker94]consensus aggregation [Benediktsson92,Ng92,Benediktsson97]voting pool of classifiers [Battiti94]dynamic classifier selection [Woods97]composite classifier systems [Dasarathy78]classifier ensembles [Drucker94,Filippi94,Sharkey99]bagging, boosting, arcing, wagging [Sharkey99]modular systems [Sharkey99]collective recognition [Rastrigin81,Barabash83]stacked generalization [Wolpert92]divide-and-conquer classifiers [Chiang94]pandemonium system of reflective agents [Smieja96] change-glasses approach to classifier selection [KunchevaPRL93]etc.

Out of fashion

Out of fashion

Subsumed Subsumed

Classifier ensembles - names

Page 45: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

combination of multiple classifiers [Lam95,Woods97,Xu92,Kittler98]

classifier ensembles [Drucker94,Filippi94,Sharkey99]

United terminology! Yey!

MCS – Multiple Classifier Systems Workshops 2000-2009

Page 46: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Simple things that work…

We detest simple things that work well for an unknown reason!!!

Page 47: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

flagship of THEORY…

Ideal scenario…

empirics and applications

hijacked by heuristics…

HEURISTICS

Real

theory

Simple things that work…

We detest simple things that work well for an unknown reason!!!

Page 48: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Lessons from the past:

Fuzzy sets •stability of the system?•reliability?•optimality?•why not probability?

Who cares?...•temperature for washing machine programmes•automatic focus in digital cameras•ignition angle of internal combustion in cars

Because it is•computationally simpler (faster)•easier to build, interpret and maintain

Learn to trust heuristics and empirics…

Page 49: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

4. Future4. Future

Page 50: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Exp

ect

ati

on

1978 2008

Asymptote of reality

Future Branch out ?

Multiple instance learning

Non i.i.d. examples

Skewed class distributions

Noisy class labels

Sparse data

Non-stationary data

classifier ensembles for changing environments classifier ensembles for change detection

Page 51: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

D.J. HandClassifier Technology and the Illusion of ProgressStatistical Science 21(1), 2006, 1-14.

“… I am not suggesting that no major advances in classification methods will ever be made. Such a claim would be absurd in the face of developments such as the bootstrap and other resampling approaches, which have led to significant advances in classification and other statistical models. All I am saying is that much of the purported advance may well be illusory.”...

So have we truly made progress or are we just kidding ourselves?

Empty-y-y…

(not even half empty-y-y-y …)

Page 52: Classifier Ensembles: Facts, Fiction, Faults and Future Ludmila I Kuncheva School of Computer Science Bangor University, Wales, UK.

Bo-o-ori-i-i-ing....Bo-o-ori-i-i-ing....