UKKDD 2007 Niall Talk

download UKKDD 2007 Niall Talk

of 19

Transcript of UKKDD 2007 Niall Talk

  • 8/3/2019 UKKDD 2007 Niall Talk

    1/19

    1

    Stacking for supervised learningStacking for supervised learning

    Niall Rooney,

    NIKEL, University of Ulster

  • 8/3/2019 UKKDD 2007 Niall Talk

    2/19

    2

    Ensemble learningEnsemble learning

    l Postulate multiple hypotheses to explain thedata

    l Shortcomings of single model learning

    algorithms (Dietterich , 2002) Statistical problem Computational problem

    Representational problem

  • 8/3/2019 UKKDD 2007 Niall Talk

    3/19

    3

    Ensemble learningEnsemble learning

    l Generalization Error: Bias + VarianceBias: how close the algorithms average

    prediction is close to the target

    Variance : how much the algorithmspredictions bounces round for differenttraining sets

    a model which is too simple, or too inflexible,

    will have a large bias

    a model which has too much flexibility willhave high variance

  • 8/3/2019 UKKDD 2007 Niall Talk

    4/19

    4

    Ensemble learningEnsemble learning

    l Generalization Error: EnsemblesEnsembles reduce bias and/or variance

    Ensembles to be effective need diverse and

    accurate base modelsDiversity measured by level of variability in

    base members predictions (for regression)

  • 8/3/2019 UKKDD 2007 Niall Talk

    5/19

    5

    Ensemble learningEnsemble learning

    Homogeneous learning- data sampling, feature sampling,

    randomization, parameter settings

    Heterogeneous learning

    - Same data, different learning algorithms

  • 8/3/2019 UKKDD 2007 Niall Talk

    6/19

    6

    Ensemble LearningEnsemble Learning

    Classifier 1 Classifier 2 Classifier N. . .

    Input Features

    Combiner

    Class Predictions

    Class Prediction

  • 8/3/2019 UKKDD 2007 Niall Talk

    7/19

    7

    Ensemble learningEnsemble learning

    Methods of combination: Voting, Weighting, Selection

    Mixture of experts

    Error-correcting output codes

    Bagging

    Boosting Stacking

  • 8/3/2019 UKKDD 2007 Niall Talk

    8/19

    8

    Ensemble Learning: StackingEnsemble Learning: Stacking

    Base Model1 Base Model 2 Base Modeln

    Meta Model

    Prediction

    instance

  • 8/3/2019 UKKDD 2007 Niall Talk

    9/19

    9

    Meta Technique: SRMeta Technique: SR

    CV Meta-training set

    { }( ( ),..., ( ), ) f f y j m j j1 x x

    ...M1 M2 Mm

    instanceInstance x*

    Base

    Modelfi

    Base Predictions f1(x*) f2(x

    *) fm(x

    *)

    Combining (Meta-Level)

    model Meta-M

    Final Prediction Meta-M(f1(x* ),..., fm(x*) )

  • 8/3/2019 UKKDD 2007 Niall Talk

    10/19

    10

    Stacking for classificationStacking for classification

    Use class distributions from base classifiersrather than class predictions

    1 1 1 1{( ( | ),..., ( | ),..., ( | ),..., ( | ), )}k m m k P C x P C x P C x P C x y

    Choice of Meta-classifier:

    Multi-response linear regression

    - For a classification with m class values, mregression problems

    - Only use probabilities related to class Cj to

    predict class Cj

  • 8/3/2019 UKKDD 2007 Niall Talk

    11/19

    11

    Stacking for classificationStacking for classification

    Different type of base classifers Multi-response model trees used to

    guarantee better performance thanSelecting best classifier

  • 8/3/2019 UKKDD 2007 Niall Talk

    12/19

    12

    Stacking for regressionStacking for regression

    Linear regression requires non-negativeweights

    Model trees meta-learner

    Homogeneous Stacking using randomfeature sub-sets

    Feature sub-sets can be improved upon

    using hill-climbing or GA techniques

  • 8/3/2019 UKKDD 2007 Niall Talk

    13/19

    13

    RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels

    Classifer1

    Classifer2

    Classifer3

    Cascade Generalization

  • 8/3/2019 UKKDD 2007 Niall Talk

    14/19

    14

    RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels

    Classifer1 Classifer2 Classifer3

    Combiner Trees

    Classifer4

    T1 T2 T1 T1

    Combiner1 Combiner2

    Combiner3

    Disjoint training sets

  • 8/3/2019 UKKDD 2007 Niall Talk

    15/19

    15

    Related Techniques: DynamicRelated Techniques: DynamicIntegrationIntegration

    Meta- Level Training Set

    ...M1 M2 Mm

    instance

    x*

    Base

    Modelfi

    Base errors

    Combining model

    (Meta-level) Meta-M

    Final Prediction Meta-M( f1(x* ),..., fm(x*) )

    {(xj,Err1(xj),..,Errm(xj),yj)}

    fm(x*)f2(x

    *)f1(x*)

    Erri(x)=|fi(x)-yi|

  • 8/3/2019 UKKDD 2007 Niall Talk

    16/19

    16

    Dynamic IntegrationDynamic Integration

    Meta-M Meta Model - distance weighted k-NN

    l NN set of k nearest meta-instances

    l

    For each member find cumulative errorof each model

  • 8/3/2019 UKKDD 2007 Niall Talk

    17/19

    17

    Dynamic IntegrationDynamic Integration

    l Dynamic Selection (DS) choose the model with lowest cumulative error

    l Dynamic Weighting (DW)

    combine the models with weights based on theircumulative error

    l Dynamic Weighting with Selection (DWS)

    combine the models as DW but exclude models ifthey have larger than median cumulative error

  • 8/3/2019 UKKDD 2007 Niall Talk

    18/19

    18

    ApplicationsApplications

    l Distributed data mining

    l Intrusion detection

    l

    Concept drift

  • 8/3/2019 UKKDD 2007 Niall Talk

    19/19

    19

    Key papersKey papers

    l Wolpert, D. H.: Stacked Generalization. Neural Networks, 5(1992) 241-259

    l Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996)49-64

    l Dietterich, T. G.: Ensemble Methods in Machine Learning.Lecture Notes in Computer Science, 1857 (2000) 1-15

    l Dzeroski, S., & Zenko, B.: Is Combining Classifiers withStacking Better than Selecting the Best One? MachineLearning, 54 (2004) 255-273

    l Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization.Journal of Artificial Intelligence Research, 10 (1999) 271-289