Feature and Variable Selection in Classification

Feature and Variable Selection in Classification

Aaron Karper

University of Bern

Aaron Karper (UniBe) Feature selection 1 / 12

Why?

Why not use all the features?

OverfittingInterpretabilityComputational Complexity

Model complexity

erro

r

training errortest error


What are the options?

Ranking

Measure relevance for each feature separately.

The good:

Fast

The bad:

Xor problem.



Xor problem



Filters

Walk in featuresubset space

evaluateproxy measure

trainclassifier

The good:

Flexibility

The bad:

Suboptimalperformance



Wrappers

Walk in featuresubset space

trainclassifier

The good:

Accuracy

The bad:

Slow training



Embedded methods

Integrate feature selection into classifier.

The good:

Accuracy, trainingtime

The bad:

Lacks flexibility


What should I use?

What is the best one?

Accuracy-wise: embedded or wrapper.Complexity-wise: ranking, filters.Why not both?


Examples

Probabilistic feature selection

For model p(c |x) ∝ p(c) p(x |c)

Can be retrofitted withp(c) = p(M) p(c |M) formodel M.More degrees of freedomspread the model thin.Standard optimizationsapply.

possible data

prob

abili

ty

specific modelwide spread model


Examples

Probabilistic feature selection

Akaike information criterion every additional variable needs to explain e times asmuch data.

Bayesian information criterion Unused parameters are marginalized.Minimum descriptor length


Examples

Autoencoder

Deep neural network.Create fixed size informationbottleneck.Train to being able to reconstructoriginal data.

1000

2000

30

500

1000

2000

500

Input

Reconstruction

Bottleneck


Prediction

Predictions

Embedded methods will improve more than other approaches.Others as first step for complexity reasons.


Feature and Variable Selection in Classification

Education

Transcript of Feature and Variable Selection in Classification