Feature and Variable Selection in Classification
Click here to load reader
-
Upload
aaron-karper -
Category
Education
-
view
227 -
download
1
description
Transcript of Feature and Variable Selection in Classification
Feature and Variable Selection in Classification
Aaron Karper
University of Bern
Aaron Karper (UniBe) Feature selection 1 / 12
Why?
Why not use all the features?
OverfittingInterpretabilityComputational Complexity
Model complexity
erro
r
training errortest error
Aaron Karper (UniBe) Feature selection 2 / 12
What are the options?
Ranking
Measure relevance for each feature separately.
The good:
Fast
The bad:
Xor problem.
Aaron Karper (UniBe) Feature selection 3 / 12
What are the options?
Ranking
Measure relevance for each feature separately.
The good:
Fast
The bad:
Xor problem.
Aaron Karper (UniBe) Feature selection 3 / 12
What are the options?
Xor problem
Aaron Karper (UniBe) Feature selection 4 / 12
What are the options?
Filters
Walk in featuresubset space
evaluateproxy measure
trainclassifier
The good:
Flexibility
The bad:
Suboptimalperformance
Aaron Karper (UniBe) Feature selection 5 / 12
What are the options?
Wrappers
Walk in featuresubset space
trainclassifier
The good:
Accuracy
The bad:
Slow training
Aaron Karper (UniBe) Feature selection 6 / 12
What are the options?
Embedded methods
Integrate feature selection into classifier.
The good:
Accuracy, trainingtime
The bad:
Lacks flexibility
Aaron Karper (UniBe) Feature selection 7 / 12
What should I use?
What is the best one?
Accuracy-wise: embedded or wrapper.Complexity-wise: ranking, filters.Why not both?
Aaron Karper (UniBe) Feature selection 8 / 12
Examples
Probabilistic feature selection
For model p(c |x) ∝ p(c) p(x |c)
Can be retrofitted withp(c) = p(M) p(c |M) formodel M.More degrees of freedomspread the model thin.Standard optimizationsapply.
possible data
prob
abili
ty
specific modelwide spread model
Aaron Karper (UniBe) Feature selection 9 / 12
Examples
Probabilistic feature selection
For model p(c |x) ∝ p(c) p(x |c)
Can be retrofitted withp(c) = p(M) p(c |M) formodel M.More degrees of freedomspread the model thin.Standard optimizationsapply.
possible data
prob
abili
ty
specific modelwide spread model
Aaron Karper (UniBe) Feature selection 9 / 12
Examples
Probabilistic feature selection
Akaike information criterion every additional variable needs to explain e times asmuch data.
Bayesian information criterion Unused parameters are marginalized.Minimum descriptor length
Aaron Karper (UniBe) Feature selection 10 / 12
Examples
Autoencoder
Deep neural network.Create fixed size informationbottleneck.Train to being able to reconstructoriginal data.
1000
2000
30
500
1000
2000
500
Input
Reconstruction
Bottleneck
Aaron Karper (UniBe) Feature selection 11 / 12
Prediction
Predictions
Embedded methods will improve more than other approaches.Others as first step for complexity reasons.
Aaron Karper (UniBe) Feature selection 12 / 12