CS479/679 Pattern Recognition Dr. George Bebis

56
Feature Selection Jain, A.K.; Duin, P.W.; Jianchang Mao, Statistical pattern recognition: a review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000. Section 4.2 Genetic Algorithms Chapter 7 (Duda et al.) – Section 7.5 CS479/679 Pattern Recognition Dr. George Bebis

description

Feature Selection Jain, A.K.; Duin , P.W.; Jianchang Mao, “Statistical pattern recognition: a review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000. Section 4.2 Genetic Algorithms Chapter 7 ( Duda et al.) – Section 7.5. - PowerPoint PPT Presentation

Transcript of CS479/679 Pattern Recognition Dr. George Bebis

Page 1: CS479/679 Pattern Recognition Dr. George Bebis

Feature SelectionJain, A.K.; Duin, P.W.; Jianchang Mao, “Statistical pattern

recognition: a review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.

Section 4.2

Genetic AlgorithmsChapter 7 (Duda et al.) – Section 7.5

CS479/679 Pattern RecognitionDr. George Bebis

Page 2: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection• Given a set of n features, the role of feature

selection is to select a subset of d features (d < n) in order to minimize the classification error.

• Fundamentally different from dimensionality reduction (e.g., PCA or LDA) based on feature combinations (i.e., feature extraction).

dimensionality reduction

Page 3: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection vs Dimensionality Reduction

• Dimensionality Reduction– When classifying novel patterns, all features need to be computed.– The measurement units (length, weight, etc.) of the features are

lost.

• Feature Selection– When classifying novel patterns, only a small number of features

need to be computed (i.e., faster classification).– The measurement units (length, weight, etc.) of the features are

preserved.

Page 4: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection Steps

• Feature selection is an optimization problem.

– Step 1: Search the space of possible feature subsets.

– Step 2: Pick the subset that is optimal or near-optimal with respect to some objective function.

Page 5: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection Steps (cont’d)

Search strategies– Optimum– Heuristic– Randomized

Evaluation strategies- Filter methods - Wrapper methods

Page 6: CS479/679 Pattern Recognition Dr. George Bebis

Search Strategies• Assuming n features, an exhaustive search would

require:

– Examining all possible subsets of size d.

– Selecting the subset that performs the best according to the criterion function.

• The number of subsets grows combinatorially, making exhaustive search impractical.

• In practice, heuristics are used to speed-up search but they cannot guarantee optimality.

6

n

d

Page 7: CS479/679 Pattern Recognition Dr. George Bebis

Evaluation Strategies

• Filter Methods– Evaluation is independent

of the classification algorithm.

– The objective function evaluates feature subsets by their information content, typically interclass distance, statistical dependence or information-theoretic measures.

Page 8: CS479/679 Pattern Recognition Dr. George Bebis

Evaluation Strategies

• Wrapper Methods– Evaluation uses criteria

related to the classification algorithm.

– The objective function is a pattern classifier, which evaluates feature subsets by their predictive accuracy (recognition rate on test data) by statistical resampling or cross-validation.

Page 9: CS479/679 Pattern Recognition Dr. George Bebis

Filter vs Wrapper Approaches

Page 10: CS479/679 Pattern Recognition Dr. George Bebis

Filter vs Wrapper Approaches (cont’d)

Page 11: CS479/679 Pattern Recognition Dr. George Bebis

Naïve Search

• Sort the given n features in order of their probability of correct recognition.

• Select the top d features from this sorted list.

• Disadvantage– Feature correlation is not considered.– Best pair of features may not even contain the best individual

feature.

Page 12: CS479/679 Pattern Recognition Dr. George Bebis

Sequential forward selection (SFS)(heuristic search)

• First, the best single feature is selected (i.e., using some criterion function).

• Then, pairs of features are formed using one of the remaining features and this best feature, and the best pair is selected.

• Next, triplets of features are formed using one of the remaining features and these two best features, and the best triplet is selected.

• This procedure continues until a predefined number of features are selected.

12

SFS performsbest when the optimal subset issmall.

Page 13: CS479/679 Pattern Recognition Dr. George Bebis

Example

13

Results of sequential forward feature selection for classification of a satellite image using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the features added at each iteration (the first iteration is at the bottom). The highest accuracy value is shown with a star.

features added at each iteration

Page 14: CS479/679 Pattern Recognition Dr. George Bebis

Sequential backward selection (SBS) (heuristic search)

• First, the criterion function is computed for all n features.

• Then, each feature is deleted one at a time, the criterion function is computed for all subsets with n-1 features, and the worst feature is discarded.

• Next, each feature among the remaining n-1 is deleted one at a time, and the worst feature is discarded to form a subset with n-2 features.

• This procedure continues until a predefined number of features are left.

14

SBS performsbest when the optimal subset islarge.

Page 15: CS479/679 Pattern Recognition Dr. George Bebis

Example

15

Results of sequential backward feature selection for classification of a satellite image using 28 features. x-axis shows the classification accuracy (%) and y-axis shows the features removed at each iteration (the first iteration is at the top). The highest accuracy value is shown with a star.

features removed at each iteration

Page 16: CS479/679 Pattern Recognition Dr. George Bebis

Bidirectional Search (BDS)

• BDS applies SFS and SBS simultaneously:– SFS is performed from the empty set– SBS is performed from the full set

• To guarantee that SFS and SBS converge to the same solution:– Features already selected by SFS are not

removed by SBS– Features already removed by SBS are not

added by SFS

Page 17: CS479/679 Pattern Recognition Dr. George Bebis

SFS and SBS

• The main disadvantage of SFS is that it is unable to remove features that become obsolete after the addition of other features.

• The main limitation of SBS is its inability to reevaluate the usefulness of a feature after it has been discarded.

Page 18: CS479/679 Pattern Recognition Dr. George Bebis

“Plus-L, minus-R” selection (LRS)

• A generalization of SFS and SBS– If L>R, LRS starts from the empty set and:

• Repeatedly add L features • Repeatedly remove R features

– If L<R, LRS starts from the full set and:• Repeatedly removes R features• Repeatedly add L features

Its main limitation is the lack of a theory to help predict the optimal values of L and R.

Page 19: CS479/679 Pattern Recognition Dr. George Bebis

Sequential floating selection (SFFS and SFBS)

• An extension to LRS with flexible backtracking capabilities– Rather than fixing the values of L and R, floating methods

determine these values from the data.– The dimensionality of the subset during the search can be

thought to be “floating” up and down

• There are two floating methods:– Sequential floating forward selection (SFFS) – Sequential floating backward selection (SFBS)

P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature selection, Pattern Recognition Lett. 15 (1994) 1119–1125.

Page 20: CS479/679 Pattern Recognition Dr. George Bebis

Sequential floating forward selection (SFFS)

• Sequential floating forward selection (SFFS) starts from the empty set.

• After each forward step, SFFS performs backward steps as long as the objective function increases.

Page 21: CS479/679 Pattern Recognition Dr. George Bebis

Sequential floating backward selection (SFBS)

• Sequential floating backward selection (SFBS) starts from the full set.

• After each backward step, SFBS performs forward steps as long as the objective function increases.

• SFBS algorithm is analogous to SFFS.

Page 22: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection using Genetic Algorithms (GAs)

(randomized search)

ClassifierFeature Subset

Pre-Processing

Feature Extraction

Feature Selection

(GA)

Feature Subset

GAs provide a simple, general, and powerful framework

for feature selection.

Page 23: CS479/679 Pattern Recognition Dr. George Bebis

Genetic Algorithms (GAs)

• What are GAs?– An optimization technique for searching very large spaces.– Inspired by the biological mechanisms of natural selection

and reproduction.

• Main characteristics of GAs– Global optimization technique.– Searches probabilistically using a population of structures,

each properly encoded as a string of symbols.– Uses an objective function, not derivatives.

Page 24: CS479/679 Pattern Recognition Dr. George Bebis

• Encoding scheme– Represent solutions in parameter space as finite length

strings (chromosomes) over some finite set of symbols.

Encoding

(11,6,9) (1011_ 0110 _1001) (101101101001)

Page 25: CS479/679 Pattern Recognition Dr. George Bebis

•Fitness function– Evaluates the goodness of a solution.

Fitness Evaluation

( ( ))Fitness f decode chromosome

Page 26: CS479/679 Pattern Recognition Dr. George Bebis

Searching

10010110… 10010110… 01100010… 01100010… 10100100... 10100100… 10010010… 01111001… 01111101… 10011101…

SelectionCrossover

Mutation

Current Generation Next Generation

Page 27: CS479/679 Pattern Recognition Dr. George Bebis

Selection• Probabilistically filters out solutions that perform poorly,

choosing high performance solutions to exploit.

0.1

0.9

0.01

0.01

1001 1001

1101 1101

1000 1101

0001 1101

fitness

Page 28: CS479/679 Pattern Recognition Dr. George Bebis

• Explore new solutions.• Crossover: information exchange between strings.

• Mutation: restore lost genetic material.

Crossover and Mutation

10011110 10011010

mutated bit

10011110 10010010

10110010 10111110

Page 29: CS479/679 Pattern Recognition Dr. George Bebis

Feature Selection Using GAs

• Binary encoding

• Fitness evaluation (to be maximized)

Feature#1 Feature#N

fitness=104accuracy + 0.4 #zeros

Classification accuracy using avalidation set

number offeatures

Page 30: CS479/679 Pattern Recognition Dr. George Bebis

Case Study 1: Gender Classification

• Determine the gender of a subject from facial images.– Challenges: race, age, facial expression, hair style, etc.

Z. Sun, G. Bebis, X. Yuan, and S. Louis, "Genetic Feature Subset Selection for Gender Classification: A Comparison Study", IEEE Workshop on Applications of Computer Vision, pp. 165-170, Orlando, December 2002.

Page 31: CS479/679 Pattern Recognition Dr. George Bebis

Feature Extraction Using PCA

• PCA maps the data in a lower-dimensional space using a linear transformation.

• The columns of the projection matrix are the “best” eigenvectors (i.e., eigenfaces) of the covariance matrix of the data.

Page 32: CS479/679 Pattern Recognition Dr. George Bebis

Which eigenvectors encode mostly gender information?

EV#1 EV#2 EV#3 EV#4 EV#5 EV#6

EV#8 EV#10 EV#12 EV#14 EV#19 EV#20

IDEA: Use GAs to search the space of eigenvectors!

Page 33: CS479/679 Pattern Recognition Dr. George Bebis

Dataset• 400 frontal images from 400 different people

– 200 male, 200 female

– Different races, lighting conditions, and facial expressions

• Images were registered and normalized– No hair information

– Normalized to account for different lighting conditions

Page 34: CS479/679 Pattern Recognition Dr. George Bebis

Experiments• Classifiers

– LDA

– Bayes classifier

– Neural Networks (NNs)

– Support Vector Machines (SVMs)

• Three-fold cross validation– Training set: 75% of the data

– Validation set: 12.5% of the data

– Test set: 12.5% of the data

• Comparison with SFBS

Page 35: CS479/679 Pattern Recognition Dr. George Bebis

Error Rates

ERM: error rate using top eigenvectors

ERG: error rate using GA selected eigenvectors

17.7%

11.3%

22.4%

13.3% 14.2%

9% 8.9%

4.7%

6.7%

Page 36: CS479/679 Pattern Recognition Dr. George Bebis

Ratio of Features - Information Kept

RN: percentage of eigenvectors in the feature subset selected.

RI: percentage of information contained in the eigenvector subset selected.

17.6%

38%

13.3%

31%36.4%

61.2%

8.4%

32.4%

42.8%

69.0%

Page 37: CS479/679 Pattern Recognition Dr. George Bebis

Histograms of Selected Eigenvectors

(a) LDA (b) Bayes

(d) SVMs(c) NN

Page 38: CS479/679 Pattern Recognition Dr. George Bebis

Reconstructed Images

Reconstructed faces using GA-selected EVs do not contain information about identity but disclose strong gender information!

Original images

Using top 30 EVs

Using EVs selected by SVM+GA

Using EVs selected by NN+GA

Page 39: CS479/679 Pattern Recognition Dr. George Bebis

Comparison with SFBS

Original images

Top 30 EVs

EVs selected by SVM+GA

EVs selectedby SVM+SFBS

Page 40: CS479/679 Pattern Recognition Dr. George Bebis

Case Study 2: Vehicle Detectionlow light camera

rear views

Non-vehicles class muchlarger than vehicle class.

Z. Sun, G. Bebis, and R. Miller, "Object Detection Using Feature Subset Selection", Pattern Recognition, vol. 37, pp. 2165-2176, 2004.

Ford Motor Company

Page 41: CS479/679 Pattern Recognition Dr. George Bebis

Which eigenvectors encode themost important vehicle features?

Page 42: CS479/679 Pattern Recognition Dr. George Bebis

Experiments• Training data set (collected in Fall 2001)

2102 images (1051 vehicles and 1051 non-vehicles)

• Test data sets (collected in Summer 2001)

231 images (vehicles and non-vehicles)

• SVM for classification• Three-fold cross-validation• Comparison with SFBS

Page 43: CS479/679 Pattern Recognition Dr. George Bebis

Error Rate

6.49%

Page 44: CS479/679 Pattern Recognition Dr. George Bebis

Histograms of Selected Eigenvectors

SFBS-SVM GA-SVM

Number of eigenvectors selected by SBFS: 87 (43.5% information)

Number of eigenvectors selected by GA: 46 (23% information)

Page 45: CS479/679 Pattern Recognition Dr. George Bebis

Vehicle Detection

Original

Top 50 EVs

EVs selectedby SFBS

Evs selectedby GAs

Reconstructed images using the selected feature subsets. - Lighting differences have been disregarded by the GA approach.

Page 46: CS479/679 Pattern Recognition Dr. George Bebis

Case Study 3:Fusion of Visible-Thermal IR Imagery

for Face Recognition

G. Bebis, A. Gyaourova, S. Singh, and I. Pavlidis, "Face Recognition by Fusing Thermal Infrared and Visible Imagery", Image and Vision Computing, vol. 24, no. 7, pp. 727-742, 2006.

Page 47: CS479/679 Pattern Recognition Dr. George Bebis

Visible vs Thermal IR• Visible spectrum

– High resolution, less sensitive to the presence of eyeglasses.

– Sensitive to changes in illumination and facial expression.

• Thermal IR spectrum– Robust to illumination changes and facial expressions.– Low resolution, sensitive to air currents, face heat patterns,

aging, and the presence of eyeglasses (i.e., glass is opaque to thermal IR).

Page 48: CS479/679 Pattern Recognition Dr. George Bebis

How should we fuse information from visible and thermal IR?

Feature Extraction

Fusion UsingGenetic Algorithms

Reconstruct Image

FusedImage

We have tested two feature extraction methods: - PCA and Wavelets

Page 49: CS479/679 Pattern Recognition Dr. George Bebis

Dataset

Co-registered thermal infrared and visible images.

Variations among images:

- Eyeglasses

- Illumination (front, left, right)

- Facial expression (smile, frown, surprise and vowels)

Equinox database

Page 50: CS479/679 Pattern Recognition Dr. George Bebis

Eyeglasses/Illumination Tests

ELG

ELnG

EFnG

EFG

Frontal Illuminatio

n

Glasses

No Glasses

EnG

EG

Lateral illuminatio

n

DATA

Page 51: CS479/679 Pattern Recognition Dr. George Bebis

Experiments

EG ELG EFG EnG ELnG EFnG

EG

ELG

EFG

EnG

ELnG

EFnG

no glasses

glasses

glasses

no glasse

s

database

test

Eyeglasses Tests

Illumination Tests

Page 52: CS479/679 Pattern Recognition Dr. George Bebis

Results

0102030405060708090

100

ELG

-ELn

G

EFG

-EFn

G

ELnG

-ELG

EFnG

-EFG

ELG

-EFG

EFG

-ELG

ELnG

-EFn

G

EFnG

-ELn

G

ELG

-EFn

G

EFG

-ELn

G

ELnG

-EFG

EFnG

-ELG

EG-E

nG

EnG

-EG

ELnG

-EG

EFnG

-EG

EG-E

LnG

EG-E

FnG

ELG

-EnG

EFG

-EnG

EnG

-ELG

EnG

-EFG

Presence ofeyeglasses

Illumination direction Eyeglasses andillumination direction

Eyeglasses and mixed illumnation

infrared

vis ible

fused

Page 53: CS479/679 Pattern Recognition Dr. George Bebis

Facial Expression Tests

EA

Smile, Frown & Surprise

Frontal Illuminatio

n

Lateral illuminatio

n

EF

EL

VA

Speaking Vowels

Frontal Illuminatio

n

VF

VL

Page 54: CS479/679 Pattern Recognition Dr. George Bebis

Results

0102030405060708090

100

EF-V

F

VF-

EF

EL-V

L

VL-

EL

EF-E

L

EL-E

F

VF-

VL

VL-

VF

EF-V

L

VL-

EF

VF-

EL

EL-V

F

EA-V

A

VA

-EA

Facial expression (emotionsvs. pronunciation of vocals)

Illumination direction Facial expression andillumination direction

Facialexpressionand mixedillumination

infrared

visible

fused

Page 55: CS479/679 Pattern Recognition Dr. George Bebis

Overall Results

Page 56: CS479/679 Pattern Recognition Dr. George Bebis

Overall Results