Genetic Feature Subset Selection for Gender Classification: A Comparison Study Zehang Sun, George...

34
Genetic Feature Subset Selection for Genetic Feature Subset Selection for Gender Classification: A Comparison Gender Classification: A Comparison Study Study Zehang Sun, George Bebis, Xiaojing Yuan, and Zehang Sun, George Bebis, Xiaojing Yuan, and Sushil Louis Sushil Louis Computer Vision Laboratory Computer Vision Laboratory Department of Computer Science Department of Computer Science University of Nevada, Reno University of Nevada, Reno [email protected] http://www.cs.unr.edu/CVL
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Genetic Feature Subset Selection for Gender Classification: A Comparison Study Zehang Sun, George...

Genetic Feature Subset Selection for Genetic Feature Subset Selection for Gender Classification: A Comparison Gender Classification: A Comparison

StudyStudy

Zehang Sun, George Bebis, Xiaojing Yuan, and Sushil Zehang Sun, George Bebis, Xiaojing Yuan, and Sushil LouisLouis

Computer Vision LaboratoryComputer Vision Laboratory

Department of Computer ScienceDepartment of Computer Science

University of Nevada, RenoUniversity of Nevada, Reno

[email protected]

http://www.cs.unr.edu/CVL

Gender Classification

• Problem statement– Determine the gender of a subject from facial images.

• Potential applications– Face Recognition– Human-Computer Interaction (HCI)

• Challenges– Race, age, facial expression, hair style, etc.

Gender Classification by Humans

• Humans are able to make fast and accurate gender classifications.– It takes 600 ms on the average to classify faces according to their gender (Bruce et

al.,1987).– 96% accuracy has been reported using photos of non-familiar faces without hair

information (Bruce et. al., 1993).

• Empirical evidence indicates that gender decisions are always made much faster than identity.– Computation of gender and identity might be two independent processes. – There is evidence that gender classification is carried out by a separate population

of cells in the inferior temporal cortex (Damasio et. al., 1990).

Designing a Gender Classifier

• The majority of gender classification schemes are based on supervised learning.

• Definition Feature extractionFeature extraction determines an appropriate subspace of dimensionality mm in the original feature space of dimensionality dd

(mm << dd).

Pre-Processing

Feature Extraction

Classifier

Previous Approaches

• Geometry-based– Use distances, angles, and areas among facial features.– Point-to-point distances + discriminant analysis (Burton ‘93, Fellous ‘97)– Feature-to-feature distances + HyberBF NNs (Brunelli ‘92)– Wavelet features + elastic graph matching (Wiskott ‘95)

• Appearance-based– Raw images + NNs (Cottrell ‘90, Golomb ‘91, Yen ‘94) – PCA + NNs (Abdi ‘95), – PCA + nearest neighbor (Valentin ‘97)– Raw images + SVMs (Moghaddam ‘02)

What Information is Useful for Gender Classification?

• Geometry-based approaches– Representing faces as a set of features assumes a-priori knowledge about

what are the features and/or what are the relationships between them.

– There is no simple set of features that can predict the gender of faces accurately.

– There is no simple algorithm for extracting the features automatically from images.

• Appearance-based approaches– Certain features are nearly characteristic of one sex or the other (e.g.,

facial hair for men, makeup or certain hairstyles for women).

– Easier to represent this kind of information using appearance-based feature extraction methods.

– Appearance-based features, however, are more likely to suffer from redundant and irrelevant information.

Feature Extraction Using PCA

• Feature extraction is performed by projecting the data in a lower-dimensional space using PCA.

• PCA maps the data in a lower-dimensional space using a linear transformation.

• The columns of the projection matrix are the “best” eigenvectors (i.e., eigenfaces) of the covariance matrix of the data.

Which Eigenvectors Encode Mostly Gender-Related Information?

EV#1 EV#2 EV#3 EV#4 EV#5 EV#6

EV#8 EV#10 EV#12 EV#14 EV#19 EV#20

Sometimes, it is possible to determine what features areencoded by specific eigenvectors.

Which Eigenvectors Encode Mostly Gender-Related Information? (cont’d)

• All eigenvectors contain information relative to the gender of faces, however, only the information conveyed by eigenvectors with largewith large eigenvalues can be generalized to new faces (Abdi et al, 1995).

• Removing specific eigenvectors could in fact improve performance (Yambor et al, 2000)

Critique of Previous Approaches

• No explicit feature selection is performed. – Same features used for face identification are also used for gender

classification.

• Some features might be redundant or irrelevant.– Rely heavily on the classifier.

– Classification accuracy can suffer.

– Time consuming training and classification.

Project Goal

• Improve the performance of gender classification using feature subset selection.

ClassifierFeature Subset

Pre-Processing

Feature Extraction

Feature Selection

(GA)

Feature Subset

Feature Selection

• DefinitionDefinition – Given a set of dd features, select a subset of size mm that leads to the smallest classification error.

• Filter MethodsFilter Methods– Preprocessing steps performed independent of the classification algorithm or its error

criteria.

• Wrapper MethodsWrapper Methods– Search through the space of feature subsets using the criterion of the classification

algorithm to select the optimal feature subset.

– Provide more accurate solutions than filter methods, but in general are more computationally expensive.

What constitutes a good set of features for classification?

What are the Benefits?

• Eliminate redundant and irrelevant features.• Less training examples are required.• Faster and more accurate classification.

Project Objectives

• Perform feature extractionfeature extraction by projecting the images in a lower-dimensional space using Principal Components Analysis (PCA).

• Perform feature selectionfeature selection in PCA space using Genetic Algorithms.

• Test four traditional classifiers (Bayesian, LDA, NNs, and SVMs).• Compare with traditional feature subset selection approaches (e.g.,

Sequential Backward Floating Search (SBFS)).

Genetic Algorithms (GAs) Review

• What is a GA?– An optimization technique for searching very large spaces.– Inspired by the biological mechanisms of natural selection and

reproduction.

• What are the main characteristics of a GA?– Global optimization technique.– Uses objective function information, not derivatives.– Searches probabilistically using a population of structures (i.e.,

candidate solutions using some encoding). – Structures are modified at each iteration using selectionselection,

crossovercrossover, and mutationmutation.

Structure of GA

10010110… 10010110…

01100010… 01100010…

10100100... 10100100…

10010010… 01111001…

01111101… 10011101…

Evaluation and Selection

Crossover

Mutation

Current Generation Next Genaration

Encoding and Fitness Evaluation

• Encoding scheme– Transforms solutions in parameter space into finite length strings

(chromosomes) over some finite set of symbols.

• Fitness function– Evaluates the goodness of a solution.

(11,6,9) (1011_ 0110 _1001) (101101101001)

( ( ))Fitness f decode chromosome

Selection Operator

• Probabilistically filters out solutions that perform poorly, choosing high performance solutions to exploitexploit.– Chromosomes with high fitness are copied over to the next generation.

0.1

0.9

0.01

0.01

1001 1001

1101 1101

1000 1101

0001 1101

fitness

Crossover and Mutation Operators

• Generate new solutions for explorationexploration.• Crossover

– Allows information exchange between points.

• Mutation– Its role is to restore lost genetic material.

10011110 10010010

10110010 10111110

10011110 10011010

Mutated bit

Genetic Feature Subset Selection

• Binary encoding

• Fitness evaluation

EV#1 EV#250

fitness=104accuracy +0.4 zeros

accuracy fromvalidation set

number offeatures

(search using first 250 eigenvectors)

Genetic Feature Subset Selection (cont’d)

• Cross-generational selection strategy– Assuming a population of size N, the offspring double the size

of the population, and we select the best N individuals from the combined parent-offspring population.

• GA parameters– Population size: 350– Number of generations: 400– Crossover rate: 0.66– Mutation rate: 0.04

Dataset

• 400 frontal images from 400 different people – 200 male, 200 female– Different races– Different lighting conditions– Different facial expressions

• Images were registered and normalized– No hair information– Account for different lighting conditions

Experiments

• Gender classifiers:– Linear Discriminant Analysis (LDA) – Bayes classifier – Neural Network (NN) classifier– Support Vector Machine (SVM) classifier

• Three - fold cross validation– Training set: 75% of the data– Validation set: 12.5% of the data– Test set: 12.5% of the data

Classification Error Rates

ERM: error rate using manually selected feature subsets

ERG: error rate using GA selected feature subsets

17.7%

11.3%

22.4%

13.3%

14.2%

9% 8.9%

4.7%

6.7%

Ratio of Features - Information Kept

RN: percentage of number of features in the feature subset

RI: percentage of information contained in the feature subset.

17.6%

38%

13.3%

31%36.4%

61.2%

8.4%

32.4%

42.8%

69%

Distribution of Selected Eigenvectors

(a) LDA (b) Bayes

(d) SVMs(c) NN

Reconstructed Images

Original imagesOriginal images

Using top 30 EVsUsing top 30 EVs

Using EVs selected Using EVs selected by B-PCA+GAby B-PCA+GA

Using EVs selected Using EVs selected by LDA-PCA+GAby LDA-PCA+GA

Reconstructed Images (cont’d)

Reconstructed faces using GA-selected EVs have lost information about identity but do disclosedo disclose strong gender information

Original imagesOriginal images

Using top 30 EVsUsing top 30 EVs

Using EVs selected Using EVs selected by SVM-PCA+GAby SVM-PCA+GA

Certain gender-irrelevant features do not appear in the reconstructed images using GA-selected EVs

Using EVs selected Using EVs selected by NN-PCA+GAby NN-PCA+GA

Comparison with SBFS

• Sequential Backward Floating Search (SBFSSBFS) is a combination of two heuristic search schemes:

(1) Sequential Forward Selection (SFSSFS) - starts with an empty feature set and at each set selects the best single

feature to be added to the feature subset.

(2) Sequential Backward Selection (SBSSBS). - starts with the entire feature and at each step drops the feature whose

absence least decreases the performance.

Comparison with SBFS (cont’d)

• SBFS SBFS is an advanced version of plus l - take away r method that first enlarges the feature subset by l features using forward selectionforward selection and then removes r features using backward selection.backward selection.

• The number of forward and backward steps in SBFS is dynamically controlled and updated based on the classifier’s performance.

Comparison with SBFS (cont’d)

(a) SVMs+SBFS (b) SVMs+GA

NN Bayes LDA SVMsERM 17.70% 22.38% 14.20% 0.90%ERG 11.33% 13% 9% 4.90%ERSBFS NA NA NA 6.70%

ERMERM: error rate using the manually selected feature subsets;

ERGERG: error rate using GA selected feature subsets.

ERSBFSERSBFS: error rate using SBFS

Comparison with SBFS (cont’d)

Original imagesOriginal images

Using top 30 EVsUsing top 30 EVs

Using EVs selected Using EVs selected by SVM-PCA+GAby SVM-PCA+GA

Using EVs selectedUsing EVs selectedby SVM-PCA+SBFSby SVM-PCA+SBFS

Conclusions

• We have considered the problem of gender classificationgender classification from frontal facial images using genetic feature subset genetic feature subset selectionselection.

• GAs provide a simplesimple, generalgeneral, and powerfulpowerful framework for feature subset selection.

• Very useful, especially when the number of training examples is small.

• We have tested four well-known classifiers using PCA for feature extraction.

• Genetic subset feature selection has led to lower error rates in all cases.

Future Work

• Generalize feature encoding scheme.– Use weights instead of 0/1 encoding.

• Consider more powerful fitness functions.• Use larger data sets.

– FERET data set.

• Apply feature selection using different features.– Various features (e.g., Wavelet or Gabor features)

• Experiment with different data sets.– Different data sets (e.g., vehicle detection)