[IEEE Gesture Recognition (FG 2011) - Santa Barbara, CA, USA (2011.03.21-2011.03.25)] Face and...

Facial Feature Fusion and Model Selection for Age Estimation

Cuixian Chen, Wankou Yang, Yishi Wang, Karl Ricanek

University of North Carolina Wilmington

KhoaLuu

Concordia University

[email protected] {Chene, yangw, wangy, ricanekk}@uncw.edu

Abstract

Automatic face age estimation is challenging due to its

complexity owing to genetic difference, behavior and en

vironmental factors, the dynamics of facial aging between

different individuals, etc. In this work we propose to fuse

the global facial feature extracted from Active Appear

ance Model (AAM) and the local facial features extracted

from Local Binary Pattern (LBP), as the representation of

faces. Furthermore, we introduce an advanced age esti

mation system combining feature fusion and model selec

tion schemes such as Least Angle Regression (ZAR) and

sequential approaches. Due to the fact that different fa

cial feature representations may come with various types

of measurement scales, we compare multiple normalization

schemes for both facial features. We demonstrate that the

feature fusion with model selection can achieve significant

improvement in age estimation over single feature represen

tation alone. Our experiment on multi-ethnicity UIUC-PAL

database suggests that age estimation with feature fusion

and model selection outpeiforms the single feature, or the

full feature model.

1. Introduction

Human faces contain important information, such as

gender, race, mood, and age [2]. Face age estimation has

attracted great attentions recently in both research commu

nities and industries, due to its significant role in human

computer interaction (HCI) , surveillance monitoring, and

biometrics. However, there are many intrinsic and extrin

sic factors which make it very difficult to predict the ages of

human subjects from their face images accurately. The in

trinsic factors include genetics, ethnicity, gender, and health

conditions. The extrinsic factors include makeup, acces

sories, facial hair, and the variation of expression, pose and

illumination. Furthermore, a face image of size nl x n2

is generally represented by a vector with dimensionality of

nl x n2. It is still a challenging topic to reduce the di

mensionality significantly and effectively from the original

image space.

1.1. Prior work

Recently, Van et al. [21] proposed the patch-kernel re

gression (PKR) to study the human face age estimation and

head pose estimation. Guo et al. [9] studied both manifold

learning to extract face aging features and local adjustment

for age estimation. Ricanek et al. [17] proposed a robust

regression approach for automatic face age estimation, by

employing Least Angle Regression (LAR) [6] for subset

features selection. Chen et al. [4] studied an age estima

tion system tuned by model selection that outperforms all

prior systems on the FG-NET face database. Most of the

aforementioned publications on age estimation share simi

lar ideas: after facial features are extracted from images, a

dimension reduction method is applied to map the original

vectors into a lower dimensional subspace. Then all or part

of the components of the transformed vectors are used to

construct a statistical model.

Cootes et al. [5] proposed the Active Appearance Model

(AAM) that described a statistical model of face shape and

texture. It is a popular facial descriptor which makes use

of the Principle Components Analysis (PCA) in a multi

factored way for dimension reduction while maintaining

important structure (shape) and texture elements of face im

ages. As pointed by Mark [13], shapes are accounted for

the major changes during ones younger years, while wrin

kles and other textural pattern variations are more promi

nent during ones older years . Since AAM extracts both

shape and texture facial features, it is appropriate to use

AAM in the age estimation system for feature acquisition.

However, the adoption of PeA's in AAM can muddle im

portant features because it attempts to maintain the greatest

variance while creating orthogonal-projection vectors.

Van et al. [21] and Guo et al. [11] show that local fea

tures can be more robust against small misalignment, varia

tion in pose and lightings. On the other hand, the Local Bi

nary Pattern (LBP) [15] operator is a popular local feature

based descriptor due to its robustness against variation in

pose or illumination than holistic methods. Therefore, ap

plying LBP on the shape-normalized patch can take both

advantages of shape model and local features.

Each feature representation has its advantages and dis-

advantages. So does the facial representation from either

AAM or from LBP, which has its inherent strengths, and

also its limitation and weakness. Fusing two feature repre

sentation with model selection could be a potential way to

get an effective age estimation system. Hence, the fusion of

global and local facial features are investigated in this study.

1.2. Contribution of work

In this work we examine the fusion of the global facial

feature extracted from AAM and the local facial features ex

tracted from LBP, to improve age estimation performance.

The proposed framework is shown in Figure 1. Our ex

periment results suggest that feature fusion achieves better

accuracy over a single feature representation consistently.

This work demonstrates the need to perform feature se

lection for the fused features. The dimension reduction

methods provide us the transformed features, with coordi

nates arranged in a certain order. These transformed fea

tures may not be all useful to build an efficient age estima

tion model. Even though one may include all covariates in

the model to achieve a low bias, it can generate a large vari

ance that deteriorates the accuracy of the estimation. How

ever, most age estimation algorithms overlook this process

by using all possible features [8, 20, 19, 9, 10, 18]. In this

work, we propose to use the model selection: LAR and

Sequential selection methods on the fused features, which

produces an more effective and computational efficient age

estimation system. This work investigates different nor

malization methods on single/fused facial feature represen

tation to further improve the performances. We evaluate

our approaches for age estimation with the multil-ethnicity

UIVe-PAL image database.

The organization of this paper is laid out as follows: Sec

tion 2 presents the techniques of normalization and model

selection methods. The experiment results on proposed ap

proaches are presented in Section 3; and conclusions are

drawn in final section of this paper.

2. Techniques of dimension reduction and

model selection

Features from images consist of locations and gray lev

els. By using the AAM model, the original features are nor

malized. However, the normalized features are still highly

correlated, and thus it is difficult to build an efficient model

based on these features. Model selection method is neces

sary since it can greatly reduce the dependency among the

covariates, while still containing important normalized fea

tures.

2.1. Feature Normalization

Let x and y be the original and the normalized feature

vectors, respectively

Min-Max(MM): This method maps the original feature

vector to the range [0,1] or [-1, 1] as follows:

(max(y) - min(y)) * (x - min(x)) + . ( ) y= mmy,

max (x) - min(x)

where min(·) and max(·) are the operators of finding the

minimum and maximum of a feature vector respectively. If min(y) = 0 and max(y) = 1, then the mapping range is

[0,1]. Otherwise, if min(y) = -1 and max(y) = 1, then

the mapping range is [-1,1].

Z-score (ZS): This method transforms the original fea

ture vector to a vector with mean 0 and standard deviation

of 1 as follows:

x - mean(x) y = std(x)

where mean(x) and std(x) are the mean and standard de

viation of x respectively.

Normalization (Norm): This method transforms the

original feature vector to a vector with mean 0 and unit

length as follows:

x - mean(x) y= , Ilx - mena(x) I I

where mean(·) and 11· 11 are the operators of mean and norm

respectively.

2.2. Local Binary Pattern (LBP)

Ojala and Pietikinen proposed LBP [15] which is widely

used in texture descriptor. It encodes the difference between

center pixel and its surrounding ones in a circular sequence

manner. It characterizes the local spatial structure of image

in (1).

where

N-l fN,R(Pc) = L S(Pi - Pc)2i,

i=O

{ 1 if x;::: 0, s(x) = 0 : ifx < O.

(1)

Pi is one of the N neighbor pixels around the center pixel

Pc, on a circle or square of radius R. An illustration of the

basic LBP is shown in Fig2. The LBP favors its usage as a

feature descriptor, due to its tolerance against illumination

changes and computational simplicity.

We use LBP histogram (Uniform Patterns with 59 Bins)

to describe the images. First, we get the shape normalized

patch by using AAM; Second, we divide the image into

m*n sub-region; Third, we calculate the LBP histogram of

each sub-region and concatenate the LBP histograms to get

a global description of the image; Four, since the dimen

sionality of the concatenated LBP histogram is m * n * 59

201

Figure 1. Framework for age estimation using facial feature fusion and model selection.

5 9 1 Threshold 4 4 6 7 2 3

Binary: 11010011

Figure 2. The basic LBP operator.

and very large, we use PCA [7] to reduce the dimensionality

of the concatenated LBP histogram to 150 by preserving energy about 95%. Fig 3 shows the framework of LBP feature extraction.

Figure 3. LBP feature extraction.

2.3. Least angle regression

Least Angle Regression (LAR) [6] generates a sequence of regression models with one new variable added in each step. Let y be the response, and denote Xi, i = 1, . . . ,m, as the standardized predictors with mean 0 and standard deviation 1. The strategy of LAR is: At the initial step, set the residual r = Y - y, where y = 2::1 Yi/m, and let the regression coefficients /31 = /32 = . . . = /3m = O. Then we find the predictor Xj which is the most correlated variable to the response (age of a subject). LAR moves the regression

coefficient /3j continuously toward its least square coeffi

cient < X j ,r > until some other variable X k has the same

correlation with the current residual as Xj and the process is paused. After the second variable Xk added into the ac-

tive set, their coefficients are moved together in a direction

defined by their joint least square coefficient of the current

residual on (x j , X k ) , until a third predictor x I, has as much correlation with the current residual. Repeat in this way until all m predictors are selected into the model. As a consequence, we select a subset of ordered coordinates (features).

3. Experiment

In this section we shall systematically evaluate the effectiveness of applying global and local facial feature fusion

with model selection methods.

3.1. Face aging database

The UIVC Productivity Aging Laboratory (UIVC-PAL) face database [14] is selected for this experiment due to its quality of images and diversity of ancestry. Only the frontal

images with neutral facial expression are selected for our age estimation algorithm. It contains 540 images with ages ranging from 18 to 93 years old. (See Figure 4 for sample images.) It is worth mentioning that UIVC-PAL is a multiethnicity adult database, which contains African-American,

Asian, Caucasian, Hispanic and Indian.

Figure 4. UIUC-PAL Sample images: African-American, Asian,

Caucasian, Hispanic, Indian.

3.2. Performance measure

The performance of age estimation is measured by the mean absolute error (MAE) and the cumulative score (CS).

The MAE is defined as the average of the absolute errors

between the estimated ages and the observed ages, i.e., MAE = 2:�lliii - ail/N, where iii is the estimated age

for the i-th test image, ai is the corresponding observed age, and N is the total number of test images.

202

MAEs (year) of different normalization methods on globaI/local features on UIUe-PAL database

AAMl AAM:.:,<I AAM'I AAMa LBp1 LBp:':,<I LBp'l LBpa

MAE. 6.47 6.96 6.84 6.96 16.07 7.70 7.74 7.83

Std. 0.69 0.79 0.87 0.78 1.91 0.61 0.60 0.68

#-Var 200 87 84 87 1 80 80 40

Total-Var 230 230 230 230 150 150 150 150 Table 1. MAEs of different nonnalizatlOn methods on smgle feature representatlOn on the UIUC-PAL database. Note: Type 1 means no

scaling; Type 2 means to use MinMax to map each covariate into range[O, 1]; Type 3 means to use MinMax to map each covariate into

range[-l, 1]; Type 4 means to standardize each covariate into a vector with mean 0 and unit variance; Type 5 means to nonnalize each

covariate into a vector with mean 0 and unit length.

MAEs (year) of different feature fusion and model selection algorithms on UIVe-PAL database

ab'1.L ab<lL ab'lL aboL ab'1.S ab",jS ab'lS aboS II b:l.aS b",jaS b'laS bOaS MAE. 6.18 6.31 7.60 6.15 5.65 5.93 6.71 6.00 6.80 7.18 7.74 6.17

SE. 0.72 0.63 0.74 0.95 0.68 0.67 0.67 0.90 0.80 0.80 0.60 0.97

#-Var 116 126 32 369 265 257 267 301 351 369 80 348 Total-Var 380 380 380 380 380 380 380 380 380 380 380 380

Table 2. MAEs (year) of different algonthms on the UIUC-PAL database. Note: a represents AAM With no scalmg, and b' represents

LBpi respectively (see definitions and details in Table 1). Furthermore, abi L means to use fusion of AAM and LBP with LAR algorithm

for model selection; abi S means to concatenate AAM and LBP features into a vector and then use sequential selection; biaS means to

concatenate LBP and AAM features into a vector and then use sequential selection.

3.3. Experiment setups

In UIVe-PAL database, each image is annotated with

161 landmarks as shown in [16]. The annotated faces with shape and texture information are presented to the AAM system to obtain the encoded appearance features, a set of transformed features with dimension size 230. Here the AAM-Library tool [1] is utilized to implement the AAM system. Meanwhile the shape-free patch is also extracted

from the annotated faces via the Active Shape Model provided by the AAM-Library tool. Next, LBP operator is

applied on each shape-free image with segmentation size 5 x 5. Histograms with 59 bins are performed on each subblock. An LBP feature vector is obtained by concatenating the feature vectors on sub-blocks. Here we used 58 uni

form patterns for LBP and each uniform pattern accounts for one bin. The remaining 198 binary patterns are all put in another bin, which makes a 59-bin histogram. In the end, peA is applied to the LBP histogram pattern to get a LBP

feature vector with dimension size 150.

Due to the fact that different facial feature representations may come with different types of measurement scales, we need to consider how to find a proper way to normalize globalllocal feature to build an effective age estimation sys

tem, for either single representation or fusion of both representation. We consider four different mappings here: Min

Max-[O, 1], Min-Max-[-I,I], Z-score standardization, and

Normalization methods. We compare these four normalization schemes for eitherlboth facial feature representations

within the face age estimation framework.

If two feature representations extracted from the face images are (somewhat) independent to each other, it is reasonable to simply concatenate the two vectors into a single new

vector, provided both globaI/local features are in the same

type of measurement scale. However, due to the fact that AAM features and LBP features are representations to the same face, both feature vectors may have correlation at certain level. It becomes prominent to adopt a proper model

selection technique which can be employed to extract a reasonable number of salient features from the larger set of candidates, and partially solve the correlation problem.

LAR is selected as one of two model selection techniques in this work due to the following reasons: (1) Empirical studies have shown that LAR is an effective model

selection techniques for age estimation tasks in [17, 4]. (2)

[12] pointed out that LAR algorithm identifies the variable (predictor) which is most correlated to the evolving residu

als at each step of selection. For example, LAR selects the predictor which is the most correlated to response (true age) in the first step. The direction chosen in this fashion keeps the correlations between residuals and selected features tied

and monotonically decreasing. It may partially solve the correlation problem for the feature fusion.

For all approaches, we use SVR as the age estimation regressor. We perform a standard lO-fold cross validation to

evaluate the prediction error of the proposed normalization, fusion and model selection approaches. We use the con-

203

Figure 5. MAE curves verse number of parameters used in the regression models on PAL database

tributed package "lars" in Matlab from Karl Sjstrand for the computation of LAR, which provides an ordered sequence

of covariates entering SVR. We use the contributed package "Libsvm" [3] in Matlab for the computation of SVR. We use default parameters from Libsvm unless otherwise

mentioned.

3.4. Experiment results

In this work we systematically evaluate the performances of a total 22 different combinations of four feature normal

ization methods, with two simple feature fusion methods.

First, we compare four normalization methods with noscaling on either AAM feature or LBP feature alone for age estimation with sequential selection. The experiment re

sults are shown in Table I and Figure 5-(1) and (2). For

AAM features, no-scaling turns out to achieve the best MAE, comparing to the rest normalization methods. On the other hand, for the LBP features, Min-Max-[O,l] and Min

Max-[-l,l] methods obtain the best results. Note that for both AAM and LBP single features, the Min-Max-[O,l] and Min-Max-[-1,1] share exactly the same results, with distinct hyper-parameters for SVR. In general sense, AAM features achieve better MAEs consistently than LBP features. It sug

gests that with single facial feature representation, AAM is one of the best facial feature representations. Based on the aforementioned results, hereafter, we only adopt the orig

inal AAM feature with no scaling for further feature fusion studies. However, no-scaling method for LBP pro-

duces poor results and we will only consider the rest four

normalization methods in the feature fusion studies.

Next, we compare three possible combinations of two feature fusion methods and two model selection methods.

The experiment results are shown in Table 2 and Figure 5:(3)-(5). In the first approach, we concatenate the

AAM features with LBP features, and use LAR as the model selection method, which is denoted as abi L. It turns

out that AAM+LBP-[0,1]+LAR gives the best MAE=6.18 with 116 selected variables. In the second approach, we concatenate the AAM features with LBP features, and use sequential model selection method, which is denoted as

abi S. It turns out that AAM+LBP-[0,1]+Seq gives the best MAE=5.65 with first 265 variables. In the third approach,

we concatenate the LBP features with the AAM features, and use sequential model selection method, which is denoted as biaS. It turns out that LBP-Norm+AAM+Seq gives the best MAE=6.17 with first 348 variables.

Finally, we compare the best MAEs among all these 22 combinations on age estimation in Table 2. The results are shown in Figure 5:(6). We can see fusion of globaV10ca1 features works better than the single feature representation consistently. Under the feature fusion framework, ab2 S achieves the best MAE=5.65 and small SE=0.68, compar

ing to ab2 L and b2aS. We further study the confidence

bands of the best MAE under fusion schemes, which is

shown in 6. Even though LAR model selection method performs a little worse than sequential selection method, it

204

Figure 6. Confidence Interval for GloballLocal Feature Fusion for

VIDC-PAL Database

chooses much less variables in the final model.

4. Conclusion

In this work we evaluate the performances of a total 22 different combinations of four feature normalization methods, two simple feature fusion methods, and two model se

lection methods. Our experiment results suggest that fusion of globaillocal facial features achieve better results over sin

gle facial feature. It is interesting to find out that for AAM feature, the original feature without any scaling works the best for age estimation task. For LBP features, Min-Max works generally better than other normalization methods. For feature fusion and model selection methods, combination of AAM + LBp2 + Seq and AAM + LBp2 + LAR are the top two methods.

Further research on this work include: 1) use canonical correlation analysis to attack the dependence problem; 2)

From Figure 5: (4)-(6), we can further improve the performance by selecting part of AAM features and part of LBP features, rather than a simply fusion of concatenation.

Acknowledgment

This work is supported by the Intelligence Advanced Research Projects Activity, Federal Bureau of Investigation, and the Biometrics Task Force. The opinions, findings, and conclusions or recommendations expressed in this publica

tion are those of the authors and do not necessarily reflect the views of our sponsors.

References

[1] Aam-library. http://groups.google.com/group/asmlibrary?pli=l.

[2] A. M. Albert, K. Ricanek, and E. Patterson. A review of the

literature on the aging adult skull and face: Implications for

forensic science research and applications. Forensic Science

International, 172: 1-9, 2007.

[3] c.-c. Chang and c.-J. Lin. UBSVM: a library for

support vector machines, 2001. Software available at

http://www.csie.ntu.edu.tw/ cjlinllibsvm.

[4] C. Chen, Y. Chang, K. Ricanek, and Y. Wang. Face age

estimation using model selection. In CVPRW, pages 93 -99,

2010.

[5] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appear

ance models. Proc. ECCV, 2:484-498, 1998.

[6] B. Efron, T. Hastie, 1. Johnstone, and R. Tibshirani. Least

angle regression. Annal of Statistics, 32:407-499, 2004.

[7] K. Fukunaga. Introduction to statistical pattern recognition,

second ed Academic Press, Boston, MA, 1990.

[8] X. Geng, Z. Zhou, and K. S. Miles. Automatic age esti

mation based on facial aging patterns. IEEE Trans. PAMI,

29(12):2234-2240,2007.

[9] G.-D. Guo, Y. Fu, C. Dyer, and T. S. Huang. Image-based

human age estimation by manifold learning and locally ad

justed robust regression. IEEE Transactions on Image Pro

cessing, 17(7):1178-1188,2008.

[10] G.-D. Guo, Y. Fu, C. Dyer, and T. S. Huang. A probabilistic

fusion approach to human age prediction. SLAM'08, 2008.

[11] G.-D. Guo, G. Mu, Y. Fu, and T. S. Huang. Human age

estimation using bio-inspired features. CVPR '09, 2009.

[12] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of

Statistical Learning: Data Mining, Inference, and Predic

tion, Second Edition. Springer-Verlag, New York, 2009.

[13] L. S. Mark, J. B. Pittenger, H. Hines, C. Carello, R. E. Shaw,

and J. T. Todd. Wrinkling and head shape as coordinated

sources of age level information. Journal Perception and

Psychophysics, 27(2):117124,1980.

[14] M. Minear and D. C. Park. A lifespan database of adult facial

stimuli. Behavior Research Methods, Instruments, & Com

puters, 36:630--633, 2004.

[15] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution

gray-scale and rotation invariant texture classification with

local binary patterns. Pattern Analysis and Machine Intelli

gence, IEEE Transactions on, 24(7):971 -987, jul. 2002.

[16] E. Patterson, A. Sethuram, M. Albert, and K. Ricanek. Com

parison of synthetic face aging to age progression by forensic

sketch artist. lASTED International Conference on Visual

ization, Imaging, and Image Processing, Palma de Mallorca,

Spain, 2007.

[17] K. Ricanek, Y. Wang, C. Chen, and S. J. Simmons. General

ized multi-ethnic face age-estimation, 2009. BTAS.

[18] S. Yan, H. Wang, Y. Fun, X. Tang, and T. S. Huang. Synchro

nized submanifold embedding for person-independent pose

estimation and beyond. IEEE Transactions on Image Pro

cessing, 18(1):202 - 210,2009.

[19] S. Yan, H. Wang, T. Huang, and X. Tang. Auto-structured

regressor from uncertain labels. ICCV, 2007.

[20] S. Yan, H. Wang, T. Huang, Q. Yang, and X. Tang. Ranking

with uncertain labels. ICME, pages 96-99, 2007.

[21] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S.

205

Huang. Regression from patch-kernel. IEEE International

Conference on Pattern Recognition, 2008.

[IEEE Gesture Recognition (FG 2011) - Santa Barbara, CA, USA (2011.03.21-2011.03.25)] Face and...

Documents

Transcript of [IEEE Gesture Recognition (FG 2011) - Santa Barbara, CA, USA (2011.03.21-2011.03.25)] Face and...