[IEEE Gesture Recognition (FG 2011) - Santa Barbara, CA, USA (2011.03.21-2011.03.25)] Face and...
Transcript of [IEEE Gesture Recognition (FG 2011) - Santa Barbara, CA, USA (2011.03.21-2011.03.25)] Face and...
Facial Feature Fusion and Model Selection for Age Estimation
Cuixian Chen, Wankou Yang, Yishi Wang, Karl Ricanek
University of North Carolina Wilmington
KhoaLuu
Concordia University
[email protected] {Chene, yangw, wangy, ricanekk}@uncw.edu
Abstract
Automatic face age estimation is challenging due to its
complexity owing to genetic difference, behavior and en
vironmental factors, the dynamics of facial aging between
different individuals, etc. In this work we propose to fuse
the global facial feature extracted from Active Appear
ance Model (AAM) and the local facial features extracted
from Local Binary Pattern (LBP), as the representation of
faces. Furthermore, we introduce an advanced age esti
mation system combining feature fusion and model selec
tion schemes such as Least Angle Regression (ZAR) and
sequential approaches. Due to the fact that different fa
cial feature representations may come with various types
of measurement scales, we compare multiple normalization
schemes for both facial features. We demonstrate that the
feature fusion with model selection can achieve significant
improvement in age estimation over single feature represen
tation alone. Our experiment on multi-ethnicity UIUC-PAL
database suggests that age estimation with feature fusion
and model selection outpeiforms the single feature, or the
full feature model.
1. Introduction
Human faces contain important information, such as
gender, race, mood, and age [2]. Face age estimation has
attracted great attentions recently in both research commu
nities and industries, due to its significant role in human
computer interaction (HCI) , surveillance monitoring, and
biometrics. However, there are many intrinsic and extrin
sic factors which make it very difficult to predict the ages of
human subjects from their face images accurately. The in
trinsic factors include genetics, ethnicity, gender, and health
conditions. The extrinsic factors include makeup, acces
sories, facial hair, and the variation of expression, pose and
illumination. Furthermore, a face image of size nl x n2
is generally represented by a vector with dimensionality of
nl x n2. It is still a challenging topic to reduce the di
mensionality significantly and effectively from the original
image space.
1.1. Prior work
Recently, Van et al. [21] proposed the patch-kernel re
gression (PKR) to study the human face age estimation and
head pose estimation. Guo et al. [9] studied both manifold
learning to extract face aging features and local adjustment
for age estimation. Ricanek et al. [17] proposed a robust
regression approach for automatic face age estimation, by
employing Least Angle Regression (LAR) [6] for subset
features selection. Chen et al. [4] studied an age estima
tion system tuned by model selection that outperforms all
prior systems on the FG-NET face database. Most of the
aforementioned publications on age estimation share simi
lar ideas: after facial features are extracted from images, a
dimension reduction method is applied to map the original
vectors into a lower dimensional subspace. Then all or part
of the components of the transformed vectors are used to
construct a statistical model.
Cootes et al. [5] proposed the Active Appearance Model
(AAM) that described a statistical model of face shape and
texture. It is a popular facial descriptor which makes use
of the Principle Components Analysis (PCA) in a multi
factored way for dimension reduction while maintaining
important structure (shape) and texture elements of face im
ages. As pointed by Mark [13], shapes are accounted for
the major changes during ones younger years, while wrin
kles and other textural pattern variations are more promi
nent during ones older years . Since AAM extracts both
shape and texture facial features, it is appropriate to use
AAM in the age estimation system for feature acquisition.
However, the adoption of PeA's in AAM can muddle im
portant features because it attempts to maintain the greatest
variance while creating orthogonal-projection vectors.
Van et al. [21] and Guo et al. [11] show that local fea
tures can be more robust against small misalignment, varia
tion in pose and lightings. On the other hand, the Local Bi
nary Pattern (LBP) [15] operator is a popular local feature
based descriptor due to its robustness against variation in
pose or illumination than holistic methods. Therefore, ap
plying LBP on the shape-normalized patch can take both
advantages of shape model and local features.
Each feature representation has its advantages and dis-
advantages. So does the facial representation from either
AAM or from LBP, which has its inherent strengths, and
also its limitation and weakness. Fusing two feature repre
sentation with model selection could be a potential way to
get an effective age estimation system. Hence, the fusion of
global and local facial features are investigated in this study.
1.2. Contribution of work
In this work we examine the fusion of the global facial
feature extracted from AAM and the local facial features ex
tracted from LBP, to improve age estimation performance.
The proposed framework is shown in Figure 1. Our ex
periment results suggest that feature fusion achieves better
accuracy over a single feature representation consistently.
This work demonstrates the need to perform feature se
lection for the fused features. The dimension reduction
methods provide us the transformed features, with coordi
nates arranged in a certain order. These transformed fea
tures may not be all useful to build an efficient age estima
tion model. Even though one may include all covariates in
the model to achieve a low bias, it can generate a large vari
ance that deteriorates the accuracy of the estimation. How
ever, most age estimation algorithms overlook this process
by using all possible features [8, 20, 19, 9, 10, 18]. In this
work, we propose to use the model selection: LAR and
Sequential selection methods on the fused features, which
produces an more effective and computational efficient age
estimation system. This work investigates different nor
malization methods on single/fused facial feature represen
tation to further improve the performances. We evaluate
our approaches for age estimation with the multil-ethnicity
UIVe-PAL image database.
The organization of this paper is laid out as follows: Sec
tion 2 presents the techniques of normalization and model
selection methods. The experiment results on proposed ap
proaches are presented in Section 3; and conclusions are
drawn in final section of this paper.
2. Techniques of dimension reduction and
model selection
Features from images consist of locations and gray lev
els. By using the AAM model, the original features are nor
malized. However, the normalized features are still highly
correlated, and thus it is difficult to build an efficient model
based on these features. Model selection method is neces
sary since it can greatly reduce the dependency among the
covariates, while still containing important normalized fea
tures.
2.1. Feature Normalization
Let x and y be the original and the normalized feature
vectors, respectively
Min-Max(MM): This method maps the original feature
vector to the range [0,1] or [-1, 1] as follows:
(max(y) - min(y)) * (x - min(x)) + . ( ) y= mmy,
max (x) - min(x)
where min(·) and max(·) are the operators of finding the
minimum and maximum of a feature vector respectively. If min(y) = 0 and max(y) = 1, then the mapping range is
[0,1]. Otherwise, if min(y) = -1 and max(y) = 1, then
the mapping range is [-1,1].
Z-score (ZS): This method transforms the original fea
ture vector to a vector with mean 0 and standard deviation
of 1 as follows:
x - mean(x) y = std(x)
where mean(x) and std(x) are the mean and standard de
viation of x respectively.
Normalization (Norm): This method transforms the
original feature vector to a vector with mean 0 and unit
length as follows:
x - mean(x) y= , Ilx - mena(x) I I
where mean(·) and 11· 11 are the operators of mean and norm
respectively.
2.2. Local Binary Pattern (LBP)
Ojala and Pietikinen proposed LBP [15] which is widely
used in texture descriptor. It encodes the difference between
center pixel and its surrounding ones in a circular sequence
manner. It characterizes the local spatial structure of image
in (1).
where
N-l fN,R(Pc) = L S(Pi - Pc)2i,
i=O
{ 1 if x;::: 0, s(x) = 0 : ifx < O.
(1)
Pi is one of the N neighbor pixels around the center pixel
Pc, on a circle or square of radius R. An illustration of the
basic LBP is shown in Fig2. The LBP favors its usage as a
feature descriptor, due to its tolerance against illumination
changes and computational simplicity.
We use LBP histogram (Uniform Patterns with 59 Bins)
to describe the images. First, we get the shape normalized
patch by using AAM; Second, we divide the image into
m*n sub-region; Third, we calculate the LBP histogram of
each sub-region and concatenate the LBP histograms to get
a global description of the image; Four, since the dimen
sionality of the concatenated LBP histogram is m * n * 59
201
Figure 1. Framework for age estimation using facial feature fusion and model selection.
5 9 1 Threshold 4 4 6 7 2 3
Binary: 11010011
Figure 2. The basic LBP operator.
and very large, we use PCA [7] to reduce the dimensionality
of the concatenated LBP histogram to 150 by preserving energy about 95%. Fig 3 shows the framework of LBP feature extraction.
Figure 3. LBP feature extraction.
2.3. Least angle regression
Least Angle Regression (LAR) [6] generates a sequence of regression models with one new variable added in each step. Let y be the response, and denote Xi, i = 1, . . . ,m, as the standardized predictors with mean 0 and standard deviation 1. The strategy of LAR is: At the initial step, set the residual r = Y - y, where y = 2::1 Yi/m, and let the regression coefficients /31 = /32 = . . . = /3m = O. Then we find the predictor Xj which is the most correlated variable to the response (age of a subject). LAR moves the regression
coefficient /3j continuously toward its least square coeffi
cient < X j ,r > until some other variable X k has the same
correlation with the current residual as Xj and the process is paused. After the second variable Xk added into the ac-
tive set, their coefficients are moved together in a direction
defined by their joint least square coefficient of the current
residual on (x j , X k ) , until a third predictor x I, has as much correlation with the current residual. Repeat in this way until all m predictors are selected into the model. As a consequence, we select a subset of ordered coordinates (features).
3. Experiment
In this section we shall systematically evaluate the effectiveness of applying global and local facial feature fusion
with model selection methods.
3.1. Face aging database
The UIVC Productivity Aging Laboratory (UIVC-PAL) face database [14] is selected for this experiment due to its quality of images and diversity of ancestry. Only the frontal
images with neutral facial expression are selected for our age estimation algorithm. It contains 540 images with ages ranging from 18 to 93 years old. (See Figure 4 for sample images.) It is worth mentioning that UIVC-PAL is a multiethnicity adult database, which contains African-American,
Asian, Caucasian, Hispanic and Indian.
Figure 4. UIUC-PAL Sample images: African-American, Asian,
Caucasian, Hispanic, Indian.
3.2. Performance measure
The performance of age estimation is measured by the mean absolute error (MAE) and the cumulative score (CS).
The MAE is defined as the average of the absolute errors
between the estimated ages and the observed ages, i.e., MAE = 2:�lliii - ail/N, where iii is the estimated age
for the i-th test image, ai is the corresponding observed age, and N is the total number of test images.
202
MAEs (year) of different normalization methods on globaI/local features on UIUe-PAL database
AAMl AAM:.:,<I AAM'I AAMa LBp1 LBp:':,<I LBp'l LBpa
MAE. 6.47 6.96 6.84 6.96 16.07 7.70 7.74 7.83
Std. 0.69 0.79 0.87 0.78 1.91 0.61 0.60 0.68
#-Var 200 87 84 87 1 80 80 40
Total-Var 230 230 230 230 150 150 150 150 Table 1. MAEs of different nonnalizatlOn methods on smgle feature representatlOn on the UIUC-PAL database. Note: Type 1 means no
scaling; Type 2 means to use MinMax to map each covariate into range[O, 1]; Type 3 means to use MinMax to map each covariate into
range[-l, 1]; Type 4 means to standardize each covariate into a vector with mean 0 and unit variance; Type 5 means to nonnalize each
covariate into a vector with mean 0 and unit length.
MAEs (year) of different feature fusion and model selection algorithms on UIVe-PAL database
ab'1.L ab<lL ab'lL aboL ab'1.S ab",jS ab'lS aboS II b:l.aS b",jaS b'laS bOaS MAE. 6.18 6.31 7.60 6.15 5.65 5.93 6.71 6.00 6.80 7.18 7.74 6.17
SE. 0.72 0.63 0.74 0.95 0.68 0.67 0.67 0.90 0.80 0.80 0.60 0.97
#-Var 116 126 32 369 265 257 267 301 351 369 80 348 Total-Var 380 380 380 380 380 380 380 380 380 380 380 380
Table 2. MAEs (year) of different algonthms on the UIUC-PAL database. Note: a represents AAM With no scalmg, and b' represents
LBpi respectively (see definitions and details in Table 1). Furthermore, abi L means to use fusion of AAM and LBP with LAR algorithm
for model selection; abi S means to concatenate AAM and LBP features into a vector and then use sequential selection; biaS means to
concatenate LBP and AAM features into a vector and then use sequential selection.
3.3. Experiment setups
In UIVe-PAL database, each image is annotated with
161 landmarks as shown in [16]. The annotated faces with shape and texture information are presented to the AAM system to obtain the encoded appearance features, a set of transformed features with dimension size 230. Here the AAM-Library tool [1] is utilized to implement the AAM system. Meanwhile the shape-free patch is also extracted
from the annotated faces via the Active Shape Model provided by the AAM-Library tool. Next, LBP operator is
applied on each shape-free image with segmentation size 5 x 5. Histograms with 59 bins are performed on each subblock. An LBP feature vector is obtained by concatenating the feature vectors on sub-blocks. Here we used 58 uni
form patterns for LBP and each uniform pattern accounts for one bin. The remaining 198 binary patterns are all put in another bin, which makes a 59-bin histogram. In the end, peA is applied to the LBP histogram pattern to get a LBP
feature vector with dimension size 150.
Due to the fact that different facial feature representations may come with different types of measurement scales, we need to consider how to find a proper way to normalize globalllocal feature to build an effective age estimation sys
tem, for either single representation or fusion of both representation. We consider four different mappings here: Min
Max-[O, 1], Min-Max-[-I,I], Z-score standardization, and
Normalization methods. We compare these four normalization schemes for eitherlboth facial feature representations
within the face age estimation framework.
If two feature representations extracted from the face images are (somewhat) independent to each other, it is reasonable to simply concatenate the two vectors into a single new
vector, provided both globaI/local features are in the same
type of measurement scale. However, due to the fact that AAM features and LBP features are representations to the same face, both feature vectors may have correlation at certain level. It becomes prominent to adopt a proper model
selection technique which can be employed to extract a reasonable number of salient features from the larger set of candidates, and partially solve the correlation problem.
LAR is selected as one of two model selection techniques in this work due to the following reasons: (1) Empirical studies have shown that LAR is an effective model
selection techniques for age estimation tasks in [17, 4]. (2)
[12] pointed out that LAR algorithm identifies the variable (predictor) which is most correlated to the evolving residu
als at each step of selection. For example, LAR selects the predictor which is the most correlated to response (true age) in the first step. The direction chosen in this fashion keeps the correlations between residuals and selected features tied
and monotonically decreasing. It may partially solve the correlation problem for the feature fusion.
For all approaches, we use SVR as the age estimation regressor. We perform a standard lO-fold cross validation to
evaluate the prediction error of the proposed normalization, fusion and model selection approaches. We use the con-
203
Figure 5. MAE curves verse number of parameters used in the regression models on PAL database
tributed package "lars" in Matlab from Karl Sjstrand for the computation of LAR, which provides an ordered sequence
of covariates entering SVR. We use the contributed package "Libsvm" [3] in Matlab for the computation of SVR. We use default parameters from Libsvm unless otherwise
mentioned.
3.4. Experiment results
In this work we systematically evaluate the performances of a total 22 different combinations of four feature normal
ization methods, with two simple feature fusion methods.
First, we compare four normalization methods with noscaling on either AAM feature or LBP feature alone for age estimation with sequential selection. The experiment re
sults are shown in Table I and Figure 5-(1) and (2). For
AAM features, no-scaling turns out to achieve the best MAE, comparing to the rest normalization methods. On the other hand, for the LBP features, Min-Max-[O,l] and Min
Max-[-l,l] methods obtain the best results. Note that for both AAM and LBP single features, the Min-Max-[O,l] and Min-Max-[-1,1] share exactly the same results, with distinct hyper-parameters for SVR. In general sense, AAM features achieve better MAEs consistently than LBP features. It sug
gests that with single facial feature representation, AAM is one of the best facial feature representations. Based on the aforementioned results, hereafter, we only adopt the orig
inal AAM feature with no scaling for further feature fusion studies. However, no-scaling method for LBP pro-
duces poor results and we will only consider the rest four
normalization methods in the feature fusion studies.
Next, we compare three possible combinations of two feature fusion methods and two model selection methods.
The experiment results are shown in Table 2 and Figure 5:(3)-(5). In the first approach, we concatenate the
AAM features with LBP features, and use LAR as the model selection method, which is denoted as abi L. It turns
out that AAM+LBP-[0,1]+LAR gives the best MAE=6.18 with 116 selected variables. In the second approach, we concatenate the AAM features with LBP features, and use sequential model selection method, which is denoted as
abi S. It turns out that AAM+LBP-[0,1]+Seq gives the best MAE=5.65 with first 265 variables. In the third approach,
we concatenate the LBP features with the AAM features, and use sequential model selection method, which is denoted as biaS. It turns out that LBP-Norm+AAM+Seq gives the best MAE=6.17 with first 348 variables.
Finally, we compare the best MAEs among all these 22 combinations on age estimation in Table 2. The results are shown in Figure 5:(6). We can see fusion of globaV10ca1 features works better than the single feature representation consistently. Under the feature fusion framework, ab2 S achieves the best MAE=5.65 and small SE=0.68, compar
ing to ab2 L and b2aS. We further study the confidence
bands of the best MAE under fusion schemes, which is
shown in 6. Even though LAR model selection method performs a little worse than sequential selection method, it
204
Figure 6. Confidence Interval for GloballLocal Feature Fusion for
VIDC-PAL Database
chooses much less variables in the final model.
4. Conclusion
In this work we evaluate the performances of a total 22 different combinations of four feature normalization methods, two simple feature fusion methods, and two model se
lection methods. Our experiment results suggest that fusion of globaillocal facial features achieve better results over sin
gle facial feature. It is interesting to find out that for AAM feature, the original feature without any scaling works the best for age estimation task. For LBP features, Min-Max works generally better than other normalization methods. For feature fusion and model selection methods, combination of AAM + LBp2 + Seq and AAM + LBp2 + LAR are the top two methods.
Further research on this work include: 1) use canonical correlation analysis to attack the dependence problem; 2)
From Figure 5: (4)-(6), we can further improve the performance by selecting part of AAM features and part of LBP features, rather than a simply fusion of concatenation.
Acknowledgment
This work is supported by the Intelligence Advanced Research Projects Activity, Federal Bureau of Investigation, and the Biometrics Task Force. The opinions, findings, and conclusions or recommendations expressed in this publica
tion are those of the authors and do not necessarily reflect the views of our sponsors.
References
[1] Aam-library. http://groups.google.com/group/asmlibrary?pli=l.
[2] A. M. Albert, K. Ricanek, and E. Patterson. A review of the
literature on the aging adult skull and face: Implications for
forensic science research and applications. Forensic Science
International, 172: 1-9, 2007.
[3] c.-c. Chang and c.-J. Lin. UBSVM: a library for
support vector machines, 2001. Software available at
http://www.csie.ntu.edu.tw/ cjlinllibsvm.
[4] C. Chen, Y. Chang, K. Ricanek, and Y. Wang. Face age
estimation using model selection. In CVPRW, pages 93 -99,
2010.
[5] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appear
ance models. Proc. ECCV, 2:484-498, 1998.
[6] B. Efron, T. Hastie, 1. Johnstone, and R. Tibshirani. Least
angle regression. Annal of Statistics, 32:407-499, 2004.
[7] K. Fukunaga. Introduction to statistical pattern recognition,
second ed Academic Press, Boston, MA, 1990.
[8] X. Geng, Z. Zhou, and K. S. Miles. Automatic age esti
mation based on facial aging patterns. IEEE Trans. PAMI,
29(12):2234-2240,2007.
[9] G.-D. Guo, Y. Fu, C. Dyer, and T. S. Huang. Image-based
human age estimation by manifold learning and locally ad
justed robust regression. IEEE Transactions on Image Pro
cessing, 17(7):1178-1188,2008.
[10] G.-D. Guo, Y. Fu, C. Dyer, and T. S. Huang. A probabilistic
fusion approach to human age prediction. SLAM'08, 2008.
[11] G.-D. Guo, G. Mu, Y. Fu, and T. S. Huang. Human age
estimation using bio-inspired features. CVPR '09, 2009.
[12] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of
Statistical Learning: Data Mining, Inference, and Predic
tion, Second Edition. Springer-Verlag, New York, 2009.
[13] L. S. Mark, J. B. Pittenger, H. Hines, C. Carello, R. E. Shaw,
and J. T. Todd. Wrinkling and head shape as coordinated
sources of age level information. Journal Perception and
Psychophysics, 27(2):117124,1980.
[14] M. Minear and D. C. Park. A lifespan database of adult facial
stimuli. Behavior Research Methods, Instruments, & Com
puters, 36:630--633, 2004.
[15] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution
gray-scale and rotation invariant texture classification with
local binary patterns. Pattern Analysis and Machine Intelli
gence, IEEE Transactions on, 24(7):971 -987, jul. 2002.
[16] E. Patterson, A. Sethuram, M. Albert, and K. Ricanek. Com
parison of synthetic face aging to age progression by forensic
sketch artist. lASTED International Conference on Visual
ization, Imaging, and Image Processing, Palma de Mallorca,
Spain, 2007.
[17] K. Ricanek, Y. Wang, C. Chen, and S. J. Simmons. General
ized multi-ethnic face age-estimation, 2009. BTAS.
[18] S. Yan, H. Wang, Y. Fun, X. Tang, and T. S. Huang. Synchro
nized submanifold embedding for person-independent pose
estimation and beyond. IEEE Transactions on Image Pro
cessing, 18(1):202 - 210,2009.
[19] S. Yan, H. Wang, T. Huang, and X. Tang. Auto-structured
regressor from uncertain labels. ICCV, 2007.
[20] S. Yan, H. Wang, T. Huang, Q. Yang, and X. Tang. Ranking
with uncertain labels. ICME, pages 96-99, 2007.
[21] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S.
205
Huang. Regression from patch-kernel. IEEE International
Conference on Pattern Recognition, 2008.