Chronological Age Estimation Under the Guidance of Age ...static.tongtianta.site › paper_pdf ›...

2500 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 9, SEPTEMBER 2019

Chronological Age Estimation Under the Guidanceof Age-Related Facial Attributes

Jiu-Cheng Xie and Chi-Man Pun , Senior Member, IEEE

Abstract— Although the researches of facial attributes’ analysishave been launched for decades, the estimation of chronologicalage attribute remains a big challenge. Previous researchers havefound that some facial attributes (e.g., gender and race attributes)have close connections with the age attribute and make ageestimation under a specific condition decided by various com-binations of those age-related attributes which should be morereasonable. In this paper, we propose a generic framework basedon a convolutional neural network, which can consider differentconditions for age estimation and jointly output age and age-related facial attributes in the end. Compared with conventionalmethods, it is more efficient and universal. Besides, we viewage estimation as a special multi-class ordinal classificationproblem and use a losses combination function to optimize thepredicted probability distribution of individual age classes. Theseoperations further improve the performance of age estimation.Finally, the proposed method achieves state-of-the-art results onboth controlled and wild face datasets.

Index Terms— Chronological age estimation, facial attributesestimation, convolutional neural network, controlled and wildenvironments.

I. INTRODUCTION

HUMAN age attribute can be inferred from severalaspects. For example, the degree of tooth wear [1], [2],

human speech sounds [3], [4], skeletal development [5], etc.In addition, predicting age from the human face is also acommon approach because of its convenience and acceptableestimation accuracy. It can be involved in many applicationscenarios, such as video surveillance, precise advertising, man-machine interaction and so on. As an interdisciplinary fieldbetween computer vision and biometric analysis, facial ageestimation has attracted much research attention.

The earliest literature about performing age estimationfrom human facial images was in 1994. For an unknownface, Kwon and Lobo [6] tried to determine it as one of thefollowing three age groups: the baby, the young adult or thesenior. Later, more and more research efforts were devoted tofacial age estimation field and researchers were not contentwith predicting age groups but would like to estimate realage values. In general, age attribute has two categories: Theapparent age, which means how old the person looks like;

Manuscript received August 27, 2018; revised December 28, 2018 andFebruary 17, 2019; accepted February 18, 2019. Date of publication March 7,2019; date of current version June 5, 2019. This work was supported in part bythe Research Committee of the University of Macau under Grant MYRG2018-00035-FST and in part by the Science and Technology Development Fund ofMacau under Grant 041-2017-A1. The associate editor coordinating the reviewof this manuscript and approving it for publication was Prof. Domingo Mery.(Corresponding author: Chi-Man Pun.)

The authors are with the Department of Computer and Information Science,University of Macau, Macau 999078, China (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIFS.2019.2902823

The chronological age, which denotes the cumulative livingyear of a person from birth time. In this paper, we only focuson the chronological age.

Early research has found that facial aging process is differ-ent under variations of gender and race. Moreover, extensiveexperiments have proved that make age prediction whileignoring these influence factors will increase estimation errorapparently [7]. To alleviate the influence of gender and raceattributes, researchers proposed various solutions. For exam-ple, Guo and Mu [7] decided to make gender and race classi-fication first, then predicted age on the classified case. Later,some efforts were devoted to predicting age and gender/raceattributes simultaneously [8]–[10]. A common idea of thesemethods is to make the model explore relationships amongdifferent attributes by themselves and then leverage them toproduce attributes predictions jointly. However, these explor-ing and leveraging operations are not controllable. Therefore,the learned relationships are ambiguous and difficult to inter-pret. Apparently, making age prediction under specific genderand race condition is more interpretable. In addition, thereare many other intrinsic or extrinsic factors correlated withfacial aging process, e.g., living environments and habits andso on. Those factors also should be taken into considerationif possible when design an age estimation method. However,conventional solutions depending on dividing age estimationprocess into several steps are relatively cumbersome.

We propose a generic framework for age and age-relatedfacial attributes estimation. It is inspired by Yoo et al. [11]and Han et al. [12] but with crucial differences. Reference [12]tries to estimate various facial attributes with heterogeneity, butour main concern is age attribute whereas other attributes areaccessories. We only take those heterogenous facial attributeswhich are mutually independent and have close relations withfacial aging process into consideration and regard each combi-nation of these different attributes as one specific condition forage estimation. We hope to predict conditional ages of the faceunder every possible specific condition and then calculate anexpectation value of them as the final age estimation. Based onthis idea, we adopt a conditional multi-task learning strategylike [11] for the generic framework.

Age estimation used to be casted as a classification prob-lem [13], [14] or a regression problem [9], [15], [16]. Classi-fication approaches regard different age values as independentones and finally pick out the most probable age class thetarget face belongs to. On the other hand, regression methodsare committed to learning proper nonlinear mapping functionsbetween face images and continuous age values. However, nei-ther of the two kinds of approaches consider the correlations

1556-6013 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0003-2336-8521

https://orcid.org/0000-0003-1788-3746

XIE AND PUN: CHRONOLOGICAL AGE ESTIMATION UNDER THE GUIDANCE OF AGE-RELATED FACIAL ATTRIBUTES 2501

among those non-overlapping age values. For example, thereis a relative ordering between different ages. Recently, someresearchers try to involve the age correlations into their estima-tion methods and it brings improved performance [17]–[20].For the facial aging process, an intuitive impression is thatfacial images of the same person with close ages usually looksmore similar than those with distant ages. We decide to lever-age this correlation and cast each conditional age estimationsub-task in the proposed framework as an ordinal classificationproblem. Then the predicted probability distribution of thoseage classes should not be chaotic but be relatively ordered.

In general, the contributions of this work are three folds:1) We propose a generic framework based on convolutional

neural network (CNN), it can jointly output age andage-related facial attributes. Because of considering ageestimation under the specific condition that decided bythose age-related attributes efficiently, this frameworkcan not only improve the precision of estimated age butalso save training cost for the network at the same time.The number of those age-related attributes is adjustable.

2) We propose a losses combination function to supervisethe network. Under this supervision, the network willtend to generate a more reasonable probability distrib-ution of age classes, where the predicted probability ofeach age class should be inversely proportional to thedeviation from the ground truth age class in general.

3) Our method achieves superior results not matter oncontrolled or wild datasets.

The remainder of this paper is structured as follows.In Section II we review related works. The proposed methodis introduced in Section III. In Section IV, we first conductextensive experiments to select appropriate hyper-parametersand then evaluate the performance of the proposed approach oncontrolled and wild datasets. Next, we give some discussionsin Section V. Finally, we draw conclusions in Section VI.

II. RELATED WORK

In this section, we briefly review some related worksand more detailed summarize should refer to [21] and [22].Generally speaking, the age estimation process consists oftwo consecutive and relatively independent steps: featureextraction step and age classification/regression step. Forfeature extraction step, early research usually adopted com-mon feature extractors such as Active Appearance Model(AAM [23], Local Binary Patterns (LBP) [24] and Gaborfeature [25]. Then, it came up with some special featuredescriptors designed for facial aging. For example, AGing pat-tern Subspace (AGES) [26] and Biologically Inspired Features(BIF) [15]. Note that the BIF descriptor was widely used inmany following age estimation tasks because of its outstandingrepresentation capability for age-related information. Afterfeature extraction, classification or regression models wouldbe applied to these features for age estimation. Some typicalcombinations include BIF+SVM [13], [15], BIF+SVR [15]and BIF+KLPS [9].

In the past few years, because of its huge superiority inimage feature extraction and classification, CNN has beenemployed in many computer vision tasks including age

estimation. As it provides an end-to-end operating mode,conventional two steps for age estimation was simplifiedto only one step. Then, research attentions were graduallytransformed to explore the characteristics of age attribute andinvolved them into the designation of specific age estimationmethods. Rothe et al. [14] still viewed age estimation as amulti-class classification problem but took the expectation ofall possible age classes as the predicted age. Geng et al. [27]assigned each facial image with a label distribution so thatevery face could contribute to not only the learning of itschronological age but also the learning of adjacent ages.Niu et al. [18] observed the ordinal relation among age classesand then cast age estimation as an ordinal regression problem.Li et al. [20] brought a cumulative signal into their learningnetwork, where the signals of neighboring ages are moresimilar than those further apart. Tan et al. [28] also leveragedthe similarity of neighbor age classes and carefully designeda group encoding and decoding strategy for age estimation.Besides, some researches tried to integrate deep network mod-ule with a specific classification/regression module [16], [29].They hoped modules in the integration could make theirrespective advantages to improve age estimation performancefurther. Recently, Li et al. [30] turned their attention to thecross-population age estimation problem. For the special casewhen labeled facial images of source population are sufficientwhereas labeled faces of the target population are relativelysmall, they tried to learn an estimation model on the formerfirst and then transfer it to the latter.

Researches of alleviating gender and race influence on ageestimation have been being carried on. After comprehensiveexperiments, Guo and Mu [7] added a new phase before ageestimation: they first carried out gender and race classificationand then performed age estimation on the classified gender andrace group. Then, they promoted their research and proposedtwo methods in [8] and [9] that could realize human age,gender and race estimation simultaneously. Both approacheswere based on the multi-label regression formulation. How-ever, these proposed methods mainly focused on modificationsof feature dimensionality reduction or age function learningwhereas the extracted facial features are still hand-crafted.Yi et al. [10] addressed this limitation by first introducingCNN into age estimation task, which implemented estimationin an end-to-end way. Note that they used a multi-task learningstrategy that could output age, gender and race attributes of theface simultaneously. Unfortunately, they neglected the effectsof gender and race variations on age estimation. Wan et al. [31]realized this negligence and proposed a cascaded structureframework based on CNN. Owing to the special designedstructure, they could achieve gender, race and age attributesin three consecutive phases. Xing et al. [32] presented ahybrid multitask learning architecture which gets the finalage by fusing individual age results from classified genderand race groups. Nevertheless, a shared problem of theseremedial methods is the tedious training process: before thefinal age attribute estimation stage, they first need to trainseveral sub-networks separately for other facial attributes.Yoo et al. [11] overcame this problem by transforming thecommon multi-task learning structure into a conditional


multi-task learning version. With this new version, they lever-aged the gender classification task to assist individual ageestimation tasks under different gender conditions. On theother hand, Han et al. [12] partly tackled this problem fromanother perspective. They first used a shared module to extractfacial features and then fine-tuned them towards the optimalestimation of each heterogeneous attribute category. In thisway, they could achieve heterogeneous facial attributes witha single network. In addition, Tian and Chen [33] formulatedthe semantic relationship between human gender and age asa near-orthogonality regularizer and then incorporated it intothe estimation framework.

III. PROPOSED METHOD

In this section, first, we give some definitions of facialattributes. Then, we introduce a CNN based generic frameworkfor age and age-related facial attributes estimation. Finally,we propose a losses combination function to optimize theestimated probability distribution of individual age classes.

A. Main and Auxiliary Attributes of Face

The human facial aging process is not monotonous. In thechildhood and teenager periods, the face changes more inshape (i.e., the bone structures). After that, in adult andelder periods, more changes are about skin texture. Wrinklescome out and increase with facial aging. Considering differentchanges in different growing periods, make age estimationindependently in a limited life period should be more rea-sonable than over the whole age range. Previous researchesalso observed that for faces under different gender and raceconditions, their aging processes are apparently different. Sim-ilarly, we are expected to perform age estimation according tospecific gender and race attribute values the face has. Besides,the facial aging process can be affected by the living habits andenvironments. For example, an outdoor worker usually looksolder than an indoor worker even they are at the same age.The person who does regular physical exercise and follows ahealthy diet often looks younger than peers.

In fact, there are still many other intrinsic or extrinsic factorsaffecting the process of facial aging, and each of them connectswith an age-related facial attribute. Nevertheless, our purposeis not to enumerate all these factors but to illuminate that ageestimation should be made under specific conditions. Differentconditions can be represented by different value combinationsof these age-related facial attributes. For example, “the whiteadult male” face represents one specific condition.

Because our main purpose is to predict age from the facialimage of a human, we regard age attribute as the main attributeof the face. According to [12], facial attributes can be roughlyclassified into two categories, the correlated ones and theheterogeneous ones. In this paper, for all age-related attributes,we only focus on heterogeneous ones. In other words, thoseattributes are mutually independent. Moreover, we hope eachselected age-related attribute should have discrete attributevalues (e.g., male and female are two discrete attribute valuesof gender). It means the attribute estimation task can betreated as a classification problem. Those age-related attributes

satisfying independence and discreteness are taken as auxiliaryfacial attributes. We hope to use auxiliary attributes to assistantthe estimation of the main attribute under specific conditions.

B. Construction of Generic Framework

Let the i th sample in a human faces dataset is denotedas (xi , yi ), where xi denotes current face image and yi ∈{0, 1, . . . , K } denotes corresponding chronological age. Forxi , we select E auxiliary facial attributes and they produce

F different conditions of the face, where F =E∏

e=1and is

the number of discrete values of the attribute Ae. We takeconsideration of one specific condition f , and it can berepresented by joint attribute values (A f

1 ∩ A f2 ∩ · · · ∩ A f

E ),where A f

1 , A f2 , . . . , A f

E are respective attribute values of thoseauxiliary attributes under condition f .

According to the full probability formula in probabilitytheory, we can get the final estimated facial age by calculatingthe expectation value of these conditional ages,

yi =F∑

f =1

y fi P(x f

i )

=F∑

f =1

(y fi

E∏

e=1

P(A fe |xi )), (1)

where y fi is the predicted age of face xi assuming this image

is under the condition f and P(x fi ) is the probability that the

assumption is true. The symbol P(A fe |xi ) is the probability of

face having auxiliary attribute value A fe . For each conditional

age estimation task, it can be calculated by

y fi =

K∑

k=0

k P(x fi,k), (2)

where P(x fi,k ) is the probability of face under condition f

belonging to age class k.CNN has shown excellent performance in image feature

extraction and classification tasks, so we also adopt it in ourgeneric framework like many previous research works. Thefull architecture of our framework is given in Fig. 1. Forbetter understanding, we divide the framework into three parts:feature extraction part, feature mapping part and attributesestimation part. As the main attribute and auxiliary attributesare all about the same face, we use a shared basic CNNmodule to extract abundant facial features. The basic CNNmodule is a stack of convolutional layers. Assuming differentfacial attributes correspond to different facial features, thuswe use disconnected sub-networks to map these features.Each sub-network, no matter in main or auxiliary tasks, is asequential connection of three full-connected layers and itis linked to the shared basic CNN module. In the finalattributes estimation part, we add a softmax layer after eachsub-network to jointly obtain the probability distributions ofcorresponding attributes and then deduce the target attributesvalues. Apparently, the learning and predicting process withthe proposed framework are both end-to-end.


Fig. 1. Structure of the proposed generic framework for age and age-related facial attributes estimation.

The effectiveness of this generic framework is based on thehigh estimation accuracy of those auxiliary facial attributes.Compared with the main attribute facial age, those auxiliaryones are relatively simple. Therefore, the high predictionaccuracy of auxiliary attributes is attainable with CNN, whichhas been proved with experiments in [10] and [31]. In theframework, we multiply the output of each sub-network inmain tasks by the probability of one specific condition. Thenduring the training process, the sub-network connected withthe real condition that the sample face belongs to will bemultiplied by a high probability value, whereas other sub-networks multiplied by small probability values. Accordingto (1), the predicted conditional age of face under the truecondition takes the most weight of the final age, and this makesthe sub-network for true condition tend to learn a relatedmapping model. More general, after periods of training, eachsub-network in main tasks will learn a distinct mapping modelfor each specific condition.

Supposing the sample face xi is under the condition f1 infact, then the absolute error between ground truth age and finalpredicted age can be computed as

AEi =∣∣∣∣∣∣yi −

F∑

f =1

(yi + εfi )P(x f

i )

∣∣∣∣∣∣

=F∑

f =1

∣∣∣ε

fi

∣∣∣ P(x f

i ), (3)

where εfi is the error between predicted conditional age

y f

iby the f th sub-network in main tasks and ground truth age yi

and P(x fi ) is the predicted probability by auxiliary tasks of

sample face xi under condition f . Denoting the absolute errorbetween estimated age using single-task learning strategy andground truth age as AEi = |εi |. Then, we can achieve

AEi − AEi =F∑

f =1

∣∣∣ε

fi

∣∣∣ P(x f

i ) − |εi |

= E1i + E2

i , (4)

where

E1i =

∣∣∣ε

f1i

∣∣∣ P(x f1

i ) − |εi | /F, (5)

E2i =

F∑

f =1, f �= f1

(∣∣∣ε

fi

∣∣∣ P(x f

i ) − |εi | /F). (6)

Without the generic framework, age estimation is just acommon multi-class classification problem with single-tasklearning strategy, which is equal to one branch of main tasksin Fig. 1. On the other hand, previous researches [11], [12]have proved that multi-task learning strategy could boost theperformance of age estimation compared with single-task ver-sion. Therefore, sub-networks for age estimation in Fig. 1 tendto provide more accurate predictions than the conventionalsingle-task version, especially the sub-network under conditionf1. Besides, as the network can deduce auxiliary attributeswith high accuracy, the predicted probability value of condi-tion f1 should be much higher than those of other conditions.Considering these situations, we can get the following two


inequality relations,∣∣∣ε

f1i

∣∣∣ <

∣∣∣ε

f ( f �= f1)i

∣∣∣ < |εi |, (7)

P(x f ( f �= f1)i ) < 1/F < P(x f1

i ). (8)

Combining above inequations with (5) and (6), we can getE1

i < 0 and E2i < 0. As a result, AEi − AEi < 0. In other

words, the generic framework can provide a more accurateestimation of facial age than the conventional version whichignores various facial aging conditions.

C. Losses Combination Function

No matter for the main attribute or auxiliary attributesestimations, we all regard them as multi-class classificationtasks. Therefore, we uniformly use the softmax cross-entropyloss function to supervise the individual attribute estimationtask. For example, given a sample face xi , the loss functionused for supervising the sub-network of the condition f inmain tasks can be written as

L fi,main = − log P(x f

i,yi), P(x f

i,yi) = e

O fi,yi

K∑

k=0eO f

i,k

. (9)

Here P(x fi,yi

) is the predicted probability of this face under

its ground truth age yi and O fi,k is the kth element of the

network’s output layer. Loss function Lei,aux for individual

auxiliary attribute estimation task has similar formulas. Forthe optimization of the network parameters, the derivative ofL f

i,main with respect to O fi,k can be calculated by chain rule

as follows:

∂L fi,main

∂O fi,k

= ∂L fi,main

∂ P(x fi,yi

)

∂ P(x fi,y1

)

∂O fi,k

= − 1

P(x fi,y1

)

∂ P(x fi,y1

)

∂O fi,k

. (10)

Considering the specificity of softmax function for probabilitycalculation, the derivative of P(x f

i,yi) with respect to O f

i,k is

∂ P(x fi,yi

)

∂O fi,k

={

−P(x fi,yi

)P(x fi,k ), yi �= k

P(x fi,k )(1 − P(x f

i,k )), yi = k.

(11)

Substitute (11) into (10), then we can get

∂L fi,main

∂O fi,k

={

P(x fi,k), yi �= k

P(x fi,k) − 1, yi = k.

(12)

On the other hand, age estimation should not be simplycast as a classification problem but an ordinal classificationproblem. The individual age classes are not independent butcorrelated. For example, for the same person, the face imageat age 54 tends to look more like the face at age 52 than that atage 20. Then we expend this impression to say that faces of thesame person with small age deviation should look more similar

than those with big deviation. In other words, the similaritydegree of two facial images from the same person shouldbe almost inversely proportional to their age deviation. If wetreat age estimation as a classification problem, the expectedprobability distribution of age classes also should follow thisprinciple. However, the cross-entropy loss function used forage classification cannot satisfy this expectation. This functiononly tries to maximize the predicted probability of ground truthage class but neglects the correlations of adjacent age classes.Considering this defection, we propose the deviation loss forassistance. The deviation loss under condition f can be writtenas

L fi,dev =

K∑

k=0

(k − yi )2 P(x f

i,k ), (13)

where P(x fi,k) is the estimated probability that face xi belongs

to age class k. In this loss function, the deviation term (k−yi)2

serves as a penalty factor. Through penalizing those age classesthat far away from the ground truth age class more severely,the deviation loss will guide the network to output morereasonable probability distribution.

For back propagation, the derivative of L fi,dev with respect

to O fi,k is

∂L fi,dev

∂O fi,k

= ∂L fi,dev

∂ P(x fi, j )

∂ P(x fi, j )

∂O fi,k

= ( j − yi )2∂ P(x f

i, j )

∂O fi,k

. (14)

The derivative of P(x fi, j ) with respect to O f

i,k has a formulasimilar to (11),

∂ P(x fi, j )

∂O fi,k

={

−P(x fi, j )P(x f

i,k ), j �= k

P(x fi,k )(1 − P(x f

i,k )), j = k.(15)

Then, substitute (15) into (14) and we can get

∂L fdev

∂O fi,k

= P(x fi,k )((k − yi )

2−K∑

j=0

( j − yi)2 P(x f

i, j )). (16)

Besides, to guide the predicted age towards ground truthmore directly, we involve in another loss function based onthe absolute error between them. It can be calculated as

Li,age = |yi − yi | . (17)

The derivative of Li,age with respect to O fi,k is

∂Li,age

∂O fi,k

= ∂Lage

∂ yi

∂ yi

∂ y fi

∂ y fi

∂ P(x fi, j )

∂ P(x fi, j )

∂O fi,k

= sgn(yi − yi )P fi j

∂ P(x fi, j )

∂O fi,k

, (18)

where sgn(·) is the signum function and P fi is the conditional

probability of face under the condition f that calculated by


multiplying the outputs of all auxiliary attributes values. Aftersubstituting (15) into (18), it generates

∂Li,age

∂O fi,k

= sgn(yi − yi )P fi P(x f

i,k)(k−K∑

j=0

j P(x fi, j )). (19)

Finally, we use a total loss function which is combinedby above individual losses to supervise the proposed genericframework. It can be given as

Ltotal = 1

N

F∑

f =1

N∑

i=1

L fi,main + 1

N

E∑

e=1

N∑

i=1

Lei,aux

+ α

NLi,age + β

N

F∑

f =1

N∑

i=1

L fi,dev , (20)

where N is the mini-batch size of training samples, α andβ are two hyper-parameters. Note that loss functions L f

i,mainand Le

i,aux are both mathematically monotonic, whereas loss

functions Li,age and L fi,dev are not always monotonic. For

the optimization algorithm, it tends to be easier to searchfor optimal parameters on monotonic functions. We hopeto use the parameter search tasks of monotonic functionsto guide the tasks of non-monotonic ones. The orders ofmagnitude of these loss functions should be close. Therefore,we use two hyper-parameters to reach this. Detailed analysis ofthe hyper-parameters will be given in the following section.Fig. 2 illustrates two typical probability distributions of ageclasses, only using softmax cross-entropy loss and using thelosses combination function, respectively.

IV. EXPERIMENTS

A. Datasets

There are many facial datasets that can be used for ageestimation. On the one hand, considering the property of facialage labels in these datasets, they can be roughly classifiedinto two groups: faces with chronological age labels and faceswith apparent age labels. On the other hand, according to theshot environment of facial images, they also can be classifiedlike this: controlled datasets and wild datasets. In this paper,we only focus on face datasets with chronological age labels.Besides, the establishments of some datasets are based onweb crawler technology but without reliable data filtering(e.g., IMDB-WIKI [34], CACD [35], etc.), which inevitablyinvolved in noise age labels. We do not take those datasetsinto consideration either. Finally, we chose the followingthree datasets for our experiments: the MORPH II [36], theFG-NET [37] and the AgeDB [38]. Statistics of these datasetsare given in Fig. 3, which include age and gender distributionsof their facial images.

MORPH II dataset contains more than 50 thousand facialimages taken in the controlled environment. Each face inthis database is not only labeled with age attribute, but alsowith gender and race attributes. The age values are between16 and 77. In the experiments, we employed two widelyused protocols for evaluation. The first one was “S1-S2-S3”protocol which took the unbalanced distribution of sampleclasses into consideration. We followed [10] and [28] to split

the dataset into three non-overlapped subsets S1, S2 and S3.After splitting, the Male-Female ratio was about three andthe White-Black ratio was equal to one. The training andtesting processes were repeated twice: (a) training on S1,testing on S2 + S3 and (b) training on S2, testing on S1+ S3 [10]. The second one was five-fold, subject-exclusive,cross-validation protocol (SE).

FG-NET dataset consists of 1, 002 color or grayscale facialimages from 82 subjects. These images are taken in the wildenvironment and the age of them ranges from 0 to 69. Forevaluation, we took the widely accepted leave-one person-out(LOPO) validation protocol according to [14] and [28] andreported the average performance over 82 splits.

AgeDB dataset includes 16, 488 facial images belongingto 568 distinct subjects. It is also an “in the wild” dataset.Every image is annotated with the age and gender attribute.In addition, a manual check is applied to each image to removepossible noisy attribute labels. When evaluating on this dataset,we took the “80%-20%” protocol. Specifically, we randomlyselected 80% sample images of this database for training, andthe remained 20% images were used for testing.

B. Evaluation Metrics

The Mean Absolute Error (MAE) may be the most widelyused evaluation metric for age attribute estimation [39], [40].Here, we also used it in our experiments. The calculation ofthis metric can be formulated as

M AE = 1

M

M∑

i=1

|yi − yi | , (21)

where M is the number of faces for testing. Note that a lowervalue of MAE means better performance. Another commonmetric we used for evaluating the age estimation performanceis Cumulative Score (CS) [41], which can be calculated by

C S(l) = Ml

M× 100%. (22)

In this metric, Ml is the number of faces whose absolute errorbetween ground truth age and estimated age is not greater thanl years. Apparently, when the parameter l is fixed, a highervalue of C S(l) reveals a better performance.

For each auxiliary facial attribute estimation, we allemployed the Classification Accuracy (CA) metric to measurethe estimation performance. It can be formulated as

C A = Mtrue

M× 100%, (23)

where Mtrue is the number of faces whose estimated attributeis consistent with the ground truth.

C. Experimental Settings

For all face images, we first used MTCNN [42] to detectfive facial landmarks and then aligned faces based on them.After alignment, eyes in each face should be almost at thesame locations. If no landmarks could be detected in the face,no operation would be applied. Next, these processed andunprocessed face images were both resized into 256×256×3pixels.


Fig. 2. Estimated probability distributions of individual age classes which are supervised by (a) the softmax cross-entropy loss only and (b) the lossescombination, respectively. The ground truth age is 40.

Fig. 3. Distributions of age and gender classes in three datasets.

Data augmentation is a useful measure that can reduceoverfitting when training neural networks. In our experiments,we augmented training data in common steps: (a) randomlycropping 224 × 224 × 3 pixel regions from the resized image,(b) randomly flipping and (c) rotating image at a randomdegree within the range of [−5◦, 5◦].

For convenience, we only took gender and race auxiliaryattributes into consideration. Note that other useful age-relatedfacial attributes also can be used in our generic framework.We employed two architectures as the basic CNN modulein our framework by inheriting convolutional layers of theAlexNet [43] network or the VGG-16 [44] network. Eachnetwork was pretrained on ImageNet dataset first. For thefeature mapping part in our framework, we used three fully-connected (FC) layers for each individual attribute estimationtask. The first two FC layers have 512 channels each, and thechannel numbers of the third one are set according to specificattribute estimation tasks. For example, the last FC layer inage estimation task has 101 channels when considering theage range is [0, 100], whereas in gender estimation task thelast channel is 2. We trained the network by using stochasticgradient descent (SGD) with a batch size of 64, weight decayof 0.0005 and momentum of 0.9. The initial learning rate

was 0.001 and then decreased by a factor of 10 with theincreasing of training epochs. In total, the learning rate wasreduced 3 times. Values of two hyper-parameters α and β inthe losses combination were set at 0.1 and 0.004, respectively.We programmed within the pytorch framework and madeevaluations on a machine with an Intel i7 CPU and twoNVIDIA GTX1080 GPUs.1

D. Hyper-Parameters Selection

Refer to (20), the total loss function for the proposedgeneric framework consists of monotonic and non-monotonicindividual losses. As networks supervised by monotonic lossesusually converge more smoothly than those supervised bynon-monotonic ones, we hope to use the former to guidethe converging process of the latter. Following this idea,we added two hyper-parameters α and β for scaling sothat these losses could have similar orders of magnitude.Apparently, the selection of hyper-parameter values is notdependent on experimental dataset types but highly associatedwith corresponding loss function. Considering the FG-NETdataset has the smallest data size in three used databases, thus

1https://github.com/Xiejiu/age_estimation_generic


TABLE I

MAE RESULTS UNDER DIFFERENT COMBINATIONS OF α AND β VALUES ON ONE RANDOM SUBJECT GROUP FROM FG-NET DATASET

TABLE II

TIMES RECORDS OF TOP PERFORMANCE UNDER DIFFERENT COMBINA-TIONS OF α AND β VALUES ON EIGHT RANDOM SUBJECT GROUPS

FROM FG-NET DATASET

we decided to make parameter selection experiments on it forconvenience. Besides, we employed AlexNet network in ourgeneric framework, which further reduced computation cost.

We used a “coarse-to-fine” two steps search strategy forhyper-parameters selection. As FG-NET has 82 subjects, theirare 82 groups under the LOPO protocol. Each group containtwo parts, all faces of one specific person for testing andother people’s faces for training. In order not to impact theperformance evaluation in the later phase, here we only didexperiments on training images.

First, we roughly confirmed the value ranges for α and βrespectively in the coarse search step. Considering the functionstructure of each loss in (20), the initial value range forα and β were [0.01, 1] and [0.0001, 1] respectively. Werandomly chose one group from eighty-two groups and furtherdivided its training faces into two non-overlapped subsets,the training subset and the validation subset. We trained ourmodel with different combinations of hyper-parameter valueson the training data and validated it on the validation data.Experimental results are given in Table I. It is apparent thatwhen the value of α is set in the range of [0.1, 1] and of βis set in the range of [0.001, 0.01], our method shows betterperformance.

Then in the fine search step, these two ranges served as theinitial value ranges for each hyper-parameter, and we did gridsearch again like that in the coarse search step. The differenceis that we randomly chose eight groups in this step. As doingexperiments on all eighty-two groups would cost us hugeexperimental time, this operation could save near 90% time.

Fig. 4. CS comparison on AgeDB dataset.

Meanwhile, it still could reveal the performance tendency ofthe model under different combinations of parameter values.For each subject group, we recorded those combinations ofparameter values which could produce the top 20% resultsin all outputs. Then, we further counted the times that eachcombination ever made top performance based on previousrecords. The counting result was summarized in Table II. See-ing from this table, we can find that combinations on the leftpart tend to give higher top performance times than those onthe right part. In addition, there are two combinations that havethe highest record times. Considering the global performancetendency, we selected the combination of parameter values inthe left part of the table finally. The values of α and β in thetotal loss function were set at 0.1 and 0.004, respectively.

E. Results in Controlled Environment

To evaluate the performance of the proposed method,we first did some experiments on face dataset MORPH IIwhich consists of facial images shot in the controlled envi-ronment. As CNN showing excellent capability in featurerepresentation, age estimation methods based on it usuallyoutperform those approaches adopting conventional featureextraction operations. To be fair, we only took state-of-the-art CNN based approaches for comparison. Results withS1-S2-S3 and SE protocols are given in Tables III and IV,respectively. From the results, we can observe that: (a) Forage estimation task, the mean absolute errors (MAE) betweenestimated age and the ground truth age are all relatively small.Performance gaps between different methods are all less thanone year. However, the age estimation performance of these


TABLE III

ESTIMATION RESULTS FROM STATE-OF-THE-ART METHODS AND THE PROPOSED METHOD ON MORPH II DATASET WITH S1-S2-S3 PROTOCOL

TABLE IV

ESTIMATION RESULTS FROM STATE-OF-THE-ART METHODS AND THE

PROPOSED METHOD ON MORPH II DATASET WITH SE PROTOCOL

methods has not reached a satisfactory level. There is stillroom for improvement of age estimation in the controlledenvironment; (b) For gender and race estimation tasks, thoseapproaches which could provide gender and race attributesestimation also showed outstanding performances in classi-fication accuracy. On the other hand, their performance isnot perfect enough either; (c) For these age and age-relatedattributes estimation tasks, our method can always performwell. Note that when predicting age and gender attributes, theproposed method produced the best results.

In addition, we are curious about the regulation effectivenessof the proposed method in generating the more reasonableprobability distribution of age classes. Therefore, we followedthe setting in [45] and made comparison experiments. Specif-ically, first, we collected 20000 face images from MORPHII and randomly selected 60% of them for training and therest for testing. For the proposed method, as the regulationoperation is implemented by deviation loss, we only combinedit with cross-entropy loss as the final loss function. To beconsistent with [45], we employed the AlexNet network forfeature extraction. Then, we made an evaluation in six differenttraining set ratios and results were presented in Table V.Except for the case of training set ration is 10%, the proposedmethod always produced the lowest MAE values than otherapproaches. The lower MAE values relate to the more reason-able probability distribution of age classes. On the other hand,note that when compared with the condition when trainingD2LDL on 60% set ratio, our method can achieve comparableperformance whereas the amount of training images is only

TABLE V

MAE VALUES ON MORPH II DATASET WITH SIX DIFFERENT

TRAINING SET RATIOS

one-third of the former. Apparently, our method shows morerobust in the case of inadequate training samples.

See from Tables III and IV, we can find that the proposedmethod with VGG-16 network for feature extraction givesbetter estimation performance than the one with AlexNet.It should be owed to the excellent capability of VGG-16 in fea-ture representation. Therefore, in the following experiments,we only adopted VGG-16 network as the basic CNN modulein the proposed method.

F. Results in Wild Environment

Compared with controlled facial images, we are more inter-ested in those faces that are captured in the wild environment.So, we did more experiments on two uncontrolled datasets,the FG-NET and the AgeDB. As these two datasets neitherprovide race information of the face subject, for conveniencewe only took gender as the auxiliary attribute in our genericframework.

Results on FG-NET database are shown in Table VI. Whencomparing them with the results in Tables III and IV, an intu-itive finding is that the performance of those age estimationalgorithms all degenerated. A similar situation also occurredin gender estimation task. On the one hand, the degenerationcan be owed to the uncontrolled environment which includesvarious lighting, changing of poses and expressions in the face,etc. On the other hand, it also resulted from the small samplesnumber for network training. A typical characteristic of CNN


Fig. 5. Examples of good and bad attributes estimation results from the proposed method on controlled and wild datasets. The top two rows give some goodestimation examples whereas the bottom two rows show some bad ones.

TABLE VI

ESTIMATION RESULTS FROM STATE-OF-THE-ART METHODS AND THE

PROPOSED METHOD ON FG-NET DATASET WITH LOPO PROTOCOL

is that its outstanding capability of feature representationheavily depends on the learning process of a large amount oftraining samples. Obviously, the small size of FG-NET datasetcould not meet this requirement. As a result, these facialattributes estimation methods based on CNN haven’t learnedenough discriminative facial features, thus they generatedworse results than those on MORPH II dataset.

The performance comparison between the DEX [14] and theproposed method is noteworthy. Overall, these two methodsboth regarded age estimation as a multi-class classificationproblem. The difference is that the proposed approach adjustedthe conventional classification task according to the char-acteristic of human age. Specifically, considering the facialaging process is affected by internal and external factors,we estimated the final age by calculating the mathematicalexpectation of conditional ages under different conditions

decided by these factors. Moreover, in view of the connectionsbetween adjacent age classes, we set up a losses combinationfunction to achieve a more reasonable age probability distrib-ution. As a result, the proposed method produced the lowestMAE value.

Considering the relatively small size of FG-NET dataset,which cannot reveal the real performance of CNN based ageestimation methods, we did extra experiments on another largeand wild dataset, the AgeDB. As this dataset is relativelynew, there are few evaluation experiments of age estima-tion methods on it. Therefore, besides the proposed method,we also selected three other state-of-the-art approaches andimplemented them for comparison. Results on this dataset aregiven in Table VII and Fig. 4.

See from results in Tables VI and VII, we can find thateven provided with much more face samples for leaning andlooser training-testing protocol, age estimation methods didn’tperform better than those on FG-NET dataset. After carefullychecking in the estimation results of individual faces, we foundthat faces in AgeDB dataset are captured under the wilderenvironment (more different poses, noise and occlusions, etc.)than those in FG-NET dataset. In other words, faces in AgeDBare closer to real conditions. Obviously, the severer shootingsituations lead to the collective performance degradation ofthese estimation methods. On the other hand, our methodstill generated the lowest MAE among these approaches.However, it did not provide the best performance on the CSvalue. A reasonable explanation of this phenomenon is thatwhen those methods all gave predicted ages near the ground


TABLE VII

ESTIMATION RESULTS FROM STATE-OF-THE-ART METHODS AND THEPROPOSED METHOD ON AGEDB DATASET WITH 80%-20% PROTOCOL

Fig. 6. Validation MAE of two baseline methods and the proposed methodon MORPH II dataset.

truth, the proposed approach could provide a more accurateestimation. In addition, we show some facial attributes esti-mation examples by the proposed method in Fig. 5 for visualassessment.

V. DISCUSSIONS

Although the proposed method achieved superior perfor-mance no matter on controlled or wild face datasets, we arecurious about the reasons for its effectiveness. Therefore,we constructed two baselines for comparison. The Base-line_1 just regarded age estimation as a simple multi-classclassification problem and employed softmax cross-entropyloss function to supervise the network. It is the same asthe DEX method [14] except for the smaller FC layers(512 vs. 4096). Then we integrated the generic frameworkwith Baseline_1 and called the integration Baseline_2. Theproposed approach just replaced the original loss functionin Baseline_2 with the proposed losses combination. Forconvenience, we took S1 subset from MORPH II dataset andfurther divided it into the train set and validation set, where theratio of samples size is 9:1. Experimental results are shownin Fig. 6. Comparing results of Baseline_1 and Baseline_2,the decrease of final validation MAE value is apparent withthe latter method. It proves the effectiveness of the genericframework which considers age estimation differences undervarious conditions. The validation MAE further reduced withthe proposed method, but the degree of reduction is relativelysmall. Besides, the validation MAE line of the proposedmethod is smoother than two baselines, which means ourapproach underwent a more stable learning process. These twobenefits should be owed to the effectiveness of the proposedlosses combination function.

VI. CONCLUSIONS

In this paper, we proposed a generic framework which canleverage age-related facial attributes to assistant age estima-tion. Besides, considering the specificity of age classificationproblem, we set a losses combination function to guide theframework to get a more reasonable probability distributionof individual age classes. Our method achieved state-of-the-art results in controlled and wild face datasets. However, theestimation performance in the uncontrolled environment is notsatisfying enough. Therefore, age estimation on wild faces stillrequires continuous research attention.

ACKNOWLEDGMENT

The authors would like to thank Zichang Tan and ByungInYoo for explaining details of their research works, and HongyuPan for providing experimental protocols for comparison.

REFERENCES

[1] U. Wittwer-Backofen, “Age estimation using tooth cementum annula-tion,” in Forensic Microscopy for Skeletal Tissues: Methods and Proto-cols (Methods in Molecular Biology). New York, NY, USA: HumanaPress, 2012, pp. 129–143.

[2] Y. K. Kim, H. S. Kho, and K. H. Lee, “Age estimation by occlusal toothwear,” J. Forensic Sci., vol. 45, no. 2, pp. 303–309, Mar. 2000.

[3] M. Nishimoto, Y. Azuma, and N. Miyamoto, “Subjective age estimationusing speech sounds: Comparison with facial images,” in Proc. IEEE Int.Conf. Syst. Man Cybern., Oct. 2008, pp. 1900–1904.

[4] M. Ilyas, A. Othmani, and A. Nait-Ali, “Human age estimation usingauditory system through dynamic frequency sound,” in Proc. IEEE Int.Conf. Bio-Eng. Smart Technol. (BioSMART), Aug./Sep. 2017, pp. 1–3.

[5] M. Ruquet, B. Saliba-Serre, D. Tardivo, and B. Foti, “Estimation of ageusing alveolar bone loss: Forensic and anthropological applications,”J. Forensic Sci., vol. 60, no. 5, pp. 1305–1309, Aug. 2015.

[6] Y. H. Kwon and D. V. Lobo, “Age classification from facial images,”in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 1994,pp. 762–768.

[7] G. Guo and G. Mu, “Human age estimation: What is the influence acrossrace and gender?” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.Workshops (CVPRW), Jun. 2010, pp. 71–78.

[8] G. Guo and G. Mu, “Joint estimation of age, gender and ethnicity: CCAvs. PLS,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. (FG),Apr. 2013, pp. 1–6.

[9] G. Guo and G. Wu, “Simultaneous dimensionality reduction and humanage estimation via kernel partial least squares regression,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 657–664.

[10] D. Yi, Z. Lei, and S. Z. Li, “Age estimation by multi-scale convolutionalnetwork,” in Proc. Asian Conf. Comput. Vis. (ACCV), Nov. 2014,pp. 144–158.

[11] B. Yoo, Y. Kwak, C. Choi, J. Kim, and Y. Kim, “Deep facial age esti-mation using conditional multitask learning with weak label expansion,”IEEE Signal Process. Lett., vol. 25, no. 6, pp. 808–812, Jun. 2018.

[12] H. Han, A. K. Jain, X. Chen, F. Wang, and S. Shan, “Heterogeneous faceattribute estimation: A deep multi-task learning approach,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 40, no. 11, pp. 2597–2609, Nov. 2018.doi: 10.1109/TPAMI.2017.2738004.

[13] G. Guo, G. Wu, Y. Fu, C. Dyer, and T. Huang, “A study on automaticage estimation using a large database,” in Proc. IEEE 12th Int. Conf.Comput. Vis. (ICCV), Sep. 2009, pp. 1986–1991.

[14] R. Rothe, R. Timofte, and L. Van Gool, “Deep expectation of realand apparent age from a single image without facial landmarks,” Int.J. Comput. Vis., vol. 126, nos. 2–4, pp. 144–157, Apr. 2018.

[15] G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimationusing bio-inspired features,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jun. 2009, pp. 112–119.

[16] W. Shen, Y. Guo, Y. Wang, K. Zhao, B. Wang, and A. Yuille, “Deepregression forests for age estimation,” presented at the IEEE Int. Conf.Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA,Jun. 2018.

http://dx.doi.org/10.1109/TPAMI.2017.2738004


[17] S. Chen, C. Zhang, M. Dong, J. Le, and M. Rao, “Using ranking-CNNfor age estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR), Jul. 2017, pp. 742–751.

[18] Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua, “Ordinal regressionwith multiple output CNN for age estimation,” in Proc. IEEE Int. Conf.Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4920–4928.

[19] Z. Hu, Y. Wen, J. Wang, M. Wang, R. Hong, and S. Yan, “Facial ageestimation with age difference,” IEEE Trans. Image Process., vol. 26,no. 7, pp. 3087–3097, Jul. 2017.

[20] K. Li, J. Xing, W. Hu, and S. J. Maybank, “D2C: Deep cumulatively andcomparatively learning for human age estimation,” Pattern Recognit.,vol. 66, pp. 95–105, Jun. 2017.

[21] Y. Fu, G. Guo, and T. S. Huang, “Age synthesis and estimation via faces:A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 11,pp. 1955–1976, Nov. 2010.

[22] R. Angulu, J. R. Tapamo, and A. O. Adewumi, “Age estimation via faceimages: A survey,” Eurasip J. Image Video Process., vol. 2018, no. 1,pp. 1–35, Dec. 2018.

[23] T. F. Cootes, G. J. Edwards, and C. J. Talor, “Active appearance models,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 415–423,Jun. 2011.

[24] A. Gunay and V. V. Nabiyev, “Automatic age classification with LBP,”in Proc. IEEE 23th Int. Symp. Comput. Inf. Sci., Oct. 2008, pp. 1–4.

[25] F. Gao and H. Ai, “Face age classification on consumer images withGabor feature and fuzzy LDA method,” in Proc. Int. Conf. Biomet-rics (ICB), Jun. 2009, pp. 132–141.

[26] X. Geng, Z.-H. Zhou, and K. Smith-Miles, “Automatic age estimationbased on facial aging patterns,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 29, no. 12, pp. 2234–2240, Dec. 2007.

[27] X. Geng, C. Yin, and Z.-H. Zhou, “Facial age estimation by learningfrom label distributions,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35, no. 10, pp. 2401–2412, Oct. 2013.

[28] Z. Tan, J. Wan, Z. Lei, R. Zhi, G. Guo, and S. Z. Li, “Efficientgroup-n encoding and decoding for facial age estimation,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 40, no. 11, pp. 2610–2623, Nov. 2018.doi: 10.1109/TPAMI.2017.2779808.

[29] Z. Kuang, C. Huang, and W. Zhang, “Deeply learned rich coding forcross-dataset facial age estimation,” in Proc. IEEE Int. Conf. Comput.Vis. Workshops (ICCVW), Dec. 2015, pp. 338–343.

[30] K. Li, J. Xing, C. Su, W. Hu, Y. Zhang, and S. Maybank, “Deepcost-sensitive and order-preserving feature learning for cross-populationage estimation,” presented at the IEEE Int. Conf. Comput. Vis. PatternRecognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018.

[31] J. Wan, Z. Tan, G. Guo, S. Z. Li, and Z. Lei, “Auxiliary demographicinformation assisted age estimation with cascaded structure,” IEEETrans. Cybern., vol. 48, no. 9, pp. 2531–2541, Sep. 2018.

[32] J. Xing, K. Li, W. Hu, C. Yuan, and H. Ling, “Diagnosing deep learningmodels for high accuracy age estimation from a single image,” PatternRecognit., vol. 66, pp. 106–116, Jun. 2017.

[33] Q. Tian and S. Chen, “Joint gender classification and age estimationby nearly orthogonalizing their semantic spaces,” Image Vis. Comput.,vol. 69, pp. 9–21, Jan. 2018.

[34] The IMDB-WIKI Dataset. [Online]. Available: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

[35] B.-C. Chen, C.-S. Chen, and W. H. Hsu, “Face recognition and retrievalusing cross-age reference coding with cross-age celebrity dataset,” IEEETrans. Multimedia, vol. 17, no. 6, pp. 804–815, Jun. 2015.

[36] K. Ricanek and T. Tesafaye, “MORPH: A longitudinal image databaseof normal adult age-progression,” in Proc. IEEE Int. Conf. Autom. FaceGesture Recognit. (FG), Apr. 2006, pp. 341–345.

[37] The FG-Net Aging Database. [Online]. Available: https://fipa.cs.kit.edu/433_451.php

[38] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, andS. Zafeiriou, “AgeDB: The first manually collected, in-the-wild agedatabase,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recignit.Workshops (CVPRW), Jun. 2017, pp. 51–59.

[39] A. Lanitis, C. J. Taylor, and T. F. Cootes, “Toward automatic simulationof aging effects on face images,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 24, no. 4, pp. 442–455, Apr. 2002.

[40] A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing differentclassifiers for automatic age estimation,” IEEE Trans. Syst. Man, Cybern.B, Cybern., vol. 34, no. 1, pp. 621–628, Feb. 2004.

[41] X. Geng, Z.-H. Zhou, Y. Zhang, G. Li, and H. Dai, “Learning fromfacial aging patterns for automatic age estimation,” in Proc. 14th ACMInt. Conf. Multimedia (MM), Oct. 2006, pp. 307–316.

[42] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection andalignment using multitask cascaded convolutional networks,” IEEESignal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016.

[43] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classificationwith deep convolutional neural networks,” in Proc. 25th Int. Conf. NeuralInf. Process. Syst. (NIPS), Dec. 2012, pp. 1097–1105.

[44] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” in Proc. Int. Conf. Learn. Represent.(ICLR), Jan. 2015, pp. 1–14.

[45] Z. He et al., “Data-dependent label distribution learning for age esti-mation,” IEEE Trans. Image Process., vol. 26, no. 8, pp. 3846–3858,Aug. 2017.

[46] G. Guo, Y. Fu, C. R. Dyer, and T. S. Huang, “Image-based human ageestimation by manifold learning and locally adjusted robust regression,”IEEE Trans. Image Process., vol. 17, no. 7, pp. 1178–1188, Jul. 2008.

[47] X. Geng, Q. Wang, and Y. Xia, “Facial age estimation by adaptivelabel distribution learning,” in Proc. Int. Conf. Pattern Recognit. (ICPR),Aug. 2014, pp. 4465–4470.

[48] H. Han, C. Otto, X. Liu, and A. K. Jain, “Demographic estimation fromface images: Human vs. machine performance,” IEEE Trans. PatternAnal. Mach. Intell., vol. 37, no. 6, pp. 1148–1161, Jun. 2015.

[49] H. Pan, H. Han, S. Shan, and X. Chen, “Mean-variance loss for deepage estimation from a face,” presented at the IEEE Int. Conf. Comput.Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018.

[50] H. Liu, J. Lu, J. Feng, and J. Zhou, “Group-aware deep featurelearning for facial age estimation,” Pattern Recognit., vol. 66, pp. 82–94,Jun. 2017.

[51] H. Liu, J. Lu, J. Feng, and J. Zhou, “Label-sensitive deep metric learningfor facial age estimation,” IEEE Trans. Inf. Forensics Security, vol. 13,no. 2, pp. 292–305, Feb. 2018.

[52] H. Liu, J. Lu, J. Feng, and J. Zhou, “Ordinal deep feature learning forfacial age estimation,” in Proc. IEEE Int. Conf. Autom. Face GestureRecognit. (FG), Jun. 2017, pp. 157–164.

Jiu-Cheng Xie received the M.Sc. degree in patternrecognition and intelligent system from the NanjingUniversity of Posts and Telecommunications, China,in 2017. He is currently pursuing the Ph.D. degreewith the Department of Computer and Informa-tion Science, University of Macau, China. His cur-rent research interests include biometrics, computervision, and machine learning.

Chi-Man Pun (M’09–SM’10) received the B.Sc.and M.Sc. degrees in software engineering from theUniversity of Macau in 1995 and 1998, respectively,and the Ph.D. degree in computer science and engi-neering from The Chinese University of Hong Kongin 2002. He is currently an Associate Professorand the Head of the Department of Computer andInformation Science, University of Macau. He hasinvestigated several funded research projects. Hehas authored or co-authored more than 100 refereedscientific papers in international journals, books,

and conference proceedings. His research interests include digital imageprocessing, multimedia forensics and watermarking, pattern recognition, andcomputer vision. He is a Professional Member of the ACM. He has servedas an Editorial Member/Referee for many international journals, such as theIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLI-GENCE, the IEEE TRANSACTIONS ON IMAGE PROCESSING, and the IEEETRANSACTIONS ON INFORMATION FORENSICS AND SECURITY.

http://dx.doi.org/10.1109/TPAMI.2017.2779808

Chronological Age Estimation Under the Guidance of Age ...static.tongtianta.site › paper_pdf ›...

Documents

Transcript of Chronological Age Estimation Under the Guidance of Age ...static.tongtianta.site › paper_pdf ›...