Analysis of Landmarks in Recognition of Face Expressionssanjiv/pubs/2011/Alug_face_11.pdf ·...

ISSN 1054�6618, Pattern Recognition and Image Analysis, 2011, Vol. 21, No. 4, pp. 681–693. © Pleiades Publishing, Ltd., 2011.

1. INTRODUCTION

The human face is a rich source of nonverbal com�munication and may provide important informationabout human thought and behavior. The facial expres�sion is a powerful and immediate means for humanbeings to communicate their emotions, intentions,and opinions to each other [7]. The communicativepower of the face makes it a focus of attention duringsocial interactions. It displays emotion, regulatessocial behavior and signals communicative intent [7,14]. A face is not only a multi�message system but alsoa multi�signal system. The signals inferred from a facecan be characterized as static (such as skin color), slow(such as permanent wrinkles), and rapid (such as rais�ing the eyebrows) [9].

The messages and signals emanating from the faceform the facial expression and provide a guide to thedisposition of a person. This makes the facial expres�sion important in the manner people conduct socialinteractions. Human beings, both consciously andunconsciously, use facial expressions to indirectlycommunicate their emotions and intentions. Hence,any system that interacts with humans needs toaccount for facial expression as a communication tool.

In recent years, there has been a growing interest inimproving all aspects of interaction between humansand computers. The emerging field of human�com�puter interaction (HCI) has been of interest to

researchers from a number of diverse fields, includingcomputer science, psychology, and neuroscience. Therecognition of facial expressions is important for aresponsive and socially interactive HCI. Besides HCI,computers with the capability to recognize facialexpressions will have a wide range of applications suchas security, law enforcement, psychiatry, and educa�tion. The recognition of facial expression forms a crit�ical component of intelligent systems based on HCI.The more complex interactions in an HCI�based sys�tem take place from a human to a computer. An auto�mated system that can determine the emotions of aperson via his/her expressions provides the systemwith the opportunity to customize its response. Forexample, a robot capable of recognizing facial expres�sions can bring a social dimension to the interactionand can be used in daily life. A system to recognize theexpression can improve the effectiveness of an on�line/distance learning system. An automobile may beable to detect fatigue in its driver and warn him/her.The recognition of facial expression can also be usedin lie detection to provide incremental validity to poly�graph examination. A long�term application of suchsystems may be in providing security in public places,where the expressions and other body language pro�vide clues to the state of the human emotion.

Although humans can easily recognize expressions,the development of automated systems to recognizefacial expressions and to infer emotions from thoseexpressions in real time is a challenging research topic.We are interested in the importance of feature points ina human face to differentiate between expressions.

APPLICATION PROBLEMS

Analysis of Landmarks in Recognition of Face Expressions1

N. Alugupallya, A. Samala, D. Marxb, and S. Bhatiac

a Department of Computer Science and Engineering, University of Nebraska�Lincoln b Department of Statistics, University of Nebraska�Lincoln

c Department of Mathematics and Computer Science, University of Missouri�St. Louise�mail: [email protected]

Abstract—Facial expression is a powerful mechanism used by humans to communicate their emotions,intentions, and opinions to each other. The recognition of facial expressions is extremely important for aresponsive and socially interactive human�computer interface. Such an interface with a robust capability torecognize human facial expressions should enable an automated system to effectively deploy in a variety ofapplications, including human computer interaction, security, law enforcement, psychiatry, and education.In this paper, we examine several core problems in face expression analysis from the perspective of landmarksand distances between them using a statistical approach. We have used statistical analysis to determine thelandmarks and features that are best suited to recognize the expressions in a face. We have used a standarddatabase to examine the effectiveness of landmark based approach to classify an expression (a) when a facewith a neutral expression is available, and (b) when there is no a priori information about the face.

Keywords: Face expressions, statistical analysis, linear discriminant analysis, face features.

DOI: 10.1134/S105466181104002X

Received October 10, 2010

1The article is published in the original.

682

PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011

ALUGUPALLY et al.

In this paper, we examine several core problems inface expression analysis to provide greater resolutionto the understanding of the problem. Specifically, wehave analyzed the effect of facial feature points on theexpressions in the face. Our approach is based on mea�surements (distances and ratios) between a set ofcanonical feature points on a face to train an efficientclassifier. We have examined the trackability of fea�tures as well as an optimal set of features for expressionrecognition in the presence or absence of a face withneutral expression. Using the information thus col�lected, we examine the effectiveness of a scheme todynamically recognize an expression in a videosequence. All these results provide greater understand�ing to develop automated systems that can dynami�cally recognize the facial expressions of humans.

2. RELATED RESEARCH

The importance of expressions in a face is wellestablished. Psychologists have grappled with theexpressions in a human face for a long time. However,the use of a computer to recognize faces and facialexpressions is a relatively recent topic of research, withmost of the work initiated in the past decade. In thissection, we’ll discuss the role and importance of facialexpressions in the field of psychology, the developmentof databases of facial expressions, and the research inrecognizing the expression in a face by a computer.

2.1. Psychology of Expressions

Psychologists have been working in the analysis andunderstanding of human facial expressions since thenineteenth century. The earliest documented researchon the subject is attributed to Darwin who hypothe�sized that there are universal facial expressions foremotions, that is, different cultures express the emo�tions on the face in the same manner [6]. However, amajority of the studies concluded that the expressionsin the face could not be attributed to the emotions[22].

More recently, the research has concentrated onthe classification of facial expressions. Ekman andFriesen [8] proposed six primary emotions each ofwhich has a prototypical facial expression involvingchanges in multiple regions of the face. These proto�typical emotional displays are also known as six basicemotions: happy, surprise, sad, fear, anger, and disgust.The next step was to create a quantifiable descriptionof these emotions. The quantifiable description is pro�vided by Facial Action Coding System (FACS), devel�oped by Ekman and Friesen [10]. FACS was designedbased on human observations to detect subtle changesin facial features.

FACS is a detailed, technical guide that explainshow to categorize facial behaviors based on the mus�cles that produce them. It relates the muscular actionto facial appearance. It uses a set of primitive measure�

ment units, called Action Units (AU) that is a combi�nation of linguistic description of muscles. The rich�ness of AUs allows for the representation of all the vis�ible facial expressions, either by an individual AU orby a combination of a set of AUs. FACS uses 44 AUsto describe the facial expressions with regard to theirlocation as well as their intensity. The intensity is fur�ther defined on a scale with five levels of magnitude,from trace to maximum. Some other relevant researchon interaction of expressions and emotions is pre�sented in [18, 43, 44].

2.2. Face Expression Databases

The best known database for facial expression anal�ysis has been developed at Carnegie Mellon Univer�sity, and is known as CMU�Pittsburgh AU�CodedFacial Expression Database, or Cohn�Kanade Data�base [19]. It provides a large, representative test�bedfor comparative studies of different approachesto facial expression analysis. The database includesapproximately 2000 image sequences from over100 subjects.

The subjects are 100 university students enrolled inintroductory psychology classes. They ranged in agefrom 18 to 30 years. The subject distribution acrossgenders is 69% female and 31% male. In terms ofracial distribution, 81% of the subjects are Euro�American, 13% are Afro�American, and 6% belong toother races. The Cohn�Kanade database was createdin an observation room equipped with a chair for thesubject and two Panasonic WV3230 cameras, eachconnected to a Panasonic S�VHS AG�7500 videorecorder with a Horita synchronized time�code gener�ator. The cameras were located directly in front of thesubject, Some other databases are described in refer�ences [26, 47].

2.3. Automated Face Expression Analysis

Most of the early research in facial expression anal�ysis was primarily in the field of psychology. One of theearliest references in expression analysis dates back to1978, when Suwa et al. [39] presented a preliminaryinvestigation in expression analysis using an imagesequence. The automatic analysis of facial expressionsis a complex task as physiognomies of faces vary con�siderably from one individual to another due to differ�ent age, ethnicity, gender, facial hair, and occludingobjects such as glasses and hair. A detailed survey ofmany different aspects of face expression analysis hasbeen presented by Pantic and Rothkrantz [30]. Herewe briefly summarize three important stages of auto�mated facial expression analysis: (a) face tracking,(b) facial expression data extraction, and (c) expres�sion classification.

Face Tracking: The face tracking stage starts withthe automatic detection of the face in the frame underconsideration. The detection method should be able to


ANALYSIS OF LANDMARKS IN RECOGNITION OF FACE EXPRESSIONS 683

locate faces in complex scenes, possibly with clutteredbackground. Some expression analysis methodsrequire the exact position of the face to extract featuresof interest whereas some other methods can work withan approximate location of the face.

Tracking is more like the face detection problem,but in a dynamic environment. Hong et al. used thePersonSpotter system to perform real�time tracking offaces [17, 38]. The PersonSpotter system detects a faceand creates a bounding box around it. Then, it obtainsthe exact dimensions of the face by fitting a labeledgraph onto the bounding box. Another technique tolocate the faces was developed by Essa and Pentland[12] using the view�based and modular eigenspacemethod developed by Pentland et al. [33].

Facial Expression Data Extraction: The dataextraction for facial expressions takes place after theface is successfully tracked. This step can be catego�rized into two classes: feature�based and template�based data extraction. The feature�based data extrac�tion methods use texture and spatial information ofthe features in a face, such as eyes, mouth, eyebrowsand nose, to classify facial expression. The template�based methods use 2�D or 3�D models of head andface as templates to extract information that can beused to classify the expression. Essa et al. proposed a3�D facial model augmented by anatomically�basedmuscles [12]. They used a Kalman filter in correspon�dence with optical flow computation to extract muscleaction in order to form a new model of facial action.Tomasi and Kanade [42] developed a feature trackerbased on the matching measure known as sum ofsquared intensity differences (SSD), using a transla�tion model. This was followed by an affine transforma�tion model by Shi and Tomasi [37]. The trackerdescribed by Shi and Tomasi [37] is based on earlierwork by Lucas, Kanade, and Tomasi [24, 42], and iscommonly known as KLT tracker. The KLT trackerlocates the features by examining the minimumEigenvalue of gradient matrices. The features are thentracked by using the Newton�Raphson method tominimize the difference between two frames, using asmall window around the location of point in the twoconsecutive frames.

Expression Classification: Expression classificationis the final step performed in a facial expression recog�nition system. The classification is done either(a) directly from the face or its features or (b) by rec�ognizing the actions units (AUs) first and then usingthe FACS [10] to classify the expressions.

FACS�based Classification: In FACS�based classifi�cation, a system uses temporal information in terms ofAUs to discriminate between the expressions. Suchsystems may use optical flow to track the motion offacial features such as eyes, brows, nose, and mouth ina rectangular bounding box. This system requires themanual initialization of all the bounding boxes sur�

rounding the facial features at the outset. It then per�forms the tracking of features for all frames from neu�tral to full�blown expression. Seyedarabi et al. [35]developed a facial expression recognition system toclassify expressions based on facial features using twoclassifiers: neural networks and Fuzzy Inference Sys�tem. Essa et al. developed an optical flow based systemto recognize action units from facial motions [12].Pantic and Rothkrantz [31] describe a rule�based sys�tem to classify expressions based on 30 feature points.A total to 31 AU classes are considered in the system.Lien et al. use a Hidden Markov Model (HMM) basedmodel to recognize the AUs. [23]. El Kaliouby andRobinson use a multi�level dynamic Bayesian networkclassifier to model complex mental states [11]. Mentalstates include agreement, concentrating, disagree�ment, interested, thinking, and unsure. Support vectormachines and variants have been used widely to clas�sify facial expressions [20, 27, 46]. Kotsia and Pitas[21] use geometric deformations in some candidatenotes in a face during face expressions to train a mul�ticlass support vector machine to classify the expres�sions. Valstar and Pantic describe a hybrid model thatcombines a support vector machine and an HMMclassifier to recognize the AUs [45]. The hybrid modelperformed better than a system that uses only an SVM.Tian et al. track the lips and use neural networks toidentify 16 AUs in face image sequences [41]. Panticand Patras use a particle filtering approach to recog�nize 27 AUs by tracking the movements of featurepoints in the face [28, 29]. An information fusionapproach using Dynamic Bayesian networks (DBNs)to model the temporal dynamics of facial expressionsis described in [16, 50]. The recognition of facialexpressions is accomplished by fusing the informationfrom the current as well as past observations. Bartletet al. present early work in detection of spontaneousfacial expressions using a FACS based system [2].They use support vector machines and AdaBoost topredict the intensity of the AUs in each frame of thevideo.

Non�FACS Based Classification: Gokturk et al. [15]proposed a novel 3D model�based face tracker, whichdoes not require the user to pose in a 3�D position. Inthis approach, pose and shape characteristics are fac�tored into two separate signature vectors throughtracking. The facial pose is recognized using monocu�lar 3�D tracking of the face in each frame. Using atrained support vector machine depending on thepose, the shape vector is passed to the machine and theexpression is recognized. Cohen et al. [5] used narveshould be naive Bayes classifier to recognize the emo�tions through facial expressions displayed in a videosequence. A wire frame model with 12 facial motionmeasurements based on the model proposed by Taoand Huang [40] is used for facial expression recogni�tion. The 12 features correspond to the magnitude ofthe 12 facial motion measurements defined in the facemodel and the combination of these features define

684


ALUGUPALLY et al.

the 7 basic classes of facial expressions, including neu�tral. Training and testing of the classifier is done withboth person�dependent and independent mannerusing 6 subjects. Zhao et al. [52] use a fuzzy kernelalong with support vector machines to classify the sixbasic expressions. Bindu et al. [3] use a 3�D cognitivemodel to represent expressions with positive and neg�ative reinforcers. Yeasin et al. [49] use a discrete HMMto train the characteristics of the expressions. Ander�son and McOwen [1] have described a real�time sys�tem for facial expressions in dynamic and clutteredscenes using optical flow. Motion signatures areobtained from the face and classified using supportvector machines. Displacements of a set of featurepoints are used by Zhou et al. [53] to classify expres�sions as well. Yang et al. proposed a framework forvideo�based facial expression recognition [48]. Theyuse a HAAR like feature as face representation andcluster the features for different expressions to derivetemporal patterns. They use a boosting approach todesign classifiers. They show the effectiveness of theapproach using Cohn�Kanade database. Zhao andPietikainen extend the concept of texture to temporaldomain and use it for expression analysis [51]. Thetexture is modeled with volume local binary patternsthat combine both motion and appearance. To man�age computational costs the co�occurrences on onlythree orthogonal planes are considered.

In summary, a significant amount of work hasalready been performed in automating the process ofrecognizing expressions from a face. In recent years,the focus has been on FACS�based methods. Despitemuch progress, there are still significant challenges insolving this problem. Our research adds to the body ofknowledge by examining several central issues includ�ing trackability of features and optimal features (dis�tances and displacements) based on fiducial points.We also present a statistical approach to classify faceswith expressions and provide a mechanism to classifyexpressions dynamically.

3. DATA SETS

There is no standard dataset that is universally usedfor uniform comparison of different algorithms. Manyresearchers use the Cohn Kanade database [19] toanalyze face expressions, even though there are somelimitations. We have also used the same database forour experiments. Details of this database are describedin Section 2.2. In addition, we use a database of featurepoints that is described below.

Feature Points Database: Expressions manifestthemselves as changes in the faces, which can betracked by monitoring the location of key features. Forexample, the eyebrows become more elliptical (raisedup) while expressing a surprise and do not change theirshape during happiness. For our research, we haveextracted the location of a set of features in the faceimages. These are obtained in two steps. First, the keypoints are manually located in the first image using agraphical user interface. Then these features aretracked using an automated method. Figure 1 showsthe features that are used in our research. The set ofpoints are based on features used for face reconstruc�tion surgery [13] and have also been used for a numberof face recognition applications [34, 36].

We have adapted the automated tracker developedby Lucas and Kanade [24] to track the feature pointsin successive image frames as the face changes expres�sion from neutral to a full�blown expression. In ourexperiments, we use a 9 × 9 pixel window to search thefeature point in the current frame, based on its loca�tion in the previous frame. The window size has beendetermined experimentally and is related to the imageresolution; this can be adapted for larger imagesappropriately. Once the features are tracked over allthe frames, their locations are stored in a database offeature points. All subsequent analysis is performedusing these feature points. It should be noted that amanual initial step is not a serious limitation of ourwork. Many automated feature extraction algorithmsare being developed [4] and they can be used to locatethe initial feature points. Furthermore, in many appli�cations in which the identity of a person is known, thefeature points can be extracted once and stored in adatabase. These can be used to perform an initialtracking to locate the feature points in the first frame.

4. ANALYSIS OF FACIAL EXPRESSIONS

The goal of this section is to answer some funda�mental questions that will be helpful in developing anautomated system. These questions are:

1. Which feature points in the face are easy to trackduring the development of expressions?

2. What is the best feature set to recognize expres�sions (a) if the identity of the person is not known and(b) if the identity of the person is known and we have aneutral face image?

1

2 3 4

5 68

9 71011 12

13 1415

1920 21

22

23

Fig. 1. Feature Points in the Face.



3. How well can we classify expressions with theoptimal set of features?

4. How do we build an optimized system to recog�nize expressions in a dynamic setting?

4.1. Tracking Efficiency

The basis of automated expression recognition sys�tem is tracking the motion of the relevant landmarks ina human face. Detecting feature points dynamically isa challenge due to large variations in facial size, featurelocation, pose, lighting conditions, and motion pat�terns. Some features are tied to more rigid parts of theface, e.g., corners of eyes, while others have greaterdegrees of freedom in their movements, e.g., middleparts of the lips. Some features are more stable forsome expressions but unstable for others. Identifica�tion of features that can be reliably tracked will provideguidance to automated expression analysis. For ouranalysis, we use the Cohn Kanade database describedin Section 2.2. We have manually identified a set of23 feature points for each starting frame as shown inFig. 1. The initial feature points are then passed to anautomated tracker which follows the points in succes�sive frames until the end of expression.

To determine the effectiveness of the tracker, weanalyzed the tracked points individually and as a func�tion of expressions. The feature points are classifiedinto three categories based on how well they aretracked.

1. Correct: If a point is correctly tracked, it is classi�fied as correct.

2. Lost: If the tracker is not able to find a featurepoint, it is labeled as lost. This is generally causedbecause the location of the feature point in successiveframes is significantly different.

3. Drifted: If the tracker finds a match for the fea�ture point in a frame, but does not do it accurately, weclassify it as drifted.

Figure 2a shows the percentage of correctly trackedfeatures for all types of expressions combined. Figure2b shows the percentage of feature points that are cor�rectly tracked for different expressions. Figure 2 showsthat the features are best tracked (80% on the average)

during sad, happy, fear, and anger expressions. Track�ing is most difficult during the surprise expression(70% on the average). This is because there is a suddenchange in the location of features during a surpriseexpression. The feature points fall out of the searchwindow for tracking, resulting in the failure of thetracker.

Feature point 20 (middle of the mouth) is the mostdifficult point to track in general and is especially hardfor surprise, happy, and fear. During these expressions,the mouth opens making the tracking of the point verydifficult as the composition of the point disappears.Feature points 22 and 23 (left and the midpoint of theupper lip) are also relatively hard to track (60% onaverage for all types of expressions) because the pointslack distinct window that can be tracked in the consec�utive frames. The uppermost points on the lip (featurepoints 16, 17 and 18) are successfully tracked around70% of the time. Feature points 1, 10, 13, 14, and 15(Midline of the forehead, inner and outer side of theleft eye, and lowest point on the left eye) are trackedmost successfully for all types of expressions due to rel�atively small displacement during the evolution of theexpression.

Based on these observations, our general conclu�sion is that Feature Points 20, 22 and 23 should notbe used for analyzing expressions due to their poortrackability (less than 50%). Furthermore, since theuppermost points of the lips are also lost most of thetime, we took an average of those three points (16,17, and 18) to increase the tracking percentage.Thus, we limit ourselves to a total of 18 featurepoints that are easy to track for our analysis. Next,we will look at the set of features that are used torecognize an expression.

4.2. Optimal Feature Set to Recognize an Expression

We select features to recognize an expression byanalyzing two different scenarios: (a) when we have aface with expression, but no a priori information isavailable, and (b) when a neutral face is also availablefor the same subject. The motivation to use a small

Fig. 2. Success rates for tracking individual feature points.

100

80

60

40

20

01 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22 23

100

80

60

40

20

01

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

AngerDisgustFearHappySadSurprise

Feature points

Co

rrec

tly

trac

ked

, %

Co

rrec

tly

trac

ked

, %

Feature points

686


ALUGUPALLY et al.

subset of feature points is a significant reduction inrecognition time, a critical issue in real time applica�tions.

Analysis with no a priori knowledge: When there isno a priori information about the face, our goal is toclassify a given face into either neutral or one of the sixbasic expressions. Our dataset consists of 598 images;299 images with neutral expressions, and 299 imageswith the six full�blown expressions. For our analysis,we start with the set of 18 feature points described ear�lier. We use two types of distance between each pair offeature points and use the following convention to rep�resent them.

• Ei, j—Euclidean distance between feature pointsPi and Pj, 1 ≤ i, j ≤ 18 and j > i.

• Bi, j—Block distance between the feature pointsPi and Pj, where 1 ≤ i, j ≤ 18 and j > i.

The Euclidean distance dE and the Block distancedB between the points Pi and Pj are given by:

Thus a total of 153 values for each of Euclidean dis�tance and block distance are computed. After comput�ing these distances, we normalize them by using thedistance between the left end of left eye and the rightend of right eye. Our goal is to analyze how well a facecan be classified into one of the expressions based onthe Euclidean distance, block distance, and their com�bination. We used stepwise discriminant analysis toobtain the 10 most important distances between differ�ent feature points (Table 1).

Figure 3 shows the recognition rates for the threeapproaches for different expressions. It shows thatimages with sad expression have the lowest recognitionrate for expression. The most misclassifications forthese images are fear and neutral. Overall, the worstrecognition rate is observed with images with theexpression of fear when using block distance. Imageswith neutral, surprise, and disgust expressions have thehighest recognition rate in all modes. Using bothEuclidean and Block distances in conjunction resultedin better overall performance (82.6%) than using onlythe Euclidean distance (79.8%) or the Block distance(78.4%).

Analysis with a priori knowledge of the neutral face:In many applications, it is reasonable to expect that aneutral face of a person is available at the outset.Examples of such situations include human computerinterfaces where the expressions are likely to be moni�tored continuously. In such cases, in addition to theEuclidean and block distances, we can use displace�ment of the features from the neutral positions. We can

dE Pi Pj,( ) xi xj–( )2 yi yj–( )

2+=

dB Pi Pj,( ) xi xj– yi yj–+=

Table 1. Most significant distances of different types (withouta priori knowledge)

Step Euclidean Distance

BlockDistance

MixedDistances

1 E04,18 B04,18 E04,18

2 E17,18 B16,18 E17,18

3 E08,16 B17,18 E08,16

4 E08,09 B08,09 E08,09

5 E03,15 B03,15 E03,15

6 E12,18 B02,17 E12,18

7 E12,17 B02,13 E12,17

8 E16,18 B04,16 B12,17

9 E16,17 B14,18 B12,18

10 E12,16 B06,17 E04,16

Fig. 3. Recognition rates for the different expressions with different distances.

100908070605040302010

0

An

ger

Dis

gust

Fea

r

Hap

py

Sad

Su

rpri

se

Neu

tral

Expression

Euclidean

Block

Combined

Rec

ogn

itio

n r

ate,

%



use both horizontal and vertical displacement asdefined below:

Displacement vectors: Δt, h, and Δi, v, represent thehorizontal and vertical displacement of the feature ifrom the neutral face to the expressive face.

Since we use 18 feature points we have a total of 36displacements, 153 Euclidean distances and 153 blockdistances. Using these variables, we use the discrimi�nant analysis to get estimation of the classification ofvarious faces. Table 2 summarizes the 10 best features,as given by their index in the set of 18 features, for eachcategory determined using a stepwise discriminantprocedure. Figure 4 shows the recognition rates for thefour approaches for different expressions. The overallrecognition rates using displacement only is 84%. Ifboth displacement and distance are allowed in theanalysis, the best variables are able to explain around46.5% of the difference between various expressions. Itshould be noted that some of the distances in Table 2are different from the ones listed in Table 1 which liststhe significant features when no a priori information isavailable.

4.3. Expression Classification

We use a statistical approach to classify a face withan expression. For our analysis, we use the completeCohn�Kanade database with each set of images start�ing from neutral expression; gradually progressthrough the expression, and end up with a full�blownexpression. We use a linear discriminant analysis(LDA) for the final classification of the face. The fea�tures provided to LDA is the optimal set of featuresdescribed in Section 4.2.

We determine the probability of a frame being oneof the six expressions, based on the features in thatframe, using discriminant analysis. The frame is clas�sified into the expression with the highest probability.We perform a separate analysis for each type of expres�sion to determine how quickly the expression is recog�nized. We examine the progression of the assignedprobabilities for all expressions for the images as theface changes the expression from neutral to that par�ticular expression. In each case, we average the proba�bilities with the number of such expression sequences.Assuming that the prior probabilities of group mem�bership (in our case the six expressions) are known and

Table 2. The 10 most important distances of different types (with a priori knowledge)

Step Displacements Euclidean Distances Block Distances Distancesand Displacements

1 Δ02,v E15,17 B02,17 E05,17

2 Δ18,v E15,18 B15,17 E15,17

3 Δ17,h E03,18 B02,15 B02,15

4 Δ15,v E02,14 B03,17 B02,17

5 Δ15,h E15,16 B01,16 E03,17

6 Δ01,v E01,14 B03,14 Δ01,h

7 Δ14,v E02,17 B13,16 E03,14

8 Δ17,v E13,14 B03,13 B15,17

9 Δ03,v E01,18 B02,08 E01,18

10 Δ02,h E02,03 B04,07 B01,02

Fig. 4. Recognition rates for the expressions with displacements and distances.

100

80

60

40

20

0

An

ger

Dis

gust

Fea

r

Hap

py

Sad

Su

rpri

se

Expression

Rec

ogn

itio

n r

ate,

%

DisplacementEuclideanBlockCombined

688


ALUGUPALLY et al.

that the group�specific densities at an expression x canbe estimated, we can compute p(t |x), the probability ofx belonging to group t, by applying Bayes’ theorem:

(1)

where qt is the prior probability of membership ingroup t, ft is the probability density function for group

t, and f(x) = (x), is the estimated unconditional

density at x. Linear discriminant analysis partitions ap�dimensional vector space into a set of regions {Rt},where the region Rt is the subspace containing all p�dimensional vectors y such that p(t |y) is the largestamong all groups. An observation is classified as com�ing from a group t if it lies in the region Rt. Assumingthat each group has a multivariate normal distribution,linear discriminant analysis develops a function orclassification criterion using a measure of squaredMahalanobis distance. The classification criterion isbased on the individual within�group covariancematrices. Each observation is placed in the class fromwhich it has the smallest squared Mahalanobis dis�tance. The squared Mahalanobis distance from x togroup t—where Vt is the within�group covariancematrix, is given by

(2)

where mt is the a p�dimensional vector containingvariable means in group t. The group�specific densityestimate at x from group t is then given by

(3)

Combining Equations 1 and 3, the posterior prob�ability of x belonging to group t is given by

(4)

The discriminant scores (x) can be calculatedusing the linear discriminant functions derived byLDA. An observation is classified into group u if set�ting t = u produces the largest value of p(t/x) or the

smallest value of (x). The linear functions derivedby LDA to calculate the discriminant scores for the sixexpressions are.

Danger = –3 – 13E05, 17 + 135E15, 17 – 52V02, 15

+ 26V02, 17 – 3E03, 17 + 39Δ01Y – 27E03, 14 – 113V15, 17

Ddisgust = –4 – 24E05, 17 + 100E15, 17 – 66V02, 15

+ 58V02, 17 + 76E03, 17 + 13Δ01Y + 16E03, 14 – 36V15, 17

Dfear = –2 + 4E05, 17 – 10E15, 17 + 16V02, 15

– 17V02, 17 + 14E03, 17 + 13Δ01Y – 14E03, 14 + 34V15, 17

Dhappy = –5 – 0E05, 17 + 204E15, 17 + 3V02, 15

p t x( )qt ft x( )

f x( )��,=

qt ft

t

∑

dt2 x( ) x mt–( )'Vt

1– x mt–( ),=

ft x( ) 2π( )

p2��–

Vt

12��–

0.5dt2 x( )–( ).exp=

p t x( )0.5Dt

2 x( )–( )exp

0.5Du2 x( )–( )exp

u

∑

��.=

Du2

Dt2

– 10V02, 17 + 86E03, 17 – 20Δ01Y + 87E03, 14 – 100V15, 17

Dsad = –0 + 8E05, 17 + 31E15, 17 – 4V02, 15

– 4V02, 17 + 12E03, 17 + 16Δ01Y + 3E03, 14 – 36V15, 17

Dsurprise = –9 + 33E05, 17 + 106E15, 17 – 16V02, 15

+ 31V02, 17 + 35E03, 17 – 12Δ01Y – 48E03, 14 – 50V15, 17

Once the discriminant scores (–0.5 (x)) for eachexpression are calculated from these equations, theposterior probability of each expression is calculatedusing Equation 4. Figure 5 shows the probabilities fordifferent expressions for the images as a neutral face istransformed into a face with different expressions.Figure 5a shows the probabilities for an average angerimage sequence. It can be seen that the probabilitiesfor the other four expressions (disgust, fear, happy andsurprise) begin with zero and remain negligiblethroughout the development of the expression. Theprobability of anger gradually goes up as we progressthrough the expression while the probability of sadgoes down simultaneously. By the seventh frame theprobability of anger and sad are equal, at around 0.45.By about the 14th frame, the probability for angerreaches around 0.7. Similarly, Figures 5b–5f show theprobabilities for different expressions for the images asa neutral face is transformed into a face with disgust,fear, happy, sad, and surprise, respectively.

4.4. Design of a Dynamic Expression Classifier

A critical issue in any real�time application, wherethe expression must be determined almost immedi�ately, is the number of features used in the analysis.More features result in added tracking time andgreater complexity in the classifier. Figure 6a showsthe overall recognition rates of the expressions as afunction of the number of features. It shows that theperformance levels off after about 8 top features.Therefore, we use only those features for our dynamicclassifier (shown in Fig. 6b). A red line indicates theEuclidean distance between the two feature points, agreen line indicates a block distance, and a blue circleindicates the displacement of the point in the currentframe and the initial frame.

Our goal in the design of the dynamic classifier is torecognize the expression accurately and as early aspossible when the expression develops in a face. Theclassifier should identify the initial few frames as neu�tral and as the expression slowly develops, it shouldclassify the frames as mixed (neutral/expression). Asthe full blown expression is formed, the classifiershould classify the expression correctly. In order todevelop this dynamic classifier, it is important tounderstand the progression of probabilities of differentexpressions during the process of expression develop�ment (shown in Fig. 5a–5f). We make the followingobservations from the figures.

Du2



—All the expressions start with a probability of 0.7for sad while the other five expressions have theirprobabilities below 0.2.

—For the five expressions (anger, disgust, fear,happy, surprise) when the probability of classificationreaches 0.5 in a frame, it can be safely classified as thatexpression.

—If the probability of sad remains over 0.75, wecan say safely classify the expression as sad.

Using these observations we can classify full blownexpressions accurately. In order to classify mixedexpressions, we use the following two heuristics.

—If the probability of one of the five expressions isbetween 0.3 and 0.5, we can say that the face is chang�

Fig. 5. Progression of probabilities of different expressions for different image sequences.

Fig. 6. Recognition rates for all the expressions as a function of the number of features and the features used in the dynamic clas�sifier.

Sad image sequence Surprise image sequence

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0

0.8

0.6

0.4

0.2

01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Frame number Frame number

Pro

babi

lity

Anger image sequence Disgust image sequence

Fear image sequence Happy image sequence

(a) (b)

(c) (d)

(e) (f)

AngerDisgust

FearHappy

SadSurprise

Δ01, V

B02, 15

E03, 14

E15, 17B02, 17

E03, 17E05, 17

B15, 17

100908070605040302010

01 2 3 4 5 6 7 8 9 10 11 12 13 1415 16 1718 19

Number of variables

Ove

ral r

eco

gnit

ion

rat

e, %

690


ALUGUPALLY et al.

ing from neutral to expression and can be classified asmixed.

—If the probability of sad is between μs ± σs weclassify the frame as mixed (sad), where μs is the aver�age value of probability of sad expression in the firstframe and σs is the standard deviation.

If none of the above conditions hold, the frame isclassified as neutral. The schematic for the dynamicclassifier is given in Fig. 7.

In order to analyze the performance of the algo�rithm, we extracted a set of frames from the Cohn�Kanade database. Since it is difficult to determineexactly when a frame turns from neutral to a mixedexpression and then to a full blown expression, wedecided to use the following rules to label frames. Weassumed the first three frames in a sequence to be neu�tral; the last three frames to be the full blown expres�sion and the middle three frames to be mixed. Table 3shows the performance of the classifier for these threesets of frames. From the table we see that we were ableto correctly classify the first set of frames as neutral (or

neutral/expression) 95% of the time. The middle set isclassified into a wrong expression only about 5% of thetime. It is interesting to note that, in a large number ofinstances, by the time we reach the middle frames, theexpression has already been fully formed. The final fullblown expression is classified correctly 96% of thetime either as the full blown expression or as neu�tral/expression. From this table it is clear that the clas�sifier is sufficiently accurate for dynamic applications.Some reasons for misclassification are (a) wrongground truth, (b) classifier error, and (c) incomplete orearly formation of expression.

5. CONCLUSION AND FUTURE WORK

In this research, we studied how the expressionscan be recognized in a human face based on thechange in facial features. The automatic analysis of theface and facial expressions is rapidly evolving into amature scientific discipline. Our research is able toanswer some of the fundamental questions useful indeveloping automated systems to recognize expres�sions based on landmarks.

We have analyzed how facial expressions can berecognized using Euclidean and block distances, anddisplacement of the feature points in a face using sta�tistical analysis. We have identified the features anddistances which are useful in differentiating betweenexpressions. Our results show that it is possible to getaccurate results with as few as 8 features. We have alsodeveloped an algorithm to dynamically classifyexpressions as they are being formed and evaluated itseffectiveness.

NeutralNo Yes

Yes

Yes

YesX

MixedNeutral/X

Sad

MixedNeutral/Sad

p(X)X∈(S, A, F, D, H, R)

p(X) > 0.5X∈(A, F, D, H, R)

No

No

No

p(X) > 0.3X∈(A, F, D, H, R)

p(S) ≥ 0.75

−σs < p(X) − μs < +σs

Fig. 7. Schematic for the dynamic classifier.

Table 3. Recognition Rates (%) for the different expressionsas the number of features increases

Neutral Neutral/Expr Expression Other

Neutral 93.6 1.2 2.5 2.7

Neutral/Expr 18.1 13.0 63.9 4.9

Expression 0.9 2.8 92.8 3.5



An important extension to our manual extractionof features can be to locate the feature points withoutany assistance from the user. In this framework, if theuser goes out of frame and returns, the application canrelocate the points and restart. In addition, the searchwindow to locate a feature in a new frame can bedynamically adjusted by examining the probabilities ofdifferent expressions. Further research can be done onmeasuring the intensity of an expression on the face.Identification of expressions other than the six pri�mary ones can also be useful in many applications.The motion of the features can be modeled using tech�niques such as particle filtering to get better character�ization of the expressions. The research can also beextended to examine datasets with spontaneousexpression [32] and challenging illumination condi�tions [25].

REFERENCES

1. K. Anderson and P. W. McOwan, “A Real–Time Auto�mated System for the Recognition of Human FacialExpressions,” IEEE Trans. Syst., Man Cybernet. B 36,96–105 (2006).

2. M. S. Bartlett, G. C. Littlewort, M. G. Frank, C. Lain�scsek, I. R. Fasel, and J. R. Movellan, “Automated FaceAnalysis by Feature Point Tracking Has High Concur�rent Validity with Manual FACS Coding,” J. Multime�dia 1, 22–35 (2006).

3. M. H. Bindu, P. Gupta, and U. S. Tiwary, “CognitiveModel–Based Emotion Recognition from FacialExpressions for Live Human Computer Interaction,”in CIISP 2007: Proc. IEEE Symp. on ComputationalIntelligence in Image and Signal Processing, (Honolulu,2007), pp. 351–356.

4. P. Campadelli and R. Lanzarotti, “Fiducial PointLocalization in Color Images of Face Foregrounds,”Image Vision Comput. 22, 863–872 (2004).

5. I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S.Huang, “Facial Expression from Video Sequences:Temporal and Static Modeling,” Computer VisionImage Understand. 91, 160–187 (2003).

6. C. Darwin, The Expression of the Emotions in Man andAnimals (Univ. Chicago Press, 1965).

7. P. Ekman, “Facial Expression and Emotion,” Am. Psy�chol. 48, 384–392 (1993).

8. P. Ekman and W. V. Friesen, “Constants across Cul�tures in the Face and Emotion,” J. Person. Social Psy�chol. 17, 124–129 (1971).

9. P. Ekman and W. V. Friesen, Unmasking the Face: AGuide to Recognizing Emotions from Clues (Prentice–Hall, Englewood Cliffs, 1975).

10. P. Ekman and W. V. Friesen, Facial Action Coding Sys�tem: A Technique for the Measurement of Facial Move�ment (Consulting Psychologists Press, Palo Alto, 1978).

11. R. El Kaliouby and P. Robinson, “Real–Time Infer�ence of Complex Mental States from Facial Expres�sions and Head Gestures,” in CVPRW 04: Proc. IEEE

Computer Soc. Conf. on Computer Vision and PatternRecognition Workshops (Washington, 2004), p. 154.

12. I. A. Essa and A. P. Pentland, “Coding, Analysis, Inter�pretation and Recognition of Facial Expressions,”IEEE Trans. Pattern Anal. Mach. Intell. 19, 757–763(1997).

13. L. G. Farkas, Anthropometry of the Head and Face(Raven Press, New York, 1994).

14. A. J. Fridlund, Human Facial Expression: An Evolution�ary View (Acad. Press, San Diego, 1994).

15. S. B. Gokturk, J.�Y. Bouguet, C. Tomasi, and B. Girod,“Model–Based Face Tracking for View–IndependentFacial Expression Recognition,” in Proc. 5th Int. Conf.on Automatic Face and Gesture Recognition (Washing�ton, 2002), p. 287.

16. H. Gu, Y. Zhang, and Q. Ji, “Task Oriented FacialBehavior Recognition with Selective Sensing,” Comp.Vision Image Understand. 100, 385–415 (2005).

17. H. Hong, H. Neven, and C. von der Malsburg, “OnlineFacial Expression Recognition Based on PersonalizedGalleries,” in Proc. 3rd IEEE Int. Conf. on AutomaticFace and Gesture Recognition (Nara, 1998), pp. 354–359.

18. W. James, “What Is an Emotion?,” Mind 9, 188–205(1984).

19. T. Kanade, J. Cohn, and Y. Tian, “ComprehensiveDatabase for Facial Expression Analysis,” in Proc.4th IEEE Int. Conf. on Automatic Face and Gesture Rec�ognition (Grenoble, 2000), pp. 46–53.

20. I. Kotsia, N. Nikolaidis, and I. Pitas, “Facial Expres�sion Recognition in Videos Using a Novel Multi–ClassSupport Vector Machines Variant,” in ICASSP 2007:Proc. IEEE Int. Conf. on Acoustics, Speech, and SignalProcessing (Honolulu, 2007), Vol. 2, pp. 585–588.

21. I. Kotsia and I. Pitas, “Facial Expression Recognitionin Image Sequences Using Geometric DeformationFeatures and Support Vector Machines,” IEEE Trans.Image Processing 16, 172–187 (2007).

22. C. Landis, “Studies of Emotional Reactions: II. Gen�eral Behavior and Facial Expression,” J. ComparativePsychol. 4, 447–510 (1924).

23. J. J. Lien, T. Kanade, J. Cohn, and C. Li, “Detection,Tracking and Classification of Action Units in FacialExpression,” J. Robotics Autonom. Syst. 31, 131–146(2000).

24. B. D. Lucas and T. Kanade, “An Iterative Image Regis�tration Technique with an Application to StereoVision,” in Int. Joint Conf. on Artificial Intelligence(Vancouver, 1981), pp. 674–679.

25. Machine Perception Laboratory, MPLab GENKI–4KFace, Expression, and Pose Dataset, Available from:http://mplab.ucsd.edu/wordpress/?page_id=398

26. D. Matsumoto and P. Ekman, Japanese and CaucasianFacial Expressions of Emotion (JACFEE) (Interculturaland Emotion Research Laboratory, Department ofPsychology, San Francisco State Univ., 1998) (Unpub�lished Slide Set).

692


ALUGUPALLY et al.

27. P. Michel and R. El Kaliouby, “Real Time FacialExpression Recognition in Video Using Support VectorMachines,” in Proc. 5th Int. Conf. on Multimodal Inter�faces (Vancouver, 2003), pp. 258–264.

28. M. Pantic and I. Patras, “Detecting Facial Actions andTheir Temporal Segments in Nearly Frontal–ViewFace Images Sequences,” in Proc. IEEE Int. Conf. onSystems, Man and Cybernetics (Hawaii, 2005),pp. 3358–3363.

29. M. Pantic and I. Patras, “Dynamics of Facial Expres�sion: Recognition of Facial Actions and Their Tempo�ral Segments from Face Profile Image Sequences,”IEEE Trans. Syst., Man Cybern. B 36, 433–449 (2006).

30. M. Pantic and L. J. M. Rothkrantz, “Automatic Analy�sis of Facial Expressions: The State of the Art,” IEEETrans. Pattern Anal. Mach. Intell. 22, 1424–1445(2000).

31. M. Pantic and L. J. M. Rothkrantz, “Expert System forAutomatic Analysis of Facial Expression,” ImageVision Comput. 18, 881–905 (2000).

32. M. Pantic, M. F. Valstar, R. Rademaker, and L. Maat,“Webbased Database for Facial Expression Analysis,”in Proc. IEEE Int. Conf. on Multimedia and Expo(ICME’05) (Amsterdam, 2005), pp. 317–321.

33. A. Pentland, B. Moghaddam, and T. Starner, “View–Based and Modular Eigenspaces for Face Recogni�tion,” in CVPR’94: Proc. IEEE Int. Computer Soc. Conf.on Computer Vision and Pattern Recognition (Seattle,1994), pp. 84–91.

34. A. Samal, V. Subramanian, and D. Marx, “SexualDimorphism in Human Faces,” J. Visual Commun.Image Rep. 18, 453–463 (2007).

35. H. Seyedarabi, A. Aghagolzadeh, and S. Khanmoham�madi, “Recognition of Six Basic Facial Expressions byFeature–Point Tracking Using RBF Neural Networkand Fuzzy Inference System,” in ICME 04: Proc. IEEEInt. Conf. on Multimedia and Expo (Taipei, 2004),pp. 1219–1222.

36. J. Shi, A. Samal, and D. Marx, “Face RecognitionUsing Landmark–Based Bidimensional Regression,”Comp. Vision Image Understand. 102, 117–133(2006).

37. J. Shi and C. Tomasi, “Good Features to Track,” inCVPR’94: Proc. IEEE Computer Soc. Conf. on ComputerVision and Pattern Recognition (Seattle, 1994), pp. 593–600.

38. J. Steffens, E. Elagin, and H. Neven, “PersonSpotter –Fast and Robust System for Human Detection, Track�ing and Recognition,” in Proc. 3rd Int. Conf. on Auto�matic Face and Gesture Recognition (Nara, 1998),pp. 516–521.

39. M. Suwa, N. Sugie, and K. Fujimora, “A PreliminaryNote on Pattern Recognition of Human EmotionalRecognition,” in Proc. 4th Int. Joint Conf. on PatternRecognition (Kyoto, 1978), pp. 408–410.

40. H. Tao and T. S. Huan, “Connected Vibrations: AModal Analysis Approach to Non–Rigid MotionTracking,” in CVPR 98: Proc. IEEE Computer Soc. Conf.

on Computer Vision and Pattern Recognition (Santa Bar�bara, 1998), pp. 753–750.

41. Y.�L. Tian, T. Kanade, and J. F. Cohn, “RecognizingAction Units for Facial Expression Analysis,” IEEETrans. Pattern Anal. Mach. Intell. 23, 97–115 (2001).

42. C. Tomasi and T. Kanade, “Detection and Tracking ofPoint Features,” Carnegie Mellon Univ. Tech. Rep.No. CMU–CS–91–132 (1991).

43. S. S. Tomkins, “The Role of Facial Response in theExperience of Emotion: A Reply to Tourangeau andEllsworth,” J. Person. Social Psychol. 40, 355–357(1981).

44. S. S. Tomkins, “Affect Theory,” in Emotion in theHuman Face, Ed. by P. Ekman, 2nd ed. (CambridgeUniv. Press, 1982).

45. M. F. Valstar and M. Pantic, “Combined Support Vec�tor Machines and Hidden Markov Models for Model�ing Facial Action Temporal Dynamics,” in Proc. IEEEInt. Workshop on Human–Computer Interaction (Rio deJaneiro, 2007), pp. 188–197.

46. Q. Xu, P. Zhang, W. Pei, L. Yang, and Z. He, “An Auto�matic Facial Expression Recognition Approach Basedon Confusion–Crossed Support Vector MachineTree,” in ICASSP 2007: Proc. IEEE Int. Conf. on Acous�tics, Speech, and Signal Processing (Honolulu, 2007),pp. 625–628.

47. Y.�L. Xue, X. Mao, and F. Zhang, “Beihang UniversityFacial Expression Database and Multiple FacialExpression Recognition,” in Proc. 5th Int. Conf. onMachine Learning and Cybernetics (Dalian, 2006),pp. 3282–3287.

48. P. Yang, Q. Liu, and D. N. Metaxas, “Boosting CodedDynamic Features for Facial Action Units and FacialExpression Recognition,” in Proc. CVPR (Minneapo�lis, 2007).

49. M. Yeasin, B. Bullot, and R. Sharma, “Recognition ofFacial Expressions and Measurement of Levels ofInterest from Video,” IEEE Trans. Multimedia 8, 500–508 (2006).

50. Y. Zhang and Q. Li, “Active and Dynamic InformationFusion for Facial Expression Understanding fromImage Sequences,” IEEE Trans. Pattern Anal. Mach.Intell. 27, 699–714 (2005).

51. G. Zhao and M. Pietikainen, “Dynamic Texture Rec�ognition Using Local Binary Patterns with an Applica�tion to Facial Expressions,” IEEE Trans. Pattern Anal.Mach. Intell. 29, 915–928 (2007).

52. H. Zhao, Z. Wang, and J. Men, “Facial ComplexExpression Recognition Based on Fuzzy Kernel Clus�tering and Support Vector Machines,” in ICNS 07:Proc. 3rd Int. Conf. on Natural Computation (Haikou,2007), pp. 562–566.

53. G. Zhou, Y. Zhan, and J. Zhang, “Facial ExpressionRecognition Based on Selective Feature Extraction, inISDA 06: Proc. 6th Int. Conf. on Intelligent SystemsDesign and Application (Jinan, 2006), pp. 412–417.



Nripen Alugupally completed hisMS from the Department of Com�puter Science and Engineering at theUniversity of Nebraska�Lincoln. Heis currently working as a SoftwareEngineer at Pacific Gas & Electric.

David Marx is a Professor in theStatistics Department at the Univer�sity of Nebraska�Lincoln. Hisresearch interests include spatial sta�tistics, linear and non�linear models,and biometrics. He has authored orco�authored over 150 papers.

Sanjiv Bhatia is an Associate Pro�fessor in the Department of Com�puter Science and Mathematics atthe University of Missouri at St.Louis. His research interests includealgorithms, computer vision, andimage analysis.

Ashok Samal is a Professor in theDepartment of Computer Scienceand Engineering at the University ofNebraska�Lincoln. His researchinterests include image analysisincluding biometrics, documentimage analysis and computer visionapplications. He has published exten�sively in these areas.

Analysis of Landmarks in Recognition of Face Expressionssanjiv/pubs/2011/Alug_face_11.pdf ·...

Documents

Transcript of Analysis of Landmarks in Recognition of Face Expressionssanjiv/pubs/2011/Alug_face_11.pdf ·...