Facial Feature Detection Survey

Coursework Submission Cover Sheet

Student No 58407053 Degree Scheme MEng

Electronic

Engineering

Student Name Luke

Gahan

Year 2013

Module EE544 Lecturer Prof Paul

F. Whelan

Title Computer

Vision

Hours spent on

this exercise

I hereby declare that the attached submission is all my own work, that it has not previously been submitted

for assessment, and that I have not knowingly allowed it to be used by another student. I understand that

deceiving or attempting to deceive examiners by passing off the work of another as one's own is not

permitted. I also understand that using another's student’s work or knowingly allowing another student to use

my work is against the University regulations and that doing so will result in loss of marks and possible

disciplinary proceedings.

Signed: Date:

For use by examiners only (students should not write below this line)

Comments:

EE544 Facial Feature Detection Review Luke Gahan

2

Abstract

This survey examines a number of facial features detection techniques with a view to

developing a head tilt estimator. The techniques examined include, appearance based

methods, model based methods, anthropometry based methods and 3D vision based

methods. Each technique is discussed with regard to its functionality and performance. A

conclusion is present which provides the authors opinion on the best technique for feature

detection and the best techniques for the task of head tilt estimation.


3

TABLE OF CONTENTS 1. Introduction ..................................................................................................................................... 4

1.1 Feature Selection ..................................................................................................................... 4

1.2 Survey Structure ...................................................................................................................... 5

2. Facial Feature Detection ................................................................................................................. 5

2.1 Appearance Based Methods .................................................................................................... 5

2.1.1 Colour Based Methods .................................................................................................... 6

2.1.2 Gabor Filters ................................................................................................................... 7

2.2 Model Based Methods ............................................................................................................ 8

2.2.1 ASM & AAM ................................................................................................................. 8

2.3 Anthropometry Based Methods ............................................................................................ 10

2.4 3D Vision Based Methods .................................................................................................... 13

3. Conclusion .................................................................................................................................... 14


4

1. INTRODUCTION Facial feature detection has applications in several areas including facial recognition, facial pose

estimation, facial reconstruction, medical diagnostics and multimedia applications. Facial feature

detection involves determining the location of several key locations on the face (often referred to as

landmarks). Once the location of these landmarks has been determined, this information can be used

for some specific purpose such as individual identification in a facial recognition system.

This survey paper examines a number of the methods used for facial feature detection. This is done

with a view towards selection method to be applied to head tilt estimation. The survey begins with a

discussion about the choice of facial features to be detected. This is followed by an outline of the

taxonomy used in this survey. The body of the survey is then presented. This covers all the facial

feature detection techniques examined. The survey finishes with a conclusion which discusses the

field as a whole, as well as possible future developments in the area.

1.1 FEATURE SELECTION The choice of which particular facial features should be detected is specific to each different

algorithm. In general the points detected are some subset of the set shown in Figure 1. These are the

most significant facial features but the particular application may require that some other features are

located. As is the case with all interest points certain performance criteria should applied detectors

(facial features detectors are interest point detectors in the facial domain). Specifically, the feature

detector should have the following attributes [1]:

High information content

o Detect all (or) most facial features

o No false detections

o Well localised

o Robust to noise

o Efficient detection

Repeatable under transforms (including different facial orientations)


5

FIGURE 1 - FACIAL LANDMARKS [2]

1.2 SURVEY STRUCTURE The techniques discussed in this survey are examined under the headings outlined below. Some of the

methods described combine a number of techniques and could be included under more than one

heading. In such a case, the technique is included under the heading which best characterises it.

Appearance Based Methods: Techniques which primarily use pixel intensity information.

Model Based Methods: Techniques which use a trained model to detect features. This

model is usually created using appearance and/or shape information.

Anthropometry Based Methods: Techniques which use knowledge of the location and

distance between various facial landmarks to detect features. This knowledge is obtained by

examining the distances between features on the human face.

3D Vision Based Methods: Techniques which use depth information to allow facial features

to be detected.

2. FACIAL FEATURE DETECTION

2.1 APPEARANCE BASED METHODS The methods described in this section primarily use pixel intensity information for the purpose of

facial feature detection. This may be binary, grayscale or colour information.


6

2.1.1 COLOUR BASED METHODS

The lips, eyes, eyebrows and mouth are particularly well suited to detection using colour based

methods. Obviously the most significant drawback to using a colour based method is that these

systems only work on colour images and can only be implement in a real world system where colour

image capturing devices are available (which is not always the case). There are also a number of

issues with regard to lighting and colour detection which need to be taken into account. Another issue

of significance is the variation in skin and feature colour for people from different ethnic

backgrounds.

HSU [3] presents a very interesting technique which first uses skin colour to detect faces in an image

and then uses the shape information to decide if a face has been successfully detected. The algorithm

identifies the eye and mouth regions as well as the facial boundary. The first step of the algorithm is

to normalise the image to overcome lighting issues which often influence colour detection. The CbCr

colour space is used for skin detection. This is a subspace of the YCbCr space which is obtained by a

non-linear transform ((Cb/Y) – (Cr/Y)). The authors claim that this space works well for light and dark

skin tones. To detect the eyes a combination of colour transforms and morphological operations are

carried out to emphasise the eye region. This procedure is shown in Figure 2. This method will not

work correctly for cases where the eyes are closed or the resolution of the eye area is poor. The mouth

region is detected by exploiting the red/blue colour difference around in the lip region. Morphological

operations are used to enhance the detected region. This procedure is shown in Figure 3. The locations

are verified by examining the triangle produced by joining the three detected features. The Hough

transform is used to find an estimation of the elliptical face boundary.

FIGURE 2 - IDENTIFICATION OF EYE REGION [3]


7

FIGURE 3 - IDENTIFICATION OF MOUTH REGION [3]

The results for detection of facial features are reasonable considering the simplicity of the algorithm.

For both low (150x220) and higher (640x480) resolution data sets this method has a detection rate of

over 89% for frontal images. The system recorded a detection rate of ~74% for half-profile faces and

18% for profile faces. The poor result for profile faces is due to the fact that the algorithm breaks

when both eyes aren’t visible. While the simplicity of this algorithm is appealing, the fact that both of

the subject’s eyes must be open and visible mean that it is limited in its application . Assuming image

sizes aren’t too large the computational efficiency of this method should be quite good. The authors

recorded a processing time of 0.08s per image for the 150x220 images. Another point worth noting is

that this method does not require any training.

2.1.2 GABOR FILTERS

Gabor filters are widely used in the area of facial feature detection. These filters which are named

after Dennis Gabor are orientation sensitive filters which can be used for texture analysis. In the

spatial domain the filter (g(x,y)) is a Gaussian kernel function modulated by a sinusoidal plane wave

as given by the equation below [5].

Where s(x,y) is a complex sinusoid known as a carrier and wr(x,y) is a 2D Gaussian shaped function

known as the envelope (the terminology is due to the fact that Dennis Gabor’s research was concerned

with the communications field). A group of filters (Gabor filter bank) can be generated for different

scales and orientations and then convolved with an image to generate a “Gabor Jet”. This Gabor jet is

the response of the feature to the Gabor filter bank.

One of the most significant applications of Gabor filters to the task of facial feature detection was

carried out by Wiskott [6]. The authors propose building a Face Bunch Graph (FBG) which is said to

“cover all possible variations in the appearances of faces”[7]. This primary involves applying the

Elastic Bunch Graph Match techniques to the detection of facial features. Figure 4 provides a

diagrammatical representation of the FBG. Each node is a facial feature. At each particular node a

number of stacks can be seen. Each of these is a Gabor jet for this particular feature. A set of jets at a

given node is known as a bunch. A bunch includes jets that cover as many possible variations as

possible (i.e. at the eye node, eye closed, open, male eye, female eye etc.). This FBG can then be used

to detect facial features in some unknown image. Classification is carried out by determining the


8

distance between the unknown feature wavelet and the “best fitting jet” (in dark grey in Figure 4) for

that particular feature.

FIGURE 4 - FACE BUNCH GRAPH [7]

[7] evaluate the system in terms of its performance with regard to face recognition and not feature

detection (% of faces recognised rather than detection of actual feature points). The performance is on

a par with most benchmark systems for images where the faces used are front facing. The

performance is very poor (<20%) for cases where there is an angle of >45% between the subject and

the camera. The authors suggest that performance could be improved by using PCA to reduce the

dimensionality of the FBG.

2.2 MODEL BASED METHODS

2.2.1 ASM & AAM

Active shape models were first proposed by Cootes[6]. This method is also known as smart snakes

due to the fact that it is similar to the active contours models (snakes) method. A shape template

model is used for detection rather than a shape being initialised and then allowed to deform to fit

some arbitrary shape. A shape model is created using annotated images of the feature to be detected as

shown in Figure 5. Once all points are registered and aligned using Procrustes analysis, Principal

Component Analysis is used to reduce the dimensionality of the model leaving only the most

significant variances. This model is then placed on the image and allowed to deform in an attempt to

identify a face in the image. The model places a constraint on the shapes which can be detected, so

only valid shapes (those in the training set) can be detected. This method can be extended by

including the underlying gray level information in the model. This approach, also developed by

Cootes, is known as Active Appearance Models [8].


9

FIGURE 5 - ASM TRAINING IMAGE [9]

There are a number of issues associated with this particular method. These include difficulty with

initialising the location of the model, scaling and rotation issues, detection of multiple faces and

dealing with images where the face isn’t frontal facing. These issues aren’t specific to facial feature

detection though a number of solutions have in proposed. These include the use of, colour

transformations (to identify face location), Gaussian pyramids (to deal with scale issues), multiple

orientation evaluation (to address rotation) and the use of multiple models in a single image (dealing

multiple faces in an image). Milborrow [10] looked at a number of different enhancements to the

ASM method with respect to facial feature detection. The author found that increasing the number of

landmarks used in training leads to an increase in the accuracy of detection. Figure 6 shows the

decrease in errors as the number of landmark points is increased. The author noted that processing

time increases linearly with the number of landmark points. The fact that more landmark point leads

to an increase in the accuracy of detection is beneficial to the task of facial feature detection as it

means that more points are available for the further processing (e.g. for use in a facial recognition

system).

FIGURE 6 - ERROR VS. NO. LANDMARKS[10]


10

Active shape/appearance models have also been used in the detection of individual facial features. A

combination of these systems could potentially be used. For example [11] uses active shape models in

the localisation of the lip region. Though it should be noted that the authors state that “another image

processing algorithm” is used to first define a region of interest for the lips. A shape model is placed

in the centre of the region of interest at the mouth and used to detect the lip contour. The authors

present favourable results with the majority of results classed as “good” (which the authors define as

the entire lip contour being within the detected region). This paper really only shows a proof of

concept and a more robust testing and refinement of the algorithm seems to be required.

Both ASMs and AAM appear to be well suited to facial feature detection but perhaps they do not

provide a very accurate result in terms of feature localisation. Rather these methods can be used to

detect a feature but perhaps another method should be used to determine exact feature locations.

2.3 ANTHROPOMETRY BASED METHODS A lot of work has been done in the medical field in the area of anthropometry. Anthropometry, which

literally means the measurement of man, is the study of the distances and proportions between

different locations on the human body. In the case of facial feature detection we are obviously most

interest in facial anthropometry.

Some of the most important work in this area was carried out by Leslie Gabriel Farkas MD. In his

work “Anthropometric Facial Proportions in Medicine” he presented a large number of facial

proportions along with their mean and standard deviation values for a group of over 2500 healthy

subjects [12]. This information can be used in the development of facial feature detection systems. It

should be emphasised that these proportions are for healthy humans and studies have shown that a

marker for certain neurological diseases is facial asymmetry.

Sohail [13] presents a method where anthropometric methods are used in the detection of facial

features in 2D images. Rather than using all of the Farkas points, the authors choose a subset of them,

as well as some of their own landmarks. Figure 7 shows the points selected (a) and the anthropometric

distances (b) used (note, the landmarks shown below are used to define search areas but are not the

facial features to be detected). The authors calculated the distances shown below from a set of 300

images of 150 subjects. The algorithm begins by detecting the eye centre using a method proposed by

another author. The algorithm correctly detects the eye centres for 99% of the images in their data

base. The location of the eye centres is used to calculate the tilt of the head. This information is then

used to correctly align the head with the y axis. The two eye centres serve as the basis for the

detection of all other facial features. Using the measured anthropometric distances search areas can be

defined around each of the points (P1,P2 …). The authors use the areas around these seven points to

identify 18 different facial features. The eye corners and eyelid mid points are detected by examining

the contours of the eyelids in binary images. The ends of the eyebrows are detected by isolating the

eyebrow using Otsu thresholding and selecting the widest points of the remaining largest blob. The

nostrils are assumed to be the darkest points around the nose tip (P6) region. A Laplacian of Gaussian

filter is used to detect these regions. For detection of points around the mouth the authors use a non-

linear intensity transform which exploits the difference in intensity between the lips and the

surrounding skin. Thresholding is performed and the remaining mouth blob is examined to identify

the corners of the mouth and the midpoint of the upper and lower lips. The authors tested the system

on three well known face databases and obtained an average accuracy of 90.44% for the detection of

all 18 facial features. The authors note that the system has trouble with faces tilted over 25% from the

y axis. This algorithm also assumes frontal facing subject. While there are better preforming systems


11

than this one, its main strength is that it uses knowledge of the facial structure to determine regions of

interest for particular facial features. By narrowing down the search region for particular features

using prior knowledge the methods of feature detection can be more specific (and simple). The system

performs quite well considering the relatively simple nature of the algorithm and the fact that it can

detect 18 facial features. A negative aspect of this system is that it relies on the correct detection of

the centre of both eyes before it can identify additional features.

FIGURE 7 - SELECTED LANDMARKS (A) & DISTANCES (B) [13]

Gupta [14] was the first to present a feature detection method which combines facial anthropometric

information and 3D images. The system uses both 2D texture and 3D range images for the detection

of the 10 facial features shown in Figure 8. These point are selected because the proportions

between them account for the large variations from person to person (this system identifies facial

features for face recognition). Like the system mentioned above this, system identifies one feature and

then uses anthropometric distances to define search regions for additional features. The system begins

by detecting the nose tip location using the Iterative Closest Point algorithm to find an estimate for the

nose tip. The system then uses the Gaussian and elliptical curvatures of the range image to detect the

exact nose tip (the peak in Gaussian curvature at the nose tip is used, as used by the method covered

in section 2.4). Anthropometric proportions are used to define a search area for the nose width points.

A Laplacian of Gaussian edge detector is used to find the width points. The widest points nearest the

nose tip in the horizontal direction are taken as the width points. The Gaussian curvature of the inner

eye is concave and this fact is used to detect inner eye corner estimates from the range image (also

used in the method by [15] in section 2.4). Using proportions defined by Farkas [12] and the nose tip

and width points, the search region for the inner eye corners is defined around the estimate locations.

The elastic bunch graph matching technique is used to detect the inner eye corner within the search

region. This technique uses a bunch of Gabor jets for detection in a similar fashion to the technique

used by [6] which is described in section 2.1.2 above. The centre of the nose root is taken to be the

mean of the two eye corner locations. The search regions for the outer corners of the eyes are defined

using the location of the inner eye and some anthropometric information. The EBGM is used in 2D to

find the outer eye corner. Mean and Gaussian curvature values are used to find estimates for mouth

corners while the search areas are defined using anthropometric proportions. The EBGM is once again

employed in the detection of the exact locations of the mouth corners.


12

FIGURE 8 - POINTS TO BE DETECTED [14]

Most of the testing carried out on this algorithm is done with regard to face recognition. For the

detection of facial features the authors present their results in terms of the standard deviation in pixels

from the detected points to the actual locations of the features. These results are shown in Table 1

below. This shows that the standard deviation of the error for all features is ~2 pixels (In the images

used 1 pixel = 0.32mm so 2 pixels is still under 1mm). This method has some very good results but its

computational performance is very slow. The ICP algorithm is used to find an estimate for the nose

tip is particularly computationally expensive (though this can be carried out offline). The training of

the EBGM used for the detection of the various features is also a significant cost (this is also done

offline). The highly accurate but complex nature of this system suggests that it would be better suited

to medical applications or systems where real time processing is not required.

TABLE 1 - FEATURE DETECTION RESULTS


13

2.4 3D VISION BASED METHODS 3D Vision techniques for facial feature detection have become much more popular in recent years.

This is due the increase in availability of 3D image capturing devices. By definition all 3D images

include depth information but some 3D image acquisition systems also capture colour (or greyscale)

information which may also be used in feature detection. Some techniques use just 3D information

while other techniques use both texture and range information.

A technique proposed by Segundo [15] uses just range images for facial feature detection. The

algorithm they employ is based on methods developed for 2D but applied to 3D range images. The

authors use a combination of clustering, edge detection and the Hough transform to first isolate the

face region. K-means clustering is used to identify the background, body and face in the range image.

Edge detection is used to identify the facial edges in the image. A Hough transform is then used to

detect the elliptical face boundary (as used by HSU on 2D images, shown in the method described in

section 2.1.1). Once the face has been localised this region of interest is examined for facial features.

The Gaussian and elliptical curvatures of the range image are calculated and this is used to determine

the location of the nose tip by exploiting the fact that the largest Gaussian curvature values occur

around the nose. Using similar curvature information the nose width points can also be located.

Curvature information is once again used to detect eye corners, this time search for pits rather than

peaks. As shown in Table 2 this algorithm performs extremely well. While these figures are very

impressive it should be noted the 3D images in this database are captured under laboratory conditions.

The performance would no doubt be lower in real world applications. A method such as this does

seem very well suited to medical applications.

TABLE 2 - FRGC 2.0 RESULTS [15]

Wang [16] presents a method which uses 2D texture images and 3D range images to detect facial

features. The authors devise a system which detects four features from the 3D range image and =10

from the 2D texture image. Four particular points are detected in both 2D and 3D. Figure 9 shows the

features which are detected by the algorithm. Those marked X are detected in both 2D and 3D. Gabor

filters are used for detection of the features in 2D space. The Gabor filters are used in the same

manner as Wiskcott uses them in [7] (as described previously, see section 2.1.2). To detect the

features in 3D the authors use point signature method. This is done by manually identifying features

in 3D training images. A sphere is then placed at the manually selected feature location. A curve is

described by the intersection of the sphere and the face surface. This can be used to characterise a

given feature and then used on an unknown image for feature detection. This system is developed for

facial recognition and the paper doesn’t present results for the accuracy of feature detection. In terms

of facial recognition, the authors report a detection rate of ~90% when the SVM classifier is used with

the system. The system proposed by Gupta [14] (see section 2.3) appears to perform better than this

one, based on recognition results.


14

FIGURE 9 - FEATURES FOR DETECTION

3. CONCLUSION The purpose of this survey paper is to review a number of facial feature detection methods with a

view to choosing a technique for the purpose of head tilt estimation. This conclusion begins with my

view on the best facial feature detection method in terms of correctly identifying feature locations.

Following this, I give my opinion on the best method for the task of head tilt estimation

Clearly there are a vast number of approaches to facial feature detection and to a large extent the

choice of technique is application dependent. In terms of just identifying the correct location of facial

features then it very hard to look past the 3D methods proposed by Gupta [14] and Segondo [15].

Both of these methods performed extremely well in testing and if accuracy of detection is the prime

concern then one of these methods should be chosen. My personal preference is for Gupta because of

the anthropometric information which is included in the algorithm. Using prior knowledge of feature

positions to define the size of search regions not only reduces the complexity of the problem but also

includes real world information in the system. The fact that these methods require 3D images means

that their application is limited at the present time.

The assignment we have been set uses 2D colour images and as a results the method proposed by

Gupta cannot be used. For the purpose of head tilt estimation I think that a method based on the work

by Wiscott [6] is the best approach. Gabor filters could be used for the detection of the corners of the

mouth and eyes. The search region for each feature should be defined using the anthropometric

proportions defined by Farkas. When using anthropometric measurement the procedure seems to be

that an initial feature is detected and then search regions for subsequent features are determined from

that. A colour transform like the one used by [3] could be used to identify face and then a red/blue

transform could be used to locate the mouth. From this starting point other features could be detected.


15

References

[1] P. F. Whelan, "Interest point detection," in DCU EE544 Course Notes Anonymous 2013,

[2] P. F. Buckley, et al. (2005 A three-dimensional morphometric study of craniofacial shape in

schizophrenia. Am. J. Psychiatry 162(3), pp. 606-608.

[3] R. Hsu, M. Abdel-Mottaleb and A. K. Jain. (2002 Face detection in color images. Pattern Analysis

and Machine Intelligence, IEEE Transactions On 24(5), pp. 696-706.

[4] J. R. Movellan. (2002 Tutorial on gabor filters. Open Source Document

[5] T. F. Cootes and C. J. Taylor. Active shape models–smart snakes. Presented at Proc. British

Machine Vision Conference.

[6] L. Wiskott, et al. (1997 Face recognition by elastic bunch graph matching. Pattern Analysis and

Machine Intelligence, IEEE Transactions On 19(7), pp. 775-779.

[7] T. F. Cootes, G. J. Edwards and C. J. Taylor. (2001 Active appearance models. Pattern Analysis

and Machine Intelligence, IEEE Transactions On 23(6), pp. 681-685.

[8] T. F. Cootes, G. Edwards and C. J. Taylor. Comparing active shape models with active appearance

models.

[9] S. Milborrow and F. Nicolls. (2008 Locating facial features with an extended active shape model.

Computer Vision–ECCV 2008 pp. 504-513.

[10] J. Luettin, N. A. Thacker and S. W. Beet. (1996 Active shape models for visual speech feature

extraction. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES 150pp. 383-390.

[11] L. G. Farkas and I. R. Munro. (1987), Anthropometric Facial Proportions in Medicine,

[12] A. S. M. Sohail and P. Bhattacharya. (2008 Detection of facial feature points using

anthropometric face model. Signal Processing for Image Enhancement and Multimedia Processing

pp. 189-200.

[13] S. Gupta, M. K. Markey and A. C. Bovik. (2010 Anthropometric 3D face recognition.

International Journal of Computer Vision 90(3), pp. 331-349.

[14] M. P. Segundo, C. Queirolo, O. R. Bellon and L. Silva. Automatic 3D facial segmentation and

landmark detection. Presented at Image Analysis and Processing, 2007. ICIAP 2007. 14th

International Conference On.

[15] Y. Wang, C. Chua and Y. Ho. (2002 Facial feature detection and face recognition from 2D and

3D images. Pattern Recog. Lett. 23(10), pp. 1191-1202.

Facial Feature Detection Survey

Documents

Transcript of Facial Feature Detection Survey