Facial Feature Detection Survey
-
Upload
luke-gahan -
Category
Documents
-
view
26 -
download
1
description
Transcript of Facial Feature Detection Survey
Coursework Submission Cover Sheet
Student No 58407053 Degree Scheme MEng
Electronic
Engineering
Student Name Luke
Gahan
Year 2013
Module EE544 Lecturer Prof Paul
F. Whelan
Title Computer
Vision
Hours spent on
this exercise
I hereby declare that the attached submission is all my own work, that it has not previously been submitted
for assessment, and that I have not knowingly allowed it to be used by another student. I understand that
deceiving or attempting to deceive examiners by passing off the work of another as one's own is not
permitted. I also understand that using another's student’s work or knowingly allowing another student to use
my work is against the University regulations and that doing so will result in loss of marks and possible
disciplinary proceedings.
Signed: Date:
For use by examiners only (students should not write below this line)
Comments:
EE544 Facial Feature Detection Review Luke Gahan
2
Abstract
This survey examines a number of facial features detection techniques with a view to
developing a head tilt estimator. The techniques examined include, appearance based
methods, model based methods, anthropometry based methods and 3D vision based
methods. Each technique is discussed with regard to its functionality and performance. A
conclusion is present which provides the authors opinion on the best technique for feature
detection and the best techniques for the task of head tilt estimation.
EE544 Facial Feature Detection Review Luke Gahan
3
TABLE OF CONTENTS 1. Introduction ..................................................................................................................................... 4
1.1 Feature Selection ..................................................................................................................... 4
1.2 Survey Structure ...................................................................................................................... 5
2. Facial Feature Detection ................................................................................................................. 5
2.1 Appearance Based Methods .................................................................................................... 5
2.1.1 Colour Based Methods .................................................................................................... 6
2.1.2 Gabor Filters ................................................................................................................... 7
2.2 Model Based Methods ............................................................................................................ 8
2.2.1 ASM & AAM ................................................................................................................. 8
2.3 Anthropometry Based Methods ............................................................................................ 10
2.4 3D Vision Based Methods .................................................................................................... 13
3. Conclusion .................................................................................................................................... 14
EE544 Facial Feature Detection Review Luke Gahan
4
1. INTRODUCTION Facial feature detection has applications in several areas including facial recognition, facial pose
estimation, facial reconstruction, medical diagnostics and multimedia applications. Facial feature
detection involves determining the location of several key locations on the face (often referred to as
landmarks). Once the location of these landmarks has been determined, this information can be used
for some specific purpose such as individual identification in a facial recognition system.
This survey paper examines a number of the methods used for facial feature detection. This is done
with a view towards selection method to be applied to head tilt estimation. The survey begins with a
discussion about the choice of facial features to be detected. This is followed by an outline of the
taxonomy used in this survey. The body of the survey is then presented. This covers all the facial
feature detection techniques examined. The survey finishes with a conclusion which discusses the
field as a whole, as well as possible future developments in the area.
1.1 FEATURE SELECTION The choice of which particular facial features should be detected is specific to each different
algorithm. In general the points detected are some subset of the set shown in Figure 1. These are the
most significant facial features but the particular application may require that some other features are
located. As is the case with all interest points certain performance criteria should applied detectors
(facial features detectors are interest point detectors in the facial domain). Specifically, the feature
detector should have the following attributes [1]:
High information content
o Detect all (or) most facial features
o No false detections
o Well localised
o Robust to noise
o Efficient detection
Repeatable under transforms (including different facial orientations)
EE544 Facial Feature Detection Review Luke Gahan
5
FIGURE 1 - FACIAL LANDMARKS [2]
1.2 SURVEY STRUCTURE The techniques discussed in this survey are examined under the headings outlined below. Some of the
methods described combine a number of techniques and could be included under more than one
heading. In such a case, the technique is included under the heading which best characterises it.
Appearance Based Methods: Techniques which primarily use pixel intensity information.
Model Based Methods: Techniques which use a trained model to detect features. This
model is usually created using appearance and/or shape information.
Anthropometry Based Methods: Techniques which use knowledge of the location and
distance between various facial landmarks to detect features. This knowledge is obtained by
examining the distances between features on the human face.
3D Vision Based Methods: Techniques which use depth information to allow facial features
to be detected.
2. FACIAL FEATURE DETECTION
2.1 APPEARANCE BASED METHODS The methods described in this section primarily use pixel intensity information for the purpose of
facial feature detection. This may be binary, grayscale or colour information.
EE544 Facial Feature Detection Review Luke Gahan
6
2.1.1 COLOUR BASED METHODS
The lips, eyes, eyebrows and mouth are particularly well suited to detection using colour based
methods. Obviously the most significant drawback to using a colour based method is that these
systems only work on colour images and can only be implement in a real world system where colour
image capturing devices are available (which is not always the case). There are also a number of
issues with regard to lighting and colour detection which need to be taken into account. Another issue
of significance is the variation in skin and feature colour for people from different ethnic
backgrounds.
HSU [3] presents a very interesting technique which first uses skin colour to detect faces in an image
and then uses the shape information to decide if a face has been successfully detected. The algorithm
identifies the eye and mouth regions as well as the facial boundary. The first step of the algorithm is
to normalise the image to overcome lighting issues which often influence colour detection. The CbCr
colour space is used for skin detection. This is a subspace of the YCbCr space which is obtained by a
non-linear transform ((Cb/Y) – (Cr/Y)). The authors claim that this space works well for light and dark
skin tones. To detect the eyes a combination of colour transforms and morphological operations are
carried out to emphasise the eye region. This procedure is shown in Figure 2. This method will not
work correctly for cases where the eyes are closed or the resolution of the eye area is poor. The mouth
region is detected by exploiting the red/blue colour difference around in the lip region. Morphological
operations are used to enhance the detected region. This procedure is shown in Figure 3. The locations
are verified by examining the triangle produced by joining the three detected features. The Hough
transform is used to find an estimation of the elliptical face boundary.
FIGURE 2 - IDENTIFICATION OF EYE REGION [3]
EE544 Facial Feature Detection Review Luke Gahan
7
FIGURE 3 - IDENTIFICATION OF MOUTH REGION [3]
The results for detection of facial features are reasonable considering the simplicity of the algorithm.
For both low (150x220) and higher (640x480) resolution data sets this method has a detection rate of
over 89% for frontal images. The system recorded a detection rate of ~74% for half-profile faces and
18% for profile faces. The poor result for profile faces is due to the fact that the algorithm breaks
when both eyes aren’t visible. While the simplicity of this algorithm is appealing, the fact that both of
the subject’s eyes must be open and visible mean that it is limited in its application . Assuming image
sizes aren’t too large the computational efficiency of this method should be quite good. The authors
recorded a processing time of 0.08s per image for the 150x220 images. Another point worth noting is
that this method does not require any training.
2.1.2 GABOR FILTERS
Gabor filters are widely used in the area of facial feature detection. These filters which are named
after Dennis Gabor are orientation sensitive filters which can be used for texture analysis. In the
spatial domain the filter (g(x,y)) is a Gaussian kernel function modulated by a sinusoidal plane wave
as given by the equation below [5].
Where s(x,y) is a complex sinusoid known as a carrier and wr(x,y) is a 2D Gaussian shaped function
known as the envelope (the terminology is due to the fact that Dennis Gabor’s research was concerned
with the communications field). A group of filters (Gabor filter bank) can be generated for different
scales and orientations and then convolved with an image to generate a “Gabor Jet”. This Gabor jet is
the response of the feature to the Gabor filter bank.
One of the most significant applications of Gabor filters to the task of facial feature detection was
carried out by Wiskott [6]. The authors propose building a Face Bunch Graph (FBG) which is said to
“cover all possible variations in the appearances of faces”[7]. This primary involves applying the
Elastic Bunch Graph Match techniques to the detection of facial features. Figure 4 provides a
diagrammatical representation of the FBG. Each node is a facial feature. At each particular node a
number of stacks can be seen. Each of these is a Gabor jet for this particular feature. A set of jets at a
given node is known as a bunch. A bunch includes jets that cover as many possible variations as
possible (i.e. at the eye node, eye closed, open, male eye, female eye etc.). This FBG can then be used
to detect facial features in some unknown image. Classification is carried out by determining the
EE544 Facial Feature Detection Review Luke Gahan
8
distance between the unknown feature wavelet and the “best fitting jet” (in dark grey in Figure 4) for
that particular feature.
FIGURE 4 - FACE BUNCH GRAPH [7]
[7] evaluate the system in terms of its performance with regard to face recognition and not feature
detection (% of faces recognised rather than detection of actual feature points). The performance is on
a par with most benchmark systems for images where the faces used are front facing. The
performance is very poor (<20%) for cases where there is an angle of >45% between the subject and
the camera. The authors suggest that performance could be improved by using PCA to reduce the
dimensionality of the FBG.
2.2 MODEL BASED METHODS
2.2.1 ASM & AAM
Active shape models were first proposed by Cootes[6]. This method is also known as smart snakes
due to the fact that it is similar to the active contours models (snakes) method. A shape template
model is used for detection rather than a shape being initialised and then allowed to deform to fit
some arbitrary shape. A shape model is created using annotated images of the feature to be detected as
shown in Figure 5. Once all points are registered and aligned using Procrustes analysis, Principal
Component Analysis is used to reduce the dimensionality of the model leaving only the most
significant variances. This model is then placed on the image and allowed to deform in an attempt to
identify a face in the image. The model places a constraint on the shapes which can be detected, so
only valid shapes (those in the training set) can be detected. This method can be extended by
including the underlying gray level information in the model. This approach, also developed by
Cootes, is known as Active Appearance Models [8].
EE544 Facial Feature Detection Review Luke Gahan
9
FIGURE 5 - ASM TRAINING IMAGE [9]
There are a number of issues associated with this particular method. These include difficulty with
initialising the location of the model, scaling and rotation issues, detection of multiple faces and
dealing with images where the face isn’t frontal facing. These issues aren’t specific to facial feature
detection though a number of solutions have in proposed. These include the use of, colour
transformations (to identify face location), Gaussian pyramids (to deal with scale issues), multiple
orientation evaluation (to address rotation) and the use of multiple models in a single image (dealing
multiple faces in an image). Milborrow [10] looked at a number of different enhancements to the
ASM method with respect to facial feature detection. The author found that increasing the number of
landmarks used in training leads to an increase in the accuracy of detection. Figure 6 shows the
decrease in errors as the number of landmark points is increased. The author noted that processing
time increases linearly with the number of landmark points. The fact that more landmark point leads
to an increase in the accuracy of detection is beneficial to the task of facial feature detection as it
means that more points are available for the further processing (e.g. for use in a facial recognition
system).
FIGURE 6 - ERROR VS. NO. LANDMARKS[10]
EE544 Facial Feature Detection Review Luke Gahan
10
Active shape/appearance models have also been used in the detection of individual facial features. A
combination of these systems could potentially be used. For example [11] uses active shape models in
the localisation of the lip region. Though it should be noted that the authors state that “another image
processing algorithm” is used to first define a region of interest for the lips. A shape model is placed
in the centre of the region of interest at the mouth and used to detect the lip contour. The authors
present favourable results with the majority of results classed as “good” (which the authors define as
the entire lip contour being within the detected region). This paper really only shows a proof of
concept and a more robust testing and refinement of the algorithm seems to be required.
Both ASMs and AAM appear to be well suited to facial feature detection but perhaps they do not
provide a very accurate result in terms of feature localisation. Rather these methods can be used to
detect a feature but perhaps another method should be used to determine exact feature locations.
2.3 ANTHROPOMETRY BASED METHODS A lot of work has been done in the medical field in the area of anthropometry. Anthropometry, which
literally means the measurement of man, is the study of the distances and proportions between
different locations on the human body. In the case of facial feature detection we are obviously most
interest in facial anthropometry.
Some of the most important work in this area was carried out by Leslie Gabriel Farkas MD. In his
work “Anthropometric Facial Proportions in Medicine” he presented a large number of facial
proportions along with their mean and standard deviation values for a group of over 2500 healthy
subjects [12]. This information can be used in the development of facial feature detection systems. It
should be emphasised that these proportions are for healthy humans and studies have shown that a
marker for certain neurological diseases is facial asymmetry.
Sohail [13] presents a method where anthropometric methods are used in the detection of facial
features in 2D images. Rather than using all of the Farkas points, the authors choose a subset of them,
as well as some of their own landmarks. Figure 7 shows the points selected (a) and the anthropometric
distances (b) used (note, the landmarks shown below are used to define search areas but are not the
facial features to be detected). The authors calculated the distances shown below from a set of 300
images of 150 subjects. The algorithm begins by detecting the eye centre using a method proposed by
another author. The algorithm correctly detects the eye centres for 99% of the images in their data
base. The location of the eye centres is used to calculate the tilt of the head. This information is then
used to correctly align the head with the y axis. The two eye centres serve as the basis for the
detection of all other facial features. Using the measured anthropometric distances search areas can be
defined around each of the points (P1,P2 …). The authors use the areas around these seven points to
identify 18 different facial features. The eye corners and eyelid mid points are detected by examining
the contours of the eyelids in binary images. The ends of the eyebrows are detected by isolating the
eyebrow using Otsu thresholding and selecting the widest points of the remaining largest blob. The
nostrils are assumed to be the darkest points around the nose tip (P6) region. A Laplacian of Gaussian
filter is used to detect these regions. For detection of points around the mouth the authors use a non-
linear intensity transform which exploits the difference in intensity between the lips and the
surrounding skin. Thresholding is performed and the remaining mouth blob is examined to identify
the corners of the mouth and the midpoint of the upper and lower lips. The authors tested the system
on three well known face databases and obtained an average accuracy of 90.44% for the detection of
all 18 facial features. The authors note that the system has trouble with faces tilted over 25% from the
y axis. This algorithm also assumes frontal facing subject. While there are better preforming systems
EE544 Facial Feature Detection Review Luke Gahan
11
than this one, its main strength is that it uses knowledge of the facial structure to determine regions of
interest for particular facial features. By narrowing down the search region for particular features
using prior knowledge the methods of feature detection can be more specific (and simple). The system
performs quite well considering the relatively simple nature of the algorithm and the fact that it can
detect 18 facial features. A negative aspect of this system is that it relies on the correct detection of
the centre of both eyes before it can identify additional features.
FIGURE 7 - SELECTED LANDMARKS (A) & DISTANCES (B) [13]
Gupta [14] was the first to present a feature detection method which combines facial anthropometric
information and 3D images. The system uses both 2D texture and 3D range images for the detection
of the 10 facial features shown in Figure 8. These point are selected because the proportions
between them account for the large variations from person to person (this system identifies facial
features for face recognition). Like the system mentioned above this, system identifies one feature and
then uses anthropometric distances to define search regions for additional features. The system begins
by detecting the nose tip location using the Iterative Closest Point algorithm to find an estimate for the
nose tip. The system then uses the Gaussian and elliptical curvatures of the range image to detect the
exact nose tip (the peak in Gaussian curvature at the nose tip is used, as used by the method covered
in section 2.4). Anthropometric proportions are used to define a search area for the nose width points.
A Laplacian of Gaussian edge detector is used to find the width points. The widest points nearest the
nose tip in the horizontal direction are taken as the width points. The Gaussian curvature of the inner
eye is concave and this fact is used to detect inner eye corner estimates from the range image (also
used in the method by [15] in section 2.4). Using proportions defined by Farkas [12] and the nose tip
and width points, the search region for the inner eye corners is defined around the estimate locations.
The elastic bunch graph matching technique is used to detect the inner eye corner within the search
region. This technique uses a bunch of Gabor jets for detection in a similar fashion to the technique
used by [6] which is described in section 2.1.2 above. The centre of the nose root is taken to be the
mean of the two eye corner locations. The search regions for the outer corners of the eyes are defined
using the location of the inner eye and some anthropometric information. The EBGM is used in 2D to
find the outer eye corner. Mean and Gaussian curvature values are used to find estimates for mouth
corners while the search areas are defined using anthropometric proportions. The EBGM is once again
employed in the detection of the exact locations of the mouth corners.
EE544 Facial Feature Detection Review Luke Gahan
12
FIGURE 8 - POINTS TO BE DETECTED [14]
Most of the testing carried out on this algorithm is done with regard to face recognition. For the
detection of facial features the authors present their results in terms of the standard deviation in pixels
from the detected points to the actual locations of the features. These results are shown in Table 1
below. This shows that the standard deviation of the error for all features is ~2 pixels (In the images
used 1 pixel = 0.32mm so 2 pixels is still under 1mm). This method has some very good results but its
computational performance is very slow. The ICP algorithm is used to find an estimate for the nose
tip is particularly computationally expensive (though this can be carried out offline). The training of
the EBGM used for the detection of the various features is also a significant cost (this is also done
offline). The highly accurate but complex nature of this system suggests that it would be better suited
to medical applications or systems where real time processing is not required.
TABLE 1 - FEATURE DETECTION RESULTS
EE544 Facial Feature Detection Review Luke Gahan
13
2.4 3D VISION BASED METHODS 3D Vision techniques for facial feature detection have become much more popular in recent years.
This is due the increase in availability of 3D image capturing devices. By definition all 3D images
include depth information but some 3D image acquisition systems also capture colour (or greyscale)
information which may also be used in feature detection. Some techniques use just 3D information
while other techniques use both texture and range information.
A technique proposed by Segundo [15] uses just range images for facial feature detection. The
algorithm they employ is based on methods developed for 2D but applied to 3D range images. The
authors use a combination of clustering, edge detection and the Hough transform to first isolate the
face region. K-means clustering is used to identify the background, body and face in the range image.
Edge detection is used to identify the facial edges in the image. A Hough transform is then used to
detect the elliptical face boundary (as used by HSU on 2D images, shown in the method described in
section 2.1.1). Once the face has been localised this region of interest is examined for facial features.
The Gaussian and elliptical curvatures of the range image are calculated and this is used to determine
the location of the nose tip by exploiting the fact that the largest Gaussian curvature values occur
around the nose. Using similar curvature information the nose width points can also be located.
Curvature information is once again used to detect eye corners, this time search for pits rather than
peaks. As shown in Table 2 this algorithm performs extremely well. While these figures are very
impressive it should be noted the 3D images in this database are captured under laboratory conditions.
The performance would no doubt be lower in real world applications. A method such as this does
seem very well suited to medical applications.
TABLE 2 - FRGC 2.0 RESULTS [15]
Wang [16] presents a method which uses 2D texture images and 3D range images to detect facial
features. The authors devise a system which detects four features from the 3D range image and =10
from the 2D texture image. Four particular points are detected in both 2D and 3D. Figure 9 shows the
features which are detected by the algorithm. Those marked X are detected in both 2D and 3D. Gabor
filters are used for detection of the features in 2D space. The Gabor filters are used in the same
manner as Wiskcott uses them in [7] (as described previously, see section 2.1.2). To detect the
features in 3D the authors use point signature method. This is done by manually identifying features
in 3D training images. A sphere is then placed at the manually selected feature location. A curve is
described by the intersection of the sphere and the face surface. This can be used to characterise a
given feature and then used on an unknown image for feature detection. This system is developed for
facial recognition and the paper doesn’t present results for the accuracy of feature detection. In terms
of facial recognition, the authors report a detection rate of ~90% when the SVM classifier is used with
the system. The system proposed by Gupta [14] (see section 2.3) appears to perform better than this
one, based on recognition results.
EE544 Facial Feature Detection Review Luke Gahan
14
FIGURE 9 - FEATURES FOR DETECTION
3. CONCLUSION The purpose of this survey paper is to review a number of facial feature detection methods with a
view to choosing a technique for the purpose of head tilt estimation. This conclusion begins with my
view on the best facial feature detection method in terms of correctly identifying feature locations.
Following this, I give my opinion on the best method for the task of head tilt estimation
Clearly there are a vast number of approaches to facial feature detection and to a large extent the
choice of technique is application dependent. In terms of just identifying the correct location of facial
features then it very hard to look past the 3D methods proposed by Gupta [14] and Segondo [15].
Both of these methods performed extremely well in testing and if accuracy of detection is the prime
concern then one of these methods should be chosen. My personal preference is for Gupta because of
the anthropometric information which is included in the algorithm. Using prior knowledge of feature
positions to define the size of search regions not only reduces the complexity of the problem but also
includes real world information in the system. The fact that these methods require 3D images means
that their application is limited at the present time.
The assignment we have been set uses 2D colour images and as a results the method proposed by
Gupta cannot be used. For the purpose of head tilt estimation I think that a method based on the work
by Wiscott [6] is the best approach. Gabor filters could be used for the detection of the corners of the
mouth and eyes. The search region for each feature should be defined using the anthropometric
proportions defined by Farkas. When using anthropometric measurement the procedure seems to be
that an initial feature is detected and then search regions for subsequent features are determined from
that. A colour transform like the one used by [3] could be used to identify face and then a red/blue
transform could be used to locate the mouth. From this starting point other features could be detected.
EE544 Facial Feature Detection Review Luke Gahan
15
References
[1] P. F. Whelan, "Interest point detection," in DCU EE544 Course Notes Anonymous 2013,
[2] P. F. Buckley, et al. (2005 A three-dimensional morphometric study of craniofacial shape in
schizophrenia. Am. J. Psychiatry 162(3), pp. 606-608.
[3] R. Hsu, M. Abdel-Mottaleb and A. K. Jain. (2002 Face detection in color images. Pattern Analysis
and Machine Intelligence, IEEE Transactions On 24(5), pp. 696-706.
[4] J. R. Movellan. (2002 Tutorial on gabor filters. Open Source Document
[5] T. F. Cootes and C. J. Taylor. Active shape models–smart snakes. Presented at Proc. British
Machine Vision Conference.
[6] L. Wiskott, et al. (1997 Face recognition by elastic bunch graph matching. Pattern Analysis and
Machine Intelligence, IEEE Transactions On 19(7), pp. 775-779.
[7] T. F. Cootes, G. J. Edwards and C. J. Taylor. (2001 Active appearance models. Pattern Analysis
and Machine Intelligence, IEEE Transactions On 23(6), pp. 681-685.
[8] T. F. Cootes, G. Edwards and C. J. Taylor. Comparing active shape models with active appearance
models.
[9] S. Milborrow and F. Nicolls. (2008 Locating facial features with an extended active shape model.
Computer Vision–ECCV 2008 pp. 504-513.
[10] J. Luettin, N. A. Thacker and S. W. Beet. (1996 Active shape models for visual speech feature
extraction. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES 150pp. 383-390.
[11] L. G. Farkas and I. R. Munro. (1987), Anthropometric Facial Proportions in Medicine,
[12] A. S. M. Sohail and P. Bhattacharya. (2008 Detection of facial feature points using
anthropometric face model. Signal Processing for Image Enhancement and Multimedia Processing
pp. 189-200.
[13] S. Gupta, M. K. Markey and A. C. Bovik. (2010 Anthropometric 3D face recognition.
International Journal of Computer Vision 90(3), pp. 331-349.
[14] M. P. Segundo, C. Queirolo, O. R. Bellon and L. Silva. Automatic 3D facial segmentation and
landmark detection. Presented at Image Analysis and Processing, 2007. ICIAP 2007. 14th
International Conference On.
[15] Y. Wang, C. Chua and Y. Ho. (2002 Facial feature detection and face recognition from 2D and
3D images. Pattern Recog. Lett. 23(10), pp. 1191-1202.