Comparison of Holistic and Feature Based Approaches to Face Recognition€¦ · be to provide a...

COMPARISON OF HOLISTIC AND FEATURE BASED

APPROACHES TO FACE RECOGNITION

A minor thesis submitted in partial fulfillment to the degree of

Masters of Applied Science in Information Technology

STEWART TSENG

School of Computer Science and Information Technology,Faculty of Applied Science,

Royal Melbourne Institute of Technology University,Melbourne, Victoria, Australia.

10th of July 2003

Declaration

I certify that all work on this thesis was carried out between March 2003 and July 2003 and it has

not been submitted for any academic award at any other college, institute or university. The work

presented was carried out under the supervision of Dr. Ron Van Schydnel and Dr. Vic Ciesielski.

All other work in the thesis is my own except where acknowledged in the text.

Signed,

Stewart Tseng

10th of July 2003

i

Contents

Acknowledgements iii

Abstract 1

1 Introduction 2

2 Background 5

2.1 Early Face Recognition Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Brief Overview of Face Recognition Approaches . . . . . . . . . . . . . . . . . . . . . . 7

3 Holistic Face Recognition 13

3.1 Karhunen-Loeve expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Feature Based Face Recognition 28

4.1 General Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Elastic Bunch Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Experiments 35

5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusions 42

A Covariance Matrix 51

B Generalised Eigenproblem 52

ii

Acknowledgements

First and foremost, I would like to sincerely thank my supervisors, Ron Van Schydnel and

Vic Ciesielski for their invaluable supervision, direction as well as the many interesting and

constructive discussions, which have helped shape this minor thesis.

Special thanks to the following people who have assisted this research including Justin Zobel

(RMIT University), Albert Lionelle (Colorado State University), Matthew Turk (University of

California), Laurenz Wiskott (Humboldt-University Berlin), Joao Hespanha (University of Cali-

fornia), Peter Kalocsai (University of Southern California) and Aleix Martinez (Ohio State Uni-

versity).

iii

Abstract

Face recognition relates to identifying or verifying individuals by their faces. There are many

face recognition approaches. These can be classified as either holistic or feature based. At

present, there have been only a small number of investigations comparing holistic and fea-

ture based approaches. Our goal is to compare one holistic and one feature-based approach

to a collection of 3,282 face images. Additionally, we have surveyed recent holistic and fea-

ture based approaches. We have compared the performance of the Eigenface and Elastic Bunch

Graph Matching and found that the Elastic Bunch Graph Matching achieved a recognition rate of

96.2%, which was significantly higher than the 71.6% recognition rate achieved by the Eigenface

approach.

1

Chapter 1

Introduction

Face recognition is concerned with identifying individuals from a collection of face images. Face

recognition pertains to a vast range of biometric approaches including fingerprint, iris/retina and

voice recognition. Overall, biometric approaches are concerned with identifying individuals by

their unique physical characteristics. Traditionally, the use of passwords and Personal Identifica-

tion Numbers have been employed to formally identify individuals but the disadvantages of such

methods are that someone else may use them or they can be easily forgotten. Given these prob-

lems, the development of biometrics approaches such as face recognition, fingerprint, iris/retina

and voice recognition provides a far superior solution when identifying individuals, because not

only does it uniquely identify individuals, but it also minimises the risk of someone else using

another person’s identity. However, a disadvantage of fingerprint, iris/retina and voice recog-

nition is they require active cooperation from individuals. For example, fingerprint recognition

requires participants to press their fingers onto a fingerprint reading device, iris/retina recognition

requires participants to actively stand in front of a iris/retina scanning device, or, voice recogni-

tion requires participants to actively speak into a microphone device. Therefore, face recognition

is considered a better approach to other biometrics because it is versatile in the sense that indi-

viduals are identified actively, by standing in front of a face scanner, or passively, as they walk

past a face scanner.

2

There are also disadvantages of using face recognition. Faces are highly dynamic and can vary

considerably in their orientation, lighting, scale and facial expression, therefore face recognition

is considered a difficult problem to solve. Given these problems, many researchers from a range

of disciplines including pattern recognition, computer vision and artificial intelligence have pro-

posed many solutions to minimise such difficulties in addition to improving the robustness and

accuracy of such approaches. As many approaches have been proposed, there have also been

extensive surveys written over the last thirty years (Goldstein et al. 1971, Kaya & Kobayashi

1972, Harmon et al. 1981, Samal & Iyengar 1992, Chellappa et al. 1995, Zhao et al. 2000).

One can fundamentally classify the proposed approaches as either holistic based, where faces are

recognised using global features from faces, or, featured based, where faces are recognised using

local features from faces. However, the features used in holistic and feature-based approaches

are fundamentally different. Features found from holistic approaches represent the optimal vari-

ances of pixel data in face images that are used to uniquely identify one individual to another.

Alternatively, the features found from feature-based approaches represent face features like the

eyes, noses and mouth, where these features are used to uniquely identify individuals.

The first goal of this paper is to compare the face recognition performance of a holistic and a

feature-based approach. We achieve our comparison by testing these two approaches on the AR

face database (Martinez & Benavente 1998). In addition to our comparison, the second goal will

be to provide a survey of recent holistic and feature-based approaches, which we complement

previously written surveys on face recognition. It is implied that all discussed approaches in this

paper use latent two-dimensional grey scale images to represent faces. Furthermore, we define

that a dataset refers to a collection of faces, the gallery set refers to the training set and the probe

set refers to the test set.

Face recognition has far reaching benefits to corporations, the government and the greater society.

The application of face recognition to corporations include access to computers, secure networks

and video conferencing; access to office buildings and restricted sections of these buildings; ac-

cess to storage archives, or, identifying members at conferences and annual general meetings.

3

Specific corporate applications include access and authorisation to operate machinery, clocking

on and clocking off when beginning and finishing work, assignment of work responsibilities and

accountability based on identity, monitoring employees, or, confirming the identity of clients,

suppliers and transport and logistics companies when they send and receive packages. Addition-

ally, sales, marketing and advertising companies could identify their customers in conjunction

with customer relationship management software.

Application of face recognition to state and federal governments may include, access to parlia-

mentary buildings, press conferences and access to secure confidential government documents

and reports and doctrines. Specific government use of face recognition can include, Australian

Customs verifying the identity of individuals to their passport files and documents, or, state and

federal police using face recognition to improve crime prevention and facilitate police activities.

Application of face recognition to the greater society may include election voting registration,

access to venues and functions, verifying the identity of driver’s to their issued licenses and

personal identification cards, confirming identity for point of sales transactions like credit cards

and EFTPOS and confirming identity when accessing funds from an automatic teller machine.

Other applications of face recognition include facilitating home security, or, gaining access to

motor vehicles.

As there are many potential applications of face recognition, our paper will limit our discussion

to only the computational aspects of face recognition. This paper shall begin with earlier de-

velopments of face recognition and an overview of other face recognition approaches in chapter

2. In chapter 3, we will focus on describing recent holistic approaches, while in chapter 4, we

shall describe recent feature based approaches. We then compare the performance of a holistic

approach to a feature-based approach and report the outcomes found in chapter 5. We shall then

conclude by summarising the paper, as well as, provide insight on our future work and outline

three leading commercial vendors in chapter 6. We shall begin with the background of face

recognition.

4

Chapter 2

Background

In this chapter, we shall outline earlier approaches of face recognition in section 2.1, which have

contributed to the development of face recognition. In section 2.2, we will describe an overview

of a number of approaches being presently utilised in face recognition. We now begin with a

discussion on the earlier developments of face recognition.

2.1 Early Face Recognition Developments

The central question of face recognition is how do people recognise faces? More importantly,

what are the processes for recognising faces. These questions have led to many investigations

by the cognitive psychology and neurophysiology field. In respect to face recognition, cognitive

psychology and neurophysiology are concerned with the empirical evaluation and theoretical

development of cognitive processes that enables conceptual and practical understanding of how

the human cognition recognises faces. Overall, cognitive psychologists and neurophysiologist

implied that through such investigations they would be able to reconstruct the cognitive processes

of the human cognition. Therefore, they would be able to predict the patterns used by the human

cognition to recognise faces (Young & Bruce 1991).

5

Such studies have included the familiarity and unfamiliarity of faces (Benton 1980), investiga-

tions into people with Prosopagnosia, where the patients are no longer able to recognise faces

of previously known individuals (Hecaen & Angelergues 1962) and the effects of face distinc-

tiveness (Shepherd et al. 1991) when recognising faces. Other studies have also included face

uniqueness (Hay & Young 1982) and use of face caricatures, which distort faces to improve their

uniqueness and distinction amongst the general population (Rhodes et al. 1987, Benson & Perret

1991).

Specific to cognitive psychology, several different but complementary information processing

models have been proposed (Hay & Young 1982, Ellis 1986, Bruce & Young 1986). Even though

some argued that these information processing models are too generalised and abstract in their

representations of the human cognition, these models were still applied to test the validity of

their hypotheses, which were achieved with the use of computers. For example, in Sergent

(1989) investigation, they used computers to facilitate with their research and understanding into

face recognition.

In parallel to cognitive psychology and neurophysiology, the field of computer science was in-

terested in constructing computational models to achieve face recognition. Fundamentally, face

recognition interrelates to and is influenced by computer vision and artificial intelligence (Clowes

1971, Minsky 1975, Marr 1980). This led to initial developments of face recognition systems

(Baron 1981, Kohonen 1984, Stonham 1986). However, it was the work of Kanade (1973) who

had combined low-level processing of images and high level decision making to develop one of

the first full face recognition systems. The majority of others had manually selected features to

perform face recognition, whereas Kanade (1973) opted to use an autonomous approach to se-

lect feature points of the face. The system identified people via three stages of extracting features

autonomously, adjusting the location of feature points to different face sizes and measuring the

similarity between two faces based on the Euclidean distance amongst the feature points. Specif-

ically, the first stage of the system captured a coarse resolution of the face via a face scanner or

television camera. The coarse images scanned were transformed to a binary image, by applying

a Laplacian operator with a threshold of 30. After the transformation, the vertical and horizontal

6

integral projection of the binary images were used to locate and extract feature points of the face

(Kanade 1973). The second part was similar except faces were scanned at higher resolutions,

but only at specific locations. The specific locations were based on the extracted feature points

of coarsely scanned faces. Furthermore, faces scanned at higher resolutions were achieved by

using a face scanner or television camera. In the second stage, the system adjusted the features

points, found from high resolution faces, to match the ratio and location to different face sizes.

In the last stage, the system determined whether a probe face matched a gallery face, which

was defined by the smallest Euclidean distance amongst the feature points between a probe and

gallery set. Kanade (1973) reported achieving a recognition rate of approximately 45% to 75%

with a dataset of 40 faces. The 40 faces consisted of 20 faces that formed the gallery set and

another 20 faces for the probe set. Kanade (1973) reported the system correctly recognised 15

probe faces from the 20 gallery faces. Looking forward to the present, there have been many

significant contributions since the work of Kanade (1973). In the next section, we shall discuss a

present overview of face recognition approaches.

2.2 Brief Overview of Face Recognition Approaches

The area of face recognition has seen many diverse, novel and promising approaches proposed.

In this section, although we only provide an overview of present approaches in the face recog-

nition literature, we also describe various face datasets that have been used to test the feasibility

of such approaches. In addition, we shall outline an industry standard dataset, which has pro-

vided a common measure of accuracy to evaluate the performance of different face recognition

approaches.

The use of Support Vector Machines (Vapnik 1998, Burges 1998) has previously been used for

general pattern recognition, but also applied to face recognition by Guo et al. (2000). Guo et al.

(2000) had used Support Vector Machines in conjunction with a binary decision tree, which

they found comparable improvements to the Eigenface approach (Turk & Pentland 1991). Also,

Xi et al. (2002) reported using Support Vector Machines for face recognition where they were

7

able to accurately and autonomously extract features from faces. Tefas et al. (2001) had used

Support Vector Machines with the elastic bunch graph matching approach (Wiskott et al. 1999)

and reported achieving a lower error rate for recognising faces when compared to the standard

elastic bunch graph matching approach (Wiskott et al. 1999). While a general evaluation of

Support Vector Machines to face recognition was proposed by Johnsson et al. (1999).

Fast Fourier Transform was another approach applied to face recognition. In Colmenarez &

Huang (1998), they proposed using the Fast Fourier Transform approach to represent the gallery

set as a number of templates. Face recognition for this approach was achieved by minimising the

Euclidean distance amongst the peaks found between comparing a gallery face to a probe face.

In contrast, Kalocsai & Biederman (1998) had used Fast Fourier Transform to stimulate neural

and behavioural effects of the humans visual system when recognising faces.

Another face recognition approach was the Hidden Markov Model. This approach had been used

in speech and character recognition, but was applied to face recognition by Samaria & Fallside

(1993), Samaria (1994) and Samaria & Young (1994).

In contrast, the majority of face recognition approaches focused only on directly recognising

faces, they did not specifically consider the inherent difficulties of lighting variations on faces.

Consequently, lighting variations led to a reduction in face recognition performance. Thus, Adini

et al. (1997) provided an empirical study to evaluate whether a selected number of approaches

were invariant to various lighting conditions being tested. They had constructed a database of

125 images that included 25 individuals. Five face images were taken of each person where they

strictly controlled the lighting variations. Lighting variation included the illumination of the left

side and the right side of the face. They found by not pre-processing grey scale faces, where

faces would normally be normalised to the average face of a gallery set, the approaches being

evaluated would always incorrectly identify a gallery face to a probe face. This was indicated

by a 100% error rate, where the error rate defined the incorrect identification of individuals.

In addition, they found by pre-processing the grey scale faces the incorrect identification rate

varied from 20% to 100%. Similarly, in another evaluation, Georghiades et al. (1998) had also

8

conducted an investigation into the effects of lighting variations on faces.

As lighting and other variations were important factors to consider when performing face recog-

nition, there was also a common need to compare different face recognition approaches. In

addition, there needed to be a standard dataset for both the gallery and probe set to enable the

benchmarking of performances for face recognition approaches. Therefore, Phillips et al. (2000)

proposed to test the performance and feasibility of face recognition approaches on a large dataset.

Furthermore, they proposed to develop a common measure of performance in order to evaluate

face recognition approaches as well as identify future research directions. A large dataset was

constructed for the Face Recognition Technology (FERET) program (Phillips et al. 2000), which

contained 14,126 images of 1,199 individuals. The performance measure was based on the cu-

mulative recognition rate. The cumulative recognition rate was found by firstly ranking each

probe set in relation to the gallery set. The rank indicates how well a probe face matches an

intended gallery face, given that the probe and gallery face represents the same individual. So for

a probe face with a rank of 1, this indicates the probe face has the smallest Euclidean distance

to an intended gallery face. Therefore, one can generally conclude that a probe face with the

nth smallest Euclidean distance to the intended gallery face has a rank of n (Phillips et al. 2000).

Once the rank is found for each probe face, a cumulative rank score is calculated for each rank.

This is found by the sum of each probe face within a rank of n and is then divided by the total

number of probe faces. It is the percentage of this cumulative rank score which denotes the cu-

mulative recognition rate. For example, given we have determined the rank for the probe faces,

we have 15 faces within a rank of 10 (inclusive of rank 1 to rank 10) and a total of 20 probe

faces. The cumulative recognition rate in this instance would be 75% at rank 10. In a further

investigation, Phillips et al. (2003a) compared 10 commercial vendor products against a much

larger database, containing 121,589 images of 37,437 individuals. The faces were originally

taken from a collection of approximately 6.8 million images of 6.1 million individuals (Phillips

et al. 2003a).

In the past, others have constructed a probe and gallery set for the purposes of evaluating their

own proposed approaches, shown in tables 2.1 and 2.2. The list of the datasets are further dis-

9

cussed in chapters 3 and 4. In addition, tables 2.1 and 2.2 include the recognition rate achieved

on such datasets, where we briefly explain these results in the footnote on page 12.

As a typical gallery set would only contain a single view of a known individual, it was most likely

those single views were also the frontal views of faces. Motivated by this limitation, Beymer &

Poggio (1995) proposed a pose invariant face recognition approach, which generated virtual ori-

entation views from the affine transformation of frontal views of faces. Another similar but

independent proposal on pose invariant face recognition was investigated by Lando & Edelman

(1995). Complementary to pose invariant investigations, Moses et al. (1996) experimented with

the effects of using upright faces versus inverted faces of individuals, whereby faces had varying

poses. In addition, the inverted faces were created from the upright faces. What they had dis-

covered from their experiments was that inverted faces did not perform as well as using upright

faces, since the inverted faces were treated as a different face class and therefore significantly

reduced the face recognition performance. In a further investigation by Moses & Ullman (1998),

they evaluated global, local and hybrid approaches of face recognition based on their analysis on

the effects of using frontal and varying poses of faces. The hybrid approaches as discussed by

Moses & Ullman (1998), was based on using both a global and local approach to achieve face

recognition.

A comparison of feature based versus template matching approaches was put forth by Brunelli

& Poggio (1993). Thus, we were also interested to provide a recent account of developments

in holistic and featured based approaches. Hence forth it is the focus of the next two chapters

to describe holistic face recognition approaches in chapter 3 and feature based face recognition

approaches in chapter 4.

10

Face Datasets Individual Variation Total Image ImageType

Dimension Rate

Sirovich & Kirby (1987) 115 113 male2 female

115 GreyScale

128�128 -

Turk & Pentland (1991) 16 male 2500 GreyScale

512�512256�256128�12864�6432�3216�16

96%�

85%�

64%�

O’Toole et al. (1993) 150 75 male75 fe-maleapprox

150 GreyScale (16levels)

151�225 98.7%�

Pentland et al. (1994) [21] [45] [9] [2] [189] [90] GreyScale

- 90%�

95%�

Moghaddam et al. (1998) 112 frontalview

224 GreyScale

- 89.5%�

Belhumeur et al. (1997) [5] [16] [66] [10] [330] [160] GreyScale

- -

Yang et al. (2000) [15] [40] [11] [10] [165] [400] GreyScale

29�41 -

Swets & Weng (1996) 504 - 1614 GreyScale

- 98%�

Zhao et al. (1999) - - 115 - - -Kim et al. (2002) 133 lighting 665 - - -

Table 2.1: Face Datasets A: Provides an overview of the face dataset used by various face

recognition approaches, as well as, details on how many different individuals were represented,

the gender of the individuals, the total number of images that were used in their experiments, the

format type of the images, the different image dimensions used, and the recognition rate.

11

Face Datasets Individual Variation Total Image ImageType

Dimension Rate

Lu et al. (2003) [20] [40] [-] [10] [575] [400] GreyScale

112�92 -

Craw et al. (1992) [1] [-] [-] [-] [64] [50] GreyScale

- 95%

Manjunath et al. (1992) 86 facialexpres-sion &orienta-tion

306 GreyScale

128�128 94%�

Wiskott et al. (1999) [250][108]

frontalorien-tatation

[500] [216] GreyScale

256�384256�384

99%��

97%��

Wiskott et al. (1995) 111 malefemalebeardedglasses

111 GreyScale

128�128 90.2%��

Kalocsai et al. (2000) 325 2 650 GreyScale

- 93%��

Table 2.2: Face Datasets B: Provides an overview of the face dataset used by various face

recognition approaches, as well as, details on how many different individuals were represented,

the gender of the individuals, the total number of images that were used in their experiments, the

format type of the images, the different image dimensions used, and the recognition rate.

1. averaged over lighting2. averaged over orientation3. averaged over different face scales4. used 1 to 100 eigenvectors for recognition5. view based recognition of 9 different orientations (-90� to +90�) of the face6. modular eigenspace recognition based on the eigeneyes and eigenfaces7. based on the intrapersonal and extrapersonal class recognition rate8. within the top 15 correctly identified individuals9. based on recognising a moving sequence of a single person10. within the top 3 correctly identified11. within the top 10 correctly identified12. within the top 4 correctly identified13. correctly identified gender14. used 100 kernels that statistically provided the most information for recognition

12

Chapter 3

Holistic Face Recognition

Holistic face recognition utilises global information from faces to perform face recognition. The

global information from faces is fundamentally represented by a small number of features which

are directly derived from the pixel information of face images. These small number of features

distinctly capture the variance among different individual faces and therefore are used to uniquely

identify individuals. In this chapter, we shall describe two holistic approaches to face recognition

called Karhunen-Loeve expansion and Linear Discriminant Analysis.

In the Karhunen-Loeve expansion section, we define the Karhunen-Loeve expansion concept

and explain the use of partial and full faces by Sirovich & Kirby (1987) and Kirby & Sirovich

(1990), as well as, an influential paper on the Eigenface approach by Turk & Pentland (1991).

Additionally, we will also discuss the Karhunen-Loeve expansion been applied to face recogni-

tion by others (O’Toole et al. 1993, Pentland et al. 1994, Moghaddam et al. 1998, Yang et al.

2000).

Alternatively, in the Linear Discriminant Analysis section, we define the concept of Linear Dis-

criminant Analysis. Furthermore, we shall discuss the work of Belhumeur et al. (1997) on Fisher-

faces and many others who have proposed variations of Linear Discriminant Analysis to achieve

face recognition (Swets & Weng 1996, Zhao et al. 1999, Kim et al. 2002, Lu et al. 2003).

13

3.1 Karhunen-Loeve expansion

The Karhunen-Loeve expansion, also known as Principal Component Analysis or Hotelling

transform, is traditionally concerned with feature selection for signal representation. By apply-

ing Karhunen-Loeve expansion to face recognition, it finds a small number of features, which are

defined by the principal components of the face. The principal components of the face are found

by projecting two dimensional face images into a one dimensional subspace, then selecting the

principal components which capture the highest variances amongst individual faces. Specifically,

the principal components of the face are found by solving the eigenvectors and eigenvalues of

the covariance matrix (Appendix B). In other words, the eigenvectors consist of a small number

of features that represent variations amongst faces in the dataset. The small number of features

can also be called the feature space. Furthermore, by finding the best features that represent faces

there is greater efficiency in memory storage and run-time execution.

To find the eigenvectors u, from the dataset �, we consider the face as a grey scale image where

each pixel in the image represents a vector, thus a face image is a matrix of vectors ��

��

��

...

��

�� (3.1)

where the face image is represented as a one dimensional column vector in raster order. For

example, the end of the first row of the face image is joined by the start of the second row of the

face image and the process continues for the rest of the face image.

Then for each face � in the dataset � we find the the average face �� for the dataset �, where ��

denotes each face, which we previously referred to as �� for a single face in the dataset �. ��

centralises the average face �� near the coordinate system origin and M is the size of the dataset.

��

�

��

�� (3.2)

14

Figure 3.1: Average Face (Sirovich & Kirby 1987) From left to right, normal face, average face for the

dataset, and normalised face, which is difference between the normal face and the average face

Once the average face �� (equation: 3.2) is found, we normalise the face �� by subtracting the

average face �� from each face �� in the dataset �,

�� (3.3)

As eigenvectors u is derived from the covariance matrix C (Appendix A.1), the covariance ma-

trix C represents the relationship between two matrices and in this application also represents the

variance of the dataset (Fukunaga 1990), where �� represents the normalised matrix transposi-

tion of the face image

C ��

�

��

�� (3.4)

To find the eigenvectors u we can solve the following

�� (3.5)

where �� is the eigenvalues, given the eigenvalues �� satisfy the equation, �� .

15

In choosing the optimal eigenvectors ��, we find the highest eigenvalues �� of the covariance

matrix

��

�

��

�� (3.6)

where M highest eigenvalues �� represents the largest variance for the normalised face ��, whereas

the remaining eigenvalues �� would have variances close to zero. The variances close to zero

do not provide any significant discriminatory information to facilitating face recognition.

Having found the M highest ranked eigenvalues ��, we can find the associated eigenvectors �� by

the following

��

�� (3.7)

By finding the eigenvectors u for the dataset �, the optimal projection is defined as

��

�� (3.8)

where is the feature space or small number of features representing the dataset � and �� is the

optimal transformation matrix for the dataset �. We shall now describe how the Karhunen-Loeve

expansion has been applied to face recognition.

Sirovich & Kirby (1987) were the first to apply the Karhunen-Loeve expansion to face recog-

nition. In their initial investigation of face recognition, they used a small number of features to

represent the dataset. Once they had reconstructed the small number of features to form images,

they concluded the images resembled faces, which they called Eigenpictures (see figure 3.2 on

page 27, the first 8 principal components of the upper portion of a face are shown). They tested

their approach on a gallery set of 115 grey scale faces that consisted of 113 males and 2 females.

The dimensions of the images in the gallery set were 128 � 128 pixels and contained small

lighting variations. The images were frontal views of the face but were manually cropped to

16

only contain the eyes and nose. From their experiments of using 40 features, they found an error

rate of 3.9% and 2.4% respectively for two female probe faces. The error rate was based on a

probe set of 115 faces that consisted of 113 males and 2 female faces. They concluded from the

error rates that their approach was gender independent. In another experiment, they also used 40

features and found an error rate of 7.8% where faces with insufficient lighting were tested on the

same probe set. While a limitation of this investigation was the use of partial faces, we shall see

in the following investigation the advantages of using full faces for face recognition.

Kirby & Sirovich (1990) extended their previous investigation (Sirovich & Kirby 1987) on using

Karhunen-Loeve expansion to use full frontal faces and face reflections. Face reflections were

created by mirroring images about the mid-line of the face. Using full frontal faces and face

reflections, they investigated the effects of even and odd symmetry Eigenfunctions to perform

face recognition (Kirby & Sirovich 1990). From their experiments, they discovered a majority of

the small number of features corresponded to the even symmetry Eigenfunctions. Therefore, they

concluded the even symmetry Eigenfunctions represented the underlying, general symmetrical

and structural properties of the face. They concluded that even and odd symmetry Eigenfunctions

facilitated the identification of individuals. Furthermore, even and odd symmetry Eigenfunctions

reduced the face recognition error rate for probe faces which were not part of the gallery set.

Motivated by the work of Sirovich & Kirby (1987), Turk & Pentland (1991) applied the Karhunen-

Loeve expansion called Eigenfaces, to achieve face recognition. Turk & Pentland (1991) wanted

to use global and holistic information of the face to perform face recognition whereas others

had previously used individual and local facial features (Turk 2001). They proposed a general

framework to face recognition. The framework defined the face space for face recognition and

defined the face map for face detection. The face space was defined by the Eigenfaces con-

structed from the gallery set. Thus, the framework transformed probe faces into their Eigenface

equivalents (probe Eigenface). The probe Eigenfaces were then matched to the face space to

determine whether faces matched a known individual or not. Specifically, the closeness of the

probe Eigenfaces to face space was determined by minimising the Euclidean distance for the

face class metric and face space metric. For illustrative purposes, we outline the following two

17

metrics:

The face class metric is defined as

�� (3.9)

where ��

� ��

where � is the probe Eigenfaces and � is the face class f.

The face space metric defined as

�� (3.10)

where ��

� ��

where � is the mean adjusted Eigenfaces (� � Eigenfaces derived from the input faces minus

the average faces) and � is the face space.

Given the face class and face space metric determined whether the probe Eigenfaces did match

a face in the gallery set, the face class and face space specified an arbitrary threshold ��. For

example, if the probe Eigenfaces were below the threshold �� for the face class and face space,

then the face was recognised. Otherwise, if the probe Eigenfaces were above the threshold ��,

the probe Eigenfaces were classified as unknown. Additionally, if the same probe faces were

consistently being classified as unknown, the classified unknown faces would therefore be added

to the face space and face class.

The face space had four possible outcomes to indicate how close a probe Eigenfaces matched a

face in the gallery set:

1. close to face class and close to face space

2. far from face class and close to face space

3. close to face class and far from face space

18

4. far from face class and far from face space

Whilst Turk & Pentland (1991) selected a small number of features with the highest variances,

O’Toole et al. (1993) investigated whether the selection of features with different variances im-

proved face recognition. In their experiments, O’Toole et al. (1993) explored the effects of

selecting features with higher variances as compared to selecting features with lower variances

for face recognition. They discovered by selecting the first 15 features with the highest variances,

general features of faces were captured whereas selecting features with lower variances unique

features of faces were captured for face recognition.

In contrast, Pentland et al. (1994) observed the limitations of face recognition and included face

variations in lighting, orientation and scale. They proposed to extend the existing Eigenface ap-

proach (Turk & Pentland 1991) to handle a larger dataset that contained 7,562 face images of

approximately 3,000 individual. Their proposal included a view-based and modular eigenspace

approach to face recognition. The view-based approach extended a single set of features for the

gallery set to include multiple set of features to represent the gallery set. The multiple set of

features captured the scale and orientation variance that was present in the gallery set. Thus,

the view-based approach would encompass the scale and orientation variations in M indepen-

dent subspaces. They rationalised by using M independent subspaces, when compared to the

Eigenface approach of only using one dimensional subspace, they were able to capture and rep-

resent the shape and geometry of faces. For their other approach, they proposed using a modular

eigenspace, where individual features of faces were modularised to form the Eigenface equiva-

lents; the eyes, nose and mouth were defined as the eigeneyes, eigennose and eigenmouth respec-

tively. They stated by using a modular eigenspace their approach was robust to large variations in

the gallery set. Additionally, they suggested a coarse-to-fine strategy could be employed, where

at the coarse level, eigenfaces were used to recognise whole faces and at the finite level, the

eigeneyes, eigennose and eigenmouth were used to recognise individuals by their facial features.

Moghaddam et al. (1998) proposed to improve the Eigenface approach. They stated a limita-

tion of the Eigenface approach was its face similarity measure. They explained by using the

19

Euclidean distance to measure face similarity, it did not capture significant face variations that

could improve face recognition. Therefore they defined a different face similarity measure. It was

based on a probabilistic measure of face similarity, which they incorporated into the Eigenface

approach. They proposed that the probabilistic face similarity measure would capture significant

face variations, which was defined by two mutually exclusive classes, the intrapersonal and ex-

trapersonal class. The intrapersonal class captured the face variations of the same individuals,

which also included variations in lighting and facial expressions. In contrast, the extrapersonal

class captured variations amongst different individuals. They tested their approach on the 1996

FERET dataset (Phillips et al. 2000), which they found their proposed face similarity measure

improved the recognition of faces by up to 10% when compared to using the Euclidean distance

to measure face similarity.

A different approach to face recognition was proposed by Yang et al. (2000). Based on higher

order statistics (Rajagopalan et al. 1999), Yang et al. (2000) proposed the kernel Eigenface ap-

proach to capture high level information such as edges and curves from three or more pixels of

face images. Therefore they inferred by capturing such relationships within face images it would

improve face recognition. Their approach was an improvement to the Eigenface approach. They

stated the fundamental difference between the kernel Eigenface feature space and Eigenface

feature space was that kernel Eigenface projected faces into higher dimensional feature space,

whereas the latter Eigenface approach projected into a lower dimensional subspace. The higher

dimensional projection space for the kernel Eigenface can be formalised by

�� (3.11)

where is the feature space found from the high dimensional space, �� is the original face

image with a dimensionality size of a and �� is the new face image with a dimensionality size

of � where � � a implies that � represents a high dimensional feature space that is greater than

the a dimensionality feature space.

20

The previous covariance matrix shown in eq. 3.4 now becomes

� ��

�

��

�� (3.12)

where the defines the new size of the covariance matrix � .

Given this difference, they defined that even though the kernel Eigenface was a nonlinear tech-

nique, they were still able to find a small number of features by finding the principal components

of the face, where these features could be used to uniquely identify individuals. In their ex-

periments they compared the Eigenface and kernel Eigenface approach, in which they used two

datasets; the Yale (Yale Face Database 1997) and AT&T (AT&T Face Database 1994) dataset.

The Yale dataset consisted of 165 images of 11 different individuals and contained variations

in face expression and lighting where these images were resized to � � �� pixels. The AT&T

dataset consisted of 400 images of 40 individuals with variations in facial expression and pose;

the images were resized to � � pixels. According to their experiments, they found kernel

Eigenface approach produced a lower error rate when compared to the Eigenface approach on

the Yale and AT&T dataset. In the next section, we shall describe a different holistic approach to

face recognition.

3.2 Linear Discriminant Analysis

Fisher’s Linear Discriminant known as Linear Discriminant Analysis finds a small number of

features that differentiates individual faces but recognises faces of the same individual. A small

number of features is found by maximising the Fisher Discriminant Criterion (Fisher 1936),

which is achieved by maximising the grouping of individual faces whilst minimising the group-

ing of different individual faces. Therefore by grouping faces of the same individual these fea-

tures can be used to determine the identity of individuals.

21

Linear Discriminant Analysis is defined by the between scatter class �� and within scatter class

��. The between scatter class �� are faces of different individuals while the within scatter class

�� are faces of the same individuals. The between scatter class �� specifically represents the

scatter of features around the mean of each face class whilst the within scatter class�� represents

the scatter of features around the overall mean for all face classes. The definition of the between

scatter class �� is

��

�� (3.13)

where C defines the number of face classes (individuals),�� is the number of faces per face class,

�� is the mean of the face class and �� is the overall mean of all different face classes.

We also outline the within scatter class �� as

��

��

�� (3.14)

where �� is the number of faces per face class and �� is the mean of a face class.

Given eq. 3.13 and eq. 3.14 to find a small number of features we maximise the Fisher Discrim-

inant Criterion ratio (Fisher 1936) by the between scatter class �� and the within scatter class

��. The Fisher Discriminant Criterion ratio is

��

�� (3.15)

where represents a small number of features and u is the optimal projection for the gallery

set. To find the optimal projection u, this can be found by solving the general eigenproblem

(Appendix B) to Linear Discriminant Analysis by

�� (3.16)

where �� is the eigenvalues of a face class and �� is the within scatter class.

22

To maximise the Fisher Discriminant Criterion ratio the between scatter class �� must be max-

imised, whilst the within scatter class �� must be minimised. However, Fisher Discriminant

Criterion could become unstable once the within scatter class �� was singular. The within scat-

ter class�� being singular can be caused by the number of faces in a probe set being smaller than

the number of pixels present in each face image. We shall now describe how Linear Discriminant

Analysis has been applied to face recognition.

Belhumeur et al. (1997) were motivated to develop a tolerant variant of the Fisher Discriminant

Criterion they called Fisherfaces. Fisherfaces were described as a stable approach that prevented

the within scatter class �� from becoming singular and also maximising the ratio for the between

scatter class�� and the within scatter class��. Fisherfaces are defined in terms of finding a small

number of features

� �� (3.17)

where �� is defined by eq. 3.18 and �� is defined by eq. 3.19.

The equation for �� is

�� (3.18)

where u is the optimal eigenvectors for the covariance matrix and �� is the within scatter class.

For ��, this is

��

��

��

� (3.19)

where �� is the transposition of ��.

Similarly, Swets & Weng (1996) defined a similar approach to Belhumeur et al. (1997), where

they proposed using two projections to solve the problem of the within scatter class �� being

singular. They proposed using Karhunen-Loeve expansion to reduce the face dimensionality by

23

optimal projection. The resultant principal components of the face would then be projected as in-

put to Linear Discriminant Analysis. In their experiment, the features found by Karhunen-Loeve

expansion were compared to the features found by the Linear Discriminant Analysis, to eval-

uate their performance. The performance measured how well the Karhunen-Loeve expansion

and Linear Discriminant Analysis captured variance from the dataset. Their experiment used the

dataset supplied by the Weizmann Institute (Weizmann Face Database 2000). The dataset con-

tained small variations in scale, pose and position. However, the dataset also contained variable

lighting and face expressions. Accordingly to their experiments, they highlighted that Linear Dis-

criminant Analysis outperformed the Karhunen-Loeve expansion. They had based their outcome

on Linear Discriminant Analysis capturing 95% of the variance with 15 features when compared

to the Karhunen-Loeve expansion capturing 95% variance with 35 features. Furthermore, they

found that Linear Discriminant Analysis was not effected by lighting variations present in face

images.

However, as Swets & Weng (1996) used two projections to prevent the within scatter class ��

becoming singular, Zhao et al. (1999) instead used a small positive number � that was added

to the within scatter class �� to prevent the within scatter class �� becoming singular. This

ensured the within scatter class �� would always be positive. They also emphasised using a

modified weighted Euclidean distance metric to improve the recognition accuracy of faces. The

modified weighted Euclidean distance metric was defined by weighting the regularised variances

(eigenvalues). They compared their modified weighted distance metric to the standard weighted

distance metric and unweighted distance metric for face recognition. According to their ex-

periment, the modified weighted Euclidean distance metric performed better than the other two

distance metrics (Zhao et al. 1999).

Whilst Kim et al. (2002) observed the limitation of Linear Discriminant Analysis using a single

set of features for a gallery set, they proposed using a Linear Discriminant Analysis mixture

model. The Linear Discriminant Analysis mixture model fundamentally captured several differ-

ent sets of features for a gallery set. They described the Linear Discriminant Analysis mixture

model consisted of a Principal Component Analysis mixture model to firstly reduce the dimen-

24

sionality of faces and then a Linear Discriminant Analysis mixture model used to find the differ-

ent sets of features. The Linear Discriminant Analysis mixture model used the standard Fisher

Discriminant Criterion ratio to maximise the within scatter class�� and the between scatter class

��. They had used the PSL dataset (Wang & Tan 2000) taken from the MPEG7 community to

test their proposal. The dataset contained 271 different individuals and contained lighting and

pose variations. Comparing the Principal Component Analysis mixture model against the Linear

Discriminant Analysis mixture model, their experiments showed the Principal Component Anal-

ysis mixture model performed better than the Linear Discriminant Analysis mixture model for

a small number of features (see Martinez & Kak (2001) for a comparison of Principal Compo-

nent Analysis and Linear Discriminant Analysis). On the other hand, they claimed the Linear

Discriminant Analysis mixture model performed better than the Principal Component Analysis

mixture model for features greater than 35.

In contrast, Lu et al. (2003) proposed a new Linear Discriminant Analysis method called Direct-

Fractional Step Linear Discriminant Analysis. Direct-Fractional Step Linear Discriminant Anal-

ysis combined two existing Linear Discriminant Analysis approaches of Fractional-Step Linear

Discriminant Analysis (Lotlikar & Kothari 2000) and Direct-Linear Discriminant Analysis (Yu

& Yang 2001). Fractional-Step Linear Discriminant Analysis outlined the reduction of the face

dimensionality by a small number of iterations, whereas Direct-Linear Discriminant Analysis

directly classified the original high-dimensionality of face images. Furthermore, Direct-Linear

Discriminant Analysis minimised the loss of discriminatory information by evaluating whether

the null space, features with variances close to zero, provided any discriminatory information

that facilitated the identification of individuals (Chen et al. 2000). Direct-Fractional Step Lin-

ear Discriminant Analysis was described as using the Direct-Linear Discriminant Analysis ap-

proach to find a small number of features to form the feature subspace, which also took into

account the null space. Thereafter, the Direct-Fractional Step Linear Discriminant Analysis used

Fractional-Step Linear Discriminant Analysis to further reduce the feature subspace by remov-

ing the smallest variances (eigenvalues) after a number of iterations. The resulting small number

of features were then used to perform face recognition. In their experiments, they used the

25

ORL dataset (AT&T Face Database 1994) provided by AT&T Laboratories Cambridge and the

UMIST dataset (UMIST Face Database 2000). The ORL dataset contained 400 images of 40

individuals, having 10 images per individual and included variations in lighting and face expres-

sions. In addition, faces were taken at different times and a few faces had glasses. From the 400

images in the ORL dataset, 200 were used for training and 200 were used as testing. Whilst the

UMIST dataset had 575 images of 20 individuals, where the UMIST dataset contained profile

or frontal views of individuals. From the 575 images from the UMIST dataset, they randomly

selected 160 images for training and the remaining 415 images were used for testing. From their

experiments, the Direct-Fractional Step Linear Discriminant Analysis achieved a low error rate

of 4% when using 22 features for the ORL dataset. However they did not state the lowest error

rate on the UMIST dataset, but showed a figure graph indicating approximately 2.5% when using

12 features.

3.3 Summary

We have discussed in this chapter several Karhunen-Loeve expansion and Linear Discriminant

Analysis approaches. However, these two holistic approaches are fundamentally different as

Karhunen-Loeve expansion finds a small number of features by the principal components of the

face, whereas, Linear Discriminant Analysis finds a small number of features by maximising the

grouping of faces from the same individual and minimising the grouping of faces from different

individuals. As holistic approaches represent global information of faces, the disadvantage of

this approach is the variances captured may not be relevant features of the face. Therefore one

advantage of using feature-based approaches is that they attempt to accurately capture relevant

features from face images. In the next chapter, we shall discuss feature based approaches, which

use a priori information to uniquely identify individuals by their facial features.

26

Figure 3.2: First Eight Principal Components of the Upper Portion of the Face (Sirovich & Kirby

1987) From top to bottom, left to right, the 1st principal component is the top left, 4th principal component

is the bottom left, 5th principal component is the top right and 8th principal component is the bottom right

27

Chapter 4

Feature Based Face Recognition

Feature based face recognition uses a priori information or local features of faces to select a

number of features to uniquely identify individuals. Local features include the eyes, nose, mouth,

chin and head outline, which are selected from face images. In this chapter, we will describe

general feature based approaches and the Elastic Bunch Graph Matching.

In the General Approaches section, we describe the work of Craw et al. (1992) where they applied

a priori information to find face features and a biologically motivated approach by Manjunath

et al. (1992).

In the Elastic Bunch Graph Matching section, we shall briefly describe a predecessor to Elastic

Bunch Graph Matching, called Dynamic Link Architecture by Lades et al. (1993). We shall

then describe Elastic Bunch Graph Matching which was proposed by Wiskott et al. (1999). Ad-

ditionally, we will also focus on other work on Elastic Bunch Graph Matching (Wiskott et al.

1995, Kalocsai et al. 2000). We now begin with the general approaches of feature based face

recognition.

28

4.1 General Approaches

The general approaches to feature based face recognition are concerned with using a priori infor-

mation of the face to find local face features. Alternatively, another general approach is to find

local significant geometries of the face that correspond to the local features of faces. We will

now discuss the general approaches that have been applied to face recognition.

Craw et al. (1992) were motivated to locate features within faces. Their approach utilised a priori

information to accurately find local features. Their implementation consisted of two parts: the

first part was designed to identify individual features, such as the eyes (general location of the

eyes), eye (iris and surrounding white around the iris), chin, cheek, hair, jaw-line, mouth, mouth-

bits (edges and outline of the lips), head outline and the nose; the second part refined the features

found from the first part by using a priori information to locate 40 pre-defined face features.

Specifically, the process of finding these 40 pre-defined features were to initially find the head

outline within face images. By finding the head outline, this facilitated finding other face features.

They employed a polygonal shape as the initial head outline. An initial shape score (Craw et al.

1992) was used to measure how close the initial head outline matched the face. The head outline

was then iteratively deformed to match the average head outline for all face images. Allowable

deformation included scale, orientation and location. Additionally, the iterative deformation of

the head outline was guided by the secondary shape score (Craw et al. 1992) that indicated

how close the orientation and scale of the head outline matched the face image. Once the head

outline was found, they could concentrate on finding other face features based on the a priori

information of the face, within the boundaries of the head outline. From their experiments,

they reported a success rate of approximately 95% for 64 images based on a moving sequence

of a single person. They also experimented with 50 still frontal views of different individual

faces. Their approach successfully found 43 complete head outlines. However, they stated the

remaining head outlines found were only partial head outlines and were caused by the variations

around the chin and mouth of individual faces. Furthermore they indicated an error rate of 6%

was found for inaccurately identifying the areas of the chin and mouth of faces. Therefore, they

29

concluded the error rates could be directly attributed to faces having beards and moustaches.

In contrast, Manjunath et al. (1992) proposed a method that did not utilise a priori information to

find face features, but, found significant curvature changes within faces that corresponded to the

face features. Their approach recognised faces in three stages. The first stage extracted features

from faces by utilising Gabor wavelets (Manjunath et al. 1992, equation 1). In the second stage

they created a graph like data structure that was used to represent the face features found as a

collection of interconnected nodes. There were two types of graphs created; an input face graph

represented a probe face and a model face graph represented a gallery face. Therefore, in the

third stage their approach matched the input face graphs to the model face graphs in order to

determine the identity of the probe set in relation to the gallery set. Specifically, the matching

process determined how similar an input face graph was to a gallery face by its distance. Hence,

a probe face with the smallest distance to a gallery face inferred a match. In their experiments,

they used a dataset of 306 faces. The dataset contained approximately 2 to 4 images of 86

individuals. Additionally, the dataset contained variations in facial expression, orientation and

minor variations in position and scale. They found a 86% recognition rate when the probe face

was within the first rank. Furthermore, a 96% recognition rate was found when the probe face was

within the top three ranked matches. In the next section, we shall describe a similar biologically

motivated approach.

4.2 Elastic Bunch Graph Matching

Elastic Bunch Graph Matching recognises faces by matching the probe set represented as the

input face graphs, to the gallery set that is represented as the model face graph. Fundamental to

the Elastic Bunch Graph Matching is the concept of nodes. Essentially, each node of the input

face graph is represented by a specific feature point of the face. For example, a node represents

an eye and another node represents the nose and the concept continues for representing the other

face features. Therefore the nodes for the input face graph are interconnected to form a graph

like data structure which is fitted to the shape of the face as illustrated in figure 4.1.

30

Figure 4.1: Face Graph (Wiskott et al. 1999)

In contrast, the model face graph represents the gallery set only used one model face graph to

represent the entire gallery set. However, the model face graph can be conceptually thought of

as a number of input face graphs stacked on top of each other and concatenated to form one

model face graph, with the exception that this is applied to the gallery set instead of the probe

set. Therefore, this would allow the grouping of the same types of face features from different

individuals. For example, the eyes of different individuals could be grouped together to form the

eye feature point for the model face graph and the noses of different individuals can be grouped

together to form the nose feature point for the model face graph. An illustration of the model

face graph is shown in figure 4.2.

Given the definition for the input face graph and model face graph, to determine the identity for

the input face graph is to achieve the smallest distance in relation to the the model face graph for

a particular gallery face. The distance is determined by the node similarity measure for the input

face graphs to the model face graph, (for example, see Wiskott et al. (1999)).

A predecessor to Elastic Bunch Graph Matching, was Lades et al. (1993) proposal of an exten-

sion to Artificial Neural Networks, called Dynamic Link Architecture. Dynamic Link Archi-

tecture grouped sets of neurons into more symbolic representations. The purpose of grouping

the neurons would facilitate in object recognition and also be invariant to position and other

distortions. The focus of Dynamic Link Architecture was to demonstrate the method’s perfor-

31

Figure 4.2: Model Face Graph (Wiskott et al. 1999) A single model face graph is used to

represent the entire gallery set. Generally, feature points of the model face graph are represented

by grouping together the same type of features from different individuals. The same type of

features could be the eyes or noses and the other relevant face features. By different individuals

we mean they are taken from the gallery set. A feature point of the model face graph refers to a

group of the same type of features taken from different individuals. For example, the eye feature

point of the model face graph is represented by a group of eyes taken from different individuals.

mance by recognising faces. They added that Dynamic Link Architecture was not specifically

tuned for face recognition, but instead was intended to be used for generic object recognition.

Lades et al. (1993) initially proposed a complete Artificial Neural Networks approach to be im-

plemented. However, for ease of implementation, they chose to implement the Elastic Graph

Matching method instead. The implementation defined that each node represented a single fea-

ture of the face, they called a jet. The jets were found by using Gabor wavelet transform (Wiskott

et al. 1999). The nodes were connected by edges that defined the relative distance between two

nodes. The nodes were interconnected to form a graph that represented the face features. Recog-

nition of faces were achieved by building a input face graph and then matching it to the model

face graph to find the best match.

32

Wiskott et al. (1999) extended Dynamic Link Architecture implementation to have multiple jets

for each node. The multiple jets represented the same types of features from different individuals,

which we described earlier at the start of this section. Therefore they called their extension Elastic

Bunch Graph Matching. In addition to proposing this extension, they employed a coarse to fine

strategy to find the best matching model face graph to the input face graph. At a coarse level,

they used a similarity measure that did not consider phase variation (Wiskott et al. (1999), eq.

5), where phase is the spatial frequency representing the face image pixels. Whilst at a finite

level, phase variation (Wiskott et al. (1999), eq. 7) was considered because phase variations was

caused by the translation of the face image pixels. Therefore by minimising phase variations,

this improves finding the jets, ultimately leading to greater accuracy in identifying individuals.

From their experiment they used the FERET (Phillips et al. 2000) and the Bochum (Bochum Face

Database 1995) dataset. They used 250 frontal views of individuals from the FERET dataset as

the model face graph. Their results on the FERET dataset found a 98% recognition accuracy,

where 245 out of the 250 input face graphs that contained variations in facial expression were

correctly identified. For the Bochum dataset they used 108 neutral frontal views for the model

face graph. They found 94% recognition accuracy where 102 out of the 108 input face models

with variations in facial expressions were correctly recognised. Also see Okada et al. (1998) for

details on their preparation and outcome from the FERET Phase III Test.

Wiskott et al. (1995) proposed a specific implementation of Elastic Bunch Graph Matching that

identified the gender of individuals. They proposed to use high level topological information

to describe the nodes of the model face graph, whereby the individual nodes of the model face

graph were labelled as either male, female, bearded or having glasses. The gender of the input

face graph was determined when the majority of the matching nodes from the model face graph

were labelled either male or female. For example if the majority of the nodes were labelled male,

the input face graph gender was determined as being male. In their experiment they used 112

faces with neutral front views. Of those 112 faces, 65% of faces were male, 28% of faces wore

glasses and 19% of faces had beards. They found they achieved 90.2% for gender identification,

90.2% correct detection of face wore glasses and 92.9% correct detection of faces having beards.

33

An improvement to the Elastic Bunch Graph Matching method was proposed by Kalocsai et al.

(2000). In their investigation, they explored the effect of weighting Gabor kernels to improve

face recognition, where 40 Gabor kernels were produced from 48 feature points of the face. They

found from using a dataset of Caucasian faces that the most discriminatory face features were

situated around the forehead and eyes. In contrast, the least discriminatory face features were the

mouth, nose, cheeks and the lower outline of the face. They also used a dataset of Asian faces and

found similar results to the Caucasian faces, where the most discriminatory face features from

the Asian dataset were the forehead and eyes. However, they also found other discriminatory

face features that included the nose and cheeks. They concluded the highest weighted kernels

would provide a more compact representation of faces and achieve higher recognition rates by

using the highest weighted kernels as compared to the lowest weighted kernels.

4.3 Summary

We have described in this chapter, feature-based approaches where we have covered general

approaches and the Elastic Bunch Graph Matching. We have described that feature-based ap-

proaches apply a priori information to find local faces features that are used to uniquely identify

individuals. We have seen in our description of general approaches that using a priori informa-

tion constrains the search space for finding and locating face features, as well as, being used to

conceptually relate the features found to the high-level semantics of the face. In contrast, Elastic

Bunch Graph Matching uses a biologically motivated approach to finding and locating the face

features. It also represented the features found as a graph like structure that semantically resem-

bled the shape of the human face. In the next chapter we shall evaluate the performance of a

holistic approach and a feature-based approach in our experiments.

34

Chapter 5

Experiments

The aim of the experiments is to compare how accurate the Eigenface (Turk & Pentland 1991)

and elastic bunch graph matching (Wiskott et al. 1999) implementation can identify individuals

from varying face conditions.

The basis for our comparison of a holistic and a feature-based approach is achieved by using the

Eigenface, which was developed by the Massachusetts Institute of Technology Media Laboratory

Vision and Modelling Group, and the Elastic Bunch Graph Matching, which was developed by

the Colorado State University Computer Science Department, where we have modified their

programs for the purposes of enabling an objective comparison.

In the dataset section, we shall describe the AR Face database (Martinez & Benavente 1998) that

was used in our experiments. In the results section, we test the accuracy on the dataset, where

we will highlight the results obtained by examining the Eigenface and Elastic Bunch Graph

Matching implementation. We conclude with a discussion on the performance of the Eigenface

and the Elastic Bunch Graph Matching. We now begin by describing the dataset.

35

neutral expression wearing sun glasses

smile wearing sun glasses & lighting left side of face

anger wearing glasses & lighting right side of face

scream wearing a scarf (covering mouth & neck)

lighting left side of face wearing a scarf lighting left side of face

lighting right side of face wearing a scarf lighting right side of face

lighting all sides of face

Table 5.1: 13 Face Variations: The face images were taken in two sessions, separated by 14 days apart.

The face images taken captured 13 different defined face variations

5.1 Dataset

We chose the AR Face Dataset (Martinez & Benavente 1998) as we wanted to use a different

dataset to the FERET database (Phillips et al. 2000), so we could independently evaluate the

Eigenface and Elastic Bunch Graph Matching implementation. However since we wanted to

compare these two approaches we utilised the cumulative recognition rate, which was used in

the FERET program (Phillips et al. 2000). This cumulative recognition rate would therefore

allow us to objectively compare the performance of these two approaches.

The gallery set contained 126 individuals of 70 men and 56 women. All 13 face variations as

outlined in table 5.1, were based on a frontal view of the face. The original gallery set had an

image dimension of 768 by 576 pixels, but we rescaled the images to 128 by 96 pixel for our

experiments. Specifically, the AR Face Dataset consisted of two sessions that were taken 14 days

apart.

In relation to the experiments, we had used 2 sets to represent the 126 individuals as the gallery

set. The first gallery set consisted of 133 images of the 126 individuals with only neutral expres-

sions that were taken from the first session. The second gallery set had 119 images of the 126

individuals with only neutral expressions and were taken from the second session. For the probe

sets, a combined total of 3,030 different faces were used and corresponded to the 126 different

36

individuals but these images were the other 12 face variations and did not include the neutral

face expression. The probe set was divided into 2 sets, where the 1st set had 1,596 face images

taken from the first session of the AR Face Dataset, and the 2nd set had 1,434 face images taken

from the second session of the AR Face Dataset. We emphasise that the probe sets were not part

of the gallery set as they did not consist of neutral expressions.

5.2 Results

Our criteria for evaluating the accuracy of the Elastic Bunch Graph Matching and Eigenface

approach was based on the cumulative recognition rate by Phillips et al. (2000), which we defined

earlier in chapter 2.

We had systematically tested the Eigenface and Elastic Bunch Graph Matching by a process of

testing the 1st probe set against the 1st gallery set and testing the 2nd probe set against the 2nd

gallery set.

In our first experiment we performed a series of tests on the Eigenface approach. One test was

to measure the identification performance. Using the gallery sets, we incrementally trained the

system by varying the number of Eigenfaces used to train the system. For the first gallery set we

trained between 1 to 133 Eigenfaces. While for the second gallery set we had trained between

1 to 119 Eigenfaces. By varying the number of Eigenfaces, we found the highest cumulative

recognition rates were achieved when we trained 133 and 119 Eigenfaces for the first and second

gallery set respectively. As seen in figure 5.1, training 133 Eigenfaces for the 1st Set found a

cumulative recognition rate of 71.2% and training 119 Eigenfaces for the 2nd Set found a 71.9%

cumulative recognition rate.

Our next experiment tested the Elastic Bunch Graph Matching. As stated in Wiskott et al. (1999),

they had manually selected the face feature locations by hand. We also manually selected the

feature locations of the face, but only for a small number of images. Thereafter, we decided to

use the Elastic Bunch Graph Matching to automate the selection of the remaining features for

37

0 10 20 30 40 50

Rank

64

66

68

70

72

Cum

ulat

ive

Rec

ogni

tion

Rat

e

1st Dataset2nd set

Figure 5.1: Eigenface Results For the 1st Set 133 Eigenfaces were used to train the system. For the 2nd

Set, 119 Eigenfaces were trained. The rank for figure 5.1, represents the cumulative individual rank for

probe images on the gallery. The cumulative recognition rate is defined as the identification accuracy for

the Eigenface approach.

the two gallery sets. We used a narrowing local search approach from the elastic bunch graph

matching to automate feature selection. Interestingly, the features found from this automated

selection when compared to our manually selected features were just as accurate, therefore we

used the automatic selected features to train the system.

Another important aspect of Elastic Bunch Graph Matching was deciding on the type of simi-

larity measure for matching the probe and gallery sets. We decided to use a predictive iterative

search for the similarity measure. The predictive iterative search estimated a given feature lo-

cation and attempted to find the highest similarity for each feature by iteratively finding the

maximum similarity measure. It stopped once the maximum similarity measure found could no

longer be increased (Bolme 2003). In figure 5.2, we achieved a 95.0% cumulative recognition

rate for the 1st set. In comparison, we achieved a 97.3% cumulative recognition rate for the 2nd

set.

38

0 10 20 30 40 50

Rank

60

70

80

90

100

Cum

ulat

ive

Rec

ogni

tion

Rat

e

1st Set2nd Set

Figure 5.2: Elastic Bunch Graph Matching Results We achieved these results by using a local nar-

rowing search to locate face features at specific locations of the face. We then used an iterative predictive

search for measuring the match between the probe and gallery set based on their similarity measure. Iter-

ative predictive search, found the maximum similarity measure for each feature. Whereby it continuously

searched at a specific location of the faces in order to maximise the similarity measure for each feature.

It stopped its search once the maximum similarity measure could not be further increased. The rank for

figure 5.2, represents the cumulative individual rank for probe images on the gallery. The cumulative

recognition rate is defined as the identification accuracy for the Elastic Bunch Graph Matching approach.

For figures 5.3 and 5.4, we show a comparison of the performances of the Eigenface and Elastic

Bunch Graph Matching on the 1st and 2nd dataset of the (Martinez & Benavente 1998). For

both the 1st and 2nd Set the elastic bunch graph matching outperforms the Eigenface approach

for lower recognition rank but comparably achieves a relative similar cumulative recognition rate

for the first rank. This was indicated by the Eigenface achieving 63.8% for the 1st Set and 65.8%

for the 2nd Set as compared to the Elastic Bunch Graph Matching, which achieved 62.9% for the

1st Set and 64.9% for the 2nd Set.

39

0 10 20 30 40 50

Rank

60

70

80

90

100

Cum

ulat

ive

Rec

ogni

tion

Rat

e

Elastic Bunch Graph MatchingEigenface

Figure 5.3: 1st set Eigenface & Elastic Bunch Graph Matching We compare the cumulative recogni-

tion rate on the 1st set for the Eigenface and Elastic Bunch Graph Matching

5.3 Discussion

Overall, the Eigenface and Elastic Bunch Graph Matching performed better on the 2nd set than

the 1st set for figures 5.1 and 5.2. Our primary interpretation is that the number of faces in

the probe set of the 2nd set is less than the 1st set. Therefore this implies that either: this can

constrain the number of tests being performed yielding higher recognition rate, however, it could

be the opposite, where instead it leads to higher error rates; or, the images from the probe set of

the 2nd set are more similar to the gallery set of the 2nd set in contrast to the probe and gallery

set of the 1st set.

By comparing the performance of the Eigenface and Elastic Bunch Graph Matching approach,

the cumulative recognition rate indicates the Elastic Bunch Graph Matching in this instance had a

higher identification rate. Reasons attributed to the Elastic Bunch Graph Matching performance

indicates two factors, the Gabor wavelets used to capture the features did accurately find features

around particular locations of the face. In addition, when face features were covered by objects

40

0 10 20 30 40 50

Rank

60

70

80

90

100

Cum

ulat

ive

Rec

ogni

tion

Rat

e

Elastic Bunch Graph MatchingEigenface

Figure 5.4: 2st set Eigenface & Elastic Bunch Graph Matching We compare the cumulative recogni-

tion rate on the 2st set for the Eigenface and Elastic Bunch Graph Matching

such as sun glasses and scarves the system relied on the other features to perform recognition. It

also seemed that even though the features were automatically selected, as opposed to manually

selecting the features, the system still achieved a high recognition rate.

On the other hand the Eigenface approach did not comparably perform as well. A few reasons

can be attributed to the Eigenface performance; critical to the performance was the variance to

uniquely identify gallery faces. Some probe faces matched to gallery faces were not within the

defined face space as these probe faces contained vast variations in illumination and changes in

the shapes of faces. This was attributed to faces wearing a scarf or wearing sun-glasses. Another

factor was the probe sets deviated substantially from the average face for the gallery set. Hence,

we gather that because the variance of the probe sets are vastly different to the variance of the

gallery set this has directly effected the performance of the Eigenface approach.

41

Chapter 6

Conclusions

Face recognition is a difficult problem because faces can vary substantially in their orientation,

lighting, scale and facial expression. The first goal was to provide a survey of recent holistic and

feature based approaches that complement previous surveys. We described holistic approaches

including Karhunen-Loeve expansion and Linear Discriminant Analysis. The holistic approach

has advantages of distinctly capturing the most prominent features within face images used, to

uniquely identify individuals amongst a gallery set, as well as, automatically finding features.

However, disadvantages of holistic approaches are that recognition performance could be sig-

nificantly affected by: a probe set deviating from the average face of a gallery set because of

lighting, orientation and scale; or, features found from faces may not form part of the face but

some other feature captured. For example, capturing features from the background of a face.

This led to our discussion on feature based approaches, where we described general approaches

and elastic bunch graph matching. The advantages for feature based approaches include the

accurate selection of facial features to uniquely identify individuals and its robustness in recog-

nition performance despite variations in face expression, faces being occluded by another object,

orientation or lighting. On the other hand, the disadvantages of this approach were that if fea-

tures were selected manually then this could contain inaccurate location of features by the human

user, or, if features were automatically selected then this would be reliant on the accuracy of the

42

feature based approach, hence leading to inaccurate location of face features.

The second goal was to compare the performance of a holistic and a feature based approach on the

AR face database (Martinez & Benavente 1998). In our comparison, we found the Elastic Bunch

Graph Matching outperformed Eigenface approach when we tested these two approaches on the

AR face database. The tests were divided into two sets: for the 1st set the cumulative recogni-

tion rate for Elastic Bunch Graph Matching was 95.2% while the Eigenface approach achieved

71.2%; in the 2nd set, the cumulative recognition rate for Elastic Bunch Graph Matching was

97.5%, whereas, the Eigenface approach was 71.9%. Closer analysis of the results showed that

for the first rank, the Eigenface and Elastic Bunch Graph Matching approach achieved relatively

equal cumulative recognition rate, but, for the remaining ranks the Eigenface approach did not

increase its cumulative recognition rate because faces were too different from the average face

of the gallery set. In comparison, the Elastic Bunch Graph Matching incrementally increased its

cumulative recognition rate as it progressed towards lower ranks.

In our future work, we would like to extend our experiments to evaluate a wider breadth of

holistic and feature based approaches. The Eigenface approach in our experiment was limited

as it did not scale to larger datasets. This was because the gallery and probe set needed to be

stored in-memory in order to perform face recognition. We would like to extend the Eigenface

approach to include a more scalable in-memory version to handle much larger datasets. Future

work for the face recognition area includes many research directions including use of colour

two-dimensional images, three-dimensional models as opposed to using two-dimensional images

to handle varying orientations, adequate size and scale of faces and the handling of lighting

variations.

Interestingly, just as the research community continues to improve existing face recognition ap-

proaches, face recognition has also become commercially viable, with face recognition vendors

reporting they can achieve robust and high recognition rates. However to verify these claims,

in a recent independent face recognition vendor evaluation by Phillips et al. (2003a) they eval-

uated 10 leading commercial vendors of face recognition. They found that Cognitec Systems

43

GmbH’s FaceVACS, Eyematic’s Visual Sensing Software and Indetix’s FaceIT were amongst the

top three for the highest recognition accuracy achieved. Therefore, we provide a brief account

of these three leading commercial vendor’s face recognition systems. The Cognitec Systems

GmbH’s FaceVACS can be classified as feature based approach. This system recognises faces in

the following process:

1. locate the position and size of faces; determine the positions of the eyes

2. checks whether the image quality taken is adequate for face recognition

3. faces are normalised by scaling the faces to a fixed size and positioning the eyes to a fixed

location

4. faces are pre-processed using standard image processing techniques

5. face features are extracted

6. if building a gallery set, features for each face are stored as part of the gallery set

7. features from the gallery set are compared to the probe set to determine a match

8. match is determined by the setting of a threshold, so if the match score is higher than the

threshold this constituted a match

Recently, the Cognitec Systems GmbH’s FaceVACS was installed at Sydney’s International Air-

port, as part of the SmartGate system. The software was used to verify the identity of all air crew

before they boarded the aeroplanes. Similarly, Eyematic’s Visual Sensing Software is also con-

sidered a feature based approach. The systems consisted of four components of locating faces

within images, then finding local feature points, extracting those local feature points and then

matching the features found to a gallery set of faces. Also finding local feature points were based

a on threshold of acceptance, where if the local feature points were above the threshold then this

would indicate a match. Indetix’s FaceIT used local feature of the faces to perform face recog-

nition and was also considered a feature based approach. Thus for Phillips et al. (2003a) vendor

44

evaluation, the identification recognition rate achieved by Cognitec Systems GmbH’s FaceVACS

was 87% (Phillips et al. 2003b, fig. 7), second was Indetix’s FaceIT with 84% (Phillips et al.

2003b, fig. 7) and the third highest was Eyematic’s Visual Sensing Software with 80% (Phillips

et al. 2003b, fig. 7).

Commercial face recognition systems have provided comparable performances to those being

developed by the research community. However, we have seen that biologically motivated face

recognition approaches have achieved robust and accurate results to this present. We envisage as

researchers work more closely with certain aspects of the human cognition and perception, their

approaches shall yield higher recognition rates as well as being be more robust to larger datasets.

Thus forth, in perspective, as we ideally strive to autonomously develop face recognition systems,

shall we also conceptually model the characteristics of the human cognition and perception.

45

Bibliography

Adini, Y., Moses, Y. & Ullman, S. (1997), ‘Face Recognition: The problem of Compensating for Changes inIllumination Direction’, IEEE Transaction on Pattern Analysis and Machine Intelligence 19(7), 721–732.

AT&T Face Database (1994), http://www.uk.research.att.com/facedatabase.html. AT&T Laboratories Cambridge.

Baron, R. (1981), ‘Mechanisms of Human Facial Recognition’, International Journal of Man-Machine Studies15, 137–138.

Belhumeur, P., Hespanha, J. & Kriegman, D. (1997), ‘Eigenfaces vs. Fisherfaces: Recognition Using Class SpecificLinear Projection’, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720.

Benson, P. & Perret, D. (1991), Perception and Recognition of Photographic Quality Facial Caricatures: Implica-tions for the Recognition of Natural Images, in V. Bruce, ed., ‘Face Recognition’, Vol. 3(1), Lawrence ErlbaumAssociates, East Sussex, England, United Kingdom, pp. 105–135.

Benton, A. (1980), ‘The Neuropsychology of Facial Recognition’, American Psychologist 35, 176–186.

Beymer, D. & Poggio, T. (1995), Face Recognition From One Example View, A.i. memo no. 1536, ArtificialIntelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, United States.

Bochum Face Database (1995), http://www.neuroinformatik.ruhr-uni-bochum.de/top.html. Institut fur Neuroinfor-matik, Ruhr-University.

Bolme, D. (2003), Elastic Bunch Graph Matching, Master thesis, Colorado State University, Fort Collins, Colorado,United States.

Bruce, V. & Young, A. (1986), ‘Understanding Face Recognition’, British Journal of Psychology 77, 305–327.

Brunelli, R. & Poggio, T. (1993), ‘Face Recognition: Features versus Templates’, Pattern Analysis and MachineIntelligence 15(10), 1042–1052.

Burges, C. (1998), ‘A Tutorial on Support Vector Machines for Pattern Recognition’, Data Mining and KnowledgeDiscovery 77, 1–43.

Chellappa, R., Wilson, C. & Sirohey, S. (1995), ‘Human and Machine Recognition of Faces: A Survey’, Proceedingsof the IEEE 83(5), 705–741.

Chen, L., Liao, H., Ko, M., Lin, J. & Yu, G. (2000), ‘A new lda-based face recognition system which can solve thesmall sample size problem’, Pattern Recognition 33(10), 1713–1726.

Clowes, M. (1971), ‘On Seeing Things’, Artificial Intelligence 2, 79–112.

46

Colmenarez, A. & Huang, T. (1998), Face Detection and Recognition, in H. Wechsler, P. J. Phillips, V. Bruce, F. F.Soulie & T. S. Huang, eds, ‘Face Recognition: From Theory to Applications’, NATO ASI Series F, Springer-Verlag.

Craw, I., Tock, D. & Bennett, A. (1992), Finding Face Features, in ‘European Conference on Computer Vision’,pp. 92–96.

Ellis, H. (1986), Processes Underlying Face Recognition, in R. Bruyer, ed., ‘The Neuropsychology of Face Percep-tion and Facial Expression’, Lawrence Erlbaum Associates, New Jersey, United States.

Fisher, R. (1936), ‘The use of multiple measures in taxonomic problems’, Ann. Eugenics 7, 179–188.

Fukunaga, K. (1990), Introduction to Statistical Pattern Recognition, Computer Science and Science Computing,2nd edn, Academic Press, New York.

Georghiades, A., Kriegman, D. & Belhumeur, P. (1998), Illumination Cones for Recognition Under Variable Light-ing Face, in ‘Conference Proceedings on Computer Vision and Pattern Recognition’, California, United States,pp. 52–58.

Goldstein, A., Harmon, L. & Lesk, A. (1971), ‘Identification of Human Faces’, Proceedings of the IEEE 59, 748–760.

Guo, G., Li, S. & Chan, K. (2000), Face recognition by Support Vector Machines, in ‘Proceedings of Fourth IEEEConference Automatic Face and Gesture Recognition’, Grenoble, France, pp. 196–201.

Harmon, L., Khan, M., Lasch, R. & Ramig, P. (1981), ‘Machine Identification of Human Faces’, Pattern Recognition13, 97–110.

Hay, D. & Young, A. (1982), The Human Face, in A. Ellis, ed., ‘Normality and Pathology in Cognitive Functions’,Academic Press, New York, United States, pp. 173–202.

Hecaen, H. & Angelergues, R. (1962), ‘Agnosia for Faces (Prosopagnosia)’, Archives of Neurology 7, 92–100.

Johnsson, K., Kittler, J., Li, Y. & Matas, J. (1999), Support Vector Machines for Face Authentication , in ‘BritishMachine Vision Conference’, Nottingham, United Kingdom, pp. 543–553.

Kalocsai, P. & Biederman, I. (1998), Differences of Face and Object Recognition in Utilizing Early Visual Informa-tion, in H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie & T. S. Huang, eds, ‘Face Recognition: From Theory toApplications’, NATO ASI Series F, Springer-Verlag.

Kalocsai, P., von der Malsburg, C. & Horn, J. (2000), ‘Face Recognition by statistical analysis of feature detectors’,Image and Vision Computing 18, 273–278.

Kanade, T. (1973), Picture Processing System by Computer Complex and Recognition of Human Faces, Ph.D thesis, Department of Information Science, Kyoto University, Japan.

Kaya, Y. & Kobayashi, K. (1972), A Basic Study of Human Face Recognition, in S. Watanabe, ed., ‘Frontiers ofPattern Recognition’, Academic Press, New York, United States, pp. 265–289.

Kim, H., Kim, D. & Bang, S. (2002), Face Recognition using LDA Mixture Model, in ‘16th International Conferenceon Pattern Recognition’, Vol. 2, pp. 486– 489.

Kirby, M. & Sirovich, L. (1990), ‘Application of the Karhunen-Loeve Procedure for the Characterization of HumanFaces’, IEEE Transaction. Pattern Analysis and Machine Intelligence 12(1), 103–108.

47

Kohonen, T. (1984), Self-Organisation and Associative Memory, Springer-Verlag, Berlin, Germany.

Lades, M., Vorbruggen, J., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R. & Konen, W. (1993), ‘Distortioninvariant object recognition in the dynamic link architecture’, IEEE Transactions on Computers 42(3), 300–311.

Lando, M. & Edelman, S. (1995), Generalisation From a Single View in Face Recognition, in ‘International Work-shop on Automatic Face- and Gesture-Recognition’, Zurich, Switzerland, pp. 80–85.

Lotlikar, R. & Kothari, R. (2000), ‘Fractional-step dimensionality reduction’, IEEE Transactions on Pattern Analysisand Machine Intelligence 22(6), 623–627.

Lu, J., Plataniotis, K. & Venetsanopoulos, A. (2003), Face Recognition using LDA-Based Algorithms, in ‘IEEETransactions on Neural Networks’, Vol. 14(1), pp. 195– 200.

Manjunath, B., Chellappa, R. & von der Malsburg, C. (1992), ‘A Feature Based Approach to Face Recognition’,IEEE Conference Proceedings on Computer Vision and Pattern Recognition pp. 373–378.

Marr, D. (1980), A Computational Investigation into the Human Representation and Processing of Visual Informa-tion, in J. Wilson, ed., ‘Vision’, W.H Freeman and Company, San Francisco, United States.

Martinez, A. & Benavente, R. (1998), The AR Face Database, Technical Report 24, Computer Vision Center,Universitat Autonoma de Barcelona (UAB), Barcelona, Spain.

Martinez, A. & Kak, A. (2001), ‘PCA versus LDA’, IEEE Transactions on Pattern Analysis and Machine Intelli-gence 23(2), 228–233.

Minsky, M. (1975), A Framework for Representing Knowledge, in P. Winston, ed., ‘The Psychology of ComputerVision’, McGraw-Hill, New York, United States.

Moghaddam, B., Wahid, W. & Pentland, A. (1998), Beyond Eigenfaces: Probabilistic Matching for Face Recog-nition , in ‘Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition’,Nara, Japan, pp. 30–35.

Moses, Y., Edelman, S. & Ullman, S. (1996), ‘Generalisation to Novel Images in Upright and Inverted Faces’,Perception 25, 443–461.

Moses, Y. & Ullman, S. (1998), ‘Generalization to Novel Views: Universal, Class-based, and Model-based Process-ing’, International Journal on Computer Vision 29, 233–253.

Okada, K., Steffens, J., Maurer, T., Hong, H., Elagin, E., H.Neven & von der Malsburg, C. (1998), The Bochum/USCFace Recognition System And How it Fared in the FERET Phase III test, in H. Wechsler, P. J. Phillips, V. Bruce,F. F. Soulie & T. S. Huang, eds, ‘Face Recognition: From Theory to Applications’, Springer-Verlag, pp. 186–205.

O’Toole, A., Abdi, H., Deffenbacher, K. & Valentin, D. (1993), ‘Low-dimensional representation of faces in higherdimensions of the face space’, Journal of Optical Society America A 10(3), 405–411.

Pentland, A., Moghaddam, B. & Starner, T. (1994), View-Based and Modular Eigenspaces for Face Recognition, in‘Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR ‘94’, Seattle, Washington,pp. 84–91.

Phillips, P., Grother, P., Micheals, R., Blackburn, D., Tabassi, E. & Bone, M. (2003a), Face Recognition Vendor Test2002 - Evaluation Report, Technical Report NISTIR 6965, DoD Counterdrug Technology Development ProgramOffice, Virginia, United States.

48

Phillips, P., Grother, P., Micheals, R., Blackburn, D., Tabassi, E. & Bone, M. (2003b), Face Recognition Vendor Test2002 - Overview and Summary, Technical Report NISTIR 6965, DoD Counterdrug Technology DevelopmentProgram Office, Virginia, United States.

Phillips, P., Moon, H., Rizvi, S. & Rauss, P. (2000), ‘The FERET Evaluation Methodology for Face-RecognitionAlgorithms’, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104.

Rajagopalan, A., Burlina, P. & Chellappa, R. (1999), Higher Order Statistical Learning for Vehicle Detection inImages , in ‘Proceedings. Seventh International Conference on Computer Vision’, Vol. 2, pp. 1204–1209.

Rhodes, G., Brennan, S. & Carey, S. (1987), ‘Identification and Ratings of Caricatures: Implications for MentalRepresentations of Faces’, Cognitive Psychology 190, 473–497.

Samal, A. & Iyengar, P. (1992), ‘Automatic Recognition and analysis of human faces and facial expressions: Asurvey’, Pattern Recognition 25(1), 65–77.

Samaria, F. (1994), Face Recognition using Hidden Markov Models, Ph.D thesis, Trinity College, University ofCambridge, England.

Samaria, F. & Fallside, F. (1993), ‘Face Identification and Feature Extraction using Hidden Markov Models’, ImageProcessing: Theory and Applications .

Samaria, F. & Young, S. (1994), ‘HMM-Based Architecture for Face Identification’, Image and Vision Computing12(8), 537–543.

Sergent, J. (1989), Structural Processing of Faces , in A. Young & H. Ellis, eds, ‘Handbook of Research on FaceProcessing’, Vol. 3(1), Elsevier Science Publishers B.V., Amsterdam, Netherlands, pp. 57–91.

Shepherd, J., Gibling, F. & Ellis, H. (1991), The Effects of Distinctiveness, Presentation Time and Delay on FaceRecognition, in V. Bruce, ed., ‘Face Recognition’, Vol. 3(1), Lawrence Erlbaum Associates, East Sussex, Eng-land, United Kingdom, pp. 137–145.

Sirovich, L. & Kirby, M. (1987), ‘Low-dimensional procedure for the characterization of human faces’, Journal ofthe Optical Society of America A 4(3), 519–524.

Stonham, J. (1986), Practical Face Recognition and Verification with WISARD, in H. Ellis, M. Jeeves, F. Newcombe& A. Young, eds, ‘Aspects of Face Processing’, Martinus Nijhoff, Dordrecht, Netherlands.

Swets, D. & Weng, J. (1996), ‘Using Discriminant Eigenfeatures for Image Retrieval’, IEEE Transactions on PatternAnalysis and Machine Intelligence 18(8), 831–836.

Tefas, A., Kotropoulos, C. & Pitas, I. (2001), Using support vector machines to enhance the performance of elasticgraph matching for frontal face authentication, in ‘IEEE Transactions on Pattern Analysis and Machine Intelli-gence’, Vol. 23 (7), Grenoble, France, pp. 735–746.

Turk, M. (2001), ‘A Random Walk Through Eigenspace’, IEICE Transaction on Information and Systems E84-D(12), 1586–1595.

Turk, M. & Pentland, A. (1991), ‘Eigenfaces for Recognition’, Journal of Cognitive Neuroscience 3(1), 71–86.

UMIST Face Database (2000), http://images.ee.umist.ac.uk/danny/database.html. University of Manchester Insti-tute of Science and Technology.

Vapnik, V. (1998), Statistical Learning Theory, John Wiley & Sons, New York, United States.

49

Wang, L. & Tan, T. K. (2000), Experimental Results of Face Description Based on the 2nd-order Eigenface Method,ISO/MPEG m6001, Panasonic Singapore Laboratories Pte Ltd (PSL).

Weizmann Face Database (2000), ftp://ftp.idc.ac.il/pub/users/cs/yael/Facebase/. Weizmann Institute.

Wiskott, L., Fellous, J., Kruger, N. & von der Malsburg, C. (1995), Face Recognition and Gender Determination, inM. Bichsel, ed., ‘Proceedings of International Workshop on Automatic Face- and Gesture-Recognition’, Zurich,pp. 92–97.

Wiskott, L., Fellous, J., Kruger, N. & von der Malsburg, C. (1999), Face recognition by elastic bunch graph match-ing, in L.C. Jain et al., ed., ‘Intelligent Biometric Techniques in Fingerprint and Face Recognition’, CRC Press,chapter 11, pp. 355–396.

Xi, D., Podolak, I. & Lee, S. (2002), Facial component extraction and face recognition with support vector machines,in ‘Proceedings of Fifth IEEE Conference Automatic Face and Gesture Recognition’, Washington DC, UnitedStates, pp. 76–81.

Yale Face Database (1997), http://cvc.yale.edu/projects/yalefaces/yalefaces.html. Yale University.

Yang, M., Ahuja, N. & Kriegman, D. (2000), Face Recognition Using Kernel Eigenfaces , in ‘Proceedings. Inter-national Conference on Image Processing’, Vol. 1, Vancouver, Canada, pp. 37–40.

Young, A. & Bruce, V. (1991), Perceptual Categories and the Computation of “Grandmother”, in V. Bruce, ed.,‘Face Recognition’, Vol. 3(1), Lawrence Erlbaum Associates, East Sussex, England, United Kingdom, pp. 5–49.

Yu, H. & Yang, J. (2001), ‘A direct LDA algorithm for high-dimensional data - with application to face recognition’,Pattern Recognition 34(10), 2067–2070.

Zhao, W., Chellappa, R. & Phillips, P. (1999), Subspace Linear Discriminant Analysis for Face Recognition, Tech-nical Report CAR-TR-914, Centre for Automation Research, University of Maryland, College Park, Washington.

Zhao, W., Chellappa, R., Rosenfeld, A. & Phillips, P. (2000), Face recognition: A literature survey, Technical ReportCAR-TR-948, Centre for Automation Research, University of Maryland, College Park, Washington, UnitedStates.

Zobel, J. (1997), Writing for Computer Science: the art of effective communication, Springer-Verlag, Singapore.

50

Appendix A

Covariance Matrix

The following defines the covariance matrix for a face image. We define that A is the covariancematrix, M is the size of the dataset, �� is the normalised face, �� is the transposition of thenormalised face, � is the face image, �� defines the average face for the dataset and c is theresultant covariance matrix vectors.

A ��

�

��

��

� ��

�

��

��

�� ...

��

��

��

�

��

......

...��

��

�

��

......

. . ....

��

�� (A.1)

51

Appendix B

Generalised Eigenproblem

In this example, we demonstrate how the associated eigenvectors and eigenvalues can be solvedfor a small covariance matrix. Given the covariance matrix A is a 2 � 2 square matrix

� �

� ��

�� (B.1)

we firstly find the eigenvalues for the covariance matrix A, where the characteristic equation forfinding the eigenvalues must satisfy the following

�� (B.2)

where � is the eigenvalues, I is the identity matrix and u is the eigenvectors.

We begin by substituting the covariance matrix from eq. B.1 and the identity matrix into thecharacteristic equation eq. B.2

�

��

��

� ��

�� (B.3)

To find the eigenvalues we solve the following

��

� ��

��

� ��

� ��

��

��

�� (B.4)

52

��

� ��

��

��

�� (B.5)

We must now find the determinant of eq. B.6 which is

��

��

�� (B.6)

where the determinant of eq. B.6 produces an algebraic equation, which we can solve by factori-sation of the following

�� (B.7)

Therefore from eq. B.7 we find two eigenvalues, � = 5 and � = 3.

As we have found the eigenvalues for the covariance matrix A, the next step is to find the eigen-vectors. We achieve this by selecting an eigenvalue and substituting that value into eq. B.8. Inthis instance, we shall use � = 5 which produces

� ��

� ��

��

��

�� (B.8)

the following is the eigenvector for eq. B.8

� �

��

�� (B.9)

thus we can prove that we have satisfied the characteristic equation, by finding the product of eq.B.8

� ��

� ��

��

��

�� (B.10)

53

having satisfied the characteristic equation, to verify that the eigenvalue � = 5 is derived from thecovariance matrix A, we calculate the product of the eigenvector and the covariance matrix A

� ��

� ��

��

��

�� (B.11)

and taking out the common factor the resultant is

�

��

�� (B.12)

which verifies that the integer multiple of 5 confirms the eigenvalue � we had previously found.

54

Comparison of Holistic and Feature Based Approaches to Face Recognition€¦ · be to provide a...

Documents

Transcript of Comparison of Holistic and Feature Based Approaches to Face Recognition€¦ · be to provide a...