Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al.,...

48
Automatic Facial Occlusion Detection and Removal Naeem Ashfaq Chaudhry November 11, 2012 Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Niclas B¨orlin Examiner: Frank Drewes Ume ˚ a University Department of Computing Science SE-901 87 UME ˚ A SWEDEN

Transcript of Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al.,...

Page 1: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Automatic Facial OcclusionDetection and Removal

Naeem Ashfaq Chaudhry

November 11, 2012Master’s Thesis in Computing Science, 30 credits

Supervisor at CS-UmU: Niclas BorlinExaminer: Frank Drewes

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Page 2: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is
Page 3: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Abstract

In our daily life, we are faced with many occluded faces. The occlusion may be from differentobjects like sunglasses, mufflers, masks, scarves etc. Sometimes, this occlusion is used bythe criminal persons to hide their identity from the surroundings. In this thesis, a techniqueis used to detect the facial occlusion automatically. After detecting the occluded areas, amethod for image reconstruction called aPCA (asymmetrical Principal Component Analysis)is used to reconstruct the faces. The entire face is reconstructed using the non occludedarea of the face. A database of images of different persons is organized which is used in theprocess of reconstruction of the occluded images. Experiments were performed to examinethe effect of the granularity of the occlusion on the aPCA reconstruction process. Theinput mask image is divided into different parts, the occlusion for each part is marked andaPCA is applied to reconstruct the faces. This process of image reconstruction takes a lotof processing time so pre-defined eigenspaces are introduced that takes very less processingtime with very less quality loss of the reconstructed faces.

Page 4: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

ii

Page 5: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Goals of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Occluded face reconstruction . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.2 Facial occlusion detection . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 5

2.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 PCA method/model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 PCA for images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Eigen faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Asymmetrical PCA (aPCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Description of aPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 aPCA calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 aPCA for reconstruction of occluded facial region . . . . . . . . . . . . 9

2.3 Skin color detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Image registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.1 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.2 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.4 Affine transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Peak signal-to-noise ratio (PSNR) . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Method 13

3.1 The AR face database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Automatic occlusion detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Replace white color with black color . . . . . . . . . . . . . . . . . . . 13

3.2.2 Image cropping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.3 Image division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.4 Occlusion detection for each block . . . . . . . . . . . . . . . . . . . . 14

iii

Page 6: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

iv CONTENTS

3.3 Occluded face reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 PSNR calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Experiment 17

4.1 Granularity effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.2 Sunglasses scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.3 Scarf scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.4 Cap and sunglasses occlusion . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Pre-defined eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Experiment description . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Results 35

5.1 Occlusion detection results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Reconstruction quality results . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.3 Reconstruction results using pre-defined eigenspaces . . . . . . . . . . . . . . 40

6 Conclusions 41

6.1 Discussion about granularity effect and reconstruction quality . . . . . . . . . 41

6.2 Discussion about pre-defined eigenspaces . . . . . . . . . . . . . . . . . . . . . 41

6.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7 Acknowledgements 43

References 45

Page 7: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

List of Figures

1.1 Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion. . . 2

2.1 The first vector Z1 is in direction of maximum variance and second vector Z2

is in direction of residual maximum variance. . . . . . . . . . . . . . . . . . . 6

2.2 Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface. . . . 8

2.3 The blue part represents the eigenspace of non-occluded regions whereas the

green part represents the pseudo eigenspace of the complete image. . . . . . . 9

2.4 (a) and (b) represent the original images while (c) and (d) represent the

registered images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 (a) an occluded facial image. (b) Image division into 6 parts. (c) Image

division into 54 smaller parts (d) Image division into 486 parts. . . . . . . . . 14

3.2 (a) an occluded facial image. (b) Image division into blocks. (c) Each black

block represents an occluded block. . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1 (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces. . . . 18

4.2 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 19

4.3 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.2 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . . 19

4.4 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 21

4.5 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.4 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . . 21

4.6 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 22

4.7 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.6 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 22

4.8 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 23

v

Page 8: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

vi LIST OF FIGURES

4.9 An example of the reconstructed face by level 3b image division (a) An oc-

cluded image. (b) The occluded image masked by the mask from Figure 4.8

(d). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . . . 23

4.10 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 24

4.11 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.10 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 25

4.12 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 25

4.13 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.12 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 26

4.14 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 26

4.15 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.14 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 27

4.16 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 27

4.17 An example of the reconstructed face by level 3b image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.16 (d). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 28

4.18 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 28

4.19 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.18 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 29

4.20 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 29

4.21 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.20 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 30

4.22 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 30

4.23 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.22 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 31

4.24 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 31

4.25 An example of the reconstructed face by level 3b image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure

4.24 (d). (c) Reconstructed image. (d) Non-occluded image. . . . . . . . . . . 32

4.26 Occluded facial images used for construction of 6 eigenspaces. . . . . . . . . . 33

4.27 (a) An occluded image. (b) Detected occlusion by level 3b image division.

(c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d)

Reconstructed image using the eigenspace in (c). . . . . . . . . . . . . . . . . 34

Page 9: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

LIST OF FIGURES vii

5.1 Occlusion detection by different image division methods. (a) Occluded image.

(b) Occlusion detection by level 1 image division. (c) Occlusion detection by

level 2 image division. (d) Occlusion detection by level 3a image division. (e)

Occlusion detection by level 3b image division. . . . . . . . . . . . . . . . . . 36

5.2 Reconstructed image by different image division methods. (a) An occluded

image. (b) Reconstructed image by level 1 image division. (c) Reconstructed

image by level 2 image division. (d) Reconstructed image by level 3a image di-

vision. (e) Reconstructed image by level 3b image division. (f) Non-occluded

image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Reconstructed image by different image division methods. (a) An occluded

image. (b) Reconstructed image by level 1 image division. (c) Reconstructed

image by level 2 image division. (d) Reconstructed image by level 3a image di-

vision. (e) Reconstructed image by level 3b image division. (f) Non-occluded

image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Page 10: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

viii LIST OF FIGURES

Page 11: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

List of Tables

5.1 Reconstruction quality of the complete image (PSNR)[dB] for granularity effect 37

5.2 Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for

granularity effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Number of Pixels used in Reconstruction . . . . . . . . . . . . . . . . . . . . . 37

5.4 Processing Time (sec) for granularity effect . . . . . . . . . . . . . . . . . . . 38

ix

Page 12: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

x LIST OF TABLES

Page 13: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 1

Introduction

1.1 Background

Face recognition has been one of the most challenging and active research topics in computervision for the last several years (Zhao et al., 2003). The goal of face recognition is torecognize a person even if the face is occluded by some object. A face recognition systemshould recognize a face independently and robustly as possible to the image variations suchas illumination, pose, occlusion, expression, etc. (Kim et al., 2007). A face is occluded ifsome area of the face is hidden behind an object like a sunglass, a hand, a mask, as seenin Figure 1.1. Face occlusions can degrade the performance of face recognition systemsincluding humans.

Recent research projects e.g. (M.Al-Naser and Soderstrom, 2011) have used pre-determinedoccluded areas in standardized positions. After occlusion detection, aPCA (asymmetricalPrincipal Component Analysis) (Soderstrom and Li, 2011) was used for entire face recon-struction. aPCA is used to estimate an entire image based on the subset of the image,e.g. to reconstruct a partially occluded facial image using the non-occluded facial partsof the image. The experiments used a small database (n = 116) of facial images with noclassification (Martinzer and Benavente, 1998). A property of the reconstructed imagesin (M.Al-Naser and Soderstrom, 2011) is that the reconstructed images have sharp edgesbetween the original and reconstructed regions.

This application can be used by the law enforcement agencies, access control systems,surveillance at different public places like ATM machines, air ports etc.

1.2 Goals of the thesis

The overall goal of this thesis is to improve the performance of aPCA for reconstruction ofoccluded regions of facial images.

The primary goal is to develop an algorithm for automatic detection and reconstructionof facial occlusions. The algorithm should be automatic and detect smaller occlusions com-pared to previous work. Furthermore, arbitrary occlusion should be handled, i.e. occlusionsof any part of the face.

A secondary goal is to develop an algorithm for smoothing the reconstructed images toreduce the edges between the original and reconstructed regions.

1

Page 14: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

2 Chapter 1. Introduction

Figure 1.1: Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion.

A tertiary goal is to extend the AR database with more images and to classify the imagesindividually according to gender, ethnicity etc.

1.3 Related work

1.3.1 Occluded face reconstruction

M.Al-Naser and Soderstrom (2011) reconstructed the occluded regions using asymmetricalprincipal component analysis (aPCA). The occluded facial regions were estimated basedon non-occluded facial regions. They did not detect the occlusion automatically ratherocclusion on the facial images was marked manually. Jabbar and Hadi (2010) detected theface area using a combination of skin color segmentation and eye template matching. Theyused fuzzy c-mean clustering algorithm for detection of occluded facial regions. When theoccluded region was one of the symmetric facial feature such as eye, then this feature is usedto recover the occluded area. When the occluded area was not one of the symmetric facialfeature then they used the most similar mean face from the database.

1.3.2 Facial occlusion detection

Min et al. (2011) performed the facial occlusion detection caused by sunglasses and scarvesusing the Gabor wavelet. The face image were divided into an upper and lower half. Theupper part was used to detect sunglass occlusions while the lower part was used for scarfocclusion detection. Kim et al. (2010) proposed a method to determine if a face is occludedby measuring skin color area ratio (SCAR). Oh et al. (2006) found the occlusion by firstdividing the facial images into a finite number of local disjoints patches and then examineeach patch separately.

Page 15: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 2

Theory

2.1 Principal Component Analysis (PCA)

PCA (Jollifie, 2002) is a mathematical procedure that is used to transform potentially cor-related variables into uncorrelated variables. Suppose we have a data matrix of observationsof N correlated variables X1,X2,. . . ,XN , PCA will transform the Xi variables into N newvariables Yi that are uncorrelated. The variables Yi are called principal components. Thefirst principal component is in the direction of the largest variance of the origin. The otherprincipal components are orthogonal to each other and represent the largest residual vari-ance, see Figure 2.1.

PCA can be used as a dimension reduction method to represent multidimensional, highlycorrelated data, with fewer variables. PCA is used for, e.g. information extraction, imagecompression, image reconstruction and image recognition.

2.1.1 PCA method/model

Image-to-vector conversion

A 2-dimensional image is transformed to a 1-dimensional vector by placing the rows side byside, i.e.

x = [p1, p2, . . . , pr]T , (2.1)

where pi is the ith row of p and r is total number of rows. Each image is stored in a vectorand each vector is stored in a matrix column wise.

Subtract the Mean

The mean of each vector is calculated and is subtracted from each vector to produce a vectorwith zero mean. Let I0 represent the mean then it is calculated as

I0 =1

N

N∑j=1

Ij , (2.2)

where N is the number of variables I.

3

Page 16: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

4 Chapter 2. Theory

Figure 2.1: The first vector Z1 is in direction of maximum variance and second vector Z2 isin direction of residual maximum variance.

Calculate the covariance matrix

The covariance of the mean centred matrix is calculated as

Cov = WTW, (2.3)

where W is a r-by-c sized matrix composed of the column vectors (Ii− I0). Cov is a squarematrix of size r-by-c.

Calculate the eigenvectors and eigenvalues of covariance matrix

The Singular Value Decomposition (SVD) Strang (2003) of a matrix A (r-by-c) decomposes

Er×c = Ur×rΣr×cVTc×c = [u1, u2, . . . , ur]

σ1σ2

. . .

σr0...0

[v1, v2, . . . , vc]

T , (2.4)

where U is an r-by-r unitary matrix, σ is an rxc rectangular diagonal matrix and V is ancxc unitary matrix. In general, U and V are the left and right singular vectors, respectivelyand the singular values σi ≥ 0 are sorted in descending order. If A is symmetric positivedefinite, U = V and contain the eigenvectors and σi are the eigenvalues.

Choosing components and forming a feature vector

The eigenvector that is associated with the highest eigenvalue represents the greatest vari-ance in the data whereas the eigenvector associated with lowest eigenvalue represents theleast variance. The eigenvalues decrease in an exponential pattern (Kim, 1996). It is esti-mated that 90% of the total variance is contained in the first 5% to 10% of the dimensions.The eigenvectors associated with low eigenvalues are less significant and can be ignored. A

Page 17: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

2.1. Principal Component Analysis (PCA) 5

feature vector b is constructed by selecting M eigenvectors associated with highest eigenval-ues, from a total of N eigenvectors, i.e.

b = (e1, e2, ..., eM ). (2.5)

Deriving the new dataset

Take transpose of Feature Vector b and multiply it with W to get the final dataset Φ

Φ = bTW. (2.6)

2.1.2 PCA for images

The PCA is computed as the SVD of the covariance matrix Cov of the facial images. AnEigenspace φ is created by using the equation

φj =∑i

bij(Ii − I0), (2.7)

where bij is eigenvector of covariance matrix {(Ii − I0)T (Ij − I0)}. Eq. 2.6 and 2.7 are thesame.

The projection coefficients {αj} ={α1, α2, α3......αx

}for each facial image are calculated

asαj = φj(I − I0)T . (2.8)

Each facial image is represented by taking the sum of the mean of all pixels and the weightedprincipal components. The representation becomes error free if all N principal componentsare used

I = I0 +

N∑j=1

αjφj . (2.9)

The final facial image is constructed by

I = I0 +

M∑j=1

αjφj , (2.10)

where M is number of selected principal components that are used for reconstruction of thefacial image. An image with negligible quality loss can be represented by a few principalcomponents because the first 5–10 % of the eigenvectors can represent more than 90% ofthe variance in the data (Kim, 1996).

PCA achieves compression since fewer (M) than the original dimensions (N) are used torepresent the images. A PCA model also allows images to be represented with only a fewvalues (α′s) and this is how PCA works for image representation.

2.1.3 Eigen faces

The eigenvectors or principal components of the distribution of faces are the eigenfaces.Eigenfaces are like the ghostly faces. The first 3 eigenfaces obtained from AR databasedescribed in section 3.1 can be seen in Figure 2.2. Each individual face can be representedby a linear combination of eigenfaces. Each face is approximated using the best eigenfacesthat have the most variance within the set of face images. The best M eigenfaces span anM-dimensional subspace-“face space”-of all possible images (Turk and Pentland, 1991).

Page 18: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

6 Chapter 2. Theory

Figure 2.2: Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface.

Figure 2.3: The blue part represents the eigenspace of non-occluded regions whereas thegreen part represents the pseudo eigenspace of the complete image.

2.2 Asymmetrical PCA (aPCA)

aPCA is a method for estimating the entire space based on a subspace of this space. Thismethod finds the correspondence between pixels in non-occluded regions and pixels behindoccluded regions.

2.2.1 Description of aPCA

aPCA is an extension of PCA (Principal Component Analysis). By using aPCA, entire facesare reconstructed by estimating the occluded regions based on the non-occluded regions ofthe images. Intensity (appearance) of non-occluded pixels is used to estimate the intensityof occluded pixels. In aPCA, two eigenspaces are constructed, one from non-occluded areasof occluded images where the eigenvectors are orthogonal to each other and the other spaceis the pseudo eigenspace that is constructed from the eigenvectors of the non-occluded imageregions. In the pseudo eigenspace, the eigenvectors are not orthogonal, as seen in the Figure2.2.

2.2.2 aPCA calculation

In aPCA, a pseudo eigenspace is created. It models the correspondence between the pixels inthe images but only non-occluded parts are orthogonal. Let Ino represents the non-occludedimage parts I. Ino is modelled in an eigenspace Φno =

{φno1 , φno2 , φno3 , . . . , φnoN

}using the

formula

φnoj =∑i

bnoij (Inoi − Ino0 ) (2.11)

Page 19: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

2.3. Skin color detection 7

where bnoij are eigenvector values of the covariance matrix {(Inoi − Ino0 )T (Inoj − Ino0 )} and Ino0

is mean of the non-occluded regions,

Ino0 =1

N

N∑j=1

(Inoj ). (2.12)

Eigenvectors of the non-occluded parts are used to make them orthogonal while the occludedparts are modelled according to the correspondence with the non-occluded parts. Thepseudo eigenspace Φp is calculated as

Φpj =∑i

bnoij (Ii − I0), (2.13)

where Ii is the original image and I0 is the mean of the original images.

Projection is used to extract the coefficients {αfj } from the eigenspace Φno

αnoj = Φnoj (Ino − Ino0 )T . (2.14)

The complete facial image I is reconstructed as

I = I0 +

M∑j=1

αnoj Φpj , (2.15)

where M is the selected number of pseudo components that are used for the reconstruction.By using the above calculated projection coefficients, a complete image can be reconstructedfrom only non-occluded parts of the image.

2.2.3 aPCA for reconstruction of occluded facial region

With the eigenspace modelling the non-occluded facial regions and pseudo eigenspace mod-elling the entire face, it is possible to use aPCA to estimate how a face image looks likebehind the occlusions. When the spaces are created, the entire face needs to be visible sothat the correspondence between the spaces can be modelled with aPCA.

The eigenspace is created according to Eq. 2.11 and a pseudo eigenspace is constructedaccording to Eq. 2.13. The correspondence between the facial regions is captured in thesetwo spaces. The non-occluded regions can then be used to extract projection coefficientsα (Eq. 2.14) meaning that only non-occluded pixels affect the representation. When thepseudo eigenspace is used with these coefficients to recreate an image of the entire face (Eq.2.15), the content of the previously occluded pixels is calculated based on their relationshipwith the non-occluded pixels.

2.3 Skin color detection

This section follows (Cheddad et al., 2009).

He uses 2 approximations l and l for skin color detection. l is calculated as

l(x) = ((r(x), g(x), b(x)) ∗ α) (2.16)

where * represents matrix multiplication and the transformation matrix

α = [0.298, 0.587, 0.140]. (2.17)

Page 20: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

8 Chapter 2. Theory

Figure 2.4: (a) and (b) represent the original images while (c) and (d) represent the registeredimages.

The matrix l is calculated as

l(x) = argxε{1,2,...,n}max(G(x), R(x)). (2.18)

An error signal for each pixel is calculated as

e(x) = l(x)− l(x), (2.19)

and classified as skin or not skin by

fskin(x) =

{1, if 0.02511 ≤ e(x) ≤ 0.11770, otherwise

. (2.20)

2.4 Image registration

Image registration is the process of transforming a set of images into one coordinate systemwithout changing the shape of the images. In this process, one image is selected as the baseimage and spatial transformations are applied on the other images so that these images alignaccording to the base image. Image registration is performed as a preliminary step in orderto apply different image processing operations on the dataset that have same coordinatesystem. If facial images are being aligned then after alignment, all the images will havetheir facial features like mouth eyes, nose, etc. in the same position.

2.4.1 Translation

Translation is a process of geometric transformation in which an image element locatedat a position (x1, y1) is shifted to a new position (x2, y2) in the transformed image. Thetranslation operation is defined as [

x2y2

]=

[x1y1

]+

[txty

](2.21)

where tx and ty are the horizontal and vertical pixels displacements, respectively.

Page 21: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

2.5. Peak signal-to-noise ratio (PSNR) 9

2.4.2 Rotation

Rotation is a geometric transformation in which the image elements are rotated by a specifiedrotating angle θ. The rotation operation is defined as[

x2y2

]=

[cos θ − sin θ− sin θ cos θ

] [x1y1

](2.22)

2.4.3 Scaling

Scaling is a geometric transformation that can be used to reduce or increase the size of theimage coordinates. The scaling operation is defined as[

x2y2

]=

[cx 00 cy

] [x1y1

](2.23)

2.4.4 Affine transformation

Affine transformation is a linear 2-D geometric transformation that uses rotation, scalingand translation operations. It maps variables located at position (x1, y1) in an input im-age into variables located at (x2, y2) in an output image by applying a linear combinationof translation, rotation, scaling and/or shearing (non-uniform scaling in some direction)operations. The Affine Transformation takes the form[

x2y2

]=

[a11 a12a21 a22

] [x1y1

]+

[txty

](2.24)

Facial images used in this thesis are aligned using Affine Transformations.

2.5 Peak signal-to-noise ratio (PSNR)

PSNR is used to calculate the ratio between the maximum possible value of a signal andthe power of distorting noise that affects the quality of its representation. It is often used asa benchmark level of similarity between constructed image and the original image (Santosoet al., 2011). PSNR compares the original image with the coded/decoded image to quan-tify the quality of data that is the output of decompressing the encoded data. A higherPSNR value means that the reconstructed data is of better quality. The mathematicalrepresentation of the PSNR is

PSNR = 10 log10

(max2

MSE

), (2.25)

where max is the maximum possible value of the image pixels and MSE is the mean squareddifference between the compressed and the original data.

MSE =

X∑m=1

Y∑n=1

[I1(m,n)− I2(m,n)]2

XY(2.26)

where I1 is the original image, I2 is the reconstructed image, X and Y are the number ofrows and columns respectively.

Page 22: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

10 Chapter 2. Theory

Page 23: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 3

Method

3.1 The AR face database

To perform the experiments, AR Face database (Martinzer and Benavente, 1998) was used.This database contains more than 4000 facial images of 126 persons including both male andfemale (70 men and 56 women). The database contains images with scarf and sunglassesocclusions and non-occluded images with different facial expressions. The original size of theimages is 768x576 pixels. The images were taken in controlled conditions with no restrictionson wearing and style.

3.2 Automatic occlusion detection

3.2.1 Replace white color with black color

The skin color detection method of section 2.3 classifies the white pixels as skin pixels.However, since white color is not a skin color, rather it is an occlusion. Therefore, whitepixels are always replaced by black pixels before skin color detection. A pixel is classifiedas white if its R, G, B values are all greater than 190, where 255 is the maximum value.

3.2.2 Image cropping

The original size of the images is 768x576 pixels. These images contain a lot of back groundarea that effect the quality of reconstructed images. Therefore the images are cropped to asize of 171x144 pixels.

3.2.3 Image division

The image (171x144)is divided into 6 parts: 2 head parts, 2 eyes parts and 2 mouth parts,see Figure 2.3 (b). The size of each head part is 45× 72 pixels, the size of each eyes part is54× 72 pixels and the size of each mouth part is 72× 72 pixels.

In the second step, each part is further divided into 9 sub parts, see Figure 2.3 (c). Bydoing this, smaller facial occlusions can also be detected. In the third step, each part ofsecond step is further divided into 9 sub parts, see Figure 2.3 (d).

11

Page 24: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

12 Chapter 3. Method

Figure 3.1: (a) an occluded facial image. (b) Image division into 6 parts. (c) Image divisioninto 54 smaller parts (d) Image division into 486 parts.

Figure 3.2: (a) an occluded facial image. (b) Image division into blocks. (c) Each blackblock represents an occluded block.

3.2.4 Occlusion detection for each block

To detect the occlusion for each block, the skin color information is used. If a pixel is nota skin pixel, it is marked as an occluded pixel. If 25% of the pixels in a block are non-skinpixels, the block is marked as an occluded block.

3.3 Occluded face reconstruction

After facial occlusion detection, a column vector is created that contains only the non-occluded parts of each image. The column vectors are stored in a matrix that contains thecorresponding non-occluded parts of the facial images in the database. Each image of thedatabase is also converted into a vector and stored in a matrix. If there are 100 images in thedatabase then this matrix will contain 100 vectors. The mean of each vector of non-occludedmatrix is calculated and subtracted from each value of the vector. Similarly, the mean ofeach vector of the original facial matrix is calculated and subtracted from each value of thevector. This produces a dataset whose mean is zero. The covariance cov of the non-occludedfacial matrix is calculated as described in section 2.1.1. The eigenvector and eigenvalues ofthe covariance matrix are calculated using the SVD. An eigenspace is constructed from thenon-occluded parts of the images. Similarly, a pseudo eigenspace is constructed from allparts of the images in the database. The projection is used to extract the coefficients fromthe eigenspace. These extracted coefficients will be used for facial images reconstruction.A specific number M = 50 of eigenvectors are used for the reconstruction of the images.The choice of M = 50 was found by initial experiments. The final facial images data isconstructed using the Eq. 2.15. At the last step, each vector of the matrix is reshaped toget the R, G, and B values for each image and to reconstruct the facial images.

3.3.1 PSNR calculation

PSNR of the input image and reconstructed image is calculated to check the quality of thereconstructed image. If value of PSNR is more than 30, then it is normally considered that

Page 25: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

3.3. Occluded face reconstruction 13

the reconstructed image is of good quality (Wikipedia, 2012).

Page 26: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

14 Chapter 3. Method

Page 27: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 4

Experiment

4.1 Granularity effect

This experiment examines the effect of the granularity of the occlusion on the aPCA recon-struction process. The image is divided into 6 parts at the first step. Occlusion for eachfacial part is determined. The non-occluded parts of the image are used to construct theeigenspace whereas the entire image is used to construct the pseudo eigenspace.

At the second step, the image is first divided into 6 parts and occlusion is determinedfor each block. If a part is occluded then this part is further divided into 9 sub parts andthe occlusion process is repeated.

At the third step, the image is first divided into 6 parts then each part into 9 sub partsbased on occlusion detection. Occlusion for each of these sub parts is determined. If any ofthe block is occluded, it is further divided into 9 sub parts and occlusion is determined forthese parts. These small parts are used to construct the eigenspace and the entire image isused to construct the pseudo eigenspace.

4.1.1 Metric

PSNR is used as a metric to determine the results of the granularity effect. PSNR iscalculated for the entire image and for only the reconstructed part of the image. Thenumber of non-occluded pixels used for encoding in each experiment are also calculated.

4.1.2 Sunglasses scenario

In this scenario, the mask input image is occluded by the sunglasses. The image is dividedinto sub parts, the occlusion is detected for each of these parts individually and the full facesare reconstructed using aPCA image reconstruction method. The average PSNR of all thereconstructed faces is calculated to determine the quality of the reconstructed facial imagesand the average PSNR value of all the reconstructed occluded parts are also calculated.Furthermore, the number of pixels used in the reconstruction process and the time takenby each division method are recorded.

In Figure 4.1, the image (a) is the original image, (b) is the input mask image occludedwith sunglasses and (c) represents the two eigenspaces. The occluded input mask image(b) will be used in the below given 3 test cases. The green ellipse represents the pseudoeigenspace that is constructed from the non-occluded images as given in the image (a)

15

Page 28: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

16 Chapter 4. Experiment

Figure 4.1: (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces.

and non-occluded parts of the occluded images. The blue ellipse represents the eigenspaceconstructed from the non-occluded parts of the occluded images.

Level 1 image division

In level 1 image division method, the mask input image is divided into 2 head parts, 2eyes parts and 2 mouth parts, see Figure 4.2(b). The occlusion for each part is detectedseparately. The full faces are reconstructed as described in section 3.3. In Figure 4.2, theimage a represents mask input image occluded with sunglasses, the image b represents thatthe image is divided into 6 different parts and in the image c, the area marked with theblack color is representing the detected occlusion in the eye parts. Note that by dividingthe image into 6 parts does not detect all the occlusion and also some non-occluded regionsare considered as occluded. The background regions in the 2 mouth parts are not detectedby level 1 image division method.

The reconstruction results of level 1 image division can be seen in Figure 4.3. Thereconstructed image has some circles around the eyes. This is due to some images with theeye glasses in the database, so the corresponding eigenvectors leave some imprints on thereconstructed images.

After reconstruction, the average PSNR of the complete reconstructed faces is calculatedand also of the occluded reconstructed regions only. Furthermore, the number of pixels usedin the reconstruction process are recorded. If more pixels are used in the reconstructionprocess, the reconstructed images should be better with higher average PSNR value.

Level 2 image division

In the level 2 image division method, the 6 parts of level 1 are further divided into 9 subparts, see Figure 4.4(b). Each of these parts undergoes occlusion detection process andaPCA is applied to reconstruct the facial images.

In Figure 4.4, the image a represents the mask input image occluded with sunglasses, theimage b represents that the image is divided into 54 sub parts and in the image c, the blackblocks represent the detected occlusions. The white background area that is not part ofmouth is considered as an occlusion. This background occlusion is also detected by dividingthe image into smaller parts. The level 2 image division method also marks some occludedarea as non-occluded area, see Figure 4.4(c) where some parts of sunglasses are marked asnon-occluded. The Figure 4.5 is an example of the image reconstruction using level 2 image

Page 29: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

4.1. Granularity effect 17

division. Note that there are prominent circles around the eyes, the black background areasnear the cheeks are not constructed well.

Level 3a image division

In the level 3a image division method, the 54 parts of level 2 are further divided into 9sub parts, see Figure 4.6(b). The complete image is divided into 486 very small parts andocclusion is detected for each part separately. After occlusion detection, aPCA is appliedto reconstruct the faces. Due to very small size of each part, very small occlusions can alsobe detected.

In Figure 4.6, the image a represents the mask input image occluded with sunglasses,the image b represents that the image is divided into 486 sub parts and in the image c,the black blocks represent the detected occlusions. The Figure 4.6 (c) shows that it hasdetected almost all the facial occlusion but also has also marked the non-occluded area asthe occluded area, hair and eyebrows are marked as occluded. The Figure 4.7 shows the facereconstructed by level 3a image division. The quality of the reconstructed image is betterthan level 1 and level 2 with less imprints of eye glasses around the eyes.

Level 3b image division

In the level 3b image division method, the 6 parts of level 1 are further divided into 9 subparts. The occlusion is detected for each of these parts separately. If a part is occluded,it is further divided into 9 sub parts, see Figure 4.8(c). The occlusion is detected for thesevery small parts and aPCA is applied to reconstruct the faces.

In Figure 4.8, the image (a) represents the mask input image occluded with sunglasses,the image (b) represents the detected occlusions by level 2 image division, the occlusion ismarked with the black color, the image (c) represents that the detected occluded area bylevel 2 image division is further divided into sub parts and again the occlusion is detectedfor these very small parts, the image (d) represents the occlusion detection by level 3b imagedivision method.

Note that background and sunglasses occlusion is detected and very less occluded areais marked as occluded. From the Figure 4.8 (d), we can note that nose and cheeks areanear the sunglasses that was marked as occluded in the Figure 4.8 (b) is now marked asnon-occluded area. The Figure 4.9 is an example of the image reconstruction using thismethod.

4.1.3 Scarf scenario

In this scenario, the input image is occluded by the scarf so that all the mouth area isoccluded. The image is divided into sub parts, the occlusion is detected for each of theseparts individually and the full faces are reconstructed using the aPCA method. The av-erage PSNR of all the reconstructed faces is calculated to determine the quality of thereconstructed facial images and the average PSNR value of all the reconstructed occludedparts are calculated. Furthermore, the number of pixels used in the reconstruction processand the time taken by each division method are recorded.

The figures 4.10 to 4.17 represent the 4 methods of image division applied on the maskinput image occluded with scarf, occlusion detected by each of these methods and thereconstructed faces reconstructed using the 4 image division methods by applying the aPCA.

Page 30: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

18 Chapter 4. Experiment

Figure 4.2: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

Figure 4.3: An example of the reconstructed face by level 1 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.2 (c). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.4: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.5: An example of the reconstructed face by level 2 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.4 (c). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.6: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

Page 31: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

4.1. Granularity effect 19

Figure 4.7: An example of the reconstructed face by level 3a image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.6 (c). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.8: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)Level 3b image division. (d) Occlusion detection by level 3b image division.

Figure 4.9: An example of the reconstructed face by level 3b image division (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.8 (d). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.10: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

Figure 4.11: An example of the reconstructed face by level 1 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.10 (c). (c) Reconstructedimage. (d) Non-occluded image.

Page 32: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

20 Chapter 4. Experiment

Figure 4.12: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.13: An example of the reconstructed face by level 2 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.12 (c). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.14: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

Figure 4.15: An example of the reconstructed face by level 3a image division. (a) Anoccluded image. (b) The occluded image masked by the mask from Figure 4.14 (c). (c)Reconstructed image. (d) Non-occluded image.

Figure 4.16: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)Level 3b image division. (d) Occlusion detection by level 3b image division.

Page 33: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

4.2. Pre-defined eigenspaces 21

Figure 4.17: An example of the reconstructed face by level 3b image division. (a) Anoccluded image. (b) The occluded image masked by the mask from Figure 4.16 (d). (c)Reconstructed image. (d) Non-occluded image.

Figure 4.18: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

4.1.4 Cap and sunglasses occlusion

In this scenario, the head is covered by the cap and the eyes are covered with the sunglasses.The mouth parts contain some background occlusion so some/all areas of all the 6 parts ofthe mask input image are occluded. The input image is divided into different parts, occlusionis detected for each part and aPCA is applied to reconstruct the faces. The average PSNR ofthe complete reconstructed images and of only occluded reconstructed parts are calculatedto determine the quality of the reconstructed images. The number of pixels used in thereconstruction process are recorded to determine the affect of the non-occluded pixels onthe quality of the reconstructed faces. The processing time of aPCA process is also recorded.

The figures 4.18 to 4.25 represent the 4 methods of image division applied on the maskinput image occluded with cap and sunglasses, occlusion detected by each of these methodsand the reconstructed faces reconstructed using these image division methods by applyingthe aPCA.

4.2 Pre-defined eigenspaces

In this experiment, 6 different pre-defined eigenspaces are created and the pseudo eigenspaceis constructed for each of them on all 116 images. The pre-defined eigenspaces have dif-

Figure 4.19: An example of the reconstructed face by level 1 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.18 (c). (c) Reconstructedimage. (d) Non-occluded image.

Page 34: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

22 Chapter 4. Experiment

Figure 4.20: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.21: An example of the reconstructed face by level 2 image division. (a) An occludedimage. (b) The occluded image masked by the mask from Figure 4.20 (c). (c) Reconstructedimage. (d) Non-occluded image.

Figure 4.22: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

Figure 4.23: An example of the reconstructed face by level 3a image division. (a) Anoccluded image. (b) The occluded image masked by the mask from Figure 4.22 (c). (c)Reconstructed image. (d) Non-occluded image.

Figure 4.24: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)Level 3b image division. (d) Occlusion detection by level 3b image division.

Page 35: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

4.2. Pre-defined eigenspaces 23

Figure 4.25: An example of the reconstructed face by level 3b image division. (a) Anoccluded image. (b) The occluded image masked by the mask from Figure 4.24 (d). (c)Reconstructed image. (d) Non-occluded image.

ferent kinds of sunglasses occlusions. When occlusion is detected, then the pre-definedeigenspace which has the least difference between the detected occlusion and the pre-definedeigenspaces is selected. This eigenspace is used to reconstruct the image with aPCA. Theclosest eigenspace is selected based on the positions of the occlusion in the eigenspace andthe detected occlusion. If the occlusion of a pixel is the same in both versions, the scoreis 0, but if they are different, the score is 1. Then the eigenspace with the lowest score isselected.

4.2.1 Metric

PSNR is used as a metric in two different ways. Calculate the PSNR for the entire imageand for only the reconstructed parts. This would only need to be done for the 6 different pre-defined occlusions. The number of non-occluded pixels used for encoding in each experimentare also recorded.

4.2.2 Experiment description

The pre-defined eigenspaces are constructed and saved in some storage media. Theseeigenspaces are created by dividing the occluded images of the Figure 4.19 as describedin the section 4.1.2. A pseudo eigenspace and 6 eigenspace for each of the images in the Fig-ure 4.26 are constructed and saved at some storage media. A vector containing the occlusioninformation about each part is also created and saved to be used later. If a part is occluded,1 is stored in respective vector element otherwise 0 is stored. The occlusion of the maskinput image is detected by following the section 4.1.2. A vector is created that contains theocclusion information about each part. This vector is compared to each vector of the pre-defined eigenspaces to calculate the number of occluded parts that have the same positionin both input mask image and the image used in construction of pre-defined eigenspace.The eigenspace that has the maximum number of same occlusion positions is selected forthe reconstruction of the facial images. The average PSNR of the complete reconstructedfacial images and of occluded reconstructed areas is calculated to determine the quality ofthe reconstructed facial images. The time taken to perform the aPCA operation is recordedto determine the efficiency of pre-defined eigenspaces.

The 6 faces having sunglasses occlusion that are used in the construction of 6 pre-defined eigenspaces can be seen in Figure 4.26. In the Figure 4.27, the image (a) is themask input image, the image (b) represents the occlusion detection by level 3 (b)imagedivision, the image (c) represents the pre-defined eigenspace that is selected based on thedetected occlusion in the image (b), the image (d) represents the reconstructed image usingpre-defined eigenspace.

Page 36: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

24 Chapter 4. Experiment

Figure 4.26: Occluded facial images used for construction of 6 eigenspaces.

Figure 4.27: (a) An occluded image. (b) Detected occlusion by level 3b image division.(c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d) Reconstructedimage using the eigenspace in (c).

Page 37: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 5

Results

In this chapter, the results of the experiments performed are described. The chapter isdivided into three parts. In the first part, the results of 4 image division methods for au-tomatic occlusion detection are discussed and images showing the output of these methodsare displayed. In the second part, the reconstruction results based on the 4 methods ofocclusion detection are discussed, tables containing the average PSNR values for the re-constructed faces, for the only reconstructed areas and table containing processing timefor each image division method are displayed. In the third part, the discussion about thepre-defined eigenspaces will be made to determine the efficiency and reconstruction qualityof the pre-defined eigenspaces. The tables containing the processing time to reconstruct thefaces with and without pre-defined eigenfaces and average PSNR values of reconstructedfaces are to be displayed and discussed.

5.1 Occlusion detection results

The Figure 5.1 represents the occlusion detection by different image division methods, image(a) represents the mask input image occluded with sunglasses , image (b) represents theocclusion detection by level 1 image division, (c) represents the occlusion detection by level2 image division (d) represents the occlusion detection by level 3a image division and (e)represents the occlusion detection by level 3b image division method. The grey blocksrepresent the marked occluded areas.

In the level 1 image division method, the complete image is divided into 6 large parts.The size of each part is large and to determine the occlusion, its 25% area should beoccluded. Due to large size of each part, less occlusion is detected. The image (b) showsthat the occlusion in both eyes parts is detected. Since white background in mouth part isalso an occlusion but it is not detected because this occlusion covers less than 25% of thecorresponding parts. The image (b) also shows that some non-occluded area in both eyespart is also marked as an occlusion.

In the level 2 image division method, the size of each part is small so it can detect thesmall occlusions. The image (c) shows that the eyes occlusions and background occlusionsin mouth parts are detected whereas less occluded area is marked as non-occluded area.But still the size of each part is large, some non-occluded area is also marked as occludedarea and less pixels are available for the reconstruction process. Many experiments wereperformed, the level 3a image division showed the best results for occlusion detection ascompared to all other methods.

25

Page 38: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

26 Chapter 5. Results

Figure 5.1: Occlusion detection by different image division methods. (a) Occluded image.(b) Occlusion detection by level 1 image division. (c) Occlusion detection by level 2 imagedivision. (d) Occlusion detection by level 3a image division. (e) Occlusion detection by level3b image division.

In the level 3a image division method, the size of each part is very small so it can detectvery small occlusions. The image (d) shows that it has detected almost all the occlusionwhile marking some non-occluded area as an occlusion. The image (d) shows that it hasmarked eyes and background occlusion correctly but has also marked eyebrows and hair asan occlusion.

Occlusion detection by level 3b image division, the process is divided into two steps.In the first step, the image is divided as described in the section 4.1.2 and the occlusion isdetected for each part. This process detects the small occlusions whereas some non-occludedarea is marked as an occlusion. In the second step, the occluded area marked at the firststep is further divided into sub parts and occlusion is detected for each sub part. By doingthis, the non-occluded areas marked as occluded area in the first step are now marked asnon-occluded areas and more pixels gets available for the reconstruction of faces, see Figure5.1 (e). The level 3b is also a good occlusion detection method.

5.2 Reconstruction quality results

The quality of reconstructed faces is determined by PSNR. The average PSNR is calculatedof the complete reconstructed faces and of the reconstructed occluded parts only. Table5.1 shows the average PSNR of the complete reconstructed faces and Table 5.2 shows thePSNR for the reconstructed occluded parts. In tables 5.1, 5.2, 5.3 and 5.4, Level 1 shows thereconstruction of faces by level 1 image division, Level 2 shows the reconstruction of faces bylevel 2 image division, Level 3a shows the reconstruction of faces by level 3a image divisionand Level 3b shows the reconstruction of faces using level 3b image division method. Thenumber of pixels used in the reconstruction of faces are recorded to determine the impactof number of non-occluded pixels on the quality of the reconstructed faces. Furthermore,the processing time taken by each image division method is also recorded.

Table 5.1 contains the average PSNR values of all 116 reconstructed faces for 3 differenttypes of occlusions. The level 1 image division has the maximum average PSNR value incase of sunglasses occlusion whereas the level 3a image division has maximum average PSNRvalue in scarf and cap & sunglasses occlusion.

Table 5.2 contains the average PSNR values of the reconstructed occluded parts only for3 different types of occlusions. The level 1 image division has the maximum average PSNRvalue in sunglasses and cap & sunglasses occlusion while the level 3a has maximum averagePSNR value in the scarf occlusion.

Table 5.3 contains the number of non-occluded pixels that are used in the reconstructionof the facial images. The quality of the reconstructed faces generally increases with theincrease of number of non-occluded pixels.

Page 39: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

5.2. Reconstruction quality results 27

Table 5.1: Reconstruction quality of the complete image (PSNR)[dB] for granularity effectOcclusion type Level 1 Level 2 Level 3a Level 3bSunglasses 23.46 23.22 23.19 23.33Scarf 19.85 19.85 20.01 19.87Cap and sunglasses 19.95 20.32 20.38 20.34

Table 5.2: Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for gran-ularity effect

Occlusion type Level 1 Level 2 Level 3a Level 3bSunglasses 23.30 20.99 20.89 20.78Scarf 18.46 18.55 18.77 18.54Cap and sunglasses 21.66 19.09 18.88 18.99

Table 5.3: Number of Pixels used in ReconstructionOcclusion type Level 1 Level 2 Level 3a Level 3bSunglasses 50544 49464 47640 53496Scarf 42768 42768 43512 45264Cap and sunglasses 31104 42768 44736 46656

Table 5.4: Processing Time (sec) for granularity effectOcclusion type Level 1 Level 2 Level 3a Level 3bSunglasses 24.04 37.77 40.33 43.60Scarf 22.53 36.81 38.80 41.48Cap and sunglasses 25.93 38.71 40.58 41.06

Page 40: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

28 Chapter 5. Results

Figure 5.2: Reconstructed image by different image division methods. (a) An occludedimage. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructedimage by level 3b image division. (f) Non-occluded image.

Figure 5.3: Reconstructed image by different image division methods. (a) An occludedimage. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructedimage by level 3b image division. (f) Non-occluded image

Table 5.4 contains the processing time for 4 image division methods that are appliedon 3 types of occlusions. The results show that the level 1 image division takes the leastprocessing time whereas the level 3b method takes the most processing time. This showsthat the processing time depends on image division. When the size of each image part islarge, it takes less processing time and when the size of each image part is small, it takesmore processing time.

The Figure 5.2 represents a single image reconstructed using different image divisionmethods. The image (a) represents an occluded image, image (b) shows that the qualityof the reconstructed image is good except some circles around the eyes but these circlesare not very prominent. The images (c) shows that the quality of the reconstructed imageis not good as we can notice prominent circles around the eyes. The white backgroundarea is also not reconstructed well. The image (d) and (e) represent that the images arereconstructed with good quality with some circles around the eyes but the circles are notprominent and the image (f) represents the non-occluded image. The visual evaluationand average PSNR values of the reconstructed images show that the level 3a image divisiongenerates the images with highest quality as compared to all other image division methods.

5.3 Reconstruction results using pre-defined eigenspaces

Six pre-defined eigenspaces were constructed using six sunglasses occlusion masks where thevector was created by level 3a image division. The occlusion of the mask input image isdetected and based on the detected occlusion, the closest eigenspace is selected for recon-struction process. The average PSNR of the reconstructed faces is calculated to determinethe quality of the reconstructed faces. The processing time is recorded to determine the ef-ficiency of the pre-defined eigenspace. Many experiments were performed and the deductedresults showed a remarkable decrease in processing time with negligible quality loss of the

Page 41: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

5.3. Reconstruction results using pre-defined eigenspaces 29

reconstructed faces.The processing time for pre-defined eigenspace was 6.2 seconds compared to 40.3 seconds

for run time eigenspace. The average PSNR of the reconstructed faces using pre-definedeigenspace was 23.0 dB and 23.1 dB for level 3a. Thus, time for the pre-defined eigenspaceis much less than the time taken by the level 3a image division method. There is a mini-mal decrease in quality of the reconstructed faces that were reconstructed from pre-definedeigenspaces but the run time creation of the eigenspaces takes a lot of processing time ascompared to the pre-defined eigenspaces.

Page 42: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

30 Chapter 5. Results

Page 43: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 6

Conclusions

6.1 Discussion about granularity effect and reconstruc-tion quality

The occlusion is detected by 4 image division methods, level 1, level 2, level 3a and level 3b.In the level 1 image division method, the image is divided into 6 large parts, the occlusionis detected for each part. As the size of each part is large, small occlusions are not detected.It can mark large non-occluded area as an occlusion and also some occluded areas as non-occluded areas. This method generated the best reconstruction results in case of sunglassesscenario because detected occlusion exactly matches the 2 boxes. Another advantage ofusing this method is that it takes less processing time compared to all other image divisionmethods.

The level 3a image division method can detect very small occlusions as the size of eachpart is very small. The main advantage of this method is that it can mark almost allocclusions. This method also marks hair around the head area and eyebrows as an oc-clusion. Many experiments were performed, facial occlusion was detected, the faces werereconstructed and average PSNR values were calculated that showed that the level 3a is thebest occlusion detection method. Moreover, it generated best reconstruction results in caseof scarf and sunglasses & cap occlusion.

The level 2 and level 3b reconstructed the images with the worst results.The pixels’ positions in the non-occluded images and in the occluded images is not the

same, so the quality of the reconstructed images is not as good as it should be. The qualitycan be enhanced by marking the occluded area on the same non-occluded images and thenuse these images for the reconstruction process. By doing this, the average PSNR valueincreases 4 to 5 dB.

6.2 Discussion about pre-defined eigenspaces

The experiments were performed and level 3a was selected as the best method that yieldedthe good reconstruction results but it takes more processing time. The limitation of moreprocessing time can be overcome by defining the pre-defined eigenspaces. For the exper-iments, 6 eigenspaces based on the sunglasses occlusions were defined and saved on somemedia storage devices to be used later. Experiments were performed and use of the pre-defined eigenspaces showed a remarkable decrease in processing time with less amount of

31

Page 44: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

32 Chapter 6. Conclusions

decrease in the quality of the reconstructed faces. This small quality loss should be accept-able where time is critical/important.

6.3 Limitations

This occlusion method is based on the skin color detection so it can not detect the faceoccluded by objects of skin color. It works well for the Caucasians and Asian people butcan not detect the occlusion of black colored people because black color is marked as anocclusion. The hair covering the head, eyebrows and beard are also not detected as facepart when mask input image is divided into very small parts. The images are not registeredproperly. If the images are registered correctly, i.e. all the facial points like eyes, nose, lipsare at the same position in all the images then the quality of the reconstructed faces can beenhanced.

6.4 Future work

The algorithm of the occlusion detection can be more generalized so that it can detect theobjects of skin color and also people having black color. More images can be added tothe database so that the extensive study of the aPCA can be made. Furthermore, if thedatabase base is big, then it can be divided into different groups based on gender, ethnicityetc.

Page 45: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

Chapter 7

Acknowledgements

By the blessings of Almighty Allah and the prayers of my parents, I have accomplishedthis work. First of all, I want to thank my external supervisor Dr. Ulrik Soderstrom whois ever ready to support and give his time to the students. I would also like to thank myinternal supervisor Dr. Niclas Borlin for arranging proper meetings and providing guidancein writing the thesis report due to which I was able to finish my thesis in time. I am alsograteful to my parents, family and friends, especially A. Mushtaq, for their moral support.

33

Page 46: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

34 Chapter 7. Acknowledgements

Page 47: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

References

Cheddad, A., Condell, J., Curran, K., and Kevitt, P. M. (2009). A skin tone detectionalgorithm for an adaptive approach to steganography. Signal Processing, 89(12).

Jabbar, D. E. K. and Hadi, W. J. (2010). Face occlusion detection and recovery uisng fuzzyC-Means. Engineering and Technology Journal, 28.

Jollifie, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. Springer-Verlag, 2nd edition.

Kim, G., Suhr, J. K., Jung, H. G., and Kim, J. (2010). Face occlusion detection by usingB-Spline active contour and skin color information. In Proceedings of 11th Int. Conf.Control, Automation, Robotics and Vision, pages 189–190, Singapore.

Kim, K. (1996). Face recognition using principle component analysis. In In Proceeding ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 586–591.

Kim, T. Y., Lee, K. M., and Lee, S. U. (2007). Occlusion invariant face recognition usingTwo-Dimensional PCA. In International Conferences VISAPP and GRAPP, volume 4 ofCommunications in Computer and Information Science. Springer-Verlag, Berlin.

M.Al-Naser and Soderstrom, U. (2011). Reconstruction of occluded facial images usingasymmetrical principal component analysis. In Proceedings of Intl. Conference on Sys-tems, Signals, and Image Processing (IWSSIP), pages 257–260.

Martinzer, A. M. and Benavente, R. (1998). The AR face database. CVC Technical Re-port 24.

Min, R., Hadid, A., and Dugelay, J.-L. (2011). Improving the recognition of faces occludedby facial accessories. In Proceedings of 9th IEEE Conference on Automatic Face andGesture Recognition, Santa Barbara, CA, USA.

Oh, H. J., Lee, K. M., and Lee, S. U. (2006). Occlusion invariant face recognition usingselective non-negative matrix factorization basis images. In Computer Vision - ACCV2006, volume 3851 of Lecture Notes in Computer Science, pages 120–129. Springer-Verlag,Berlin.

Santoso, A. J., Nugroho, D. L. E., Suparta, D. G. B., and Hidayat, D. R. (2011). Compres-sion ratio and peak signal to noise ratio in grayscale image compression using wavelet.International Journal of Computer Sceince and Technology, 2(2).

Soderstrom, U. and Li, H. (2011). Asymmetric principal component analysis theory and itsapplications to facial video coding. In Effective Video Coding for Multimedia Applications,CS in, page 16. InTech, Umea.

35

Page 48: Automatic Facial Occlusion Detection and Removal · vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is

36 REFERENCES

Strang, G. (2003). Introduction To LINEAR ALGEBRA. Wellesley-Cambridge Press, thirdedition.

Turk, M. and Pentland, A. (1991). Face recognition using eigenfaces. In In Proceeding of theIEEE Conference on Computer Vision and Pattern Recognition, pages 586–691, Hawaii.

Wikipedia (2012). http://en.wikipedia.org/wiki/Peak signal-to-noise ratio. Last accessedon 2012-10-16.

Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A. (2003). Face recognition: Aliterature survey. ACM Computing Surveys, 35(4):399–458.