Face Detection Evaluation

7/31/2019 Face Detection Evaluation

1/11

CAP 6415 Computer Vision Project Report Version 2

Face Detection Using Skin Color and Haar-like features

Implementation and Evaluation

ByVasant Manohar

Department of Computer Science and Engineering

University of South Florida

Email: [email protected]

AbstractFrom a research point of view, well-established problems need

standard databases, evaluation protocols and scoring methods

available. Evaluating algorithms lets researchers know the strengths

and weaknesses of a particular approach and identifies aspects of a

problem where further research is needed. In this report, two face

detection algorithms (one based on Haar-like features and the otherbased on skin color modeling) have been implemented. They have

been empirically evaluated and the strengths and weaknesses of each

method are identified. Also, with the changing values of the measures

a global setting of parameter for the skin-color based face detector,

which is optimal for most of the images, is identified. Since the

performance has been quantified, a conclusion as to which algorithm

outperforms the other is made. Experimental results of both the face

detection algorithms and empirical evaluation have been produced.

1. Introduction and OverviewTwo face detection algorithms are implemented as part of this

project. One based on skin color and connected component operatorsand the other method based on Haar-like features.

Both the methods were implemented using Open Computer

Vision Library. As for Haar face detection, the functions of OpenCV

were used to implement the face detector. Also, the classifier was not

trained on any training data. The reason being, the Haar face detector

has been already trained with face data and the learnt parameters of

the classifier already existed in the library.

However, as for the other method based on skin color, the images

from the dataset were used to train the classifier. In fact, we can call

the method as semi-supervised, because, sample pixels of the skin

are input to the classifier before it detects the face in the image.

Again, this is not mandatory, but was done in this project because

acquiring samples for skin color from huge databases seemeddifficult. Hence, it was checked if given the samples of the skin

pixels that appear in the image, will the classifier be able to segment

the face effectively. It should be noted here that the number of skin

pixels that were extracted from the image were kept minimal. The

emphasis was on making the training set size as small as possible

while building a classifier with acceptable accuracy.

Both the face detectors will detect only frontal faces. Profile

faces, faces that are partially occluded, and heads will not be

detected.

It is worth mentioning that the emphasis of this project was no

developing robust face detection algorithms. Rather, effort

directed to develop a framework for empirical evaluation

algorithms for face detection (in fact object-detection, to mak

generic). By way of performance evaluation, the aspects of e

algorithm that needs improvement were identified.

Using the proposed measures, we can do the following:1. Quantitatively measure the performance.

2. Compare the performance of an algorithm for different kind

data.

3. Possible to quantitatively compare different detection algorithm

4. In the course of an algorithms development, any performa

improvement can be measured.

5. Trade-offs between performance aspects can be determined.

6. Parameter settings of algorithms that is optimal on majority of

images can inferred from performance plots.

The report is organized into the following sections. Sectio

introduces the performance evaluation measures on which

algorithms will be evaluated. Section 3 briefly explains the Haar detector and discusses the results. Section 4 discusses the skin-c

based face detector and details the results. Section 5 shows

ground truth. It also discusses the ground truthing issues and

performance evaluation can be made independent of ground truth

errors. Section 6 explains how the measures were used to arrive

global setting for the parameters of skin color-based face detec

The details of the evaluation results of the two algorithms and

inferences that can be drawn from them are explained in Sectio

Section 8 describes the future scope of work on the project.

conclusions drawn from the work are presented in Section 9.

2. Performance Evaluation [1]

This section details the measures used to quantify the diffeaspects of an algorithms performance. The strengths

weaknesses of each of these measures are described.

The value of each measure is between zero (worst) and one (be

Fig. 1 introduces the concept of the recall and precision applie

the detection measures. There can be two forms of false alarms,

results from the non-overlapping region in the detected area ca

the false positive (FP) or in other words, this area is classified a

object but the ground truth is absent. The other form of false al

results from the missed region in the ground truth area and thi
mailto:[email protected]:[email protected]


2/11

called false negative (FN). Precision gives an idea of how well the

detected area match with the ground truth area. Recall on the other

hand gives an idea of how well the ground truth overlaps with the

detected area. All the measures described have the recall and

precision counterparts so that the FP and FN errors are accounted for.

Figure 1: Recall and Precision Concept

The measures are organized in growing level of complexity and

accuracy. The first measure, the Object Count Accuracy (Sec 2.1.1)is a trivial measure and it simply counts the number of detected

objects with respect to the ground truth objects without checking for

how accurate they overlap with each other. Next, the pixel-based

measures, which check for the raw pixel overlaps between the object

and ground truth boxes are defined in Sec 2.1.2 and 2.1.3. Here the

entire frame is considered as a bit-map without any distinctions made

between the different objects. If there were a detected box

overlapping another detected box, then this measure would not make

any distinction as it considers the union of the areas. Here, bigger

boxes have an advantage over smaller boxes. The measures

discussed in Sec 2.1.4 and 2.1.5 are area-thresholded measures. If the

overlap between the ground truth and detected box is greater than a

threshold, then full credit is given for the particular box pair. Next,

the area-based measures are discussed in Sec 2.1.6 and 2.1.7, wherethe measures treat the individual boxes equally regardless of the size,

in contrast to the pixel-based measure, which treats bigger boxes

differently than smaller boxes. The area-based measure takes into

account the individual objects as opposed to not making such

distinction in the case of pixel-based measure. In Sec 2.1.8, the

fragmentation measure is discussed. This measure penalizes

algorithms if they break individual ground truth box into multiple

detected boxes.

We also propose a set of measures, which are based on a

requirement that there is a one-to-one mapping between each ground

truth box and the detected box. We measure the positional accuracy

of the detection output to the ground truth in Sec 2.2.1. A size-based

measure is discussed in Sec 2.2.2, while Sec 2.2.3 discusses anorientation-based measure. Finally, we propose a composite measure

in Sec 2.2.4, which is area-based and takes into account the recall,

precision and fragmentation.

2.1 Measures independent of Ground Truth and Detected

Box MatchingThe measures proposed in this section are independent of Ground

Truth and Detected Box Matching. This is because, whenever

calculating overlaps, it considers spatial union of boxes, wh

makes sure that overlapped areas are not counted twice.

2.1.1 Object Count AccuracyThis measure compares the number of ground-truth objects in

frame with the number of algorithm output boxes. It penalizes

algorithm both for extra or fewer boxes than the ground truth. Le

be the set of ground truth objects in the image and letD be the se

output boxes produced by the algorithm. TheAccuracy is defined

G + D =

G, D

G + D

undefined if N N 0

Minimum (N N )Accuracy otherwise

N N

2

=

whereNGandNDare the number of ground-truth objects and outp

boxes, respectively in the image.

The measure does not consider the spatial information of th

boxes. Only the count of boxes in each frame is considered. T

measure could be useful in evaluating algorithm performances

correctly identifying the number of objects in a given imirrespective of how close they are with respect to the ground t

object. Consider a scenario in which there are 10 ground truth obj

and algorithm A finds 8 boxes (say) and algorithm B finds 2 bo

then A is obviously better than B as for identifying the numbe

objects in the image. To measure the accuracy in-terms of over

with respect to area there are other measures.

2.1.2 Pixel-based RecallThe measure measures how well the algorithm minimizes f

negatives. This is a pixel-count-based measure.

Let UnionGand UnionD be the spatial union of boxes in G and

G

G i

i=N

Union = Gi=1U

where Gi represents the ith ground truth object in the image.

D

D i

i=N

Union = D

i=1

U

whereDi represents the ith detected object in the image.

We define Recall as the ratio of the detected areas in the gro

truth with the total ground truth:

1

G

G D

G

undefined if Union =

Union UnionRecall - otherwiseUnion

=

I

where | | operator denotes the number of pixels in the given are

This measure treats the frame notas collection of objects but

binary pixel map (object/non-object; output-covered/not-out

covered). So the score increases as the overlap increases and wil

1 for complete overlap.


3/11

2.1.3 Pixel-based Precision

The measure measures how well the algorithm minimizes false

positives. This is a pixel-count-based measure.

Let UnionGand UnionD be the spatial union of boxes in G andD.

G

G i

i=N

Union = G

i=1

U

D

D i

i=N

Union = D

i=1

U

We define Precision as the ratio of the detected areas in the

ground truth with the total detection:

1

D

D G

D

undefined if Union =

Union UnionPrecision- otherwise

Union

=

I

where | | operator denotes the number of pixels in the given area.This measure treats the frame notas collection of objects but as a

binary pixel map (object/non-object; output-covered/not-output-

covered). So the score increases as the false positives reduce and will

be 1 for no false positives.

2.1.4 Area-thresholded Recall

In this measure, a ground-truth object is considered detected if the

output boxes cover a minimum proportion of its area. Recall is

computed as the ratio of the number of detected objects to the total

number of ground-truth objects.

LetArea-thresholded_Recall be the number of detected objects in

an image:

G

Gi

N

GctObjectDete

calldthresholdeArea i=)(

Re_

where,

1

0

i D

i i

G Unionif > OVERLAP_MIN

ObjectDetect(G ) = G

otherwise

I

Here OVERLAP_MIN is the minimum proportion of the ground-

truth objects area that should be overlapped by the output boxes in

order to say that it is correctly detected by the algorithm.

Here again, the ground-truth objects are treated equally regardless

of size.

2.1.5 Area-thresholded PrecisionThis is a counterpart of measure 2.1.4. The measure counts the

number of output boxes that significantly covered the ground truth.

An output boxDisignificantly covers the ground-truth if a minimum

proportion of its area overlaps with UnionG.

Let Area-thresholded_Precision be the number of output boxes

that significantly overlap with the ground-truth objects:

D

D

i

N

DecisionBox

ecisiondthresholdeArea i

=

)(Pr

Pr_

where,

1

0

i G

i i

D Unionif > OVERLAP_MIN

BoxPrecision(D) = D

otherwise

I

Here OVERLAP_MIN is the minimum proportion of the ou

boxs area that should be overlapped by the ground truth in orde

say that the output box is precise.

Again, in this measure the output boxes are treated equ

regardless of size similar to the previous measure.

2.1.6 Area-based RecallThis measure is intended to measure the average area recall o

the ground-truth objects in the image. The recall for an object is

proportion of its area that is covered by the algorithms output bo

The objects are treated equally regardless of size.

We define Recall as the average recall for all the objects in

ground truth G:

i

i

G

ObjectRecall(G)GRecall =

N

where,

0i

i i D

i

undefined if G

ObjectRecall(G) = G Unionotherwise

G

=

I


All the ground-truth objects contribute equally to the measregardless of their size. On one extreme, if an image contains

objects a large object that was completely detected and a v

small object that was missed, thenRecall will be 50%.

2.1.7 Area-based PrecisionThis is a counterpart of the previous measure 2.1.6 where

output boxes are examined instead of the ground-truth obje

Precision is computed for each output box and averaged for

whole image. The precision of a box is the proportion of its area

covers the ground truth objects.

We define Precision as the average precision of the algorith

output boxesD:

i

i

D

BoxPrecision(D)DPrecision =

N

where,

0i

i i G

i

undefined if D

BoxPrecision(D) = D Unionotherwise

D

=

I



4/11

In this measure the output boxes are treated equally regardless of

size.

2.1.8 Average FragmentationDetection of objects is usually not the final step in a vision

system. For example, extracted text from video will go through

enhancement, binarization and finally recognition by an OCR

system. Ideally, the extracted text should be in one piece, but a

detection algorithm could produce several boxes (e.g. one for each

word or character) or multiple overlapping boxes, which could

increase the difficulty for the next processing step.

The measure is intended to penalize an algorithm for multiple

output boxes covering a ground-truth object. Multiple detections

include overlapping and non-overlapping boxes.

For a ground-truth object Gi, the fragmentation of the output

boxes overlapping the object Giis measured by:

0

1

1

D G

i

10 D G

undefined if N

Frag(G) =otherwise

+ log (N )

=

I

I

where D GN I is the number of output boxes in D that overlap with

the ground-truth object Gi.

For an image, Frag is simply the average fragmentation of all

ground-truth objects in the image where Frag(Gi) is defined. This is

a particularly useful metric for face detection.

3. Haar face detectorHaar object detection, partly motivated by face detection, was

primarily developed with the goal of rapid object detection.

Since the Haar face detector is not the main face detector under

investigation in this project, only a brief overview of the method is

provided. For a detailed description of the technique, please refer [2].

There are three main contributions of the method. The first is theintroduction of a new image representation called the Integral

Image, which allows the features used by the detector to be

computer very quickly. The second is a learning algorithm, based on

AdaBoost, which selects a small number of critical visual features

and yields extremely efficient classifiers. The third contribution is a

method for combining classifiers in a cascade which allows

background regions of the image to be quickly discarded while

spending more computation on promising object-like regions. [2]

Haar face detector is implemented in OpenCV. The function

cvHaarDetectObjects( ) finds rectangular regions in the given image

that are likely to contain objects the cascade has been trained for and

returns those regions as a sequence of rectangles. The function scans

the image several times at different scales. Each time it considersoverlapping regions in the image and applies the classifiers to the

regions. It may also apply some heuristics to reduce number of

analyzed regions, such as Canny pruning.

The default parameters are

1. scale_factor = 1.1, min_neighbors = 3, flags = 0, tuned foraccurate yet slow face detection.

2. For faster face detection on real video images the bettersettings are scale_factor = 1.2, min_neighbors = 2,

flags=CV_HAAR_DO_CANNY_PRUNING.

However, each setting is associated with a cost. While, the def

settings might detect faces most of the cases, it might also decla

face when there is none in the image. On the other, though the o

setting might not detect faces when there is no face in the imag

might also not detect if there are faces in the image. Figs. 2 an

explain this possibility.

(2a) Original Image

(2b) Results with setting 2

(2c) Results with setting 1 (default)

Figure 2: Results showing the better results of default setting


5/11

(3a) Original Image

(3b) Results with setting 2

(3c) Results with setting 1 (default)

Figure 3: Results showing the false positives produced by default

settings.

Though, the default settings produced false positives, the fact thatit detects the face accurately when present, motivated its usage as the

settings. The Haar face detection results shown from now on were

obtained using the default settings.

4. Skin color-based face detector [3]The skin color-based face detector proceeds in the following

steps.

1. Convert the RGB image into the YCbCr color space.

reason behind this is that that segmentation of skin colored regi

becomes robust only if the chrominance component is used

analysis. Therefore, the luminance component is eliminated as m

as possible by choosing the CbCr plane (chrominance) of the YC

color space to build the model.

2. Regions of interest are carefully extracted from the imag

training pixels. Regions containing human skin pixels as well as n

skin pixels are collected. The mean and covariance of the datab

characterize the model. It is modeled as a unimodal Gaussian.

mean and covariance is estimated using EM algorithm. (

algorithm was implemented as part of another project by me in

2003. The same code was used).

It can be seen in Figure 4 that the color of human skin pixe

confined to a very small region in the chrominance space, whic

distinct from the non-skin region.

Figure 4: CbCr plane of skin and non-skin regions

Let c = [Cb Cr] T be the chrominance vector of an input pixel.

probability that the given pixel lies in the skin distribution is gi

by

=

s

ssTs cc

skincp

2

)()(2

1exp

)/(

1

where s and s represent the mean vector and the covariamatrix respectively of the training pixels. This gives the probab

of a pixel occurring given that it is a skin pixel. Similarly,

calculatep(c/non-skin).

The posterior probability that a pixel represents skin given

chrominance vector c,p(skin/c) is evaluated using Bayes theorem

)/()/(

)/()/(

skinnoncpskincp

skincpcskinp

+

=

An input image is analyzed pixel-by-pixel evaluating the

probability at each pixel. This results in a gray level image where

gray value gives the probability of the pixel representing skin. T

image is thresholded to get to obtain a binary image. A cor

choice of threshold is critical. Increasing the threshold will incre

the chances of losing certain skin regions exposed to adverse ligh

conditions, during thresholding. Also, the extra regions that


6/11

retained in the image because of the lower threshold can be removed

using connected component operators.

3. The resulting image after stage 2 contains lot of noise. So the

image is opened using a disk shaped structuring element. The effect

of using area open is removal of small and bright regions in the

thresholded image. The size of the structuring element should not be

more than that of the smallest face the system is designed to detect.

A set of shape based connected operators are applied over these

remaining components to decide whether they represent a face or

not. These operators make use of basic assumptions about the shape

of the face.

4. Compactness It is defined as the ratio of its area to the square

of its perimeter.

2P

AsCompactnes =

This criterion is maximized for circular objects. Face component

exhibits a high value for this operator. If a particular component

shows a compactness value greater than this threshold, it is retained

for further analysis, else discarded.

5. Solidity For a connected component, solidity is defined as theratio of its area to the area of the rectangular bounding box.

yxDD

ASolidity =

It gives a measure of area occupancy of a connected component

within its min-max box dimensions. The solidity assumes a high

value for face components. If a particular component shows a

solidity value greater than a threshold, it is retained, else discarded.

6. Aspect ratio It is assumed that normally face components

have an aspect ratio well within a certain range. If a components

aspect ratio falls out of this range, the component is eliminated.

x

y

D

DRatioAspect =

7. Normalization The remaining unwanted components are

removed using Normalized Area. It is the ratio of the area of the

connected component to that of the largest component present in the

image. Connected components that are less than this threshold are

eliminated.

The connected components that remain at this stage contain faces.

Figure 5 walks through all the steps for the image shown in Figure

5a.

(5a) Original Image

(5b) Binary image showing probable skin areas

(5c) After erosion with a disk structuring element

(5d) After dilation with the same structuring element

(5e) After compactness thresholding


7/11

(5f) After solidity thresholding

(5g) After aspect ratio thresholding

(5h) After normalized area thresholding

(5i) Final result

Figure 5: Various stages in face detection using skin-color and

connected component operators.

The parameter setting that was used for the above image is:

Probability threshold 0.1 Size of SE (for opening) 17 x 17 pixels, disk shaped (Im

size 816 x 616 pixels)

Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.9 2.1 Normalization threshold 0.35It is worth mentioning here as to how the parameter settings w

finalized. This is derived from the scatter plots of the Recall

Precision measures. Again, this is one of the primary uses

empirical evaluation. This is explained in detail in a later section.

5. Ground truthThe following guidelines were used while ground-truthing

images for evaluation.

A face is bounded by a rectangle, where the area includeseyebrows, eyes, nose mouth and chin.

There should be a small but clear space between these fafeatures and the bounding box.

The ears and top hair are not included in the face.For clear visualization, the ground truth images are shown

section 7 along with the results of each of the methods on the

images.

One of the major issues with evaluation is the quality of gro

truth. How reliable the ground truth is? To account for

ambiguity, care was taken to make the evaluation insensitiv

ground truthing errors. Measures that use area overlaps, were ma

little lenient in the sense that their contribution to the final score

less weight as against measures such as fragmentation, object c

accuracy etc. The approach of weighing the different metric va

will also be helpful in extending the evaluation protocol to diffedomains.

6. Parameter setting of skin color-based face detectAnother important application of performance evaluation i

arrive on global setting of parameters that influence the performa

of an algorithm. Toward this end, a set of measures that

representative of the performance of the algorithm were chose

tracking the algorithm performance. Area-Thresholded Recall

Precision give the overall picture of the performance. These

measures were chosen to decide the setting of parameters for the

color-based face detector.

Also, as for the parameters, only the parameters that affected

performance most were chosen. The values of solidity compactness thresholds did not seem to be varying the results

considerable extent. Also, keeping the probability threshold as low

possible is always advisable as missing the skin region is not go

So the aspect ratio and normalized area thresholds were varied

the scatter plots of ATR and ATP were plotted. The global settin

these parameters was decided at the value, where the values of

measure peaked for most of the images.


8/11

Figure 6: Scatter plot of ATR/ATP against Normalized Area

Threshold

A value of 0.35 is chosen for Normalized Area Threshold because

this is the value at which the ATR does not go very low for most of

the images, while the ATP is maintained at best possible value.

A similar plot was made for the Aspect Ratio range against

ATR/ATP. On the same lines of deciding the value for Normalized

Area threshold, the value for Aspect Ratio range was decided as 0.9

2.1. Since it has two values lower threshold and higher threshold,

there have to be two plots to show the effect of each. The reader

might not be able to appreciate the decision through two different

plots. Hence, the plot has not been shown here.

It is important to note that the performance of the algorithm might

not be the best for all images. With a different setting, the

performance might increase. This is shown in Fig. 7.

(7a) Results with global settings(One face is totally missed)

(7b) Results with image specific settings

(Though one of the faces is fragmented, it is localized proper

Figure 7: Results explaining the tradeoff between global sett

and per image setting of parameters.

The result in Fig. 7a) is obtained with the global setting, while

result in Fig. 7b) is obtained with the following settings.

Probability threshold 0.1 Size of SE (for opening) 5 x 5 pixels, disk shaped (Im

size 400 x 276 pixels)

Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.3 2.1 Normalization threshold 0.05However this is acceptable because one cannot set the param

on a per image basis. The whole process of face detection shoul

an automated process with no user intervention with the param

settings.

7. Evaluation ResultsBased on the performance metric values (Refer last page for

evaluation results Fig. 9), one can conclude the following: -

1. The Haar face detector is more robust in detecting a face in

image. This is apparent from the fact that the Area Threshol

Recall of Haar face detector is always higher than or equal to

skin color-based face detector.

2. Both the algorithms produce false positives. However, based

the values of Precision, we can infer that, often Haar face dete

produces lesser false positives than skin color-based face detector

3. On this dataset, both the algorithms, when they detect a face, t

detect it in whole. This can be seen from the average fragmenta

measure for the test images. Again, this is specific to the test data

used. The skin color-based face detector is expected to be pron

errors in terms of fragmentation. In fact when tried on a diffeimage from a database from which no images were used in train

the results of face detection showed fragmentation.


9/11

where,

G Di i

i i

min (N ,N ) G DOverlap Ratio =

G Di=1

I

U

Here, , indicates the maximum number of one-to

mapping between ground truth objects and detected boxes. Howe

work has to be done in checking the failure cases of the measureboundary conditions. Initial results have been promising in th

successfully captures the aspects stated above.

i imin (G ,D)

Finally, the essence of evaluation is to improve the performa

of the algorithm. Here we have noticed that the skin color based

detector does not perform as good as the Haar face detector. In f

even tweaking the parameters does not yield best results. This sh

that the method based on skin color is not robust.

Figure 8: Results of skin color-based face detector. The detected

face is fragmented. Also, there are false positives.9. Conclusion

Two face detection algorithms one based on Haar-like feat

and the other based on skin color have been implemented. Both

methods have been empirically evaluated and their performa

quantified. Based on the results, we can declare that the Haar detector outperforms skin color-based face detector in almost all

aspects of evaluation.

This image was not included in the test set because color is

sensitive to the camera used. Since this image was taken from a

different data set on which the classifier was not trained on, it

wouldn't be genuine to test the algorithm on this data. Again this is

one of the major drawbacks skin color-based face detector. It has to

be trained with skin and non-skin pixels from images taken from a

camera whose images will be present in the test set. The Haar face

detector is not limited by any such constraints. From these, we can

declare that the performance of Haar face detector is better than the

skin color-based face detector in all the aspects.

Efforts on improving the performance of the skin-color ba

method have proven to be futile. This probably is due to the inab

of the method in handling challenging situations. Even a cur

investigation of the method reveals the fact that color is not a g

feature to rely on. It can vary due to different lightings, cameras

other factors such as shadow etc Since, the evaluation is

subjective and the performance has been quantified, there is

ambiguity with the conclusion.8. Future WorkEffort has to be directed in making the performance evaluation

insensitive to ground truthing errors. This is an extremely difficult

task. However, measures such as Area Thresholded Recall and

Precision are efforts in this direction. However, there is still scope

for improvement. This aspect has to be explored.

References

[1] Kasturi R, Goldgof D, Soundararajan P, Manohar V, PerformEvaluation Protocol for Text and Face Detection & Tracking in VAnalysis and Content Extraction (VACE-II), Report Submitted to AdvaResearch and Development Activity, March 2004.Another point is the fact that there are probably too many

measures. Considering the fact that they cover different aspects of

performance, this is acceptable. However, there have to be measures

that comprehensively cover all aspects of an algorithm. To this end,

we have developed a comprehensive measure that accounts for

fragmentation (splits), merges, area overlap and false positives.

[2] Viola P and Jones M.J Robust real-time object detection. In Pro

IEEE Workshop on Statistical and Computational Theories of Vision, 20[3] Kuchi P, Gabbur P, Bhat S, David S, Human Face Detection

Tracking using Skin Color Modeling and Connected CompoOperators, IETE Journal of Research, Special issue on Visual MProcessing, May 2002.

This measure is mainly intended to comprehensively cover many

aspects in one measure. However, this measure requires a one-to-one

mapping of ground truth and detected objects. This is an area-based

measure, which penalizes false detections, missed detections and

spatial fragmentation. For a single image, we define CAM, the

detection composite measure; (given that there are NG ground-truthobjects andND detected objects in the image) as,

2

G D

Overlap RatioCAM =

N + N
http://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdf


10/11

Evaluation Results

A-1 A

OCA 1 PBR 1

PBP .98

ATR 1 ATP 1

ABR .98

ABP .96

AF 1 N

9.1 (a) 9.1 (b) 9.1 (c)

A-1 A

OCA 1 PBR .97

PBP .81

ATR 1

ATP 1 ABR .9

ABP .73

AF 1 9.2 (a) 9.2 (b) 9.2 (c)

A-1 A

OCA .8 PBR .8

PBP .87

ATR .67

ATP 1

ABR .67 ABP .82

AF 1

9.3 (a) 9.3 (b) 9.3 (c)


11/11

9.4 (a) 9.4 (b) 9.4 (c)

Figure 9: Results of Evaluation

(a) Ground Truth Image (b) Results of Haar face detection (c) Results of skin color-based face detectionA-1 Haar face detector; A-2 Skin color-based face detector

OCA Object Count Accuracy; PBR Pixel Based Recall; PBP Pixel Based Precision; ATR Area Thresholded Recall; ATP A

Thresholded Precision; ABR Area Based Recall; ABP Area Based Precision; AF Average Fragmentation

OVERLAP_MIN was kept at 40% for all the ATR/ATP measurements.

A-1 A

OCA 1 PBR .64

PBP .62

ATR .5 ATP .5

ABR .49

ABP .47

AF 1

Face Detection Evaluation

Documents

Transcript of Face Detection Evaluation