Post on 05-Apr-2018
7/31/2019 Face Detection Evaluation
1/11
CAP 6415 Computer Vision Project Report Version 2
Face Detection Using Skin Color and Haar-like features
Implementation and Evaluation
ByVasant Manohar
Department of Computer Science and Engineering
University of South Florida
Email: vmanohar@csee.usf.edu
AbstractFrom a research point of view, well-established problems need
standard databases, evaluation protocols and scoring methods
available. Evaluating algorithms lets researchers know the strengths
and weaknesses of a particular approach and identifies aspects of a
problem where further research is needed. In this report, two face
detection algorithms (one based on Haar-like features and the otherbased on skin color modeling) have been implemented. They have
been empirically evaluated and the strengths and weaknesses of each
method are identified. Also, with the changing values of the measures
a global setting of parameter for the skin-color based face detector,
which is optimal for most of the images, is identified. Since the
performance has been quantified, a conclusion as to which algorithm
outperforms the other is made. Experimental results of both the face
detection algorithms and empirical evaluation have been produced.
1. Introduction and OverviewTwo face detection algorithms are implemented as part of this
project. One based on skin color and connected component operatorsand the other method based on Haar-like features.
Both the methods were implemented using Open Computer
Vision Library. As for Haar face detection, the functions of OpenCV
were used to implement the face detector. Also, the classifier was not
trained on any training data. The reason being, the Haar face detector
has been already trained with face data and the learnt parameters of
the classifier already existed in the library.
However, as for the other method based on skin color, the images
from the dataset were used to train the classifier. In fact, we can call
the method as semi-supervised, because, sample pixels of the skin
are input to the classifier before it detects the face in the image.
Again, this is not mandatory, but was done in this project because
acquiring samples for skin color from huge databases seemeddifficult. Hence, it was checked if given the samples of the skin
pixels that appear in the image, will the classifier be able to segment
the face effectively. It should be noted here that the number of skin
pixels that were extracted from the image were kept minimal. The
emphasis was on making the training set size as small as possible
while building a classifier with acceptable accuracy.
Both the face detectors will detect only frontal faces. Profile
faces, faces that are partially occluded, and heads will not be
detected.
It is worth mentioning that the emphasis of this project was no
developing robust face detection algorithms. Rather, effort
directed to develop a framework for empirical evaluation
algorithms for face detection (in fact object-detection, to mak
generic). By way of performance evaluation, the aspects of e
algorithm that needs improvement were identified.
Using the proposed measures, we can do the following:1. Quantitatively measure the performance.
2. Compare the performance of an algorithm for different kind
data.
3. Possible to quantitatively compare different detection algorithm
4. In the course of an algorithms development, any performa
improvement can be measured.
5. Trade-offs between performance aspects can be determined.
6. Parameter settings of algorithms that is optimal on majority of
images can inferred from performance plots.
The report is organized into the following sections. Sectio
introduces the performance evaluation measures on which
algorithms will be evaluated. Section 3 briefly explains the Haar detector and discusses the results. Section 4 discusses the skin-c
based face detector and details the results. Section 5 shows
ground truth. It also discusses the ground truthing issues and
performance evaluation can be made independent of ground truth
errors. Section 6 explains how the measures were used to arrive
global setting for the parameters of skin color-based face detec
The details of the evaluation results of the two algorithms and
inferences that can be drawn from them are explained in Sectio
Section 8 describes the future scope of work on the project.
conclusions drawn from the work are presented in Section 9.
2. Performance Evaluation [1]
This section details the measures used to quantify the diffeaspects of an algorithms performance. The strengths
weaknesses of each of these measures are described.
The value of each measure is between zero (worst) and one (be
Fig. 1 introduces the concept of the recall and precision applie
the detection measures. There can be two forms of false alarms,
results from the non-overlapping region in the detected area ca
the false positive (FP) or in other words, this area is classified a
object but the ground truth is absent. The other form of false al
results from the missed region in the ground truth area and thi
mailto:vmanohar@csee.usf.edumailto:vmanohar@csee.usf.edu7/31/2019 Face Detection Evaluation
2/11
called false negative (FN). Precision gives an idea of how well the
detected area match with the ground truth area. Recall on the other
hand gives an idea of how well the ground truth overlaps with the
detected area. All the measures described have the recall and
precision counterparts so that the FP and FN errors are accounted for.
Figure 1: Recall and Precision Concept
The measures are organized in growing level of complexity and
accuracy. The first measure, the Object Count Accuracy (Sec 2.1.1)is a trivial measure and it simply counts the number of detected
objects with respect to the ground truth objects without checking for
how accurate they overlap with each other. Next, the pixel-based
measures, which check for the raw pixel overlaps between the object
and ground truth boxes are defined in Sec 2.1.2 and 2.1.3. Here the
entire frame is considered as a bit-map without any distinctions made
between the different objects. If there were a detected box
overlapping another detected box, then this measure would not make
any distinction as it considers the union of the areas. Here, bigger
boxes have an advantage over smaller boxes. The measures
discussed in Sec 2.1.4 and 2.1.5 are area-thresholded measures. If the
overlap between the ground truth and detected box is greater than a
threshold, then full credit is given for the particular box pair. Next,
the area-based measures are discussed in Sec 2.1.6 and 2.1.7, wherethe measures treat the individual boxes equally regardless of the size,
in contrast to the pixel-based measure, which treats bigger boxes
differently than smaller boxes. The area-based measure takes into
account the individual objects as opposed to not making such
distinction in the case of pixel-based measure. In Sec 2.1.8, the
fragmentation measure is discussed. This measure penalizes
algorithms if they break individual ground truth box into multiple
detected boxes.
We also propose a set of measures, which are based on a
requirement that there is a one-to-one mapping between each ground
truth box and the detected box. We measure the positional accuracy
of the detection output to the ground truth in Sec 2.2.1. A size-based
measure is discussed in Sec 2.2.2, while Sec 2.2.3 discusses anorientation-based measure. Finally, we propose a composite measure
in Sec 2.2.4, which is area-based and takes into account the recall,
precision and fragmentation.
2.1 Measures independent of Ground Truth and Detected
Box MatchingThe measures proposed in this section are independent of Ground
Truth and Detected Box Matching. This is because, whenever
calculating overlaps, it considers spatial union of boxes, wh
makes sure that overlapped areas are not counted twice.
2.1.1 Object Count AccuracyThis measure compares the number of ground-truth objects in
frame with the number of algorithm output boxes. It penalizes
algorithm both for extra or fewer boxes than the ground truth. Le
be the set of ground truth objects in the image and letD be the se
output boxes produced by the algorithm. TheAccuracy is defined
G + D =
G, D
G + D
undefined if N N 0
Minimum (N N )Accuracy otherwise
N N
2
=
whereNGandNDare the number of ground-truth objects and outp
boxes, respectively in the image.
The measure does not consider the spatial information of th
boxes. Only the count of boxes in each frame is considered. T
measure could be useful in evaluating algorithm performances
correctly identifying the number of objects in a given imirrespective of how close they are with respect to the ground t
object. Consider a scenario in which there are 10 ground truth obj
and algorithm A finds 8 boxes (say) and algorithm B finds 2 bo
then A is obviously better than B as for identifying the numbe
objects in the image. To measure the accuracy in-terms of over
with respect to area there are other measures.
2.1.2 Pixel-based RecallThe measure measures how well the algorithm minimizes f
negatives. This is a pixel-count-based measure.
Let UnionGand UnionD be the spatial union of boxes in G and
G
G i
i=N
Union = Gi=1U
where Gi represents the ith ground truth object in the image.
D
D i
i=N
Union = D
i=1
U
whereDi represents the ith detected object in the image.
We define Recall as the ratio of the detected areas in the gro
truth with the total ground truth:
1
G
G D
G
undefined if Union =
Union UnionRecall - otherwiseUnion
=
I
where | | operator denotes the number of pixels in the given are
This measure treats the frame notas collection of objects but
binary pixel map (object/non-object; output-covered/not-out
covered). So the score increases as the overlap increases and wil
1 for complete overlap.
7/31/2019 Face Detection Evaluation
3/11
2.1.3 Pixel-based Precision
The measure measures how well the algorithm minimizes false
positives. This is a pixel-count-based measure.
Let UnionGand UnionD be the spatial union of boxes in G andD.
G
G i
i=N
Union = G
i=1
U
D
D i
i=N
Union = D
i=1
U
We define Precision as the ratio of the detected areas in the
ground truth with the total detection:
1
D
D G
D
undefined if Union =
Union UnionPrecision- otherwise
Union
=
I
where | | operator denotes the number of pixels in the given area.This measure treats the frame notas collection of objects but as a
binary pixel map (object/non-object; output-covered/not-output-
covered). So the score increases as the false positives reduce and will
be 1 for no false positives.
2.1.4 Area-thresholded Recall
In this measure, a ground-truth object is considered detected if the
output boxes cover a minimum proportion of its area. Recall is
computed as the ratio of the number of detected objects to the total
number of ground-truth objects.
LetArea-thresholded_Recall be the number of detected objects in
an image:
G
Gi
N
GctObjectDete
calldthresholdeArea i=)(
Re_
where,
1
0
i D
i i
G Unionif > OVERLAP_MIN
ObjectDetect(G ) = G
otherwise
I
Here OVERLAP_MIN is the minimum proportion of the ground-
truth objects area that should be overlapped by the output boxes in
order to say that it is correctly detected by the algorithm.
Here again, the ground-truth objects are treated equally regardless
of size.
2.1.5 Area-thresholded PrecisionThis is a counterpart of measure 2.1.4. The measure counts the
number of output boxes that significantly covered the ground truth.
An output boxDisignificantly covers the ground-truth if a minimum
proportion of its area overlaps with UnionG.
Let Area-thresholded_Precision be the number of output boxes
that significantly overlap with the ground-truth objects:
D
D
i
N
DecisionBox
ecisiondthresholdeArea i
=
)(Pr
Pr_
where,
1
0
i G
i i
D Unionif > OVERLAP_MIN
BoxPrecision(D) = D
otherwise
I
Here OVERLAP_MIN is the minimum proportion of the ou
boxs area that should be overlapped by the ground truth in orde
say that the output box is precise.
Again, in this measure the output boxes are treated equ
regardless of size similar to the previous measure.
2.1.6 Area-based RecallThis measure is intended to measure the average area recall o
the ground-truth objects in the image. The recall for an object is
proportion of its area that is covered by the algorithms output bo
The objects are treated equally regardless of size.
We define Recall as the average recall for all the objects in
ground truth G:
i
i
G
ObjectRecall(G)GRecall =
N
where,
0i
i i D
i
undefined if G
ObjectRecall(G) = G Unionotherwise
G
=
I
where | | operator denotes the number of pixels in the given are
All the ground-truth objects contribute equally to the measregardless of their size. On one extreme, if an image contains
objects a large object that was completely detected and a v
small object that was missed, thenRecall will be 50%.
2.1.7 Area-based PrecisionThis is a counterpart of the previous measure 2.1.6 where
output boxes are examined instead of the ground-truth obje
Precision is computed for each output box and averaged for
whole image. The precision of a box is the proportion of its area
covers the ground truth objects.
We define Precision as the average precision of the algorith
output boxesD:
i
i
D
BoxPrecision(D)DPrecision =
N
where,
0i
i i G
i
undefined if D
BoxPrecision(D) = D Unionotherwise
D
=
I
where | | operator denotes the number of pixels in the given are
7/31/2019 Face Detection Evaluation
4/11
In this measure the output boxes are treated equally regardless of
size.
2.1.8 Average FragmentationDetection of objects is usually not the final step in a vision
system. For example, extracted text from video will go through
enhancement, binarization and finally recognition by an OCR
system. Ideally, the extracted text should be in one piece, but a
detection algorithm could produce several boxes (e.g. one for each
word or character) or multiple overlapping boxes, which could
increase the difficulty for the next processing step.
The measure is intended to penalize an algorithm for multiple
output boxes covering a ground-truth object. Multiple detections
include overlapping and non-overlapping boxes.
For a ground-truth object Gi, the fragmentation of the output
boxes overlapping the object Giis measured by:
0
1
1
D G
i
10 D G
undefined if N
Frag(G) =otherwise
+ log (N )
=
I
I
where D GN I is the number of output boxes in D that overlap with
the ground-truth object Gi.
For an image, Frag is simply the average fragmentation of all
ground-truth objects in the image where Frag(Gi) is defined. This is
a particularly useful metric for face detection.
3. Haar face detectorHaar object detection, partly motivated by face detection, was
primarily developed with the goal of rapid object detection.
Since the Haar face detector is not the main face detector under
investigation in this project, only a brief overview of the method is
provided. For a detailed description of the technique, please refer [2].
There are three main contributions of the method. The first is theintroduction of a new image representation called the Integral
Image, which allows the features used by the detector to be
computer very quickly. The second is a learning algorithm, based on
AdaBoost, which selects a small number of critical visual features
and yields extremely efficient classifiers. The third contribution is a
method for combining classifiers in a cascade which allows
background regions of the image to be quickly discarded while
spending more computation on promising object-like regions. [2]
Haar face detector is implemented in OpenCV. The function
cvHaarDetectObjects( ) finds rectangular regions in the given image
that are likely to contain objects the cascade has been trained for and
returns those regions as a sequence of rectangles. The function scans
the image several times at different scales. Each time it considersoverlapping regions in the image and applies the classifiers to the
regions. It may also apply some heuristics to reduce number of
analyzed regions, such as Canny pruning.
The default parameters are
1. scale_factor = 1.1, min_neighbors = 3, flags = 0, tuned foraccurate yet slow face detection.
2. For faster face detection on real video images the bettersettings are scale_factor = 1.2, min_neighbors = 2,
flags=CV_HAAR_DO_CANNY_PRUNING.
However, each setting is associated with a cost. While, the def
settings might detect faces most of the cases, it might also decla
face when there is none in the image. On the other, though the o
setting might not detect faces when there is no face in the imag
might also not detect if there are faces in the image. Figs. 2 an
explain this possibility.
(2a) Original Image
(2b) Results with setting 2
(2c) Results with setting 1 (default)
Figure 2: Results showing the better results of default setting
7/31/2019 Face Detection Evaluation
5/11
(3a) Original Image
(3b) Results with setting 2
(3c) Results with setting 1 (default)
Figure 3: Results showing the false positives produced by default
settings.
Though, the default settings produced false positives, the fact thatit detects the face accurately when present, motivated its usage as the
settings. The Haar face detection results shown from now on were
obtained using the default settings.
4. Skin color-based face detector [3]The skin color-based face detector proceeds in the following
steps.
1. Convert the RGB image into the YCbCr color space.
reason behind this is that that segmentation of skin colored regi
becomes robust only if the chrominance component is used
analysis. Therefore, the luminance component is eliminated as m
as possible by choosing the CbCr plane (chrominance) of the YC
color space to build the model.
2. Regions of interest are carefully extracted from the imag
training pixels. Regions containing human skin pixels as well as n
skin pixels are collected. The mean and covariance of the datab
characterize the model. It is modeled as a unimodal Gaussian.
mean and covariance is estimated using EM algorithm. (
algorithm was implemented as part of another project by me in
2003. The same code was used).
It can be seen in Figure 4 that the color of human skin pixe
confined to a very small region in the chrominance space, whic
distinct from the non-skin region.
Figure 4: CbCr plane of skin and non-skin regions
Let c = [Cb Cr] T be the chrominance vector of an input pixel.
probability that the given pixel lies in the skin distribution is gi
by
=
s
ssTs cc
skincp
2
)()(2
1exp
)/(
1
where s and s represent the mean vector and the covariamatrix respectively of the training pixels. This gives the probab
of a pixel occurring given that it is a skin pixel. Similarly,
calculatep(c/non-skin).
The posterior probability that a pixel represents skin given
chrominance vector c,p(skin/c) is evaluated using Bayes theorem
)/()/(
)/()/(
skinnoncpskincp
skincpcskinp
+
=
An input image is analyzed pixel-by-pixel evaluating the
probability at each pixel. This results in a gray level image where
gray value gives the probability of the pixel representing skin. T
image is thresholded to get to obtain a binary image. A cor
choice of threshold is critical. Increasing the threshold will incre
the chances of losing certain skin regions exposed to adverse ligh
conditions, during thresholding. Also, the extra regions that
7/31/2019 Face Detection Evaluation
6/11
retained in the image because of the lower threshold can be removed
using connected component operators.
3. The resulting image after stage 2 contains lot of noise. So the
image is opened using a disk shaped structuring element. The effect
of using area open is removal of small and bright regions in the
thresholded image. The size of the structuring element should not be
more than that of the smallest face the system is designed to detect.
A set of shape based connected operators are applied over these
remaining components to decide whether they represent a face or
not. These operators make use of basic assumptions about the shape
of the face.
4. Compactness It is defined as the ratio of its area to the square
of its perimeter.
2P
AsCompactnes =
This criterion is maximized for circular objects. Face component
exhibits a high value for this operator. If a particular component
shows a compactness value greater than this threshold, it is retained
for further analysis, else discarded.
5. Solidity For a connected component, solidity is defined as theratio of its area to the area of the rectangular bounding box.
yxDD
ASolidity =
It gives a measure of area occupancy of a connected component
within its min-max box dimensions. The solidity assumes a high
value for face components. If a particular component shows a
solidity value greater than a threshold, it is retained, else discarded.
6. Aspect ratio It is assumed that normally face components
have an aspect ratio well within a certain range. If a components
aspect ratio falls out of this range, the component is eliminated.
x
y
D
DRatioAspect =
7. Normalization The remaining unwanted components are
removed using Normalized Area. It is the ratio of the area of the
connected component to that of the largest component present in the
image. Connected components that are less than this threshold are
eliminated.
The connected components that remain at this stage contain faces.
Figure 5 walks through all the steps for the image shown in Figure
5a.
(5a) Original Image
(5b) Binary image showing probable skin areas
(5c) After erosion with a disk structuring element
(5d) After dilation with the same structuring element
(5e) After compactness thresholding
7/31/2019 Face Detection Evaluation
7/11
(5f) After solidity thresholding
(5g) After aspect ratio thresholding
(5h) After normalized area thresholding
(5i) Final result
Figure 5: Various stages in face detection using skin-color and
connected component operators.
The parameter setting that was used for the above image is:
Probability threshold 0.1 Size of SE (for opening) 17 x 17 pixels, disk shaped (Im
size 816 x 616 pixels)
Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.9 2.1 Normalization threshold 0.35It is worth mentioning here as to how the parameter settings w
finalized. This is derived from the scatter plots of the Recall
Precision measures. Again, this is one of the primary uses
empirical evaluation. This is explained in detail in a later section.
5. Ground truthThe following guidelines were used while ground-truthing
images for evaluation.
A face is bounded by a rectangle, where the area includeseyebrows, eyes, nose mouth and chin.
There should be a small but clear space between these fafeatures and the bounding box.
The ears and top hair are not included in the face.For clear visualization, the ground truth images are shown
section 7 along with the results of each of the methods on the
images.
One of the major issues with evaluation is the quality of gro
truth. How reliable the ground truth is? To account for
ambiguity, care was taken to make the evaluation insensitiv
ground truthing errors. Measures that use area overlaps, were ma
little lenient in the sense that their contribution to the final score
less weight as against measures such as fragmentation, object c
accuracy etc. The approach of weighing the different metric va
will also be helpful in extending the evaluation protocol to diffedomains.
6. Parameter setting of skin color-based face detectAnother important application of performance evaluation i
arrive on global setting of parameters that influence the performa
of an algorithm. Toward this end, a set of measures that
representative of the performance of the algorithm were chose
tracking the algorithm performance. Area-Thresholded Recall
Precision give the overall picture of the performance. These
measures were chosen to decide the setting of parameters for the
color-based face detector.
Also, as for the parameters, only the parameters that affected
performance most were chosen. The values of solidity compactness thresholds did not seem to be varying the results
considerable extent. Also, keeping the probability threshold as low
possible is always advisable as missing the skin region is not go
So the aspect ratio and normalized area thresholds were varied
the scatter plots of ATR and ATP were plotted. The global settin
these parameters was decided at the value, where the values of
measure peaked for most of the images.
7/31/2019 Face Detection Evaluation
8/11
Figure 6: Scatter plot of ATR/ATP against Normalized Area
Threshold
A value of 0.35 is chosen for Normalized Area Threshold because
this is the value at which the ATR does not go very low for most of
the images, while the ATP is maintained at best possible value.
A similar plot was made for the Aspect Ratio range against
ATR/ATP. On the same lines of deciding the value for Normalized
Area threshold, the value for Aspect Ratio range was decided as 0.9
2.1. Since it has two values lower threshold and higher threshold,
there have to be two plots to show the effect of each. The reader
might not be able to appreciate the decision through two different
plots. Hence, the plot has not been shown here.
It is important to note that the performance of the algorithm might
not be the best for all images. With a different setting, the
performance might increase. This is shown in Fig. 7.
(7a) Results with global settings(One face is totally missed)
(7b) Results with image specific settings
(Though one of the faces is fragmented, it is localized proper
Figure 7: Results explaining the tradeoff between global sett
and per image setting of parameters.
The result in Fig. 7a) is obtained with the global setting, while
result in Fig. 7b) is obtained with the following settings.
Probability threshold 0.1 Size of SE (for opening) 5 x 5 pixels, disk shaped (Im
size 400 x 276 pixels)
Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.3 2.1 Normalization threshold 0.05However this is acceptable because one cannot set the param
on a per image basis. The whole process of face detection shoul
an automated process with no user intervention with the param
settings.
7. Evaluation ResultsBased on the performance metric values (Refer last page for
evaluation results Fig. 9), one can conclude the following: -
1. The Haar face detector is more robust in detecting a face in
image. This is apparent from the fact that the Area Threshol
Recall of Haar face detector is always higher than or equal to
skin color-based face detector.
2. Both the algorithms produce false positives. However, based
the values of Precision, we can infer that, often Haar face dete
produces lesser false positives than skin color-based face detector
3. On this dataset, both the algorithms, when they detect a face, t
detect it in whole. This can be seen from the average fragmenta
measure for the test images. Again, this is specific to the test data
used. The skin color-based face detector is expected to be pron
errors in terms of fragmentation. In fact when tried on a diffeimage from a database from which no images were used in train
the results of face detection showed fragmentation.
7/31/2019 Face Detection Evaluation
9/11
where,
G Di i
i i
min (N ,N ) G DOverlap Ratio =
G Di=1
I
U
Here, , indicates the maximum number of one-to
mapping between ground truth objects and detected boxes. Howe
work has to be done in checking the failure cases of the measureboundary conditions. Initial results have been promising in th
successfully captures the aspects stated above.
i imin (G ,D)
Finally, the essence of evaluation is to improve the performa
of the algorithm. Here we have noticed that the skin color based
detector does not perform as good as the Haar face detector. In f
even tweaking the parameters does not yield best results. This sh
that the method based on skin color is not robust.
Figure 8: Results of skin color-based face detector. The detected
face is fragmented. Also, there are false positives.9. Conclusion
Two face detection algorithms one based on Haar-like feat
and the other based on skin color have been implemented. Both
methods have been empirically evaluated and their performa
quantified. Based on the results, we can declare that the Haar detector outperforms skin color-based face detector in almost all
aspects of evaluation.
This image was not included in the test set because color is
sensitive to the camera used. Since this image was taken from a
different data set on which the classifier was not trained on, it
wouldn't be genuine to test the algorithm on this data. Again this is
one of the major drawbacks skin color-based face detector. It has to
be trained with skin and non-skin pixels from images taken from a
camera whose images will be present in the test set. The Haar face
detector is not limited by any such constraints. From these, we can
declare that the performance of Haar face detector is better than the
skin color-based face detector in all the aspects.
Efforts on improving the performance of the skin-color ba
method have proven to be futile. This probably is due to the inab
of the method in handling challenging situations. Even a cur
investigation of the method reveals the fact that color is not a g
feature to rely on. It can vary due to different lightings, cameras
other factors such as shadow etc Since, the evaluation is
subjective and the performance has been quantified, there is
ambiguity with the conclusion.8. Future WorkEffort has to be directed in making the performance evaluation
insensitive to ground truthing errors. This is an extremely difficult
task. However, measures such as Area Thresholded Recall and
Precision are efforts in this direction. However, there is still scope
for improvement. This aspect has to be explored.
References
[1] Kasturi R, Goldgof D, Soundararajan P, Manohar V, PerformEvaluation Protocol for Text and Face Detection & Tracking in VAnalysis and Content Extraction (VACE-II), Report Submitted to AdvaResearch and Development Activity, March 2004.Another point is the fact that there are probably too many
measures. Considering the fact that they cover different aspects of
performance, this is acceptable. However, there have to be measures
that comprehensively cover all aspects of an algorithm. To this end,
we have developed a comprehensive measure that accounts for
fragmentation (splits), merges, area overlap and false positives.
[2] Viola P and Jones M.J Robust real-time object detection. In Pro
IEEE Workshop on Statistical and Computational Theories of Vision, 20[3] Kuchi P, Gabbur P, Bhat S, David S, Human Face Detection
Tracking using Skin Color Modeling and Connected CompoOperators, IETE Journal of Research, Special issue on Visual MProcessing, May 2002.
This measure is mainly intended to comprehensively cover many
aspects in one measure. However, this measure requires a one-to-one
mapping of ground truth and detected objects. This is an area-based
measure, which penalizes false detections, missed detections and
spatial fragmentation. For a single image, we define CAM, the
detection composite measure; (given that there are NG ground-truthobjects andND detected objects in the image) as,
2
G D
Overlap RatioCAM =
N + N
http://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdf7/31/2019 Face Detection Evaluation
10/11
Evaluation Results
A-1 A
OCA 1 PBR 1
PBP .98
ATR 1 ATP 1
ABR .98
ABP .96
AF 1 N
9.1 (a) 9.1 (b) 9.1 (c)
A-1 A
OCA 1 PBR .97
PBP .81
ATR 1
ATP 1 ABR .9
ABP .73
AF 1 9.2 (a) 9.2 (b) 9.2 (c)
A-1 A
OCA .8 PBR .8
PBP .87
ATR .67
ATP 1
ABR .67 ABP .82
AF 1
9.3 (a) 9.3 (b) 9.3 (c)
7/31/2019 Face Detection Evaluation
11/11
9.4 (a) 9.4 (b) 9.4 (c)
Figure 9: Results of Evaluation
(a) Ground Truth Image (b) Results of Haar face detection (c) Results of skin color-based face detectionA-1 Haar face detector; A-2 Skin color-based face detector
OCA Object Count Accuracy; PBR Pixel Based Recall; PBP Pixel Based Precision; ATR Area Thresholded Recall; ATP A
Thresholded Precision; ABR Area Based Recall; ABP Area Based Precision; AF Average Fragmentation
OVERLAP_MIN was kept at 40% for all the ATR/ATP measurements.
A-1 A
OCA 1 PBR .64
PBP .62
ATR .5 ATP .5
ABR .49
ABP .47
AF 1