Face Detection Evaluation

download Face Detection Evaluation

of 11

Transcript of Face Detection Evaluation

  • 7/31/2019 Face Detection Evaluation

    1/11

    CAP 6415 Computer Vision Project Report Version 2

    Face Detection Using Skin Color and Haar-like features

    Implementation and Evaluation

    ByVasant Manohar

    Department of Computer Science and Engineering

    University of South Florida

    Email: [email protected]

    AbstractFrom a research point of view, well-established problems need

    standard databases, evaluation protocols and scoring methods

    available. Evaluating algorithms lets researchers know the strengths

    and weaknesses of a particular approach and identifies aspects of a

    problem where further research is needed. In this report, two face

    detection algorithms (one based on Haar-like features and the otherbased on skin color modeling) have been implemented. They have

    been empirically evaluated and the strengths and weaknesses of each

    method are identified. Also, with the changing values of the measures

    a global setting of parameter for the skin-color based face detector,

    which is optimal for most of the images, is identified. Since the

    performance has been quantified, a conclusion as to which algorithm

    outperforms the other is made. Experimental results of both the face

    detection algorithms and empirical evaluation have been produced.

    1. Introduction and OverviewTwo face detection algorithms are implemented as part of this

    project. One based on skin color and connected component operatorsand the other method based on Haar-like features.

    Both the methods were implemented using Open Computer

    Vision Library. As for Haar face detection, the functions of OpenCV

    were used to implement the face detector. Also, the classifier was not

    trained on any training data. The reason being, the Haar face detector

    has been already trained with face data and the learnt parameters of

    the classifier already existed in the library.

    However, as for the other method based on skin color, the images

    from the dataset were used to train the classifier. In fact, we can call

    the method as semi-supervised, because, sample pixels of the skin

    are input to the classifier before it detects the face in the image.

    Again, this is not mandatory, but was done in this project because

    acquiring samples for skin color from huge databases seemeddifficult. Hence, it was checked if given the samples of the skin

    pixels that appear in the image, will the classifier be able to segment

    the face effectively. It should be noted here that the number of skin

    pixels that were extracted from the image were kept minimal. The

    emphasis was on making the training set size as small as possible

    while building a classifier with acceptable accuracy.

    Both the face detectors will detect only frontal faces. Profile

    faces, faces that are partially occluded, and heads will not be

    detected.

    It is worth mentioning that the emphasis of this project was no

    developing robust face detection algorithms. Rather, effort

    directed to develop a framework for empirical evaluation

    algorithms for face detection (in fact object-detection, to mak

    generic). By way of performance evaluation, the aspects of e

    algorithm that needs improvement were identified.

    Using the proposed measures, we can do the following:1. Quantitatively measure the performance.

    2. Compare the performance of an algorithm for different kind

    data.

    3. Possible to quantitatively compare different detection algorithm

    4. In the course of an algorithms development, any performa

    improvement can be measured.

    5. Trade-offs between performance aspects can be determined.

    6. Parameter settings of algorithms that is optimal on majority of

    images can inferred from performance plots.

    The report is organized into the following sections. Sectio

    introduces the performance evaluation measures on which

    algorithms will be evaluated. Section 3 briefly explains the Haar detector and discusses the results. Section 4 discusses the skin-c

    based face detector and details the results. Section 5 shows

    ground truth. It also discusses the ground truthing issues and

    performance evaluation can be made independent of ground truth

    errors. Section 6 explains how the measures were used to arrive

    global setting for the parameters of skin color-based face detec

    The details of the evaluation results of the two algorithms and

    inferences that can be drawn from them are explained in Sectio

    Section 8 describes the future scope of work on the project.

    conclusions drawn from the work are presented in Section 9.

    2. Performance Evaluation [1]

    This section details the measures used to quantify the diffeaspects of an algorithms performance. The strengths

    weaknesses of each of these measures are described.

    The value of each measure is between zero (worst) and one (be

    Fig. 1 introduces the concept of the recall and precision applie

    the detection measures. There can be two forms of false alarms,

    results from the non-overlapping region in the detected area ca

    the false positive (FP) or in other words, this area is classified a

    object but the ground truth is absent. The other form of false al

    results from the missed region in the ground truth area and thi

    mailto:[email protected]:[email protected]
  • 7/31/2019 Face Detection Evaluation

    2/11

    called false negative (FN). Precision gives an idea of how well the

    detected area match with the ground truth area. Recall on the other

    hand gives an idea of how well the ground truth overlaps with the

    detected area. All the measures described have the recall and

    precision counterparts so that the FP and FN errors are accounted for.

    Figure 1: Recall and Precision Concept

    The measures are organized in growing level of complexity and

    accuracy. The first measure, the Object Count Accuracy (Sec 2.1.1)is a trivial measure and it simply counts the number of detected

    objects with respect to the ground truth objects without checking for

    how accurate they overlap with each other. Next, the pixel-based

    measures, which check for the raw pixel overlaps between the object

    and ground truth boxes are defined in Sec 2.1.2 and 2.1.3. Here the

    entire frame is considered as a bit-map without any distinctions made

    between the different objects. If there were a detected box

    overlapping another detected box, then this measure would not make

    any distinction as it considers the union of the areas. Here, bigger

    boxes have an advantage over smaller boxes. The measures

    discussed in Sec 2.1.4 and 2.1.5 are area-thresholded measures. If the

    overlap between the ground truth and detected box is greater than a

    threshold, then full credit is given for the particular box pair. Next,

    the area-based measures are discussed in Sec 2.1.6 and 2.1.7, wherethe measures treat the individual boxes equally regardless of the size,

    in contrast to the pixel-based measure, which treats bigger boxes

    differently than smaller boxes. The area-based measure takes into

    account the individual objects as opposed to not making such

    distinction in the case of pixel-based measure. In Sec 2.1.8, the

    fragmentation measure is discussed. This measure penalizes

    algorithms if they break individual ground truth box into multiple

    detected boxes.

    We also propose a set of measures, which are based on a

    requirement that there is a one-to-one mapping between each ground

    truth box and the detected box. We measure the positional accuracy

    of the detection output to the ground truth in Sec 2.2.1. A size-based

    measure is discussed in Sec 2.2.2, while Sec 2.2.3 discusses anorientation-based measure. Finally, we propose a composite measure

    in Sec 2.2.4, which is area-based and takes into account the recall,

    precision and fragmentation.

    2.1 Measures independent of Ground Truth and Detected

    Box MatchingThe measures proposed in this section are independent of Ground

    Truth and Detected Box Matching. This is because, whenever

    calculating overlaps, it considers spatial union of boxes, wh

    makes sure that overlapped areas are not counted twice.

    2.1.1 Object Count AccuracyThis measure compares the number of ground-truth objects in

    frame with the number of algorithm output boxes. It penalizes

    algorithm both for extra or fewer boxes than the ground truth. Le

    be the set of ground truth objects in the image and letD be the se

    output boxes produced by the algorithm. TheAccuracy is defined

    G + D =

    G, D

    G + D

    undefined if N N 0

    Minimum (N N )Accuracy otherwise

    N N

    2

    =

    whereNGandNDare the number of ground-truth objects and outp

    boxes, respectively in the image.

    The measure does not consider the spatial information of th

    boxes. Only the count of boxes in each frame is considered. T

    measure could be useful in evaluating algorithm performances

    correctly identifying the number of objects in a given imirrespective of how close they are with respect to the ground t

    object. Consider a scenario in which there are 10 ground truth obj

    and algorithm A finds 8 boxes (say) and algorithm B finds 2 bo

    then A is obviously better than B as for identifying the numbe

    objects in the image. To measure the accuracy in-terms of over

    with respect to area there are other measures.

    2.1.2 Pixel-based RecallThe measure measures how well the algorithm minimizes f

    negatives. This is a pixel-count-based measure.

    Let UnionGand UnionD be the spatial union of boxes in G and

    G

    G i

    i=N

    Union = Gi=1U

    where Gi represents the ith ground truth object in the image.

    D

    D i

    i=N

    Union = D

    i=1

    U

    whereDi represents the ith detected object in the image.

    We define Recall as the ratio of the detected areas in the gro

    truth with the total ground truth:

    1

    G

    G D

    G

    undefined if Union =

    Union UnionRecall - otherwiseUnion

    =

    I

    where | | operator denotes the number of pixels in the given are

    This measure treats the frame notas collection of objects but

    binary pixel map (object/non-object; output-covered/not-out

    covered). So the score increases as the overlap increases and wil

    1 for complete overlap.

  • 7/31/2019 Face Detection Evaluation

    3/11

    2.1.3 Pixel-based Precision

    The measure measures how well the algorithm minimizes false

    positives. This is a pixel-count-based measure.

    Let UnionGand UnionD be the spatial union of boxes in G andD.

    G

    G i

    i=N

    Union = G

    i=1

    U

    D

    D i

    i=N

    Union = D

    i=1

    U

    We define Precision as the ratio of the detected areas in the

    ground truth with the total detection:

    1

    D

    D G

    D

    undefined if Union =

    Union UnionPrecision- otherwise

    Union

    =

    I

    where | | operator denotes the number of pixels in the given area.This measure treats the frame notas collection of objects but as a

    binary pixel map (object/non-object; output-covered/not-output-

    covered). So the score increases as the false positives reduce and will

    be 1 for no false positives.

    2.1.4 Area-thresholded Recall

    In this measure, a ground-truth object is considered detected if the

    output boxes cover a minimum proportion of its area. Recall is

    computed as the ratio of the number of detected objects to the total

    number of ground-truth objects.

    LetArea-thresholded_Recall be the number of detected objects in

    an image:

    G

    Gi

    N

    GctObjectDete

    calldthresholdeArea i=)(

    Re_

    where,

    1

    0

    i D

    i i

    G Unionif > OVERLAP_MIN

    ObjectDetect(G ) = G

    otherwise

    I

    Here OVERLAP_MIN is the minimum proportion of the ground-

    truth objects area that should be overlapped by the output boxes in

    order to say that it is correctly detected by the algorithm.

    Here again, the ground-truth objects are treated equally regardless

    of size.

    2.1.5 Area-thresholded PrecisionThis is a counterpart of measure 2.1.4. The measure counts the

    number of output boxes that significantly covered the ground truth.

    An output boxDisignificantly covers the ground-truth if a minimum

    proportion of its area overlaps with UnionG.

    Let Area-thresholded_Precision be the number of output boxes

    that significantly overlap with the ground-truth objects:

    D

    D

    i

    N

    DecisionBox

    ecisiondthresholdeArea i

    =

    )(Pr

    Pr_

    where,

    1

    0

    i G

    i i

    D Unionif > OVERLAP_MIN

    BoxPrecision(D) = D

    otherwise

    I

    Here OVERLAP_MIN is the minimum proportion of the ou

    boxs area that should be overlapped by the ground truth in orde

    say that the output box is precise.

    Again, in this measure the output boxes are treated equ

    regardless of size similar to the previous measure.

    2.1.6 Area-based RecallThis measure is intended to measure the average area recall o

    the ground-truth objects in the image. The recall for an object is

    proportion of its area that is covered by the algorithms output bo

    The objects are treated equally regardless of size.

    We define Recall as the average recall for all the objects in

    ground truth G:

    i

    i

    G

    ObjectRecall(G)GRecall =

    N

    where,

    0i

    i i D

    i

    undefined if G

    ObjectRecall(G) = G Unionotherwise

    G

    =

    I

    where | | operator denotes the number of pixels in the given are

    All the ground-truth objects contribute equally to the measregardless of their size. On one extreme, if an image contains

    objects a large object that was completely detected and a v

    small object that was missed, thenRecall will be 50%.

    2.1.7 Area-based PrecisionThis is a counterpart of the previous measure 2.1.6 where

    output boxes are examined instead of the ground-truth obje

    Precision is computed for each output box and averaged for

    whole image. The precision of a box is the proportion of its area

    covers the ground truth objects.

    We define Precision as the average precision of the algorith

    output boxesD:

    i

    i

    D

    BoxPrecision(D)DPrecision =

    N

    where,

    0i

    i i G

    i

    undefined if D

    BoxPrecision(D) = D Unionotherwise

    D

    =

    I

    where | | operator denotes the number of pixels in the given are

  • 7/31/2019 Face Detection Evaluation

    4/11

    In this measure the output boxes are treated equally regardless of

    size.

    2.1.8 Average FragmentationDetection of objects is usually not the final step in a vision

    system. For example, extracted text from video will go through

    enhancement, binarization and finally recognition by an OCR

    system. Ideally, the extracted text should be in one piece, but a

    detection algorithm could produce several boxes (e.g. one for each

    word or character) or multiple overlapping boxes, which could

    increase the difficulty for the next processing step.

    The measure is intended to penalize an algorithm for multiple

    output boxes covering a ground-truth object. Multiple detections

    include overlapping and non-overlapping boxes.

    For a ground-truth object Gi, the fragmentation of the output

    boxes overlapping the object Giis measured by:

    0

    1

    1

    D G

    i

    10 D G

    undefined if N

    Frag(G) =otherwise

    + log (N )

    =

    I

    I

    where D GN I is the number of output boxes in D that overlap with

    the ground-truth object Gi.

    For an image, Frag is simply the average fragmentation of all

    ground-truth objects in the image where Frag(Gi) is defined. This is

    a particularly useful metric for face detection.

    3. Haar face detectorHaar object detection, partly motivated by face detection, was

    primarily developed with the goal of rapid object detection.

    Since the Haar face detector is not the main face detector under

    investigation in this project, only a brief overview of the method is

    provided. For a detailed description of the technique, please refer [2].

    There are three main contributions of the method. The first is theintroduction of a new image representation called the Integral

    Image, which allows the features used by the detector to be

    computer very quickly. The second is a learning algorithm, based on

    AdaBoost, which selects a small number of critical visual features

    and yields extremely efficient classifiers. The third contribution is a

    method for combining classifiers in a cascade which allows

    background regions of the image to be quickly discarded while

    spending more computation on promising object-like regions. [2]

    Haar face detector is implemented in OpenCV. The function

    cvHaarDetectObjects( ) finds rectangular regions in the given image

    that are likely to contain objects the cascade has been trained for and

    returns those regions as a sequence of rectangles. The function scans

    the image several times at different scales. Each time it considersoverlapping regions in the image and applies the classifiers to the

    regions. It may also apply some heuristics to reduce number of

    analyzed regions, such as Canny pruning.

    The default parameters are

    1. scale_factor = 1.1, min_neighbors = 3, flags = 0, tuned foraccurate yet slow face detection.

    2. For faster face detection on real video images the bettersettings are scale_factor = 1.2, min_neighbors = 2,

    flags=CV_HAAR_DO_CANNY_PRUNING.

    However, each setting is associated with a cost. While, the def

    settings might detect faces most of the cases, it might also decla

    face when there is none in the image. On the other, though the o

    setting might not detect faces when there is no face in the imag

    might also not detect if there are faces in the image. Figs. 2 an

    explain this possibility.

    (2a) Original Image

    (2b) Results with setting 2

    (2c) Results with setting 1 (default)

    Figure 2: Results showing the better results of default setting

  • 7/31/2019 Face Detection Evaluation

    5/11

    (3a) Original Image

    (3b) Results with setting 2

    (3c) Results with setting 1 (default)

    Figure 3: Results showing the false positives produced by default

    settings.

    Though, the default settings produced false positives, the fact thatit detects the face accurately when present, motivated its usage as the

    settings. The Haar face detection results shown from now on were

    obtained using the default settings.

    4. Skin color-based face detector [3]The skin color-based face detector proceeds in the following

    steps.

    1. Convert the RGB image into the YCbCr color space.

    reason behind this is that that segmentation of skin colored regi

    becomes robust only if the chrominance component is used

    analysis. Therefore, the luminance component is eliminated as m

    as possible by choosing the CbCr plane (chrominance) of the YC

    color space to build the model.

    2. Regions of interest are carefully extracted from the imag

    training pixels. Regions containing human skin pixels as well as n

    skin pixels are collected. The mean and covariance of the datab

    characterize the model. It is modeled as a unimodal Gaussian.

    mean and covariance is estimated using EM algorithm. (

    algorithm was implemented as part of another project by me in

    2003. The same code was used).

    It can be seen in Figure 4 that the color of human skin pixe

    confined to a very small region in the chrominance space, whic

    distinct from the non-skin region.

    Figure 4: CbCr plane of skin and non-skin regions

    Let c = [Cb Cr] T be the chrominance vector of an input pixel.

    probability that the given pixel lies in the skin distribution is gi

    by

    =

    s

    ssTs cc

    skincp

    2

    )()(2

    1exp

    )/(

    1

    where s and s represent the mean vector and the covariamatrix respectively of the training pixels. This gives the probab

    of a pixel occurring given that it is a skin pixel. Similarly,

    calculatep(c/non-skin).

    The posterior probability that a pixel represents skin given

    chrominance vector c,p(skin/c) is evaluated using Bayes theorem

    )/()/(

    )/()/(

    skinnoncpskincp

    skincpcskinp

    +

    =

    An input image is analyzed pixel-by-pixel evaluating the

    probability at each pixel. This results in a gray level image where

    gray value gives the probability of the pixel representing skin. T

    image is thresholded to get to obtain a binary image. A cor

    choice of threshold is critical. Increasing the threshold will incre

    the chances of losing certain skin regions exposed to adverse ligh

    conditions, during thresholding. Also, the extra regions that

  • 7/31/2019 Face Detection Evaluation

    6/11

    retained in the image because of the lower threshold can be removed

    using connected component operators.

    3. The resulting image after stage 2 contains lot of noise. So the

    image is opened using a disk shaped structuring element. The effect

    of using area open is removal of small and bright regions in the

    thresholded image. The size of the structuring element should not be

    more than that of the smallest face the system is designed to detect.

    A set of shape based connected operators are applied over these

    remaining components to decide whether they represent a face or

    not. These operators make use of basic assumptions about the shape

    of the face.

    4. Compactness It is defined as the ratio of its area to the square

    of its perimeter.

    2P

    AsCompactnes =

    This criterion is maximized for circular objects. Face component

    exhibits a high value for this operator. If a particular component

    shows a compactness value greater than this threshold, it is retained

    for further analysis, else discarded.

    5. Solidity For a connected component, solidity is defined as theratio of its area to the area of the rectangular bounding box.

    yxDD

    ASolidity =

    It gives a measure of area occupancy of a connected component

    within its min-max box dimensions. The solidity assumes a high

    value for face components. If a particular component shows a

    solidity value greater than a threshold, it is retained, else discarded.

    6. Aspect ratio It is assumed that normally face components

    have an aspect ratio well within a certain range. If a components

    aspect ratio falls out of this range, the component is eliminated.

    x

    y

    D

    DRatioAspect =

    7. Normalization The remaining unwanted components are

    removed using Normalized Area. It is the ratio of the area of the

    connected component to that of the largest component present in the

    image. Connected components that are less than this threshold are

    eliminated.

    The connected components that remain at this stage contain faces.

    Figure 5 walks through all the steps for the image shown in Figure

    5a.

    (5a) Original Image

    (5b) Binary image showing probable skin areas

    (5c) After erosion with a disk structuring element

    (5d) After dilation with the same structuring element

    (5e) After compactness thresholding

  • 7/31/2019 Face Detection Evaluation

    7/11

    (5f) After solidity thresholding

    (5g) After aspect ratio thresholding

    (5h) After normalized area thresholding

    (5i) Final result

    Figure 5: Various stages in face detection using skin-color and

    connected component operators.

    The parameter setting that was used for the above image is:

    Probability threshold 0.1 Size of SE (for opening) 17 x 17 pixels, disk shaped (Im

    size 816 x 616 pixels)

    Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.9 2.1 Normalization threshold 0.35It is worth mentioning here as to how the parameter settings w

    finalized. This is derived from the scatter plots of the Recall

    Precision measures. Again, this is one of the primary uses

    empirical evaluation. This is explained in detail in a later section.

    5. Ground truthThe following guidelines were used while ground-truthing

    images for evaluation.

    A face is bounded by a rectangle, where the area includeseyebrows, eyes, nose mouth and chin.

    There should be a small but clear space between these fafeatures and the bounding box.

    The ears and top hair are not included in the face.For clear visualization, the ground truth images are shown

    section 7 along with the results of each of the methods on the

    images.

    One of the major issues with evaluation is the quality of gro

    truth. How reliable the ground truth is? To account for

    ambiguity, care was taken to make the evaluation insensitiv

    ground truthing errors. Measures that use area overlaps, were ma

    little lenient in the sense that their contribution to the final score

    less weight as against measures such as fragmentation, object c

    accuracy etc. The approach of weighing the different metric va

    will also be helpful in extending the evaluation protocol to diffedomains.

    6. Parameter setting of skin color-based face detectAnother important application of performance evaluation i

    arrive on global setting of parameters that influence the performa

    of an algorithm. Toward this end, a set of measures that

    representative of the performance of the algorithm were chose

    tracking the algorithm performance. Area-Thresholded Recall

    Precision give the overall picture of the performance. These

    measures were chosen to decide the setting of parameters for the

    color-based face detector.

    Also, as for the parameters, only the parameters that affected

    performance most were chosen. The values of solidity compactness thresholds did not seem to be varying the results

    considerable extent. Also, keeping the probability threshold as low

    possible is always advisable as missing the skin region is not go

    So the aspect ratio and normalized area thresholds were varied

    the scatter plots of ATR and ATP were plotted. The global settin

    these parameters was decided at the value, where the values of

    measure peaked for most of the images.

  • 7/31/2019 Face Detection Evaluation

    8/11

    Figure 6: Scatter plot of ATR/ATP against Normalized Area

    Threshold

    A value of 0.35 is chosen for Normalized Area Threshold because

    this is the value at which the ATR does not go very low for most of

    the images, while the ATP is maintained at best possible value.

    A similar plot was made for the Aspect Ratio range against

    ATR/ATP. On the same lines of deciding the value for Normalized

    Area threshold, the value for Aspect Ratio range was decided as 0.9

    2.1. Since it has two values lower threshold and higher threshold,

    there have to be two plots to show the effect of each. The reader

    might not be able to appreciate the decision through two different

    plots. Hence, the plot has not been shown here.

    It is important to note that the performance of the algorithm might

    not be the best for all images. With a different setting, the

    performance might increase. This is shown in Fig. 7.

    (7a) Results with global settings(One face is totally missed)

    (7b) Results with image specific settings

    (Though one of the faces is fragmented, it is localized proper

    Figure 7: Results explaining the tradeoff between global sett

    and per image setting of parameters.

    The result in Fig. 7a) is obtained with the global setting, while

    result in Fig. 7b) is obtained with the following settings.

    Probability threshold 0.1 Size of SE (for opening) 5 x 5 pixels, disk shaped (Im

    size 400 x 276 pixels)

    Compactness threshold 0.025 Solidity threshold 0.5 Aspect ratio 0.3 2.1 Normalization threshold 0.05However this is acceptable because one cannot set the param

    on a per image basis. The whole process of face detection shoul

    an automated process with no user intervention with the param

    settings.

    7. Evaluation ResultsBased on the performance metric values (Refer last page for

    evaluation results Fig. 9), one can conclude the following: -

    1. The Haar face detector is more robust in detecting a face in

    image. This is apparent from the fact that the Area Threshol

    Recall of Haar face detector is always higher than or equal to

    skin color-based face detector.

    2. Both the algorithms produce false positives. However, based

    the values of Precision, we can infer that, often Haar face dete

    produces lesser false positives than skin color-based face detector

    3. On this dataset, both the algorithms, when they detect a face, t

    detect it in whole. This can be seen from the average fragmenta

    measure for the test images. Again, this is specific to the test data

    used. The skin color-based face detector is expected to be pron

    errors in terms of fragmentation. In fact when tried on a diffeimage from a database from which no images were used in train

    the results of face detection showed fragmentation.

  • 7/31/2019 Face Detection Evaluation

    9/11

    where,

    G Di i

    i i

    min (N ,N ) G DOverlap Ratio =

    G Di=1

    I

    U

    Here, , indicates the maximum number of one-to

    mapping between ground truth objects and detected boxes. Howe

    work has to be done in checking the failure cases of the measureboundary conditions. Initial results have been promising in th

    successfully captures the aspects stated above.

    i imin (G ,D)

    Finally, the essence of evaluation is to improve the performa

    of the algorithm. Here we have noticed that the skin color based

    detector does not perform as good as the Haar face detector. In f

    even tweaking the parameters does not yield best results. This sh

    that the method based on skin color is not robust.

    Figure 8: Results of skin color-based face detector. The detected

    face is fragmented. Also, there are false positives.9. Conclusion

    Two face detection algorithms one based on Haar-like feat

    and the other based on skin color have been implemented. Both

    methods have been empirically evaluated and their performa

    quantified. Based on the results, we can declare that the Haar detector outperforms skin color-based face detector in almost all

    aspects of evaluation.

    This image was not included in the test set because color is

    sensitive to the camera used. Since this image was taken from a

    different data set on which the classifier was not trained on, it

    wouldn't be genuine to test the algorithm on this data. Again this is

    one of the major drawbacks skin color-based face detector. It has to

    be trained with skin and non-skin pixels from images taken from a

    camera whose images will be present in the test set. The Haar face

    detector is not limited by any such constraints. From these, we can

    declare that the performance of Haar face detector is better than the

    skin color-based face detector in all the aspects.

    Efforts on improving the performance of the skin-color ba

    method have proven to be futile. This probably is due to the inab

    of the method in handling challenging situations. Even a cur

    investigation of the method reveals the fact that color is not a g

    feature to rely on. It can vary due to different lightings, cameras

    other factors such as shadow etc Since, the evaluation is

    subjective and the performance has been quantified, there is

    ambiguity with the conclusion.8. Future WorkEffort has to be directed in making the performance evaluation

    insensitive to ground truthing errors. This is an extremely difficult

    task. However, measures such as Area Thresholded Recall and

    Precision are efforts in this direction. However, there is still scope

    for improvement. This aspect has to be explored.

    References

    [1] Kasturi R, Goldgof D, Soundararajan P, Manohar V, PerformEvaluation Protocol for Text and Face Detection & Tracking in VAnalysis and Content Extraction (VACE-II), Report Submitted to AdvaResearch and Development Activity, March 2004.Another point is the fact that there are probably too many

    measures. Considering the fact that they cover different aspects of

    performance, this is acceptable. However, there have to be measures

    that comprehensively cover all aspects of an algorithm. To this end,

    we have developed a comprehensive measure that accounts for

    fragmentation (splits), merges, area overlap and false positives.

    [2] Viola P and Jones M.J Robust real-time object detection. In Pro

    IEEE Workshop on Statistical and Computational Theories of Vision, 20[3] Kuchi P, Gabbur P, Bhat S, David S, Human Face Detection

    Tracking using Skin Color Modeling and Connected CompoOperators, IETE Journal of Research, Special issue on Visual MProcessing, May 2002.

    This measure is mainly intended to comprehensively cover many

    aspects in one measure. However, this measure requires a one-to-one

    mapping of ground truth and detected objects. This is an area-based

    measure, which penalizes false detections, missed detections and

    spatial fragmentation. For a single image, we define CAM, the

    detection composite measure; (given that there are NG ground-truthobjects andND detected objects in the image) as,

    2

    G D

    Overlap RatioCAM =

    N + N

    http://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdfhttp://www.public.asu.edu/~pkuchi/IETE_2002.pdf
  • 7/31/2019 Face Detection Evaluation

    10/11

    Evaluation Results

    A-1 A

    OCA 1 PBR 1

    PBP .98

    ATR 1 ATP 1

    ABR .98

    ABP .96

    AF 1 N

    9.1 (a) 9.1 (b) 9.1 (c)

    A-1 A

    OCA 1 PBR .97

    PBP .81

    ATR 1

    ATP 1 ABR .9

    ABP .73

    AF 1 9.2 (a) 9.2 (b) 9.2 (c)

    A-1 A

    OCA .8 PBR .8

    PBP .87

    ATR .67

    ATP 1

    ABR .67 ABP .82

    AF 1

    9.3 (a) 9.3 (b) 9.3 (c)

  • 7/31/2019 Face Detection Evaluation

    11/11

    9.4 (a) 9.4 (b) 9.4 (c)

    Figure 9: Results of Evaluation

    (a) Ground Truth Image (b) Results of Haar face detection (c) Results of skin color-based face detectionA-1 Haar face detector; A-2 Skin color-based face detector

    OCA Object Count Accuracy; PBR Pixel Based Recall; PBP Pixel Based Precision; ATR Area Thresholded Recall; ATP A

    Thresholded Precision; ABR Area Based Recall; ABP Area Based Precision; AF Average Fragmentation

    OVERLAP_MIN was kept at 40% for all the ATR/ATP measurements.

    A-1 A

    OCA 1 PBR .64

    PBP .62

    ATR .5 ATP .5

    ABR .49

    ABP .47

    AF 1