Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2),...

60
Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005) Robust Real-Time Face Detection 1

Transcript of Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2),...

Page 1: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Robust real-time face detection

Paul A. Viola and Michael J. Jones

Intl. J. Computer Vision

57(2), 137–154, 2004

(originally in CVPR’2001)(slides adapted from Bill Freeman, MIT 6.869, April 2005)

Robust Real-Time Face Detection1

Page 2: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Scan classifier over locs. & scales

Robust Real-Time Face Detection2

Page 3: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

“Learn” classifier from data

Robust Real-Time Face Detection3

Training Data• 5000 faces (frontal)• 108 non faces• Faces are normalized

Scale, translation

Many variations• Across individuals• Illumination• Pose (rotation both in plane and out)

Page 4: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Characteristics of Algorithm

Robust Real-Time Face Detection4

• Feature set (…is huge about 16M features)• Efficient feature selection using AdaBoost• New image representation: Integral Image • Cascaded Classifier for rapid detection

Fastest known frontal face detector for gray scale images

Page 5: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Integral Image

Robust Real-Time Face Detection5

Allows for fast feature evaluation Do not work directly on image intensities

Compute integral image using a few operations per pixel (similar with Haar Basis functions)

Page 6: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Simple and Efficient Classifier

Robust Real-Time Face Detection6

Select a small number of important features from a huge library of potential features using AdaBoost [Freund and Schapire,1995]

Page 7: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost, Adaptive Boosting

Robust Real-Time Face Detection7

Formulated by Yoav Freund and Robert Schapire.[1] It is a meta-algorithm, can be used in conjunction with many other learning algorithms

to improve their performance.

AdaBoost is adaptive subsequent classifiers are tweaked in favor of instances misclassified by previous classifiers.

Sensitive to noisy data and outliers. Less susceptible to the overfitting problem than most algorithms in some problems.

Calls a weak classifier repeatedly in a series of rounds from T classifiers. For each call

a distribution of weights Dt is updated that indicates the importance of examples in the data set

On each round, the weights of each incorrectly classified example are increased Or alternatively, the weights of each correctly classified example are decreased), The new classifier focuses more on those examples

Page 8: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost

Robust Real-Time Face Detection8

Given , Initialize For

For each classifier that minimizes the error with respect to the distribution

is the weighted error rate of classifier

If , then stop Choose , typically Update

where is a normalized factor (choose so that Dt+1 will sum_x=1)

1 1( , ),..., ( , )m mx y x y , { 1, 1}i ix X y Y

1

1( ) , 1,..., ,D i i m

m

1,...,t T: { 1, 1}th X

tD

arg mint

t th H

h

( )[ ( )]t t i t iD i y h x

0.5t

t R 11ln

2t

tt

t th

1

( ) exp( ( ))( ) t t i t i

tt

D i y h xD i

Z

tZ

Page 9: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost

Robust Real-Time Face Detection9

Output the final classifier

The equation to update the distribution Dt is constructed so that

After selecting an optimal classifier for the distribution Examples that the classifier identified correctly are weighted less Examples that is identified incorrectly are weighted more.

When the algorithm is testing the classifiers on the distribution it will select a classifier that better identifies those examples that

the previous classifier missed.

1

( ) ( )T

t tt

H x sign a h x

0, ( ) ( )( )

0, ( ) ( )t i

t i t it i

y i h xa y h x

y i h x

Page 10: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Characteristics of Algorithm

Robust Real-Time Face Detection10

• Feature set (…is huge about 16M features)• Efficient feature selection using AdaBoost• New image representation: Integral Image • Cascaded Classifier for rapid detection

Page 11: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Cascaded Classifier

Robust Real-Time Face Detection11

Combining successively more complex classifiers in a cascade structure Dramatically increases the speed of the detector by Focusing attention on promising regions of the image.

Focus of attention approaches It is often possible to rapidly determine where in an image a

face might occur (Tsotsos et al., 1995; Itti et al., 1998; Amit and Geman, 1999; Fleuret and Geman, 2001).

More complex processing is reserved only for these promising regions.

The key measure of such an approach is the “false

negative” rate of the attentional process.

Page 12: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Cascaded Classifier

Robust Real-Time Face Detection12

Training process An extremely simple and efficient classifier Used as a “supervised” focus of attention operator.

A face detection attentional operator Filter out over 50% of the image Preserving 99% of the faces over a large dataset

This filter is exceedingly efficient it can be evaluated in 20 simple operations per

location/scale

Page 13: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Overview

Robust Real-Time Face Detection13

Features: form and computing Combing features to form a classifier: AdaBoost Constructing cascade of classifiers Experimental results Discussions

Page 14: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Features

Robust Real-Time Face Detection14

Using features rather than image pixels

Features act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data

Much faster than a pixel-based system

Page 15: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Image features

Robust Real-Time Face Detection15

• “Rectangle filters” [Papageorgiou et al. 1998] Similar to Haar wavelets

• Differences between sums of pixels inadjacent rectangles

• About 160000 rectangle features for a 200x200 image

Page 16: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Integral Image

Robust Real-Time Face Detection16

Partial sum

Any rectangle is D = 1+4-(2+3)

Also known as:• summed area tables [Crow84]• boxlets [Simard98]

Page 17: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Huge library of filters

Robust Real-Time Face Detection17

Page 18: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Feature Discussion

Robust Real-Time Face Detection18

Primitive when compared with steerable filters, etc…

Excellent for the detailed analysis of boundaries, image compression, and texture analysis.

Sensitive to the presence of edges, bars, and other simple image structure

Quite coarse: only three orientations (|, X, --)

Overcomplete: 400 times, aspect ratio, location

Page 19: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Computational Advantage

Robust Real-Time Face Detection19

Face detector scans the input at many scales starting at the base scale: detect face at a size of 24 × 24

pixels, Then at 12 scales, 1.25 larger than the last 384 × 288 pixel image is scanned at the top scale

The conventional approach: Compute a pyramid of 12 images (smaller and smaller image) A fixed scale detector is scanned at each image.

Computation of the pyramid directly requires significant time. It takes around .05 seconds to compute a 12 level pyramid of

this size (on an Intel PIII 700 MHz processor) Implemented efficiently on conventional hardware (using bilinear

interpolation to scale each level of the pyramid)

Page 20: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Computational Advantage

Robust Real-Time Face Detection20

Define a meaningful set of rectangle features A single feature can be evaluated at any scale and

location in a few operations.

Effective detectors is constructed with two rectangle features.

Computational efficiency of features Face detection process can be completed for an entire

image at every scale at 15 frames per second About the same time required to evaluate the 12 level

image pyramid alone.

Page 21: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Classification Functions

Robust Real-Time Face Detection21

Any machine learning methods Given the feature set and training set

Mixture of Gaussian model (Sung and Poggio, 1998) Simple image feature and neural network (Rowley et al.

1998) Support Vector Machine (Osuna et al. 1997b) Winnow learning procedure (Roth et al. 2000)

160000 featuresEven though each feature can be

computed very efficiently, computingthe complete set is prohibitively expensive

Page 22: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost

Robust Real-Time Face Detection22

A very small number of features can be combined to form an effective classifier

Boost the classification performance Combining a collection of weak classification functions to form

a stronger classifier Weak learner

Do not expect even the best classification function to classify the training data well

The first round of learning Examples are re-weighted in order to emphasize those which were

incorrectly classified by the previous weak classifier. The final strong classifier

takes the form of a perceptron, a weighted combination of weak classifiers followed by a threshold.6

Training error of the strong classifier approaches zero exponentially in the number of rounds

Page 23: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost

Robust Real-Time Face Detection23

Selecting a small set of good classification functions nevertheless have significant variety Select effective features which nevertheless have significant

variety Restrict the weak learner to classification functions

Each function depends on a single feature

Select the single rectangle feature which best separates the positive and negative examples

1 if ( )( , , , )

0

pf x ph x f p

otherwise

threshold

24x24 subwindow

feature

Polarity indicating the direction of

inequality

Page 24: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost

Robust Real-Time Face Detection24

No single feature can perform the classification task with low error Features selected early: error rates 0.1~0.3 Features selected later: error rates 0.4~0.5

Threshold single features Single node decision trees Decision stumps

Page 25: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Constructing the classifier

Robust Real-Time Face Detection25

Perceptron yields a sufficiently powerful classifier

Use AdaBoost to efficiently choose best features• add a new hi(x) at each round

• each hi(xk) is a “decision stump”b=Ew(y [x> q])

a=Ew(y [x< q])x

hi(x)

Page 26: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Constructing the Classifier

Robust Real-Time Face Detection26

For each round of boosting:• Evaluate each rectangle filter on each example• Sort examples by filter values• Select best threshold for each filter (min error)

Use sorting to quickly scan for optimal threshold

• Select best filter/threshold combination• Weight is a simple function of error rate• Reweight examples

(There are many tricks to make this more efficient.)

Page 27: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost using single rectangular feature

Robust Real-Time Face Detection27

Given example images , Initialize weight For

Normalize the weights

Select the best classifier with respect to the weighted error

Define with the parameters minimizing Update weights

1 1( , ),..., ( , )m mx y x y 0,1iy

1,

1 1, for 0,1 respectively

2 2i iw ym l

1,...,t T

, ,min | ( , , , ) |t f p i i ii

w h x f p y

t( ) ( , , , )t t t th x h x f p

11, ,

iet i t i tw w

,,

,1

t it i n

t jj

ww

w

0 is classified correctly

1i

i

xe

otherwise

1t

tt

Page 28: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

AdaBoost using single rectangular feature

Robust Real-Time Face Detection28

The final strong classifier

1 1

11 ( )

( ) 2

0

1log

T T

t t tt t

tt

a h x aC x

otherwise

a

Page 29: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Good Reference on Boosting

Robust Real-Time Face Detection29

Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting

http://www-stat.stanford.edu/~hastie/Papers/boost.ps

“We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.”

Page 30: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Discussion

Robust Real-Time Face Detection30

The set of weak classifier is extraordinarily large One weak classifier for each distinct

feature/threshold combination KN weak classifier

K: the number of features N: the number of examples

Others have larger classifier sets Wrapper method

M weak classifier: O(MNKN) 10^16 operations AdaBoost

O(MKN) 10^11 operations

Page 31: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Discussion

Robust Real-Time Face Detection31

Dependency on N? Suppose that the examples are sorted by a given feature value. Any two thresholds that lie between the same pair of sorted

examples is equivalent. Therefore the total number of distinct thresholds is N

For each feature, sort the examples based on feature value Compute optimal threshold for that feature in a single pass

over this sorted list. For each element in the list, Compute

Total sum of positive example weights T+ Total sum of negative example weights T- the sum of positive weights below the current example S+ The sum of negative weights below the current example S-

Page 32: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Discussion

Robust Real-Time Face Detection32

Error of a threshold split the list

The final application demanded a very aggressive process which would discard the vast majority of features.

Page 33: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Other feature selection

Robust Real-Time Face Detection33

Papageorgiou et al.1998 Feature selection based on feature variance.

37 features out of 1734 features for every image subwindow: still large

Roth et al. 2000 Feature selection process based on the Winnow

exponential perceptron learning rule A very large and unusual feature set: each pixel is mapped into

a binary vector of d dimensions Concatenate all pixels to nd-D vector Perceptron: assign weight to each dimension Winnow learning process:

Converges to a solution where many of the weights are zero Very large number of features are retained (perhaps a few

hundred or thousand).

Page 34: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Results

Robust Real-Time Face Detection34

The classifier constructed from 200 features would yield reasonable results

1 in 14084

For a face detector to be practical for real applications, the false positive rate must be closer to 1 in 1,000,000.

Page 35: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Learning Results

Robust Real-Time Face Detection35

Features selected by AdaBoost are meaningful and easily interpreted

In terms of detection Results are compelling but not sufficient for many real-

world tasks. In terms of computation

Very fast, requiring 0.7 seconds to scan an 384 by 288 pixel image.

Page 36: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Attentional Cascade

Robust Real-Time Face Detection36

Achieves increased detection performance while radically reducing computation time

Construct boost classifier Rejecting many of negative sub-windows Detecting almost all positive instances. Adjusting the strong classifier threshold to minimize

false negatives: lower threshold

Page 37: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Attentional Cascade

Robust Real-Time Face Detection37

Further processing

1. Evaluate the rectangle features (requires between 6 and 9 array references per feature).

2. Compute the weak classifier for each feature (requires one threshold operation per feature)

3. Combine the weak classifiers (requires one multiply per feature, an addition, and finally a threshold).

Page 38: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Attentional Cascade

Robust Real-Time Face Detection38

Subsequent classifiers

Page 39: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Trading speed for accuracy

Robust Real-Time Face Detection39

Given a nested set of classifier hypothesis classes

Computational Risk Minimization

Page 40: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Training a Cascade of Classifiers

Robust Real-Time Face Detection40

Detection Goals Good detection rates (85%~95%) and Extremely low false positive rates (on the order of

10−5 or 10−6).

False positive rate of the cascade:

Detection rate:

1

K

ii

F f

1

K

ii

D d

To achieve a detection rate of 0.9 by a 10 stage classifier• each stage has a detection rate of 0.99• false positive rate 30% (0.3010 ≈ 6 × 10−6).

Page 41: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Training a Cascade of Classifiers

Robust Real-Time Face Detection41

The expected number of features:

Scheme for trading off these errors is to adjust the threshold of the perceptron produced by AdaBoost

the positive rate of the ith classifier

the number of features in the ith classifier

Page 42: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Tradeoffs in Training

Robust Real-Time Face Detection42

Classifiers with more features Achieve higher detection rates and lower false positive rates. require more time to compute

An optimization framework in which the number of classifier stages, the number of features, ni, of each stage, the threshold of each stage

are traded off in order to minimize the expected number of features N given a target for F and D.

Finding this optimum is a tremendously difficult problem.

Page 43: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Training Cascaded Detector

Robust Real-Time Face Detection43

A simple framework to produce effective and efficient classifier The user selects the maximum acceptable rate for fi and the

minimum acceptable rate for di .

Each layer of the cascade is trained by AdaBoost with the number of features used being increased until the target detection and false positive rates are met for this level. The rates are determined by testing the current detector on a

validation set.

If the overall target false positive rate is not yet met then another layer is added to the cascade. The negative set for training subsequent layers is obtained by

collecting all false detections found by running the current detector on a set of images which do not contain any instances of faces.

Page 44: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Training Cascaded Detector

Robust Real-Time Face Detection44

User selects values for f , the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer.

• User selects target overall false positive rate, F_target .

• P = set of positive examples, N = set of negative examples

• F0 = 1.0; D0 = 1.0, i = 0

• while F_i > F_target

– i ←i + 1 – ni = 0; Fi = Fi−1 – while Fi > f × Fi−1

∗ ni ← ni + 1 ∗ Use P and N to train a classifier with ni features using AdaBoost ∗ Evaluate current cascaded classifier on validation set to determine Fi and Di . ∗ Decrease threshold for the ith classifier until the current cascaded classifier has a

detection rate of at least d × Di−1 (this also affects Fi )

– N ← ∅ – If Fi > Ftarget

Evaluate the current cascaded detector on the set of non-face images put any false detections into the set N

Page 45: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Simple Experiment

Robust Real-Time Face Detection45

A monolithic 200-feature classifier and A cascade of ten 20-feature classifiers Training using

5000 faces + 10000 nonface sub-windows

Page 46: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Robust Real-Time Face Detection46

Page 47: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Simple Experiment

Robust Real-Time Face Detection47

A monolithic 200-feature classifier and A cascade of ten 20-feature classifiers Training using

5000 faces + 10000 nonface sub-windows

Little difference between them in terms of accuracy But cascaded classifier is nearly 10 times faster

since its first stage throws out most non-faces so that they are never evaluated by subsequent stages.

Page 48: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Detector Cascade Discussion

Robust Real-Time Face Detection48

Similar to Rowley et al. (1998) (fast) Trained two neural networks

One was moderately complex focused on a small region of the image, detected faces with a low false positive rate.

Second neural network much faster focused on a larger regions of the image, and detected faces with a higher false positive rate

This method two stage cascade include 38 stages

Page 49: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Training Dataset

Robust Real-Time Face Detection49

4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.

Page 50: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Structure of the Detector Cascade

Robust Real-Time Face Detection50

38 layer cascade of classifiers included a total of 6060 features

First classifier constructed using two features rejects about 50% of non-faces while correctly detecting close to 100% of faces.

The next classifier has ten features rejects 80% of nonfaces while detecting almost 100% of faces.

The next two layers are 25-feature classifiers Then three 50-feature classifiers Then classifiers with variety of different numbers of features chosen

according

Page 51: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Speed of Face Detector

Robust Real-Time Face Detection51

Speed is proportional to the average number of features computed per sub-window.

On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window.

On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).

Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.

Page 52: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Scanning The Detector

Robust Real-Time Face Detection52

Multiple scales Scaling is achieved by scaling the detector itself, rather

than scaling the image The features can be evaluated at any scale with the same

cost

Locations Subsequent locations are obtained by shifting the window

some number of pixels D choice of D affects both speed and accuracy

a step size > 1 pixel tends to decrease the detection rate slightly while also decreasing the number of false positives

Page 53: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Robust Real-Time Face Detection53

Page 54: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Integration of Multiple Detections

Robust Real-Time Face Detection54

Postprocess: combine overlapping detections into a single detection The set of detections are first partitioned into disjoint

subsets Two detections are in the same subset if their bounding

regions overlap.

Each partition yields a single final detection. The corners of the final bounding region are the average

of the corners of all detections in the set.

Decreases the number of false positives.

Page 55: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Integration of Multiple Detections

Robust Real-Time Face Detection55

A simple Voting Scheme further improves results Three detections performed similarly on the final task, but

in some cases errors were different. Retaining only those detections where at least 2 out of 3

detectors agree. This improves the final detection rate as well as

eliminating more false positives. Since detector errors are not uncorrelated, the

combination results in a measurable, but modest, improvement over the best single detector.

Page 56: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Sample results

Robust Real-Time Face Detection56

MIT + CMU test set

Page 57: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Failure Cases

Robust Real-Time Face Detection57

Trained on frontal, upright faces. The faces were only very roughly aligned so there is some variation in

rotation both in plane and out of plane. Detect faces that are tilted up to about ±15 degrees in plane and about

±45 degrees out of plane (toward a profile view). The detector becomes unreliable with more rotation.

Harsh backlighting in which the faces are very dark while the background is relatively light sometimes causes failures. Nonlinear variance normalization based on robust statistics to remove

outliers The problem with such a normalization is the greatly increased

computational cost within our integral image framework.

Fails on significantly occluded faces. Occluded eyes: usually fail. The face with covered mouth will usually still be detected.

Page 58: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Summary (Viola-Jones)

Robust Real-Time Face Detection58

• Fastest known face detector for gray images• Three contributions with broad applicability:

Cascaded classifier yields rapid classificationAdaBoost as an extremely efficient feature

selectorRectangle Features + Integral Image can be

used for rapid image analysis

Page 59: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Face detector comparison

Robust Real-Time Face Detection59

Informal study by Andrew Gallagher, CMU,for CMU 16-721 Learning-Based Methods in Vision, Spring 2007 The Viola Jones algorithm OpenCV implementation was

used. (<2 sec per image). For Schneiderman and Kanade, Object Detection Using

the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission).

Page 60: Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted.

Robust Real-Time Face Detection60

SchneidermanKanadeViola

Jones