Viola Jones Presentation

8/18/2019 Viola Jones Presentation

1/33

Object Detection: The Viola-Jones Face Detector

Augusto Morgan

Institute of Computing - University of Campinas

[email protected]

June 9, 2014

Augusto Morgan (IC) Viola-Jones Face Detector June 9, 2014 1 / 22

http://find/http://goback/


2/33

Overview

1 Object Detection

2 Viola-Jones Face DetectorHaar-like features and the integral imageAdaBoostCascade of Weak Classiers

3 Haar-like Features Extended Set


http://find/


3/33

Object Detection

How can we detect objects in an image?


http://find/


4/33

Object Detection

How can we detect objects in an image?

We can use a classier:Given an image, is it the object we are looking for or not?

But what if the images contains a lot of other objects?We are interested in nding where in the image are the objects.


http://find/


5/33

Sliding Window

We can use the classier in small portions of the image!

We slice the image in small subwindows and apply the classier on eachone of them.

Problems?


http://goforward/http://find/http://goback/


6/33

Viola-Jones Real-Time Face Detector

Proposed in 2001 by Paul Viola and Michael Jones

It discards a great number of negative samples before applying toomuch processing time on them, achieving high frame-rates

How does it achieve that?


http://find/


7/33

Haar wavelet function

The classier used in the paper is bases on Haar-like features.

Haar waveletfunction:

ψ (t ) =1 0 ≤ t < 12 ,

− 1 12 < t ≤ 1,

0 otherwhise .

Figure: Haar wavelet


http://find/


8/33

Haar-like Features

Rectangles representing a score based on positive areas and negative areas.

Three kind of features: 2, 3 and 4 rectangles.Each feature is calculated by:

f (i ) = I White − I Black

Figure: The different types of Haar-LikeFeatures


http://find/


9/33

Haar-like Features

Rectangles representing a score based on positive areas and negative areas.Three kind of features: 2, 3 and 4 rectangles.

Each feature is calculated by:

f (i ) = I White − I Black

Problem: The number of Haar-Like Features is too large!

For a 24x24 pixels window there

are more than 160,000 distinctHaar-Like Features.

Note: this set is overcomplete.Figure: The different types of Haar-LikeFeatures




10/33

The Integral Image

New intermediate representation of the image, similar to the SummedArea Table used in CG.

Each pixel (x,y) contains the sum of the original pixels above and to theleft of (x,y), inclusive.

ii (x , y ) =x < x y < y

i (x , y )

It can be computed in one pass overthe original image.

Figure: The integral image


http://find/http://goback/


11/33

Features Calculation using the Integral Image

Figure: The sum of onerectangle using the integralimage

The sum of each rectangle canbe calculated using the integralimage in four array references.

Sum (R ) = ii (A)− ii (B )− ii (D )+ ii (C )

Each feature can then be

calculated in a few arrayreferences.


http://find/


12/33

Advantages and Drawbacks

Rectangular Features are very simple and coarse.

However they are really fast!


http://find/


13/33

Advantages and Drawbacks

Rectangular Features are very simple and coarse.

However they are really fast!

They can be calculated at different scales without the need to calculate aGaussian Pyramid and each level integral image, wich speeds up its usewith multiscale detection.

Every other feature strategy that need the Pyramid to be calculated formultiscale runs slower than this approach.


http://find/


14/33

Training the Classier

Given the features and the set of positive and negative examples, anyclassier can be trained.

There are, however, a huge number of features.A very small number of features can be combined to create an effectiveclassier.

How to nd these features?




15/33

A weak classier

The weak classier used in the paper takes as input a sub-window (x ) andconsists of a feature (f ), a threshold (θ) and a polarity (p ) indicating thedirection of the following inequality:

h(x , f , θ, p ) =1 pf (x ) < p θ,0 otherwhise .

The weak classier used can be viewed as a single node decision tree, astump.

For each feature, an optimal threshold is associated, which is used tominimize the number of missclassications.


http://find/


16/33

AdaBoost

AdaBoost is used to boost the performance of a simple learning algorithm.It combines weak classication functions, to create a more powerfull one.


http://find/


17/33

AdaBoost

AdaBoost is used to boost the performance of a simple learning algorithm.It combines weak classication functions, to create a more powerfull one.

At each round the examples are re-weighted to emphasize those whichwere incorrectly classied by the previous weak classier.

The nal strong classier is a weighted combination of weak classiersfollowed by a threshold.


http://find/


18/33

AdaBoost

We can see the AdaBoost procedure as a greedy feature selection process:

AdaBoost is actually selecting a small set of good features.


d

http://find/


19/33

AdaBoost

We can see the AdaBoost procedure as a greedy feature selection process:

AdaBoost is actually selecting a small set of good features.

This way, the weak learning algorithm tries to select the single rectanglethat best separate the positive and negative examples.


T i i

http://find/


20/33

Training

Done in multiples rounds.


T i i

http://find/


21/33

Training


All examples start with the same weight.


T i i

http://find/


22/33

Training



At each round it searches over a large set of features and thresholds,choosing the feature/threshold that minimize the weighted error.


T i i g



23/33

Training



At each round it searches over a large set of features and thresholds,choosing the feature/threshold that minimize the weighted error.

The examples wrongly classied have their weight changed and the process

is repeated.


Considerations



24/33

Considerations

Huge set of possible features and related thresholds (NK , where N is thenumber of examples and K the number of features).

For 20000 samples and 160000 features (the number for the 24x24 pixelssubwindow) contains 3.2 billion distincts classiers!

If using M rounds, AdaBoost takes O (MKN ).


Considerations

http://find/


25/33

Considerations

Huge set of possible features and related thresholds (NK , where N is thenumber of examples and K the number of features).

For 20000 samples and 160000 features (the number for the 24x24 pixelssubwindow) contains 3.2 billion distincts classiers!

If using M rounds, AdaBoost takes O (MKN ).

For each subwindow, all the classiers are used and combined to get the

nal answer.What if we could eliminate subwindows earlier?


The Attentional Cascade

http://find/


26/33


The insight is that smaller, and therefore more efficient, boosted classiers

can be constructed which reject many of the negative sub-windows whiledetecting almost all positive instances.





27/33


The insight is that smaller, and therefore more efficient, boosted classiers

can be constructed which reject many of the negative sub-windows whiledetecting almost all positive instances.

This can be done by adjusting the threshold in the AdaBoost algorithm, tominimize false-negatives.

Figure: The rst features selected by AdaBoost



http://find/


28/33


They achieved 100% Hit Rate, and 50% False Positive in the rst 2feature classier.

Far from acceptable, but, with a few operations they can discard around50% of the non-face sub-windows. And this is only the rst classier.



http://find/


29/33


They achieved 100% Hit Rate, and 50% False Positive in the rst 2feature classier.

Far from acceptable, but, with a few operations they can discard around50% of the non-face sub-windows. And this is only the rst classier.

A cascade of classiers is built this way, with the positive output of eachone, activating the next one, using the more complex classiers only in thesub-windows that are more likely a face.

Since the great majority of sub-windows of an image are negative, thecascade tries to eliminate as many sub-windows as possible at the earlieststage possible.



http://find/


30/33


Figure: The Classier Cascade

In the end, a post-processing step is taken to handle multiple-detections of the same face, to have no duplicates.


Haar-like Features Extended Set

http://find/


31/33

Haar like Features Extended Set

Proposed by Rainer Lienhart and Jochen Maydt in 2002.

Same principle, more variability.

Figure: The extended Haar-like feature set


Rotated Summed Area Table

http://find/


32/33

Rotated Summed Area Table

Figure: The rotated “integral image”


References



33/33

Viola, P. and Jones. M., CVPR 2001, Rapid Object Detection using aBoosted Cascade of Simple Features

Viola, P. and Jones. M., International Journal of Computer Visionv. 57 2004, Robust Real-Time Face Detection.

Lienhart, R. and Maydt, J., IEEE ICIP 2002, An Extended Set of Haar-like Features for Rapid Object Detection

Weisstein, Eric W. “Haar Function.” From MathWorld–A WolframWeb Resource . http://mathworld.wolfram.com/HaarFunction.html


Viola Jones Presentation

Documents

Transcript of Viola Jones Presentation