Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Face Detection

Lecturers:Mor Yakobovits

Roni Karlikar

Supervisor:Hagit Hel-Or

Introduction

Humans can easily detect faces, although faces can be very different from each other.

• Humans have also tendency to see face patterns even if they don’t really exist.

Faces everywhere

5 http://www.marcofolio.net/imagedump/faces_everywhere_15_images_8_illusions.html

Face Detection

• The problem of face detection is:Given an image, say if it contains faces or not.

• The idea of face detection in computer vision is to let the computer learn to detect faces in images, just as a human can do.

Applications of Face Detection

• Auto-focus in cameras• Security systems (recognize faces of certain

people)• Human-computer interface• Marketing systems• Much more..

Difficulties of Face DetectionBuilding a model for faces is not a simple task,faces can be complex and vary from each other. Faces in images are also affected from the environment.

Difficulties - Changing lightening

• Affects color, facial feature

Difficulties - Skin Tone

• Large variety of skin tones.

Difficulties - Facial Expressions

• Affects shape of face and its features

Difficulties – Scaling and Angles

Difficulties - Obstructions

• Obstruction of facial features

Today’s Lecture

• We will talk about:

– Skin detection

– Eigenfaces

– Viola-Jones algorithm

Today’s Lecture

• All of the 3 approaches we’ll see today are based on learning.– The computer learns to detect faces.

Learning - Intro

• The learning model we’ll use is Classifier.– Purpose: classify data to several classes.– Training level: let the computer learn the features of each

class (face & non-face). This is done using a dataset with examples for instances of each class. (the instances are already classified)

– Classification: given a new instance, tell which class it belongs.

Example: Studying for exam by solving previous exams.

Face Detection Using Skin Detection

Probabilistic Approach

image source

http://graphics.cs.msu.ru/sites/default/files/research/skindetection_p1.jpg

Skin Detection

• Purpose:– Find “skin pixels” in a given image.

• The main question:- How to determine if a pixel is a “skin pixel”?

- Our approach will be to teach the computer what color is a skin color and what color isn’t.

Skin Detection

• Skin detection is a color(pixel) – based approach for detecting faces.

• This approach is quite simple

• but has limited results due to– high sensitivity to illumination and other changes in skin

tones– not only faces contain skin (arms, legs…)– some objects have similar colors to skin (for example,

wooden furniture)

Example for how illumination cause false-negative and false-positive detection.

“ ” , -Detecting Faces in Color Images by Hsu Abdel Mottaleb & Jain

(a) a yellow-biased face image (b) a light-compensated image (c) skin regions of (a) in white (d) skin regions of (b)

http://research.edm.uhasselt.be/~tcuypers/projects/Facedetection/facedetection-paper.pdf



Another examples for false-positive (chair in top left corner) & false-negative (dark area of face in the image of the soccer player) skin detections.

(1999)Rehg & Jones

http://www.cc.gatech.edu/~rehg/Papers/SkinDetect-IJCV.pdf


Skin Colors In RGB Color Space

97% of the skin-color bins overlaps with non-skin color bins.explanation could be-

many objects whose color resemble a skin color, like walls, railtracks, furnitures and wooden objects.

(1999)Rehg & Jones


Skin Classifier

skin

• The problem:Given a pixel x with color (r,g,b), determine if it’s a skin or not.

Skin Classifier

• Given x = (R,G,B), how do we determine it’s class? (skin/non-skin)

• Nearest neighbor– find labeled pixel closest to X– choose the class that pixel

• Data modeling– fit a model (curve, surface, or volume) to

each class

• Probabilistic data modeling– fit a probability model to each class– we’ll focus on this approach Orange dots – skin

Purple dots – non skin

source

http://pages.cs.wisc.edu/~lizhang/courses/cs766-2007f/syllabus/10-23-recognition/10-22-recognition.ppt

Probabilistic Skin Classifier

• Two approaches we’ll discuss– Gaussian-Based (parametric model)– Histogram-Based (non-parametric model)

Parametric modeling

Main Idea:• Assume the type/shape of the distribution we’re trying

to find.

• Find the parameters values for the assumed type from a training set.

• Single Gaussian Model– We assume the probabilistic distribution we are trying to

find is a Normal Distribution (Gaussian function)

• To find that distribution, all we need is:• - mean of the learned skin colors• - covariance matrix of the learned skin colors

Gaussian-Based Approach)Parametric model(

Those parameters are evaluated separately for each class

is a color vector!

• After we have the mean & covariance, we get:

• Where is the mean vector and is the covariance matrix of class j– For j=skin & j=non-skin


What we have

•P(rgb / skin) & P(rgb / ~skin) –“probability that a (non-)skin pixel will have the color rgb”

But that’s not what we want.

•We need P(skin / rgb) & P(~skin / rgb) –“the probability that a pixel with the color rgb is (non-) skin”

What we need

After we achieve that, we can use MAP estimation.

Remember Bayes’ Rule?

Bayes’ Rule

P(skin) is the portion of skin pixels from total pixels in the

learning dataset

P(R) can be calculated using the probabilities we already have.

MAP Estimation

Classification: A pixel with color R will be classified as skin iff P(skin / R) is higher than P(~skin / R)

)Maximum A Posteriori estimation(

MAP estimation- Maximizes the probability for the posterior, and so Minimizes the probability for false-negative misclassification

False-negative misclassification: a skin pixel classified as non-skin

• Problem with Single Gaussian Model:– Actual skin distribution might be too complex to be represented as a

Gaussian distribution.

• Solution: Mixture of Gaussians (MoG)– Represent the distribution with several different Gaussian distributions

to allow more dynamic modeling of the distribution

Another Gaussian-Based Approach)Parametric model(

Skin Color distribution in HSV color space

source

-HSV (Hue, Saturation, Value) separates color components from intensity(in RGB intensity affects all channels)

- Not the best color space for color-based approaches, but conversion is very simple compared to the better color spaces

http://ilab.cs.ucsb.edu/projects/haiying/skin_color_model_in_HS_space.jpg

• Drawback:– Slower learning because we need to use EM algorithm to estimate the

MoG– Slower classification, since it requires a evaluation all of the Gaussians

Mixed Non-skin Model Mixed Skin Model

• In the case of Mixture of Gaussians:


Classification: Use Bayes’ and then MAP

MoG vs. Single Gaussian

source

Single GaussianMoG Training set distribution

http://www.jneuroengrehab.com/content/2/1/22/figure/F4

EM algorithm – Expectation Maximization algorithm

Non-parametric modeling

Main Idea:• Do not assume anything about the distribution we are

looking for.

• Derive the distribution directly from the dataset.

Histogram-Based Approach

• Learn from a labeled dataset – for each color bin (256*256*256 ~ 16.7m in RGB), count

• how many pixels of that color were skin• how many pixels of that color were non-skin

• We get a histogram:

Non-parametric model

And a equivalent histogram for non-skin pixels.

(Our histograms will have three dimensions)

• we have P(rgb / skin) & P(rgb / ~skin)

• we need P(skin / rgb) & P(~skin / rgb)

Histogram-Based ApproachNon-parametric model

• A 3D histogram looks like this:

(1999)Rehg & Jones


• Viewing direction along the green-magenta axis which joins corners (0,255,0) and (255,0,255) in RGB

• The viewpoint was chosen to orient the gray line horizontally

• 8 bins in each color channel• Only shows bins with counts greater than 336,818


• Step by step explanation:– Learning:

1. Using a labeled dataset:1. For each color X:

count how many occurrences of X as skin pixel & non-skin pixels:

2. Normalize each histogram for each color X:


& respectively

• Step by step explanation: (cont’d)– Learning:

3. Apply bayes’ rule, for each color X:

• We have P(X|skin) from histogram N

• .P(skin) =

• .• P(X) =

• Symmetrically for


• Step by step explanation: (cont’d)– Classification:

• We are given a color X• Determine class with: (MAP estimation)

• Only 2 table look-ups!– One in the skin histogram and one in the non-skin histogram


• Assume we observed from the dataset:– 534 skin-pixels with the color (100, 100, 100)– 330 non-skin pixels with the color (100, 100, 100)– Total number of observed pixels is 10000

• 5000 skin pixels• 5000 non-skin pixels

• We get the corresponding probabilities:– P((100, 100, 100) | Skin ) = 534/5000= 0.1068– P((100, 100, 100) | Non-skin ) = 330/5000= 0.066

• The histograms (skin & non-skin):

Histogram-Based Approach - ExampleNon-parametric model

Example – cont’d

• P((100, 100, 100) | Skin ) = 0.1068• P((100, 100, 100) | ~Skin ) = 0.066

• Using Bayes’ Rule:

• P((100, 100, 100) | Skin ) is bigger than P((100, 100, 100) | ~Skin ) – And so every pixel with the color (100, 100, 100) will be classified as a skin pixel

Reminder: Bayes’ Rule -

Parametric vs. Non-parametric Modeling

Parametric Non-parametric

Can generalize with rather small dataset

Requires a big dataset to achieve good performance

Dataset size needed

Slower than non-parametric if we use EM algorithm

Fast and simple Learning

Rather slow, need to evaluate (at least) 2 Gaussians on each classification

Very fast, only 2 table look-ups required

Classification

Unproportionately smaller to non-parametric model

(Rehg & Jones – MoG needed 896 bytes)

Very big, we explicitly store the distribution for each color (16.7m for 3D color space)

(Rehg & Johns – needed 262 Kbytes

Storage space

Parametric vs. Non-parametric Modeling

Bibliography Statistical Color Models with Application to Skin Detection by

Rehg & Jones (1999) “Detecting Faces in Color Images” by Hsu, Abdel-Mottaleb

& Jain (2002) “A Survey on Pixel-Based Skin Color Detection Techniques” by

Vezhnevets, Sazonov & Andreeva (2003) http://alumni.media.mit.edu/~maov/classes/comp_photo_vis

ion08f/lect/05_skin_detection.pdf http://pages.cs.wisc.edu/~lizhang/courses/cs766-2007f/syllab

us/10-23-recognition/10-22-recognition.ppt







http://nichol.as/papers/Vezhnevets/A%20survey%20on%20pixel-based%20skin%20color%20detection.pdf







http://alumni.media.mit.edu/~maov/classes/comp_photo_vision08f/lect/05_skin_detection.pdf

http://alumni.media.mit.edu/~maov/classes/comp_photo_vision08f/lect/05_skin_detection.pdf



Eigenfaces

M.A. Turk and A.P. Pentland:

Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3 (1):71--86,

1991.

An image is a point in a high dimensional space• An N x M image is a point in RNM

• We can define vector for every image in this space

What is image?

The Space of Faces

• Images of faces being similar in overall configuration (like nose, mouth, eyes…) , will not be randomly distributed in this huge image space while the image space is very big (an 200x200 image is point in )

• Therefore, they can be described by a low dimensional subspace.

Eigenfaces-key ideas

Eigenfaces look somewhat like generic faces.

• Find basis vectors that describe the face space without losing a lot of data.

• Use them to detect faces

v 1

v 2

• Dimensionality reductionWe can represent the yellow points with only their v1 coordinates

• since v2 coordinates are all essentially 0

Dimensionality Reduction

Motivation• This makes it much cheaper to store and compare points• A bigger deal for higher dimensional problems (today there is 8 mega pixel

images).

The Problem:• In perfect world we can find small sub-space to

describe the face space without losing data. x2

x1

2 dimensions

y1

1-dimensions

But , This is not the situation!!!

1-dimensions2 dimensions

x

y

• What we can do? use PCA

PCA- Principal Component Analysis

The goal of PCA is to reduce the dimensionality of the data while retaining as much information as possible in the original dataset.

Dimensionality reduction• PCA allows us to compute a linear transformation that maps

data from a high dimensional space to a lower dimensional sub-space. K x N

Principal Component Analysis (PCA)

𝑦=𝑇𝑥𝑤h𝑒𝑟𝑒𝑇=[ 𝑡11 ⋯ 𝑡1 𝑁

⋮ ⋱ ⋮𝑡𝐾 1 ⋯ 𝑡𝐾𝑁 ]

𝑦 1=𝑡11 𝑥1+𝑡12𝑥2+. ..+𝑡1𝑁 𝑥𝑁

𝑦 2=𝑡 21𝑥1+𝑡 22𝑥2+ .. .+𝑡 2𝑁 𝑥𝑁

𝑦𝐾=𝑡𝐾 1𝑥1+𝑡𝐾 2𝑥2+. . .+𝑡𝐾𝑁 𝑥𝑁

− Dimensionality reduction implies information loss!− PCA preserves as much information as possible, that is, it minimizes the error.

− PCA minimizes the reconstruction error.


¿|𝑥− �̂�|∨¿x- the orginal vector.the vector after dimensionality reduction.(represent in the normal space)

• PCA assumes that the data follows a Gaussian distribution (mean µ, covariance matrix Σ).


PCA-example

Consider the variation along direction v among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

v 1

v 2

From 2 dimensions to 1-dimension

𝑇 h𝑒𝑑𝑎𝑡𝑎𝑤𝑒𝑙𝑜𝑠𝑒

v1 is eigenvector of AAT with largest eigenvaluev2 is eigenvector of AAT with smallest eigenvalue

Solution:

PCA-example (cont.)

¿𝑣𝑇 𝐴𝐴𝑇𝑣𝑤h𝑒𝑟𝑒 𝐴=¿

The best low-dimensional space can be determined by the “best” eigenvectors of the covarice matrix of x (i.e. the eigenvector corresponding to the “largest” eigen-values– also called “principal components”).

Why is it true? Intuition

𝐶= ∑𝑖∈𝑟 𝑒𝑑𝑝𝑜𝑖𝑛𝑡

❑

(𝑥 𝑖 , 𝑦 𝑖 ) (𝑥 𝑖

𝑦 𝑖)= ∑

𝑖∈𝑟 𝑒𝑑𝑝𝑜𝑖𝑛𝑡

❑

(𝑥 𝑖

2 𝑥 𝑖 𝑦 𝑖

𝑥𝑖 𝑦 𝑖 𝑦 𝑖2 )=¿¿( ∑ 𝑥 𝑖

2 ∑ 𝑥 𝑖 𝑦 𝑖

∑ 𝑥 𝑖 𝑦 𝑖 ∑ 𝑦 𝑖2 )

∑ 𝑥 𝑖 𝑦 𝑖 0 Because x and y are independent of each other.

Result:(∑ 𝑥 𝑖2 0

0 ∑ 𝑦 𝑖2)

Intuition(cont.)• What to do in complex

vector?

• Rotate the axis until we get the previous example

• find the eigenvector and rotate them to original axis

∑ 𝑥 𝑖 𝑦 𝑖≠0

Use PCA to Find eigenfacesImage RepresentationAn N*N image are represented by vectors of size N2

x1,x2,x3,…,xM

Example:

33154

213

321

191

5

4

2

1

3

3

2

1

Use PCA to Find eigenfaces,………,(very important : the face images must be centered and of the same size)

𝐸𝑥𝑎𝑚𝑝𝑙𝑒 :𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔𝑖𝑚𝑎𝑔𝑒𝑠

Use PCA to Find eigenfaces𝑀𝑒𝑎𝑛 :𝜇

𝑇𝑜𝑝𝑒𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟𝑠 :𝑢1,…𝑢𝑘

The result:

Problem1: Choosing the Dimension K

K NMi =

eigenvalues

• How many eigenfaces to use?• Look at the decay of the eigenvalues

– the eigenvalue tells you the amount of variance “in the direction” of that eigenface

– ignore eigenfaces with low variance

Example: For N = 1024 x 1024 pixels, Size of A will be 1048576 x 1048576 ! Number of eigenvectors will be 1048576!

Choosing the Dimension K- Example

Problem2: Size of Covariance Matrix C• Suppose each data point is -dimensional ( pixels)

– The size of covariance matrix C is – The number of eigenfaces is M’

– Example: For N = 1024 x 1024 pixels, Size of A will be 1048576 x 1048576 ! Number of eigenvectors will be 1048576 !

Typically, only 20-30 eigenvectors suffice. So, this method is very inefficient!

For every eigenvector u with eigenvalues (AAT) u = u =>AT(AAT) u = AT(u)

=>AT(AAT) u = (ATu)=> (ATA)(ATu) = (ATu)=>v=ATuis the eigenvector ofATA.

Efficient Computation of Eigenvectors

Find u from v:v=Atu => Av=AAtu=u => u=(1/)Av

Eigenfaces – summary in words

• Eigenfaces are the eigenvectors of the covariance matrix of the probability distribution of the vector space of human faces

• Eigenfaces are the ‘standardized face ingredients’ derived from the statistical analysis of many pictures of human faces

• A human face may be considered to be a combination of these standardized faces

Eigenfaces-key ideas

• Find basis vectors that describe the face space without losing a lot of data.

• Use them to detect faces

Projecting onto the Eigenfaces

• The eigenfaces v1, ..., vK span the space of faces

– A face is converted to eigenface coordinates by

,….,

Projecting onto the Eigenfaces

Detection with Eigenfaces

+……+

¿∨𝑥− �̂�∨¿<𝑡 h𝑟𝑒𝑠 h𝑜𝑙𝑑

• Option 1: Use the trainning set to check how much data lost.

• Option 2: Use validation data.

Problem: determination good threshold

Advantages of the Approach:

• Fast

• Simple

• Learning skill

• Robust to small change in the face

Limitations of Eigenfaces Approach

• Variations in lighting conditions– Different lighting conditions for

enrolment and query. – Bright light causing image

saturation.

• Light changes degrade performance (not Drastically)− Light normalization helps.

• Performance decreases quickly with changes to face size− Multi-scale eigenspaces.− Scale input image to multiple sizes.

• Performance decreases with changes to face orientation (but not as fast as with scale changes)

− Plane rotations are easier to handle.− Out-of-plane rotations are more difficult to handle.

Limitations of Eigenfaces Approach

Limitations of Eigenfaces Approach• Not robust to misalignment

Visualization://https. . / ? = 7www youtube com watch v YWRiF FAuKEhttp://demonstrations.wolfram.com/FaceRecognitionUsin

gTheEigenfaceAlgorithm/

https://www.youtube.com/watch?v=YWRiF7FAuKE

https://www.youtube.com/watch?v=YWRiF7FAuKE

http://demonstrations.wolfram.com/FaceRecognitionUsingTheEigenfaceAlgorithm/



Reconstruction using the eigenfaces

Bibliography• M.A. Turk and A.P. Pentland: Eigenfaces

for Recognition. Journal of Cognitive Neuroscience, 3 (1):71--86,1991.

• http://www.cs.cmu.edu/afs/cs/academic/class/15385-s12/www/lec_slides/lec-19.ppt

• http://www.cse.unr.edu/~bebis/CS485/Lectures/Eigenfaces.pptx

http://www.face-rec.org/algorithms/PCA/jcn.pdf






http://www.cs.cmu.edu/afs/cs/academic/class/15385-s12/www/lec_slides/lec-19.ppt



http://www.cse.unr.edu/~bebis/CS485/Lectures/Eigenfaces.pptx

http://www.cse.unr.edu/~bebis/CS485/Lectures/Eigenfaces.pptx

Viola Jones Algorithm for Face Detection

Paul Viola & Michael J. Jones

First publication (2001)Revised and improved (2004)

https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/viola-cvpr-01.pdf

http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

Viola – Jones Algorithm

• Object-detection algorithm– Feature-based– Today: example for implementation of face detector– Face detection was the motivation for the algorithm

• First real-time face detector– Speed is very important!

• Implemented in OpenCV library– Improved version of what we will see today

Features – what are they?

• How many features are needed to indicate the existence of a face?

• All faces share some common features:– The eyes region is darker than the

upper-cheeks.– The nose bridge region is brighter

than the eyes.– That is useful domain knowledge

• How can we encode such domain features?

Rectangle Features (or Haar-like Features)

• We will look for features inside 24x24 pixels window

• Each feature contains black & white rectangles,

the feature value is defined by:

• ∑ (pixels in white area) – ∑ (pixels in black area)

• Other explanation: correlation of the image with a mask with 1 in pixels of white areas, and -1 in pixels of black areas 0 0 0 0 0

0 0 -1 1 0

0 0 1 -1 0

0 0 0 0 0

Rectangle Features (Haar Features)

• The basic 4 features are on the right. – All other features are

obtained by change of orientation and/or change of scale of those 4


• For a 24x24 detection region, the number of possible rectangle features is ~160,000


• Some features correspond to common facial features. Examples:

Challenges

1) Feature Computation – as fast as possible

2) Feature Selection – too many features, need to select the most informative ones

3) Real-timeliness – focus mainly on potentially positive image areas (potentially faces)

Rectangles Feature Evaluation

• Feature evaluation is one of the basic operations in Viola & Jones algorithm which use it a lot

• Naively summing each rectangle is not practical

• We must find a way to evaluate features fast

Integral Image• Definition:

– The integral image at location (x,y) is the sum of the pixels above and to the left of (x,y), inclusive.

• The integral image can be computed in a single pass.

)x,y(

' , '

formal definition:

, ', '

Recursive definition:

, , 1 ,

, 1, ,

x x y y

ii x y i x y

s x y s x y i x y

ii x y ii x y s x y

s(x,y) = sum of pixels in row x, columns 1…y

i(x,y) is the imageii(x,y) is its integral image

Computing the Integral Image

ii(x, y-1)

s(x-1, y)

i(x, y)

' , '

formal definition:

, ', '

Recursive definition:

, , 1 ,

, 1, ,

x x y y

ii x y i x y

s x y s x y i x y

ii x y ii x y s x y

s(x,y) = sum of pixels in row x, columns 1…y

i(x,y) is the imageii(x,y) is its integral image

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)







ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D


-







ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D


Feature Evaluation Using Integral Image

• ∑ (pixels in white area) - ∑ (pixels in black area)

Example:

-1 +1+2-1

-2+1

Integral Image

A BC DE F

Black square = D – B – C + AWhite square = F – D – E + C

White - Black = -A+B+2C-2D-E+F

Our achievements – so far


2) Feature Selection – select the most informative features

3) Real-timeliness: focus mainly on potentially positive image areas (potentially faces)

Feature Selection

• The problem: too many features– In a 24x24 sub-window

there are ~160,000 possible features

– Impractical to evaluate all of the features in every candidate sub-window

• The solution: select the most informative features

– How? AdaBoost

AdaBoost Algorithm

• Introduced by Yoav Freund & Robert E. Schapire in 1995– Received Gödel Prize in 2003 for their work

• It is a machine-learning algorithm• Stands for Adaptive Boost

• AdaBoost is used to improve learning algorithms– Combines several “weak“ learners into a

“strong“ one

AdaBoost Algorithm

• Main idea: create a strong classifier by a linear combination of weighted simple weak classifiers.

Strong classifier

Weak classifier

WeightImage

AdaBoost – Intro• What are our “weak” classifiers?

– Each single rectangle feature is regarded as a “weak” classifier.

• It’s an iterative algorithm– Iteratively choose the best “weak” classifiers.– Tweak each “weak” classifier in favor of instances that were

misclassified by previous “weak” classifiers (thus Adaptive)

• What about the weights? Learning!

The weak classifiers

• A weak classifier hj(x) consists of a feature fj, a threshold j, and a parity pj indicating the direction of inequality sign:

x is a 24-by-24 sub-window of an image

otherwise0

)(if1)(

jjjj

θpxfxh

AdaBoost – How it works

• Given training set: – Xi is an 24x24 image, Yi is it’s label –

face/non-face– Each Xi has a weight

• Initially, all weights are equal

– This weights will be used to force the chosen “weak” classifiers to focus on unrepresented data in the training set

AdaBoost – How it works

Given: example images labeled +/- Initially, all weights set equally

Repeat T times: (T – number of “weak” classifiers we want) Step 1: choose the most efficient weak classifier that will be a

component of the final strong classifier

Step 2: Update the weights of the dataset images to emphasize the examples from the training set which were incorrectly classified This makes the next weak classifier to focus on “harder”

examples

AdaBoost – Feature Selection

• Step 1 is slow – there is a large set of possible weak classifiers to check (each “weak” classifier is in fact a single feature)

• Which feature to choose?– Choose the most informative one

• Test each “weak” classifier on the weighted training set• Choose the “weak” classifier with the best detection rates

AdaBoost – Feature Selection

• Finally we get a “strong” classifier that is a weighted combination of the best T “weak” classifiers– Weight of each classifier depends on its detection rate on the

training set– Higher weight for better classifier

Each is a “weak” classifier, is its weighth is the “strong” classifierx is a 24x24 image

otherwise

xxhT

t

T

t ttth0

2

1)(1)( 1 1

Boosting illustration

Weak Classifier 1

Slide source

http://www.cs.unc.edu/~lazebnik/spring09/lec23_face_detection.ppt


WeightsIncreased

Slide source



Weak Classifier 2

Slide source



WeightsIncreased

Slide source



Weak Classifier 3

Slide source



Final classifier is a combination of weak

classifiers

Slide source


AdaBoost - Conclusion

• AdaBoost selects a small set of informative features and use them to build a strong classifier

Feature Selection

• Top two features weighted by AdaBoost:

(specific to the training dataset that Viola & Jones used in their experiment)

Real-timeliness

• On average only 0.01% of all sub-windows in a image are positives (faces)

• We spend time equally on negative & positive windows

• Can we spend less time on non-faces?

Real-timeliness

• Attentional Cascade is the answer!– The idea: Cascade classifiers with gradually

increased complexity• An instance will get to layer 10, only if he passed

layers 1-9• 1st layer will use, say 2-3 features, to filter out

easy-to-find negative windows (non-faces)• 2nd layer will use, say 10 features, to filter out more

challenging negatives• And so on…

– Each layer will be a “strong” classifier obtained using AdaBoost

Cascading classifiers

Training a cascade

• First, we should decide:– How many layers? (strong classifiers)– How many features in each layer?– Threshold of each strong classifier?

• What is the optimal combination?– This is a complex problem

Training a cascade• Finding the optimum is not practical, any

workaround?

• Viola & Jones goal: no worse than 95% TP rate, 10-6 FP rate– Viola & Jones suggested an algorithm that:

• does not guarantee optimality, but• able to generate a cascade that meets their goal

Training a cascade - outline

• User selects:– fi (Maximum Acceptable False Positive rate / layer)

– di (Minimum Acceptable True Positive rate / layer)

– Ftarget (Target Overall FP rate)

– Trial & error process until target rates are met

• Until Ftarget is met:– Add new layer to the cascade: Until fi , di rates are met for this layer

Increase feature number & train new strong classifier with AdaBoost

Determine rates of the updated layer on training set

TrainingSet

(sub-windows)

IntegralRepresentation

Featurecomputation

AdaBoostFeature Selection

Cascade trainerTesting phaseTraining phase

Strong Classifier 1(cascade stage 1)

Strong Classifier N(cascade stage N)

Classifier cascade framework

Strong Classifier 2(cascade stage 2)

FACE IDENTIFIED

Viola & Jones Algorithm - Visualization

Viola & Jones Algorithm visualized

VIOLA & JONES ALGORITHM O PENCV IMPLEMENTAION VISU

ALIZATION

https://vimeo.com/12774628



Viola & Jones Algorithm

Detector 10 31 50 65 78 95 167 422Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9% 94.1%Rowley-Baluja-Kanade 83.2% 86.0% - - 89.2% 89.2% 90.1% 89.9%Schneiderman-Kanade - - - 94.4% - - - -

False detections

Viola & Jones prepared their final Detector cascade: 38 layers, 6060 total features included 1st classifier- layer, 2-features

50% FP rate, 99.9% TP rate 2nd classifier- layer, 10-features

20% FP rate, 99.9% TP rate next 2 layers 25-features each, next 3 layers 50-features each and so on…

Tested on the MIT+MCU test set a 384x288 pixel image on an PC (dated 2001) took about 0.067 seconds

Detection rates for various numbers of false positives on the MIT+MCU test set containing 130 images and 507 faces (Viola & Jones 2002)

Bibliography

• Rapid Object Detection using a Boosted Cascade of Simple Features by Viola & Jones (2001)

• Robust Real-Time Face Detection by Viola & Jones (2004)

• A Short Introduction to Boosting by Freund & Schapire (1999)

• http://webdocs.cs.ualberta.ca/~nray1/CMPUT466_551/ViolaJones.ppt





http://www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf

http://www.site.uottawa.ca/~stan/csi5387/boost-tut-ppr.pdf

http://webdocs.cs.ualberta.ca/~nray1/CMPUT466_551/ViolaJones.ppt

http://webdocs.cs.ualberta.ca/~nray1/CMPUT466_551/ViolaJones.ppt

Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Documents

Transcript of Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.