Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

133
Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor : Hagit Hel-Or

Transcript of Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Page 1: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Face Detection

Lecturers:Mor Yakobovits

Roni Karlikar

Supervisor:Hagit Hel-Or

Page 2: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Introduction

Humans can easily detect faces, although faces can be very different from each other.

Page 3: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Humans have also tendency to see face patterns even if they don’t really exist.

Page 4: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.
Page 5: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Faces everywhere

5 http://www.marcofolio.net/imagedump/faces_everywhere_15_images_8_illusions.html

Page 6: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Face Detection

• The problem of face detection is:Given an image, say if it contains faces or not.

• The idea of face detection in computer vision is to let the computer learn to detect faces in images, just as a human can do.

Page 7: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Applications of Face Detection

• Auto-focus in cameras• Security systems (recognize faces of certain

people)• Human-computer interface• Marketing systems• Much more..

Page 8: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties of Face DetectionBuilding a model for faces is not a simple task,faces can be complex and vary from each other. Faces in images are also affected from the environment.

Page 9: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties - Changing lightening

• Affects color, facial feature

Page 10: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties - Skin Tone

• Large variety of skin tones.

Page 11: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties - Facial Expressions

• Affects shape of face and its features

Page 12: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties – Scaling and Angles

Page 13: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Difficulties - Obstructions

• Obstruction of facial features

Page 14: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Today’s Lecture

• We will talk about:

– Skin detection

– Eigenfaces

– Viola-Jones algorithm

Page 15: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Today’s Lecture

• All of the 3 approaches we’ll see today are based on learning.– The computer learns to detect faces.

Page 16: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Learning - Intro

• The learning model we’ll use is Classifier.– Purpose: classify data to several classes.– Training level: let the computer learn the features of each

class (face & non-face). This is done using a dataset with examples for instances of each class. (the instances are already classified)

– Classification: given a new instance, tell which class it belongs.

Example: Studying for exam by solving previous exams.

Page 17: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Face Detection Using Skin Detection

Probabilistic Approach

image source

Page 18: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Detection

• Purpose:– Find “skin pixels” in a given image.

• The main question:- How to determine if a pixel is a “skin pixel”?

- Our approach will be to teach the computer what color is a skin color and what color isn’t.

Page 19: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Detection

• Skin detection is a color(pixel) – based approach for detecting faces.

• This approach is quite simple

• but has limited results due to– high sensitivity to illumination and other changes in skin

tones– not only faces contain skin (arms, legs…)– some objects have similar colors to skin (for example,

wooden furniture)

Page 20: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Example for how illumination cause false-negative and false-positive detection.

“ ” , -Detecting Faces in Color Images by Hsu Abdel Mottaleb & Jain

(a) a yellow-biased face image (b) a light-compensated image (c) skin regions of (a) in white (d) skin regions of (b)

Page 21: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Another examples for false-positive (chair in top left corner) & false-negative (dark area of face in the image of the soccer player) skin detections.

(1999)Rehg & Jones

Page 22: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Colors In RGB Color Space

97% of the skin-color bins overlaps with non-skin color bins.explanation could be-

many objects whose color resemble a skin color, like walls, railtracks, furnitures and wooden objects.

(1999)Rehg & Jones

Page 23: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Classifier

skin

• The problem:Given a pixel x with color (r,g,b), determine if it’s a skin or not.

Page 24: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Classifier

• Given x = (R,G,B), how do we determine it’s class? (skin/non-skin)

• Nearest neighbor– find labeled pixel closest to X– choose the class that pixel

• Data modeling– fit a model (curve, surface, or volume) to

each class

• Probabilistic data modeling– fit a probability model to each class– we’ll focus on this approach Orange dots – skin

Purple dots – non skin

source

Page 25: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Probabilistic Skin Classifier

• Two approaches we’ll discuss– Gaussian-Based (parametric model)– Histogram-Based (non-parametric model)

Page 26: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Parametric modeling

Main Idea:• Assume the type/shape of the distribution we’re trying

to find.

• Find the parameters values for the assumed type from a training set.

Page 27: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Single Gaussian Model– We assume the probabilistic distribution we are trying to

find is a Normal Distribution (Gaussian function)

• To find that distribution, all we need is:• - mean of the learned skin colors• - covariance matrix of the learned skin colors

Gaussian-Based Approach)Parametric model(

Those parameters are evaluated separately for each class

is a color vector!

Page 28: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• After we have the mean & covariance, we get:

• Where is the mean vector and is the covariance matrix of class j– For j=skin & j=non-skin

Gaussian-Based Approach)Parametric model(

Page 29: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

What we have

•P(rgb / skin) & P(rgb / ~skin) –“probability that a (non-)skin pixel will have the color rgb”

But that’s not what we want.

Page 30: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

•We need P(skin / rgb) & P(~skin / rgb) –“the probability that a pixel with the color rgb is (non-) skin”

What we need

After we achieve that, we can use MAP estimation.

Remember Bayes’ Rule?

Page 31: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Bayes’ Rule

P(skin) is the portion of skin pixels from total pixels in the

learning dataset

P(R) can be calculated using the probabilities we already have.

Page 32: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

MAP Estimation

Classification: A pixel with color R will be classified as skin iff P(skin / R) is higher than P(~skin / R)

)Maximum A Posteriori estimation(

MAP estimation- Maximizes the probability for the posterior, and so Minimizes the probability for false-negative misclassification

False-negative misclassification: a skin pixel classified as non-skin

Page 33: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Problem with Single Gaussian Model:– Actual skin distribution might be too complex to be represented as a

Gaussian distribution.

• Solution: Mixture of Gaussians (MoG)– Represent the distribution with several different Gaussian distributions

to allow more dynamic modeling of the distribution

Another Gaussian-Based Approach)Parametric model(

Page 34: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Skin Color distribution in HSV color space

source

-HSV (Hue, Saturation, Value) separates color components from intensity(in RGB intensity affects all channels)

- Not the best color space for color-based approaches, but conversion is very simple compared to the better color spaces

Page 35: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Drawback:– Slower learning because we need to use EM algorithm to estimate the

MoG– Slower classification, since it requires a evaluation all of the Gaussians

Mixed Non-skin Model Mixed Skin Model

• In the case of Mixture of Gaussians:

Gaussian-Based Approach)Parametric model(

Classification: Use Bayes’ and then MAP

Page 36: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

MoG vs. Single Gaussian

source

Single GaussianMoG Training set distribution

Page 37: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

EM algorithm – Expectation Maximization algorithm

Page 38: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Non-parametric modeling

Main Idea:• Do not assume anything about the distribution we are

looking for.

• Derive the distribution directly from the dataset.

Page 39: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Histogram-Based Approach

• Learn from a labeled dataset – for each color bin (256*256*256 ~ 16.7m in RGB), count

• how many pixels of that color were skin• how many pixels of that color were non-skin

• We get a histogram:

Non-parametric model

And a equivalent histogram for non-skin pixels.

(Our histograms will have three dimensions)

Page 40: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• we have P(rgb / skin) & P(rgb / ~skin)

• we need P(skin / rgb) & P(~skin / rgb)

Histogram-Based ApproachNon-parametric model

Page 41: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• A 3D histogram looks like this:

(1999)Rehg & Jones

Histogram-Based ApproachNon-parametric model

• Viewing direction along the green-magenta axis which joins corners (0,255,0) and (255,0,255) in RGB

• The viewpoint was chosen to orient the gray line horizontally

• 8 bins in each color channel• Only shows bins with counts greater than 336,818

Page 42: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Step by step explanation:– Learning:

1. Using a labeled dataset:1. For each color X:

count how many occurrences of X as skin pixel & non-skin pixels:

2. Normalize each histogram for each color X:

Histogram-Based ApproachNon-parametric model

& respectively

Page 43: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Step by step explanation: (cont’d)– Learning:

3. Apply bayes’ rule, for each color X:

• We have P(X|skin) from histogram N

• .P(skin) =

• .• P(X) =

• Symmetrically for

Histogram-Based ApproachNon-parametric model

Page 44: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Step by step explanation: (cont’d)– Classification:

• We are given a color X• Determine class with: (MAP estimation)

• Only 2 table look-ups!– One in the skin histogram and one in the non-skin histogram

Histogram-Based ApproachNon-parametric model

Page 45: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Assume we observed from the dataset:– 534 skin-pixels with the color (100, 100, 100)– 330 non-skin pixels with the color (100, 100, 100)– Total number of observed pixels is 10000

• 5000 skin pixels• 5000 non-skin pixels

• We get the corresponding probabilities:– P((100, 100, 100) | Skin ) = 534/5000= 0.1068– P((100, 100, 100) | Non-skin ) = 330/5000= 0.066

• The histograms (skin & non-skin):

Histogram-Based Approach - ExampleNon-parametric model

Page 46: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Example – cont’d

• P((100, 100, 100) | Skin ) = 0.1068• P((100, 100, 100) | ~Skin ) = 0.066

• Using Bayes’ Rule:

• P((100, 100, 100) | Skin ) is bigger than P((100, 100, 100) | ~Skin ) – And so every pixel with the color (100, 100, 100) will be classified as a skin pixel

Reminder: Bayes’ Rule -

Page 47: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Parametric vs. Non-parametric Modeling

Parametric Non-parametric

Can generalize with rather small dataset

Requires a big dataset to achieve good performance

Dataset size needed

Slower than non-parametric if we use EM algorithm

Fast and simple Learning

Rather slow, need to evaluate (at least) 2 Gaussians on each classification

Very fast, only 2 table look-ups required

Classification

Unproportionately smaller to non-parametric model

(Rehg & Jones – MoG needed 896 bytes)

Very big, we explicitly store the distribution for each color (16.7m for 3D color space)

(Rehg & Johns – needed 262 Kbytes

Storage space

Page 48: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Parametric vs. Non-parametric Modeling

Page 49: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Bibliography Statistical Color Models with Application to Skin Detection by

Rehg & Jones (1999) “Detecting Faces in Color Images” by Hsu, Abdel-Mottaleb

& Jain (2002) “A Survey on Pixel-Based Skin Color Detection Techniques” by

Vezhnevets, Sazonov & Andreeva (2003) http://alumni.media.mit.edu/~maov/classes/comp_photo_vis

ion08f/lect/05_skin_detection.pdf http://pages.cs.wisc.edu/~lizhang/courses/cs766-2007f/syllab

us/10-23-recognition/10-22-recognition.ppt

Page 50: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Eigenfaces

M.A. Turk and A.P. Pentland:

Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3 (1):71--86,

1991.

Page 51: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

An image is a point in a high dimensional space• An N x M image is a point in RNM

• We can define vector for every image in this space

What is image?

Page 52: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

The Space of Faces

• Images of faces being similar in overall configuration (like nose, mouth, eyes…) , will not be randomly distributed in this huge image space while the image space is very big (an 200x200 image is point in )

• Therefore, they can be described by a low dimensional subspace.

Page 53: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Eigenfaces-key ideas

Eigenfaces look somewhat like generic faces.

• Find basis vectors that describe the face space without losing a lot of data.

• Use them to detect faces

Page 54: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

v 1

v 2

• Dimensionality reductionWe can represent the yellow points with only their v1 coordinates

• since v2 coordinates are all essentially 0

Dimensionality Reduction

Motivation• This makes it much cheaper to store and compare points• A bigger deal for higher dimensional problems (today there is 8 mega pixel

images).

Page 55: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

The Problem:• In perfect world we can find small sub-space to

describe the face space without losing data. x2

x1

2 dimensions

y1

1-dimensions

But , This is not the situation!!!

1-dimensions2 dimensions

x

y

• What we can do? use PCA

Page 56: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

PCA- Principal Component Analysis

The goal of PCA is to reduce the dimensionality of the data while retaining as much information as possible in the original dataset.

Page 57: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Dimensionality reduction• PCA allows us to compute a linear transformation that maps

data from a high dimensional space to a lower dimensional sub-space. K x N

Principal Component Analysis (PCA)

𝑦=𝑇𝑥𝑤h𝑒𝑟𝑒𝑇=[ 𝑡11 ⋯ 𝑡1 𝑁

⋮ ⋱ ⋮𝑡𝐾 1 ⋯ 𝑡𝐾𝑁 ]

𝑦 1=𝑡11 𝑥1+𝑡12𝑥2+. ..+𝑡1𝑁 𝑥𝑁

𝑦 2=𝑡 21𝑥1+𝑡 22𝑥2+ .. .+𝑡 2𝑁 𝑥𝑁

𝑦𝐾=𝑡𝐾 1𝑥1+𝑡𝐾 2𝑥2+. . .+𝑡𝐾𝑁 𝑥𝑁

Page 58: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

− Dimensionality reduction implies information loss!− PCA preserves as much information as possible, that is, it minimizes the error.

− PCA minimizes the reconstruction error.

Principal Component Analysis (PCA)

¿|𝑥− �̂�|∨¿x- the orginal vector.the vector after dimensionality reduction.(represent in the normal space)

Page 59: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• PCA assumes that the data follows a Gaussian distribution (mean µ, covariance matrix Σ).

Principal Component Analysis (PCA)

Page 60: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

PCA-example

Consider the variation along direction v among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

v 1

v 2

From 2 dimensions to 1-dimension

𝑇 h𝑒𝑑𝑎𝑡𝑎𝑤𝑒𝑙𝑜𝑠𝑒

Page 61: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

v1 is eigenvector of AAT with largest eigenvaluev2 is eigenvector of AAT with smallest eigenvalue

Solution:

PCA-example (cont.)

¿𝑣𝑇 𝐴𝐴𝑇𝑣𝑤h𝑒𝑟𝑒 𝐴=¿

The best low-dimensional space can be determined by the “best” eigenvectors of the covarice matrix of x (i.e. the eigenvector corresponding to the “largest” eigen-values– also called “principal components”).

Page 62: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Why is it true? Intuition

𝐶= ∑𝑖∈𝑟 𝑒𝑑𝑝𝑜𝑖𝑛𝑡

(𝑥 𝑖 , 𝑦 𝑖 ) (𝑥 𝑖

𝑦 𝑖)= ∑

𝑖∈𝑟 𝑒𝑑𝑝𝑜𝑖𝑛𝑡

(𝑥 𝑖

2 𝑥 𝑖 𝑦 𝑖

𝑥𝑖 𝑦 𝑖 𝑦 𝑖2 )=¿¿( ∑ 𝑥 𝑖

2 ∑ 𝑥 𝑖 𝑦 𝑖

∑ 𝑥 𝑖 𝑦 𝑖 ∑ 𝑦 𝑖2 )

∑ 𝑥 𝑖 𝑦 𝑖 0 Because x and y are independent of each other.

Result:(∑ 𝑥 𝑖2 0

0 ∑ 𝑦 𝑖2)

Page 63: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Intuition(cont.)• What to do in complex

vector?

• Rotate the axis until we get the previous example

• find the eigenvector and rotate them to original axis

∑ 𝑥 𝑖 𝑦 𝑖≠0

Page 64: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Use PCA to Find eigenfacesImage RepresentationAn N*N image are represented by vectors of size N2

x1,x2,x3,…,xM

Example:

33154

213

321

191

5

4

2

1

3

3

2

1

Page 65: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Use PCA to Find eigenfaces,………,(very important : the face images must be centered and of the same size)

Page 66: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

𝐸𝑥𝑎𝑚𝑝𝑙𝑒 :𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔𝑖𝑚𝑎𝑔𝑒𝑠

Page 67: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Use PCA to Find eigenfaces𝑀𝑒𝑎𝑛 :𝜇

Page 68: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

𝑇𝑜𝑝𝑒𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟𝑠 :𝑢1,…𝑢𝑘

The result:

Page 69: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Problem1: Choosing the Dimension K

K NMi =

eigenvalues

• How many eigenfaces to use?• Look at the decay of the eigenvalues

– the eigenvalue tells you the amount of variance “in the direction” of that eigenface

– ignore eigenfaces with low variance

Example: For N = 1024 x 1024 pixels, Size of A will be 1048576 x 1048576 ! Number of eigenvectors will be 1048576!

Page 70: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Choosing the Dimension K- Example

Page 71: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Problem2: Size of Covariance Matrix C• Suppose each data point is -dimensional ( pixels)

– The size of covariance matrix C is – The number of eigenfaces is M’

– Example: For N = 1024 x 1024 pixels, Size of A will be 1048576 x 1048576 ! Number of eigenvectors will be 1048576 !

Typically, only 20-30 eigenvectors suffice. So, this method is very inefficient!

Page 72: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

For every eigenvector u with eigenvalues (AAT) u = u =>AT(AAT) u = AT(u)

=>AT(AAT) u = (ATu)=> (ATA)(ATu) = (ATu)=>v=ATuis the eigenvector ofATA.

Efficient Computation of Eigenvectors

Find u from v:v=Atu => Av=AAtu=u => u=(1/)Av

Page 73: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Eigenfaces – summary in words

• Eigenfaces are the eigenvectors of the covariance matrix of the probability distribution of the vector space of human faces

• Eigenfaces are the ‘standardized face ingredients’ derived from the statistical analysis of many pictures of human faces

• A human face may be considered to be a combination of these standardized faces

Page 74: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Eigenfaces-key ideas

• Find basis vectors that describe the face space without losing a lot of data.

• Use them to detect faces

Page 75: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Projecting onto the Eigenfaces

• The eigenfaces v1, ..., vK span the space of faces

– A face is converted to eigenface coordinates by

,….,

Page 76: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Projecting onto the Eigenfaces

Page 77: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Detection with Eigenfaces

+……+

¿∨𝑥− �̂�∨¿<𝑡 h𝑟𝑒𝑠 h𝑜𝑙𝑑

Page 78: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Option 1: Use the trainning set to check how much data lost.

• Option 2: Use validation data.

Problem: determination good threshold

Page 79: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Advantages of the Approach:

• Fast

• Simple

• Learning skill

• Robust to small change in the face

Page 80: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Limitations of Eigenfaces Approach

• Variations in lighting conditions– Different lighting conditions for

enrolment and query. – Bright light causing image

saturation.

• Light changes degrade performance (not Drastically)− Light normalization helps.

Page 81: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

• Performance decreases quickly with changes to face size− Multi-scale eigenspaces.− Scale input image to multiple sizes.

• Performance decreases with changes to face orientation (but not as fast as with scale changes)

− Plane rotations are easier to handle.− Out-of-plane rotations are more difficult to handle.

Limitations of Eigenfaces Approach

Page 82: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Limitations of Eigenfaces Approach• Not robust to misalignment

Page 84: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.
Page 85: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Reconstruction using the eigenfaces

Page 87: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Viola Jones Algorithm for Face Detection

Paul Viola & Michael J. Jones

First publication (2001)Revised and improved (2004)

Page 88: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Viola – Jones Algorithm

• Object-detection algorithm– Feature-based– Today: example for implementation of face detector– Face detection was the motivation for the algorithm

• First real-time face detector– Speed is very important!

• Implemented in OpenCV library– Improved version of what we will see today

Page 89: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Features – what are they?

• How many features are needed to indicate the existence of a face?

• All faces share some common features:– The eyes region is darker than the

upper-cheeks.– The nose bridge region is brighter

than the eyes.– That is useful domain knowledge

• How can we encode such domain features?

Page 90: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Rectangle Features (or Haar-like Features)

• We will look for features inside 24x24 pixels window

• Each feature contains black & white rectangles,

the feature value is defined by:

• ∑ (pixels in white area) – ∑ (pixels in black area)

• Other explanation: correlation of the image with a mask with 1 in pixels of white areas, and -1 in pixels of black areas 0 0 0 0 0

0 0 -1 1 0

0 0 1 -1 0

0 0 0 0 0

Page 91: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Rectangle Features (Haar Features)

• The basic 4 features are on the right. – All other features are

obtained by change of orientation and/or change of scale of those 4

Page 92: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Rectangle Features (Haar Features)

• For a 24x24 detection region, the number of possible rectangle features is ~160,000

Page 93: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Rectangle Features (Haar Features)

• Some features correspond to common facial features. Examples:

Page 94: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Challenges

1) Feature Computation – as fast as possible

2) Feature Selection – too many features, need to select the most informative ones

3) Real-timeliness – focus mainly on potentially positive image areas (potentially faces)

Page 95: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Rectangles Feature Evaluation

• Feature evaluation is one of the basic operations in Viola & Jones algorithm which use it a lot

• Naively summing each rectangle is not practical

• We must find a way to evaluate features fast

Page 96: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image• Definition:

– The integral image at location (x,y) is the sum of the pixels above and to the left of (x,y), inclusive.

• The integral image can be computed in a single pass.

)x,y(

' , '

formal definition:

, ', '

Recursive definition:

, , 1 ,

, 1, ,

x x y y

ii x y i x y

s x y s x y i x y

ii x y ii x y s x y

s(x,y) = sum of pixels in row x, columns 1…y

i(x,y) is the imageii(x,y) is its integral image

Page 97: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Computing the Integral Image

ii(x, y-1)

s(x-1, y)

i(x, y)

' , '

formal definition:

, ', '

Recursive definition:

, , 1 ,

, 1, ,

x x y y

ii x y i x y

s x y s x y i x y

ii x y ii x y s x y

s(x,y) = sum of pixels in row x, columns 1…y

i(x,y) is the imageii(x,y) is its integral image

Page 98: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)

Page 99: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)

Page 100: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)

Page 101: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)

-

Page 102: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Integral Image - Motivation• Using the values of the Integral Image

we can compute any rectangular sum (e.g white part of a feature) in a constant time:– Example: the sum of rectangle D

can be computed with:

ii(d) – ii(b) – ii(c) + ii(a)

• Result: Rapid feature evaluation!– two-, three- and four-rectangular

features can be computed with 6, 8 and 9 array accesses respectively.

ii(a) = A

ii(b) = A+B

ii(c) = A+C

ii(d) = A+B+C+D

D = ii(d)+ii(a)-ii(b)-ii(c)

Page 103: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Feature Evaluation Using Integral Image

• ∑ (pixels in white area) - ∑ (pixels in black area)

Example:

-1 +1+2-1

-2+1

Integral Image

A BC DE F

Black square = D – B – C + AWhite square = F – D – E + C

White - Black = -A+B+2C-2D-E+F

Page 104: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Our achievements – so far

1) Feature Computation – as fast as possible

2) Feature Selection – select the most informative features

3) Real-timeliness: focus mainly on potentially positive image areas (potentially faces)

Page 105: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Feature Selection

• The problem: too many features– In a 24x24 sub-window

there are ~160,000 possible features

– Impractical to evaluate all of the features in every candidate sub-window

• The solution: select the most informative features

– How? AdaBoost

Page 106: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost Algorithm

• Introduced by Yoav Freund & Robert E. Schapire in 1995– Received Gödel Prize in 2003 for their work

• It is a machine-learning algorithm• Stands for Adaptive Boost

• AdaBoost is used to improve learning algorithms– Combines several “weak“ learners into a

“strong“ one

Page 107: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost Algorithm

• Main idea: create a strong classifier by a linear combination of weighted simple weak classifiers.

Strong classifier

Weak classifier

WeightImage

Page 108: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost – Intro• What are our “weak” classifiers?

– Each single rectangle feature is regarded as a “weak” classifier.

• It’s an iterative algorithm– Iteratively choose the best “weak” classifiers.– Tweak each “weak” classifier in favor of instances that were

misclassified by previous “weak” classifiers (thus Adaptive)

• What about the weights? Learning!

Page 109: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

The weak classifiers

• A weak classifier hj(x) consists of a feature fj, a threshold j, and a parity pj indicating the direction of inequality sign:

x is a 24-by-24 sub-window of an image

otherwise0

)(if1)(

jjjj

θpxfxh

Page 110: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost – How it works

• Given training set: – Xi is an 24x24 image, Yi is it’s label –

face/non-face– Each Xi has a weight

• Initially, all weights are equal

– This weights will be used to force the chosen “weak” classifiers to focus on unrepresented data in the training set

Page 111: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost – How it works

Given: example images labeled +/- Initially, all weights set equally

Repeat T times: (T – number of “weak” classifiers we want) Step 1: choose the most efficient weak classifier that will be a

component of the final strong classifier

Step 2: Update the weights of the dataset images to emphasize the examples from the training set which were incorrectly classified This makes the next weak classifier to focus on “harder”

examples

Page 112: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost – Feature Selection

• Step 1 is slow – there is a large set of possible weak classifiers to check (each “weak” classifier is in fact a single feature)

• Which feature to choose?– Choose the most informative one

• Test each “weak” classifier on the weighted training set• Choose the “weak” classifier with the best detection rates

Page 113: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost – Feature Selection

• Finally we get a “strong” classifier that is a weighted combination of the best T “weak” classifiers– Weight of each classifier depends on its detection rate on the

training set– Higher weight for better classifier

Each is a “weak” classifier, is its weighth is the “strong” classifierx is a 24x24 image

otherwise

xxhT

t

T

t ttth0

2

1)(1)( 1 1

Page 114: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

Weak Classifier 1

Slide source

Page 115: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

WeightsIncreased

Slide source

Page 116: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

Weak Classifier 2

Slide source

Page 117: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

WeightsIncreased

Slide source

Page 118: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

Weak Classifier 3

Slide source

Page 119: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Boosting illustration

Final classifier is a combination of weak

classifiers

Slide source

Page 120: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

AdaBoost - Conclusion

• AdaBoost selects a small set of informative features and use them to build a strong classifier

Page 121: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Feature Selection

• Top two features weighted by AdaBoost:

(specific to the training dataset that Viola & Jones used in their experiment)

Page 122: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Our achievements – so far

1) Feature Computation – as fast as possible

2) Feature Selection – select the most informative features

3) Real-timeliness: focus mainly on potentially positive image areas (potentially faces)

Page 123: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Real-timeliness

• On average only 0.01% of all sub-windows in a image are positives (faces)

• We spend time equally on negative & positive windows

• Can we spend less time on non-faces?

Page 124: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Real-timeliness

• Attentional Cascade is the answer!– The idea: Cascade classifiers with gradually

increased complexity• An instance will get to layer 10, only if he passed

layers 1-9• 1st layer will use, say 2-3 features, to filter out

easy-to-find negative windows (non-faces)• 2nd layer will use, say 10 features, to filter out more

challenging negatives• And so on…

– Each layer will be a “strong” classifier obtained using AdaBoost

Page 125: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Cascading classifiers

Page 126: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Training a cascade

• First, we should decide:– How many layers? (strong classifiers)– How many features in each layer?– Threshold of each strong classifier?

• What is the optimal combination?– This is a complex problem

Page 127: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Training a cascade• Finding the optimum is not practical, any

workaround?

• Viola & Jones goal: no worse than 95% TP rate, 10-6 FP rate– Viola & Jones suggested an algorithm that:

• does not guarantee optimality, but• able to generate a cascade that meets their goal

Page 128: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Training a cascade - outline

• User selects:– fi (Maximum Acceptable False Positive rate / layer)

– di (Minimum Acceptable True Positive rate / layer)

– Ftarget (Target Overall FP rate)

– Trial & error process until target rates are met

• Until Ftarget is met:– Add new layer to the cascade: Until fi , di rates are met for this layer

Increase feature number & train new strong classifier with AdaBoost

Determine rates of the updated layer on training set

Page 129: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Our achievements – so far

1) Feature Computation – as fast as possible

2) Feature Selection – select the most informative features

3) Real-timeliness: focus mainly on potentially positive image areas (potentially faces)

Page 130: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

TrainingSet

(sub-windows)

IntegralRepresentation

Featurecomputation

AdaBoostFeature Selection

Cascade trainerTesting phaseTraining phase

Strong Classifier 1(cascade stage 1)

Strong Classifier N(cascade stage N)

Classifier cascade framework

Strong Classifier 2(cascade stage 2)

FACE IDENTIFIED

Viola & Jones Algorithm - Visualization

Page 131: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Viola & Jones Algorithm visualized

VIOLA & JONES ALGORITHM O PENCV IMPLEMENTAION VISU

ALIZATION

Page 132: Face Detection Lecturers: Mor Yakobovits Roni Karlikar Supervisor: Hagit Hel-Or.

Viola & Jones Algorithm

Detector 10 31 50 65 78 95 167 422Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9% 94.1%Rowley-Baluja-Kanade 83.2% 86.0% - - 89.2% 89.2% 90.1% 89.9%Schneiderman-Kanade - - - 94.4% - - - -

False detections

Viola & Jones prepared their final Detector cascade: 38 layers, 6060 total features included 1st classifier- layer, 2-features

50% FP rate, 99.9% TP rate 2nd classifier- layer, 10-features

20% FP rate, 99.9% TP rate next 2 layers 25-features each, next 3 layers 50-features each and so on…

Tested on the MIT+MCU test set a 384x288 pixel image on an PC (dated 2001) took about 0.067 seconds

Detection rates for various numbers of false positives on the MIT+MCU test set containing 130 images and 507 faces (Viola & Jones 2002)