Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.

Post on 21-Dec-2015

220 views 5 download

Tags:

Transcript of Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.

Object Recognizing

We will discuss:

• Features

• Classifiers

• Example ‘winning’ system

Object Classes

Class Non-class

Class Non-class

Features and Classifiers

Same features with different classifiersSame classifier with different features

Generic Features

Simple (wavelets) Complex (Geons)

Class-specific Features: Common Building Blocks

Optimal Class Components?

• Large features are too rare

• Small features are found

everywhere

Find features that carry the highest amount of information

Entropy

Entropy:

x = 0 1 H

p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08

)p(x log )p(x- H i2i

Mutual Information I(C,F)

Class:11010100

Feature:10011100

I(F,C) = H(C) – H(C|F)

Optimal classification features

• Theoretically: maximizing delivered information minimizes classification error

• In practice: informative object components can be identified in training images

Mutual Info vs. Threshold

0.00 20.00 40.00

Detection threshold

Mu

tu

al

Info

forehead

hairline

mouth

eye

nose

nosebridge

long_hairline

chin

twoeyes

Selecting Fragments

Adding a New Fragment(max-min selection)

?

MIΔ

MI = MI [Δ ;class] - MI [ ;class ]Select: Maxi Mink ΔMI (Fi, Fk)

)Min. over existing fragments, Max. over the entire pool(

);(),;(min);(),;( jjiij

i FCMIFFCMIFCMIFFCMI

Highly Informative Face Fragments

Horse-class features

Car-class features

Pictorial features Learned from examples

Fragments with positions

On all detected fragments within their regions

Star model

Detected fragments ‘vote’ for the center location

Find location with maximal vote

Bag of words

ObjectObject Bag of ‘words’Bag of ‘words’

Bag of visual words A large collection of image patches

1.Feature detection 1.Feature detection and representationand representation

•Regular grid– & VogelSchiele ,2003

–Fei- ,Fei & Perona2005

Each class has its words historgram

SVM – linear separation in feature space

Optimal Separation

SVMPerceptron

Find a separating plane such that the closest points are as far as possible

Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w|Margin: 2/|w|

0+1

-1 The Margin

Max Margin Classification

)Equivalently, usually used

How to solve such constraint optimization ?

The examples are vectors xi

The labels yi are +1 for class, -1 for non-class

Using Lagrange multipliers: Minimize LP =

With αi > 0 the Lagrange multipliers

Minimize Lp :

Set all derivatives to 0:

Also for the αi

Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above conditions. Put into Lp

Dual formulationMathematically equivalent formulation: Can maximize the Lagrangian with respect to the αi

After manipulations – nice concise optimization :

SVM: in simple matrix form

We first find the α. From this we can find: w, b, and the support vectors.

The matrix H is a simple ‘data matrix’: Hij = yiyj <xi∙xj>

Final classification: w∙x + b ∑αi yi <xi x> + b

Because w = ∑αi yi xi Only <xi x> with support vectors are used

Full story – separable case

Or use ∑αi yi <xi x> + b

Quadratic Programming QP

Minimize (with respect to x)

Subject to one or more constraints of the form:

Ax < b (inequality constraints)Ex = d (equality constraints)

The non-separable case

It turns out that we can get a very similar formulation of the problem and solution, if we penalize the incorrect classification in a certain way. The penalty is Cξi where ξi ≥ 0is the distance of the miss-classified point from the respective plane. We now minimize a penalty with the miss-classifications:

Kernel Classification

Using kernels

A kernel K(x,x’) is also associated with a mapping x → φ(x) We can use φ(x) and perform a linear classification in the target

space .

It turns out that this can be done directly using kernels and without the mapping, the results are equivalent. The optimal separation in the target space is the same as what we will get using the procedure below. It is similar to the linear case, with

the kernel replacing the dot-product .

Use K(xi, xj)

Use ∑αi yi K<xi x> + b

Summary points

• Linear separation with the largest margin, f(x) = w∙x + b

• Dual formulation, f(x) = ∑αi yi (xi ∙ x) + b

• Natural extension to non-separable classes

• Extension through kernels, f(x) = ∑αi yi K(xi x) + b

Felzenszwalb et al .

• Felzenszwalb, McAllester, Ramanan CVPR 2008. A Discriminatively Trained, Multiscale, Deformable Part Model

Object model using HoG

A bicycle and its ‘root filter ’The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations

Using patches with HoG descriptors and classification by SVM

The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale

Dealing with scale: multi-scale analysis

A part Pi = (Fi, vi, si, ai, bi) .

Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box

ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2

Adding Parts

Bicycle model: root, parts, spatial map

Person model

The full score of a potential match is:  ∑ Fi ∙ Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2  

Fi ∙ Hi is the appearance part

xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part.

Match Score

The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image:

Score = β∙ψ

Using the vectors ψ to train an SVM classifier :β∙ψ > 1 for class examples

β∙ψ < 1 for class examples

Using SVM:

β∙ψ > 1 for class examples β∙ψ < 1 for class examples

However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi

 

We need to take the best ψ over all placements. In their notation :Classification then uses β∙f > 1

We need to take the best ψ over all placements. In their notation :

Classification then uses β∙f > 1

In analogy to classical SVMs we would like to train from labeled examples D = (<x1, y1i> . . . , <xn, yn>) By optimizing the following objective function,

Finding β, SVM training:

search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD.

With the optimal placement, use

β∙ψ > 1 for class examples β∙ψ < 1 for class examples

Recognition

• Training -- positive examples with bounding boxes around the objects, and negative examples.

• Learn root filter using SVM

• Define fixed number of parts, at locations of high energy in the root filter HoG

• Use these to start the iterative learning

Hard Negatives

The set M of hard-negatives for a known β and data set DThese are support vector (y ∙ f =1) or misses (y ∙ f < 1)

Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)

Correct person detections

Difficult images, medium results. About 0.5 precision at 0.5 recall

All images contain at least 1 bird

Average precision :Roughly, AP of 0.3 – in a test with 1000 class images, out of the top 1000 detection, 300 will be true class examples (recall =

precision = 0.3) .

Future Directions

• Dealing with very large number of classes – Imagenet, 15,000 categories, 12 million images

• To consider: human-level performance for at least one class