Vincent Lepetit - Real-time computer vision

Real-Time Computer Vision

Microsoft Computer Vision School

Vincent Lepetit - CVLab - EPFL (Lausanne, Switzerland)

applications

• How the demo works (including Randomized Trees);

• More recent work.

Background• 3D world to 2D images (projection matrix,

internal parameters, external parameters, homography, ...);

• Robust estimation (non-linear least-squares, RANSAC, robust estimators, ...);

• Feature point matching (affine region detectors, SIFT, ...).

From the 3D World to a 2D Image

World coordinate system

What is the relation between the 3D coordinates of a point M and its correspondent m in the image captured by the camera ?

Perspective Projection

Camera center

The image formation is modeled as a perspective projection, which is realistic for standard cameras:

The rays passing through a 3D point M and its correspondent m in the image all intersect at a single point C, the camera center.

Expressing M in the Camera Coordinates System

Camera coordinate systemX

Step 1: Express the coordinates of M in the camera coordinates system as Mcam.

This transformation corresponds to a Euclidean displacement (a rotation plus a translation):

Mcam = RM + Twhere: R is a 3x3 rotation matrix, and T is a 3- vector.

Homogeneous Coordinates

Lets replace by the 4- homogeneous vector : Just add a 1 as the fourth coordinate.

Now, the Euclidean displacement can be expressed as an linear transformation instead of an affine one:

Camera coordinate systemX

⎜ ⎜ ⎜

⎟ ⎟ ⎟ → ˜ M =

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Mcam = RM + T →

⎜ ⎜ ⎜

⎟ ⎟ ⎟

= RXYZ

⎜ ⎜ ⎜

⎟ ⎟ ⎟

+ T →

⎜ ⎜ ⎜

⎟ ⎟ ⎟

= R | T( )

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

→ Mcam = R | T( ) ˜ M

(R | T ) is a 3x4 matrix.9

Projection

Computation of the coordinates of m in the image plane, from Mcam (expressed in the camera coordinates system): Simply use Thales' theorem:

Camera coordinate system

→ mX = f XZ

From Projection to ImageCoordinates of m in pixels ?

Camera coordinate system

mX = f XZ, mY = f Y

Image coordinate system u0

1 pixel

mu = u0 + kumX , mv = v0 + kvmY

mX = f XZ, mY = f Y

mu = u0 + kumX , mv = v0 + kvmY

In matrix form :uvw

⎜ ⎜ ⎜

⎟ ⎟ ⎟

ku f 0 u00 kv f v00 0 1

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜

⎟ ⎟ ⎟ defines m in homogeneous coordinates

→mu =

= u0 + ku fXZ

mv =vw

= v0 + kv fYZ

⎨ ⎪

⎩ ⎪

Putting • the perspective projection and• the transformation into pixel coordinatestogether:

The Full TransformationThe two transformations are chained to form the full transformation from a

3D point in the world coordinate system to its projection in the image:

The product of the internal calibration matrix and the external calibration matrix is a 3x4 matrix called the "projection matrix".

The projection matrix is defined up to a scale factor.

⎜ ⎜ ⎜

⎟ ⎟ ⎟

ku f 0 u00 kv f v00 0 1

⎜ ⎜ ⎜

⎟ ⎟ ⎟

R11 R13 R13 T1R21 R22 R23 T2R31 R32 R33 T3

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

P11 P12 P13 P14P21 P22 P23 P24P31 P32 P33 P34

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

projection matrix

The Full Transformation

R, T, and the products kuf and kvf can be extracted from the projection matrix.

⎜ ⎜ ⎜

⎟ ⎟ ⎟

ku f 0 u00 kv f v00 0 1

⎜ ⎜ ⎜

⎟ ⎟ ⎟

R11 R13 R13 T1R21 R22 R23 T2R31 R32 R33 T3

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

P11 P12 P13 P14P21 P22 P23 P24P31 P32 P33 P34

⎜ ⎜ ⎜

⎟ ⎟ ⎟

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

projection matrix

Homography

m� = PM = [P1P2P3P4]

= [P1P2P3P4]

= [P1P2P4]

= H3×3m

m�M/mH3×3

Computing a Projection Matrix or a Homography from Point Correspondences

by solving a linear system

�u v 1 0 0 0 uu� vu� u�

0 0 0 u v 1 uv� vv� v�

m� = Hm

m = [u, v, 1]�,m� = [u�, v�, 1]�

• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels)

• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient).

minR,T

dist2�HR,Tmi,m

Computing a Projection Matrix or a Homography from Point Correspondences with a non-linear optimization

minR,T

dist2�PR,TMi,m�

MHR,Tmm�

A Look to the Reprojection Error

reprojection error

1D camera under 2D translation

100 "3D points" taken at randomly in

[400;1000]x[-500;+500]

True camera position at (0, 0)

Gaussian Noise on the ProjectionsWhite cross: true camera position;Black cross: global minimum of the objective function.

In that case, the global minimum of the objective function is close to the true camera pose.

What if there are Outliers ?

incorrect measure (outlier)

Gaussian Noise on the Projections + 20% outliers

White cross: true camera position;Black cross: global minimum of the objective function.

The global minimum is now far from the true camera pose.

What Happened ?

The error on the 2D point locations mi is assumed to have a Gaussian (Normal) distribution with identical covariance matrices σI, and independent;

This assumption is violated when mi is an outlier.

Bayesian interpretation:

argminR,T

�i dist2

�PR,Tmi,m�

= argmaxR,T

�i N

�m�

i;PR,Tmi,σI�

Robust estimationIdea: Replace the Normal distribution by a more suitable distribution, or

equivalently replace the least-squares estimator by a "robust estimator" or “M-estimator”:

argminR,T

�i dist2

�PR,Tmi,m�

→ argminR,T

�i ρ

�dist

�PR,Tmi,m�

��

Example of an M-estimator:The Tukey Estimator

The Tukey estimator assumes the measures follow a distribution that is a mixture of:• a Normal distribution, for the inliers,• a uniform distribution, for the outliers.

�if |x| ≤ c ρ(x) = c2

6 (1− (1− (xc )2)3)

if |x| > c ρ(x) = c2

Normal distribution(inliers)

Uniform distribution(outliers)

Tukey estimator

-log(.)-log(.)

Least-squares

Mixture

Gaussian Noise on the Projections + 20% outliers + Tukey estimator

White cross: true camera position;Black cross: global minimum of the object function.

The global minimum is very close to the true camera pose.BUT: - local minimums;- the objective function is flat where all the correspondences are considered outliers.

Gaussian Noise on the Projections + 50% outliers + Tukey estimator

Even more local minimums.Numerical optimization can get trapped into a local minimum.

RANSAC

How to Optimize ?

Idea: sampling the space of solutions (the camera pose space here):

How to Optimize ?

Idea: sampling the space of solutions:

+ Numerical Optimization from the best sampled pose.

Problem: Exhaustive regular sampling is too expensive in 6 dimensions.Can we do a smarter sampling ?

RANSACRANSAC: RANdom SAmple Consensus

Line fitting: the "Throwing Out the worst residual" heuristics can fail (Example for the original paper [Fischler81]):

outlier

final least-squares solution

Ideal line

RANSACAs before, we could do a regular sampling, but would not be optimal:

Ideal line

Generate hypotheses from subsets of the measurements.If a subset contains no gross errors, the estimated parameters (the hypothesis) are closed

to the true ones.

Take several subsets at random, retain the best one.

Ideal line

The quality of a hypothesis is evaluated by the number of measures that lie "close enough" to the predicted line.

We need to choose a threshold (T) to decide if the measure is "close enough". RANSAC returns the best hypothesis, ie the hypothesis with the largest number of

inliers.

1 if dist(mi,line p( )) ≤ T0 if dist(mi,line p( )) > T⎧ ⎨ ⎩ i

RANSAC for HomographiesTo apply RANSAC to homography estimation, we need a way to compute a

homography from a subset of measurements:

Since RANSAC only provides a solution estimated with a limited number of data, it must be followed by a robust minimization to refine the solution.

�u v 1 0 0 0 uu� vu� u�

0 0 0 u v 1 uv� vv� v�

How to Get the Correspondences ?

• Extract Feature Points / Keypoints / Regions (Harris corner detector, extrema of Laplacian, affine region detectors, ...);

• standard approach: Match them based on Euclidean distances between descriptors such as SIFT, SURF, ...

Affine Region Detectors

Hessian-Affine detector MSER detector

Affine Normalization

Warp by M11/2 Warp by M2

We still have to correct for the orientation !

Select Canonical Orientation• Create histogram of local gradient directions computed over the image patch;• Each gradient contributes for its norm, weighted by its distance to patch center;• Assign canonical orientation at peak of smoothed histogram.

Select Canonical Orientation

Description Vector

SIFT Description VectorMade of local histograms of gradients:

In practice: 8 orientations x 4 x 4 histograms = 128 dimensions vector.Normalised to be robust to light changes.

Matching Regions

Matching: Approximate Nearest Neighbour

Best-Bin-First: Approximate nearest-neighbour search in k-d tree

Keypoint Matching

Pre-processingMake the actual classification easier

Nearest neighbor classification

The standard approach is a particular case of classification:

Search in the Database

Idea: let’s try another classification method!

One Class per KeypointOne class per keypoint: the set of the keypoint’s possible appearances

under various perspective, lighting, noise...

class 1

class 2

Training phase

Classifier

Classifierclass 1

class 1

class 2

Run-Time

Which Classifier ?We want a classifier that:

• can handle many classes;• is very fast;• has reasonable recognition performances (a

very high recognition rate is not an necessary requirement).

Which Classifier ?• Randomized Trees [Amit & Geman, 1997];• Random forests [Breiman, 2001].

An (Ideal) Single Tree

binary test

class #

How to Build the Tree ?

binary test ?

training set

binary test ?

training set

found by minimizing the entropy after the test:

Sright

argmintest

|Sleft||S| Entropy(Sleft) + |Sright|

|S| Entropy(Sright)

binary test

training set

Problem: runs quickly out of training samples for the deeper tests

Idea: Use Several Sub-Optimal TreesEach tree is trained with a random subset of the training set.

Idea: Use Several Sub-Optimal Trees

The leaves contain the probabilities over the classes, computed from the training set.

Classification with Several Sub-Optimal TreesThe test sample is dropped into each tree, and the probabilities in the leaves it reached are averaged:

+ + ) = (13

Visual InterpretationEach tree partitions the space in a different way and compute the probability of each class for each cell of the partition:

Visual InterpretationCombining the trees gives a fine partition with a better estimate of the class probabilities:

For PatchesPossible tests: compare the intensities of two pixels around the keypoint after Gaussian smoothing:

• Very efficient to compute;• Invariant to light change by any raising function.

�1 if I(m + dmi,1) ≤ I(m + dmi,2)0 otherwise

m + dmi,1

m + dmi,2 I : image after Gaussian smoothing

Results

Randomized Trees (and Random Ferns) applied to image patches are becoming a powerful tool for Computer Vision.

[Shotton et al, CVPR’11]

Used to infer body parts in the Kinect body tracking system.

The tests rely on the depth map.

Tests in [Shotton et al, CVPR’11]Classes are the body parts. The goal is to label each pixel with the label of the part it belongs to.

Tests compare the depth of two pixels around the considered pixel.

The displacements are normalized by the depth of the considered pixel for invariance:

fi(m) =�

1 if depth(m + dm1depth(m) ) ≤ depth(m + dm2

depth(m) )0 otherwise

3D Pose EstimationMean-Shift is used to find the joint locations from the body parts.

Training

“Training 3 trees to depth 20 from 1 million images takes about 1 day on a 1000 core cluster” [Shotton et al, CVPR’11]

Most of the training data is synthetic:

A SubtreeAverage of the patches that reach this node

[Gall et Lempitsky, CVPR’09; Barinova et al, CVPR’10]

Hough Forest for Object Detection:• Random Forests used to make each patch vote for the object centroid;

• The tests compare the output of filters and histograms-of-gradient between 2 pixels;• The leaves contain the displacement toward the object center.

Accumulated votes from all patches

Final detectionEach patch votes for the object centroid

Votes from the 3 patches

Tests used in [Gall et Lempitsky, CVPR’09]

Channels: the 3 color channels, absolute values of the first and second derivatives of the image, and 9 channels from HoG (Histograms-of-Gradients).

fi(m) =�

1 if channeli(m + dm1) < channeli(m + dm2) + τ0 otherwise

[Bosch et al, ICCV’07]Image Classification using Random Forests and Ferns [Bosch et al, ICCV’07]Use a sliding window to detect objects.Much faster than SVMs, recognition performances similar.

[Bosch et al, ICCV’07]

Tests:

n and b: random vector and scalar.xm: vector computed from a Pyramidal Histogram-of-Gradients.

fi(m) =�

1 if n�xm + b ≤ 00 otherwise

[Kalal et al, CVPR’10]TLD (aka Predator), for Track, Learn, Detect:

• Random Ferns used to speed up detection;

• Trained online: the distributions in the leaves are updated online, using the incoming images.

[Kalal et al, CVPR’10]• Tests: 2bit binary patterns• Trained online: the distributions in the leaves are updated online, using the

incoming images.

Random Ferns: A Simplified Tree-Like Classifier

For Keypoint Recognition, We Can Use Random Tests!

Number of trees

Recognition rate

Comparison of the recognition rates for 200 keypoints:

tests selected by minimizing entropy

tests with random locations

We can use random tests • For a small number of classes

– we can try several tests, and– retain the best one according to some criterion.

We can use random tests• For a small number of classes

– we can try several tests, and– retain the best one according to some criterion.

• When the number of classes is large– any test does a decent job:

Why it is Interesting

• Building the trees takes no time (we still have to estimate the posterior probabilities);

• Allows incremental learning;

• Simplifies the classifier structure.

The Tree Structure is not Needed

Results of pixel comparisons (0 or 1) Class Label

The distributions can be expressed simply, as:

Compromise:

which is proportional to

but complete representation of the joint distribution infeasible.

Naive Bayesian ignores the correlation:

We are looking for

argmaxi

P(C = ci patch)

If patch can be represented by a set of image features { fi }:

P(C = ci patch) = P(C = ci f1, f2,… fn, fn+1,… … fN )

Training

Training0

Training1

Training

Training Results

Normalize:

P( f1, f2,…, fn |C = ci)000001

∑ =1

Training Results

Normalize:

P( f1, f2,…, fn |C = ci)000001

∑ =1

Recognition

Normalization

Normalize:

P( f1, f2,…, fn |C = ci)000001

∑ =1

Subtlety with Normalizationpleaf, class =

Number of samples(leaf, class)Number of samples(class)

too selective:Number of samples(leaf, class) can be 0 simply because the training set is finite.

we use:pleaf, class =

Number of samples(leaf, class)+NregularizationNumber of samples(class)+Number of leaves×Nregularization

This can be done by simply initializing the counters to Nregularization instead of 0.

Influence of Nregularization

pleaf, class =Number of samples(leaf, class)+Nregularization

Number of samples(class)+Number of leaves×Nregularization

Nregularization (log scale)

Recognition rate

Implementation of Feature Point Recognition with Ferns

1: for(int i = 0; i < H; i++) P[i] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }

• Very simple to implement;• No need for orientation, perspective, light correction.

Number of inliers for Ferns

Number of inliers for SIFT

each point corresponds to an image from a 1000-frame sequence

Ferns are much faster, sometimes more accurate, but SIFT does not need training.

Ferns versus SIFT

Randomized Trees vs Ferns

Ferns more discriminant but more sensitive to outliers.

Ferns with productRT (with random tests) with product

Ferns with averageRT (with random tests) with average

Different combination strategies: average (RT) / product (Ferns)

Number of structures

Randomized Trees vs FernsInfluence of the number of classes:

Ferns with product

Ferns with averageRec

Memory and Computation Time

• Recognition time grows linearly with the number of Trees/Ferns and the number of classes.

• Recognition time grows linearly with the logarithm of the depth of Trees/Ferns.

• Memory grows linearly with the number of Trees/Ferns and the number of classes.

• Memory grows exponentially with the depth of Trees/Ferns.

• Increasing the depth may result in overfitting.• Increasing the number of Trees/Ferns (usually) improves

recognition.

Influence of the Number of FernsFerns with productRT (with random tests) with product

Ferns with averageRT (with random tests) with average

Number of structures

Increasing the number of Ferns/Trees improves the recognition rate, but increases the computation time and memory.

Number of Ferns / Number of Leaves / Memory / Computation Time

Fern size

Number of Ferns

eFern size

Conclusions on Randomized Trees and Ferns

• Simple to implement, Ferns even simpler;

• Both very fast, but dumb: need a lot of training examples to learn.

• Use a lot of memory to store the posterior distributions in the leaves.

We now have correspondences between a reference image of the object and the input image:

Some correspondences are correct, some are not.We can estimate the homography between the 2 images by applying RANSAC on subsets of 4 correspondences.

Computing a Homography from Point Correspondencesby solving a linear system

�u v 1 0 0 0 uu� vu� u�

0 0 0 u v 1 uv� vv� v�

�m� = H �m�m = [u, v, 1]�, �m� = [ku�, kv�, k]�

u� = H11u+H12u+H13H31u+H32u+H33

v� = H21u+H22u+H23H31u+H32u+H33

u� = H11u+H12u+H13H31u+H32u+H33

v� = H21u+H22u+H23H31u+H32u+H33

�u v 1 0 0 0 uu� vu� u�

0 0 0 u v 1 uv� vv� v�

Using four correspondences:BX = 08

with X = [H11,H12,H13,H21,H22,H23,H31,H32,H33]�

How to Solve this Linear System ?

• X is the null eigenvector of B.

• In practice: the eigenvector corresponding to the smallest eigenvalue.

BX = 08

with X = [H11,H12,H13,H21,H22,H23,H31,H32,H33]�

• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels)

• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient).

Computing a Homography from Point Correspondences with a non-linear optimization

minR,T

dist2�HR,Tmi,m

m�mHR,Tm

Numerical Optimization

Start from an initial guess p0:

p0 can be taken randomly but should be as close as possible to the global minimum:

- pose computed at time t-1;- pose predicted from pose computed at time t-1 and a motion model;- ...

Numerical OptimizationGeneral methods:• Gradient descent / Steepest Descent;• Conjugate Gradient;• ...

Non-linear Least-squares optimization:• Gauss-Newton;• Levenberg-Marquardt;• ...

Numerical OptimizationWe want to find p that minimizes:

where • p is a vector of parameters that define the camera pose (translation vector + parameters of the rotation matrix);• b is a vector made of the measurements (here the m’i);• f is the function that relates the camera pose to these measurements.

f(p) =

u(HR(p),T(p)m1)v(HR(p),T(p)m1)

u(m�

1)v(m�

E(p) =�

i dist2�HR(p),T(p)mi,m�

= �f(p)− b�2

Gradient descent / Steepest Descent

Weaknesses:- How to choose λ ? - Needs a lot of iterations in long and narrow valleys:

pi+1 = pi − λ∇E(pi)

E(pi) = f (pi) −b2

= f (pi) −b( )T f (pi) −b( )→∇E(pi) = 2J f (pi) −b( ) with J the Jacobian matrix of f , computed at pi

The Gauss-Newton and the Levenberg-Marquardt algorithms

But first, the Linear Least-Squares Case:

If the function f is linear ie f(p) = Ap, p can be estimated as:

where A+ is the pseudo-inverse of A: A+=(ATA)-1AT€

E(p) = f (p) −b 2

Non-Linear Least-Squares: The Gauss-Newton algorithm

Iteration steps:

pi+1=pi + ∆i

∆i is chosen to minimize the residual || f(pi+1) – b ||2. It is computed by approximating f to the first order:

Δ i = argminΔ

f (pi + Δ) −b 2

= argminΔ

f (pi) + JΔ −b 2 First order approximation: f (pi + Δ) ≈ f (pi) + JΔ

= argminΔ

εi + JΔ 2εi = f (pi) −b denotes the residual at iteration i

Δ i is the solution of the system JΔ = −εi in the least − squares sense :Δ i = −J+εi where J+ is the pseudo - inverse of J

Non-Linear Least-Squares: The Levenberg-Marquardt Algorithm

In the Gauss-Newton algorithm:

In the Levenberg-Marquardt algorithm:

Levenberg-Marquardt Algorithm:

0. Initialize λ with a small value: λ = 0.001

1. Compute ∆i and E(pi + ∆i)

2. If E(pi + ∆i) > E(pi): λ ← 10 λ and go back to 1 [happens when the linear approximation of f is too coarse]

3. If E(pi + ∆i) < E(pi): λ ← λ / 10, pi+1 ← pi + ∆i and go back to 1.

Once converged, set λ ← 0 and continue up to convergence.

Δ i = − JTJ( )−1JTεi

Δ i = − JTJ + λI( )−1JTεi

Non-Linear Least-Squares: the Levenberg-Marquardt Algorithm

• When λ is small, LM behaves similarly to the Gauss-Newton algorithm.• When λ becomes large, LM behaves similarly to a steepest descent to guarantee

convergence.

Δ i = − JTJ + λI( )−1JTεi

Another Way to Refine the Pose:Template Matching

Global region tracking by minimizing cross-correlation: •Useful for objects difficult to model using local features;•Accurate.

Template T

Input Image Ip

Lucas-Kanade Algorithm

Gauss-Newton step:€

W (I,p)[m j ]−T[m j ]( )2

Δ i = Jp+ ⋅ εp,I

Template T

Input Image I

Pseudo-inverse of the Jacobian of W(I, p) evaluated at p and the mj

εp,I = (…,T[m j ]−W (I,p)[m j ],…)T

Template T

Lucas-Kanade Algorithm

Computing J and J+ is computationally expensive.

Inverse Compositional Algorithm[Baker et al. IJCV03]

Template T

Input Image It

pi = pi-1 + dpi

dpi = Jp= 0+ εp= 0,I

Jp=0 is a constant matrix and can therefore be precomputed !

-pi-1dpi

ESM (Efficient Second-order Method)(1) I = T + Jp=0dp + dpTHp=0dp [second-order Taylor expansion]

(2) Jp=dp = Jp=0 + 2dpTHp=0 [derivation of (1) wrt p]

(3) dpTHp=0 = ½(Jp=dp - Jp=0) [from Equation (2)]

(4) I = T + Jp=0 + ½(Jp=dp - Jp=0)dp [by injecting (3) in (1)]

(5) dp = [½(Jp=0 + Jp=dp)]+ (I - T) [from Equation (4)]

Like Gauss-Newton but replace Jp=0 by ½(Jp=0 + Jp=dp).Need to compute Jp=dp at each iteration, and a pseudo-inverse

at each iteration, but need much less iterations.122

BRIEF [ECCV’10]very fast feature point descriptor

Remark

• Moving legacy code to new CPUs does not result in a speed-up anymore;

• Should consider the features of new platforms: parallelism (multi-cores, GPU), locality, ...

BRIEF descriptor

Gaussian smoothing

BRIEF descriptor

Gaussian smoothing

Alternatively, using integral images:

Integral Images

Integral Image

Integral Image(u, v) =�

i=1..u

j=1..v

Image(i, j)

How to Use Integral Images

[Viola & Jones, IJCV’01]

Features computed in constant time

Computing Integral Images

IntegralImage[u][v] = IntegralImage[u][v-1] +LineBuffer[u] +Image[u][v]

Evaluation

Computation Speed

For BRIEF, most of the time is spent in Gaussian smoothing.

Matching Speed distance(BRIEF descriptor 1, BRIEF descriptor 2)

= Hamming distance(BRIEF descriptor 1, BRIEF descriptor 2)

= number of bits set to 1(BRIEF descriptor 1 xor BRIEF descriptor 2)

= popcount(BRIEF descriptor 1 xor BRIEF descriptor 2)

10- to 15-fold speed increase on Intel's Bloomfield (SSE 4.2) and AMD's Phenom (SSE 4a)

Matching Speed

Picking the Locations

uniform distribution Gaussian distribution Gaussian distribution for location and length

uniform distribution on Polar coordinates

census transform locations

Picking the Locations

uniform distribution Gaussian distribution Gaussian distribution for location and length

uniform distribution on Polar coordinates

census transform locations

Rotation and Scale Invariance

Rotation and Scale InvarianceDuplicate the Descriptors:18 rotations x 3 scales

code released in GPL on CVLab website

DOT [CVPR’10]dense descriptor for object detection

Joint work with Stefan Hinterstoisser (TU Munich)

Template matching with an efficient representation of the images and the templates.

object detection with a sliding window and template matching

Initial Similarity Measure

Making the Similarity Measure Robust to Small Motions

Downsampling

Ignoring the Dependencies between the Regions...

Lists of Dominant Orientations

Fast Computation with Bitwise Operations

0000110000010000

Code available under LGPL license athttp://campar.in.tum.de/personal/hinterst/index/

New Method, LINE[PAMI, under revision]

Initial Similarity Measure

ESteger(I,O, c) =�

�� cos�orientation(O, r)− orientation(I, c + r)

��

previous measure:153

Making the Similarity Measure Robust to Small Motions

ESteger(I,O, c) =�

�� cos�orientation(O, r)− orientation(I, c + r)

��

E(I,O, c) =�

�max

t∈region(c+r)

�� cos�orientation(O, r)− orientation(I, t)

��

Avoiding to Recompute the max Operator1. spread the gradients

2. precompute response mapsBecause• we consider only a discrete set of gradient directions;• we do not consider the gradient norms,we can precompute a response for each region in the image and a gradient direction for the template

in the template

Optimized Version

1. The sets of orientations in the image regions are encoded with a binary representation:

Optimized Version2. The binary representation is used as an index to lookup tables with the precomputed responses for each gradient direction in the template:

Avoiding Caches MissesThe response maps are re-arranged into linear memories:

Using the Linear Memories

The similarity measure can be computed for all the image locations by summing linear memories, shifted by an offset that depends on the template.

Advantage of Linearizing the Memory

Speed-up factor

DOT [CVPR’10]LINE

LINE-MOD [Hinterstoisser et al, ICCV’11]

Extension to the Kinect: the templates combine the image and the depth map.

thanks!

Vincent Lepetit - Real-time computer vision

Art & Photos

Transcript of Vincent Lepetit - Real-time computer vision

Sargent - Vincent Cichowicz Vincent DiMartino and Armando ...

A Presentation by Vincent & Vincent Companies

The Vision - Alexian Brothers Foundation · The Vision Winter 2015 Long-time supporters of the Alexian Brothers Foundation Vincent and Patricia Foglia are philanthropic leaders throughout

Obstacle Tower: A Generalization Challenge in Vision, Control ...Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning Arthur Juliani1, Ahmed Khalifa2, Vincent-Pierre

A Semi-Automatic Method for Resolving Occlusion in ...A Semi-Automatic Method for Resolving Occlusion in Augmented Reality Vincent Lepetit and Marie-Odile Berger LORIA/INRIA Lorraine

CEI Foundation - INSIGHTFULnews · 2019-03-27 · St. Vincent de Paul Center on Bank Street in the West End. This Vision Clinic complements The Foundation's first Vision Clinic which

cv-vincent-fortier-01-web - moncommunicateur.commoncommunicateur.com/Vincent-Fortier/vincent-fortier.pdf · Title: cv-vincent-fortier-01-web.jpg Author: Tony Grenier Created Date:

Ardouin Mélodie Flahault-Franc Emmanuelle Galante Mathieu Ledoux Cédric Lepetit Muriel Seres Charlotte.

Augmented Reality for Board Games - labri.fr · Augmented Reality for Board Games Eray Molla EPFL, CVLab Vincent Lepetit† EPFL, CVLab ABSTRACT We introduce a new type of Augmented

Chapter 3 Vincent Ologak. Chapter 3 Vincent Ologak .

Deformable Surface Tracking Ambiguities · 2016-10-09 · Deformable Surface Tracking Ambiguities Mathieu Salzmann, Vincent Lepetit and Pascal Fua Computer Vision Laboratory Ecole·

Arthur et le dragon Auteur : Eve Vincent Illustrateur : Eve Vincent Editeur : Eve Vincent Collection : Eve Vincent.

A Fast Local Descriptor for Dense Matching Engin Tola, Vincent Lepetit, Pascal Fua Computer Vision Laboratory EPFL Engin Tola, Vincent Lepetit, Pascal.

ICCV Workshop on Recovering 6D Object Pose Tae-Kyun Kim, Carsten Rother, Vincent Lepetit, Jiri Matas, Ales Leonardis, Rigas Kouskouridas December 17 th,

CERTIFICATO N. 23491/06/S RANDSTAD ITALIA S.P.A. · randstad italia s.p.a. via roberto lepetit, 8/10 20124 milano (mi) italia via roberto lepetit, 8/10 20124 milano (mi) italia (view

A Fast Local Descriptor for Dense Matching Engin Tola, Vincent Lepetit, Pascal Fua Computer Vision Laboratory, EPFL Reporter ： Jheng-You Lin 1.

Lepetit f 06

Laboratorio Analisi Lepetit Vademecum del 25.10.18 Pag. 1/ 147 · Esame / Test Cod.SSN Cod.Lab. Valori di controllo / sex / età Laboratorio Analisi Lepetit Vademecum del 25.10.18

Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011.

New vision on the mental problems of Vincent van Gogh ...