Efficient Representation of Local Geometry for Large Scale Object Retrieval

20
22 nd June 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2009 Center for Machine Perception Czech Technical University in Prague Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas

description

Efficient Representation of Local Geometry for Large Scale Object Retrieval. Michal Per ďoch Ondřej Chum and Jiří Matas. Large Scale Object Retrieval. Large (web) scale “real-time” search involves millions(billions) of images - PowerPoint PPT Presentation

Transcript of Efficient Representation of Local Geometry for Large Scale Object Retrieval

Page 1: Efficient Representation of Local Geometry for Large Scale Object Retrieval

22nd June 2009IEEE Computer Society Conference on

Computer Vision and Pattern Recognition 2009

Center for Machine PerceptionCzech Technical University in Prague

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Michal PerďochOndřej Chum and Jiří Matas

Page 2: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 2/20

Large Scale Object Retrieval

Large (web) scale “real-time” search involves millions(billions) of images

Indexing structure should fit into RAM, failing to do so results in a order of magnitude increase in response time

1 bit per feature requires ~1.16GB in a retrieval system with 5 million images, ~2000 features per image

Scalability and the cost of the search engine is determined by memory footprint -> memory is the limiting factor

Methods based on local features Are able to retrieve objects, not only images Robust to occlusions and background changes

Page 3: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 3/20

Image vs. Object Retrieval

Image query Object query

Page 4: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 4/20

Object Retrieval vs. Global Methods

Query

Page 5: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 5/20

Image Representation by Local Features

Local features = appearance + geometry Bag of words approaches focus on appearance. Geometry benefits object retrieval (in spatial verification)

[Philbin’07]

The problem: How to efficiently store geometric information of local features?

Currently, the straightforward representation of local geometry requires roughly 4-5x more memory than the appearance

I.e. ~32bits for appearance vs. ~160bits for geometry.

Our solution: a learnt geometric vocabulary

Page 6: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 6/20

?

Bag of words, Video Google [Šivic’03]

Image Representation by Local Features

Keypoint Detection

Local Appearance

SIFT Description [Lowe’04]

Visual Vocabulary

Visual Words

word1, word2, word8, ...

word948534 ,word998125

graffiti

graffiti

Local Geometry

graffiti

Page 7: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 7/20

Inverted fileInverted file

+TF-IDF weights

Object Retrieval with Spatial Verification

word123...

1000000treesbark

128...

948534

998125

128...

948534

998125

128...

948534

998125bike

128...

948534

998125

graffiti

TF-IDF ranking 0.85 0.81...

0.0130.001...

graffitigraffiti2

barkall_souls

Geometry

bikesbikesbarkgraffiti

Final ranking

210138...

3 1

graffitigraffiti2

barkall_souls

1481179...

948534

998423query

query

LO-RANSAC

H Ai Aj

Page 8: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 8/20

Coordinates x0 and an ellipse E

ellipse E can be decomposed up to an arbitrary rotation Q

We fix ambiguity in Q by choosing

Local Geometry of Affine Features

R is given by a dominant orientation as in [Lowe’04] Local affine transformation is defined by a single correspondence

Page 9: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 9/20

Geometry CompressionRepresenting transformation Ai

Ai (independent of rotation R), captures the shape of the ellipse

Let a transformation Bi be a representant of Ai, ideally

However, to compress the representation, B will be chosen from a vocabulary and shared by multiple Ai, so that

How to measure and minimize error E of H?

Page 10: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 10/20

Minimizing Geometric Error of H

using simple manipulations

Frobenius norm can be used in space of transformations as a similarity

We capture the quality of H by integrating reprojection error over unit circle

Page 11: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 11/20

K-means-like ClusteringFor a set of transformations A of cardinality N we are looking for a best set B of K representative candidates Bj

that minimizes error E

over all possible assignments Aj.1. assignment step

2. refinement step

which leads to a closed form solution.3. go to 1, until no reassignments occurred or error is below threshold

Page 12: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 12/20

Geometric “vocabularies” Assuming that the scale and shape of the ellipse are independent,

scale can be extracted and quantized separately Each geometric vocabulary is characterized by two numbers (x, y)

and denoted SxEyx – number of bits for encoding scaley – number of bits for encoding ellipse shape, K = 2y

Geometric vocabularies learnt on a set of local geometries of 10M affine covariant points (for visualisation scale was quantized separately)

y = 2, K = 22 y = 3, K = 23 y = 4, K = 24 y = 5, K = 25

Page 13: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 13/20

S0E8 S4E4 S8E0 S4E12 256 ellipses 16 scales, 16 ellipses 256 scales, circles 16 scales, 4k ellipses

Geometric “vocabularies”

Page 14: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 14/20

Efficient Geometry Representation

Coordinates xi, yi are quantized separately and encoded using 16bits

Local Geometry

x1,y1,B1

x2,y2,B5

x3,y3,B3...

xN,yN,BNgraffiti

Geom. Vocabulary

graffiti

Keypoint Detection

Local Appearance

SIFT Description [Lowe’04]

Visual Vocabulary

Visual Words

word1, word2, word8, ...

word948534 ,word998125

graffiti

graffiti

Page 15: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 15/20

Gravity Vector Assumption Transformation A has one eigenvector (0,1)> pointing down in the

image – we call it a gravity vector Surprisingly, upright direction is often preserved (R = I) Gravity vector assumption was used solely in spatial verification

[Philbin’07]

Page 16: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 16/20

Gravity Vector AssumptionPerformance comparison

SIFT dominant orientation [Lowe’04] produces 50% more descriptorsRobustness of gravity vector assumption

Robustness of SIFT to small imprecision in orientation Correct final geometry due to LO-step in RANSAC

Page 17: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 17/20

Experiments – Datasets and Protocol Oxford Buildings Dataset (Ox5K) [Philbin’07]

5062 images of buildings collected from Flickr, 55 queries with ground truth, 11 landmarks each with 5 different queries

mAP – mean Average Precision, average area below the precision-recall curves for all queries

Additional 100k of distractor images, together dataset (Ox105K)

Page 18: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 18/20

Two different visual vocabularies, trained on Oxford5K and ~6000 images of tourist spots in Paris, protocol from [Philbin’07]

Spatial verification with “8bit per feature” geometric vocabulary for ellipse shape performs almost as exact geometry

It might be necessary to use more challenging datasets to highlight the benefit of affine shape. In Oxford buildings dataset 86% of all GT pairs have anisotropy under 1.5 (72% is under 1.25) 90% of all GT pairs have skew under 20° (76% is under 10°)

Experiments – Geometric Vocabularies

No geometry

Full geometry

Page 19: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 19/20

Experiments – Performance Comparison Oxford Buildings Dataset

Proposed method outperforms all state of the art methods Geometry representation S0E8 was used

Page 20: Efficient Representation of Local Geometry for Large Scale Object Retrieval

Perďoch M., Chum O.,Matas J.: Efficient Representation of Local Geometry for Large Scale Object Retrieval 20/20

ConclusionsContributions

We have proposed a method for learning an efficient, compact representation of local geometry

Method achieved state of the art results (0.916 mAP) on Oxford buildings dataset and competitive results on INRIA Holidays dataset.

Proposed method requires only 45.2 bits/feature on average Local geometry in 24 bits (16 bits for position, 8 bits for shape) Visual word labels and TF/IDF weights in 21.2 bits

We have shown that the use of gravity vector assumption in feature description significantly improves recognition rate and further reduces the memory footprint

Future work Compare performance of scale vs. affine covariant points on

standard dataset