Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA...

26
Packing bag-of-features ICCV2009 Herv´ eJ´ egou MatthijsDouze CordeliaSchmid INRIA INRIA INRIA [email protected] [email protected] [email protected] LEAR (Learning and Recognition in Vision)

Transcript of Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA...

Packing bag-of-featuresICCV2009Herv´ eJ´ egou MatthijsDouze

CordeliaSchmidINRIA INRIA [email protected] [email protected] [email protected]

LEAR (Learning and Recognition in Vision)

OUTLINE Introduction Background Structure & Approach Experiments Conclusion

Introduction

One of the main limitations of image search based on bag-of-feature is the memory usage per image

Provide a method whichReduce the memory usage and faster than standard bag-of -feature

Background

BOF (Bag-of-feature)

BOF outline 1. Extract local image descriptors (features)

(Hessian-Affine detector [8] & SIFT descriptor [6])

2. Learn “visual vocabulary”

(Clustering of the descriptor) 3.Quantize features using visual vocabulary

(K-means quantizer) 4. Represent images by frequencies of “visual

words”histogram of visual word occurrence is

weighted using the tf-idf . normalized with the L2 norm

Producing a frequency vector fi of length k

tf-idf weighting

tf = 0.030 ( 3/100 )100 vocabularies in a document, ‘a’ 3 times

idf = 13.287 ( log (10,000,000/1,000) )1,000 documents have ‘a’, total number of documents 10,000,000

if-idf = 0.398 ( 0.03 * 13.287 )

Visual word (feature)

image

Datasets INRIA Holidays dataset [4] University of Kentucky recognition

benchmark [9] Flickr1M & Flickr1M*

Binary BOF [12]

Discard the information about the exact number of occurrences of a given visual word in the image

Binary BOF components only indicates the presence or absence of a particular visual word in the image

A sequential coding using 1 bit per component. The memory usage is, then, ┌k/8┐byte per image. The memory usage per image would be typically 10kB per image[12] J.SivicandA.Zisserman. Video Google: A text

retrieval Approach to object matching in videos.In ICCV,pages1470–1477,2003.

Binary BOF [12]

INRIA Holidays dataset

Compressed inverted file [16]

[16]J.ZobelandA.Moffat. Inverted files for text search engines. ACM Computing Surveys,38(2):6,2006.

Compared with a standard inverted file , about 4 times more image scan be indexed using the same amount of memory.

The amount of memory to be read is proportionally reduced at query time.

This may compensate the decoding cost of the decompression algorithm.

Binary BOF [12]

INRIA Holidays dataset

MiniBOFs1) Producing2) Indexing3) Fusing

Performed a query image

Projection of a BOF:vocabulary aggregators Sparse projection matrices

A = {A1,...,Am} of sizes d × k

d = dimension of the output descriptor

k = dimension of the initial BOF

For each projection vector (a matrix row ) , the number of

non-zero components is nz = k/d. Typically set nz = 8 for k=1000

,resulting d = 125

Projection of a BOF: The other aggregators are defined by shuffling the

input BOF vector components using random permutation. For k =12 , d=3 the random permutation

(11,2,12,8,9,4,10,1,7,5,6,3

Image i , m miniBOFs ωi,j , 1 ≤ j ≤ m

fi = BOF frequency vector

Indexing structure [4]

Quantization

k’ = number of codebook entries of the indexing structure

The set of k-means codebooks qj(.), 1<= j <= m, is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset.

The miniBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose k ≠ k’.

Typically set k’ = 20000

Indexing structure

Binary signature generation bi,j Length of d, refine the localization of the miniBOF within

the cell

Using the method of [4] The miniBOF is projected using a random rotation

matrix R, producing d components

Each bit of the vector bi,j is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset.

Quantizing(Voronoi) cell

Indexing structure

j th miniBOF associated with image i is represented by the tuple

4 bytes to store the image identifier i┌d/8┐byte to store the binary vector bi,j

Total memory usage pre image is Ci =m*( 4 + ┌d/8┐)

Indexing structure

Multi-probe strategy [7] Retrieving not only the inverted list associated

with the quantized index ci,j , but the set of inverted lists associated with the closest t centroids of the quantizer codebook

It increases the number of image hits because t times more inverted lists are visited

Fusion:expected distance criterion

bi,j The signature associated with the query image q

bq,j The signature of the database image Ibq = [ bq,1,...,bq,m ]bi = [ bi,1,...,bi,m ]

h(x,y) represents the Hamming distance

Fusion

equal to 0

for images having no observed binary signature

equal to d * m/2

if the database image I is the query image itself

The query speed improved by a threshold on the Hamming distance, we use τ = d/2

Fusion

ExperimentsUsed the following parameters in all the miniBOF experiments

On University of Kentucky object recognition

benchmark

On Holidays + Flickr1M

ExperimentsOn Holiday + Flickr1M

ExperimentsSample

Conclusion This paper we have introduced a way

of packing BOFs : miniBOFsAn efficient indexing structure based on

Hamming Embedding allows for rapid access and an expected distance criterion for the fusion of the scores.

It Reduces Memory usage the quantity of memory

scanned (hits) query time