Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
-
date post
19-Dec-2015 -
Category
Documents
-
view
226 -
download
1
Transcript of Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
![Page 1: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/1.jpg)
Packing bag-of-features
ICCV 2009
Herv´e J´egouMatthijs DouzeCordelia Schmid
INRIA
![Page 2: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/2.jpg)
Introduction
• Introduction• Proposed method• Experiments• Conclusion
![Page 3: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/3.jpg)
Introduction
• Introduction• Proposed method• Experiments• Conclusion
![Page 4: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/4.jpg)
Bag-of-features
Extracting local image
descriptors
Clustering of the descriptors & k-means quantizer(visual words)
The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm
Roducing a frequency vector fi of length k
![Page 5: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/5.jpg)
TF–IDF weighting
•
![Page 6: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/6.jpg)
TF–IDF weighting
• tf– 100 vocabularies in a document, ‘a’ 3 times– 0.03 (3/100)
• idf– 1,000 documents have ‘a’, total number of
documents 10,000,000– 9.21 ( ln(10,000,000 / 1,000) )
• if-idf = 0.28( 0.03 * 9.21)
![Page 7: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/7.jpg)
Binary BOF[12]
• discard the information about the exact number of occurrences of a given visual word in the image.
• Binary BOF vector components only indicates the presence or not of a particular visual word in the image.
• A sequential coding using 1 bit per component, k/8 bytes per image⌈ ⌉ , the memory usage per
image would be typically 10 kB per image[12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.
![Page 8: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/8.jpg)
Binary BOF(Holidays dataset)
![Page 9: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/9.jpg)
Inverted-file index(Sparsity)
• Documents– T0 = "it is what it is"
– T1 = "what is it"
– T2 = "it is a banana"
• Index– "a": {2}– "banana": {2}– "is": {0, 1, 2}– "it": {0, 1, 2}– "what": {0, 1}
![Page 10: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/10.jpg)
Binary BOF
![Page 11: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/11.jpg)
Compressed inverted file
• • Compression can close to the vector entropy• Compared with a standard inverted file, about
4 times more images can be indexed using the same amount of memory
• This may compensate the decoding cost of the decompression algorithm
[16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.
![Page 12: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/12.jpg)
Introduction
• Introduction• Proposed method• Experiments• Conclusion
![Page 13: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/13.jpg)
MiniBOFs
![Page 14: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/14.jpg)
Projection of a BOF
• Sparse projection matices– – d: dimension of the output descriptor– k: dimension of the input BOF
• For each matrix row, the number of non-zero components is , typically set nz = 8 for k = 1000, resulting in d = 125
![Page 15: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/15.jpg)
Projection of a BOF
• The other matrices are defined by random permutations.– For k = 12 and d = 3, the random permutation (11,
2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3)
• Image i , m mini-BOFs – , ( )
![Page 16: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/16.jpg)
Indexing structure
• Quantization– The miniBOF is quantized by associated with
matrix , , where is the number of codebook entries of the indexing structure.
– The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose . We typically set = 20000.
![Page 17: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/17.jpg)
Indexing structure
• Binary signature generation– The miniBOF is projected using a random rotation
matrix R, producing d components– Each bit of the vector is obtained by comparing
the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset
![Page 18: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/18.jpg)
Quantizing cells
[4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search.In ECCV, 2008.
![Page 19: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/19.jpg)
Indexing structure
• miniBOF associated with image i is represented by the tuple
•
• total memory usage per image is bytes
![Page 20: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/20.jpg)
Multi-probe strategy
• retrieving not only the inverted list associated with the quantized index , but the set of inverted lists associated with the closest t centroids of the quantizer codebook
• T times image hits
![Page 21: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/21.jpg)
Fusion
• Query signature• Database signature• • •
•
![Page 22: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/22.jpg)
Fusion
•
– equal to 0 for images having no observed binary signatures
– equal to if the database image i is the query image itself
![Page 23: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/23.jpg)
Fusion
![Page 24: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/24.jpg)
Introduction
• Introduction• Proposed method• Experiments• Conclusion
![Page 25: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/25.jpg)
Dataset
• Two annotated Dataset– INRIA Holidays dataset [4] – University of Ken-tucky recognition benchmark [9]
• Distractor dataset– one million images downloaded from Flickr,
Flickr1M• Learning parameters– Flickr1M∗
![Page 26: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/26.jpg)
Detail
• Descriptor extraction– Resize to a maximum of 786432 pixels– Performed a slight intensity normalization– SIFT
• Evaluation– Recall@N– mAP– Memory– Image hits
• Parameters
# Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.
![Page 27: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/27.jpg)
mAP
• Mean average precision• EX: – two images A&B– A has 4 duplicate images– B has 5 duplicate images– Retrieval rank A: 1, 2, 4, 7– Retrieval rank B: 1, 3, 5 – Average precision A = (1/1+2/2+3/4+4/7)/4=0.83– Average precision B = (1/1+2/3+3/5+0+0)/3=0.45– mAP= (0.83+0.45)/2=0.64
![Page 28: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/28.jpg)
Table 1(Holidays)
# The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF
![Page 29: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/29.jpg)
Table 2(Kentucky)
![Page 30: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/30.jpg)
Table 3(Holidays+Flickr1M)
![Page 31: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/31.jpg)
Figure(Holidays+Flickr1M)
# Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.
![Page 32: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/32.jpg)
Sample
![Page 33: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/33.jpg)
Introduction
• Introduction• Proposed method• Experiments• Conclusion
![Page 34: Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.](https://reader033.fdocuments.us/reader033/viewer/2022051415/56649d2c5503460f94a0266d/html5/thumbnails/34.jpg)
Conclusion
• This paper have introduced a way of packing BOFs:miniBOFs– An efficient indexing structure for rapid access and
an expected distance criterion for the fusion of the scores
– Reduces memory usage– Reduces the quantity of memory scanned (hits)– Reduces query time