Efficient Image Search and Retrieval using Compact Binary Codes
description
Transcript of Efficient Image Search and Retrieval using Compact Binary Codes
-
Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU)Antonio Torralba (MIT)Yair Weiss (Hebrew U.)
-
How can we search them, based on visual content?Large scale image searchInternet contains many billions of imagesThe Challenge:Need way of measuring similarity between imagesNeeds to scale to Internet
-
Existing approaches to Content-Based Image RetrievalFocus of scaling rather than understanding imageVariety of simple/hand-designed cues:Color and/or Texture histograms, Shape, PCA, etc.Various distance metricsEarth Movers Distance (Rubner et al. 98)
Most recognition approaches slow (~1sec/image)
-
Our ApproachLearn the metric from training data
DO BOTH TOGETHER
Use compact binary codes for speed
-
Large scale image/video searchRepresentation must fit in memory (disk too slow)
Facebook has ~10 billion images (1010)PC has ~10 Gbytes of memory (1011 bits) Budget of 101 bits/image
YouTube has ~ a trillion video frames (1012)Big cluster of PCs has ~10 Tbytes (1014 bits) Budget of 102 bits/frame
-
Binary codes for imagesWant images with similar content to have similar binary codes
Use Hamming distance between codesNumber of bit flipsE.g.:
Semantic Hashing [Salakhutdinov & Hinton, 2007]Text documents
Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3
-
Semantic HashingAddress SpaceSemantically similar imagesQuery addressSemantic Hash FunctionQuery ImageBinary codeImages in database[Salakhutdinov & Hinton, 2007] for text documentsQuite different to a (conventional) randomizing hash
-
Semantic HashingEach image code is a memory addressFind neighbors by exploring Hamming ball around query address
Address SpaceQuery addressImages in databaseChooseCode lengthRadiusLookup time is independent of # of data pointsDepends on radius of ball & length of code:
- Code requirementsSimilar images Similar CodesVery compact (
-
Input Image representation: Gist vectorsPixels not a convenient representationUse Gist descriptor instead (Oliva & Torralba, 2001)512 dimensions/image (real-valued 16,384 bits)L2 distance btw. Gist vectors not bad substitute for human perceptual distanceOliva & Torralba, IJCV 2001NO COLOR INFORMATION
-
1. Locality Sensitive HashingGionis, A. & Indyk, P. & Motwani, R. (1999)
Take random projections of dataQuantize each projection with few bits101No learning involvedGist descriptor
-
2. BoostingModified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003]Positive examples are pairs of similar imagesNegative examples are pairs of unrelated images
Learn threshold & dimension for each bit (weak classifier)
-
3. Restricted Boltzmann Machine (RBM)Type of Deep Belief NetworkHinton & Salakhutdinov, Science 2006 Single RBM layerAttempts to reconstruct input at visible layer from activation of hidden layerW
-
Multi-Layer RBM: non-linear dimensionality reduction512512w1Input Gist vector (512 dimensions)Layer 1512256w2Layer 2256Nw3Layer 3Output binary code (N dimensions)Linear units at first layer
-
Training RBM models1st Phase: Pre-training
Unsupervised
Can use unlabeled data (unlimited quantity)
Learn parameters greedily per layer
Gets them to right ballpark2nd Phase: Fine-tuning
Supervised
Requires labeled data(limited quantity)
Back propagate gradients of chosen error function
Moves parameters to local minimum
-
Greedy pre-training (Unsupervised)512512w1Input Gist vector (512 real dimensions)Layer 1
-
Greedy pre-training (Unsupervised)Activations of hidden units from layer 1 (512 binary dimensions)512256w2Layer 2
-
Greedy pre-training (Unsupervised)Activations of hidden units from layer 2 (256 binary dimensions)256Nw3Layer 3
-
Fine-tuning: back-propagation of Neighborhood Components Analysis objective 512512Input Gist vector (512 real dimensions)Layer 1512256Layer 2256NLayer 3Output binary code (N dimensions)w3w2w1
-
Neighborhood Components AnalysisGoldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004Tries to preserve neighborhood structure of input spaceAssumes this structure is given (will explain later)Points in output space (coordinate is activation probability of unit) Toy example with 2 classes & N=2 units at top of network:
-
Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class away
-
Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class awayPoints close in input space (Gist) will be close in output code space
-
Simple Binarization StrategySet threshold - e.g. use median
Deliberately add noise
- Overall Query SchemeQuery ImageRBMCompute GistBinary codeGist descriptorImage 1Semantic HashRetrieved images
-
Retrieval Experiments
-
Test set 1: LabelMe22,000 images (20,000 train | 2,000 test)Ground truth segmentations for allCan define ground truth distance btw. images using these segmentations
-
Defining ground truth Boosting and NCA back-propagation require ground truth distance between imagesDefine this using labeled images from LabelMe
-
Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)
-
Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)Varying spatial resolution to capture approximate spatial correspondance
-
Examples of LabelMe retrieval12 closest neighbors under different distance metrics
-
LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000
-
LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000Number of bits% of 50 true neighbors in first 500 retrieved
-
Test set 2: Web images12.9 million imagesCollected from InternetNo labels, so use Euclidean distance between Gist vectors as ground truth distance
-
Web images retrieval% of 50 true neighbors in retrieval setSize of retrieval set
-
Web images retrievalSize of retrieval set % of 50 true neighbors in retrieval set% of 50 true neighbors in retrieval setSize of retrieval set
-
Examples of Web retrieval12 neighbors using different distance metrics
-
Retrieval Timings
-
SummaryExplored various approaches to learning binary codes for hashing-based retrievalVery quick with performance comparable to complex descriptors
More recent work on binarizationSpectral Hashing (Weiss, Torralba, Fergus NIPS 2009)
*********