CS598Visual Information retrieval
description
Transcript of CS598Visual Information retrieval
![Page 1: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/1.jpg)
CS598VISUAL INFORMATION RETRIEVALLecture VIII: LSH Recap and Case Study on Face Retrieval
![Page 2: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/2.jpg)
LECTURE VIIIPart I: Locality Sensitive Hashing Re-cap
Slides credit: Y. Gong
![Page 3: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/3.jpg)
THE PROBLEM Large scale image search:
Given a query image Want to search a large database to find similar
images E.g., search the internet to find similar images
Need fast turn-around time Need accurate search results
![Page 4: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/4.jpg)
LARGE SCALE IMAGE SEARCH Find similar images in a large database
Kristen Grauman et al
![Page 5: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/5.jpg)
INTERNET LARGE SCALE IMAGE SEARCH
Internet contains billions of images
The Challenge:– Need way of measuring similarity between images
(distance metric learning)– Needs to scale to Internet (How?)
Search the internet
![Page 6: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/6.jpg)
LARGE SCALE IMAGE SEARCH
Representation must fit in memory (disk too slow)
Facebook has ~10 billion images (1010)
PC has ~10 Gbytes of memory (1011 bits)
Budget of 101 bits/image
Fergus et al
![Page 7: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/7.jpg)
REQUIREMENTS FOR IMAGE SEARCH Search must be both fast, accurate and
scalable Fast
Kd-trees: tree data structure to improve search speed Locality Sensitive Hashing: hash tables to improve search
speed Small code: binary small code (010101101)
Scalable Require very little memory
Accurate Learned distance metric
![Page 8: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/8.jpg)
SUMMARIZATION OF EXISTING ALGORITHMS
Tree Based Indexing Spatial partitions (i.e. kd-tree) and recursive hyper
plane decomposition provide an efficient means to search low-dimensional vector data exactly.
Hashing Locality-sensitive hashing offers sub-linear time
search by hashing highly similar examples together.
Binary Small Code Compact binary code, with a few hundred bits per
image
![Page 9: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/9.jpg)
TREE BASED STRUCTURE Kd-tree
The kd-tree is a binary tree in which every node is a k-dimensional point
They are known to break down in practice for high dimensional data, and cannot provide better than a worst case linear query time guarantee.
![Page 10: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/10.jpg)
LOCALITY SENSITIVE HASHING Hashing methods to do fast Nearest Neighbor (NN) Search
Sub-liner time search by hashing highly similar examples together in a hash table
Take random projections of data
Quantize each projection with few bits
Strong theoretical guarantees
More detail later
![Page 11: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/11.jpg)
BINARY SMALL CODE
1110101010101010
Binary? 0101010010101010101 Only use binary code (0/1)
Small? A small number of bits to code each image i.e. 32 bits, 256 bits
How could this kind of small code improve the image search speed? More detail later.
![Page 12: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/12.jpg)
DETAIL OF THESE ALGORITHMS1. Locality sensitive hashing
Basic LSH LSH for learned metric
2. Small binary code Basic small code idea Spectral hashing
![Page 13: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/13.jpg)
The basic idea behind LSH is to project the data into a low-dimensional binary (Hamming) space i.e., each data point is mapped to a b-bit vector,
called the hash key. Each hash function h must satisfy the locality
sensitive hashing property:
where ∈ [0, 1] is the similarity function of interest
1. LOCALITY SENSITIVE HASHING
Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In SOCG, 2004.
![Page 14: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/14.jpg)
LSH FUNCTIONS FOR DOT PRODUCTS
The hashing function of LSH to produce Hash Code
is a hyperplane separating the space
![Page 15: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/15.jpg)
1. LOCALITY SENSITIVE HASHING
• Take random projections of data • Quantize each projection with few bits
0
1
01 0
1
101
No learning involved
Feature vector
Slide credit: Fergus et al.
![Page 16: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/16.jpg)
HOW TO SEARCH FROM HASH TABLE?
Q11110
1
110111
110101
h r1…rkXi
N
h r1…rk
<< N
Q
A set of data points
Hash function
Hash table
New query
Search the hash table for a small set of images
results
![Page 17: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/17.jpg)
COULD WE IMPROVE LSH? Could we utilize learned metric to improve LSH?
How to improve LSH from learned metric?
Assume we have already learned a distance metric A from domain knowledge
XTAX has better quantity than simple metrics such as Euclidean distance
![Page 18: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/18.jpg)
HOW TO LEARN DISTANCE METRIC? First assume we have a set of domain
knowledge Use the methods described in the last lecture
to learn distance metric A
As discussed before,
Thus is a linear embedding function that embeds the data into a lower dimensional space
Define G =
![Page 19: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/19.jpg)
LSH FUNCTIONS FOR LEARNED METRICS Given learned metric with G could be viewed as linear parametric function or a
linear embedding function for data x Thus the LSH function could be:
The key idea is first embed the data into a lower space by G and then do LSH in the lower dimensional space
Data embedding
Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. In CVPR, 2008
![Page 20: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/20.jpg)
SOME RESULTS FOR LSH Caltech-101 data set
Goal: Exemplar-based Object Categorization Some exemplars Want to categorize the whole data set
![Page 21: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/21.jpg)
RESULTS: OBJECT CATEGORIZATION
Caltech-101 database
[CORR]
ML = metric learningKristen Grauman et al
![Page 22: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/22.jpg)
QUESTION ? Is Hashing fast enough?
Is sub-linear search time fast enough?
For retrieving (1 + e) near neighbors is bounded by O(n1/(1+e) )
Is it fast enough? Is it scalable enough? (adapt to the memory
of a PC?)
![Page 23: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/23.jpg)
NO! Small binary code could do better.
Cast an image to a compact binary code, with a few hundred bits per image.
Small code is possible to perform real-time searches with millions from the Internet using a single large PC.
Within 1 second! (for 80 million data 0.146 sec.)
80 million data (~300G) 120M
![Page 24: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/24.jpg)
BINARY SMALL CODE First introduced in text search/retrieval
Salakhutdinov and Hinton [3] introduced it for text documents retrieval
Introduced to computer vision by Torralba et al [4].
[3] Ruslan Salakhutdinov and Geoffrey Hinton, "Semantic Hashing. International Journal of Approximate Reasoning," 2009 [4] A. Torralba, R. Fergus, and Y. Weiss. "Small Codes and Large Databases for Recognition." In CVPR, 2008.
![Page 25: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/25.jpg)
SEMANTIC HASHING
Address Space
Semantically similar images
Query address
Semantic
HashFunction
Query
Binary code
Images in database
Quite differentto a (conventional)randomizing hash
Fergus et al
Ruslan Salakhutdinov and Geoffrey Hinton Semantic Hashing. International Journal of Approximate Reasoning, 2009
![Page 26: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/26.jpg)
Similar points are mapped into similar small code
Then store these code into memory and compute hamming distance (very fast, carried out by hardware)
SEMANTIC HASHING
![Page 27: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/27.jpg)
OVERALL QUERY SCHEME
Query Image
generate
small code
Feature vector
Binary code
Feature vector
Image 1
Store into memoryUse hard ware to compute hamming distance
Retrieved images <1ms
~1ms (in Matlab)
<10μs
Fergus et al
![Page 28: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/28.jpg)
SEARCH FRAMEWORK
Produce binary code (01010011010)
Store these binary code into the memory
Use hardware to compute the hamming distance (very fast)
Sort the Hamming distances and get final ranking results
![Page 29: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/29.jpg)
HOW TO LEARN SMALL BINARY CODE
Simplest method (use median)
LSH are already able to produce binary code
Restricted Boltzmann Machines (RBM)
Optimal small binary code by spectral hashing
![Page 30: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/30.jpg)
1. SIMPLE BINARIZATION STRATEGYSet threshold (unsupervised)
- e.g. use median
0
1
0 1Fergus et al
![Page 31: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/31.jpg)
2. LOCALITY SENSITIVE HASHING LSH is ready to generate binary code (unsupervised)• Take random projections of data• Quantize each projection with few bits
0
1
01
01
101
No learning involved
Feature vector
Fergus et al
![Page 32: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/32.jpg)
3. RBM [3] TO GENERATE CODE Not going into detail, see Salakhutdinov and Hinton for
detail Use a deep neural network to train small code Supervised method
R. R. Salakhutdinov and G. E. Hinton. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure. In AISTATS, 2007.
![Page 33: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/33.jpg)
LABELME RETRIEVAL LabelMe is a large database with human annotated
images
The goal of this experiment is to First generate small code Use hamming distance to search for similar images Sort the results to produce final ranking
Gist descriptor: ground truth
![Page 34: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/34.jpg)
EXAMPLES OF LABELME RETRIEVAL 12 closest neighbors under different distance
metrics
![Page 35: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/35.jpg)
TEST SET 2: WEB IMAGES 12.9 million images from tiny image data set
Collected from Internet
No labels, so use Euclidean distance between Gist vectors as ground truth distance
Note: Gist descriptor is a kind of feature widely used in computer vision for
![Page 36: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/36.jpg)
EXAMPLES OF WEB RETRIEVAL 12 neighbors using different distance metrics
Fergus et al
![Page 37: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/37.jpg)
WEB IMAGES RETRIEVAL
Observation: more codes get better performance
![Page 38: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/38.jpg)
RETRIEVAL TIMINGS
Fergus et al
![Page 39: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/39.jpg)
SUMMARY Image search should be
Fast Accurate Scalable
Tree based methods Locality Sensitive Hashing Binary Small Code (state of the art)
![Page 40: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/40.jpg)
QUESTIONS?
![Page 41: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/41.jpg)
LECTURE VIIIPart II: Face Retrieval
![Page 42: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/42.jpg)
• Poses, lighting and facial expressions confront recognition
• Efficiently matching against large gallery dataset is nontrivial
• Large number of subjects matters
Challenges
… …
… …
… …
Gallery faces
?
![Page 43: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/43.jpg)
Contextual face recognition
• Design robust face similarity metric– Leverage local feature context to deal
with visual variations• Enable large-scale face recognition tasks– Design efficient and scalable matching
metric– Employ image level context and active
learning–Utilize social network context to scope
recognition – Design social network priors to improve
recognition
![Page 44: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/44.jpg)
Preprocessing
Face DetectionBoosted cascade
Eye DetectionNeural network
Face alignmentSimilarity transform to canonical frame
Illumination normalizationSelf-quotient image[Wang et. al. ‘04]
[Viola-Jones ‘01]
Input to our algorithm
![Page 45: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/45.jpg)
Feature extraction
*
Spatial AggregationLog-polar arrangement of 25 Gaussian-weighted regions
Gaussian pyramidDense sampling in scale
Patches 8×8, extracted on a regular grid at each scale
FilteringConvolution with 4 oriented fourth-derivative of Gaussian quadrature pairs
{ f1 … fn }
<>
n ≈ 500
fi ε R400
0
One feature descriptor per patch:
DAISY Shape
![Page 46: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/46.jpg)
Face representation & matchingAdjoin Spatial information:
Quantizating by a forest of randomized trees in Feature Space × Image Space :
…
T1 Tk
Each feature gi contributes to k bins of the combined histogram vector h.IDF weighted L1 norm: wi = log ( #{ training h : h(i) > 0 } / #training ). d( h, h’ ) = Σi wi | h(i) –
h’(i) |
{ f1 … fn }
f1
x1
y1
fn
xn
yn
… { g1, g2 … gn }
f1
x1
y1
w, <> τT2
…
![Page 47: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/47.jpg)
Randomized projection trees<> τ
Linear decision at each node: w a random projection: w ~ N( 0, Σ ).
Why random projections?• Simple • Interact well with high-dimensional sparse data (feature descriptors!)• Generalize trees used previously used for vision tasks (kd-trees, Extremely Randomized Forests)
[Dasgupa & Freund, Wakin et. al., ...]
Additional data-dependence can be introduced through multiple trials: Select a (w, τ) pair that minimizes a cost function (i.e., MSE, conditional entropy)
[Guerts, LePetit & Fua, ...]
τ = median{ w, [f x y]’ }Normalizes spatial and feature parts
Can also be randomizedw
τ
{ w, [f x y]’ }
![Page 48: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/48.jpg)
Implicit elastic matching{ g1 … gn }
{ g’1 … g’n }
???Even with good descriptors, dense correspondence is difficult:
Instead, quantize features such that corresponding features likely map to same bin:
gi g’jBin 1Bin 2 …
Number of “soft matches’’ is given by comparing histograms:
…h …h’
Histogram of joint quantizations
Similarity(I,I’) = 1 - || h - h’ || w1
Enable efficient inverted file indexing!!
![Page 49: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/49.jpg)
… … … … … … …
Gallery faces
… …
Query face
Ross
![Page 50: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/50.jpg)
• A subset of PIE for exploration (11554 faces / 68 users)– 30 faces per person are used for inducing the
trees• Three settings to explore– Histogram distance metric– Tree depth– Number of trees
Exploring the optimal settings
Distance metric
Reco. Rate
L2 un-weighted
86.3%
L2 IDF-weighted
86.7%
L1 un-weighted
89.3%
L1 IDF-weighted
89.4%
Forest size
1 5 10 15
Reco. Rate
89.4%
92.4%
93.1%
93.6%
![Page 51: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/51.jpg)
Recognition accuracy (1) ORL
40 subjectsUnconstrained
Ext. Yale B
38 subjects , Extreme illumination
PIE
68 subjects Pose, illumination
Multi-PIE
250 subjectsPose, illumination, expression, time
Baseline (PCA) 88.1% 65.4% 62.1% 32.1%LDA 93.9% 81.3% 89.1% 37.0%LPP 93.7% 86.4% 89.2% 21.9%This work 96.5% 91.4% 94.3% 67.6%
Gallery faces:ORL: 5 faces/subject YaleB: 20 faces/subjectPIE: 30 faces/subject Multi-PIE: faces in the 1st session
![Page 52: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/52.jpg)
Recognition accuracy (2) PIE->ORL (ORL->ORL)
ORL->PIE(PIE->PIE)
PIE -> Multi-PIE(Multi-PIE->Multi-
PIE)
Baseline (PCA)
85.0% (88.1%)
55.7%(62.1%)
26.5%(32.6%)
LDA 58.5% (93.9%)
72.8%(89.1%)
8.5%(37.0%)
LPP 17.0% (93.7%)
69.1%(89.2%)
17.1%(21.9%)
This work 92.5% (96.5%)
89.7%(94.3%)
67.2%(67.6%)
The first dataset is used for inducing the forestThe forest is then applied to test on the second dataset
![Page 53: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/53.jpg)
• Poses, lighting and facial expressions confront recognition
• Efficiently matching against large gallery dataset is nontrivial
• Large number of subjects matters
Challenges
… …
… …
… …
Gallery faces
?
![Page 54: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/54.jpg)
Contextual face recognition
• Design robust face similarity metric– Leverage local feature context to deal
with visual variations• Enable large-scale face recognition tasks– Design efficient and scalable matching
metric– Employ image level context and active
learning–Utilize social network context to scope
recognition – Design social network priors to improve
recognition
![Page 55: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/55.jpg)
Preprocessing
Face DetectionBoosted cascade
Eye DetectionNeural network
Face alignmentSimilarity transform to canonical frame
Illumination normalizationSelf-quotient image[Wang et. al. ‘04]
[Viola-Jones ‘01]
Input to our algorithm
![Page 56: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/56.jpg)
Feature extraction
*
Spatial AggregationLog-polar arrangement of 25 Gaussian-weighted regions
Gaussian pyramidDense sampling in scale
Patches 8×8, extracted on a regular grid at each scale
FilteringConvolution with 4 oriented fourth-derivative of Gaussian quadrature pairs
{ f1 … fn }
<>
n ≈ 500
fi ε R400
0
One feature descriptor per patch:
DAISY Shape
![Page 57: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/57.jpg)
Face representation & matchingAdjoin Spatial information:
Quantizating by a forest of randomized trees in Feature Space × Image Space :
…
T1 Tk
Each feature gi contributes to k bins of the combined histogram vector h.IDF weighted L1 norm: wi = log ( #{ training h : h(i) > 0 } / #training ). d( h, h’ ) = Σi wi | h(i) –
h’(i) |
{ f1 … fn }
f1
x1
y1
fn
xn
yn
… { g1, g2 … gn }
f1
x1
y1
w, <> τT2
…
![Page 58: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/58.jpg)
Randomized projection trees<> τ
Linear decision at each node: w a random projection: w ~ N( 0, Σ ).
Why random projections?• Simple • Interact well with high-dimensional sparse data (feature descriptors!)• Generalize trees used previously used for vision tasks (kd-trees, Extremely Randomized Forests)
[Dasgupa & Freund, Wakin et. al., ...]
Additional data-dependence can be introduced through multiple trials: Select a (w, τ) pair that minimizes a cost function (i.e., MSE, conditional entropy)
[Guerts, LePetit & Fua, ...]
τ = median{ w, [f x y]’ }Normalizes spatial and feature parts
Can also be randomizedw
τ
{ w, [f x y]’ }
![Page 59: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/59.jpg)
Implicit elastic matching{ g1 … gn }
{ g’1 … g’n }
???Even with good descriptors, dense correspondence is difficult:
Instead, quantize features such that corresponding features likely map to same bin:
gi g’jBin 1Bin 2 …
Number of “soft matches’’ is given by comparing histograms:
…h …h’
Histogram of joint quantizations
Similarity(I,I’) = 1 - || h - h’ || w1
Enable efficient inverted file indexing!!
![Page 60: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/60.jpg)
… … … … … … …
Gallery faces
… …
Query face
Ross
![Page 61: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/61.jpg)
• A subset of PIE for exploration (11554 faces / 68 users)– 30 faces per person are used for inducing the
trees• Three settings to explore– Histogram distance metric– Tree depth– Number of trees
Exploring the optimal settings
Distance metric
Reco. Rate
L2 un-weighted
86.3%
L2 IDF-weighted
86.7%
L1 un-weighted
89.3%
L1 IDF-weighted
89.4%
Forest size
1 5 10 15
Reco. Rate
89.4%
92.4%
93.1%
93.6%
![Page 62: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/62.jpg)
Recognition accuracy (1) ORL
40 subjectsUnconstrained
Ext. Yale B
38 subjects , Extreme illumination
PIE
68 subjects Pose, illumination
Multi-PIE
250 subjectsPose, illumination, expression, time
Baseline (PCA) 88.1% 65.4% 62.1% 32.1%LDA 93.9% 81.3% 89.1% 37.0%LPP 93.7% 86.4% 89.2% 21.9%This work 96.5% 91.4% 94.3% 67.6%
Gallery faces:ORL: 5 faces/subject YaleB: 20 faces/subjectPIE: 30 faces/subject Multi-PIE: faces in the 1st session
![Page 63: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/63.jpg)
Recognition accuracy (2) PIE->ORL (ORL->ORL)
ORL->PIE(PIE->PIE)
PIE -> Multi-PIE(Multi-PIE->Multi-
PIE)
Baseline (PCA)
85.0% (88.1%)
55.7%(62.1%)
26.5%(32.6%)
LDA 58.5% (93.9%)
72.8%(89.1%)
8.5%(37.0%)
LPP 17.0% (93.7%)
69.1%(89.2%)
17.1%(21.9%)
This work 92.5% (96.5%)
89.7%(94.3%)
67.2%(67.6%)
The first dataset is used for inducing the forestThe forest is then applied to test on the second dataset
![Page 64: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/64.jpg)
Scalability challenges
• Given N labeled faces, how to choose a maximum M<N faces to form the gallery faces such that the recognition accuracy is maximized?
• How to handle the problem of recognizing large number of subjects in a social network?
![Page 65: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/65.jpg)
An active learning framework
A discriminative model (GP+MRF)
Active learning:
),(),( jiji tttt
),(1),( jiji tttt
K)N(0,)X|Y( P
2
2
2exp),(
ii
ii
tyty
Cc
ibp
ibp
UiiUictqctqtH )(log)(maxarg)(maxargx*
![Page 66: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/66.jpg)
Experiments
• 80 minutes “friends” video• 6 characters• 1282 tracks, 16720 faces• 500 samples are used for testing
![Page 67: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/67.jpg)
Social network scope and priors
• Scope the recognition by social network• Build the prior probability of whom Rachel would like to tag
![Page 68: CS598Visual Information retrieval](https://reader035.fdocuments.us/reader035/viewer/2022062222/5681637d550346895dd45d0b/html5/thumbnails/68.jpg)
Effects of social priors
Perfect recognition
Recognition w/ Priors
Recognition w/o Priors