UI designer, WEB designer, graphic designer, Ukraine, Kharkiv, Kiev, CV
AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception...
Transcript of AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception...
![Page 1: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/1.jpg)
The Bag of Words Torn Open: Instance Retrieval goes Deep
AI Ukraine 2016Kharkiv, Ukraine
James Pritts
Center for Machine Perception
Czech Technical University in Prague
![Page 2: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/2.jpg)
Who are we?
Filip RadenovićPhD candidate
James PrittsPhD candidate
Jiří MatasProfessor
Ondřej ChumAssociate Professor
Giorgos ToliasPost-Doctoral candidate
![Page 3: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/3.jpg)
Goals
Introduce the Instance Retrieval Problem
Compare two ways to learn an image encodingBag-of-words (BoW) descriptor:
~1,000,000D vector
Convolutional Neural Network (CNN) descriptor
512D vector
Demonstrate state-of-the-art retrieval performance
![Page 4: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/4.jpg)
Part 1: The Instance Retrieval Task
![Page 5: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/5.jpg)
Instance Retrieval Challenges
Significant viewpoint and/or scale change
Significant illumination change
Severe occlusions
Visually similar but different objects
![Page 6: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/6.jpg)
Instance Retrieval Challenges
Significant viewpoint and/or scale change
Significant illumination change
Severe occlusions
Visually similar but different objects
![Page 7: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/7.jpg)
Instance Retrieval Challenges
Significant viewpoint and/or scale change
Significant illumination change
Severe occlusions
Visually similar but different objects
![Page 8: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/8.jpg)
Instance Retrieval Challenges
Significant viewpoint and/or scale change
Significant illumination change
Severe occlusions
Visually similar but different objects
![Page 10: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/10.jpg)
Notional Instance Retrieval System
Image
Descriptordatabase
Lots ofimages
Query Encoding
Query Encoding
Ranking Matching
Off-line stage: learning
On-line stage: inference
Descriptordatabase
![Page 11: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/11.jpg)
Notional Instance Retrieval System
Image
Descriptordatabase
Lots ofimages
Query Encoding
Query Encoding
Ranking Matching
Off-line stage: learning
On-line stage: inference
Descriptordatabase
![Page 12: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/12.jpg)
Part 2: The Bag of Words (BoW) representation
![Page 13: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/13.jpg)
Bag of Words: Off-line stage
![Page 14: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/14.jpg)
Quantization by K-Means
Initialize cluster centres
Find nearest cluster to each datapoint (slow) O(N k)
Re-compute cluster centres as centroids
Iterate
![Page 15: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/15.jpg)
Quantization by Approximate K-Means
+ fast O(N log k)
+ reasonable quantization
- Can be inconsistent when ANN fails
Philbin, Chum, Isard, Sivic, and Zisserman – CVPR 2007 Object retrieval with large vocabularies and fast spatial matching
Initialize cluster centres
Find approximate nearest cluster to each datapoint
Re-compute cluster centres as centroids
Iterate
![Page 16: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/16.jpg)
Quantization by Hierarchical K-means
+ fast O(N log k)
+ incremental construction
- not so good quantization
- often imbalanced
Nistér & Stewénius: Scalable recognition with a vocabulary tree. CVPR 2006
![Page 17: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/17.jpg)
Bag-of-Words Image Representation
A
C
D
BA
C
D
B
1
0
0
2
0
3
0
1
Images
…
Vis
ual
vo
cab
ula
ryAn image is represented by the histogram ofdetected visual words
Term-frequency (tf) – visual word D is twice in the image
![Page 18: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/18.jpg)
18
Bag of Words : On-line Stage
137…
15999565
BOW
geometries
IN: q
word image ID1 1 5 10 … 73501252 2 7 12 … 73991213 1 4 15 … 7200190… …
16777216 3 7 10 … 7012245
1. Inverted file: posting list per visual word 2. Image ranking
score image ID0.87 50.75 15730.52 11202
… …0.001 32
image 11202image 1573image 5
3. Spatial verification
#inliers zoom image ID247 7x 1573105 2x 517 37x 11202… … …2 17x 75213
4. Re-ranked shortlist
137…
15999565
+23
15…
14890215
+3
1029…
15678921
+…
+ + +…
query image 1573 image 45
5. Query expansion
OUT: R
…
Shortlist: top N images
![Page 19: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/19.jpg)
BoW and Inverted File
1 2 3 4 5 6 7 8 9 10
6 7 7 …
1 3 6
…5 6 8
…
2 4 10 …A
C
D
B
Vis
ual
vo
cab
ula
ry
…
A CD BA AB
BC
CD
D
… … … … …
…
…
…
…
…
![Page 20: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/20.jpg)
BoW and Inverted File
1 2 3 4 5 6 7 8 9 10
6 7 7 …
1 3 6 …
5 6 8 …
query visual word 1
query visual word 2
query visual word 3
D
B
G
![Page 21: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/21.jpg)
BoW and Inverted File
1 2 3 4 5 6 7 8 9 10
Efficient (fast)Linear complexity (in # documents)Can be interpreted as voting
![Page 22: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/22.jpg)
Efficient Scoring
bag of words representation(up to 1,000,000 D)
0
3
0
1
α1 ( 1 0 0 2 )α2 ( 0 2 0 1 )
α3 ( 1 0 0 0 )
…
Database Query
• =
Score
αqs2
s3
…
A C DB
A
C
D
B
s1
![Page 23: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/23.jpg)
Word Weighting
Words (in text) common to many documentsare less informative - ‘the’, ‘and’, ‘or’, ‘in’, …
idfX = log # docs containing
# documents
X
Images are represented by weighted histograms tfX idfX
(rather than just a histogram of tfX )
Words that are too frequent (virtually in every document) can be put on a stop list(ignored as if they were not in the document)
Baeza-Yates, Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
features from all documents
![Page 24: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/24.jpg)
24
Bag of Words : On-line Stage
137…
15999565
BOW
geometries
IN: q
word image ID1 1 5 10 … 73501252 2 7 12 … 73991213 1 4 15 … 7200190… …
16777216 3 7 10 … 7012245
1. Inverted file: posting list per visual word 2. Image ranking
score image ID0.87 50.75 15730.52 11202
… …0.001 32
image 11202image 1573image 5
3. Spatial verification
#inliers zoom image ID247 7x 1573105 2x 517 37x 11202… … …2 17x 75213
4. Re-ranked shortlist
137…
15999565
+23
15…
14890215
+3
1029…
15678921
+…
+ + +…
query image 1573 image 45
5. Query expansion
OUT: R
…
Shortlist: top N images
![Page 25: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/25.jpg)
25
Query Expansion
…
Query image
Results
New query
Spatial verification
New results
Chum, Philbin, Sivic, Isard, Zisserman: Total Recall…, ICCV 2007
![Page 26: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/26.jpg)
26
Query Expansion: Step by Step
Query Image Retrieved image Originally not retrieved
![Page 27: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/27.jpg)
27
Query Expansion: Step by Step
![Page 28: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/28.jpg)
28
Query Expansion: Step by Step
![Page 29: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/29.jpg)
The Bag of Words solution
Significant viewpoint scale change covariant local features, invariant descriptors
Significant illumination change color-normalized feature descriptors
Severe occlusions locality of the features, geometric verification
Visually similar but different objects Feature discriminability & geometric verification
** Encoding is learned, but representation has many assumptions
![Page 30: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/30.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
Filip Radenović Giorgos Tolias Ondřej Chum
Center for Machine Perception, CTU in Prague
ECCV 2016
…global max
pooling & L2-norm
![Page 31: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/31.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
Filip Radenović Giorgos Tolias Ondřej Chum
Center for Machine Perception, CTU in Prague
ECCV 2016
…global max
pooling & L2-norm
![Page 32: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/32.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
CNN Image Retrievalcompact image descriptors
Nearest Neighbor search
…global max pooling & L2-norm
imagedescriptor
![Page 33: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/33.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
CNN Image Retrievalcompact image descriptors
Nearest Neighbor search
CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)
re-train with data relevant to your task
…global max pooling & L2-norm
imagedescriptor
![Page 34: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/34.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
CNN Image Retrievalcompact image descriptors
Nearest Neighbor search
CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)
re-train with data relevant to your task
Bag of Wordsstate-of-the-art retrieval performance
couples well with SfM
…global max pooling & L2-norm
imagedescriptor
![Page 35: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/35.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
CNN Image Retrievalcompact image descriptors
Nearest Neighbor search
CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)
re-train with data relevant to your task
Bag of Wordsstate-of-the-art retrieval performance
couples well with SfM
Unsupervised training data generationno human interaction
…global max pooling & L2-norm
imagedescriptor
![Page 36: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/36.jpg)
CNN Image Retrieval Learns from BoW:Unsupervised Fine-Tuning with Hard Examples
CNN Image Retrievalcompact image descriptors
Nearest Neighbor search
CNN Learning (Fine-Tuning)start with CNN trained for different but similar task (reasonable parameters)
re-train with data relevant to your task
Bag of Wordsstate-of-the-art retrieval performance
couples well with SfM
Unsupervised training data generationno human interaction
Hard Examples
hard positives hard negatives
…global max pooling & L2-norm
imagedescriptor
![Page 37: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/37.jpg)
“Lots of Training Examples”
Large Internet photo collection
…
Convolutional Neural Network (CNN)
Image annotations
Training
![Page 38: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/38.jpg)
“Lots of Training Examples”
Large Internet photo collection
…
Convolutional Neural Network (CNN)
Not accurateExpensive $$
Manual cleaning ofthe training data
done by Researchers
Very expensive $$$$
Automated extractionof training data
Very accurateFree $
![Page 39: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/39.jpg)
Off-the-shelf CNN
• Target application: classification
• Training dataset: ImageNet
• Architecture: AlexNet & VGG
• Directly applicable to other tasks
Images from ImageNet.org
Fine-grain classification
Images from ImageNet.org
Object detection
Images from PASCAL VOC 2012
Image retrieval
![Page 40: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/40.jpg)
Annotations for CNN Image Retrieval• CNN pre-trained for classification task used for retrieval
[Gong et al. ECCV’14, Babenko et al. ICCV’15, Kalantidis et al. arXiv’15, Tolias et al. ICLR’16]
• Fine-tuned CNN using a dataset with landmark classes[Babenko et al. ECCV’14]
• NetVLAD: Weakly supervised fine-tuned CNN using GPS tags[Arandjelovic et al. CVPR’16]
• We propose: automatic annotations for CNN training
Building class
Landmark class
spatially closest ≠ matching
Hard positives Hard negatives
![Page 41: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/41.jpg)
Retrieval and SfM
[Schonberger et al. CVPR’15][Radenovic et al. CVPR’16]
![Page 42: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/42.jpg)
CNN learns from BoW – Training DataCamera Orientation Known
Number of Inliers Known
7.4M images 713 training 3D models[Schonberger et al. CVPR’15]
![Page 43: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/43.jpg)
Hard Negative Examples
query the most similarCNN descriptor
naive hard negativestop k by CNN
diverse hard negativestop k: one per 3D model
Negative examples: images from different 3D models than the queryHard negatives: closest negative examples to the queryOnly hard negatives: as good as using all negatives, but faster
increasing CNN descriptor distance to the query
![Page 44: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/44.jpg)
Hard Positive Examples
query top 1 by CNN top 1 by BoWrandom from top k by BoW
harder positives
used in NetVLAD
Positive examples: images from the same 3D model as the queryHard positives: positive examples not close enough to the query
![Page 45: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/45.jpg)
CNN Siamese Learning
…global max
pooling& L2-norm
D x 1CNNdesc.
Query Convolutional Layers Pooling Descriptor
…global max
pooling & L2-norm
D x 1CNNdesc.
Positive Convolutional Layers Pooling Descriptor
ContrastiveLoss
1 – positive0 – negative
Pair Label
MATCHING PAIR
![Page 46: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/46.jpg)
CNN Siamese Learning
…global max
pooling& L2-norm
D x 1CNNdesc.
Query Convolutional Layers Pooling Descriptor
…global max
pooling & L2-norm
D x 1CNNdesc.
Convolutional Layers Pooling Descriptor
ContrastiveLoss
1 – positive0 – negative
Pair Label
NON-MATCHING PAIR
Contrastive vs. Triplet loss: Contrastive better with our dataContrastive loss more strict, requires accurate training dataTriplet loss less sensitive to inaccurate annotation
![Page 47: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/47.jpg)
Whitening and dimensionality reduction
1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]
2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]
…global max pooling & L2-norm
Dx1 CNNdesc.
whitening
end-to-end learning post-processing
optionaldim reduction
![Page 48: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/48.jpg)
Whitening and dimensionality reduction
1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]
2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]
3. End-to-end Learning – Performs comparable or worse than Lw, while slowing down the convergence
…global max pooling & L2-norm
Dx1 CNNdesc.
whitening
end-to-end learning
optionaldim reduction
![Page 49: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/49.jpg)
Whitening and dimensionality reduction
1. PCAw – PCA of an independent set of descriptors[Babenko et al. ICCV’15, Tolias et al. ICLR’16]
2. Lw – We propose to learn whitening using labeled training data and linear discriminant projections[Mikolajczyk & Matas ICCV’07]
3. End-to-end Learning – Performs comparable or worse than Lw, while slowing down the convergence
…global max pooling & L2-norm
Dx1 CNNdesc.
whitening
end-to-end learning post-processing
optionaldim reduction
![Page 50: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/50.jpg)
Efficient Scoring and Ranking
CNN descriptor encoding(512D)
Nearest neighbors used on CNN descriptorsCan use any fast NN search, like ANN
![Page 51: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/51.jpg)
Experiments – datasets
• Oxford 5k dataset[Philbin et al. CVPR’07]
• Paris 6k dataset[Philbin et al. CVPR’08]
• Holidays dataset[Jegou et al. ECCV’10]
• 100k distractor dataset[Philbin et al. CVPR’07]
• Protocol: mean Average Precision (mAP)
Training 3D models do not contain any landmark from
these datasets
![Page 52: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/52.jpg)
Experiments – Learning (AlexNet)
• Careful choice of positive and negative training images makes a difference
Oxford 5k Paris 6k
Off-the-shelf
top 1 CNN + top k CNN
top 1 CNN + top 1 / model CNN
top 1 BoW + top 1 / model CNN
random(top k BoW) + top 1 / model CNN
44.2
51.6
56.2
63.1
56.7
63.9
59.7
67.1
62.2
68.9
60.2
67.5Our learned whitening
![Page 53: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/53.jpg)
Experiments – Over-fitting and Generalization
• We added Oxford and Paris landmarks as 3D models and repeated fine-tuning
Only +0.3 mAP on average over all testing datasets
![Page 54: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/54.jpg)
State-of-the-art
63.5
69.2
NetVLAD 256D
vs.
Our CNN 32D
Concurrent work: [Gordo et al. ECCV’16]
![Page 55: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/55.jpg)
Teacher vs. Student
Our CNN with re-ranking (R) and query expansion(QE) surpasses its teacher on all datasets!!!
Method Oxf5k Oxf105k Par6k Par106k
BoW(16M)+R+QE 84.9 79.5 82.4 77.3
CNN(512D) 79.7 73.9 82.4 74.6
CNN(512D)+R+QE 85.0 81.8 86.5 78.8
![Page 56: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/56.jpg)
Teacher vs. Student
query
top 10 (correct | incorrect)
BoW
CNN
first incorrect at rank 127
![Page 57: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/57.jpg)
first incorrect at rank 159
Teacher vs. Student
query top 10 (correct | incorrect)
BoW
CNN
Fine-tuningmight not be enough
![Page 58: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/58.jpg)
CNN descriptorsSignificant viewpoint scale change lots of training data
Significant illumination change lots of training data
Severe occlusions lots of training data
Visually similar but different objects lots of training data
![Page 59: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/59.jpg)
CNN descriptorsSignificant viewpoint scale change lots of training data
Significant illumination change lots of training data
Severe occlusions lots of training data
Visually similar but different objects lots of training data
versus
Bag of WordsSignificant viewpoint scale change covariant local features, invariant descriptors
Significant illumination change color-normalized feature descriptors
Severe occlusions locality of the features, geometric verification
Visually similar but different objects Feature discriminability & geometric verification
![Page 60: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/60.jpg)
CNN descriptor learning
• Proposed a method to generate the necessary “lots of training examples” without any human interaction
• Strong supervision for hard negative, hard positive mining, and supervised whitening
• Data and trained networks available at:cmp.felk.cvut.cz/~radenfil/projects/siamac.html
• For more details about the paper visit Poster O-1A-01
![Page 62: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/62.jpg)
So is the Bag-of-Words REALLY torn?
Not yet, but don’t mess with tape ;)
![Page 63: AI Ukraine 2016 Kharkiv, Ukraine · Kharkiv, Ukraine James Pritts Center for Machine Perception Czech Technical University in Prague. Who are we? Filip Radenović PhD candidate James](https://reader034.fdocuments.us/reader034/viewer/2022050518/5fa1d7f17e24bf75ee491687/html5/thumbnails/63.jpg)
Questions?
• Thanks for your attention
• Interested students should ask about our PhD program
Center for Machine Perception
Czech Technical University in Prague
http://cmp.felk.cvut.cz
Contact Jiri Matas or Ondrej Chum