Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling...
Transcript of Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling...
![Page 1: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/1.jpg)
Introduction to Research in Mobile Robotics: Visual Place Recognition
Luis Gomez Camara [email protected]
Intelligent and Mobile Robotics (IMR) Group Czech Institute of Informatics, Robotics and Cybernetics
Czech Technical University in Prague
![Page 2: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/2.jpg)
Motivation: Lifelong Autonomy
Long-term autonomy of mobile robots is a highly relevant research topic (also at IMR)
Requires navigation over extended periods of time
Long-term navigation is challenging due
Accumulation of errors (drift)
Dynamic environments
Visual Place Recognition (VPR) is a valuable tool
![Page 3: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/3.jpg)
Long-term navigation: Applications
Self-driving cars
Planetary roversAutonomous
underwater vehicles
Injectable nanorobots
UAVs
Domestic robots
. . .
![Page 4: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/4.jpg)
Robot navigation
The ability of a robot to
Determine its own position in its frame of reference
Plan a path towards some goal location
Answering the questions:
1. Where am I?
2. How do I get to other places?
3. Where are other places relative to me?
Navigation consists of:
1. Self-localization
2. Path planning
3. Map building and map interpretation
![Page 5: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/5.jpg)
Visual SLAM
Simultaneous Localization And Mapping using optical sensors
Image: www.dragonfly.com
Problem: Drifting over time
Loop closure to correct the drift
Visual Place Recognition to solve loop closure
![Page 6: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/6.jpg)
Visual SLAM: Drift
“SLAM- Loop Closing with Visually Salient Features
P. Newman et. al. 2005
Red arrows: camera pose
Grey ellipses: global uncertainty
Images: views used in loop closure
Note angular error at the bottom right
![Page 7: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/7.jpg)
Visual SLAM: Loop closure
Definition: the task of recognising a previously-visited location and updating beliefs accordingly
“Deformation-based loop closure for large scale dense
RGB-D SLAM”, Thomas Whelan et. al. 2013
Basic component of SLAM systems
Used to correct drift that accumulates over time
Reduction in the uncertainty of the map estimate
Necessary for long-term navigation
![Page 8: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/8.jpg)
Visual Place Recognition (VPR)
Definition: given a query image of a place, find its location by comparison with a database of previously visited places
Query image
Database of places
?
Fundamental and challenging problem in computer vision
![Page 9: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/9.jpg)
Navigation
Autonomous driving
Geolocalization
Image retrieval
AR, VR, etc.
VPR: Applications
![Page 10: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/10.jpg)
VPR: Challenges
Day-night cycles and illumination changes
Weather and season-related changes Viewpoint changes Dynamic objects, occlusions, etc.
![Page 11: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/11.jpg)
Image Retrieval: Pipeline
Offline stage: Database creation
Online stage: Place recognition
Places image dataset Feature extractionImage Feature Representation
f1 f2 fn…
Places database
Feature Matching
Feature extractionImage Feature Representation
f1 f2 fn…
Query image
Ranked list of candidate
images
Exhaustive search
Re-ranked list of images
Place recognition = best candidate
![Page 12: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/12.jpg)
Image Retrieval: Milestones
Dominated by Bag of Visual Words (BoVW) model Pre-trained and fine-tuned models
Spatial pyramid matchingLazebnik et al.
![Page 13: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/13.jpg)
Image Feature Extraction
Dense sampling
• Patches of fixed size and shape
• Regular grid, possibly overlapping and over range of scales
• Simpler than keypoints (but heavier)
• Optimal for high-level representations (e.g. scene classification)
Interest points (keypoints)
• Salient locations that are likely to match in other images
• Edges, corners, blobs, etc.
• Optimal for image correspondences
![Page 14: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/14.jpg)
Image Feature Descriptors
Created from regions around points of interest
Should be stable (robust) to orientation, illumination, etc.
Can be matched against descriptors in other images
Handcrafted: SIFT, SURF, ORB, etc.
Learned: CNN features
![Page 15: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/15.jpg)
SIFT
Scale Invariant Feature Transform
Hand-crafted (engineered)
Used as both detector and descriptor
There are faster alternatives such as SURF, ORB, BRISK
SIFT is still one the most accurate hand-crafted descriptors
Scale affects detection
"Distinctive Image Features from Scale-Invariant Keypoints”,David Lowe 2004
![Page 16: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/16.jpg)
SIFT
1. Scale-space extrema detector• LoG approximated by DoG
• Successively blur with Gaussian filter
• Scale parameter: standard deviation
First octave second third fourth
Maxima/minima detection
• Find local extrema
• Over both scale and space
![Page 17: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/17.jpg)
SIFT
2. Keypoint localisation• Remove low -contrast keypoints
• Remove keypoints edges
• Only strong points in interest remain
Before After
3. Orientation assignment• Based on local properties, find a consistent
orientation for each keypoint and scale
• Invariance to image rotation
• Orientation histogram (36 bins) around keypoint:
• gradient and magnitude from pixel diffs.
• Highest bin and bins > 80% of highest are used to create keypoints
![Page 18: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/18.jpg)
SIFT
4. Keypoint descriptor• 16x16 neighborhood
• 16 sub-blocks of 4x4 size
• 8 bin orientation histogram per sub-block
• Total of 128 bin values
• Everything relative to keypoint orientation
• Normalization for contrast changes
• Thresholding large gradients for brightness changes
![Page 19: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/19.jpg)
Bag of (visual) Words (BoW)
Traditional approach in VPR
Borrowed from Natural Language Processing
"Video Google: A Text Retrieval Approach to Object Matching in Videos”,
Sivic and Zisserman 2003
Stores zero-order information (word repetitions)
Uses hand-crafted descriptors (SIFT, SURF, ORB, etc.)
![Page 20: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/20.jpg)
Bag of (visual) Words
Steps:1. Extract descriptors from collection of
images
2. Learn visual dictionary by clustering descriptors (e.g. k-means)
3. Represent query image by
• Quantizing descriptors to closest word (centroid)
• Histogram of word repetitions
4. Image is represented as a vector
![Page 21: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/21.jpg)
Vector quantization
Images: www.mathworks.com
![Page 22: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/22.jpg)
Bag of (visual) Words
Pros:
• Largely unaffected by object positions, scale and orientation
• Good for classifying images according to content
• Fast search thanks to inverted indices
(requires sparsity of words in images)
Cons:
• Spatial information is discarded
• Information loss due to quantization
• High dimensionality
Inverted file index
![Page 23: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/23.jpg)
BoW improvements
al pyramid
representation
• Spatial pyramid
• Fisher Vectors:
• Uses Gaussian mixture model (GMM) as vocabulary
• Statistical measure of descriptors wrt GMM
• Derivative of likelihood wrt GMM parameters
• Stores second order information (covariances)
• VLAD: Vector of Locally Aggregated Descriptors
• Similar to Fisher Vectors but only first order information (distances)
![Page 24: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/24.jpg)
Spatial pyramid
level 0 level 1level 2
Based on approximate global geometric correspondence
Image divided into increasingly fine sub-regions
Histograms of local features found inside each sub-region
Extension of an orderless bag-of-features
"Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”,
Lazebnik et al. 2006
![Page 25: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/25.jpg)
Spatial Pyramid
al pyramid
representation
"Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”,
Lazebnik et al. 2006
Weak features: oriented edge points (gradient in a given direction above minimum threshold)
Strong features: SIFT
![Page 26: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/26.jpg)
VLAD (Vector of Locally Aggregated Descriptors)
"Aggregating local descriptors into a compact image representation”,
Jégou et al. 2010
0. Train a visual dictionary C (k-means): 𝐶 = {𝜇1, 𝜇2, . . . , 𝜇𝑘}
1. For an image with m descriptors, 𝑋 = {𝑥1, 𝑥2, . . ., 𝑥𝑚},
assign descriptors to closest cell centroid
2. Compute residuals 𝑥 − 𝜇𝑖
3. Accumulate residuals for each cell:
𝑣𝑖 =
𝑥:𝑛𝑛 𝑥 =𝜇𝑖
(𝑥 − 𝜇𝑖)
4. Concatenate accumulated residuals in vector 𝑣 ∈ ℝ𝑘𝑚
𝑣 = [𝑣1, 𝑣2, . . . , 𝑣𝑘]
![Page 27: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/27.jpg)
VLAD
Advantages
Fast to compute
Adds more discriminative power than BoW
Good results with small dimensionality
Fixed length vector irrespectively of feature detections
"Aggregating local descriptors into a compact image representation”,
Jégou et al. 2010
![Page 28: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/28.jpg)
CNN Features
Extracted from Convolutional Neural Networks
Pre-trained vs. end-to-end
Early layers learn features similar to Gabor filters
Later layers learn more semantic features
Semantic features are robust (a car is always a car)
Spatial information can also be exploited
Gabor filters
Semantic features
![Page 29: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/29.jpg)
CNN Features
CNN: mathematical model with huge number of parameters
Automatically learned during training with massive labelled datasets
Number of CNN features depends on architecture
![Page 30: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/30.jpg)
CNN Features
Important concepts:
Parameter sharing: weights in kernels are used at all locations
Pooling: used to subsample feature maps and obtain translation invariance
![Page 31: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/31.jpg)
CNN Features
"Visualizing and Understanding Convolutional Networks"Zeiler and Fergus 2013
![Page 32: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/32.jpg)
CNN Features
"Visualizing and Understanding Convolutional Networks"Zeiler and Fergus 2013
![Page 33: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/33.jpg)
CNN Features
"Visualizing and Understanding Convolutional Networks"Zeiler and Fergus 2013
![Page 34: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/34.jpg)
CNN: Some recent approaches
"Bag of Local Convolutional Features for Scalable Instance Search" (Mohedano et al. 2016)
• Instance retrieval based on CNN features and the BoW model
• Activations of pre-trained CNN as local features
• High dimensional, sparse representation (N=512, 20k visual words)
• Each local CNN feature is assigned its closest visual word (assignment map)
• Performance comparable to other CNN-based approaches but more scalable
![Page 35: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/35.jpg)
CNN: Some recent approaches
"On the performance of ConvNet features for place recognition" (Sünderhauf et al. 2015)
• Systematic analysis on the performance of pre-trained CNN layers
• Tested on the AlexNet architecture trained on ImageNet
• Nearest Neighbor search of extracted feature vectors
• Layer Conv3 best performing for place recognition
AlexNet architecture: "ImageNet Classification with Deep ConvolutionalNeural Networks",Krizhevsky et al. 2012
Example of CNN image features
![Page 36: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/36.jpg)
CNN: Some recent approaches
"NetVLAD: CNN architecture for weakly supervised place recognition" (Arandjelović et al. 2016)
• Learns image representation in an end-to-end manner for the VPR task
• Steps:
1. Crop the CNN at the last convolutional layer (H x W x D)
2. Each spatial location generates one descriptor
3. Express VLAD image representation as a matrix
... ...
conv3 conv4 conv5N = 13 x 13 = 169
j-th dimension of the i-th descriptor
j-th dimension of the k-th cluster
Membership of descriptor to k-th word (cluster)Value: 0 or 1
4. Make the membership term it differentiable
: Soft membership assignment
, and are sets of trainable parameters
: Response attenuation constant (positve)
![Page 37: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/37.jpg)
CNN: Some recent approaches
"NetVLAD: CNN architecture for weakly supervised place recognition" (Arandjelović et al. 2016)
• More flexible than original VLAD thanks to extra trainable parameters
and are descriptors known to belong to images that should not match
Supervised VLAD allows to learn a better anchor (cluster center)
that minimizes the product between the residuals
![Page 38: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/38.jpg)
CNN: Some recent approaches
"Levelling the Playing Field: A Comprehensive Comparison of Visual PlaceRecognition Approaches under Changing Conditions" (Zaffar et al. 2019)
Berlin Kudamm
Gardens Point
Nordland
![Page 39: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/39.jpg)
Neuroscience of place recognition
"The cognitive map in humans: Spatial navigation and beyond"
Russel A. Epstein et al. 2017
Hippocampus (HPC) and Entorhinal cortex (EC)
Stores map-like spatial codes (cognitive maps)
Supports memory during navigation
A cognitive map is an internal neural
representation of one's surrounding physical
environmentEntorhinal cortex
![Page 40: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/40.jpg)
Neuroscience of place recognition
Parahippocampal Place Area (PPA) and Retrosplenial cortex (RSC)
PPA perception of landmarks and visuospatial structure of the scene
RSC cognitive map retrieval
PPA + RSC Place Recognition
Landmarks:
• Spatial layout (very important)
• Discrete landmarks: buildings, statues, etc.
• Extended topographical landmarks:
arrangement of buildings, valleys, ridges, etc.
Entorhinal cortex
"Where am I now? Distinct Roles for Parahippocampaland Retrosplenial Cortices in Place Recognition" Russel A. Epstein et al. 2007
![Page 41: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/41.jpg)
Visual Cortex Hierarchy
"Bio-inspired computer vision: Towards a synergistic approach of artificial and biological vision"Medathati et al. 2016
![Page 42: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/42.jpg)
Our approach
Use CNNs to extract semantic (robust) features from images
Store them along with their spatial arrangement
Compare images by simultaneously matching features and locations
![Page 43: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/43.jpg)
VGG16 architecture
"Very Deep Convolutional Networks for Large-Scale Image Recognition"Simonyan and Zisserman 2014
Increased depth (16 layers) compared to AlexNet
Smaller convolutional filters (3x3)
Trained on Places205 for scene recognition
Layer conv4_2 for spatial consistency check
We use conv5_2 for quick retrieval of candidates
![Page 44: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/44.jpg)
SSM-VPR (Semantic and Spatial Matching Visual Place Recognition)
Two-stage system
"Spatio-Semantic ConvNet-based Visual Place Recognition" Camara et al. 2019
STAGE 1 Image Filtering
InputQuery image of a place
ProcessFast search of images similar to
query in large database of places
OutputN top-ranked candidates
STAGE 2 Spatial Matching
InputQuery + candidates
ProcessSemantic and geometric comparison of query and
candidate using CNNs
OutputRecognized place (best match)
![Page 45: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/45.jpg)
SSM-VPR
VGG16 CNN pre-trained on Places205 dataset
Image filtering stage:
• Layer conv5_2
• 14x14x512 feature maps
• 16 sliding 7x7x512 cubes per image
• Store into image filtering database (IFDB)
Spatial matching stage
• Layer conv4_2
• 56x56x512 feature maps
• 729 sliding 3x3x512 cubes per image
• Store into spatial matching database (SMDB)
• Also store locations
![Page 46: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/46.jpg)
Image Retrieval: Pipeline
Offline stage: Database creation
Online stage: Place recognition
Places image dataset Feature extractionImage Feature Representation
f1 f2 fn…
Places database
Feature Matching
Feature extractionImage Feature Representation
f1 f2 fn…
Query image
Ranked list of candidate
images
Exhaustive search
Re-ranked list of images
Place recognition = best candidate
![Page 47: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/47.jpg)
SSM-VPR Pipeline
Offline stage: Database creation
Online stage: Place recognition
Places image dataset Feature extractionImage Feature Representation
f1 f2 fn…
IFDB
Feature Matching
Feature extractionImage Feature Representation
f1 f2 fn…
Query image
Ranked list of candidate
images
Exhaustive search
Re-ranked list of images
Place recognition = best candidate
Imagefiltering
Spatial matching
SMDB
![Page 48: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/48.jpg)
SSM-VPR: Image filtering
Ground truth candidate: 0
Ground truth candidate: 89
Ground truth candidate: 25
![Page 49: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/49.jpg)
SSM-VPR: Spatial matching
1. For each location in candidate, find location of closest match in query
2. Set the pair of locations as anchor points
3. Look at the spatial consistency between the locations of matched pairs of vectors
4. Location consistency: check all cells around candidate anchor point
5. Accumulate consistent matches for all locations in candidate
6. Select candidate with largest score
![Page 50: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/50.jpg)
SSM-VPR: Parameter optimization
Berlin Kudamm
Gardens Point
Nordland
Same datasets as in Zaffar et al. 2019
![Page 51: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/51.jpg)
SSM-VPR: Recognition results
Same datasets as in Zaffar et al. 2019
𝑹𝒆𝒄𝒂𝒍𝒍 =𝑻𝑷
(𝑻𝑷 + 𝑭𝑵)𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑻𝑷
(𝑻𝑷 + 𝑭𝑷)
𝑇𝑃 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑃 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑁 = 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
Gardens Point
![Page 52: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/52.jpg)
SSM-VPR: Recognition results
Same datasets as in Zaffar et al. 2019
𝑹𝒆𝒄𝒂𝒍𝒍 =𝑻𝑷
(𝑻𝑷 + 𝑭𝑵)𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑻𝑷
(𝑻𝑷 + 𝑭𝑷)
𝑇𝑃 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑃 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑁 = 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
Kudamm
![Page 53: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/53.jpg)
SSM-VPR: Recognition results
Same datasets as in Zaffar et al. 2019
𝑹𝒆𝒄𝒂𝒍𝒍 =𝑻𝑷
(𝑻𝑷 + 𝑭𝑵)𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =
𝑻𝑷
(𝑻𝑷 + 𝑭𝑷)
𝑇𝑃 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑃 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑁 = 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
Nordland
![Page 54: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/54.jpg)
SSM-VPR: Teach-and-Replay navigation
![Page 55: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/55.jpg)
SSM-VPR: Teach-and-Replay navigation
![Page 56: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/56.jpg)
SSM-VPR: Teach-and-Replay navigation
![Page 57: Introduction to a Research Topic in Mobile Robotics€¦ · Image Feature Extraction Dense sampling •Patches of fixed size and shape •Regular grid, possibly overlapping and over](https://reader034.fdocuments.us/reader034/viewer/2022050201/5f54a9c53565db1bb90d4703/html5/thumbnails/57.jpg)
Conclusions
Separating recognition in two stages is a highly successful approach
High-level CNN features are very robust to changes
Considering the spatial location of features is the key for high performance recognition
Substantial improvement of the state-of-the-art
Interesting applications in autonomous navigation