Lecture 12 Recognition - Robotics and Perception...
Transcript of Lecture 12 Recognition - Robotics and Perception...
![Page 1: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/1.jpg)
Lecture 12 Recognition
Davide Scaramuzza
![Page 2: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/2.jpg)
Oral exam dates
• UZH
– January 19-20
• ETH
– 30.01 to 9.02 2017 (schedule handled by ETH)
• Exam location
– Davide Scaramuzza’s office:
• Andreasstrasse 15, 2.10, 8050 Zurich
![Page 3: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/3.jpg)
Course Evaluation
• Please fill the evaluation form you received by email!
• Provide feedback on
– Exercises: good and bad
– Course: good and bad
– How to improve
![Page 4: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/4.jpg)
Lab Exercise 6 - Today
Room ETH HG E 33.1 from 14:15 to 16:00
Work description: K-means clustering and Bag of Words place recognition
![Page 5: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/5.jpg)
Outline
• Recognition applications and challenges
• Recognition approaches
• Classifiers
• K-means clustering
• Bag of words
• Oral Exam – Instructions and Example questions
![Page 6: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/6.jpg)
Application: large-scale retrieval
Query image Results on a database of 100 Million images
![Page 7: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/7.jpg)
• Smartphone: – Lincoln Microsoft Research – Point & Find, Nokia – SnapTell.com (Amazon) – Google Goggles
Application: recognition for mobile phones
![Page 8: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/8.jpg)
Application: Face recognition
See iPhoto, Google Photos, Facebook
Detection Recognition “Sally”
![Page 9: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/9.jpg)
Application: Face recognition
• Detection works by using four basic types of feature detectors
– The white areas are subtracted from the black ones.
– A special representation of the sample called the integral image makes feature extraction faster.
P. Viola, M. Jones: Robust Real-time Object Detection, Int. Journal of Computer Vision 2001
![Page 10: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/10.jpg)
Application: Optical character recognition (OCR)
Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software
Digit recognition, AT&T labs, using CNN,
by Yann LeCun (1993)
http://yann.lecun.com/
License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
![Page 11: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/11.jpg)
Application: pedestrian recognition
• Detector: Histograms of oriented gradients (HOG)
Credit: Van Gool’s lab, ETH Zurich
![Page 12: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/12.jpg)
Challenges: object intra-class variations
• How to recognize ANY car
• How to recognize ANY cow
![Page 13: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/13.jpg)
Challenges: object intra-class variations
• How to recognize ANY chair
![Page 14: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/14.jpg)
Challenges: context and human experience
![Page 15: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/15.jpg)
Outline
• Recognition challenges
• Recognition approaches
• Classifiers
• K-means clustering
• Bag of words
• Review of the course
• Evaluation of the course
![Page 16: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/16.jpg)
1960-1990
Polygonal objects
2000-today
Any kind of object 1990-2000
Faces, characters,
planar objects
Research progress in recognition
![Page 17: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/17.jpg)
Two schools of approaches
• Model based – Tries to fit a model (2D or 3D)
using a set of corresponding features (lines, point features)
• Example: SIFT matching and RANSAC for model validation
• Appearance based – The model is defined by a set of
images representing the object • Example: template matching can be
thought as a simple object recognition algorithm (the template is the object to recognize); disadvantage of template matching: it works only when the image matches exactly the query
![Page 18: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/18.jpg)
Example of 2D model-based approach Q: Is this Book present in the Scene?
Scene Book
![Page 19: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/19.jpg)
Example of 2D model-based approach Q: Is this Book present in the Scene?
Extract keypoints in both images
![Page 20: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/20.jpg)
Most of the Book’s keypoints are present in the Scene
A: The Book is present in the Scene
Example of 2D model-based approach Q: Is this Book present in the Scene?
Look for corresponding matches
![Page 21: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/21.jpg)
Example of appearance-based approach: Simple 2D template matching
• The model of the object is simply an image
• A simple example: Template matching
– Shift the template over the image and compare (e.g. NCC or SSD)
– Problem: works only if template and object are identical
![Page 22: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/22.jpg)
Example of appearance-based approach: Simple 2D template matching
• The model of the object is simply an image
• A simple example: Template matching
– Shift the template over the image and compare (e.g. NCC or SSD)
– Problem: works only if template and object are identical
![Page 23: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/23.jpg)
Outline
• Recognition challenges
• Recognition approaches
• Classifiers
• K-means clustering
• Bag of words
• Review of the course
• Evaluation of the course
![Page 24: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/24.jpg)
What is the goal of object recognition?
Goal: classify!
• Either
– say yes/no as to whether an object is present in an image
• Or
– categorize an object: determine what class it belongs to (e.g., car, apple, etc)
How to display the result to the user
• Bounding box on object
• Full segmentation
Is it or is it not a car? Bounding box on object Full segmentation
![Page 25: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/25.jpg)
Car/non-car
Classifier
Yes, car. No, not a car.
Basic component: a binary classifier
Detection via classification: Main idea
![Page 26: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/26.jpg)
Detection via classification: Main idea
Car/non-car
Classifier
Feature
extraction
Training examples
1. Obtain training data 2. Define features 3. Define classifier
More in detail, we need to:
![Page 27: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/27.jpg)
• Consider all subwindows in an image
– Sample at multiple scales and positions
• Make a decision per window:
– “Does this contain object X or not?”
Detection via classification: Main idea
Car/non-car
Classifier
![Page 28: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/28.jpg)
Generalization: the machine learning approach
![Page 29: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/29.jpg)
• Apply a prediction function to a feature representation of the image to get the desired output:
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Generalization: the machine learning approach
![Page 30: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/30.jpg)
The machine learning framework
𝑦 = 𝑓(𝒙)
• Training: given a training set of labeled examples {(𝒙1, 𝑦1), … , (𝒙𝑁, 𝑦𝑁)}, estimate the prediction function 𝑓 by minimizing the prediction error on the training set
• Testing: apply 𝑓 to a never-before-seen test example 𝒙 and output the predicted value 𝑦 = 𝑓(𝒙)
output prediction
function
Image
feature
![Page 31: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/31.jpg)
• Images in the training set must be annotated with the “correct answer” that the model is expected to produce
Contains a motorbike
Recognition task and supervision
![Page 32: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/32.jpg)
Examples of possible features
• Blob features
• Histograms of oriented gradients (HOG)
• Image Histograms
![Page 33: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/33.jpg)
Classifiers: Nearest neighbor
𝑓(𝒙) = label of the training example nearest to 𝒙
• No training required!
• All we need is a distance function for our inputs
• Problem: need to compute distances to all training examples! (what if you have 1 million training images and 1 thousand features per image?)
Test
example Training
examples
from class 1
Training
examples
from class 2
Features are represented in the descriptor space (Ex. What is the dimensionality of the descriptor space for SIFT features?)
![Page 34: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/34.jpg)
Classifiers: Linear
• Find a linear function to separate the classes:
𝑓(𝒙) = 𝑠𝑔𝑛(𝒘 𝒙 + 𝑏)
![Page 35: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/35.jpg)
Classifiers: non-linear
Bad classifier (over fitting) Good classifier
![Page 36: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/36.jpg)
Outline
• Recognition challenges
• Recognition approaches
• Classifiers
• K-means clustering
• Bag of words
• Review of the course
• Evaluation of the course
![Page 37: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/37.jpg)
How do we define a classifier?
• We first need to cluster the training data
• Then, we need a distance function to determine to which cluster the query image belongs to
![Page 38: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/38.jpg)
K-means clustering
• k-means clustering is an algorithm to partition 𝑛 observations into 𝑘 clusters in which each observation 𝒙𝑗 belongs to the cluster with center 𝒎𝑖
• It minimizes the sum of squared Euclidean distances between points 𝒙𝑗 and their nearest cluster centers 𝒎𝑖
𝐷 𝑋,𝑀 = (𝑥𝑗 −𝑚𝑖)2
𝑛
𝑗=1
𝑘
𝑖=1
Algorithm:
• Randomly initialize 𝑘 cluster centers
• Iterate until convergence:
– Assign each data point 𝒙𝑗 to the nearest center 𝒎𝑖
– Recompute each cluster center as the mean of all points assigned to it
![Page 39: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/39.jpg)
K-means demo
Source: http://shabal.in/visuals/kmeans/1.html
![Page 40: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/40.jpg)
Outline
• Recognition challenges
• Recognition approaches
• Classifiers
• K-means clustering
• Bag of words
• Review of the course
• Evaluation of the course
![Page 41: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/41.jpg)
Most of the Book’s keypoints are present in the Scene
A: The Book is present in the Scene
Review: Feature-based object recognition Q: Is this Book present in the Scene?
Look for corresponding matches
![Page 42: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/42.jpg)
Taking this a step further…
• Find an object in an image
• Find an object in multiple images
• Find multiple objects in multiple images
?
?
? As the number of images increases, feature-based object recognition becomes computationaly more and more expensive
![Page 43: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/43.jpg)
Application: large-scale image retrieval
Query image Results on a database of 100 million images
![Page 44: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/44.jpg)
Slide
![Page 45: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/45.jpg)
Slide Slide Credit: Nister
![Page 46: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/46.jpg)
Slide
![Page 47: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/47.jpg)
Fast visual search
“Scalable Recognition with a Vocabulary Tree”, Nister and Stewenius, CVPR 2006.
“Video Google”, Sivic and Zisserman, ICCV 2003
• Query in a database of 100 million images in 6 seconds
![Page 48: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/48.jpg)
Bag of Words • Extension to scene/place recognition:
– Is this image in my database?
– Robot: Have I been to this place before?
![Page 49: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/49.jpg)
University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Visual Place Recognition
Goal: find the most similar images of a query image in a database of 𝑵 images
Complexity: 𝑁2∙𝑀2
2 feature comparisons (worst-case scenario)
Each image must be compared with all other images!
𝑁 is the number of all images collected by a robot
- Example: 1 image per meter of travelled distance over a 100𝑚2 house with one
robot and 100 feature per image → M = 100, 𝑁 = 100 → 𝑁2𝑀2/2=
~ 50 𝑀𝑖𝑙𝑙𝑖𝑜𝑛 feature comparisons!
Solution: Use an inverted file index! Complexity reduces to 𝑵 ∙ 𝑴
[“Video Google”, Sivic & Zisserman, ICCV’03]
[“Scalable Recognition with a Vocabulary Tree”, Nister & Stewenius, CVPR’06] See also FABMAP and Galvez-Lopez’12’s (DBoW2)]
![Page 50: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/50.jpg)
Indexing local features: inverted file text
• For text documents, an efficient way to find all pages on which a word occurs is to use an index
• We want to find all images in which a feature occurs
• How many distinct SIFT or BRISK features exist?
– SIFT → Infinite – BRISK-128 → 2128 = 3.4 ∙ 1038
• Since the number of image features may be
infinite, before we build our visual vocabulary we need to map our features to “visual words”
• Using analogies from text retrieval, we should:
– Define a “Visual Word” – Define a “vocabulary” of Visual Words – This approach is known as “Bag of
Words” (BOW)
![Page 51: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/51.jpg)
Building the Visual Vocabulary
Image Collection Extract Features Cluster Descriptors
Descriptors space
Examples of
Visual Words:
![Page 52: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/52.jpg)
Inverted File index
• Inverted File Data Base (DB) lists all possible visual words
• Each word points to a list of images where this word occurs
• Voting array: has as many cells as images in the DB – each word in query image votes for an image
101 103 105 105 180 180 180
… Voting Array for Q
0
… 101 102 103 104 105
…
Inverted File DB
Query image Q
Visual words in Q
+1 +1 +1
Visual word List of images that this word appears in
![Page 53: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/53.jpg)
Populating the vocabulary
![Page 54: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/54.jpg)
Populating the vocabulary
![Page 55: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/55.jpg)
Populating the vocabulary
![Page 56: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/56.jpg)
Populating the vocabulary
![Page 57: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/57.jpg)
Populating the vocabulary
![Page 58: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/58.jpg)
Populating the vocabulary
![Page 59: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/59.jpg)
![Page 60: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/60.jpg)
![Page 61: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/61.jpg)
![Page 62: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/62.jpg)
![Page 63: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/63.jpg)
![Page 64: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/64.jpg)
![Page 65: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/65.jpg)
![Page 66: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/66.jpg)
![Page 67: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/67.jpg)
![Page 68: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/68.jpg)
![Page 69: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/69.jpg)
![Page 70: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/70.jpg)
![Page 71: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/71.jpg)
Building the inverted file index
![Page 72: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/72.jpg)
Building the inverted file index
![Page 73: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/73.jpg)
Building the inverted file index
![Page 74: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/74.jpg)
Building the inverted file index
![Page 75: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/75.jpg)
Recognition
![Page 76: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/76.jpg)
Robust object/scene recognition
• Visual Vocabulary discards the spatial relationships between features
– Two images with the same features shuffled around will return a 100% match when using only appearance information.
• This can be overcome using geometric verification
– Test the ℎ most similar images to the query image for geometric consistency (e.g. using 5- or 8-point RANSAC) and retain the image with the smallest reprojection error and largest number of inliers
– Further reading (out of scope of this course):
• [Cummins and Newman, IJRR 2011]
• [Stewénius et al, ECCV 2012]
![Page 77: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/77.jpg)
Video Google System
1. Collect all words within query region
2. Inverted file index to find relevant frames
3. Compare word counts
4. Spatial verification
Sivic & Zisserman, ICCV 2003
• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/
Query region
Retrieved frames
![Page 78: Lecture 12 Recognition - Robotics and Perception Grouprpg.ifi.uzh.ch/docs/teaching/2016/12_recognition.pdfCourse Evaluation •Please fill the evaluation form you received by email!](https://reader034.fdocuments.us/reader034/viewer/2022042912/5f4547b625535f04fe694152/html5/thumbnails/78.jpg)
FABMAP [Cummins and Newman IJRR 2011]
• Place recognition for robot localization
• Use training images to build the BoW database
• Captures the dependencies of visual words to distinguish the most characteristic structure of each scene
• Probabilistic model of the world. At a new frame, compute:
– P(being at a known place)
– P(being at a new place)
• Very high performance
• Binaries available online
• Open FABMAP