Categgyory-level localization -...
Transcript of Categgyory-level localization -...
![Page 1: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/1.jpg)
Category-level localizationg y
Cordelia SchmidCordelia Schmid
![Page 2: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/2.jpg)
RecognitionRecognition
• Classification• Classification– Object present/absent in an image– Often presence of a significant amount of background clutter
• Localization / Detection– Localize object within the
frame– Bounding box or pixel-
level segmentation
![Page 3: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/3.jpg)
Pixel-level object classificationPixel level object classification
![Page 4: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/4.jpg)
DifficultiesDifficulties
Intra class variations• Intra-class variations
• Scale and viewpoint change
• Multiple aspects of categories
![Page 5: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/5.jpg)
ApproachesApproaches
Intra class variation• Intra-class variation => Modeling of the variations, mainly by learning from a large dataset for example by SVMslarge dataset, for example by SVMs
• Scale + limited viewpoints changes• Scale + limited viewpoints changes => multi-scale approach
• Multiple aspects of categories> separate detectors for each aspect front/profile face=> separate detectors for each aspect, front/profile face,
build an approximate 3D “category” model
![Page 6: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/6.jpg)
Outline
S1. Sliding window detectors
2. Features and adding spatial informationg p
3. Histogram of Oriented Gradients (HOG)
4. State of the art algorithms and PASCAL VOC
![Page 7: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/7.jpg)
Sliding window detector• Basic component: binary classifier
Car/non-carClassifier
Yes,No,ta carnot a car
![Page 8: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/8.jpg)
Sliding window detector• Detect objects in clutter by search
Car/non-carClassifier
• Sliding window: exhaustive search over position and scale
![Page 9: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/9.jpg)
Sliding window detector• Detect objects in clutter by search
Car/non-carClassifier
• Sliding window: exhaustive search over position and scale
![Page 10: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/10.jpg)
Detection by Classification• Detect objects in clutter by search
Car/non-carClassifier
• Sliding window: exhaustive search over position and scaleSliding window: exhaustive search over position and scale(can use same size window over a spatial pyramid of images)
![Page 11: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/11.jpg)
Window (Image) Classification
Training Data
Feature
ClassifierExtraction
• Features usually engineered Car/Non-car
• Classifier learnt from data
![Page 12: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/12.jpg)
Problems with sliding windows …
• aspect ratio
• granularity (finite grid)• granularity (finite grid)
• partial occlusion
• multiple responses
![Page 13: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/13.jpg)
Outline
S1. Sliding window detectors
2. Features and adding spatial informationg p
3. Histogram of Oriented Gradients (HOG)
4. State of the art algorithms and PASCAL VOC
![Page 14: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/14.jpg)
BOW + Spatial pyramidsStart from BoW for region of interest (ROI)
• no spatial information recordedno spatial information recorded
• sliding window detector
B f W dBag of Words
Feature Vector
![Page 15: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/15.jpg)
Adding Spatial Information to Bag of Words
Bag of Words
C t t
Concatenate
Feature Vector
Keeps fixed length feature vector for a window
![Page 16: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/16.jpg)
Spatial Pyramid – represent correspondence
1 BoW 4 BoW 16 BoW
16 BoW
![Page 17: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/17.jpg)
Dense Visual Words• Why extract only sparse image
fragments?fragments?
• Good where lots of invariance is needed, but not relevant to sliding window detection?
• Extract dense visual words on an overlapping grid
QuantizeWord
Patch / SIFT
![Page 18: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/18.jpg)
Outline
S1. Sliding window detectors
2. Features and adding spatial informationg p
3. Histogram of Oriented Gradients + linear SVM classifier
4. State of the art algorithms and PASCAL VOC
![Page 19: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/19.jpg)
Feature: Histogram of Oriented Gradients (HOG)Gradients (HOG)
imagedominant direction HOG
ency
• tile 64 x 128 pixel window into 8 x 8 pixel cells
frequ
e
orientation
tile 64 x 128 pixel window into 8 x 8 pixel cells
• each cell represented by histogram over 8 orientation bins (i.e. angles in range 0-180 degrees) orientation
![Page 20: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/20.jpg)
Histogram of Oriented Gradients (HOG) continued
• Adds a second level of overlapping spatial bins re• Adds a second level of overlapping spatial bins re-normalizing orientation histograms over a larger spatial area
• Feature vector dimension (approx) = 16 x 8 (for tiling) x 8 (orientations) x 4 (for blocks) = 4096(orientations) x 4 (for blocks) 4096
![Page 21: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/21.jpg)
Window (Image) Classification
Training Data
Feature
ClassifierExtraction
• HOG Features pedestrian/Non-pedestrian
• Linear SVM classifier
![Page 22: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/22.jpg)
![Page 23: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/23.jpg)
Averaged examples
![Page 24: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/24.jpg)
Dalal and Triggs, CVPR 2005
![Page 25: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/25.jpg)
Learned model
f(x) wTx b
average over positive training datap g
![Page 26: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/26.jpg)
Training a sliding window detector
• Unlike training an image classifier there are a (virtually)
g g
• Unlike training an image classifier, there are a (virtually) infinite number of possible negative windows
Training (learning) generally proceeds in three distinct• Training (learning) generally proceeds in three distinct stages:
1 B i l i i i l i d l ifi f1. Bootstrapping: learn an initial window classifier from positives and random negatives
2. Hard negatives: use the initial window classifier for detection on the training images (inference) and identify false positives with a high scorefalse positives with a high score
3. Retraining: use the hard negatives as additional t i i d ttraining data
![Page 27: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/27.jpg)
Training a sliding window detector• Object detection is inherently asymmetric: much more
“non-object” than “object” datanon object than object data
• Classifier needs to have very low false positive rate• Non-object category is very complex – need lots of data• Non-object category is very complex – need lots of data
![Page 28: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/28.jpg)
Bootstrapping
1. Pick negative training set at randomset at random
2. Train classifier3 Run on training data3. Run on training data4. Add false positives to
training settraining set5. Repeat from 2
• Collect a finite but diverse set of non-object windows• Force classifier to concentrate on hard negative examples
For some classifiers can ensure equivalence to training on• For some classifiers can ensure equivalence to training on entire data set
![Page 29: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/29.jpg)
Example: train an upper body detector– Training data – used for training and validation sets
33 Hollywood2 training movies• 33 Hollywood2 training movies• 1122 frames with upper bodies marked
– First stage training (bootstrapping)• 1607 upper body annotations jittered to 32k positive samples• 55k negatives sampled from the same set of frames• 55k negatives sampled from the same set of frames
– Second stage training (retraining)• 150k hard negatives found in the training data
![Page 30: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/30.jpg)
Training data positive annotationsTraining data – positive annotations
![Page 31: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/31.jpg)
Positive windows
Note: common size and alignment
![Page 32: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/32.jpg)
Jittered positives
![Page 33: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/33.jpg)
Jittered positives
![Page 34: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/34.jpg)
Random negatives
![Page 35: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/35.jpg)
Random negatives
![Page 36: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/36.jpg)
Window (Image) first stage classification
HOG Feature Linear SVMJittered positives HOG FeatureExtraction
ClassifierJittered positives
random negatives f(x) wTx bxx
• find high scoring false positives detectionsfind high scoring false positives detections
• these are the hard negatives for the next round of training• these are the hard negatives for the next round of training
cost = # training images x inference on each image• cost = # training images x inference on each image
![Page 37: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/37.jpg)
Hard negatives
![Page 38: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/38.jpg)
Hard negatives
![Page 39: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/39.jpg)
First stage performance on validation set
![Page 40: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/40.jpg)
Effects of retrainingg
![Page 41: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/41.jpg)
Side by side
before retraining after retraining
![Page 42: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/42.jpg)
Side by sidebefore retraining after retraining
![Page 43: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/43.jpg)
Accelerating Sliding Window Search• Sliding window search is slow because so many windows are
needed e g x × y × scale ≈ 100 000 for a 320×240 imageneeded e.g. x × y × scale 100,000 for a 320×240 image
• Most windows are clearly not the object class of interest
• Can we speed up the search?
![Page 44: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/44.jpg)
Cascaded Classification• Build a sequence of classifiers with increasing complexity
More complex, slower, lower false positive rate
ClassifierN
FaceClassifier2
Classifier1
Possibly a face
Possibly a faceN21
Window
face face
Non-faceNon-faceNon-face
• Reject easy non-objects using simpler and faster classifiers
![Page 45: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/45.jpg)
Cascaded Classification
• Slow expensive classifiers only applied to a few windows significant speed-up
• Controlling classifier complexity/speed:Controlling classifier complexity/speed:– Number of support vectors [Romdhani et al, 2001]– Number of features [Viola & Jones, 2001]– Two-layer approach [Harzallah et al, 2009]
![Page 46: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/46.jpg)
Summary: Sliding Window Detection• Can convert any image classifier into an
object detector by sliding window Efficientobject detector by sliding window. Efficient search methods available.
• Requirements for invariance are reduced by hi t l ti d lsearching over e.g. translation and scale
S ti l d b• Spatial correspondence can be “engineered in” by spatial tiling
![Page 47: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/47.jpg)
Test: Non-maximum suppression (NMS)• Scanning-window detectors typically result in
multiple responses for the same object
Conf=.9
• To remove multiple responses, a simple greedy procedure p p p g y pcalled “Non-maximum suppression” is applied:
1. Sort all detections by detector confidenceNMS: 1. Sort all detections by detector confidence 2. Choose most confident detection di; remove all dj s.t. overlap(di,dj)>T3. Repeat Step 2. until convergence
NMS:
![Page 48: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/48.jpg)
Outline
S1. Sliding window detectors
2. Features and adding spatial informationg p
3. HOG + linear SVM classifier
4. PASCAL VOC and state of the art algorithms
![Page 49: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/49.jpg)
PASCAL VOC dataset - Content• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat,
chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV
• Real images downloaded from flickr, not filtered for “quality”
• Complex scenes, scale, pose, lighting, occlusion, ...
![Page 50: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/50.jpg)
Annotation• Complete annotation of all objects
O l d d Diffi ltOccludedObject is significantly occluded within BB
DifficultNot scored in evaluation
TruncatedObject extends beyond BB
PoseFacing left
![Page 51: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/51.jpg)
Examples
Aeroplane Bicycle Bird Boat Bottle
Bus Car Cat Chair Cow
![Page 52: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/52.jpg)
Examples
Dining Table Dog Horse Motorbike Person
Potted Plant Sheep Sofa Train TV/Monitorp /
![Page 53: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/53.jpg)
Main Challenge Tasks
• ClassificationI th d i thi i ?– Is there a dog in this image?
– Evaluation by precision/recall
• Detection– Localize all the people (if any) in
this image/– Evaluation by precision/recall
based on bounding box overlap
![Page 54: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/54.jpg)
Detection: Evaluation of Bounding Boxes
• Area of Overlap (AO) Measure• Area of Overlap (AO) MeasureGround truth Bgt
Bgt Bp
Predicted Bp
> ThresholdDetection if50%50%
![Page 55: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/55.jpg)
Classification/Detection Evaluation• Average Precision [TREC] averages precision over the entire range of
recall
1
0.8
– A good score requires both high recall and high precision
Interpolated
0.4
0.6
prec
isio
n – Application-independent
– Penalizes methods giving high
0.2
g g gprecision but low recallAP
0 0.2 0.4 0.6 0.8 10recall
![Page 56: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/56.jpg)
Object Detection with Discriminatively Object Detection with Discriminatively Trained Part Based Models
Pedro F. Felzenszwalb, David Mcallester, Deva Ramanan, Ross Girshick
PAMI 2010
Matlab code available online:http://www.cs.brown.edu/~pff/latent/
![Page 57: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/57.jpg)
Approach
• Mixture of deformable part-based modelsMixture of deformable part-based models– One component per “aspect” e.g. front/side view
• Each component has global template + deformable partsEach component has global template deformable parts• Discriminative training from bounding boxes alone
![Page 58: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/58.jpg)
Example Model• One component of person model
x1
x x
x3
x4
x6
x5
x2
root filterscoarse resolution
part filtersfiner resolution
deformationmodelscoarse resolution finer resolution models
![Page 59: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/59.jpg)
Object Hypothesis• Position of root + each part• Each part: HOG filter (at higher resolution)• Each part: HOG filter (at higher resolution)
p0 : location of root
z = (p0,..., pn)
p1,..., pn : location of parts
S i f filtScore is sum of filter scores minus
deformation costs
![Page 60: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/60.jpg)
Score of a HypothesisAppearance term Spatial prior
filters deformation parameters
displacements
concatenation of HOG features and
concatenation of filters and deformation HOG features and
part displacement features
and deformation parameters
• Linear classifier applied to feature subset defined by hypothesis
![Page 61: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/61.jpg)
Training• Training data = images + bounding boxes• Need to learn: model structure filters deformation costs• Need to learn: model structure, filters, deformation costs• Latent SVM:
determine classifier and model parameters (location of the parts)– determine classifier and model parameters (location of the parts)
Training
![Page 62: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/62.jpg)
Person Model
root filtersl ti
part filtersfi l ti
deformationd lcoarse resolution finer resolution models
![Page 63: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/63.jpg)
Car Model
root filters part filters deformationroot filterscoarse resolution
part filtersfiner resolution
deformationmodels
![Page 64: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/64.jpg)
Car Detections
high scoring false positiveshigh scoring true positives
![Page 65: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/65.jpg)
Person Detections
hi h i t itihigh scoring false positives
high scoring true positivesg g p
(not enough overlap)
![Page 66: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/66.jpg)
Selective search for object location [v.d.Sande et al. 11]
• Pre-select class-independent candidate image windows with segmentation
• Local features + bag-of-words • SVM classifier with histogram intersection kernel + hard negative mining
Guarantees ~95% Recall for any object class in Pascal VOC with only 1500 windows per image
g g g
windows per image
Student presentation
![Page 67: Categgyory-level localization - Inrialear.inrialpes.fr/people/verbeek/mlcr.slides.14.15/localization_2014.pdfRecognition • Classification – Object present/absent in an image –](https://reader030.fdocuments.us/reader030/viewer/2022040700/5d54a64888c993a1598b8e24/html5/thumbnails/67.jpg)
Student presentation