Object Recognition. So what does object recognition involve?
Object Recognition
description
Transcript of Object Recognition
![Page 1: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/1.jpg)
Object Recognition
![Page 2: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/2.jpg)
So what does object recognition involve?
![Page 3: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/3.jpg)
Verification: is that a bus?
![Page 4: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/4.jpg)
Detection: are there cars?
![Page 5: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/5.jpg)
Identification: is that a picture of Mao?
![Page 6: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/6.jpg)
Object categorization
sky
building
flag
wallbanner
bus
cars
bus
face
street lamp
![Page 7: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/7.jpg)
Challenges 1: view point variation
Michelangelo 1475-1564
![Page 8: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/8.jpg)
Challenges 2: illumination
slide credit: S. Ullman
![Page 9: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/9.jpg)
Challenges 3: occlusion
Magritte, 1957
![Page 10: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/10.jpg)
Challenges 4: scale
![Page 11: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/11.jpg)
Challenges 5: deformation
Xu, Beihong 1943
![Page 12: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/12.jpg)
Challenges 7: intra-class variation
![Page 13: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/13.jpg)
Two main approaches
Part-basedGlobal sub-window
![Page 14: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/14.jpg)
Global Approaches
x1 x2 x3
Vectors in high-dimensional space
Aligned images
![Page 15: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/15.jpg)
x1 x2 x3
Vectors in high-dimensional space
Global Approaches
Training
Involves some dimensionality
reduction
Detector
![Page 16: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/16.jpg)
– Scale / position range to search over
Detection
![Page 17: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/17.jpg)
Detection– Scale / position range to search over
![Page 18: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/18.jpg)
Detection– Scale / position range to search over
![Page 19: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/19.jpg)
Detection– Combine detection over space and scale.
![Page 20: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/20.jpg)
PROJECT 1
Build a detection system that inputs an image, runs a detector over (x,y) and scales, and removes spurious detections. The system should be able to run different detectors. For initial testing use linear SVM (existing package).
Challenge:• Algorithm for integration of raw detections. • Speed.
![Page 21: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/21.jpg)
• Turk and Pentland, 1991• Belhumeur et al. 1997• Schneiderman et al. 2004• Viola and Jones, 2000• Keren et al. 2001• Osadchy et al. 2004
• Amit and Geman, 1999• LeCun et al. 1998• Belongie and Malik, 2002
• Schneiderman et al. 2004• Argawal and Roth, 2002• Poggio et al. 1993
![Page 22: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/22.jpg)
Antiface method for detection
• No training on negative examples is required.• A set of rejectors is applied in cascaded manner.
• Robust to large pose variation.• Simple and very fast.
![Page 23: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/23.jpg)
Intuition
Lower probability
Lower probability
dxdyII yxeIP 22
)(
image smoothness measure
Boltzmann distribution
How are the natural images distributed in a high dimensional space?
![Page 24: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/24.jpg)
Lower probability
Lower probability
Antiface Much less false positives
PCA Many false positives
Intuition
![Page 25: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/25.jpg)
Main Idea Claim: for random natural images viewed as
unit vectors,
yx, y x,
is large on average.is large on average.
– for all positive classxd , x
– d is smooth
xd , is large on average for random natural image.
Anti-Face detector is defined as a vector d satisfying:
![Page 26: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/26.jpg)
Discrimination
x
xxd ,
xd ,
x
SMALL
LARGE
If x is an image and is a target class:
![Page 27: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/27.jpg)
Cascade of Independent Detectors
1d
2d
3d
7 inner products
4 inner products
![Page 28: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/28.jpg)
Example
Samples from the training set
4 Anti-Face Detectors
![Page 29: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/29.jpg)
4 Anti-face Detectors4 Anti-face Detectors
![Page 30: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/30.jpg)
Eigenface method with the subspace of dimension 100
![Page 31: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/31.jpg)
PROJECT 2
• Implement Antiface method for detection*.• Implement several extensions of Antifaces:
– Change the accepting rule so that instead of passing all the detectors it passes at least 80% of detectors.
– Apply Naïve Bayes in 10D antiface space – Project each image onto 20D Antiface space and train
SVM in this space.
See project page for details
* D. Keren M. Osadchy and C. Gotsman, Anti-Faces: A novel, fast method for image detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 7, July 2001, pp. 747-761.
![Page 32: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/32.jpg)
Part-Based Approaches
ObjectObject
Bag of ‘words’Bag of ‘words’
Constellation of partsConstellation of parts
![Page 33: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/33.jpg)
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
Bag of ‘words’ analogy to documents
![Page 34: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/34.jpg)
![Page 35: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/35.jpg)
Interest Point Detectors
• Basic requirements:– Sparse– Informative – Repeatable
• Invariance– Rotation– Scale (Similarity)– Affine
![Page 36: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/36.jpg)
Popular Detectors
Scale Invariant
Affine Invariant
Harris-Laplace Affine
Difference of Gaussians Laplace of Gaussians Scale Saliency (Kadir-Braidy)
Harris-Laplace
Difference of Gaussians
Affine
Laplace of Gaussians
Affine
Affine Saliency (Kadir-Braidy)
The are many others…
See:
1) “Scale and affine invariant interest point detectors” K. Mikolajczyk, C. Schmid,
IJCV, Volume 60, Number 1 - 2004
2) “A comparison of affine region detectors”, K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, http://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/vibes_ijcv2004.pdf
![Page 37: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/37.jpg)
Representation of appearance:Local Descriptors
• Invariance– Rotation– Scale – Affine
• Insensitive to small deformations
• Illumination invariance– Normalize out
![Page 38: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/38.jpg)
SIFT – Scale Invariant Feature Transform
• Descriptor overview:– Determine scale (by maximizing DoG in scale and in space),
local orientation as the dominant gradient direction.Use this scale and orientation to make all further computations invariant to scale and rotation.
– Compute gradient orientation histograms of several small windows (128 values for each point)
– Normalize the descriptor to make it invariant to intensity change
David G. Lowe, "Distinctive image features from scale-invariant keypoints,“ International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
![Page 39: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/39.jpg)
Feature Detection and Representation
Normalize patch
Detect patches[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Compute SIFT
descriptor
[Lowe’99]
Slide credit: Josef Sivic
![Page 40: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/40.jpg)
…
Feature Detection and Representation
![Page 41: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/41.jpg)
Codewords dictionary formationCodewords dictionary formation
…
![Page 42: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/42.jpg)
Codewords dictionary formationCodewords dictionary formation
Vector quantization
…
Slide credit: Josef Sivic
![Page 43: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/43.jpg)
Codewords dictionary formationCodewords dictionary formation
Fei-Fei et al. 2005
![Page 44: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/44.jpg)
Image patch examples of codewordsImage patch examples of codewords
Sivic et al. 2005
![Page 45: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/45.jpg)
Vector X
Representation
Learning
positive negative
SVM classifier
positive negative
SVM classification
![Page 46: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/46.jpg)
SVM classification
Recognition
SVM(X)
Contains object
Vector X
Representation
Doesn’t contain object
![Page 47: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/47.jpg)
PROJECT 3
• Implement a bag of ‘words’ approach. The method is described in “Visual Categorization with Bags of Keypoints” G.Cruska, C. R. Dance, L.Fan, J.Willamowski,C. Bray.
• Test it on 4 categories (from 101 database): airplanes, faces, cars side, motorbikes, against background.
![Page 48: Object Recognition](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814547550346895db214d8/html5/thumbnails/48.jpg)
PROJECT 4
• Implement part based method, described in “Class Recognition Using Discriminative Local Features”, by G. Dorkó, C. Schmid.
• Test it on Oxford object data set.• Compare the performance of the algorithm using
different point detectors. The code for point detectors is provided.
• Compare the performance of the algorithm with original SIFT and with SIFT without rotation invariance. The initial code for SIFT is provided, but should be edited to remove rotation invariance.