Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation:...
-
date post
15-Jan-2016 -
Category
Documents
-
view
221 -
download
0
Transcript of Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation:...
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 1
Object RecognitionObject Recognition
Outline:
• Introduction
• Representation: Concept
• Representation: Features
• Learning & Recognition
• Segmentation & Recognition
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 2
Credits: major sources of material, including figures and slides were:
• Riesenhuber & Poggio, Hierarchical models of object recognition in cortex. Nature Neuroscience, 1991.
• B. Mel. SeeMore. Neural Computation, 1997.
• Ullman, Vidal-Naquet, Sari. Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 2002.
• David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. of Computer Vision, 2004.
• and various resources on the WWW
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 3
Why is it difficult?Why is it difficult?
• position/pose/scale• lighting/shadows
• articulation/expression• partial occlusion
Because appearance drastically varies with:
need invariant recognition!
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 4
The “Classical View”The “Classical View”Historically:
Segmentation
Feature Extractio
n
Recognition
Problem:Bottom-up segmentation only works in very limited range of situations! This architecture is fundamentally flawed!
Image
Two ways out: 1) “direct” recognition, 2) integration of seg.&rec.
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 5
Ventral StreamVentral Stream
→ larger RFs, higher “complexity”, higher invariance →
V1 V2 V4 IT
edges, bars objects, faces
D.vanEssen (V2) K.Tanaka (IT)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 6
Basic ModelsBasic Models
seminal work by Fukushima, newer version by Riesenhuber and Poggio
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 7
QuestionsQuestions• what are the intermediate features?
• how/why are they being learned?
• how is invariance computation implemented?• what nonlinearities; at what level (dendrites?)
• how is invariance learned?• temporal continuity; role of eye movements
• basic model is feedforward, what do feedback connections do?• attention/segmentation/bayesian inference?
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 8
Representation: ConceptRepresentation: Concept• 3-d models: won’t talk about
• view-based:
• holistic descriptions of a view
• invariant features/histogram techniques
• spatial constellation of localized features
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 9
Holistic Descriptions I:Holistic Descriptions I:TemplatesTemplates
Idea:• compare image (regions) directly to template• image patches, object template are represented as
high-dimensional vectors• simple comparison metrics (Euclidean distance,
normalized correlation, ...)
Problem:• such metrics not robust w.r.t. even small changes in
position/aspect/scale changes or deformations difficult to achieve invariance
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 10
Holistic Descriptions II:Holistic Descriptions II:Eigenspace ApproachEigenspace Approach
Somewhat better: “Eigenspace” approaches• perform Principal Component Analysis (PCA) on
training images (e.g. “Eigenfaces”• compare images by projecting on subset of the PCs
Turk&Pentland (1992)Murase&Nayar (1995)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 11
AssessmentAssessment
• quite successful for segmented and carefully aligned images (e.g., eyes and nose are at the same pixel coordinates in all images)
• but similar problems as above:• not well-suited for clutter• problems with occlusions• some notable extensions trying to deal with this
(e.g., Leonardis, 1996,1997)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 12
Feature HistogramsFeature HistogramsIdea: reach invariance by computing invariant featuresExamples: Mel (1997), Schiele&Crowley (1997,2000)
histogram pooling: throw occurrences of simple feature from all image regions together into one “bin”
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 13
Assessment:• works very well for segmented images with• only one object, but...
Problem:• histograms of simple features over the whole image
leads to a “superposition catastrophe”, lacks a “binding” mechanism
• consider several objects in scene: histogram contains all their features; no representation of which features came from same object
• system breaks down for clutter or complex backgrounds
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 14
B.
Mel (1
99
7)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 15
Training and test images, performance:
A B C D E
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 16
Feature ConstellationsFeature Constellations
Elastic Matching Techniques:Fischler&Elschlager (1973), Lades et.al. (1993)
Tremendously successful for:• face finding/recognition• object recognition• gesture recognition• cluttered scene analysis
“Elastic Graph Matching”(EGM)
Observation:holistic templates and histogram techniques can´t handle cluttered scenes well
Idea:How about constellations of features?E.g. face is constellation of eyes, nose, mouth, etc.
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 17
Representation: Representation: FeaturesFeatures
Only discuss local features:
• image patches
• wavelet basis, e.g., Haar, Gabor
• complex features, e.g., SIFT (= Scale Invariant Feature Transform)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 18
Image PatchesImage Patches
likelihood ratio:
“merit”:
weight:
Ullm
an
, V
idal-
Naq
uet,
Sali
(20
02
)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 19
Intermediate complexity is best: (trivial result, really)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 20
Recognition examples:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 21
Gabor WaveletsGabor Wavelets
image space frequency space
• in frequency space Gabor wavelet is a Gaussian• “wavelet”: different wavelets are scaled/rotated versions of a mother wavelet
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 22
Gabor Wavelets as Gabor Wavelets as filtersfilters
Gabor filters: sin() and cos() part
compute correlation of image withfilter at every location x0:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 23
Tiling of frequency space: Tiling of frequency space: JetsJets
measured frequency tuning of biological neurons (left) and dense coverage
applying different Gabor filters (with different k) to sameimage location gives vector of filter responses: Jet
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 24
SIFT FeaturesSIFT Features• step 1: find scale space extrema
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 25
• step 2: apply contrast and curvature requirements
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 26
• step 3: local image descriptor extracted at key points is a 128-dim vector
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 27
Learning and Learning and RecognitionRecognition
• top-down model matching• Elastic graph matching
• bottom-up indexing• with or without shared features
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 28
Elastic Graph Matching Elastic Graph Matching (EGM)(EGM)
“view based”: need differentgraphs for different views
Representation:graph nodes labelled with Jets (Gabor filterresponses of different scales/orientations)
Matching:Minimize cost function that punishesdissimilarities of Gabor responses anddistortions of the graph through stochasticoptimization techniques
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 29
Bunch GraphsBunch GraphsIdea: add invariance by labelling graph nodes with collectionor bunch of different feature exemplars (Wiskott et.al.,1995, 1997)
Advantage: can decouple finding the facial features from the identification
Matching uses a MAX rule.
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 30
Indexing MethodsIndexing Methods
• when you want to recognize very many objects, it’s inefficient to individually check for each model by searching for all of its features in a top-down fashion
• better: indexing methods• also: share features among object models
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 31
Recognition with SIFT Recognition with SIFT featuresfeatures
• recognition: extract SIFT features; match to nearest neighbor in data base of stored features; use Hough transform to pool votes
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 32
Recognition with Gabor Jets Recognition with Gabor Jets and Color Featuresand Color Features
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 33
Scaling Behavior when Scaling Behavior when Sharing Features between Sharing Features between
modelsmodels
• Recognition speed limited more by number of features rather than number of object models, modest number of features o.k.• can incorporate many feature types• can incorporate stereo (reasoning about occlusions)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 34
Hierarchies of FeaturesHierarchies of Features
Long history of using hierarchies:Fukushima’s Neocognitron (1983),Nelson&Selinger (1998,1999):
Advantages using hierarchy:• faster learning and processing• better grip on correlated deformations• easier to find proper specificity vs. invariance tradeoff?
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 35
Feature LearningFeature Learning• Unsupervised clustering: not necessarily optimal for
discrimination
• Use big bag of features, fish out the useful ones (e.g. via boosting: Viola, 1997): takes very long to train, since you have to consider every feature from that big bag
• Note: usefulness of one feature depends on the which other ones you’re using already.
• Learn higher level features as (nonlinear) combinations of lower level features (Perona et.al., 2000): also takes very long to train, only up to 5 features. But could use locality constraint
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 36
FeedbackFeedback
Question: Why all the feedback connections in the brain?Important for on-line processing?
Neuroscience: Object recognition in 150 ms (Thorpe et.al., 1996), but interesting temporal response properties of IT neurons (Oram&Richmond, 1999); some V1 neurons “restore” line behind an occluder
Idea: Feed-forward architecture: can’t correct errors made at early stages later on. Feedback architecture can!
“High level hypotheses try to reinforce their lower level evidence while hypotheses compete at all levels.”
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 37
Recognition & SegmentationRecognition & Segmentation
• Basic Idea: integrate recognition with segmentation in a feedback architecture:
• object hypotheses reinforce their supporting evidence and inhibit competing evidence, suppressing features that do not belong to them (idea goes back to at least the PDP books)
• at the same time: restore missing features due to partial occlusion (associative memory property)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 38
Current work in this areaCurrent work in this area
• mostly demonstrating how recognition can aid segmentation
• what is missing is a clear and elegant demonstration of a truly integrated system that shows how the two kinds of processing help each other
• Maybe don’t treat as two kinds of processing but one inference problem
• how best to do this? “million dollar question”