Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz...
-
Upload
ginger-gabriella-morton -
Category
Documents
-
view
215 -
download
0
Transcript of Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz...
![Page 1: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/1.jpg)
Learning and Inference in Vision: from Features to Scene Understanding
Jonathan Huang, Tomasz Malisiewicz
MLD Student Research Symposium, 2009
![Page 2: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/2.jpg)
Road
Sky
Trees
Bridge
SignCar
![Page 3: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/3.jpg)
Huge datasetsPASCAL Visual Objects Challenge (VOC) dataset
~15000 annotated images, ~35,000 annotated object instances, 20 object classes with segmentations, bounding boxes
![Page 4: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/4.jpg)
Huge datasets
LabelMe dataset
~11845 static images, >100,000 labeled polygons
![Page 5: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/5.jpg)
Outline
I. Recognizing single object classes (Jon)
II. Scene understanding with multiple classes (Tomasz)
![Page 6: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/6.jpg)
Recognition task #1: Find all markers
![Page 7: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/7.jpg)
Geometric Variability
Recognition task #2: Find all cats
Object recognition is often hard due to:
![Page 8: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/8.jpg)
Variation within an object class
![Page 9: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/9.jpg)
Viewpoint/Scales/Illumination Variability Images from Flickr
![Page 10: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/10.jpg)
From Pixels to Visual features
car
ImagingImaging
InferenceInference
Scene
Featu
res
Pixels
Low level features
Higher level inference
![Page 11: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/11.jpg)
Local Visual Features
Images are high dimensional!
Compute image statistics in a region (e.g., estimate the distribution of image gradient orientations)
(640 width) *(480 height) = (307200 pixels)
![Page 12: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/12.jpg)
Key ideas in feature design
Be invariant to stuff you don’t care about…
while not being too invariant
![Page 13: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/13.jpg)
Object classification
Inference: What object class is this?Learning: What does each object class look like?
Cow or Horse??
Let’s look at a simpler example first…
![Page 14: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/14.jpg)
Document classification analogy
John Terry scored on a header to lift Chelsea to a 1-0 victory over Manchester United and extend the Blues’ Premier League lead to 5 points. Chelsea had been frustrated by Manchester United for 76 minutes, but took advantage of a free kick awarded when Darren Fletcher fouled Ashley Cole.Brian Ching scored six minutes into overtime and the Houston Dynamo advanced to Major League Soccer’s Western ...
In the Senate, where proposals differ substantially from the House-passed measure on issues like a government-run plan and how to pay for coverage, the bill is stalled while budget analysts assess its overall costs. The slim margin in the House — the bill passed with just two votes to spare, and 39 Democrats opposed it — suggests even greater challenges in the Senate, where the majority leader, ...
??? ???
Classify each document as sports or politics
![Page 15: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/15.jpg)
Bag-of-words models for text classification
“Much of the meaning behind written language is preserved even when the ordering of the individual words is lost.” [El-Arini et al.,’09]
bag
words(Sue Ann)
![Page 16: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/16.jpg)
Document classification analogy
but to on Darren awarded Fletcher advanced Ashley lift over to 1-0 scored advantage Major for lead 76 Chelsea Premier to Terry League John Houston the kick Chelsea took United points. free minutes fouled United been frustrated overtime Manchester six a when League a extend victory Ching 5 and to and Western Manchester Brian Cole. Dynamo Soccer’s by a minutes, Blues’ the had header into of scored ...
the margin how In on majority 39 costs. with measure slim overall — to like opposed suggests challenges pay even substantially stalled government run where the issues votes it the where bill for spare, from bill and a Senate, analysts coverage, in — the Democrats greater differ two proposals budget its House assess while Senate, to in just the leader and the plan passed the is House passed The ...
??? ???
![Page 17: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/17.jpg)
Document classification analogy
but to on Darren awarded Fletcher advanced Ashley lift over to 1-0 scored advantage Major for lead 76 Chelsea Premier to Terry League John Houston the kick Chelsea took United points. free minutes fouled United been frustrated overtime Manchester six a when League a extend victory Ching 5 and to and Western Manchester Brian Cole. Dynamo Soccer’s by a minutes, Blues’ the had header into of scored ...
the margin how In on majority 39 costs. with measure slim overall — to like opposed suggests challenges pay even substantially stalled government-run where the issues votes it the where bill for spare, from bill and a Senate, analysts coverage, in — the Democrats greater differ two proposals budget its House assess while Senate, to in just the leader and the plan passed the is House-passed The ...
??? ???
![Page 18: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/18.jpg)
![Page 19: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/19.jpg)
Visual words (discretization)
Words are discrete, visual features are typically continuous…
Discretization via clustering/vector quantization
![Page 20: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/20.jpg)
Visual words
[Sivic et al., ‘05]
![Page 21: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/21.jpg)
Object classification with bag of words
[Sivic et al., ‘05]
![Page 22: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/22.jpg)
Object classification with bag of wordsPerformance on Caltech 101 dataset with linear SVM on bag-of-word vectors:
Faces
Airplanes Cars
[Csurka et al., ‘04]
![Page 23: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/23.jpg)
Object Detection problemDetection: Locate all the faces in this image.
Classification: Is this a face, or not a face?
![Page 24: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/24.jpg)
Face detection via a series of classifications(a.k.a. sliding window brain damage)
![Page 25: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/25.jpg)
False Detection
Missed Faces
Sliding window detection results
![Page 26: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/26.jpg)
The need for… capturing spatial relationships
![Page 27: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/27.jpg)
One ApproachCreate a more descriptive (complicated) feature
Histograms of Oriented Gradients (HOG) features
Original ImageSubdivided Image cells
Histogrammed gradients in
each cell
Estimated Image Gradients
gradient magnitudes
gradient orientations
[Dalal & Triggs, ‘06]
![Page 28: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/28.jpg)
People Tracking with HOG features
bette
r
![Page 29: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/29.jpg)
Modeling Spatial Relationships with Deformable Part Based Models
Spring-based models: Parts prefer low-energy configurations
[Fischler & Elschlager ,’73], [Ramanan et al,’07], [Felszwenwalb et al,’05,’09], [Kumar et al, ‘09]
![Page 30: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/30.jpg)
Parts Based Model
Vertices – Local Appearance
Edges - Spatial Relationship
Goal: Assign model parts to image regions preserving
both local appearance and spatial relationships
![Page 31: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/31.jpg)
Parts based models - Inference ProblemInference problem: What is the best scoring assignment f?
Local Appearance termPairwise Spatial
Relationship term
Inference is NP-hard for general graphs
For trees can use belief propagation for exact solution in polytime
![Page 32: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/32.jpg)
Parts based models - Learning Problem
Linear models:
s.t.
Local Appearance termPairwise Spatial
Relationship term
Convex max-margin objective
Positive examples on one side
Negative examples on the other
[Kumar et al,’09]
Learning linear models: Find weight vectors that best separate positive and negative examples. E.g.,
![Page 33: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/33.jpg)
Person deformable part model
Root filter (8x8 resolution)
Part filter (4x4 resolution)
Quadratic spatial configuration model
[Felszwenwalb et al,’09]
![Page 34: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/34.jpg)
[Felszwenwalb et al,’09]
![Page 35: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/35.jpg)
[Ramanan et al,’09]
![Page 36: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/36.jpg)
Outline
I. Recognizing single object classes (Jon)
II. Scene understanding with multiple classes (Tomasz)
![Page 37: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/37.jpg)
Part II: Scene Understanding with Multiple ClassesGoal: Predict Many Different Objects in a Single Image
Car
Fire Hydrant
Building
Fence
Sidewalk
Tree
![Page 38: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/38.jpg)
Wait...
• What’s wrong with just learning a different sliding window classifier for each object type in the world?
![Page 39: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/39.jpg)
The image as seen from a object detector’s point of view
![Page 40: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/40.jpg)
41
Relationships between objects make recognition possible
41Antonio Torralba. The Context Challenge. http://web.mit.edu/torralba/www/carsAndFacesInContext.html
![Page 41: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/41.jpg)
43
Objects as the “Parts” of a Scene
Key Challenge in Scene Understanding: Modeling relationships between objects from different categories
Deformable Part Model Scene Model
![Page 42: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/42.jpg)
Fixed Extent “Things” vs Free-form “Stuff”
Building
Fence
Sidewalk
Car
Fire Hydrant
Tree
Things have a well-defined shape. A part of a car is not a car.
Stuff is free-form and mostly defined by color/texture. A part of a building is still a building.
![Page 43: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/43.jpg)
3 Types of Scene Models
Pixel-based Window-based Segment-based
![Page 44: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/44.jpg)
Pixel-based Scene Understanding
Unable to reason about instances
Only limited notion of context
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. Shotton et al. ECCV 2006
Produces Segmentation
Works well on “stuff”
![Page 45: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/45.jpg)
50
Pixel-wise Conditional Random Fields (TextonBoost)
• Inference
• y^* = argmax_y p(y|x)
• Training: Use boosting to learn unary potential
• Future Direction: Higher-Order Cliques50
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. Shotton et al. ECCV 2006
![Page 46: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/46.jpg)
Window-based Scene Understanding
Often not possible to model “stuff” using windows.
Window assumption also questionable for some “things.”
Possible to model interactions between object instances.
Discriminative models for multi-class object layout. Desai et al. ICCV 2009Object Recognition by Scene Alignment.
Russell et al. NIPS 2007
![Page 47: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/47.jpg)
52
Discriminative models for multi-class object layout
• Inference via Greedy Forward Search
• Training
52
![Page 48: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/48.jpg)
53
Window-based results
53
![Page 49: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/49.jpg)
Region-Based Scene Understanding
Use Segmentation algorithm to extract stable regionsUse CRF to label those segments
Problem: Hard to get object-segments. Problem: Inference difficult for fully connected models.
![Page 50: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/50.jpg)
56
Region-Based CRF
• Training: Bag of Words with Nearest Neighbor classifier
• Maximum Likelihood training of pairwise potentials
56
Object Categorization using Co-Occurrence, Location and Appearance. Galleguillos et al. CVPR 2008.
Spatial Relations
![Page 51: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/51.jpg)
57
Segmentation-Based Results
57
Input image No context w/ context
Object Categorization using Co-Occurrence, Location and Appearance. Galleguillos et al. CVPR 2008.
![Page 52: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/52.jpg)
58
Model Granularity vs. Object Type
Pixels Windows Regions
Things (car, cow, person) :-( :-) :-/
Stuff (road, sky, tree) :-) :-( :-)
Granularity
ObjectType
![Page 53: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/53.jpg)
Scene Understanding Recap
• Rich object-object interactions are important for scene understanding.
• Different underlying assumptions (pixel vs. window vs. region) are better suited for different types of objects (“stuff” vs. “things”)
• Many of the techniques for single class object recognition (e.g., part based models) are relevant for scene understanding
![Page 54: Learning and Inference in Vision: from Features to Scene Understanding Jonathan Huang, Tomasz Malisiewicz MLD Student Research Symposium, 2009.](https://reader030.fdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cc6d6/html5/thumbnails/54.jpg)
Thanks!
Image Classification
Sliding Window based Object Detection
Modeling Spatial Relationships between parts
Modeling Spatial Relationships between objects