3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.
-
Upload
patrick-campbell -
Category
Documents
-
view
225 -
download
0
Transcript of 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.
![Page 1: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/1.jpg)
3D Scene Models6.870 Object recognition and scene understanding
Krista Ehinger
![Page 2: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/2.jpg)
Questions
What makes a good 3D scene model? How accurate does it need to be?
How far can you get with automatic surface detection? Where do you need human input?
![Page 3: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/3.jpg)
Modelling the scene
Real scenes have way too many surfaces
![Page 4: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/4.jpg)
Modelling the scene
Option 1: Diorama world
![Page 5: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/5.jpg)
Tour Into the Picture (TIP)
Model the scene as 5 planes + foreground objects
Easy implementation: planes/objects defined by humans
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 6: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/6.jpg)
TIP Implementation
User defines vanishing point, rear wall of the scene (inner rectangle)
Given some assumptions about the camera, position/size of all planes can be computed...
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 7: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/7.jpg)
Defining the box
Define planes: Floor -> y=0, Ceiling -> y=H Given horizon (vanishing point), corners of
floor, ceiling can be computed from 2D image position
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 8: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/8.jpg)
Defining the box
Once the positions of the planes are known, compute the texture of the planes
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 9: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/9.jpg)
What about foreground objects?
Assume a quadrangle attached to floor, compute attachment points, upper points
Hierarchical model of foreground objects
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 10: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/10.jpg)
Extracting foreground objects
Foreground objects removed, added to mask Holes in background filled in using photo
completion software
Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997
![Page 11: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/11.jpg)
TIP Demonstration
![Page 12: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/12.jpg)
TIP Discussion
Pros: Accurate model (due to human input) Deals with foreground objects, occlusions
Cons: Requires human input, not automatic Model too simple for many real-world scenes
![Page 13: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/13.jpg)
Modelling the scene
Option 2: Pop-up book world
![Page 14: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/14.jpg)
Automatic Photo Pop-Up
Three classes of surface: ground, sky, vertical Not just a box: can model more kinds of scenes Automatic classification, no labeling
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.
![Page 15: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/15.jpg)
Photo Pop-Up Implementation
Pixels -> superpixels -> constellations Automatic labeling of constellations as ground,
vertical, or sky Define angles of vertical planes (using
attachment to ground) Map textures to vertical planes (as in TIP)
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.
![Page 16: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/16.jpg)
Superpixels, constellations
Superpixels are neighboring pixels that have nearly the same color (Tao et al, 2001)
Superpixels assigned to constellations according to how likely they are to share a label (ground, vertical, sky) based on difference between feature vectors
![Page 17: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/17.jpg)
Feature vectors
Color features: RGB, hue, saturation Texture features: Difference of oriented
Gaussians, Textons Location (absolute and percentile) N superpixels in constellation Line and intersection detectors Not used: constellation shape (contiguous, N
sides), some texture features
![Page 18: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/18.jpg)
Training process
For each of 82 labeled training images Compute superpixels, features, pairwise likelihoods Form a set of N constellations (N = 3 to 25), each
labeled with ground truth Compute constellation features
Compute constellation label, homogeneity likelihood:
![Page 19: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/19.jpg)
Training process
Adaboost weak classifiers learn to estimate whether superpixels have same label (based on feature vector)
Another set of Adaboost week classifiers learns constellation label, homogeneity likelihood (expressed as percent ground, vertical, sky, mixed)
Emphasis on classifying larger constellations
![Page 20: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/20.jpg)
Building the 3D model
Along vertical/ground boundary, fit line segments (Hough transform) – goal is to find simplest shape (fewest lines)
Project lines up from corners of boundary lines, cut and fold
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.
![Page 21: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/21.jpg)
Photo Pop-Up Demonstration
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.
![Page 22: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/22.jpg)
Photo Pop-Up Discussion
Pros: Automatic Can handle a variety of scenes, not just boxes
Cons: No handling of foreground objects Misclassification leads to very strange models Only 2 kinds of surface: ground, vertical
D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.
![Page 23: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/23.jpg)
Modelling the scene
Option 3: Actually try to model surface angles
![Page 24: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/24.jpg)
3D Scene Structure from Still Image
Compute surface normal for each surface No right-angle assumptions; surfaces can have
any angle Automatic (trained on images with known depth
maps)
![Page 25: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/25.jpg)
3D Scene Implementation
Segment image into superpixels Estimate surface normal of each superpixel
(using Markov Random Field model) Optional: Detect and extract foreground objects Map textures to planes
Original image Modeled depth map
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007
![Page 26: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/26.jpg)
Image features
Superpixel features (xi) Color and texture features as in Photo Pop-Up Vector also includes features of neighboring
superpixels Boundary features (xij)
Color difference, texture difference, edge detector
![Page 27: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/27.jpg)
Markov Random Field Model
First term: model planes in terms of image features of superpixels
Second term: model planes in terms of pairs of superpixels, with constraints...
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007
![Page 28: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/28.jpg)
Model constraints
Connected structure: except where there is an occlusion, neighboring superpixels are likely to be connected
Coplanar structure: except where there are folds, neighboring superpixels are likely to lie on the same plane
Co-linearity: long straight lines in the image correspond to straight lines in 3D
![Page 29: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/29.jpg)
Foreground objects
Automatically-detected foreground objects may be removed from model (for example: pedestrians, using Dalal & Triggs detector)
Detected objects add 3D cues (pedestrians are basically vertical, occlude other surfaces)
![Page 30: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/30.jpg)
3D Scene Demonstration
![Page 31: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/31.jpg)
Results
A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007
![Page 32: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/32.jpg)
3D Scene Discussion
Pros: Handles a variety of scene types Fairly accurate (about 2/3 of scenes correct) Automatic Handles foreground objects
Cons: Still fails on 1/3 of scenes
![Page 33: 3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649da65503460f94a92773/html5/thumbnails/33.jpg)
Discussion
Simple 3D models are adequate for many scenes
You can get pretty far without human input (but still would be better results with human annotation of scenes)
Extensions? Use photo completion techniques to handle
occlusions? Massive training sets -> better 3D models?