Stereoscopic 3D reconstruction -...
Transcript of Stereoscopic 3D reconstruction -...
Stereoscopic 3D reconstruction
• Narrower formulation: given a rectified binocular
stereo pair, fuse it to produce a depth image
Stereoscopic 3D reconstruction
• Narrower formulation: given a rectified binocular stereo
pair, fuse it to produce a depth image
image 1 image 2
Dense depth map
Depth from disparity
f
x x’
Baseline
B
z
O O’
X
f
z
fBxxdisparity
Disparity is inversely proportional to depth…
B1 B2
1Bx
f z 2Bx
f z
1 2B Bx x
f z
x
Depth from disparity
f
x’
z
O O’
X
f
z
fBxxdisparity
1Bx
f z 2Bx
f z
1 2B Bx x
f z
B
B1
B2
field of view
of stereo
The role of the baseline in depth reconstruction accuracy
one pixel
uncertainty of
scenepoint
Optical axes of the two cameras need not be parallel
• Field of view decreases with increase in baseline and vergence
• Accuracy increases with baseline and vergence
The correspondence problem
• …, so if we could find the corresponding
points in two images, we could estimate
relative depth…
• Epipolar geometry constrained our search, but
we still have to solve the correspondence
problem.
• Goal/motivation: Establish densecorrespondences between the views of a stereo pair, given the epipolar constraint, so as to perform 3D reconstruction
Matching cost
disparity
Left Right
scanline
Correspondence search
• Slide a window along the right scanline and
compare contents of that window with the
reference window in the left image
• Matching cost: SSD or normalized correlation
Left Right
scanline
Correspondence search
SSD
Left Right
scanline
Correspondence search
Norm. corr
Basic stereo matching algorithm
• If necessary, rectify the two stereo images to transform
epipolar lines into scanlines
• For each pixel x in the first image
– Find corresponding epipolar scanline in the right image
– Examine all pixels on the scanline and pick the best match x’
– Compute disparity x–x’ and set depth(x) = B*f/(x–x’)
Effect of window size
– Smaller window
+ More detail
• More noise
– Larger window
+ Smoother disparity maps
• Less detail
W = 3 W = 20
Results with window search
Window-based matching Ground truth
Data
Failures of correspondence search
Textureless surfaces Occlusions, repetition
Non-Lambertian surfaces, specularities…
How can we improve window-based matching?
• The similarity constraint is local (each
reference window is matched
independently)
• Need to enforce non-local
correspondence constraints
Non-local constraints• Uniqueness
– For any point in one image, there should be at
most one matching point in the other image
Non-local constraints• Uniqueness
– For any point in one image, there should be at most
one matching point in the other image
• Ordering
– Corresponding points should be in the same order in
both views
Non-local constraints
• Uniqueness
– For any point in one image, there should be at
most one matching point in the other image
• Ordering
– Corresponding points should be in the same
order in both views
• Smoothness
– We expect disparity values to change slowly
(for the most part)
Scanline stereo
• Try to coherently match pixels on the
entire scanline
• Different scanlines are still optimized
independentlyLeft image Right image
“Shortest paths” for scan-line stereo
Left image
Right image
Can be implemented with dynamic programming Ohta & Kanade ’85, Cox et al. ’96:
C(i,j) = min{C(i-1,j)+coccl, C(i,j-1)+coccl, C(i-1,j-1)+ccorr(i,j)}
leftS
rightS
Rightocclusion
occlC
occlC
I
I
corrC
Slide credit: Y. Boykov
Dynamic Programming - Result
• No inter scan-line consistency is enforced
Stereo matching as energy minimization
I1I2 D
• Energy functions of this form can be minimized using
graph cuts
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001
W1(i) W2(i+D(i)) D(i)
jii
jDiDiDiWiWDE,neighbors
2
21 )()())(()()(
data term smoothness term
ResultsWindow-based matchingGround truth
Dynamic programming Graph Cuts
Another example
Dynamic programmingGraph Cuts
Active stereo with structured light
Active stereo with structured light
L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color
Structured Light and Multi-pass Dynamic Programming. 3DPVT 2002
Active stereo with structured light
• Project “structured” light patterns onto the object
• Simplifies the correspondence problem
• Allows us to use only one camera
L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured
Light and Multi-pass Dynamic Programming. 3DPVT 2002
Laser scanning
Optical triangulation
• Project a single stripe of laser light
• Scan it across the surface of the object
• This is a very precise version of structured light scanning
Digital Michelangelo Project
Levoy et al.http://graphics.stanford.edu/projects/mich/
Source: S. Seitz
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Source: S. Seitz
Laser scanned models
The Digital Michelangelo Project, Levoy et al.Source: S. Seitz
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Source: S. Seitz
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Source: S. Seitz
Laser scanned models
The Digital Michelangelo Project, Levoy et al.
Source: S. Seitz
1.0 mm resolution
Aligning range images
• A single range scan is not sufficient to describe a
complex surface
• Need techniques to register multiple range images
B. Curless and M. Levoy, A Volumetric Method for Building Complex Models from
Range Images, SIGGRAPH 1996
Kinect: Structured infrared light
http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/
The third eye can be used to provide further constraints
or for verification
Beyond two-view stereo
Volumetric stereo
Scene Volume
V
Input Images
(Calibrated)
Goal: Determine occupancy + “color” of points in V
Discrete formulation: Voxel Coloring
Discretized
Scene Volume
Input Images
(Calibrated)
Goal: Assign RGB values to voxels in Vphoto-consistent with images
The idea…
Photo-consistent
Scene
surface
RedRedRed
BlueGrayRed
Inconsistent
1. Choose voxel
2. Project and correlate
3. Color if consistent(standard deviation of pixel
colors below threshold)
Voxel Coloring Approach
Complexity and computability
Discretized
Scene Volume
N voxels
C colors
3
All Scenes (CN3)Photo-Consistent
Scenes
TrueScene
Photo-consistency
All ScenesPhoto-Consistent
Scenes
TrueScene
• A photo-consistent scene is a scene that exactly
reproduces your input images from the same
camera viewpoints
• You can’t use your input cameras and images to
tell the difference between a photo-consistent
scene and the true scene
Which shape do you get?
• The Photo Hull is the UNION of all photo-consistent scenes in V
– It is a photo-consistent scene reconstruction
– Tightest possible bound on the true scene
True Scene
V
Photo Hull
V
Source: S. Seitz
1. Choose voxel
2. Project and correlate
3. Color if consistent(standard deviation of pixel
colors below threshold)
Voxel Coloring Approach
Visibility Problem: in which images is each voxel visible?
Space Carving
•Space Carving Algorithm
Image 1 Image N
…...
• Initialize to a volume V containing the true scene
• Repeat until convergence
• Choose a voxel on the outside of the volume
• Carve if not photo-consistent
• Project to visible input images
K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, ICCV 1999
Space Carving Algorithm
• The Basic Algorithm is Unwieldy
– Complex update procedure
• Alternative: Multi-Pass Plane Sweep
– Efficient, can use texture-mapping hardware
– Converges quickly in practice
– Easy to implement
Space Carving Results: African Violet
Input Image (1 of 45) Reconstruction
ReconstructionReconstruction
Space Carving Results: Hand
Input Image(1 of 100)
Views of Reconstruction
1. C=2 (shape from silhouettes)
• Volume intersection [Baumgart 1974]
> For more info: Rapid octree construction from image sequences. R. Szeliski,
CVGIP: Image Understanding, 58(1):23-32, July 1993 or
> W. Matusik, C. Buehler, R. Raskar, L. McMillan, and S. J. Gortler, Image-Based
Visual Hulls, SIGGRAPH 2000 ( pdf 1.6 MB )
Voxel coloring solutions
Why use silhouettes?
• Can be computed robustly
• Can be computed efficiently
- =
background
+
foreground
background foreground
• The case of binary images: a voxel is
photo-consistent if it lies inside the object’s
silhouette in all views
Reconstruction from Silhouettes
Binary Images
• The case of binary images: a voxel is photo-
consistent if it lies inside the object’s silhouette
in all views
Reconstruction from Silhouettes
Binary Images
Finding the silhouette-consistent shape (visual hull):
• Backproject each silhouette
• Intersect backprojected volumes
Volume intersection
• Reconstruction Contains the True Scene
– But is generally not the same
Voxel algorithm for volume intersection
• Color voxel black if on silhouette in every image
Voxel algorithm for volume intersection
Photo-consistency vs. silhouette-consistency
True Scene Photo Hull Visual Hull
55
Visual Hull Results
•Download data and results fromhttp://www-cvr.ai.uiuc.edu/ponce_grp/data/visual_hull/
Properties of Volume Intersection
• Pros
– Easy to implement
• Cons
– No concavities
– Reconstruction is not photo-consistent if
texture information is available
– Requires silhouette extraction
Carved visual hulls• The visual hull is a good starting point for
optimizing photo-consistency
– Easy to compute
– Tight outer boundary of the object
– Parts of the visual hull (rims) already lie on the surface
and are already photo-consistent
• Thus:
Yasutaka Furukawa and Jean Ponce, Carved Visual Hulls for Image-Based
Modeling, ECCV 2006.
58
1. Compute visual hull
2. Carve the visual hull to optimize photo-consistency
From multiple views to textured 3D meshes
K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a
GPU-powered approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with
ECCV’2010, Heraklion, Crete, Greece, 10 September 2010.
Shape from silhouettes
K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a GPU-powered
approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with ECCV’2010, Heraklion, Crete,
Greece, 10 September 2010.
Shape from silhouettes
61
K. Tzevanidis, X. Zabulis, T. Sarmis, P. Koutlemanis, N. Kyriazis, A.A. Argyros, “From multiple views to textured 3D meshes: a
GPU-powered approach”, in Proceedings of the Computer Vision on GPUs Workshop, CVGPU’2010, In conjunction with
ECCV’2010, Heraklion, Crete, Greece, 10 September 2010.