A Comparison of Stereo and Multiview 3-D Reconstruction U ...
Computer Vision: Stereo and 3D Reconstruction · Computer Vision: Stereo and 3D Reconstruction...
Transcript of Computer Vision: Stereo and 3D Reconstruction · Computer Vision: Stereo and 3D Reconstruction...
Computer Vision: Stereo and 3D Reconstruction
Daniel G. Aliaga
Department of Computer Science
Purdue University
Thanks to S. Narasimhan @ CMU
for some of the slides
Definitions
• Camera geometry (=motion)
– Given corresponded points on ≥2 views, what are the poses of the cameras?
• Correspondence geometry (=correspondence)
– Given a point in one view, what are the constraints of its position in another view?
• Scene geometry (=structure)
– Given corresponded points on ≥2 views and the camera poses, what is the 3D location of the points?
Stereo Rig
scene
LP LL yx ,left image
RP RR yx ,
right image
baseline bAssume that we know corresponds to LP RP
Z
bX
f
xL 2
Z
bX
f
xR 2
Using perspective projection (defined using coordinate system shown)
Z
Y
X
Z
Y
f
y
f
y RL
ZYX ,,
f
Stereo Rig
Z
bX
f
xL 2
Z
bX
f
xR 2
Z
Y
f
y
f
y RL
RL
RL
xx
xxbX
2
RL
RL
xx
yybY
2 RL xx
bfZ
scene
LP LL yx ,
left image
RP RR yx ,
right image
baseline b
ZYX ,,
Stereo: Disparity and Depth
Z
bX
f
xL 2
Z
bX
f
xR 2
Z
Y
f
y
f
y RL
RL
RL
xx
xxbX
2
RL
RL
xx
yybY
2 RL xx
bfZ
is the disparity between corresponding left and right image points RL xxd
• disparity increases with baseline b • inversely proportional to depth
scene
LP LL yx ,
left image
RP RR yx ,
right image
baseline b
ZYX ,,
Stereo: Ray Triangulation
scene point
optical center optical center image plane
(Ray) Triangulation: compute reconstruction as intersection of two rays
Stereo: Ray Triangulation
scene point
optical center optical center image plane
Do two lines intersect in 3D?
If so, how do you compute their intersection?
Stereo: Ray Triangulation
Equations for the intersection:
0)()( 21 ba pppp
0)()( 43 ba pppp
)pp(spp 121b
)pp(tpp 343a
...s
...t
)(5.0 ba ppp
1p
2p
3p
4p
ap
bp
p
Solve for s and t, compute p:
field of view
of stereo
Stereo: Vergence
1. Field of view decreases with increase in baseline and vergence 2. Accuracy increases with increase in baseline and vergence
one pixel
uncertainty of
scenepoint
optical axes of the two cameras need not be parallel
Camera Geometry
• We need to transform “left frame” to “right frame” – includes a rotation and translation:
scene
Left frame Right frame
Lx~ Rx~
333231
232221
131211
rrr
rrr
rrr
R
• In matrix notation, we can write as:
Camera Geometry
• In matrix notation, we can write as:
Camera Geometry
RLLL
RLLL
RLLL
zrzryrxr
yrzryrxr
xrzryrxr
34333231
24232221
14131211
Camera Geometry: Orthonormality Constraints
(a) Rows of R are perpendicular vectors
(b) Each row of R is a unit vector
IRRT
NOTE: Constraints
are NON-LINEAR!
Problem:
Given ‘s
Find
Lx~ Rx~
LRtR ),...,,( 341211 rrr
Camera Geometry: Problem Definition
scene
Lx~ Rx~
•subject to (nonlinear) constraints
Problem: same image coords can be generated by doubling Lx~ Rx~ LRt~
thus, we can find only up to a scale factor!
Solution: fix scale by using constraint: (1 additional equation)
Camera Geometry: Scale Ambiguity
Lx~
Rx~
LRt~
Lx~2
Rx~2
LRt~
2
LRt~
1~~
LRLR tt
Camera Geometry: How many scene points are needed?
Each scene point gives 3 equations:
and 6+1 additional equations from orthonormality of rotation matrix
constraints and scale constraint.
Thus, for n scene points, we have (3n + 6 + 1) equations and 12
unknowns
•What is the minimum value for n?
RLLL
RLLL
RLLL
zrzryrxr
yrzryrxr
xrzryrxr
34333231
24232221
14131211
Camera Geometry: Solving an Over-determined System
• Generally, more than 3 points are used to find the 12 unknowns
• Formulate error for scene point i as:
• Find that minimize:
RLRLi xtxRe ~)~(
)]1()([|| 2
1
12
LRLRT
N
i
i ttIRReE
LRtR &
Camera Geometry: A Linear Estimation
Assume a near correct rotation is known. Then an orthogonal
rotation matrix looks like:
•Using this matrix, iteratively and linearly solve for ’s and tLR:
1
1
1
xy
xz
yz
R
0~)~( RLRL xtxR
•How many equations/scene-points are needed?
•6 unknowns, 3 equations per scene point, so ≥ 2 points
•Limitations:
•1. ignores normality/scale (fix by re-scaling each iteration)
•2. assumes good initial guess
where is the 3D rotation axis
and its length is the amount by
which to rotate
Correspondence
scene point
optical center optical center image plane
Correspondence
scene point
optical center optical center image plane
epipolar line
Correspondence
scene point
optical center image plane
optical center
epipolar line epipolar line
Epipolar Geometry
Epipolar Constraint: reduces correspondence problem to 1D search along conjugate epipolar lines
scene point
optical center image plane
optical center
epipolar line epipolar line epipolar plane
Epipolar Geometry
Epipolar Constraint: can be expressed using the fundamental matrix F
scene point
optical center image plane
optical center
epipolar line epipolar line epipolar plane
Epipolar Geometry
converging
cameras
motion parallel with image plane
Epipolar Geometry
e
e’
Epipolar Geometry
Forward motion
Epipolar Geometry
Correspondence reduced to looking in a small neighborhood of a line…
Fundamental Matrix
How to compute the fundamental matrix?
1. geometric explanation…
2. algebraic explanation…
Fundamental Matrix: Geometric Exp.
Thus, there is a mapping
xx
X
ee
l
lx
point line
c c
Fundamental Matrix: Geometric Exp.
How do you map a point to a line?
xx
X
ee
l
c c
Fundamental Matrix: Geometric Exp.
Idea:
- We know ‘s are in a plane
- Define a line by its “perpendicular”, then we can use dot product; e.g.,
xx
X
ee
l
)(x
0 lx 0)'( lcx
c c
or
What is a definition of as perpendicular to the pictured epipolar line?
Fundamental Matrix: Geometric Exp.
xx
X
ee
l
c c
)()( cxcel
l
xel (assume all in canonical frame
of the right-side camera)
Cross product can be expressed using matrix notation:
Fundamental Matrix: Geometric Exp.
xel
z
y
x
xy
xz
yz
x
x
x
ee
ee
ee
xe
0
0
0
xexe ][
xel ][
How do you compute ?
Fundamental Matrix: Geometric Exp.
x
Use a homography (or projective transformation) to map to x x
(Homography: maps points in a plane to another plane)
...
...
...
,,
1
H
w
xw
xw
xx
x
x y
x
y
x
Hxx
Fundamental Matrix: Geometric Exp.
xel ][
Hxx Hxel ][ HeF ][ 0 Fxx T
xx
X
ee
l
c c
Epipolar Constraint 0 lxWant …
Fundamental Matrix: Algebraic Exp.
xx
X
ee
l
c c
PXx
XtRx
XPx
Fundamental Matrix: Algebraic Exp.
PXx ?X
tcxPtX )( Pwhere is the pseudoinverse of P
Why pseudoinverse?
Since not square, pseudoinverse means but solved as an optimization
IPP P
xel ][
?x
What is in terms of ?
Recall
x x
(Let’s assume which means in on the image plane) 0t X
xPPx PPeF ][ 0 Fxx T
Epipolar Constraint
Correspondence: Epipolar Geometry
Epipolar constraint reduces correspondence problem to 1D search along conjugate epipolar lines
scene point
optical center image plane
optical center
epipolar line epipolar line epipolar plane xx
Correspondence: Epipolar Geometry
Epipolar constraint can be expressed as
scene point
optical center image plane
optical center
epipolar line epipolar line epipolar plane
0 Fxx T
xx
Fundamental matrix
Correspondence: Epipolar Geometry
Interesting case: what happens if camera motion is pure translation?
x x
If motion parallel to x-axis…
]0|[IP ]|[ tIP
eF
Te 001
010
100
000
Fimplies horizontal
epipolar line…
Thus the desire to do image rectification
IH ( ) ?F
Correspondence: Epipolar Geometry
Thus for rectified images, correspondence is reduced to looking in a small neighborhood of a line…
Essential Matrix
• Similar to the fundamental matrix but includes the intrinsic calibration matrix, thus the equation is in terms of the normalized image coordinates, e.g.:
0'' Exx T FKKE T'and
essential matrix
Scene Geometry scene point?
Camera A Camera B
Camera geometry known
Correspondence and epipolar geometry known
What is the location of the scene point (scene geometry)?
Scene Geometry: Linear Formulation
XMx aa
~~
ax~
X~
aa KPM bb KPM
XMx bb
~~
bx~
or Problem?
Assumes we know
But what is the value for ?
Twyxx ~
w
Scene Geometry: Linear Formulation
XMx~~
'
''
~
w
yx
s
sysx
x
'
''
~
w
yx
x
Recall
where
'/''/'
wywx
yx where and are the
observed projections x y
Let , thus ?s 'ws
14131211 mZmYmXmsx Hence?
24232221 mZmYmXmsy
Scene Geometry: Linear Formulation
14131211 mZmYmXmsx
Given 24232221 mZmYmXmsy
For a scene point, how many unknowns?
For a scene point, how many camera views needed?
In general, one scene point observed in at least two views is sufficient…
3+N
3N≥3+N
and N cameras
34333231 mZmYmXms
Scene Geometry: Linear Formulation
b
a
s
sZYX
14131211 mZmYmXmsx
24232221 mZmYmXmsy
34333231 mZmYmXms
x2
Scene Geometry: Linear Formulation
'ssZYX
10'''
'0'''
'0'''
01
0
0
333231
232221
131211
333231
232221
131211
mmm
ymmm
xmmm
mmm
ymmm
xmmm
34
24
14
34
24
14
'
'
'
m
m
m
m
m
m
14131211 mZmYmXmsx
24232221 mZmYmXmsy
34333231 mZmYmXms
x2
Cameras M and M’
Scene Geometry: Nonlinear Form.
• Remember “Bundle Adjustment”
– Given initial guesses, use nonlinear least squares to refine/compute the calibration parameters
– Simple but good convergence depends on accuracy of initial guess
Recall
ij ji
jiij
ji
jiij
Xm
Xmy
Xm
Xmx
mnE 2
3
22
3
1)~
~
()~
~
(1
Goal is 0E
For scene geometry, are the unknowns… X~
Scene Geometry: Nonlinear Form.
Example Result
• Using dense feature-based stereo
• Next: images and features!
[Pollefeys99]