C280, Computer Vision
description
Transcript of C280, Computer Vision
![Page 2: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/2.jpg)
Roadmap• Previous: Image formation, filtering, local features, (Texture)…• Tues: Feature-based Alignment
– Stitching images together– Homographies, RANSAC, Warping, Blending– Global alignment of planar models
• Today: Dense Motion Models– Local motion / feature displacement– Parametric optic flow
• No classes next week: ICCV conference• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known
inter-camera pose• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D
structure– Factorization approaches– Global alignment with 3D point models
![Page 3: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/3.jpg)
Last Time: Alignment
• Homographies• Rotational Panoramas• RANSAC• Global alignment• Warping• Blending
![Page 4: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/4.jpg)
Today: Motion and Flow
• Motion estimation• Patch-based motion
(optic flow)• Regularization and line
processes• Parametric (global)
motion• Layered motion models
![Page 5: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/5.jpg)
Why estimate visual motion?
• Visual Motion can be annoying– Camera instabilities, jitter– Measure it; remove it (stabilize)
• Visual Motion indicates dynamics in the scene– Moving objects, behavior– Track objects and analyze trajectories
• Visual Motion reveals spatial layout – Motion parallax
Szeliski
![Page 6: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/6.jpg)
Classes of Techniques
• Feature-based methods– Extract visual features (corners, textured areas) and track them– Sparse motion fields, but possibly robust tracking– Suitable especially when image motion is large (10s of pixels)
• Direct-methods– Directly recover image motion from spatio-temporal image
brightness variations– Global motion parameters directly recovered without an
intermediate feature motion calculation– Dense motion fields, but more sensitive to appearance variations– Suitable for video and when image motion is small (< 10 pixels)
Szeliski
![Page 7: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/7.jpg)
Patch-based motion estimation
![Page 8: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/8.jpg)
Image motion
How do we determine correspondences?
Assume all change between frames is due to motion:
),(),( ),(),( yxyx vyuxIyxJ
I J
![Page 9: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/9.jpg)
• Brightness Constancy Equation:),(),( ),(),( yxyx vyuxIyxJ
Or, equivalently, minimize :2)),(),((),( vyuxIyxJvuE
),(),(),(),(),(),( yxvyxIyxuyxIyxIyxJ yx
Linearizing (assuming small (u,v))using Taylor series expansion:
The Brightness Constraint
Szeliski
![Page 10: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/10.jpg)
Gradient Constraint (or the Optical Flow Constraint)
2)(),( tyx IvIuIvuE Minimizing:
0)(
0)(
0
tyxy
tyxx
IvIuII
IvIuIIdvE
duE
In general 0, yx II
0 tyx IvIuIHence,
Szeliski
![Page 11: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/11.jpg)
yx
tyx IvyxIuyxIvuE,
2),(),(),(
Minimizing
Assume a single velocity for all pixels within an image patch
ty
tx
yyx
yxx
IIII
vu
IIIIII2
2
tT IIUII
LHS: sum of the 2x2 outer product of the gradient vector
Patch Translation [Lucas-Kanade]
Szeliski
![Page 12: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/12.jpg)
Local Patch Analysis
• How certain are the motion estimates?
Szeliski
![Page 13: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/13.jpg)
The Aperture Problem
TIIMLet
• Algorithm: At each pixel compute by solving
• M is singular if all gradient vectors point in the same direction• e.g., along an edge• of course, trivially singular if the summation is over a single pixel
or there is no texture• i.e., only normal flow is available (aperture problem)
• Corners and textured areas are OK
and
ty
tx
IIII
b
U bMU
Szeliski
![Page 14: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/14.jpg)
SSD Surface – Textured area
![Page 15: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/15.jpg)
SSD Surface -- Edge
![Page 16: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/16.jpg)
SSD – homogeneous area
![Page 17: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/17.jpg)
Iterative Refinement
• Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation
• Warp one image toward the other using the estimated flow field(easier said than done)
• Refine estimate by repeating the process
Szeliski
![Page 18: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/18.jpg)
Optical Flow: Iterative Estimation
xx0
Initial guess: Estimate:
estimate update
(using d for displacement here instead of u)
Szeliski
![Page 19: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/19.jpg)
Optical Flow: Iterative Estimation
xx0
estimate update
Initial guess: Estimate:
Szeliski
![Page 20: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/20.jpg)
Optical Flow: Iterative Estimation
xx0
Initial guess: Estimate:Initial guess: Estimate:
estimate update
Szeliski
![Page 21: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/21.jpg)
Optical Flow: Iterative Estimation
xx0
Szeliski
![Page 22: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/22.jpg)
Optical Flow: Iterative Estimation
• Some Implementation Issues:– Warping is not easy (ensure that errors in warping
are smaller than the estimate refinement)– Warp one image, take derivatives of the other so
you don’t need to re-compute the gradient after each iteration.
– Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)
Szeliski
![Page 23: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/23.jpg)
Optical Flow: Aliasing
Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity.I.e., how do we know which ‘correspondence’ is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
To overcome aliasing: coarse-to-fine estimation.
actual shift
estimated shift
Szeliski
![Page 24: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/24.jpg)
Limits of the gradient method
Fails when intensity structure in window is poorFails when the displacement is large (typical
operating range is motion of 1 pixel)Linearization of brightness is suitable only for small
displacements• Also, brightness is not strictly constant in
imagesactually less problematic than it appears, since we can
pre-filter images to make them look similar
Szeliski
![Page 25: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/25.jpg)
image Iimage J
aJwwarp refine
a aΔ+
Pyramid of image J Pyramid of image I
image Iimage J
Coarse-to-Fine Estimation
u=10 pixels
u=5 pixels
u=2.5 pixels
u=1.25 pixels
Szeliski
![Page 26: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/26.jpg)
J Jw Iwarp refine
ina
a+
J Jw Iwarp refine
a
a+
J
pyramid construction
J Jw Iwarp refine
a+
I
pyramid construction
outa
Coarse-to-Fine Estimation
Szeliski
![Page 27: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/27.jpg)
Spatial Coherence
Assumption * Neighboring points in the scene typically belong to the same surface and hence typically have similar motions. * Since they also project to nearby points in the image, we expect spatial coherence in image flow.
Black
![Page 28: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/28.jpg)
Formalize this Idea
Noisy 1D signal:
x
u
Noisy measurementsu(x)
Black
![Page 29: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/29.jpg)
Regularization
Find the “best fitting” smoothed function v(x)
x
u
Noisy measurementsu(x)
v(x)
Black
![Page 30: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/30.jpg)
Membrane model
Find the “best fitting” smoothed function v(x)
x
uv(x)
Black
![Page 31: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/31.jpg)
Membrane model
Find the “best fitting” smoothed function v(x)
x
uv(x)
Black
![Page 32: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/32.jpg)
Membrane model
Find the “best fitting” smoothed function v(x)
uv(x)
Black
![Page 33: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/33.jpg)
Spatial smoothness assumptionFaithful to the data
Regularization
x
uv(x)
Minimize:
1
1
2
1
2 ))()1(())()(()(N
x
N
x
xvxvxuxvvE
Black
![Page 34: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/34.jpg)
Bayesian Interpretation
1
1
2
1
2 ))()1(())()(()(N
x
N
x
xvxvxuxvvE
)()|()|( vpvupuvp
)()( xvxu ),0(~ 1 N
2)1()( xvxv ),0(~ 22 N
)/))()((21exp(
21)|( 2
12
1 1
xvxuvupN
x
)/))((21exp(
21)( 2
22
1
1 2
xvvp x
N
x
Black
![Page 35: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/35.jpg)
Discontinuities
x
uv(x)
What about this discontinuity?What is happening here?What can we do?
Black
![Page 36: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/36.jpg)
Robust EstimationNoise distributions are often non-Gaussian, having much heavier tails. Noise
samples from the tails are called outliers.• Sources of outliers (multiple motions):
– specularities / highlights– jpeg artifacts / interlacing / motion blur– multiple motions (occlusion boundaries, transparency)
velocity space
u1
u2
++
Black
![Page 37: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/37.jpg)
Occlusion
occlusion disocclusion shear
Multiple motions within a finite region.
Black
![Page 38: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/38.jpg)
Coherent Motion
Possibly Gaussian.
Black
![Page 39: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/39.jpg)
Multiple Motions
Definitely not Gaussian.
Black
![Page 40: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/40.jpg)
Weak membrane modelu
v(x)
1
1
2
1
2 ))(1())()1()(())()((),(N
x
N
x
xlxvxvxlxuxvlvE
x
}1,0{)( xl
Black
![Page 41: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/41.jpg)
Analog line process
1
1
2
1
2 ))(())()1()(())()((),(N
x
N
x
xlxvxvxlxuxvlvE
1)(0 xl
Penalty function Family of quadratics
Black
![Page 42: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/42.jpg)
Analog line process
1
1
2
1
2 ))(())()1()(())()((),(N
x
N
x
xlxvxvxlxuxvlvE
Infimum defines a robust error function.
1
12
1
2 )),()1(())()(()(N
x
N
x
xvxvxuxvvE
Minima are the same:
Black
![Page 43: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/43.jpg)
Robust Regularization
x
uv(x)
Minimize:
1
12
11 )),()1(()),()(()(
N
x
N
x
xvxvxuxvvE
Treat large spatial derivatives as outliers.
Black
![Page 44: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/44.jpg)
Robust EstimationProblem: Least-squares estimators penalize deviations between data & model with quadratic error fn (extremely sensitive to outliers)
error penalty function influence function
Redescending error functions (e.g., Geman-McClure) help to reduce the influence of outlying measurements.
error penalty function influence function
Black
![Page 45: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/45.jpg)
Optical flow
Outlier with respect to neighbors.
)()()()(),( yxyxS vvuuvuE
Robust formulation of spatial coherence term
Black
![Page 46: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/46.jpg)
“Dense” Optical Flow
)),()()()()(())(( DtyxD IvIuIE xxxxxxu
)]),()(()),()(([),()(
SG
SS vvuuvuE yxyxxy
x
xuxuu ))(())(()( SD EEE Objective function:
When is quadratic = “Horn and Schunck”
Black
![Page 47: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/47.jpg)
Optimization
vE
vTvv
uE
uTuu
nn
nn
)(1
)(1
)()1(
)()1(
)(
),(),(sGn
SnsxDtsusxs
uuIIvIuIuE
T(u)= max of second derivative
Black
![Page 48: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/48.jpg)
Optimization
• Gradient descent• Coarse-to-fine (pyramid)• Deterministic annealing
Black
![Page 49: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/49.jpg)
Example
Black
![Page 50: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/50.jpg)
Example
Black
![Page 51: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/51.jpg)
Quadratic:
Robust:
Black
![Page 52: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/52.jpg)
Magnitude of horizontal flow
Black
![Page 53: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/53.jpg)
Outliers
Points where the influence is reduced
Spatial term Data term
Black
![Page 54: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/54.jpg)
With 5% uniform random noise added to the images.
Black
![Page 55: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/55.jpg)
Horizontal Component
Black
![Page 56: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/56.jpg)
More Noise
Quadratic:
Quadratic data term, robust spatial term:
Black
![Page 57: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/57.jpg)
Both Terms Robust
Spatial and data outliers:
Black
![Page 58: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/58.jpg)
Pepsi
Black
![Page 59: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/59.jpg)
Real Sequence
Deterministic annealing.First stage (large ):
Black
![Page 60: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/60.jpg)
Real Sequence
Final result after annealing:
Black
![Page 61: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/61.jpg)
Parametric motion estimation
![Page 62: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/62.jpg)
Global (parametric) motion models• 2D Models:• Affine• Quadratic• Planar projective transform (Homography)
• 3D Models:• Instantaneous camera motion models • Homography+epipole• Plane+Parallax
Szeliski
![Page 63: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/63.jpg)
Motion models
Translation
2 unknowns
Affine
6 unknowns
Perspective
8 unknowns
3D rotation
3 unknowns
Szeliski
![Page 64: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/64.jpg)
0)()( 654321 tyx IyaxaaIyaxaaI
• Substituting into the B.C. Equation:yaxaayxvyaxaayxu
654
321
),(),(
Each pixel provides 1 linear constraint in 6 global unknowns
0 tyx IvIuI
2 tyx IyaxaaIyaxaaIaErr )()()( 654321
Least Square Minimization (over all pixels):
Example: Affine Motion
Szeliski
![Page 65: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/65.jpg)
Last lecture: Alignment / motion warping• “Alignment”: Assuming we know the correspondences,
how do we get the transformation?
),( ii yx ),( ii yx
2
1
43
21
tt
yx
mmmm
yx
i
i
i
i
• Expressed in terms of absolute coordinates of corresponding points…
• Generally presumed features separately detected in each frame
e.g., affine model in abs. coords…
![Page 66: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/66.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
),( ii yx),( ii yx
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
![Page 67: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/67.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
),( ii yx),( ii yx
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
![Page 68: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/68.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
),( ii yx),( ii yx
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
![Page 69: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/69.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
),( ii yx
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
(ui,vi)
![Page 70: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/70.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
(ui,vi)
![Page 71: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/71.jpg)
Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or
analyze spatio-temporal gradient
• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change
(ui,vi)
yaxaayxvyaxaayxu
654
321
),(),(
2
1
43
21
tt
yx
mmmm
yx
i
i
i
i
4
1
65
32
aa
yx
aaaa
vu
i
i
i
i
Previous Alignment model:
Now, Displacement model:
![Page 72: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/72.jpg)
Quadratic – instantaneous approximation to planar motion 2
87654
82
7321
yqxyqyqxqqv
xyqxqyqxqqu
yyvxxu
yhxhhyhxhhy
yhxhhyhxhhx
',' and
'
'
987
654
987
321
Projective – exact planar motion
Other 2D Motion Models
Szeliski
![Page 73: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/73.jpg)
ZxTTxxyyv
ZxTTyxxyu
ZYZYX
ZXZYX
)()1(
)()1(2
2
yyvxxuthyhxhthyhxhy
thyhxhthyhxhx
',' :and
'
'
3987
1654
3987
1321
)(1
)(1
233
133
tytt
xyv
txtt
xxu
w
w
Local Parameter:
ZYXZYX TTT ,,,,,
),( yxZ
Instantaneous camera motion:
Global parameters:
Global parameters:
32191 ,,,,, ttthh
),( yx
Homography+Epipole
Local Parameter:
Residual Planar Parallax Motion
Global parameters:
321 ,, ttt
),( yxLocal Parameter:
3D Motion Models
Szeliski
![Page 74: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/74.jpg)
• Consider image I translated by
• The discrete search method simply searches for the best estimate.
• The gradient method linearizes the intensity function and solves for the estimate
21
,00
2
,1
)),(),(),((
)),(),((),(
yxvvyuuxIyxI
vyuxIyxIvuE
yx
yx
00 ,vu
),(),(),(),(),(
1001
0
yxyxIvyuxIyxIyxI
Discrete Search vs. Gradient Based
Szeliski
![Page 75: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/75.jpg)
Correlation and SSD
• For larger displacements, do template matching– Define a small area around a pixel as the template– Match the template against each pixel within a
search area in next image.– Use a match measure such as correlation,
normalized correlation, or sum-of-squares difference– Choose the maximum (or minimum) as the match– Sub-pixel estimate (Lucas-Kanade)
Szeliski
![Page 76: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/76.jpg)
Shi-Tomasi feature tracker
1. Find good features (min eigenvalue of 22 Hessian)
2. Use Lucas-Kanade to track with pure translation
3. Use affine registration with first feature patch4. Terminate tracks whose dissimilarity gets too
large5. Start new tracks when needed
Szeliski
![Page 77: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/77.jpg)
Tracking results
Szeliski
![Page 78: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/78.jpg)
Tracking - dissimilarity
Szeliski
![Page 79: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/79.jpg)
Tracking results
Szeliski
![Page 80: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/80.jpg)
How well do thesetechniques work?
![Page 81: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/81.jpg)
A Database and Evaluation Methodology for Optical Flow
Simon Baker, Daniel Scharstein, J.P Lewis, Stefan Roth, Michael Black, and Richard Szeliski
ICCV 2007http://vision.middlebury.edu/flow/
![Page 82: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/82.jpg)
Limitations of Yosemite• Only sequence used for quantitative evaluation
• Limitations:• Very simple and synthetic• Small, rigid motion• Minimal motion discontinuities/occlusions
Image 7 Image 8
Yose
mite
Ground-Truth FlowFlow Color
Coding
Szeliski
![Page 83: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/83.jpg)
Limitations of Yosemite
• Only sequence used for quantitative evaluation
• Current challenges:• Non-rigid motion• Real sensor noise• Complex natural scenes• Motion discontinuities• Need more challenging and more realistic benchmarks
Image 7 Image 8
Yose
mite
Ground-Truth FlowFlow Color
Coding
Szeliski
![Page 84: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/84.jpg)
Motion estimation 84
Realistic synthetic imagery• Randomly generate scenes with “trees” and “rocks”• Significant occlusions, motion, texture, and blur• Rendered using Mental Ray and “lens shader” plugin
Rock
Grov
e
Szeliski
![Page 85: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/85.jpg)
85
Modified stereo imagery
• Recrop and resample ground-truth stereo datasets to have appropriate motion for OF
Venu
sM
oebi
us
Szeliski
![Page 86: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/86.jpg)
• Paint scene with textured fluorescent paint• Take 2 images: One in visible light, one in UV light• Move scene in very small steps using robot• Generate ground-truth by tracking the UV images
Dense flow with hidden texture
Setup
Visible
UV
Lights Image Cropped
Szeliski
![Page 87: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/87.jpg)
Experimental results
• Algorithms:• Pyramid LK: OpenCV-based implementation of
Lucas-Kanade on a Gaussian pyramid• Black and Anandan: Author’s implementation• Bruhn et al.: Our implementation• MediaPlayerTM: Code used for video frame-rate
upsampling in Microsoft MediaPlayer• Zitnick et al.: Author’s implementation
Szeliski
![Page 88: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/88.jpg)
Motion estimation 88
Experimental results
Szeliski
![Page 89: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/89.jpg)
Conclusions
• Difficulty: Data substantially more challenging than Yosemite
• Diversity: Substantial variation in difficulty across the various datasets
• Motion GT vs Interpolation: Best algorithms for one are not the best for the other
• Comparison with Stereo: Performance of existing flow algorithms appears weak
Szeliski
![Page 90: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/90.jpg)
Layered Scene Representations
![Page 91: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/91.jpg)
Motion representations
• How can we describe this scene?
Szeliski
![Page 92: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/92.jpg)
Block-based motion prediction
• Break image up into square blocks• Estimate translation for each block• Use this to predict next frame, code difference
(MPEG-2)
Szeliski
![Page 93: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/93.jpg)
Layered motion
• Break image sequence up into “layers”:
• =
• Describe each layer’s motion
Szeliski
![Page 94: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/94.jpg)
Layered motion
• Advantages:• can represent occlusions / disocclusions• each layer’s motion can be smooth• video segmentation for semantic processing• Difficulties:• how do we determine the correct number?• how do we assign pixels?• how do we model the motion?
Szeliski
![Page 95: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/95.jpg)
Layers for video summarization
Szeliski
![Page 96: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/96.jpg)
Background modeling (MPEG-4)
• Convert masked images into a background sprite for layered video coding
• + + +
•=
Szeliski
![Page 97: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/97.jpg)
What are layers?
• [Wang & Adelson, 1994; Darrell & Pentland 1991]
• intensities• alphas• velocities
Szeliski
![Page 98: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/98.jpg)
Fragmented Occlusion
![Page 99: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/99.jpg)
Results
![Page 100: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/100.jpg)
Results
![Page 101: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/101.jpg)
How do we form them?
Szeliski
![Page 102: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/102.jpg)
How do we estimate the layers?
1. compute coarse-to-fine flow2. estimate affine motion in blocks (regression)3. cluster with k-means4. assign pixels to best fitting affine region5. re-estimate affine motions in each region…
Szeliski
![Page 103: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/103.jpg)
Layer synthesis
• For each layer:• stabilize the sequence with the affine motion• compute median value at each pixel• Determine occlusion relationships
Szeliski
![Page 104: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/104.jpg)
Results
Szeliski
![Page 105: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/105.jpg)
Recent results: SIFT Flow
![Page 106: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/106.jpg)
Recent GPU Implementation
• http://gpu4vision.icg.tugraz.at/• Real time flow exploiting robust norm +
regularized mapping
![Page 107: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/107.jpg)
Today: Motion and Flow
• Motion estimation• Patch-based motion (optic flow)• Regularization and line processes• Parametric (global) motion• Layered motion models
![Page 108: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/108.jpg)
Slide Credits
• Rick Szeliski• Michael Black
![Page 109: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/109.jpg)
Roadmap• Previous: Image formation, filtering, local features, (Texture)…• Tues: Feature-based Alignment
– Stitching images together– Homographies, RANSAC, Warping, Blending– Global alignment of planar models
• Today: Dense Motion Models– Local motion / feature displacement– Parametric optic flow
• No classes next week: ICCV conference• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known
inter-camera pose• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D
structure– Factorization approaches– Global alignment with 3D point models
![Page 110: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/110.jpg)
Final project
• Significant novel implementation of technique related to course content
• Teams of 2 encouraged (document role!)• Or journal length review article (no teams)• Three components:
– proposal document (no more than 5 pages)– in class results presentation (10 minutes)– final write-up (no more than 15 pages)
![Page 111: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/111.jpg)
Project Proposals• Due next Friday!• No more than 5 pages• Explain idea:
– Motivation– Approach– Datasets– Evaluation– Schedule
• Proposal should convince me (and you) that project is interesting and doable given the resources you have.
• You can change topics after proposal (at your own risk!)• I’ll consider proposal, final presentation, and report when
evaluating project: a well-thought out proposal can have significant positive weight.
• Teams OK; overlap with thesis or other courses OK.
![Page 112: C280, Computer Vision](https://reader031.fdocuments.us/reader031/viewer/2022012919/56816962550346895de11b33/html5/thumbnails/112.jpg)
Project Ideas?
• Face annotation with online social networks?• Classifying sports or family events from photo
collections?• Optic flow to recognize gesture?• Finding indoor structures for scene context?• Shape models of human silhouettes? • Salesin: classify aesthetics?
– Would gist regression work?