16 cv mil_multiple_cameras

55
Computer vision: models, learning and inference Chapter 16 Multiple Cameras Please send errata to [email protected]

Transcript of 16 cv mil_multiple_cameras

Page 1: 16 cv mil_multiple_cameras

Computer vision: models, learning and inference

Chapter 16

Multiple Cameras

Please send errata to [email protected]

Page 2: 16 cv mil_multiple_cameras

Structure from motion

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Given • an object that can be characterized by I 3D points• projections into J images

Find• Intrinsic matrix• Extrinsic matrix for each of J images• 3D points

Page 3: 16 cv mil_multiple_cameras

Structure from motion

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

For simplicity we’ll start with simpler problem

• Just J=2 images• Known intrinsic matrix

Page 4: 16 cv mil_multiple_cameras

Structure

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 5: 16 cv mil_multiple_cameras

Epipolar lines

55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 6: 16 cv mil_multiple_cameras

Epipole

66Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: 16 cv mil_multiple_cameras

Special configurations

77Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 8: 16 cv mil_multiple_cameras

Structure

88Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 9: 16 cv mil_multiple_cameras

The geometric relationship between the two cameras is captured by the essential matrix.

Assume normalized cameras, first camera at origin.

First camera:

Second camera:

The essential matrix

99Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 16 cv mil_multiple_cameras

The essential matrix

1010Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

First camera:

Second camera:

Substituting:

This is a mathematical relationship between the points in the two images, but it’s not in the most convenient form.

Page 11: 16 cv mil_multiple_cameras

The essential matrix

1111Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Take cross product with (last term disappears)

Take inner product of both sides with x2.

Page 12: 16 cv mil_multiple_cameras

The cross product term can be expressed as a matrix

Defining:

We now have the essential matrix relation

The essential matrix

1212Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: 16 cv mil_multiple_cameras

Properties of the essential matrix

1313Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Rank 2:

• 5 degrees of freedom

• Non-linear constraints between elements

Page 14: 16 cv mil_multiple_cameras

Recovering epipolar lines

1414Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Equation of a line:

or

or

Page 15: 16 cv mil_multiple_cameras

Recovering epipolar lines

1515Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Equation of a line:

Now consider

This has the form where

So the epipolar lines are

Page 16: 16 cv mil_multiple_cameras

Recovering epipoles

1616Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Every epipolar line in image 1 passes through the epipole e1.

In other words for ALL

This can only be true if e1 is in the nullspace of E.

Similarly:

We find the null spaces by computing , and taking the last column of and the last row of .

Page 17: 16 cv mil_multiple_cameras

Decomposition of E

1717Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Essential matrix:

To recover translation and rotation use the matrix:

We take the SVD and then we set

Page 18: 16 cv mil_multiple_cameras

Four interpretations

1818Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

To get the different solutions we mutliply

by -1 and substitute

Page 19: 16 cv mil_multiple_cameras

The fundamental matrix

1919Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Now consider two cameras that are not normalised

By a similar procedure to before we get the relation

or

where

Relation between essential and fundamental

Page 20: 16 cv mil_multiple_cameras

Fundamental matrix criterion

2020Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 21: 16 cv mil_multiple_cameras

Estimation of fundamental matrix

2121Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

When the fundamental matrix is correct, the epipolar line induced by a point in the first image should pass through the matching point in the second image and vice-versa.

This suggests the criterion

If and then

Unfortunately, there is no closed form solution for this quantity.

Page 22: 16 cv mil_multiple_cameras

The 8 point algorithm

2222Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Approach: • solve for fundamental matrix using homogeneous

coordinates• closed form solution (but to wrong problem!)• Known as the 8 point algorithm

Start with fundamental matrix relation

Writing out in full

or

Page 23: 16 cv mil_multiple_cameras

The 8 point algorithm

2323Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Can be written as:

where

Stacking together constraints from at least 8 pairs of points we get the system of equations

Page 24: 16 cv mil_multiple_cameras

The 8 point algorithm

2424Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Minimum direction problem of the form , Find minimum of subject to .

To solve, compute the SVD and then set to the last column of .

Page 25: 16 cv mil_multiple_cameras

Fitting concerns

2525Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• This procedure does not ensure that solution is rank 2. Solution: set last singular value to zero.

• Can be unreliable because of numerical problems to do with the data scaling – better to re-scale the data first

• Needs 8 points in general positions (cannot all be planar).

• Fails if there is not sufficient translation between the views

• Use this solution to start non-linear optimisation of true criterion (must ensure non-linear constraints obeyed).

• There is also a 7 point algorithm for (useful if fitting repeatedly in RANSAC)

Page 26: 16 cv mil_multiple_cameras

Structure

2626Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 27: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

2727Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Start with pair of images taken from slightly different viewpoints

Page 28: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

2828Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Find features using a corner detection algorithm

Page 29: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

2929Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Match features using a greedy algorithm

Page 30: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

3030Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fit fundamental matrix using robust algorithm such as RANSAC

Page 31: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

3131Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Find matching points that agree with the fundamental matrix

Page 32: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

3232Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Extract essential matrix from fundamental matrix • Extract rotation and translation from essential matrix• Reconstruct the 3D positions w of points• Then perform non-linear optimisation over points and

rotation and translation between cameras

Page 33: 16 cv mil_multiple_cameras

Two view reconstruction pipeline

3333Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Reconstructed depth indicated by color

Page 34: 16 cv mil_multiple_cameras

Dense Reconstruction

3434Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• We’d like to compute a dense depth map (an estimate of the disparity at every pixel)

• www• Approaches to this include dynamic programming and graph

cuts

• However, they all assume that the correct match for each point is on the same horizontal line.

• To ensure this is the case, we warp the images

• This process is known as rectification

Page 35: 16 cv mil_multiple_cameras

Structure

3535Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 36: 16 cv mil_multiple_cameras

Rectification

3636Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

We have already seen one situation where the epipolar lines are horizontal and on the same line:

when the camera movement is pure translation in the udirection.

Page 37: 16 cv mil_multiple_cameras

Planar rectification

3737Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Apply homographiesand to image

1 and 2

Page 38: 16 cv mil_multiple_cameras

• Start with which breaks down as• Move origin to center of image

• Rotate epipole to horizontal direction

• Move epipole to infinity

Planar rectification

3838Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 39: 16 cv mil_multiple_cameras

Planar rectification

3939Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• There is a family of possible homographies that can be applied to image 1 to achieve the desired effect

• These can be parameterized as

• One way to choose this is to pick the parameter that makes the mapped points in each transformed image closest in a least squares sense

where

Page 40: 16 cv mil_multiple_cameras

Before rectification

4040Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Before rectification, the epipolar lines converge

Page 41: 16 cv mil_multiple_cameras

After rectification

4141Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

After rectification the epipolar lines are horizontal and aligned with one another

Page 42: 16 cv mil_multiple_cameras

Polar rectification

4242Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Planar rectification does not work if epipole lies within the image.

Page 43: 16 cv mil_multiple_cameras

Polar rectification

4343Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Polar rectification works in this situation, but distorts the image more

Page 44: 16 cv mil_multiple_cameras

Dense Stereo

4444Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 45: 16 cv mil_multiple_cameras

Structure

4545Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 46: 16 cv mil_multiple_cameras

Multi-view reconstruction

4646Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 47: 16 cv mil_multiple_cameras

Multi-view reconstruction

4747Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 48: 16 cv mil_multiple_cameras

Reconstruction from video

4848Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

1. Images taken from same camera and can also optimise for intrinsic parameters (auto-calibration)

2. Matching points is easier as can track them through the video

3. Not every point is within every image

4. Additional constraints on matching – three-view equivalent of fundamental matrix is tri-focal tensor

5. New ways of initialising all of the camera parameters simultaneously (factorisation algorithm)

Page 49: 16 cv mil_multiple_cameras

Bundle Adjustment

4949Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bundle adjustment refers to process of refining initial estimates of structure and motion using non-linear optimisation.

This problem has the least squares form:

where:

Page 50: 16 cv mil_multiple_cameras

Bundle Adjustment

5050Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

This type of least squares problem is suited to optimisation techniques such as the Gauss-=Newton method:

Where

The bulk of the work is inverting JTJ. To do this efficiently, we must exploit the structure within the matrix.

Page 51: 16 cv mil_multiple_cameras

Structure

5151Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 52: 16 cv mil_multiple_cameras

3D reconstruction pipeline

5252Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 53: 16 cv mil_multiple_cameras

Photo-Tourism

5353Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 54: 16 cv mil_multiple_cameras

Volumetric graph cuts

5454Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 55: 16 cv mil_multiple_cameras

Conclusions

5555Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Given a set of a photos of the same rigid object it is possible to build an accurate 3D model of the object and reconstruct the camera positions

• Ultimately relies on a large-scale non-linear optimisation procedure.

• Works if optical properties of the object are simple (no specular reflectance etc.)