Agenda - Carnegie Mellon University16720.courses.cs.cmu.edu/lec/transformations.pdf164 Computer...

Agenda

• Rotations

• Camera calibration

• Homography

• Ransac

164 Computer Vision: Algorithms and Applications (September 3, 2010 draft)

Transformation Matrix # DoF Preserves Icon

translationh

I ti

2⇥32 orientation

rigid (Euclidean)h

R ti

2⇥33 lengths ⇢⇢

⇢⇢SSSS

similarityh

sR ti

2⇥34 angles ⇢

⇢SS

affineh

Ai

2⇥36 parallelism ⇥⇥ ⇥⇥

projectiveh

˜Hi

3⇥38 straight lines `

Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preservesthe properties listed in the rows below it, i.e., similarity preserves not only angles but alsoparallelism and straight lines. The 2⇥3 matrices are extended with a third [0T 1] row to forma full 3⇥ 3 matrix for homogeneous coordinate transformations.

amples of such transformations, which are based on the 2D geometric transformations shownin Figure 2.4. The formulas for these transformations were originally given in Table 2.1 andare reproduced here in Table 3.5 for ease of reference.

In general, given a transformation specified by a formula x0 = h(x) and a source imagef(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)?Think about this for a minute before proceeding and see if you can figure it out.

If you are like most people, you will come up with an algorithm that looks something likeAlgorithm 3.1. This process is called forward warping or forward mapping and is shown inFigure 3.46a. Can you think of any problems with this approach?

procedure forwardWarp(f,h, out g):

For every pixel x in f(x)

1. Compute the destination location x0 = h(x).

2. Copy the pixel f(x) to g(x0).

Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an imageg(x0) through the parametric transform x0 = h(x).

Geometric Transformations

x

y

Let’s define families of transformations by the properties that they preserve

Rotations

Definition: an orthogonal transformation perserves dot products

Linear transformations that preserve distances and angles

[can conclude by setting a,b = coordinate vectors]

Defn: A is a rotation matrix if ATA = I, det(A) = 1Defn: A is a reflection matrix if ATA = I, det(A) = -1

aT b = T (a)T (b) where T (a) = Aa, a 2 Rn, A 2 Rn⇥n

aT b = aTATAb () ATA = I

aT b = F (a)TF (b) where F (a) = Aa, a 2 Rn, A 2 Rn⇥n

2D Rotations

R =

cos ✓ � sin ✓sin ✓ cos ✓

�

1 DOF

3D Rotations

Think of as change of basis where ri = r(i,:) are orthonormal basis vectors

R

2

4XYZ

3

5 =

2

4r11 r12 r13r21 r22 r23r31 r32 r33

3

5

2

4XYZ

3

5

rotated coordinate frame

r1

r2

r3

How many DOFs?

3 = (2 to point r1 + 1 to rotate along r1)

3D RotationsLots of parameterizations that try to capture 3 DOFs

Helpful one for vision: axis-angle representation

Represent a 3D rotation with a unit vector that represents the axis of rotation, and an angle of rotation about that vector

7

Shears

A=

2

664

1 hxy hxz 0hyx 1 hyz 0hzx hzy 1 00 0 0 1

3

775

Shears y into x

7

8

Rotations• 3D Rotations fundamentally more complex than in 2D!

• 2D: amount of rotation!• 3D: amount and axis of rotation

-vs-

2D 3D

8

05-3DTransformations.key - February 9, 2015

Recall: cross-product

Dot product:

Cross product:

a · b = ||a|| ||b||cos✓

Cross product matrix:

��

i j ka1 a2 a3b1 b2 b3

��=

��a2 a3b2 b3

�� i��a1 a3b1 b3

�� j+��a1 a2b1 b2

��k

a⇥ b = ab =

2

40 �a3 a2a3 0 �a1�a2 a1 0

3

5

2

4b1b2b3

3

5

Approach

x

! 2 R3, ||!|| = 1

✓

Approach

x✓

! 2 R3, ||!|| = 1

xk

x?

1. Write as x as sum of parallel and perpindicular component to omega

2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega

R = I + w sin ✓ + ww(1� cos ✓)

[Rx can simplify to cross and dot product computations]

Exponential map

x✓

! 2 R3, ||!|| = 1

xk

x?

[standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…]

Implication: we can approximate change in position due to a small rotation as v ⇥ x, where v = !✓

R = exp(v), where v = !✓

= I + v +1

2!

v2 + . . .

Agenda

• Rotations


• Homography

• Ransac

Perspective projection

COP

(X,Y,Z)

(x,y,1)

x =f

Z

X

y =f

Z

Y

x

y

z

[right-handed coordinate system]

Perspective projection revisited

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4X

Y

Z

3

5

�x = fX

� = Z

x =�x

�

=fX

Z

Given (X,Y,Z) and f, compute (x,y) and lambda:

Special case: f = 1

COP

(X,Y,Z)(x,y,1)

• 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point

Natural geometric intuition:

[Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image]

Z

2

4x

y

1

3

5 =

2

4X

Y

Z

3

5

Homogenous notation

For now, think of above as shorthand notation for

2

4x

y

z

3

5 ⇠

2

4X

Y

Z

3

5

2

4x

y

z

3

5 ⌘

2

4X

Y

Z

3

5

9� s.t. �

2

4x

y

z

3

5 =

2

4X

Y

Z

3

5

Camera projection

3D point in world coordinates

Camera extrinsics (rotation and translation)

Camera instrinsic matrix K (can include skew & non-square pixel size)

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

Z

1

3

775

camera

world coordinate frame

r1

r2

r3

T

Aside: homogenous notation is shorthand for x =�x

�

Fancier intrinsicsx

s

= s

x

x

y

s

= s

y

y

x

0 = x

s

+ o

x

y

0 = y

s

+ o

y

x” = x

0 + s

✓

y

0

non-square pixels

shifted origin

x

y

✓ skewed image axes

}

}

K =

2

4s

x

s

✓

o

x

0 s

y

o

y

0 0 1

3

5

2

4f 0 00 f 00 0 1

3

5 =

2

4fs

x

fs

✓

o

x

0 fs

y

o

y

0 0 1

3

5

Notation�

2

4x

y

1

3

5 =

2

4fs

x

fs

✓

o

x

0 fs

y

o

y

0 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

Z

1

3

775

= K3⇥3

⇥R3⇥3 T3⇥1

⇤

2

664

X

Y

Z

1

3

775

= M3⇥4

2

664

X

Y

Z

1

3

775

Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor

[Using Matlab’s rows x columns]

Notation (more)M3⇥4

2

664

XYZ1

3

775 =⇥A3⇥3 b3⇥1

⇤

2

664

XYZ1

3

775

= A3⇥3

2

4XYZ

3

5+ b3⇥1

M =

2

4mT

1

mT2

mT3

3

5 , A =

2

4aT1aT2aT3

3

5 , b =

2

4b1b2b3

3

5

Applying the projection matrix

Set of 3D points that project to x = 0:

Set of 3D points that project to y = 0:

Set of 3D points that project to x = inf or y = inf:

� =⇥X Y Z

⇤a3 + b3

⇥X Y Z

⇤a1 + b1 = 0

⇥X Y Z

⇤a2 + b2 = 0

⇥X Y Z

⇤a3 + b3 = 0

x =1

�

(⇥X Y Z

⇤a1 + b1)

y =1

�(⇥X Y Z

⇤a2 + b2)

x

y

a3

Rows of the projection matrix describe the 3 planes defined by the image coordinate system

a1

a2

image plane

COP

(x,y) (X,Y,Z)

What’s set of (X,Y,Z) points that project to same (x,y)?2

4X

Y

Z

3

5 = �w + b where w = A

�1

2

4x

y

1

3

5, b = �A

�1b

What’s the position of COP / pinhole?

COP

A

2

4XYZ

3

5+ b = 0 )

2

4XYZ

3

5 = �A�1b

Other geometric properties

Affine Cameras

• Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines are projected to parallel lines • The transformation can be written as a direct linear transformation

Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z)

mT3 =

⇥0 0 0 1

⇤ x =⇥X Y Z

⇤a1 + b1

y =⇥X Y Z

⇤a2 + b1

Affine transformations = linear transformations plus an offset

Geometric Transformations

Euclidean (trans + rot) preserves lengths + angles

Euclidean

Affine

Projective

Affine: preserves parallel lines

Projective: preserves lines

Agenda

• Rotations


• Homography

• Ransac

Calibration: Recover M from scene points P1,..,PN and the corresponding projections in the image plane p1,..,pN

Find M that minimizes the distance between the actual points in the image, pi, and their predicted projections MPi

Problems: • The projection is (in general) non-linear • M is defined up to an arbitrary scale factor

PnP = Perspective n-Point

ii MPp ≡

iT

iT

ii

Ti

T

i PmPmv

PmPmu

3

2

3

1 ==

0)(0)(

32

31

=−

=−

iiT

iT

iiT

iT

vPmPmuPmPm

Write relation between image point, projection matrix, and point in space:

Write non-linear relations between coordinates:

Make them linear:

The math for the calibration procedure follows a recipe that is used in many (most?) problems involving camera geometry, so it’s worth remembering:

0

00

00

111

111

=

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

−

−

−

−

m

PvPPuP

PvPPuP

TNN

TN

TNN

TN

TT

TT

��

Put all the relations for all the points into a single matrix:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

==⎥⎦

⎤

−

−⎢⎣

⎡

3

2

1

00

0 mmm

mmPvPu

PP

Tii

Tii

Ti

TiWrite them in

matrix form:

In noise-free case: Lm = 0

(vector of 0’s)

What about noisy case?

min||m||2=1

||Lm||2

Is this the right error to minimize?

If not, what is?

Min right singular vector of L (or eigenvector of LTL)

P1

z

x

y

Pi

(ui,vi)

(u1,v1)

MPi

Ideal error

2

3

2

2

3

1⎟⎟⎠

⎞⎜⎜⎝

⎛

⋅

⋅−+⎟⎟

⎠

⎞⎜⎜⎝

⎛

⋅

⋅−

i

ii

i

ii Pm

PmvPmPmuError(M) =

Initialize nonlinear optimization with “algebraic” solution

Radial Lens Distortions

Radial Lens Distortions

No Distortion Barrel Distortion Pincushion Distortion

Correcting Radial Lens Distortions

Before After

http://www.grasshopperonline.com/barrel_distortion_correction_software.html

Overall approachError(M,k’s)Minimize reprojection error:

Initialize with algebraic solution (approaches in literature based on various assumptions)

Revisiting homographies

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

01

3

775

Place world coordinate frame on object plane

Projection of planar points

Convert between 2D location on object plane and image coordinate with a 3X3 matrix H(Above holds for any instrinc matrix K)

�

2

4x

y

1

3

5 =

2

4f 0 00 f 00 0 1

3

5

2

4r11 r12 r13 t

x

r21 r22 r23 t

y

r31 r32 r33 t

z

3

5

2

664

X

Y

01

3

775

=

2

4f 0 00 f 00 0 1

3

5

2

4r11 r12 t

x

r21 r22 t

y

r31 r32 t

z

3

5

2

4X

Y

1

3

5

=

2

4fr11 fr12 ft

x

fr21 fr22 ft

y

r31 r32 t

z

3

5

2

4X

Y

1

3

5

Two-views of a plane

Image correspondences

�1

2

4x1

y11

3

5 = H1

2

4XY1

3

5

�2

2

4x2

y21

3

5 = H2

2

4XY1

3

5

�

2

4x2

y21

3

5 = H

2

4XY1

3

5

[Aside: H usually invertible]

[LHS and RHS are related by a scale factor]

�

2

4x2

y21

3

5 = H2H�11

2

4x1

y11

3

5

Computing homography projections

�

2

4x2

y2

1

3

5 =

2

4a b c

d e f

g h i

3

5

2

4x1

y1

1

3

5

Given (x1,y1) and H, how do we compute (x2,y2)?

Is this operation linear in H or (x1,y1)?

x2 =�x2

�

=ax1 + by1 + c

gx1 + hy1 + i

Estimating homographies


Given corresponding 2D points in left and right image, estimate H

How many corresponding points needed? How many degrees of freedom in H?

Homogenous linear systemAH(:) =

2

6400...

3

75

x2(gx1 + hy1 + i) = ax1 + by1 + c

...

Estimating homographies


H is determined only up to scale factor (8 DOFs) Need 4 points minimum. How to handle more points?

min||H(:)||2=1

||AH(:)||2

Minimum right singular vector of A (eigenvector of ATA)

AH(:) =

2

6400...

3

75

Given corresponding 2D points in left and right image, estimate H

“Frontalizing” planes using homographies

Estimate homography on (at least) 4 pairs of corresponding points (e.g., corners of quad/rect)

Apply homography on all (x,y) coordinates inside target rectangle to compute source pixel location

“Frontalizing” planes using homographies

Special case of 2 views: rotations about camera center

LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5

cues (parallax) can only be recovered when T is nonzero. Looking at thehomography equation, the limit of H as d approaches infinity is R. Thus anypair of images of an arbitrary scene captured by a purely rotating camera isrelated by a planar homography.

A planar panorama can be constructed by capturing many overlappingimages at di↵erent rotations, picking an image to be a reference, and thenfinding corresponding points between the overlapping images. The pairwisehomographies are derived from the corresponding points, forming a mosaicthat typically is shaped like a “bow-tie,” as images farther away from thereference are warped outward to fit the homography. The figure below isfrom Pollefeys and Hartley & Zisserman.

4.7. Second Derivation of Homography Constraint

The homography constraint, element by element, in homogenous coordinatesis as follows:

2

4x2

y2z2

3

5 =

2

4H11 H12 H13

H21 H22 H23

H31 H32 H33

3

5

2

4x1

y1z1

3

5 , x2 ⇠ Hx1

In inhomogenous coordinates (x02 = x2/z2 and y02 = y2/z2),

Can be modeled as planar transformations, regardless of scene geometry!

(a) incline L.jpg (img1) (b) incline R.jpg (img2) (c) img2 warped to img1’s frame

Figure 5: Example output for Q6.1: Original images img1 and img2 (left and center) andimg2 warped to fit img1 (right). Notice that the warped image clips out of the image. Wewill fix this in Q6.2

H2to1=computeH(p1,p2)

Inputs: p1 and p2 should be 2⇥N matrices of corresponding (x, y)T coordinatesbetween two images.Outputs: H2to1 should be a 3⇥ 3 matrix encoding the homography that best matchesthe linear equation derived above for Equation 8 (in the least squares sense). Hint:Remember that a homography is only determined up to scale. The Matlab functionseig() or svd() will be useful. Note that this function can be written without anexplicit for-loop over the data points.

6 Stitching it together: Panoramas (30 pts)

We can also use homographies to create a panorama image from multiple views of the samescene. This is possible for example when there is no camera translation between the views(e.g., only rotation about the camera center), as we saw in Q4.2.

First, you will generate panoramas using matched point correspondences between imagesusing the BRIEF matching you implemented in Q2.4. We will assume that there is no errorin your matched point correspondences between images (Although there might be someerrors).

In the next section you will extend the technique to use (potentially noisy) keypointmatches.

You will need to use the provided function warp im=warpH(im, H, out size), whichwarps image im using the homography transform H. The pixels in warp_im are sampledat coordinates in the rectangle (1, 1) to (out_size(2), out_size(1)). The coordinates ofthe pixels in the source image are taken to be (1, 1) to (size(im,2), size(im,1)) andtransformed according to H.

• Q6.1 (15pts) In this problem you will implement and use the function (stub providedin matlab/imageStitching.m):

[panoImg] = imageStitching(img1, img2, H2to1)

on two images from the Dusquesne incline. This function accepts two images and theoutput from the homography estimation function. This function will:

10

Figure 6: Final panorama view. With homography estimated with RANSAC.

• a folder matlab containing all the .m and .mat files you were asked to write andgenerate

• a pdf named writeup.pdf containing the results, explanations and images asked forin the assignment along with to the answers to the questions on homographies.

Submit all the code needed to make your panorama generator run. Make sure all the .m

files that need to run are accessable from the matlab folder without any editing of the pathvariable. If you downloaded and used a feature detector for the extra credit, include thecode with your submission and mention it in your writeup. You may leave the data folderin your submission, but it is not needed. Please zip your homework as usual and submit itusing blackboard.

Appendix: Image Blending

Note: This section is not for credit and is for informational purposes only.

For overlapping pixels, it is common to blend the values of both images. You can sim-ply average the values but that will leave a seam at the edges of the overlapping images.Alternatively, you can obtain a blending value for each image that fades one image into theother. To do this, first create a mask like this for each image you wish to blend:

mask = zeros(size(im,1), size(im,2));

mask(1,:) = 1; mask(end,:) = 1; mask(:,1) = 1; mask(:,end) = 1;

mask = bwdist(mask, ’city’);

mask = mask/max(mask(:));

The function bwdist computes the distance transform of the binarized input image, so thismask will be zero at the borders and 1 at the center of the image. You can warp this maskjust as you warped your images. How would you use the mask weights to compute a linearcombination of the pixels in the overlap region? Your function should behave well whereone or both of the blending constants are zero.

13

Derivation

LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5

cues (parallax) can only be recovered when T is nonzero. Looking at thehomography equation, the limit of H as d approaches infinity is R. Thus anypair of images of an arbitrary scene captured by a purely rotating camera isrelated by a planar homography.

A planar panorama can be constructed by capturing many overlappingimages at di↵erent rotations, picking an image to be a reference, and thenfinding corresponding points between the overlapping images. The pairwisehomographies are derived from the corresponding points, forming a mosaicthat typically is shaped like a “bow-tie,” as images farther away from thereference are warped outward to fit the homography. The figure below isfrom Pollefeys and Hartley & Zisserman.

4.7. Second Derivation of Homography Constraint

The homography constraint, element by element, in homogenous coordinatesis as follows:

2

4x2

y2z2

3

5 =

2

4H11 H12 H13

H21 H22 H23

H31 H32 H33

3

5

2

4x1

y1z1

3

5 , x2 ⇠ Hx1

In inhomogenous coordinates (x02 = x2/z2 and y02 = y2/z2),

…

K2

2

4X2

Y2

Z2

3

5 = R

2

4X1

Y1

Z1

3

5

�2

2

4x2

y2

1

3

5 =

2

4f2 0 00 f2 00 0 1

3

5

2

4X2

Y2

Z2

3

5

�

2

4x2

y2

1

3

5 = K2RK

�11

2

4x1

y1

1

3

5

Take-home points for homographies

• If camera rotates about its center, then the images are related by a homography irrespective of scene depth.

• If the scene is planar, then images from any two cameras are related by a homography.

• Homography mapping is a 3x3 matrix with 8 degrees of freedom.

�

2

4x2

y2

1

3

5 =

2

4a b c

d e f

g h i

3

5

2

4x1

y1

1

3

5

Matching features

What do we do about the “bad” matches?

49

General problem: we are trying to fit a (geometric) model to noisy data

How about we choose the average vector (least-squares soln)? Why will/won’t this work?

Let’s generalize the problem a bitEstimate best model (a line) that fits data {xi, yi}

minw,b

X

i

(yi � fw,b(xi))2

fw,b(xi) = wxi + b

x

y

Let’s generalize the problem a bit“Least-squares” solution

x

y

RANSAC Line Fitting Example

Sample two points


Fit Line


Total number of points within a threshold of line.


Repeat, until get a good result

RAndom SAmple Consensus

Select one match, count inliers

Least squares fit

Find “average” translation vector for the largest group of inliers

RANSAC for estimating transformation

RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 3. Compute inliers (point matches where |pi’ - T pi|2< ε) 4. Keep largest set of inliers

5. Re-compute least-squares estimate of transformation T using all of the inliers

RANSAC for estimating transformation

RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 3. Compute inliers (point matches where |pi’ - T pi|2< ε) 4. Keep largest set of inliers

5. Re-compute least-squares estimate of transformation T using all of the inliers

Ah = 0, A 2 R8X9 h, 0 2 R9

Recall homography estimation: how do we estimate with all inlier points?

RANSAC for alignment

Planar object recognition(what is transformation used; how many pairs must be selected in initial step?

Agenda - Carnegie Mellon University16720.courses.cs.cmu.edu/lec/transformations.pdf164 Computer...

Documents

Transcript of Agenda - Carnegie Mellon University16720.courses.cs.cmu.edu/lec/transformations.pdf164 Computer...