3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

3D Vision – Real Objects, Real Cameras Chapter 11 (parts of) , 12 (parts of) Computerized Image Analysis 2 Anders Brun, [email protected]

3D Vision

!  Philisophy !  Image formation

"  The pinhole camera "  Projective geometry "  Artefacts and challenges

!  Camera calibration !  Stereo vision !  Structured Light

Philosophy: Why 3-D?

!  Why do we model things in 3-D? !  Without a 3-D model of the

world, events are more difficult to predict! Movement, grasping, collision estimation, real size estimation, …

!  Example: 2-D: A car on the highway looks bigger and drives faster when it approaches

!  3-D: A car on the highway has constant size and speed when it approaches

x z

x y

Philosophy: 3-D cues …

Photo: Greg Keene

• Shape from: • Focus • Lighting • Stereo • Structured light • …

Philosophy: 3-D cues …

Philosophy: Marr and 2.5-D

!  Primal sketch: Edges and areas !  2.5-D sketch: Texture and depth !  3-D model: A hierarchical 3-D model of the world

Teddy dataset, from http://cat.middlebury.edu/

Philosophies

!  Build accurate 3-D world representation 1.  Build a complete 3-D model of the scene 2.  Plan the task using the 3-D model 3.  Example: Build a model of the scene, then

find the teddy bear and send a robot arm to grab it.

!  Plan as you go, act and react 1.  Collect features from the scene 2.  Use the features to guide your actions 3.  Example: Find the teddy bear using

template matching in image, then send the robot hand in that direction. Possibly take more images when halfway.

Passive, Active and Dynamic Vision

!  Passive vision: "  The camera has a fixed location

!  Dynamic vision: "  The camera is moving but cannot be steered

!  Active vision: "  The camera can be steered

The pinhole camera

!  The Pinhole camera is an idealized model

!  A real aperture is not a point. !  A real aperture has a non-vanishing area and

typically also a lens…

The pinhole camera model

!  Where is the point P projected on the image plane inside the camera?

f

P=(X,Y,Z)

x

focal point or origin (the “pinhole”)

image plane

€

x = − fXZ

The pinhole camera model (alternative)

!  Imagine an observer is located at the focal point !  A screen is placed at distance f from observer. !  Where on this screen is P projected

f = focal length

P=(X,Y,Z)

x

focal point (the “observer”)

screen

€

x = + fXZ

y = + f YZ

The pinhole camera model

!  In the pinhole camera, the world appears to be upside down (or 180° rotated).

!  The alternative interpretation is useful in computer graphics. It tells you exactly where to draw P on a screen, in front of the observer, in order to make it appear real for the observer. (OBS: the change of sign)

!  The alternative interpretation leads directly to “projective geometry”.

Projective Geometry (Very Briefly)

!  Points in 2-D are represented by lines in 3-D !  The 3-D space is called the embedding space !  All points along a line are equivalent !  This is analogous to a photography, every point (position) in a

photograph (2-D) corresponds to a line or ray in reality (3-D) Equivalence class

x

α x

x


!  We can convert points in the ordinary plane to the projective plane

!  2-D (x,y) # 3-D (x,y,1) !  In general: D-dimensional # (D+1)-dimensional !  Points x and α x are equivalent, α ≠ 0

1

Equivalence class

x

α x

x


1

x

1

(linear) transformation H x’

€

α x'y'1

#

$

% % %

&

'

( ( (

=

h11 h12 h13

h21 h22 h23

h31 h32 h33

#

$

% % %

&

'

( ( (

xy1

#

$

% % %

&

'

( ( (

€

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

€

⇔


!  Homography, a map from (D+1)-dim to (D+1)-dim !  Linear in the (D+1)-dim embedding space !  x’ = H x !  Represents a perspective transformation in D-dim space !  This is very nice!

1

x

1

(linear) transformation H x’


!  Using homographies, we can express a rich class of transformations using linear mappings

Identity Similarity Isometric Affine Perspective

€

R −Rt0 1#

$ %

&

' (

€

sR −Rt0 1

#

$ %

&

' (

€

A t0 1"

# $

%

& '

€

H = I

€

det(H) ≠ 0

Perspective Transformations

!  Remember this example? We wanted to compute the perspective transformation parameters.

From Feature based methods for structure and motion estimation by P. H. S. Torr and A. Zisserman


€

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

!  Estimating H from point correspondences (simplified version, check the book for a more advanced version)

!  Each point correspondence translates to 2 linear equations (in the coefficients of H)

!  Assuming h33 =1, we need 4 corresponding 2-D point pairs (x,y,x’,y’) to solve this equation system (8 unknowns).

!  This way of solving the for the parameters has severe practical disadvantages, but it shows that it is possible at least...

€

h31xx'+h32yx'+h33x'−h11x − h12y − h13 = 0h31xy'+h32yy'+h33y'−h21x − h22y − h23 = 0

€

Px (x,y,x ',y')Py (x,y,x ',y')"

# $

%

& ' h =

00"

# $ %

& '

€

⇔

€

⇔


!  A cleaner and more stable solution !  Multiply both sides with the “cross product matrix”

€

α

0 −1 y'1 0 −x'−y ' x ' 0

$

%

& & &

'

(

) ) ) x'y'1

$

%

& & &

'

(

) ) )

=

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =Q(x,y,x',y ')h

€

⇔

€

⇔

“Now three equations killing two unknowns”

Single perspective camera

C

Oi

X

u

αu =f s −w0

0 g −v0

0 0 1

"

#

$$$$

%

&

''''

1 0 0 00 1 0 00 0 1 0

"

#

$$$

%

&

''' R −Rt0T 1

"

#$$

%

&''X

αu =MX

f

M: Projection matrix

Internal parameters

External parameters


!  Estimation of M from known coordinates (X,Y,Z,1) projections in a camera (x,y,1)

!  This is analogous to the homographic projection !  Algorithms exist to solve this with 6

correspondences €

α x'y'1

#

$

% % %

&

'

( ( (

=

m11 m12 m13 m14

m21 m22 m23 m24

m31 m32 m33 m34

#

$

% % %

&

'

( ( (

XYZ1

#

$

% % % %

&

'

( ( ( (


!  This enables calibration from 6 known points !  M can be factored: You can estimate camera

focal length, image coordinate systems, camera position and rotation.

!  Triangulation: If you known several Mi, then you can also estimate a position X (3-D) using several camera projections ui ,(2-D).

Marker based motion capture

Images: courtesy of Lennart Svensson

Mocap


External calibration

!  Rotation + position, 6 DoF, ”calibration”


Motion capture applications

!  Animation !  Biomechanical analysis !  Industrial analysis


Image formation – Lenses

!  Thin lens ! 

€

zz'= f 2

Image focal point

object focal point

Image plane

€

z'

€

f

€

f

€

z

Object plane


!  Magnification, m = x/X !  From similarity, x/z’ = X/f

Image focal point

object focal point

Image plane

€

z'

€

f

€

f

€

z

Object plane

€

m =xX

=fz

=z'f

€

x

€

X


!  Depth of field

!  Thus, objects within depth of field, are scattered within an area smaller than a pixel, i.e. they are depicted sharp

Image focal point

object focal point

Image plane

€

z'

€

f

€

f

€

z

Object plane

€

ε

€

Δz

€

Δz

= size of a pixel


Image focal point

object focal point

Image plane

€

z'

€

f

€

f

€

z

Object plane

€

ε = size of a pixel

!  Depth of field

!  Aperture size and focal length both affects the depth of field. A larger aperture will yield a smaller depth of field.

€

Δz

€

Δz


Image focal point

object focal point

Image plane

€

z'

€

f

€

f

€

z

Object plane

€

ε = size of a pixel

€

Δz'

€

Δz'

!  Depth of focus

!  “Depth of focus” is analogous. How much the image plane can be shifted without scattering light from a point in focus more than a pixel

AACAM – @ Matlab File Exchange

!  Matlab code for non-perfect pinhole camera "  Set aperture radius and focal length "  Set depth of field "  Set object distance and aperture radius


!  (systems of) lenses # distortions: !  Spherical aberration !  Shorter focal length close to edges of lens

(Image from wikipedia)


!  (systems of) lenses # distortions: !  Coma



!  (systems of) lenses # distortions: !  Chromatic aberration



!  (systems of) lenses # distortions: !  Astigmatism



!  (systems of) lenses # distortions: !  Geometric distortion


Barrel distortion Pincushion distortion

Is this really a problem?

!  In old and cheap cameras, yes !  Uppsala 1999-01-01

From http://www.uu.se/carpediem/1999/

Is this really a problem?

!  But also for e.g. modern GoPRO cameras!

Camera Calibration Toolbox

!  A Matlab toolbox for camera calibration: !  http://www.vision.caltech.edu/bouguetj/calib_doc/ !  Freely available

Camera Toolbox Calibration

!  Focal length: The focal length in pixels is stored in the 2x1 vector fc. !  Principal point: The principal point coordinates are stored in the 2x1

vector cc. !  Skew coefficient: The skew coefficient defining the angle between

the x and y pixel axes is stored in the scalar alpha_c. !  Distortions: The image distortion coefficients (radial and tangential

distortions) are stored in the 5x1 vector kc.

Stereo – Basic equations

x z

B

f

P=(X,Y,Z)

x1 x2

€

x1 = − fXZ

€

x2 = − fX − BZ

€

⇒ Z =fB

x2 − x1=fBd

P=(X,Y,Z)

B

Stereo – the general case

!  It may happen that the relation between the two cameras is not a paralax translation

!  Then the “epipolar constraint” applies !  By “rectification” epipolar lines are aligned with

scanlines

From: Epipolar Rectification by Fusiello et al.

Stereo – Disparity Estimation

!  Search horizontally for patch disparity, use e.g. sum of squared differences (SSD)

Teddy dataset, from http://cat.middlebury.edu/

Stereo – Depth estimation

!  A simple formula converting disparity d to distance z when the inter camera distance is B:

! 

€

Z =fBd

Patch based estimate Ground truth

Stereo – Constraints

!  Constraints (Marr and Poggio): "  Each point in each image is assigned at most one

disparity value "  The disparity varies smoothly at most locations in the

images !  However… !  Different regularization

may be applied to the depth function x

z

x1 x2

Stereo from Segmentation

!  Alternative approach: "  Make a segmentation of the image first "  Apply a linear model in each segmented region "  Refine the models in the regions …

From Segment-based Stereo Matching Using Graph Cuts by Hong and CHen

Large Scale 3D Maps (C3/SAAB)

d

Courtesy of Petter Torle C3 Technologies

Large Scale 3D Maps (C3/SAAB)

Structured Light

!  A lightsource helps the stereo algorithm to find matching points.

!  Often used in industrial applications

From: http://mesh.brown.edu/3DPGP-2009/homework/hw2/hw2.html

More Structured Light

!  Microsoft Kinect, using infrared light

• http://www.youtube.com/watch?v=nvvQJxgykcU

Other Computer Vision Code

!  Open CV "  Free to use "  Supports IPP speedups "  http://en.wikipedia.org/wiki/OpenCV "  http://sourceforge.net/projects/opencvlibrary/ "  http://opencv.willowgarage.com/wiki/

!  Intel® Integrated Performance Primitives 6.0 "  http://www.intel.com/cd/software/products/asmo-na/eng/

302910.htm "  Commercial (but cheap) "  Includes Computer Vision, Signal Processing, Data

Compression, ….

Typical Exam Questions …

!  Project this object (points) using a pinhole camera

!  Can geometric transformation compensate for lens distortions in general?

!  Explain the parameters building up the projection matrix M

€

u =

f s −w0

0 g −v0

0 0 1

#

$

% % %

&

'

( ( (

1 0 0 00 1 0 00 0 1 0

#

$

% % %

&

'

( ( (

R −Rt0T 1#

$ %

&

' ( X

u =MX

Thank You!

!  Email questions to: [email protected]

3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

Documents

Transcript of 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...