3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

57
3D Vision – Real Objects, Real Cameras Chapter 11 (parts of) , 12 (parts of) Computerized Image Analysis 2 Anders Brun, [email protected]

Transcript of 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world,...

Page 1: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

3D Vision – Real Objects, Real Cameras Chapter 11 (parts of) , 12 (parts of) Computerized Image Analysis 2 Anders Brun, [email protected]

Page 2: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

3D Vision

!  Philisophy !  Image formation

"  The pinhole camera "  Projective geometry "  Artefacts and challenges

!  Camera calibration !  Stereo vision !  Structured Light

Page 3: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: Why 3-D?

!  Why do we model things in 3-D? !  Without a 3-D model of the

world, events are more difficult to predict! Movement, grasping, collision estimation, real size estimation, …

!  Example: 2-D: A car on the highway looks bigger and drives faster when it approaches

!  3-D: A car on the highway has constant size and speed when it approaches

x z

x y

Page 4: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: 3-D cues …

Photo: Greg Keene

• Shape from: • Focus • Lighting • Stereo • Structured light • …

Page 5: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: 3-D cues …

Page 6: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: 3-D cues …

Page 7: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: 3-D cues …

Page 8: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophy: Marr and 2.5-D

!  Primal sketch: Edges and areas !  2.5-D sketch: Texture and depth !  3-D model: A hierarchical 3-D model of the world

Teddy dataset, from http://cat.middlebury.edu/

Page 9: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Philosophies

!  Build accurate 3-D world representation 1.  Build a complete 3-D model of the scene 2.  Plan the task using the 3-D model 3.  Example: Build a model of the scene, then

find the teddy bear and send a robot arm to grab it.

!  Plan as you go, act and react 1.  Collect features from the scene 2.  Use the features to guide your actions 3.  Example: Find the teddy bear using

template matching in image, then send the robot hand in that direction. Possibly take more images when halfway.

Page 10: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Passive, Active and Dynamic Vision

!  Passive vision: "  The camera has a fixed location

!  Dynamic vision: "  The camera is moving but cannot be steered

!  Active vision: "  The camera can be steered

Page 11: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

The pinhole camera

!  The Pinhole camera is an idealized model

!  A real aperture is not a point. !  A real aperture has a non-vanishing area and

typically also a lens…

Page 12: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

The pinhole camera model

!  Where is the point P projected on the image plane inside the camera?

f

P=(X,Y,Z)

x

focal point or origin (the “pinhole”)

image plane

x = − fXZ

Page 13: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

The pinhole camera model (alternative)

!  Imagine an observer is located at the focal point !  A screen is placed at distance f from observer. !  Where on this screen is P projected

f = focal length

P=(X,Y,Z)

x

focal point (the “observer”)

screen

x = + fXZ

y = + f YZ

Page 14: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

The pinhole camera model

!  In the pinhole camera, the world appears to be upside down (or 180° rotated).

!  The alternative interpretation is useful in computer graphics. It tells you exactly where to draw P on a screen, in front of the observer, in order to make it appear real for the observer. (OBS: the change of sign)

!  The alternative interpretation leads directly to “projective geometry”.

Page 15: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Projective Geometry (Very Briefly)

!  Points in 2-D are represented by lines in 3-D !  The 3-D space is called the embedding space !  All points along a line are equivalent !  This is analogous to a photography, every point (position) in a

photograph (2-D) corresponds to a line or ray in reality (3-D) Equivalence class

x

α x

x

Page 16: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Projective Geometry (Very Briefly)

!  We can convert points in the ordinary plane to the projective plane

!  2-D (x,y) # 3-D (x,y,1) !  In general: D-dimensional # (D+1)-dimensional !  Points x and α x are equivalent, α ≠ 0

1

Equivalence class

x

α x

x

Page 17: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Projective Geometry (Very Briefly)

1

x

1

(linear) transformation H x’

α x'y'1

#

$

% % %

&

'

( ( (

=

h11 h12 h13

h21 h22 h23

h31 h32 h33

#

$

% % %

&

'

( ( (

xy1

#

$

% % %

&

'

( ( (

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

Page 18: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Projective Geometry (Very Briefly)

!  Homography, a map from (D+1)-dim to (D+1)-dim !  Linear in the (D+1)-dim embedding space !  x’ = H x !  Represents a perspective transformation in D-dim space !  This is very nice!

1

x

1

(linear) transformation H x’

Page 19: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Projective Geometry (Very Briefly)

!  Using homographies, we can express a rich class of transformations using linear mappings

Identity Similarity Isometric Affine Perspective

R −Rt0 1#

$ %

&

' (

sR −Rt0 1

#

$ %

&

' (

A t0 1"

# $

%

& '

H = I

det(H) ≠ 0

Page 20: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Perspective Transformations

!  Remember this example? We wanted to compute the perspective transformation parameters.

From Feature based methods for structure and motion estimation by P. H. S. Torr and A. Zisserman

Page 21: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Perspective Transformations

x'= h11x + h12y + h13h31x + h32y + h33

y'=h21x + h22y + h23h31x + h32y + h33

!  Estimating H from point correspondences (simplified version, check the book for a more advanced version)

!  Each point correspondence translates to 2 linear equations (in the coefficients of H)

!  Assuming h33 =1, we need 4 corresponding 2-D point pairs (x,y,x’,y’) to solve this equation system (8 unknowns).

!  This way of solving the for the parameters has severe practical disadvantages, but it shows that it is possible at least...

h31xx'+h32yx'+h33x'−h11x − h12y − h13 = 0h31xy'+h32yy'+h33y'−h21x − h22y − h23 = 0

Px (x,y,x ',y')Py (x,y,x ',y')"

# $

%

& ' h =

00"

# $ %

& '

Page 22: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Perspective Transformations

!  A cleaner and more stable solution !  Multiply both sides with the “cross product matrix”

α

0 −1 y'1 0 −x'−y ' x ' 0

$

%

& & &

'

(

) ) ) x'y'1

$

%

& & &

'

(

) ) )

=

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =

0 −1 y '1 0 −x '−y' x' 0

$

%

& & &

'

(

) ) )

h11 h12 h13

h21 h22 h23

h31 h32 h33

$

%

& & &

'

(

) ) ) xy1

$

%

& & &

'

(

) ) )

0 =Q(x,y,x',y ')h

“Now three equations killing two unknowns”

Page 23: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Single perspective camera

C

Oi

X

u

αu =f s −w0

0 g −v0

0 0 1

"

#

$$$$

%

&

''''

1 0 0 00 1 0 00 0 1 0

"

#

$$$

%

&

''' R −Rt0T 1

"

#$$

%

&''X

αu =MX

f

M: Projection matrix

Internal parameters

External parameters

Page 24: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Single perspective camera

!  Estimation of M from known coordinates (X,Y,Z,1) projections in a camera (x,y,1)

!  This is analogous to the homographic projection !  Algorithms exist to solve this with 6

correspondences €

α x'y'1

#

$

% % %

&

'

( ( (

=

m11 m12 m13 m14

m21 m22 m23 m24

m31 m32 m33 m34

#

$

% % %

&

'

( ( (

XYZ1

#

$

% % % %

&

'

( ( ( (

Page 25: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Single perspective camera

!  This enables calibration from 6 known points !  M can be factored: You can estimate camera

focal length, image coordinate systems, camera position and rotation.

!  Triangulation: If you known several Mi, then you can also estimate a position X (3-D) using several camera projections ui ,(2-D).

Page 26: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Marker based motion capture

Images: courtesy of Lennart Svensson

Page 27: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Mocap

Images: courtesy of Lennart Svensson

Page 28: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

External calibration

!  Rotation + position, 6 DoF, ”calibration”

Images: courtesy of Lennart Svensson

Page 29: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Motion capture applications

!  Animation !  Biomechanical analysis !  Industrial analysis

Images: courtesy of Lennart Svensson

Page 30: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  Thin lens ! 

zz'= f 2

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

Page 31: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  Magnification, m = x/X !  From similarity, x/z’ = X/f

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

m =xX

=fz

=z'f

x

X

Page 32: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  Depth of field

!  Thus, objects within depth of field, are scattered within an area smaller than a pixel, i.e. they are depicted sharp

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε

Δz

Δz

= size of a pixel

Page 33: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε = size of a pixel

!  Depth of field

!  Aperture size and focal length both affects the depth of field. A larger aperture will yield a smaller depth of field.

Δz

Δz

Page 34: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

Image focal point

object focal point

Image plane

z'

f

f

z

Object plane

ε = size of a pixel

Δz'

Δz'

!  Depth of focus

!  “Depth of focus” is analogous. How much the image plane can be shifted without scattering light from a point in focus more than a pixel

Page 35: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

AACAM – @ Matlab File Exchange

!  Matlab code for non-perfect pinhole camera "  Set aperture radius and focal length "  Set depth of field "  Set object distance and aperture radius

Page 36: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  (systems of) lenses # distortions: !  Spherical aberration !  Shorter focal length close to edges of lens

(Image from wikipedia)

Page 37: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  (systems of) lenses # distortions: !  Coma

(Image from wikipedia)

Page 38: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  (systems of) lenses # distortions: !  Chromatic aberration

(Image from wikipedia)

Page 39: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  (systems of) lenses # distortions: !  Astigmatism

(Image from wikipedia)

Page 40: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Image formation – Lenses

!  (systems of) lenses # distortions: !  Geometric distortion

(Image from wikipedia)

Barrel distortion Pincushion distortion

Page 41: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Is this really a problem?

!  In old and cheap cameras, yes !  Uppsala 1999-01-01

From http://www.uu.se/carpediem/1999/

Page 42: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Is this really a problem?

!  But also for e.g. modern GoPRO cameras!

Page 43: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Camera Calibration Toolbox

!  A Matlab toolbox for camera calibration: !  http://www.vision.caltech.edu/bouguetj/calib_doc/ !  Freely available

Page 44: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Camera Toolbox Calibration

!  Focal length: The focal length in pixels is stored in the 2x1 vector fc. !  Principal point: The principal point coordinates are stored in the 2x1

vector cc. !  Skew coefficient: The skew coefficient defining the angle between

the x and y pixel axes is stored in the scalar alpha_c. !  Distortions: The image distortion coefficients (radial and tangential

distortions) are stored in the 5x1 vector kc.

Page 45: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo – Basic equations

x z

B

f

P=(X,Y,Z)

x1 x2

x1 = − fXZ

x2 = − fX − BZ

⇒ Z =fB

x2 − x1=fBd

P=(X,Y,Z)

B

Page 46: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo – the general case

!  It may happen that the relation between the two cameras is not a paralax translation

!  Then the “epipolar constraint” applies !  By “rectification” epipolar lines are aligned with

scanlines

From: Epipolar Rectification by Fusiello et al.

Page 47: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo – Disparity Estimation

!  Search horizontally for patch disparity, use e.g. sum of squared differences (SSD)

Teddy dataset, from http://cat.middlebury.edu/

Page 48: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo – Depth estimation

!  A simple formula converting disparity d to distance z when the inter camera distance is B:

! 

Z =fBd

Patch based estimate Ground truth

Page 49: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo – Constraints

!  Constraints (Marr and Poggio): "  Each point in each image is assigned at most one

disparity value "  The disparity varies smoothly at most locations in the

images !  However… !  Different regularization

may be applied to the depth function x

z

x1 x2

Page 50: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Stereo from Segmentation

!  Alternative approach: "  Make a segmentation of the image first "  Apply a linear model in each segmented region "  Refine the models in the regions …

From Segment-based Stereo Matching Using Graph Cuts by Hong and CHen

Page 51: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Large Scale 3D Maps (C3/SAAB)

d

Courtesy of Petter Torle C3 Technologies

Page 52: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Large Scale 3D Maps (C3/SAAB)

Page 53: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Structured Light

!  A lightsource helps the stereo algorithm to find matching points.

!  Often used in industrial applications

From: http://mesh.brown.edu/3DPGP-2009/homework/hw2/hw2.html

Page 54: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

More Structured Light

!  Microsoft Kinect, using infrared light

• http://www.youtube.com/watch?v=nvvQJxgykcU

Page 55: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Other Computer Vision Code

!  Open CV "  Free to use "  Supports IPP speedups "  http://en.wikipedia.org/wiki/OpenCV "  http://sourceforge.net/projects/opencvlibrary/ "  http://opencv.willowgarage.com/wiki/

!  Intel® Integrated Performance Primitives 6.0 "  http://www.intel.com/cd/software/products/asmo-na/eng/

302910.htm "  Commercial (but cheap) "  Includes Computer Vision, Signal Processing, Data

Compression, ….

Page 56: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Typical Exam Questions …

!  Project this object (points) using a pinhole camera

!  Can geometric transformation compensate for lens distortions in general?

!  Explain the parameters building up the projection matrix M

u =

f s −w0

0 g −v0

0 0 1

#

$

% % %

&

'

( ( (

1 0 0 00 1 0 00 0 1 0

#

$

% % %

&

'

( ( (

R −Rt0T 1#

$ %

&

' ( X

u =MX

Page 57: 3D Vision – Real Objects, Real Cameras · 2017. 2. 21. · Without a 3-D model of the world, events are more difficult to predict! Movement, grasping, collision estimation, real

Thank You!

!  Email questions to: [email protected]