Visual Servoing (part I & III) - unistra.fr

Visual Servoing (part I & III)From computer vision geometry

to visual servoing

Master ISTI PUF Medical Robotics (2009)

Christophe DOIGNON

Maître de Conférences HdR UDS

(member of the IEEE Robotics and automation and Signal societies)

mail : [email protected] web : http://eavr.u-strasbg.fr/~christophe

Phone : +33(0)3902 44341 (Office C418, level 4) Pôle API – ENSPS Boulevard Brant 67412 Illkirch, FRANCE

mailto:[email protected]

http://eavr.u-strasbg.fr/~christophe

Multiple view geometry in computer vision, Andrew Zisserman, Richard Hartley, Cambridge University Press, ISBN : 0-521-62304-9, 2000, 607 pages.

Introductory techniques for 3-D computer vision, Emanuele Trucco, Alessandro Verri, Prentice Hall, ISBN:0-13-261108-2, 1998, 330 pages

Geometric Invariance in Computer Vision, J. Mundy, A. Zisserman,MIT Press, ISBN: 0-262-13285-0, 1992, 540 pages

Computer and Robot Vision, R.M. Haralick, L.G. Shapiro, Addison-Wesley Publishing, 1992, 0-201-10877-1, (2 volumes), 1340 pages

Digital Image Processing, K. Castleman,Prentice Hall, ISBN: 0-13-211467-4, 1996, 650 pages

Handbook of Image and Video Processing, A. Bovik,Academic Press, ISBN:0-12-119790-5, 2000, 880 pages,

Image Analysis for the Biological Sciences, C.A. Glasbey and G.W. Horgan,Vic Barnett Editor,Wiley & Sons, ISBN:0-471-93726-6, 1995, 210 pages

References and further readings (CV)

A new approach to visual servoing in robotics, B. Espiau, F. Chaumette, and P. Rives, IEEE Trans. Robotics and Automation, vol. 8, pp. 313–326, June 1992.

Model-based object pose in 25 lines of codes, D. Dementhon, L. Davis. IJCV, 15:123–141, 1995.

A tutorial on visual servo control, S. Hutchinson, G. Hager, P. Corke.. IEEE TRA, 12(5):651–670, 1996.

Visual Servo Control Part I: Basic Approaches, F. Chaumette, S. Hutchinson, IEEE Robotics and Automation Mag, Décembre 2006.

A new technique for fully autonomous and efficient 3D robotics hand-eye calibration, R. Tsai and R. Lenz, IEEE Trans. Rob. Automat., vol. 5, no. 3, pp. 345–358, June 1989.

Dynamic sensor-based control of robots with visual feedback, L. Weiss, A. Sanderson, and C. Neuman, IEEE J. Robot. Automat., vol. 3, pp.404–417, Oct. 1987.

Visual tracking of a moving target by a camera mounted on a robot: A combination of vision and control, N. Papanikolopoulos, P. Khosla, and T. Kanade, IEEE Trans. Robot. Automat., vol. 9, no. 1, pp. 14–35, Feb. 1993.

References (Visual servoing)

Outlines

Chapter 0 : Introduction and applications

Chapter 1 (VS 1): from the scene to the image• Geometrical Transformations • Video Acquisition and Perspective Model• Pose Estimation

Chapter 2 (VS 3): from motion field to visual servoing

• Motion field analysis• Image-Based Visual Servoing• Position-Based Visual Servoing• Stability analysis

Chapter 0

Introduction and applications

Introduction Computer Vision (CV) is dealing with video image features processing and

relations between acquired images and objects of interest in the 3D scene by means of visual sensors and mathematical models. Image data processing, understanding and computer vision geometry are the main parts of CV and are closely related. The lecture is about Computer Vision Geometry (CVG) and its relations with motion field, pose estimation, reconstruction and visual servoing. In particular, we are looking for :

Projection in the image of a scene object with a simple geometry,Mathematical models for projections and motion between a couple of images,Geometric reconstruction from multiple images and motion field analysis.

Applications of CVG may include :

* Localization, regitration and visual tracking,* shape recognition,* Close environment modelisation,* Metrology, monitoring,* Augmented reality and images synthesis.

Introduction (2)

The registration of a visual sensor (often a digital camera) and an region/object of interest (ROI) allows to build a positioning (robotic) task with an articulated mechanism (robot arm) while the ROI is moving.

Traffic cross-roads surveillance, forest, and farmings area

Applications – 1

Finger prints recognition (right : a pre-processing)

Applications – 2

Dynamic Visual Inspection : defaults detection

Applications – 3

Linear cameras

Light

Web displacement direction

Impact BugConnecting passage

(joint)Mark

Gaz detection (Thermographic IR, NIR, UV, fluo)

Applications – 4

Conveying supervision – Quality of products

Applications – 5

Satellite images mosaicking

Applications – 6

Medical imaging guidance (X TDM and US) : diagnostic assistance and intraoperative assisted surgery

Applications – 7

US imagingCT scan slice

Applications – 8

Internal organ with non-blocked artery

Blood flow through two patients legs

Internal organ with blocked artery

Clinical examination with thermal imaging

Applications – 9

Active Vision : 3D acquisition of organs surfacesby means of structured lighting

Chapter 1

From the Scene to the Image

Chapter 1 : from the scene to the image

1.1 Geometrical transformations

EuclideanTransformations Similarities Affine

Transformations

ProjectiveTransformations

3 (3)Projective 1-D(homography)

2 (2)Affine 1-D

1 (2)Euclidean 1-D

Transformations(homogeneous coordinates)

Degrees of freedom (nb of necessary points for computing a basis)

Spaces

w [x '1 ]=[1 t x0 1 ][ x1 ]

w [x '1 ]=[a11 a12

a21 1 ][ x1 ]

One-dimensional Geometry

w [x '1 ]=[a11 a120 1 ][x1 ]

P3

P1

P2

Q1 Q2Q3

Two-dimensional Geometry

P3

P1

P2

P4

Q1 Q2

Q4

Q3

6 (3, non-collinear)

Affine 2-D

8 (4, no collinear triplets)

Projective 2-D(planar

homography)


Similarity 2-D


Euclidean 2-D

Transformations(homogeneous coordinates)

Degrees of freedom (nb of points for computing a basis)

Spaces

w[ x 'y '1 ]=[a11 a12 a13

a21 a22 a23

a31 a32 a33][ xy1 ]=[ A t

vT a33 ] xy1,det≠0

Two-dimensional Geometry

w[ x 'y '1 ]=[sr11 sr12 t xsr21 sr22 t y0 0 1 ][ xy1 ] ,RT R= I ,det R =1

w[ x 'y '1 ]=[r 11 r12 t xr 21 r 22 t y0 0 1 ][ xy1 ],RT R=I ,det R =1

w[ x 'y '1 ]=[a11 a12 a13

a21 a22 a23

0 0 1 ][ xy1 ] ,

R=[r11 r12r21 r 22 ]

Three-dimensional Geometry

P3

P1

P4P5

P2

Q2

Q1

Q3Q4

Q5

12 (4, non-coplanar)Affine 3-D

15 (5, no quadruplet of coplanar points)

Projective 3-D(homography)

7 (4, non-coplanar)Similarities 3-D

6 (4, non-coplanar)Euclidean 3-D

Transformations (homogeneous coordinates)

Degrees of freedom (nb of points for computing a basis)

Spaces

Three-dimensional Geometry

w [x 'y 'z '1 ]=[a11 a12 a13 a14a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44]⋅[xyz1 ]

¿

w[x 'y 'z '1 ]=[ R t0T 1 ]⋅[ xyz1 ] ,RT R=I ,det R=1

w[x 'y 'z '1 ]=[ sR t0T 1 ]⋅[ xyz1 ],RT R=I ,det R =1

w[x 'y 'z '1 ]=[a11 a12 a13 a14a21 a22 a23 a24

a31 a32 a33 a34

0 0 0 1]⋅[xyz1 ] ,

Projective geometry is the mathematical useful tool for inderstanding geometrical parts of CV. In particular, two classes of projective transformations are often used :

1) The perspective projection (or central projection) which is a linear transformation from a n-dimensional projective space Pn to a n-1 projective space Pn-1

(example : from P3 to P2 with center C):

w⋅[ x 'y '1 ]=[1 0 0 00 1 0 00 0 1 0 ]⋅[xyz1 ]

P

Q

C

P = w⋅[ xyz1 ]Q = w⋅[ x 'y '1 ]

Q : Perspective projection of a 3D point P

2) The planar homography H (or 2D collineation) which is also a linear bijective transformation from a projective space Pn to another projective space Pn or itself.

(example with P2 to P2 , det(H) not null) :

w⋅[ x 'y '1 ]=[h11 h12 h13h21 h22 h23

h31 h32 h33]⋅[ xy1 ]

P (lying in a plan : planar scene)

Ql

Qr

Cl

Cr

Qr : Perspective projection of P

(left)Image Il

Ql : Perspective projection of P

(right)Image Ir

Homography H inducedby the plan including P

A n-dimensional homography of Pn has (n+1)x(n+1)-1 linearly independent components (also called degrees od freedom) and may be computed from n+2 couple of points in correspondence.

A projective basis of Pn is composed of n+2 points, all subsets of n+1 of these points are linearly independent.

Points at the infinite distance in Pn have the (n+1)th homogeneous coordinate equal to zero:

A point at the infinite distance is transformed to a point at the infinite distance with an affine transformation.

A point at infinite distance may be transformed to a point at a FINITE distance with a projective transformation.

Notes

x1 , x 2 , . . . , xn ,0

[ A tvT a33 ]⋅ xy0=AxyvTxy

Examples with a 2-D affine transformation :

Examples with a 2-D projective transformation :

[ A t0T 1 ]⋅xy0=Axy0

- Vision with color features and second-order moment descriptor

- Vision for human motion capture by means of a model of an articulated object

- Vision with the color CMU cam

Videos

- 3-D finger visual tracking

The camera modelling takes into account the physical properties of the visual Sensors. An image acquisition device is composed of :

photo-sensitive elements (CMOS/CCD) : a matrix optical lens, electronic circuit (filter, ADC, sampling, signal of synchro,…) which provides output video data in analog norm (RS-170/CCIR, NTSC/PAL) or in numerical trames (Firewire, USB, Ethernet, CamLink,...)

Camera geometrical models usually used are based on :

the perspective projection, or one of approximations : para-perspective, ortho-perspective, weak perspective, Non-linear effect : distorsion (lens, mis-synchronism) : radial, tangential

1.2 Video acquisition and perspective model

General synopsis of the image acquisition

• Perspective model (details)

The geometrical relationships between a point in the 3D scene (world) and the image plane are usually issued from the perspective model, which is represented with the (3x4) and (4x4) following matrices:

A 3-D position vector t and a 3-D rotation R (object camera frame) A perspective canonical projection (camera image frame) An affine transformation ( image pixel : upper-triangular matrix).

C Q

Optical centre

Pimage plan

Scene

object

optical axis

xc

y c

z c

1=[ R t

0T 1 ]⋅ xo

y o

z o

1z c uv1=[G x 0 uc 0

0 G y vc 00 0 1 0 ]⋅x

c

y c

z c

1

• Perspective model (details)The main assumptions for which the perspective model is validated (and coming from thin lens properties, are :

conditions of Gauss (small inclination and paraxial rays) monochromatic light, the image surface is planar, the image plan is perpendicular to the optical axis.

- Parameters Gx, G

y, u

c and v

c are named intrinsic parameters of the vision

system matrice K.

- The rigid transformation from a reference frame attached to the object and that of the camera is an euclidean transformation named the viewpoint or extrinsic parameters.

G x=s x f / l xG y= f /l y

z c u s

vs1 =[1 0 0 0

0 1 0 00 0 1 0 ]⋅ x

c

y c

zc

1uv1=[G x 0 uc

0 G y v c

0 0 1 ]K

⋅us

vs1

Non-linear effects are : due to strong lens curvature for short focal length at the boarders of

the image, the shorter the focal length is, the more significative the non-linear

effects are.

Radial distortion model (R. Tsaï, IEEE Journal of Robotics and Automation, 1987) :

us=udD x

v s=vdD y

with :D x=ud κ1r

2κ2r4. . .

D y=vd κ1r2κ 2r

4 . . .and : r=ud

2vd2

• Perspective model (details)

• Perspective model

Exercise

1) Given n collinear points in space Pi, i=1,..n, express the geometrical relationships for that 1-dimensional object of interest (represented by theset of previous points) and its image with the perspective projection.

- a) the projective transformation from the straightline supporting the points and the image plan,

- b) the metric transformation, for a rigid object, fromthe reference frame attached to the object and the imageframe when the camera parameters (matrix K) areintroduced.

image plan

2) How many points in correspondence (object/image) are necessary to compute the orientation and the position of the 1-dimensional object (the camera is assumed to be calibrated) ? Express and solve this pose problem.

P0

Q0

C

P2

P1

- 3-D Vision with a biped robot

Videos

- 3-D fast Vision for a game with pucks

- 3-D reconstruction with a 2-D echographe

- Visual path following with monocular vision

Exercise

1) With the following technical specifications (next page) of a CCIR camera, give the approximative values for the diagonal elements of the camera matrix K (matrix of intrinsic parameters) for a vision system composed of :

- a JAI CV-A50 camera,- a lens with the focal length of 16 mm- a capture PC-board which is able to grab 25 images per second

at a resolution of 768 (H) x 572 (V) pixels.

2) Give a rough estimate of the decentering, that are the coordinates of the image of the camera optical axis (the principal point).

Document

Exercise (home work)Given a planar scene - that is a set of coplanar points P in the 3-D space - and two views of that scene. Show that if the relation between each view and the planar scene is related to a perspective projection, then the two views are directly related to a planar homography.

(Hints : Explain that the 3-D points P = (x,y,z,1)T are lying in a common plane defined by a 4-vector π = ( n , - d ). Express each perspective projection (scene to right image point Q

r , and scene to left image Q

l).

Express for one projection, the back-perspective transformation (the ray) and introduce the coplanarity of scene points : π T P = 0).

P (lying in a plan : planar scene)

Ql

Qr

Cl

Cr

Qr : Perspective projection of P

(left)Image Il

Ql : Perspective projection of P

(right)Image Ir

Homography H inducedby the plan including P

The purpose is to compute the extrinsic parameters (R and t) given the image data and the geometric model of the visible object (the intrinsic parameters are assumed to be known).

- 3-D localisation – the object is rigid (and opaque).

- The real (3x4) matrix Mco has 11 components. When its decomposition is not taken into account,one has to provide six correspondances betweenthe object points and their projections in the imageto compute all the components of that matrix.

- The matrix transformation Mco (with tz > 0) may be computed with a less number of points, if one takes into account the above decomposition .

- A very useful particular case is the planar case which arise for a thin object of interest (wrt the cmaera-object distance). This can be handled with a set of n feature coplanar points and their projections in the image.=> planar homographies (also named 2-D homographies).

w⋅uv1=K⋅[R t ]M co

⋅xo

y o

z o

1

1.3 Pose estimation (2-D/3-D rigid registration)

• Pose problem (1) : the planar caseA planar homography H = (hij) exists between two sets of coplanar points (between two plans), one is the object plan, the other is the image plan.This transformation is fully determined (but up to a scale) (h33 can be set to 1 for instance – but there are other choices), with a small set of n=4 (object points/ image points). Assuming that the plan object is represented by the following z(o) = 0 (that is all 3-D points must be orthogonally projected on the plan with the z(o) direction), one has :

Once, H is computed linearly with at least four correspondences, the goal is to compute the rotation matrix R = (c1, c2, c3) and the position vector t , given the camera intrinsic parameters matrix K :

and c3=c1∧c2

w⋅uv1=H⋅ x o

yo

1 =K⋅[R t ]M co [1 0 0

0 1 00 0 00 0 1 ] x

o

y o

1

H= K⋅[c1 c2 t ]

• Pose problem (2) : the planar case

Stage 1: Computation of H : every correspondence of points (Pi / Qi ) provides two equations as:

with the vector h of unknowns (which stacks all but one components of the matrix H) :

The linear solution with the least mean squares is straightforward :

with and

and with at least 4 couples of object/image points.

[−xi −y i −1 0 0 0 ui xi ui yi0 0 0 −xi −y i −1 vi x i v i y i ]

miT

⋅h = [−ui

−v i ]k i

h = h11 h12 h13 h21 h22 h23 h31 h32 T

h = M T M −1M TM #

k M=m1T ,m2

T ,. . . ,mnT T k=k 1

T , k 2T , . . . , k n

T T

• Pose problem (3) : the planar case

Stage 2: R and t estimation. Recall : R and tare then directly computed whateveris the non-null real parameter µ.

[exercise : express calculations]

Pb : R is not exactly a rotation matrix hen the data are noisy.

one can normalize c1 or c2 , but it is not enough for ensuring that the matrix R is an orthonormal one:

Alternative solution : the singular value decomposition (SVD) of R : R = UDVT if the magnitude of the diagonal elements of D are not equal to one, takes the closest matrix R' (according to the Frobenius norm) with singular values enforced to 1 :

R' = U VT and det(R') = 1

[ c1 c2 t ] = K−1H

RT R≠I

In practice, to get well-conditioned matrices, it is necessary to normalize the data. This exercise proposes a very simple way to normalize data, a useful stage before applying any fitting.

1) In the case of the planar pose problem, compute the normalized H' homography matrix in function of the Homography H when the normalization is a translation to the centroid of feature (points are the only type of features consider herein) and a scaling, the object model is assumed to be unnoisy. That is given npoints Qi = (ui,vi,1)T, the centroid is given by :

, and the scaling is

Compute the (3x3) normalization matrix N with

2) In the case of two data images issued from a planar scene, the two image planes are related to a planar homography H. Carry out the normalization for the two sets of image data.

d=∑iui−u2v i−v 2

n2

Qi=[ui−u / d vi−v /d 1 ]T

v=∑ivi /nu=∑

iui/ n

N⋅Q i = Qi

Exercise

Chapter 2

From motion field to visual servoing

A kinematic (motion) analysis is relevant when the frequency of the image acquisition is high, that is when the frame rate is faster than the apparent motion of the scene object.

In the case variables to control are expressed in the 3-D space, the information to extract from the video data are the infinitesimal variations of the orientation and of the positions of the visual sensor. This is the only relevant cue. Then, from a 3-D desired position and orientation (orientation and translation of reference), one can carry out a 3-D visual servoing, but better named as the position-based visual servoing (PBVS). A intermediate step of object localisation is needed in that case (see chapter one).

In the case variables to control are expressed in the sensor space (the image plan herein), the information to extract from the video data are the infinitesimal variations of image descriptors (coordinates of image points or lines, image moments, SSD, Fourier descriptors,...). Then, from a desired image (image of reference), one may carry out a 2-D visual servoing, better known as an image-based visual servoing (IBVS).

There are also some approaches which combine 2-D and 3-D signals : there are named hybrid techniques, one of them is the 2-D ½ visual servoing.

2.1 Motion field analysis

When simple geometric features (points, lines, conics, moments,…) described the area of interest in the image (after pre-processing and segmentation), aninteraction matrix is associated to each of them. This matrix is coming from the jacobian computation and the feature model.

In opposite, if the visual cues are based on the apperance - at the pixel level (the distribution of the luminance in an image region for instance), the apparent motion in the image may be computed with one of optical flow techniques, allowing to carry out a visual servoing based on visual tracking.

2.1 Motion field analysis (2)

The aim of a visual tracking or a vision-based control scheme is to minimize an error vector e(t), which is typically defined by:

e(t) = s(m(t),a) – s* .

This formulation is quite general and it encompasses a wide variety of approaches. The vector m(t) is a set of image measurements (e.g., the image coordinates of interest points or the image coordinates of the centroid of an area).These image measurements are used to compute a vector of k visual features, the vector s(m(t),a), in which the vector a is a set of parameters that represents potential additional knowledge about the system (e.g., coarse camera intrinsic parameters or 3-D models of objects). The vector s* is gathering the desired values of the features. In this analysis, we consider the case of a fixed goal pose and a motionless target, i.e., s* is constant and changes in s depend only on camera motion.

Visual servoing and tracking (and filtering) schemes mainly differ in the way that vector s is designed. With IBVS, s consists of a set of features that are immediately available in the image data. For PBVS, s consists of a set of 3-D parameters, which must be estimated from image measurements.


Once s is selected, the design of the visual tracking or visual servoing schemes can be quite simple. The approach is to design a velocity controller. To do this, we require the relationship between the time variatons of s and the (virtual or real) camera velocity. As we consider only rigid motions, the spatial velocity of the camera can be represented with a velocity screw denoted by

τc = (v c,ωc),

where v c is the instantaneous linear velocity of the origin of the camera frame and ωc is the instantaneous angular velocity of the camera frame (axes).

The relationship between ds/dt and the velocity screw τc is given by

in which Ls is the interaction matrix (or feature jacobian) related to s.


s = Lsc

Ls is a real matrix of size (k x 6). Fr a motionless target (or object of interest), the desired feature vector is constant, then ds*/dt = 0, and:

Le = Ls .

If we wish to have an exponential decreasing behaviour of the error

de/dt = - λ e , λ is a positive real number or a positive matrix),

One has

where Le+ is the Moore-Penrose pseudo-inverse of Le, that is: Le

+ = (LeT Le)-1

LeT , when the rank of Le is equal to 6.

In practice, it is not possible to perfectly know the values of the components of Leor Le

+ , an estimation of one of these matrices is used (notation ^).The resulting velocity control low is then

or


c =− Le e

e = Le c

c =− Le e

c =− Le e

The interaction matrix is representing geometrical relationships between time variations of feature images and the velocity screw, which characterizes the rigid motion of the object of interest in the 3-D scene.

Traditional image-based visual control schemes use the time-derivative of the perspective projection and the image-plane coordinates of a set of points to define the set s. The image measurements m are usually the pixel coordinates of the set of image points (but this is not the only possible choice), and the parameters a in the definition of s=s(m,a) are nothing but the camera intrinsic parameters to go from the image measurements expressed in pixels to the features.

With points Q as image features to build thevector s, the infinitesimal variation dQ = (du,dv)T

of image coordinates between two times instantt and t’=t+dt is related to a variation of the 3-Dpoint coordinates dP= dCP =(dx(c) ,dy (c),dz (c))T

expressed in the camera frame (Rc) in motion.

Time first-derivative of the perspective projection

2.2 Image-Based Visual Servoing

z c [uv1 ]z c [ uv0]=K⋅[ x c

y c

zc ]

In one hand, the first time derivative dP/dt is the velocity of point P wrt the frame origin of (Rc) expressed in (Rc). Then, one has :

z c Q = [1 0 −u0 1 −v ]⋅K⋅V Rc

P / Rc

In the other hand, assume that the camera frame (Rc) with C as the origin moves with respect to a Galilean frame (Rf) with Of as origin, with an instantaneous linear velocity v c and an instantaneous angular velocity (rotation velocity) ωc :

dPdt

= V Rc

P / Rc

vc =V R f

C / Rc

c = cf / Rc (Rf)

(Ro)

(Rc)

Configuration for controlling a robot by meansof visual servo with the sensor on-board (eye-in-hand)

If P is a motionless point in this frame [ static environment ],

then by means of the composition law of velocities, it comes

or simply with the notations

Finally :

The components of the (2x6) interaction matrix Lq related to point Q, depend on the “depth” (exactly, this is the z-coordinate z(c) ) of the 3-D point P (from which is Q is issued) wrt the camera frame.

• In order to control all the components of the velocity screw, at least three points are needed (k >= 6). But the number of controllable components is depending of the structure of the viewed object.

V R f

P / Rc= Rcf V R f

P / R f =0=V R c

P / RcV R f

C / Rc cf / Rc

∧CP / Rc

V R f

P / R f =0

vc =V R f

C / Rc

c = cf / Rc

CP =−vc −c×CP

Q = Lq c

Lq =1z c 1 0 −u

0 1 −v⋅K⋅[−I 3x3sk CP ]

and

If K = I, (identity matrix) :

• Since the “depth” z(c) in Lq (translation parts) require that this parameter should be estimated or coarsely identified, then only an estimate of Lq is available. That's why there are several choices available for constructing the estimate to be used in the control law. Three strategies are commonly proposed for approaching the interaction matrix :

• Lq is updated at each iteration (every image acquisition) : • Lq is maintained constant during all the visual servo and equal that of corresponding to the desired configuration q* :

• Lq is a linear combination of the first two strategies :

Lq = [−1zc 0 uzc

u v −1u2 v

0 −1zc

vzc

1v2 −uv −u ]

Lq =Lq

Lq =1/2 Lq

Lq∗

Lq =Lq∗

We consider a flexible robot (flexible joints) and composed of two parallel rotation axes (see figure below). The end segment position can be controlled, either with resolver and position incremental encoder placed onto every joint, or either with a fast vision system. With the latter measuring solution, the vision system is placed in front of (and perpendicular to) the robot working planar area (x,y). Then, the z axis, normal to this plan, coincides with the camera optical axis (in opposite direction). In the case a visual servoing is the selected sensor and control scheme, a set of red landmarks are sticked on the first segment (corps 1) and a set of blue landmarks are sticked on the second one (corps 2). The vision system consists of a lens, a digital camera, and the geometrical transformation from the scene (robot plan) point and its projection in the image plane is represented with the perspective projection, without any distortion. In the sequel, we also assume the camera intrinsic parameters matrix is the identity matrix.

1) Since a 3-D rotation matrix, R, such thatRT = (r1, r2, r3 ), is an orthonormal matrix, show that it isrelated to the instantaneous angular velocity Ω with :

and

Exercise

Articulated mechanical system (robot) with segments (=corps)are viewed with colored landmarks

sk []=d RdtRT

sk []=[ 0 −z y

z 0 − x

− y x 0 ]

2) Show from 1) that : .

Each landmark position Pik is defined by the homogeneous coordinate [ λi

k 1]T inside segment I (corps I) and with respect to an arbitrarily chosen origin Oi on segment i. The two segments (corps 1 and 2) are considered as rigid bodies and independent. They may respectively contain n1 and n2 collinear landmarks.

3) Show that the rotation matrix Ri (see below, or equivalently the unit vector xi =( xi1 , xi2 ,0)T) and the position vector (t i = (tx

i,tyi,tz) of segment i, wrt to the camera frame (with origin C and expressed in the

camera frame (Rc)) are related to the coordinates of the image point Qik=(ui

k,vik), image of the centre

of each landmark Pik (the image plane is parallel to the robot plan, the inter-plan distance is equal to

the third (constant and strictly positive) component of the position vector tz. Give this geometrical relationship.

4) How many landmarks Pik one has to place on each seg-

-ment (corps i) to compute the position and the orientation ofthis segment with respect to the camera frame (Rc) ?

Then give a method to compute the two vectors xi and t i = (txi,ty

i,tz).

Exercise (continued)=[ r3

T r2r1T r3r 2T r1]=[

x

y

z]

Ri=[ x i1 −xi2 0xi2 x i1 00 0 1]

projections of landmarks of segment Ionto the image plane.

5) By derivation of the previous result, show that for a small displacement of each landmark Pik , it

comes:

where

6) How many degrees of freedom can be controlled with these two independent objects of interest (each segment) ?

7) Finally, the two segments are linked with a joint (revolut joint, they are not independent in this question). How many degrees of freedom this articulated object (segments 1 and 2) is composed of ?

Exercise (end)

[u ik

vik]= 1t z [1 0 −i

k x i2

0 1 ik x i1 ]⋅[V x

V y

z] t= t x t y 0T=V x V y V z

T

2.3 Position-Based Visual Servoing

(Ro) : object/target frame(object of interest)

(Rc*) : camera frame(desired situation)

(Rc) : camera frame(current situation)

With PBVS, a desired orientation (rotation matrix) Rco* = Rc*o and a desired position

vector tco* = tc*o of the visual sensor (defining the desired frame Rc*) with respect

to a target or object of interest in the 3-D scene (frame Ro) are defining thereference situation. To do that, the vector s is built with only 3-D information andneeds partial (or fully) camera pose. It is then typical to define s in terms of theparameterization used to represent camera pose.

Vector-based representation of a 3-D rotation: This will allow us to easily computevector s. Given a rotation matrix R= (Rij), we define the real vector r such that

R = exp(sk(r)), that is :

θ : rotation angleu : unit vector (rotation direction)

Then, one can choose s = (tco,θ u) for the current situation an s* = (tc*o,(θ u)*) forthe desired situation. Vector (θ u)* is corresponding (equivalent) to matrix Rco

* .

=acos[ 12R11R22R33−1]

u = [signR32−R23 R11−cos 1−cos

signR13−R31 R22−cos 1−cos

signR21−R12 R33−cos 1−cos

]r = u ,

Velocity control law : with R = Rc*c = Rc*o Roc, the resulting θ u vector describesthe relative orientation between the actual (current) and desired orientation.

One may define s with s = (tco,θ u) et s* = (tc*o,0) .

Then the error and task function is e = (tco - tc*o , θ u ) and the (6x6) interactionmatrix is given by

inside which the (3x3) sub-matrix Lθ u is expressed as:

L u = I 3 −2

sk u 1− sinc

sinc22 sk u2

Le = [−I 3 sk t co0 Lu ]

Then, the computation of the velocity screw is straightforward:

Notes:

1) One may observe that the velocity control is partially decoupled, and may befully decoupled (if necessary) with sequential visual servos.

2) No warranty is given about the visibility of the image features used to computethe pose at each iteration. It is possible that several image features leave thefield of view during the visual servoing !

3) The computation of desired and current pose depend on the camera intrinsicparameters. Then the final accuracy strongly depends on the camera calbration.

c =− uLe

−1 = [−I 3 sk t co Lu−1

0 Lu−1 ]vc =− [t c∗o−t co sk t co u]

c =− Le−1 e

2.4 Stability analysisTo assess the stability of the closed-loop visual servo systems, we wil use Lyapunov analysis. In particular, consider the candidate Lyapunov functiondefined by the squared error norm (locally positive-definite function)

Whose the derivative is given by

The global asymptotic stability of the system is thus obtained when the followingsufficient condition is ensured

= 1/2 eT e

= eT e =− LeLe

e

LeLe

0

- Visual servoing with points

Videos : several IBVS and PBVS

- Visual servoing with coplanar lines

- Visual servoing with 3-D lines (motionless target)

- Very fast visual servoing

- Visual servoing/tracking with 3-D lines

- Visual servoing with redundance

- Visual servoing with lasers and LED forminimally invasive surgery

- Visual servoing with limbs for // robots

- Visual tracking for mobility impaired persons

Videos – and some applications to Augmented Reality

- Visual servoing-based tracking of lines

- Visual servoing-based tracking of points of interest

- Virtual visual servoing for camera pose (multiple features)

- Virtual visual servoing for surgical instrument with anendoscopic camera (multiple features)

- Virtual visual servoing for suturing

- Visual servoing with stereo and ESM

- Simulation of automatic landing

Visual Servoing (part I & III) - unistra.fr

Documents

Transcript of Visual Servoing (part I & III) - unistra.fr