Uncalibrated 3D Stereo Image-based Dynamic Visual Servoing ... · uncalibrated visual servoing with...

Uncalibrated 3D Stereo Image-based Dynamic Visual Servoing forRobot Manipulators

Caixia Cai†, Emmanuel Dean-León‡, Dario Mendoza§, Nikhil Somani† and Alois Knoll†

Abstract— This paper introduces a new comprehensive so-lution for the open problem of uncalibrated 3D image-basedstereo visual servoing for robot manipulators. One of the maincontributions of this article is a novel 3D stereo camera model tomap positions in the task space to positions in a new 3D VisualCartesian Space (a visual feature space where 3D positionsare measured in pixels). This model is used to compute afull-rank Image Jacobian Matrix (Jimg), which solves severalcommon problems presented on the classical image Jacobians,e.g., image space singularities and local minima. This Jacobianis a fundamental key for the image-based control design, whereuncalibrated stereo camera systems can be used to drive arobot manipulator. Furthermore, an adaptive second ordersliding mode visual servo control is designed to track 3D visualmotions using the 3D trajectory errors defined in the VisualCartesian Space. The stability of the control in closed loop witha dynamic robot system is formally analyzed and proved, whereexponential convergence of errors in the Visual Cartesian Spaceand task space without local minima are demonstrated. Thecomplete control system is evaluated both in simulation and ona real industrial robot. The robustness of the control scheme isevaluated for cases where the extrinsic parameters of the stereocamera system change on-line and the kinematic/dynamic robotparameters are considered as unknown. This approach offersa proper solution for the common problem of visual occlusion,since the stereo system can be moved to obtain a clear view ofthe task at any time.

I. INTRODUCTION

Visual servoing control (VSC) is an approach to controlmotion of a robot manipulator using visual feedback signalsfrom a vision system. This has been one of the most activetopics in robotics since the early 1990s[1]. The vision systemcan be mounted directly on a robot end-effector (eye-in-handconfiguration) or fixed in the work space (fixed-camera con-figuration). Aditionally, visual servoing approaches differ inthe way in which error functions are defined. In Image-BasedVisual Servoing (IBVS) the error function is defined directlyin terms of image features. In Position-Based Visual Servoing(PBVS) the error function, which is specified in the taskspace coordinates, is obtained from the visual information[2]. The conclusion drawn in many of the previous works,e.g.,[1] and [3], is that the IBVS method is more favorablethan the PBVS method, since the IBVS has low sensitivityto camera calibration errors.

Authors Affiliation: †Technische Universität München, Fakultät für In-formatik. ‡Cyber-Physical Systems, fortiss - An-Institut der TechnischenUniversität München. §UPIITA-IPN

Address: †Boltzmannstrae 3, 85748 Garching bei München (Germany).‡Guerickestr. 25 80805 München (Germany). §Av. IPN 2508, Mexico-City(Mexico)

Email: †{caica,somani,knoll}@in.tum.de, ‡{dean}@fortiss.org

Two main aspects have a great impact on the behaviorof any visual servoing scheme: the selection of the visualfeatures used as input of the control law and the form ofthe control scheme. This work focuses on the study of IBVSdesign, where these two fundamental problems are handled:First, the definition of a new image Jacobian (Jimg) for astereo camera system in fixed-camera configuration (see Fig.1). Second, the design of a Dynamic Visual Servoing controlfor robot manipulators, which includes unknown camera androbot parameters (kinematic and dynamic).

An IBVS usually employs the image Jacobian matrix(Jimg) to relate end-effector velocities in the manipulator’stask space to the feature parameter velocities in the feature(image) space. A full and comprehensive survey on VisualServoing and image Jacobian definitions can be found in [1],[2], [4] and more recently in [5]. In general, the classicalimage Jacobian is defined using a set of image feature mea-surements, usually denoted by s, and it describes how imagefeatures change when the robot manipulator pose changesṡ = Jimgv. In Dynamic Visual Servoing we are interested indetermining the manipulator’s required dynamical behaviorto achieve some desired value of image feature parameters.This implies calculating or estimating the image Jacobianand using its inverse to map the image feature velocities ṡinto a meaningful state variable required for the control law,usually the generalized joint velocities q̇.

In general, the image Jacobian can be computed usingdirect depth information (depth-dependent Jacobian) [6],[7],or by approximation via on-line estimation of depth of thefeatures (depth-estimation Jacobian) [2], [5], [8], [9], or us-ing depth-independent image Jacobian matrix [10],[11],[12].Additionally, many papers directly estimate on-line the com-plete image Jacobian in different ways [13],[14],[15],[16],[17]. However, all these methods use redundant image pointcoordinates to define (as a general rule) a non-square imageJacobian, which is a differentiable mapping from SE(3) tos ∈ R2p (with p as the number of feature points). Then,a generalized inverse of the image Jacobian matrix needsto be computed, which leads to well-known problems suchas the image space singularities and local minima. In oderto obtain a full-rank Image Jacobian, [18] proposed anapproach where the definition of the features is based on acombination of both IBVS and PBVS approaches, attemptingto incorporate the advantages of each method. However, thismethod requires the exact camera calibration, and in thepresence of calibration errors the convergence of the errorfunctions (in the image or in the task space) can not beguaranteed.

In the context of control schemes, kinematic-based con-trols cannot yield high performance and guarantee stabilitybecause the nonlinear forces in robot dynamics are neglected.This problem known as Dynamic Visual Servoing was studiedby Hashimoto et al.[19] for the eye-in-hand setup and byZergeroglu et al. [20] for the fixed-camera configuration.Their methods work well when the camera intrinsic andextrinsic parameters are known. In order to avoid tediousand costly camera calibration, Astolfi et al. [21], Bishop et al.[22], Hsu et al. [23], Dean et al. [24] and Kelly [25] proposedto employ an adaptive algorithm to estimate the unknowncamera parameters on-line. However, these methods areapplicable to planar manipulators only. The problem of 3Duncalibrated visual servoing with robot dynamics has beentackled with new adaptive controllers [11],[26]. However, theimage jacobians in these approaches are not full-rank andthey have the above mentioned problems.

In this paper, we address the problem of controlling a robotmanipulator to trace time-varying desired trajectories definedin an uncalibrated image space, taking explicitly into accountthe robot dynamics in the design and in the passivity proof.The contribution of this paper can be summarized into threeaspects: 1) The introduction of a new stereo camera modelbased on a virtual composite camera system. This modelprovides a new 3D visual feature for the IBVS method.This feature is a representation of task positions in a 3Dvisual space. 2) Using this stereo camera model, we designa new image Jacobian which maps velocities from the taskspace to the 3D visual space. The dimension of the imagefeature vector, obtained from the stereo camera model, allowsa full-rank Jacobian, whose singularities depend exclusivelyon parameters defined by the user. 3) An adaptive secondorder sliding mode visual servo control using the proposedimage Jacobian has been implemented for 3D visual motiontracking. The control law is designed to cope with uncertain-ties in the robot dynamic/kinematic and camera parameters.Moreover, we extend the implementation of this adaptiveimage-based control law to include environment constraints,such as: robot singularities avoidance and (self-/obstacle-)collision avoidance. By means of Lyapunov stability analysis,we demonstrate the convergence of errors in both the 3DCartesian visual space and task space without local minima.

DistributedSystem

VisualStereoTracker

RobotLow Level

Control

3DVisualization

System

2 cameras [Uncalibrated Stereo System] {30ms}

Virtua

l

Came

ra 1

VirtualCam

era 2

t or qd

Voltage

q,qq,q

b) Top Viewa) General View VirtualObstacle

RobotEF

Target

Stereo System

Virtual

Camera 1 Virtua

l

Camera 2

VirtualObstacle

TCP/IP

{4 ms}

Fig. 1. Description of robotic experimental setup.

In order to evaluate the control approach and the proposedmodels, experiments have been conducted in both simulationand on a real industrial robot. In simulation, we control

the robot without environment constraints to better illustratethe robustness of the system to uncertainties in the robotparameters and changes in the extrinsic parameters of thestereo system. In the experimental setup, we integrate thevisual servoing system in a Human-Robot-Interaction (HRI)scenario (see Fig. 1). In this scenario, we reproduce a morerealistic environment including obstacles, robot singularities,visual occlusion, collision avoidance and sporadic loss oftargets. We prove that the system is always stable and canbe used for HRI tasks.

A. Organization

This paper is organized as follows. First, in Section IIwe will introduce the new 3D Camera Model. This modeldefines the complete 3D Visual Jacobian, which will beused in the Section III to design an adaptive image-based3D visual servoing. In the same section, the passivity proofis shown and a validation in simulation with remarks aboutrobustness and adaptability of the control is given. In SectionIV we present the adaptive image-based torque controllerwhich includes environment constraints. We explain thecapabilities of this control and its different properties, such asrobot singularity avoidance and collision avoidance. SectionV provides an overview of the HRI setup (Fig. 1) andshows the obtained results under different conditions. Finally,Section VI draws the conclusions and future work.

II. 3D CAMERA MODEL

Ov2

Ov1

Ocl

Ocr

pl

pr

pV2

pV1

Ob

l

l

X [m]b

Object

OV X ( )Cl XV

VirtualCameras

StereoSystem

RClV

X =R XV ClC

l

V

O is located at the same

position as Oc , and its

orientation is given byR

V

l

Cl

V

Rb

Cl

Fig. 2. Image Projections: The figure depicts the different coordinate framesused to obtain a general 3D visual camera model. Xb ∈R3×1 is the positionin meters [m] of an Object with respect to the world coordinate frame(wcf) denoted by Ob. OCl and OCr are the coordinate frames for the leftand right cameras, respectively. RbCl ∈ SO(3) represents the orientation ofwcf with respect to the left camera. OV is a reference coordinate frame forthe virtual orthogonal cameras Ov1,2 where R

ClV ∈ SO(3) is its orientation

with respect to OCl . λ is the distance from Ovi to OV along each opticalaxis i. The vectors pl , pr ∈ R2×1 are the projections of the point Xb in theleft and right cameras. Finally, pvi ∈ R2×1 represents the projection of theObject in the virtual cameras Ovi .

A. Problem Statement

In the classical IBVS methods, the sensory feedback sig-nals are directly chosen as the image feature measurementss. The vector Xb ∈ R3×1 represents the position vector of a

target object in the task space, defined in the world coordi-nate frame (wcf). The vector s = [x1,y1, ...,xp,yp]T ∈ R2p×1contains the image feature measurements of all p featurepoints on the target object. Then, the relation between ṡ andẊb is given by ṡ = JimgẊb, where Jimg ∈ R2p×3 is known asthe image Jacobian.

If we consider ∆Xb as the input to a robot controller,then we need to compute the inverse mapping of ṡ as∆Xb = J+img∆s, where ∆∗ is an error function defined inthe space ∗, J+img ∈ R3×2p is chosen as the Moore-Penrosepseudoinverse of Jimg, which leads to the two characteristicproblems of the IBVS method: the feature (image) spacesingularities and local minima. In this case, the imageJacobian is singular when rank(Jimg) < 3, while the imagelocal minima are defined as the set of image locationsΩs =

{s|∆s = 0,∆Xb 6= 0,∀s ∈ R2p×1

}when using redundant

image features.In this work, we define a new type of visual features such

that a full-rank image Jacobian Matrix (Jimg ∈ R3×3) canbe obtained. The key idea of this model is to combine thestereo camera model with a virtual composite camera modelto get a full-rank image Jacobian to map velocities of a targetobject (ẊCl ) to velocities of the image features (in our case,pixel velocities in the 3D visual space, denoted here by Ẋs),see Fig. 2.

This new 3D visual model can be computed in 3 mainsteps.

1) Use the standard stereo vision model [27] to recoverthe 3D relative position of an object with respect tothe reference frame of the stereo system OCl

1. Thisposition is denoted by XCl ∈ R3×1.

2) The Cartesian position XCl is projected on two vir-tual cameras Ovi . This projection is a crucial step,since it modifies the dimension of the mapping fromtwo 2D-image feature measurements of all p points(s = [pl1 , pr1 , ...plp , prp ]

T ) to a single 3D visual vectorsnew = Xs ∈ R3×1 in a 3D Visual Cartesian Space.Since s represents the position XCl ∈ R3×1 in theimage feature space, the maximum dimension of s is3. Therefore, if S ∈ R2p×1 (as is commonly definedin the classical methods) there will be 2p−3 linearlydependent elements in s. In this work, we propose avirtual projection that reduces the dimension of s andgenerates 3 linearly independent elements to computea full-rank image Jacobian (Jimg).

3) The final step is to transform the visual space velocities(Ẋs) to joint velocity space (q̇) via the image Jacobian(Jimg). This is achieved using the robot Jacobian J(q)(Js = JimgJ(q)).

The following sub-sections are devoted to explain each ofthese steps in detail.

B. Stereo Vision ModelThe relation between right camera and left camera is given

by the orientation matrix Rrl ∈ SO(3) and the translation1This step could be considered as an intermediate mapping between the

stereo image space and the virtual composite image space.

vector trl ∈R3×1. Using these transformations we can definethe projection matrices PCl and PCr ∈ R3×4 for each cameraas PCl = Kl

[I3×3 03×1

]and PCr = Kr

[Rrl t

rl

], where

Kl and Kr are the intrinsic camera matrices of the left andright cameras2.

Then, defining the observed image points in each cameraas pl = [x1 y1]T , pr = [x2 y2]T , we can use triangulation[27] to compute the relative position XCl = [xc yc zc]

T ofthe point Xb with respect to the left camera OCl . This can be

done by solving the system A[

XCl1

]= 0, where

A =

x1 pl31− p

l11 x1 p

l32− p

l12 x1 p

l33− p

l13 x1 p

l34− p

l14

y1 pl31− pl21 y1 p

l32− p

l22 y1 p

l33− p

l23 y1 p

l34− p

l24

x2 pr31− pr11 x2 pr32− pr12 x2 pr33− pr13 x2 pr34− pr14y2 pr31− pr21 y2 pr32− pr22 y2 pr33− pr23 y2 pr34− pr14

(1)

where, pli j denotes the element (i, j) in PCl , and pri j denotes

the element (i, j) in PCr .Before integrating the stereo camera model with the virtual

composite model, a re-orientation of the coordinate frameOCl is required. This is achieved by defining a new coordinateframe OV with the same origin as OCl . The projection XV =[xV yV zV ]T of XCl in OV is defined as (see Fig. 2)

XV = RClV XCl . (2)

where RClV is the orientation of the reference frame3 OV with

respect to OCl .

C. Virtual Composite Camera Model

In order to get the 3D visual space, we define two virtualcameras attached to the stereo camera system using thecoordinate frame OV (see Fig. 2). We use the pinhole cameramodel [27] to project the relative position XV to each of thevirtual cameras Ov1 and Ov2 .

The model for the virtual camera 1 is given by

pv1 =[

xv1yv1

]=

1−yV +λ

αR(θ)[

xV −o11zV −o12

]+

[cxcy

](3)

where θ is the rotation angle of the virtual camera alongits optical axis, O1 = [o11 o12]T is the position of theoptical center with respect to the coordinate frame OV , C1 =[cx cy]

T is the position of the principal point in the imageplane, λ is the distance from the virtual camera coordinateframe Ov1 to the reference frame OV along its optical axis,α and the rotation matrix R(θ) are defined as:

α =[

f β 00 f β

]R(θ) =

[cosθ −sinθsinθ cosθ

](4)

Since this model represents a user-defined virtual camera,all its parameters (extrinsic and intrinsic4) are known, in fact,

2These parameters can be computed off-line.3This reference frame is fixed and defined by the user and hence assumed

to be known.4Since the virtual cameras are user-defined, we can set the same intrinsic

parameters and λ values for both cameras.

in the defined configuration of the virtual cameras θ = 05(see Fig. 2).

Similarly, the virtual camera 2 is defined as:

pv2 =[

xv2yv2

]=

1xV +λ

αR(θ)[

yV −o21zV −o22

]+

[cxcy

](5)

In order to construct the 3D visual Cartesian space Xs ∈R3×1, we combine both virtual camera models as follows.

Using the properties of the rotation matrix R(θ) and thefact that α is a diagonal matrix, from (3), xv1 can be writtenin the form

xv1 = γ1xV −o11−yV +λ

− γ2yv1 + γ3, (6)

where the constant parameters γ1, γ2, γ3 ∈ R are explicitlydefined as

γ1 =f β

cosθ, γ2 = tan(θ), and γ3 = cx + cyγ2. (7)

Based on (5) and (6), we define the 3D Visual CameraModel representation Xs = [xs ys zs]T using the orthogonalelements [xv1 xv2 yv2 ]

T

Xs =

xv1xv2yv2

=Rα︷︸︸︷[

γ1 01×202×1 αR(θ)

]xV−011−yV+λyV−021xV+λ

zV−022xV+λ

+ρ (8)where ρ = [γ3− γ2yv1 cx cy]

T .

D. Visual Jacobian

In the previous section, we defined that the position ofa point XV projected in the 3D Visual Space is given byXs6 as (8). The Optical Flow can be obtained with the timederivative of (8) as follows7:

Ẋs =

Jα︷︸︸︷Rα Jv ẊV = Jα ẊV (9)

where the image Jacobian Matrix Jv ∈ R3×3 is defined as

Jv =

1

−yV+λxV−o11

(−yV+λ )20

− yV−o21(xV+λ )2

1xV+λ

0− zV−o22

(xV+λ )20 1xV+λ

(10)This image Jacobian Matrix Jv represents the mapping fromvelocities defined in the reference frame OV to velocities(pixels/s) in the 3D visual space. In order to complete the3D visual mapping we need to include the transformationsfrom OV to Ob. This transformation is given by the followingequation (see Fig. 2)

XV = RClV (R

bCl Xb + t

bCl ) (11)

5The reason to introduce the auxiliary coordinate frame OV is to simplifythe composite camera model by rotating the coordinate frame OCl in anspecific orientation, such as θ = 0.

6In this work, we use Xs instead of the classical notation s because Xs ismore than a image feature measurement, in fact, it defines a position vectorin the 3D visual space.

7Given that θ = 0, then γ1 = f β , γ2 = 0, ρ = [cx cx cy]T andRα = diag( f β ) ∈ R3×3.

where, RbCl and tbCl

are the rotation matrix and the translationvector between frame OCl and Ob.

Taking the time derivative of (11) and substituting therobot Differential Kinematics Ẋb = J(q)q̇, equation (9) canbe rewritten in the form

Ẋs = Jα RClV R

bCl Ẋb = JimgẊb (12)

= JimgJ(q)q̇ = Jsq̇ (13)

where J(q) ∈ R3×3 is the analytic Jacobian matrix of therobot manipulator, and the image-based Jacobian matrix Js ∈R3×3 is defined as the Visual Jacobian.

Then the inverse differential kinematics that relates gener-alized joint velocities q̇ and 3D visual velocities Ẋs is givenby

q̇ = Js−1Ẋs = J(q)−1Jimg−1Ẋs (14)

Remark 1: Singularity-free Jimg. From (9) and (12), wecan see that Jimg−1 = RbCl

−1RClV−1

Jv−1R−1α . The matricesRbCl ,R

ClV ∈ SO(3) are non-singular. Rα = diag( f β ) ∈ R3×3

consists of user-defined virtual camera parameters. Then,det(Jv) = 0→ det(Jimg) = 0. This condition is present onlywhen: 1) O11 +λ = 0 and O21−λ = 0 or 2) xV = −λ andyV = O21 or 3) xV = O11 and yV =−λ . However, O11, O21and λ are also defined by the user. Then, a non-singularJimg can be obtained using the condition O11 = O21 > λ >max(xVmax ,yVmax), where xVmax and yVmax are delimited by therobot workspace defined wrt OV .

Therefore, the singularities of Js are defined only by thesingularities of J(q). Hence, to guarantee a non-singular 3Dvisual mapping an approach to avoid robot singularities mustbe implemented. In Section IV, we discuss this issue andpropose a solution.

Remark 2: Sensitivity to Camera Orientation RbCl . Theorientation matrix RbCl requires a special attention becausecan cause system instability. Instead of demanding an exactoff-line calibration of this parameter, the problem is tackledin two parts: a) A coarse on-line estimation of the orientationmatrix is computed using the real-time information generatedby the robot (see Section II-E) and b) Estimation errors forthe complete Jacobian Js are taken into account in the controldesign. Thus, a robust control approach is designed to copewith these errors (including RbCl and the stereo-rig intrinsicparameters), see Section III-C.

E. On-line Orientation Matrix Estimation

We define this system as uncalibrated because we not onlyassume that the calibration of the stereo vision system (leftcamera OCl ) with respect to the wcf (Ob) is unknown, butalso consider the possibility of on-line modification of theparameters that define this relationship (e.g. RbCl ). The stereo-rig is assumed to be known, which is not a strong constraintbecause the entire stereo system is referenced with respectto the left camera (see Fig. 2), and this can be done off-line.Nevertheless, an exact calibration of the stereo system isalso not required because errors in estimation of these visualparameters are handled in the control design (see SectionIII-C).

In order to compute the visual Jacobian, in this work weuse an on-line orientation matrix estimator, where two sets ofposition points defined in each coordinate frame Ob and OClare used. These sets are generated while the robot is moving.The estimation approach can be summarized as follows.

Let datasets A and B be the sets of end-effector positionsdefined with respect to Ob and OCl , respectively. The problemis to find the best rotation RbCl that will align the points indataset A with the points in dataset B. This rotation matrixcan be obtained by computing the matrix

M =n

∑i=1

(Ai−CA)(Bi−CB)T (15)

where CA,B is the centroid of the data set (A,B). Then a least-squares fit of the rotation matrix can be written as RbCl =Udiag

(1,1,det

(UV T

))V T , where U,V are computed from

SV D(M). This rotation matrix is prone to errors of estimationwhich are considered in the next section.

III. ADAPTIVE IMAGE-BASED 3D VISUALSERVOING

In this section we describe the design of an adaptiveimage-based dynamic control. This control method includesthe robot dynamics in its passivity proof.

A. Non Linear Robot Dynamic Model

The dynamics of a serial n-link rigid, non-redundant, fullyactuated robot manipulator can be written as follows

M(q)q̈+C(q, q̇)q̇+G(q)+Bq̇ = τ. (16)

where q ∈ Rn×1 is the vector of joint positions, τ ∈ Rn×1stands for the applied joint torques, M(q) ∈ Rn×n is thesymmetric positive definite inertia matrix, C(q, q̇)q̇ ∈ Rn×nis the vector of centripetal and Coriolis effects, G(q) ∈Rn×1is the vector of gravitational torques, and finally B ∈ Rn×nis a diagonal matrix for the viscous frictions.

The robot model described in (16) can be written in termsof a known state robot regressor Y =Y (q, q̇, q̈)∈Rn×m and anunknown robot parameter vector Θ∈Rm×1 by using nominalreferences q̇r and q̈r as follows:

M(q)q̈r +C(q, q̇)q̇r +G(q)+Bq̇r = YrΘ (17)

Subtracting the linear parameterization equation (17) to(16), produces the open-loop error dynamics

M(q)Ṡq +C(q, q̇)Sq = τ−YrΘ (18)

with the joint error surface Sq defined as Sq = q̇− q̇r, whereq̇r represents the nominal reference of joint velocities. Thisnominal reference can be used to design a control in the 3Dvisual space.

B. Joint Velocity Nominal Reference

Considering equation (14), q̇r can be defined as

q̇r = Js−1Ẋsr (19)

where, the 3D visual nominal reference Ẋsr is given by

Ẋsr =(

Ẋsd −Kp∆Xs +Ssd −K1∫ t

t0Ssδ (ζ )dζ

−K2∫ t

t0sign

(Ssδ (ζ )

)dζ) (20)

Ssδ = Ss−Ssd ,Ss =(∆Ẋs +Kp∆Xs

),Ssd = Ss (t0)e

−κt (21)

where Ẋsd is the desired visual velocity; ∆Xs = Xs−Xsd isthe visual position error, ∆Ẋs is the visual velocity error,Kp = KpT ∈ R3×3+ and K j = K jT ∈ R3×3+ (with j = 1,2) andSsδ is the visual error surface.

Using (19-21) in Sq we obtain:

Sq = q̇− q̇r = Js−1(Ẋs− Ẋsr

)= Js−1Se (22)

with

Se = Ssδ +K1∫ t

t0Ssδ (ζ )dζ +K2

∫ tt0

sign(Ssδ (ζ )

)dζ (23)

where Se is the extended visual error manifold.

C. Uncertainties in JsThe above definition of q̇r depends on the exact calibration

of Js. However, this is a very restricted assumption. Hence,the uncertainties in the Visual Jacobian Js should be takeninto account in the control design. To achieve this, theuncalibrated nominal reference is defined bŷ̇qr = Ĵs−1Ẋsr (24)where Ĵs is an estimate of Js such that Ĵs is full-rank∀q ∈ Ωq, and Ωq =

{q|det(J (q)) 6= 0,∀q ∈ Rn×1

}defines

the singularity free workspace. Then, the uncalibrated jointerror surface is:

Ŝq = q̇− ̂̇qr = Sq−∆JsẊsr (25)with ∆Js = Ĵs

−1− Js−1 as the estimation errors, which in-cludes both intrinsic and extrinsic parameters.

D. Control Design

Consider a robot manipulator in closed loop with thefollowing second order sliding visual servoing scheme,

τ =−Kd Ŝq + ŶrΘ̂ (26)˙̂Θ =−ΓŶr

T Ŝq (27)

where Θ̂ is the on-line estimation of the constant robot pa-rameter vector, Kd = KdT ∈Rn×n+ and Γ∈Rm×m+ are constantmatrices. This adaptive on-line estimation together with thesecond order sliding mode in Ssδ handle the uncertainties onthe robot dynamic/kinematic and camera parameters.

E. Stability Proof

The stability proof is conducted in three parts:

1) Boundedness of the closed loop trajectories: The un-calibrated closed-loop error dynamics between (18) and (26-27) gives

M (q) ˙̂Sq = τ− ŶrΘ−C (q, q̇) Ŝq (28)=−Kd Ŝq + Ŷr∆Θ−C (q, q̇) Ŝq (29)

with ∆Θ = Θ̂−Θ.The uncalibrated error kinematic energy can be used as a

Lyapunov function in the following form as:

V =12

[Ŝq

TM (q) Ŝq +∆ΘT Γ−1∆Θ

](30)

Considering the time derivative of (30) in closed loop with(26-29), V̇ yields

V̇ = ŜqT

M ˙̂Sq +12

ŜqT

ṀŜq +∆ΘT Γ−1 ˙̂Θ (31)

=−ŜqT

Kd Ŝq + ŜqT

Ŷr∆Θ−∆ΘT ŶrŜq =−Kd∥∥Ŝq∥∥ (32)

where, the property (17) in terms of Ŝq has been used.Selecting a positive Kd , equation (32) becomes negative

definite and this proves the passivity of the robot dynamics(16) in closed loop with (26-27). Then, the following prop-erties of the closed-loop state arises

Ŝq ∈ L∞→ Se ∈ L∞ =⇒(

Ssδ ,∫ t

t0sign

(Ssδ (ζ )

)dζ)∈ L∞ (33)

Which implies that all the signal states are bounded, specially(q̇r, q̈r) ∈ L∞ and

(Ẋsr , Ẍsr

)∈ L∞.

2) Second-order sliding modes: From (22) and (25) weobtain

Ŝq = Js−1Se−∆JsẊsr (34)

Using (23), (34) can be written as

Ssδ =−K1∫ t

t0Ssδ (ζ )dζ −K2

∫ tt0

sign(Ssδ (ζ )

)dζ +Ψ (35)

with Ψ = Js(Ŝq +∆JsẊsr

),

Taking the time derivative of (35) and multiplying it bySTsδ , we can prove the sliding mode regimen

STsδ Ṡsδ ≤−K1∥∥Ssδ ∥∥−µ ∣∣Ssδ ∣∣ (36)

with µ = K2−∣∣ d

dt Ψ∣∣. If K2 ≥ ∣∣ ddt Ψ∣∣, then a sliding mode at

Svδ = 0 is induced at ts =|Ssδ (t0)|

µ . Moreover, notice that forany initial condition Ssδ (t0) = 0 then ts = 0, which impliesthat the sliding mode is guaranteed for all time.

3) Exponential convergence of visual tracking errors:Since a sliding mode exists at all times at Ssδ (t) = 0, thenSs = Ssd , therefore ∆Ẋs = −Kp∆Xs + Ss (t0)e−κt ∀t, whichimplies that the 3D visual tracking errors converge to zeroexponentially fast.

Remark 3: Convergence of ∆Xb without local minima.Given that Jimg is full-rank ∀t, from (12) can be seen that∆Xs = 0→ ∆Xb = 0 without local minima. This is the mostimportant impact of designing a full-rank image Jacobianwhich, in general, is not obtained with the classical methods.

F. Simulation

The torque level adaptive control is evaluated in simulationto better illustrate the robustness of the system to uncer-tainties in the robot parameters and changes in the extrinsicparameters of the stereo system.

In this case, we simulate a 6DOF industrial robot wherethe last three joints are controlled in a static position with aPID-like controller. The real robot dynamic parameters wereused to simulate the robot in closed loop with the controlapproach in (26-27). Also real camera parameters were usedto simulate the camera projections.

The task is defined as follows: the robot end-effectoris commanded to draw a circle in the world coordinateframe using the trajectory Xbd = (0.2sin(ωt),0.2cos(ωt)−0.8,0.5), where ω = 10rad/sec. Simulations are carried outin Matlab 2010a.

Fig. 3 shows the results obtained from the simulation,where the 3D visual position/velocity tracking can be ob-served. During the simulation, an estimate of robot parame-ters and a coarse-calibrated camera intrinsic parameters areused in the control law, and at time t = 4s, t = 7s, the extrinsiccamera parameters are altered, see Fig. 3. From the plots itcan be observed that even when the parameters change, thecontroller is capable to cope with these uncertainties andmaintain stability of the system. In the case of changes inthe orientation matrix RbCl , the controller can handle the un-certainty to a certain extent (approx. 20% error). Therefore,a suitable technique to generate a rough estimation of RbCl isneeded to guarantee the stability (see Section II-E).

0 2 4 6 8 10−70

−60

−50

−40

−30

−20

−10

0

10

20

Time (s)

Positio

n E

rror

(pix

el)

a) Position Error in 3D Visual Cartesian Space

X

Y

Z

0 2 4 6 8 10−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Time (s)

Positio

n E

rror

(m)

b) Position Error in task Space

X

Y

Z

0 2 4 6 8 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Time(s)

Para

mete

rs

c) Robot and Camera Pose Parameters

Xcam

Ycam

Zcam

α

β

γ

−0.4

−0.2

0

0.2

0.4

−1.2

−1

−0.8

−0.6

−0.40.4

0.5

0.6

0.7

0.8

X(m)

d) Desired and end−effector trajectories

Y(m)

Z(m

)

Xd

Xef

Fig. 3. Simulation results: The exponential convergence of the trackingerrors is depicted. It can be seen that the trajectories are stable, even whenthe parameters change t = 4s and t = 7s. a),b) The position errors in bothtask space (meter) and 3D visual Cartesian space (pixel) are shown. c) Theparameters of the stereo camera pose are illustrated, notice how they changein time. d) is the tracking trajectory in task space.

IV. 3D VISUAL SERVOING WITH ENVIRONMENTCONSTRAINTS

In the real experiment, we integrate the visual servoingsystem in a HRI scenario, where enviroment constraints,such as: robot singularites avoidance, (self-/obstacle) col-lision avoidance must be included to generate a safe and

Stereo VisionModel

OrthogonalCamera Model

Imagepoints

Robot

Environment

Constraints F

+

+

+

-

-

+

3D Camera Model

p pl , r

X (X )Cl V Xs q = J Xs s-1

qr

J(q)T

YrΘ˄

˄

˄

q

Sq˄

Kdττt

τec

ȓ Θ˄

Fig. 4. 3D adaptive visual servoing including enviroment constraints

singularity-free trajectory for the robot. In this work, wemodeled the environment constraints as a total force F ,which includes singularity avoidance force Fr, self-collisionavoidance force Fc, obstacle-collision avoidance force Fo,etc. Fig. 4 shows the integration of the control with theenvironment constraints.

1) Singularity Force: An example of how to computethis force is given by Fr = −KrPrDr−BrẊe f , where Ẋe f isthe velocity of the end-effector, Kr,Br ∈ R3×3 are constantmatrices, and we define ∆q as the absolute value of thedifference between qi and qsingularity, Dr is the direction ofgradient for the maximum manipulability factor µ . Then,Pr = eαr∆q−1, where αr is a constant to control the stiffnessof the applied force.

2) Collision Avoidance Forces: We use the artificial po-tential field approach to compute the forces for collisionavoidance problem, where the obstacles are repulsive sur-faces for the manipulator. This force is given by Fo =−KoPoDo−BoẊe f , with Po = eαo(Disto−ho)− 1, where hs isthe minimum distance between the obstacle and robot arm onthe xoy plane, and Do is the direction of the force, obtainedbetween the center of the obstacle and the end-effector.

V. EXPERIMENTS

A. System Review

This system consists of 3 sub-systems, a) the VisualStereo Tracker, b) the Robot Control System and c) the 3Dvisualization System, see Fig. 1.

1) Visual Stereo Tracker: The stereo system is composedof 2 USB cameras fixed on a camera tripod. The stereo rig isuncalibrated with respect to the wcf and can be moved. Theparameters of the virtual cameras (see Section II) are selectedsuch that Jα is always non-singular. In order to compute τand avoid multiple-sampling system, an extended Kalmanfilter (EKF) is used to estimate the visual position (samplingperiod 1ms), where the reference is updated each 30ms withthe real visual data of both cameras.

2) Robot Control System: The robot system comprises ofa StaübliTX90 industrial robot arm (6DOF), a CS8C controlunit and a Workstation running on GNU/Linux OS with real-time patch, see Fig. 1. The data communication betweenthe PC and the control unit is in a local network based on

TCP/IP. In this article, the robot is controlled in torquemode using a Low Level Interface (LLI) library.

3) 3D Visualization System: This module performsOpenGL based real-time rendering of the workspace in 3D.It uses Qt and Coin3D as its backbone. The system updatesthe configuration of the robot arm and the positions of thetarget in real-time. This is achieved by means of TCP/IPcommunication.

B. Experiment Results

In this case we use a simple color-based visual tracker toidentify the target (green) and the robot end-effector (red).The target is held by a human, and the control goal is tofollow the object in the human hand with the end-effector.The stereo vision tracking system provides the positions ofboth red and green cubes with respect to the stereo coordinateframe. This information is mapped to the 3D visual Cartesianspace to compute the errors. Using the adaptive control adynamic collision/singularity free trajectory can be obtained.

During the experimental validation, several behaviors areevaluated. These behaviors are depicted in Fig. 5 and Fig. 6.

Fig. 5 a) shows how the robot handles self-collisions, thesystem generates a collision-free trajectory (red line) insteadof moving directly to the target. Fig. 5 b) demonstrates theobstacle avoidance. The blue line is the target motion and thered line is the robot end-effector motion. In the center, thereis a static obstacle. The robot can avoid it while continuingto track the target.

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

a) Self Collision Avoidance

X(m)

Y(m

)

End−effector Pos

Desired Pos

Initial position

Robot Position

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

b) Obstacle Avoidance

X(m)

Y(m

)

End−effector Pos

Desired Pos

Obstacle position

Robot Position

Fig. 5. Experiment results: a) Self-collision avoidance. b) Obstaclesavoidance.

Fig. 6 demonstrates the visual servoing control in com-bination with different forces. Fig. 6 f) and a) show visualtracking with and without obstacle avoidance, respectively.Fig. 6 c) illustrates the result of singularity avoidance, therobot does not reach the singular condition (q3 = 0), evenwhen the user tries to force it. Fig. 6 e) shows self collisionavoidance. Fig. 6 d) depicts table avoidance where the motionof the robot in the z−axis is constrained by the height of thetable (the end-effector is not allow to go under the table).

The primary advantage of our system is when the targetobject is occluded, the stereo system can be moved tomaintain the targets in the field of view, see Fig. 6 h). Aftermoving the stereo system, our approach can use real data toestimate the orientation matrix on-line. The control absorbsthe perturbations and maintains the overall stability.

The system proves to be stable and safe for HRI scenarios,even in situations where the target is lost (due to occlusionsby the robot or the human), see Fig. 6 b) and g). In this case,

the robot system disables the contribution of τ (visual servo-ing contribution) and reacts only to collision and singularityforces. The visual tracking is resumed as soon as the target isvisible again. A video where these behaviors are illustratedcan be downloaded under: http://youtu.be/INwI2pDWYYo

Fig. 6. System behaviors: a) The control uses the information of bothcameras to track the object. b) The robots stops when the target is lostuntil is visible again. c) Singularity avoidance. d) Table avoidance. e) Self-collision avoidance. f) Obstacle avoidance. g) When the object is occluded,then robot changes its behavior. h) The camera can be moved when thetarget is occluded. The orientation matrix is estimated on-line.

VI. CONCLUSIONSWe proposed a novel image-based controller for 3D image-

based visual servoing using uncalibrated stereo vision sys-tem. The control was evaluated both in simulation and on areal industrial robot. The obtained results show the stabilityof the control even with parametric uncertainties. Further-more, this work extends the adaptive image-based control lawto include enviroment constraints. As a result, informationabout the environment and the kinematic constraints canbe integrated with the 3D visual servoing to generate arobot dynamic system with trajectory free of collisions andsingularities. This approach was evaluated in a real HRIscenario. The future work is to include an advanced objecttracker to improve the performance. Also, the extension to6D (visually control the position and orientation of the end-effector using the visual Cartesian space) is being analyzed.

REFERENCES[1] S. Hutchinson, G. Hager, and P. Corke, “A tutorial on visual servo

control,” IEEE Transactions on Robotics and Automation, vol. 12,no. 5, pp. 651–670, Oct. 1996.

[2] F. Chaumette and S. Hutchinson, “Visual servo control. I. Basicapproaches,” IEEE Robotics Automation Magazine, vol. 13, no. 4, pp.82–90, Dec. 2006.

[3] F. Chaumette, “Potential problems of stability and convergence inimage-based and position-based visual servoing,” in The Confluenceof Vision and Control. LNCIS Series, No 237, Springer-Verlag, 1998,pp. 66–78.

[4] M. Marey and F. Chaumette, “Analysis of classical and new visualservoing control laws,” in IEEE International Conference on Roboticsand Automation, May 2008, pp. 3244–3249.

[5] F. Janabi-Sharifi, L. Deng, and W. Wilson, “Comparison of basicvisual servoing methods,” IEEE/ASME Transactions on Mechatronics,vol. 16, no. 5, pp. 967–983, Oct. 2011.

[6] J. Feddema, C. S. G. Lee, and O. Mitchell, “Model-based visual feed-back control for a hand-eye coordinated robotic system,” Computer,vol. 25, no. 8, pp. 21–31, Aug. 1992.

[7] Y. Mezouar and F. Chaumette, “Optimal camera trajectory withimage-based control.” The International Journal of Robotics Research,vol. 22, no. 10, pp. 781–804, 2003.

[8] N. Papanikolopoulos and P. Khosla, “Adaptive robotic visual tracking:theory and experiments,” IEEE Transactions on Automatic Control,vol. 38, no. 3, pp. 429–445, Mar. 1993.

[9] E. Nematollahi and F. Janabi-Sharifi, “Generalizations to control lawsof image-based visual servoing,” International Journal of Optomecha-tronics, vol. 3, no. 3, pp. 167–186, 2009.

[10] D. Kim, A. Rizzi, G. Hager, and D. Zoditschek, “A ldquo;robustrdquo; convergent visual servoing system,” in IEEE/RSJ InternationalConference on Intelligent Robots and Systems, vol. 1, Aug. 1995, pp.348–353.

[11] Y.-H. Liu, H. Wang, C. Wang, and K. K. Lam, “Uncalibrated visualservoing of robots using a depth-independent interaction matrix,” IEEETransactions on Robotics, vol. 22, no. 4, pp. 804–817, Aug. 2006.

[12] H. Wang, Y.-H. Liu, and D. Zhou, “Dynamic Visual Tracking for Ma-nipulators Using an Uncalibrated Fixed Camera,” IEEE Transactionson Robotics, vol. 23, no. 3, pp. 610–617, June 2007.

[13] B. Yoshimi and P. Allen, “Alignment using an uncalibrated camerasystem,” Robotics and Automation, IEEE Transactions on, vol. 11,no. 4, pp. 516–521, Aug. 1995.

[14] K. Hosoda and M. Asada, “Versatile visual servoing without knowl-edge of true Jacobian,” in IEEE/RSJ/GI International Conference onIntelligent Robots and Systems, vol. 1, Sep. 1994, pp. 186–193.

[15] J. Piepmeier, G. McMurray, and H. Lipkin, “Uncalibrated dynamicvisual servoing,” IEEE Transactions on Robotics and Automation,vol. 20, no. 1, pp. 143–147, Feb. 2004.

[16] L. Pari, J. Sebastin, A. Traslosheros, and L. Angel, “Image basedvisual servoing: Estimated image jacobian by using fundamentalmatrix vs analytic jacobian,” in Image Analysis and Recognition, ser.Lecture Notes in Computer Science, A. Campilho and M. Kamel, Eds.Springer Berlin Heidelberg, 2008, vol. 5112, pp. 706–717.

[17] S. Azad, Farahmand, Amir-Massoud, and M. Jagersand, “Robustjacobian estimation for uncalibrated visual servoing,” in Robotics andAutomation (ICRA), 2010 IEEE International Conference on, May2010, pp. 5564–5569.

[18] E. Cervera, A. P. D. Pobil, F. Berry, and P. Martinet, “Improvingimage-based visual servoing with three-dimensional features.” TheInternational Journal of Robotics Research, vol. 22, no. 10-11, pp.821–840, 2003.

[19] K. Hashimoto, T. Ebine, and H. Kimura, “Dynamic Visual FeedbackControl For A Hand-eye Manipulator,” in Proceedings of the 1992lEEE/RSJ International Conference on Intelligent Robots and Systems,vol. 3, July 1992, pp. 1863–1868.

[20] E. Zergeroglu, D. Dawson, M. de Queiroz, and S. Nagarkatti, “Robustvisual-servo control of robot manipulators in the presence of uncer-tainty,” in IEEE Conference on Decision and Control, vol. 4, 1999,pp. 4137–4142.

[21] A. Astolfi, L. Hsu, M. Netto, and R. Ortega, “Two solutions to theadaptive visual servoing problem,” IEEE Transactions on Robotics andAutomation, vol. 18, no. 3, pp. 387–392, June 2002.

[22] B. Bishop, S. Hutchinson, and M. Spong, “Camera modelling forvisual servo control applications,” Math. Comput. Model., vol. 24, no.5-6, pp. 79–102, Sep. 1996.

[23] L. Hsu and P. Aquino, “Adaptive visual tracking with uncertainmanipulator dynamics and uncalibrated camera,” in IEEE Conferenceon Decision and Control, vol. 2, 1999, pp. 1248–1253.

[24] E. Dean-Leon, V. Parra-Vega, A. Espinosa-Romero, and J. Fierro,

http://youtu.be/INwI2pDWYYo

“Dynamical image-based PID uncalibrated visual servoing with fixedcamera for tracking of planar robots with a heuristical predictor,” inIEEE International Conference on Industrial Informatics, June 2004,pp. 339–345.

[25] R. Kelly, “Robust asymptotically stable visual servoing of planarrobots,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5,pp. 759–766, Oct. 1996.

[26] H. Wang, Y.-H. Liu, and W. Chen, “Uncalibrated Visual TrackingControl Without Visual Velocity,” IEEE Transactions on ControlSystems Technology, vol. 18, no. 6, pp. 1359–1370, Nov. 2010.

[27] A. Harltey and A. Zisserman, Multiple View Geometry in ComputerVision (2. ed.). Cambridge University Press, 2006.

INTRODUCTIONOrganization

3D CAMERA MODELProblem StatementStereo Vision ModelVirtual Composite Camera ModelVisual JacobianOn-line Orientation Matrix Estimation

ADAPTIVE IMAGE-BASED 3D VISUAL SERVOINGNon Linear Robot Dynamic ModelJoint Velocity Nominal ReferenceUncertainties in JsControl DesignStability ProofBoundedness of the closed loop trajectoriesSecond-order sliding modesExponential convergence of visual tracking errors

Simulation

3D VISUAL SERVOING WITH ENVIRONMENT CONSTRAINTSSingularity ForceCollision Avoidance Forces

EXPERIMENTSSystem ReviewVisual Stereo TrackerRobot Control System3D Visualization System

Experiment Results

CONCLUSIONS

Uncalibrated 3D Stereo Image-based Dynamic Visual Servoing ... · uncalibrated visual servoing with...

Documents

Transcript of Uncalibrated 3D Stereo Image-based Dynamic Visual Servoing ... · uncalibrated visual servoing with...