Universitat Politècnica de Catalunya14 UNIVERSITTA POLITÈCNICA DE CAALUNYTA Figure 2.1: Visual...

Universitat Politècnica de Catalunya

Master:

Automatic Control and Robotics

Master Thesis

Development of an error propagation model for

Visual Odometry

Jose Rafael Capriles Grane

Director: Juan Andrade Cetto

Academic Course 2011/12

January 2012

Development of an Error Propagation Model for Visual Odometry 1

Abstract

Visual odometry is the process of estimating the position and orientation of a moving

camera rig by matching features along the image sequence and computing relative motion

from these matches.

In this thesis, we propose an error propagation model for visual odometry. To test our

model we implemented a visual odometry system using a full-calibrated stereo camera

rig. In our system, no prior knowledge of the scene nor the motion is necessary. The

algorithm extracts feature points in the camera images, using SURF [1] detector. These

feature points are then matched in the left and right images. The 3D reconstruction of the

matched points is then calculated by stereo 3D triangulation. To �nd the camera position

and orientation, we estimate the rigid motion of a subset of points tracked in two di�erent

camera positions. Pose estimation is calculated using single value decomposition.

To propagate feature detection uncertainties, keypoint covariance in the image plane

are propagated through the algorithm to stereo 3D reconstruction, pose estimation and

�nally position concatenation. First order error propagation is computed using the Ja-

cobian of each function. To test the performance of the error propagation model we

evaluated the results with Monte Carlo simulation.

Finally we test our visual odometry method with simulated stereo sequences recorded

with a robot in Matlab. This research was developed at the Institut de Robotica i Infor-

matica Industrial CSIC-UPC.

2 UNIVERSITAT POLITÈCNICA DE CATALUNYA

Index

Abstract 1

1 Introduction 7

1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Visual Odometry 13

2.1 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Odometry estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Error Propagation 21

3.1 Error Propagation in Visual Odometry . . . . . . . . . . . . . . . . . . . . 22

3.2 From Images to 3D Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 From 3D Points to R and T . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 From Pose Estimation to Cummulative Odometry . . . . . . . . . . . . . . 28

4 Results 29

4.1 The Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Stereo Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Visual Odometry error propagation model . . . . . . . . . . . . . . . . . . 32

5 Conclusions 37

6 Acknowledgements 39

3

Figure list

1.1 Visual odometry results obtained by D. Nister O. Naroditsky and J. Bergen 10

2.1 Visual odometry algorithm overview. . . . . . . . . . . . . . . . . . . . . . 14

2.2 Pinhole Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Motion between two frames. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Recover motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Recover motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6 Odometry concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 TEO robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Stereo images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Relative position of grids with respect to the cameras . . . . . . . . . . . . 31

4.4 Current and Previous Stereo pair matched . . . . . . . . . . . . . . . . . . 32

4.5 Error propagation to 3D reconstruction . . . . . . . . . . . . . . . . . . . 33

4.6 Error propagation to R and T . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.7 Odometry cumulative error . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.8 Matching step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.9 3D point clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.10 Relative motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.11 Accumulative odometry uncertanty . . . . . . . . . . . . . . . . . . . . . . 36

5


1. Introduction

Accurate localization and mapping is a problem of great signi�cance in mobile robotics.

Providing autonomy of the mapping task is in particular important for applications in

which human intervention is limited such as search and rescue, underwater exploration,

interplanetary missions or manipulation of dangerous material.

There are a number of notable localization methods being used today in the robotics

community. The principal di�erence among these methods is the type of sensor used

to collect the environment data. Odometry, being the computation of aggregated robot

translation is of paramount importance to all of them. Odometry is typically computed

from wheel encoders. The method alone can at most yield a 2D path estimate, so it is not

su�cient when navigating in a non-planar surface, and must be complemented by other

sensors such as sonars, laser range-�nders or GPS. It has also the drawback that wheel

slippage adds signi�cant error to the cumulative estimation of translation.

Vision based path estimation on the contrary, is able to yield accurate results while

being intrinsically 3D. The technique of motion estimation from the di�erence in camera

images is known as visual odometry [3]. It is thus an alternative or complementary

solution to wheel-based odometry.

A visual odometry algorithm can be divided in four stages:

1. Feature detection: extract features of interest and match them across image frames.

2. Stereo reconstruction: from feature matches, compute the location of 3D points


with respect to the camera frame.

3. Motion estimation: from 3D point cloud in two consecutive time frames, compute

the relative camera motion (rotation and translation).

4. Aggregate relative motion: concatenate the relatives motions along the image se-

quence.

1.1 Objectives

The aim of this work is to implement and evaluate an error propagation model for visual

odometry. The system will use images from a stereo camera rig mounted in front of a

4-wheel moving robot. To test the error propagation model we use a visual odometry

system simulation. The speci�c objectives of this master thesis are:

� To implement a visual odometry simulation

� To develop an error propagation model for visual odometry.

� To test the visual odometry application and the error propagation model together.

1.2 State of the Art

Visual odometry is a relatively new technology for localizing mobile robots. The term

"visual odometry" was coined in 2004 by Nister in his paper [3], the term was chosen

for its similarity to wheel odometry, but instead of estimates the motion of a vehicle by

incrementally integrating the number of turns of its wheels over time, visual odometry

estimates the motion by incrementally estimate the robot pose through examination of the

changes that motion induces on the images of its onboard cameras. Application of visual

odometry include robotics, wearable computing, augmented reality, and automotive.


Visual odometry needs good illumination in the environment and a static scene with

enough texture to allow apparent motion to be obtained, also the consecutive frames

should be captured at a frequency that ensure to have su�cient scene overlap. Ful�lling

this conditions, visual odometry provides advantages with respect to wheel odometry as

being not a�ected by wheel slip in uneven terrain or other adverse conditions. Also, since

cameras are inexpensive and provide high information bandwidth, they can serve as a

cheap and precise localization sensor.

The research in this �eld, starts for planetary rovers in the early 1980s, and was

motivated by the NASA Mars exploration program. The �rst estimation of a vehicle's

egomotion from visual input alone was described by Moravec [4]. His work stands out not

only for presenting the �rst motion estimation pipeline, but also for describing one of the

earliest corner detectors [7].

Most of the research done in visual odometry is building upon Moravec's work, as

Matthies and Shafer [5], who used binocular system and Moravec's procedure for detecting

and tracking corners, they included the error covariance matrix into the motion estimation

step to obtain superior results in trajectory recovery with 2% relative error over 5.5 meters

path. Later Olson et al. [6] extended that work by introducing an absolute orientation

sensor and used Forstner corner detector [9], which is is signi�cantly faster to compute.

They showed that incorporating an absolute orientation sensor into the estimation step

the error growth can be reduced to a linear function of the distance traveled (relative

position error of 1.2 % on a 20 meters path).

Similar visual odometry implementations was obtained by Lacroix et al. [8] but us-

ing dense stereo and detecting keypoints by analyzing the correlation function around its

peaks. This approach was later exploited in [11]. A di�erent approach to motion estima-

tion and outlier removal for an all-terrain rover was proposed by Milella and Siegwart in

[10]. They estimate motion by �rst using least squares, as in the methods earlier, and

then with the iterative closest point (ICP) algorithm [12].

In 2004, Nister et al.[3] improved earlier implementation by proposing the �rst real-


Figure 1.1: Visual odometry results obtained by D. Nister O. Naroditsky and J. Bergen

time visual odometry long run implementation with a robust outlier rejection scheme.

They improved the earlier implementations in several areas. First, they track features

(Harris corner [13]) independently in all frames and only allowed matches between fea-

tures. This has the bene�t of avoiding feature drift during cross-correlation-based track-

ing. Second, they did not compute the relative motion as a 3D to 3D point registration

problem but as a 3D-to-2D camera pose estimation problem. Finally, they incorporated

RANSAC [14] outliers rejection into the motion estimation step.

A di�erent motion estimation scheme was introduced by Comport et al. [17]. Instead

of using 3-D-to-3-D point registration like previous methods they used 3-D-to-2-D camera-

pose estimation techniques, this allows the motion to be computed from 2-D-to-2-D image

matches without having to triangulate 3-D points. The bene�t of this technique lays in a

more accurate motion computation and CPU consumption.

Since visual odometry works by computing the motion path incrementally (pose after

pose), the errors introduced by each new estimation accumulate over time. This generates

a drift between the real path and the estimated trajectory. As can be seen in Figure 1.1

from [3].

Keep this drift as small as possible can be done through local optimization over the

last n poses as in [15]. This approach, called sliding window bundle adjustment has been


used in several works and specially on a 10 Km visual odometry experiment by Konolige

et al. [16] they demonstrated that windowed-bundle adjustment can decrease the �nal

position error by a factor of 2 to 5.

Visual odometry can be used as a building block for a complete simultaneous localiza-

tion and mapping (SLAM) [18]. But is needed to add some way to detect loop closing and

possibly a global optimization step to obtain a metrically consistent map. Visual-SLAM

[20] can be used as well to only recover the camera path and not the environment map,

this method is much more precise that visual odometry, because it enforces many more

constraints on the path, but not necessarily more robust. And require more computation.


2. Visual Odometry

An overview of the visual odometry algorithm proposed is shown in �gure 2.1

� The �rst step of our algorithm is to acquire the stereo stream of images

� From each stereo image we extract SURF keypoints.

� Next, based on this features we match the keypoints (between the left and the right

images)

� The matched keypoints are then matched with the previous frame keypoints.

� For each feature that have positive matches (left - right and left - previous left) we

compute the 3D points projection in camera coordinates.

� Based on the movement of the point cloud between frames, we can estimate the

movement of the camera using single value decomposition.

� Finally we concatenate the new motion estimation with the previous localization of

the robot to obtain the accumulative odometry estimation


Figure 2.1: Visual odometry algorithm overview.

2.1 Feature Detection

After capturing the images, the system has to identify features in the left and right images.

Feature detection is a key-step since the detected features are the baseline for the motion

estimation.

In this stage we needed to take care of several variables like the number of keypoints

needed to obtain enough inlier points, the detector implementation, the CPU time re-

quired for detection. Our �rst approach was to use Scale Invariants Features (SIFT) [21]

but the time required for the detection was too large for our purpose. We decided to

implement our visual odometry algorithm using Speeded Up Robust Features (SURF)

instead[23].

Feature detection is performed using the OpenCV library, where the class SURF im-

plements Speeded Up Robust Features as a fast multi-scale Hessian keypoint detector.

In each frame, we want to detect as much SURF keypoints as we can, but we need to

ensure that each iteration is performed in less than 1 second. The reason to have as much

keypoints as possible is because we need dense structure reconstruction to compute the

relative motion of the camera rig. Moreover, further steps in the process will �lter many

features as outliers.


Figure 2.2: Pinhole Model.

2.2 Feature Matching

After the SURF keypoints are detected in the images, we need to match them between

the current and previous frames. For this process we use a nearest neighbor matcher,

implemented in OpenCV. The matching is done in two steps, �rst the features from

the current stereo pair are matched and then, these keypoints are matched with the the

previous frame's keypoints.

2.3 3D Reconstruction

3D reconstruction is based in the pinhole camera model. The image is formed by the

intersection of the light rays from the objects through the center of the lens (projection

center), with the focal plane, see �gure 2.2.

The projection from a 3D point in world coordinates to the 2D image is obtained with

the perspective projection Eq. 2.1. The camera matrix or matrix of intrinsic parameters

does not depend on the scene viewed and, once estimated, can be re-used.


Figure 2.3: Motion between two frames.

ul

vl

sl

1

=

αul 0 u0l 0

0 αvl v0l 0

0 0 1 0

0 0 0 1

x

y

z

1

(2.1)

The origin of the stereo camera rig is located in the left camera, see �gure (2.3). Thus,

the left perspective projection matrix has no extrinsic parameters. The right camera is

located with respect to the left camera, and the same 3D point is projected to the left

image plane �rst by representing it in the left camera frame and then projecting using

the right camera extrinsic parameters, as show in equation 2.2.

ur

vr

sr

1

=

αur 0 u0r 0

0 αvr v0r 0

0 0 1 0

0 0 0 1

r1T tx

r2T ty

r3T tz

0T 1

x

y

z

1

(2.2)

Substituting the scale of each image point (third row) on the �rst two rows for both

Eq. 2.1 and Eq. 2.2, and combining the four resulting equations into a system, we have

that:

(ur − u0r)r3

T − αurr1T

(vr − v0r)r3T − αvrr2T

−αul 0 ul − u0l

0 −αvl vl − v0l

x

y

z

=

(u0r − ur)tz + αurtx

(v0r − vr)tz + αvrty

0

0

(2.3)


Figure 2.4: Recover motion estimation

By letting x = (x, y, z)T , the location of the 3D point is obtained solving the under-

constrained system. Representing our system as:

Ax = b. (2.4)

The least squares solution is simply

x = (ATA)−1ATb. (2.5)

2.4 Motion Estimation

Now that we are able to get the cloud of 3D points from the stereo camera rig, we need

to recover the motion between consecutive sets of 3D points, as we can see in Figure 2.4.

R and T, represent the transformation between two robot poses, P = [p1, p2...pn] and

P ′ = [p′1, p′2...p

′n] are respectivaly two set of 3D points reconstructed from the matched

keypoints at current and previous time frame.

In Figure 2.5, you can see the two set of point cloud, the blue points correspond to

previous time frame and the red points correspond to the current time frame. We desired


Figure 2.5: Recover motion estimation

to obtain a rotation matrixR and translation vectorT that minimize the distance between

the two sets of matched points di.

minn∑i=1

di2 (2.6)

minn∑i=1

||Rpi + T− pi′||2 (2.7)

Let µ, µ′ be the two mean vectors of the corresponding 3D point sets. The centered

points will be ai = pi − µ and bi = pi − µ′. A matrix with all ai stacked up is A, the

similar matrix of all bi is B. A solution to this problem is based on the singular value

decomposition of the product of those matrices is

BAT = UTDVT. (2.8)

Then, the rotation between the two 3D point cloud is obtained as

R = VUT. (2.9)

Finally, the translation vector T is equal to


Figure 2.6: Odometry concatenation

T = µ−Rµ′. (2.10)

With this least-squares estimation, a point pi transformed by Rpi +T , should be near

to point pi′.

2.5 Odometry estimation

The agregated odometry will be the result of the homogeneous matrixHi+1 concatenation,

as seen in Equation 2.12, where Hi is the current pose estimation, Hi+1 is the new pose

estimation and Hmotion is the relative motion estimation obtained in the previous step.

Hmotion =

Rmotion Tmotion

0 1

(2.11)

The odometry concatenation will be,

Hi+1 = HiHmotion (2.12)

Which is equal to,


Hi+1 =

RiRmotion RiTmotion + Ti

0 1

(2.13)


3. Error Propagation

The uncertainties (or errors) are the lack of certainty associated to measurements. This

errors occurs due to measurement limitations (e.g., instrument precision, incorrect cali-

bration). Each measurements has its own uncertainty and all the calculations made with

these measurements have to take care of such uncertainty into consideration. The ef-

fect of a measurement error on the uncertainty of a function based on them is known as

propagation of error (or propagation of uncertainty).

There are many sources of error than can be accumulated in our visual odometry

algorithm, each of them from di�erent sources i.e. errors in the camera model, errors in

the feature extraction due to noncontinuous image representation (pixel), errors in the

3D reconstruction, errors in motion estimation, etc. In order to simplify the work we only

consider three principal sources of error:

� Detection errors: this errors are associated to changes in the camera images and

the capability of tracing the points in the video stream. The keypoints must be

repeatable, that means that they must not disappear in di�erent views (for consec-

utive matching) and they should be strong against viewpoint changes (for stereo

matching principally).

� Matching errors: the SURF keypoints extracted in the images can be wrongly


matched. This problem can be alleviated detecting and rejecting the wrong matches

and removing the outliers. We use a RANSAC for robust outlier removal.

� Number of features: after matching and �ltering the keypoints, a rich 3D point set

is needed to be able to compute a good pose estimation. If the number of features

is smaller than a threshold the current frame will be discarded and we loose track

of the robot at this frame.

3.1 Error Propagation in Visual Odometry

The source of error in this algorithm is feature detection uncertainty. The idea is to

propagate such uncertainty all the way through the algorithm to compute an estimate of

the robot pose uncertainty.

We can divide the error propagation model in three stages:

1. From images to 3D points.

2. From 3D point cloud to pose estimation.

3. From pose estimation to cummulative odometry.

3.2 From Images to 3D Points

The visual odometry algorithm is based on the information obtained from the stereo

images, the feature detection in this images provides key information and also introduces

the initial uncertainty of the process. The problem we faced is how precise is the estimation

of the keypoints in each image and how does this uncertainty impact on the whole system.

We suppose all points between image pairs were matched properly to avoid extra error

sources and we focus our attention in propagating the covariance of the SURF features to

the 3D point cloud. This covariance is represented as a unique matrix Σuv for all feature


points, given by the scale of the SURF operator, and independent along the horizontal

and vertical image directions.

Σuv =

σul

2

σvl2

σur2

σur2

(3.1)

This is, σul , σvl , σur and σvr are the standard deviation in pixels of the matched features

obtained from the SURF detector. As we know from [22], the �rst order approximation

to the covariance matrix Σy in a system y = f(x) is equal to :

Σy = ∇fΣx∇Tf (3.2)

The problem now is to obtain a Jacobian ∇f to our system, in which there is no

explicit function that can be used for simply taking all the partial derivatives. We might

not know f(x) explicitly, but we might know some relation between inputs and outputs

such as

Φ(x, f(x)) = 0 (3.3)

In Equation (3.3), we have an implicit de�nition of our system that we can di�erentiate

with respect to our input x:

dΦ(x, f(x))

dx= 0 (3.4)

dΦ

dx+dΦ

df

df

dx= 0 (3.5)

And hence


df (x)

dx= −

(dΦ

df (x)

)−1dΦ

dx(3.6)

df

dx= −

(dΦ

dy

)−1dΦ

dx(3.7)

Now we can use Equation 3.7 to obtain the Jacobian that we need. For the case of

stereo reconstruction we want to minimize the error in the system represented in Equation

2.4. In other words, we want to minimize the following cost function:

C = ‖A x − b‖2 (3.8)

The necesary condition for a minimum is dC(x,y)dy|x = 0 and in the notation of equation

(3.3) we have Φ = dCdy. If the Hessian H = d2C

dy2is invertible at the minimum, then the

implicit function equation (3.7) states that the Jacobian of f is:

∇f = −(d2C

dy2

)−1d2C

dydx(3.9)

Where x = (ul, vl, ur, vr)T and y = (x, y, z)T . Then, the Jacobian is equal to

∇f = −(

d2C

d(x, y, z)2

)−1d2C

d(x, y, z)d(ul, vl, ur, vr)(3.10)

Using notation of equation 2.4, we can obtain

d2C

d(x, y, z)2 = 2(ATA) (3.11)

And,


d2C

d(x, y, z)d(ul, vl, ur, vr)=

2A1

T (tz +R3Tp)− 2R3(b1 − A1p)

2A2T (tz +R3

Tp)− 2R3(b2 − A2p)

−2αurpz 0 −2αurpx − 4pz(u0r − ur)

0 −2αvrpz −2αvrpy − 4pz(v0r − vr)

(3.12)

Where Ai and Ri are the i-th rows of matrices A and R, respectively.

Then, the covariance matrix Σp of the 3D point, is approximately equal to:

Σp = ∇f Σuv ∇fT (3.13)

Σp = ∇f

σul 0 0 0

0 σvl 0 0

0 0 σur 0

0 0 0 σvr

∇fT (3.14)

3.3 From 3D Points to R and T

The motion estimation is calculated as a rotation and translation between two sets of

matched 3D points. We asume all points between the two 3D point sets were matched

properly and we focus our attention in propagating the covariance of the 3D point cloud

to the motion estimation parameters. In this case we want to minimize the cost function:

C =n∑i=1

||Rpi + T − pi′||2 (3.15)

The minimization of the cost function in Equation 3.15 implies a relation between our

two sets of variables, the 3d locations of points, and the parameters of the rigid body

motion. Thus, we have the same situation as in the previous section. We want to obtain


a Jacobian of an implicit fuction represented in this case by the cost function in equation

(3.15). Where p = (X1, Y1, X2, Y2) = (x1, y1, ...xn, yn, x1′, y1

′, ...xn′, yn

′) and y = (x, y, θ)T .

∇f = −(

d2C

d(x, y, θ)2

)−1d2C

d(x, y, θ)d(p)(3.16)

Using notation of equation 3.15, we can de�ne

dC

d(x, y, θ)=

dCd(x)

dCd(y)

dCd(Θ)

(3.17)

Then, if we de�ne the number of inliers used in each iteration for motion estimation

as n, we can obtain

d2C

d(x, y, θ)2=

2n 0 d2C(1,3)

(x,y,θ)2

0 2n d2C(2,3)(x,y,θ)2

d2C(3,1)(x,y,θ)2

d2C(3,2)(x,y,θ)2

d2C(3,3)(x,y,θ)2

(3.18)

Where d2C(1,3)(x,y,θ)2

,d2C(2,3)

(x,y,θ)2and d2C(3,3)

(x,y,θ)2are equal to

d2C(1, 3)

(x, y, θ)2= 2(

n∑i=1

xi′sen(θ) +

n∑i=1

yi′cos(θ)) (3.19)

d2C(2, 3)

(x, y, θ)2= 2(

n∑i=1

yi′sen(θ)−

n∑i=1

xi′cos(θ)) (3.20)

d2C(3, 3)

(x, y, θ)2= 2(sen(θ)((

n∑i=1

xixi′ +

n∑i=1

yiyi′ + Tx

n∑i=1

xi′ + Ty

n∑i=1

yi′) +

(n∑i=1

xiyi′ −

n∑i=1

yixi′ + Tx

n∑i=1

yi′ − Ty

n∑i=1

xi′)) (3.21)

And,


d2C(3, 1)

(x, y, θ)2=d2C(1, 3)

(x, y, θ)2(3.22)

d2C(3, 2)

(x, y, θ)2=d2C(2, 3)

(x, y, θ)2(3.23)

Finally,

d2C

d(x, y, θ)d(p)=

2 0 2cos(θ) −2sen(θ)

0 2 −2sen(θ) −2cos(θ)

d2C(3,1)d(x,y,θ)d(p)




(3.24)

Where,

d2C(3, 1)

d(x, y, θ)d(p)= 2(

n∑i=1

xi + Tx −n∑i=1

xi′cos(θ) +

n∑i=1

yi′sen(θ)) (3.25)

d2C(3, 2)

d(x, y, θ)d(p)= 2(

n∑i=1

yi + Ty −n∑i=1

yi′cos(θ)−

n∑i=1

xi′sen(θ)) (3.26)

d2C(3, 3)

d(x, y, θ)d(p)= 2(

n∑i=1

xi′ − (

n∑i=1

xi + Tx)cos(θ)− (n∑i=1

yi + Ty)sen(θ)) (3.27)

d2C(3, 4)

d(x, y, θ)d(p)= 2(

n∑i=1

yi′ + (

n∑i=1

xi + Tx)sen(θ)− (n∑i=1

yi + Ty)cos(θ)) (3.28)

Then, the covariance matrix Σmotion of the relative motion, is approximately equal to:

Σmotion = ∇f Σp ∇fT (3.29)

Σmotion = ∇motion

Σp1

. . .

Σpn

Σ′p1. . .

Σ′pn

∇Tmotion (3.30)


3.4 From Pose Estimation to Cummulative Odometry

We want to propagate the covariance of the motion estimation to the odometry. The

odometry is formed by a rotational part and a translational part, as shown in equation

(2.13). In this case, we have an explicit function for the concatenation of rigid body

motions, we can directly apply the derivates

Hi+1 = HiHmotion (3.31)

Hi+1 = Hi

Rmotion Tmotion

0 1

(3.32)

Hi+1 = Hi

cos(θ) −sen(θ) Tx

sen(θ) cos(θ) Ty

0 0 1

(3.33)

The covariance matrix of the odometry at time frame i is equal to:

Σodom = ∇f Σodom ∇Tf +∇g Σmotion ∇T

g (3.34)

Where ∇f is the Jacobian with respect to the previous odometry position

∇f =dHi+1

d(Hi)=

1 0 −sen(θ)Tx − cos(θ)Ty0 1 cos(θ)Tx − sen(Θ)Ty

0 0 1

(3.35)

And ∇g is the Jacobian with respect to the new motion estimation.

∇g =dHi+1

d(Hmotion)=

cos(θ) −sen(θ) 0

sen(θ) cos(θ) 0

0 0 1

(3.36)


4. Results

4.1 The Robot

TEO is our segway RMP 400 based robot. It is the 4-wheel robot shown in Figure 4.1.

It has a TCM3 Compass, a H3D 3D Laser, one Hokuyo Laser and two PointGrey Flea

Cameras.The stereo rig used in this project is located in the front part of the robot.

Figure 4.1: TEO robot


Figure 4.2: Stereo images

4.2 Stereo Images

Camera images are the input of a visual odometry system. Among other things, the

precision of the system depends mostly on the quality of the camera images. The stereo

camera images used in this project are taken by two PointGrey Flea Cameras located

at front of the robot, the two cameras are pointing in the same direction with a 38 cm

separation between them. These two cameras work as a stereo system capturing pairs of

images at a resolution of 1280 x 960 px, as can be seen in Figure 4.2

4.3 Calibration

The goal of calibration is to accurately measure the intrinsic and extrinsic parameters

of the stereo camera system. The intrinsic parameters are focal length, principal point,

skew coe�cient and distortions. The extrinsic parameters describe the mutual position

and orientation between cameras.

The calibration is accomplished by taking several pictures of a planar checkerboard-

like pattern with known dimensions from di�erent positions and orientations, by ensuring

that the �eld of view of the camera is �lled as much as possible. The intrinsic and extrinsic

parameters are then found though a least square minimization method. Our calibration

was performed using the camera calibration toolbox available in Matlab.


Figure 4.3: Relative position of grids with respect to the cameras

The intrinsic calibration parameters are show in next table (4.3)

Intrinsic Camera Calibration Parameters

Parameter Description Value left Value right

W, h Image width, height (1280x960) (1280x960)

Cx, Cy Principal Point (313.18805, 243.49370) (300.19781, 246.39833)

fx, fy Focal length (885.81833, 884.10614) (882.93736, 881.18040)

The extrinsic calibration results computed are shown in next table (4.3)

Extrinsic Camera Calibration Parameters

Parameter Description Value

Tx, Ty, Tz Translation from left to right image -0.3857, -0.0072, 0.0001

Rx, Ry,Rz Rotation from left to right image 0.04046, 0.01616, 0.01419

Figure 4.3 shows in the left side the relative positions of the calibration grid with

respect to the stereo camera rig and in the right side two di�erent images of the calibration

pattern used.


Figure 4.4: Current and Previous Stereo pair matched

4.4 Feature Matching

Figure 4.4 shows the results of the matching process. The current images are in the

bottom section of the �gure, while the previous images are in the top section. First the

keypoints are calculated in each image, then the keypoints are matched from left to right

images (in current and previous time frame) and showed with light-blue lines from left

to right. Then we match current and previous keypoints and the results are the top to

bottom dark-blue lines.

4.5 Visual Odometry error propagation model

The error propagation model was tested successfully on simulation. The �rst stage of

our simulation covers the image capture, feature extraction and 3D reconstruction, in this

stage the source of error of our system came from SURF keypoints detection error which we

represent as a associated covariance to each feature. Then, this error is propagated to the

3D reconstruction. To evaluate our model we will compare it with the error propagation


Figure 4.5: Error propagation to 3D reconstruction

using Monte Carlo simulation.

Monte Carlo simulation is a computerized mathematical technique: given a function

y = f(x) and some known data, a large corrupted data set is created by repeatedly adding

random noise. Then, this large population is used to estimate the distribution of y from

the distribution of the samples yi = f(xi) [24]. We calculate the mean and the covariance

matrix of this distribution and compare it with the one obtained in our error propagation

model.

Figure 4.5 shows the results of the error propagation from images to 3D. The blue

ellipses are the associated covariance of the 3D reconstruction using Monte Carlo simula-

tion, while the red ellipses are the covariance of the 3D reconstruction obtained with our

error Jacobian functionl.

Our error propagation model obtain a covariance ellipsoids for each point thinner (x

and y axis) that the Monte Carlo results but wider (z axis). This mean that the error in

X and Y axis are smaller than the error in Z axis, as we specting due to the fact that the

reconstructed points are constrained to lie near the projection lines (lines from the focal

spot to the respective image points)

In the next stage we computed the error that is propagated from the 3D pointcloud to


the relative motion estimation. At this point, we have two di�erent point cloud computed

from di�erent location (i.e. pose1 and pose2), each of this points has an associated error

which will a�ect the R and T estimation. Figure 4.6 shows a single step of 3D points to

relative motion error propagation.

Figure 4.6: Error propagation to R and T

The blue ellipsoids indicate the covariance of our estimation using Monte Carlo while

red ellipse is the covariance of our estimation using our error propagation model. The

covariance of our error propagation model is smaller than the Monte Carlo covariance.

This is due to the fact that the linear approximation of the �rst order propagation of

uncertainty leaves higher order terms out of the computation. This probe the validity of

the error propagation model in a single step condition.

Figure 4.7: Odometry cumulative error

In the �nal stage we compute the error that is propagated from the relative motion

estimation to the cumulative odometry. In Figure 4.7 we have a simulated odometry path,


and the error propagated from the stereo features trough this path. The blue ellipses are

the associated covariance of our estimation using Monte Carlo and the red ellipses are the

covariance of our estimation using the error propagation model described in this thesis.

As can be seen, our covariances ellipses are smaller than Monte Carlo ellipses which means

our error propagation model make a good aproximation to the problem.

Figure 4.8: Matching step

Once every section of the error propagation system was tested separately, we tested

the system working all together. The results of the stereo image matching step are showed

in Figure 4.8

Figure 4.9: 3D point clouds

Then, the 3D point cloud are computed from current and previous time frame, the

result of this step is showed in Figure 4.9. The current 3D point cloud are showed in

blue while the previous 3D point cloud are in red. In this �gure we can appreciate

the similarities between the two cloud. The di�erences between then are basically a

rotation and translation. This rotation and traslation between the two point cloud is the


correspondig movement that the stereo rig made.

The covariance of all 3D points obtained from stereo images is shown in Figure 4.10.

Then, the relative motion is calculated between these two sets of points to obtain the

rotation matrix and translation vector that minimize the error between the two sets of

points, as we see in previous chapters.

Figure 4.10: Relative motion estimation

Finally, the odometry is concatenated, the result of the error propagation can be seen

in Figure 4.11. As we said before, the error is propagated cummulatively though the

system, this is the reason why the covariance ellipses of the odometry estimation increase

at each step. The increase rate between each iteration depend of how well was estimated

the relative motion at each iteration and this depends of the velocity of the robot and

the image condition. That is the reason why the covariance ellipses increase in a non-

uniformly pattern.

Figure 4.11: Accumulative odometry uncertanty


5. Conclusions

We present a visual odometry system based on stereo camera input, and also an error

propagation model. The system estimates a mobile robot position.We explored di�erent

ways to extract distinct features in an image such as SIFT or SURF, we can conclude

that SIFT is more robust than SURF but also requires much more time, and in our case it

was better to work with SURF. Once the features were extracted, we did the same work

by evaluating di�erent method of points matching. Our experiments conclude to the

use of SURF detector, and FLANN matching was the better choise for our requirements.

After 3D triangulation, we also implemented the relative orientation algorithm using SVD

resolution and RANSAC, for robust outlier removal.

The error propagation model is based on this conditions and assumes some restriction

to our test. We can conclude that our error propagation model is a valid model and it can

be used to improve the visual odometry performance. This error propagation model try

to include all the error that can be obtained in the normal running of the visual odometry

application but also assume some conditions like no error matching or the assumption of

a 3D point cloud without changes in the amount of points. This problems are present all

the time in the algorithm. This model try to improve the results of the process but still

there is a lot of �eld to cover.


Several possibilities for future work could be considered. First of all, our visual odom-

etry implementation has not been subject to any optimizations. Then, it can be easily

improved with regards to the processing time. Also, the robustness of motion estimation

can be increased, using the re�nement method, called sparse bundle adjustments'[25].

Other possible improvement is to use the Gaussians from the stereo error propagation to

improve the e�ciency of the motion estimation section and could be used to as a check

the quality of the odometry estimation.


6. Acknowledgements

To the Institut de Robotica e Informatica Industrial (IRI) and its sta�, for providing me

with a workplace, materials and support.

It is fair to acknowledge that this investigation could not have been done without

the participation, whether directly or indirectly, in the contents of it, or just with moral

support, of a fair amount of people. These individuals, and I am sure I will miss a couple,

are:

� Juan Andrade Cetto (my tutor), without whom I wouldn't have made it past the

�rst chapter.

� My family; mom, dad Fernando, my sister, for giving me support every time that I

needed it. You guys are the world to me.

� My uncles and cousins for all the support in the most di�cult times, thanks.

� My friends, in alphabetical order; Aaron, Angel, Dani, Esteban, Jorge and Luiz, it

was a great time.

6. Bibliography

[1] H. Bay, T. Tuytelaars, and L. Van Gool, �SURF: speeded up robust features,� in Pro-

ceedings of the European Conference on Computer Vision, pp. 404-417, Graz, Austria,

May 2006.

[2] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler,

and A. Ng. �ROS: an open-source robot operating system." In Proceedings of the Open-

Source Software workshop of the International Conference on Robotics and Automation

(ICRA), Kobe, Japan. May 2009.

[3] D. Nister, O. Naroditsky, and J. Bergen �Visual Odometry", Sarno� Corporation.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), pp. 652 - 659 Vol.1, Washington, USA. Jan. 2004.

[4] H. Moravec, �Obstacle avoidance and navigation in the real world by a seeing robot

rover", PhD thesis, Standford University, Standford, USA, September 1980

[5] L. Matthies and S. Shafer, �Error modeling in stereo navigation", In Proceedings of

the IEEE Journal of Robotics and Automation Vol. RA-3, No. 3, June 1987

[6] C. Olson, L. Matthies, M. Schoppers and M. W.Maimone, �Robust stereo ego-motion

for long distance navigation." In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognitionn (CVPR), pp. 453-458, Hilton Head, USA. June 2000,

41


[7] H. Moravec, �Towards automatic visual obstacle avoidance", In Proceedings of the 5th

international joint conference on Arti�cial intelligence, pp. 584. San Francisco, USA.

Aug. 1977.

[8] S. Lacroix, A. Mallet, R. Chatila, and L. Gallo, �Rover self localization in planetary-

like environments." In Proceedings of the International Symposium of Arti�cial Intel-

ligence, Robotics, and Automation for Space (i-SAIRAS), pp. 433 to 440. Noordwijk,

The Netherlands. June 1999

[9] W. Forstner, �A feature based correspondence algorithm for image matching." In

Archives of Photogrammetry and Remote Sensing, vol. 26, no. 3, pp. 150 - 166, 1986.

[10] A. Milella and R. Siegwart, �Stereo-based ego-motion estimation using pixel tracking

and iterative closest point." In Proceedings of the 4th IEEE International Conference

on Computer Vision Systems (ICVS), pp 21-24. New York, USA. Jan 2006.

[11] A. Howard, �Real-time stereo visual odometry for autonomous ground vehicles." In

Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Sys-

tems.(IROS) pp. 3946-3952. Nice, France. September,2008

[12] P. Besl and N.McKay, �Amethod for registration of 3-d shapes." In IEEE Transac-

tions on Pattern Analysis and Machine Intelligence vol. 14, no. 2, pp. 1992.

[13] C. Harris and M. Stephens, �A combined corner and edge detector", in Proceedings

of the Alvey Vision Conference. pp 147 - 151. 1988.

[14] M.A. Fischler and R.C. Bolles. �Random sample consensus: A paradigm for model

�tting with applications to image analysis and automated cartography." In Commu-

nications of the ACM, 24(6), pp. 381-395, June 1981.

[15] F. Fraundorfer, D. Scaramuzza, and M. Pollefeys, �A constricted bundle adjustment

parameterization for relative scale estimation in visual odometry", in Proceedings of

the IEEE International Conference on Robotics and Automation (ICRA), pp. 1899 -

1904. Anchorage, Alaska, May 2010.


[16] K. Konolige, M. Agrawal, and J. Sol, �Large scale visual odometry for rough ter-

rain", in Proceedings of the International Symposium on Robotics Research (ISRR),

Hiroshima, Japan. Nov 2007.

[17] A. Comport, E. Malis, and P. Rives, �Accurate quadrifocal tracking for robust 3d

visual odometry," in Proceedings of the IEEE International Conference on Robotics

and Automation (ICRA), pp. 40 - 45. Roma, Italy. April, 2007.

[18] H. Durrant-Whyte and T. Bailey, �Simultaneous localization and mapping (SLAM):

Part I. The essential algorithms," In IEEE Robotics and Automation Magazine, vol.

13, no. 2, pp. 99-110, 2006.

[19] T. Bailey and H. Durrant-Whyte, �Simultaneous localization and mapping (SLAM):

Part II. State of the art," In IEEE Robotics and Automation Magazine, vol. 13, no. 3,

pp. 108-117, 2006.

[20] T. Lemaire and S. Lacroix, �Vision-based SLAM: Stereo and monocular approaches,"

In International Journal of Computer Vision (IJCV), vol. 74, no. 3, pp. 343-364, 2006.

[21] David G. Lowe, �Distinctive image features from scale-invariant keypoints." In Inter-

national Journal of Compute Vision, 60, 2 pp 91-11. 2004

[22] Benjamin Ochoa, Serge Belongie, �Covariance Propagation for Guided Matching" In

3rd Statistical Methods in Multi-Image and Video Processing (SMVP) 2006

[23] A. Cumani and A. Guiducci, �Visual odometry with SURFs and dÂ�ej`avu correc-

tion." In Proceedings of the 8th Conference on Optical 3D Measurement Techniques,

Zurich, Switzerland. July 2007.

[24] John Christopher Clarke �Applications of Sequence Geometry to Visual Motion,"

PhD thesis. pp150 Robotic Research Group, Departemnt of Engineering Science, Uni-

versity of Oxford, 1997

[25] Kurt Konolige, �Sparse Sparse Bundle Adjustment" In Proceedings of the British

Machine Vision Conference, pp. 102.1 - 11 Aberystwyth, Wales. Aug 2010

Universitat Politècnica de Catalunya14 UNIVERSITTA POLITÈCNICA DE CAALUNYTA Figure 2.1: Visual...

Documents

Transcript of Universitat Politècnica de Catalunya14 UNIVERSITTA POLITÈCNICA DE CAALUNYTA Figure 2.1: Visual...