Universitat Politècnica de Catalunya14 UNIVERSITTA POLITÈCNICA DE CAALUNYTA Figure 2.1: Visual...
Transcript of Universitat Politècnica de Catalunya14 UNIVERSITTA POLITÈCNICA DE CAALUNYTA Figure 2.1: Visual...
Universitat Politècnica de Catalunya
Master:
Automatic Control and Robotics
Master Thesis
Development of an error propagation model for
Visual Odometry
Jose Rafael Capriles Grane
Director: Juan Andrade Cetto
Academic Course 2011/12
January 2012
Development of an Error Propagation Model for Visual Odometry 1
Abstract
Visual odometry is the process of estimating the position and orientation of a moving
camera rig by matching features along the image sequence and computing relative motion
from these matches.
In this thesis, we propose an error propagation model for visual odometry. To test our
model we implemented a visual odometry system using a full-calibrated stereo camera
rig. In our system, no prior knowledge of the scene nor the motion is necessary. The
algorithm extracts feature points in the camera images, using SURF [1] detector. These
feature points are then matched in the left and right images. The 3D reconstruction of the
matched points is then calculated by stereo 3D triangulation. To �nd the camera position
and orientation, we estimate the rigid motion of a subset of points tracked in two di�erent
camera positions. Pose estimation is calculated using single value decomposition.
To propagate feature detection uncertainties, keypoint covariance in the image plane
are propagated through the algorithm to stereo 3D reconstruction, pose estimation and
�nally position concatenation. First order error propagation is computed using the Ja-
cobian of each function. To test the performance of the error propagation model we
evaluated the results with Monte Carlo simulation.
Finally we test our visual odometry method with simulated stereo sequences recorded
with a robot in Matlab. This research was developed at the Institut de Robotica i Infor-
matica Industrial CSIC-UPC.
2 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Index
Abstract 1
1 Introduction 7
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Visual Odometry 13
2.1 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Odometry estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Error Propagation 21
3.1 Error Propagation in Visual Odometry . . . . . . . . . . . . . . . . . . . . 22
3.2 From Images to 3D Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 From 3D Points to R and T . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 From Pose Estimation to Cummulative Odometry . . . . . . . . . . . . . . 28
4 Results 29
4.1 The Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Stereo Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Visual Odometry error propagation model . . . . . . . . . . . . . . . . . . 32
5 Conclusions 37
6 Acknowledgements 39
3
4 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure list
1.1 Visual odometry results obtained by D. Nister O. Naroditsky and J. Bergen 10
2.1 Visual odometry algorithm overview. . . . . . . . . . . . . . . . . . . . . . 14
2.2 Pinhole Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Motion between two frames. . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Recover motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Recover motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Odometry concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 TEO robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Stereo images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Relative position of grids with respect to the cameras . . . . . . . . . . . . 31
4.4 Current and Previous Stereo pair matched . . . . . . . . . . . . . . . . . . 32
4.5 Error propagation to 3D reconstruction . . . . . . . . . . . . . . . . . . . 33
4.6 Error propagation to R and T . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.7 Odometry cumulative error . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Matching step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.9 3D point clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.10 Relative motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.11 Accumulative odometry uncertanty . . . . . . . . . . . . . . . . . . . . . . 36
5
6 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Development of an Error Propagation Model for Visual Odometry 7
1. Introduction
Accurate localization and mapping is a problem of great signi�cance in mobile robotics.
Providing autonomy of the mapping task is in particular important for applications in
which human intervention is limited such as search and rescue, underwater exploration,
interplanetary missions or manipulation of dangerous material.
There are a number of notable localization methods being used today in the robotics
community. The principal di�erence among these methods is the type of sensor used
to collect the environment data. Odometry, being the computation of aggregated robot
translation is of paramount importance to all of them. Odometry is typically computed
from wheel encoders. The method alone can at most yield a 2D path estimate, so it is not
su�cient when navigating in a non-planar surface, and must be complemented by other
sensors such as sonars, laser range-�nders or GPS. It has also the drawback that wheel
slippage adds signi�cant error to the cumulative estimation of translation.
Vision based path estimation on the contrary, is able to yield accurate results while
being intrinsically 3D. The technique of motion estimation from the di�erence in camera
images is known as visual odometry [3]. It is thus an alternative or complementary
solution to wheel-based odometry.
A visual odometry algorithm can be divided in four stages:
1. Feature detection: extract features of interest and match them across image frames.
2. Stereo reconstruction: from feature matches, compute the location of 3D points
8 UNIVERSITAT POLITÈCNICA DE CATALUNYA
with respect to the camera frame.
3. Motion estimation: from 3D point cloud in two consecutive time frames, compute
the relative camera motion (rotation and translation).
4. Aggregate relative motion: concatenate the relatives motions along the image se-
quence.
1.1 Objectives
The aim of this work is to implement and evaluate an error propagation model for visual
odometry. The system will use images from a stereo camera rig mounted in front of a
4-wheel moving robot. To test the error propagation model we use a visual odometry
system simulation. The speci�c objectives of this master thesis are:
� To implement a visual odometry simulation
� To develop an error propagation model for visual odometry.
� To test the visual odometry application and the error propagation model together.
1.2 State of the Art
Visual odometry is a relatively new technology for localizing mobile robots. The term
"visual odometry" was coined in 2004 by Nister in his paper [3], the term was chosen
for its similarity to wheel odometry, but instead of estimates the motion of a vehicle by
incrementally integrating the number of turns of its wheels over time, visual odometry
estimates the motion by incrementally estimate the robot pose through examination of the
changes that motion induces on the images of its onboard cameras. Application of visual
odometry include robotics, wearable computing, augmented reality, and automotive.
Development of an Error Propagation Model for Visual Odometry 9
Visual odometry needs good illumination in the environment and a static scene with
enough texture to allow apparent motion to be obtained, also the consecutive frames
should be captured at a frequency that ensure to have su�cient scene overlap. Ful�lling
this conditions, visual odometry provides advantages with respect to wheel odometry as
being not a�ected by wheel slip in uneven terrain or other adverse conditions. Also, since
cameras are inexpensive and provide high information bandwidth, they can serve as a
cheap and precise localization sensor.
The research in this �eld, starts for planetary rovers in the early 1980s, and was
motivated by the NASA Mars exploration program. The �rst estimation of a vehicle's
egomotion from visual input alone was described by Moravec [4]. His work stands out not
only for presenting the �rst motion estimation pipeline, but also for describing one of the
earliest corner detectors [7].
Most of the research done in visual odometry is building upon Moravec's work, as
Matthies and Shafer [5], who used binocular system and Moravec's procedure for detecting
and tracking corners, they included the error covariance matrix into the motion estimation
step to obtain superior results in trajectory recovery with 2% relative error over 5.5 meters
path. Later Olson et al. [6] extended that work by introducing an absolute orientation
sensor and used Forstner corner detector [9], which is is signi�cantly faster to compute.
They showed that incorporating an absolute orientation sensor into the estimation step
the error growth can be reduced to a linear function of the distance traveled (relative
position error of 1.2 % on a 20 meters path).
Similar visual odometry implementations was obtained by Lacroix et al. [8] but us-
ing dense stereo and detecting keypoints by analyzing the correlation function around its
peaks. This approach was later exploited in [11]. A di�erent approach to motion estima-
tion and outlier removal for an all-terrain rover was proposed by Milella and Siegwart in
[10]. They estimate motion by �rst using least squares, as in the methods earlier, and
then with the iterative closest point (ICP) algorithm [12].
In 2004, Nister et al.[3] improved earlier implementation by proposing the �rst real-
10 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 1.1: Visual odometry results obtained by D. Nister O. Naroditsky and J. Bergen
time visual odometry long run implementation with a robust outlier rejection scheme.
They improved the earlier implementations in several areas. First, they track features
(Harris corner [13]) independently in all frames and only allowed matches between fea-
tures. This has the bene�t of avoiding feature drift during cross-correlation-based track-
ing. Second, they did not compute the relative motion as a 3D to 3D point registration
problem but as a 3D-to-2D camera pose estimation problem. Finally, they incorporated
RANSAC [14] outliers rejection into the motion estimation step.
A di�erent motion estimation scheme was introduced by Comport et al. [17]. Instead
of using 3-D-to-3-D point registration like previous methods they used 3-D-to-2-D camera-
pose estimation techniques, this allows the motion to be computed from 2-D-to-2-D image
matches without having to triangulate 3-D points. The bene�t of this technique lays in a
more accurate motion computation and CPU consumption.
Since visual odometry works by computing the motion path incrementally (pose after
pose), the errors introduced by each new estimation accumulate over time. This generates
a drift between the real path and the estimated trajectory. As can be seen in Figure 1.1
from [3].
Keep this drift as small as possible can be done through local optimization over the
last n poses as in [15]. This approach, called sliding window bundle adjustment has been
Development of an Error Propagation Model for Visual Odometry 11
used in several works and specially on a 10 Km visual odometry experiment by Konolige
et al. [16] they demonstrated that windowed-bundle adjustment can decrease the �nal
position error by a factor of 2 to 5.
Visual odometry can be used as a building block for a complete simultaneous localiza-
tion and mapping (SLAM) [18]. But is needed to add some way to detect loop closing and
possibly a global optimization step to obtain a metrically consistent map. Visual-SLAM
[20] can be used as well to only recover the camera path and not the environment map,
this method is much more precise that visual odometry, because it enforces many more
constraints on the path, but not necessarily more robust. And require more computation.
12 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Development of an Error Propagation Model for Visual Odometry 13
2. Visual Odometry
An overview of the visual odometry algorithm proposed is shown in �gure 2.1
� The �rst step of our algorithm is to acquire the stereo stream of images
� From each stereo image we extract SURF keypoints.
� Next, based on this features we match the keypoints (between the left and the right
images)
� The matched keypoints are then matched with the previous frame keypoints.
� For each feature that have positive matches (left - right and left - previous left) we
compute the 3D points projection in camera coordinates.
� Based on the movement of the point cloud between frames, we can estimate the
movement of the camera using single value decomposition.
� Finally we concatenate the new motion estimation with the previous localization of
the robot to obtain the accumulative odometry estimation
14 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 2.1: Visual odometry algorithm overview.
2.1 Feature Detection
After capturing the images, the system has to identify features in the left and right images.
Feature detection is a key-step since the detected features are the baseline for the motion
estimation.
In this stage we needed to take care of several variables like the number of keypoints
needed to obtain enough inlier points, the detector implementation, the CPU time re-
quired for detection. Our �rst approach was to use Scale Invariants Features (SIFT) [21]
but the time required for the detection was too large for our purpose. We decided to
implement our visual odometry algorithm using Speeded Up Robust Features (SURF)
instead[23].
Feature detection is performed using the OpenCV library, where the class SURF im-
plements Speeded Up Robust Features as a fast multi-scale Hessian keypoint detector.
In each frame, we want to detect as much SURF keypoints as we can, but we need to
ensure that each iteration is performed in less than 1 second. The reason to have as much
keypoints as possible is because we need dense structure reconstruction to compute the
relative motion of the camera rig. Moreover, further steps in the process will �lter many
features as outliers.
Development of an Error Propagation Model for Visual Odometry 15
Figure 2.2: Pinhole Model.
2.2 Feature Matching
After the SURF keypoints are detected in the images, we need to match them between
the current and previous frames. For this process we use a nearest neighbor matcher,
implemented in OpenCV. The matching is done in two steps, �rst the features from
the current stereo pair are matched and then, these keypoints are matched with the the
previous frame's keypoints.
2.3 3D Reconstruction
3D reconstruction is based in the pinhole camera model. The image is formed by the
intersection of the light rays from the objects through the center of the lens (projection
center), with the focal plane, see �gure 2.2.
The projection from a 3D point in world coordinates to the 2D image is obtained with
the perspective projection Eq. 2.1. The camera matrix or matrix of intrinsic parameters
does not depend on the scene viewed and, once estimated, can be re-used.
16 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 2.3: Motion between two frames.
ul
vl
sl
1
=
αul 0 u0l 0
0 αvl v0l 0
0 0 1 0
0 0 0 1
x
y
z
1
(2.1)
The origin of the stereo camera rig is located in the left camera, see �gure (2.3). Thus,
the left perspective projection matrix has no extrinsic parameters. The right camera is
located with respect to the left camera, and the same 3D point is projected to the left
image plane �rst by representing it in the left camera frame and then projecting using
the right camera extrinsic parameters, as show in equation 2.2.
ur
vr
sr
1
=
αur 0 u0r 0
0 αvr v0r 0
0 0 1 0
0 0 0 1
r1T tx
r2T ty
r3T tz
0T 1
x
y
z
1
(2.2)
Substituting the scale of each image point (third row) on the �rst two rows for both
Eq. 2.1 and Eq. 2.2, and combining the four resulting equations into a system, we have
that:
(ur − u0r)r3
T − αurr1T
(vr − v0r)r3T − αvrr2T
−αul 0 ul − u0l
0 −αvl vl − v0l
x
y
z
=
(u0r − ur)tz + αurtx
(v0r − vr)tz + αvrty
0
0
(2.3)
Development of an Error Propagation Model for Visual Odometry 17
Figure 2.4: Recover motion estimation
By letting x = (x, y, z)T , the location of the 3D point is obtained solving the under-
constrained system. Representing our system as:
Ax = b. (2.4)
The least squares solution is simply
x = (ATA)−1ATb. (2.5)
2.4 Motion Estimation
Now that we are able to get the cloud of 3D points from the stereo camera rig, we need
to recover the motion between consecutive sets of 3D points, as we can see in Figure 2.4.
R and T, represent the transformation between two robot poses, P = [p1, p2...pn] and
P ′ = [p′1, p′2...p
′n] are respectivaly two set of 3D points reconstructed from the matched
keypoints at current and previous time frame.
In Figure 2.5, you can see the two set of point cloud, the blue points correspond to
previous time frame and the red points correspond to the current time frame. We desired
18 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 2.5: Recover motion estimation
to obtain a rotation matrixR and translation vectorT that minimize the distance between
the two sets of matched points di.
minn∑i=1
di2 (2.6)
minn∑i=1
||Rpi + T− pi′||2 (2.7)
Let µ, µ′ be the two mean vectors of the corresponding 3D point sets. The centered
points will be ai = pi − µ and bi = pi − µ′. A matrix with all ai stacked up is A, the
similar matrix of all bi is B. A solution to this problem is based on the singular value
decomposition of the product of those matrices is
BAT = UTDVT. (2.8)
Then, the rotation between the two 3D point cloud is obtained as
R = VUT. (2.9)
Finally, the translation vector T is equal to
Development of an Error Propagation Model for Visual Odometry 19
Figure 2.6: Odometry concatenation
T = µ−Rµ′. (2.10)
With this least-squares estimation, a point pi transformed by Rpi +T , should be near
to point pi′.
2.5 Odometry estimation
The agregated odometry will be the result of the homogeneous matrixHi+1 concatenation,
as seen in Equation 2.12, where Hi is the current pose estimation, Hi+1 is the new pose
estimation and Hmotion is the relative motion estimation obtained in the previous step.
Hmotion =
Rmotion Tmotion
0 1
(2.11)
The odometry concatenation will be,
Hi+1 = HiHmotion (2.12)
Which is equal to,
20 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Hi+1 =
RiRmotion RiTmotion + Ti
0 1
(2.13)
Development of an Error Propagation Model for Visual Odometry 21
3. Error Propagation
The uncertainties (or errors) are the lack of certainty associated to measurements. This
errors occurs due to measurement limitations (e.g., instrument precision, incorrect cali-
bration). Each measurements has its own uncertainty and all the calculations made with
these measurements have to take care of such uncertainty into consideration. The ef-
fect of a measurement error on the uncertainty of a function based on them is known as
propagation of error (or propagation of uncertainty).
There are many sources of error than can be accumulated in our visual odometry
algorithm, each of them from di�erent sources i.e. errors in the camera model, errors in
the feature extraction due to noncontinuous image representation (pixel), errors in the
3D reconstruction, errors in motion estimation, etc. In order to simplify the work we only
consider three principal sources of error:
� Detection errors: this errors are associated to changes in the camera images and
the capability of tracing the points in the video stream. The keypoints must be
repeatable, that means that they must not disappear in di�erent views (for consec-
utive matching) and they should be strong against viewpoint changes (for stereo
matching principally).
� Matching errors: the SURF keypoints extracted in the images can be wrongly
22 UNIVERSITAT POLITÈCNICA DE CATALUNYA
matched. This problem can be alleviated detecting and rejecting the wrong matches
and removing the outliers. We use a RANSAC for robust outlier removal.
� Number of features: after matching and �ltering the keypoints, a rich 3D point set
is needed to be able to compute a good pose estimation. If the number of features
is smaller than a threshold the current frame will be discarded and we loose track
of the robot at this frame.
3.1 Error Propagation in Visual Odometry
The source of error in this algorithm is feature detection uncertainty. The idea is to
propagate such uncertainty all the way through the algorithm to compute an estimate of
the robot pose uncertainty.
We can divide the error propagation model in three stages:
1. From images to 3D points.
2. From 3D point cloud to pose estimation.
3. From pose estimation to cummulative odometry.
3.2 From Images to 3D Points
The visual odometry algorithm is based on the information obtained from the stereo
images, the feature detection in this images provides key information and also introduces
the initial uncertainty of the process. The problem we faced is how precise is the estimation
of the keypoints in each image and how does this uncertainty impact on the whole system.
We suppose all points between image pairs were matched properly to avoid extra error
sources and we focus our attention in propagating the covariance of the SURF features to
the 3D point cloud. This covariance is represented as a unique matrix Σuv for all feature
Development of an Error Propagation Model for Visual Odometry 23
points, given by the scale of the SURF operator, and independent along the horizontal
and vertical image directions.
Σuv =
σul
2
σvl2
σur2
σur2
(3.1)
This is, σul , σvl , σur and σvr are the standard deviation in pixels of the matched features
obtained from the SURF detector. As we know from [22], the �rst order approximation
to the covariance matrix Σy in a system y = f(x) is equal to :
Σy = ∇fΣx∇Tf (3.2)
The problem now is to obtain a Jacobian ∇f to our system, in which there is no
explicit function that can be used for simply taking all the partial derivatives. We might
not know f(x) explicitly, but we might know some relation between inputs and outputs
such as
Φ(x, f(x)) = 0 (3.3)
In Equation (3.3), we have an implicit de�nition of our system that we can di�erentiate
with respect to our input x:
dΦ(x, f(x))
dx= 0 (3.4)
dΦ
dx+dΦ
df
df
dx= 0 (3.5)
And hence
24 UNIVERSITAT POLITÈCNICA DE CATALUNYA
df (x)
dx= −
(dΦ
df (x)
)−1dΦ
dx(3.6)
df
dx= −
(dΦ
dy
)−1dΦ
dx(3.7)
Now we can use Equation 3.7 to obtain the Jacobian that we need. For the case of
stereo reconstruction we want to minimize the error in the system represented in Equation
2.4. In other words, we want to minimize the following cost function:
C = ‖A x − b‖2 (3.8)
The necesary condition for a minimum is dC(x,y)dy|x = 0 and in the notation of equation
(3.3) we have Φ = dCdy. If the Hessian H = d2C
dy2is invertible at the minimum, then the
implicit function equation (3.7) states that the Jacobian of f is:
∇f = −(d2C
dy2
)−1d2C
dydx(3.9)
Where x = (ul, vl, ur, vr)T and y = (x, y, z)T . Then, the Jacobian is equal to
∇f = −(
d2C
d(x, y, z)2
)−1d2C
d(x, y, z)d(ul, vl, ur, vr)(3.10)
Using notation of equation 2.4, we can obtain
d2C
d(x, y, z)2 = 2(ATA) (3.11)
And,
Development of an Error Propagation Model for Visual Odometry 25
d2C
d(x, y, z)d(ul, vl, ur, vr)=
2A1
T (tz +R3Tp)− 2R3(b1 − A1p)
2A2T (tz +R3
Tp)− 2R3(b2 − A2p)
−2αurpz 0 −2αurpx − 4pz(u0r − ur)
0 −2αvrpz −2αvrpy − 4pz(v0r − vr)
(3.12)
Where Ai and Ri are the i-th rows of matrices A and R, respectively.
Then, the covariance matrix Σp of the 3D point, is approximately equal to:
Σp = ∇f Σuv ∇fT (3.13)
Σp = ∇f
σul 0 0 0
0 σvl 0 0
0 0 σur 0
0 0 0 σvr
∇fT (3.14)
3.3 From 3D Points to R and T
The motion estimation is calculated as a rotation and translation between two sets of
matched 3D points. We asume all points between the two 3D point sets were matched
properly and we focus our attention in propagating the covariance of the 3D point cloud
to the motion estimation parameters. In this case we want to minimize the cost function:
C =n∑i=1
||Rpi + T − pi′||2 (3.15)
The minimization of the cost function in Equation 3.15 implies a relation between our
two sets of variables, the 3d locations of points, and the parameters of the rigid body
motion. Thus, we have the same situation as in the previous section. We want to obtain
26 UNIVERSITAT POLITÈCNICA DE CATALUNYA
a Jacobian of an implicit fuction represented in this case by the cost function in equation
(3.15). Where p = (X1, Y1, X2, Y2) = (x1, y1, ...xn, yn, x1′, y1
′, ...xn′, yn
′) and y = (x, y, θ)T .
∇f = −(
d2C
d(x, y, θ)2
)−1d2C
d(x, y, θ)d(p)(3.16)
Using notation of equation 3.15, we can de�ne
dC
d(x, y, θ)=
dCd(x)
dCd(y)
dCd(Θ)
(3.17)
Then, if we de�ne the number of inliers used in each iteration for motion estimation
as n, we can obtain
d2C
d(x, y, θ)2=
2n 0 d2C(1,3)
(x,y,θ)2
0 2n d2C(2,3)(x,y,θ)2
d2C(3,1)(x,y,θ)2
d2C(3,2)(x,y,θ)2
d2C(3,3)(x,y,θ)2
(3.18)
Where d2C(1,3)(x,y,θ)2
,d2C(2,3)
(x,y,θ)2and d2C(3,3)
(x,y,θ)2are equal to
d2C(1, 3)
(x, y, θ)2= 2(
n∑i=1
xi′sen(θ) +
n∑i=1
yi′cos(θ)) (3.19)
d2C(2, 3)
(x, y, θ)2= 2(
n∑i=1
yi′sen(θ)−
n∑i=1
xi′cos(θ)) (3.20)
d2C(3, 3)
(x, y, θ)2= 2(sen(θ)((
n∑i=1
xixi′ +
n∑i=1
yiyi′ + Tx
n∑i=1
xi′ + Ty
n∑i=1
yi′) +
(n∑i=1
xiyi′ −
n∑i=1
yixi′ + Tx
n∑i=1
yi′ − Ty
n∑i=1
xi′)) (3.21)
And,
Development of an Error Propagation Model for Visual Odometry 27
d2C(3, 1)
(x, y, θ)2=d2C(1, 3)
(x, y, θ)2(3.22)
d2C(3, 2)
(x, y, θ)2=d2C(2, 3)
(x, y, θ)2(3.23)
Finally,
d2C
d(x, y, θ)d(p)=
2 0 2cos(θ) −2sen(θ)
0 2 −2sen(θ) −2cos(θ)
d2C(3,1)d(x,y,θ)d(p)
d2C(3,2)d(x,y,θ)d(p)
d2C(3,3)d(x,y,θ)d(p)
d2C(3,4)d(x,y,θ)d(p)
(3.24)
Where,
d2C(3, 1)
d(x, y, θ)d(p)= 2(
n∑i=1
xi + Tx −n∑i=1
xi′cos(θ) +
n∑i=1
yi′sen(θ)) (3.25)
d2C(3, 2)
d(x, y, θ)d(p)= 2(
n∑i=1
yi + Ty −n∑i=1
yi′cos(θ)−
n∑i=1
xi′sen(θ)) (3.26)
d2C(3, 3)
d(x, y, θ)d(p)= 2(
n∑i=1
xi′ − (
n∑i=1
xi + Tx)cos(θ)− (n∑i=1
yi + Ty)sen(θ)) (3.27)
d2C(3, 4)
d(x, y, θ)d(p)= 2(
n∑i=1
yi′ + (
n∑i=1
xi + Tx)sen(θ)− (n∑i=1
yi + Ty)cos(θ)) (3.28)
Then, the covariance matrix Σmotion of the relative motion, is approximately equal to:
Σmotion = ∇f Σp ∇fT (3.29)
Σmotion = ∇motion
Σp1
. . .
Σpn
Σ′p1. . .
Σ′pn
∇Tmotion (3.30)
28 UNIVERSITAT POLITÈCNICA DE CATALUNYA
3.4 From Pose Estimation to Cummulative Odometry
We want to propagate the covariance of the motion estimation to the odometry. The
odometry is formed by a rotational part and a translational part, as shown in equation
(2.13). In this case, we have an explicit function for the concatenation of rigid body
motions, we can directly apply the derivates
Hi+1 = HiHmotion (3.31)
Hi+1 = Hi
Rmotion Tmotion
0 1
(3.32)
Hi+1 = Hi
cos(θ) −sen(θ) Tx
sen(θ) cos(θ) Ty
0 0 1
(3.33)
The covariance matrix of the odometry at time frame i is equal to:
Σodom = ∇f Σodom ∇Tf +∇g Σmotion ∇T
g (3.34)
Where ∇f is the Jacobian with respect to the previous odometry position
∇f =dHi+1
d(Hi)=
1 0 −sen(θ)Tx − cos(θ)Ty0 1 cos(θ)Tx − sen(Θ)Ty
0 0 1
(3.35)
And ∇g is the Jacobian with respect to the new motion estimation.
∇g =dHi+1
d(Hmotion)=
cos(θ) −sen(θ) 0
sen(θ) cos(θ) 0
0 0 1
(3.36)
Development of an Error Propagation Model for Visual Odometry 29
4. Results
4.1 The Robot
TEO is our segway RMP 400 based robot. It is the 4-wheel robot shown in Figure 4.1.
It has a TCM3 Compass, a H3D 3D Laser, one Hokuyo Laser and two PointGrey Flea
Cameras.The stereo rig used in this project is located in the front part of the robot.
Figure 4.1: TEO robot
30 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 4.2: Stereo images
4.2 Stereo Images
Camera images are the input of a visual odometry system. Among other things, the
precision of the system depends mostly on the quality of the camera images. The stereo
camera images used in this project are taken by two PointGrey Flea Cameras located
at front of the robot, the two cameras are pointing in the same direction with a 38 cm
separation between them. These two cameras work as a stereo system capturing pairs of
images at a resolution of 1280 x 960 px, as can be seen in Figure 4.2
4.3 Calibration
The goal of calibration is to accurately measure the intrinsic and extrinsic parameters
of the stereo camera system. The intrinsic parameters are focal length, principal point,
skew coe�cient and distortions. The extrinsic parameters describe the mutual position
and orientation between cameras.
The calibration is accomplished by taking several pictures of a planar checkerboard-
like pattern with known dimensions from di�erent positions and orientations, by ensuring
that the �eld of view of the camera is �lled as much as possible. The intrinsic and extrinsic
parameters are then found though a least square minimization method. Our calibration
was performed using the camera calibration toolbox available in Matlab.
Development of an Error Propagation Model for Visual Odometry 31
Figure 4.3: Relative position of grids with respect to the cameras
The intrinsic calibration parameters are show in next table (4.3)
Intrinsic Camera Calibration Parameters
Parameter Description Value left Value right
W, h Image width, height (1280x960) (1280x960)
Cx, Cy Principal Point (313.18805, 243.49370) (300.19781, 246.39833)
fx, fy Focal length (885.81833, 884.10614) (882.93736, 881.18040)
The extrinsic calibration results computed are shown in next table (4.3)
Extrinsic Camera Calibration Parameters
Parameter Description Value
Tx, Ty, Tz Translation from left to right image -0.3857, -0.0072, 0.0001
Rx, Ry,Rz Rotation from left to right image 0.04046, 0.01616, 0.01419
Figure 4.3 shows in the left side the relative positions of the calibration grid with
respect to the stereo camera rig and in the right side two di�erent images of the calibration
pattern used.
32 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Figure 4.4: Current and Previous Stereo pair matched
4.4 Feature Matching
Figure 4.4 shows the results of the matching process. The current images are in the
bottom section of the �gure, while the previous images are in the top section. First the
keypoints are calculated in each image, then the keypoints are matched from left to right
images (in current and previous time frame) and showed with light-blue lines from left
to right. Then we match current and previous keypoints and the results are the top to
bottom dark-blue lines.
4.5 Visual Odometry error propagation model
The error propagation model was tested successfully on simulation. The �rst stage of
our simulation covers the image capture, feature extraction and 3D reconstruction, in this
stage the source of error of our system came from SURF keypoints detection error which we
represent as a associated covariance to each feature. Then, this error is propagated to the
3D reconstruction. To evaluate our model we will compare it with the error propagation
Development of an Error Propagation Model for Visual Odometry 33
Figure 4.5: Error propagation to 3D reconstruction
using Monte Carlo simulation.
Monte Carlo simulation is a computerized mathematical technique: given a function
y = f(x) and some known data, a large corrupted data set is created by repeatedly adding
random noise. Then, this large population is used to estimate the distribution of y from
the distribution of the samples yi = f(xi) [24]. We calculate the mean and the covariance
matrix of this distribution and compare it with the one obtained in our error propagation
model.
Figure 4.5 shows the results of the error propagation from images to 3D. The blue
ellipses are the associated covariance of the 3D reconstruction using Monte Carlo simula-
tion, while the red ellipses are the covariance of the 3D reconstruction obtained with our
error Jacobian functionl.
Our error propagation model obtain a covariance ellipsoids for each point thinner (x
and y axis) that the Monte Carlo results but wider (z axis). This mean that the error in
X and Y axis are smaller than the error in Z axis, as we specting due to the fact that the
reconstructed points are constrained to lie near the projection lines (lines from the focal
spot to the respective image points)
In the next stage we computed the error that is propagated from the 3D pointcloud to
34 UNIVERSITAT POLITÈCNICA DE CATALUNYA
the relative motion estimation. At this point, we have two di�erent point cloud computed
from di�erent location (i.e. pose1 and pose2), each of this points has an associated error
which will a�ect the R and T estimation. Figure 4.6 shows a single step of 3D points to
relative motion error propagation.
Figure 4.6: Error propagation to R and T
The blue ellipsoids indicate the covariance of our estimation using Monte Carlo while
red ellipse is the covariance of our estimation using our error propagation model. The
covariance of our error propagation model is smaller than the Monte Carlo covariance.
This is due to the fact that the linear approximation of the �rst order propagation of
uncertainty leaves higher order terms out of the computation. This probe the validity of
the error propagation model in a single step condition.
Figure 4.7: Odometry cumulative error
In the �nal stage we compute the error that is propagated from the relative motion
estimation to the cumulative odometry. In Figure 4.7 we have a simulated odometry path,
Development of an Error Propagation Model for Visual Odometry 35
and the error propagated from the stereo features trough this path. The blue ellipses are
the associated covariance of our estimation using Monte Carlo and the red ellipses are the
covariance of our estimation using the error propagation model described in this thesis.
As can be seen, our covariances ellipses are smaller than Monte Carlo ellipses which means
our error propagation model make a good aproximation to the problem.
Figure 4.8: Matching step
Once every section of the error propagation system was tested separately, we tested
the system working all together. The results of the stereo image matching step are showed
in Figure 4.8
Figure 4.9: 3D point clouds
Then, the 3D point cloud are computed from current and previous time frame, the
result of this step is showed in Figure 4.9. The current 3D point cloud are showed in
blue while the previous 3D point cloud are in red. In this �gure we can appreciate
the similarities between the two cloud. The di�erences between then are basically a
rotation and translation. This rotation and traslation between the two point cloud is the
36 UNIVERSITAT POLITÈCNICA DE CATALUNYA
correspondig movement that the stereo rig made.
The covariance of all 3D points obtained from stereo images is shown in Figure 4.10.
Then, the relative motion is calculated between these two sets of points to obtain the
rotation matrix and translation vector that minimize the error between the two sets of
points, as we see in previous chapters.
Figure 4.10: Relative motion estimation
Finally, the odometry is concatenated, the result of the error propagation can be seen
in Figure 4.11. As we said before, the error is propagated cummulatively though the
system, this is the reason why the covariance ellipses of the odometry estimation increase
at each step. The increase rate between each iteration depend of how well was estimated
the relative motion at each iteration and this depends of the velocity of the robot and
the image condition. That is the reason why the covariance ellipses increase in a non-
uniformly pattern.
Figure 4.11: Accumulative odometry uncertanty
Development of an Error Propagation Model for Visual Odometry 37
5. Conclusions
We present a visual odometry system based on stereo camera input, and also an error
propagation model. The system estimates a mobile robot position.We explored di�erent
ways to extract distinct features in an image such as SIFT or SURF, we can conclude
that SIFT is more robust than SURF but also requires much more time, and in our case it
was better to work with SURF. Once the features were extracted, we did the same work
by evaluating di�erent method of points matching. Our experiments conclude to the
use of SURF detector, and FLANN matching was the better choise for our requirements.
After 3D triangulation, we also implemented the relative orientation algorithm using SVD
resolution and RANSAC, for robust outlier removal.
The error propagation model is based on this conditions and assumes some restriction
to our test. We can conclude that our error propagation model is a valid model and it can
be used to improve the visual odometry performance. This error propagation model try
to include all the error that can be obtained in the normal running of the visual odometry
application but also assume some conditions like no error matching or the assumption of
a 3D point cloud without changes in the amount of points. This problems are present all
the time in the algorithm. This model try to improve the results of the process but still
there is a lot of �eld to cover.
38 UNIVERSITAT POLITÈCNICA DE CATALUNYA
Several possibilities for future work could be considered. First of all, our visual odom-
etry implementation has not been subject to any optimizations. Then, it can be easily
improved with regards to the processing time. Also, the robustness of motion estimation
can be increased, using the re�nement method, called sparse bundle adjustments'[25].
Other possible improvement is to use the Gaussians from the stereo error propagation to
improve the e�ciency of the motion estimation section and could be used to as a check
the quality of the odometry estimation.
Development of an Error Propagation Model for Visual Odometry 39
6. Acknowledgements
To the Institut de Robotica e Informatica Industrial (IRI) and its sta�, for providing me
with a workplace, materials and support.
It is fair to acknowledge that this investigation could not have been done without
the participation, whether directly or indirectly, in the contents of it, or just with moral
support, of a fair amount of people. These individuals, and I am sure I will miss a couple,
are:
� Juan Andrade Cetto (my tutor), without whom I wouldn't have made it past the
�rst chapter.
� My family; mom, dad Fernando, my sister, for giving me support every time that I
needed it. You guys are the world to me.
� My uncles and cousins for all the support in the most di�cult times, thanks.
� My friends, in alphabetical order; Aaron, Angel, Dani, Esteban, Jorge and Luiz, it
was a great time.
40 UNIVERSITAT POLITÈCNICA DE CATALUNYA
6. Bibliography
[1] H. Bay, T. Tuytelaars, and L. Van Gool, �SURF: speeded up robust features,� in Pro-
ceedings of the European Conference on Computer Vision, pp. 404-417, Graz, Austria,
May 2006.
[2] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler,
and A. Ng. �ROS: an open-source robot operating system." In Proceedings of the Open-
Source Software workshop of the International Conference on Robotics and Automation
(ICRA), Kobe, Japan. May 2009.
[3] D. Nister, O. Naroditsky, and J. Bergen �Visual Odometry", Sarno� Corporation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 652 - 659 Vol.1, Washington, USA. Jan. 2004.
[4] H. Moravec, �Obstacle avoidance and navigation in the real world by a seeing robot
rover", PhD thesis, Standford University, Standford, USA, September 1980
[5] L. Matthies and S. Shafer, �Error modeling in stereo navigation", In Proceedings of
the IEEE Journal of Robotics and Automation Vol. RA-3, No. 3, June 1987
[6] C. Olson, L. Matthies, M. Schoppers and M. W.Maimone, �Robust stereo ego-motion
for long distance navigation." In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognitionn (CVPR), pp. 453-458, Hilton Head, USA. June 2000,
41
42 UNIVERSITAT POLITÈCNICA DE CATALUNYA
[7] H. Moravec, �Towards automatic visual obstacle avoidance", In Proceedings of the 5th
international joint conference on Arti�cial intelligence, pp. 584. San Francisco, USA.
Aug. 1977.
[8] S. Lacroix, A. Mallet, R. Chatila, and L. Gallo, �Rover self localization in planetary-
like environments." In Proceedings of the International Symposium of Arti�cial Intel-
ligence, Robotics, and Automation for Space (i-SAIRAS), pp. 433 to 440. Noordwijk,
The Netherlands. June 1999
[9] W. Forstner, �A feature based correspondence algorithm for image matching." In
Archives of Photogrammetry and Remote Sensing, vol. 26, no. 3, pp. 150 - 166, 1986.
[10] A. Milella and R. Siegwart, �Stereo-based ego-motion estimation using pixel tracking
and iterative closest point." In Proceedings of the 4th IEEE International Conference
on Computer Vision Systems (ICVS), pp 21-24. New York, USA. Jan 2006.
[11] A. Howard, �Real-time stereo visual odometry for autonomous ground vehicles." In
Proceedings of the IEEE/RSJ International Conference of Intelligent Robots and Sys-
tems.(IROS) pp. 3946-3952. Nice, France. September,2008
[12] P. Besl and N.McKay, �Amethod for registration of 3-d shapes." In IEEE Transac-
tions on Pattern Analysis and Machine Intelligence vol. 14, no. 2, pp. 1992.
[13] C. Harris and M. Stephens, �A combined corner and edge detector", in Proceedings
of the Alvey Vision Conference. pp 147 - 151. 1988.
[14] M.A. Fischler and R.C. Bolles. �Random sample consensus: A paradigm for model
�tting with applications to image analysis and automated cartography." In Commu-
nications of the ACM, 24(6), pp. 381-395, June 1981.
[15] F. Fraundorfer, D. Scaramuzza, and M. Pollefeys, �A constricted bundle adjustment
parameterization for relative scale estimation in visual odometry", in Proceedings of
the IEEE International Conference on Robotics and Automation (ICRA), pp. 1899 -
1904. Anchorage, Alaska, May 2010.
Development of an Error Propagation Model for Visual Odometry 43
[16] K. Konolige, M. Agrawal, and J. Sol, �Large scale visual odometry for rough ter-
rain", in Proceedings of the International Symposium on Robotics Research (ISRR),
Hiroshima, Japan. Nov 2007.
[17] A. Comport, E. Malis, and P. Rives, �Accurate quadrifocal tracking for robust 3d
visual odometry," in Proceedings of the IEEE International Conference on Robotics
and Automation (ICRA), pp. 40 - 45. Roma, Italy. April, 2007.
[18] H. Durrant-Whyte and T. Bailey, �Simultaneous localization and mapping (SLAM):
Part I. The essential algorithms," In IEEE Robotics and Automation Magazine, vol.
13, no. 2, pp. 99-110, 2006.
[19] T. Bailey and H. Durrant-Whyte, �Simultaneous localization and mapping (SLAM):
Part II. State of the art," In IEEE Robotics and Automation Magazine, vol. 13, no. 3,
pp. 108-117, 2006.
[20] T. Lemaire and S. Lacroix, �Vision-based SLAM: Stereo and monocular approaches,"
In International Journal of Computer Vision (IJCV), vol. 74, no. 3, pp. 343-364, 2006.
[21] David G. Lowe, �Distinctive image features from scale-invariant keypoints." In Inter-
national Journal of Compute Vision, 60, 2 pp 91-11. 2004
[22] Benjamin Ochoa, Serge Belongie, �Covariance Propagation for Guided Matching" In
3rd Statistical Methods in Multi-Image and Video Processing (SMVP) 2006
[23] A. Cumani and A. Guiducci, �Visual odometry with SURFs and d�ej`avu correc-
tion." In Proceedings of the 8th Conference on Optical 3D Measurement Techniques,
Zurich, Switzerland. July 2007.
[24] John Christopher Clarke �Applications of Sequence Geometry to Visual Motion,"
PhD thesis. pp150 Robotic Research Group, Departemnt of Engineering Science, Uni-
versity of Oxford, 1997
[25] Kurt Konolige, �Sparse Sparse Bundle Adjustment" In Proceedings of the British
Machine Vision Conference, pp. 102.1 - 11 Aberystwyth, Wales. Aug 2010
44 UNIVERSITAT POLITÈCNICA DE CATALUNYA