N EW R ESULTS ON T RIANGULATION P OLYNOMIAL E QUATION … · 2008-05-22 · N EW R ESULTS ON T...

NEW RESULTS ON TRIANGULATION,POLYNOMIAL EQUATION SOLVING AND THEIR

APPLICATION IN GLOBAL LOCALIZATION

KLAS JOSEPHSON

Centre for Mathematical SciencesMathematics

MathematicsCentre for Mathematical SciencesLund UniversityBox 118SE-221 00 LundSweden

http://www.maths.lth.se/

Licentiate Theses in Mathematical Sciences 2008:7ISSN 1404-028X

ISBN 978-91-633-2870-1LUTFMA-2029-2008

c© Klas Josephson, 2008

Printed in Sweden by MediaTryck, Lund 2008

Preface

This thesis considers the problem of image based localization. This means that the objec-tive is to find the position and the direction from which an image was taken. To achievethis two subproblems are first studied, triangulation and polynomial equations. In thelast paper a complete localization system is constructed using improvements to the poseproblem.

The work of this thesis has been founded by the Swedish Research Council throughgrant no. 2004-4579 ’Image-Based Localisation and Recognition of Scenes’.

The thesis consists of the following three papers:

• K. Josephson, F. Kahl, Triangulation of Points, Lines and Conics, to appear inJournal of Mathematical Imaging and Vision, 2008.

• M. Byröd, K. Josephson, K. Åström, Fast and Stable Polynomial Equation Solv-ing for Computer Vision, manuscript for International Journal of Computer Vision,2008.

• K. Josephson, M. Byröd, F. Kahl, K. Åström, Localization with Hybrid Features,submitted to Journal of Mathematical Imaging and Vision, 2008.

The first paper considers optimal triangulation in arbitrary many views. The secondpaper concerns solving systems of polynomial equations. This can for example be usedfor fast and optimal three view triangulation. The last paper considers localization withhybrid features. In this paper the methods of paper two are necessary.

During the work with this thesis the following papers have also been written of whichsome in parts overlap with those of this thesis.

• K. Josephson, F. Kahl, Triangulation of Points, Lines and Conics, Proc. 15th Scan-dinavian Conference on Image Analysis, Aalborg, Denmark, 2007.

• K. Josephson, M, Byröd, F. Kahl, K. Åström, Image Based Localization UsingHybrid Features Correspondences, Proc. ISPRS workshop BenCOS at CVPR, Min-neapolis, MN, USA, 2007.

• M, Byröd, K. Josephson, K. Åström, Improving Numerical Accuracy of GröbnerBasis Polynomial Equation Solvers, Proc. International Conference on ComputerVision, Rio de Janeiro, Brazil, 2007.

• M, Byröd, K. Josephson, K. Åström, Fast Optimal Three View Triangulation, Proc.Asian Conference on Computer Vision, Tokyo, Japan, 2007.

3

• M, Byröd, Z. Kukelova, K. Josephson, T. Pajdla, K. Åström, Fast and RobustNumerical Solutions to Minimal Problems for Cameras with Radial Distortion,Proc. at Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA,2008.

• Z. Kukelova, M, Byröd, K. Josephson, T. Pajdla, K. Åström, Fast and RobustNumerical Solutions to Minimal Problems for Cameras with Radial Distortion,submitted to Computer Vision and Image Understanding, 2008.

• M, Byröd, K. Josephson, K. Åström, Fast Optimal Three View Triangulation, sub-mitted to IPSJ Transactions on Computer Vision and Applications, 2008

4

Acknowledgements

First of all, I would like to thank my supervisor Fredrik Kahl for always being avail-able for discussions and for his careful examination of this manuscript as well as othermanuscripts. Also Magnus Oskarsson and Olof Barr has read and examined the manuscriptand by that improved the quality considerably. I also would like to thank Kalle Åström forintroducing me to the art of solving polynomial equations by Gröbner basis techniquesand for many discussions on the subject. My gratitude also goes to the MathematicalImaging Group at the Centre for Mathematical Sciences and especially to Martin Byrödthat has been my main collaborator since he joined the research group. I also want tothank all those sharing the same coffee room for the joyful breaks in my work. AnkiOttosson needs a special mention for all help with all kinds of administrative difficulties.The football team at the Centre for Mathematical Sciences also needs to be acknowledged,together with a wish for better results this season.

Finally, I would also like to thank friends and family for all those things that are notwork.

5

Introduction

1 Background

The scope of this thesis is geometrical computer vision. The geometrical foundation ofthis field is old and probably begun in 1855 by Chasles [5]. In his paper Chasles looks atthe problem of finding the geometry between two cameras if seven point correspondencesare known between two images. The next important paper in the field was written byKruppa in 1913 [20]. In this paper he almost correctly solves the problem of relative posebetween two cameras when the inner calibration of the cameras is known. A correctedsolution of the problem was presented by Thomson in 1959 [33]. After that the fieldreally took off and the amount of publications has constantly increased.

One of the fundamental problems in geometrical computer vision is the structure andmotion problem. Given one or many images of the same object, this refers to the problemto say something about the structure of the imaged object and/or the camera motionbetween the images. The structure and motion problems can be stated in several differentways depending what is known about the different entities. In some the placement of twoor more cameras is known and the problem is to calculate the three-dimensional structure.This is the so-called triangulation problem. In other cases the three-dimensional structureof the object is known but nothing is known about the camera that took the images. Theproblem of finding the location of the camera is then known as the pose problem. Anothersituation is when neither the three-dimensional structure nor the camera positions areknown. This is called the structure and motion problem. A special case of the structureand motion problem consists of calculating the motion between two images. This isknown as the relative pose problem. All these problems can also be varied depending onthe knowledge about the inner parameters of the cameras used, such as, for example, thefocal length.

Of course there are some fundamental limitations in the problem of structure andmotion. Depending on the a priori knowledge of the camera there are different levelsof these ambiguities. For example, if the full inner information of a camera is known,that is, the camera can be seen as a tool for measuring angles, then there will only be anambiguity in size. In other words, it is impossible to know if both the object and the

7

INTRODUCTION

movements of the camera is small or if both are large.This thesis considers several of these structure and motion problems. The first paper

deals with the triangulation problem. This is a well studied area in computer vision [18],but still more work is needed especially in the area of globally optimal methods. Globallyoptimal methods were developed for two and three views for points [17, 32] but for moreviews or other structures, such as lines and conics, things get more complicated.

The second paper deals with Gröbner basis methods. Gröbner basis methods are usedto solve systems of polynomial equations which naturally arise in structure and motionproblems. The notion of Gröbner bases was introduced by Buchberger in his thesis from1965 [2] and builds on methods in algebraic geometry. The Gröbner basis methods givean algorithmic way to solve systems of polynomial equations, but these algorithms requireexact arithmetics. To overcome this problem Faugère [10] introduced the F4 method thatapplies methods from numerical linear algebra. This can be used in simpler problems butthe numerical accuracy become problematic even for this method. Due to this problem,specific solvers have been presented inspired by the F4 method. In the second paper ofthis thesis an improved method to this problem is presented that can handle more difficultproblems than previously.

1.1 Localization

"Where am I and what am I seeing?" is the question in the localization problem. Manylocalization system have been constructed over the time e.g. [27, 3]. Common for thesetwo are that they only solve the problem in two dimensions. In this thesis the focusinstead is on localization in three dimensions. In three dimensions GPS is a competingtechnic, but the GPS has several drawbacks. For example it is not possible to get thedirection from a GPS and it needs an open sky to work.

To solve the localization problem many steps are necessary to carry out. Two of thesesteps are those addressed in the first two papers. The problem of triangulation is necessaryto solve to be able to build a model by the training data and the Gröbner Basis methodhas shown to be an important step if minimal cases are to be used. The improvements onthe Gröbner Basis methods are also possible to apply for better solving the triangulationproblem.

The last paper of the thesis builds a complete localization system. To do this severalmore problems need to be addressed. One of these is the correspondence problem. Thisproblem is not directly addressed in this thesis instead commonly used methods are car-ried out. The focus is on a set of new minimal cases that are derived and used in theprocess of localization. By showing how to solve these new minimal cases, we give newcontributions towards solving the general localization problem.

Next will be a short introduction to the most fundamental concepts of geometricalcomputer vision and algebraic geometry. In the introduction an overview of importantconcepts in computer vision will be given, such as how a camera is modeled, the geometry

8

2. GEOMETRY IN COMPUTER VISION

between two images and some aspects of the triangulation problem. In algebraic geom-etry the concepts of ideal and variety are explained. This is followed by the basics onGröbner bases, and an introduction to the important concept the action matrix. For thelocalization problem the correspondence problem is shortly reviewed along with outliershandling using RANSAC.

2 Geometry in Computer Vision

2.1 The Pinhole Camera

One of the foundations of geometrical computer vision is the pinhole camera. The mod-eled pinhole camera consists of two components, the focal point and the image plane.Figure 1 shows a schematic figure of a pinhole camera. There C is the focal point and πthe image plane, further, X1 and X2 are world points and x1 and x2 are projected imagepoints.

C

π

X1

x1

X2x2

Figure 1: The geometry of a pinhole camera. Xi are points in the three dimensionalspace projected to the image points, xi on the image plane π. C is the focal point, alsoreferred to as the camera center.

The Pinhole camera can be modeled mathematically with the so called camera equa-tion,

λx = PX. (1)

In this equation λ is the distance between the focal point and the world point and boththe image and the world point are in homogeneous/extended coordinates. P is the cameramatrix of size 3× 4 and holds both the intrinsic and extrinsic parameters of the camera.To understand why (1) models a pinhole camera put the focal point in the origin and theimage plane as z = f where f is the focal length, that is, the distance between the imageplane and the focal point. In this case the setup looks as in Figure 2.

9

INTRODUCTION

z

y

X

π

C

fYZ

f

Y

Z

Figure 2: A pinhole camera with the image plane, π, placed at z = f where f is the focallength of the camera. The focal point C is placed in the origin and the world point X isprojected to the image plane.

As can be seen in Figure 2 a point X = (0, Y, Z)T is mapped to the point (0, fY/Z, f)T .Since it is possible to rotate this setup to include X , with values different from zero, themapping becomes,

(X,Y, Z)T 7→ (fX

Z,fY

Z)T . (2)

With homogenous coordinates this mapping can be expressed with matrix multiplicationaccording to

XYZ1

7→fXfYZ

=

f 0f 0

1 0

XYZ1

. (3)

The principal point is the orthogonal projection of the focal point on the image plane.During this derivation it is assumed that the pinhole camera has its principal point at theorigin of the image plane. This can not be taken for granted so the mapping in (3) canbe adjusted to account for different principal points by putting

XYZ1

7→fX + ZpxfY + Zpy

Z

=

f px 0f py 0

1 0

XYZ1

, (4)

where (px, py) is the principal point. The camera matrix can now be written as λx =

10


K[I | 0]X, where K is

K =

f pxf py

1

. (5)

There are two more calibration parameters that are associated with the design of theCCD, the image registration sensor of a digital camera. The first of those is the aspectratio of the pixels, γ. If the pixels are square than γ equals 1, which is true for almostall CCD’s. The second parameter is the skew, s. The skew parameter will be zero if thex and y axes are perpendicular on the CCD, which for almost all cameras will be true.These parameters are included in the calibration matrix as follows,

K =

f s pxγf py

1

. (6)

This is the full calibration matrix that is used to model a pinhole camera. The only thingthat are still needed is to move the camera away from the origin and to incorporate fullfreedom in rotation. This is done by defining the general camera matrix as,

P = K[R | t], (7)

where R is an orthogonal matrix and t a translation vector. This is the the camera matrixfor an arbitrary pinhole camera. With this camera (1) represents a projection through apinhole camera in homogenous coordinates. This model of a camera is used throughoutthis thesis. For a more exhausted derivation of the camera equation see [18].

2.2 Two View Geometry

Geometrical computer vision often includes several cameras, or several images with dif-ferent views with the same camera. In two-view geometry, the most important propertyis the epipolar geometry. The epipolar geometry is the intrinsic geometry between twoviews. One main motivation to the study of epipolar geometry is the search for corre-spondences in triangulation. The reason for this is that the epipolar geometry defines aline to search along in the second image to find a corresponding point, given a point inthe first image. But the epipolar geometry is also used the other way around, the epipolargeometry can be calculated through several correspondences between two images whichthen can be used to find the camera matrices in the relative pose problem.

Given two corresponding points x and x′ in two images, then these two points havea common point X in the three dimensional space. With the information from one ofthe images the possible locations of X are reduced to a line i.e. the point is known butnot the depth. This line is projected down to a line in the second image and x′ should beplaced on this line, at least in absence of noise. The line which the corresponding pointcan be located on is called the epipolar line.

11

INTRODUCTION

The geometric entities of the epipolar geometry are shown in Figure 3. The epipolesare the intersection of the image planes with the baseline joining the camera centers.Another way to characterize the epipoles is that they are the images of the focal pointsof the other. The second entity is the epipolar plane, which is a plane containing thebaseline and the point of interest in the three dimensional space. The last entity is theepipolar line, which is the intersection of an epipolar plane with the image plane. If theepipolar geometry is known between two cameras, a point in one image defines a epipolarline in the other image. Furthermore the epipolar line always intersects the epipole.

C C′e e′

x′x

l l′

XΠ

Figure 3: A geometry with two cameras and a point X projected down to x and x′,respectively, in the cameras. The baseline between the two camera centra, C and C′,intersect the image planes in the epipoles, e and e′. The lines l and l′ in the images,given by the plane Π, are called epipolar lines. The plane Π is defined by the cameracenters and the point X.

2.2.1 The Fundamental Matrix

To algebraically describe the epipolar geometry between two cameras the fundamentalmatrix is used. The fundamental matrix holds the information of how a point in one ofthe images generate the corresponding epipolar line in the other image and the funda-mental matrix F will map points to lines in homogeneous coordinates according to,

l′ = Fx. (8)

According to the properties of the epipolar geometry described in the last section thecorresponding point x′ to x has to be contained in the epipolar line l′, hence

x′T l′ = x′TFx = 0, (9)

has to be fulfilled for all corresponding points. The fundamental matrix has several im-portant properties. If F is the fundamental matrix of a camera pair (P, P ′) then FT isthe fundamental matrix for the pair (P ′, P ). This follows directly from (9). Furthermore

12


the fundamental matrix is only given up to scale and has rank 2. The reason why F hasrank 2 can be understood by the fact that the fundamental matrix maps points into linesand all these lines intersect in the epipole. Hence it is enough to give a point on a circlearound the epipole.

The fundamental matrix has seven degrees of freedom. The nine elements of thefundamental matrix encodes these seven degrees of freedom since it is homogenous andthat det(F ) = 0 due to the fact that it does not has full rank.

2.2.2 The Essential Matrix

If the cameras are calibrated the degrees of freedom for the fundamental matrix get re-duced, and the name also changes to the essential matrix. Without calibration the funda-mental matrix had seven degrees of freedom but the essential matrix has only five. Thiscan be realized since if the cameras are calibrated the first camera can be set P = [I | 0]and the second to P = [R | t]. Both the translation, t, and the rotation, R, have threedegrees of freedom but since the essential matrix only is given up to scale it only remainsfive degrees of freedom.

The essential matrix is defined in the same way as the fundamental matrix,

x′TEx = 0, (10)

where x′ and x are the coordinates when the effects of the calibration is removed, so-callednormalized coordinates. From this definition and the relation x = K−1x it follows thatx′TK ′−TEK−1x = 0. This equation gives the connection between the essential andfundamental matrices to be as follows,

E = K ′TFK. (11)

The reduced number of degrees of freedom gives additional constraints on the essen-tial matrix compared to the fundamental matrix. These constraints are that the essentialmatrix is a matrix where two of the singular values values are equal and the last zero [18].

In [28] Nistér uses the property,

2EETE − tr(EE′)E = 0, (12)

introduced by Demazure [8] and later also used by [9], of the essential matrix. Thisproperty is derived from the fact that two singular values are equal and is used to find theessential matrix given the minimum set of five correspondences between two calibratedcameras. This constraint has later been used with different constraints on the calibra-tion matrix to solve several minimal cases in geometrical computer vision [22, 30]. Thismethod is also used in several occasions in this thesis.

13

INTRODUCTION

2.3 The Triangulation Problem

A classical problem in geometrical computer vision is the triangulation problem for points.The formulation of this problem is:

Problem 2.1 (Triangulation). Given two or more images and corresponding points in those,find which point in the world that is the source given that the cameras are known.

Figure 4 holds a sketch of the problem of triangulation, where the task of triangulationis to find the point X that best fulfills the image points.

X

Figure 4: Sketch of multiple view triangulation. If there were no noise, all lines wouldintersect in a single point, but due to noise this is not true. The problem of triangulationis to find the point X that is the best approximation to such a point given some type ofmeasure.

The triangulation problem is well studied and hence a lot of literature can be found,see [18, 26] and the references therein. In the absence of noise this problem is trivial andcan be solved with two views by constructing the matrix,

A =

xpT3 − pT1ypT3 − pT2x′p

′T3 − p

′T1

y′p′T3 − p

′T2

, (13)

where pi, p′i is the i:th row of the camera matrices P and P ′, respectively, of the twoviews. This matrix will not have full rank and the null space will be the sought after pointin homogeneous coordinates.

If there is noise in the images the problem becomes more difficult. First of all anoptimization criterion has to be chosen. The most common choice and the statisticallyoptimal given gaussian noise is the reprojection errors in L2-norm. In recent years alsothe L∞-norm have bee given more interest, see e.g. [16, 19] but in this thesis the focus ison the L2-norm.

14

3. ALGEBRAIC GEOMETRY

If the L2-norm is chosen the most widely used method to triangulate a point is to uselocal optimization methods based on gradient decent. After an initialization using a linearmethod, so called bundle adjustment is applied [18]. This method is used even thoughit is well known that the error function will not be convex and hence local minima canexists.

To overcome the problem of local minima, global optimal methods are interesting.Such methods have been given for two and three views. The solution of the two viewcase was given by Hartley and Sturm in 1997 [17]. The solution is given by calculatingthe stationary points to the cost function. This leads to solving a polynomial of degree6. The three view case was solved in 2005 by Stewénius et al. [32]. Also this problem issolved by calculating the stationary points to the cost function, but in this problem, thenumber of those points is 47. To solve that problem Gröbner basis methods are used.The paper of Stewénius et al. was one of the main inspirations to the second paper of thisthesis due to the numerical difficulties they had. The problem of three view triangulationis used as a benchmarking problem in our paper.

If more than three views are used there are to our knowledge no published globalmethods on closed form. You may argue that the solutions to two and three views areneither closed but they can be solved by an eigenvalue problem. Instead of these methodsother approaches need to be taken. This was done in 2006 by Agarwal et al. [1]. Thismethod is based on a branch and bound algorithm where the bounding is accomplishedby fractional programing. Branch and bound algorithms have theoretically an exponentialcomplexity but if both the branching and bounding are done wisely the complexity canin real examples be considerably lower. In the paper by Agarwal et al. it is shown that thisindeed is the case with their solution.

2.3.1 Triangulation of Other Geometric Structures

In the paper by Agarwal et al. only points are considered. But not only points are of in-terest in the problem of triangulation. Other structures such as lines, conics and arbitrarypatches, can be interesting to triangulate. The triangulation of lines in two views givesan exactly determined line in the three dimensional space, since a line in space has fourdegrees of freedom.

In the first paper of this thesis an extension to the method used by Agarwal is giventhat deals with lines and conics that also gives the globally optimal triangulated structures.

3 Algebraic Geometry

Due to the structure of the camera equation (1) minimal problems in geometrical com-puter vision often lead to solving systems of polynomial equations. This is also true ifequation (12) is studied. The occurrence of polynomial systems get even more obviousfor the camera equation if the camera is calibrated. In this case only the rotation and

15

INTRODUCTION

translation have to be determined. With use of a quaternion q = (a, b, c, d)T and atranslation vector t = (x, y, z)T the camera matrix can then be described as follows:

P =

a2 + b2 − c2 − d2 2bc− 2ad 2ac+ 2bd x2ad+ 2bc a2 − b2 + c2 − d2 2cd− 2ab y2bd− 2ac 2ab+ 2cd a2 − b2 − c2 + d2 z

. (14)

Under these prerequisites it is natural that many computer vision problems end up insolving a system of m polynomial equations in s variables, x = (x1, . . . , xs), formalizedas,

f1(x) = 0,...

fm(x) = 0.

(15)

To solve systems like this many methods exist, see [4] and references therein. The methodstudied to solve these problems in this thesis is through algebraic geometry, see [6, 7] fora introduction to the field.

One of the benefits of using methods from algebraic geometry in computer visionis that these can be solved by fast algorithms under certain conditions. In these cases aproblem specific solver can be constructed that computes a solution in a few millisecondson a standard PC. This is a very attractive feature since these often are used in the kernelof a hypothesis and test algorithm and hence have to be executed many times.

The two most important objects in algebraic geometry are the variety and the ideal.

Definition 3.1 (Variety). The variety V is the set of points x where all equations in a systemof polynomial equations vanish.

The variety does not need to be a finite set but in this thesis only varieties consisting of afinite set of points are considered.

Definition 3.2 (Ideal). The ideal I generated of a set of polynomials equation fi = 0 is,ideal I =

∑i hi(x)fi(x), where hi ∈ C[x].

In this definition C[x] denotes the set of all polynomials over the complex numbers.The reason to study the ideal I to a set of equation as in (15) is that a point x is a

zero of (15) iff it is a zero to the ideal. Now the ideal is used to define equivalence of twopolynomials. This is done by saying that two polynomials f and g are equivalent moduloI iff f − g ∈ I , denoted by f ∼ g. With this definition a quotient space C[x]/I isachieved of all equivalent classes modulo I .

It is also possible to construct equivalence classes based on the variety V . Here twopolynomials are said to be equivalent if they are equal on all x ∈ V . It is obvious that iftwo polynomials are equivalent modulo I then they are also equal on V . If furthermorethe ideal is a radical, that is [7], if fm ∈ I for somem ≥ 1 then f ∈ I , then the converseis also true and the two structures are isomorphic.

16

3. ALGEBRAIC GEOMETRY

The most important property is that the dimension of C[x]/I is equal to the numberof solutions of (15).

3.1 The Action Matrix

Knowledge of ideals and varieties can be used to solve systems of polynomial equations.To understand this let us start of by looking at the problem of finding the roots to apolynomial of degree three. If the polynomial is

f(x) = x3 + a2x2 + a1x+ a0, (16)

then the roots to can be found through the so-called companion matrix,−a2 1 0−a1 0 1−a0 0 0

. (17)

The eigenvalues of the companion matrix are equal to the zeros of (16). This is easilyseen since the characteristic polynomial of (17) is equal to (16). This is well known andfor example used by the Matlab command roots as it is a numerically stable way to findthe roots.

The method of the companion matrix can be extended to the multivariate case [6, 23].Consider again the system of polynomial equations in (15), and the corresponding idealI and variety V . Since the dimension of the quotient space is equal to the number ofsolutions to the system, the quotient space forms a linear vector space with the samedimensionality as the number of solutions. Take p ∈ C[x] and consider the operationTp : f(x) 7→ p(x)f(x). This operation is now linear and given some basis in C[x]/Iit can be represented with a matrix mp. This matrix is called the action matrix and is ageneralization of the companion matrix.

The action matrix mp has some nice properties. The eigenvalues of mp are p(x)evaluated at the points of V. Furthermore, the eigenvectors of mT

p are equal to the vectorelements evaluated on V . To understand this consider an arbitrary polynomial p(x). Itcan be represented as p(x) = cTb, where c is a vector with coefficients and b a vectorof monomials forming a basis of the quotient space. Further, let [·] denote the reductionmodulo I , then the following holds for any c,

[p · cTb] = [(mpc)Tb] = [cTmTp b]. (18)

Since its holds for any c it holds especially for [pb] = [mTp b]. If this is evaluated where

x ∈ V the brackets can be left out and the result becomes,

p(x)b(x) = mTp b(x), for x ∈ V. (19)

This is recognized as an eigenvalue problem for mTp . Hence the eigenvalues of mT

p isequal to p(x) evaluated at the points of V and the eigenvectors are the basis elements ofb evaluated at the zeros.

17

INTRODUCTION

3.2 Gröbner Basis

The difficulties in solving systems of polynomial equations are now reduced to construct-ing the action matrix since the eigenvalue problem is well studied and several algorithmsto solve the problem exist [13]. One way to find the action matrix is to select a linearbasis B of C[x]/I and for each element of B calculate [p · bi] where bi ∈ B. To do thesecalculations it is necessary to be able to represent each equivalent class of C[x]/I by a welldefined representative.

One way to find these representatives is to use so called Gröbner bases [7]. The keyidea is to use multivariate polynomial division. A Gröbner basis exists for each idealand yields a well defined reminder for every polynomial. Well defined means that for all

f1, f2 ∈ [f ] the reduction with the Gröbner basis f1G

= f2G

. There is also an algo-rithm called Buchberger’s algorithm [7] that is guaranteed to find this basis. This methodunfortunately only works with exact arithmetics. The Gröbner basis is also dependent ofmonomial order that is needed to define a consistent division of multivariate polynomials.

The reason way Buchberger’s algorithm only works with exact arithmetics is that ineach step a check is done if a remainder is zero. With floating points arithmetics it isimpossible to be certain about that since the calculations always are plagued by round oferrors. There has been work to overcome this problem by Faugère [10]. The main ideaof his work is to do many reductions in one step using well studied methods of linearalgebra. This is done by putting the system in (15) on the form,

CX = 0, (20)

where X = [xα1 , . . . ,xαn ]T is a vector holding all occurring monomials and C is acoefficient matrix. With this notion all operation may be done on the coefficient ma-trix C and the numerics can be enhanced with different methods from numerical linearalgebra [13].

This method of working with Gröbner basis methods has been adopted by the com-puter vision community and used to write special purpose solvers to several problemssuch as, relative pose with two calibrated cameras [29], autocalibration with radial distor-tion [21], relative pose with unknown but common focal length [30], optimal three viewtriangulation [32], calculation of the fundamental matrix for a catadioptric camera [12]and more. Even though the method of Faugère improves the numerical stability it is notalways enough. This becomes obvious in the problem of optimal three view triangulationwhere the authors were forced to used 128 bits mantissa on the floats to be able to getaccurate results, which resulted in an extremely slow solver.

4 Global Localization

The problem of global localization is also known as the kidnapped robot problem. Thereason for this is that the problem to solve is to find the location of the robot where the

18

4. GLOBAL LOCALIZATION

only knowledge is some kind of model. In this thesis it is done by using an image so theproblem can be formulated as:

Problem 4.1 (Localization). Given a query image and a model of the 3D scene structure,find where the image was taken and how the camera was oriented relative to the model.

In general it is not necessary that the query is in form of an image, even other methodshave been used such as laser sensors [3].

To solve the localization problem several methods from computer vision have to beused. The triangulation problem, that is the subject for the first paper of this thesis,is an important step in the construction of a model. The Gröbner Basis methods havealso been showed to be useful in this type of problem. Also other problems have to besolved among those is the correspondence problem. The correspondence is the problemof finding common points between a query image and the model. This problem is notdirectly considered in this thesis instead a widely used method is applied, the methodincludes solving many minimal geometrical problems.

With the triangulation and correspondence problems and several minimal cases thatare solved by using Gröbner Basis technics it is possible to build a global localizationsystem. This is the theme of the last paper of this thesis. In that paper several newminimal cases are derived that are shown useful in global localization.

In the next two subsections a short introduction to the correspondence problem andto minimal problems are given.

4.1 The Correspondence Problem

The correspondence problem is, given two or more images of the same scene find a setof points which can be identified as the same points in another image. This problem isnot directly addressed in this thesis. Instead one of the commonly used methods in thecomputer vision community is used. This means that interest points are located withthe Maximally Stable Extremal Regions (MSER) method [25]. This method looks forregions that have similar intensity. These regions are then used as interest points, wherethe points is the center of the regions. For each such point a descriptor is calculatedthat makes it possible to locate the point in the model. The descriptor used is the SIFTdescriptor [24]. The SIFT descriptor is rotation invariant and moderately scale invari-ant. The SIFT descriptor looks at gradients in the region around an interest point andconstructs a histogram over directions of the gradients. This information is stored in128 element feature vectors that are used for matching.

When the matching between model and query image has been done there will bea large number of outliers in a vast majority of the cases. One method to handle theoutliers is to use the RANSAC method (RANdom SAmple Consensus). RANSAC wasintroduced by Fischler and Bolles [11]. RANSAC takes a small random subset, usuallyminimal, of all correspondences and calculates a possible solution. The solution is then

19

INTRODUCTION

tested on all correspondences and the number of correspondences that agree with thesolution is counted as inliers. This procedure is performed a large number of times andthe inliers to the solution with most inliers are then considered to be true inliers. On thislarger set it is now possible to apply other methods in order to calculate a better solution,typically involving local non-linear optimization. In Figure 5 an example of fitting linesto data with outliers using RANSAC is shown.

d

c

a

b

Figure 5: Estimation of a line using RANSAC: The open points are outliers and the solidpoints inliers. A random minimal set of two points is chosen and an exact fitting is madeto this minimal set. One random set is (a,b) and the other (c,d). In both cases amargin is added (dashed lines) and the number of points that falls within the margin arecounted as inliers. In this case the (a,b) set gives 10 inliers whereas (c,d) only resultsin two. Hence the inlier set of (a,b) is chosen.

4.2 Minimal Cases

In RANSAC minimal cases are the standard method to find the solutions for a random-ized set of correspondences. In Figure 5 the minimal set is two points to give a test line.In the pose problem this, of course, gets more complicated. One well know minimalcase for pose problems is if the camera is calibrated, i.e. K in equation (6) is known. Inthis case it is only necessary to have three correspondences between the image and knownworld points to decide the pose of the camera, see [15]. On the other hand, this does notgive a single solution, as in Figure 5, instead it can give up to four possible solutions.

The given minimal case is applicable if the model holds three-dimensional points.Another way to attack the problem is to let the model hold other cameras with pointsin those images. Then the problem becomes to decide the relative pose between twocameras.

Problem 4.2 (Relative Pose). Given a model with known cameras and known image pointsin those cameras, find where a query image was taken and how the camera was orientated.

20

5. SUMMARY OF THE PAPERS

This problem was as mentioned before almost solved by Kruppa [20] in 1913 andcorrected by Thomson in 1959 [33] given that calibrated cameras are used. If uncali-brated cameras are used, the problem was solved by Chasles [5] in 1855. This problemrequires at least seven corresponding points and can give up to three possible solutions.

It is not necessary that the camera is either calibrated or uncalibrated. There arealso levels in between. One of these cases of semi-calibrated camera is when the innercalibration of the camera is known except for the focal length. In this case it is stillpossible to solve the pose problem with a minimal setup, which was shown by Stewéniuset al. in [30]. The minimal setup in this case is six corresponding points and it yields 15solutions.

In the problem of relative pose it is not necessary that all correspondences refer to thesame camera. If this is not the case Stewénius et al. [31] solved the problem and showedthat there can be up to 64 solutions if 6 correspondences are used when the unknowncamera is calibrated.

5 Summary of the Papers

PAPER I — Triangulation of Points, Lines and Conics

This paper approaches one of the most fundamental problems in computer vision, thetriangulation problem. The paper continues on a work by Agarwal et al. [1] presentedin 2006, where a method was presented that guaranteed a globally optimal solution fortriangulation of points in arbitrary many views, if the error measure was the L2-norm ofthe reprojection errors. The method is based on a branch and bound algorithm where theunder estimators are calculated through fractional programming.

In the paper the method is extended in two directions. First it is shown how to handleuncertainties in measurements of the image objects. It is also shown how the method canbe applied to lines and conics. In the line case Plücker coordinates [18] are used whichleeds to a more complicated relaxation in the bounding step of the algorithm.

Author contribution: This paper is the result of work carried out by myself and mysupervisor Fredrik Kahl. I acted as first author and carried out the experiments.

PAPER II — Fast and Stable Polynomial Equation Solving for Com-puter Vision

This paper addresses the problem of numerical instability when systems of polynomialequations are solved with Gröbner basis methods. This is a common problem; e.g. in [32]Stewénius et al. were forced to use emulated floats with higher accuracy than usual, whichresulted in a very slow algorithm. Another example of numerical instability is the pa-per [22] where Kukelova et al. were not able to construct a floating point solver due tonumerical problems.

21

INTRODUCTION

To overcome the numerical problems two new methods are presented, and a combi-nation of these. The first method truncates the ideal and makes the basis over-redundant.This gives additional false solutions but we show that all true solution still appear. Theother method used is a change of monomials used for the basis in the quotient space.This is also extended so that polynomials can be used in the basis instead of monomials.The combination of these methods is also used with an adaptive step deciding to whatextent the truncation should be done.

The methods are tested on several recently reported problems involving a step wherea system of polynomial equations was solved. On all these problems a significant increasein numerical stability is shown with a small extra cost in complexity.

Author contribution: This is joint work by myself, Martin Byröd and Kalle Åström.Martin acted as first author but I carried out most of the experiments.

PAPER III — Localization Using Hybrid Feature Correspondences

In the last paper of this thesis the pose problem is considered. Several new minimal casesare considered with different degree of calibration of the camera. The minimal cases arebased on what we call hybrid features. By hybrid features we mean that both correspon-dence to other images and correspondences to known world points are used simultane-ously. For most of these minimal cases upper bounds on the number of solutions aregiven. Most of these bounds are found by Gröbner basis methods using Macaulay 2 [14].Some of the new minimal cases are also solved by Gröbner basis methods and in somecases the improved methods of Paper II were necessary to achieve a solution.

Author contribution: This is a joint work of myself, Martin Byröd and our super-visors Fredrik Kahl and Kalle Åström. I acted as first author and carried out most of theexperiment with aid primary by Martin.

22

Bibliography

[1] Sameer Agarwal, Manmohan Krishna Chandraker, Fredrik Kahl, David J. Krieg-man, and Serge Belongie. Practical global optimization for multiview geometry. InProc. 9th European Conf. on Computer Vision, Graz, Austria, pages 592–605, 2006.

[2] B. Buchberger. On finding a vector space basis of the residue class ring modulo azero dimensional polynomial ideal (German). PhD thesis, University of Innsbruck,Austria, 1965.

[3] W. Burgard, A.B. Cremers, D. Fox, D. Hahnel, G. Lakemeyer, D. Schulz,W. Steiner, and S. Thrun. The interactive museum tour-guide robot. In NationalConference on Artificial Intelligence (AAAI’98), Madison, Wisconsin, USA, 1998.

[4] Eduardo Cattani, David A. Cox, Guillaume Chèze, Alicia Dickenstein, MohamedElkadi, Ioannis Z. Emiris, André Galligo, Achim Kehrein, Martin Kreuzer, andBernard Mourrain. Solving Polynomial Equations: Foundations, Algorithms, and Ap-plications (Algorithms and Computation in Mathematics). Springer-Verlag New York,Inc., Secaucus, NJ, USA, 2005.

[5] M. Chasles. Question 296. Nouv. Ann. Math., 14(50), 1855.

[6] D. Cox, J. Little, and D. O’Shea. Using Algebraic Geometry. Springer Verlag, 1998.

[7] D. Cox, J. Little, and D. O’Shea. Ideals, Varieties, and Algorithms. Springer, 2007.

[8] M. Demazure. Sur deux problemes de reconstruction. Technical Report 882, IN-RIA, Rocquencourt, France, 1988.

[9] O.D. Faugeras and S. Maybanks. Motion from point matches: multiplicity ofsolutions. Int. J. Computer vision, 1990.

[10] J.-C. Faugère. A new efficient algorithm for computing gröbner bases (f4). Journalof Pure and Applied Algebra, 139(1-3):61–88, 1999.

23

INTRODUCTION

[11] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for modelfitting with applications to image analysis and automated cartography. Communi-cations of the ACM, 24(6):381–95, 1981.

[12] C. Geyer and H. Stewénius. A nine-point algorithm for estimating para-catadioptricfundamental matrices. In Proc. Conf. Computer Vision and Pattern Recognition, Min-neapolis, USA, June 2007.

[13] G. H. Golub and C. F. van Loan. Matrix Computations. The Johns HopkinsUniversity Press, 3rd edition, 1996.

[14] D. Grayson and M. Stillman. Macaulay 2. Available athttp://www.math.uiuc.edu/Macaulay2/, 1993-2002. An open source computeralgebra software.

[15] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nölle. Analysis and solutions for thethree point perspective pose estimation problem. In Proc. Conf. Computer Visionand Pattern Recognition, pages 592–598, 1991.

[16] R. Hartley and F. Schaffalitzky. L∞ minimization in geometric reconstruction prob-lems. In Proc. Conf. Computer Vision and Pattern Recognition, Washington DC, pages504–509, Washington DC, USA, 2004.

[17] R. Hartley and P. Sturm. Triangulation. Computer Vision and Image Understanding,68:146–157, 1997.

[18] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cam-bridge University Press, 2004. Second Edition.

[19] F. Kahl. Multiple view geometry and the L∞-norm. In International Conference onComputer Vision, pages 1002–1009, Beijing, China, 2005.

[20] E. Kruppa. Zur Ermittlung eines Objektes aus Zwei Perspektiven mit innerer Ori-entierung. Sitz-Ber. Akad. Wiss., Wien, math. naturw. Kl. Abt, IIa(122):1939–1948,1913.

[21] Z. Kukelova and T. Pajdla. A minimal solution to the autocalibration of radialdistortion. In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEEConference on, 2007.

[22] Z. Kukelova and T. Pajdla. Two minimal problems for cameras with radial distor-tion. In Proceedings of The Seventh Workshop on Omnidirectional Vision, CameraNetworks and Non-classical Cameras (OMNIVIS), 2007.

[23] D. Lazard. Resolution des systemes d’equations algebriques. Theor. Comput. Sci.,15:77–110, 1981.

24

BIBLIOGRAPHY

[24] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. Journalof Computer Vision, 2004.

[25] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo frommaximally stable extremal regions. Image Vision Computing, 22(10):761–767, 2004.

[26] C. McGlone, E. Mikhail, and J. Bethel, editors. Manual of Photogrammetry, 5thEdition. 2004.

[27] P. Newman, J. J. Leonard, J. D. Tardos, and J. Neira. Explore and return: Exper-imental validation of real-time concurrent mapping and localization. In Proc. Int.Conf. on Robotics and Automation, Washington D.C., USA, pages 1802–1809, 2002.

[28] D. Nistér. An efficient solution to the five-point relative pose problem. IEEE Trans.Pattern Analysis and Machine Intelligence, 26(6):756–770, June 2004.

[29] H. Stewénius, C. Engels, and D. Nistér. Recent developments on direct relativeorientation. ISPRS Journal of Photogrammetry and Remote Sensing, 60:284–294,June 2006.

[30] H. Stewénius, F. Kahl, D. Nistér, and F. Schaffalitzky. A minimal solution forrelative pose with unknown focal length. In Proc. Conf. Computer Vision and PatternRecognition, San Diego, USA, 2005.

[31] H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström. Solutions to minimalgeneralized relative pose problems. In Workshop on Omnidirectional Vision, BeijingChina, October 2005.

[32] H. Stewénius, F. Schaffalitzky, and D. Nistér. How hard is three-view triangulationreally? In Proc. Int. Conf. on Computer Vision, pages 686–693, Beijing, China, 2005.

[33] E. H. Thompson. A rational algebraic formulation of the problem of relative orien-tation. Photogrammetric Record, 14(3):152–159, 1959.

25

PAPER I

To appear in Journal of Mathematical Imaging and Vision, 2008.

Triangulation of Points, Lines andConics

Klas Josephson and Fredrik Kahl

Abstract

The problem of reconstructing 3D scene features from multiple views with known cameramotion and given image correspondences is considered. This is a classical and one of the mostbasic geometric problems in computer vision and photogrammetry. Yet, previous methods failto guarantee optimal reconstructions - they are either plagued by local minima or rely on anon-optimal cost-function. A common framework for the triangulation problem of points,lines and conics is presented. We define what is meant by an optimal triangulation based onstatistical principles and then derive an algorithm for computing the globally optimal solution.The method for achieving the global minimum is based on convex and concave relaxations forboth fractionals and monomials. The performance of the method is evaluated on real imagedata.

1 Introduction

Triangulation is the problem of reconstructing 3D scene features from their projections.Naturally, since it is such a basic problem in computer vision and photogrammetry, thereis a huge literature on the topic, in particular, for point features, see [7, 13]. The standardapproach for estimating point features is a two-step approach:

(i) Use a linear least-squares algorithm to get an initial estimate.

(ii) Refine the estimate (so called bundle adjustment) by minimizing the sum of squaresof reprojection errors in the images.

This methodology works fine in most cases. However, it is well-known that the cost-function is indeed non-convex and one may occasionally get trapped in local minima [6].The goal of this paper is to derive the statistically optimal cost-function, cf. [12], andan algorithm which gives the globally optimal solution with a guarantee. The algorithmmay serve as a benchmarking tool for other methods which are non-optimal. On the

29

PAPER I

other hand, as we shall see, the developed algorithm shows good empirical performanceon real data and it can be used for reliably computing optimal estimates in a structurefrom motion system.

In [6], the two-view triangulation problem for points was treated. The solution to theoptimal problem was obtained by solving a sixth degree polynomial. This was generalizedfor three views in [16], but the resulting polynomial system turns out to be of very highdegree and the solution method based on Gröbner bases becomes numerically unstable.Even though the state-of-the-art methods have improved in this area [2], this approach isnot generalizable for an arbitrary number of views. In [11] convex linear matrix inequali-ties relaxations are used to approximate the non-convex cost-function (again, in the pointcase), but no guarantee of actually obtaining the global minimum is provided. For lineand conic features, the literature is limited to closed-form solutions with algebraic cost-functions and to local optimization methods, see [7] and the references therein. Globaloptimization techniques have also been applied to other problems in multiple view ge-ometry, see [10] as well as the survey paper [5].

In this paper, we present a common framework for the triangulation problem for anynumber of views and for three different feature types, namely, points, lines and conics.An algorithm is presented which yields the global minimum of the statistically optimalcost-function. Our approach is most closely related to the work in [9], where fractionalprogramming is used to solve a number of geometric reconstruction problems includingtriangulation for points. Our main contributions are the following. First, we show howa covariance-weighted cost-function - which is the statistically correct thing to consider- can be minimized globally. For many feature detectors, e.g., [4, 19], it is possible toobtain information of position uncertainty of the estimated features. Second, we presenta unified framework for the triangulation problem of points, lines and conics and thecorresponding optimal algorithms. Finally, from an algorithmic point of view, we applyconvex and concave relaxations of monomials in the optimization framework in order tohandle the Plücker constraints for line features. To our knowledge, global optimizationover the Plücker manifold has not been done previously in multiple view geometry.

The paper is organized as follows. Section 2 contains a recapitulation on projectivegeometry and how points, lines and conics projection can be written in a similar man-ner. In Section 3 we give expressions for the geometrical reprojection error and state theproblem we solve. Further on in Section 4 the optimization using a branch and boundalgorithm is described, including calculation of a lower bound for the objective function.In Section 5 evaluation of the method is presented with experiments on three differentdata set.

2 Projective Geometry

In the triangulation of points, lines and conics, it is essential to have the formulation of theprojection from the three dimensional space to the two dimensional image space in the

30

2. PROJECTIVE GEOMETRY

same way as the standard projection formulation used in the point case. For that reasonwe begin with a short recapitulation of the projection of points with a standard pinholecamera. After that the methods to reformulate the projection of lines and quadrics intosimilar equations are considered. For more reading on projective geometry see [7].

2.1 Points

A perspective/pinhole camera is modeled by,

λx = PX, λ > 0, (1)

where P denotes the camera matrix of size 3 × 4. Here X denotes the homogeneouscoordinates for the point in the 3D space, X = [U V W 1]T , and x denote the coor-dinates in the image plane, x = [u v 1]T . The scalar λ can be interpreted as the depth,hence λ > 0 if the point appears in the image. This property is useful in the optimizationperformed in this paper.

2.2 Lines

Lines in three dimensions have four degrees of freedom - a line is determined by the in-tersection of the line with two predefined planes. The two intersection points on the twoplanes encode two degrees of freedom. Even if lines only have four degrees of freedom,there are no universal way of representing every line in P4. One alternative way to repre-sent a line is to use Plücker coordinates. With Plücker coordinates, the line is representedin an even higher dimensional space P5. The over parameterization is hold back by aquadratic constraint that has to be fulfilled for every line.

The Plücker coordinates to a line in three dimensional space can be obtained fromthe Plücker matrix,

L = ABT −BAT , (2)

where A and B are two arbitrary 3D points in homogeneous coordinates on the line. Itis easy to see that L is a skew-symmetric 4 × 4 matrix. Further on are the six Plückercoordinates defined as the elements1,

L = {l12, l13, l14, l23, l42, l34} (3)

of L.From the fact that L has rank 2 and hence detL = 0 it follows that the coordinates

satisfy the constraintl12l34 + l13l42 + l14l23 = 0. (4)

1Alternative definitions are also used.

31

PAPER I

The Plücker coordinates are also easy to deal with when changing coordinate systems.If a points X transforms according to X ′ = HX , the construction of the Plücker matrixgives,

L′ = (HA)(HB)T − (HB)(HA)T == H(ABT )HT −H(BAT )HT = (5)

= H(ABT −BAT )HT = HLHT .

Thus the Plücker matrix transforms as L′ = HLHT . Further, it is easy to show that thecoordinates are independent of the choice of points defining the line.

One advantage with the Plücker representation of lines is that it is possible to con-struct a Plücker camera, PL , that makes it possible to write the projection formula in asimilar way as in the point case. The Plücker camera equation looks like,

λl = PLL, (6)

where PL is the so called Plücker camera. The Plücker camera is given by

PL =

p2 ∧ p3

p3 ∧ p1

p1 ∧ p2

, (7)

where pTi denote the rows of the point camera matrix and pi ∧ pj denotes the Plücker

line constructed by pi and pj . The Plücker camera is hence a 3× 6 matrix with elementsthat are quadratic in the elements of the standard camera matrix. Further, l denotes theprojected image line, represented by a 3× 1 vector.

One difference, however, in the projection equation for lines compared with that ofpoints is that it is not possible to interpret the scale λ as depth.

2.3 Conics

As for lines, we are interested in writing the projection of a quadric to an image conic inthe form of the projection formula for points. Before that we first make a short recapitu-lation on conics and quadrics.

A general conic in the plane is defined by the quadratic form

xT cx = 0. (8)

Here x are points in homogeneous coordinates on the edge of the conic and c is a 3× 3symmetric matrix. In a similar manner a quadric in space is defined by the quadraticform,

XTCX = 0, (9)

where X is a point in homogeneous coordinates and C a 4× 4 symmetric matrix.

32

3. TRIANGULATION

When quadrics are projected through a pinhole camera (1), is it a lot easier to workwith the duals to the conics and quadrics. These duals are the envelopes of the structures.For conics, the envelope consists of lines and for quadrics, the envelope consist of allplanes tangent to the quadric locus. Provided the quadrics and conics are non-degenerate,one can show that the equations for the duals are

UTLU = 0, (10)

where U are homogeneous plane coordinates and L = C−1. Similar for conics, one gets

uT lu = 0, (11)

where u are homogeneous line coordinates and l = c−1. The projection for the envelopeforms are

λl = PLPT λ 6= 0. (12)

Now we want to reformulate (12) so it appears in a similar form as the point projec-tion formula. This is indeed possible.

Lemma 2.1. The projection of a quadric,

λl = PLPT , (13)

can be writtenλl = P

CL. (14)

where l and L denote column vectors of length 6 and 10 obtained from stacking the elementsin l and L. P

Cis an 6× 10 matrix. The entries in P

Care quadratic expressions in P .

Proof. Equation (13) can be written as a tensor product, λl = (P ⊗ P )L, where l andL denote the stacked columns of l and L. Due to the symmetry in l and L it can berewritten as (14).

As for the line case, it is not possible to make the interpretation that the scalar λ ofthe projected conic corresponds to the depth.

3 Triangulation

In triangulation the goal is to reconstruct a three dimensional scene from measurementsin N images, N ≥ 2. The camera matrices Pi, i = 1 . . . N , are considered to be knownfor each image. In the point case, the camera matrix can be written P = (p1, p2, p3)T ,where pj is a 4–vector. Let (u, v)T denote the image coordinates andX = (U, V,W, 1)T

the extended coordinates of the point in 3D. This gives the reprojection error

r =(u− pT

1 X

pT3 X

, v − pT2 X

pT3 X

). (15)

33

PAPER I

Further,∑N

i=1 ‖ri‖22 is the objective function to minimize if the smallest reprojectionerror is to be achieved in L2-norm. To use the optimization algorithm proposed in thispaper (see next section), it is necessary to write the error in each image as a rationalfunction f/g where f is convex and g concave.

The residual r in (15) can be rewritten as

r =(u pT

3 X − pT1 X

pT3 X

,v pT

3 X − pT2 X

pT3 X

), (16)

is it easy to see that the L2-norm of the residual can be written as ‖r‖2 = ((aTX)2 +(bTX)2)/(pT

3 X)2, where a, b are 4-vectors determined by the image coordinates and theelements of the camera matrix. By choosing f = ((aTX)2 +(bTX)2)/(pT

3 X) (with thedomain pT

3 X > 0) and g = pT3 X , one can show that f is indeed convex and g concave.

It is straight forward to form the same residual vectors in the line and conic cases - theonly difference is that the dimension is different.

3.1 Error Distribution

In the remainder of this paper, it will be assumed that the errors on the residual elementsare Gaussian. This is true when triangulating points given that the errors that appear inthe image plane are normally distributed. For lines and conics this will not be entirelytrue [8] but we argue that it is an good approximation when the magnitude of the errorsis small.

In order to validate this claim, the following experiments were performed: 100,000random images were created consisting of one conic and one line. These structures werethan sampled with 21 and 63 points for the line and the conic, respectively. Gaussiannoise was then added to these points with a standard deviation corresponding to one pixelin an image with a resolution 1000 × 1000 pixels. In the next step, a line and a conicwere fitted to these points using least-squares. In the line case, the scale was adjusted suchthat the first two coordinates have a norm of one and translated such that the distance tothe origin was one in order to avoid passing through the origin (since this would meanthat the third coordinate would be zero).

As can be seen in Figure 1 the elements in the line vector residual is not perfectlynormally distributed but it is an good approximation. In the conic case, Figure 2 showsthat two of the elements are closer to be perfect normally distributed. These two elementscorrespond to the center of the conic divided by a scale factor. The other three elementsare close to normal distributions even though they are not as close as the other two.

3.2 Incorporation of Covariance

The residual vector (16) may in dimension independent notation be written

r =(x1p

TnX − pT

1 X

pTnX

, . . . ,xn−1p

TnX − pT

n−1X

pTnX

). (17)

34

3. TRIANGULATION

−0.1 −0.05 0 0.05 0.10

20

40

60

80

100Line element: 1

−0.1 −0.05 0 0.05 0.10

20

40

60

80

100Line element: 2

Figure 1: The distributions of the errors in the two residuals for line case when Gaussiannoise is added to 21 sample points of the true line. A Gaussian distribution is overlaid tothe histogram to compare with.

−1500 −1000 −500 0 500 1000 15000

1

2

3

4

5x 10

−3 Conic element: 1

−1000 −500 0 500 10000

1

2

3

4

5x 10


−1500 −1000 −500 0 500 1000 15000

1

2

3

4

5x 10


−1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5Conic element: 4

−1 −0.5 0 0.5 10

0.5

1

1.5

2

2.5Conic element: 5

Figure 2: The distributions of the errors in the five elements of the conic residual vectorwhen Gaussian noise is added to 63 sample points. A Gaussian distribution is overlaid tothe histogram to compare with.

35

PAPER I

The statistically optimal cost-function is to weight the residual vector by its covariance[12] (at least to a first order approximation). Incorporating covariance weighted errortransforms to,

‖Lr‖ =∥∥∥∥L(x1p

TnX − pT

1 X

pTnX

, . . .

)∥∥∥∥ , (18)

where L is the cholesky factorization of the inverse covariance matrix to the structure ineach image. Notice that we have chosen to normalize by the last coordinate and in thatcase the covariance becomes a 2 × 2 symmetric matrix in the point and line cases and a5 × 5 in the conic case. The reason why the covariance matrix is one dimension lowerthan the image vector is that there is no uncertainty in the last element of the extendedimage coordinates.

3.3 Problem Formulation

For the triangulation of points, lines or conics, the optimization problem to solve can beformulated as follows:

minn∑

i=1

‖Liri‖2. (19)

The only thing which differs (except for dimensions) in the different cases is that in theline case it is necessary to fulfill the quadratic Plücker constraint (4) for the coordinatesof the three dimensional structure.

4 Branch and Bound Optimization

Branch and bound algorithms are used to find the global optimum for non-convex opti-mization problems. The result of the algorithm is a provable upper and lower bound ofthe optimum and it is possible to get arbitrary close to the optimum.

On a non-convex, scalar-valued, objective function Φ at the domain Q0 the branchand bound algorithm works by finding a lower bound to the function Φ on the domainQ0. The lower bounding function should ideally be a convex function to make it possibleto find a minimum reliably. If the difference between the optimum for the boundingfunction and the lowest value of the function Φ - calculated so far - is small enough, sayε, then the optimum is considered to be found. Otherwise the domain Q0 is splitted intosubdomains and the procedure is repeated in these domains. The algorithm repeats untilthe gap between the bounding function and the value of the objective function is smallerthan the predefined value ε. Hence, the global optimum is attained within a preset ε > 0which can be chosen arbitrarily small.

If the lower bounding function on a subdomain has its minimum higher than aknown value of the objective function in another subdomain, it is possible to neglectthe first subdomain since we know that the optimum in that region is greater than thelowest value obtained so far.

36

4. BRANCH AND BOUND OPTIMIZATION

4.1 Bounding

The goal of the bounding function Φlb is that it should be (i) a close under-estimatorto the original objective function Φ and (ii) easy to compute the lowest value Φlb in agiven domain. For practical reasons, convex functions are good candidates as boundingfunctions. Further, as the domain of the bounding functions is partitioned into smallerregions, the approximation gap to the objective function must converge (uniformly). Agood choice of Φlb is the convex envelope, defined as follows.

Definition 1 (Convex Envelope). Let f : S → R, where S ⊂ Rn is a nonempty convexset. The convex envelope of f over S (denoted convenv f ) is a convex function such that(i) convenv f(x) ≤ f(x) ∀ x ∈ S and (ii) for any other convex function u, satisfyingu(x) ≤ f(x) ∀ x ∈ S, we have convenv f(x) ≥ u(x) ∀ x ∈ S.

In Figure 3, an example of a convex envelope of a one dimensional function is plotted.

x

f(x)

Figure 3: Illustration of the convex envelope; the blue solid line is a function f(x) andthe dashed red line is its convex envelope convenv f(x).

The concave envelope is defined in a similar manner. It can often be hard to computethe convex envelope (or concave envelope), in fact, it may be as hard as computing theglobal optimum itself.

37

PAPER I

Fractional Relaxation

Fractional programming is used to minimize/maximize a sum of p ≥ 1 fractions subjectto convex constraints. In this paper we are interested of minimizing

minx

p∑i=1

fi(x)gi(x)

(20)

subject to x ∈ D,

where fi and gi are convex and concave, respectively, functions from Rn → R, and thedomain D ⊂ Rn is a convex set. On top of this it is assumed that fi and gi are positiveand that one can compute a priori bounds on fi and gi. Even under these assumptions itcan be shown that the problem is NP-complete [3].

Instead of studying equation (20) more let us rewrite it. Assume that we have boundson the functions fi(x) and gi(x), where the intervals are [li, ui] and [Li, Ui], re-spectively. Let the 2p-dimensional rectangle [l1, u1] × · · · × [lp, up] × [L1, U1] ×· · · × [Lp, Up] be denoted Q0. With auxiliary variables t = (t1, . . . , tp)T and s =(s1, . . . , sp)T it can be showed that the global optimum to (20) is identic as to the fol-lowing optimization problem,

minx,t,s

p∑i=1

tisi

(21)

subject to fi(x) ≤ ti gi(x) ≥ si

x ∈ D (t, s) ∈ Q0.

In [18] it has been shown that the problem of finding the convex envelope to a singlefraction t/s as the parts in the sum in (21), where t ∈ [l, u] and s ∈ [L,U ], is given asthe solution of the following Second Order Cone Program (SOCP):

convenv

[t

s

]= minimize r (22)

subject to

∥∥∥∥ 2λ√l

r′ − s′∥∥∥∥ ≤ r′ + s′∥∥∥∥ 2(1− λ)√u

r − r′ − s+ s′

∥∥∥∥ ≤ r − r′ + s− s′

λL ≤ s ≤ λU (1− λ)L ≤ s− s′ ≤ (1− λ)Ur′ ≥ 0 r − r′ ≥ 0

l ≤ t ≤ u L ≤ s ≤ U

38

4. BRANCH AND BOUND OPTIMIZATION

where λ = u−tu−l for ease the notation and r, r′, s′ are auxiliary scalar variables.

When Φ is a sum of ratios as in (21) a bound for the function can be calculatedas the sum of the convex envelopes of the individual fractions. When summarizing theconvex envelopes the sum will, however, generally not be the convex envelope to thefunction. Still the function will be a lower bound and it fulfills the requirements of abounding function. This way of calculating Φlb by solving p SOCP problems can bedone efficiently [17].

A more exhaustive description on fractional programming in multiple view geometrycan be found in [9] where point triangulation (without covariance weighting) is treated.

Monomial Relaxation

In the line case the Plücker coordinates have to fulfill the Plücker constraint (4). Thisgives extra constraints in the problem to find lower bounds.

If we make the choice in the construction of the Plücker coordinates that the firstpoint lies on the plane z = 1 and the second on the plane z = 0, remember that thePlücker coordinates are independent of the construction points, the two points X =(x1, x2, 1, 1)T and Y = (y1, y2, 0, 1)T gives the following Plücker coordinates forthe line (2.2),

L = (x1y2 − x2y1, −y1, x1 − y1, −y2, y2 − x2, 1)T . (23)

This parameterization involves that the last coordinate is one and that only the first oneis nonlinear to the points of intersection. Thanks to this it is only necessary to make arelaxation of the first coordinate (in addition to the fractional terms).

The choosen parameterization does not work if the line is parallel to the plane z = 0.If this is the case we change the coordinate system.

In [14] the convex and the concave envelopes are derived for a monomial y1y2. Theconvex envelope is given by

convenv(y1y2) = max{y1y

U2 + yU

1 y2 − yU1 y

U2

y1yL2 + yL

1 y2 − yL1 y

L2

}. (24)

Similarly the concave envelope is

concenv(y1y2) = min{y1y

L2 + yU

1 y2 − yU1 y

L2

y1yU2 + yL

1 y2 − yL1 y

U2

}. (25)

Given bounds on x1, x2, y1 and y2 in the parameterization of a line, it is possible topropagate the bounds to the monomials x1y2 and x2y1.

4.2 Branching

It is necessarily to have a good strategy when branching. If a bad strategy is chosen thecomplexity can be exponentially but if a good choice is made it is possible to achieve alower complexity - at least in practice.

39

PAPER I

A standard branching strategy for fractional programming [1] is to branch on thedenominator si of each fractional term ti/si. This limits the practical use of branch andbound optimization to at most about 10 dimensions [15] but in the case of triangulationthe number of branching dimensions can be limited to a fixed number (at most the degreeof freedom of the geometric primitive). Hence, in the point case is it enough to branchon three dimensions, in the line case four and in the cases of conics nine dimensionsmaximally.

In the line case, we choose not to branch on the denominators. Instead the coordi-nates of the points where the line intersect with the planes z = 0 and z = 1 are usedfor the parameterization (4.1). This gives four dimensions to split at, independent of thenumber of images. It is also important to choose a coordinate system such that the nu-merical values of these parameters are kept at a reasonable magnitude. When the decisionof which rectangle to split is taken there are two choices to make. The first is on whichdimension to split and the second where to split in the chosen dimension. A reasonablepick of dimension is the one with the largest interval. The decision to split can be madein several different ways. One alternative is to make use of the bounding function. Thehypothesis is that the domain with the minimum bounding function is the best candidatefor the minimum of the objective function. To get as good estimation as possible closeto that point the splitting location can be chosen close to the minimum of the boundingfunction. Another way to split is to split the dimension into two equal parts. Both thesesplitting procedure can be shown to be convergent [1].

4.3 Initialization

It is necessary to have an initial domain Q0 for the branch and bound algorithm. Themethod used for this is similar in the point and conic case but different in the line casedue to the Plücker constraint.

Points and Conics

In order to get a bound on the denominators, we assume a bound on the maximal repro-jection error. Thus the bounds are constructed from a user given maximal reprojectionerror. The bounds on the denominators gi(x) can then be calculated by the followingoptimization problem,

for i = 1, . . . , p, min/max gi(x) (26)

subject tofj(x)gj(x)

≤ γ j = 1, . . . , p,

where γ is the user given bound on the reprojection error. This is a quadratic convexprogramming problem. In the experiments, γ is set to 3.

40

5. EXPERIMENTS

Lines

In the case of lines, the Plücker constraint makes things a bit more problematic. Insteadwe choose a more geometric way of getting bounds on the coordinates of the two pointsdefining the line.

For each image line l we first set the scale such that the norm of the first two coor-dinates equals one and then we make a translation to avoid the line to pass through theorigin. Two parallel lines (l1 and l2) are then constructed with γ pixels apart (one on eachside of l). Then, we make the hypothesis that the two points defining the optimal 3Dline (with our choice of coordinate systems) are located on the planes z = 0 and z = 1,respectively. Now, in order to find bounds on x1, x2, y1, y2, see equation (23), we needto solve the optimization problems,

max/min xi

subject to lT1iPiX ≤ 0 (27)

lT2iPiX ≥ 0,

where X = [x1, x2, 0, 1]T and the corresponding for yi with X substituted to Y =[y1, y2, 1, 1]T . Again, it is important to choose the coordinate system such that theplanes z = 0 and z = 1 are located appropriately. In addition, to avoid getting anunbounded feasible region, the maximum depth is limited to the order of the 3D pointfurthest away. In the experiments, we set γ to 5 pixels.

5 Experiments

The implementation is made in Matlab using a toolbox called SeDuMi [17] for the con-vex optimization steps.

The splitting of dimensions has been made by taking advantage of the informationwhere the minimum of the bounding function is located, as described in Section 4.2.

While testing the various cases, we have found that the relaxation performed in theline case - a combination of fractions and monomials - the bounds on the denominatorsis a critical factor for the speed of convergence. To increase the convergence speed, a localgradient descent step is performed on the computed solution in order to quickly obtain agood solution which can be employed to discard uninteresting domains.

Two public sets of real data2 were used for the experiments with points and lines.One of a model house with a circular motion and one of a corridor with a mostly forwardmoving camera motion. The model house has 10 frames and the corridor 11. In thesetwo sequences there were no conics. In Figure 4 samples of the data sets for points andlines are given. A third real data sequence was used for conic triangulation, see Figure 6.

2http://www.robots.ox.ac.uk/∼vgg/data.html

41

PAPER I

Figure 4: Image sets used for the experiments.

Points and lines were reconstructed and then the reprojection errors between differentmethods were compared. The other methods compared are bundle adjustment and alinear method [7]. The covariance structure for the lines was computed by fitting a lineto measured image points. In the reconstruction only the four first frames were used.In the house scene, there are 460 points and in the corridor 490. The optimum wasconsidered to be found if the gap between the global optimum and the under-estimatorwas less than 10 %. The results are presented in Tables 1 and 2.

Data set Optimal Bundle LinearMean Std Mean Std Mean Std

House 0.15 0.14 0.15 0.14 0.16 0.15Corridor 0.13 0.11 0.13 0.11 0.13 0.11

Table 1: Reprojection errors for points with three different methods on two data sets.

Data set Optimal Bundle LinearMean Std Mean Std Mean Std

House 1.40 0.92 1.41 0.93 1.62 1.03Corridor 3.42 4.29 3.30 4.34 4.02 5.45

Table 2: Reprojection errors for lines with three different methods on two data sets.

In the house scene, the termination criterion was reached already in the first iterationfor all points and for most of them the bounding functions was very close to the globalminimum (less than the 10 % required). In the corridor scene, the average number ofiterations were 3 and all minimas were reached within at most 23 iterations.

In the line case, the under-estimators works not as well as in the point case. This is due

42

5. EXPERIMENTS

(a) (b)

Figure 5: The result from reprojection of lines. The green dashed line is the original andthe red solid line is the reprojected. Image (a) is from the house scene and (b) is from thecorridor.

to the extra complexity of the Plücker coordinates. This makes it necessary to performmore iterations before it is possible to state that the global optimum is reached withina given certainty. In the house scene with the circular camera motion the breakpoint isreached within at most 120 iterations for all the tested 12 lines. However, for the corridorsequence with a weaker camera geometry (at least for triangulation purposes) it is noteven enough with 500 iterations for 6 of the tested 12 lines. Even if a lot of iterationsare needed to certify the global minimum, the location of the optimum in most cases isreached within less than a handful of iterations.

It can be seen in Tables 1 and 2 that both a linear method and bundle adjustmentwork fine for these problems. However, in some cases the bundle adjustment reprojectionerrors get higher than the errors for the optimal method. This shows that bundle adjust-ment (which is based on local gradient descent) sometimes gets stuck in a local minimum.The opposite also occurs in some cases: bundle adjustment gets better results than the op-timal method. The reason for this is twofold. The first reason is that a threshold is usedin the optimal method for the gap between the optimal value and the found value of theobjective function. This gap is fixed to 10% in our experiments which leaves a margin.The second reason is that this margin of error is not attained in all line experiments wherewe instead reach the maximum iteration number.

The result can also be seen in Figure 5 where two lines from each data set are comparedwith reprojected line.

5.1 Conics

For conics, some example images can be found in Figure 6 with marked image conics. Thecovariance structure was estimated by fitting a conic curve to measured image points. The

43

PAPER I

Figure 6: The images with the data used for conics.

corresponding 3D quadrics were first computed with the optimal and a linear method.The result of the reprojected conics from these two methods are imaged in Figure 7.

The number of iterations performed to reach the global minimum with a gap lessthen 5 % of the bounding function for the three conics were 3, 6 and 14. As can be seenfrom the images, the quadrics in the data set are planar and hence the condition numberof the corresponding 4× 4 matrix should be zero. For the three estimated quadrics withthe optimal method, the condition numbers are 1.2 · 10−3, 3.7 · 10−7 and 8.8 · 10−6.This can be compared with the result for the linear estimate with condition numbers of3.7 · 10−4, 4.1 · 10−5 and 1.1 · 10−4.

The result of the optimal method was also compared with bundle adjustment. In thelocal optimization the result of the linear method were used as initialization. The normof the residual from the three estimated quadrics are shown in Table 3. As can be seenthere the result of the optimal method and bundle adjustment is identical. This showsthat bundle adjustment reached the global optimum in all the experiments.

Conic # Optimal Bundle Linear1 39.5 39.5 11902 5.00 5.00 1603 15.3 15.3 2660

Table 3: The norm of the residuals when quadrics were reconstructed. It is obviousthat the linear method isn’t good enough and that bundle adjustment reaches the globaloptimum.

Figure 7 (a) shows the reprojected conic compared with the original for one of theconics. The fitting is very good and it is obvious from Figure 7 (b) that the linear resultis far from acceptable. In part (c) of Figure 7 a comparison of the optimal method andbundle adjustment is shown. As can be seen there the result is almost identical.

44

6. DISCUSSION

(a) (b) (c)

Figure 7: The result of the reprojected conics of the data set in Figure 6. In image (a)a part of the reconstruction with optimal method is viewed. The light green line is thereprojection and the dark red the original conic. In (b) the red lines are the reprojectionafter linear method and the white when the optimal method were used. In (c) the imagein (b) is zoomed in. The added blue dashed line is the result from bundle adjustmentinitialized with the result from the linear method (red), as can be seen this line almosstperfectly coincide with the result from the optimal method.

6 Discussion

A unified treatment of the triangulation problem has been described using covariancepropagation. In addition to traditional local algorithms and algorithms based on algebraicobjective functions, globally optimal algorithms have been presented for the triangulationof points, lines and conics. For most cases, local methods work fine and they are generallyfaster in performance. However, none of the competing methods have a guarantee ofglobality.

The performed experiments show that bundle adjustment works well. This conclu-sion may come as no surprise. It has already been observed in the two-view case forpoints [6]. Now it is shown that this is true for any number of views for point features.Perhaps the main contribution of this paper is to serve as a benchmarking algorithm ofother algorithms since it gives a way to evaluate the performance of other methods.

A future line of research is to include more constraints in the estimation process, forexample, planar quadric constraints. This opens up the possibility to perform optimalauto-calibration using the image absolute conic. An other line of research is to improvecomputational efficiency for global optimal triangulation problems.

7 Acknowledgments

This work has been funded by the Swedish Research Council (grants no. 2004-4579, no.2005-3230 and no. 2007-6476), by the Swedish Foundation for Strategic Research (SSF)through the programme Future Research Leaders, and by the European Research Council(GlobalVision grant no. 209480).

45

PAPER I

46

Bibliography

[1] Harold P. Benson. Using concave envelopes to globally solve the nonlinear sum ofratios problem. Journal of Global Optimization, 22:343–364, 2002.

[2] Martin Byröd, Klas Josephson, and Kalle Åström. Improving numerical accuracy ofgröbner basis polynomial equation solvers. In International Conference on ComputerVision, 2007.

[3] R. W. Freund and F. Jarre. Solving the sum-of-ratios problem by an interior-pointmethod. J. Glob. Opt., 19(1):83–102, 2001.

[4] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey VisionConference, pages 147–151, 1988.

[5] R. Hartley and F. Kahl. Optimal algorithms in multiview geometry. In Asian Conf.Computer Vision, Tokyo, Japan, 2007.

[6] R. Hartley and P. Sturm. Triangulation. Computer Vision and Image Understanding,68(2):146–157, 1997.


[8] Stephan Heuel. Uncertain Projective Geometry: Statistical Reasoning for PolyhedralObject Reconstruction, volume 3008 of Lecture Notes in Computer Science. Springer,2004.

[9] F. Kahl, S. Agarwal, M.K. Chandraker, D.J. Kriegman, and S. Belongie. Practi-cal global optimization for multiview geometry. International Journal of ComputerVision, 2007.

[10] F. Kahl and R. Hartley. Multiple view geometry under the L∞-norm. IEEE Trans.Pattern Analysis and Machine Intelligence, 2007. To appear.

[11] F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstructionproblems. Int. Journal Computer Vision, 74(1):3–15, 2007.

[12] K. Kanatani. Statistical Optimization for Geometric Computation: Theory and Prac-tice. Elsevier Science, Amsterdam, The Netherlands, 1996.

47

PAPER I

[13] J.C. McGlone, E.M. Mikhail, J. Bethel, and R. Muellen. Manual of Photogramme-try. American Society of Photogrammetry and Remote Sensing , Falls Church, VA,2004.

[14] Hong Seo Ryoo and Nikolaos V. Sahinidis. Analysis of bounds for multilinearfunctions. Journal of Global Optimization, 19(4):403–424, 2001.

[15] Siegfried Schaible and Jianming Shi. Fractional programming: the sum-of-ratioscase. Optimization Methods and Software, 18:219–229, 2003.

[16] H. Stewénius, F. Schaffalitzky, and D. Nistér. How hard is three-view triangulationreally? In Int. Conf. Computer Vision, pages 686–693, Beijing, China, 2005.

[17] J.F. Sturm. Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetriccones. Optimization Methods and Software, 11-12:625–653, 1999.

[18] Mohit Tawarmalani and Nikolaos V. Sahinidis. Semidefinite relaxations of frac-tional programs via novel convexification techniques. Journal of Global Optimiza-tion, 20:133–154, 2001.

[19] Bill Triggs. Detecting keypoints with stable position, orientation, and scale underillumination changes. In European Conf. Computer Vision, volume 4, pages 100–113, Prague, Czech, 2004.

48

PAPER II

Manuscript for International Journal of Computer Vision , 2008.

Fast and Stable Polynomial EquationSolving for Computer Vision

Martin Byröd, Klas Josephson and Kalle Åström

Abstract

This paper presents several new results on techniques for solving polynomial problems in com-puter vision. Gröbner basis techniques for equation solving have been applied successfully toseveral geometric computer vision problems. However, in many cases these methods are plaguedby numerical problems. In this paper we show that an extension of current state of the artby not only considering reduction based on a Gröbner basis can yield dramatic improvementsin numerical stability. Furthermore, we show how the action matrix can be computed in thegeneral setting of an arbitrary linear basis for C[x]/I . In particular, two improvements onthe stabality of the computations are made by studying how the linear basis for C[x]/I shouldbe selected. The first of these strategies utilizes QR-factorization with column pivoting and thesecond is based on singular value decomposition (SVD). Moreover, it is shown how to improvestability further by an adaptive scheme for truncation of the Gröbner basis. These new tech-niques are studied on some of the latest reported uses of Gröbner basis methods in computervision and we demonstrate dramatically improved numerical stability making it possible tosolve a larger class of problems than previously possible.

1 Introduction

Numerous geometric problems in computer vision involve the solution of systems ofpolynomial equations. This is particularily true for so called minimal structure and mo-tion problems, e.g . [5, 22, 35]. Solutions to minimal structure and motion problems canoften be used in RANSAC algorithms to find inliers in noisy data [10, 36, 37]. For suchapplications one needs to solve a large number of minimal structure and motion problemsas fast as possible in order to find the best set of inliers. There is thus a need for fast andnumerically stable algorithms for solving particular systems of polynomial equations.

Another area of recent interest is global optimization used for e.g . optimal triangula-tion, resectioning and fundamental matrix estimation. Global optimization is a promis-ing, but difficult pursuit and different lines of attack have been tried, e.g . branch and

51

PAPER II

bound [1], L∞ norm methods [14, 19] and methods using linear matrix inequalities(LMIs) [20].

An alternative way to find the global optimum is to calculate stationary points directly(usually by solving some polynomial equation system) [15, 34]. So far, this has beenan approach of limited applicability since calculation of stationary points is numericallydifficult for larger problems. By using the methods presented in this paper it should bepossible to handle a somewhat larger class of problems, thus offering an alternative tothe above mentioned optimization methods. An example of this is optimal three viewtriangulation which has previously not been solved in a practical way [34]. We showthat this problem can now be solved in a reasonably efficient way with an algorithmimplemented in standard IEEE double precision.

Traditionally, researchers have hand-coded elimination schemes in order to solve sys-tems of polynomial equations. Recently, however, new techniques based on algebraicgeometry and numerical linear algebra have been used to find all solutions, cf . [30]. Theoutline of such algorithms is that one first studies a specific geometric problem and findsout what structure the Gröbner basis the ideal I has for that problem, how many solutionsthere are and what the degrees of monomials occurring in the Gröbner basis elements are.For each instance of the problem with numerical data, the process of forming the Gröb-ner basis follows the same steps and the solution to the problem can be written down asa sequence of pre determined elimination steps using numerical linear algebra.

Currently, the limiting factor in using these methods for larger and more difficultcases is numerical problems. For example in [34] it was necessary to use emulated 128bit numerics to make the system work, which made the implementation very slow. Thispaper improves on the state of the art of these techniques making it possible to handlelarger and more difficult problems in a practical way.

In the paper we pin-point the main source of these numerical problems (the condi-tioning of a crucial elimination step) and propose a range of techniques for dealing withthis issue. The main novelty is a new approach to the action matrix method for equationsolving, relaxing the need of adhering to a properly defined monomial order and a com-plete Gröbner basis. This unlocks substantial freedom, which in this paper is used in anumber of different ways to improve stability.

Firstly, we show how the sensitive elimination step can be avoided completely, byusing an overly large/redundant (linearly dependent) basis for C[x]/I to construct theaction matrix. This method yields the right solutions along with a set of false solutionsthat can then easily be filtered out by evaluation in the original equations.

Secondly, we show how a change of basis in the quotient space C[x]/I can be usedto improve the numerical precision of the Gröbner basis computations. This approachcan be seen as an attempt at finding an optimal reordering or even linear combination ofthe monomials and we investigate what conditions such a reordering/linear combinationneeds to satisfy. We develop the tools needed to compute the action matrix in a generallinear basis for C[x]/I and propose two strategies for selecting a basis which enhances the

52

1. INTRODUCTION

stability of the solution procedure.The first of these is a fast strategy based on QR-factorization with column pivoting.

The Gröbner basis like computations employed to solve a system of polynomial equa-tions can essentially be seen as matrix factorization of an under-determined linear system.Based on this insight, we combine the robust method of QR factorization from numer-ical linear algebra with the Gröbner basis theory needed to solve polynomial equations.More precisely, we employ QR-factorization with column pivoting in the above men-tioned elimination step and obtain a simultaneous selection of linear basis and triangularfactorization.

Factorization with column pivoting is a very well studied technique and there existhighly optimized and reliable implementations of these algorithms in e.g . LAPACK [25],which makes this technique accessible and relatively straight forward to implement.

The second technique for basis selection goes one step further and employs singularvalue decomposition (SVD) to select a general linear basis of polynomials for C[x]/I .This technique is computationally more demanding than the QR-method, but yieldseven better stability.

Finally, we show how a redundant linear basis for C[x]/I can be combined withthe above basis selection techniques. In the QR-method, since the pivot elements aresorted in descending order, we get an adaptive criterion for where to truncate the Gröbnerbasis like structure by setting a maximal threshold for the quotient between the largestand the smallest pivot element. When the quotient exceeds this threshold we abort theelimination and move the remaining columns into the basis. This way, we expand thebasis only when necessary.

The paper is organized as follows. We give an overview of the classical theory of alge-braic geometry underlying the ideas presented in this paper in Section 2 after a brief dis-cussion of related techniques for polynomial equation solving in Section 1.1. Thereafter,in Section 3.1, we present the theoretical underpinnings of the new numerical techniquesintroduced in Sections 4, 5, 5.3 and 6. In Section 7 we evaluate the speed and numeri-cal stability of the proposed techniques on a range of typical geometric computer visionproblems and finally we give some concluding remarks.

1.1 Related Work

The area of polynomial equation solving is currently very active and only few of theavailable methods have yet found their way into the computer vision community. Seee.g . [4] and references therein for a comprehensive exposition of the state of the art in thisfield.

One of the oldest and still used methods for non-linear equation solving is the Newton-Raphson method which essentially works by gradient descent. Newton-Raphson is fastand easy to implement, but depends heavily on initialisation and finds only a single zerofor each initialisation. In the univariate case, a numerically sound procedure to find the

53

PAPER II

complete set of roots is to compute the eigenvalues of the companion matrix. However,if only real solutions are needed, the fastest way is probably to use Sturm sequences [17].

In several variables, a first method is to use resultants [6], which using a determinantconstruct enables the successive elimination of variables. However, the resultant growsexponentially in the number of variables and is in most cases not practical for more thantwo variables. An alternative way of eliminating variables is to compute a lexicographicalGröbner basis for the ideal generated by the equations which can be shown to containa univariate polynomial representing the solutions [6]. This approach is however oftennumerically unstable.

A radically different approach is provided by homotopy continuation methods [38].These methods typically work in conjunction with mixed volume calculations by con-structing a simple polynomial system with the same number of zeros as the actual systemthat is to be solved. The simple system with known zeros is then continously deformedinto the actual system. The main drawback of these methods is the computational com-plexity with computation time ranging in seconds or more.

At present, the best methods for geometric computer vision problems are based oneigendecomposition of a certain matrices (action matrices) representing multiplication inthe quotient space C[x]/I . The action matrix can be seen as a direct generalization of thecompanion matrix in the univariate case. The factors that make this approach attractiveis that it (i) is fast and numerically feasible, (ii) handles more than two variables andreasonably high degrees (up to around 10) and (iii) is well suited to tuning for specificapplications. To the authors best knowledge, this method was first used in the contextof computer vision by Stewenius et al . [30] eventhough Gröbner basis methods werementioned in [16].

The work presented in this paper is based on preliminary results presented in [2, 3]and essentially develops the action matrix method further to resolve numerical issuesarising in the construction of the action matrix. Using the methods presented here, it isnow possible to solve a larger class of problems than previously possible.

2 Review of Algebraic Geometry for Equation Solving

In this section we review some of the classical theory of multivariate polynomials. Weconsider the following problem

Problem 1. Given a set of m polynomials fi(x) in s variables x = (x1, . . . , xs), deter-mine the complete set of solutions to

f1(x) = 0...

fm(x) = 0.

(1)

54

2. REVIEW OF ALGEBRAIC GEOMETRY FOR EQUATION SOLVING

We denote by V the zero set of (1). In general V need not be finite, but in this paper wewill only consider zero dimensional V , i.e. V is a point set.

The general field of study of multivariate polynomials is algebraic geometry. See [6]for a nice introduction to the field and for proofs of all claims made in this section. Inthe language of algebraic geometry, V is an affine algebraic variety and the polynomialsfi generate an ideal I = Σihi(x)fi(x), where hi ∈ C[x] are any polynomials and C[x]denotes the set of all polynomials in x over the complex numbers.

The motivation for studying the ideal I is that it is a generalization of the set ofequations (1). A point x is a zero of (1) iff it is a zero of I . Being even more general, wecould ask for the complete set of polynomials which vanish on V . If I is equal to this set,then I is called a radical ideal.

We say that two polynomials f, g are equivalent modulo I iff f − g ∈ I and denotethis by f ∼ g. With this definition we get the quotient space C[x]/I of all equivalenceclasses modulo I and let [·] denote the natural projection C[x]→ C[x]/I , i.e. by [fi] wemean the set {gi : fi − gi ∈ I} of polynomials equivalent to fi modulo I .

A related structure is C[V ], the set of equivalence classes of polynomial functions onV . We say that a function F is polynomial on V if there is a polynomial f such thatF (x) = f(x) for x ∈ V and equivalence here means equality on V . If two polynomialsare equivalent modulo I , then they are obviously also equal on V . If I is radical, thenconversely two polynomials which are equal on V must also be equivalent modulo I .This means that for radical ideals, C[x]/I and C[V ] are isomorphic. Moreover, if V is apoint set, then clearly C[V ] is isomorphic to Cr, where r = |V |.

2.1 The Action Matrix

Turning to equation solving, our starting point is the companion matrix which arises forpolynomials in one variable. For a third degree polynomial

p(x) = x3 + a2x2 + a1x+ a0, (2)

the companion matrix is −a2 1 0−a1 0 1−a0 0 0

. (3)

The eigenvalues of the companion matrix are the zeros of p(x) and for high degree poly-nomials, this provides a numerically stable way of calculating the roots.

With some care, this technique can be extended to the multivariate case as well,which was first done by Lazard in 1981 [26]. For V finite, the space C[x]/I is fi-nite dimensional. Moreover, if I is radical, then the dimension of C[x]/I is equal to|V |, i.e. the number of solutions [6]. For some p ∈ C[x] consider now the operationTp : f(x) → p(x)f(x). The operator Tp is linear and since C[x]/I is finite dimen-sional, we can select a linear basis B of polynomials for C[x]/I and represent Tp as a

55

PAPER II

matrix mp. This matrix is known as the action matrix and is precisely the generalizationof the companion matrix we are looking for. The eigenvalues of mp are p(x) evaluatedat the points of V . Moreover, the eigenvectors of mT

p equals the vector of basis elementsevaluated on V . Briefly, this can be understood as follows: Consider an arbitrary poly-nomial p(x) = cTb, where c is a vector of coefficients and b is a vector of polynomialsforming a basis of C[x]/I . We then have

[p · cTb] = [(mpc)Tb] = [cTmTp b]. (4)

This holds for any coefficient vector c and hence it follows that [pb] = [mTp b], which

can be written pb = mTp b + g for some vector g with components gi ∈ I . Evaluating

the expression at a zero x ∈ V we get g(x) = 0 and thus obtain

p(x)b(x) = mTp b(x), (5)

which we recognize as an eigenvalue problem on the matrix mTp with eigenvectors b(x).

In other words, the eigenvectors of mTp yield b(x) evaluated at the zeros of I and the

eigenvalues give p(x) at the zeros. The conclusion we can draw from this is that zerosof I corresponds to eigenvectors and eigenvalues of mp, but not necessarily the opposite,i.e. there can be eigenvectors/eigenvalues that do not correspond to actual solutions. If Iis radical, this is not the case and we have an exact correspondence.

2.2 Gröbner Bases

We have seen theoretically that the action matrix mp provides the solutions to (1). Themain issue is now how to compute mp. This is done by selecting a basis B for C[x]/Iand then computing [p · bi] for each bi ∈ B. To do actual computations in C[x]/I weneed to represent each equivalence class [f ] by a well defined representative polynomial.The idea is to use multivariate polynomial division and represent [f ] by the remainderunder division of f by I . Fortunately, for any polynomial ideal I , this can always bedone and the tool for doing so is a Gröbner basis G for I [6]. The Gröbner basis for Iis a canonical set of generators for I with the property that multivariate division by G,

denoted fG

, always yields a well defined remainder. By well defined we mean that for any

f1, f2 ∈ [f ], we have f1G

= f2G

. The Gröbner basis is computed relative a monomialorder and will be different for different monomial orders. As a consequence, the set ofrepresentatives for C[x]/I will be different, whereas the space itself remains the same.

The linear basis B should consist of elements bi such that the elements {[bi]}ri=1

together span C[x]/I and biG

= bi. Then all we have to do to get is mp is to compute

the action pbiG

for each basis element bi, which is easily done if G is available.

56

2. REVIEW OF ALGEBRAIC GEOMETRY FOR EQUATION SOLVING

Example 2. The following two equations describe the intersection of a line and a circle

x2 + y2 − 1 = 0x− y = 0. (6)

A Gröbner basis for this system is

y2 − 12 = 0

x− y = 0, (7)

from which we trivially see that the solutions are 1√2(1, 1) and 1√

2(−1,−1). In this

case B = {y, 1} are representatives for a basis for C[x]/I and we have Tx[1] = [x] andTx[y] = [xy] = [y2] = [1

2 ], which yields the action matrix

mx =[

0 112 0

], (8)

with eigenvalues 1√2,− 1√

2. ut

2.3 A Note on Algebraic and Linear Bases

At this point there is a potentially confusing situation since there are two different typesof bases at play. There is the linear basis B of the quotient space C[x]/I and there is thealgebraic basis (Gröbner basis) G of the ideal I . To make the subsequent arguments astransparent as possible for the reader we will emphasize this fact by referring to the formeras a linear basis and the latter as an algebraic basis.

2.4 Floating Point Gröbner Basis Computations

The well established Buchberger’s algorithm is guaranteed to compute a Gröbner basis infinite time and works well in exact arithmetic [6]. However, due to round-off errors, iteasily becomes unstable in floating point arithmetic and except for very small examples itbecomes practically useless. The reason for this is that in the Gröbner basis computation,leading terms are successively eliminated from the generators of I by pairwise subtractionof polynomials, much like gaussian elimination. This leads to cancellation effects whereit becomes impossible to tell whether a certain coefficient should be zero or not.

A technique introduced by Faugere et al . in [9] is to write the system of equations (1)on matrix form

CX = 0, (9)

where X =[xα1 . . . xαn

]Tis a vector of monomials with the notation xαk =

xαk11 · · ·xαks

s and C is a matrix of coefficients. Elimination of leading terms now trans-lates to matrix operations and we then have access to a whole battery of techniques from

57

PAPER II

numerical linear algebra allowing us to perform many eliminations at the same time withcontrol on pivoting etc.

This technique takes us further, but for larger more demanding problems it is neces-sary to study a particular class of equations (e.g . relative orientation for omnidirectionalcameras [11], fundamental matrix estimation with radial distortion [23], optimal threeview triangulation [34], etc.) and use knowledge of what the structure of the Gröbnerbasis should be to design a special purpose Gröbner basis solver [30]. The typical workflow has been to study the particular problem at hand with the aid of a computer algebrasystem such as Maple or Macaulay2 and extract information such as the leading terms ofthe Gröbner basis, the monomials to use as a basis for C[x]/I , the number of solutions,etc. and work out a specific set of larger (gauss-jordan) elimination steps leading to theconstruction of a Gröbner basis for I .

Although, these techniques have permitted the solution to a large number of previ-ously unsolved problems, many difficulties remain. Most notably, the above mentionedelimination steps (if at all doable) are often hopelessly ill conditioned [34, 24]. This is inpart due to the fact that one has focused on computing a complete and correct Gröbnerbasis respecting a properly defined monomial order, which we show is not necessary.

In this paper we move away from the goal of computing a Gröbner basis for I andfocus on finding a representative of f in terms of a linear combination of a basis B, sincethis is the core of constructing mp. We denote this operation f for a given f ∈ C[x] andspecifically note that we only need to compute f for f ∈ R. It should however be notedthat the computations we do much resemble those necessary to get a Gröbner basis.

A first advantage of not having to compute a Gröbner basis is that we can replace thegauss-jordan elimination by standard linear equation solving techniques which are bothfaster and numerically more sound. Furthermore we are not bound by any particularmonomial order which as we will see, when used right, buys considerable numericalstability.

Drawing on these observations, we investigate in detail the exact matrix operationsneeded to compute f and thus obtain a procedure which is both faster and more stable,enabling the solution of a larger class of problems than previously possible.

3 A New Approach to the Action Matrix Method

In this section we present a new way of looking at the action matrix method for poly-nomial equation solving. The advantage of the new formulation is that it yields morefreedom in how the action matrix is computed. We start with a few examples that we willuse to clarify these ideas.

Example 3. In the five point relative orientation problem for calibrated cameras, cf . [22,7, 27, 31], the calculation of the essential matrix using 5 image point correspondencesleads to 10 equations of degree 3 in 3 unknowns. These equations involve 20 monomials.

58

3. A NEW APPROACH TO THE ACTION MATRIX METHOD

By writing the equations as in (9) and using a total degree ordering of the monomials weget a coefficient matrix C of size 10 × 20 and a monomial vector X = [xα1 . . .xαn ]T

with 20 monomials. It turns out that the first 10 × 10 block C1 of C =[C1 C2

]is

in general of full rank and thus the first 10 monomials X1 can be expressed in terms ofthe last 10 monomials X2 as

X1 = −C−11 C2X2. (10)

This makes it possible to regard the monomials in X2 as representatives of a linear basisfor C[x]/I . It is now straightforwad to calculate the action matrix for Tx (the multi-plication operator for multiplication by x) since monomials in the linear basis are eithermapped to monomials in the basis or to monomials in X1, which can be expressed interms of the basis using (10). ut

In this example the linear basis X2 can be thought of as a basis for the space ofremainders after division with a Gröbner basis for one choice of monomial order andthis is how these computations have typically been viewed. However, the calculationsabove are not really dependent on any properly defined monomial order and it seems thatthey should be meaningful irrespective of whether a true monomial order is used or not.Moreover, we do not use all the Gröbner basis properties.

Based on these observations we emphasize two important facts: (i) We are not in-terested in finding the Gröbner basis or a basis for the remainder space relative to someGröbner basis per se; it is enough to get a well defined mapping f and (ii) it suffices tocalculate f on the elements x · xαi , i.e. we do not need to be able to compute f for allf ∈ C[x]. These statements and their implications will be made more precise further on.

Example 4. Consider the equations

f1 = xy + x− y − 1 = 0f2 = xy − x+ y − 1 = 0, (11)

with solutions (−1,−1), (1, 1). Now let B = {x, y, 1} be a set of representatives for theequivalence classes in C[x]/I for this system. The set B does not constitute a proper basisfor C[x]/I since the elements of B represent linearly dependent equivalence classes. Theydo however span C[x]/I . Now study the operator Ty acting on B. We have Ty(1) = y,Ty(x) = xy ∼ x−y+1 and Ty(y) = y2 ∼ xy ∼ x−y+1 which gives a multiplicationmatrix 1 1 0

−1 −1 11 1 0

.An eigendecomposition of this matrix yields the solutions (−1,−1), (1, 1), (−1, 0). Ofthese the first two are true solutions to the problem, whereas the last one does not satisfythe equations and is thus a false zero. ut

59

PAPER II

In this example we used a set of monomials B whos corresponding equivalence classesspanned C[x]/I , but were not linearly independent. However, it was still possible toexpress the image Ty(B) in terms of B. The elements of the resulting action matrix arenot uniquely determined. Nevertheless we were able to use it to find the solutions tothe problem. In this section we give general conditions for when a set B can be used toconstruct a multiplication matrix which produces the desired set of zeros, possibly alongwith a set of false zeros, which need to be filtered out.

More generally this also means that the chosen representatives of the linear basis ofC[x]/I need not be low order monomials given by a Gröbner basis. In fact, they neednot be monomials at all, but could be general polynomials.

Drawing on the concepts illustrated in the above two examples we define a solvingbasis, similar to B in Example 4. The overall purpose of the definition is to rid our selvesof the need of talking about a Gröbner basis and properly defined monomial orders,thus providing more room to derive numerically stable algorithms for computation of theaction matrix and similar objects.

In the following we will also provide techniques for determining if a candidate basis Bconstitutes a solving basis and we will give numerically stable techniques for basis selectionin too large (linearly dependent) solving bases, here referred to as redundant bases.

3.1 Solving Bases

We start off with a set of polynomial equations as in (1) and a (point) set of zerosV (f1, . . . , fm) and make the following definition

Definition 5. Consider a finite subset B ⊂ C[x] of the set of polynomials over the com-plex numbers. If for each bi ∈ B and some p ∈ C[x] we can write

p(x)bi(x) = Σjmijbj(x) (12)

for some (not necessarily unique) coefficients mij and where equality means equality onV , then we call B a solving basis for (1) w.r.t p. ut

We now get the following for the matrix mp made up of the coefficients mij .

Theorem 6. Given a solving basis B for (1) w.r.t p, the evaluation of p on V is an eigenvalueof the matrix mp. Moreover, the vector b = (b1, . . . , br)T evaluated on V is an eigenvectorof mp.

Proof. By the definition of mp, we get

p(x)b(x) =

pb1...fbr

=

Σjm1jbj...

Σjmrjbj

= mpb(x) (13)

for x ∈ V . ut

60


As will become clear further on, when B is a true basis for C[x]/I , then the matrixmp defined here is simply the transposed action matrix for multiplication by p.

Given a solving basis, the natural question to ask is now under which circumstancesall solutions to (1) can be obtained from an eigenvalue decomposition of mp. We nextexplore some conditions under which this is possible. A starting point is the followingdefinition

Definition 7. A solving basis B is called a complete solving basis if the inverse image of themapping x→ b(x) from variables to monomial vector is finite for all points. ut

A complete solving basis allows us to recover all solutions from mp as shown in thefollowing theorem.

Theorem 8. Let B be a complete solving basis for (1) and mp as above and assume that forall eigenvalues λi we have λi 6= λj for i 6= j. Then the complete set of solutions to (1) canbe obtained from the set of eigenvectors {vi} of mp.

Proof. The vector b(x) evaluated on V is an eigenvector of mp. The number of eigen-vectors and eigenvalues of mp is finite so we can compute all eigenvectors {vi} and get{b(x)} for all x ∈ V as a subset of these. Applying b−1 to vi for all i thus yields a finiteset of points containing V . Evaluation in (1) allows us to filter out the points of this setwhich are not solutions to (1). ut

If on the other hand the inverse image is not finite for some vi so that we get a pa-rameter family x corresponding to this eigenvector, then the correct solution can typicallynot be obtained without further use of the equations (1) as illustrated in the followingexample.

Example 9. Consider the polynomial system

y2 − 2 = 0x2 − 1 = 0 (14)

with V = {(1,√2), (−1,√

2), (1,√

2), (−1,−√2), }. Clearly, B = {x, 1} with

monomial vector b(x, y) =[x 1

]T, is a solving basis w.r.t x for this example since

1 · x = x and x · x = x2 = 1 on V . Hence, b(x, y) evaluated on V is an eigenvector of

mx =[0 11 0

], (15)

which is easily confirmed. However, these eigenvectors do not provide any informationabout the y-coordinate of the solutions. We could try adding y to B but this would notwork since the values of xy on V cannot be expressed as a linear combination of x and yevaluated on V . A better choice of solving basis would be B = {xy, x, y, 1}. ut

61

PAPER II

At a first glance, Theorem 8 might not seem very useful since solving for x fromb(x) = vi potentially involves solving a new system of polynomial equations. How-ever, it provides a tool for ruling out choices of B which are not practical to work with.Moreover, there is usually much freedom in the choice of B. In general, B is a set ofpolynomials. However, it is often practical work with a basis of monomials. For each biwe then have bi(x) = xαi1

1 · · ·xαiss and get the following result

Corollary 10. If B consists of monomials and the r× s matrix A with Aij = αij is of ranks, then all solutions to (1) are obtained from the eigenvectors of mxk

.

Proof. Taking the logarithm of bj(x) we get component wise

log(bi(x)) = Σjαij log(xj), (16)

where xj = ±xj if necessary. Using the matrix A, this can be written

log(b(x)) = A

log(x1)...

log(xs)

. (17)

If rank(A) = s then in particular A is full rank and we can solve linearly for log(x).Theorem 8 then yields the conclusion. ut

We get an even more convenient situation if the right monomials are included in B:

Corollary 11. If {x1, . . . , xs} ⊂ B, then all solutions to (1), can be directly read off fromthe eigenvectors of mxk

.

Proof. Since the monomials {x1, . . . , xs} occur in B, they enter in the vector b(x) andhence the mapping in Definition 7 is injective with a trivial inverse. ut

The situation in Corollary 11 is certainly the most convenient one. However, evenif not all variables are included as elements in B, we can often still express each variablexk as a linear combination of the basis elements bi(x) for x ∈ V by making use ofthe original equations. We thus again obtain a well defined inverse to the mapping inDefinition 7.

Example 12. Consider the polynomial system (11) from Example 4. Subtracting f1 andf2 and dividing by 2 we get a third polynomial f3 = x−y. Thus B = {y, 1} constitutesa solving basis w.r.t. x since Tx(1) = x = y (on V ) and Tx(y) = xy = x− y + 1 = 1(on V ). The vector of monomials b(x, y) =

[y 1

]Tis not invertible since it does not

give any information about the x coordinate. However, we can use f3 = x − y = 0 toget the solutions from from the eigenvectors. ut

62


Finally, we show how the concept of solving basis connects to the standard theory ofaction matrices in the quotient space C[x]/I .

Theorem 13. If the ideal I generated by (1) is radical, then a complete solving basis B w.r.tto p for (1) with the property that all eigenvalues of mp are distinct spans C[x]/I .

Proof. Since I is radical, C[x]/I is isomorphic to C[V ], the ring of all polynomial func-tions on V . Moreover, since V is finite, all functions on V are polynomial and henceC[V ] can be identified with Cr, where r = |V |. Consider now the matrix B =[b(x1), . . . ,b(xr)

]. Each row of B corresponds to a (polynomial) function on V .

Hence, if we can show that B has row rank r, then we are done. Due to theorem 6,all b(xi) are eigenvectors of mp corresponding to eigenvalues p(xi). By the assumptionof distinct eigenvalues we have p(xi) 6= p(xj) whenever b(xi) 6= b(xj). Since B isa complete solving basis we have b(xi) 6= b(xj) whenever xi 6= xj . This means thatthe r points in V correspond to distinct eigenvalues and hence, since eigenvectors cor-responding to different eigenvalues are linearly independent, B has column rank r. Forany matrix row rank equals column rank and we are done. utThe above theorem provides an obvious correspondence between solving bases and linearbases for C[x]/I and in principle states that under some extra restrictions, a solving basisis simply a certain choice of basis for C[x]/I and then the matrix mp turns into the(transposed) action matrix.

However, relaxing these restrictions we get something which is not necessarily a basisfor C[x]/I in the usual sense, but still provides a method for solving (1). More specifically,using the concept of a solving basis provides two distinctive advantages.

(i) For a radical polynomial system with r zeros, C[x]/I is r-dimensional, so a basisfor C[x]/I contains r elements. This need not be the case for a solving basis, whichcould well contain more than r elements, but due to Theorem 8 still provides the rightsolutions. This fact is exploited in Section 4.

(ii) Typically, the artithmetic in C[x]/I has been computed using a Gröbner basis forI , which directly provides a monomial basis for C[x]/I in form of the set of monomialswhich are not divisible by the Gröbner basis. In this work we move focus from Gröbnerbasis computation to the actual goal of expressing the products pbi in terms of a set oflinear basis elements and thus no longer need to adhere to the overly strict ordering rulesimposed by a particular monomial order. This freedom is exploited in Sections 5.1 and5.2.

Finally, (i) and (ii) are combined in Section 5.3.

3.2 Solving Basis Computations using Numerical Linear Algebra

We now describe the most straight forward technique for deciding whether a candidatebasis B w.r.t. one of the variables xk, can be used as a solving basis and simultaneouslycalculate the action of Txk

on the elements of B.

63

PAPER II

We start by generating more equations by multiplying the original set of equations bya hand crafted (problem dependent) set of monomials. This yields additional equations,which are equivalent in terms of solutions, but hopefully linearly independent from theoriginal ones. In Example 9, we could multiply by e.g . {x, y, 1}, yielding xy2−2x, x3−x, y3 − 2y, x2y − y, y2 − 2, x2 − 1.

Given a candidate for a linear basis B of monomials one then partitions the set ofall monomialsM ocurring in the equations in to three partsM = E ⋃R⋃B, whereR = xkB \ B is the set of monomials that need to be expressed in terms of B to satisfythe definition of a solving basis and E = M \ (R⋃B) are the remaining (excessive)monomials. Each column in the coefficent matrix represents a monomial, so we reorderthe columns and write

C =[CE CR CB

], (18)

reflecting the above partition. The E-monomials are not of interest to us so we eliminatethem by putting CE on row echelon form using LU factorization

[UE1 CR1 CB1

0 CR2 CB2

]XEXRXB

= 0. (19)

We now discard the top rows and provided that enough linearly independent equa-tions were added in the first step so that CR2 is of full rank, we multiply by C−1

R2 fromthe left producing [

I C−1R2CB2

] [XRXB

]= 0, (20)

or equivalently

XR = −C−1R2CB2XB, (21)

which means that theR-monomials can be expressed as a linear combination of the basismonomials. Thus B is a solving basis and the matrix mxk

can easily be constructed as in(12). In other words, given an enlarged set of equations and a choice of linear basis B,the full rank of CR2 is sufficient to solve (1) via eigendecomposition of mxk

. The abovemethod is summarised in Algorithm 1 and given the results of Section 3.1 we now havethe following

Result 14. Algorithm 1 yields the complete set of zeros of a polynomial system, given that thepre- and postconditions are satisfied.

Proof. The postcondition that CR2 is of full rank ensures that B is a solving basis andtogether with the preconditions, Theorem 8 and Corollary 11 then guarantees the state-ment. ut

64


Example 15. Consider the equations from Example 2. Multiplying the second equationby x and y yields the enlarged system

1 0 1 0 0 11 −1 0 0 0 00 1 −1 0 0 00 0 0 1 −1 0

x2

xyy2

xy1

= 0, (22)

withM = {x2, xy, y2, x, y, 1} and since we chose B = {y, 1}, we get R = {xy, x}and E = {x2, y2}. After Step 11 and 12 of Algorithm 1 we have CR2 = [ 2 0

0 1 ] andCB2 =

[0 1−1 0

]and inserting into (33) we obtain[

xyx

]=[0 − 1

21 0

] [y1

], (23)

which then allows us to construct mx for this example. ut

Algorithm 1 Compute a solving basis w.r.t. xk and use it to solve a polynomial system.

Require: List of equations F = {f1, . . . , fm}, set of basis monomials B containing thecoordinate variables x1, . . . xs, m lists of monomials {Li}mi=1.

Ensure: CR2 is of full rank, eigenvalues of mxkare distinct.

1: Fext ← F2: for all fi ∈ F do3: for all xαj ∈ Li do4: Fext ← Fext

⋃{xαj · fi}5: end for6: end for7: Construct coefficient matrix C from Fext.8: M← The set of all monomials occurring in Fext.9: R ← xk · B \ B

10: E ←M\ (R⋃B)11: Reorder and partition C: C = [CE CR CB ].12: LU-factorize to obtain CR2 and CB2 as in (19).13: Use (21) to express xk · xαi in terms of B and store the coefficients in mxk

.14: Compute eigenvectors of mxk

and read off the tentative set of solutions.15: Evaluate in F to filter out possible false zeros.

A typical problem that might occur is that some eigenvalues of mxkare equal, which

means that two or more zeros have equal xk-coordinate. Then the corresponding eigen-vectors can not be uniquelly determined. This problem can be resolved by computing

65

PAPER II

mxkfor several k and then forming a random linear combination ma1x1+···+asxs

=a1mx1 + · · · + asmxs

, which then with very small probability has two equal eigenval-ues.

As previously mentioned, computing mp for a larger problem is numerically verychallenging and the predominant issue is expressing pB in terms of B, via somethingsimilar to (21). The reason for this is that without proper care, CR2 tends to becomevery ill conditioned (condition numbers of 1010 or higher are not uncommon). This wasalso the reason that extremely slow emulated 128 bit numerics had to be used in [34] toget a working algorithm.

In the following sections we will investigate techniques to circumvent this problemand produce well conditioned CR2, thus drastically improving numerical stability.

4 Using Redundant Solving Bases –The Truncation Method

A typical situation with an ill conditioned or rank deficient CR2 is that there are a fewproblematic monomials where the corresponding columns in C are responsible for thedeteriorated conditioning of CR2. A straight forward way to improve the situation is tosimply include the problematic monomials in B, thus avoiding the need to express thesein terms of the other monomials. This technique is supported by Theorem 8, which guar-antees that we will find the original set of solutions among the eigenvalues/eigenvectors ofthe larger mp found using this redundant basis. The price we have to pay is performingan eigenvalue decomposition on a larger matrix.

Not all monomials from M can be included in the basis B and in general it is adifficult question exactly which monomials can be used. One can however see that B hasto be a subset of the following set, which we denote the permissible monomials P = {b ∈M : pb ∈M}. The permissible monomials P is the set of monomials which stay inMunder multiplication by p.

An example of how the redundant solving basis technique can be used is provided bythe problem of L2-optimal triangulation from three views [34]. The optimum is foundamong the up to 47 stationary points, which are zeros of a polynomial system in threevariables. In this example an enlarged set of 255 equations in 209 monomials were usedto get a Gröbner basis. Since the the solution dimension r is 47 in this case, the 47lowest order monomials were used as a basis for C[x]/I in [34], yielding a numericallydifficult situation. In fact, as will be shown in more detail in the experiments section,this problem can solved by simply including more elements in B. In this example, thecomplete permissible set contains 154 monomials. By including all of these in B leaving55 monomials to be expressed in terms of B, we get a much smaller and in this case betterconditioned elimination step.

The redundant 154-element solving basis and the 154 × 154-matrix mxk, was con-

66

4. USING REDUNDANT SOLVING BASES –THE TRUNCATION METHOD

structed w.r.t. one of the variables xk and the set of eigenvalues computed from mxk

for one instance are plotted in the complex plane in Figure 1 together with the actualsolutions of the polynomial system.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

Re

Im

ActualSolutionsThe set ofeigenvalues

Figure 1: Eigenvalues of the action matrix using the redundant basis method and actualsolutions to the polynomials system plotted in the complex number plane. The formerare a strict superset of the latter.

67

PAPER II

5 Basis Selection

In the previous section we saw how it is possible to pick a “too large” (> r elements)linear basis P and still use it to solve the equations. In this section we show how one canselect a true (linearly independent) basis as a subset of P in a numerically stable way andthus gain both speed and stability. In the following, P denotes any subset ofM with theproperty that the obtained CR2 is of full rank, thus making P a solving basis.

Since the set V of zeros of (1) is finite with r points, P seen as a set of functionson V contains at most r linearly independent elements. It should therefore be possibleto choose a subset P ′ ⊂ P such that the elements in P ′ can be expressed as linearcombinations of elements in P \ P ′. By dropping P ′ from the solving basis, the setB = P \P ′ would thus constitute a new tighter solving basis w.r.t. the same multiplier pand ideal I as P .

We now present two numerically stable techniques for constructing a true basis Bfrom a redundant solving basis P .

5.1 The QR Method

We start by selecting P as large as possible, still yielding a full rank CR and form[CE CR CP ]. Any selection of basis monomials B ⊂ P will then correspond to amatrix CB consisting of a subset of the columns of CP .

By performing gaussian elimination we again obtain (19), but with B replaced byP , letting us get rid of the E-monomials by discarding the top rows. Furthermore, theR-monomials will all have to be expressed in terms of the P-monomials so we continuethe elimination putting CR2 on triangular form, obtaining[

UR CP1

0 CP2

] [XRXP

]= 0. (24)

At this point we could simply continue the gaussian elimination, with each new pivotelement representing a monomial expressed in terms of the remaining basis monomials.However, this typically leads to poor numerical performance since the elimination mightbe very ill conditioned. This is where the basis selection comes to play.

As noted above we can choose which of the p monomials in P to put in the basis andwhich to reduce. This is equivalent to choosing a permutation Π of the columns of CP2,

CP2Π =[cπ(1) . . . cπ(p)

](25)

and then proceed using standard elimination. The goal must thus be to make this choiceso as to minimize the condition number κ(

[cπ(1) . . . cπ(p−r)

]) of the first p − r

columns of the permuted matrix. In its generality, this is a difficult combinatorial opti-mization problem. However, the task can be approximately solved in an attractive way by

68

5. BASIS SELECTION

QR factorization with column pivoting [12]. With this algorithm, CP2 is factorized as

CP2Π = QU, (26)

where Q is orthogonal and U is upper triangular. By solving for CP2 in (26) and sub-

stituting into (24) followed by multiplication from the left with[

I 00 QT

]and from the

right with [ I 00 Π ], we get [

UR CP1Π0 U

] [XR

ΠTXP

]= 0. (27)

We observe that U is in general not quadratic and write U =[UP′2 CB2

],

where UP′2 is quadratic upper triangular. We also write CP1Π =[CP′1 CB1

]and

ΠtXP1 =[XP′1 XB

]Tyielding

[UR CP′1 CB1

0 UP′2 CB2

]XRXP′XB

= 0. (28)

Finally [XRXP′

]= −

[UR CP′10 UP′2

]−1 [CB1

CB2

]XB (29)

is analogous to (21) and ammounts to solving r upper triangular equation systems whichcan be efficiently done by back substitution.

The reason why QR factorization fits so nicely within this framework is that it si-multaneously solves the two tasks of reduction to upper triangular form and numericallysound column permutation and with comparable effort to normal Gaussian elimination.

Furthermore, QR factorization with column pivoting is a widely used and well stud-ied algorithm and there exist free, highly optimized implementations, making this anaccessible approach.

Standard QR factorization successively eliminates elements below the main diagonalby multiplying from the left with a sequence of orthogonal matrices (usually Householdertransformations). For matrices with more columns than rows (under-determined systems)this algorithm can produce a rank-deficient U which would then cause the computationsin this section to break down. QR with column pivoting solves this problem by, atiteration k, moving the column with greatest 2-norm on the last m− k + 1 elements toposition k and then eliminating the last m−k elements of this column by multiplicationwith an orthogonal matrix Qk.

69

PAPER II

5.2 The SVD Method

By considering not only monomial bases, but more general polynomial bases it is possibleto further improve numerical stability. We now show how singular value decomposi-tion (SVD) can be used to construct a basis for C[x]/I as r linearly independent linearcombinations of elements in a solving basis P .

As in Section 5.1 we start out by selecting an as large as possible (redundant) solvingbasis and perform preliminary matrix operations forming (24), where the aim is now toconstruct a linearly independent basis from P . We now do this by performing an SVDon CP2, writing

CP2 = UΣVT , (30)

where U and V are orthogonal and Σ is diagonal with typically r last elements zeroΣ =

[Σ′ 00 0

]for a system with r solutions.

Now multiplying from the left with[

I 00 UT

]and from the right with [ I 0

0 V ] in (24),we get [

UR CP1V0 Σ

] [XR

VTXP

]= 0. (31)

The matrix V induces a change of basis in the space spanned by P and we writeXP = VTXP = [ X′P XB ]T , where P ′ and B are now sets of polynomials. Using thisnotation we get UR 0 CP1

0 Σ′ 00 0 0

XRXP′XB

= 0, (32)

where Σ′ is diagonal with n − r non-zero diagonal entries. The zeros above Σ′ entersince Σ′ can be used to eliminate the corresponding elements without affecting any otherelements in the matrix. In particular this means that we have{

XP′ = 0XR = −U−1

R CP1XB(33)

on V , which allows us to express any elements in span(M) in terms of XB, which makesB a solving basis.

Computing the action matrix is complicated slightly by the fact that we are nowworking with a polynomial basis rather than a monomial one. To deal with this situationwe introduce some new notation. To each element ek of XP we assign a vector vk =[ 0 ... 1 ... 0 ]T ∈ R|P|, with a one at position k. Similarly, we introduce vectors uk ∈R|M|, wk ∈ R|B| representing elements of X and XB respectively. Further we define thelinear mapping R : span(M) → span(B), which using (33) associates an element ofspan(M) with an element in span(B). We now represent R by a |B| × |M| matrix

R =[−CT

P′U−1TR 0 I

], (34)

70

5. BASIS SELECTION

acting on the space spanned by the vectors uk.We also introduce the mappingMp : span(P)→ span(M) given byMp(f) = p·f

with the representation(Mp)ij = I(xαi = p · xαj ), (35)

where I(·) is the indicator function.Mp represents multiplication by p on P . In the basis P induced by V we get

Mp =[I 00 VT

]MpV. (36)

Finally, we get a representation of the multiplication mapping from B to B as

mp = RMpL, (37)

where L = [ 0I ] simply interprets the wk ∈ R|B| vectors as R|P|-vectors. The matrix mp

derived here is the transpose of the corresponding matrix in Section 3.1.An eigendecomposition of mT

p yields a set of eigenvectors v in the new basis. Itremains to inverse transform these eigenvectors to obtain eigenvectors of mt

p. To do this,we need to construct the change of basis matrix Vq in the quotient space. Using R andL, we get V−1

q = RVTL. And from this we get v = V−Tq v in our original basis.As will be seen in the experiments, the SVD method is somewhat more stable than

the QR method, but significantly slower due to the costly SVD factorization.

5.3 Basis Selection and Adaptive Truncation

We have so far seen three techniques for dealing with the case when the sub matrix CP2

is ill conditioned. By the method in Section 4 we avoid operating on CP2 altogether.Using, the QR and SVD methods we perform elimination, but in a numerically muchmore stable manner. One might now ask whether it is possible to combine these meth-ods. Indeed it turns out that we can combine either the QR or the SVD method witha redundant solving basis to get an adaptive truncation criterion yielding even better sta-bility in some cases. The way to do this is to choose a criterion for early stopping in thefactorization algorithms. The techniques in this section are related to truncation schemesfor rank-deficient linear least squares problems, cf . [21].

A neat feature of QR factorization with column pivoting is that it provides a way ofnumerically estimating the conditioning of CP2 simultaneously with eliminiation. Bydesign, the QR factorization algorithm produces an upper triangular matrix U with diag-onal elements uii of decreasing absolute value. The factorization proceeds column wise,producing one |uii| at a time. If rank(U) = r, then |urr| > 0 and ur+1,r+1 = · · · =unn = 0. However, in floating point arithmetic, the transition from finite |uii| to zerois typically gradual passing through extremely small values and the rank is consequently

71

PAPER II

hard to determine. For robustness it might therefore be a good idea to abort the factor-ization process early. We do this by setting a threshold τ for the ratio |u11

uii| and abort the

factorization once the value exceeds this threshold. A value of τ ≈ 108 has been found toyield good results1. Note that this produces an equivalent result to carrying out the fullQR factorization and then simply discarding the last rows of U. This is practical sinceoff-the-shelf packages as LAPACK only provide full QR factorization, eventhough somecomputational effort could be spared by modifying the algorithm so as not to carry outthe last steps.

Compared to setting a fixed (redundant) basis size, this approach is beneficial sinceboth rank and conditioning of CP2 might depend on the data. By the above method wedecide adaptively where to truncate and i.e. how large the linear basis should be.

In the context of the SVD we get a similar criterion by looking at the singular valuesinstead and set a threshold for σ1

σi, which for i = rank(CP2) is exactly the condition

number of CP2.

6 Using Eigenvalues Instead of Eigenvectors

In the literature, the preferred method of extracting solutions using eigenvalue decom-position is to look at the eigenvectors. It is also possible to use the eigenvalues, but thisseemingly requires us to solve s eigenvalue problems since each eigenvalue only gives thevalue of one variable. However, there can be an advantage with using the eigenvaluesinstead of eigenvectors. If there are multiple eigenvalues (or almost multiple eigenvalues)the computation of the corresponding eigenvectors will be numerically unstable. How-ever, the eigenvalues can usually be determined with reasonable accuracy. In practice, thissituation is not uncommon with the action matrix.

Fortunately, we can make use of our knowledge of the eigenvectors to devise a schemefor quickly finding the eigenvalues of any action matrix on C[x]/I . From Section 2 weknow that the right eigenvectors of an action matrix is the vector of basis elements ofC[x]/I evaluated at the zeros of I . This holds for any action matrix and hence all actionmatrices have the same set of eigenvectors. Consider now a problem involving the twovariables xi and xj . If we have constructed mxi

, the construction of mxjrequires almost

no extra time. Now perform an eigenvalue decomposition mxi = VDxiV−1. Since V

is the set of eigenvectors for mxj as well, we get the eigenvalues of mxj by straightforwardmatrix multiplication and then elementwise division from

mxjV = VDxj

. (38)

This means that with very little extra computational effort over a single eigenvalue de-composition we can obtain the eigenvalues of all action matrices we need.

1Performance is not very sensitive to the choice of τ and values in the range 106 to 1010 yield similarresults.

72

7. EXPERIMENTS

7 Experiments

In this section we evaluate the numerical stability of the proposed techniques on a rangeof typical geometric computer vision problems. The experiments are mainly carried outon synthetic data since we are interested in the intrinsic numerical precision of the solver.By intrinsic precision we mean precision under perfect data. The error under noise is ofcourse interesting for any application, but this is an effect of the problem formulationand not of the particular equation solving technique.

In Section 7.1 all the main methods (standard, truncated, SVD and QR) are testedon optimal three view triangulation first studied by Stewénius et al . in [34]. They hadto use emulated 128 bit arithmetics to get usable results, whereas with the techniques inthis paper, we solve the equations in standard IEEE double precision. Furthermore, theimproved methods are tested on: localization with hybrid features [18], relative pose withunknown but common focal length [32] and relative pose for generalized cameras [33].Significant improvements in stability are shown in all cases. In the localization examplewe failed completely to solve the equations using previous methods and hence this caseomits a comparision with previous methods.

7.1 Optimal Three View Triangulation

The triangulation problem is formulated as finding the world point that minimizes thesum of squares of the reprojection errors. This means that we are minimizing the likeli-hood function, thus obtaining a statistically optimal estimate. A solution to this problemwas presented by Stewénius et al . in [34]. They solved the problem by computing thestationary points of the likelihood function which amounts to solving a system of poly-nomial equations. The calculations in [34] were conducted using emulated 128 bit arith-metics yielding very long computation times and in the conclusions the authors writethat one goal of further work is to improve the numerical stability to be able to use stan-dard IEEE double-precision (52 bit mantissa) and thereby increase the speed significantly.With the techniques presented in this paper it is shown that it is now possible to take thestep to double-precision arithmetics.

To construct the solver for this example some changes in the algorithm of [34] weredone to make better use of the changes of basis according to Section 5. The initial threeequations are still the same as well as the first step of partial saturation (w.r.t. x). However,instead of proceeding to perform another step of partial saturation on the new ideal, wesaturate (w.r.t. y and z respectively) from the initial three equations and join the threedifferent partially saturated ideals. Finally, we discard the initial three equations andobtain totally nine equations.

This method does not give the same ideal as the one in [34] were sat(I, xyz) wasused. The method in this paper produces an ideal of degree 61 instead of 47 as obtainedby Stewénius et al . The difference is 11 solutions located at the origin and 3 solutionswhere one of the variables is zeros, this can be checked with Macaulay 2 [13]. The 11

73

PAPER II

solutions at the origin can be ignored in the calculations and the other three can easily befiltered out in a later stage.

To build the solver we use the nine equation from the saturated ideal (3 of degree 5and 6 of degree 6) and multiply with x, y and z up to degree 9. This gives 225 equationsin 209 different monomials. easiest way to get rid of the 11 false solutions at the origin isto remove the corresponding columns and rows from the action matrix.

The synthetic data used in the validation was generated with three randomly placedcameras at a distance around 1000 from the origin and a focal length of around 1000.The unknown world point was randomly placed in a cube with side length 1000 centeredat the origin. The methods have been compared on 100,000 test cases.

7.1.1 Numerical Experiments

The first experiment investigates what improvement can be achieved by simply avoidingthe problematic matrix elimination using the techniques of Section 4. For this purposewe choose the complete set of permissible monomials P as a redundant basis and performthe steps of Algorithm 1. In this case we thus get a redundant basis of 154 elements and a154× 154 multiplication matrix to perform eigenvalue decomposition on. In both casesthe eigenvectors are used to find the solutions. The results of this experiment are shownin Figure 2. As can be seen, this relatively straight forward technique already yields a largeenough improvement in numerical stability to give the solver practical value.

−15 −10 −5 0 50

0.05

0.1

0.15

0.2

0.25

Log10

of error in 3D placement

Fre

quen

cy

Standard basisRedundant basis

Figure 2: Histogram of errors over 100,000 points. The improvement in stability usingthe redundant basis renders the algorithm feasible in standard IEEE double precision.

74

7. EXPERIMENTS

Looking closely at Figure 2 one can see that even though the general stability is muchimproved, a small set of relatively large errors remain. By doing some extra work usingthe QR method of Section 5.1 to select a true basis as a subset of P , we improve stabilityfurther in general and in particular completely resolve the issue with large errors, cf .Figure 3.

−15 −10 −5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Log10


Fre

quen

cy

Standard basisRedundant basisQR basis

Figure 3: Histogram of errors for the standard, redundant basis and QR methods. TheQR method improves stability in general and in particular completely removes the smallset of large errors present in both the standard and redundant basis methods.

In Figure 4, the performance of the QR method is compared to the slightly morestable SVD method which selects a polynomial basis for C[x]/I from the monomials inP . In this case, errors are typically a factor ∼ 5 smaller for the SVD method comparedto the QR method.

The reason that a good choice of basis improves the numerical stability is that thecondition number in the elimination step can be lowered considerably. Using the basisselection methods, the condition number is decreased by about a factor 105. Figure 5shows a scatter plot of error versus condition number for the three view triangulationproblem. The SVD method displays a significant decrease and concentration in botherror and condition number. It is interesting to note that to a reasonable approximationwe have a linear trend between error and condition number. This can be seen since wehave a linear trend with slope one in the logarithmic scale. Moreover, we have a y-axisintersection at about 10−13, since we have a focal length of about 1000 this means that wehave error ≈ 10−16κ = εmachκ. This observation justifies our strategy of minimizing

75

PAPER II

−15 −10 −5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Log10


Fre

quen

cy

SVD basisQR basis

Figure 4: Comparison between the SVD and QR methods. The SVD method improvessomewhat over the QR method at the cost of the computationally more demanding SVDfactorization.

the condition number.As mentioned in Section 6, it might be beneficial to use the eigenvalues instead of

eigenvectors to extract solutions.When solving this problem using eigenvalues there is two extra eigenvalues problems

of size 50 × 50 that have to be solved. The impact of the switch from eigenvectors toeigenvalues can be seen in Figure 6. For this example we gain some stability at the costof having to perform three eigenvalue decompositions (one for each coordinate) insteadof only one. Moreover, we need to sort the eigenvalues using the eigenvectors to puttogether the correct triplets.

However, we can use the trick of Section 6 to get nearly the same accuracy usingonly a single eigenvalue decomposition. Figure 7 shows the results of this method. Themain advantage of using the eigenvalues is that we push down the number of large errorsconsiderably.

Finally we study the combination of basis selection and truncation of the Gröbnerbasis for the three view triangulation problem. The basis size was determined adaptivelyas described in Section 5.3 with a threshold τ = 108. Table 1 shows the size of the basiswhen this method was used. Since the basis is chosen minimal in 94% of the cases forthe SVD-method and 95% for the QR method the time consumption is almost identicalto the original basis selection methods, but as can be seen in Table 2 the number of largeerrors are reduced. This is probably due to the fact that truncation is carried out only

76

7. EXPERIMENTS

100

105

1010

1015

1020

10−15

10−10

10−5

100

105

Condition number

Err

or

standard basisSVD basis

Figure 5: Error versus the condition number for the part of the matrix which is invertedin the solution procedure.

77

PAPER II

−15 −10 −5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Log10


Fre

quen

cy

SVD basisSVD + eig

Figure 6: Error histograms showing the difference in precision between the eigenvalueand eigenvector methods.

−15 −10 −5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Log10


Fre

quen

cy

SVD basisSVD + fast eig

Figure 7: This graph shows the increase in performance when the fast eigenvalue methodis used insted of the eigenvector method.

78

7. EXPERIMENTS

when the matrices are close to being singular.

50 51 52 53 54 ≥ 55SVD 94.0 3.5 0.8 0.4 0.3 1.0QR 95.0 3.0 0.7 0.3 0.2 0.8

Table 1: Basis sizes for the QR and SVD methods with variable basis size. The tableshows the percentage of times the basis size occurred during 100,000 experiments.

To conclude the numerical experiments on three view triangulation two tables withdetailed error statistics are given. The acronyms STD, QR, SVD and TRUNC respectivelydenote the standard method, QR method, SVD method and redundant basis method.The suffixes eig, fast and var respectively denote the eigenvalue method, the fast eigen-value method (Section 6) and the use of a variable size basis (Section 5.3). Table 2 showshow many times the error gets larger the some given levels for several solvers. This isinteresting for example when RANSAC is used. As can be seen, the QR-method withadaptive basis size is the best method for reducing the largest errors but the SVD-methodwith use of the eigenvalues is the best in general. Table 3 shows the median and the 95:thpercentile errors for the same methods as in the previous table. Notable in here is thatthe 95:th percentile is improved with as much as factor 107 and the median with a factor105. The SVD-method with eigenvalues is shown to be the best but the QR-methodgives almost as good results.

Method > 10−3 > 10−2 > 10−1 > 1STD 35633 24348 15806 9703STD:eig 29847 19999 12690 7610SVD 1173 562 247 119SVD:eig 428 222 128 94SVD:fast 834 393 178 94SVD:var+fast 730 421 245 141TRUNC 6712 4697 3339 2384TRUNC:fast 5464 3892 2723 2015QR 1287 599 269 127QR:eig 517 250 149 117QR:fast 1043 480 229 106QR:var+fast 584 272 141 71

Table 2: Number of errors out of 100,000 experiments larger than certain levels. TheQR-method with adaptive basis size yields the fewest number of large errors.

79

PAPER II

Method 95th 50thSTD 1.42 · 101 9.85 · 10−5

STD:eig 5.30 · 100 3.32 · 10−5

SVD 1.19 · 10−5 6.09 · 10−9

SVD:eig 1.20 · 10−6 1.29 · 10−9

SVD:fast 4.37 · 10−6 2.53 · 10−9

SVD:var+fast 2.34 · 10−6 2.50 · 10−9

TRUNC 6.55 · 10−3 1.40 · 10−8

TRUNC:fast 1.87 · 10−3 3.27 · 10−9

QR 1.78 · 10−5 1.06 · 10−8

QR:eig 1.70 · 10−6 2.08 · 10−9

QR:fast 6.97 · 10−6 3.64 · 10−9

QR:var+fast 3.41 · 10−6 3.61 · 10−9

Table 3: The 95th percentile and the median error for various methods. The improve-ment in precision is up to a factor 107. The SVD method gives the best results, but theQR-method is not far off.

7.1.2 Speed Comparison

The main motivation for using the QR-method rather than the SVD-method is that theQR-method is computationally less expensive. To verify this the standard, SVD and QR-methods were run and the time was measured. Since the implementations were done inMatlab it was necessary to take care to eliminate the effect of Matlab being an interpretedlanguage. To do this only the time after construction of the coefficient matrix was takeninto account. This is because the construction of the coefficient matrix essentially am-mounts to copying coefficients to the right places which can be done extremely fast in e.g .a C language implementation.

In the routines that were measured no subroutines were called that were not built-infunctions in Matlab. The measurements were done with the Matlab profiler.

The time measurements were done on an Intel Core 2 2.13 GHz machine with 2 GBmemory. Each algorithm was executed with 1000 different coefficient matrices con-structed from the same type of scene setups as previously. The same set of coefficientmatrices was used for each method. The result is given in Table 4. Our results showthat the QR-method is approximately three times faster than the SVD-method but 50%slower than the standard method.

7.1.3 Triangulation of Real Data

Finally, the algorithm is evaluated under real world conditions. The Oxford dinosaur [8]is a familiar image sequence of a toy dinosaur shot on a turn table. The image sequence

80

7. EXPERIMENTS

Method Time per call / ms Relative timeSVD 66.89 1TRUNC 55.84 0.83QR 24.45 0.37STD 16.44 0.25

Table 4: Time consumption in the solver part for the three different methods. The timeis an average over 1000 function calls.

consists of 36 images and 4983 point tracks. For each point visible in three or more viewswe select the first, middle and last view and triangulate using these. This yields a total of2683 point triplets to triangulate. The image sequence contains some erroneous trackswhich we deal with by removing any points reprojected with an error greater than twopixels in any frame. The whole sequence was processed in approximately 80 seconds ina Matlab implementation on an Intel Core 2 2.13 GHz CPU with 2 GB of memory bythe QR-method with variable basis size. The resulting point cloud is shown in Figure 8.

We have also run the same sequence using the standard method, but the errors wereto large to yield usable results (typically larger errors than the dimensions of the dinosauritself ).

Figure 8: The Oxford dinosaur reconstructed from 2683 point triplets using the QR-method with variable basis size. The reconstruction was completed in approximately 80seconds by a Matlab implementation on an Intel Core 2 2.13 GHz CPU with 2 GB ofmemory.

7.2 Localization with Hybrid Features

In this experiment, we study the problem of finding the pose of a calibrated camera withunknown focal length. One minimal setup for this problem is three point-correspondences

81

PAPER II

with known world points and one correspondence to a world line. The last feature isequivalent to having a point correspondence with another calibrated camera. These typesof mixed features are called hybrid features and were introduced in [18], where the au-thors propose a parameterization of the problem but no solution was given apart fromshowing that the problem has 36 solutions.

The parameterization in [18] gives four equations in four unknowns. The unknownsare three quaternion parameters and the focal length. The equation derived from the linecorrespondence is of degree 6 and those obtained from the 3D points are of degree 3.The coefficient matrix C is then constructed by expanding all equations up to degree 10.This means that the equation derived from the line is multiplied with all monomials upto degree 4, but no single variable in the monomials is of higher degree than 2. In thesame manner the point correspondence equations are multiplied with monomials up todegree 7 but no single variable of degree more than 5. The described expansion gives 980equations in 873 monomials.

The next step is to reorder the monomials as in (18). In this problem CP corre-sponds to all monomials up to degree 4 except f4 where f is the focal length, which gives69 columns in CP . The part CR corresponds to the 5:th degree monomials that ap-pear when the monomials in P are multiplied with the first of the unknown quaternionparameters.

For this problem, we were not able to obtain a standard numerical solver. The reasonfor this was that even going to significantly higher degrees than mentioned above, we didnot obtain a numerical invertible C matrix. In fact, with an exact linear basis (same num-ber of basis elements as solutions), even the QR and SVD methods failed and truncationhad to be used.

In this example we found that increasing the linear basis of C[x]/I by a few elementsmore than produced by the adaptive criterion was beneficial for the stability. In thisexperiment, an increase by three basis elements was used.

The synthetic experiments for this problem were generated by randomly drawingfour points from a cube with side length 1000 centered at the origin and two cameraswith a distance of approximately 1000 to the origin. One of these cameras was treated asunknown and one was used to get the camera to camera point correspondence. This givesone unknown camera with three point correspondences and one line correspondence.The experiment was run 10, 000 times.

In Figure 9 the distribution of basis sizes is shown for the QR-method. For the SVD-method the basis size was identical to the QR-method in over 97% of the cases and neverdiffered by more than one element.

Figure 10 gives the distribution of relative errors in the estimated focal length. Itcan be seen that both the SVD-method and the faster QR-method give useful results.We emphasize that we were not able to construct a solver with the standard method andhence no error distribution for that method is available.

82

7. EXPERIMENTS

39 40 41 42 43 44 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Basis size

Fre

quen

cy

Figure 9: The basis size for localization with hybrid features. The number of solutionsare 36 and since we always add three dimensions to the truncated ideal the minimalpossible basis size is 39.

−15 −10 −5 00

0.1

0.2

0.3

0.4

0.5

Log10

of relative error in focal length

Fre

quen

cy

SVD basisQR basis

Figure 10: The error for localization with hybrid features. The standard method is omit-ted since we did not manage to construct a standard solver due to numerical problems.

83

PAPER II

7.3 Relative Pose with Unknown Focal Length

Relative pose for calibrated cameras is a well known problem and the standard minimalcase for this is five points in two views. There are in general ten solutions to this problem.For the same problem but with unknown focal length, the corresponding minimal caseis six points in two views [32], which was solved by Stewénius et al . using Gröbner basistechniques.

Following the same recipe as Stewénius et al . it is possible to express the fundamentalmatrix as a linear combination,

F = F0 + F1l1 + F2l2. (39)

Then putting f−2 = p one obtains nine equations from the constraint on the essentialmatrix [28]

2EEtE − tr(EEt)E = 0. (40)

A 10th equation is then obtained by making use of the fact that the fundamental matrixi singular, i.e. det(F ) = 0. These equations involve the unknowns p, l1 and l2 and areof total degree 5. The problem has 15 solutions in general.

We set up the coefficient matrix C by multiplying these ten equations by p so thatthe degree of p reaches a maximum of four. This gives 34 equations in a total of 50monomials. It turns out that it is possible to eliminate only one of the four monomialsl32p

4, l22p4, l2p4 and p4 from all of the equations. However, we can discard the equations

where these monomials cannot be eliminated and then proceed as usual. We choose toeliminate l22p

4, but this choice is arbitrary.The validation data was generated with two cameras of equal focal length of around

1000 placed at a distance of around 1000 from the origin. The six points were randomlyplaced in a cube with side length 1000 centered at the origin. The standard, SVD, andQR-methods have been compared on 100,000 test cases and the errors in focal length areshown in Figure 11. In this case the QR-method yields slightly better results than theSVD-method. This is probably due to loss in numerical precision when the solution istransformed back to the original basis.

7.4 Relative Pose for Generalized Camera

Generalized cameras provide a generalization of the standard pin-hole camera in the sensethat there is no common focal point through which all image rays pass, cf . [29]. Insteadthe camera captures arbitrary image rays or lines. Solving for the relative motion of ageneralized camera can be done using six point correspondences in two views. This isa minimal case which was solved in [33] with Gröbner basis techniques. The problemequations can be set up using quaternions to parameterize the rotation, Plücker repre-sentation of the lines and a generalized epipolar constraint which captures the relationbetween the lines. After some manipulations one obtains a set of sixth degree equations

84

8. CONCLUSIONS

−15 −10 −5 00

0.1

0.2

0.3

0.4

0.5

Log10

of error in focal length

Fre

quen

cy

Standard basisSVD basisQR basis

Figure 11: The error in focal length for relative pose with two semi calibrated cameraswith unknown but common focal length.

in the three quaternion parameters v1, v2 and v3. For details, see [33]. The problem has64 solutions in general.

To build our solver including the change of basis we multiply an original set of 15equations with all combinations of 1, v1, v2, v3 up to degree two. After this we end upwith 101 equations of total degree 8 in 165 different monomials.

We generate synthetic test cases by drawing six points from a normal distributioncentered at the origin. Since the purpose of this investigation is not to study generalizedcameras under realistic conditions we have not used any particular camera rig. Instead weuse a completely general setting where the cameras observe six randomly chosen lines eachthrough the six points. There is also a random relative rotation and translation relatingthe two cameras. It is the task of the solver to calculate the rotation and translation.

The four different methods have been compared on a data set of 10,000 randomlygenerated test cases. The results from this experiment are shown in Figure 12. As canbe seen, a good choice of basis yields drastically improved numerical precision over thestandard method.

8 Conclusions

We have introduced some new theoretical ideas as well as a set of techniques designedto overcome numerical problems encountered in state-of-the-art methods for polynomial

85

PAPER II

−15 −10 −5 00

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Log10

of angular error

Fre

quen

cy

Standard basisSVD basisQR basis

Figure 12: The angular error for relative pose with generalized camera.

equation solving. We have shown empirically that these techniques in many cases yielddramatic improvements in numerical stability and further permits the solution of a largerclass of problems than previously possible.

The technique for solving polynomial equations that we use in this paper can besummarized as follows. The original equations are first expanded by multiplying thepolynomials with a set of monomials. The resulting equations is expressed as a product ofa coefficient matrix C and a monomial vector X. Here we have some freedom in choosingwhich monomials to multiply with. We then try to find a solving basis B for the problem.For a given candidate basis B we have shown how to determine if B constitutes a solvingbasis. If so then we can use numerical linear algebra to construct the action matrix andget a fast and numirically stable solution to the problem at hand. However, we do notknow (i) what monomials we should multiply the original equations with and (ii) whatsolving basis B should be used to get the simplest and most numerically stable solutions.Are there algorithmic methods for answering these questions? For a given expansion CXcan one determine if this allows for a solving basis? A concise theoretical understandingand practical algorithms for these problems would certainly be of great aid in the workon polynomial problems and is a highly interesting subject for future research.

86

Bibliography

[1] Sameer Agarwal, Manmohan Krishna Chandraker, Fredrik Kahl, David J. Krieg-man, and Serge Belongie. Practical global optimization for multiview geometry. InProc. 9th European Conf. on Computer Vision, Graz, Austria, pages 592–605, 2006.

[2] M. Byröd, K. Josephson, and K. Åström. Fast optimal three view triangulation. InAsian Conference on Computer Vision, 2007.

[3] M. Byröd, K. Josephson, and K. Åström. Improving numerical accuracy of gröbnerbasis polynomial equation solvers. In Proc. 11th Int. Conf. on Computer Vision, Riode Janeiro, Brazil, Rio de Janeiro, Brazil, 2007.

[4] Eduardo Cattani, David A. Cox, Guillaume Chèze, Alicia Dickenstein, MohamedElkadi, Ioannis Z. Emiris, André Galligo, Achim Kehrein, Martin Kreuzer, andBernard Mourrain. Solving Polynomial Equations: Foundations, Algorithms, and Ap-plications (Algorithms and Computation in Mathematics). Springer-Verlag New York,Inc., Secaucus, NJ, USA, 2005.


[6] D. Cox, J. Little, and D. O’Shea. Ideals, Varieties, and Algorithms. Springer, 2007.

[7] M. Demazure. Sur deux problemes de reconstruction. Technical Report 882, IN-RIA, 1988.

[8] Visual geometry group, university of oxford. http://www.robots.ox.ac.uk/∼vgg.

[9] J.-C. Faugère. A new efficient algorithm for computing gröbner bases (f4). Journalof Pure and Applied Algebra, 139(1-3):61–88, 1999.

[10] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for modelfitting with applications to image analysis and automated cartography. Communi-cations of the ACM, 24(6):381–95, 1981.

[11] C. Geyer and H. Stewénius. A nine-point algorithm for estimating para-catadioptricfundamental matrices. In Proc. Conf. Computer Vision and Pattern Recognition, Min-neapolis, USA, June 2007.

[12] G. H. Golub and C. F. van Loan. Matrix Computations. The Johns HopkinsUniversity Press, 3rd edition, 1996.

87

PAPER II

[13] D. Grayson and M. Stillman. Macaulay 2. Available athttp://www.math.uiuc.edu/Macaulay2/, 1993-2002. An open source computeralgebra software.

[14] R. Hartley and F. Schaffalitzky. L∞ minimization in geometric reconstruction prob-lems. In Proc. Conf. Computer Vision and Pattern Recognition, Washington DC, pages504–509, Washington DC, USA, 2004.

[15] R. Hartley and P. Sturm. Triangulation. Computer Vision and Image Understanding,68:146–157, 1997.

[16] R. Holt, R. Huang, and A. Netravali. Algebraic methods for image processing andcomputer vision. IEEE Transactions on Image Processing, 5:976–986, 1996.

[17] D. G. Hook and P. R. McAree. Using sturm sequences to bracket real roots ofpolynomial equations. Graphics gems, pages 416–422, 1990.

[18] K. Josephson, M. Byröd, F. Kahl, and K. Åström. Image-based localization usinghybrid feature correspondences. In The second international ISPRS workshop Ben-COS 2007, Towards Benchmarking Automated Calibration, Orientation, and SurfaceReconstruction from Images, 2007.

[19] F. Kahl. Multiple view geometry and the l∞-norm. In ICCV, pages 1002–1009,2005.

[20] F. Kahl and D. Henrion. Globally optimal estimates for geometric reconstructionproblems. In Proc. 10th Int. Conf. on Computer Vision, Bejing, China, pages 978–985, 2005.

[21] I. Karasalo. A criterion for truncation of the QR-decomposition algorithm for thesingular linear least squares problem. BIT Numerical Mathematics, 14(2):156–166,June 1974.

[22] E. Kruppa. Zur Ermittlung eines Objektes aus Zwei Perspektiven mit innerer Ori-entierung. Sitz-Ber. Akad. Wiss., Wien, math. naturw. Kl. Abt, IIa(122):1939–1948,1913.

[23] Z. Kukelova and T. Pajdla. A minimal solution to the autocalibration of radialdistortion. In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEEConference on, 2007.

[24] Z. Kukelova and T. Pajdla. Two minimal problems for cameras with radial distor-tion. In Proceedings of The Seventh Workshop on Omnidirectional Vision, CameraNetworks and Non-classical Cameras (OMNIVIS), 2007.

[25] Lapack - linear algebra package. http://www.netlib.org/lapack.

88

BIBLIOGRAPHY

[26] D. Lazard. Resolution des systemes d’equations algebriques. Theor. Comput. Sci.,15:77–110, 1981.

[27] D. Nistér. An efficient solution to the five-point relative pose problem. In Proc.Conf. Computer Vision and Pattern Recognition, volume 2, pages 195–202. IEEEComputer Society Press, 2003.

[28] J. Philip. A non-iterative algorithm for determining all essential matrices corre-sponding to five point pairs. Photogrammetric Record, 15(88):589–599, October1996.

[29] R. Pless. Using many cameras as one. In Proc. Conf. Computer Vision and PatternRecognition, Madison, USA, 2003.

[30] H. Stewénius. Gröbner Basis Methods for Minimal Problems in Computer Vision.PhD thesis, Lund University, April 2005.


[32] H. Stewénius, F. Kahl, D. Nistér, and F. Schaffalitzky. A minimal solution forrelative pose with unknown focal length. In Proc. Conf. Computer Vision and PatternRecognition, San Diego, USA, 2005.


[34] H. Stewénius, F. Schaffalitzky, and D. Nistér. How hard is three-view triangulationreally? In Proc. Int. Conf. on Computer Vision, pages 686–693, Beijing, China, 2005.

[35] E. H. Thompson. A rational algebraic formulation of the problem of relative orien-tation. Photogrammetric Record, 14(3):152–159, 1959.

[36] P. Torr and A. Zisserman. Robust parameterization and computation of the trifocaltensor. Image and Vision Computing, 15(8):591–605, 1997.

[37] P. Torr and A. Zisserman. Robust computation and parametrization of multipleview relations. In Proc. 6th Int. Conf. on Computer Vision, Mumbai, India, pages727–732, 1998.

[38] J. Verschelde. Phcpack: A general-purpose solver for polynomial systems by homo-topy continuation. ACM Transactions on Mathematical Software, 25(2):251–276,1999.

89

PAPER III

Submitted to Journal of Mathematical Imaging and Vision.

Image Based Localization Using HybridFeatures

Klas Josephson, Martin Byröd, Fredrik Kahl, Kalle Åström

Abstract

Where am I and what am I seeing? This is a classical vision problem and this paper presentsa solution based on efficient use of a combination of 2D and 3D features. Given a model of ascene, the objective is to find the relative camera location of a new input image. Unlike tradi-tional hypothesize-and-test methods that try to estimate the unknown camera position based on3D model features only, or alternatively, based on 2D model features only, we show that using amixture of such features, that is, a hybrid correspondence set, may improve performance. We useminimal cases of structure-from-motion for hypothesis generation in a RANSAC engine. Forthis purpose, several new and useful minimal cases are derived for calibrated, semi-calibratedand uncalibrated settings. Based on algebraic geometry methods, we show how these minimalhybrid cases can be solved efficiently. The whole approach has been validated on both syntheticand real data, and we demonstrate improvements compared to previous work. The softwarefor solving hybrid geometry problems will be made publicly available.

1 Introduction

Localization refers to the ability of automatically inferring the pose and the position ofan observer relative to a model, cf. [3]. We propose to solve the problem using an image-based approach. The model or the map of the environment can be anything from a singleroom in a building to a complete city. In general, one image will be used as a queryimage, but in principle several images can be used as input. No prior knowledge of theobserver’s position is assumed and therefore the problem is often referred to as globallocalization whereas local versions assume an approximate position. The mapping of theenvironment can be regarded as an off-line process since it is generally done once and forall. Such a mapping can be done using standard Structure from Motion (SfM) algorithms[13, 17, 23], or by some other means.

The key idea of this paper is to use a mixture of 2D and 3D features simultaneously forlocalization. If one were to rely solely on 3D matches, one is restricting the set of possible

93

PAPER III

correspondences to relatively few correspondences and a relatively rich 3D model wouldbe required in order to be successful. On the other hand, using only 2D features requiresrelatively many correct correspondences to generate a single hypothesis. In addition, withexisting methods such as the seven point algorithm of two views [13], one is limited topicking all the 2D correspondences from one single image in the model. Again, one isrestricting the set of correspondences to a relatively small subset. Further, the absolutescale cannot be recovered solely from 2D correspondences of one query image and onemodel image.

Using combinations of 2D and 3D features, what we call hybrid correspondence sets,for generating hypotheses gives a number of advantages. We can make use of all possiblecorrespondences simultaneously, even from different 2D model images. Compared toapproaches using only 2D correspondences, the scale relative to the 3D map can be re-covered and, more importantly, the number of correspondences is smaller which is a goodproperty when using RANSAC. One can argue that in most cases, traditional methodswould work fine. However, we demonstrate that hybrid correspondence sets are indeeduseful and there is simply no reason why this information should not be used as it leadsto improvements.

The three main contributions of this paper are:1. We demonstrate how hybrid feature correspondences (combinations of 2D and 3Dfeatures) can be used for improved image-based localization.2. A complete list of minimal hybrid cases is given and for each case, we also give thenumber of possible solutions possible.3. Algorithms for efficiently computing the solutions of the minimal cases are given.Further, the behavior and stability on synthetic data is evaluated for some cases.

1.1 Related Work

Localization and scene recognition are key components of any autonomous system. Inrobotics, (global) localization is also known as the kidnapped robot problem. Successfulsolutions have generally been achieved with laser, sonar or stereo vision range sensorsand built maps for controlled robots moving in 2D, e.g., [19]. Another example is theDeutches Museum Bonn tour-guide robot RHINO [4] where laser sensors are used. An-other competing technique (at least, for some applications) is GPS. However, the accuracyis typically only in the order of 10-20 meters and no direction information is obtained.

Image-based localization using special landmarks is a common approach, e.g., [2],but this severely limits the flexibility and the applicability of the method. Similar toour approach, distinctive visual features were utilized in [22] to overcome this limitation.They also showed that RANSAC is an effective way of generating hypotheses. However,only 3D model features were used and this requires a rich 3D model to work well.

For large-scale models, an image search technique is required to speed up the process.This can be seen as a pre-processing step which produces a small number of hypotheti-

94

2. PROBLEM FORMULATION

cal part-models that need further verification. Possible such pre-processing schemes aredeveloped in [21].

The wealth of research in the SfM field is, of course, related to the present work, inparticular, the work concerned with RANSAC [27] and wide baseline matching [29, 18].The same approach as proposed in this paper can be used to solve the wide baselinematching problem to build up 3D models [23].

Understanding of the geometry and the number of solutions of minimal structureand motion problems has a long history. For the uncalibrated case, the minimal problemof seven points in two views (which has three solutions) was studied and solved already in1855, cf. [8]. The corresponding calibrated case was in principle solved in 1913 [14]. Thestudy of minimal cases has got increased attention with its use in RANSAC algorithms tosolve both for geometry and correspondence in numerous applications [13, 28].

2 Problem Formulation

To solve the localization problem we are interested in solving the following problem:

Problem 1. Under the assumption that for a query image, there are m potential corre-spondences to image points in views with known absolute orientation and n potentialcorrespondences to scene points with known 3D coordinates, find the largest subset ofthe correspondences that admits a solutions to the absolute orientation problem within aspecified accuracy.

The method that we will use to solve the localization problem is based on hypothesize-and-test with RANSAC [10] and local invariant features [16]. This involves solving min-imal structure and motion problem with hybrid correspondence sets.

3 Minimal Hybrid Correspondence Sets

The classical absolute orientation problem (also known as camera resectioning) for cali-brated cameras for three known points can be posed as finding the matrix P = [R t ],such that λiui = PUi, i = 1, 2, 3.HereR is a 3×3 rotation matrix and t is a 3-elementstranslation vector. Thus, the camera matrix encodes six degrees of freedom of unknownparameters. Each point gives two constrains and therefore three points form a minimalcase. In general there are four possible solutions [11, 12, 13].

We will study the absolute orientation problem both for calibrated cameras as above,for the case of unknown focal length and for the uncalibrated camera case. As explainedin the introduction we will also consider both known 3D-2D correspondences (Ui, ui)as above and 2D-2D correspondences (vi, ui) with features vi in other views. Here wewill assume that the camera matrices of the other views are known, so that a 2D-2Dcorrespondence can be thought of as a 3D-2D correspondence where the unknown 3D

95

PAPER III

point Ui lies on a line expressed in Plücker coordinates. In this paper the (m,n) case de-notes the case of m 2D-2D correspondences and n 3D-2D correspondences. Notice thateach 2D-2D correspondence imposes one constraint and each 2D-3D correspondenceimposes two constraints.

Calibrated Cameras For calibrated cameras there are six degrees of freedom, three fororientation and three for position. One way of parameterizing the camera matrix is to usea quaternions vector (a, b, c, d) for rotation, i.e.

P =

a2 + b2 − c2 − d2 2bc− 2ad 2ac+ 2bd x2ad+ 2bc a2 − b2 + c2 − d2 2cd− 2ab y2bd− 2ac 2ab+ 2cd a2 − b2 − c2 + d2 z

. (1)

Potential minimal cases are:The (0,3) case. This is the well known resection problem, cf. [11, 12, 13] with up to

four solutions in front of the camera.The (2,2) case. This case is given a numerical solution in this paper. The algorithms

works equally well if the 2D-2D correspondences are to the same or to different cameras.There are up to 16 solutions.

The (4,1) case. There are two cases here. In the first case all 2D-2D correspondencesare to the same view. In this first case the problem can be solved by first projecting the 3Dpoint in the known camera and then using the five point algorithm to solve for relativeorientation, (hence up to 10 solutions), cf. [14]. The scale is then fixed using the 2D-3Dcorrespondence. The second case is when the 2D-2D correspondences are to at least twodifferent views. This is studied in this paper and we demonstrate that there are up to 32solutions. No numerical algorithm is presented.

The (6,0) case. This cannot be solved for absolute orientation if all points are fromthe same model view. In this case only relative orientation can be found and only fivecorrespondences are needed as discussed above. However, if the correspondences comefrom different views, it is in fact equivalent with the relative orientation problem forgeneralized cameras, cf. [26], which has up to 64 solutions.

Unknown Focal Length For calibrated cameras with unknown focal length there areseven degrees of freedom, three for orientation, three for position and one for the focallength. One way of parameterizing the camera matrix is as

P =

a2 + b2 − c2 − d2 2bc− 2ad 2ac+ 2bd x2ad+ 2bc a2 − b2 + c2 − d2 2cd− 2ab y

2f(bd− ac) 2f(ab+ cd) f(a2 − b2 − c2 + d2) fz

. (2)

Potential minimal cases areThe (1,3) case. This case is given a numerical solution in this paper.There are 36

solutions.

96

3. MINIMAL HYBRID CORRESPONDENCE SETS

The (3,2) case. This is studied in this paper and we demonstrate that there are up to40 solutions. No numerical algorithm is presented.

The (5,1) case. There are two cases here. In the first case all 2D-2D correspondencesare to the same view. In this first case the problem can be solved by first projecting the 3Dpoint in the known camera and then using the six point algorithm to solve for relative ori-entation and focal length [25]. The scale is then fixed using the 2D-3D correspondence.The second case is when the 2D-2D correspondences are to at least two different views.This is studied in this paper and we demonstrate that there are up to 112 solutions. Nonumerical algorithm is presented.

The (7,0) case. This cannot be solved for absolute orientation if all points are to thesame view. It can be solved for relative orientation using 6 points as discussed above.However for the case of correspondence to different view it should be solvable, but to ourknowledge it is still an open problem.

Uncalibrated Cameras For the uncalibrated camera case there are 11 degrees of free-dom. Each 2D-2D correspondence gives one constraint and each 2D-3D correspondencegives two constraints. Potential minimal cases are

The (1,5) case. This can be solved by as follows. Using the five 3D-2D corre-spondences, the camera matrix can be determined up to a one-parameter family P =P1 + νP2, where P1 and P2 are given 3 × 4 matrices and ν is an unknown scalar. Theremaining 2D correspondence can be parameterized as a point on a line U = C + µDfor some unknown parameter µ. The projection equation gives λu × u = u × PU =u× (P1 +νP2)(U1 +µU2) = 0. Thus we obtain two, second degree constraints in ν, µ.Using resultants, it follows easily that there are two solutions for the unknowns ν, µ.

The (3,4) case. There are two cases here. In the first case all 2D-2D correspondencesare from the same view. In this first case the problem can be solved by first projecting thefour 3D point in the known camera and then using the seven point algorithm to solvefor relative orientation [8]. There are then 3 solutions. The projective coordinate systemis then fixed using the 2D-3D correspondences. The second case is when the 2D-2Dcorrespondences are to at least two different views. In this case the solutions procedure isanalogous to the previous case and can be obtained using resultants, this method showsthat there can be up to eight solutions. No numerical algorithm is presented.

The (1+2k,5-k) case with k = 2, 3, 4. These cannot be solved for absolute orientationif all points originate from one model view. Relative orientation is then still possible usingat least seven points. The remaining case is when the 2D-2D correspondences are to atleast two different views. Once again the methods of the (1,5) case can be used and thereare up to 2(1+2k) solutions. No numerical algorithm is presented.

Summary We conclude this section by summarizing all the minimal cases for hy-brid 2D and 3D feature correspondences, see Table 1. We state an upper bound onthe number of physically realizable solutions. In general, as we shall see later in Sec-

97

PAPER III

tion 5, the number of plausible solutions is much smaller. In the next section, weclarify the calculations for some of these cases. This will also lead to efficient algo-rithms for computing the solutions. Algorithms in matlab for solving the (2,2) and(1,3) cases, that later are evaluated in this paper, will be available for download onhttp://www.maths.lth.se/vision/downloads/.

2D-2D 2D-3D number of cameracorresp. corresp. solutions setting

0 3 4 calibrated2 2 16 calibrated4 1 32 or 10∗ calibrated6 0 64 calibrated

1 3 36 unknown focal3 2 40 unknown focal5 1 112 or 15∗ unknown focal7 0 ? unknown focal

1 5 2 uncalibrated3 4 8 or 3∗ uncalibrated

1 + 2k 5− k 21+2k uncalibrated

Table 1: Minimal hybrid cases for structure from motion. The number of solutionsindicates an upper bound of the number of physically realizable solutions. The solutionnumbers marked with asterisk ”∗” correspond to cases where all 2D-2D correspondencesoriginate from a single (model) view, whereas for other cases, it is implicitly assumed thatthe correspondence set covers multiple views. Note that one case is still an open problem(marked with ”?”). In the table italic is used to denote cases which are analogous to wellknown structure and motion problems and bold face is used for cases that are solved inthis paper. The remaining cases have not be solved to a sufficient detail yet.

4 Solving Minimal Cases with Algebraic Geometry

Minimal structure and motion problems typically boil down to solving a system of poly-nomial equations in a number of unknowns and in this section we describe how thesetypes of polynomial problems can be solved with a combination of algebraic geometryand numerical linear algebra. For a specific application problem the structure of the poly-nomial system is fixed. Thus the number of solutions to a structure and motion problemtypically depends only on the type of problem at hand. Common examples are relativeorientation for calibrated cameras, five points in two views, which has ten solutions [24]and fundamental matrix estimation, seven points in two views, which has three solutions.

98

4. SOLVING MINIMAL CASES WITH ALGEBRAIC GEOMETRY

Solving systems of polynomial equations is a challenging numerical task and thereis no general method of universal applicability as for linear equation systems. A firststep towards solving a certain class of polynomial systems is to determine the number ofsolutions. There are several techniques for doing this. The theory of mixed volumes [9]can be used to prove the number of solutions for a set of polynomial equations assuminggeneral coefficients of the polynomial. The software package phc [30] is useful both forcalculating mixed volume and for finding solutions with homotopy methods. Anothermethod is to calculate the so called Gröbner basis of the polynomial system which theneasily yields the number of solutions to the problem [9]. For problems that are notsynthetic (where the coefficients are represented as floating point approximations) findingthe Gröbner basis is numerically difficult due to accumulating round-off errors. Onetechnique implemented in the computer algebra system Macaulay2 [1] is to work withinteger coefficients and then projecting the equations from C[x] to Zp[x] and computingthe Gröbner basis there.

Apart from providing theoretical information regarding the problem structure, Gröb-ner bases are also a popular and efficient tool for constructing numerical solvers for poly-nomial equation systems. A given polynomial system f1(x) = · · · = fm(x) = 0,generates an ideal I = {g(x) : g = Σkhkfk, hk ∈ C[x]}, consisting of all linear com-binations of the fk, where the coefficients hk are arbitrary polynomials. It is convenienton a theoretical level to work with I since it is more general than the original system ofequations, but has exactly the same zeros. A Gröbner basis G for I is a special set of gen-erators which allows multivariate polynomial division by I denoted fG. Consider nowthe quotient space C[x]/I of equivalence classes of polynomials modulo I (f and g areequivalent modulo I iff f − g ∈ I) and let [f ] denote the equivalence class of f . We get

a convenient representative for [f ] to work with by computing fG

since f1G

= f2G

forf1, f2 ∈ [f ].

For an ideal with a finite number of solutions C[x]/I is a finite-dimensional spacewith the same number of dimensions as solutions [9]. In this space, the multiplicationmappings Txk

: f 7→ xkf are especially interesting. By selecting a basis for C[x]/I , wecan represent Txk

as an r × r matrix mxk, known as the action matrix. The eigenvalues

of this matrix are then the variable xk evaluated at the zeros of I . Moreover, the vectorof basis elements of C[x]/I , which in general is a vector of polynomials, evaluated at thezeros of I equals the eigenvectors of mxk

. For proof of this see [9].Given a set of basis elements B = {b1(x), . . . br(x)} whose equivalence classes span

C[x]/I , all we need to do now is to compute xkbiG

for each bi. In other words, xkbineeds to be expressed in terms of the basis elements {bj}. This can be done as follows.First we multiply the system of polynomials by a set of monomials, yielding a larger setof equations. These equations are all members of the ideal and are hence equivalent interms of solutions, but hopefully linearly independent. We now write the equations onmatrix form

CX = 0, (3)

99

PAPER III

where C is a matrix of coefficients and X = [xα1 , . . . ,xαn ]T is a vector of monomialswith the notation xαi = xαi1

1 . . . xαiss . In the basic case, we now partition the set of

monomialsM into three subsetsM = E∪R∪B, where B is a set of monomials forminga basis for C[x]/I , R = xkB \ B is the set of monomials reached by multiplication byxk and which needs to be expressed in terms of B and finally E = M\ (R∪ B) is theset of monomials which are not involved in forming mxk

. We then get

[CE CR CB

] XEXRXB

= 0. (4)

Since the E monomials are not used for computing the action matrix we eliminate themby putting CE on upper triangular form

[UE CR1 CB1

0 CR2 CB2

]XEXRXB

= 0 (5)

and the keeping only the lower part, yielding

[CR2 CB2

] [XRXB

]= 0. (6)

or equivalentlyXR = C−1

R2CB2XB (7)

allowing us to express all the necessary monomials in terms of monomials in the basis.We will now go into the details of how the systems of equations are derived and how

Gröbner basis solvers as described above can be constructed for the more interesting ofthe hybrid minimal problems studied in this paper.

4.1 Calibrated Cameras

A calibrated camera can be parameterized using quaternions as shown in (1). Assume thatwe have two correspondences between image points and scene points

u1 ∼ PU1, u2 ∼ PU2.

Since there is a freedom in choosing coordinate systems both in the scene and in theimages, these can be transformed into

U1 =

0001

, u1 =

001

, U2 =

1001

, u2 =

10u

.

100

4. SOLVING MINIMAL CASES WITH ALGEBRAIC GEOMETRY

This gives us the following constraints

x = 0, y = 0, ad = −bc,z = u(a2 + b2 − c2 − d2)− 2bd+ 2ac.

As the overall scale of the camera matrix is irrelevant, one can set a = 1 and eliminate daccording to d = −bc. This makes it possible to parameterize the camera matrix as

P =

(1 + b2)(1− c2) 4bc 2c(1− b2) 00 (1− b2)(1 + c2) −2b(1 + c2) 0

−2c(1 + b2) 2b(1− c2) (1− b2)(1− c2) z

.

By putting a = 1 two things happen. First the scale of the camera matrix is fixed,hence the left-hand 3 × 3 sub matrix in (1) will only be a rotation matrix up to scale.This will not have any further impact on the problem since the measurement equationsare homogeneous. The second consequence is that solutions with a = 0 will not beincluded. Since a ∈ R the probability for this is zero, but there might be problems if a isclose to zero. However, as the synthetic experiments will show this is no serious problem.

Assume now that we have two correspondences between image points and points thathave been seen in only one other model image. This gives two points on the viewingline Ci and Di associated to a point vi in the query image. If the line is representedwith Plücker coordinates [13] and the camera is converted to the correspondent Plückercamera the constraints above converts to a single equation. It is further on easy to seethat every nonzero element in the Plücker camera has a common factor of 1 + b2. Afterremoving the common factor, the constraint polynomials (p1, p2) are of order 2 in b andorder 4 in c.

The dimension of the quotient space C[b, c]/I is 16 with I = 〈p1, p2〉 which can bechecked with computer algebra [30]. By multiplying the polynomials with {1, b, c, bc}we obtain 8 equations in 24 monomials. It is then possible to express 8 of the monomialsin terms of the remaining 16 monomials

{bc4, b3c2, c4, bc3, b2c2, b3c, c3, bc2, b2c, b3, c2, bc, b2, c, b, 1}

which then form a basis for the quotient space C[b, c]/I . From this it is straightforwardto construct the 16 × 16 action matrix mc for the linear mapping C[b, c]/I 3 p(c) 7→cp(c) ∈ C[b, c]/I . From the eigenvalue decomposition of the matrix mc the 16 (somepossibly complex) solutions are obtained. Similar calculations give that there are 32 solu-tions for the (4,1) case.

4.2 Unknown Focal Length

For the case of unknown focal length we have one additional unknown. Thus we needone extra constraint. There are several interesting minimal cases: (1,3), (3,2) and (5,1).

101

PAPER III

However for the last case (assuming that all the five points were in correspondence withthe same view) one could solve the relative orientation problem using the six point algo-rithm [25] and then fix the scale using the known 3D correspondence.

Using (2) as parameterization for the camera matrix and assuming that two of the 3Dpoint correspondences are with

U1 =

0001

, U2 =

1001

, u1 =

101

(8)

it is possible to eliminate y = 0 and x = zf = g(b, c, d, f). We fix the scale by settinga = 1. For both the (1, 3) case and the (3, 2) case we get polynomial constraints inthe four remaining unknowns (b, c, d, f). Calculations with computer algebra [30] showthat there are 36 solutions for the (1, 3) case, 40 solutions to the (3, 2) case and 112 inthe (5, 1) case.

Constructing a Solver for the (1,3) Case

Using the parameterization above, one 3D correspondence and one 2D correspondenceremain. The 2D correspondence is handled as in the (2, 2) case by constructing thePlücker camera. This results in one equation. The other three required equations aregenerated from the last 3D point and from U2. If U2 is projected through the camera,two equations appear. By eliminating the depth λ one equation remains. The last twoequations are generated through the last 3D point correspondence (U3, u3). Let u3 =(x1, x2, 1)T and then construct the matrix,

B =

0 −1 x2

−1 0 x1

−x2 x1 0

, (9)

from which three equations can be generated by putting,

BPU3 = 0. (10)

These three equations are linearly dependent and only the two first are kept.The four constructed equations will now be multivariate polynomials where the equa-

tion constructed by the Plücker line will have degree 4 in (b, c, d) and degree 2 in f . Themaximal total degree of the polynomials in this equation will be 6. For the other threeequations derived from 3D-2D correspondences the total degree is 3 and the maximal foreach variable is 2 for (b, c, d) and 1 for f .

The next step in constructing the solver is to expand the coefficient matrix C. At thisstage the matrix will have size 4 × 82. The expansion is made by adding the equations

102

5. SYNTHETICAL EXPERIMENTS

when the polynomial constructed by the Plücker line is multiplied with all monomialsup to degree 4 but with no single variable with a degree higher than 2. The other threeequations are expanded with all monomials up to degree 7 but with no single variable witha degree higher than 5. After this expansion the coefficient matrix C is of size 980×873.

After C is expanded, we divide the monomials into the three sets E , R and B asoutlined above. The basis B is selected to be all monomials of degree 4 and lower exceptfor f4. The set R contains the monomials which are not in B but are produced whenthe monomials of B are multiplied with b. The rest of the monomials are collected in theset E . This partition gives 69 monomials in B, 34 inR and the remaining 770 in E .

We use LU-decomposition to eliminate monomials in the set E , which gives a subsetof equations, now with only 103 monomials. The (1,3) problem is numerically signifi-cantly more challenging than the (2,2) problem and to get a solution a combination oftechniques for improving numerical stability introduced in [6, 7, 5] are used. The tech-niques are essentially to (i) use a matrix factorization algorithm to select a numericallywell conditioned basis set B and (ii) to add some extra elements to B making it linearlydependent, but with the advantage that the added elements do not need to be expressedin terms of the basis since they are now a part of the basis.

5 Synthetical Experiments

5.1 The (2,2)-Solver

The purpose of this section is to evaluate the stability of the algorithm for solving the(2, 2) case introduced in Section 4. To this end we use synthetically generated data in theform of randomly generated cameras and points. This allows us to measure the typicalerrors and the typical number of plausible solutions, over a large range of cases.

The point features are drawn uniformly from the cube ±500 units from the originin each direction. The cameras (two known and one unknown) are generated at approx-imately 1000 units from the origin pointing roughly in the direction of the center of thepoint cloud.

The algorithm has been run on 10,000 randomly generated cases as described above.To evaluate the accuracy of the solutions we take the square root of the sum of the forreprojection errors. The result is illustrated in Figure 1. As can be seen, the error typicallystays as low as 10−15 to 10−10, but occasionally much larger errors occur. However, sincethe solver is used as a subroutine in a RANSAC engine, which relies on solving a largenumber of different instances, these very rare cases with poor accuracy are not a seriousproblem.

As shown in Section 4 the (2,2) calibrated case in general has 16 solutions. Sinceobviously only one of these solutions is the correct one it is interesting to investigate howmany plausible solutions are typically obtained. With plausible solutions we mean realvalued camera matrices that yield positive depths for all four problem points. In Figure 1 a

103

PAPER III

−15 −10 −5 00

0.05

0.1

0.15

0.2

0.25

0.3

0.35

log10

of the reprojection error

Fre

quen

cy

0 2 4 6 80

500

1000

1500

2000

2500

3000

3500

4000

Number of solutions

Fre

quen

cy

Figure 1: Statistics from the evaluation of the solver for the (2,2) case for calibratedcameras. The solver was run on 10,000 randomly generated cases. Left: Histogram overthe error in matrix norm between the estimated camera P ′ and the true camera P . Theerror is plotted on a logarithmic scale. Right: Histogram over the number of real valuedsolutions yielding positive depths.

histogram which shows the typical number of plausible solutions is given. As can be seenthe most common situation is one to four plausible solutions. In one of the 10,000 cases,the algorithm was unable to find a real solution with positive depths for all points. Thisis probably due to numerical problems when the points and/or cameras are unfortunatelypositioned (two or more real solutions irrespective of the sign of the depths were found inall cases). In three of the cases seven solutions were found and in one case eight plausiblesolutions were found. The average number of plausible solutions was 2.6 and the averagenumber of real solutions was 6.4. In some of the cases all 16 solutions were real.

5.2 The (1,3)-Solver

The synthetic experiments for the (1,3) problem were done in the same manner as in the(2,2) case. Four points were generated by randomly drawing from a cube with side length1000 centered at the origin and two cameras with a distance of approximately 1000 tothe origin. One of these cameras was treated as unknown and one was used to get thecamera to camera point correspondence. This gives one unknown camera with threepoint correspondences and one line correspondence. The experiment was run 10,000times.

The result is shown in Figure 2. As can be seen, the error is not as low as in the (2,2)case but the numerical precision is still good enough and in most cases the error staysbelow 10−5. The same figure also holds a histogram over the number of solutions thatare real and with positive focal length. The histogram shows that even though there canbe up to 36 real solutions in theory, the actual number of plausible solutions is usuallybelow ten.

104

6. LOCALIZATION SYSTEM EXPERIMENTS

−15 −10 −5 00

0.05

0.1

0.15

0.2

0.25

Log10

of relative error in focal length

Fre

quen

cy

0 5 10 150

200

400

600

800

1000

1200

1400

1600

1800

Number of solutions

Fre

quen

cy

Figure 2: Statistics from the evaluation of the solver for the (1,3) case for calibratedcameras with unknown focal length. The solver was run on 10,000 randomly generatedcases. Left: Histogram over the error focal length between the estimated camera and thetrue camera. The error is plotted on a logarithmic scale. Right: Histogram over thenumber of real valued solutions with positive focal length.

6 Localization System Experiments

To test the new methods on real data a complete localization system was constructed.Since this paper treats the pose problem, and not the construction of the model, therewill only be a short description of how the model was built. For more on model con-struction [23] is recommended. Localization with some of the new minimal cases wasthen tested and compared with two previous methods. In the calibrated case the previousmethod is the (0,3)-solver [12] and in the uncalibrated case we have compared with alinear resection algorithm using six correspondences.

6.1 Construction of the Model

The model was built from a set of unordered images with nothing known about thepositions of the cameras. The calibrations of the cameras were known for these images.On all images SIFT descriptors [16] were calculated. The descriptors were then usedtogether with the knowledge of the calibration to find the relative motion between everypair of cameras, with enough corresponding points, using the five point algorithm [20].By using the pairwise connection between all images the best corresponding triplet wasfound. By best triplet we here mean the set of three images with most correspondingpoints common for all three views. Two of the images in the triplet were then used as thefoundation of the model to fixate the unknown scale.

After that a third camera was added. This was done by using the known relativemotion to one of the first two cameras. Since this relative motion was known it is enoughwith one single point to get the scale. To find the scale a RANSAC engine was used.The remaining cameras were then added stepwise choosing the camera with most known

105

PAPER III

correspondences to points that were known in the 3D-space. The placement was thenperformed in the same way as with the third camera. After each camera was added bundleadjustment was applied to points and cameras. In the beginning all points and all cameraswere used in the bundle adjustment step, but to speed up the process only the new cameraand new 3D points were optimized when the size of the model increased. The modelconstruction is not focused on removing outliers even though RANSAC is used, whichwill result in many outliers remaining in the model. The outliers will be handled in thelocalization phase.

The complete model holds the following data so it can be used in a hybrid localizationsystem:

SIFT: All SIFT features found in all images.

Points: The normalized coordinates of all SIFT features in all images.

Structure: The coordinates of all triangulated points in the 3D-space.

Cameras: All positions of the cameras in the model.

Tracks: Groups of feature points that correspond to a single 3D point.

In addition to this the model also holds mappings between these objects.For the experiments two models were constructed of the same area but with slightly

different properties. The first model, called Model 1 in the following, is built from asequence of only seven images. These images are taken along a street with approximatelyfifty meters baseline between the most separated views. The sparsity of this model makesit very challenging for a localization system. The second model, called Model 2, wasconstructed using thirty images but covers a much shorter distance. The maximal base-line between two images is approximately 10 meters. Examples of both images used toconstruct the model and test images are shown in Figure 3. The test images were takenapproximately two months after the model images and with a different camera. Somestatistics on the two models are given in Table 2.

Model 1 Model 2Number of cameras 7 30Number of SIFT features 18675 65385Number of 3D points 593 3199Number of 2D points 17097 55925

Table 2: Some data about the models. The SIFT features are the complete set of SIFTfeatures from all images in the model, the 3D points are all triangulated point and the2D points are all SIFT features which do not correspond to a triangulated 3D point.

106


Figure 3: On the left is one of the images used to construct the models and on the rightis one of the test images. The test images were taken about two months after the modelimages. Different cameras were used when the model and test images were taken.

6.2 Testing of Localization

Two of the new methods given in this paper are tested: the (2,2) solver for calibratedcameras and the (1,3) solver for calibrated cameras with unknown focal length. In the(2,2) case the number of inliers is measured after the RANSAC step has been carried outand is compared with the number of inliers when the three point solver is used in thesame RANSAC loop. All located cameras were also manually examined, since we do nothave any ground truth data, to decide whether the localization was correct.

In the (1,3) case more experiments are done. Due to the fact that the focal length iscalculated this value can be verified. This is done by first calculating the calibration ofthe camera with more data than during the localization step. By then normalizing thecoordinates we know that the focal length should be one since the inner calibration isremoved.

The comparison for this method is made with uncalibrated cameras. To do that theprincipal point of the camera is assumed to be in the middle of the image, the skew is fixedto zero and the aspect ratio to one. These assumptions on the camera matrix are true foralmost all digital cameras constructed today, even though the principal point may differsomewhat in location. Since the (1,3) solver is tested in uncalibrated images the solver iscompared with a linear solver of six three dimensional points. This is not a minimal case,but it is probably the most used method on uncalibrated images today.

The localization step is carried out as follows for all methods used. First the SIFTdescriptors are calculated for the query image. After that, all these SIFT features arecompared with those in the model to find the closest match. To make this more robusta distance ratio between the best and the second best match is used. The threshold forthe ratio is fixed to 0.7 and the match is only accepted if the matching score for the firstmatch is 0.7 better than the second match. For each established point track, we only

107

PAPER III

match to the best point in the track.This will give a set of matches between the query image and the model. Some of these

matches are to 3D points and some correspond to 2D features without triangulated 3Dpoints. These matches will also contain a significant number of outliers. To handle theoutliers RANSAC is used with the proposed minimal solvers. When the number of inliersis counted during the RANSAC iterations the number of 2D correspondences is dividedby 10. This is done to balance the fact that the number of 2D points in the model isabout ten times as large as the number of 3D points. The thresholds for the reprojectionerror in the RANSAC algorithm is fixed to 0.6% of the image size as in [23]. For the 2Dpoints this value is reduced by a factor ten since the 3D points are first triangulated usingthe 2D points. As a final step of the localization, bundle adjustment is performed on thecamera position using the 3D inliers.

Test of the (2,2) Solver

The (2,2) solver is tested and compared with the three point solver [12]. Both thesemethods assumes calibrated cameras. Figure 4 shows an example of a test image whichhas been correctly positioned by both methods.

Figure 4: The reprojection of visible 3D points when the (2,2) solver (left) and the threepoint solver (right) are used. One out of four model points in Model 2 is reprojected andboth solvers have resulted in a correct position.

To evaluate the solver, all images that had three or more correspondences to 3D pointsare used. On these images, 500 RANSAC iterations were performed and the number ofinliers was counted. For the (2,2) solver, both the numbers of 3D and 2D inliers aregiven. The result of this experiment is shown in Table 3. The three point solver gives aslightly higher number of 3D inliers. The fact that an incorrectly placed image usuallygives the (2,2) solver two inliers but gives the three point solver three inliers is probablythe reason that the three point solver has a slightly higher number of inliers.

108


Model 1 Model 2 Models 1 & 2Inliers 2D Inliers 3D Inliers 2D Inliers 3D Inliers 2D Inliers 3D

(2,2) solver 9.83 4.60 7.63 9.41 8.68 7.11(0,3) solver 4.72 9.88 7.42

Table 3: The mean number of inliers after 500 RANSAC iterations for the (2,2) solverand the three point solver. Model 1 is a sparse model and Model 2 much denser.

Furthermore, the number of correctly located images was counted. For the (2,2)solver, the number of correctly placed cameras was 55 out of 65 images in Model 1 andthe corresponding figure for the three point solver was 45. The total number of images,are as before those test images that had at least three 3D correspondences to the model.The images with fewer correspondences are not counted. In Model 2 the figures are closer.There were 54 images correctly placed with the (2,2) solver and 53 images with the threepoint method. The total number of images was 71. The reason why the performancediffers more in the first model is probably due to the sparsity of that model. When themodel is sparse the importance of also using the 2D correspondences increases.

Test of the (1,3) Solver

If we assume nominal values for the principal point, skew and aspect ratio, then the (1,3)solver can be used on uncalibrated images. Making these assumptions, we compare theresult with the six point linear solver for the uncalibrated case. The results of an imagepositioned by the (1,3) solver and the six point solver are shown in Figure 5.

Figure 5: The reprojection of visible 3D points when the (1,3) solver (left) and the sixpoint solver (right) is used. In this example Model 1 is used and all points in front of thecamera are reprojected. The (1,3) solver has a correct position but the six point solver hasnot.

As in the previous case, we used all images that had more than six 3D correspondences

109

PAPER III

and hence could be used in both algorithms. On these images the 500 RANSAC itera-tions were performed and the number of inliers was counted. For the (1,3) solver boththe numbers of 3D and 2D inliers are given. The result of this experiment is presented inTable 4. The results are very similar to those of the previous case.

Model 1 Model 2 Models 1 & 2Inliers 2D Inliers 3D Inliers 2D Inliers 3D Inliers 2D Inliers 3D

(1,3) solver 14.7 6.52 9.84 11.36 11.42 9.78(0,6) solver 7.37 12.20 10.63

Table 4: The number of inliers after 500 RANSAC iterations for the (1,3) solver and thesix point solver. Model 1 is a sparse model and Model 2 much denser.

The number of correctly located images was also counted, as in the previous case. Forthe (1,3) solver the number of correctly placed cameras was 22 in Model 1, out of 32images. The six point method on the other hand only managed to locate 4 of these 32images correctly. In the second model, the (1,3) method succeeded in the localization in41 out of 62 trials. On the same data, the six point solver managed to place 14 camerascorrectly.

Determining the Focal Length

The (1,3) solver also estimates the focal length and this can be used to evaluate the solversince we can find the calibration of the camera using other data. The experiment is carriedout as follows. The localization step is performed as usual but the calibration dependencyis removed in the beginning by normalizing the image coordinates. The result shouldthen be that the (1,3) solver returns a focal length equal to one. To test this every imagewith enough of correspondences was localized. All images that got at least one extra inlierin 3D were then used in kernel voting [15] to estimate the focal length. The constraintthat one extra inlier should appear is used to remove a large part of the miss placed images.The result of the kernel voting is shown in Figure 6. As can be seen in the figure, the mainpeak is localized close to one.

7 Conclusions

In this paper we have presented new minimal cases for the resection problem. These usea mixture of correspondences to known 3D points and correspondences to points thathave only been found in one image in the model. In all except one of these minimalcases we have given an upper bound on the possible number of solutions with use ofGröbner basis techniques. In two of the cases we have also presented and evaluatedsolvers. The first of these cases is the (2,2) problem that finds the pose for a calibratedcamera. The solution with Gröbner basis techniques leads to a very fast and numerically

110

7. CONCLUSIONS

−1 −0.5 0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Focal length

Fre

quen

cy

Figure 6: The result of the kernel voting for the focal length. The true focal length in thisexperiment is one. In the kernel voting only estimations with more the minimal number,which is three, of inliers are used in the voting process.

stable algorithm. We also present a solver for the (1,3) case for cameras with unknownfocal length. This problem is much more complicated than the (2,2) problem but wecan still present a numerically sound algorithm that is fast with Gröbner basis methods.Both these methods are tested in a complete localization system and they are shown toimprove on the current state of the art. The experiments show also that the (1,3) solverfor pose and unknown focal length can be used on uncalibrated cameras under somereasonable assumptions. With these assumptions the solver shows promising results. Theimprovements are most significant on sparse models.

In the paper all methods are used separately. A convenient way to extend this workwould be to use all methods simultaneously and make an automatic choice of whichmethod to use depending on the data present. By that our new methods can help toimprove the robustness of every localization system using hybrid cases. In future workwe intend to apply the idea of hybrid features to evaluate the performance on publiclyavailable benchmarks for example Zurich Building Image Database to compare with thestate of the art.

111

PAPER III

112

Bibliography

[1] D. Bayer and M. Stillman. Macaulay. www.math.columbia.edu/bayer/Macaulay/,1994. An open source computer algebra software.

[2] M. Betke and L. Gurvits. Mobile robot localization using landmarks. IEEE Trans.on Robotics and Automation, 13(2):251–262, 1997.

[3] J. Borenstein, B. Everett, and L. Feng. Navigating Mobile Robots: Systems and Tech-niques. A. K. Peters, Ltd., Wellesley, MA, 1996.

[4] W. Burgard, A.B. Cremers, D. Fox, D. Hahnel, G. Lakemeyer, D. Schulz,W. Steiner, and S. Thrun. The interactive museum tour-guide robot. In NationalConference on Artificial Intelligence (AAAI’98), Madison, Wisconsin, USA, 1998.

[5] M. Byröd. Fast and stable polynomial equation solving and its application to com-puter vision. Licentiate thesis, Centre for Mathematical Sciences LTH, Lund Uni-versity, Sweden, 2008.

[6] M. Byröd, K. Josephson, and K. Åström. Fast optimal three view triangulation. InAsian Conference on Computer Vision, 2007.

[7] M. Byröd, K. Josephson, and K. Åström. Improving numerical accuracy of gröbnerbasis polynomial equation solvers. In Proc. 11th Int. Conf. on Computer Vision, Riode Janeiro, Brazil, Rio de Janeiro, Brazil, 2007.


[9] D. Cox, J. Little, and D. O’Shea. Using Algebraic Geometry. Springer Verlag, 1998.

[10] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for modelfitting with application to image analysis and automated cartography. Commun.Assoc. Comp. Mach., 24:381–395, 1981.

[11] J. A. Grunert. Das pothenot’sche problem, in erweiterter gestalt; nebst bemerkun-gen über seine anwendung in der geodäsie. Archiv der Mathematik und Physik,1:238–248, 1841.

[12] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nölle. Analysis and solutions for thethree point perspective pose estimation problem. In Proc. Conf. Computer Visionand Pattern Recognition, pages 592–598, 1991.

113

PAPER III


[14] E. Kruppa. Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Ori-entierung. Sitz-Ber. Akad. Wiss., Wien, math. naturw. Kl. Abt, IIa(122):1939–1948,1913.

[15] H. Li and R. Hartley. A non-iterative method for lens distortion correction frompoint matches. In Workshop on Omnidirectional Vision, Beijing China, October2005.

[16] D. Lowe. Distinctive image features from scale-invariant keypoints. Int. JournalComputer Vision, 2004.

[17] Y. Ma, S. Soatto, J. Kosecka, and S. Sastry. An Invitation to 3-D Vision: From Imagesto Geometric Models. Springer Verlag, 2003.

[18] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo frommaximally stable extremal regions. In British Machine Vision Conf., pages 384–393,Cardiff, UK, September 2002.

[19] P.M. Newman, J.J. Leonard, J. Neira, and J.Tardos. Explore and return: Experi-mental validation of real time concurrent mapping and localization. In Int. Conf.Robotics and Automation, pages 1802–1809, 2002.

[20] D. Nistér. An efficient solution to the five-point relative pose problem. In Proc.Conf. Computer Vision and Pattern Recognition, volume 2, pages 195–202. IEEEComputer Society Press, 2003.

[21] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Conf.Computer Vision and Pattern Recognition, volume II, pages 2161–2168, New YorkCity, USA, 2006.

[22] S. Se, D.G. Lowe, and J.J. Little. Vision-based global localization and mapping formobile robots. IEEE Transactions on Robotics, 21(3):364–375, 2005.

[23] N. Snavely, S. M. Seitz, and R. Szeliski. Modeling the world from internet photocollections. International Journal of Computer Vision, 2007.


[25] H. Stewénius, D. Nistér, F. Kahl, and F. Schaffalitzky. A minimal solution forrelative pose with unknown focal length. In Conf. Computer Vision and PatternRecognition, volume 2, pages 789–794, San Diego, USA, 2005.

114

BIBLIOGRAPHY


[27] P. Torr and A. Zisserman. Robust computation and parametrization of multipleview relations. In Int. Conf. Computer Vision, pages 727–732, Mumbai, India,1998.

[28] B. Triggs. Camera pose and calibration from 4 or 5 known 3d points. In Proc.7th Int. Conf. on Computer Vision, Kerkyra, Greece, pages 278–284. IEEE ComputerSociety Press, 1999.

[29] T. Tuytelaars and L. Van Gool. Matching widely separated views based on affineinvariant regions. Int. Journal Computer Vision, 59(1):61–85, 2004.

[30] J. Verschelde. Algorithm 795: Phcpack: a general-purpose solver for polynomialsystems by homotopy continuation. ACM Trans. Math. Softw., 25(2):251–276,1999.

115

N EW R ESULTS ON T RIANGULATION P OLYNOMIAL E QUATION … · 2008-05-22 · N EW R ESULTS ON T...

Documents

Transcript of N EW R ESULTS ON T RIANGULATION P OLYNOMIAL E QUATION … · 2008-05-22 · N EW R ESULTS ON T...