The International Journal of Robotics Research...

http://ijr.sagepub.com/Robotics Research

The International Journal of

http://ijr.sagepub.com/content/5/3/27The online version of this article can be found at:

DOI: 10.1177/027836498600500302

1986 5: 27The International Journal of Robotics ResearchO.D. Faugeras and M. Hebert

The Representation, Recognition, and Locating of 3-D Objects

Published by:

http://www.sagepublications.com

On behalf of:

Multimedia Archives

can be found at:The International Journal of Robotics ResearchAdditional services and information for

http://ijr.sagepub.com/cgi/alertsEmail Alerts:

http://ijr.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://ijr.sagepub.com/content/5/3/27.refs.htmlCitations:

What is This?

- Sep 1, 1986Version of Record >>

at Massachusetts Institute of Technology on November 11, 2013ijr.sagepub.comDownloaded from at Massachusetts Institute of Technology on November 11, 2013ijr.sagepub.comDownloaded from

http://ijr.sagepub.com/


http://ijr.sagepub.com/content/5/3/27

http://ijr.sagepub.com/content/5/3/27



http://www.ijrr.org/

http://www.ijrr.org/

http://ijr.sagepub.com/cgi/alerts

http://ijr.sagepub.com/cgi/alerts

http://ijr.sagepub.com/subscriptions

http://ijr.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://www.sagepub.com/journalsPermissions.nav

http://ijr.sagepub.com/content/5/3/27.refs.html

http://ijr.sagepub.com/content/5/3/27.refs.html

http://ijr.sagepub.com/content/5/3/27.full.pdf

http://ijr.sagepub.com/content/5/3/27.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml

http://online.sagepub.com/site/sphelp/vorhelp.xhtml





27

The Representation,Recognition, andLocating of 3-DObjects

O. D. FaugerasINRIADomaine de Voluceau-RocquencourtB.P. 10578153 Le Chesnay, France

M. HebertCarnegie Mellon University5000 Forbes Avenue

Pittsburgh, Pennsylvania 15213

Abstract

The problem of recognizing and locating rigid objects in 3-Dspace is important for applications of robotics and naviga-tion. We analyze the task requirements in terms of whatinformation needs to be represented, how to represent it, whatkind of paradigms can be used to process it, and how toimplement the paradigms. We describe shape surfaces bycurves and patches, which we represent by linear primitives,such as points, lines, and planes. Next we describe algo-rithms to construct this representation from range data. Wethen propose the paradigm of recognizing objects while locat-ing them. We analyze the basic constraint of rigidity that canbe exploited, which we implement as a prediction and verifi-cation scheme that makes efficient use of the representation.Results are presented for data obtained from a laser rangefinder, but both the shape representation and the matchingalgorithm are general and can be used for other types ofdata, such as ultrasound, stereo, and tactile.

1. Introduction

In this article we focus on the task of constructingdescriptions of static scenes containing various kindsof 3-D objects when range data and 3-D models ofthese objects are available, and we explore the problemof recognizing and locating such objects in the work-space. One of the objects we want to be able to handlein the vision program is presented in Fig. 1. The spe-cific range data that have been used in the examplespresented in this article come from a laser range finderdeveloped at INRIA (Faugeras et al. 1982), but the

ideas and the results should be fairly independent ofthe origin of the data. With this in mind, we ask our-selves the following questions: How do we representmodels and scenes? What constraints and what para-digms are relevant?

Recently, there has been a surge in the developmentof methods for dealing with range data. Recoveringdepth information from images has always been animportant goal in computer vision. Several stereoprograms have been developed: e.g., Marr and Poggio(1979); Grimson ( 1981 ); Baker and Binford (1982);Nishihara ( 1983); and Ohta and Kanade ( 1983). Theseprograms are capable of producing fairly accurate anddense range maps in times that will eventually becomerealistic if the right hardware is built. Some promisingtechniques like shape from X (Horn 1975; Witkin1981 ) have also been investigated.Other approaches to the computation of depth based

on the idea of active ranging have been proposed.These approaches fall into two broad categories: (1)active triangulation and (2) time of flight techniques.Active triangulation techniques use an extra source oflight to project some patterns onto the objects to bemeasured, thereby reducing the stereo-matching prob-lem complexity (Faugeras et al. 1982; Oshima andShirai 1983). Time of flight techniques send some en-ergy (ultrasonic or electromagnetic) toward the sceneand measure the time it takes for some of it to returnto the source after reflection. In both cases, distances toscene points are immediately available without furtherprocessing.None of the techniques we mentioned so far use

direct contact. We should also mention tactile sensingmethods, which can also provide information aboutpositions of points on the grasped object as well aslocal normal orientation (Gaston and Lozano-Perez

at Massachusetts Institute of Technology on November 11, 2013ijr.sagepub.comDownloaded from



28

1983). A lot of work still remains to be done to obtainfaster and more accurate range sensors using themethods we just mentioned, but it is realistic to askourselves the following types of questions:What can be done with these data?What task domains can be tackled?What kinds of representations and computational

paradigms are useful?What kinds of constraints do they allow us to use?How much information is actually needed, given

those constraints, to solve a particular problem?

To summarize our conclusions, we shall find that:

1. Representations should be in terms of linearprimitives, such as points, lines, and planes,even if at some intermediate level we deal withthings like curved surface patches.

2. The fundamental constraint to be exploited isthat of rigidity.

3. The basic paradigm to be used is that of recog-nizing while locating (or vice versa).

Our work is related to that of Oshima and Shirai

(1983), who were pioneers in this area, and also tothat of Grimson and Lozano-Perez (1983), Bolles andHoraud (1984), and Brady et al. (1984).

2. Representing 3-D Shapes

2.1. EXTRACTING RELEVANT PRIMITIVES FROM 3-DDATA

The question of representing 3-D shapes is a very basicone. Because a representation is a set of data structuresand the algorithms that operate on them, it is mean-ingless to discuss a representation without reference toa class of applications. In light of this, we now brieflydiscuss several categories of representations and relatethem to our own choices.

Traditionally there have been two approaches to therepresentation of shapes. Hierarchical representationsdeal explicitly with various resolutions, thereby allow-ing the objects to be manipulated at different levels ofprecision. Homogeneous representations, on the otherhand, deal with only one resolution. We prefer homo-geneous representations because they are simpler to

Fig. 1. Example of an indus-trial part.

use and build and because it is still not certain that we

gain anything with hierarchical representations interms of the specific problems we are trying to solve.Some promising work in this area has been performed,however: e.g., Marr and Poggio (1979); Brady et al.(1984).Next we address the issue of volume and surface

representations. We believe that accessibility is the keypoint here: i.e., whether or not the representation canbe extracted reliably from the output of the sensors.Volume representations are potentially very rich andlend themselves to the computation of many otherrepresentations. Descriptions obtained by decompos-ing the inside volume of the objects into elementaryvolumes such as cubes or rhombododecahedra are not

very well suited to our problem since they depend ona reference grid and are not intrinsic to the objects(Ponce 1983). We believe that intrinsic object repre-sentations, such as the one proposed by Boissonnat(1984), where the volumes of objects are decomposedinto tetrahedra, will prove to be much more powerfulthan grid-dependent decompositions.

Another popular type of representation is based onthe idea of a skeleton as a complete summary of theshape as well as a way of making explicit some impor-tant symmetries (Agin 1972; Marr and Poggio 1979;Boissonnat 1984; and Brady and Asada 1984). Webelieve that this kind of representation can also beextremely useful, but it is not very well suited to ourspecific problem because it is not robust to partial




29

occlusion. Indeed, a small perturbation in the mea-surements may imply a fairly big change in the shapeof the skeleton. However, recent work by Brady et al.(1984) seems to indicate that some significant sym-metry properties can be reliably extracted by localoperators.

Surface representations seem ideal for recognizingand localizing objects for at least three reasons:

1. Robustness to occlusion. We can still find the

equation of a planar patch even though it ispartially hidden.

2. Robustness to the position of the viewpoint.The type of surface patch (e.g., planar, quadric)is fairly stable with respect to variations in theposition of the viewpoint.

3. It is possible to choose a description of thesurface representation that follows simple rulesof transformation when rotated or translated.

In view of all this, our surface description is in termsof points, curves, and surface patches. Points arecorners on the surface or centers of symmetry, as weshall see later in this paper. Curves are either internalboundaries or symmetry axes. Surface patches areeither planes or quadrics. Another extremely impor-tant issue is deciding what features are used to repre-sent these primitives.Two requirements are necessary for solving the 3-D

recognition and localization problem. First we wantthe features to be somewhat stable with respect topartial occlusion. Second we want the features to carryenough information to allow us to recover positionand orientation. Standard numerical features, such as

, elongations, length, perimeter, and surface, do notsatisfy any of these requirements and should thereforebe used with caution. Topological features, such asconnectivity, genus, or number of neighbors, sufferfrom the same handicap. On the other hand, geometricfeatures, such as equations of curves or surfacepatches, are more stable with respect to partial occlu-sion and can be used to recover position and orienta-tion, as we shall see later.

It is important to note that primitives are ultimatelydescribed in terms of linear entities (points, lines, andplanes) because we think that most of the computa-tions used for the control of the matching process areintractable when nonlinear representations are used:

e.g., when the equation of a quadric surface is usedinstead of its principal directions. Moreover, it seemsthat linear features are sufficient for describing a widerange of objects.

2.2. CHARACTERISTIC POINTS ON THE SURFACE

The idea of extracting and matching characteristicpoints has been used in the field of intensity imagesanalysis and it can also be used in the case of 3-Ddata. For example, one can extract spikes of an objectthat are defined as local maxima of the curvature.This class of primitives is not suitable for the controlof the search problem because matching two pointsdoes not put a strong constraint on the rigid transfor-mation.

2.3. BOUNDARIES

2.3.1. Definition

Internal boundaries such as troughs or creases containimportant information about an object’s shape justlike edges in an image because they are localized inspace and can be robustly extracted even in the pres-ence of measurement noise. Internal boundaries arecurves on the object where its surface undergoes a C’discontinuity. Brady et al. (1984) have applied the 1-Dcurvature primal sketch to principal lines of curvatureto detect local extrema in curvature. This method is

expensive, and we proceed differently.

2.3.2. Detection AlgorithmsNow we discuss two different methods for detectingalgorithms based on the preceding definition. Onemethod consists in noticing that when we cross an in-ternal edge, one of the principal curvatures reaches alocal maximum. This is the principal curvature in thedirection perpendicular to the edge (see Fig. 2). Onemethod for detecting internal edges follows (Ponceand Brady 1985):

1. Compute principal curvatures and directionsat every point by fitting locally polynomials (inpractice, quadrics) to the set of range values.




30

Fig. 2. Definition of theinternal boundaries.

2. Perform nonmaxima suppression in the princi-pal directions.

3. Link the local maxima in the principal direc-tion orthogonal to that of the maximum.

Another method is based on the observation thatwhen we cross an internal boundary, the normal to thesurface is discontinuous under the general viewer as-sumption. The problem is thus reduced to that offinding a local maximum of the magnitude of the direc-tional derivative of the normal in some local coordi-nate system. One method for detecting internal edgesfollows:

1. Estimate the normal at every point.2. Compute the magnitude of the directional

derivative of the normal in a number of direc-tions (typically 8) at every point.

3. Perform nonmaxima suppression.

Mathematically, both methods are equivalent. Inpractice, however, we have found that the secondmethod is more effective because of some noise sensi-

tivity problems attached to the estimation of principalcurvatures in the first method. However, Ponce andBrady (1985) have shown that smoothing algorithmsmay significantly reduce the sensitivity to noise. Re-sults of applying the second method to one view of theobject in Fig. 2 are shown in Fig. 3.

2.3.3. Representation of Boundaries and theOrientation Problem

The outcome of the previous algorithms is a set ofdiscrete curves in 3-D space that can be represented in

Fig. 3. Results of internalboundaries detection.

various ways. Our previous experience with 2-Dcurves (Ayache and Faugeras 1984) leads us to believethat some reasonable polygonal approximationscheme is sufficient for the problem we want to solve.Therefore we propose to represent edges as polygonalchains. This raises the question of the ambiguity of therepresentation of a line in 3-D space. As shown in Fig.4, we represent a line by the pair (v, d), where d is thevector distance to the origin and v is a unit vectorparallel to the line. Notice that the representation(- v, d) is equivalent to <q 2>. Consequently, contraryto the case of occluding edges in which a segment hasan intrinsic orientation with respect to the direction of

observation, there is no intrinsic way of choosing theorientation of those line segments. This in turn in-creases the combinatorics of the problem.

’

2.4. SURFACE REPRESENTATION

2.4. l. Type of Primitives

The type of surface primitives that can be used forrecognition of objects is highly constrained by the fea-sibility of the corresponding segmentation algorithm.It is difficulty to control the segmentation algorithm




31

Fig. 4. Representation of theprimitives. A. Line. B. Plane.C. Quadric.

when the degree of the surface is high. We need morepoints to reliably fit a high-degree polynomial than alow-degree polynomial surface, and therefore regionswill tend to be more blurred in the first case than inthe second. High-order surfaces cannot be used asefficiently in the matching process as linear ones be-cause the coefficients of a polynomial of some degreeare polynomial functions of the same degree of therotation and translation applied to the patch.

The two types of primitives we use are the planesand the quadric patches surface primitives. We presentnow some notations and conventions for representingthese primitives and then describe the segmentationalgorithm.

Planes A plane is represented by a vector v and ascalar d. The equation of the plane is

where &dquo; - &dquo;’ is the inner product, v is the normal, and dis the distance to the origin (see Fig. 4). A plane hastwo different equivalent definitions (v, d) and(- v, - d). This orientation problem is easily solved byorienting the normal toward the outside of the object.

Quadrics The standard representation of a quadricsurface is a 3 X 3 symmetric matrix A, a vector v, anda scalar d. The equation of the surface is

As we have mentioned previously, we want to avoidusing high-degree polynomials. We prefer a represen-tation of quadric surfaces using linear features (seeFig. 4), as described below.

The principal directions of the quadric are theeigenvectors vl, v2, V3 of A. Notice that thesevectors do not have a canonic orientation.

The center of the quadric (when it exists) is defined by

Other information that could be used is the type ofsurface (e.g., cylinder, ellipsoid), which is related to thesigns of the eigenvalues of A. The type of surface isreliable only if the eigenvalues are large. In the case ofquadric surfaces, the representation is unique up to ascale factor. The way of defining a unique representa-tion with respect to scaling is discussed in the sectionon surface fitting.

2.4.2. The Segmentation Problem

The Region-Growing rllgorithm Primitive surfacescan be extracted from the original data in two ways.




32

First, the whole set of points is considered as a primi-tive and is split into smaller regions until a reasonableapproximation is reached. Second, we can merge itera-tively the current set of regions (the initial set of re-gions being the set of original points) until the approx-imation error becomes too large. So far, the onlysplitting schemes are with respect to a fixed coordinatesystem (e.g., octrees, quadtrees), and we think that amuch better way to go would be to split in a way thatis intrinsic to the object (prismtrees), but we have notbeen able to come up with such a scheme. Therefore,the region-growing method seems more appropriate.

Let us assume that we are able to compute a mea-sure E(S) of the quality of the fit between region Sand the generic primitive surface. (This measure is de-scribed in the next section.) The region-growing algo-rithm merges the neighboring nodes of a current graphof surface patches and stops when no more mergescan be performed according to a maximum error valueEmax. The initial graph can be either derived from arange image or built from a set of measurements onthe object in the form of a triangulation (see Fig. 5).

Several strategies can be implemented for the selec-tion of the pair of regions to be merged at a giveniteration and for the kind of control on the error. Thebest solution is to use as global a strategy as possible,which means that the evolution of the segmentation isdetermined by the quality of the overall description ofthe surface. The global control prevents the segmenta-tion from being perturbed by local noisy measure-ments. In terms of implementation, the global controlhas two consequences:

1. At each iteration, the regions R; and Rj whichproduce the minimum error E(Ri U R) amongthe whole set of pairs are merged.

2. The program stops when the global error~N IE(R;) is greater than E..

Surface Fitting The computation of the error measureis presented in this section, with planes and quadricstreated separately.

Planes In the case of planar patches, the error E isdefined as the distance between the points of the regionand the best-fitting plane in the least-squares sense:

Fig. 5. Triangulation of theRenault part.

The function F is homogeneous with respect to theparameters v and d. Therefore, we have to constrainthe problem in order to avoid the trivial solution (0, 0).The most natural constraint is II v II = 1, which has theadvantage of being invariant with respect to rotationsand translations. At the minimum, aF/ad = 0. There-fore, we obtain the expression for d:

where N is the number of points. Reporting this rela-tion in Eq. (2), we obtain

Finally, the direction of the best-fitting plane is thevector vm;n corresponding to the smallest eigenvalueÀmïn of the matrix

where




33

The resulting error is also the smallest eigenvalue Amin.

Quadrics The approach is the same as with the planesexcept that we do not have a clear definition of thedistance from a point to a quadric surface. The sim-plest way is to define the error measure as using thenotations of Eq. (1).

For the purpose of segmentation, we represent a quad-ric surface by ten numbers a1 ...10 related to the pre-vious description by the relations:

The function F is homogeneous with respect tothe parameters at . Therefore, we have to constrainthe problem in order to avoid the trivial solution[0, ... , 0]. Several constraints can be designed sincethere exists no natural constraint as in the case ofplanes (II y = 1). The most frequently used constraintsare

where Tr is the trace of the matrix. The first two con-straints are not invariant with respect to a rigid trans-formation, rotations, or translations. This implies thata surface observed from two different directions will

have two different sets of parameters when expressedin the same coordinate system. So, we use the thirdconstraint, which is invariant by translation, becauseA is invariant, and by rotation, because the trace oper-ator is also invariant.The three vectors are defined as P = (a, , ... ,

alo)’, P, = (a,, ... , a6)’, and P2 = (a7, ... , alo)’.Using these vectors, our constraint is

Since the function F defined in Eq. (3) is a quadraticfunction of the parameters, it can be redefined as

where Mi is a symmetric matrix of the form

B;, C/, and D, are 6 X 6, 6 X 4, and 4 X 4 matricesrespectively. In addition, Di and Bi are symmetricmatrices. We shall not detail the values of the matricesthat are easily computed from Eq. (3).

If we define the matrices B = ’2,7=1 B;, C = ’2,f¥=1 1 C;,and D = ’2,7=1 Di, the minimization of Eq. (3) becomes:

with 11 P, II = 1, where

The minimum is found by using the Lagrange multi-pliers method. With this method, Eq. (6) is equivalentto the problem:

Find the minimum A and the corresponding vectorP such that: .




34

Fig. 6. Results of the seg-mentation into planes.

Equation (8) gives the solution for P2:

Reporting this expression of P2 in Eq. (7), the solutionPF is the unit eigenvector corresponding to thesmallest eigenvalue .lm;n of the symmetric matrix B -CD-’Cr. From Eqs. (7) and (8), the resulting error E isthe eigenvalue Âmn.

Computational Problems Notice that the matrices B,C, and D can be easily updated when new points areadded to the region or when two regions are merged,but the errors cannot be iteratively updated since theyare derived from an eigenvalue calculation. We areinvestigating efficient iterative methods that could beapplied to the surface-fitting problem.

Results The results presented in Figs. 6 - 9 have beenobtained from the triangulation of a set of about 3000points measured by the INRIA range finder. Figure 6shows the segmentation of the object in Fig. 1 into 60

planar patches. The average error is I mm, which is’

about twice the accuracy of the range finder. Figures 7 -9 illustrate the behavior of the segmentation programon an oil bottle and a funnel when we permit planarand quadric patches to compete as long as the averageerror is below 1 mm. Initially, the surface is betterapproximated by planar patches (see Fig. 8), but as thesize of the patches increases, the quadric patches be-come a better approximation (see Fig. 9). Note that theswitch between planes and quadrics is done automati-cally by simply comparing the respective errors.

Fig. 7. Triangulations of theoil bottle and the funnel.




35

Fig. 8. Segmentation intoplanar patches.

3. Recognition and Locating

~.1. POSSIBLE STRATEGIES FOR THE SEARCH PROBLEM

Now that we have worked out a representation for3-D shapes that satisfies the requirements for solving

the problem of recognizing and locating objects, weare ready to deal with the corresponding search prob-lem.Our goal is to produce a list of matched model and

scene primitives (M1, S, ), ... , (lLlp, Sp)), wheresome S~’s may be equal to the special primitive NIL,meaning that the corresponding model primitive Mi isnot present in the scene. Producing such a list is therecognition task. We also want to locate the identifiedmodel in the workspace: i.e., we want to compute therigid displacement that takes the model onto the scene.Any rigid displacement T can be decomposed in an

infinite number of ways as a product of a translationand a rotation. In order to make this decompositionunique, we assume that the axis of the rotation goesthrough the origin of coordinates and that the rotation




36

Fig. 9. Segmentation intoquadric patches.

is applied first. Notice that there is no need to intro-duce changes in scale because absolute measurementsare available in both active and passive stereo.The corresponding combinatorial complexity can

certainly be very high. (It is an exponential function ofthe number of primitives in the scene and the

models.) Great care should be taken in the choice ofthe search strategy in order to allow the use of theconstraint of rigidity to reduce as much as possible thesize of the search space. Several techniques can beused, and we will review some of them here.

3.1.1. Relaxation MatchingRelaxation techniques have been developed to solve anumber of matching tasks. Matching is seen as a label-ing problem. For example, a model primitive Mi islabeled (i.e., matches) a scene primitive S; . If only re-gion-based measurements are used (the numerical ortopological features mentioned above), there are manypossible matches for a given model primitive, eachone being described by a quality measure p(Mj, Sa).




37

Starting from these initial measurements, the goal ofthe relaxation technique is to reduce iteratively theambiguity and incoherence of the initial matches. Thisis done by bringing in another piece of information -a numerical measure of the coherence of an n-tuple ofmatches c(Ml, S,, ... , Mn, Sn)’ A number of differ-ent techniques have been proposed to achieve thisgoal, and they fall into two broad categories. The so-called discrete relaxation techniques assume that thefunctions p and c can only be equal to 0 or 1 (Rosen-feld, Hummel, and Zucker 1979). They can be usedfor problems like subgraph isomorphisms, when theconnectivity of the graphs is not too high, but they arenot adapted to our problem, where a richer metricinformation is available. In continuous relaxation tech-

niques, the functions p and c take real values andp(M;, S;) can be considered as the probability that S,corresponds to M; . The general idea then consists in aniterative modification of the likelihoods by combiningthose computed at the previous step with the coher-ence measures c. Generally, convergence toward a bestmatch cannot be guaranteed except in some very spe-cial cases (Faugeras and Berthod 1980; Hummel andZucker 1983).

Relaxation techniques have two main drawbacksthat have prevented us from using them. First, theresults are very sensitive to the quality of the initialmatches - the initial likelihoods p(M~ , So) - sinceconvergence is toward a local maximum of some

global average measure of coherence. Second, theyonly take into account local coherence (otherwise thefunction c becomes intractable), whereas rigidity,which is the basic constraint of our problem, is global.

3.1.2. Hough Transform and Clustering

Hough transform techniques have often been used forrecognizing planar shapes (Ballard 1981 ). The basicidea is to quantize the space of the relevant transfor-mations to be used in matching models and scenes(scalings, translations, rotations, etc.) and use thatspace as an accumulator. Each match between a modeland a scene primitive corresponds to a family of per-mitted transformations, and the corresponding cells inthe accumulator are increased by one in the simplest

case. Then the best matches are identified by searchingfor local maxima in the accumulator space. Usually averification step must be performed to break ties andimprove accuracy.

If these ideas were applied to our problem, the fol-lowing pitfalls could occur. First, the transformationspace is six-dimensional, implying either a large sizeaccumulator or poor precision. Second, matching twoprimitives (points, lines, or planes) does not com-pletely define the transformation; it only constrains itin ways that make the updating of the accumulatorsomewhat complicated. For these reasons, and becausethere is no easy way to efficiently exploit the rigidityconstraint with the Hough transform approach, wehave not implemented it.

z

3.1.3. Tree Search

Tree search is a generic name for a large number oftechniques whose basic thrust is to explore efficientlythe space of solutions: i.e., the set of lists of pairs (Mo,Sl)’ ... , (MP, Sp). This is done by traversing thetree of Fig. 10. With this technique, it is highly desir-able to avoid traversing the whole tree (that is, gener-

.

ating all solutions, even wrong ones), while guarantee-ing that the best solutions will be found. We canachieve this by exploiting rigidity (the basic constraintof our problem) and applying the paradigm of recog-nizing while locating (see Fig. 11). For every path inthe tree corresponding to a partial recognition (M, ,Si), ... , (Mk, Si), we compute the best rigid dis-placement Tk from model to scene for those primitives.We then apply Tk to the next unmatched model prim-itive Mk+) and only consider as possible candidates forMx+~ those unmatched scene primitives that are sum-ciently close to T~(Mk+1). Once a model primitive hasbeen matched, it is not reconsidered. Therefore, thereis a one-to-one correspondence between model andscene primitives (including NIL). This allows us todrastically reduce the breadth of the search tree, thusconfirming our claim of the power of the recognition/locating paradigm (Faugeras and Hebert 1983). Sev-eral important issues, such as how to order the modelprimitives and how to reduce the depth of the searchtree, are explored later in this paper.




38

Fig. 10. Principle of thetree-search algorithm. A.Hypotheses for M, . B. Hy-pothesis for M2 compatiblewith (M¡, Si,). C HypothesisforM3 compatible with (M¡,S¡) and (M2, Sj,).

3.2. MATHEMATICAL REPRESENTATION OF THERIGIDITY CONSTRAINT

3.2.1. Estimation of tlte Position as a MinimizationProblem

The main part of the application of the rigidity con-straint is the estimation of the transformation given apartial match. Given a set of pairings (Mi, Si), wherethe M~’s and S~’s are primitives of the model and thescene respectively, the problem is to compute the&dquo;best&dquo; transformation T that applies the model ontothe scene. This is shown in Fig. 11. By best, we meanthat the sum of the distance between T(M;) and Si isminimum so that the rigidity constraint propagation isstated as a minimization problem:

Fig. I]. The rigidity con-straint. A. Scene. B. Model.C. Viewer-centered frame. D.Object-centered fi-ame.

The distance D and the corresponding minimizationproblem depend on the type of primitives: points, linesegments, or planes.

Points The distance D is simply the usual distancebetween the scene point and the transformed modelpoint. The square euclidian distance is preferred inorder to apply a least-squares method. Therefore, theminimization problem is

Line Segments The distance D contains two termscorresponding to the two components of the line rep-resentation, the direction and the distance vector. Therelations between a line segment (v, d) and the trans-formed line are

The corresponding criterion is:




39

where Ki and K2 are weighting constants between 0and 1. To be rigorous, these constants should be re-lated to a noise model (Bolle and Cooper 1984; Cer-nushi, Belhumer, and Cooper 1985).

Planes Similar to the previous case, the transformedplane (v’, d’) of (v, d) is given by

And the corresponding criteria are

and

Notice that a problem of underdetermination mayoccur because a minimum number of pairings is re-quired in order to be able to solve the minimizationproblem numerically.

In sections 3.2.2, 3.2.3, and 3.2.4, we describe therepresentation of the rotations used for solving theoptimization problem, then we present an exact solu-tion in the cases of planes and points, and finally wepropose an iterative solution that includes line seg-ments.

3.2.2. Representation of the Rotation

Possible Representations The three minimizationproblems are constrained by the fact that R is a rota-tion matrix. Several representations of rotations areavailable, and they lead to different constraints:

1. R is an orthonormal matrix: RRt = Id.2. The rotation can be defined as an axis v of

magnitude 1 and an angle 0.3. The rotation can be defined as a quatemion,

the product Rv being a product of quaternions(Hamilton 1969).

The first representation leads to a high-dimensionalspace of constraints, while the second one leads to anonpolynomial criterion. The third representation

provides the simplest way of solving the problem andis described later in this section.

Definition of the Quaternions A quaternion can bedefined as a pair (w, s) where w is a vector of IR3and sis a number. A multiplication is defined over the set ofquatemions Q as

where A is the cross product.The definitions of the conjugate and the module of

a quatemion are similar to the ones for the complexnumbers:

where q = (w, s) and

Note that ~R3 is a subspace of Q due to the identifica-tion v = (v, 0). Similarly, the module is the extensionof the euclidian magnitude and is &dquo;multiplicative&dquo;:

Representation of the Rotations by the Quaternions Arotation R of axis v and angle 0 can be represented bytwo quaternions q = (w, s) and -q, the application ofthe rotation being translated into a quaternion prod-uct by the relation

where the vectors and the quaternions are identified.

The mapping between the rotations and the quater-nions is defined by

Similarly, for any quaternion of module I , there existsa rotation satisfying Eq. ( 16). Therefore, a rotation is




40

represented by two quaternions of magnitude 1. Therelations defining the three minimization problemscan be translated into minimizations in the space ofthe quaternions. The new constraint lql = 1.

3.2.3. An Exact Noniterative Solution to theMinimization Problem

Planes The minimization problem stated in Eq. (13)can be restated in quaternion notation as

subject to the constraint lql = 1.

Since the module is multiplicative and lql = 1, Eq. (18)can be rewritten as

Equation (15) shows that the expression q * v, -

v’i * q is a linear function of q. Therefore, there existsymmetric matrices Ai such that:

q being considered as a column vector.

So, if B = ~N i A~, the minimization problem forthe rotation part in the case of planes can be restated as

where [q[ = 1.

Since B is a symmetric matrix, the solution to thisproblem is the four-vector q~ corresponding to thesmallest eigenvalue Àm.in of B. The matrix B can beincrementally computed from the pairings (vf, v’f). Formore details on the computation of B, see Hebert(1983).

Points The minimization problem in Eq. (9) can be

rewritten in Q as

Using the same method used for deriving Eq. (19)from Eq. (18), Eq. (21 ) becomes

t * q is a new quaternions t’. As we showed previously,there exist matrices Ai such that

If we define the eight-vector V = (q, t’)‘, Eq. (22) be-comes :

The minimization is subject to the constraints lql = 1andq ~ t’=0

Where B is a symmetric matrix of the form

where:

and

and N is the number of pairings.

After some simple algebraic manipulation, the solu-tion is found to be

and




41

where qmin is the four-vector of unit magnitude corre-sponding to the smallest eigenvalue ~... of the sym-metric matrix A - C’CIN. As in the case of planes, thematrices Ai depend only on the two terms of the pair-ing M; and S; .

Lines We do not have a noniterative solution for thiscase, but an iterative one is presented in the followingsection.

.3.2.4. Iterative Solution

As we mentioned before, the estimation of the trans-formation is used to guide a search process and has tobe repeated many times. In particular, the minimiza-tion problems in Eqs. (9), (11), (13), and (14) have tobe solved repeatedly on data sets that differ only inone term. Techniques such as recursive least-squaresare good candidates for dealing with this problem andindeed turn out to be quite effective. The basic idea ofrecursive least-squares follows. Given a set of matricesHi and a set of measured vectors zi, we want to findthe minimum with respect to x of the following crite-rion :

If x~_1 is the minimum of h-l(X), then xk can be com-puted as

where the matrix Kk is given by

Eqs. (14), (19), and (21 ) can be put in the form ofI Eq. (27), with x = t, z, = d’; - di, Hi = v’/ for Eq.

(14); x = q, z, = 0, H~ = ~ * q - q * y, for Eq. (19);and x = (q, t’), z¡ = 0, H¡( q, t’)’ = q * x¡ - x’ ¡ * q + t’for Eq. {21 ).

Eq. (11) can be rewritten as follows:

The first part of the criterion does not pose anyspecific problem, but the second part does because theterm containing t is a function of the index i andtherefore the method used for determining the pointsis not applicable. What we do then is we linearize thequadratic term hl(q) = q * di * q in the vicinity of thesolution qi- and apply the recursive least-squares tothe modified criterion. This approach is similar to theone used in the extended Kalman filtering ( Jazwinski1970). Convergence theorems do not exist, so we can-not guarantee that the iterative solution is exact, as wedid with Eq. ( 14). This is also true of Eqs. ( 19) and(22), since the recursive least-squares do not take intoaccount the constraint jql2 = 1.

In Table 1 we show the behavior of the recursive

least-squares method under some severe conditions.Starting from three different estimation errors on therotation axis and angle, we have used the solutionfound by the search program shown in Figs. 14 to 16to estimate the transformation by applying five itera-tions of the filtering process. It can be seen that evenwith very large initial errors the technique convergesrapidly toward the optimal solution.When a solution exists with a small error then the

iterative technique will find it, whereas we have no-ticed that the quaternion magnitude will differ widelyfrom 1 when such a solution does not exist. This is an

important clue as to whether or not to pursue theexploration of a particular partial solution to thematching process.

3.3. CONTROL STRATEGY

In this section we present the three basic steps in thecontrol process: (1) hypothesis formation, (2) predic-tion and verification, and (3) controlling the depth ofthe search.

3.3.1. Hypothesis Formation

The first step in the recognition process is hypothesisformation, i.e., the search for sets of consistent pairings




42

Table 1. Iterative Method

that provide enough information for the computationof the transformation and the application of the pre-diction-verification phase. This first step is crucialbecause it determines the number of branches of thetree that have to be explored. The main problem isthat we need at least two pairings in order to estimatethe transformation. Lines should not be parallel andplanes should be independent. The number of primi-tives required is summarized in Table 2.The hypothesis formation proceeds in- three steps:

(1) selection of a first pairing, (2) selection of a secondpairing, and (3) estimation of the transformation.

Selection of a First Pairing For each primitive of themodel, the compatible primitives of the scene arelisted. The choice of these primitives cannot make useof the rigidity constraint; only the position-invariantfeatures, such as the length of the segments or the areaof surface patches, can be used. Most of these featuresare highly sensitive to occlusion and noise, so theyshould be used carefully and with large tolerances.

Selection of a Second Pairing Given a first pairing(M, , Si) and a second model primitive M2, the candi-dates for matching must satisfy the rigidity constraint.




43

Table 2. Minimum Number of Primitives Needed toEstimate the Transformation

It turns out that this choice is quite simple. In the caseof points, the only constraint on S2 is that D(S1, S2) =D(M1 , M2), where D is the usual euclidian distance. Inthe case of planes, the only constraint is on the anglebetween the normals, i.e., S2 must be chosen so that(~2~ V’1) = (vz, v,), since for any such scene primitivesthe rotation is fixed and a translation can always befound. (In fact, the coordinates of the translation alongv’1 and v’2 are fi~ced.) Using Eq. (12), we get

By assuming that v’, and v’2 are not parallel and bymultiplying Eq. (28) by v’1 and v’2 we get:

and

where s = v’, - V’2 -

Eqs. (29) always yield a unique solution in a and,6.In the case of lines, the situation is slightly different.

Again, we have a constraint on the angles, and S2must be chosen such that {v’2, v’T) = (v2, v,). But wealso have a distance constraint. For example,d(M, , M2) (the shortest distance between Mi and M2)should be equal to d{Sl , S2). This can be seen as fol-lows. First, if L = (v, d) and M = (w, e) are two non-parallel straight lines, then the algebraic shortest dis-tance between them is given by

where (x, y, z) is the determinant of the three vectorsx, y, z. Indeed the equation of a line intersecting L

and parallel to v /B w, the direction of the shortest dis-tance

The equation of the line corresponding to the shortestdistance is obtained by solving the vector equationwith respect to a, fl, and y as

From Eq. (31 ) we derive the algebraic distance y,which is the projection of d - e on the line of directionv 1B w:

which is Eq. (30).

Let us now go back to our constraint. Given M, and

M2, SI and S2 satisfying (V/2’ v’ 1 ) = (vz , v,), we knowthat the rotation is determined. Let us characterize the

translation by writing the translation vector t as

and by using Eq. (10):

Since R is known, we define ul = d’1 - RdI and U2 =

d’2 - Rd2 . Multiplying the first equation by V/2 andthe second by v‘ and letting C = v’ 1 ~ v’2, we get

1 - C2 is nonzero since M, and M2 are nonparallel.To obtain the coordinate of t along V’I A v’ 2, we mul-tiply both equations by v’i A v’ 2’ Letting S = sin (v’, ,v’2), we get




44

In order to be consistent, we need

Replacing ul by d’, - Rd, and U2 by d’2 - Rd2, wefind this to be equivalent to

Since RT’ does not change determinants, and by usingthe equations vi = R-lv’, and V2 = R-1v’ 2, we get

Estimation of the Transformation The transformationis estimated (or partially estimated in the case of planarpatches) by using the techniques described in the pre-vious sections. As we mentioned before, some primi-tives do not have a canonic orientation so the trans-formation estimated from an initial hypothesis is notunique and several equivalent transformations aregenerated. The number of possible transformationsdepends on the type of primitive. The most importantcases are a pair of lines, which produces two solutions,and a pair of quadrics, which produces eight transfor-mations when the three elongations À1,2,3 are not zerosince each eigenvector has two equivalent orientations.One important part of this step is the order in which

the primitives of the model are considered for match-ing. Obviously, uninteresting branches might be ex-plored if the order is not carefully determined. Con-sider, for example, the case in which the two firstprimitives are parallel planes. In this situation, theestimated rotation is arbitrary and the rotation errorvanishes. Eventually a complete branch of the tree isexplored based on a wrong estimation of the transfor-mation.Three basic rules must be applied in the ordering of

the primitives:1. Small primitives (in terms of area or length)

should be avoided.2. The first two or three primitives must be lin-

early independent in order to avoid indeter-mination. The best estimation is producedwhen the primitives are nearly orthogonal.

3. If local symmetries exist in the object, the

primitives that could best discriminate betweenalmost equivalent positions of the objectshould be among the first ones considered formatching. Note that in some cases this mightcontradict the first two rules.

3.3.2. Prediction and VerificationIn this step, given an initial hypothesis and the asso-ciated transformation T = (R, t), we want to predictthe set of candidate primitives of the scene that can bematched with each primitive of the model in order toverify the validity of the initial hypothesis. In order todo this, we need to apply the transformation to everymodel primitive Mi and find the primitives of thescene that are close enough to T(Mi) (see Fig. 12). Wewant to avoid a sequential exploration of the scenedescription for each model primitive because it wouldincrease drastically the combinatorics of the algo-rithm. Moreover, we have computed a first estimate ofthe transformation, thereby reducing the averagelength of the list of candidates to a very small numberby implementing the rigidity constraint.We need a representation of the space of parameters

that permits a direct access to the Sk such that D(Sk,T(Ml)) < E. Generally speaking, such a structure couldbe implemented as a discretized version of the spaceof parameters, each &dquo;cell&dquo; of the space containing thelist of the primitives with the corresponding parame-ters, this structure being built only once. Then, the listof candidates is determined by reporting the cell towhich T(M~) belongs. This operation is made in con-stant time and does not depend on the initial numberof scene primitives. Although it is impossible to im-plement completely the previous scheme since thedimension of the parameters space is six, which leadsto an array of intractable size, it is possible to discretizeonly part of the space.One of the easiest and most effective solutions dis-

cretizes the spherical coordinates of the normals, di-rection, and principal directions, in the case of planes,lines, or quadrics. The resulting data structure is thediscretized unit sphere containing pointers to lists ofprimitives (see Fig. 13). This solution is easy to imple-ment because the dimension of the subspace is onlytwo, and it is efficient because the rotation usually




45

Fig. 12. The prediction step.A. Hypothesis. B. Prediction.

provides a strong enough constraint to remove most ofthe incompatible pairings.Another possibility is to sort the values of the pa-

rameters of the scene primitives; then the candidatescan be retrieved by a binary search technique. Thissecond method is less efficient in terms of complexitysince we lose the direct access to the lists of candi-

dates. On the other hand, it produces shorter lists be-cause a wider set of parameters is taken into account.

3.3.3. Controlling the Depth of the Search ..

Assume that in order to be recognized, an object musthave some fixed percentage of its surface visible (20%




46

Fig. 13. Using the discretizedunit sphere. A. DiscretizedGaussian sphere. B. List ofscene primitives.

Fig. 14. Results of the recog-nition algorithm on a firstscene. A. Scene segmenta-

tion. B. First identifiedmodel after rotation with theestimate R. C. Superimposi-

tion of identified scene andmodel primitives. D. Imageof the normals to the scene.

for example) as well as a minimum number of recog-nized primitives. If at some level of the tree of Fig. 10the number of NIL assignments is such that even if allthe remaining model primitives are matched, the re-quired area percentage cannot be reached, then it isnot necessary to explore further down the tree. Thisallows entire subtrees to collapse and improves effi-ciency at the cost of missing a few correct interpreta-tions of the data.

3.3.4. Results

Figures 14 -16 show the results of the segmentationand recognition programs on a scene containing three




47

Fig. 15. Results of the recog-nition algorithm on a firstscene (cont. ). A. Scene seg-mentation. B. Second identi-

fied model after rotationwith the estimated R. C. Su-perimposition of identifiedscene and model primitives.




48

Fig. 16. Results of the recog-nition algorithm on a firstscene (cont.). A. Scene seg-mentation. B. Third identi-


parts similar to the one in Fig. 1. Figure 14A showsthe segmentation of the scene, figure 14B shows theidentified model in the orientation given by the esti-mated rotation, and figure 14C shows the superimpo-sition of the identified scene primitives (bold lines),the corresponding model primitives (dotted lines). Thecomputed object position is used to remove sceneprimitives that are part of the recognized model buthave not been identified by the recognition program(dashed lines). After the identification of the firstmodel, the identified scene primitives are removedfrom the scene segmentation. Figures 15 and 16 showthe identification of the second and third object. Thecomputation time for the recognition varies between30 seconds and 5 seconds as the number of primitivesinvolved decreases. The average orientation error, thatis, the average angle between a scene primitive and thecorresponding transformed model primitive, is lessthan 1 degree. This example shows that it is possibleto achieve both efficiency and accuracy by using therigidity constraint. Figures 17 -19 show the results onanother scene.

4. Conclusion

We have presented a number of ideas and results re-lated to the problem of recognizing and locating 3-D




49

Fig. 17. Results of the recog-nition algorithm on a secondscene. A. Scene segmenta-tion. B. First identifiedmodel after rotation with the

estimated R. C. Superimpo-sition of identified scene andmodel primitives. D. Imageof the normals to the scene.




50

Fig. 18. Results of the recog-nition algorithm on a secondscene (cont.). A. Scene seg-mentation. B. Second identi-

fied model after rotationwith the estimated R. C. Su-

perimposition of identifiedscene and model primitives.




51

Fig. 19. Results of the recog-nition algorithm on a secondscene (cont.). A. Scene seg-mentation. B. Third identi-


rigid objects from range measurements. We have dis-cussed the need for representing surface information,specifically curves and surface patches. We have de-scribed a number of simple algorithms for extractingsuch information from range data and argued for arepresentation in terms of linear primitives constructedfrom curves and surface patches. We have also dis-cussed the representation of the constraint of rigidityand proposed to exploit it to guide the recognitionprocess. The resulting paradigm consists in recognizingwhile locating and has been implemented as a hypoth-esis formation and verification process that has provedextremely efficient in practice.We think that further work is needed in order to

explore other 3-D object representations, both for theapplications described in this paper and for the moregeneral problems of dealing with articulated objects orwith classes of objects rather than specific instances. Inboth cases, the rigidity constraint cannot be exploitedas fully as we have done. More powerful matchingmechanisms and other constraints must be brought in,which will make the future exciting.

Acknowledgments

We are thankful to Nicholas Ayache, Bernard Faver-jon, and Fabrice Clara for many fruitful discussions.We have also benefited from discussions with Michael

Brady, Eric Grimson, Masake Oshima, TomasLozano-Perez, and Yoshiaki Shirai.

REFERENCES ’

Agin, G. J. 1972. Representation and description of curvedobjects. Tech. Report AIM-73. Stanford, Calif.: StanfordUniversity.

Ayache, N., and Faugeras, O. D. 1984 (August). A newmethod for the recognition and positioning of 2-D objects.Proc. 7th Int. Conf. on Pattern Recognition, pp. 1274-1280.




52

Baker, H. H., and Binford, T. O. 1982. A system for auto-mated stereo mapping. Proc. Image Understanding Work-shop, pp. 215-222. Sciences Applications, Inc.

Ballard, D. H. 1981. Generalizing the Hough transform toarbitrary shapes. Pattern Recognition 13(2):111-122.

Boissonnat, J. D. 1984 (August, Montreal, Canada). Repre-senting 2-D and 3-D shapes with the Delaunay triangula-tion. Proc. 7th Int. Conf. on Pattern Recognition, pp. 745-748.

Bolle, R., and Cooper, D. 1984 (December). On optimallycombining pieces of information with application to esti-mating 3-D complex-object position from range data.Tech. Report LEMS-8., Providence, R.I.: Brown Univer-sity, Division of Engineering.

Brady, M., and Asada, H. 1984. Smoothed local symmetriesand their implementation. Int. J. Robotics Res. 3(3), 36-61.

Brady, M., et al. 1984 (Kyoto, Japan). Describing surfaces.Proc. 2nd Int. Symp. on Robotics Res., MIT Press, pp.434-445.

Cernushi, B., Belhumer, P., and Cooper, B. 1985 (June).Estimating and recognizing parametrized 3-D objectsusing a moving camera. Proc. CVPR’85, pp. 167-170.

Faugeras, O. D., and Berthod, M. 1980. Improving consist-ency and reducing ambiguity in stochastic labeling: anoptimization approach. IEEE Trans. on Pattern Analysisand Machine Intell. PAMI-3(4): 412-424.

Faugeras, O. D., and Hebert, M. 1983 (August, Karlsruhe,Germany). A 3-D recognition and positioning algorithmusing geometrical constraints between primitive surfaces.Proc. 8th Joint Conf. on Artificial Intell., pp. 996-1002.

Faugeras, O. D., et al. 1982. Towards a flexible vision sys-tem. In Robot vision, ed. A. Pugh. United Kingdom: IFS,pp. 129-142.

Gaston, P. C., and Lozano-Perez, T. 1983. Tactile recogni-tion and localization using object models. Tech. ReportAIM- 705, Cambridge: MIT Artificial Intelligence Labora-tory.

Grimson, W. E. L. 1981. From images to surfaces: a compu-tational study of the human early visual system. Cam-bridge : MIT Press.

Grimson, W. E. L., and Lozano-Perez, T. 1983. Model-basedrecognition and localization from sparse three-dimensionaldata. Int. J. Robotics Res. 3(3):3-35.

Hamilton, W. R. 1969. Elements of quaternions. New York:Chelsea.

Hebert, M. 1983. Reconnaissance de formes tridimension-nelles. Ph. D. thesis, University of Paris South. Availableas INRIA Tech. Rep. ISBN 2-7261-0379-0.

Horaud, P., and Bolles, R. C. 1984 (Atlanta, Georgia).3-DPO’s strategy for matching three-dimensional data.Proc. of the Int. Conf. on Robotics, pp. 78-85.

Horn, B. K. P. 1975. Obtaining shape from shading infor-mation. In The Psychology of computer vision, ed. P. H.Winston. New York: McGraw-Hill, pp. 115-155.

Hummel R., and Zucker, S. 1983. On the foundations ofrelaxation labeling processes. IEEE Trans. on PatternAnalysis and Machine Intell. PAMI-5:267-287.

Jazwinski, A. H. 1970. Stochastic processing and filteringtheory. Orlando, Fla.: Academic Press.

Marr, D., and Poggio, T. 1979. A computational theory ofhuman stereo vision. Proc. Roy. Soc. Lond. (204):301- 328.

Nishihara, H. K. 1983. PRISM: a practical realtime imagingstereo system. In Proc. 3rd Int. Conf. on Robot Vision andSensory Control, ed. B. Rooks. IFS publications and Am-sterdam : North-Holland.

Ohta, Y., and Kanade, T. 1983. Stereo by intra- and inter-scanline search using dynamic programming. Tech. Rep.CMU-CS-83-162. Pittsburgh: Carnegie Mellon University.

Oshima, M., and Shirai, Y. 1983. Object recognition usingthree-dimensional information. IEEE Trans. on Pattern

Analysis and Machine Intell. PAMI-5(4): pp. 353-361.Ponce, J. 1983. Representation et manipulation d’objets

tridimensionnels. Ph. D. thesis, University of Paris South.Also available as INRIA Tech. Rep. ISBN 2-7261-0378-2.

Ponce, J., and Brady, M. 1985 (March). Toward a surfaceprimal sketch. Proc. Int. Conf. Robotics and Automation,pp. 420-425.

Rosenfeld, A., Hummel, R., and Zucker, S. 1979. Scenelabeling by relaxation operations. IEEE Trans. on Sys-tems, Man and Cybernetics (6):420-433.

Witkin, A. P. 1981. Recovering surface shape and orienta-tion from texture. In Computer vision, ed. M. Brady,Amsterdam: North-Holland.




The International Journal of Robotics Research...

Documents

Transcript of The International Journal of Robotics Research...