Bayesian segmentation of range images of polyhedral objects using entropy-controlled quadratic...

Bayesian segmentation of range images of polyhedralobjects using entropy-controlled quadratic Markov

measure field models

Carlos Angulo,1,2 Jose L. Marroquin,1,3 and Mariano Rivera1,*1Centro de Investigaciones en Matemáticas A.C., Apdo. Postal 402 Guanajuato, Gto. 36000, Mexico

[email protected]@cimat.mx

*Corresponding author: [email protected]

Received 28 January 2008; revised 10 June 2008; accepted 17 June 2008;posted 17 June 2008 (Doc. ID 92085); published 28 July 2008

We present a method based on Bayesian estimation with prior Markov random field models for segmen-tation of range images of polyhedral objects. This method includes new ways to determine the confidenceassociated with the information given for every pixel in the image as well as an improved method forlocalization of the boundaries between regions. The performance of the method compares favorably withother state-of-the-art procedures when evaluated using a standard benchmark. © 2008 Optical Societyof America

OCIS codes: 150.1135, 150.5670, 100.2960.

1. Introduction

One of the main problems in computer vision is therecovery of three-dimensional (3D) information fromimages of a scene. A method that is widely used forthis purpose is triangulation, which is possible ifone has two views of a scene taken from differentviewpoints and identifies image points that corre-spond to the same3Dpoint,which is the case of stereo-vision. Alternatively, one may replace one of thecameras by a video projector and project onto thescene an image with known structure such that thepixels of the projected image are easily identified inthe captured image. In this way, once the camera–projector system is calibrated, one may computethe ðx; y; zÞ coordinates of the corresponding pointin the3Dscene.Thismaybedonebymeansof a codingstrategy of assigning to every pixel in a row (or col-

umn) of the projected image a unique code word; aparticularly simple coding scheme is obtained by pro-jecting a series of binary (dark and light) fringe pat-terns,where the frequency of the fringes is doubled forevery successive image. These images may be de-signed to correspond to a plain binary code (whereeach fringe is split into a dark and light subfringeat each step) or, more conveniently, to a gray codein which case consecutive code words always have aHamming distance equal to 1, which makes the sys-tem more robust against noise [1].

In either case, the resolution of the system will bedetermined by the number of projected images: if oneusesn images, the resultingdepthwill be quantized to2n levels. The resulting depth field may be visualizedby assigning to each depth a corresponding graylevel; the resulting image is called a “range image.”The range images used in all the experiments re-ported here were obtained with a structured lightscanner [2] with a series of eight projected fringe pat-terns per range image so that the depth is quantized

0003-6935/08/224106-10$15.00/0© 2008 Optical Society of America

4106 APPLIED OPTICS / Vol. 47, No. 22 / 1 August 2008

to 256 levels. One sample range image and the corre-sponding ordinary intensity image are presented inFig. 1. For each pixel r ¼ ðr1; r2Þ in a range image IðrÞ,one can compute the corresponding 3D Cartesiancoordinates (in millimeters) using the followingformulas [3]:

xðrÞ ¼ ð255 − r2Þ

�IðrÞscal þ offset

�

f;

yðrÞ ¼ ð255 − r1Þc

�IðrÞscal þ offset

�

f;

zðrÞ ¼ ½255 − IðrÞ�scal

; ð1Þ

where scal, offset, c, and f arevalues obtained throughcalibration of the camera–projector system. For theexamples we are using, the origin of the coordinatedsystem is in the center of the image, at 255mm fromthe camera; the z axis points in the direction of thecamera; and the x and y axes point to the top and rightof the camera, parallel to the image.In many industrial applications, the objects to be

digitized are often polyhedrons; in this case, 3D mod-els may be constructed by finding the equations ofthe planar faces of the polyhedron and the lines thatseparate such regions. This particular instance ofthe image segmentation problem has been studiedby a number of researchers [4–14] and is the focusof this work. The most usual methods that have beenused for this purpose are region-growing-basedand edge-detection-based algorithms. In the region-growing-based algorithms, one selects a seed region(which can be a pixel) in a range image at a time andkeeps adding nearby pixels, which fulfill some con-straints, to the region until there are no more pixelsto add. The different ways to choose a seed regionand the constraints that the pixels to be added toit should satisfy characterize different particularmethods. The algorithms in [4–6] are instances ofthis approach. The edge-detection methods focuson the search for boundaries between regions thatcorrespond to different planes. Most of them use apreprocess to find the possible edge pixels and a post-

process to fill gaps in the detected edges. The algo-rithm in [7] is an instance of this method.Although these methods produce reasonably good

results, their performance is limited by the fact that,in general, they do not take into account specific priorinformation about the polyhedral nature of the ob-jects in the scene (e.g., the fact that the intersectionbetween two planes must be a straight line). A differ-ent segmentation method, based in Bayesian estima-tion theory and closer to our approach, is presentedby Li [15]. This method is based on the reconstructionof three piecewise smooth fields ϕ1, ϕ2, and ϕ3, whichcorrespond to the coefficients of different planes.These fields are found by minimizing an energy func-tion given by

Uðϕ1;ϕ2;ϕ3Þ ¼Xr

½ϕ1ðrÞxðrÞ þ ϕ2ðrÞyðrÞ þ ϕ3ðrÞ − zðrÞ�2

þ λXr

Xs∈Nr

g

� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiwa½ϕ1ðrÞ − ϕ1ðsÞ�2 þwb½ϕ2ðrÞ − ϕ2ðsÞ�2 þwc½ϕ3ðrÞ − ϕ3ðsÞ�2

q �; ð2Þ

where xðrÞ, yðrÞ, and zðrÞ are given by Eq. (1); Nr de-notes the neighborhood of pixel r; wa, wb, and wc arepositive parameters; and g is a function that is cho-sen in such a way that discontinuities in the ϕ fieldsare preserved. For range images, the author proposesgðηÞ ¼ lnð1þ η2Þ and a gradient descent method forthe minimization of U. Although this method pro-duces good results with synthetic images corruptedwith white Gaussian noise, with real range imagesthe results are not as good. One of the reasonswhy the performance of this and of other segmenta-tion methods decrease when applied to real rangeimages is that quantization of range data is not takeninto account. As explained above, because of the waythe range images are acquired, they are quantized,usually to 256 different values; for this reason, a

Fig. 1. (a) Intensity image and (b) range image of the same scene.

1 August 2008 / Vol. 47, No. 22 / APPLIED OPTICS 4107

plane in the quantized image appears as a stairlikepattern, as shown in Fig. 2. This situation compli-cates the detection of borders between regions thatcorrespond to different planes (i.e., discontinuitiesin the ϕ fields), in particular when there is a gradientbut not a depth discontinuity, i.e., in the case of “roof”borders (see Fig. 3). Other sources of difficulty are thepresence of noisy points produced by specular reflec-tions at the moment of taking the images and the factthat, because the structured light scanner uses twooptical pathways, one for the projector and the otherfor the camera, there are shadows in the images;these points do not give information about the objectsand must be eliminated.The goal of this work is to present a segmentation

method for range images of polyhedral objects thatovercomes these difficulties and improves signifi-cantly the performance of published methods. It isbased on two ideas:

1. a preprocessing stage in which the spuriousinformation originated by the quantization is elimi-nated and2. a probabilistic approach to segment the

images in terms of planar regions, using Bayesianestimation, which takes into account the polyhedralnature of the scene.

The results produced by this method feature a bet-ter localization of the borders between regions, mak-ing the reconstruction qualitatively more similar tothe ground truth in a number of experiments. Theseresults are also quantitatively better than otherstate-of-the-art segmentation algorithms, accordingto standard metrics.In Section 2 we present the algorithms used and

the theory in which they are based. In Section 3we present the results obtained and make qualita-tive and quantitative comparisons with other pub-lished methods and, finally, give our conclusions.

2. Proposed Method

A. Bayesian Estimation and Hidden Markov RandomMeasure Fields

The image segmentation problem we are interestedin may be formulated as follows: partition the pixellattice L into K nonoverlapping regions R1;…;RKsuch that all the pixels in a given region correspondto points that belong to the same planar region in the3D scene. This means that for all pixels r ∈ Rk,

zðrÞ ¼ mkðrÞ þ nðrÞ

where mkðrÞ ¼ θk1xðrÞ þ θk2yðrÞ þ θk3 is the kth pla-nar model and nðrÞ is a model for the observationerror; e.g., nðrÞ is a zero-mean Gaussian random vari-able with variance σ2. θk ¼ ðθk1; θk2; θk3Þ denotes the(unknown) parameter vector of the correspondingplane. One may define the vector field b of indicatorvariables for each region: bkðrÞ ¼ 1 if r ∈ Rk, andbkðrÞ ¼ 0, otherwise.

The solution to this problem consists in the estima-tion of the field bðrÞ, r ∈ L, and of the parameter vec-tors θk, k ¼ 1;…K . Following the approach of [16],the field b may be estimated by considering it as afield of random vectors that satisfy the constraints

XKk¼1

bkðrÞ ¼ 1; bkðrÞ ≥ 0 ∀ k; r; ð3Þ

that is, as a field of probability distributions wherebkðrÞmay be interpreted as the probability that pixelr belongs to region Rk. In addition, the b field is re-quired to satisfy other prior constraints; specifically:

1. Each variable bkðrÞ should be almost constant,except at the boundaries between regions.

2. Since the estimated b variables should approx-imate indicator variables, one should have that, foreach pixel r, only one variable bjðrÞ should be closeto 1 (indicating that r ∈ Rj), while all others bkðrÞ, k ≠

j should be close to zero.

These constraints may be expressed in the form of aprior Gibbs distribution for b, which takes the form

PbðbÞ ¼1Zb

exp½−UbðbÞ�;

where Zb is a normalizing constant and

UbðbÞ ¼ λXhr;si

XKk¼1

½bkðrÞ − bkðsÞ�2Bðr; sÞ − μXr∈L

XKk¼1

b2kðrÞ;

ð4Þ

Fig. 3. Zoom of a range image (roof edge).

Fig. 2. Three kinds of border: (a) step, (b) roof, (c) combination.


where λ and μ are positive parameters. In the firstterm of this equation, the sum is taken over all pairsof adjacent pixels hr; si and the weights Bðr; sÞ arecomputed in such a way that they take a value closeto zero at the boundaries between regions and closeto 1 elsewhere (see Subsection 2.D). Taking into ac-count that a realization of the field b will havehigh probability if Ub is minimal, one can see thatthis term enforces prior constraint (1). For the samereason, the second term, together with Eq. (3), en-forces constraint (2), since the sum

PKk¼1 b

2kðrÞ will

take the maximum value of 1 when this constraintis exactly satisfied [i.e., when bðrÞ is a low entropydistribution].As it was shown in [16], this prior distribution for b

may be combined with the likelihood of the observa-tions using Bayes’ rule. In that work it was shownthat, using the fact that each bðrÞ is a low entropydistribution, the maximum a posteriori (MAP) esti-mator for b given θmay be approximated by themini-mizer of the energy function

UðbjθÞ ¼ −XKk¼1

Xr∈I

ðlog½vkðr; θÞ�bkðrÞ2 þ μbkðrÞ2Þ

þ λXKk¼1

Xhr;si

½bkðrÞ − bkðsÞ�2Bðr; sÞ ð5Þ

subject to the constraints in Eq. (3), where vkðr; θÞ de-notes the likelihood of the observation at pixel r giventhat bkðrÞ ¼ 1 for parameter values equal to θ. In ourcase, this likelihood may be obtained assuming thatthe depth obtained from the range image equals thetrue depth plus a zero-mean Gaussian error withvariance that depends on the quantization error ofthe scanner:

log½vkðr; θÞ� ¼ −δr½zðrÞ −mkðrÞ�2; ð6Þwhere, as above, mk denotes the kth planar modeland δr has a value close to 1 if the quantization errorat pixel r is negligible and takes a value close to zerootherwise. In Subsection 2.B we explain how to esti-mate the δr variables. Substituting Eq. (6) into Eq. (5)one finally obtains that the MAP estimator for bgiven θ may be obtained by minimization of thepotential

UðbjθÞ ¼XKk¼1

Xr∈L

ð½zðrÞ −mkðrÞ�2δr − μÞbkðrÞ2

þ λXKk¼1

Xhr;si

½bkðrÞ − bkðsÞ�2Bðr; sÞ ð7Þ

subject to the constraints in Eq. (3). Before minimi-zation of Eq. (7), the computation of some terms isneeded. A group of starting model parameters θfor the segmentation must be calculated; at the sametime, the initial value for K must be obtained. Theconfidence of the information (quantization error)for every pixel has to be calculated to compute theδ field. Also field B has to be calculated to define

the interaction between pixels. In the followingsubsections we describe the procedure for obtainingthese terms.

B. Computation of the δr Field

The main purpose of this stage is to eliminate thespurious information introduced by the quantizationof the range images. The problem with an image of aquantized plane (see Fig. 4) is that the gradient iszero in most points and, hence, it can neither be usedto compute the plane parameters locally nor as a cri-terion for determining the segmentation. In fact,even in the absence of noise, the true plane will passonly through a few points, labeled “corner points”in Fig. 5.

In one dimension, if the difference operatorsD−ðI; r1Þ and DþðI; r1Þ are defined as

D−ðI; r1Þ ¼ Iðr1Þ − Iðr1 − 1Þ;DþðI; r1Þ ¼ Iðr1Þ − Iðr1 þ 1Þ; ð8Þ

as one can see from Fig. 5, at a corner point eitherDþðI; r1Þ or D−ðI; r1Þ should be greater than zerobut exactly one of them should be equal to zero.Therefore, the corners points should satisfy

DþðI; r1Þ ·D−ðI; r1Þ ¼ 0;

DþðI; r1Þ þD−ðI; r1Þ > 0: ð9Þ

Using the same idea, the same operators may bedefined for two dimensions:

Fig. 4. (a) Synthetic plane and (b) its quantization.

Fig. 5. Corner points in one dimension


D1þðI; rÞ ¼ Iðr1; r2Þ − Iðr1 þ 1; r2Þ;D1−ðI; rÞ ¼ Iðr1; r2Þ − Iðr1 − 1; r2Þ;D2þðI; rÞ ¼ Iðr1; r2Þ − Iðr1; r2 þ 1Þ;D2−ðI; rÞ ¼ Iðr1; r2Þ − Iðr1; r2 − 1Þ:

ð10Þ

To find the corner points in two dimensions thepoints have to satisfy the conditions in Eq. (9) intwo directions, i.e., they have to be corner pointsin r1 and r2. The corresponding conditions are

D1þðI; rÞ ·D1−ðI; rÞ ¼ D2þðI; rÞ ·D2−ðI; rÞ ¼ 0;

D1þðI; rÞ þD1−ðI; rÞ > 0;D2þðI; rÞ þD2−ðI; rÞ > 0:

ð11ÞUsing only the corner points, a local plane may befitted to quantized data with an error that dependsonly on the sensor noise and not on the quantization,as can be seen in Fig. 6.The process for finding the corner points is crucial

to the method presented in this paper because thishelps to obtain the parameters of the initial modelsin which the image will be segmented. A value of δr ¼1 is assigned if pixel r corresponds to a corner pointand a small value of δr is assigned if it does not (weuse δr ¼ 10−6).It is also necessary to eliminate the pixels in

shadowed areas from the computation. A pixel isconsidered to be in a shadowed area if IðrÞ < εs, whereεs is a threshold that, in general, may depend on thescanner and has to be hand adjusted. For all the ex-periments reported here we have used εs ¼ 3, butpractically the same results are obtained forany εs ∈ ½2; 7�.

C. Initial Estimation of a Plane’s Parameters

This procedure consists in successively sweeping theentire image with a window, looking for corner pointsthat are in a plane. The process needs three para-meters: an initial window size (initsiz), which shouldbe smaller than the smallest region one wishes to de-tect; a minimum number of corner points in the win-dow (minp), which should be at least five to producereasonable fits; and the maximum distance betweenthe points and the plane in the fitting process (max-dist), which depends on the sensor noise. The processis presented in Algorithm 1.

With this method one can compute an approxima-tion for the planes in the scene and a value for K. Forall the experiments reported here, we used the fol-lowing values for the parameters of this procedure:initsize ¼ 9, minp ¼ 5, and maxdist ¼ 2.

To obtain the correct plane parameters, it is im-portant to use the mapping to Cartesian coordi-nates given by Eq. (1), so that the least squaresestimator for the parameters is obtained by solving

Fig. 6. (a) Corner points of the plane in Fig. 4 and (b) the fittedplane (fitted with a small error).

Algorithm 1

Set K ¼ 1for r1 ¼ initsiz=2 to #columns-initsiz/2for r2 ¼ initsiz=2 to #rows-initsiz/2{Center the window in ðr1; r2Þif inside the window there are minp or more corner points{Fill the matrix in Eq. (12) with the information of the corner points inside the window;Solve the system to obtain the parameters of the plane;while all the corner points inside the window are at a distance less than maxdist{Increase the window size;if the calculated plane has not be added to the list of planesadd the current calculated plane to the list and set K :¼ K þ 1

elsechange the plane calculated before for the actual plane in the list of planes;

Fill matrix in Eq. (12) with the information of the corner points inside the window;Recalculate the plane parameters solving the system;}

if the plane was added to the list of planesDelete all the corner points used to calculate the plane;

}Return the window size to initsize;}


the system

0BBBBBBBBBBBBBBBB@

Pr∈W

δr ¼ 1

xðrÞ2 Pr∈W

δr ¼ 1

xðrÞyðrÞ Pr∈W

δr ¼ 1

xðrÞ

Pr∈W

δr ¼ 1

xðrÞyðrÞ Pr∈W

δr ¼ 1

yðrÞ2 Pr∈W

δr ¼ 1

yðrÞ

Pr∈W

δr ¼ 1

xðrÞ Pr∈W

δr ¼ 1

yðrÞ Pr∈W

δr ¼ 1

1

1CCCCCCCCCCCCCCCCA

0B@

θk1θk2θk3

1CA

¼

0BBBBBBBBBBBBBB@

Pr∈W

δr ¼ 1

xðrÞzðrÞ

Pr∈W

δr ¼ 1

yðrÞzðrÞ

Pr∈W

δr ¼ 1

zðrÞ

1CCCCCCCCCCCCCCA

: ð12Þ

To initialize the probability field, we use a field ofuniform distributions, i.e., we set

bkðrÞ ¼1K

∀ r; k ¼ 1;…K :

D. Estimation of the B Field

This field should be equal to 1 where the neighboringpixels should have an interaction and 0 where theydo not, because a region border is there.Three kinds of border should be detected: step,

roof, and a combination of both (see Fig. 2). The stepand combination borders can be detected using athreshold in the difference operator defined above;i.e., if r is the pixel at the left of the pixel s, we setBðr; sÞ ¼ 0 if jD1þ½IðrÞ�j > α, where α is the maximuminclination (absolute value of the gradient) of all theplanes in the scene. In all the experiments reportedhere we used the value α ¼ 6. The same is done in thevertical direction. At the same time, the shadow pix-els have to be disconnected from the pixels used forthe segmentation, so we set Bðr; sÞ ¼ 0 if r or s areshadow pixels.The roof border detection is more complicated.

These borders are in the intersection of the planes,but not all plane intersections make a roof borderin the image. Only the pixels in the intersection oftwo planes that are close to the interface betweenregions corresponding to the same planes should bedisconnected: the interface between regions Rk and

Rl is defined as the set of pixels that belong to eitherone of the regions and that have at least one neighborthat belongs to the other one. To detect roof bordersand, hence, to set the corresponding Bðr; sÞ ¼ 0, thefollowing procedure is used:

1. Select a pair of planes mk and ml with theirrespective regions Rk and Rl.

2. Search for the interface between regions Rkand Rl. If it is empty, go to step 5.

3. Compute xmin, xmax, ymin, and ymax as the coor-dinates of the corners of the smallest rectangle thatcontains this interface (see Fig. 7).

4. Search for the roof border inside the rectangle:because of the mapping to Cartesian coordinates,the intersection between the planes cannot be founddirectly. To do this all the pixels r (with Cartesiancoordinates ½xðrÞ; yðrÞ; zðrÞ�) inside the rectangle haveto be checked one by one. The equation of the inter-section line is obtained equating the equations ofthe planes and solving for one of the variables:

yintðrÞ ¼ðθk1 − θl1ÞxðrÞ þ ðθk3 − θl3Þ

ðθk2 − θl2Þ:

If a point xðrÞ corresponding to a pixel r is evaluatedin this line, the value of the coordinate yintðrÞ in theintersection line is obtained. If the value of yðrÞ issubtracted to this yintðrÞ value, the sign of the resultindicates if the pixel is over or under the intersectionline. Doing the same to a neighboring pixel, one candetermine if the intersection of the planes passes be-tween the two pixels. This is done for all pixels insidethe rectangle:

B½ðr1; r2Þ; ðr1 þ 1; r2Þ� ¼ 0⇐½yintðr1; r2Þ− yðr1; r2Þ�½yintðr1 þ 1; r2Þ− yðr1 þ 1; r2Þ� ≤ 0;

B½ðr1; r2Þ; ðr1; r2 þ 1Þ� ¼ 0⇐½yintðr1; r2Þ− yðr1; r2Þ�½yintðr1; r2 þ 1Þ− yðr1; r2 þ 1Þ� ≤ 0: ð13Þ

Fig. 7. Interface between regions (see text).


5. If all the pairs of planes have been checked,stop; if not, go to step 1.

E. Minimization of the Energy Function

During the energy minimization, the constraints inEq. (3) must be satisfied; to enforce the equality con-straints, one may use the method of Lagrange multi-pliers. The Lagrangian that corresponds to theenergy in Eq. (7) is given by

Lðb; γÞ ¼XKk¼1

Xr∈I

�ð½zðrÞ −mkðrÞ�2δr − μÞbkðrÞ2

þ λXs∈Nr

Bðr; sÞ½bkðrÞ − bkðsÞ�2�

−Xr∈I

γr�1 −

Xk

bkðrÞ�; ð14Þ

where fγrg are the Lagrange multipliers. Setting thegradient of L with respect to b and γ equal to zero,solving for bkðrÞ, and substituting it into the equalityconstraint in Eq. (3) to eliminate the γ variables, onegets the updated equation

bkðrÞ ¼

�1 −

PKi¼1

MiðrÞAiðrÞ

�

AkðrÞP

ki¼1

1AiðrÞ

þMkðrÞAkðrÞ

; ð15Þ

where

MkðrÞ ¼ λXs∈Nr

Bðr; sÞbkðsÞ;

AkðrÞ ¼ ½zðrÞ −mkðrÞ�2δr þ λXs∈Nr

Bðr; sÞ − μ: ð16Þ

To satisfy the nonnegativity constraint in Eq. (3),after every update, all negative b’s are set to zero andnormalization is done if needed to satisfy the corre-sponding equality constraint. Once the optimal esti-mator for the b field is found, it is possible to estimatea label field f that indicates directly the region towhich every pixel r belongs. It is defined as f ðrÞ ¼k if bkðrÞ > biðrÞ for all i ≠ k.

F. Implementation Details

Because the values for the parameters for the initialmodels may be inaccurate, it is necessary to adjustthem as the segmentation goes on. Thus, in our pro-posal, the plane parameters are recomputed usingleast squares every 10 Gauss–Seidel iterations con-sidering, for fitting planemk, only the corner points rwith bkðrÞ > 0:5. For example, instead of the upperleft term

Xr ∈ Wδr ¼ 1

xðrÞ2

in Eq. (12), one should use

Xr : bkðrÞ > 0:5

δr ¼ 1

xðrÞ2

and the same for the remaining terms.Note that it is possible that in the course of the

iterations, the support of a plane k, i.e., the setfrjbkðrÞ > 0:5g, is empty for some k. This may hap-pen, for example, when the initialization process gen-erates a spurious plane or more than one instance ofa given plane. In these cases, the planes with emptysupport are eliminated to avoid numerical errors andto accelerate the process.

It is also possible, however, that two instances ofthe same plane, generated in the initialization phase,continue to survive during the iterations with non-empty support. To fuse these two instances into a sin-gle plane, the following criteria are used:

a. The planes should be almost parallel. Thishappens if the angle βij between planes i and j satis-fies cosðβijÞ ¼ ni · nj > 0:97, where ni and nj are theunit normal vectors to the planes.

b. The planes should have compatible supports.To detect this, we define closei as the number ofcorner points at a distance less than 1 from planei and closeij as the number of corner points at a dis-tance less than 1 from planes i and j. We the say thattwo planes i and j have compatible supportif 2×closeij

closeiþclosej> 0:9.

If both conditions are satisfied, the plane withhigher average probability (i.e., higher average valuefor the corresponding b variables) for the points inthe support of the planes is allowed to continuethe process and the other plane is eliminated.

After the plane parameters have been recomputedand the number of models has been updated, the Bfield is also recomputed, using the procedure of Sub-section 2.D.

3. Experiments

To make a comparison of our algorithm, we used theimage database from theWeb page of the Image Ana-lysis Research Laboratory of the University of SouthFlorida [17]. This database consists of 30 images ob-tained with a structured light scanner, together withthe ground truths and the best segmentation ob-tained by this group.

A. Quantitative Evaluation

To evaluate the images we use the method in [8],which we transcribe here to make this paper self-contained. This method classifies the regionsobtained in the machine segmentation images in fivegroups:

Correctly detected regions: A region in themachine segmentation that corresponds exactly toa region in the ground truth.


Undersegmented regions: A region of the ma-chine segmentation that corresponds to a group ofregions in the ground truth.Oversegmented regions: A group of regions in

the machine segmentation that correspond to a sin-gle region in the ground truth image.Missed regions: Regions in the ground truth

image that do not correspond to any region in themachine segmentation.Noise regions: Regions in the machine segmenta-

tion that do not correspond to any region in theground truth image.

Because making two regions correspond exactlyis very difficult, a tolerance value T is defined. Thistolerance represents the relative overlap betweenthe segmentation produced by the evaluated methodand the ground truth and, therefore, it controls thestrictness of the evaluation. The evaluation shownin Table 1 was performed with a tolerance of 0.95for all test images. This table shows the numberof regions correctly detected (labeled Correctly), over-segmented regions (Over), undersegmented regions(Under), missed regions (Missed), and noise regions(Noise) for every image for both our method (CIMAT)and the method in [6] (UE), which is the algorithm

that has the best evaluation in [8,10,18] and is consid-ered one of the state-of-the-art methods in this area.

As one can see, the performance of our algorithmis better with respect to the number of correctlydetected and missed regions. The number of over-segmented and undersegmented regions is slightlyhigherinourprocedure;this isduetoverysmallorthinregions that appear mainly in larger regions, such asthe wall and floor. This problem is produced in placeswhere the noise is high and there are no corner points

Table 1. Performance Comparison of the CIMAT and UE Methods

Structured LightScanner Test Total Regions

Correctly Over Under Missed Noise

UE CIMAT UE CIMAT UE CIMAT UE CIMAT UE CIMAT

00 30 12 13 0 0 0 0 18 17 18 2201 26 13 16 1 1 0 0 12 9 9 902 24 12 13 0 0 0 0 12 11 10 1103 16 11 12 0 0 0 0 5 4 2 304 22 12 13 0 2 0 0 10 7 9 705 21 10 11 0 0 0 0 11 10 10 1106 19 11 12 0 1 0 1 8 4 5 407 20 8 11 0 0 0 0 12 9 11 908 24 15 11 0 2 0 0 9 11 9 1109 20 8 10 0 0 0 0 12 10 11 1210 13 7 7 0 1 0 0 6 5 3 511 12 6 6 0 1 0 0 6 5 4 512 25 9 6 0 1 0 0 16 18 15 1913 20 10 14 0 0 0 0 10 6 10 614 20 10 13 1 3 0 0 9 4 6 515 20 8 10 0 1 0 0 12 9 10 916 12 7 9 1 0 0 0 4 3 3 417 12 5 8 0 0 0 0 7 4 6 418 20 10 12 0 1 0 0 10 7 10 819 26 11 9 0 0 0 0 15 17 13 1820 18 10 13 0 0 0 0 8 5 7 621 12 8 7 0 0 0 0 4 5 4 1222 14 5 4 0 1 0 0 9 9 8 823 18 8 11 0 0 0 0 10 7 10 1124 19 8 11 0 1 0 0 11 7 11 825 14 8 9 0 0 0 0 6 5 5 526 13 7 7 0 0 0 0 6 6 6 727 12 5 7 0 0 0 0 7 5 6 428 9 6 9 0 0 0 0 3 0 4 029 12 5 7 0 1 0 0 7 4 7 4

TOTAL 543 265 301 3 17 0 1 275 223 242 247

Fig. 8. Example of the regionsevaluatedas oversegmentation; theregions produced by noise are inside the white ellipses: (a) rangeimage and (b) segmentation produced by the CIMAT method.


nearby. In Fig. 8 we show a typical example of two re-gionsevaluatedasoversegmentedduetothisproblem.It is also possible to plot performance curves for dif-

ferent values of the tolerance [10] to comparedifferentmethods. Figures 9 and 10 show these curves formethods labeled: UE [6], UB [5], UFPR [4], and ourmethod (CIMAT). As one can see, our method has asignificantly better performance than the others, bothwith respect to correctly detected regions and tomissed regions. Another performance measure thatis often used [10] is the area under the performancecurve (AUC) in Fig. 9. The corresponding values forthe four compared methods are given in Table 2(a higher value indicates better performance).

B. Visual Comparison

Figure 11 shows an example of the results obtainedwith our algorithm compared with UE. As one can

Fig. 10. Number of missed regions versus tolerance of the evalua-tion tool for different segmentation methods (see text).

Fig. 9. Number of correctly detected regions versus tolerance ofthe evaluation tool for different segmentation methods (see text).

Table 2. Area Under Performance (AUC) ValueComparison

Method AUC Value

UB 18873UFPR 19210UE 19549.5CIMAT 20363 Fig. 11. (a) Range image, (b) ground truth segmentation,

(c) CIMAT segmentation, (d) UE segmentation.


see, our results look more similar to the groundtruth, particularly in the neighborhood of bordersbetween regions. This may be better appreciatedin the close-up presented in Fig. 12. Because thefocus of this work is the segmentation of rangeimages of polyhedral objects, we believe that theaccurate detection of these borders is particularlyimportant.

4. Conclusions

We have presented a method for the segmentationof range images of polyhedral objects, which is basedin Bayesian estimation theory with a prior Markovrandom field model. The key elements for the successof this approach are the following:

1. The detection of corner points and, hence, theelimination of spurious information for fitting theplanar models. This step is crucial for the correct in-itialization and updating of the model’s parameters.2. The use of an appropriate geometric model for

the global Cartesian coordinates associated witheach pixel in the range image [Eq. (1)].3. The use of a quadratic Markov measure field

model for the segmentation, which includes a confi-dence field (δ) and a spatially varying potential (basedon theB field) that takes into account the analytic for-mula for the intersection of neighboring planes.

The performance of the resultingmethod comparesfavorably with other published approaches, bothquantitatively and qualitatively.

The authors were supported in part by Conacyt(Mexico) grants 46270 and 61367.

References

1. S. Inokuchi, K. Sato, and F. Matsuda, “Range imaging systemfor 3-D object recognition,” in Proceedings of the InternationalConference on Pattern Recognition (IEEE, 1984), pp. 806–808.

2. A. Hover, “ABW images: the camera and models, ”http://marathon.csee.usf.edu/range/seg‑comp/SegComp.html.

3. A. Hoover, D. Goldgof, and K. W. Bowyer, “The space envelope:a representation for 3D scenes,”Comput. Vision ImageUnder-stand. 69, 310–329 (1998).

4. D. B. Goldgof, T. S. Huang, and H. Lee, “A curvature-basedapproach to terrain recognition,” IEEE Trans. Pattern Anal.Mach. Intell. 11, 1213–1217 (1989).

5. X. Y. Jiang and H. Bunke “Fast segmentation of range imagesinto planar regions by scan line grouping,” Machine VisionAppl. 7, 115–122 (1994).

6. E. Trucco and R. B. Fisher, “Experiments in curvature-basedsegmentation of range data,” IEEE Trans. Pattern Anal.Mach. Intell. 17, 177–182 (1995).

7. T. S. Newman, P. J. Flynn, and A. K. Jain, “Model-based clas-sification of quadric surfaces,” CVGIP Image Understand. 58,235–249 (1993).

8. A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke,D. B. Goldgof, K. Bowyer, D. W. Eggert, A. Fitzgibbon, andR. B. Fisher, “An experimental comparison of range imagesegmentation algorithms,” IEEE Trans. Pattern Anal. Mach.Intell. 18, 673–689 (1996).

9. D. Huber, O. Carmichael, andM.Hebert, “3-Dmap reconstruc-tion from range data,” in Proceedings of the IEEE Interna-tional Conference on Robotics and Automation (ICRA ’00)(IEEE, 2000), pp. 891–897.

10. J. Min, M. Powell, and K.W. Bowyer, “Automated performanceevaluation of range image segmentation algorithms,” IEEETrans. Syst. Man Cybern. Part B 34, 263–271 (2004).

11. X. Jiang, “An adaptive contour closure algorithm and experi-mental evaluation,” IEEE Trans. Pattern Anal. Mach. Intell.22, 1252–1265 (2000).

12. R. W. Taylor, M. Sarini, and A. P. Reeves, “Fast segmentationof range imagery into planar regions,” Comput. Vision Graph.Image Process. 45, 42–60 (1989).

13. Y. Chen and G. Medioni, “Object modeling by registration ofmultiple range images,” in IEEE Conference on Roboticsand Automations (IEEE, 1991), pp. 2724–2729.

14. A. W. Fitzgibbon, D. W. Eggert, and R. B. Fisher, “High-levelCAD model acquisition from range images,” Comput. AidedDes. 29, 321–330 (1997).

15. Stan Z. Li,Markov Random Field Modeling in Image Analysis(Springer, 2001).

16. M. Rivera, O. Ocegueda, and J. L. Marroquin, “Entropy-controlled quadratic measure field models for efficient imagesegmentation,” IEEE Trans. Image Process. 16, 3047–3057(2007).

17. http://marathon.csee.usf.edu/range/seg‑comp/SegComp.html.18. X. Jiang, K. Bowyer, Y. Morioka, S. Hiura, K. Sato, S. Inokuchi,

M. Bock, C. Guerra, R. E. Loke, and J. M. H. du Buf.,“Some further results of experimental comparison of rangeimage segmentation algorithms,” in Proceedings of Interna-tional Conference on Pattern Recognition, Vol. 4 (IEEE,2000), pp. 877–881.

Fig. 12. (a) CIMAT segmentation, (b) ground truth, and (c) UEsegmentation.


Bayesian segmentation of range images of polyhedral objects using entropy-controlled quadratic...

Documents

Transcript of Bayesian segmentation of range images of polyhedral objects using entropy-controlled quadratic...