Visual Computing and Learning Lab - Perceptual grouping ofline features in...

Pattern Recognition 37 (2004) 145–159www.elsevier.com/locate/patcog

Perceptual grouping of line features in 3-D space:a model-based framework

In Kyu Parka ;∗, Kyoung Mu Leeb, Sang Uk Leeb

aMultimedia Laboratory, Samsung Advanced Institute of Technology, San 14-1, Nongseo-ri, Kiheung-eup,Yongin 449-712, South Korea

bSchool of Electrical Engineering and Computer Science, Seoul National University, Seoul 151-742, South Korea

Received 14 February 2003; accepted 12 June 2003

Abstract

In this paper, we propose a novel model-based perceptual grouping algorithm for the line features of 3-D polyhedral objects.Given a 3-D polyhedral model, perceptual grouping is performed to extract a set of 3-D line segments which are geometricallyconsistent with the 3-D model. Unlike the conventional approaches, grouping is done in 3-D space in a model-based framework.In our unique approach, a decision tree classi7er is employed for encoding and retrieving the geometric information of the3-D model. A Gestalt graph is constructed by classifying input instances into proper Gestalt relations using the decision tree.The Gestalt graph is then decomposed into a few subgraphs, yielding appropriate groups of features. As an application, wesuggest a 3-D object recognition system which can be accomplished by selecting a best-matched group. In order to evaluatethe performance of the proposed algorithm, experiments are carried out on both synthetic and real scenes.? 2003 Published by Elsevier Ltd on behalf of Pattern Recognition Society.

Keywords: Perceptual grouping; Model-based framework; Line feature; Decision tree classi7er; Gestalt graph; Subgraph; Object recognition

1. Introduction

In arti7cial intelligence and computer vision, objectrecognition in a complex scene has been an ultimate goalfor several decades. Although a lot of algorithms have beenproposed so far, their performance in real situation is quitelow compared with that of human visual ability. In orderto mimic the recognition process of human vision systemand, therefore, to increase the performance of recognitionalgorithms, computer vision researchers have employed apsychological model of visual perception, known as per-ceptual organization. The essential notion of perceptual

∗ Corresponding author. Tel.: +82-31-280-9247;fax: +82-31-280-9207.

E-mail addresses: [email protected] (I.K. Park),[email protected] (K.M. Lee), [email protected](S.U. Lee).

organization is that object features are not distributed at ran-dom, but they exhibit some (probably unknown) hierarchi-cal regularities or patterns, known as the Gestalt principles[1]. According to the Gestalt psychology, elements in animage are grouped from parts to whole during the recogni-tion process, based on the Gestalt principles such as prox-imity, parallelism, closure, symmetry, continuation, and soon. The process is known as perceptual grouping.

Starting from the psychological origin, a lot of researchershave revealed that grouping of features that are likely tocome from a same object plays a crucial role in object recog-nition task [2–6], since it does not only provide higher-levelinformation, but also reduce the computational complex-ity of matching process signi7cantly [1,3,7–10]. In additionto recognition task, perceptual grouping has been appliedto various applications including stereo matching, modelindexing, contour completion, 7gure-ground segmentation,change detection, and more [2,3,5,6,11–15].

0031-3203/$30.00 ? 2003 Published by Elsevier Ltd on behalf of Pattern Recognition Society.doi:10.1016/S0031-3203(03)00225-5

mailto:[email protected]



146 I.K. Park et al. / Pattern Recognition 37 (2004) 145–159

Although it is not a simple task to group generic ob-ject features perceptually, perceptual grouping can be usedas a powerful methodology for solving the problems ofthe extraction and recognition of a polyhedral object andman-made structures owing to their inherent structural regu-larities. Furthermore, part-to-whole grouping hierarchy canbe used eEciently for the reconstruction of 3-D structure.However, most existing perceptual grouping algorithmshave several inherent drawbacks: they are often subject toheuristics, and they usually depend on a few parameters thatshould be manually adjusted, therefore the Gexibility is de-graded from case to case. Moreover, since most of them aredesigned for grouping 2-D geometric primitives exclusively[2,3,9,10,13,14,16,17], direct extension to 3-D problem isnot successful, due to perspective distortion and occlusion.For example, parallel lines in 3-D space appear to convergeto a vanishing point under perspective projection.

As have been discussed in the recent workshops[18–20], it is generally understood that the underlyingproblems of the conventional straightforward perceptualgrouping approaches could be improved substantially bytaking into account the new research directions [21]:

• The role of learning in perceptual grouping.• The use of object models.• Perceptual grouping in 3-D.• Quality measures.• Perceptual grouping of motion sequence.

Among the above suggestions, the 7rst three are the issuesto be addressed in this paper. We propose a novel perceptualgrouping algorithm based on a 3-D model-based framework.The main idea of the proposed algorithm is to utilize notonly the Gestalt principles but also useful information pro-vided by the given 3-D model simultaneously, to improvegrouping performance signi7cantly. Unlike the conventionalmethods, the whole grouping process is carried out in 3-Dspace, in which object features are modeled as 3-D line seg-ments. In order to achieve the ‘model-based learning’, theproposed algorithm employs the decision tree classi7er [22],which is constructed by learning the training samples of agiven 3-D model. Decision tree learning has been used as apowerful and eEcient method for machine learning to ac-quire useful knowledge from mass amount of data. In ourdecision tree learning and classi7cation processes, a pair of3-D line features is used as a training sample as well as aninput instance. During classi7cation, each input instance isclassi7ed by a decision tree, yielding a Gestalt relationshipof the corresponding line pair. Gestalt graph is then con-structed to model the network of 3-D line features, in whichthe nodes represent 3-D line features and the edges encodethe Gestalt relationship. Based on the graph representation,graph partitioning is then performed to generate a few sub-groups consisting of model-consistent features. As an ap-plication, by selecting the best-matched group to the modelby using a simple alignment method, 3-D object recognition

Fig. 1. Block diagram of the proposed algorithm.

can be accomplished eMectively. The overall block diagramof the proposed algorithm is shown in Fig. 1.

The main contribution of this paper is summarized as fol-lows: Firstly, by employing a 3-D model-based approach,the grouping process becomes not only independent on theview point of a scene but also free from perspectivedistortion and occlusion. Secondly, which is the mostimportant contribution, the learning and model-based frame-work increase the capability of selecting relevant featureswhile rejecting irrelevant features eEciently, so that thegrouping yields quite a small number of groups of featuresthat show strong geometrical consistency with the givenmodel. Thirdly, the proposed algorithm is computation-ally inexpensive, since the training is done oM-line and aninstance classi7cation involves only a few comparison ofGoating point parameters. Finally, the decision tree classi-7er is robust to noise and clutters. Note that, by employingprobabilistic noise model for each attribute, backgroundclutters need not be modeled explicitly.

This paper is organized as follows: Section 2 reviews therecent works on perceptual grouping. In Section 3, meth-ods for extracting 3-D feature and its representation are pre-sented. Modeling and constructing a decision tree classi7erare described in Sections 4 and 5, respectively. In Section6, a grouping algorithm based on the decision tree classi7-cation is presented. 3-D object recognition is exempli7ed asan application of grouping in Section 7. The experimentalresults are provided in Section 8. Finally, the conclusionsare drawn in Section 9.

2. Previous work

Perceptual grouping has been used in various problemsby many researchers. Based on the classi7cation structure,algorithms are roughly categorized by the signal dimension-ality and the level of abstraction [8]. A thorough review ofthe pre-1993 works can be found in [8]. For the survey ofrecent works, the readers can refer to the special issue of

I.K. Park et al. / Pattern Recognition 37 (2004) 145–159 147

Computer Vision and Image Understanding [21], the work-shop proceedings [18,19], and the talks at [20].

Sarkar and Boyer [23,24] developed Perceptual inferencenetwork (PIN) to handle spatial data and extended the ap-plicability of Bayesian networks to visual processing. Asan improvement to the famous Lowe’s work [7], Havaldaret al. [3] proposed an algorithm to recognize generic objectsfrom a single intensity image, which exploited the criteriasuch as proximity, symmetry, parallelism, and closure. De-tection is performed using the proximity indexing. Murinoet al. [16] constructed straight-segment graph (SSG) usingdirect Hough transform (DHT). Then, minimum-energycon7guration was found by assigning a label to each nodeiteratively. Descombes and Pun [13] reduced the complexityof the grouping process to a linear problem, with respect tothe number of image contours, based on asynchrony. Otherthan Bayesian networks and energy minimization methods,Boshra and Zhang [5] used a constraint-satisfaction ap-proach, which was implemented in lower-order polynomialcomplexity by using local-consistency enforcement.

The general strategy used in the previous works can bedescribed roughly in two steps: First, Gestalt principlesare designed and implemented for image features. Then,a graph or a network for the features is constructed, andsome meaningful structures are extracted by using searchmethods, such as Bayesian inference, energy minimization,and constraint-satisfaction.

Although this approach is eEcient and promising, webelieve an object model and learning methodology wouldimprove the performance signi7cantly. To our knowledge,however, few works have been done on this. Recently,Sarkar and Soundararajan [25] has tried it, in which learningis accomplished by stochastic automata, while grouping isdone by graph partitioning. Syeda-Mahmood [26] proposedan approach to 7nd closely spaced parallelism between lineson 3-D objects to achieve data and model-driven selection.It was assumed that both the 2-D model with 3-D aEnepose change and a scaled-orthographic projection modelpreserve the parallelism. Note that, the probabilistic ap-proaches have been adopted recently by a few researchers.For example, the probability distribution of possible imagefeature group was derived in [27] and the combinationof incommensurable sources of information was achievedin [14].

3. Feature extraction and modeling

Our objective is to group 3-D features perceptually. Sincethey can describe 3-D polyhedral objects eEciently, 3-Dline segments become useful features for the purpose. Notethat grouping performance is highly degraded if 3-D fea-tures are extracted poorly. In this context, the importanceof the low-level processing of feature extraction cannot beoveremphasized.

3.1. 3-D line feature extraction

The coordinates of 3-D line segments are formed by back-projecting the endpoints of 2-D line segments using the 3-Ddepth of the end points. In our implementation, the Nevatia–Babu edge detector is used to extract 2-D edge contours.Note that the performance of the Nevatia–Babu edgedetector is comparable to that of the Canny edge detector,especially when we deal with line edges. The detectededge is thinned and linked together by searching 8-connectedneighbors. Finally, each linked chain is 7tted by a 2-Dline segment, taking into account the 7tting error and theminimum length.

In order to obtain 3-D depth values of the end points, thespace encoding range 7nder is utilized in this work. Thesystem con7guration of the range 7nder is shown in Fig.2(a). Note that any other 3-D acquisition algorithms likemulti-baseline stereo reconstruction can be used too.

Although they provide available depth values, thereexists an amount of inevitable measurement noise whichdisturbs the subsequent grouping process. In our approach,a practical noise model is introduced and analyzed, so thatthe grouping parameters are derived adaptively based onthe noise model. The procedure is described in detail in thefollowing subsection.

3.2. Probabilistic noise modeling of 3-D line feature

Noise modeling and analysis are performed on the po-sitional variations of the end points of 3-D line segments.They are done in a Monte Carlo framework, using the ref-erence points with known groundtruth. As shown in Fig.2(b), we have 72 reference points marked on a pattern box.3-D range data of the reference points are measured severaltimes, while rotating the box. The generated noise samplesare shown in Fig. 3(a).

The measurement noise is de7ned as the Euclidean dis-tance between the measured point and the groundtruth,which is now represented by a random variable X̂ . LetN denote the number of the measured points. Then,the cumulative distribution function F̂(x) is estimatedby

F̂(x) = Prob(X̂ 6 x)

=# (Samples with measurement noise less than x)

# (Total noise samples)

=1N

∫ x

0

N−1∑k=0

�(s− sk) ds; (1)

where sk is the measurement noise of the kth sample and�(·) denotes the impulse function. The estimated F̂(x) isshown in Fig. 3(b), which is approximately a Gaussiandistribution.


(a) (b)

Fig. 2. Space encoding range 7nder. (a) System con7guration. (b) Pattern box with known reference points.

Fig. 3. Noise modeling of the range 7nder. (a) Measurement noise distribution of the reference points. (b) Observed cumulative distributionfunction. (c) Approximated Gaussian cumulative distribution function. (d) Approximated Gaussian density function.

Assuming a Gaussian distribution, a functional 7tting isemployed to 7nd the optimal variation 2

m which makesthe resultant cumulative distribution best 7t to F̂(x). Theresult is shown in Fig. 3(c). It is shown that the 7tteddistribution function F(x) approximates F̂(x) very well.The corresponding density function, f(x), is shown inFig. 3(d).

It is known that the measurement noise would increasein proportion to the distance from the focal plane to the ob-ject [28]. This is well exempli7ed in the measurement datawith diMerent distances, as shown in Fig. 4(a). The pro-

portional relation is modeled by a linear approximation asshown in Fig. 4(b). Therefore, the eMect of the diMerence inmeasuring con7gurations or environments can be compen-sated by considering X , given by

X = aX̂ + b; (2)

where a and b are the coeEcients of the linear functionobtained in Fig. 4(b). Finally, the optimal noise variance isdetermined by

2opt = a2 2

m: (3)


Fig. 4. Measurement noise of the range 7nder with respect to the distance from the focal plane. (a) Measurement noise of selected referencepoints. (b) Average measurement noise.

4. Modeling a decision tree

We now introduce the key ingredients of the decision treewhich plays a crucial role in our approach. In the subsequentSection 5, the training and construction method of a decisiontree will be presented.

4.1. Decision tree classi9er

A decision tree classi7es instances by sorting them downthe tree from the root to some leaf node [22,29]. Each nodein the tree speci7es a test of some attribute of the instance,and each branch descending from that node corresponds toone of the possible values for this attribute. The decisiontree of the well-known PlayTennis example [22] is shown inFig. 5. It determines whether an instance (weather condi-tion) is suitable for playing tennis or not. For example, theinstance

{Outlook = Sunny; Temperature = Cool;

Humidity = Low; Wind = Weak}would be sorted down to a negative class NO. In this ex-ample, the instance has a vector representation consisting offour attributes, while the target class is twofold: YES andNO.

4.2. Instance attributes

Instance attributes should provide proper measures to dis-tinguish geometric diMerence among target classes. In ourwork, a pair of 3-D line segments is used as an instance.As will be discussed later, the target classes are speci7edby the key geometric relations between 3-D line features,including collinear (CL), parallel (PP), convergent (CV),and none (NO).

The target classes can be eMectively discriminatedby comparing a few geometric parameters like angle anddistance. Those parameters are adopted as the instanceattributes. For example, in order to distinguish convergent

from collinear and parallel, the between-angle property, !,is examined (Fig. 6(a)). Next, to evaluate the proximity,the minimum distance between two line segments, dmin, isde7ned (Fig. 6(b)). In addition to the between angle andminimum distance, the projected overlapping ratio, L0, isalso taken into account (Fig. 6(c)) to distinguish parallelfrom collinear. Usually, the orthographic projection of oneonto another should be overlapped enough for a pair of linesegments to be parallel. Finally, 3-D coplanarity is a neces-sary condition for a pair of line segments to be grouped asone of the discussed target classes. Coplanarity (P) can bemeasured by computing the average deviation of the end-points from the 7tted plane. In summary, an instance (a pairof 3-D line segments) is represented by an attribute vector,denoted by

I = {!; dmin ; L0; P}: (4)

4.3. Target classes

In the proposed decision tree classi7er, the target classesconsist of four geometric relations, i.e., collinear (CL), par-allel (PP), convergent (CV), and none (NO). Based onthem, the shape of a 3-D polyhedral model can be charac-terized properly. Note that they are the key ingredients ofthe Gestalt relations, which are known as the intermediatestructures for human perception of 3-D object [1].

In order to evaluate the con7dence of classi7cation, wede7ne a measure of signi7cance for each class. The signi7-cance level, CCL, of a collinear pair is de7ned by

CCL = 1 − min(

1;#

min(l1; l2)

); (5)

where # is the minimum distance between the endpoints ofthe line segments with lengths of l1 and l2. As shown inFig. 7(a)–(c), the signi7cance level increases as the linesegments become closer.

The signi7cance level of a parallel pair, CPP, is de7nedas the normalized overlapping ratio of the orthographic


(a) (b)

Fig. 5. The decision tree example of PlayTennis [23]. (a) Decision tree. (b) Training samples.

(a) (b) (c)

Fig. 6. Instance attributes. (a) Between angle (!). (b) Minimum distance (dmin). (c) Projected overlapping ratio (L0).

(a) (b)

(c)

(f) (g) (h)

(d) (e)

(c)

Fig. 7. Target classes and signi7cance level. (a) Strong collinearpair. (b) Weak collinear pair. (c) Invalid collinear pair. (d) Strongparallel pair. (e) Weak parallel pair. (f) Invalid parallel pair. (g)Strong convergent pair. (h) Weak convergent pair. (i) Invalid con-vergent pair.

projection of one onto another, given by

CPP =L0

min(l1; l2); (6)

where L0 is the overlapped length. The parallelism becomesstronger when the line segments are more overlapped, whichis illustrated in Fig. 7(d)–(f).

The signi7cance level of a convergent pair,CCV, measuresthe completeness of the convergent corner, and is de7nedby the minimum of the line completeness values of the linesegments. The line completeness is evaluated by the ratioof the segment length and the elongated segment length(Fig. 7(h)). The mathematical de7nition of CCV is given by

CCV = min(

l1L1

;l2L2

); (7)

where L1 and L2 denote the length of the elongated line seg-ments. Note that CCL, CPP, and CCV are all normalized valuebetween 0 (zero signi7cance) to 1 (perfect signi7cance).

5. Construction of a decision tree

A decision tree is constructed by learning training sam-ples obtained from a given 3-D polyhedral model. In this pa-per, we employ the well-known ID3 algorithm [22]. In thissection, we describe the tree construction procedure, as wellas the practical issue of discretizing continuous attributes.

5.1. ID3

ID3 [22] is one of the most popular algorithms for con-structing a decision tree. It constructs a decision tree in atop-down manner, in which greedy search is performed to7nd the appropriate attribute to test at each node. A test ata speci7c node is selected to yield maximum informationgain. The information gain is the measure of purity increasethat can be expected as a result of partitioning the training


(a) (b)

Fig. 8. Model and training samples. (a) A simple polyhedral model. (b) Training samples (Total 36 samples). � is coplanarity coeEcient.

samples by the test. It is evaluated by the entropy of thetraining samples on the class homogeneity. The constructeddecision tree of the PlayTennis example is shown in Fig. 5.For the complete discussion, the readers should refer to Refs.[22] or [29].

5.2. Learning training samples

In the 3-D model-based framework, the decision tree con-struction is equivalent to the process of training samplesextracted from a given 3-D polyhedral model. Since ID3utilizes only discrete attributes, continuous attribute valuesshould be discretized or categorized 7rst, which is the issuein the next subsection.

The training samples are basically a set of all possiblepairs of the 3-D line segments in the polyhedral model. InFig. 8(a), an example of a polyhedral model and the linesegments are shown. A partial set of training samples isshown in Fig. 8(b), to which ID3 algorithm can be applied.The target class of each sample is obtained by hand-labeling.Note that all the learning procedures are done in an oM-lineprocessing.

5.3. Discretization of continuous attributes

ID3 accepts only categorized or discretized attributes.Therefore, numerical attributes should be discretized, priorto tree construction [30]. Since the training samples are ob-tained from a static 3-D polyhedral model, !, dmin, and Chave a few dominant values, which is well-exempli7ed inFig. 8(b). Let SA, SD, and SC denote the sets of the key valuesof !, dmin, and C, respectively. Then we have

SA = {A0; A1; : : : ; Al−1};SD= {D0; D1; : : : ; Dm−1};SC= {C0(=0)}: (8)

In ID3, only a single attribute is tested at each node. Forinstance, ‘If the angle attribute of an instance is Ai then

branch to the node Ns, otherwise branch to the node N ′t .

If the feature extraction is perfect and the extracted featuresdon’t have any noise, then we can use the elements of SA,SD, and SC as the categorized/discretized attributes directly.However, no instance would have exactly the same attributein SA, SD, and SC in a real situation. Therefore, it is proper todiscretize the attributes with intervals, allowing some mar-gins of underlying noise. Note that naive selection of at-tribute intervals does not provide robust classi7cation in realapplications.

In order to solve the problem, we perform the probabilisticanalysis of the key values in SA, SD, and SC. The goal is topredict probabilistic distributions of the observed attributevalues in noisy environment, and consequently to performthe discretization of the attributes adaptively.

Without loss of generality, the procedure of the proba-bilistic modeling is deployed for a speci7c key value Ai inSA. Observed angle of Ai is distorted by the positional errorof the end points of line segments, yielding a distributioncentered at Ai. Notice that the position of the end points isrepresented by an independently and identically distributedGaussian random variable X̂ as described in Section 3.Therefore, observed angle now can be considered as a ran-dom variable &, which is a function of X̂ , given by

& = cos−1(

( SX1 − SX0) · ( SX3 − SX2)‖ SX1 − SX0‖ ‖ SX3 − SX2‖

); (9)

where ( SX0, SX1), and ( SX2, SX3) denote the position vectors ofthe end points of two line segments, respectively. However,it is not a simple task to derive the distribution of & in Eq.(9) analytically. Thus, in our approach, attempts are madeto generate many random instances using the distribution ofX̂ , which is essentially a Monte-Carlo simulation. We canestimate the probability distribution of & out of the observedvalues. In this procedure, & is assumed to be Gaussian. Itis observed in many examples that the observed distributionis approximated well by a Gaussian distribution.

Similar procedures can be applied to all the key values inSA; SD, and SC to obtain the probabilistic distributions. In Fig. 9,the probability density functions of the attributes for a


Fig. 9. Attribute noise modeling for a simple box model. (a) A simple box model. (b) Probability density of the angle attribute. (c)Probability density of the distance attribute. (d) Probability density of the coplanarity attribute.

Fig. 10. Discretization example of the continuous attributes.

simple object model are shown. Using 99% of sig-ni7cance level of Gaussian distribution, each densityfunction, N (m; 2), yields a discretization interval of[m − 2:58 ; m + 2:58 ]. Each interval corresponds to therelevant classi7cation class. When the intervals are over-lapped, the maximum a posteriori rule is applied to selecta prevailing class. Uncovered intervals are assigned to theclass NO. An example of the generated discretization in-tervals is shown in Fig. 10, in which the position of the keyvalues and the resultant target classes are speci7ed in thecorresponding intervals.

5.4. Summary and example

The proposed learning algorithm is summarized inFig. 11(a). In Fig. 11(b), we present an example of theconstructed decision tree of the 3-D model shown in Fig.8(a), in which, for the convenience, each branching isspeci7ed by a key value, instead of a proper interval. Notethat, the tree does not have any node for testing projectedoverlapping ratio, because it does not provide any usefulinformation for the classi7cation in this example. Never-theless, the tree performs classi7cation properly, which isone of the nice properties of decision tree learning.

6. Instance grouping using the decision tree

In this section, we present the procedure of the initialgrouping and re7nement methods, which are the essentialparts of the proposed two-stage grouping.

6.1. Initial grouping: Gestalt graph construction

In the initial grouping stage, an instance set is 7rst gen-erated by collecting all the pairs of extracted line features.Each instance is then classi7ed into one of the target classes


(a) (b)

Fig. 11. Summary of decision tree construction. (a) The procedure. (b) Constructed decision tree for the model in Fig. 8(a).

simply by putting it into the decision tree classi7er. A targetclass implies a Gestalt relation between the line segments.After classifying all the instances, we now have a networkof line segments, i.e., a Gestalt graph, in which an edge inthe graph denotes a Gestalt relation. It is similar to the graphrepresentation in [8], in which 7ve basic graphs are con-sidered independently. The proposed graph representationdoes not utilize multiple Gestalt graphs, but put the derivedGestalt relations into a graph simultaneously. This facili-tates a simpler structure of learning and classifying. Let usdenote the nodes, edges, and the decision tree classi7er asV, E and DT, respectively. Then, the Gestalt graph G, isde7ned as a 3-tuple, given by

G = (V;E;DT); (10)

where

V = {Vi |Vi = li};E = {Ei; j |Ei; j = (DT(Vi; Vj); C)}: (11)

In Eq. (11), C denotes the signi7cance level of the clas-si7cation. In case that the output of DT is CV, the edgeattribute has the coeEcients of a plane on which Vi andVj lie, as well as a Gestalt relation.

6.2. Re9nement: graph partitioning

The Gestalt graph is a network of 3-D features, whichis geometrically consistent with the given 3-D model. Dueto the discrimination power of the model-based grouping,most of other features are eliminated and only a small num-ber of features remain. It is observed from the experimentsthat the Gestalt graph can be partitioned into several sub-graphs, based on the eMective connectivity. To this end,graph-cutting edges which are below a certain threshold

Fig. 12. Object recognition by group selection.

of signi7cance level are found. In our implementation, thethreshold is set to 0.5. If a small number of edges (usuallyless than 3) divide the graph into more than two subgraphsand the size of the children subgraphs are large enough,then the edges are removed from the graph, making newsubgraphs. The process is performed iteratively, yieldingsubgraphs as 7nal grouping result. However, in many cases,only single group remains so that further decomposition isnot necessary. Note that this observation con7rms the eE-cacy of the model-based approach.

7. An application: 3-D object recognition

In this section, we show that the proposed grouping tech-nique can be applied directly to the 3-D object recognitiontask. In our framework, recognition turns out to be the pro-cess of selecting the most probable group from the groupingresult. To this end, simple alignment method is employed, inwhich attempts are made to examine all the possible align-ing between line features in each candidate group and thosein the given 3-D model. The alignment method is known


Fig. 13. Multilevel grouping result. (a) Input scene (from top view). (b) 3-D polyhedral model. (c) Extracted line segments. (d) Detectedcollinear pairs. (e) Detected parallel pairs. (f) Detected convergent pairs. (g) Grouping result. (h) Best-matched group.

to be computationally expensive. However, the complexityis tolerable in our case, since there exist only a few nodesin a subgraph. The procedure in Fig. 12 is used in realimplementation. In this way, we can select the feature groupwhich shows the most geometric compatibility to the givenmodel as the 7nal recognition result.

8. Experimental results and discussion

In order to examine the performance of the pro-posed algorithm, experiments have been carried outon both synthetic and real scenes. Since polyhedralobjects were our main concern, several aerial


Fig. 14. Multilevel grouping result. (a) Input scene (from oblique view). (b) 3-D polyhedral model. (c) Extracted line segments. (d) Detectedcollinear pairs. (e) Detected parallel pairs. (f) Detected convergent pairs. (g) Grouping result. (h) Best-matched group.

images containing man-made structures were used for theexperiments.

8.1. Grouping results

The intermediate grouping results were 7rst investigatedon synthetic aerial scenes, as shown in Figs. 13 and 14. Each

scene consists of several block-shaped buildings of RADIUSmodel board [31]. Each building was texture-mapped us-ing the textures from the board images. Figs. 13(a) and (b)show the input image and the 3-D polyhedral model, respec-tively. The initial groupings of the extracted line segments,shown in Fig. 13(c), are presented in Figs. 13(d)–(f), inwhich the detected collinear, parallel, and convergent pairs


Fig. 15. Experimental result for model board. (a) Input intensity image. (b) 3-D polyhedral model. (c) Extracted line feature. (d) Groupingresult.

are shown, respectively. Notice that all the extracted pairsare drawn together. By constructing the Gestalt graph andpartitioning it into subgraphs, the resultant feature groupswere obtained as shown in Fig. 13(g). Fig. 13(h) shows theselected best-matched group using the alignment method.For another test scene with diMerent view point as shownin Fig. 14, correct grouping and matching results were alsoobtained.

For the test on real scenes, a city model board was spe-cially designed to include various types of buildings in dif-ferent sizes and roof shapes. The reduced scale of the modelboard is 1:1000. Therefore, a 5 cm-height building in theboard would correspond to a 50 m-height building in realworld. Figs. 15(a) and (b) show the input intensity imageand the 3-D polyhedral model, respectively. The extractedline features are shown in Fig. 15(c). The grouping resultis presented in Fig. 15(d), in which only a single group re-mained, which was also the best-matched group. While, as

shown in Fig. 16(d), which are the grouping results of thescene from another view point shown in Fig. 16(a), four can-didate groups were extracted. After the alignment process,7nally, the third group was veri7ed as the best-match, asshown in Fig. 16(e). Notice that the graph edges are shown(in blue color) together with the line segments themselves.

We have observed form the experimental resultsthat only a few number of extracted feature groupsremained for all the test scenes. In addition, eachextracted group consisted of a small number of featurestoo, which had the strong geometric coherence with thegiven 3-D model. Note that, it is one of the importantand desirable properties to rule out irrelevant featuresas much as possible, while keeping relevant ones. Al-though no objective measure of the grouping performanceis provided, it can be stated that such 7ltering propertywould show the nice quality of the proposed groupingsubjectively.


Fig. 16. Experimental result for model board. (a) Input intensity image. (b) 3-D polyhedral model. (c) Extracted line feature. (d) Groupingresult. (e) Best-matched group.


8.2. Computational complexity

Now let us summarize the computational complexity ofthe proposed algorithm. Decision tree construction can beperformed in an oM-line processing, thus it is beyond ourconcern. Let us denote N to be the number of extracted linefeatures of the input image. Then, the number of input in-stances is (N

2 ), and the complexity of the classi7cation pro-cedure becomes O(N 2). However, since the classi7cation ofan input instance involves only a few comparisons, ( N

2 ) isquite endurable enough in our application. Moreover, oncethe initial grouping is performed, only a small fraction of theinput instances remains, reducing the computational com-plexity of the subsequent procedure to sublinear of N . Inour experiments, the algorithm has been implemented andrun on Pentium II 400 MHz processor. It was observed thatthe grouping was performed within 10 s, which con7rms thecomputational eEcacy of the proposed algorithm.

9. Conclusion

In this paper, we proposed a new model-based perceptualgrouping algorithm of 3-D line features in an image. Ourmain contribution is to utilize not only the Gestalt relations,but also the information provided by a given 3-D polyhedralmodel. We employed a machine learning technique, i.e., adecision tree classi7er, for learning the model and group-ing the input instances which are geometrically consistentwith the model. The learning and model-based frameworkincreased the capability of selecting relevant features andrejecting irrelevant features signi7cantly. Gestalt graph wasconstructed as a result of grouping, and further re7nementyielded a few groups that were geometrically consistent withthe 3-D model.

Since the proposed grouping process was performed in3-D, it was independent on the view point of a scene andalso free from the perspective distortion. In addition, theproposed algorithm was computationally inexpensive. Thetraining was done in an oM-line processing and the instanceclassi7cation involved only a few comparison and Goatingpoint operations.

In the future, we are going to extend the proposed al-gorithm to have the capability of learning and classify-ing multiple 3-D objects. Our future work also includesgrouping-based object recognition and tracking in image se-quences. We are also interested in applying the proposedmethod to image-based scene modeling, in which 3-D tex-tured polyhedral world could be created automatically.

References

[1] D.G. Lowe, Perceptual Organization and Visual Recognition,Kluwer Academic, Norwell, MA, 1985.

[2] R. Mohan, R. Nevatia, Using perceptual organization to extract3-D structures, IEEE Trans. Pattern Anal. Mach. Intell. 11(11) (1989) 1121–1139.

[3] P. Havaldar, G. Medioni, F. Stein, Perceptual grouping forgeneric recognition, Int. J. Comput. Vision 20 (1/2) (1996)59–80.

[4] A. Amir, M. Lindenbaum, A generic grouping algorithm andits quantitative analysis, IEEE Trans. Pattern Anal. Mach.Intell. 20 (2) (1998) 168–185.

[5] M. Boshra, H. Zhang, A constraint-satisfaction approach for3-D object recognition by integrating 2-D and 3-D data,Comput. Vision Image Understanding 73 (2) (1999) 200–214.

[6] A. Selinger, R.C. Nelson, A perceptual grouping hierarchyfor appearance-based 3-D object recognition, Comput. VisionImage Understanding 76 (1) (1999) 83–92.

[7] D.G. Lowe, 3-D object recognition from single 2-D images,Artif. Intell. 31 (3) (1987) 355–395.

[8] S. Sarkar, K.L. Boyer, Perceptual organization in computervision: a review and a proposal for a classi7catory structure,IEEE Trans. System Man Cybernet. 23 (2) (1993) 382–399.

[9] S. Sarkar, K.L. Boyer, A computational structure forpreattentive perceptual organization: graphical enumerationand voting methods, IEEE Trans. System Man Cybernet.24 (2) (1994) 246–267.

[10] G.L. Foresti, C. Regazzoni, A hierarchical approach to featureextraction and grouping, IEEE Trans. Image Process. 9 (6)(2000) 1056–1074.

[11] D.W. Jacobs, Robust and eEcient detection of salient convexgroups, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1) (1996)23–37.

[12] A. Yla-Jaaski, F. Ade, Grouping symmetrical structures forobject segmentation and description, Comput. Vision ImageUnderstanding 63 (3) (1996) 399–417.

[13] A.J. Descombes, T. Pun, Asynchornous perceptual grouping:from contours to relevant 2-D structures, Comput. VisionImage Understanding 66 (1) (1997) 1–24.

[14] D. Crevier, A probabilistic method for extracting chains ofcollinear segments, Comput. Vision Image Understanding76 (1) (1999) 36–53.

[15] I.K. Park, K.M. Lee, S.U. Lee, Recognition andreconstruction of 3-D objects using model-based perceptualgrouping, Proceedings of International Conference on PatternRecognition, Vol. I, Barcelona, Spain, September 2000, pp.720–724.

[16] V. Murino, C.S. Regazzoni, G.L. Foresti, Grouping asa searching process for minimum-energy con7guration oflabelled random 7elds, Comput. Vision Image Understanding64 (1) (1996) 157–174.

[17] J. Feldman, Regularity-based perceptual grouping, Comput.Intell. 13 (4) (1997) 582–623.

[18] D. Jacobs, M. Lindenbaum (organizer), Proceedings of IEEEComputer Society Workshop on Perceptual organization inComputer Vision, Santa Barbara, CA, June 1998.

[19] K. Boyer, S. Sarkar (organizer), Proceedings of IEEEComputer Society Workshop on Perceptual organization inComputer Vision, Corfu, Greece, September 1999.

[20] K. Boyer, S. Sarkar (organizer), Third Workshop onPerceptual Organization in Computer Vision, Vancouver,Canada, July 2001.

[21] K.L. Boyer, S. Sarkar, Perceptual organization in computervision: status, challenges, and potential, Comput. VisionImage Understanding 76 (1) (1999) 1–4.


[22] J.R. Quinlan, Induction of decision trees, Machine Learning1 (1986) 81–106.

[23] S. Sarkar, K.L. Boyer, Integration, inference, and managementof spatial information using Bayesian networks: perceptualorganization, IEEE Trans. Pattern Anal. Mach. Intell. 15 (3)(1993) 256–274.

[24] S. Sarkar, K.L. Boyer, Using perceputal inference networksto manage vision processes, Comput. Vision ImageUnderstanding 62 (1) (1995) 27–46.

[25] S. Sarkar, P. Soundararajan, Supervised learning of largeperceptual organization: graph spectral partitioning andlearning automata, IEEE Trans. Pattern Anal. Mach. Intell.22 (5) (2000) 504–525.

[26] T.F. Syeda-Mahmood, Data- and model-driven selection usingparallel line groups, Comput. Vision Image Understanding 67(3) (1997) 205–222.

[27] R.L. Castano, S. Hutchinson, A probabilistic approach toperceptual grouping, Comput. Vision Image Understanding64 (3) (1996) 399–419.

[28] M. Trobina, Error model of a coded-light range sensor,Technical Report BIWI-TR-164, Image Science Group, ETH,September 1995.

[29] T.M. Mitchell, Machine Learning, McGraw-Hill, New York,1997.

[30] U.M. Fayyad, K.B. Irani, Multi-interval discretizationof continuous-valued attributes for classi7cation learning,International Joint Conference on Arti7cial Intelli-gence, Chambery, France, September 1993, pp. 1022–1027.

[31] D.C. Nadadur, X. Zhang, R.M. Haralick, Groundtruthoutline drawing in model board images, Technical ReportISL-TR-94-01, University of Washington.

About the Author—IN KYU PARK received the B.S., M.S., and Ph.D. degrees from Seoul National University, Seoul, Korea in 1995,1997, and 2001, respectively, all in Electrical Engineering and Computer Science. In September 2001, he joined the Samsung AdvancedInstitute of Technology, Yongin, Korea as a member of the technical staM, where he has been involved in MPEG standardization activities.Dr. Park’s research interests are in the areas of 3-D computer vision, computer graphics, pattern recognition, and multimedia application,especially 3-D shape reconstruction, image-based modeling and rendering, object recognition, and multimedia database indexing/retrieval.

About the Author—KYOUNG MU LEE received the B.S. and M.S. degrees in Control and Instrumentation Engineering from SeoulNational University, Seoul, Korea in 1984 and 1986, respectively, and Ph.D. degree in Electrical Engineering from the University ofSouthern California, Los Angeles, in 1993. From 1993 to 1994, he was a research associate in the Signal and Image Processing Institute atthe University of Southern California. He was with the Samsung Electronics Co. Ltd. at Suwon in Korea as a senior researcher from 1994to 1995, where he worked on developing industrial real-time vision systems. From 1995 to 2003, he was an Assistant Professor and anAssociated Professor in the Department of Electronics and Electrical Engineering of the Hong-Ik University in Seoul, Korea. In September2003, he joined the School of Electrical Engineering and Computer Science at Seoul National University as an Associate Professor. Dr. Leeis currently serving as a Member of Editorial Board of the EURASIP Journal of Applied Signal Processing. His current primary researchinterests include computational vision, shape from X, 2-D and 3-D object recognition, human–computer interface, and visual navigation.

About the Author—SANG UK LEE received his B.S. degree from Seoul National University, Seoul, Korea, in 1973, the M.S. degreefrom Iowa State University, Ames in 1976, and Ph.D. degree from University of Southern California, Los Angeles, in 1980, all in ElectricalEngineering. From 1980 to 1981, he was with the General Electric Company, Lynchburg, VA, working on the development of the digitalmobile radio. From 1981 to 1983, he was a Member of Technical StaM, M/A-COM Research Center, Rockville, MD. In 1983, he joined theDepartment of Control and Instrumentation engineering at Seoul National University as an Assistant Professor, where he is now a Professorat the School of Electrical Engineering and Computer Science. Currently, he is also aEliated with the Automation and Systems ResearchInstitute and the Institute of New Media and Communications at Seoul National University. His current research interests are in the areas ofimage and video signal processing, digital communication, and computer vision. He served as an Editor-in-Chief for the Transaction of theKorean Institute of Communication Science from 1994 to 1996. Dr. Lee is currently a Member of the Editorial Board of both the Journalof Visual Communication and Image Representation and the Journal of Applied Signal Processing, and an Associate Editor for IEEETransactions on Circuits and Systems for Video Technology. He is a member of Phi Kappa Phi.

Visual Computing and Learning Lab - Perceptual grouping ofline features in...

Documents

Transcript of Visual Computing and Learning Lab - Perceptual grouping ofline features in...