Best Available Copy - DTIC · Bhanu [31 rotated the object through known angles. Ahuja and Veenstra...

* ~Comp etorence,

4 41

4 , Principal Component Analysis with Missing Dataand Its Application to Object Modeling

Heung-yeung Shum Katsushi Ikeuchi Raj Reddy

L. .IDecember, 1993 0CMU-CS-93-213

DTICELECTEVjn1CQ MAR 1.0 19941

iCM aegieý00iii Ve oi

94-07791

94 3 9 047

BestAvailable

Copy

Principal Component Analysis with Missing Data

and Its Application to Object Modeling

Heung-yeung Shum Katsushi Ikeuchi Raj Reddy

December, 1993CMU-CS-93-213

School of Computer ScienceCarnegie Mellon University

Pittsburgh, Pennsylvania 15213

Accesion For

NTIS CRA&IDTIC TABUnannounced ,Justification rl.. _ .

By ............ X:

Distribution I t , . .- -

Availability Codes , . 199Avail and I or

Dist Special

This research is partially sponsored by the Department of the Army, Army Research Office under grantnumber DAAH04-94-G-0006, partially sponsored by the Avionics Laboratory, Wright Rcrch and Devel-opment Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB, Ohio45433-6543 under Contract F33615-90-C-1465, ARPA Order No. 7597, and partially supported by NSFunder Contract IRI-9224521. The views and conclusions contained in this document are those of the authorsand should not be interpreted as representing the official policies or endorsements, either expressed orimplied, of the Department of the Army or the U.S. government.

S3 olei

Keywords: 3D object modeling, multiple view merging. principal component analysis.

AbstractObservation-based object modeling often requires integration of shape descriptions fromdifferent views. In current conventional methods, to sequentially merge multiple views,accurate description of each surface patch has to be precisely known in each view and trans-formation between any adjacent views needs to be accurately recovered. When noisy dataand mismatches are present, recovered transformation becomes erroneous. In addition, thetransformation error accumulates and propagates along the sequence, which results in aninaccurate object model. To overcome these problems, we have developed a weighted leastsquare (WLS) approach which simultaneously recovers object shape and transformationamong different views without recovering inter-frame motion as an intermediate step.

We show that object modeling from a sequence of range images is a problem of principalcomponent analysis with missing data (PCAMD), which can be generalized as a WLS mini-mization problem. Andefficient algorithm is devised to solve the problem of PCAMD. Afterwe have segmented surface regions in each view and tracked over all the sequence, we con-struct a 3F x P normal measurement matrix of surface normals, and an F x P distance mea-surement matrix of normal distances to the origin for all visible P regions appeared over thewhole sequence of F views, respectively. These two measurement matrices, which havemany missing elements due to noise, occlusion and mismatching, enable us to formulatemultiple view merging as a combination of two WLS problems. A two-step algorithm,which employs the quaternion representation of the rotation matrix, is presented to computesurface descriptions and transformations among different views simultaneously. After sur-face equations are extracted, spatial connectivity among these surfaces is established toenable the object model to be reconstructed.

Experiments using synthetic data and real range images show that our approach is robustagainst noise and mismatching and generates accurate object model by averaging over allvisible surfaces. Specifically, using a sequence of real range images, we illustrate the recon-struction of a toy house model.

1 Introduction 11.1 Past work ............................................................................................................................. 11.2 Our approach to m ultiple view m erging .............................................................................. 31.3 Organization of paper .......................................................................................................... 4

2 Principal Component Analysis with Missing Data 52.1 M otivational exam ple ..................................................................................................... 52.2 W iberg's form ulation ..................................................................................................... 72.3 M odified W iberg's form ulation .................................................................................... 92.4 W LS form ulation ............................................................................................................... 12

3 Merging Multiple Views 163.1 Two W LS problem s ...................................................................................................... 163.2 Quaternion W LS- 1 ............................................................................................................ 183.3 Iterative algorithm ........................................................................................................ 19

4 Surface Patch Tracking 214.1 Range im age segm entation .......................................................................................... 214.2 Adjacency graph ................................................................................................................ 214.3 M atching two views ..................................................................................................... 22

5 Spatial Connectivity 235.1 Half-space intersection and union ............................................................................... 235.2 M odified Jarvis' m arch ................................................................................................ 245.3 3D spatial connectivity ............................................................................................... 27

6 Experiments 296.1 Synthetic data .................................................................................................................... 29

6.1.1 Applicability ................................................................................................... 296.1.2 Robustness ..................................................................................................... 30

6.2 Real range im age sequence .......................................................................................... 33

7 Concluding Remarks 38

Acknowledgment 39

References 39

Figure 1 Distinct views of a dodecahedron ................................................................ 5Figure 2 A simple polygon and its supporting lines ........................................... 23Figure 3 Example of modified Jarvis' march and cell decomposition ............... 25Figure 4 Illustration of data structure of intersection point ................................. 26Figure 5 Reconstruction of connectivity ............................................................. 28

Figure 6 Effect of noise ...................................................................................... 31Figure 7 Effect of number of views ..................................................................... 32Figure 8 Reconstructed error vs. number of matched faces ............................... 32Figure 9 Comparison between sequential reconstruction and WLS method ....... 33Figure 10 Recovered and original dodecahedron models ................................... 33Figure 11 A sequence of images of a polyhedral object ...................................... 34Figure 12 Two views of shaded display of a recovered model ............................ 35Figure 13 A sequence of images of a toy house .................................................. 36Figure 14 Four views of a reconstructed house model .......................................... 37

1 Introduction

Solid modeling is a useful tool for tasks such as representing the virtual environment for

virtual reality systems; representing the real environment for robot programming; and

modeling real objects for object recognition. Currently, most object models are con-

structed by human operators [2]. It would be much better to have a system that can auto-

matically build models of real objects that it observes. If we can develop a reliable

technique to generate accurate 3D object models by observing real objects from multiple

views, we can reduce the effort and cost of model construction, and we can significantly

broaden the application areas of solid modeling.

Observation-based modeling systems usually work with a sequence of images of the

object(s), where the sequence spans a smoothly varying change in the positions of the sen-

sor and/or object(s). Most previous systems have attempted to apply inter-frame motion

estimates to successive pairs of views in a sequential manner [ 12]. Whenever a new view

is introduced, it is matched with the previous view, and the transformation between these

two successive views has to be recovered before the object model is updated. This sequen-

tial method does not work well in practice because local motion estimates are subject to

noise and missing data. Local mismatching errors accumulate and propagate along the

sequence, yielding erroneous object models.

Rather than sequentially integrating successive pairs of view, we should instead search for

the statistically optimal object model that is most consistent with all the views. Although

every single view provides only partial information of the object, it is likely that any part

of the object will be observed a number of times along the sequence. Object modeling

from this sequence of views can be formulated as an overdetermined minimization prob-

lem because significant redundancy exists among all the views.

1.1 Past work

Much work has been done in object modeling from a sequence of range images [4]. Most

work assumed that transformation between successive views is either known or can be

recovered, so that all data can be transformed to a fixed coordinate system. For example,

Bhanu [31 rotated the object through known angles. Ahuja and Veenstra [ I ] constructed an

octree object model from orthogonal views. Soucy and Laurendeau [ 16] proposed to trian-

gulate each view and merge multiple views via a Venn diagram when the transformation is

known. Because building a Venn diagram is combinatorial, only four-view merging is pre-

2

sented in their work. By finding the correspondences from intensity patterns in all eight

views, Vemuri and Aggarwal [21] derived the inter-frame motion and transformed all eight

range images to the first frame. Ferrie and Levine [9] merged multiple views using corre-

spondence points which are identified by correlation over the differential properties of the

surface. Parvin and Medioni [12] proposed to construct boundary representation (B-rep)

object models from unregistered multiple range images. Each view of the object is repre-

sented as an adjacency graph where nodes represent surface patches with attributes and arcs

representing adjacency between surfaces. To merge any two views, a rigid transformation

has to be computed accurately. Most of previous approaches to modeling from a sequence of

views are sequential. Thus, transformation errors accumulate and propagate from one

matching to another, which may result in imprecise object models.

Inferring scene geometry and camera motion from a sequence of intensity image is also pos-

sible in principle. For example, Tomasi and Kanade [19] proposed a factorization method to

simultaneously solve shape and motion under orthography, and Poelman and Kanade [13]

extend it to the case of paraperspective projection. Szeliski and Kang [ 18] proposed a non-

linear optimization method to solve shape and motion under perspective. However, in

[19][13], the task is formulated as a least squares problem where missing data due to occlu-

sion and mismatching is extrapolated from measured data and estimated motion. Although

three views of four points is theoretically sufficient in determining structure and motion

[201, it is difficult in practice to find a good submatrix to do "row-wise" and "column-wise"

extrapolation. Szeliski and Kang [ 18] proposed to assign a weight to each measurement and

incorporated an object-oriented perspective projection in a nonlinear least squares problem.

The very nature of the nonlinear least squares formulation requires standard techniques in

nonlinear optimization, e.g., Levenberg-Marquardt, in which convergence to a local mini-

mum may be a problem. In addition, most existing algorithms seem to be more useful for

determining camera motion than for building 3D object models because the recovered

object shape is defined by a collection of 3D points whose connectivity is not explicitly

known.

The factorization method [5][13][19], in essence, is principal component analysis of some

measurement matrix. Principal component analysis expresses the variance of the measure-

ment matrix in a compact and robust way and has been extensively studied in computational

statistics [7]. The singular value decomposition (SVD) method [10] is a straightforward

solution when the measurement matrix is complete. When data is incomplete or missing, as

often the case in practice, principal component analysis becomes much more complicated.

Ruhe [ 15] first proposed a Gauss-Newton algorithm to solve this problem, taking advantage

3

of the sparse, structured derivatives of the object function. Wiberg [22] generalized Ruhe's

work to the cases where the rank of the measurement matrix is known. However, Wiberg's

algorithm requires solving large pseudo-inverse matrices.

1.2 Our approach to multiple view merging

We propose to build object models from a sequence of range images. Our approach is to

recover bounding surfa'es and transformations simultaneously by employing principal com-

ponent analysis with missing data. After segmenting all range images and establishing cor-

respondences among different views, a composite graph is built. The object surface

description and transformations among different views are recovered by solving a combina-

tion of two weighted least squares (WLS) problems.

There are two key observations to our approach of object modeling from a sequence of

views. First, because of the redundancy in the sequence of images, we can get a reliable

solution from an overconstrained minimization problem even when data is missing. Because

only part of the object is visible in each view we cannot find correspondences among all sur-

faces between two views. Therefore, this is not a least squares (LS) problem but a WLS one

where the weights are zeros for invisible regions. The difficulty is how to formulate the

WLS problem properly and how to solve this problem without resorting to extrapolation of

the unknowns. We present an algorithm to iteratively update the surface description and

transformation so that the weighted least squares error is minimized.

The second observation lies in the first WLS problem of recovering surface normals and

rotation matrices. The modeling problem can be decomposed into two smaller problems

because recovering rotation is independent of translation. If we directly apply the WLS

algorithm, we have to explicitly update nine parameters of every rotation matrix. It is well-

known that the rotation matrix is a nonlinear function of only three independent parameters.

Therefore, updating nine parameters (even with proper normalization afterwards) is not the

best way to solve this problem. We solve this problem by representing rotations using

quaternions.

To make an object model from a recovered set of surface equations, spatial connectivity

among surfaces has also to be recovered. Spatial connectivity refers to the spatial relation-

ship among surfaces, i.e., for each surface, what other surfaces it is connected to. The prob-

lem of surface connectivity is reduced to one of connectivity of supporting lines of a simple

4

polygon, solved by a modified Jarvis' march algorithm that combines information on both

algebraic level and signal level.

1.3 Organization of paper

In section 2 we discuss principal component analysis when data is missing. From a motiva-

tional example of modeling 12-faced polyhedra from a sequence of views, we formulate the

multiple view merging as a problem of principal component analysis with missing data(PCAMD). Then we outline Wiberg's formulation of PCAMD, and modify the formulation

by proper indexing of the objective function. The modified formulation is then generalizedto a WLS minimization problem. An efficient PCAMD algorithm is presented to solve this

WLS problem. In section 3 we formulate modeling the object and recovering transforma-

tions as a combination of two WLS problems. We compute the surface description and

transformation by extracting the principal components of two highly rank-deficient mea-surement matrices with many missing elements, each of which forms a WLS problem. Thefirst WLS problem of recovering rotation matrices and surface normals is further simplified

using the quaternion representation of rotation. A two-step algorithm is presented to modelthe object from a sequence of segmented range images. Section 4 gives a brief description of

a surface patch tracking system. Different modules in the tracking system, range image seg-

mentation, adjacency graph building, and two-view merging are presented. In section 5 we

show that the problem of surface connectivity can be reduced to one of connectivity of sup-

porting lines of a simple polygon. Since the problem of establishing connectivity of support-ing lines can be regarded as both a modified convex hull-like problem and a cell

decomposition problem, we propose a modified Jarvis' march algorithm which successfullyreconstruct the simple polygon. We demonstrate applicability and robustness of proposedPCAMD method by applying our approach to synthetic data and real range images in Sec-

tion 6. We show that, given correspondence, four views are necessary to recover the shapeof an arbitrary dodecahedron (12-face polyhedrr). A study on synthetic data of a dodecahe-

dron shows that our approach is robust with respect to noise and surface mismatching. Our

method yields a statistically optimal model for a given set of views; this method improves as

more views are incorporated. From a sequence of real range images, a polyhedral objectmodel is precisely recovered using the proposed method. A complex toy house model is also

reconstructed from a sequence of range images. Final comments and concU sions are pre-

sented in Section 7.

5

2 Principal Component Analysis with Missing Data

2.1 Motivational example

Suppose that our task is to make a model for a dodecahedron (12-faced polyhedra) from a

sequence of segmented range images. A dodecahedron is a simple and interesting Platonic

solid. Assume that we have tracked 12 faces over 4 nonsingular views. The segmented

range images provide trajectories of plane coordinates {p f=rI....4,p=..12), where

p = (vT, d) represents a planar equation with surface normal v and normal distance to the

origin d. Then we may form a 16 x 12 measurement matrix as follows:

(1 )(1) p(I) p(l) (I) , * * , , ,

p1 p2 P3 P4 p5 P6(2) (2) (2) (2) , , (2) (2) , , * ,

w=- P1 P2 P3 P4 p7 p8(3) (3) (3) * P (3) P (3) (3) * *

P 1 p2 *6 8, 9(4) (4) (4) (4) (4) (4)7 * * * * * P7 Pi0 P 12

where every * indicates an unobservable face since there are only six visible faces from each

nonsingular view. Our modeling task is now to recover the poses of all the 12 faces in a

fixed coordinate system.

V2\ VI

1/-#1

V3 V4

Figure I Distinct views of a dodecahedron

6

If the measurement matrix were complete, our task would be to average all those 12 faces

over 4 views assuming data is noisy. In the absence of noise, any set of 12 faces from one of

4 views will do. The standard way to solve this problem is to apply SVD to this measurt-

ment matrix, whose rank is at most 4 (see section 3.1 for the argument). The measurement

matrix can subsequently be factorized, with proper normalization, into a left matrix Q of

transformation parameters and a right matrix P of plane coordinates

W=QP

where Q(I)IQ(2)

P = [P P2 ... P12] =

[(4)J

and Q 0 is the transformation offth view with respect to the fixed world coordinate system,

and pp is the pth plane equation in the same world coordinate system. Singular value

decomposition has also been successfully applied to shape and motion recovery from a

sequence of intensity images [191.

Unfortunately, the measurement matrix is often incomplete in practice; it is not unusual for a

large portion of the matrix to be unobservable. As we have seen in the above example, half

of the measurement matrix is unknown. When the percentage of missing data is very small,

it is possible to replace the missing eleiients with the mean or an extreme value; this is a

common strategy in multivariate statistics [7]. However, such an approach is no longer valid

when a significant portion of the measurement matrix is unknown.

One common practice in modeling from a sequence of images is to use extrapolation. For

example, we can recover the transformation between view I and view 2 if there are at least

three matched planar surfaces that are non-parallel [8]. Then we extrapolate the invisible

planar surfaces in view I from its corresponding surfaces in view 2 which are visible using

the transformation recovered. And apply the same extrapolation to invisible surfaces in view

1. By repeating this process we can in principle extrapolate the locations of all invisible sur-

faces from visible surfaces [ 121. A final step could be added to fine-tune the result by factor-

izing the extrapolated measurement matrix using SVD. A similar extrapolation approach"propagation method" [191 is used in motion and shape recovery from a sequence of inten-

sity images.

7

The major problem with the extrapolation method is that once the estimated transformation

is incorrect at any step, the extrapolated results will be erroneous. In sequential modeling

errors accumulate and propagate all the way. The fine-tuning process at the last step would

not improve the result dramatically since the extrapolated measurement matrix is inaccurate.To obviate this problem, we make use of more rigorous mathematical tools developed in

computational statistics that caters for missing data without resorting to error sensitive

extrapolation. We will demonstrate the formulation in this section and apply it to multipleview merging in the next section.

2.2 Wiberg's formulation

The problem of object modeling from a sequence of views shown in the previous sectioncan be formulated as a problem of principal component analysis with missing data

(PCAMD), which has been extensively studied in computational statistics. Ruhe [15] pro-

posed a minimization method to analyze one component model when observations are miss-

ing. One component model decomposes an F x P measurement matrix into an F x I left

matrix and a i x P right matrix. Wiberg [22] extended Ruhe's method to the more generalcase of arbitrary component model. We first outline Wiberg's formulation of principal com-

ponent analysis with missing data, before proposing a modified formulation by appropriateindexing, and generalizing the problem as a WLS problem.

Suppose that an F x P measurement matrix W consists of P individuals from an F-variate

normal distribution with mean gi and covariance 1. Let the rank of W be r. If the data is

complete and the measurement matrix filled, the problem of principal component analysis is

to determine u, s, and v such-that

is minimized, where U, and v are F x r and P x r matrices with orthogonal columns,

S = diag ((I,) is an r x r diagonal matrix, g. is the maximum likelihood approximation of the

mean vector and eT - (I.I) is an F-tuple vector with all ones. The solution to this problem

is essentially SVD of the centered (or registered) data matrix w-ep

If data is incomplete, we have the following minimization r blem:

min = p (EQ I)

I= {(fp): Wf, is observed}

where uf. and vP. are column vector notations defined by

1u.1... = &V (EQ 2)Lu T

and

[ T.. =i,2 (EQ 3)

Iv'. - -

Lemma 1

A necessary condition to uniquely solve this problem (EQ 1) is m > r(F+ P-r) + P

where m is the number of observable elements in W.

Proof:

It is trivially true that there are at most r (F + P - r) independent elements from LU

decomposition of an F x P matrix of rank r. Hence, to uniquely solve (EQ 1), the number of

equations (m) has to be no fewer than the number of unknowns (r(F+P-r)+P)). (Q.E.D.)

To sufficiently determine the problem (EQ 1) more constraints are needed to normalize

either the left matrix & or the right matrix V.

If we write the measurement matrix W as an m-dimensional vector w, the minimization

problem can be written as

min 4 - Tf (EQ4)

where

f = w-g-Fu = w-Gv (EQ5)

9

and

U= .. = [iP- . 1=P] , Ji.p(i)"

F and G are of dimension m x rF and m x (r + 1) P respectively, and are computed by

expanding every element f, of f

f= W 5 I., 1_1(i)_ Twi - . ) - 1"vp(i).

where the ith component of w is indexed to the (Jti), p(i))-th component of W, i.e.,

Wi -- Wf(i),p(i), and Wf, = wi(fP).

To solve the minimization problem stated by (EQ 4), the derivative of the objective function

(with respect to u and V,) should be zero, i.e.,

= [FTFu-F(w-li)j =0 (EQ6)T-TrGG-G w

Obviously (EQ 6) is nonlinear because F is a function of v and G is a function of u. In the-

ory, any appropriate nonlinear optimization method can be applied to solve it. However, the

dimensionality is so high in practice that we have to adapt the algorithm to make use of the

special structure of the problem. It can been observed that:

(1) For fixed u, we have a linear least squares problem of v; for fixed v, we have a linear

least problem of u;

(2) since (EQ 6) is also a bilinear problem of u and v, we can successively improve their

estimates by using the updating technique in the NIPALS algorithm [15], i.e., for a given

v, uis updated u = Fe (w- i); foragiven u, v is updated i = G+w. Fe and G+ are the

pseudo-inverses of F and G, respectively.

2.3 Modified Wiberg's formulation

In practice F and G are usually sparse matrices with many zeros. If we appropriately index

w as wP, such that

10

ft= w-I-Hu (EQ 7)

or

W 1, I [ t V1, I ... v,-, U1, I

Wi,22 I2 VI, 2 U 1, 2

S... ... ! OPxr ...

fl WI.P -- V1, P Vr, p U 1, r (EQ 8)

WF,1 I1l V 1, I V,, I UF, I

.... OPxrLWF, P ~L V1 ~ Vj,*PI UF, p

and similarly, if we index w as w2 , such that

f2 = w2 -Ki (EQ 9)

or,

V1, I7

W 1, I U1,I Ui,r 1 V2,1

W2, 1 U2, U2,r 1 OFx•(r+) ...

.... Vr, If= WF, I UF, I UF, r I (EQ 10)

W 1, p U,. I UI,, 1 V1,p

.. -- 0 Fx (r+ I) "--

LWF, Pi LFIt UF, r I Vr, P

Note that H and K are block diagonal matrices.

Because f1 and f2 contain the same observables as f,

2 -2 (EQ II)

and

11

_ HT(w- (EQ 12)

TK-i-K W2 J

since

H - K = . (EQI3)

If data is complete, K is a block diagonal matrix of dimension FP x (r + 1) P, whose block

elements are U matrices of dimension F x (r + 1) , replicated along the diagonal, i.e.,

K (EQ 14)

U1, I Ui1, IU = (EQ 15)

LUF, I UFrlj

when data is incomplete, the elements associated with the missing data are taken out, result-

ing in a matrix of dimension m x (r + 1) PUM.,x (r+, 1) 0 0 U, 0 0K 0 . 0 0 .0 (EQ 16)

0 0 U,px (r+ 0 0 up

whereP F

I MP = m,andmp =Y"f,p-=-I f=I

p = I when Wf, p is observed otherwise Yf, p = 0.

Similarly, when data is incomplete, we have the following matrix of dimension m x rF

H = - 00 = 'i I (EQ 17)o v0 o n~] 0 0.vF

12

whereF P

X ,= m, and n,= yf= Ip=l

The pseudo-inverse matrices of H and K can be easily computed because of their block

diagonal structure,

U10 01K 0= 0 (EQ 18)

V0 01U+

h' = o (EQ 19)

2.4 WLS formulation

The minimization problem (EQ 1) can be generalized as a WLS problem

1 2

min = f f (EQ2O)

where yf is weighting factor for each measurement Wf P.

In the previous discussion, we have assumed that all weights are either one when data is

observable or zero when unobservable. However, in many cases we may prefer to assign dif-ferent weights other than ones or zeros to individual measurement. For example, in recover-

ing the pose of a 3D plane, we can assign the confidence measurement to each recovered

surface normal by its incidence angle with the viewing direction. Different sensor models

can be applied to obtain a weighting matrix if necessary. In the following, we formulate

principal component analysis with missing data as a WLS problem.

We introduce two FP x FP diagonal weight matrices,

r = diag (r,, r 2, .... p) (EQ 21)

13

and

0 = diag (0D, 12, ..... OF) (EQ 22)

where

F = diag (ypl, 2',p,) = I,...,P,

and

'f = diag(yl,f y2,f ypf), f 1 F.

The minimization problem becomes

rain Y- 2 Y= Y2 2(EQ 23)

where

o= r f, and fy2 = 0 f2.

The solution to the above problem is when the first order derivative of the objective function

becomes zero. The derivative of the objective function is

=" Hyu- H; (w (EQ 24)T - T

where

Hy= KH =l(EQ 25)

and

Ky = OK =(EQ 26)

T[. of• F

Therefore, after computing the pseudo inverses of Hy, and K

" wy!.- : ... . .. . ... . . . .. -... ... . . .. .

14

VT (EQ 27)

( •.~U)' (U 41r)1

/=[- ... ]=[ (EQ 28)

(uT(TxU)-i (UT,*,)j

we can then use the PCAMD (principal component analysis with missing data) algorithm to

solve the WLS problem. Our formulation is essentially a modified NIPALS Ruhe-Wiberg

algorithm. The algorithm is as follows:

Algorithm PCAMD

(1) initialize

(2) update

u = H! (w -)(3) update

i = /•rW2

(4) stop if the algorithm converges, or go back to (2).

Remarks:

(1) Ruhe [15] also suggested using Newton and Gauss methods to speed up the

convergence of the NIPALS method. In practice, we found that the NIPALS method

converges within desired tolerance in several iterations in our experiments.

(2) Ruhe [ 15] and Wiberg [22] also showed that the more the missing data, the worse the

result will be. It is hardly surprising because the method is basically an interpolation

of all observable elements. Statistically this corresponds to decreasing robustness of

the estimate for the principal components given the observations. Fortunately object

15

modeling from multiple views, we can always take many views to form a well

constrained problem for our modeling purpose. Determining a minimally acceptable

number of views can be regarded as a sensor planning problem.

(3) The missing data can also be extrapolated as long as we find some sub-blocks in the

measurement matrix which satisfy Lemma 1. The issue of obtaining those blocks is

non-trivial. Once the missing data have been augmented, a linear or nonlinear

optimization method can be applied to solve the original problem. It should work

well if data is noise-free, i.e., only the first r singular values of the reconstructed

measurement matrix are non-zero. However, this method becomes of questionable

utility when any result from sub-block computation is inaccurate.

(4) There are statistical ways to improve the solution, for example, the metrically

Winsorised residuals method (181. This method is based on the assumption that each

measurement is corrupted by additive Gaussian noise. The metrically Winsorised

residuals method adjusts the weight for each measurement depending on its residual

error.

16

3 Merging Multiple Views

Principal component analysis with missing data has been formulated to a WLS minimiza-

tion problem in the previous section and a PCAMD algorithm is proposed to solve it. From

the motivational example of a dodecahedron it is clear that object modeling from a sequence

of views should be formulated as a WLS problem.

In this section, we show that multiple view merging can be formulated as a combination oftwo WLS problems. The first WLS problem involves rotation matrices and surface normals,

which is independent of translation. Once the first problem is solved, the second WLS prob-

lem yields translation vectors and normal distances to the origin. The first WLS problem ofdetermining rotation matrices and surface normals can be further simplified by representing

the rotation matrix using the quaternion. A straightforward two-step iterative algorithm can

be devised to solve these two problems using the PCAMD algorithm from the previous sec-

tion.

3.1 Two WLS problems

Suppose that we have tracked P planar regions over F frames. We then have trajectoriesof plane coordinates { (Vfp, df.) I f= I .. F,p = i_., P} where Vfp is the surface normal of thepth patch in thefth frame, and dfp is the associated normal distance to the origin. To facilitate

the decomposability of rotation and translation, instead of forming a 4F x P measurement

matrix as in section 2.1, we form surface normals vfp into a 3F x P matrix w"• and dis-

tances dfp into an F x P matrix w(d. w(') and w(d) are called the normal measurement

matrix and distance measurement matrix respectively.

It can be easily shown that w(v) has at most rank 3 and w"( has at most rank 4 when noise-free, therefore, w(" and w('4 are highly rank-deficient. We decompose w('" into

W(v) = R V (EQ29)

where

R R-

17

is the rotation matrix of each view with respect to the world coordinate system, andv - [v, ..., vpl is the surface normal matrix in the world coordinate system. Since R is a

3F x 3 matrix and V is an 3 x P matrix, the rank of w(') is at most 3.

Similarly, we can decompose w~d) into

W(d) = T M (EQ 30)

M= v] ... [vp ]andT= [ .:

[d 1] .. [ dp - 1t FR F I]

tf and Rf are the translation vector and rotation matrix of view f with respect to a fixedworld coordinate system.

Note that decomposition of w(dI depends on the decomposition of w('). Since M is 4 x P

and T is F x 4, the rank of w"0 ) is at most 4.

We can also decompose w~d' into

W " IV P] + [di dp](EQ 3 1)

When all elements in the two measurement matrices are known, we need to solve two least-

squares problems. However, since only part of the planar regions are visible in each view,

we end up with two WLS problems instead. The first least squares problem, labeled as

WLS- 1, is

ran .(V) t(V)2mi (y, pW ,p- IRV]fp)) (EQ 32)

f= i...F.pffI .. ,P

and the second one, denoted as WLS-2, is

min ( (d /, P [TM]f)1 ) (EQ 33)

= I..F, pf .P

where

18

4, 0 if surface p is invisible in frame f, and yf = 1 otherwise. All weights can be any

number between zero and one, depending on the significance or confidence of each mea-

surement. A similar WLS formulation is also used in [18].

3.2 Quaternion WLS-1

It appears, from the last section, that we can devise a naive two-step algorithm which solves

WLS-I and subsequently WLS-2 by applying the PCAMD algorithm to both problems.

However, in order to solve WLS- 1, we iterate R ( as if it had nine independent parameters,

while it is a nonlinear trigonometric function of three parameters. Although it is possible to

normalize RO after every iteration, it may perform poorly in terms of robustness and effi-

ciency.

In fact, several representations of rotation are often used in practice:

(1) An orthonormal rotation matrix R

(2) An rotation axis a and an rotation angle 0

(3) A unit quaternion q

A quaternion is a 4-tuple (ws) where w is a 3-vector and s is a scalar. The mapping between

a unit quaternion and a rotation axis along with a rotation angle is given by w = sin (0/2) a

and s = cos (0/2). The quaternion representation of the rotation matrix leads to a simple

way of solving minimization problems of 3D point matching and surface normal matching,

as demonstrated in [8].

The WLS-I problem (EQ 32) can be decomposed into F minimization problems

pmin l Jjwf-R'vPII (EQ 34)

p=lI

where

f = [W1,..., WFI., I f , ) = diag(yf1 .... yf.P)

The above problem can be reformulated using quaternion as

19

P

min Yy <jjwf-qU)v,,'JJ (EQ 35)P=l

subject to Iql = 1

where 4 is the conjugate quaternion of q, and

P P P£1Yi1 w -q 'yJ w~q -qUvpf =P q E 6P=1 P=I p=I

A P are symmetric matrices because wfq - q V p is a linear function of q Obviously

P

B = I Ap (EQ 37)P= I

is also symmetric, and the minimization problem 0 becomes

min q( B T qBU (EQ 38)

The solution to the above minimization problem is the eigenvector qmin corresponding to

the minimum eigenvalue of the matrix B

3.3 Iterative algorithm

We combine quaternion-based rotation matrix updating to form a two-step algorithm to

solve both the first and the second WLS problems. The algorithm is as follows:

Algorithm two-step WLS's

Step 0 Initialization

(0.1) read in measurement matrices W , W(d)M(d)(0.2) read in weight matrices y Y

(0.3) initialize R, vectorize R to v

20

Step 1 WLS-I

(1.1) vectorizeW(v) tow 1 and wv2

(1.2) update Hv

(1.3) update

u -HT wvy vi1(1.4) update

Bh

(1.5) update

q(f

and trsform to R, vectorize to v

(1.6) go to (1.2) if not converged, otherwise advance to Step 2

Step 2 WLS-2

(2.1) vectorize W(d) to wdl and wd2

(2.2) update H+

(2.3) update

u = H+TWdl(2.4) update

(2.5) update

v=K~dyWd2

(2.6.) stop if converged, otherwise go to (2.2).

We have not explicitly discussed the normalization problem in our WLS approach. The nor-

malization problem occurs because the measurement matrix is rank-deficient, hence, there

are infinite solutions to the minimization problem (EQ 1) unless an additional constraint is

imposed. This additional constraint is generally problem dependent; for example, the 2-

norm of the factorized left matrix is constrained to be one [ 15]. Fortunately, we have implic-

itly constrained our rotation matrices in its quaternion representation. The remaining con-

straint in the first WLS is the normalization of surface normal ve,.tors which are constrained

to be of unit magnitudes.

Prior to multiple view merging, we need to track surfaces so that normal measurement

matrix and distance measurement matrix can be formed. The next section describes our sur-

face tracking algorithm.

21

4 Surface Patch Tracking

In this section, we briefly overview each module of our surface patch tracking system: range

image segmentation, adjacency graph building, and two-view matching.

4.1 Range image segmentation

There are many different techniques for range image segmentation. By and large they can be

divided into feature-based and primitive-based approaches, although statistics-based

approaches have also been introduced recently. Feature-based approaches yield precise seg-

mentation but are sensitive to noise in practice. For example, Gaussian and mean curvatures

can be used to label different regions before region growing; however, this process is quite

sensitive to noise because of the second order derivative. Primitive-based approaches are

more robust to noise but constrained by the number of primitives. The higher the degree of

surface polynomial, the more difficult and the less robust the segmentation is likely to be.

We have used the primitive-based region growing segmentation method of [8]. The two

types of surface primitives used are the planar surface and the quadric surface. The regions

are established via region growing from seed points, i.e., the seed points are chosen from

points which are closest to their approximating primitives, and then merged with their

neighbors until the best-fit errors become unacceptable.

4.2 Adjacency graph

Once we have successfully segmented the range data for each view, the range image associ-

ated with view i can be represented as a set of planar regions Ii = { vi, di,, cij}, where vii

and dij are the normal ani distance of the jth segment planar surface respectively, and cii is

the centroid of jth segmented region.

From each view of the 3D object, we build an adjacency graph where every node in the

graph represents a visible planar region and each arc connects two adjacent nodes. The adja-

cency graph is updated whenever this view is matched with another. Eventually we have

adjacency information among all visible planar regions after tracking all of them for the

whole sequence. From the adjacency graph, all the object vertices can be located; thus, 3D

object model is obtained. However, augmenting adjacency graph is difficult for concave

objects because of occlusion. A better way of establishing spatial connectivity among all

surfaces is discussed in section 5.

22

We have implemented the planar surface patch tracking system which employs a wavefront

algorithm to generate the adjacency graph. The wavefront algorithm makes use of range

data because there is significant change in range data across an occluding edge.

4.3 Matching two views

Given two adjacent segmented images I, and 12, we would like to find correspondence

between different regions in two views, i.e., we want to find a mapping ý: (11 -- 12) such

that a certain distance measurement d (I1, 12) is minimized.

Two questions arise in matching two views of planar regions. The first is how to make corre-

spondence between two views; the second is how to recover the transformation between

them. Our solution to the first problem is to use adjacency information between two seg-

mented patches and between segmented surface normals. If displacement between two

views is relatively small, there should be only linear shape change [11] within the same

aspect, corresponding segmented regions are of similar size (number of points), centroid,

and surface normals. When a new aspect appears, which signals a nonlinear shape change,

there would be significant change in these parameters. There may not always be solutions to

the second problem because we need at least two corresponding non-parallel faces to deter-

mine rotation and three to determine translation. In practice, we can make the assumption

that we always have two non-parallel corresponding faces in two adjacent views.

In fact, solving the second problem can be of help to the first problem because we can then

make use of the hypothesis-and-test approach. We iteratively select two pairs of non-parallel

faces from the two images to be matched, estimate the corresponding rotation matrix, and

then attempt to match the rest of the faces. We always choose two adjacent faces from both

images, and match them based on surface normal, distance and centroid of segment regions.

The number of faces matched and consistency in face adjacency are used in the distance

measure between two matches. The estimated transformation matrix is only used to help

building the adjacency graph, while the precise transformation is robustly recovered from

our WLS method.

Multiple view tracking is done by sequentially matching two adjacent views. Whenever a

new view is added, the adjacency graph and the weight matrix are automatically modified.

Because of the problems associated with updating adjacency graph, subsequent to surface

patch tracking and multiple view merging, we use another algorithm to establish the spatial

connectivity among surfaces.

23

5 Spatial Connectivity

Once we have extracted the equations of planar surfaces of the object, we then need to

establish spatial connectivity relationship among these surfaces. One approach is to build

adjacency graph from a sequence of views, as discussed in previous section. However, aug-

menting adjacency graph whenever a new view is introduced, is quite ad-hoc. In this sec-

tion, we present a new approach of recovering surface connectivity after all surface patches

are recovered. We show that the problem of spatial connectivity of boundary surfaces can be

reduced to one of connectivity of supporting lines of a simple polygon.

5.1 Haft-space intersection and union

We assume that every planar patch P of object model is a simple polygon. A simple polygon

does not self-intersect. Every (infinite) plane divides the space into two parts, inside and out-

side, with surface normal pointing towards the external side of the object. Given an

unbounded planar surface, if we intersect all other planar surfaces on it, we obtain support-

ing lines as illustrated in Fig.2. Each supporting line is directed so that the interior of P lies

locally to its right. The right half-plane created by such a directed supporting line c is called

the supporting half-plane, and is characterized as supporting the polygon [61; however, a

concave P might not all lie in the right half-plane as indicated in Fig.2.

a

Jac

b

d

Figure 2 A simple polygon and its supporting lines (stippled and solid lines)

24

For each point x in the plane, if we know which side of each supporting line x lies on, we

know if x is inside P. Therefore, the polygon P (and its interior) can be represented as a bool-

ean formula whose atoms are those supporting lines. In other words, a simple polygon can

be represented by intersection and union of its supporting line. For example, a boolean for-

mula for the polygon in Fig.2 can be ca Q ab E bd ( dc. This Guibas style [6] formula is

obtained by complementing the second supporting line at a convex angle, and the first sup-

porting line at a concave angle when we go around the polygon. Other boolean formulas

such as Peterson style are also possible [6]. Once spatial connectivity is established, the

Guibas style formula is straightforward.

5.2 Modified Jarvis' march

The problem of establishing spatial connectivity of supporting lines can be formulated as a

mocdified convex hull-like problem which involves only vertices. This problem can also be

regarded as one of cell decomposition which involves data points. We propose a modified

Jarvis' march algorithm to reconstruct simple polygons from supporting lines and valid data

points. The algorithm to recover spatial connectivity among 3D surfaces is discussed in sec-

tion 5.3.

Definition 1

A point is defined as valid in a simple polygon if there exist sufficient valid data points aroundits neighborhood.

Lemma 2

The intersection point P of two supporting lines is valid in a simple polygon if and only if theintersection of two corresponding half-planes is valid locally at P.

Proof:

When the intersection of two half-planes is valid iocally at P, the intersection point of

these two supporting lines is valid by its definition.

Assume that the intersection point of two supporting lines is valid. Since two lines divide

the plane into four regions, there must exist at least one such region among four around the

25

intersection point that is a valid cell of the simple polygon. Therefore, the intersection of

two half-planes is valid locally at P. (Q.E.D.)

The Lemma 2 leads to a modified Jarvis' march algorithm of reconstructing simple polygon

from supporting lines and valid data points.

To construct the simple polygon from all supporting lines and valid data points, we first pre-

compute all intersection points which are candidates of vertices of the simple polygon. If we

march successive vertices with the least turning angle, we obtain their convex hull; this is

referred to as Jarvis' march algorithm [ 14]. The kernel of the simple polygon, if it exists, can

also be found by intersecting all half-spaces. Using Lemma 2, however, enables us to find

the correct simple polygon by marching all points whose local neighborhood is valid. We

call this algorithm the "modified Jarvis' march".

Assume that we have first found the lowest left point p] of the set of vertex candidates,

which is certainly a convex hull vertex, but not necessarily a vertex for our simple polygon

(unless it is valid locally). For example, in Fig.3, p] is not a simple vertex because pSplp2 is

not a valid triangle cell (valid cells are shaded areas which show valid data points). Since

p6p2p3 is a valid triangle cell, we start our algorithm from p2.

p13 p14 p151 lo P1p3* p

*pl p2 ip 3 p4

Figure 3 Example of modified Jarvis' march and cell decomposition. Shaded area represents valid datapoints.

26

A data smcure is defined for each intersection point P as follows

-{

interswt-polint left, right, up, down;inutctst-point previous, next;

Sinrsect-po•nt P;

Fig.4 shows the relationship among the members of the data structure. Assume that an inter-

section point is intersected by only two supporting lines.

P->up = P4 P->right = P3

P->next =

-eftPP->down = P2P->previous = P,

Figure 4 Illustration of data structure of intersection point

After the starting vertex is found, we march for the next vertex as illustrated in Fig.4. If

there are sufficient data points in cell PP2P3, next valid vertex is P2; If P2 is not valid, we

check if P3 is valid; If P3 is also invalid, P4 must be valid, or an error will occur. The march

ends when the next vertex is the starting vertex. The modified Jarvis' march (MJM) algo-

rithm is given as follows:

Algorithm MJM

Step 1. initialize starting vertex

START->previous = NULL,P = START->next,P->previous = START;

27

Step 2. march

P->left = PI; P->down = P2; P->right = P3; P->up = P4;ifceli PP2P3 valid, P->next = P2 (case 1)else if cel PP3P4 valid, P->next = P3 (case 2)else if cell PP4P, valid, P->next = P4 (case 3)ele error occurs;

Step 3. terminate

If P->next = START.

A postprocessing step may be necessary to remove points which belong to case 2 in step 2 of

the march algorithm. These points are on the same line with its previous point and its nextpoint. For example, in Fig.3, p12 can be removed because p8 and p16 make it redundant.

As can be seen from the above algorithm and Fig.3 as well as Fig.4, the problem of single

polygon reconstruction from supporting lines and valid data points is one of cell decomposi-

tion. As we march around all supporting lines, the Guibas style boolean formula of the sim-

ple polygon can be readily formulated.

5.3 3D spatial connectivity

So far we have discussed the problem of recovering the connectivity of supporting lines of a

simple polygon. The approach uses information at both signal level (real data points) and

algebraic level (line equations). The same hybrid approach can be applied to the problem of

spatial connectivity of planar surfaces in 3D.

Indeed, the problem of connectivity of planar surfaces in 3D can be reduced to a set of prob-

lems of connectivity in 2D. Assume that we have recovered a set of N face equations and

transformation among different views (e.g., from PCAMD). All valid data points from mul-

tiple views can be merged in the same world coordinate system. For each face Fi, if we

intersect all other N-I faces Fj (j = 1, ... , N- I, j * i). with Fi and project all these lines

onto Fi, we get M (=N-1) supporting lines on face Fi. We also project nearby 3D points onto

this face Fi. Without loss of generality, we assume that no two supporting lines are parallel

(or a normal threshold d can be set such that vivj Ž_ d). For any of the M supporting lines, weintersect it with the rest M-I lines, we get all possible candidates for vertices of the valid

simple polygon which is the model of face Fi, as illustrated in Fig.5. The modified Jarvis'

28

march algorithm can be then applied to each of the N faces accordingly. By connecting all

polygons recovered, we get the entire 3D object model boundary. A simple algorithm can be

accordingly constructed to cstablish 3D spatial connectivity then.

Fig. 5 shows an example. It is a face of a toy house model. The complete house model is

reconstructed and presented in next section. Fig.5(a) shows intersections of supporting linesand nearby data points projected on this face, while Fig.5(b) superimposes a reconstructed

simple polygon model of this face on Fig.5(a).

(a) (b)

Figure 5 Reconstruction of connectivity. The tiny dots represent projected nearby data points.Intersections of supporting lines are represented by black circles. Vertices of reconstructedsimple polygon are represented by small squares.

29

6 Experiments

In this section, we present results of applying our algorithm on synthetic data and on real

range image sequence of objects. We demonstrate the applicability and the robustness of our

approach using synthetic data, and present the recovered model from real range images.

6.1 Synthetic data

Our synthetic data set consists of a set of 12 planes as in the case of the dodecahedron in the

last section. A dodecahedron with 4 different views is shown in Fig. 1 in section 2.

6.1.1 Applicability

In this section we study the applicability of the proposed approach. In order to recover the

shape of a dodecahedron, given correspondence, how many views are necessary?

For example, we pick 4 distinct views from the viewing sphere so that there is no singular-

ity. Singularity occurs when less than 6 faces are visible. For example, we can formulate two

measurement matrices for surface normals and planar distances as follows:

v() (1) () (1) (1) l * , , ,V1 2 V3 V4 V5 v6

(2) v (2) •• (2)v(2) (2) (2)*_ VI 2 V3 V4 V7 8 (EQ 39)

v(3) V(3) *(3) * V (3) v(3) (3) * *1 2 68 V 9 V 10

, (4) (4) (4) . (4) (4) (4)V7 V8 V9 V10 vi V 12

d•') d(l) d•") dal) d•') d") * * * * * *

1 2 3 4 5 6

Sd(2 ) d.(2) d.(2) d(2) * * (2) d(2) * * * ,

1 2 3 4 7 8 (EQ 40)d(3 ) d( 3 ) d (3) d (3) d (3) d(3) *

1 2 6 8 9 to

* * * * * (* d04 d((4) d0( 4 d2(4)L6 8 9 10 11 12 _

In order to solve the first WLS problem uniquely for F frames, we need

18F> 3F+ 3P

since we have 18F equations and 3F unknowns for rotation matrices and 3P unknowns

for surface normals. For the second problem, we have 6F equations, but there are only F

30

unknown translation vectors and P unknown plane distances after using results from the

first problem. Therefore, the necessary condition to uniquely solve the second problem is

6F > 3F + P

Since P is 12, F a 4 is the unique solution for both problems. Again, we are not concerned

with the normalization problem here.

6.1.2 Robustness

We study the effectiveness of our approach when data is corrupted by noise and mismatch-

ing occurs. Our synthetic data consists of a set of 12 surface patches randomly distributed

around all faces of a dodecahedron. Correspondence is assumed to be known. Only the first

WLS problem is studied because of the similarity between those two WI S problems. Theminimization of weighted squares distance between reconstructed and given measurement

matrices leads to the recovery of surface equations and transformations.

To study the error sensitivity on reconstruction of our algorithm, we take four nonsingular

views of the dodecahedron where each component in every surface normal is corrupted by aGaussian noise of zero-mean and variable standard deviation. As we have shown in the pre-vious section, at least 4 views are required to recover the dodecahedron model. Fig 6 shows

that our algorithm converges in a few steps. The cases with standard deviation a of 0.05,

0.1, 0.2, and 0.5 are studied. Notice that the case with standard deviation of 0.5 yields verynoisy original data. As we take more views, the sum of weighted squares error is reduced.

Fig. 7 plots the normalized weighted least squares error for 4 views, 8 views, 12 views, and

16 views respectively, while gaussian noise with 0.5 standard deviation is present. The

squares errors are normalized because the number of observations increases as more views

are introduced.

The more interesting case is when mismatching occurs. Obviously if the face appears only

once in the whole sequence, then its reconstruction depends on the amount of noise. When

this face appears in more and more views, its reconstruction using our WLS method is aver-

aged over these views. Fig. 8 gives the reconstructed errors of a face which appeared 12times in 16 views. When only two views are matched, the reconstructed surface normal is

deviated from its normal by 18.2 and 38.9 degrees when standard deviation a is 0.1 and 0.2,respectively. When more views are added, the angle between the reconstructed surface nor-

mal and its normal decreases to around 10 and 20 degrees respectively.

31

When an observed surface normal is wrong in one particular view, the conventional sequen-

tial reconstruction method results in an erroneous recovered surface normal and transforma-

tion. The errors propagate as new views are introduced regardless of the number of views in

which this surface is visible. However, our WLS approach gives appreciably smaller recon-

struction error on this observed surface normal by distributing the errors in all views. In any

case, in general, our approach cannot be worse than the sequential approach. Fig. 9 com-

pares the reconstructed errors of sequential method and WLS. There are 12 observations of

this surface normal in 16 views and its first observation is off by an angle between 0' and

400. The reconstructed models of sequential method and WLS method are shown in Fig. 6

along with the original model, for the case of 400 angle deviation of one surface normal in

the first view. Fig. lOa shows a badly-skewed model, which is the worst case from the

sequential method since the error was introduced in the first frame. Fig. lOb shows the

reconstructed model by the WLS method while the original dodecahedron model is pre-

sented in Fig. 10c.

Sum of Weighted Squares Error

10.0 . sigma 0.05. sigma 0. 1

sigma 0.2sigma 0.5

Number of

0.0 0 10 Iterations

0 10

Figure 6 Effect of noise

32

Error10.0- 00 *-• 4 views

"--•-8 views-' 12 views"- 16 views

" "2 •"......• ...... • ..... ..... ......

0.0 Number of

0 10 Iteratons

Figure 7 Effect of number of views

Error (degree)40.0 !

sigma 0.2.

30.0 .

20.0 ......

1 0 .0. ... .. . .....

0.0 Number ofFaces matched

2 12

Figure 8 Reconstructed error vs. number of matched faces

33

Reconstruction Error (degree)

40.0sequential

30.0

20.0

0.0 •<' Input error(degree)0 10 20 30 40

FMgure 9 Comparison between sequential reconstruction and WLS method

(a) (b) (c)

Figure 10 Recovered and original dodecahedron models (a) worst case of sequential method; (b) ourWLS method; (c) original model

6.2 Real range image sequence

We have applied our algorithm to a sequence of range images of a polyhedral object, using

the planar region tracker described in section 3. Figures 11(a) and 11(b) show the whole

sequence of 12 views and their corresponding segmentation results. Segmentation is not

perfect in several views. Figure 12 shows the result of our system, two shaded views of

recovered object model. Figures 13 and 14 show another example of a toy house.

34

(b)

Figure 11 A sequence of images of a polyhedral object (a) original images; (h) after segmentation

35

Figure 12 Two views of shaded display of a recovered model

36

I.ii~ure I.? A se~quence o~f imlages ofI a toy house ta original images; 11) 1 after segmlentationl

37

Figure 14 Four views of texture mapped display of a reconstructed house model

38

7 Concluding Remarks

An object modeling approach using multiple range images has been described in this paper.

The boundary representation object model is recovered and integrated from different views.

One significant contribution of this work is the application of principal component analysiswith missing data to object modeling from a sequence of views. An inherent problem in

multiple view integration is that the information observed from each view is incomplete and

noisy. Based on Wiberg's formulation, we have generalized principal component analysis

with missing data as a WLS minimization problem and presented an efficient algorithm,

PCAMD, to solve it.

With zero weights assigned to the unobservable data, merging different views can be formu-

lated as a combination of two WLS minimization problems. By applying the PCAMD algo-

rithm to both problems, we get a straightforward two-step algorithm in which the first step

computes surface normals and rotation matrices by employing the quaternion representation

of the rotation matrix; the subsequent step recovers translation vectors and normal distances

to the origin. Experiments on synthetic data and real range images indicate that our

approach converges quickly and produces good models even in the presence of noise and

mismatching. An accurate polyhedral object model reconstructed from a sequence of real

range images is presented. A complex toy house model is also reconstructed.

When motion between two views is relatively small, we-can track different segmented sur-

face patches by making use of surface normals, distances, centroids, and adjacency informa-

tion. An adjacency graph is built for each view and modified as the viewing direction

changes. A significant advantage of surface patch tracking, as opposed to other methods

such as point matching and line segment tracking, is that surface patches can be more reli-

ably extracted and tracked. One problem in our current system is that the composite graph

has to be accurately recovered to extract vertices and edges. This can be done as shown in

section 4 and section 5. However, we believe that the composite graph may not be neces-

sary. One possible solution, which we are currently working on, is to combine half-space

intersection and union of surface equations and range data to eliminate the use of the com-

posite graph.

Another contribution of this work is a hybrid approach of establishing spatial connectivity

of boundary surfaces. The spatial connectivity of surfaces, and in particular, the supporting

39

lines of a simple polygon, can be obtained by combining algebraic equations of surfaces and

data points merged from multiple views once transformation is recovered.

"One general principle in computer vision is, if surface information is not enough to deter-mine each surface locally, use global constraints that constrain relative configuration of thesurfaces so that the total degrees of freedom decrease." [17]. The object modeling technique

presented in this paper is an example of this principle, where the algebraic structure of sur-

face equations from multiple views is used as the global constraint. The recovered objectmodel is statistically optimal because it is most consistent with all of the views in the sense

of weighted least squares. By observing and employing different forms of input redundancy,

our approach can be easily extended to other vision problems such as shape and motionfrom a sequence of intensity images. We are also working on applying our techniques to

more complicated scene modeling.

Acknowledgment

We thank Sing Bing Kang for his many valuable comments which have significantly

improved the quality of this paper. We would also like to thank David Chan for helping seg-mentation of range images, and Mark Wheeler for proofreading early versions of this paper.

References

[1] N. Ahuja and J. Veenstra. Generating Octrees from Object Silhouettes in Ortho-

graphic Views. IEEE Trans. PAMI Vol. 11 No. 2, pp. 137-149, 1989.

[2] F. Arman and J.K. Aggarwal. Model-based Object Recognition in Dense-Range

Images - A Review. ACM Computing Surveys. Vol. 25, No. 1, pp. 5-43, 1993

[3] B. Bhanu. Representation and Shape Matching of 3-D Objects. IEEE Trans. PAMI

Vol. 6, pp. 340-351, 1984.

[4] Y. Chen and G. Medioni. Object Modeling by Registration of Multiple RangeImages. Proc. IEEE Int. Conf. R & A, pp. 2724-2729, April 1991.

[5] C. Debrunner and N. Ahuja. Motion and Structure Factorization and Segmentation

of Long Multiple Motion Image Sequences. Proc. ECCV, pp. 217-221, 1992.

40

[6] D. Dobkin, L. Guibas, J. Hershberger, and J. Snoeyink. An Efficient Algorithm for

Finding the CSG Representation of a Simple Polygon. Algorithmica, 10: 1-23, 1993.

(7] Y. Dodge. Analysis of Experiments with Missing Data. Wiley. 1985.

[8] O.D. Faugeras and M. Hebert. The Representation, Recognition, and Localization of

3-D Objects. Int. J. Robotics Research. Vol. 5, No. 3, pp. 27-52, 1986.

[9] F.P. Ferrie and M.D. Levine. Integrating Information from Multiple Views. Proc.

IMEE Workshop on Computer Vision. pp. 117-122, 1987.

[10] G.H. Golub and C.F. Van Loan. Matrix Computation. 2nd Edition. The John Hop-

kins University Press. 1989.

[11] K. Ikeuchi. Generating an Interpretation Tree from a CAD Model for 3D-Object

Reconstruction in Bin-Picking. Int. J. Computer Vision. pp. 145-165, 1987.

[12] B. Parvin and G. Medioni. B-rep from Unregistered Multiple Range Images. Proc.

IEEE Int. Conf. R & A, pp. 1602-1607, May 1992.

[13] C.J. Poelman and T. Kanade. A Paraperspective Factorization Method for Shape and

Motion Recovery. CMU-CS-92-208, Oct 1992.

[14] F.P. Preparata and M.I. Shamos. Computational Geometry. Springer-Verlag. 1988.

[15] A. Ruhe. Numerical Computation of Principal components when Several Observa-

tions are Missing. Tech Rep. UMINF-48-74, Dept. Information Processing, Umea

Univ., Umea, Sweden, 1974.

[16] M. Soucy and D. Laurendeau. Multi-Resolution Surface Modeling from Multiple

Range Views. Proc. IEEE CVPR, pp. 348-353, 1992.

[17] K. Sugihara. Machine Interpretation of Line Drawings. The MIT Press, 1986.

[18] R. Szeliski and S.B. Kang. Recovering 3D Shape and Motion from Image Streams

using Non-linear Least Squares. DEC CRL 93/3, 1993.

[19] C. Tomasi and T. Kanade. Shape and Motion from Image Streams under Orthogra-

phy: A Factorization Method. Int. J. of Computer Vision, 9:2, pp. 137-154, 1992.

[20] S. Ullman. The Interpretation of Visual Motion. The MIT Press. 1979.

[21] B.C. Vemuri and J.K. Aggarwal. 3-D Model Construction from Multiple Views

Using Range and Intensity Data. Proc. CVPR, pp. 435-437, 1986.

[22] T. Wiberg. Computation of Principal Components when Data are Missing. Proc.

Second Symp. Cc,.nputational Statistics, pp. 229-236, Berlin, 1976.

Best Available Copy - DTIC · Bhanu [31 rotated the object through known angles. Ahuja and Veenstra...

Documents

Transcript of Best Available Copy - DTIC · Bhanu [31 rotated the object through known angles. Ahuja and Veenstra...