Best Available Copy - DTIC · Bhanu [31 rotated the object through known angles. Ahuja and Veenstra...
Transcript of Best Available Copy - DTIC · Bhanu [31 rotated the object through known angles. Ahuja and Veenstra...
* ~Comp etorence,
4 41
4 , Principal Component Analysis with Missing Dataand Its Application to Object Modeling
Heung-yeung Shum Katsushi Ikeuchi Raj Reddy
L. .IDecember, 1993 0CMU-CS-93-213
DTICELECTEVjn1CQ MAR 1.0 19941
iCM aegieý00iii Ve oi
94-07791
94 3 9 047
BestAvailable
Copy
Principal Component Analysis with Missing Data
and Its Application to Object Modeling
Heung-yeung Shum Katsushi Ikeuchi Raj Reddy
December, 1993CMU-CS-93-213
School of Computer ScienceCarnegie Mellon University
Pittsburgh, Pennsylvania 15213
Accesion For
NTIS CRA&IDTIC TABUnannounced ,Justification rl.. _ .
By ............ X:
Distribution I t , . .- -
Availability Codes , . 199Avail and I or
Dist Special
This research is partially sponsored by the Department of the Army, Army Research Office under grantnumber DAAH04-94-G-0006, partially sponsored by the Avionics Laboratory, Wright Rcrch and Devel-opment Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB, Ohio45433-6543 under Contract F33615-90-C-1465, ARPA Order No. 7597, and partially supported by NSFunder Contract IRI-9224521. The views and conclusions contained in this document are those of the authorsand should not be interpreted as representing the official policies or endorsements, either expressed orimplied, of the Department of the Army or the U.S. government.
S3 olei
Keywords: 3D object modeling, multiple view merging. principal component analysis.
AbstractObservation-based object modeling often requires integration of shape descriptions fromdifferent views. In current conventional methods, to sequentially merge multiple views,accurate description of each surface patch has to be precisely known in each view and trans-formation between any adjacent views needs to be accurately recovered. When noisy dataand mismatches are present, recovered transformation becomes erroneous. In addition, thetransformation error accumulates and propagates along the sequence, which results in aninaccurate object model. To overcome these problems, we have developed a weighted leastsquare (WLS) approach which simultaneously recovers object shape and transformationamong different views without recovering inter-frame motion as an intermediate step.
We show that object modeling from a sequence of range images is a problem of principalcomponent analysis with missing data (PCAMD), which can be generalized as a WLS mini-mization problem. Andefficient algorithm is devised to solve the problem of PCAMD. Afterwe have segmented surface regions in each view and tracked over all the sequence, we con-struct a 3F x P normal measurement matrix of surface normals, and an F x P distance mea-surement matrix of normal distances to the origin for all visible P regions appeared over thewhole sequence of F views, respectively. These two measurement matrices, which havemany missing elements due to noise, occlusion and mismatching, enable us to formulatemultiple view merging as a combination of two WLS problems. A two-step algorithm,which employs the quaternion representation of the rotation matrix, is presented to computesurface descriptions and transformations among different views simultaneously. After sur-face equations are extracted, spatial connectivity among these surfaces is established toenable the object model to be reconstructed.
Experiments using synthetic data and real range images show that our approach is robustagainst noise and mismatching and generates accurate object model by averaging over allvisible surfaces. Specifically, using a sequence of real range images, we illustrate the recon-struction of a toy house model.
1 Introduction 11.1 Past work ............................................................................................................................. 11.2 Our approach to m ultiple view m erging .............................................................................. 31.3 Organization of paper .......................................................................................................... 4
2 Principal Component Analysis with Missing Data 52.1 M otivational exam ple ..................................................................................................... 52.2 W iberg's form ulation ..................................................................................................... 72.3 M odified W iberg's form ulation .................................................................................... 92.4 W LS form ulation ............................................................................................................... 12
3 Merging Multiple Views 163.1 Two W LS problem s ...................................................................................................... 163.2 Quaternion W LS- 1 ............................................................................................................ 183.3 Iterative algorithm ........................................................................................................ 19
4 Surface Patch Tracking 214.1 Range im age segm entation .......................................................................................... 214.2 Adjacency graph ................................................................................................................ 214.3 M atching two views ..................................................................................................... 22
5 Spatial Connectivity 235.1 Half-space intersection and union ............................................................................... 235.2 M odified Jarvis' m arch ................................................................................................ 245.3 3D spatial connectivity ............................................................................................... 27
6 Experiments 296.1 Synthetic data .................................................................................................................... 29
6.1.1 Applicability ................................................................................................... 296.1.2 Robustness ..................................................................................................... 30
6.2 Real range im age sequence .......................................................................................... 33
7 Concluding Remarks 38
Acknowledgment 39
References 39
Figure 1 Distinct views of a dodecahedron ................................................................ 5Figure 2 A simple polygon and its supporting lines ........................................... 23Figure 3 Example of modified Jarvis' march and cell decomposition ............... 25Figure 4 Illustration of data structure of intersection point ................................. 26Figure 5 Reconstruction of connectivity ............................................................. 28
Figure 6 Effect of noise ...................................................................................... 31Figure 7 Effect of number of views ..................................................................... 32Figure 8 Reconstructed error vs. number of matched faces ............................... 32Figure 9 Comparison between sequential reconstruction and WLS method ....... 33Figure 10 Recovered and original dodecahedron models ................................... 33Figure 11 A sequence of images of a polyhedral object ...................................... 34Figure 12 Two views of shaded display of a recovered model ............................ 35Figure 13 A sequence of images of a toy house .................................................. 36Figure 14 Four views of a reconstructed house model .......................................... 37
1 Introduction
Solid modeling is a useful tool for tasks such as representing the virtual environment for
virtual reality systems; representing the real environment for robot programming; and
modeling real objects for object recognition. Currently, most object models are con-
structed by human operators [2]. It would be much better to have a system that can auto-
matically build models of real objects that it observes. If we can develop a reliable
technique to generate accurate 3D object models by observing real objects from multiple
views, we can reduce the effort and cost of model construction, and we can significantly
broaden the application areas of solid modeling.
Observation-based modeling systems usually work with a sequence of images of the
object(s), where the sequence spans a smoothly varying change in the positions of the sen-
sor and/or object(s). Most previous systems have attempted to apply inter-frame motion
estimates to successive pairs of views in a sequential manner [ 12]. Whenever a new view
is introduced, it is matched with the previous view, and the transformation between these
two successive views has to be recovered before the object model is updated. This sequen-
tial method does not work well in practice because local motion estimates are subject to
noise and missing data. Local mismatching errors accumulate and propagate along the
sequence, yielding erroneous object models.
Rather than sequentially integrating successive pairs of view, we should instead search for
the statistically optimal object model that is most consistent with all the views. Although
every single view provides only partial information of the object, it is likely that any part
of the object will be observed a number of times along the sequence. Object modeling
from this sequence of views can be formulated as an overdetermined minimization prob-
lem because significant redundancy exists among all the views.
1.1 Past work
Much work has been done in object modeling from a sequence of range images [4]. Most
work assumed that transformation between successive views is either known or can be
recovered, so that all data can be transformed to a fixed coordinate system. For example,
Bhanu [31 rotated the object through known angles. Ahuja and Veenstra [ I ] constructed an
octree object model from orthogonal views. Soucy and Laurendeau [ 16] proposed to trian-
gulate each view and merge multiple views via a Venn diagram when the transformation is
known. Because building a Venn diagram is combinatorial, only four-view merging is pre-
2
sented in their work. By finding the correspondences from intensity patterns in all eight
views, Vemuri and Aggarwal [21] derived the inter-frame motion and transformed all eight
range images to the first frame. Ferrie and Levine [9] merged multiple views using corre-
spondence points which are identified by correlation over the differential properties of the
surface. Parvin and Medioni [12] proposed to construct boundary representation (B-rep)
object models from unregistered multiple range images. Each view of the object is repre-
sented as an adjacency graph where nodes represent surface patches with attributes and arcs
representing adjacency between surfaces. To merge any two views, a rigid transformation
has to be computed accurately. Most of previous approaches to modeling from a sequence of
views are sequential. Thus, transformation errors accumulate and propagate from one
matching to another, which may result in imprecise object models.
Inferring scene geometry and camera motion from a sequence of intensity image is also pos-
sible in principle. For example, Tomasi and Kanade [19] proposed a factorization method to
simultaneously solve shape and motion under orthography, and Poelman and Kanade [13]
extend it to the case of paraperspective projection. Szeliski and Kang [ 18] proposed a non-
linear optimization method to solve shape and motion under perspective. However, in
[19][13], the task is formulated as a least squares problem where missing data due to occlu-
sion and mismatching is extrapolated from measured data and estimated motion. Although
three views of four points is theoretically sufficient in determining structure and motion
[201, it is difficult in practice to find a good submatrix to do "row-wise" and "column-wise"
extrapolation. Szeliski and Kang [ 18] proposed to assign a weight to each measurement and
incorporated an object-oriented perspective projection in a nonlinear least squares problem.
The very nature of the nonlinear least squares formulation requires standard techniques in
nonlinear optimization, e.g., Levenberg-Marquardt, in which convergence to a local mini-
mum may be a problem. In addition, most existing algorithms seem to be more useful for
determining camera motion than for building 3D object models because the recovered
object shape is defined by a collection of 3D points whose connectivity is not explicitly
known.
The factorization method [5][13][19], in essence, is principal component analysis of some
measurement matrix. Principal component analysis expresses the variance of the measure-
ment matrix in a compact and robust way and has been extensively studied in computational
statistics [7]. The singular value decomposition (SVD) method [10] is a straightforward
solution when the measurement matrix is complete. When data is incomplete or missing, as
often the case in practice, principal component analysis becomes much more complicated.
Ruhe [ 15] first proposed a Gauss-Newton algorithm to solve this problem, taking advantage
3
of the sparse, structured derivatives of the object function. Wiberg [22] generalized Ruhe's
work to the cases where the rank of the measurement matrix is known. However, Wiberg's
algorithm requires solving large pseudo-inverse matrices.
1.2 Our approach to multiple view merging
We propose to build object models from a sequence of range images. Our approach is to
recover bounding surfa'es and transformations simultaneously by employing principal com-
ponent analysis with missing data. After segmenting all range images and establishing cor-
respondences among different views, a composite graph is built. The object surface
description and transformations among different views are recovered by solving a combina-
tion of two weighted least squares (WLS) problems.
There are two key observations to our approach of object modeling from a sequence of
views. First, because of the redundancy in the sequence of images, we can get a reliable
solution from an overconstrained minimization problem even when data is missing. Because
only part of the object is visible in each view we cannot find correspondences among all sur-
faces between two views. Therefore, this is not a least squares (LS) problem but a WLS one
where the weights are zeros for invisible regions. The difficulty is how to formulate the
WLS problem properly and how to solve this problem without resorting to extrapolation of
the unknowns. We present an algorithm to iteratively update the surface description and
transformation so that the weighted least squares error is minimized.
The second observation lies in the first WLS problem of recovering surface normals and
rotation matrices. The modeling problem can be decomposed into two smaller problems
because recovering rotation is independent of translation. If we directly apply the WLS
algorithm, we have to explicitly update nine parameters of every rotation matrix. It is well-
known that the rotation matrix is a nonlinear function of only three independent parameters.
Therefore, updating nine parameters (even with proper normalization afterwards) is not the
best way to solve this problem. We solve this problem by representing rotations using
quaternions.
To make an object model from a recovered set of surface equations, spatial connectivity
among surfaces has also to be recovered. Spatial connectivity refers to the spatial relation-
ship among surfaces, i.e., for each surface, what other surfaces it is connected to. The prob-
lem of surface connectivity is reduced to one of connectivity of supporting lines of a simple
4
polygon, solved by a modified Jarvis' march algorithm that combines information on both
algebraic level and signal level.
1.3 Organization of paper
In section 2 we discuss principal component analysis when data is missing. From a motiva-
tional example of modeling 12-faced polyhedra from a sequence of views, we formulate the
multiple view merging as a problem of principal component analysis with missing data(PCAMD). Then we outline Wiberg's formulation of PCAMD, and modify the formulation
by proper indexing of the objective function. The modified formulation is then generalizedto a WLS minimization problem. An efficient PCAMD algorithm is presented to solve this
WLS problem. In section 3 we formulate modeling the object and recovering transforma-
tions as a combination of two WLS problems. We compute the surface description and
transformation by extracting the principal components of two highly rank-deficient mea-surement matrices with many missing elements, each of which forms a WLS problem. Thefirst WLS problem of recovering rotation matrices and surface normals is further simplified
using the quaternion representation of rotation. A two-step algorithm is presented to modelthe object from a sequence of segmented range images. Section 4 gives a brief description of
a surface patch tracking system. Different modules in the tracking system, range image seg-
mentation, adjacency graph building, and two-view merging are presented. In section 5 we
show that the problem of surface connectivity can be reduced to one of connectivity of sup-
porting lines of a simple polygon. Since the problem of establishing connectivity of support-ing lines can be regarded as both a modified convex hull-like problem and a cell
decomposition problem, we propose a modified Jarvis' march algorithm which successfullyreconstruct the simple polygon. We demonstrate applicability and robustness of proposedPCAMD method by applying our approach to synthetic data and real range images in Sec-
tion 6. We show that, given correspondence, four views are necessary to recover the shapeof an arbitrary dodecahedron (12-face polyhedrr). A study on synthetic data of a dodecahe-
dron shows that our approach is robust with respect to noise and surface mismatching. Our
method yields a statistically optimal model for a given set of views; this method improves as
more views are incorporated. From a sequence of real range images, a polyhedral objectmodel is precisely recovered using the proposed method. A complex toy house model is also
reconstructed from a sequence of range images. Final comments and concU sions are pre-
sented in Section 7.
5
2 Principal Component Analysis with Missing Data
2.1 Motivational example
Suppose that our task is to make a model for a dodecahedron (12-faced polyhedra) from a
sequence of segmented range images. A dodecahedron is a simple and interesting Platonic
solid. Assume that we have tracked 12 faces over 4 nonsingular views. The segmented
range images provide trajectories of plane coordinates {p f=rI....4,p=..12), where
p = (vT, d) represents a planar equation with surface normal v and normal distance to the
origin d. Then we may form a 16 x 12 measurement matrix as follows:
(1 )(1) p(I) p(l) (I) , * * , , ,
p1 p2 P3 P4 p5 P6(2) (2) (2) (2) , , (2) (2) , , * ,
w=- P1 P2 P3 P4 p7 p8(3) (3) (3) * P (3) P (3) (3) * *
P 1 p2 *6 8, 9(4) (4) (4) (4) (4) (4)7 * * * * * P7 Pi0 P 12
where every * indicates an unobservable face since there are only six visible faces from each
nonsingular view. Our modeling task is now to recover the poses of all the 12 faces in a
fixed coordinate system.
V2\ VI
1/-#1
V3 V4
Figure I Distinct views of a dodecahedron
6
If the measurement matrix were complete, our task would be to average all those 12 faces
over 4 views assuming data is noisy. In the absence of noise, any set of 12 faces from one of
4 views will do. The standard way to solve this problem is to apply SVD to this measurt-
ment matrix, whose rank is at most 4 (see section 3.1 for the argument). The measurement
matrix can subsequently be factorized, with proper normalization, into a left matrix Q of
transformation parameters and a right matrix P of plane coordinates
W=QP
where Q(I)IQ(2)
P = [P P2 ... P12] =
[(4)J
and Q 0 is the transformation offth view with respect to the fixed world coordinate system,
and pp is the pth plane equation in the same world coordinate system. Singular value
decomposition has also been successfully applied to shape and motion recovery from a
sequence of intensity images [191.
Unfortunately, the measurement matrix is often incomplete in practice; it is not unusual for a
large portion of the matrix to be unobservable. As we have seen in the above example, half
of the measurement matrix is unknown. When the percentage of missing data is very small,
it is possible to replace the missing eleiients with the mean or an extreme value; this is a
common strategy in multivariate statistics [7]. However, such an approach is no longer valid
when a significant portion of the measurement matrix is unknown.
One common practice in modeling from a sequence of images is to use extrapolation. For
example, we can recover the transformation between view I and view 2 if there are at least
three matched planar surfaces that are non-parallel [8]. Then we extrapolate the invisible
planar surfaces in view I from its corresponding surfaces in view 2 which are visible using
the transformation recovered. And apply the same extrapolation to invisible surfaces in view
1. By repeating this process we can in principle extrapolate the locations of all invisible sur-
faces from visible surfaces [ 121. A final step could be added to fine-tune the result by factor-
izing the extrapolated measurement matrix using SVD. A similar extrapolation approach"propagation method" [191 is used in motion and shape recovery from a sequence of inten-
sity images.
7
The major problem with the extrapolation method is that once the estimated transformation
is incorrect at any step, the extrapolated results will be erroneous. In sequential modeling
errors accumulate and propagate all the way. The fine-tuning process at the last step would
not improve the result dramatically since the extrapolated measurement matrix is inaccurate.To obviate this problem, we make use of more rigorous mathematical tools developed in
computational statistics that caters for missing data without resorting to error sensitive
extrapolation. We will demonstrate the formulation in this section and apply it to multipleview merging in the next section.
2.2 Wiberg's formulation
The problem of object modeling from a sequence of views shown in the previous sectioncan be formulated as a problem of principal component analysis with missing data
(PCAMD), which has been extensively studied in computational statistics. Ruhe [15] pro-
posed a minimization method to analyze one component model when observations are miss-
ing. One component model decomposes an F x P measurement matrix into an F x I left
matrix and a i x P right matrix. Wiberg [22] extended Ruhe's method to the more generalcase of arbitrary component model. We first outline Wiberg's formulation of principal com-
ponent analysis with missing data, before proposing a modified formulation by appropriateindexing, and generalizing the problem as a WLS problem.
Suppose that an F x P measurement matrix W consists of P individuals from an F-variate
normal distribution with mean gi and covariance 1. Let the rank of W be r. If the data is
complete and the measurement matrix filled, the problem of principal component analysis is
to determine u, s, and v such-that
is minimized, where U, and v are F x r and P x r matrices with orthogonal columns,
S = diag ((I,) is an r x r diagonal matrix, g. is the maximum likelihood approximation of the
mean vector and eT - (I.I) is an F-tuple vector with all ones. The solution to this problem
is essentially SVD of the centered (or registered) data matrix w-ep
If data is incomplete, we have the following minimization r blem:
min = p (EQ I)
I= {(fp): Wf, is observed}
where uf. and vP. are column vector notations defined by
1u.1... = &V (EQ 2)Lu T
and
[ T.. =i,2 (EQ 3)
Iv'. - -
Lemma 1
A necessary condition to uniquely solve this problem (EQ 1) is m > r(F+ P-r) + P
where m is the number of observable elements in W.
Proof:
It is trivially true that there are at most r (F + P - r) independent elements from LU
decomposition of an F x P matrix of rank r. Hence, to uniquely solve (EQ 1), the number of
equations (m) has to be no fewer than the number of unknowns (r(F+P-r)+P)). (Q.E.D.)
To sufficiently determine the problem (EQ 1) more constraints are needed to normalize
either the left matrix & or the right matrix V.
If we write the measurement matrix W as an m-dimensional vector w, the minimization
problem can be written as
min 4 - Tf (EQ4)
where
f = w-g-Fu = w-Gv (EQ5)
9
and
U= .. = [iP- . 1=P] , Ji.p(i)"
F and G are of dimension m x rF and m x (r + 1) P respectively, and are computed by
expanding every element f, of f
f= W 5 I., 1_1(i)_ Twi - . ) - 1"vp(i).
where the ith component of w is indexed to the (Jti), p(i))-th component of W, i.e.,
Wi -- Wf(i),p(i), and Wf, = wi(fP).
To solve the minimization problem stated by (EQ 4), the derivative of the objective function
(with respect to u and V,) should be zero, i.e.,
= [FTFu-F(w-li)j =0 (EQ6)T-TrGG-G w
Obviously (EQ 6) is nonlinear because F is a function of v and G is a function of u. In the-
ory, any appropriate nonlinear optimization method can be applied to solve it. However, the
dimensionality is so high in practice that we have to adapt the algorithm to make use of the
special structure of the problem. It can been observed that:
(1) For fixed u, we have a linear least squares problem of v; for fixed v, we have a linear
least problem of u;
(2) since (EQ 6) is also a bilinear problem of u and v, we can successively improve their
estimates by using the updating technique in the NIPALS algorithm [15], i.e., for a given
v, uis updated u = Fe (w- i); foragiven u, v is updated i = G+w. Fe and G+ are the
pseudo-inverses of F and G, respectively.
2.3 Modified Wiberg's formulation
In practice F and G are usually sparse matrices with many zeros. If we appropriately index
w as wP, such that
10
ft= w-I-Hu (EQ 7)
or
W 1, I [ t V1, I ... v,-, U1, I
Wi,22 I2 VI, 2 U 1, 2
S... ... ! OPxr ...
fl WI.P -- V1, P Vr, p U 1, r (EQ 8)
WF,1 I1l V 1, I V,, I UF, I
.... OPxrLWF, P ~L V1 ~ Vj,*PI UF, p
and similarly, if we index w as w2 , such that
f2 = w2 -Ki (EQ 9)
or,
V1, I7
W 1, I U1,I Ui,r 1 V2,1
W2, 1 U2, U2,r 1 OFx•(r+) ...
.... Vr, If= WF, I UF, I UF, r I (EQ 10)
W 1, p U,. I UI,, 1 V1,p
.. -- 0 Fx (r+ I) "--
LWF, Pi LFIt UF, r I Vr, P
Note that H and K are block diagonal matrices.
Because f1 and f2 contain the same observables as f,
2 -2 (EQ II)
and
11
_ HT(w- (EQ 12)
TK-i-K W2 J
since
H - K = . (EQI3)
If data is complete, K is a block diagonal matrix of dimension FP x (r + 1) P, whose block
elements are U matrices of dimension F x (r + 1) , replicated along the diagonal, i.e.,
K (EQ 14)
U1, I Ui1, IU = (EQ 15)
LUF, I UFrlj
when data is incomplete, the elements associated with the missing data are taken out, result-
ing in a matrix of dimension m x (r + 1) PUM.,x (r+, 1) 0 0 U, 0 0K 0 . 0 0 .0 (EQ 16)
0 0 U,px (r+ 0 0 up
whereP F
I MP = m,andmp =Y"f,p-=-I f=I
p = I when Wf, p is observed otherwise Yf, p = 0.
Similarly, when data is incomplete, we have the following matrix of dimension m x rF
H = - 00 = 'i I (EQ 17)o v0 o n~] 0 0.vF
12
whereF P
X ,= m, and n,= yf= Ip=l
The pseudo-inverse matrices of H and K can be easily computed because of their block
diagonal structure,
U10 01K 0= 0 (EQ 18)
V0 01U+
h' = o (EQ 19)
2.4 WLS formulation
The minimization problem (EQ 1) can be generalized as a WLS problem
1 2
min = f f (EQ2O)
where yf is weighting factor for each measurement Wf P.
In the previous discussion, we have assumed that all weights are either one when data is
observable or zero when unobservable. However, in many cases we may prefer to assign dif-ferent weights other than ones or zeros to individual measurement. For example, in recover-
ing the pose of a 3D plane, we can assign the confidence measurement to each recovered
surface normal by its incidence angle with the viewing direction. Different sensor models
can be applied to obtain a weighting matrix if necessary. In the following, we formulate
principal component analysis with missing data as a WLS problem.
We introduce two FP x FP diagonal weight matrices,
r = diag (r,, r 2, .... p) (EQ 21)
13
and
0 = diag (0D, 12, ..... OF) (EQ 22)
where
F = diag (ypl, 2',p,) = I,...,P,
and
'f = diag(yl,f y2,f ypf), f 1 F.
The minimization problem becomes
rain Y- 2 Y= Y2 2(EQ 23)
where
o= r f, and fy2 = 0 f2.
The solution to the above problem is when the first order derivative of the objective function
becomes zero. The derivative of the objective function is
=" Hyu- H; (w (EQ 24)T - T
where
Hy= KH =l(EQ 25)
and
Ky = OK =(EQ 26)
T[. of• F
Therefore, after computing the pseudo inverses of Hy, and K
" wy!.- : ... . .. . ... . . . .. -... ... . . .. .
14
VT (EQ 27)
( •.~U)' (U 41r)1
/=[- ... ]=[ (EQ 28)
(uT(TxU)-i (UT,*,)j
we can then use the PCAMD (principal component analysis with missing data) algorithm to
solve the WLS problem. Our formulation is essentially a modified NIPALS Ruhe-Wiberg
algorithm. The algorithm is as follows:
Algorithm PCAMD
(1) initialize
(2) update
u = H! (w -)(3) update
i = /•rW2
(4) stop if the algorithm converges, or go back to (2).
Remarks:
(1) Ruhe [15] also suggested using Newton and Gauss methods to speed up the
convergence of the NIPALS method. In practice, we found that the NIPALS method
converges within desired tolerance in several iterations in our experiments.
(2) Ruhe [ 15] and Wiberg [22] also showed that the more the missing data, the worse the
result will be. It is hardly surprising because the method is basically an interpolation
of all observable elements. Statistically this corresponds to decreasing robustness of
the estimate for the principal components given the observations. Fortunately object
15
modeling from multiple views, we can always take many views to form a well
constrained problem for our modeling purpose. Determining a minimally acceptable
number of views can be regarded as a sensor planning problem.
(3) The missing data can also be extrapolated as long as we find some sub-blocks in the
measurement matrix which satisfy Lemma 1. The issue of obtaining those blocks is
non-trivial. Once the missing data have been augmented, a linear or nonlinear
optimization method can be applied to solve the original problem. It should work
well if data is noise-free, i.e., only the first r singular values of the reconstructed
measurement matrix are non-zero. However, this method becomes of questionable
utility when any result from sub-block computation is inaccurate.
(4) There are statistical ways to improve the solution, for example, the metrically
Winsorised residuals method (181. This method is based on the assumption that each
measurement is corrupted by additive Gaussian noise. The metrically Winsorised
residuals method adjusts the weight for each measurement depending on its residual
error.
16
3 Merging Multiple Views
Principal component analysis with missing data has been formulated to a WLS minimiza-
tion problem in the previous section and a PCAMD algorithm is proposed to solve it. From
the motivational example of a dodecahedron it is clear that object modeling from a sequence
of views should be formulated as a WLS problem.
In this section, we show that multiple view merging can be formulated as a combination oftwo WLS problems. The first WLS problem involves rotation matrices and surface normals,
which is independent of translation. Once the first problem is solved, the second WLS prob-
lem yields translation vectors and normal distances to the origin. The first WLS problem ofdetermining rotation matrices and surface normals can be further simplified by representing
the rotation matrix using the quaternion. A straightforward two-step iterative algorithm can
be devised to solve these two problems using the PCAMD algorithm from the previous sec-
tion.
3.1 Two WLS problems
Suppose that we have tracked P planar regions over F frames. We then have trajectoriesof plane coordinates { (Vfp, df.) I f= I .. F,p = i_., P} where Vfp is the surface normal of thepth patch in thefth frame, and dfp is the associated normal distance to the origin. To facilitate
the decomposability of rotation and translation, instead of forming a 4F x P measurement
matrix as in section 2.1, we form surface normals vfp into a 3F x P matrix w"• and dis-
tances dfp into an F x P matrix w(d. w(') and w(d) are called the normal measurement
matrix and distance measurement matrix respectively.
It can be easily shown that w(v) has at most rank 3 and w"( has at most rank 4 when noise-free, therefore, w(" and w('4 are highly rank-deficient. We decompose w('" into
W(v) = R V (EQ29)
where
R R-
17
is the rotation matrix of each view with respect to the world coordinate system, andv - [v, ..., vpl is the surface normal matrix in the world coordinate system. Since R is a
3F x 3 matrix and V is an 3 x P matrix, the rank of w(') is at most 3.
Similarly, we can decompose w~d) into
W(d) = T M (EQ 30)
M= v] ... [vp ]andT= [ .:
[d 1] .. [ dp - 1t FR F I]
tf and Rf are the translation vector and rotation matrix of view f with respect to a fixedworld coordinate system.
Note that decomposition of w(dI depends on the decomposition of w('). Since M is 4 x P
and T is F x 4, the rank of w"0 ) is at most 4.
We can also decompose w~d' into
W " IV P] + [di dp](EQ 3 1)
When all elements in the two measurement matrices are known, we need to solve two least-
squares problems. However, since only part of the planar regions are visible in each view,
we end up with two WLS problems instead. The first least squares problem, labeled as
WLS- 1, is
ran .(V) t(V)2mi (y, pW ,p- IRV]fp)) (EQ 32)
f= i...F.pffI .. ,P
and the second one, denoted as WLS-2, is
min ( (d /, P [TM]f)1 ) (EQ 33)
= I..F, pf .P
where
18
4, 0 if surface p is invisible in frame f, and yf = 1 otherwise. All weights can be any
number between zero and one, depending on the significance or confidence of each mea-
surement. A similar WLS formulation is also used in [18].
3.2 Quaternion WLS-1
It appears, from the last section, that we can devise a naive two-step algorithm which solves
WLS-I and subsequently WLS-2 by applying the PCAMD algorithm to both problems.
However, in order to solve WLS- 1, we iterate R ( as if it had nine independent parameters,
while it is a nonlinear trigonometric function of three parameters. Although it is possible to
normalize RO after every iteration, it may perform poorly in terms of robustness and effi-
ciency.
In fact, several representations of rotation are often used in practice:
(1) An orthonormal rotation matrix R
(2) An rotation axis a and an rotation angle 0
(3) A unit quaternion q
A quaternion is a 4-tuple (ws) where w is a 3-vector and s is a scalar. The mapping between
a unit quaternion and a rotation axis along with a rotation angle is given by w = sin (0/2) a
and s = cos (0/2). The quaternion representation of the rotation matrix leads to a simple
way of solving minimization problems of 3D point matching and surface normal matching,
as demonstrated in [8].
The WLS-I problem (EQ 32) can be decomposed into F minimization problems
pmin l Jjwf-R'vPII (EQ 34)
p=lI
where
f = [W1,..., WFI., I f , ) = diag(yf1 .... yf.P)
The above problem can be reformulated using quaternion as
19
P
min Yy <jjwf-qU)v,,'JJ (EQ 35)P=l
subject to Iql = 1
where 4 is the conjugate quaternion of q, and
P P P£1Yi1 w -q 'yJ w~q -qUvpf =P q E 6P=1 P=I p=I
A P are symmetric matrices because wfq - q V p is a linear function of q Obviously
P
B = I Ap (EQ 37)P= I
is also symmetric, and the minimization problem 0 becomes
min q( B T qBU (EQ 38)
The solution to the above minimization problem is the eigenvector qmin corresponding to
the minimum eigenvalue of the matrix B
3.3 Iterative algorithm
We combine quaternion-based rotation matrix updating to form a two-step algorithm to
solve both the first and the second WLS problems. The algorithm is as follows:
Algorithm two-step WLS's
Step 0 Initialization
(0.1) read in measurement matrices W , W(d)M(d)(0.2) read in weight matrices y Y
(0.3) initialize R, vectorize R to v
20
Step 1 WLS-I
(1.1) vectorizeW(v) tow 1 and wv2
(1.2) update Hv
(1.3) update
u -HT wvy vi1(1.4) update
Bh
(1.5) update
q(f
and trsform to R, vectorize to v
(1.6) go to (1.2) if not converged, otherwise advance to Step 2
Step 2 WLS-2
(2.1) vectorize W(d) to wdl and wd2
(2.2) update H+
(2.3) update
u = H+TWdl(2.4) update
(2.5) update
v=K~dyWd2
(2.6.) stop if converged, otherwise go to (2.2).
We have not explicitly discussed the normalization problem in our WLS approach. The nor-
malization problem occurs because the measurement matrix is rank-deficient, hence, there
are infinite solutions to the minimization problem (EQ 1) unless an additional constraint is
imposed. This additional constraint is generally problem dependent; for example, the 2-
norm of the factorized left matrix is constrained to be one [ 15]. Fortunately, we have implic-
itly constrained our rotation matrices in its quaternion representation. The remaining con-
straint in the first WLS is the normalization of surface normal ve,.tors which are constrained
to be of unit magnitudes.
Prior to multiple view merging, we need to track surfaces so that normal measurement
matrix and distance measurement matrix can be formed. The next section describes our sur-
face tracking algorithm.
21
4 Surface Patch Tracking
In this section, we briefly overview each module of our surface patch tracking system: range
image segmentation, adjacency graph building, and two-view matching.
4.1 Range image segmentation
There are many different techniques for range image segmentation. By and large they can be
divided into feature-based and primitive-based approaches, although statistics-based
approaches have also been introduced recently. Feature-based approaches yield precise seg-
mentation but are sensitive to noise in practice. For example, Gaussian and mean curvatures
can be used to label different regions before region growing; however, this process is quite
sensitive to noise because of the second order derivative. Primitive-based approaches are
more robust to noise but constrained by the number of primitives. The higher the degree of
surface polynomial, the more difficult and the less robust the segmentation is likely to be.
We have used the primitive-based region growing segmentation method of [8]. The two
types of surface primitives used are the planar surface and the quadric surface. The regions
are established via region growing from seed points, i.e., the seed points are chosen from
points which are closest to their approximating primitives, and then merged with their
neighbors until the best-fit errors become unacceptable.
4.2 Adjacency graph
Once we have successfully segmented the range data for each view, the range image associ-
ated with view i can be represented as a set of planar regions Ii = { vi, di,, cij}, where vii
and dij are the normal ani distance of the jth segment planar surface respectively, and cii is
the centroid of jth segmented region.
From each view of the 3D object, we build an adjacency graph where every node in the
graph represents a visible planar region and each arc connects two adjacent nodes. The adja-
cency graph is updated whenever this view is matched with another. Eventually we have
adjacency information among all visible planar regions after tracking all of them for the
whole sequence. From the adjacency graph, all the object vertices can be located; thus, 3D
object model is obtained. However, augmenting adjacency graph is difficult for concave
objects because of occlusion. A better way of establishing spatial connectivity among all
surfaces is discussed in section 5.
22
We have implemented the planar surface patch tracking system which employs a wavefront
algorithm to generate the adjacency graph. The wavefront algorithm makes use of range
data because there is significant change in range data across an occluding edge.
4.3 Matching two views
Given two adjacent segmented images I, and 12, we would like to find correspondence
between different regions in two views, i.e., we want to find a mapping ý: (11 -- 12) such
that a certain distance measurement d (I1, 12) is minimized.
Two questions arise in matching two views of planar regions. The first is how to make corre-
spondence between two views; the second is how to recover the transformation between
them. Our solution to the first problem is to use adjacency information between two seg-
mented patches and between segmented surface normals. If displacement between two
views is relatively small, there should be only linear shape change [11] within the same
aspect, corresponding segmented regions are of similar size (number of points), centroid,
and surface normals. When a new aspect appears, which signals a nonlinear shape change,
there would be significant change in these parameters. There may not always be solutions to
the second problem because we need at least two corresponding non-parallel faces to deter-
mine rotation and three to determine translation. In practice, we can make the assumption
that we always have two non-parallel corresponding faces in two adjacent views.
In fact, solving the second problem can be of help to the first problem because we can then
make use of the hypothesis-and-test approach. We iteratively select two pairs of non-parallel
faces from the two images to be matched, estimate the corresponding rotation matrix, and
then attempt to match the rest of the faces. We always choose two adjacent faces from both
images, and match them based on surface normal, distance and centroid of segment regions.
The number of faces matched and consistency in face adjacency are used in the distance
measure between two matches. The estimated transformation matrix is only used to help
building the adjacency graph, while the precise transformation is robustly recovered from
our WLS method.
Multiple view tracking is done by sequentially matching two adjacent views. Whenever a
new view is added, the adjacency graph and the weight matrix are automatically modified.
Because of the problems associated with updating adjacency graph, subsequent to surface
patch tracking and multiple view merging, we use another algorithm to establish the spatial
connectivity among surfaces.
23
5 Spatial Connectivity
Once we have extracted the equations of planar surfaces of the object, we then need to
establish spatial connectivity relationship among these surfaces. One approach is to build
adjacency graph from a sequence of views, as discussed in previous section. However, aug-
menting adjacency graph whenever a new view is introduced, is quite ad-hoc. In this sec-
tion, we present a new approach of recovering surface connectivity after all surface patches
are recovered. We show that the problem of spatial connectivity of boundary surfaces can be
reduced to one of connectivity of supporting lines of a simple polygon.
5.1 Haft-space intersection and union
We assume that every planar patch P of object model is a simple polygon. A simple polygon
does not self-intersect. Every (infinite) plane divides the space into two parts, inside and out-
side, with surface normal pointing towards the external side of the object. Given an
unbounded planar surface, if we intersect all other planar surfaces on it, we obtain support-
ing lines as illustrated in Fig.2. Each supporting line is directed so that the interior of P lies
locally to its right. The right half-plane created by such a directed supporting line c is called
the supporting half-plane, and is characterized as supporting the polygon [61; however, a
concave P might not all lie in the right half-plane as indicated in Fig.2.
a
Jac
b
d
Figure 2 A simple polygon and its supporting lines (stippled and solid lines)
24
For each point x in the plane, if we know which side of each supporting line x lies on, we
know if x is inside P. Therefore, the polygon P (and its interior) can be represented as a bool-
ean formula whose atoms are those supporting lines. In other words, a simple polygon can
be represented by intersection and union of its supporting line. For example, a boolean for-
mula for the polygon in Fig.2 can be ca Q ab E bd ( dc. This Guibas style [6] formula is
obtained by complementing the second supporting line at a convex angle, and the first sup-
porting line at a concave angle when we go around the polygon. Other boolean formulas
such as Peterson style are also possible [6]. Once spatial connectivity is established, the
Guibas style formula is straightforward.
5.2 Modified Jarvis' march
The problem of establishing spatial connectivity of supporting lines can be formulated as a
mocdified convex hull-like problem which involves only vertices. This problem can also be
regarded as one of cell decomposition which involves data points. We propose a modified
Jarvis' march algorithm to reconstruct simple polygons from supporting lines and valid data
points. The algorithm to recover spatial connectivity among 3D surfaces is discussed in sec-
tion 5.3.
Definition 1
A point is defined as valid in a simple polygon if there exist sufficient valid data points aroundits neighborhood.
Lemma 2
The intersection point P of two supporting lines is valid in a simple polygon if and only if theintersection of two corresponding half-planes is valid locally at P.
Proof:
When the intersection of two half-planes is valid iocally at P, the intersection point of
these two supporting lines is valid by its definition.
Assume that the intersection point of two supporting lines is valid. Since two lines divide
the plane into four regions, there must exist at least one such region among four around the
25
intersection point that is a valid cell of the simple polygon. Therefore, the intersection of
two half-planes is valid locally at P. (Q.E.D.)
The Lemma 2 leads to a modified Jarvis' march algorithm of reconstructing simple polygon
from supporting lines and valid data points.
To construct the simple polygon from all supporting lines and valid data points, we first pre-
compute all intersection points which are candidates of vertices of the simple polygon. If we
march successive vertices with the least turning angle, we obtain their convex hull; this is
referred to as Jarvis' march algorithm [ 14]. The kernel of the simple polygon, if it exists, can
also be found by intersecting all half-spaces. Using Lemma 2, however, enables us to find
the correct simple polygon by marching all points whose local neighborhood is valid. We
call this algorithm the "modified Jarvis' march".
Assume that we have first found the lowest left point p] of the set of vertex candidates,
which is certainly a convex hull vertex, but not necessarily a vertex for our simple polygon
(unless it is valid locally). For example, in Fig.3, p] is not a simple vertex because pSplp2 is
not a valid triangle cell (valid cells are shaded areas which show valid data points). Since
p6p2p3 is a valid triangle cell, we start our algorithm from p2.
p13 p14 p151 lo P1p3* p
*pl p2 ip 3 p4
Figure 3 Example of modified Jarvis' march and cell decomposition. Shaded area represents valid datapoints.
26
A data smcure is defined for each intersection point P as follows
-{
interswt-polint left, right, up, down;inutctst-point previous, next;
Sinrsect-po•nt P;
Fig.4 shows the relationship among the members of the data structure. Assume that an inter-
section point is intersected by only two supporting lines.
P->up = P4 P->right = P3
P->next =
-eftPP->down = P2P->previous = P,
Figure 4 Illustration of data structure of intersection point
After the starting vertex is found, we march for the next vertex as illustrated in Fig.4. If
there are sufficient data points in cell PP2P3, next valid vertex is P2; If P2 is not valid, we
check if P3 is valid; If P3 is also invalid, P4 must be valid, or an error will occur. The march
ends when the next vertex is the starting vertex. The modified Jarvis' march (MJM) algo-
rithm is given as follows:
Algorithm MJM
Step 1. initialize starting vertex
START->previous = NULL,P = START->next,P->previous = START;
27
Step 2. march
P->left = PI; P->down = P2; P->right = P3; P->up = P4;ifceli PP2P3 valid, P->next = P2 (case 1)else if cel PP3P4 valid, P->next = P3 (case 2)else if cell PP4P, valid, P->next = P4 (case 3)ele error occurs;
Step 3. terminate
If P->next = START.
A postprocessing step may be necessary to remove points which belong to case 2 in step 2 of
the march algorithm. These points are on the same line with its previous point and its nextpoint. For example, in Fig.3, p12 can be removed because p8 and p16 make it redundant.
As can be seen from the above algorithm and Fig.3 as well as Fig.4, the problem of single
polygon reconstruction from supporting lines and valid data points is one of cell decomposi-
tion. As we march around all supporting lines, the Guibas style boolean formula of the sim-
ple polygon can be readily formulated.
5.3 3D spatial connectivity
So far we have discussed the problem of recovering the connectivity of supporting lines of a
simple polygon. The approach uses information at both signal level (real data points) and
algebraic level (line equations). The same hybrid approach can be applied to the problem of
spatial connectivity of planar surfaces in 3D.
Indeed, the problem of connectivity of planar surfaces in 3D can be reduced to a set of prob-
lems of connectivity in 2D. Assume that we have recovered a set of N face equations and
transformation among different views (e.g., from PCAMD). All valid data points from mul-
tiple views can be merged in the same world coordinate system. For each face Fi, if we
intersect all other N-I faces Fj (j = 1, ... , N- I, j * i). with Fi and project all these lines
onto Fi, we get M (=N-1) supporting lines on face Fi. We also project nearby 3D points onto
this face Fi. Without loss of generality, we assume that no two supporting lines are parallel
(or a normal threshold d can be set such that vivj Ž_ d). For any of the M supporting lines, weintersect it with the rest M-I lines, we get all possible candidates for vertices of the valid
simple polygon which is the model of face Fi, as illustrated in Fig.5. The modified Jarvis'
28
march algorithm can be then applied to each of the N faces accordingly. By connecting all
polygons recovered, we get the entire 3D object model boundary. A simple algorithm can be
accordingly constructed to cstablish 3D spatial connectivity then.
Fig. 5 shows an example. It is a face of a toy house model. The complete house model is
reconstructed and presented in next section. Fig.5(a) shows intersections of supporting linesand nearby data points projected on this face, while Fig.5(b) superimposes a reconstructed
simple polygon model of this face on Fig.5(a).
(a) (b)
Figure 5 Reconstruction of connectivity. The tiny dots represent projected nearby data points.Intersections of supporting lines are represented by black circles. Vertices of reconstructedsimple polygon are represented by small squares.
29
6 Experiments
In this section, we present results of applying our algorithm on synthetic data and on real
range image sequence of objects. We demonstrate the applicability and the robustness of our
approach using synthetic data, and present the recovered model from real range images.
6.1 Synthetic data
Our synthetic data set consists of a set of 12 planes as in the case of the dodecahedron in the
last section. A dodecahedron with 4 different views is shown in Fig. 1 in section 2.
6.1.1 Applicability
In this section we study the applicability of the proposed approach. In order to recover the
shape of a dodecahedron, given correspondence, how many views are necessary?
For example, we pick 4 distinct views from the viewing sphere so that there is no singular-
ity. Singularity occurs when less than 6 faces are visible. For example, we can formulate two
measurement matrices for surface normals and planar distances as follows:
v() (1) () (1) (1) l * , , ,V1 2 V3 V4 V5 v6
(2) v (2) •• (2)v(2) (2) (2)*_ VI 2 V3 V4 V7 8 (EQ 39)
v(3) V(3) *(3) * V (3) v(3) (3) * *1 2 68 V 9 V 10
, (4) (4) (4) . (4) (4) (4)V7 V8 V9 V10 vi V 12
d•') d(l) d•") dal) d•') d") * * * * * *
1 2 3 4 5 6
Sd(2 ) d.(2) d.(2) d(2) * * (2) d(2) * * * ,
1 2 3 4 7 8 (EQ 40)d(3 ) d( 3 ) d (3) d (3) d (3) d(3) *
1 2 6 8 9 to
* * * * * (* d04 d((4) d0( 4 d2(4)L6 8 9 10 11 12 _
In order to solve the first WLS problem uniquely for F frames, we need
18F> 3F+ 3P
since we have 18F equations and 3F unknowns for rotation matrices and 3P unknowns
for surface normals. For the second problem, we have 6F equations, but there are only F
30
unknown translation vectors and P unknown plane distances after using results from the
first problem. Therefore, the necessary condition to uniquely solve the second problem is
6F > 3F + P
Since P is 12, F a 4 is the unique solution for both problems. Again, we are not concerned
with the normalization problem here.
6.1.2 Robustness
We study the effectiveness of our approach when data is corrupted by noise and mismatch-
ing occurs. Our synthetic data consists of a set of 12 surface patches randomly distributed
around all faces of a dodecahedron. Correspondence is assumed to be known. Only the first
WLS problem is studied because of the similarity between those two WI S problems. Theminimization of weighted squares distance between reconstructed and given measurement
matrices leads to the recovery of surface equations and transformations.
To study the error sensitivity on reconstruction of our algorithm, we take four nonsingular
views of the dodecahedron where each component in every surface normal is corrupted by aGaussian noise of zero-mean and variable standard deviation. As we have shown in the pre-vious section, at least 4 views are required to recover the dodecahedron model. Fig 6 shows
that our algorithm converges in a few steps. The cases with standard deviation a of 0.05,
0.1, 0.2, and 0.5 are studied. Notice that the case with standard deviation of 0.5 yields verynoisy original data. As we take more views, the sum of weighted squares error is reduced.
Fig. 7 plots the normalized weighted least squares error for 4 views, 8 views, 12 views, and
16 views respectively, while gaussian noise with 0.5 standard deviation is present. The
squares errors are normalized because the number of observations increases as more views
are introduced.
The more interesting case is when mismatching occurs. Obviously if the face appears only
once in the whole sequence, then its reconstruction depends on the amount of noise. When
this face appears in more and more views, its reconstruction using our WLS method is aver-
aged over these views. Fig. 8 gives the reconstructed errors of a face which appeared 12times in 16 views. When only two views are matched, the reconstructed surface normal is
deviated from its normal by 18.2 and 38.9 degrees when standard deviation a is 0.1 and 0.2,respectively. When more views are added, the angle between the reconstructed surface nor-
mal and its normal decreases to around 10 and 20 degrees respectively.
31
When an observed surface normal is wrong in one particular view, the conventional sequen-
tial reconstruction method results in an erroneous recovered surface normal and transforma-
tion. The errors propagate as new views are introduced regardless of the number of views in
which this surface is visible. However, our WLS approach gives appreciably smaller recon-
struction error on this observed surface normal by distributing the errors in all views. In any
case, in general, our approach cannot be worse than the sequential approach. Fig. 9 com-
pares the reconstructed errors of sequential method and WLS. There are 12 observations of
this surface normal in 16 views and its first observation is off by an angle between 0' and
400. The reconstructed models of sequential method and WLS method are shown in Fig. 6
along with the original model, for the case of 400 angle deviation of one surface normal in
the first view. Fig. lOa shows a badly-skewed model, which is the worst case from the
sequential method since the error was introduced in the first frame. Fig. lOb shows the
reconstructed model by the WLS method while the original dodecahedron model is pre-
sented in Fig. 10c.
Sum of Weighted Squares Error
10.0 . sigma 0.05. sigma 0. 1
sigma 0.2sigma 0.5
Number of
0.0 0 10 Iterations
0 10
Figure 6 Effect of noise
32
Error10.0- 00 *-• 4 views
"--•-8 views-' 12 views"- 16 views
" "2 •"......• ...... • ..... ..... ......
0.0 Number of
0 10 Iteratons
Figure 7 Effect of number of views
Error (degree)40.0 !
sigma 0.2.
30.0 .
20.0 ......
1 0 .0. ... .. . .....
0.0 Number ofFaces matched
2 12
Figure 8 Reconstructed error vs. number of matched faces
33
Reconstruction Error (degree)
40.0sequential
30.0
20.0
0.0 •<' Input error(degree)0 10 20 30 40
FMgure 9 Comparison between sequential reconstruction and WLS method
(a) (b) (c)
Figure 10 Recovered and original dodecahedron models (a) worst case of sequential method; (b) ourWLS method; (c) original model
6.2 Real range image sequence
We have applied our algorithm to a sequence of range images of a polyhedral object, using
the planar region tracker described in section 3. Figures 11(a) and 11(b) show the whole
sequence of 12 views and their corresponding segmentation results. Segmentation is not
perfect in several views. Figure 12 shows the result of our system, two shaded views of
recovered object model. Figures 13 and 14 show another example of a toy house.
34
(b)
Figure 11 A sequence of images of a polyhedral object (a) original images; (h) after segmentation
35
Figure 12 Two views of shaded display of a recovered model
36
I.ii~ure I.? A se~quence o~f imlages ofI a toy house ta original images; 11) 1 after segmlentationl
37
Figure 14 Four views of texture mapped display of a reconstructed house model
38
7 Concluding Remarks
An object modeling approach using multiple range images has been described in this paper.
The boundary representation object model is recovered and integrated from different views.
One significant contribution of this work is the application of principal component analysiswith missing data to object modeling from a sequence of views. An inherent problem in
multiple view integration is that the information observed from each view is incomplete and
noisy. Based on Wiberg's formulation, we have generalized principal component analysis
with missing data as a WLS minimization problem and presented an efficient algorithm,
PCAMD, to solve it.
With zero weights assigned to the unobservable data, merging different views can be formu-
lated as a combination of two WLS minimization problems. By applying the PCAMD algo-
rithm to both problems, we get a straightforward two-step algorithm in which the first step
computes surface normals and rotation matrices by employing the quaternion representation
of the rotation matrix; the subsequent step recovers translation vectors and normal distances
to the origin. Experiments on synthetic data and real range images indicate that our
approach converges quickly and produces good models even in the presence of noise and
mismatching. An accurate polyhedral object model reconstructed from a sequence of real
range images is presented. A complex toy house model is also reconstructed.
When motion between two views is relatively small, we-can track different segmented sur-
face patches by making use of surface normals, distances, centroids, and adjacency informa-
tion. An adjacency graph is built for each view and modified as the viewing direction
changes. A significant advantage of surface patch tracking, as opposed to other methods
such as point matching and line segment tracking, is that surface patches can be more reli-
ably extracted and tracked. One problem in our current system is that the composite graph
has to be accurately recovered to extract vertices and edges. This can be done as shown in
section 4 and section 5. However, we believe that the composite graph may not be neces-
sary. One possible solution, which we are currently working on, is to combine half-space
intersection and union of surface equations and range data to eliminate the use of the com-
posite graph.
Another contribution of this work is a hybrid approach of establishing spatial connectivity
of boundary surfaces. The spatial connectivity of surfaces, and in particular, the supporting
39
lines of a simple polygon, can be obtained by combining algebraic equations of surfaces and
data points merged from multiple views once transformation is recovered.
"One general principle in computer vision is, if surface information is not enough to deter-mine each surface locally, use global constraints that constrain relative configuration of thesurfaces so that the total degrees of freedom decrease." [17]. The object modeling technique
presented in this paper is an example of this principle, where the algebraic structure of sur-
face equations from multiple views is used as the global constraint. The recovered objectmodel is statistically optimal because it is most consistent with all of the views in the sense
of weighted least squares. By observing and employing different forms of input redundancy,
our approach can be easily extended to other vision problems such as shape and motionfrom a sequence of intensity images. We are also working on applying our techniques to
more complicated scene modeling.
Acknowledgment
We thank Sing Bing Kang for his many valuable comments which have significantly
improved the quality of this paper. We would also like to thank David Chan for helping seg-mentation of range images, and Mark Wheeler for proofreading early versions of this paper.
References
[1] N. Ahuja and J. Veenstra. Generating Octrees from Object Silhouettes in Ortho-
graphic Views. IEEE Trans. PAMI Vol. 11 No. 2, pp. 137-149, 1989.
[2] F. Arman and J.K. Aggarwal. Model-based Object Recognition in Dense-Range
Images - A Review. ACM Computing Surveys. Vol. 25, No. 1, pp. 5-43, 1993
[3] B. Bhanu. Representation and Shape Matching of 3-D Objects. IEEE Trans. PAMI
Vol. 6, pp. 340-351, 1984.
[4] Y. Chen and G. Medioni. Object Modeling by Registration of Multiple RangeImages. Proc. IEEE Int. Conf. R & A, pp. 2724-2729, April 1991.
[5] C. Debrunner and N. Ahuja. Motion and Structure Factorization and Segmentation
of Long Multiple Motion Image Sequences. Proc. ECCV, pp. 217-221, 1992.
40
[6] D. Dobkin, L. Guibas, J. Hershberger, and J. Snoeyink. An Efficient Algorithm for
Finding the CSG Representation of a Simple Polygon. Algorithmica, 10: 1-23, 1993.
(7] Y. Dodge. Analysis of Experiments with Missing Data. Wiley. 1985.
[8] O.D. Faugeras and M. Hebert. The Representation, Recognition, and Localization of
3-D Objects. Int. J. Robotics Research. Vol. 5, No. 3, pp. 27-52, 1986.
[9] F.P. Ferrie and M.D. Levine. Integrating Information from Multiple Views. Proc.
IMEE Workshop on Computer Vision. pp. 117-122, 1987.
[10] G.H. Golub and C.F. Van Loan. Matrix Computation. 2nd Edition. The John Hop-
kins University Press. 1989.
[11] K. Ikeuchi. Generating an Interpretation Tree from a CAD Model for 3D-Object
Reconstruction in Bin-Picking. Int. J. Computer Vision. pp. 145-165, 1987.
[12] B. Parvin and G. Medioni. B-rep from Unregistered Multiple Range Images. Proc.
IEEE Int. Conf. R & A, pp. 1602-1607, May 1992.
[13] C.J. Poelman and T. Kanade. A Paraperspective Factorization Method for Shape and
Motion Recovery. CMU-CS-92-208, Oct 1992.
[14] F.P. Preparata and M.I. Shamos. Computational Geometry. Springer-Verlag. 1988.
[15] A. Ruhe. Numerical Computation of Principal components when Several Observa-
tions are Missing. Tech Rep. UMINF-48-74, Dept. Information Processing, Umea
Univ., Umea, Sweden, 1974.
[16] M. Soucy and D. Laurendeau. Multi-Resolution Surface Modeling from Multiple
Range Views. Proc. IEEE CVPR, pp. 348-353, 1992.
[17] K. Sugihara. Machine Interpretation of Line Drawings. The MIT Press, 1986.
[18] R. Szeliski and S.B. Kang. Recovering 3D Shape and Motion from Image Streams
using Non-linear Least Squares. DEC CRL 93/3, 1993.
[19] C. Tomasi and T. Kanade. Shape and Motion from Image Streams under Orthogra-
phy: A Factorization Method. Int. J. of Computer Vision, 9:2, pp. 137-154, 1992.
[20] S. Ullman. The Interpretation of Visual Motion. The MIT Press. 1979.
[21] B.C. Vemuri and J.K. Aggarwal. 3-D Model Construction from Multiple Views
Using Range and Intensity Data. Proc. CVPR, pp. 435-437, 1986.
[22] T. Wiberg. Computation of Principal Components when Data are Missing. Proc.
Second Symp. Cc,.nputational Statistics, pp. 229-236, Berlin, 1976.