On the distortion of shape recovery from motion

11
On the distortion of shape recovery from motion Tao Xiang a , Loong-Fah Cheong b, * a Department of Computer Science, Queen Mary, University of London, London E1 4NS, UK b Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore 119260 Received 12 March 2003; received in revised form 26 November 2003; accepted 12 February 2004 Abstract Given that most current structure from motion (SFM) algorithms cannot recover true motion estimates reliably, it is important to understand the impact such motion errors have on the shape reconstruction. In this paper, various robustness issues surrounding different types of second order shape estimates recovered from motion cue are addressed. We present a theoretical model to understand the impact that errors in the motion estimates have on shape recovery. Using this model, we focus on the recovery of second order shape under different generic motions, each of these motions presenting different degrees of error sensitivity. We also show that different shapes exhibit different degrees of robustness with respect to its recovery. Understanding such different distortion behavior is important if we want to design better fusion strategy with other shape cues. q 2004 Elsevier B.V. All rights reserved. Keywords: Shape recovery; Structure from motion; Error analysis; Iso-distortion framework 1. Introduction In spite of the best efforts of a generation of computer vision researchers, we still do not have a practical and robust system for accurately reconstructing shape from a sequence of moving imagery under all motion-scene configuration. There have been a few theoretical analysis of current SFM algorithms in an attempt to understand these algorithms’ error behaviour [18,23,24]. Questions have also been asked about the goals of shape reconstruction as well as the nature of shape representation appropriate for achieving these goals. Much of the intervening research emphasized either the importance of a closer coupling with tasks to be performed by the systems [2] and/or, often correlatively, a more qualitative representation of depth [9,13]. The view is that the physical world imposes complexity such that adequate interpretation is impossible outside the context of tasks. Accordingly, less metric forms of depth represen- tation such as projective depth and ordinal depth 1 might be adequate for accomplishing these tasks. Shape reconstruction typically occurs in a context where different cues such as shading, contour, range data and motion are available [7]. It has been stated that different depth cues are suited for recovering different kinds of depth representation. For instance, motion and stereo cues are usually used to recover point depth values, whereas texture and shading cues are more suited for recovering second order depth properties such as curvatures. There have indeed been a number of computational works which exploited the complementary strengths of different depth cues by using a proper fusion scheme [3,4,10,19]. There has also been a significant increase in the number of psychophysical studies that examine fusion strategies utilized by human observers [3,11,21,25]. Second order shape properties such as concavity are important for many perceptual tasks. While it is well known that the recovery of second order shape using motion cues is sensitive to errors, are there certain types of second order shapes, which are more amenable to recovery? What is the robustness of the recovered shapes with respect to the motion types? If the confidence level of the shape recovered from motion can be ascertained in details against various motion-scene configurations, it then becomes possible to adopt a proper fusion mechanism to combine the depths recovered from different cues to produce better results. 0262-8856/$ - see front matter q 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2004.02.005 Image and Vision Computing 22 (2004) 807–817 www.elsevier.com/locate/imavis * Corresponding author. Tel.: þ 65-6874-2990; fax: þ 65-6779-1103. E-mail addresses: [email protected] (L.-F. Cheong), txiang@dcs. qmul.ac.uk (T. Xiang). 1 Ordinal depth representation is a shape representation which only allows us to order the values of the scene depths.

Transcript of On the distortion of shape recovery from motion

Page 1: On the distortion of shape recovery from motion

On the distortion of shape recovery from motion

Tao Xianga, Loong-Fah Cheongb,*

aDepartment of Computer Science, Queen Mary, University of London, London E1 4NS, UKbDepartment of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore 119260

Received 12 March 2003; received in revised form 26 November 2003; accepted 12 February 2004

Abstract

Given that most current structure from motion (SFM) algorithms cannot recover true motion estimates reliably, it is important to

understand the impact such motion errors have on the shape reconstruction. In this paper, various robustness issues surrounding different

types of second order shape estimates recovered from motion cue are addressed. We present a theoretical model to understand the impact that

errors in the motion estimates have on shape recovery. Using this model, we focus on the recovery of second order shape under different

generic motions, each of these motions presenting different degrees of error sensitivity. We also show that different shapes exhibit different

degrees of robustness with respect to its recovery. Understanding such different distortion behavior is important if we want to design better

fusion strategy with other shape cues.

q 2004 Elsevier B.V. All rights reserved.

Keywords: Shape recovery; Structure from motion; Error analysis; Iso-distortion framework

1. Introduction

In spite of the best efforts of a generation of computer

vision researchers, we still do not have a practical and robust

system for accurately reconstructing shape from a sequence

of moving imagery under all motion-scene configuration.

There have been a few theoretical analysis of current SFM

algorithms in an attempt to understand these algorithms’

error behaviour [18,23,24]. Questions have also been asked

about the goals of shape reconstruction as well as the nature

of shape representation appropriate for achieving these

goals. Much of the intervening research emphasized either

the importance of a closer coupling with tasks to be

performed by the systems [2] and/or, often correlatively, a

more qualitative representation of depth [9,13]. The view is

that the physical world imposes complexity such that

adequate interpretation is impossible outside the context of

tasks. Accordingly, less metric forms of depth represen-

tation such as projective depth and ordinal depth1 might be

adequate for accomplishing these tasks.

Shape reconstruction typically occurs in a context where

different cues such as shading, contour, range data and

motion are available [7]. It has been stated that different

depth cues are suited for recovering different kinds of depth

representation. For instance, motion and stereo cues are

usually used to recover point depth values, whereas texture

and shading cues are more suited for recovering second

order depth properties such as curvatures. There have

indeed been a number of computational works which

exploited the complementary strengths of different depth

cues by using a proper fusion scheme [3,4,10,19]. There has

also been a significant increase in the number of

psychophysical studies that examine fusion strategies

utilized by human observers [3,11,21,25].

Second order shape properties such as concavity are

important for many perceptual tasks. While it is well

known that the recovery of second order shape using

motion cues is sensitive to errors, are there certain types of

second order shapes, which are more amenable to

recovery? What is the robustness of the recovered shapes

with respect to the motion types? If the confidence level of

the shape recovered from motion can be ascertained in

details against various motion-scene configurations, it then

becomes possible to adopt a proper fusion mechanism to

combine the depths recovered from different cues to

produce better results.

0262-8856/$ - see front matter q 2004 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2004.02.005

Image and Vision Computing 22 (2004) 807–817

www.elsevier.com/locate/imavis

* Corresponding author. Tel.: þ65-6874-2990; fax: þ65-6779-1103.

E-mail addresses: [email protected] (L.-F. Cheong), txiang@dcs.

qmul.ac.uk (T. Xiang).1 Ordinal depth representation is a shape representation which only

allows us to order the values of the scene depths.

Page 2: On the distortion of shape recovery from motion

In the past, the sensitivity of depth recovery from

structure from motion (SFM) algorithms have been

investigated from a statistical perspective, in particular,

the effects of noise [1,20,22]. However, given that true

motion estimates cannot be recovered, we expect systematic

distortion also to be present in the recovered structure. Not

many researchers have addressed explicitly the systematic

nature of the depth distortion given some errors in the 3D

motion estimates. Our investigation seeks to clarify the

impact such motion errors have on second order shape

estimates. In addition, we attempt to understand the severity

of the resultant shape distortion under different motion-

scene configurations. Are there generic motion types that

can render shape recovery more robust and reliable? As was

shown in Refs. [6,23] using the iso-distortion framework,

there exists certain dichotomy between forward (perpen-

dicular to the image plane) and lateral (in the image plane)

motion in terms of both 3D motion and point depth

recovery. For instance, useful information like depth order

is preserved under a lateral motion in a small field of view,

even though it is difficult to disambiguate translation and

rotation in such case; while under forward motion, a robust

recovery of 3D structure is much more difficult. Evidently,

not all motions are equal in terms of robust depth recovery

and these results have implications for the kinds of active

vision strategy to be adopted for improving depth estimates.

In this paper, iso-distortion framework [5] is again

employed to analyze the errors in second order shape

recovery. Our main contribution lies in the elucidation of

the impact of errors in 3D motion estimates on second order

shape descriptors such as normal curvatures and shape index

[14]. In particular, our findings show that the second order

shape is recovered with varying degrees of uncertainty

depending on the types of 3D motion executed. Further-

more, we also make clear that different shapes exhibit

different sensitivities in their recovery to errors in 3D

motion estimates. These results flesh out the intuitive

feeling that various SFM algorithms behave differently

under different motion-scene configurations; this in turn

might permit a better fusion strategy with other shape cues.

Experiments are conducted to verify various theoretical

results obtained in this paper.

2. Background

2.1. Local shape representation

Consider the canonical form of a smooth (quadratic)

surface patch:

Z ¼1

2ðkminX2 þ kmaxY2Þ þ d ð1Þ

where kmin; kmax are the two principal curvatures with

kmin , kmax: This represents a canonical case whereby the

observer is fixating straight ahead at the surface patch

located at a distance d unit away, and the X-axis and the Y-

axis are aligned with principal directions (directions of

principal curvatures). Rotating the aforementioned surface

patch around the Z-axis by u in the anti-clockwise direction,

we obtain a more general surface patch whose principal

directions do not coincide with the X –Y frame:

Z ¼1

2ðcos2uðkminX2 þ kmaxY2Þ þ sin2uðkmaxX2

þ kminY2ÞÞ þ cos u sin uðkmin 2 kmaxÞXY þ d ð2Þ

Since the focus of this paper is on the recovery of local

shape, our analysis will primarily center around the fixation

point ðX ¼ Y ¼ 0Þ: The idea is that if the perceptual tasks

require local shape information, the visual system is usually

fixated at that point.

Local representations of curved surfaces include the

classical differential invariants of Gaussian and mean

curvatures which are computed based on the principal

curvatures. A good shape descriptor should correspond to

our intuitive idea of shape: shape is invariant under

translation and rotation, and more importantly, independent

of the scaling operation. Principal curvatures, as well as the

Gaussian and mean curvatures satisfy the former but do not

satisfy the latter condition because they still contain the

information of the amount of curvature. Koenderink [14]

proposed two measures of local shape: shape index ðSÞ and

curvedness ðCÞ as alternatives to the classical differential

shape invariants. S and C are defined, respectively, as

follows:

S ¼2

parctan

kmin þ kmax

kmin 2 kmax

ð3Þ

C ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffik2

min þ k2max

2

sð4Þ

S is a number in the range of ½21;þ1� and obviously scale

invariant and C is a positive number with the unit m21:

Shape index and curvedness provide us with a description of

3D quadratic surfaces in terms of their types of shape and

amount of curvature. Fig. 1 shows the examples of quadratic

surfaces with correspondent shape index values.

When shape is reconstructed based on the motion cue, it

is more appropriate to use shape index as the local

measurement of shape type rather than other differential

invariants. Due to the well-known ambiguity of SFM, the

scale of the recovered objects and the speed of translation

can only be determined up to a common factor. The

recovered scale information is thus meaningless unless

additional information is available. With shape index, we

know that the accuracy with which it can be estimated

depends on the accuracy of the 3D motion estimates.

2.2. General nature of distortion

This section derives the distortion in the recovered shape

given that the motion parameters are imprecise. In the most

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817808

Page 3: On the distortion of shape recovery from motion

general case, if the observer is moving rigidly with respect

to its 3D world with a translation ðU;V ;WÞ and a rotation

ða;b; gÞ; the resulting optical flow ðu; vÞ at an image

location ðx; yÞ can be expressed as the well-known

equation [15]:

u ¼ utrans þ urot ¼ ðx 2 x0ÞW

axy

f2 b

x2

fþ f

!þ gy

v ¼ vtrans þ vrot ¼ ðy 2 y0ÞW

Zþ a

y2

fþ f

!2

bxy

f2 gx

ð5Þ

where ðx0; y0Þ ¼ ðf UW; f V

WÞ is the focus of expansion (FOE),

Z is the depth of a scene point, utrans; vtrans are the horizontal

and vertical components of the flow due to translation, and

urot; vrot the horizontal and vertical components of the flow

due to rotation, respectively.

Since the depth can only be derived up to a scale factor,

we set W ¼ 1 for the case of general motion; the case of

pure lateral motion ðW ¼ 0Þ will be discussed separately

where required. Then the scaled depth of a scene point

recovered can be written as:

Z ¼ðx 2 x0; y 2 y0Þ·n

ðu 2 urot; v 2 vrotÞ·n

where n is a unit vector which specifies a direction.

If there are some errors in the estimation of the intrinsic

or extrinsic parameters, this will in turn cause errors in the

estimation of the scaled depth, and thus a distorted version

of the space will be computed. In this paper, we denote the

estimated parameters with the hat symbol (^) and errors in

the estimated parameters with the subscript e (where the

error of any estimate p is defined as pe ¼ p 2 pÞ: The

estimated depth Z can be readily shown to be related to

the actual depth Z as follows:

Z ¼ Zðx 2 x0; y 2 y0Þ·n

ðx 2 x0; y 2 y0Þ·n þ ðurote; vrote

Þ·nZ

!ð6Þ

From Eq. (6) we can see that Z is obtained from Z through

multiplication by a factor given by the terms inside the

bracket, which we denote by D and call the iso-distortion

factor. The expression for D contains the term n whose

value depends on the scheme we use to recover depth. In the

forthcoming analysis, we will take the ‘epipolar reconstruc-

tion’ scheme [5], i.e. n is along the estimated epipolar

direction with

n ¼ðx 2 x0; y 2 y0Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðx 2 x0Þ2 þ ðy 2 y0Þ

2p ;

based on the intuition that the epipolar direction contains the

strongest translational flow. The detailed statistical and

geometrical reasons for choosing this scheme of recon-

structing shape were explained in Ref. [6].

If we denote, with slight abuse of notation, the

homogeneous co-ordinates of a point P3 by ðX; Y ; Z;WÞ;

and the estimated position P3 by ðX; Y; Z; WÞ; the distortion

transformation f : P3 ! P3 can be written down. Note that

to obtain the estimated X; we use the back-projection given

by xZf¼ D xZ

f¼ DX; similarly, Y ¼ DY : In general, the

transformation from physical to perceptual space belongs to

the family of Cremona transformations [5]. The resulting

transformation f is more complicated than a projective

transformation, and very little statements can be made about

the nature of the distorted shape. Indeed, some readers may

observe that at certain points, the distortion D is not defined.

These points are known as the fundamental elements of the

Cremona transformation and indeed characterize properties

of the transformation. Despite the complex nature of the

distortion, the recovered depths may nevertheless possess

good properties under certain types of generic motions. For

instance, ordinal depth information is preserved under

lateral motion, even though it is difficult to disambiguate the

translation and the rotation in such case [6,23].

Many biological organisms often judge structure by

making lateral motion, although the lateral translations are

usually coupled with certain amount of rotation. For an

artificial active vision system, forward and lateral motions

Fig. 1. Examples of quadratic surfaces with correspondent shape index values. S ¼ ^1 : umbilic shape (spherical cap and cup); S ¼ ^0:5 : cylindrical shapes;

S [ ½21:0;20:5� or S [ ½0:5; 1:0� : elliptic shapes; S [ ½20:5; 0:5� : hyperbolic shapes.

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817 809

Page 4: On the distortion of shape recovery from motion

are often purposefully executed for various tasks such as

depth recovery, which means that the agent executing such

motions is at least aware of the generic type of motion being

executed. Thus it is reasonable to assume the following:

W ¼ W ¼ 0 when lateral motion is executed

U ¼ U ¼ V ¼ V ¼ 0 when forward motion is executed

(

ð7Þ

Furthermore, the rotation around the optical axis is usually

not executed for an active visual system. Thus we can

further assume that g ¼ g ¼ 0:

3. Distortion of shape recovery under generic motions

3.1. Lateral motion

We consider an agent performing a lateral translational

motion ðU;V ; 0Þ; coupled with a rotation ða;b; gÞ: Under the

aforementioned assumption of W ¼ W ¼ g ¼ g ¼ 0; the

estimated epipolar direction would be a fixed direction for

all the feature points with

n ¼ðU; VÞffiffiffiffiffiffiffiffiffiffiffiU2 þ V2

p :

Without loss of generality, we can set the estimated epipolar

direction to be along the positive X-axis direction, so we

have V ¼ 0 (though V is not necessarily zero). We can

easily show that for any other direction, the distortion

expression will have identical form with only changes in the

values of some parameters. Modifying Eq. (6) somewhat to

take into account W ¼ 0; the distortion factor can readily

shown to be:

from which the mapping from a point ðX; Y ;ZÞ in the

physical space to a point in the reconstructed space ðX; Y; ZÞ

can be established.

Consider that a surface sðX;YÞ ¼ ðX;Y ;ZðX;YÞÞ; para-

meterized2 by X and Y ; is transformed under the mapping

to the reconstructed surface sðX; YÞ ¼ ðX; Y; ZðX;YÞÞ; which

is parameterized by X and Y: Using Eq. (8), the perceived

surface can also be parameterized by X and Y as follows:

sðX;YÞ ¼ ðDX;DY ;DZÞ ð9Þ

where D is defined in Eq. (8) and for brevity, ZðX;YÞ has

been written as Z: Given this parameterization,

Mathematica3 can be used to perform algebraic

manipulation to obtain the expressions for quantities such

as kmin and kmax; as well as to visualize the distortion of the

local shape in the following sections.

3.1.1. Curvatures

For the local surface patch given by Eq. (2), we obtain

the perceived principal curvatures kmin and kmax at the

fixation point as follows:

kmin ¼1

2

ðkmin þ kmaxÞU 2 2be

U2

ffiffiffiffiffiQ

U2

s0@

1A

kmax ¼1

2

ðkmin þ kmaxÞU 2 2be

ffiffiffiffiffiQ

U2

s0@

1A

ð10Þ

where Q¼U2ðkmin 2 kmaxÞ2 þ4ða2

e þb2eÞ24Uðkmin 2 kmaxÞ

ðbecos 2uþae sin2uÞ: Notice that the parameter d which

indicates the distance between the surface patch and the

observer does not affect the distorted principal curvature

(and other shape measures based on principal curvatures).

From Eq. (10), it is obvious that Gaussian and mean

curvatures are not preserved under the distortion; even the

signs of principal curvatures are not necessarily preserved.

The principal directions of the perceived surface patch can

be obtained by computing the eigenvectors of the Wein-

garten matrix [8] of the parametrical shape representation. It

was found that generally the principal directions are not

preserved. Only when the principal directions are aligned

with the X- and the Y-axis and ae ¼ 0; they are preserved,

though the directions of the maximum curvature and that of

the minimum curvature may swap.

We are also interested in the distortion of the normal

curvatures, especially the normal curvatures along the

horizontal and vertical directions which are denoted as

ZXX and ZYY ; respectively, and given by:

ZXX ¼ kmin cos2uþ kmax sin2u

ZYY ¼ kmin sin2uþ kmax cos2u

ð11Þ

The distorted normal curvatures along the horizontal and

vertical directions can be obtained as:

ZXX ¼U

UZXX 2

2be

U

ZYY ¼U

UZYY

ð12Þ

Interestingly, ZXX and ZYY are not affected by ae: Eq. (12)

shows that normal curvatures along different directions have

different distortion properties.

D ¼ðU2 þ V2ÞZ

ððUU þ VVÞZ þ UbeðX2 þ Z2Þ2 VaeðY

2 þ Z2Þ þ ðVbe 2 UaeÞXYÞð8Þ

2 Such parametric representation may not be possible for more complex

global shapes.3 Mathematica is a registered trademark of Wolfram Research, Inc.

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817810

Page 5: On the distortion of shape recovery from motion

3.1.2. Shape index

From the expression of the distorted principal curvatures,

the distorted shape index at the fixation point can readily be

obtained by substituting Eq. (10) into Eq. (3):

S ¼2

parctan

ðkmin þ kmaxÞU 2 2be

U

ffiffiffiffiffiQ

U2

r0BBB@

1CCCA ð13Þ

As the expression for S is very complex, we have assorted

some diagrams to graphically illustrate the error in the shape

index estimate with respect to the true shape index. The

curvedness is fixed for each diagram so as to factor out the

influence of curvedness. Fig. 2 shows the distortion obtained

by varying principal directions while fixing all the other

parameters. Fig. 2(a)–(c) show the sensitivity of shape

recovery to the motion estimation errors for three principal

directions: u ¼ 0; u ¼ p4

and u ¼ p2: Fig. 2(d) show the

average sensitivity for different principal directions. Note

that the sudden jump of Se in some of the curves corresponds

to the case where the direction of the maximum curvature

and that of the minimum curvature have swapped. It can

also be seen that the principal direction does affect the

robustness of the shape perception. We can also infer from

Fig. 2 that different shapes have different distortion

properties. The average curve shown in Fig. 2(d) gives us

a rough idea on the overall robustness of the perception of

different shapes: the saddle-like shapes (also known as

‘hyperbolic shapes’ with kmin and kmax having different

signs) are more sensitive to the errors in 3D motion

estimation than the concave and convex shapes (‘elliptic

shapes’ or ‘umbilic shapes’ with kmin and kmax having the

same sign).

Other factors that may have an impact on the sensitivity

of shape perception are studied in Fig. 3. be has a

significant influence on the shape perception. Comparing

Fig. 3(a) with Fig. 2(a), we can see that the error in the

estimated shape index increases with increasing value of

lbel: The sign of be will determine whether the shape index

will be under-estimated or over-estimated for each shape

type (comparing Fig. 3(a) and (b)). On the contrary, ae

seems to have little influence on the recovered shape index,

as shown in Fig. 3(c). This anisotropy between ae and be is

related to the directional anisotropy that exists in the

distortion of the normal curvatures, derived earlier in Eq.

(12). This anisotropy is of course a result of the estimated

translation being in the horizontal direction. Finally, our

analysis seems to support the view that the more curved

surfaces will be perceived with high accuracy (see

Fig. 3(d)).

3.2. Distortion under forward motion

In this section, we assume pure forward translation is

performed. According to Eq. (7), we have U ¼ U ¼ V ¼

V ¼ 0: We also assume g ¼ g ¼ 0 as we did before.

Fig. 2. Distortion of shape index with different principal directions, plotted in terms of Se against Su ¼ 0 for (a), u ¼ p4

for (b) and u ¼ p2

for (c); (d) was

obtained by averaging curves with u ¼ 0 þ n p8

where n ¼ 0; 1;…8: All the other parameters are identical: ae ¼ 0:005; be ¼ 20:02; U ¼ 0:2 and U ¼ 0:18:

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817 811

Page 6: On the distortion of shape recovery from motion

Adopting ‘epipolar reconstruction’ approach, the distortion

factor can be expressed as:

Following the same procedure of deriving the distorted

curvatures as in the lateral motion case, we can obtain the

distorted principal and normal curvature expressions on any

point of the distorted surface. The distorted shape index can

also be computed thereafter. These expressions are very

complex and are thus not presented here. It is noted that at

the fixation point, the distorted local shape measurements

are undefined due to the undefined distortion factor (see

Eq. (14)). Our analysis in Ref [6] shows that the distortion

factor varies wildly around the fixation point, which implies

that the distorted surface patches are also not smooth.

Therefore, the distorted local shape measurements at any

particular point do not make much sense.

3.3. Graphical illustration

To have an idea on how curved surface patches will be

distorted under different generic motions, we use Mathe-

matica to obtain the distorted shapes and graphically display

them. Fig. 4 compares the distortion in the original and

recovered surfaces under lateral and forward motions. We

consider a vertical cylinder (upper row) and a horizontal

cylinder (lower row), given by the equations Z ¼ 12

X2 þ 4

and Z ¼ 12

Y2 þ 4 respectively. For the lateral motion case

and given the parameters stated in the caption of Fig. 4, it is

easy to show that from Eqs. (10) and (12) that at the fixation

point a vertical cylinder will be perceived as a less curved

vertical cylinder and a horizontal cylinder will be perceived

as a saddle-like shape. The principal directions remain

unchanged. Fig. 4(b) and (e) show the full reconstructed

surfaces over the entire cylinders; clearly the distorted

surface patches are smooth and the distortion behaviour at

the fixation point seems to be qualitatively representative of

the global shape distortion even with a large view angle. The

recovered surface patches under forward motion, by

contrary, are not smooth and show large distortion. Even

when the errors in the rotation estimates are much smaller

than those in the case of lateral motion, the perceived

surface patches are obviously not quadrics with major

distortion occurring around the central region of attention.

4. Experiments

In this section, we conducted computational experiments

on computer generated and real images to further verify the

theoretical predictions.

Fig. 3. Other factors that affect the distortion of shape index (see text). be ¼ 20:03 for (a), be ¼ 0:02 for (b) and be ¼ 20:02 for (c) and (d); ae ¼ 20:05 for

(c) and ae ¼ 0:005 for (a), (b) and (d); C ¼ 20 for (d) and C ¼ 10 for (a), (b) and (c). u ¼ 0; U ¼ 0:20 and U ¼ 0:18 for all the diagrams.

D ¼X2 þ Y2

X2 þ Y2 þ aeZðX2Y þ YZ2 þ Y3Þ2 beZðXY2 þ XZ2 þ X3Þð14Þ

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817812

Page 7: On the distortion of shape recovery from motion

4.1. Computer generated image sequences

SOFA4 is a package of nine computer generated image

sequences designed for testing research works in motion

analysis. It includes full ground truth on the motion and

camera parameters. Sequence 1 and 5 (henceforth abbre-

viated as SOFA1 and SOFA5) were chosen for our

experiments, the former depicting a lateral motion and the

latter a forward motion. Both of them have an image

dimension of 256 £ 256 pixels, a focal length of 309 pixels

and a field of view of approximately 458. The focal length

and principal point of the camera were fixed for the whole

sequence. Optical flow was obtained using Lucas and

Kanade’s method [16], with a temporal window of 15

frames. Depth was recovered for frame 9. We assumed all

the intrinsic parameters of the camera were estimated

accurately throughout the experiments and concentrated on

the impact of errors in the extrinsic parameters.

The 3D scene for SOFA1 consisted of a cube resting on a

cylinder (Fig. 5(a)). Clearly the entire feature points on the

top face of the cylinder in SOFA1 (which are delineated in

Fig. 5(a)) lie on a plane. The reconstructed shape of this

plane was used to testify our theoretical predictions in the

lateral motion case. The camera trajectory for SOFA1 was a

circular route on a plane perpendicular to the world Y-axis,

with constant translational parameters ðU;V ;WÞ ¼

ð0:8137; 0:5812; 0Þ and constant rotational parameters

ða;b;gÞ ¼ ð20:0203; 0:0284; 0Þ: For the reasons we dis-

cussed in Section 2, we assume that current SFM algorithms

can estimate W accurately (i.e. W ¼ 0). Given the 3D

motion estimation, depth is estimated as:

Z ¼2fðU; VÞ·n

ðu; vÞ·n 2 ðurot; vrotÞ·nð15Þ

Different SFM algorithms may give different errors in the

estimated 3D motion parameters. Instead of being constraint

by one specific SFM algorithm, let us consider all the

possible configurations in the 3D motion estimation errors.5

The resultant recovered depths are illustrated in Fig. 5 using

3D plot viewed from the side. In particular, Fig. 5(b) shows

Fig. 4. Distortion of cylinders under lateral and forward motion: (a) and (d) are the original vertical and horizontal cylinders, respectively; (b) and (e) are the

distorted vertical and horizontal cylinders under lateral motion respectively, with U ¼ 0:9; U ¼ 1:0; V ¼ 0:5 and V ¼ 0: (c) and (f) are the distorted vertical

and horizontal cylinders under forward motion, respectively. The errors in the estimated rotation are: ae ¼ 0 and be ¼ 0:05 for (b) and (e), and ae ¼ 0 and

be ¼ 0:001 for (c) and (f). The field of view is 1278 for all the diagrams.

4 Courtesy of the Computer Vision Group, Heriot-Watt University

(http://www.cee.hw.ac.uk/~mtc/sofa).

5 For our experiments, if the subspace algorithm proposed by Heeger and

Jepson [12] was used, we obtained the reconstruction shown in Fig. 5(c); if

the algorithm proposed by Ma et al. [17] was adopted, we obtained the

reconstruction shown in Fig. 5(d).

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817 813

Page 8: On the distortion of shape recovery from motion

Fig. 5. Computer generated lateral motion sequence and shapes recovered when different 3D motion estimates were used for reconstruction. (a) SOFA1 frame 9

with the top face of the cylinder delineated; (b) reconstruction with true motion parameters. (c) Reconstruction with ðU; VÞ ¼ ð21; 0Þ and ða; b; gÞ ¼

ð20:03; 0:01; 0Þ: (d) Reconstruction with ðU; VÞ ¼ ð21; 0Þ and ða; b; gÞ ¼ ð20:01; 0:04; 0Þ: (e) Reconstruction with ðU; V ¼ ð22; 0Þ and ða; b; gÞ ¼

ð20:01; 0:04; 0Þ: (f) Reconstruction with ðU; V ¼ ð21; 0Þ and ða; b; gÞ ¼ ð20:01; 0:06; 0Þ.

Fig. 6. Computer generated forward motion sequence and shape recovered when different 3D motion estimates were used for reconstruction. (a) SOFA5 frame

9. (b) Reconstruction with true motion parameters. (c) Reconstruction with ða; b; gÞ ¼ ð20:001;20:001; 0Þ: (d) Reconstruction with ða; b; gÞ ¼

ð0:001; 0:001; 0Þ: (e) Reconstruction with ða; b; gÞ ¼ ð20:001; 0:001; 0Þ: (f) Reconstruction with ða; b; gÞ ¼ ð0:001;20:001; 0Þ:

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817814

Page 9: On the distortion of shape recovery from motion

that, with motion parameters accurately estimated, the plane

(top of the cylinder) remained as a plane, although the noise

in the optical flow estimates made some points ‘run away’

from the plane. Fig. 5(c) depicts the shape recovered when

be . 0 and U , 0: According to Eq. (12), we have ZXX . 0:

It can be seen from Fig. 5(c) that the plane was indeed

reconstructed as a convex surface. Conversely, when be ,

0 and U , 0; concave surfaces were perceived (Fig.

5(d)–(f)). Comparing Fig. 5(d) with Fig. 5(e), we find that

larger U in general resulted in smaller curvature distortion,

whereas the results of Fig. 5(d) with Fig. 5(f) show that large

be resulted in larger curvature distortion. Fig. 5(c)–(f) show

that the reconstructed surfaces were not curved in the Y

direction. All these results confirm to our prediction made in

Section 2.

The SOFA5 sequence was used in the next set of

experiments to verify predictions in the case of forward

motion. The 3D scene for SOFA5 comprised of a pile of four

cylinders stacking upon each other and in front of a frontal-

parallel background (Fig. 6(a)). The camera trajectory

for SOFA5 was parallel to the world Z-axis and

the corresponding translational and rotational parameters

were ðU;V ;WÞ ¼ ð0; 0; 1Þ and ða;b;gÞ ¼ ð0; 0; 0Þ respec-

tively. Current SFM algorithms have no difficulty in

estimating the translation accurately for the SOFA5

sequence [23]. Therefore, the equation for reconstructing

depth for each feature point would be:

Z ¼x2 þ y2

ðu; vÞ·ðx; yÞ2 ðurot; vrotÞ·ðx; yÞ

The resultant recovered depths, given different typical errors

in the 3D motion estimates, are shown in Fig. 6. Fig. 6(b)

depicts the case of no errors in the motion parameters. It can

be seen that the background plane was preserved roughly.

However, when there was small amount of errors in the

rotation estimates, a complicated curved surface with

significant distortion was reconstructed (shown in

Fig. 6(c)). It is in accordance with our prediction that

large depth distortion is expected when forward motion is

performed.

Fig. 7. Real lateral motion sequence and shapes recovered when different 3D motion estimates were used for reconstruction. (a) BASKET frame 9. (b)

Reconstruction with ðU; VÞ ¼ ð21; 0Þ and ða; b; gÞ ¼ ð0:05; 0:05; 0Þ: (c) Reconstruction with ðU; VÞ ¼ ð21; 0Þ and ða; b; gÞ ¼ ð0:05; 0:1; 0Þ: (d) Reconstruction

with ðU; VÞ ¼ ð20:5; 0Þ and ða; b; gÞ ¼ ð0:05; 0:05; 0Þ:

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817 815

Page 10: On the distortion of shape recovery from motion

4.2. Real image sequence

Shape recovery was also performed on a real image

sequence BASKET.6 It was taken by a stationary video

camera on an upturned basket being rotated on an office

chair (Fig. 7(a)). The true intrinsic and extrinsic parameters

are not available. However, we know that the basket was

rotating about a vertical axis which was approximately

parallel to the Y-axis of the camera co-ordinate system and

located along the Z-axis of the camera co-ordinate system.

The equivalent egomotion can thus be expressed as

ðU;V ;WÞ ¼ ðb0Z0; 0; 0Þ and ða;b;gÞ ¼ ð0;2b0; 0Þ where

b0 was the rotational velocity of the basket (b0 . 0 in this

case)and Z0 the distance between the optical center of the

camera and the centroid of the basket. Depth was recovered

for frame 9 using Eq. (15). Notice that since the camera was

not calibrated, we have arbitrarily set the value of the focal

length as f ¼ 300: Fortunately, we have proven in Ref. [6]

that under lateral motion the errors in the intrinsic

parameters would not affect the depth recovery results

qualitatively. These recovered shapes, given different errors

in the 3D motion estimates, were shown in Fig. 7. Fig. 7(b)

depicts the shape recovered when U , 0 and be , 0

(be , 0 since b , 0 and b . 0Þ: Substituting the signs of

these terms into Eq. (12), we obtained ZXX , ZXX ; which

means that the recovered surface should be less convex than

the true shape. Comparing the results of Fig. 7(b) with the

true shape measured by us manually, this is indeed the case.

Furthermore, according to Eq. (12), a smaller be or a larger

U (with the signs of these terms unchanged) would result in

an even less convex shape being recovered. Fig. 7(c) and (d)

clearly support our prediction.

5. Conclusions

This paper presented a geometric investigation on the

reliability of shape recovery from motion cue given errors in

the estimates for motion parameters. Distortion in the local

shape recovered was considered under different generic

motions. Specifically, our distortion model has shown that

the types of motions executed are critical for the accuracy of

the shape recovered. In the case of lateral motion, the

accuracy of curvature estimates exhibits anisotropy with

respect to the estimated translation direction, in that the

surface curvature in the estimated translational direction is

better perceived than that in the orthogonal direction.

Correlatively, the recognition of different shapes (classified

in terms of shape index) exhibits different degrees of

sensitivity to noise. In the case of forward motion, the

reconstructed shape experiences larger distortion compared

with the case of lateral motion and is not visually smooth

any more.

These findings suggest that shape estimates are recovered

with varying degrees of uncertainty depending on the

motion-scene configuration. When designing a visual system

to perform shape recovery, these findings can help us choose

the best motion strategy. In the meantime, if the confidence

level of the shape estimates under various motion-scene

configurations can be ascertained, the robustness of shape

perception can be enhanced by including additional static

cues to restrict the space of possible interpretation. The work

presented in this paper has taken a step towards this direction.

More works need to be done to understand the robustness of

shape recovery under more general configurations. The

investigation should also be extended to the other cues.

Practical and robust fusion algorithms would not be possible

until a full understanding of the robustness of the shape

recovered based on various cues can be achieved.

References

[1] G. Adiv, Inherent ambiguities in recovering 3-D motion and structure

from a noisy flow field, IEEE Trans PAMI 11 (5) (1989) 477–489.

[2] J. Aloimonos, I. Weiss, A. Bandopadhay, Active vision, Int J Comput

Vision 2 (1988) 333–356.

[3] H.H. Bulthoff, H.A. Mallot, Integration of depth modules: stereo and

shading, JOSA-A(5) (1988) 1749–1758.

[4] P. Cavanagh, Reconstructing the third dimension: interactions

between color, texture, motion, binocular disparity, and shape,

CVGIP 37 (2) (1987) 171–195.

[5] L-F. Cheong, K. Ng, Geometry of distorted visual space and Cremona

Transformation, Int J Comput Vision 32 (2) (1999) 195–212.

[6] L.-F. Cheong, T. Xiang, Characterizing depth distortion under

different generic motions, Int J Comput Vision 44 (3) (2001)

199–217.

[7] J.E. Cutting, P.M. Vishton, Perceiving layout and knowing distances:

the integration, relative potency, and contextual use of different

information about depth, in: W. Epstein, S. Rogers (Eds.), Perception

of space and motion, Academic Press, San Diego, 1995.

[8] A. Davies, P. Samuels, An introduction to computational geometry for

curves and surfaces, Clarendon Press, Oxford, 1996.

[9] O.D. Faugeras, Stratification of three-dimensional vision: projective,

affine, and metric representations, J Opt Soc Am A 12 (1995)

465–484.

[10] P.V. Fua, Y.G. Leclerc, Object-centered surface reconstruction:

combining multi-image stereo and shading, IJCV 16 (1) (1995)

35–56.

[11] R.A. Jacobs, Optimal integration of texture and motion cues to depth,

Vision Res 39 (1999) 3621–3629.

[12] D.J. Heeger, A.D. Jepson, Subspace methods for recovering rigid

motion I: algorithm and implementation, Int J Comput Vision 7

(1992) 95–117.

[13] J.J. Koenderink, A.J. van Doorn, Affine structure from motion, J Optic

Soc Am 8 (2) (1991) 377–385.

[14] J.J. Koenderink, A.J. van Doorn, Surface shape and curvature scales,

Image Vision Comput 10 (8) (1992) 557–564.

[15] H.C. Longuet-Higgins, A computer algorithm for reconstruction of a

scene from two projections, Nature 293 (1981) 133–135.

[16] Lucas BD. Generalized image matching by the method of differences.

PhD Dissertation. Carnegie-Mellon University; 1984.

[17] Y. Ma, J. Kosecka, S. Sastry, Linear differential algorithm for motion

recovery: a geometric approach, Int J Comput Vision 36 (1) (2000)

71–89.

6 Courtesy of Dr Andrew Calway of Department of Computer Science,

University of Bristol.

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817816

Page 11: On the distortion of shape recovery from motion

[18] J. Oliensis, A critique of structure-from-motion algorithms, Comput

Vision Image Understand 80 (2000) 172–214.

[19] P.R. Schrater, D. Kersten, How optimal depth cue integration depends

on the task, IJCV 40 (1) (2000) 71–89.

[20] J.I. Thomas, A. Hanson, J. Oliensis, Understanding noise: the critical

role of motion error in scene reconstruction, Proceedings of the

DARPA Image Understanding Workshop, 1993, p. 691–5.

[21] J.S. Tittle, J.F. Norman, V.J. Perotti, F. Phillips, The perception of

scale-dependent and scale-independent surface structure from

binocular disparity, texture, and shading, Perception 26 (1997)

147–166.

[22] J. Weng, T.S. Huang, N. Ahuja, Motion and structure from image

sequences, Springer, Berlin, 1991.

[23] T. Xiang, L-F. Cheong, Understanding the behavior of structure from

motion algorithms: a geometric approach, Int J Comput Vision 51 (2)

(2003) 111–137.

[24] G.S. Young, R. Chellapa, Statistical analysis of inherent ambiguities

in recovering 3-D motion from a noisy flow field, IEEE Trans PAMI

14 (1992) 995–1013.

[25] M.J. Young, M.S. Landy, L.T. Maloney, A perturbation analysis of

depth perception from combinations of texture and motion cues,

Vision Res 33 (1993) 2685–2696.

T. Xiang, L.-F. Cheong / Image and Vision Computing 22 (2004) 807–817 817