A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution....

4
A ROBUST ALGORITHM FOR FUSING NOISY DEPTH ESTIMATES USING STOCHASTIC APPROXIMATION Amit K. Roy Chowdhury, R. Chellappa Center for Automation Research Department of Electrical and Computer Engineering University of Maryland, College Park MD 20742, USA amitrc,chella @cfar.umd.edu ABSTRACT The problem of structure from motion (SFM) is to extract the three-dimensional model of a moving scene from a sequence of images. Most of the algorithms which work by fusing the two-frame depth estimates (observations) assume an under- lying statistical model for the observations and do not eval- uate the quality of the individual observations. However, in real scenarios, it is often difficult to justify the statistical as- sumptions. Also, outliers are present in any observation se- quence and need to be identified and removed from the fu- sion algorithm. In this paper, we present a recursive fusion algorithm using Robbins-Monro stochastic approximation (RMSA) which takes care of both these problems to provide an estimate of the real depth of the scene point. The estimate converges to the true value asymptotically. We also propose a method to evaluate the importance of the successive ob- servations by computing the Fisher information (FI) recur- sively. Though we apply our algorithm in the SFM problem by modeling of human face, it can be easily adopted to other data fusion applications. 1. INTRODUCTION The problem of structure from motion (SFM) is to extract the three-dimensional model of a moving scene from a se- quence of images. Traditional SFM algorithms [1], [2] re- cover a 3D scene structure from two images. However, these algorithms often produce inaccurate reconstructions of the scene, mainly due to incorrect estimation of camera motion. Recently techniques have been developed that use multiple images for scene reconstruction, achieving greater robust- ness and accuracy by fusing the two-frame estimates [3], [4], [5], [6]. Prepared through collaborative participation in the Advanced Sensors Consortium (ASC) sponsored by the U.S. Army Research Laboratory under the Federated Laboratory Program, Cooperative Agreement DAAL01-96- 2-0001. This paper describes a new data fusion algorithm applied to multi-frame structure from motion (MFSFM) using stoch- astic approximation (SA). The method can be easily extended to other data fusion applications also. We assume that the 2- frame estimates (observations) are available from a suitable 2-frame SFM algorithm. 1 The correspondence problem is not addressed in this paper. We propose a recursive strategy for estimating the un- known true depth given the observations. Our observation model assumes additive noise on the true value but does not assume any particular distribution of the noise process. We also take care of eliminating outliers in the observation se- quence by choosing an appropriate cost function. Our algo- rithm uses the RMSA technique. We show that the estimates converge to the true value asymptotically. We also propose a method for evaluating the importance of the successive ob- servations (i.e. the number of frames to consider) by evalu- ating the Fisher Information (FI) recursively. The results of our algorithm are demonstrated by applying them on image sequences of a human face. 2. OVERVIEW OF THE ALGORITHM 2.1. The Observation Model Let represent the depth value obtained by the two-frame SFM algorithm from the and -th frame and , the -th position in that vector. We assume a linear observation model (1) where is a noise process with unknown distribution. is assumed diagonal [4] implying that the depth at every point 1 The particular two-frame algorithm chosen here was the one described in [2] because of its speed of computation, but our fusion algorithm is not bound by this particular method.

Transcript of A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution....

Page 1: A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution. is assumed diagonal [4] implying that the depth at every point 1The particular

A ROBUST ALGORITHM FOR FUSING NOISY DEPTH ESTIMATES USING STOCHASTICAPPROXIMATION

Amit K. Roy Chowdhury, R. Chellappa

Center for Automation ResearchDepartment of Electrical and Computer Engineering

University of Maryland, College ParkMD 20742, USA�

amitrc,chella � @cfar.umd.edu

ABSTRACT

The problem of structure from motion (SFM) is to extract thethree-dimensional model of a moving scene from a sequenceof images. Most of the algorithms which work by fusing thetwo-frame depth estimates (observations) assume an under-lying statistical model for the observations and do not eval-uate the quality of the individual observations. However, inreal scenarios, it is often difficult to justify the statistical as-sumptions. Also, outliers are present in any observation se-quence and need to be identified and removed from the fu-sion algorithm. In this paper, we present a recursive fusionalgorithm using Robbins-Monro stochastic approximation(RMSA) which takes care of both these problems to providean estimate of the real depth of the scene point. The estimateconverges to the true value asymptotically. We also proposea method to evaluate the importance of the successive ob-servations by computing the Fisher information (FI) recur-sively. Though we apply our algorithm in the SFM problemby modeling of human face, it can be easily adopted to otherdata fusion applications.

1. INTRODUCTION

The problem of structure from motion (SFM) is to extractthe three-dimensional model of a moving scene from a se-quence of images. Traditional SFM algorithms [1], [2] re-cover a 3D scene structure from two images. However, thesealgorithms often produce inaccurate reconstructions of thescene, mainly due to incorrect estimation of camera motion.Recently techniques have been developed that use multipleimages for scene reconstruction, achieving greater robust-ness and accuracy by fusing the two-frame estimates [3], [4],[5], [6].

Prepared through collaborative participation in the Advanced SensorsConsortium (ASC) sponsored by the U.S. Army Research Laboratory underthe Federated Laboratory Program, Cooperative Agreement DAAL01-96-2-0001.

This paper describes a new data fusion algorithm appliedto multi-frame structure from motion (MFSFM) using stoch-astic approximation (SA). The method can be easily extendedto other data fusion applications also. We assume that the 2-frame estimates (observations) are available from a suitable2-frame SFM algorithm. 1 The correspondence problem isnot addressed in this paper.

We propose a recursive strategy for estimating the un-known true depth given the observations. Our observationmodel assumes additive noise on the true value but does notassume any particular distribution of the noise process. Wealso take care of eliminating outliers in the observation se-quence by choosing an appropriate cost function. Our algo-rithm uses the RMSA technique. We show that the estimatesconverge to the true value asymptotically. We also propose amethod for evaluating the importance of the successive ob-servations (i.e. the number of frames to consider) by evalu-ating the Fisher Information (FI) recursively. The results ofour algorithm are demonstrated by applying them on imagesequences of a human face.

2. OVERVIEW OF THE ALGORITHM

2.1. The Observation Model

Let ��� represent the depth value obtained by the two-frameSFM algorithm from the � and ����� � -th frame and � � ���� , the� -th position in that vector. We assume a linear observationmodel ���������������� �������� � � �"! (1)

where � � is a noise process with unknown distribution. � isassumed diagonal [4] implying that the depth at every point

1The particular two-frame algorithm chosen here was the one describedin [2] because of its speed of computation, but our fusion algorithm is notbound by this particular method.

Page 2: A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution. is assumed diagonal [4] implying that the depth at every point 1The particular

0 2 4 6 8

x 104

−100

0

100

0 2 4 6 8

x 104

0

1

2x 10

4

0 2 4 6 8

x 104

0

2

4x 10

6

0 2 4 6 8

x 104

0

5

10x 10

8

0 2 4 6 8

x 104

0

1

2x 10

11

0 2 4 6 8

x 104

0

2

4x 10

13

0 2 4 6 8

x 104

−100

0

100

0 2 4 6 8

x 104

0

1

2x 10

4

0 2 4 6 8

x 104

−2

0

2x 10

6

0 2 4 6 8

x 104

−2

0

2x 10

8

Fig. 1. The top six figures plot the estimates of the first sixmoments of the observation vector and the bottom four fig-ures plot the first four cumulants. The horizontal axis repre-sents the pixel number. The first column represents the oddcentral moments/cumulants and the second column the evenones.

is treated independently. Thus, we can write, for every � ,#�$ ��%&�'� $ � ���()� �*� � �+! (2)

If % and � $are Gaussian random variables, the observation

vector �,�.- #0/ ��� � � � #�132, given %4�65 , is also Gaussian.

For Gaussian random variables, all odd central moments areidentically zero (from symmetry) and all cumulants of ordergreater than two are zero. However, as seen from Fig. 1, theobservations need not follow a normal distribution.

2.2. The Cost Function

The form of our observation model suggests solution by theclassical linear regression model, whereby we try to mini-mize the mean square error 7 1$98 / � #�$�: %;�=< which yields thesolution that the optimal % , denoted by %;>?� 7 1$98 / � #�$ �A@)! .However, sample means are sensitive to influential values.An analysis of the depth values across frames for any pixelshows that there are some values which can be characterizedas outliers and should not be considered while fusing the es-timates (see Fig. 2).

It is a well-known fact that the median is less sensitiveto outlying data points than the mean. The mean of a sam-ple of size � has breakdown �@�� since by changing just onedata value we can force the sample mean to have any valuewhatsoever. The sample median has breakdown 50% (thebest possible value), reflecting the fact that it is less sensi-tive to individual values. The least squares regression esti-mator inherits the sensitivity of the mean and has breakdown�@�� , whereas the least median of squares estimator (LMS)has breakdown roughly 50%. Thus the cost function we try

0 10 20 30 40 50−50

0

50

100

150

200

250(a)

0 10 20 30 40 50−40

−20

0

20

40

60(b)

0 10 20 30 40 50−40

−20

0

20

40

60

80(c)

0 10 20 30 40 50−40

−20

0

20

40

60(d)

Fig. 2. A plot of the depth values across 50 frames for fourrandomly chosen points. It can be seen that there are isolatedoutliers in all the four cases.

to optimize is% > ��B�CADFEHG IJLKMHN # �POQ��� #�$�: %;� <SR (3)

The disadvantage of this method is that we no longer have aclosed form solution for the optimal estimate as we had forthe mean-square error criterion.

3. A RECURSIVE ALGORITHM USINGSTOCHASTIC APPROXIMATION

3.1. The Robbins-Monro Algorithm

The Robbins-Monro stochastic approximation algorithm is astochastic search technique for finding the root T > to UV��T��W�X

based on noisy measurements of U0�T�� , i.e. Y�Z��T��[�\U0��T�� �N Z���T��S�"]��()� � �*� �+! , where N Z���T�� is assumed to be the noiseterm and ! is the number of observations. The RMSA al-gorithm obtains the estimate by the following recursion,^T�Z _ / � ^T�Z : O`ZaY�Z�� ^T�Z��b� (4)

where O Z is an appropriately chosen sequence. Details ofthe algorithm can be found in [7], [8]. We will outline themethod for obtaining the solution for our specific problem.Suppose that c�d&�ef� is the unknown distribution of a sequenceof observations g�h��=g / � �*� � and we are interested in findingthe root of the equation U0�T��&�4c�d&�T�� : X �ji�� X

, i.e. themedian of the distribution. For this problem, the Robbins-Monro (RM) recursion is as follows [8]:^T�Z _ / � ^T�Z : OQZ���k ZQ� ^T�Z�� : X �ji)� (5)

where k Z � ^T Z �l�nm if g Zpo ^T ZXotherwise

(6)

Page 3: A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution. is assumed diagonal [4] implying that the depth at every point 1The particular

The choice of the gain sequence O Z is determined by the con-vergence properties of the algorithm [9], [8]. 2

The sequence in consideration in our case is g $ �%f�q�� # $ : %;�=< . The minimization is carried out over a predeter-mined search set r and the number of frames is determinedby analyzing the Fisher information of the observations. Sincethe depth observations s #)$At

are the result of a 2-frame SFMalgorithm, they are corrupted by noise whose distribution isunknown in general. However, since RMSA solves for theestimate in the situation where the distribution of the obser-vations is unknown, it is robust enough to deal with our par-ticular problem.

3.2. Convergence Properties of the Algorithm

Given the sequence of observations s g $ ��%;� t , we wish to ob-tain the median of the sequence using RMSA for the particu-lar value of the parameter % . The estimate of the median ob-tained by the RMSA recursion is strongly consistent. Also,the error in the estimate converges in distribution to a normalwith zero mean and suitable covariance matrix.

The median is the solution of the equation U0�%f�F��c d �%f� :X �ji . We choose the gain sequence O Z �uOQ@v��]w�x �=y��+O{z X,

such that 7 Z 8�|Z 8 h O`Z}�u~ and 7 Z 8�|Z 8 h OQ<Zp� ~ , (e.g. �@�� �� o ). Also from (4) and (5), Y�ZQ� ^%fZ�����k ZQ� ^%fZ�� : X �ji .Then � - Y�Zv� ^%;Z���� ^%fZ 2 � � -jk Z�� ^%;Z���� ^%;Z 2f: X �ji� � -��;�Pg Zpo ^% Z � 20: X �ji� �{��g�Z o ^%;Z)� : X �ji� c�d&� ^% Z � : X � i���UV� ^% Z �S�where � is the indicator function and

�represents the expec-

tation operator. From the above we see that N Z is zero mean.With these conditions, along with a bound on the varianceof the noise, it can be shown that

^% Z}� % > almost surely as] � ~ [8]. Thus the estimate is strongly consistent. Also,it can be shown that under proper regularity conditions,] yQ�+< � ^% Z : % > � ��� � X �+�w�in distribution, as ] � ~ [8], where � is suitable covari-ance matrix, which depends on the Jacobian of U0�%f� and OQZ .This implies that � ^%fZ : % > � decays at the rate of ];��yQ�+< andthe rate of convergence of

^% to % is maximized at � �� .

3.3. Estimating the Fisher Information

We evaluate the importance of the consecutive observationsby estimating the Fisher information [10]. Given the obser-vations denoted by � , the Fisher information matrix is� ��T��3� �}� -���� ��� I;��� � �����A�=����� ��� I��� � �����A�=��� 2

(7)

2We used the commonly chosen gain sequence � �F�����  S¡ ¢¤£�¥� S¦§ ¨©=ª .

where T is the parameter to be estimated given the observa-tions, 3 We estimate the Fisher information using simultane-ous perturbation [8] for the gradient approximation and av-eraging for the expectation operation [11]. For the observa-tion model Y«�¬Tw�'gq�Ag,­¬� d ��e;� , where g is a randomvariable with a density � d , we can write## T � ® D[� ¯}��°��±� ## T � ® D[� d{�° : T��� ##a² � ® DF� d � ² � #a²# T � ² �x° : T� : � d � ² � # � dp� ² �#a² �The estimate of the gradient of °Q�T�� with respect to Tp³H´¶µ :

^UV��T��F� °Q�T?��·�� : °v��T : ·��� ¸¹º · � //...·�� /µ

»½¼¾ (8)

where ·¿�À��· / � � � �*�+· µ � and the components of · are in-dependent Bernoulli random variables. The steps in com-puting the Fisher information are:Step 1 Given

^T Z , generate a set of ] pseudo measurementsaccording to the empirical distribution of the observations.Denote these by e µ�Á= J)Ã"Ä ��]0� . Calculate the gradient accord-ing to (8). It may be necessary to average several gradientestimates with independent values of · . Compute the termwithin the expectation operator in the definition of Fisher in-formation (7).Step 2 Repeat Step 1 a large number of times, say � . Aver-age the estimates obtained. This is the estimate of the Fisherinformation,

^c Z � ^T Z � .We can evaluate the relative importance of the observa-

tions by looking at increase in the Fisher information (seeFig. 3).

4. RESULTS AND ANALYSIS

We applied our algorithm for 3D modeling of human facesfrom 2D images. Given a sequence of images, we used thetwo frame algorithm described in [2] to obtain the depth map.In this method, a fast partial search is used to compute themotion and structure. The least squares error of the systemis computed using Fourier techniques and the focus of ex-pansion is estimated in ÅÆ� � < �*® D � � operations for a �ÈÇ� flow field. The two-frame depths were then fused by themethod described above. A 3D model was created by inter-polating the values at the pixels at which the depth was notobtained. From this model, we synthesized views which are

3 ÉËÊ represents expectation with respect to Ì and Í Ê represents the gra-dient with respect to Ì .

Page 4: A ROBUST ALGORITHM FOR FUSING NOISY DEPTH …(1) where is a noise process with unknown distribution. is assumed diagonal [4] implying that the depth at every point 1The particular

0 2 4 6 8 10 12 140

2

4

6

0 2 4 6 8 10 12 140

5

10

0 2 4 6 8 10 12 140

2

4

6

0 2 4 6 8 10 12 140

2

4

6

Fig. 3. The figure shows the variation of the Fisher informa-tion (FI) over increasing frames.

Fig. 4. The first two columns show the first and last framesused to compute the depth. The last two columns repre-sent views from camera positions not part of the original se-quence.

not part of the original image sequence (Fig. 4). To illustratethe point that fusion improves upon the individual observa-tions, we plot the two frame and fused depth maps in Fig.5.

5. CONCLUSION

In this paper we have presented a recursive algorithm for fus-ing the two-frame depth estimates over multiple frames inthe presence of noisy observations where we do not knowthe distribution of the noise process. We have also shownhow to deal with outliers present in the observation sequence.

Fig. 5. The first two columns show the first and last framesused to compute the depth. The third column shows thedepth map from two frames and the last figure representsfused depth map.

We have demonstrated the optimality of the method by show-ing that the estimate is strongly consistent and asymptoti-cally normal. We have also presented a method for comput-ing the Fisher information using stochastic gradient and ap-plied it to compute the number of frames to consider for thefusion algorithm. The work was applied to the modeling ofhuman faces and the results have been presented.

6. REFERENCES

[1] J. Oliensis, “A critique of structure from motion algo-rithms,” NECI TR, 1997.

[2] S. Shridhar, “Extracting structure from optical flow us-ing fast error search technique,” CfAR Technical Re-port, University of Maryland, CAR-TR-893, 1998.

[3] T.J. Broida and R. Chellappa, “Estimating the kine-matics and structure of a rigid object from a sequenceof monocular images,” IEEE Trans. on Pattern Analy-sis and Machine Intelligence, vol. 13(6), pp. 497–513,1991.

[4] T. Kanade L. Matthies and R. Szeliski, “Kalman fil-tering algorithms for estimating depth from image se-quences,” International Journal of Computer Vision,vol. 3, pp. 209–236, 1989.

[5] R.Szeliski and S.B.Kang, “Recovering 3d shapeand motion from image streams using non-linear leastsquares,” Journal of Visual Computation and ImageRepresentation, vol. 5(1), pp. 10–28, 1994.

[6] J.Inigo Thomas and J. Oliensis, “Dealing with noisein multiframe structure from motion,” Computer Vi-sion and Image Understanding, vol. 76(2), pp. 109–124, 1999.

[7] H. Robbins and S. Monro, “A stochastic approxima-tion method,” Annals of Mathematical Statistics, vol.22, pp. 400–407, 1951.

[8] J.C.Spall, Introduction to Stochastic Search and Opti-mization, Wiley, 2000.

[9] M.Metivier A. Benveniste and P. Priouret, AdaptiveAlgorithms and Stochastic Approximations, Springer-Verlag, 1987.

[10] R.E. Blahut J.A.O’Sullivan and D.L. Snyder, “Infor-mation theoretic image formation,” IEEE Trans. onInformation Theory, vol. 44(6), 1998.

[11] J.C.Spall, “Resampling-based calculation of the infor-mation matrix for general identification problems,” inProc. of the American Control Conf., PA, 1998.