ast - users.wfu.eduusers.wfu.edu/plemmons/papers/pd.pdf · ast Algorithms for Phase Div ersit...

12

Transcript of ast - users.wfu.eduusers.wfu.edu/plemmons/papers/pd.pdf · ast Algorithms for Phase Div ersit...

Fast Algorithms forPhase Diversity-Based Blind DeconvolutionCurtis R. Vogela, Tony Chanb and Robert Plemmonsca Department of Mathematical SciencesMontana State UniversityBozeman, MT 59717-0240 USAb Department of MathematicsUCLALos Angeles, CA 90095-1555 USAc Department of Mathematics and Computer SciencesWake Forest UniversityWinston-Salem, NC 27109 USAABSTRACTPhase diversity is a technique for obtaining estimates of both the object and the phase, by exploiting the simulta-neous collection of two (or more) short-exposure optical images, one of which has been formed by further blurringthe conventional image in some known fashion. This paper concerns a fast computational algorithm based upon aregularized variant of the Gauss-Newton optimization method for phase diversity-based estimation when a Gaus-sian likelihood �t-to-data criterion is applied. Simulation studies are provided to demonstrate that the method isremarkably robust and numerically e�cient.Keywords: phase diversity, blind deconvolution, phase retrieval, quasi-Newton methods1. INTRODUCTIONPhase diversity-based blind deconvolution is a technique for obtaining estimates of both the object and the phasein optical image deblurring. It involves the simultaneous collection of two (or more) short-exposure images. One ofthese is the object that has been blurred by unknown aberrations and the other is collected in a separate channelby blurring the �rst image by a known amount, e.g., using a beam splitter and an out-of-focus lens. Using thesetwo images as data, one can set up a mathematical optimization problem (typically using a maximum likelihoodformulation) for recovering the object as well as the phase aberration. The mathematics of this recovery processwas �rst described by Gonsalves.1 Later, Paxman et al2 extended Gonsalves' results, allowing for more than twodiversity measurements and for non-Gaussian likelihood functions. The work presented here aims at improving thenumerical e�ciency of the restoration procedure when a Gaussian likelihood, or least squares, �t-to-data criterionis used. A variant of the Gauss-Newton method for nonlinear least squares3 is introduced to solve the optimizationproblem. Unlike �rst order (gradient-based) methods previously employed,2 this scheme makes direct use of secondorder (Hessian) information, thereby achieving much more rapid convergence rates. When combined with appropriatestabilization, or regularization, the method is remarkably robust, providing convergence even for poor initial guesseswithout the aid of a globalization technique, e.g., a line search, to guarantee convergence.This paper is organized as follows. Section 2 begins with a mathematical description of image formation in thecontext of phase diversity. We assume spatial translation invariance, and we assume that light emanating fromthe object is incoherent.4 Discussion of the mathematical model for phase diversity-based blind deconvolution isfollowed by the formulation of a mathematical optimization problem whose solution yields both the unknown phaseOther author information: (Send correspondence to Curtis R. Vogel)Curtis R. Vogel: e-mail: vogelmath.montana.edu; world wide web url://www.math.montana.edu/�vogelTony Chan: e-mail: [email protected]; world wide web url://www.math.ucla.edu/�chanRobert Plemmons: e-mail: [email protected]; world wide web url://www.mthcsc.wfu.edu/�plemmons1

aberration and the unknown object. The cost functional to be minimized is of regularized least squares type, takingthe form J [�; f ] = Jdata[�; f ] + 2 jjf jj2 + �Jreg [�]: (1)Here � and f represent the phase and the object, respectively, and and � are positive parameters which quantifythe trade-o� between goodness of �t to the data and stability. The �rst term on the right hand side of (1) is aleast squares �t-to-data term; the second term provides stability with respect to perturbations in the object; and thethird term provides stability with respect to perturbations in the phase. Stabilization provides several advantages:It dampens spurious oscillations in the phase and the object; and it helps \convexify" the optimization problem,getting rid of spurious small-scale minima of the cost functional. A solution can then be computed in a fast, robustmanner using local optimization techniques like Newton's method.We select an object stabilization term jjf jj2=2 which is quadratic. This, combined with the fact that the �t-to-data term is quadratic in the object, allows us to formulate a reduced cost functional1,2 whose only unknown isthe phase �. We select a parameterization of the phase which is based on prior knowledge of its statistics. Thisparameterization leads to a natural choice of the phase regularization functional Jreg [�] in equation (1).Section 3 addresses the computation of a minimizer for the reduced cost functional. We present a variant of thewell-known Gauss-Newton method3 for nonlinear least squares minimization. Results of a numerical simulation arepresented in section 4. The mathematical model used in this simulation is based on the hardware con�guration fora 3.5 meter telescope at the U.S. Air Force Star�re Optical Range (SOR) in New Mexico. See Ellerbroek et al5 fordetails. The functional (1) is minimized using three di�erent techniques: (i) the Gauss-Newton variant; (ii) standardNewton's method; and (iii) a standard method known as BFGS.3 A comparison shows that for moderately highsolution accuracy, the �rst method is superior when computational e�ort is measured in terms of iteration count.Finally, in section 5 we present a summary and a discussion of future directions of our work.2. MATHEMATICAL BACKGROUND2.1. Mathematical ModelThe kth diversity image is given by dk = s[�+ �k] ? f + �k; k = 1; : : : ;K; (2)where �k represents noise in the data, f is the unknown object, s is the point spread function (PSF), � is the unknownphase function, �k is the kth phase diversity function, and ? denotes convolution product,(s ? f)(x) = Z ZIR2 s(x� y) f(y) dy; x 2 IR2: (3)Assuming that light emanating from the object is incoherent, the dependence of the PSF on the phase is given bys[�] = jF(pe{�)j2; (4)where p denotes the pupil, or aperture, function, { = p�1, and F denotes the 2-D Fourier transform,(Fh)(y) = Z ZIR2 h(x)e�{2� x�y dx; y 2 IR2: (5)The pupil function p = p(x1; x2) is determined by the extent of the telescope's primary mirror.In atmospheric optics,6 the phase �(x1; x2) quanti�es the deviation of the wave front from a reference planarwave front. This deviation is caused by variations in the index of refraction (wave speed) along light ray paths, andis strongly dependent on air temperature. Because of turbulence, the phase varies with time and position in spaceand is often modeled as a stochastic process.Additional changes in the phase � can occur after the light is collected by the primary mirror, e.g., when adaptiveoptics are applied. This involves mechanical corrections obtained with a deformable mirror to restore � to planarity.By placing beam splitters in the light path and modifying the phase di�erently in each of the resulting paths, one2

can obtain more independent data. The phase diversity functions �k represent these deliberate phase modi�cationsapplied after light is collected by the primary mirror. Easiest to implement is defocus blur, modeled by a quadratic�k(x1; x2) = bk (x21 + x22); (6)where the parameters bk are determined by defocus lengths. In practice, the number of diversity images is oftenquite small, e.g., K = 2 in the numerical simulations to follow. In addition, one of the images, which we will denoteusing index k = 1, is obtained with no deliberate phase distortion, i.e., �1 = 0 in (2).2.2. Cost FunctionalsTo estimate phase � from phase diversity data (2), we minimize the regularized least squares cost functionalJ = J [�; f ] = 12 KXk=1 Z ZIR2 [(sk ? f)(x)� dk(x)]2 dx!+ 2 Z ZIR2 f(x)2 dx+ �Jreg [�]; (7)where sk = s[�+ �k]. Here Jreg[�] is a regularization functional, whose purpose is to establish stability with respectto perturbations in �. Similarly, the term R RIR2 f(x)2 dx establishes stability with respect to perturbations in f . and � are positive regularization parameters, which quantify the tradeo� between goodness of �t to the data andstability.Note that J is quadratic in the object f . This enables us to solve for f by �xing � and minimizing with respectto f . Let upper case letters denote Fourier transforms and hold � constant. Using the Convolution Theorem andthe fact that the Fourier transform is norm preserving, one obtains from (7)J = J [F ] = 12 Z ZIR2 " KXk=1 jSk(y)F (y) �Dk(y)j2 + jF (y)j2# dy + constant: (8)This has as its minimizer F (y) = PKk=1 S�k(y)Dk(y) +PKk=1 jSk(y)j2 ; (9)where the superscript � denotes complex conjugate. The estimated object is thenf = F�1(F ) = Z ZIR2 F (y)e{2� x�y dy: (10)Note that when K = 1, (9)-(10) give the inverse Wiener �lter estimate for the solution of the deconvolution problems ? f = d.Substituting (9) back into (7), we obtain the reduced cost functionalJ = J [�] = 12 Z ZIR2 t[�](y) dy + �Jreg [�]; (11)where, suppressing the dependence on y, the integrandt[�] = KXk=1 jDkj2 � jPKk=1D�kSk[�]j2Q[�] (12)= PKk=1Pk�1j=1 jDkSj [�]�DjSk[�]j2 + PKk=1 jDkj2Q[�] ; (13)with Q[�] = + KXk=1 jSk[�]j2: (14)Derivation of (12) from (7) is very similar to the derivation in Paxman et al.2 See Appendix A for veri�cation ofthe equality of (12) and (13). 3

2.3. Parameterization of PhaseBecause of atmospheric turbulence, the phase varies with both time and position in space.6 Adaptive optics systemsuse deformable mirrors to correct for these phase variations. Errors in this correction process arise from a varietyof sources, e.g., errors in the measurement of phase, inability of mirror to conform exactly to the phase shape, andlag time between phase measurement and mirror deformation. The phase aberrations resulting from an adaptiveoptics system are often modeled as realizations of a stochastic process. Assuming that this process is second orderstationary with zero mean, it is characterized by its covariance function,A(x;y) = E(�(x; �); �(y; �)); (15)where E denotes the expectation (temporal ensemble averaging) operator. This is a symmetric, positive function.Under certain mild conditions on A, the the associated covariance operatorAf(x) = Z ZIR2 A(x;y) f(y) dy (16)is compact and self-adjoint. Hence it has a sequence of eigenvalues �j which decay to zero and correspondingeigenvectors �j(x) which form an orthonormal basis for L2(IR2). This eigendecomposition can be formally expressedas A = V diag(�j)V�;where V : `2 ! L2(IR2) is given by Vc = 1Xj=1 cj�j(x); c = (c1; c2; : : :) 2 `2:This provides a means for generating simulated phase functions having the desired covariance structure. Letting wdenote a simulated white noise vector, take�(x) = A1=2w = V diag(�1=2j )V�w =Xj �1=2j hw; �ji �j(x); (17)where pointed brackets denote the complex inner product on L2(IR2),hg; hi = Z ZIR2 g(x)h�(x) dx: (18)This also provides a natural regularization, or stabilization, functionalJreg [�] = 12 hA�1�; �i = 12Xj jcj j2�j ; (19)where the cj 's are the coe�cients in the eigenvector expansion of �,�(x) = mXj=1 cj �j(x): (20)3. NUMERICAL OPTIMIZATION3.1. Gradient and Hessian ComputationsNewton's method for the minimization of (11) takes the form of the iteration�k+1 = �k �H [�k]�1g[�k]; k = 0; 1; : : : ; (21)where g[�] is the gradient of J = J [�], H [�] denotes the Hessian, and �0 is an initial guess. In Appendix B, we derivefrom (11)-(12) the following expression for the gradient of J ,g[�] = �2 KXk=1 Imag(H�k [�]F(Real(hk[�]F�1Vk[�]) )) + � A�1�; (22)4

where A is the covariance operator for the phase, cf., (16), Real(z) and Imag(z) denote real and imaginary parts ofa complex number z, respectively, andHk[�] = pe{(�+�k); hk[�] = F�1(Hk [�]); (23)sk[�] = jhk[�]j2; Sk[�] = F(sk[�]); (24)Vk[�] = F �[�]Dk � jF [�]j2Sk[�]; (25)with F [�] the Fourier transform of the estimated object given in (9) and Dk the Fourier transform of the data. Withthe exception of the Real(�) in equation (22), when = 0, this reduces to formula (22)-(23) in Paxman et al.2If the phase is represented by a linear combination (20), then by the chain rule the corresponding discrete gradientis a vector in IRm with entries [g]j = hg; �ji; j = 1; : : : ;m;and the (discrete) Hessian is an m�m matrix with ijth entryHij = hH [�]�j ; �ii; 1 � i; j � m: (26)The Hessian entries an be approximately computed from (26) using �nite di�erences of the gradient, e.g.,H [�]�j � g[�+ ��j ]� g[�]�for � relatively small. When combined with iteration (21), this approach is known as the �nite di�erence Newton'smethod.3 When m is large, this approach is not practical. A commonly used alternative is the BFGS method.3Requiring only values of the gradient, BFGS simultaneously constructs a sequence of approximate minimizers �k anda sequence of symmetric positive de�nite (SPD) approximations to the Hessian or its inverse. While this method canbe rigorously proven to be asymptotically superlinearly convergent, practical experience has shown its performanceto be disappointing for highly ill-conditioned problems3 like the one considered here.Due to its robustness and good convergence properties, the method of choice for general nonlinear least squaresproblems is Levenberg-Marquardt.3 Levenberg-Marquardt consists of a Gauss-Newton approximation to the leastsquares Hessian, combined with a trust region globalization scheme. The key idea is that the integral of the �rstterm in (13) can be written as a sum of functionals of \least squares" form12 jjr[�]jj2 = 12 hr[�]; r[�]i:The Gauss-Newton approximation HGN to the corresponding least squares Hessian is characterized byhHGN [�]�; i = hr0[�]�; r0[�] i;where prime denotes di�erentiation with respect to �. HGN is clearly symmetric and positive semide�nite. Withthe addition contribution �A�1 obtained by twice di�erentiating the regularization term in (19), one obtains anapproximation HGN + �A�1 which is SPD. Even when the true (not necessarily SPD) Hessian is available, an SPDapproximation like this may be preferable. While the approximation will result in slower convergence near the exactminimizer, it is likely to be more robust in the sense that it yields convergent iterates for a much wider range ofinitial guesses.In Appendix C, we derive a variant of the Gauss-Newton approximation which is based on (13). We do not derivean explicit formula for the matrix HGN [�]. Instead, we compute the action of this matrix on a given direction vector� 2 L2(IR2) as follows: First, for k = 1; : : : ;K and j = 1; : : : ; k � 1, compute~Ujk = ~DjF(Imag[h�kF�1(Hk�)])� ~DkF(Imag[h�jF�1(Hj�)]); (27)and then take HGN [�]� = 4 KXk=1 k�1Xj=1 Imag[H�jF(hjF�1( ~D�k ~Ujk))�H�kF(hkF�1( ~D�j ~Ujk))] + �A�1�: (28)5

00.5

1

0

0.5

1−4

−2

0

2

x1

Phase φ(x,y)

x2

00.5

1

0

0.5

10

2

4

6

x1

Phase Diversity θ(x,y)

x2

00.5

1

0

0.5

10

50

100

150

x1

PSF s1 = s[φ]

x2

00.5

1

0

0.5

10

10

20

x1

PSF s2 = s[φ+θ]

x2Figure 1. Phase, phase diversity, and corresponding PSF's.Given a phase representation (20), one can in principle compute an m � m Gauss-Newton-like matrix HGN in amanner analogous to (26). Adding the contribution from the regularization (19), one obtains the (approximate)Hessian matrix H =HGN + �A�1;where now A = diag(�1; : : : ; �m). If m is relatively small, H can be computed explicitly and inverted. If m is large,linear systems H�c = �g can be solved using conjugate gradient (CG) iteration. The cost of each CG iteration isdominated by one Hessian matrix-vector multiplication, Hv, per iteration, which is implemented using (27)-(28).4. NUMERICAL SIMULATION AND PERFORMANCEThe �rst step in the simulation process is to generate a phase function �(x1; x2), a phase diversity function �(x1; x2),and a pupil, or aperture, function p(x1; x2). These are used to compute PSF's via equations (23)-(24). Correspondingto a telescope with a large circular primary mirror with a small circular secondary mirror at its center, the pupilfunction is taken to be the indicator function for an annulus. The computational domain is taken to be a 72 � 72pixel array. n = 888 of these pixels lie in the support of the pupil function, which is the white region in the upperright subplot in Fig. 2. Values of the phase function � at these n pixels are generated from an n � n covariancematrix in a manner analogous to (17), i.e., the phase is the product of a square root of a covariance matrix and acomputer-generated discrete white noise vector. We use a covariance matrix which models phase aberrations froman adaptive optics system for a 3.5 meter telescope at the SOR. The phase used in the simulation appears in theupper left subplot in Fig. 1. The PSF s1 = s[�] corresponding to this phase is shown in the upper right subplot.The quadratic phase diversity function � appears in the lower left subplot. To its right is the PSF s2 = s[� + �]corresponding to the perturbed phase. In the notation of Section 2, � = �2, while �1 is identically zero.The next stage in the simulation process to is to generate an object function f(x1; x2), convolve it with the twoPSF's, and add noise to the resulting pair of images, cf., equation (2). The convolution integrals are approximatedusing 2-D FFT's, and the simulated noise vectors, �k; k = 1; 2, are taken from a Gaussian distribution with meanzero and variances �2k equal to the 5 percent of the signal strengths in each image, i.e., �k = :05 � jjsk ? f jj. Theobject is a simulated binary star, shown at the upper left in Fig. 2. The lower left subplot shows the blurred noisyimage corresponding to phase �, while the image at the lower right corresponds to the perturbed phase �+ �.Given data consisting of the images d1 and d2 and the pupil function p(x1; x2) shown in Fig. 2 together with thediversity function �(x1; x2) shown in Fig. 1, the minimization problem described previously is solved numerically.The regularization parameters are chosen to be = 10�7 and � = 10�5, and the dominant m = 40 eigenvectorsare selected as the �j(x1; x2)'s in the representation (20) for the phase. The reconstructed phase and object are6

x1

x 2

Object

10 20 30

5

10

15

20

25

30

350

0.2

0.4

0.6

0.8

1

x1

x 2

Pupil Function

10 20 30

5

10

15

20

25

30

350

0.2

0.4

0.6

0.8

1

x1

x 2

Image

10 20 30

5

10

15

20

25

30

350

1

2

3

4

x1

x 2

Diversity Image

10 20 30

5

10

15

20

25

30

350

0.2

0.4

0.6

0.8

Figure 2. Negative grayscale plot of simulated object f(x1; x2) at upper left; image d1 = s[�] ? f + �1 at lowerleft; and diversity image d2 = s[�+ �] ? f + �2 at lower right. At the upper right is a grayscale plot of the pupil, oraperture, function.shown in Fig. 3. The parameters �, , and m were selected by trial and error to provide the best reconstructionsfrom the given (noisy, discrete) data. If � and are decreased signi�cantly, the reconstructions degrade becauseof ampli�cation of error. On the other hand, increasing these parameters signi�cantly results in overly smoothedreconstructions. Decreasing m signi�cantly will result in loss of resolution. Increasing m will have no e�ect on thereconstructions, but it will increase the computational cost due to increased size of the (discrete) Hessians.It should be noted in Fig. 3 that the reconstructed phase is compared not against the exact phase (shownat the upper left in Fig. 1), but against the L2 projection of the exact phase onto the span of the dominant meigenfunctions of the covariance matrix. This can be interpreted as a projection onto a \low frequency" subspace.A visual comparison of the upper left and lower left subplots in Fig. 3 shows very good agreement; the norm ofthe di�erence of the two functions being plotted di�er by about 10 percent. Similarly, the reconstructed object iscompared against a di�raction limited version of the true object.Numerical performance is summarized in Figs. 4. The Gauss-Newton (GN) method, the �nite di�erence Newton's(FDN) method, and the BFGS method were each applied with initial guess �0 = 0 and same parameter values, = 10�7, � = 10�5, and m = 40. The latter two methods required a line search in order to converge, but GN didnot. The linear convergence rate of GN and the quadratic rate for FDN can clearly be seen. While FDN and BFGShave asymptotically better convergence properties, GN required only 5 iterations to achieve phase reconstructionsvisually indistinguishable from the reconstructions at much later iterations. FDN required 15 iterations to achievecomparable accuracy, while BFGS required about 35 iterations. Additional overhead is incurred by these two methodsin the line search implementation. In terms of total computational cost (measured by total number of FFT's, forexample), BFGS is the most e�cient of the three methods due to its low cost per iteration. On the other hand,BFGS is an inherently sequential algorithm, while GN and FDN have obvious parallel implementations. Givene�cient parallel implementations, clock time should be proportional to the iteration count rather than the numberof FFT's.We also tested our algorithms on the simulated satellite image data considered by Tyler et al.7 Numericalperformance is summarized in Fig. 5. Somewhat di�erent regularization parameters were used in this case (� = 10�4and = 10�1), while m = 40 and the 5 percent noise level remains the same as in the binary star test problem. Acomparison of Fig. 4 and Fig. 5 reveals the BFGS convergence history to be much the same. In the case of FDN,7

x1

x 2

Projected True Phase

10 20 30

5

10

15

20

25

30

35

x1

x 2

Diffraction Limited True Object

10 20 30

5

10

15

20

25

30

35

x1

x 2

Reconstructed Phase

10 20 30

5

10

15

20

25

30

35

x1

x 2

Reconstructed Object

10 20 30

5

10

15

20

25

30

35

−1.5

−1

−0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

1

−1

−0.5

0

0.5

1

0

0.2

0.4

0.6

0.8

1

Figure 3. True and reconstructed phases and objects.

0 10 20 30 40 50 60 70 8010

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Quasi−Newton Iteration

Nor

m o

f Gra

dien

t

Gradient Norm vs. Iteration Count

Figure 4. Performance of various methods for the binary star test problem, as measured by the norm of thegradient vs. iteration count. Circles (o) indicate the Gauss-Newton method; stars (*) indicate �nite di�erenceNewton's method; and pluses (+) indicate the BFGS method.rapid (quadratic) convergence occurs much earlier (at 8 or 9 iterations vs. at 18 or 19 iterations) for the satelliteproblem than for the binary star problem. As with the binary star test problem, due to the need to perform severalline searches during the �rst few iterations, the actual cost of FDN is somewhat higher than what the 8 or 9 iterationsmight indicate.For GN, we obtained rapid decrease in the norm of the gradient for the �rst 3 or 4 iterations with both testproblems. This rapid decrease continues for later iterations in the binary star case. For the satellite test problemon the other hand, the GN convergence rate slows considerably at later iterations. However, after 3 iterations there8

0 10 20 30 40 50 60 70 8010

−9

10−8

10−7

10−6

10−5

10−4

10−3

Quasi−Newton Iteration

Nor

m o

f Gra

dien

t

Gradient Norm vs. Iteration Count

Figure 5. Performance of various methods for the satellite test problem, as measured by the norm of the gradientvs. iteration count. Circles (o) indicate the Gauss-Newton method; stars (*) indicate �nite di�erence Newton'smethod; and pluses (+) indicate the BFGS method.is almost no change in the reconstructed phase and object. Hence, to obtain \reasonably accurate" reconstructionswith GN, no more than 4 iterations are necessary for the satellite test problem. As in the previous test problem, GNrequired no line searches. 5. DISCUSSIONThis paper has presented a fast computational algorithm for simultaneous phase retrieval and object recovery inoptical imaging. The approach uses phase diversity-based phase estimation with a maximum likelihood �t-to-datacriterion when a Gaussian noise model is applied with appropriate stabilization of the cost functional (1). Threemethods, full Newton, Gauss-Newton, and BFGS, for minimizing the resulting least squares cost functional werecompared in numerical simulations. The Newton and Gauss-Newton make use of both �rst order (gradient) andsecond order (Hessian) information. Their implementations for our two test problems are highly parallelizable, incomparison to the BFGS secant updating method. The full Newton approach required a good initial guess, whileGauss-Newton is robust, even with a poor initial guess and without globalization, i.e., a line search. In particular,simulation studies indicate that the method is remarkably robust and numerically e�cient for our problem.Improvements and extensions of this work being considered include following projects.� Parallel implementation. The code, currently in MATLAB, is being converted to C for implementation onthe massively parallel IBM SP2 at the Air Force Maui High Performance Computing Center. This project ispartially in conjunction with the AMOS/MHPCC R&D Consortium Satellite Image Reconstruction Study.7The gradient of our reduced cost functional as well as the Gauss-Newton Hessian-vector products can bee�ciently evaluated in parallel. Our objective is to achieve a parallel implementation where the run-time isessentially independent of the phase parameterization value m discussed in x3.1.� Phase-diversity based phase retrieval. The recovery of phase aberrations induced by atmospheric turbulenceusing phase diversity methods has recently been compared to classical Shack-Hartman wavefront sensing.8Here, the object itself is assumed known, e.g., a point source from a natural or laser guide star. The wavefrontphase aberrations estimates are used to drive a closed loop adaptive optics system for real-time control ofdeformable mirrors used in optical image reconstruction. Speed is of course essential in these phase retrievalcomputations, and we plan to consider this application in our future work.9

APPENDIX A. DERIVATION OF EQUATION (13)KXk=1 k�1Xj=1 jDkSj �DjSkj2 = 12 KXk=1 KXj=1 jDkSj �DjSkj2= KXk=1 jDkj2 KXj=1 jSj j2 � j KXk=1D�kSkj2= Q KXk=1 jDkj2 � j KXk=1D�kSkj2 � KXk=1 jDkj2:To establish the equality of (12) and (13), add PKk=1 jDkj2 to both sides and divide by Q.APPENDIX B. DERIVATION OF GRADIENT FORMULASThe Gateau derivative, or variation, of J in the direction is given by� J [�] = lim�!0 J [�+ � ]� J [�]� = dd� J [�+ � ]j�=0:The derivative J 0[�] (which is the gradient of J) is characterized by� J [�] = h ; J 0[�]ifor any direction . By the linearity of the operator � , (11)-(12), and (19),� J [�] = �12� Z Z P [�]P �[�]Q[�] + �h ;A�1�i; (29)where P [�] =Xk D�kSk[�]; Q[�] = +Xk Sk[�]S�k [�]: (30)Now by the product and quotient rules for di�erentiation,� Z Z P [�]P �[�]Q[�] = Z Z Q� (P [�]P �[�])� jP j2Pk � (Sk[�]S�k [�])Q2= Z Z Xk � Sk[�]�P �QD�k � jP j2S�kQ2 �+ c:c:= Xk h� Sk[�]; Vki+ c:c:; (31)where \c:c:" denotes the complex conjugate of the previous term, andVk = PQ�Dk � jP j2SkQ2 = F �Dk � jF j2Sk:The second equality follows from (30) and (9). We now suppress the subscripts k and make use of various propertiesof the derivative, as well as (23)-(24). We also make use of the fact that the Fourier transform preserves angles toobtain h� S[�]; V i+ c:c: = � hS[�]; V i+ c:c: (32)= � hs[�]; vi + c:c:; v = F�1(V )= � hh[�]h�[�]; vi + c:c:= � hh[�]; h[�]vi + c:c:10

= h� h[�]; hvi+ hh; � h[�]vi+ c:c:= h� H [�];F(hv)i+ hF(hv�); � H [�]i+ c:c:= hi H;F(hv)i + hF(hv�); i Hi+ c:c: (33)= h ; i�H�F(hv) + iHF�(hv�)i+ c:c:= h ; 4Imag[H�F(hReal[F�1(V )])]i: (34)In (33) we have used (23) to obtain � H = { H . Equating the left hand side of (32) with (34) and then applying(31) and (29), we obtain the derivative, or gradient, of J ,J 0[�] = �2Xk Imag[H�kF(hkReal[F�1(Vk)])] + � A�1�:APPENDIX C. DERIVATION OF HESSIAN APPROXIMATIONSFrom (13) and (18), the �rst term in (11) can be written asJdata[�] = 12Xj<kX jjRjk [�]jj2 + 2 jjD1=2[�]jj2; (35)where Rjk [�] = DkSj [�]�DjSk[�]Q1=2[�] ; D[�] = PKk=1 jDkj2Q[�] ; (36)and Q[�] is de�ned in (14). Perhaps the simplest positive de�nite Hessian approximations can be constructed bydropping the second term on the right hand side of (35), �xing the Q in the �rst term, and applying the Gauss-Newtonidea. Proceeding in this manner, de�ne ~Rjk [�] = ~DkSj [�]� ~DjSk[�]; (37)where ~Dk = DkQ1=2 : (38)Note that Q is �xed, so the ~Dk are independent of �. Then the action of the resulting operator, which we denote by~H , is given by h ~H�; i =Xj<kXhUjk; ~R0jk [�] i (39)where now Ujk = ~R0jk [�]� = ~DkS0j [�]� � ~DjS0k[�]�: (40)We next derive an expression for S0j [�]�, where � 2 L2(IR2) is �xed. Drop the subscript j, and let W be arbitrary.Then proceeding as in Appendix B with � and W playing the role of and V , respectively,hS0[�]�;W i = ��hS[�];W i= hi�H;F(hw)i + hF(hw�); i�Hi; w = F�1(W )= h{h�F�1(�H); wi + h{�hF��(�H); wi= h�2 Imag[h�F�1(�H)]; wi:Taking Fourier transforms, S0[�]� = �2F(Imag[h�F�1(�H)]): (41)Consequently, Ujk = �2 ~DkF(Imag[h�jF�1(�Hj)]) + 2 ~DjF(Imag[h�kF�1(�Hk)]): (42)11

Examining the terms in (39) and applying computations similar to those above,hUjk ; ~DkS0j [�] i = h ~D�kUjk; S0j [�] i= � h ~D�kUjk; Sj [�]i= h2Imag[H�jF(hjwk)]; i; (43)provided wk = F�1( ~D�kUjk) (44)is real-valued. An analogous expression is obtained when ~Dk; S0j are replaced by ~Dj ; S0k. Then combining (39) with(43), ~H[�]� = 2Xj<kX Imag[H�jF(hjwk)�H�kF(hkwj)]: (45)ACKNOWLEDGMENTSThe work of Curtis R. Vogel was supported by NSF under grant DMS-9622119 and AFOSR/DEPSCoR under grantF49620-96-1-0456. Tony Chan was supported by NSF under grant DMS-9626755 and ONR under grant N00014-19-1-0277. Support for Robert Plemmons came from NSF under grant CCR-9623356 and from AFOSR under grantF49620-97-1-0139. The authors wish to thank Brent Ellerbroek of the U.S. Air Force Star�re Optical Range forproviding data, encouragement, and helpful advice for this study.REFERENCES1. R. A. Gonsalves, \Phase retrieval and diversity in adaptive optics," Optical Engineering, 21, pp. 829-832, 1982.2. R. G. Paxman, T. J. Schulz, and J. R. Fineup, \Joint estimation of object and aberrations by using phasediversity," J. Optical Soc. Am., 9, pp. 1072-1085, 1992.3. J. E. Dennis, Jr., and R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equa-tions, SIAM Classics in Applied Mathematics Series, 16, 1996.4. J. W. Goodman, Introduction to Fourier Optics, 2nd Edition, McGraw-Hill, 1996.5. B.L. Ellerbroek, D. C. Johnston, J. H. Seldin, M. F. Reiley, R. G. Paxman, and B. E. Stribling, \Space-objectidenti�cation using phase-diverse speckle," in Mathematical Imaging, Proc. SPIE, 3170-01, 1997.6. M. C. Roggeman and B. Welsh, Imaging Through Turbulence, CRC Press, 1996.7. D. W. Tyler, M. C. Roggemann, T. J. Schultz, K. J. Schultz, J. H. Selden, W. C. van Kampen, D. G. Sheppard andB. E. Stribling, Comparison of image estimation techniques using adaptive optics instrumentation, Proc. SPIE3353, 1998 (these Proceedings).8. B. L. Ellerbroek, B. J. Thelen, D. J. Lee, D. A. Carrara, and R. G. Paxman, Comparison of Shack-Hartmanwavefront sensing and phase-diverse phase retrieval, Adaptive Optics and Applications, R. Tyson and R. Fugate,editors, Proc. SPIE 3126, pp. 307-320, 1997.

12