Convergence of a two-level ideal algorithm for a parametric shape inverse model problem

Inverse Problems in Science and EngineeringVol. 19, No. 3, April 2011, 363–393

Convergence of a two-level ideal algorithm for a parametric

shape inverse model problem

Benoıt Chaigne* and Jean-Antoine Desideri

INRIA Sophia Antipolis-Mediterranee, Project-Team Opale, 2004 route des Lucioles BP93,Sophia Antipolis cedex F-06902, France

(Received 18 June 2010; final version received 18 December 2010)

Similar to the discretization of ordinary or partial differential equations, thenumerical approximation of the solution of an optimization problem is possiblysubject to numerical stiffness. In the framework of parametric shape optimiza-tion, hierarchical representations of the shape can be used for preconditioning,following the idea of Multigrid (MG) methods. In this article, by analogy with thePoisson equation, which is the typical example for linear MG methods, weaddress a parametric shape inverse problem. We describe the ideal cycle of atwo-level algorithm adapted to shape optimization problems that requireappropriate transfer operators. With the help of a symbolic calculus softwarewe show that the efficiency of an optimization MG-like strategy is ensured by asmall dimension-independent convergence rate. Numerical examples are workedout and corroborate the theoretical results. Applications to antenna designare realized. Finally, some connections with the direct and inverse Broyden–Fletcher–Goldfarb–Shanno preconditioning methods are shown.

Keywords: parametric shape optimization; inverse problem; multigrid andmultilevel methods; convergence rate

AMS Subject Classifications: 49Q10; 49M05; 49N45; 65B99; 65N55

1. Introduction

Many engineering problems are defined as inverse problems: one aims at optimizing adevice to achieve or closely approach an ideal physically-relevant target. The correspond-ing objective function depends on a state variable u governed by the partial differentialequations (PDE) of the underlying physics. When the control is on the geometry of thedevice, which is commonly the case, the inverse problem reads as the following shapeoptimization problem

minSJ ðSÞ ¼

1

2

Z�

OðuðS,!ÞÞ � �Oð!Þ�� 2d!, ð1Þ

where S belongs to some feasible class of shapes, O is some functional of the state, �O theideal target functional and � an observation domain. In general, the study of a shape

*Corresponding author. Email: [email protected]

ISSN 1741–5977 print/ISSN 1741–5985 online

� 2011 Taylor & Francis

DOI: 10.1080/17415977.2011.551877

http://www.informaworld.com

functional with respect to a geometrical domain is a difficult mathematical problem. Aformal analysis of such problems can be found for instance in [1,2]; a necessary optimalitycondition can be derived in an adequate functional space. However, in our framework, weconsider a parametric shape optimization problem. The shape is assumed to have anexplicit discrete definition, thus making strong assumptions about the set of feasibleshapes. This is often the case in engineering applications when the software is based onComputer Aided Design (CAD) tools. Please note that in the sequel, the expressions shapeoptimization and shape inverse problem are meant in this parametric frameworkthroughout.

It is well-known that discretized PDEs yield very ill-conditioned systems as the gridrefines. Similarly, the numerical treatment of optimization problems can be problematicwhen the design space dimension is large due to numerical stiffness. Among precondi-tioning techniques, Multilevel (ML)/Multigrid (MG) methods can be used to overcomethis difficulty to solve PDEs discretely. By analogy, several algorithms have been proposedto extend the ML concepts to the framework of PDE-constrained shape optimizationproblems: e.g. the consistent approximations [3–5], the one-shot methods [6–10], the MG/Opt algorithm [11–13], as well as parametric approaches [14]. A rather completebibliography can be found in [15].

The consistent approximations technique provides a proof for the convergence ofsuccessive discrete solutions towards the continuous one under consistent hypothesis. Theresulting algorithm is a gradient-based Nested Iteration algorithm. Assuming that aniterative method is used to compute the PDE and its adjoint, it can also take into accountpossible incomplete calculations (i.e. within a given finite number of iterations, regardlessof the residual). This implies that the most accurate approximation of the state isnot necessary to obtain a descent direction, thus reducing the computational cost(see [3; Chapter 6], or [5]).

The one-shot methods are devised to solve in a single iterative process both the systemof state equations and the necessary optimality conditions. The overall system beingnonlinear, a Full Approximation Scheme (FAS) or Newton-like strategy is applied. Thisstrategy turns out to be irrelevant when the state is an explicit function of the geometry orwhen the state equation is a linear system solved by a direct method.

The MG/Opt algorithm is a shape optimization algorithm inspired by the nonlinearMG algorithm FAS. It is based on the hierarchy of approximation spaces of the statevariable (not necessarily of the shape representation). The approximation spaces areusually defined together with the mesh (e.g. P1 elements on a triangulated surface). Sincethe cost function depends on the state, there exists a mapping between h, the characteristicmesh size, and Jh, the corresponding approximated cost function. In other words, ahierarchy in terms of approximation spaces of the state implies a hierarchy in terms ofapproximations of J. In order to ensure the consistency between the functions Jh to beminimized, an FAS strategy is used: in the end, a descent direction on a coarse is a descentdirection on the fine grid.

In a parametric representation, the design space and the discrete representation for thecalculus of the cost function are independent. This approach is employed here, and wefocus on the question of adequately defining the hierarchy of parametric design spaces. Inthat case, since the objective function to be minimized is of the same order of accuracy(same mesh size) for each level of the design space, the consistency between the ‘grids’ isachieved. In order to examine how MG-like algorithms compare in the framework ofshape inverse/optimization problems we consider two-level ideal algorithms applied to a

364 B. Chaigne and J.-A. Desideri

simple parametric shape inverse problem. Using the terminology of the MG theory wepropose different strategies and sketch a convergence proof. As ideal algorithms, weassume that the coarse problem is solved exactly. The relaxation method is a simplesteepest descent method. For different transfer operators, the convergence rate of the idealcycle is derived. Depending on the nature of these operators, the convergence rate turnsout to be independent of the search space dimension or not, illustrating a fundamentaldifference between PDE and shape inverse problems regarding preconditioning. Besides,some connections with the direct and inverse Broyden–Fletcher–Goldfarb–Shanno(BFGS) methods are considered. Namely, we show that the efficiency of one or theother formulation depends on the nature of the Hessian operator, as for MG methods.

This article is organized as follows: in Section 2, we set the optimization problem andidentify similarities and differences with the Poisson model equation; in Section 3, we recallimportant results on the convergence of linear iterations and the concept of smoothing; inSection 4 we deal with the ideal two-level algorithms, the transfer operators and weillustrate the theoretical results with numerical examples; in Section 5 we extend theproblem to CAD parameterization and in Section 6 we show some results of its applicationto computational electromagnetics; the relationship with the BFGS method is consideredin Section 7; finally we draw some conclusions in Section 8.

2. Definition of the model problems

We consider the best approximation problem of a real-valued bounded function �u in afinite-dimensional subspace. This problem will be referred to in the sequel as the shapeoptimization problem. It is closely related to the Poisson equation �Du¼ f, which is thetypical model problem for linear MG methods. In their discrete formulations, bothproblems lead to symmetrical linear systems. We exhibit their diagonalizations sincenumerical properties of iterative methods (such as smoothing) are closely related to theeigenstructure of the underlying matrices.

2.1. Best approximation problem

In the sequel H0 denotes the space of bounded functions defined on [0 1] such thatu(0)¼ u(1)¼ 0 with usual L2 inner product (u, v) and norm uk k ¼

ffiffiffiffiffiffiffiffiffiffiffiu, uð Þ

p.

2.1.1. Shape functional

Let �u be a function in H0 and F a subspace of H0. The best approximation of �u in F readsas the minimization of the L2 norm of the difference between u2F and the target function�u, that is,

minu2FJ ðuÞ ¼

1

2u� �uk k2¼

1

2

Z 1

0

uðtÞ � �uðtÞ�� 2dt: ð2Þ

2.1.2. Parametric functional

The parametric approach consists in approaching �u in a finite-dimensional subspace F.Let fukg

Nk¼1 be a free family of functions inH0. The space F¼ span{. . . , uk, . . .} is a subspace

Inverse Problems in Science and Engineering 365

ofH0 of dimension N (or equivalently of degree N� 1 for polynomial spaces). A function u

of F is noted u[x] where x2RN is the vector of coefficients1 in the basis fukg

Nk¼1, that is

u½x�ðtÞ ¼XNk¼1

xkukðtÞ: ð3Þ

The parametric objective function is the following application

JðxÞ ¼def:J ðu½x�Þ ¼

1

2

Z 1

0

XNk¼1

xkukðtÞ � �uðtÞ

��2

dt x 2 RN, ð4Þ

which is merely the restriction of the shape functional (2) to the subspace F in terms of the

design parameters x. Note that the parametric functional essentially relies on the search

space basis which is not necessarily unique: for a given subspace the choice of the basis

may be critical. Namely, it can yield bad numerical properties for the convergence of

numerical optimization algorithms such as slow convergence due to bad conditioning,

which justifies the need for preconditioning methods.The gradient G and the Hessian matrix H of J are

GðxÞ ¼

..

.

u½x� � �u, ukð Þ

..

.

0BBB@

1CCCA and HðxÞ ¼

..

.

� � � uk, uj� �

0BBBB@

1CCCCA: ð5Þ

In a simpler way, the parametric functional reads as the quadratic form

JðxÞ ¼1

2xTHx� bTxþ c, ð6Þ

where bk¼ ( �u, uk) and c ¼ 12

�uk k2. Note thatH is s.p.d. In that case the optimality condition

G(x)¼ 0 is sufficient and equivalent to solving the linear system Hx¼ b.

2.1.3. Application with P1 elements

Let us consider P1 elements as approximation space of �u. That is, we aim at approaching �u

with a piecewise linear function (this approach will be considered as CAD-free in the

sense that it relies on a mesh T h, contrary to Bezier or other meshless representations,

see Section 5).Let T h be a uniform discretization of the interval [0 1] : tk¼ kh, h ¼ 1

Nþ1,

k¼ 0, . . . ,Nþ 1. The P1 functions (‘hat’ functions, see Figure 1) are defined by

ukðtÞ ¼

t� tk�1h

t 2 ½tk�1 tk� k4 0

tkþ1 � t

ht 2 ½tk tkþ1� k5Nþ 1

0 t =2 ½tk�1 tkþ1�

8>>><>>>:

: ð7Þ


The local support of the functions uk implies that the Hessian matrix, noted as Hh,has a band structure. With linear approximation the bandwidth is 3

hjk ¼

Z h

0

t

h

� �2dt ¼

h

3j ¼ k ¼ 0, j ¼ k ¼ Nþ 1,

2

Z h

0

t

h

� �2dt ¼

2h

305 j ¼ k5Nþ 1,

Z h

0

t

h1�

t

h

� �dt ¼

h

6j ¼ kþ 1, j ¼ k� 1:

8>>>>>>>><>>>>>>>>:

: ð8Þ

Hence we have Hh ¼h6B with

B ¼

4 1

1 4 1

. .. . .

. . ..

1 4 1

1 4

0BBBBBB@

1CCCCCCA2 R

N�N, ð9Þ

where we have applied Dirichlet boundary conditions (i.e. we have ignored the functions u0and uNþ1). The right-hand side (RHS) bh is given by (bh)k¼ ( �u, uk).

B is a real symmetric matrix. As such it admits an orthogonal diagonalizationB¼:,:T with real eigenvalues. Moreover B is strictly diagonally dominant. Accordingto the Gershgorin theorem the spectrum of B is such that �(B)� [2 6]. Consequently, thecondition number of B is bounded by �2� 3, regardless of the mesh size N.

2.2. Poisson equation

The 1D Poisson equation on the closed interval [0 1] with homogeneous Dirichletboundary conditions reads

�u00ðtÞ ¼ f ðtÞ t 2 �0 1½

uð0Þ ¼ uð1Þ ¼ 0

�ð10Þ

for some given function f.

tN+1tk+2tk+1tktk−1tk−2t0

k−1u ku k+1u0u N+1u

t

1

......

......

u

Figure 1. P1 elements of T h.


Considering a centred finite differences scheme evaluated at the nodes tk of the uniformmesh T h yields the linear system Ahuh¼ fh where

Ah ¼1

h2A ¼

1

h2

2 �1

�1 2 �1

. .. . .

. . ..

�1 2 �1

�1 2

0BBBBBB@

1CCCCCCA2 R

N�N, ð11Þ

(uh)k¼ u(tk) and ( fh)k¼ f(tk).A is a real symmetric matrix: it has an orthogonal diagonalization A¼S&ST with real

eigenvalues.

2.3. Spectral analysis

The previously defined model problems lead to the linear systems Hhx¼ bh and Ahuh¼ fh.Before discussing the numerical treatment for solving these equations, let us discuss theeigenstructure of Hh and Ah.

We have seen that Hh ¼h6B and Ah ¼

1h2A where A and B do not depend on the mesh

size h. It is easy to verify that both model problems are related by

B ¼ 6I� A: ð12Þ

Hence the diagonalization of B is related to that of A as follows:

: ¼ S and , ¼ 6I�&: ð13Þ

Both matrices have the same eigenvectors associated to eigenvalues which are inverselyordered (and shifted).

The diagonalization of A is well known (see, e.g. [16,17]): let Sk denote the ktheigenvector associated to �k; we have

Sk ¼ffiffiffiffiffi2hp

..

.

sin jk�hð Þ

..

.

0BBB@

1CCCA and �k ¼ 2� 2 cos k�hð Þ: ð14Þ

Remark 2.1 : is orthogonal, and in case of Dirichlet conditions, also symmetrical, sothat :¼:T

¼:�1.

The eigenvectors (Figure 2) are discrete Fourier modes on T h. Each eigenvector ischaracterized by a frequency parameter �k¼ k�h. We refer to high frequency modes (HF)the eigenvectors Sk such that �k �

�2 and to low frequency modes (LF) the remaining

eigenvectors.Using the terminology introduced above, the two model problems can be distinguished

according to their eigenstructure: for the discrete Poisson problem, the LF modes areassociated with the smaller eigenvalues; for the best approximation problem, the pairing ofmodes with frequencies is inverse: the LF modes are associated with the larger eigenvalues.Vice versa, the HF modes are associated with the larger eigenvalues in the case of thePoisson equation, whereas they are associated with the smaller eigenvalues in the case ofthe best approximation problem (Figure 3).


1 8 16−0.4

0

0.4S1

1 8 16−0.4

0

0.4S2

1 8 16−0.4

0

0.4S3

1 8 16−0.4

0

0.4S4

1 8 16−0.4

0

0.4S5

1 8 16−0.4

0

0.4S6

1 8 16−0.4

0

0.4S7

1 8 16−0.4

0

0.4S8

1 8 16−0.4

0

0.4S9

1 8 16−0.4

0

0.4S10

1 8 16−0.4

0

0.4S11

1 8 16−0.4

0

0.4S12

1 8 16−0.4

0

0.4S13

1 8 16−0.4

0

0.4S14

1 8 16−0.4

0

0.4S15

1 8 16−0.4

0

0.4S16

Figure 2. Eigenvectors of A and B: discrete Fourier modes (N¼ 16).

2 4 6 8 10 12 14 160

0.5

1

1.5

2

2.5

3

3.5

4(a) (b)

LF

HF

k

μk

2 4 6 8 10 12 14 162

2.5

3

3.5

4

4.5

5

5.5

6

LF

HF

k

λk

Figure 3. Eigenvalues of A and B: (a) �(A) and (b) �(B).


In accordance with this structure the linear operators Ah and Hh have oppositesmoothing properties. Indeed, a matrix-vector product amplifies the modes of largesteigenvalues: the LF or HF modes depending on the problem.2

The analytical knowledge of the diagonalization of B will be useful for the convergencestudy of the two-grid ideal schemes. Anticipating the sequel, one can already guess that aclassical MG strategy (in the sense of a PDE-like strategy) will fail in this framework sincewe lack a smoother operator. In order to explain this argument in details we will review inSection 3 the decay properties of basic iterative methods, focusing on the optimizationpoint of view.

Remark 2.2 The MG methods are intended to solve stiff linear systems; the bestapproximation problem in the P1 parameterization is, however, well-conditioned (�253).In Section 5 we will generalize this problem to a parametric optimization problem (mesh-independent parameterization, i.e. without using T h), where the condition number canbecome pathologically high as the space dimension increases. This technique is howeverpreferred in engineering since there are fewer variables, the shape is smoother and it yieldsa CAD model.

2.4. Relevance of the shape optimization model problem

Our motivation for studying such a problem comes from the observation that for somePDE-constrained shape optimization problems in engineering, the Hessian matrix exhibitsa spectral structure similar to that of the model problem [18]. Therefore, assuming enoughregularity on the criterion such that there exists a vicinity around the solution where theHessian remains positive definite, the local convergence towards this solution is expectedto have similar properties to this model problem. Application to antenna design isillustrated in Section 6 of this article.

3. Decay factors of the basic iterative methods

In this section we consider basic iterative methods for solving the model problems fromtwo points of view: as a linear system Mx¼ b (Jacobi method) or equivalently as theminimization of the quadratic form 1

2xTMx� bTx when M is symmetric positive definite

(steepest descent method). We derive their amplification matrices.

3.1. Jacobi iteration and steepest descent method

The Jacobi method for solving a linear system Mx¼ b reads

xiþ1 ¼ I� �D�1M M� �

xi þ �D�1M b, ð15Þ

where DM is the diagonal part of M and � a relaxation parameter. Assuming without lossof generality, that the system has already been scaled such that the diagonal of M is theidentity, the amplification matrix of the Jacobi iteration with relaxation parameter � reads

G� ¼ I� �M: ð16Þ


The gradient of the quadratic form reads J0(x)¼Mx� b, hence the steepest descentmethod consists in the following linear iteration

xiþ1 ¼ ðI� �MÞxi þ �b, ð17Þ

where the step � can either be fixed or given by a line search along the gradient directionMxi� b. It appears that the steepest descent iteration with step � can be seen as a Jacobi

iteration with relaxation parameter � if the problem has been properly scaled beforehand(i.e. the diagonal ofM is the identity). Consequently, the amplification matrix is equivalentto (16) for one descent iteration.

In term of the iterative error ei¼ xi� x, where x is the fixed point, the iteration reads

eiþ1¼G�ei. The absolute value of the eigenvalues gk of G� are called decay factors.

3.2. Decay functions of the basic methods applied to the model problems

Let us apply this method to both model problems in order to derive the decay factors, thusproviding useful information about the convergence rate.3

Let us assume that both problems are scaled, hence the amplification matrices w.r.t. the

relaxation parameter � read

GA� ¼ I�

�

2A and GB

� ¼ I��

4B: ð18Þ

Figure 4 illustrates that the decay functions are monotonous functions of the modefrequency: it is decreasing in the case of the Poisson equation and increasing in the case ofthe shape optimization problem.

Provided that all decay factors remain in ]�1 1[ to insure convergence, the relaxation

parameter � can be set to adapt the decay function. Ideally, one would optimize � such thatthe spectral radius is minimized. However, in a stiff problem such as the discrete Poissonequation, the smallest eigenvalue remains close to one, whatever the value of �.Alternatively, it may be more relevant to optimize � on a subset of the space rather thanglobally. In the sequel we divide the search space into two complementary subspaces: the

1 64 128 192 255−1

−0.5

−0.20

0.2

0.5

1(a) (b)Decay factor

Mode frequency

Fac

tor

τ = 1/2τ = 2/3τ = 1

1 64 128 192 255−1

−0.5

−0.20

0.2

0.5

1Decay factor

Mode frequency

Fac

tor

τ = 2/3τ = 4/5τ = 1

Figure 4. Decay function of one Jacobi (steepest descent) iteration for some values of the relaxationparameter � and for N¼ 255: (a) Poisson equation and (b) best approximation.


subspace spanned by the LF eigenmodes and the subspace spanned by the HF eigenmodes,identified, respectively, by the subset of indexes ILF and IHF (assuming that N is odd)

ILF ¼ 1, . . . ,N� 1

2

� , IHF ¼

Nþ 1

2, . . . ,N

� :

In the following sections we provide the optimal value of � that minimizes one of thefollowing criteria

� ¼ maxk2ILF[IHF

gk�� , �LF ¼ max

k2ILFgk�� , �HF ¼ max

k2IHF

gk��

according to each problem.

3.2.1. Optimal relaxation parameter for the Poisson equation

First note that the method converges only when � lies in the interval ]0 1]. This ensuresthat all decay factors are such that gAk

�� 5 1. For all � in ]0 1], the decay factors arepositive in the LF part. Since the decay function is monotonous decreasing w.r.t. k, thedecay factor of maximum absolute value is gA1 ¼ 1� � þ � cos ð�hÞ which is minimized for�¼ 1, yielding �LF¼ cos(�h). This means that the iteration cannot be efficient on theLF part, unless h is large enough (i.e. N must be small: a coarse parameterization/grid).Since � is necessarily greater or equal than �LF (here �¼ �LF) it makes no sense to optimize� globally.

On the contrary, � can be set to minimize �HF. In that sense the optimal value is � ¼ 23

for which the largest decay factor is obtained at k ¼ Nþ12 and equal to 1

3. Note that thisvalue is independent of the mesh size. In the MG terminology this method is called asmoother since it efficiently reduces the HF part of the error. In the literature, the smootheris sometimes called the solution operator.

3.2.2. Optimal relaxation parameter for the shape inverse problem

In this case the convergence of the method requires that � lies in �0 43� (allowing � to be an

over-relaxation parameter). The main difference with the previous problem is that � can beset efficiently for either the LF or the HF part. This comes from the fact that no eigenvalueis close to zero. In other words this problem is well conditioned.

From a global point of view, the optimal value to minimize � is obtained at �¼ 1and yields � ¼ 1

2 cosð�hÞ, which is mesh dependent but bounded from above by 12. This

defines the best global solution operator. On the HF part, the optimal value is�HF ¼

13 ð2 cosð�hÞ � 1Þ and reached at � ¼ 4

3, which is also mesh dependent, thoughbounded from above by 1

3. Finally, on the LF part, the optimal value is mesh independent,namely �LF ¼

15 and reached at � ¼ 4

5.Keeping in mind that we aim at constructing a mesh-independent ML algorithm,

the only promising solution operator is the one that operates on the LF half-space(an anti-smoother). Besides, as illustrated in Section 5 or in [19], the Bezier–Bernsteinparameterization deteriorates the conditioning and there is no particular setting of theparameter resulting in a smoother. Inversely, an efficient solution operator is given only bysome � that minimizes the convergence rate of the LF subspace (�LF for the LF half-space,but situations exist for which it is necessary to define the solution operator on a subspaceof smaller dimension, see Section 5).


Nevertheless, in order to conduct our analytical analysis further, we will use the P1parameterization (for which we do know a closed form of the spectrum) and assume thatthe solution operator is an anti-smoother.

3.2.3. Mesh size limit h! 0

Let GAn and GB

n be the amplification matrices equivalent to n Jacobi iterations withrespective optimal parameter � as defined previously. When the mesh size is refined andtends towards 0, the spectral radius of each problem has the following trend

�ðGAn Þ !

1

3nð1þ 2Þn ¼ 1 and �ðGB

n Þ !3

5

�n

:

In the next section our aim is to show how we can improve the convergence rate of theoptimization problem using an MG strategy as it can be done with the Poisson equation.

4. MG-like methods for parametric shape optimization

Let us briefly sketch the basic ideas of the MG methods (we refer to [20] for a completereview). Assume that you are given a mesh, namely the fine mesh. The approximationspace defined on this mesh can be seen as the direct sum of two complementary subspaces:an LF space and an HF space. The efficiency of MG methods for solving a PDE relies onthe complementarity of two ‘ingredients’:

(1) a simple iterative method (Jacobi, Gauss–Seidel, SOR, etc.) easily reduces the HFcomponents of the error: the smoother or solution operator;

(2) transfer operators between the fine mesh and coarser meshes and partial orcomplete reduction of the remaining LF content of the iterative error; this step isclassically referred to as the Coarse Grid Correction (CGC).

These ingredients are assembled in MG cycles which are composed of relaxation phases(smoothing phases) and Coarse Grid Corrections phases via transfer operators (NestedIterations, V-,W-cycles, etc.).

On the unique fine grid we have seen that the spectral radius of the solution operator ismesh size dependent and close to one (a bad convergence rate on the LF subspace). In theMG strategy, the smoother is such that the convergence rate on the HF subspace is meshsize independent. Then the LF modes are well represented (if not exactly) on a coarsermesh. Relative to this coarse mesh, the modes of highest frequency within the LF modesbecome the HF modes. Again, the solution operator can be used on this new mesh toefficiently reduce (i.e. at a mesh size independent convergence rate) the error in the newlydefined HF subspace. Moreover, the computational work is smaller on this coarser mesh.This can be repeated recursively on coarser meshes. On the coarsest mesh, the number ofd.o.f. is assumed to be small enough to permit the exact solution of the coarse-gridproblem (for instance with a direct method). In that case the MG algorithm is said to beideal. In the end, under certain assumptions, one can prove the following properties: theconvergence rate is mesh size independent; the computational work is proportional to thenumber of nodes N.

An MG cycle can be seen as a linear iteration. Rigorously, we need to investigate thespectrum of the amplification matrix of one MG cycle. An ideal two-level cycle is sufficient


to prove that the convergence is mesh size independent or not. This is the aim of thissection.

4.1. Transfer operators

Let us consider two grids: the fine mesh T h and the coarse mesh T 2h as defined inSection 2.1.3 with, respectively,N andN0 interior nodes (Figure 5). Assume that the problemdimensions are N¼ 2p� 1 on the fine grid and N0 ¼ 2p�1� 1 on the coarse grid for somep41. In that case the relation 2N0 þ 1¼N holds, which is equivalent to say that the intervalon the coarse grid h0 is twice bigger h0 ¼ 2h.

We first need to define transfer operators between these grids, a prolongation operatorP : T 2h!T h and a restriction operator R : T h!T 2h. Since we have assumed Dirichletboundary conditions, we only need to define transfer operators between the interior nodes.

Once these operators are defined, two definitions of a coarse sub-problem aredistinguished [20]:

. Galerkin Coarse grid Approximation (GCA): the matrix of the coarse problem A0

is obtained by projection of the matrix of the fine problem using the transferoperators, i.e. A0 ¼RAhP;

. Discrete Coarse grid Approximation (DCA): the matrix of the coarse problem A0

is obtained by discretization on the mesh T 2h of the original problem,i.e. A0 ¼A2h.

In the sequel we define P as the linear interpolation operator. Regarding the Poissonequation, if the restriction operator is the arithmetic-average operator R ¼ 1

2PT, then both

definitions are equivalent (A2h ¼12P

TAhP). Regarding the shape optimization problem, theequivalence is achieved with R¼PT (H2h¼PTHhP).

4

Let us introduce a few notations: let rh¼Ahuh� fh (resp. rh¼Hhx� bh) be the residual.By linearity, for all uh (resp. x) we have the equality Aeh¼Auh� fh¼ rh (resp. Hheh¼Hhx� bh¼ rh) where u (resp. x) is the exact solution and eh¼ uh� u (resp. eh¼ x� x)the error.

4.2. Two-grid ideal algorithms

It is sufficient for a convergence proof of a MG cycle (V-cycle, W-cycle, saw-tooth, etc.) toconsider only two grids: a coarse grid and a fine grid. Indeed, it is always possible to apply

recursively a two-grid algorithm on the coarse grid (which then becomes the new fine grid).

τ

τ

R P

ht0 t2(N’+1)

t

2h

t0 tN’+1

th2

h

Figure 5. Fine T h and coarse T 2h discretizations.


Remark 4.1 We recall that we want to solve the linear systems Ahuh¼ fh (Poissonequation) and Hhx¼ bh (shape optimization) on a fine grid T h. In the sequel, the contextof the problem will be obvious; therefore, we skip the superscripts A and B on theamplification matrices for the sake of simplicity.

4.2.1. Classical formulation

Let T h and T 2h be two grids as defined in Section 2.1.3. The initial approximation is notedas u

ð0Þh . A two-grid ideal algorithm involves three phases.

Algorithm 4.1

(1) a pre-relaxation phase on the fine grid: the error is smoothed with a few Jacobiiterations; another approximation u

ð1Þh is obtained:

uð1Þh ¼ Gku

ð0Þh þ f 0h; ð19Þ

(2) a coarse grid correction phase: the residual

r ¼ Ahuð1Þh � fh ð20Þ

verifies the following equation on the coarse grid

RAhPe2h ¼ Rr, ð21Þ

which is solved exactly. We deduce a second approximation of the solution on thefine grid by prolongation of the error e2h as a correction term

uð2Þh ¼ u

ð1Þh � Pe2h ¼ u

ð1Þh � P RAhPð Þ

�1Rr; ð22Þ

(3) a post-relaxation phase: the final approximation is given with relaxation of uð2Þh ,

that is

uð3Þh ¼ Gku

ð2Þh þ f 0h: ð23Þ

For the convergence analysis of such an algorithm, it is sufficient to consider a singleJacobi iteration as relaxation phase. The amplification matrix equivalent to the ideal cyclethus reads

G ¼ G� I� P RAhPð Þ�1RAh

� G�: ð24Þ

The efficiency of this ideal cycle can be proved using the spectral decomposition of G.

4.2.2. Spectral radius of the ideal cycle

The convergence proof can be found in [16,17] for instance. For the sake of comparisonwith the shape optimization problem, here is a sketch of the proof: first we show that G issimilar to a matrix of the form 'D2 where D is diagonal and ' has non-zero entries on thediagonal and perdiagonal only. We deduce the spectrum of G according to the one of 'D2.We have the following result:

. there are N0 eigenvalues equal to zero;

. there are N0 þ 1 eigenvalues equal to 19;

thus �ðGÞ ¼ 19. In short, the method converges (�51) and the convergent rate is mesh-

independent. This result shows the optimal efficiency of the MG method.


To fix an idea, if the fine grid is such that N¼ 31 (h ¼ 132) we can compute the number

of Jacobi iterations needed to reach the same convergence rate. That is, we look for thenumber of iterations n such that

�ðGnÞ �1

9, ð25Þ

which is

n �logð9Þ

logð3Þ � logð1þ 2 cos�hÞ 684: ð26Þ

Based on this result, we have considered two Jacobi iterations as pre- and post-relaxation phases, we need the computational work done for the CGC to be less that 682iterations. For time complexity results we refer to [16,20].

4.2.3. Optimization formulation

We are now interested in the shape optimization problem. We first apply straightforwardlythe two-grid scheme as described in Section 4.2.1. We still consider the P1 parameteriza-tion on T h and T2h as well as the linear interpolation operator P as prolongation operator.

Let us sketch a similar ideal algorithm in the framework of the shape optimizationproblem. Let x(0) be an initial approximation of the optimal design parameters. Let usdescribe the different phases of an ideal algorithm in the optimization terminology:

. the relaxation phase is regarded as one steepest descent iteration on the scaledproblem with step �;

. in terms of optimization, the CGC is still an optimization problem. Hence theideal CGC returns the minimum of the objective on the coarse mesh:

miny2T 2h

Jðxþ PyÞ ð27Þ

for a given x obtained, for instance, by a previous relaxation. The objectivefunction on the coarse grid remains quadratic and the domain is convex, hence theglobal minimum is fully characterized by its stationary conditions.

With a pre-relaxation phase and a post-relaxation phase we have Algorithm 2.

Algorithm 4.2

(1) relaxation: xð1Þ ¼ G�xð0Þ þ b 0h;

(2) CGC: Dy ¼ argminy2T 2hj ð yÞ ¼ Jðxð1Þ þ PyÞ, x(2)¼ x(1)þPDy;

(3) relaxation: xð3Þ ¼ G�xð2Þ þ b 0h:

When the objective is strictly convex and without constraints, the unique minimum ischaracterized by the stationary conditions. In the case of the CGC, since the gradient islinear we have

gð yÞ ¼ PTHhPyþ PTr, ð28Þ

where r¼Hhx(1)� bh is the residual. According that the stationary conditions on the

coarse grid read g(Dy)¼ 0 we have

xð2Þ ¼ xð1Þ � P PTHhP� ��1

PTr: ð29Þ


Therefore the amplification matrix equivalent to the cycle reads

G ¼ G� I� P PTHhP� ��1

PTHh

h iG� , ð30Þ

which has exactly the same structure as the amplification matrix (24). Note that the

restriction operator must be the transpose of the prolongation operator, R¼PT.


The proof sketched here follows the one given in [16,17] (for the detailed proof we refer to

the appendix of [21] or [18]). As for the Poisson equation we first apply similar

transformations to simplify the diagonalization. We show that G is also similar to a matrix

of the form 'D2, where D is diagonal and '’s structure is illustrated in Figure 6.The eigenvalues are however more difficult to derive. Indeed, if the ' structure is

simple, it is harder to simplify its entries. We adopt the following strategy:

(a) an ansatz on the eigenvectors structure (non-zero entries) is set;(b) linear systems are deduced from this hypothesis;(c) these linear systems are solved using the symbolic calculus software MapleTM [22];(d) the solutions and their linear independence are verified.

In short, we obtain the following results:

. there are N0 eigenvalues k equal to zero;

. N 0þ1 ¼125 is an obvious eigenvalue;

B1A

B

CCA2

C

C

A3

. . .. . .1 N „ N „+1 N „+2 N

Figure 6. Structure of the ' matrix: only entries in A and B (diagonal and perdiagonal) arenon-zero.


. the remaining N0 eigenvalues given by MapleTM are

N 0þ1þk ¼1

25

27� 144 cos2 �k2

� �þ 304 cos4 �k

2

� �� 320 cos6 �k

2

� �þ 160 cos8 �k

2

� �3� 8 cos2 �k

2

� �þ 8 cos4 �k

2

� � :

Hence the spectral radius of the ideal cycle is

�ðD2Þ ¼ maxk¼1...N

kj j

¼ maxk¼N 0þ1...N

kj j:

At this stage one can show that N 0þ1þk 4 125 for k¼ 1 . . .N0 and that is a

monotonous decreasing function on the interval � 2 �0 �2 ½. Thus the maximum is attained

at �¼ �1, i.e. �('D2)¼N0þ2, which is mesh-dependent. When the mesh size tends to zero

(h! 0) we have N 0þ2!925 ¼ ð

35Þ2. This is exactly the convergence rate of two Jacobi

iterations (Section 3.2). In other words, the Coarse Grid Correction is useless.

4.3. Alternative transfer operators for shape optimization

The classical strategy fails in the case of the optimization problem since the fine grid and

the coarse grid are not complementary: the relaxation phase behaves as an anti-smoother

while the restriction operator on the coarse grid still behaves as a low-pass filter, the only

modes whose errors have already been well damped, i.e. the LF modes, can be represented.

The zero eigenvalues of the amplification matrix corresponds to the annihilated modes: the

LF modes.In this section we focus on the definition of new transfer operators for shape

optimization. In this framework, the concepts of prolongation and restriction are extended

to any kind of parameterizations such as Bezier curves, B-splines, etc., which are defined

continuously on the domain, although the analysis is still done with the P1 CAD-free

formulation. In addition, we guarantee that the new definition is consistent with the

previously defined transfer operators between meshes (discrete structures).Let F be the reference search space (the fine ‘grid’) and V a subspace of F (the coarse

‘grid’). F has dimension N and VN0 (N05N ). As seen in Section 2.1.2, the fine space relies

on the basis fukgNk¼1 (the so-called parameterization). As element of the fine space, any

element v of V�F can be written in this basis. In a general abstract formulation, a

subspace V is defined by

V � v ¼XNk¼1

xkuk2 F

�� x ¼ Qy 8y 2 RN 0

( ), ð31Þ

where Q is the matrix of the linear application from RN0 to R

N that maps the components

of v in V to the parameterization basis F (the columns of Q are a basis of V in F ): it can be

seen as a prolongation operator.The subspaces V are regarded as correction spaces. In other words, for a given x 2F,

the coarse problem is a minimization problem on the affine subspace xþV, i.e.

x¼ xþDx, where Dx¼Qy for some y2RN0. Thus the parametric objective function,


gradient and Hessian of the coarse problem read

j ð yÞ ¼ Jðx þQyÞ, ð32Þ

gð yÞ ¼ QTGðx þQyÞ, ð33Þ

hð yÞ ¼ QTHðx þQyÞQ: ð34Þ

The CGC of Algorithm 2 can be rewritten as

Dy ¼ argminy2V

j ð yÞ ¼ Jðxð1Þ þQyÞ, xð2Þ ¼ xð1Þ þQDy:

The amplification matrix equivalent to this new cycle reads

G ¼ G� I�Q QTHhQ� ��1

QTHh

h iG�: ð35Þ

Note that the restriction operator is automatically taken as the transpose of theprolongation operator. We do not know anything aboutQTHhQ a priori. Hence the coarsesub-problem is defined in the sense of a Galerkin Coarse Approximation (GCA). In otherwords the choice of the transfer operator Q defines the sense of the ‘coarse grid’. Here,coarse means fewer degrees of freedom but not necessarily a smoother approximation.This definition is consistent with the previous definition when Q is appropriately chosen.Three cases are considered:

(1) the subspaces are embedded parameterizations (referred as Y method);(2) the subspaces are embedded parameterizations preconditioned by a perdiagonal

permutation matrix (referred as Z method) (as suggested in [19]);(3) the subspaces are eigenspaces (referred as � method).

For each of these cases the spectral radius of the ideal cycle is examined when the fineparameterization is composed of P1 elements.

4.3.1. Embedded parameterizations

Assume that we are provided with embedded parameterization spaces V1�V2� � � � �VN.Each of it is considered with a basis fugN

0

k¼1 (e.g. polynomial spaces of increasing degree andBernstein basis, B-splines of constant degree/order and increasing number of splines withknots insertion, etc.). Let F¼VN. We assume that for any of these spaces there exists alinear application from R

N0 to RN noted EN

N 0 such that

8y 2 RN 0 , x ¼ EN

N 0y¼) v ¼XN 0k¼1

ykuN 0

k ¼XNk¼1

xkuNk 2 VN 0 : ð36Þ

In other words, ENN 0 is the application that maps the components of v in VN0 in the basis

of F. For polynomial spaces PN0

and PN (of dimension N0 þ 1 and Nþ 1 resp.) in theBernstein basis this application results from the so-called degree elevation property [23].Formally, the subspaces read

VN 0 � v ¼XNk¼1

xkunk2F

�� x ¼ ENN 0y 8y 2 R

N 0

( )() Q ¼ EN

N 0 : ð37Þ


If the parameterization space is defined with P1 elements, then Q is the classical linearinterpolation operator Q¼P. We have shown in this case that the algorithm is notefficient because a gradient iteration behaves as an anti-smoother operator. For otherparameterization, such as the Bezier parameterization, numerical experiments corroboratethis result [24,25].

4.3.2. Preconditioning by spectrum permutation

In this second section we examine the method proposed in [19]. We keep the assumptionunder which we are provided with embedded parameterizations and their basis togetherwith the classical prolongation operators EN

N 0 . We will use the fact that HF modesconverge slowly. A new transfer operator is designed such that HF modes are projected ona ‘coarse grid’.

Let H be the Hessian matrix. It is a real s.p.d. matrix, therefore diagonalizable withorthogonal eigenvectors and real positive eigenvalues: H¼:,:T, :T :¼::T

¼ IN. Wesuppose (w.l.o.g.) that the eigenvalues k are ordered increasingly. Let us consider thefollowing subspaces:

VN 0 � v ¼XNk¼1

xkuk 2 F

�� x ¼ :P:TENN 0y 8y 2 R

N 0

( ), ð38Þ

that is Q ¼ :P:TENN 0 , where P is the perdiagonal permutation matrix

P ¼

1

. ..

1

0@

1A: ð39Þ

The role of such a transfer operator is to reorganize the eigenpairs such that the relaxationoperator becomes a smoother (in the search space, i.e. for the variable y, but an anti-smoother for the shape).


We still consider the P1 parameterization on each level and Enn 0 ¼ P. We conduct a spectral

analysis of the amplification matrix (35) with Q¼:P:TP. Again, we adopt the same

methodology used to derive the spectral radius of the classical method:

(a) the coarse grid problem (GCA) H0 ¼QTHhQ is simplified;(b) we deduce a simpler form for I�QH0�1QTHh and similar transformations are

applied to G; it follows that G is again similar to a matrix of the form 'D2 where Dis diagonal;

(c) the entries of ' exhibit a structure illustrated in Figure 6;(d) an ansatz is proposed for the eigenvector structure of 'D2 and the linear systems

are solved using MapleTM. An analytical formula is obtained, providing thespectral radius.

The complete proof, which is quite tedious, is done in the appendix of [21] or [18]. Wefind �ðGÞ ¼ 1

25.

4.3.4. Eigenspace transfer operator

In this last method the embedded parameterizations are not necessary. It can be viewed asanalogous to the algebraic version of the MG algorithm. Let us consider a single fine space


F with the basis {uk}. Subspaces are directly deduced from the Hessian diagonalization: thesubspace of dimensionN0 is the space spanned by the lastN0 eigenvectors (we have assumedthat the eigenvectors are decreasingly ordered to be consistent with Section 2.3), i.e.

H ¼ :,:T¼ :1 :2

� � ,1 0

0 ,2

�:T

1

:T2

!, ð40Þ

where

:1 ¼ !1 . . . !N�N 0� �

, :2 ¼ !Nþ1�N 0 . . . !N

� �ð41Þ

and

VN 0 � v ¼XNk¼1

xkuk2F

�� x ¼ :2y 8y 2 RN 0

( )() Q ¼ :2: ð42Þ

The parametric Hessian reads

h ¼ :T2H:2 ¼ ,2, ð43Þ

whose conditioning is necessarily better than the fine space Hessian since

�2ðhÞ ¼Nþ1�N 0

N5

1N¼ �2ðHÞ: ð44Þ

Remark 4.2 One may consider linear equality constraints. In such a caseH is taken as theprojected Hessian. The eigenvectors : belong to the feasible space. A coarse correction istransferred to the feasible ‘fine grid’ using the orthonormal basis Z of the kernel of theconstraints: x¼Z:2 y and Q¼Z:2. Thus the coarse search spaces are not submitted toany additional constraints.


The proof is much simpler than for the previous method, using the fact that the coarse gridbasis is exactly composed of eigenvectors. Thanks to orthogonality properties, G isstraightforwardly simplified. We deduce that �ðGÞ ¼ 1

25 [18].

4.3.6. Comparison of the ideal algorithms with a single-level method

After permutation or eigenspace correction, the spectral radius of an ideal cycle isindependent of the mesh size. With two Jacobi/gradient iterations (pre- and/or post-relaxation) with step � ¼ 4

5 this radius is

�MG ¼1

25: ð45Þ

A n-steps Jacobi/gradient method would give (Section 3.2)

�J 53

5

�n

ð46Þ

for all mesh size. Hence we would only need n � logð25Þlogð5Þ�logð3Þ 6:30 iterations to reach at

least the same convergence rate (i.e. n¼ 7).


In other words, the work done on the coarse grid should not exceed the work done torun five Jacobi iterations for the MG algorithm to be more efficient. Again, this is due tothe fact that the shape optimization problem is well conditioned with P1 elements. Theconditioning often becomes very poor when using other types of parameterization (e.g.Bernstein polynomial, Bezier representation) or when physics comes into considerationthrough spatially variable coefficients.

4.4. Numerical experiments

In order to illustrate and confirm the theoretical results, let us realize numericalexperiments of the two-grid algorithms. The parameterization is piecewise linear (P1) onthe grids T 2ph where p¼ 0, 1,. . . , 7 and h ¼ 1

Nþ1 is the mesh size with N¼ 256. The targetfunction �u is composed of sinusoidal and exponential terms (i.e. not exactly representablein a finite polynomial representation). One pre-relaxation iteration (�1¼ 1) and one post-relaxation iteration (�2¼ 1) with parameter � ¼ 4

5 are considered. From the two alternativealgorithms, only the algebraic approach (4.3.4) will be considered in the numericalexample since they both have the same convergence rate.

The convergence is represented in terms of the discrete L2 norm of the residualrðxÞ ¼

def:Hhx� bhk kL2

. The convergence threshold is set to "¼ 10�14 � r0 where r0¼ r(x(0)) isthe residual of the initial approximation x(0)¼ 0.

The results are illustrated in Figures 7 (shape approximation) and 8 (convergence). Theconvergence of the single level and the ML algorithms are represented on the same graphw.r.t. the equivalent work of a relaxation iteration. That is, for ML algorithms, thegeometrical mean of the convergence rate is represented.

As expected, the convergence rate of the Jacobi method tends to 35 ¼ 0:6. The threshold

is reached after 40 iterations. The equivalent convergence rate per relaxation iteration ofthe classical MGV method also tends to 3

5. The threshold is reached after about 24 cycles(equivalent to 48 Jacobi iterations on the fine grid). Finally, the computed convergencerate of the optimization MGV method is equal to 0:2 ¼ ð 125Þ

12, as expected. The convergence

is reached after 11 cycles (equivalent to 22 Jacobi iterations on the fine grid).Generally, the convergence of MG-like algorithms has to be represented in terms of

working units (WU; [20]) taking into account the work involved in the grid-to-gridtransfers and not only the relaxation operations. In an efficient implementation of an MGmethod, this additional work should be balanced by a strong reduction of the convergencerate, thus requiring fewer iterations and in the end, less computation time. However, whenthe problem is not stiff, such a method of evaluation is not totally relevant since a singlelevel method also requires few iterations and converges rapidly; this is the case in ourproblem with a P1 parameterization, according to the very small condition number(Remark 4.2). We emphasize the fact that in this experiment, we just aim at confirming thetheoretical work, i.e. illustrate the convergence rate w.r.t. a cycle iteration (the convergencerate here is a measure of the algorithm efficiency, independent of the amount of work doneat each iteration). Assuming this goal, the transfer operations were not optimized; a realtime measure would not be relevant here.

In the end though, it is the real time execution which matters. We consider this firstanalysis as an argument for further strategies that are applied to a stiff problem inSection 5.3 and a real case of antenna optimization in Section 6. There, somequantification of the computation time is provided.


5. Stiffness due to the parameterization

The stiffness of the optimization problem is closely related to the bad or ill conditioning of

the Hessian matrix. Let us compute the condition number �2 ¼n1of the Hessian matrix for

different parametric basis functions and w.r.t. the space dimension.

–0.1

0

0.1

0.2

0.3

0.4Target

–2e–14

–1.5e–14

–1e–14

–5e–15

0

5e–15

1e–14

1.5e–14

2e–14Residual–Jacobi

–2e–14

–1.5e–14

–1e–14

–5e–15

0

5e–15

1e–14

1.5e–14

2e–14Residual–MGV

–2e–14

–1.5e–14

–1e–14

–5e–15

0

5e–15

1e–14

1.5e–14

2e–14

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Residual–MGV Opt

Figure 7. Approximation with P1 parameterization – target function and residual.

1e–16

1e–14

1e–12

1e–10

1e–08

1e–06

0.0001

0.01

1

ri

ri1/ri

Number of iterations Number of iterations

L2–norm of residual

JacobiMGV

MGV Opt

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 0 10 20 30 40 50

Convergence rate

JacobiMGV

MGV Opt

Figure 8. Convergence with P1 parameterization – L2-norm and rate.


5.1. Bezier parameterization

The best approximation problem can be formulated in a polynomial space. Using a Bezierrepresentation of the graph of u consists in using the Bernstein polynomials as basisfunctions, uk ¼ Bk

N. Compared to the class of piecewise linear shapes, a smooth shape isobtained with few parameters. For Bezier curve properties we refer to [23], and forapplication to ML shape optimization we refer to [19].

The projected Hessian on the feasible space is straightforwardly obtained by removingthe first and the last lines and columns of the Hessian matrix. The elements of the Hessianmatrix are given by

hkj ¼

Z 1

0

ukðtÞuj ðtÞdt ¼Ck

Nþ1CjNþ1

Ckþj2ðNþ1Þ

1

2ðNþ 1Þ þ 1: ð47Þ

We do not know any closed form for the eigenpairs of such a matrix, hence they arecomputed numerically.

To be consistent with the analysis conducted with the piecewise linear elements, weassume that the problem has been scaled beforehand (that is, preconditioned by the inverseof the diagonal of the Hessian matrix). In that form the steepest descent method isequivalent to the Jacobi iterations. To investigate the decay factors we must compute theeigenpairs of D�1H H where DH is the diagonal part of H, the projected Hessian. Figures 9and 10 depict the eigenvectors, the eigenvalues and the decay factors corresponding to oneiteration of a steepest descent method.

A spectral structure similar to the one obtained with the P1 elements can be observed:the eigenvectors are Fourier-like modes; the LF modes are associated to the largesteigenvalues and the HF modes to the smallest. The main difference is that the matrix is ill-conditioned: with N¼ 16 we have �2ðD

�1H HÞ 109. This yields an amplification matrix for

which no � can be set to define a smoother.According to the decay function of one descent iteration, for which the amplification

matrix reads G� ¼ I� �D�1H H, the convergence is obtained if � 2 �0 2max½ where max 6. In

this interval, the convergence rate of the HF modes remains close to one. Even themaximum of the decay factor of the LF part is large. In other words no solution operatorcan be efficient on the half-space spanned by the LF modes. Consequently, the coarseparameterization space of an ideal two level algorithm must be redefined as a space oflarger dimension than just the half of the fine space. For an ML algorithm, manyintermediate levels should be considered.

5.2. Note on condition numbers

When it can be estimated, the condition number gives a measure of the stiffness, that is anassessment of the difficulty involved in solving the numerical problem iteratively. In Figure11 we represent the condition number of the Hessian matrix of (6) w.r.t. the spacedimension using the Bernstein and other parameterizations. It shows that: (1) theBernstein parameterization leads rapidly to very stiff problems; (2) orthogonal basis andB-splines of degree 3 are well conditioned (with B-splines, it seems bounded above).

By using a CAD representation, the smoothness of the generated shapes is wellcontrolled, and the number of necessary degrees of freedom is usually small. On the otherhand, it generally increases the condition number. However, as a pure geometrical problem,


1 8 16−0.5

0

0.5Ω1

1 8 16−0.5

0

0.5Ω2

1 8 16−0.5

0

0.5Ω3

1 8 16−0.5

0

0.5Ω4

1 8 16−0.5

0

0.5Ω5

1 8 16−0.5

0

0.5Ω6

1 8 16−0.5

0

0.5Ω7

1 8 16−0.5

0

0.5Ω8

1 8 16−0.5

0

0.5Ω9

1 8 16−0.5

0

0.5Ω10

1 8 16−0.5

0

0.5Ω11

1 8 16−0.5

0

0.5Ω12

1 8 16−0.5

0

0.5Ω13

1 8 16−0.5

0

0.5Ω14

1 8 16−0.5

0

0.5Ω15

1 8 16−0.5

0

0.5Ω16

Figure 9. Bernstein – eigenvectors – N¼ 16.

2 4 6 8 10 12 14 160

1

2

3

4

5

6

LF

HF

k

λk

(a)

1 4 8 12 16−1

−0.5

0

0.5

1

Mode frequency

Fac

tor

τ = 1/6τ = 1/4τ = 1/3

(b)

Figure 10. Bernstein – eigenvalues and decay factors – N¼ 16: (a) Eigenvalues and (b) decay factors.


it is clear that the locally defined B-spline functions are more adequate. Hence we do notrecommend using Bezier–Bernstein functions for parametric shape optimization. This willalso be confirmed later in the application (Section 6). Nevertheless, we cannot guaranteethat in a physically-relevant problem the B-spline functions still define a well-conditionedproblem regardless of the space dimension.

In the next section, in order to demonstrate the efficiency of the present algorithm inthe treatment of an ill-conditioned problem, we perform a numerical experiment on thegeometrical model problem using the Bernstein basis.

5.3. Numerical experiments

We aim to approach the same graph as for the P1 basis (Figure 7) with a Bezier curve ofdegree N. The accuracy of the optimal value J that can be achieved is given by the shapeobtained by orthogonal projection of the target function onto the space of polynomials ofdegree N (that is, in the Legendre basis, which is orthogonal for the usual scalar product inL2). Thus, we can measure the convergence of the ML schemes in terms of the objectiveerror e¼

def:J� J.

The fine parameterization is of degree N¼ 21. First we realize a numerical optimizationwith a classical method on the unique fine level. Then the ML strategies are tested with asaw-tooth scheme (in this linear case, the convergence rate of this scheme is equal to theone of a V-cycle) with two pre-relaxation iterations (�1¼ 2) of Conjugate Gradient (CG)and seven levels (N¼ 3, 6, 9, 12, 15, 18, 21). We consider that an algorithm has convergedwhen the relative gradient norm or the relative fitness residual is less than "¼ 10�15.

The results are presented in Figures 12 (approximation) and 13 (convergence in termsof fitness evaluations and real time). The problem is effectively very stiff (�2 8.4� 1012):the convergence of the single level (SL) strategy is slow and stagnates after about 1900evaluations (1 s); around 4500 evaluations (2.7 s), the algorithm converges with theperformance e/e0 10�9. We observe that the classical strategy (MLY) is ineffective: theconvergence is slightly slower than the single level strategy; at convergence, around 6000evaluations (3.8 s), the fitness performance is such that the relative error is aboute/e0 10�8. On the contrary, the alternative strategies, with spectrum permutation (MLZ)or algebraic (MLO), are more efficient: the performance of the single level strategy is

1 100

10000 1e+06 1e+08 1e+10 1e+12 1e+14 1e+16 1e+18

0 5 10 15 20 25 30

Con

ditio

n nu

mbe

r

Degrees of freedom

Condition number w.r.t. problem dimension

Bernstein

(a)

0

10

20

30

40

50

60

0 5 10 15 20 25 30

Con

ditio

n nu

mbe

r

Degrees of freedom

Condition number w.r.t. problem dimension

B-splineLegendre

Tchebychev

(b)

Figure 11. condition number w.r.t. the space dimension: (a) Bernstein parameterization and(b) other parameterization.


reached after 150 iterations (0.12 s, i.e. about 20 times faster); then the spectrumpermutation strategy fails to solve on the very coarse grid and reaches the performance ofe/e0 10�12 after 5000 evaluations (4 s); the algebraic strategy is the best, converging to themachine precision e/e0 10�16 after 1000 evaluations (0.6 s). In terms of uniform error thisdifference is clearly observable in Figure 12.

–0.001

–0.0005

0

0.0005

0.001Error – conjugate gradient

–0.001

–0.0005

0

0.0005

0.001Error – MG classical (MLY)

–0.001

–0.0005

0

0.0005

0.001Error – MG permutation (MLZ)

–0.001

–0.0005

0

0.0005

0.001

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

Error – MG albegraic (MLO)

Figure 12. Shape error at convergence with Bernstein parameterization.

1e–16

1e–14

1e–12

1e–10

1e–08

1e–06

0.0001

0.01

1

ei/e0

ei/e0

Number of evaluations

Relative fitness errorSL

MLYMLZMLO

1e–16

1e–14

1e–12

1e–10

1e–08

1e–06

0.0001

0.01

1

0 1000 2000 3000 4000 5000 6000 0 0.5 1 1.5 2 2.5 3 3.5 4

Real time (s)

Relative fitness errorSL

MLYMLZMLO

Figure 13. Convergence with Bernstein parameterization – relative fitness error.


Note here that the representation of the convergence in real time takes into account thecomputational work done for the transfer operations. This shows that the ML algorithm isindeed globally efficient.

Remark 5.1 In the framework of optimization problems, an optimal relaxationparameter is seldom known in practice. Alternatively, it is computed by a line searchprocedure at each iteration. As a consequence, the convergence rate per iteration is quiteinhomogeneous. The speed of convergence is better represented with the convergence ofthe fitness function.

6. Application to antenna design

A reflector antenna is a device that is widely used for satellite communication. It isschematically composed of a primary source of time-harmonic electromagnetic waves anda set of reflecting surfaces called reflectors [26]. The design of such devices is oftenaddressed as follows: find the optimal shape of the reflectors such that an ideal powerradiation pattern is achieved. We wish to minimize a shape functional of the form (1)where u is the electric field and O(u) the radiated power [27].

We have used the Free-Form Deformation technique [28] for the parametricrepresentation of the reflectors. We first used a Bernstein basis (Bezier) and thenB-splines basis of degree 3, since the Bernstein basis contributes itself to the badconditioning of the problem (Section 5.2 and [19]).

Without any knowledge on the discrete spectral structure we define the intermediateand coarse levels using the model problem (geometric levels). To avoid difficulties due tomultimodality we have applied a robust optimization technique to provide a preliminarysolution shown in Figure 14; we then wish to improve its performance using ML strategies.The settings of the experiments are the following: 11 levels such that the fine level hasdimension 30 and the coarse 3; two iterations of Conjugate Gradient as relaxation; a saw-tooth scheme (from fine to coarse).

Since we are interested here in the convergence speed of the ML algorithms we showthe history convergence of the objective function. For a more detailed description of thephysical results, see [18,25]. In PDE-constrained problems, the computational time is

–300

–200

–100

0

100

200

300

–200 –100 0 100 200 300 400 500

Figure 14. Axisymmetrical antenna – initial shape – wave guide (left) and reflector (right).


largely dominated by the numerical approximation of the differential equation (in our casearound 1min per Finite Element Analysis), which is required for every fitness evaluation.Hence, for the purpose of comparisons between algorithms, we can reasonably representthe global computational time in terms of number of fitness evaluations. In this experimentthe maximum number of fitness evaluations has been reached (400 evaluations) while noneof the algorithms have converged. As it can be seen in Figure 15, compared to a single levelstrategy (SL CG), the classical MG method (MLY) fails to speed up the convergence whilethe alternative strategy (MLO) clearly converges faster, even with the B-splineparameterization. Note that in this latter case, the performance is significantly improved(38%), whereas it is only marginal with the Bernstein parameterization (3%). The applieddeformations on the reflector are shown in Figures 16 and 17.

7. Quasi-Newton

The Quasi-Newton class of methods are very popular for parametric optimizationproblems since their convergence rate is asymptotically quadratic [29]. The descentdirection d is given by an iteratively updated preconditioning of the gradient:

Bid ¼ rJ: ð48Þ

The most popular method to update the preconditioner Bi is known as the BFGS method.In fact there exists two formulations: the direct and the inverse (or DFP method). In theformer method, the Hessian matrix is approximated, hence the linear system (48) has to besolved at each iteration, whereas in the latter formulation the inverse of the Hessian matrixis approximated and the descent direction is explicitly computed. While this secondmethod looks attractive, it may not be relevant in the case of inverse problems.

As for the MG strategies, the reason relies on the nature of the Hessian. Indeed, wehave in (1) an integral operator, which is compact; as such it can easily be approximated byfinite rank corrections (BFGS direct). The inverse operator is however unbounded (in aninfinite dimensional space; bounded but ill-conditioned through a discretization proce-dure). This is a classical result in functional analysis that can be found for instance in [30;Chapter 10], or [31; Chapters 3 and 4].

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

0 50 100150200250300350400 0 50 100 150 200 250 300 350 400

Normalized fitness

Number of functional evaluations

SL CGML YML O

(a)

0.5

0.6

0.7

0.8

0.9

1

Normalized fitness

Number of functional evaluations

SL CGML O

(b)

Figure 15. Convergence of ML parametric shape optimization algorithms applied to the design ofreflector antenna: (a) Bernstein and (b) B-splines parameterizations.


Hence, while the direct BFGS may require a linear system to be solved, thepreconditioner is efficient. On the contrary, the inverse BFGS fails to effectivelyprecondition the problem. Let us illustrate this in Figures 18 (approximation) and19 (convergence).

8. Conclusion

ML approaches to solve large systems arising from the discretization of stiffPDE problems are very popular since their convergence rate are independent of the

–5–4–3–2–1 0 1 2 3 4 5

–300 –200 –100 0 100 200 300 –300 –200 –100 0 100 200 300

–300 –200 –100 0 100 200 300

(a) (b)

(c)

–5–4–3–2–1 0 1 2 3 4 5

–5–4–3–2–1 0 1 2 3 4 5

Figure 16. Deformation of the reflector – Bernstein: (a) single level CG; (b) ML classical MLY; and(c) ML algebraic MLO.

–5–4–3–2–1 0 1 2 3 4 5

–300 –200 –100 0 100 200 300 –300 –200 –100 0 100 200 300

(a)

–5–4–3–2–1 0 1 2 3 4 5(b)

Figure 17. Deformation of the reflector – B-spline: (a) single level CG and (b) ML algebraic MLO.


problem dimension. In practice, this ideal property is difficult to demonstrate for realworld problems. Nevertheless, the ML algorithms remain very attractive since theygenerally converge faster than one-grid methods.

For shape optimization problems, the ML preconditioning strategies are not alwaysclearly established. This is partly due to additional difficulties that are intrinsic tooptimization problems (for instance multimodality) or shape representation (CAD-free,parametric representation, etc.).

In this article, we have considered shape inverse problems such as (1) because webelieve they cover a large class of realistic applications in engineering (known as inversedesign). In this framework, and for parametric shape representations, we have shown thatan ideal convergence rate can be derived for a simple but representative problem, thusproviding a sound basis for fast parametric shape optimization ML methods. Moreprecisely, we have shown that the classical MG strategies cannot be directly applied toshape inverse problems; indeed the ideal convergence rate of geometrical MG methods forPDE is closely related to the spectral properties of discrete differential operators, while theinverse problem is addressed here as an integral problem, which can be seen as the inverseoperator of the differential. Therefore, taking into account this fundamental difference,

1e–16

1e–14

1e–12

1e–10

1e–08

1e–06

0.0001

0.01

1

ei/e0

ei/e0

Number of evaluations

Relative fitness errorDirect BFGS

Inverse BFGS

1e–16

1e–14

1e–12

1e–10

1e–08

1e–06

0.0001

0.01

1

0 1000 2000 3000 4000 5000 6000 0 0.5 1 1.5 2 2.5 3 3.5 4

Real time (s)

Relative fitness error

Direct BFGSInverse BFGS

Figure 19. Convergence – relative fitness error.

–0.001

–0.0005

0

0.0005

0.001Error – directBFGS

–0.001

–0.0005

0

0.0005

0.001

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Error – inverseBFGS

Figure 18. Approximation – target function and error.


we have examined two-level ideal algorithms where the ‘coarse-grid correction’ wasdefined using non-classical transfer operators. Numerical examples have been successfullyconducted, including a realistic non-linear problem in electromagnetics.

In addition, this fundamental difference must also be considered by otherpreconditioners such as the BFGS method.

Acknowledgements

The authors are indebted to the R&D department of the Orange Labs in La Turbie which providedthe software SRSR for the numerical simulation of electromagnetic wave propagation in free spacewhich was used for the numerical experiments of Section 6.

Notes

1. In the CAGD terminology (Bezier, B-spline, NURBS, etc.) the coefficients x are also calledcontrol points.

2. In the style of [13], a simple manner to illustrate that a linear operator M is a smoother (or an‘anti-smoother’) is to look at the discrete Fourier transform (DFT) of the Krylov vectorsqi¼Mix for some vector x that contains all frequencies (i.e. non-zero in the direction of eacheigenvector)

3. The study of the Jacobi method applied to the Poisson equation is a well-known problem.We recall it for the sake of comparison with the shape optimization problem.

4. These properties, A2h¼RAhP and 40 such that R¼PT, are known as the variationalproperties.

References

[1] J. Sokolowski and J.P. Zolesio, Introduction to Shape Optimization, Springer-Verlag,

Heidelberg, 1992.[2] M.C. Delfour and J.P. Zolesio, Shapes and Geometries, Metrics, Analysis, Differential Calculus,

and Optimization, Advances in Design and Control, SIAM, Philadelphia, PA, 2001.

[3] E. Polak, Optimization: Algorithms and Consistent Approximations, Springer-Verlag, New York,

NY, 1997.[4] B. Mohammadi and O. Pironneau, Applied Shape Optimization for Fluids, Oxford University

Press, Oxford, 2001.

[5] O. Pironneau and E. Polak, Consistent approximations and approximate functions and gradients

in optimal control, SIAM J. Control Optim. 41 (2002), pp. 487–510.[6] N. Marco and F. Beux, Multilevel optimization: Application to one-shot shape optimum design,

Research Report No. 2068, INRIA, Sophia-Antipolis, France, 1993.

[7] E. Arian and S. Ta’asan, Smoothers for optimization problems, in Seventh Copper Mountain

Conference on Multigrid Methods, Vol. CP3339, N.D. Melson, T. Manteuffel, S. McCormick

and C. Douglas, eds., NASA, NASA Conference Publication, Hampton, VA, 1996, pp. 15–30.[8] N. Marco and A. Dervieux, Multilevel parametrization for aerodynamical optimization of 3D

shapes, Finite Elem. Anal. Des. 26 (1997), pp. 259–277.[9] V.H. Schulz, Solving discretized optimization problems by partially reduced SQP methods,

Comput. Vis. Sci. 1 (1998), pp. 83–86.[10] T. Dreyer, B. Maar, and V.H. Schulz, Multigrid optimization in applications, J. Comput. Appl.

Math. 120 (2000), pp. 67–84.[11] S.G. Nash, A multigrid approach to discretized optimization problems, Optim. Methods Softw. 14

(2000), pp. 99–116.


[12] R.M. Lewis and S.G. Nash, A multigrid approach to the optimization of systems governed by

differential equations, in 8th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary

Analysis and Optimization, AIAA, Long Beach, CA, 2000, Article ID AIAA-2000-4890.[13] R.M. Lewis and S.G. Nash, Model problems for the multigrid optimization of systems governed

by differential equations, SIAM J. Sci. Comput. 26 (2005), pp. 1811–1837.[14] M. Martinelli and F. Beux, Multi-level gradient-based methods and parametrisation in

aerodynamic shape design, Rev. Eur. Mec. Numer. 17 (2008), pp. 169–197.[15] A. Borzı and V.H. Schulz, Multigrid methods for PDE optimization, SIAM Rev. 51 (2009),

pp. 361–395.[16] J.A. Desideri,Modeles Discrets et Schemas Iteratifs, Application aux Algorithmes Multigrilles et

Multidomaines, Hermes, Paris, 1998.[17] J.W. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied

Mathematics, Philadelphia, PA, USA, 1997.[18] B. Chaigne, Methodes hierarchiques pour l’optimisation geometrique de structures rayonnantes,

Ph.D. thesis, Universite de Nice, October 2009.[19] J.A. Desideri, Two-level Ideal Algorithm for Parametric Shape Optimization, Advances in

Numerical Mathematics, Institute of Numerical Mathematics, Russian Academy of Sciences,

Moscow, 2006, pp. 65–85.[20] P. Wesseling, An Introduction to Multigrid Methods, John Wiley & Sons, Chichester, 1991.[21] B. Chaigne and J.A. Desideri, Convergence of a two-level ideal algorithm for a parametric shape

optimization model problem, Research Report No. RR-7068, INRIA, Sophia-Antipolis, France,

2009.[22] M.B. Monagan, K.O. Geddes, K.M. Heal, G. Labahn, S.M. Vorkoetter, J. McCarron, and

P. DeMarco, Maple 10 Programming Guide, Maplesoft, Waterloo ON, Canada, 2005.[23] G. Farin, Curves and Surfaces for CAGD: A Practical Guide, 5th ed., Morgan Kaufmann

Publishers, San Francisco, CA, USA, 2002.[24] J.C. Zhao, J.A. Desideri and B. Abou Majd El, Two-level correction algorithms for model

problems, Research Report No. 6246, INRIA, Sophia-Antipolis, France, July 2007.[25] B. Chaigne and J.A. Desideri, Methodes hierarchiques pour la conception optimale de forme

d’antenne a reflecteur, Research Report No. 6625, INRIA, Sophia-Antipolis, France, September

2008.

[26] P.F. Combes, Micro-ondes, Vol. 2, Dunod, Paris, France, 1997.[27] T.S. Angell and A. Kirsch, Optimization Methods in Electromagnetic Radiation, Springer

Monographs in Mathematics, Springer-Verlag, New York, 2004.[28] T.W. Sederberg and S.R. Parry, Free-form deformation of solid geometric models, SIGGRAPH

Comput. Graph. 20 (1986), pp. 151–160.[29] C. Kelley, Iterative Methods for Optimization, SIAM, Raleigh, North Carolina, 1999.[30] K. Yosida, Functional Analysis, Springer-Verlag, Berlin, 1965.

[31] G.W. Hanson and A.B. Yakolev, Operator Theory for Electromagnetics: An Introduction,

Springer-Verlag, New York, 2002.


Copyright of Inverse Problems in Science & Engineering is the property of Taylor & Francis Ltd and its content

may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express

written permission. However, users may print, download, or email articles for individual use.

Convergence of a two-level ideal algorithm for a parametric shape inverse model problem

Documents

Transcript of Convergence of a two-level ideal algorithm for a parametric shape inverse model problem