Semi-Infinite Programming

Nonconvex Optimization and Its Applications

Volume 25

Managing Editors:

Panos Pardalos University of Florida, U.S.A.

Reiner Horst University of Trier, Germany

Advisory Board:

Ding-ZhuDu University of Minnesota, U.S.A.

C.A. Floudas Princeton University, U.S.A.

G.lnfanger Stanford University, U.S.A.

J. Mockus Lithuanian Academy of Sciences, Lithuania

P.D. Panagiotopoulos Aristotle University, Greece

H.D. Sherali Virginia Polytechnic Institute and State University, U.S.A.

The titles published in this series are listed at the end of this volume.

Semi-Infinite Programming

Edited by

Rembert Reemtsen

Institute of Mathematics, Brandenburg Technical University ofCottbus

and

Jan-J. Riickmann

Institute for Applied Mathematics, University of Erlangen-Nuremberg

Springer-Science+Business Media, B.Y.

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-1-4419-4795-6 ISBN 978-1-4757-2868-2 (eBook) DOI 10.1007/978-1-4757-2868-2

Printed on acid-free paper

All Rights Reserved © 1998 Springer Science+Business Media Dordrecht

Originally published by Kluwer Academic Publishers in 1998.

Softcover reprint of the hardcover 1 st edition 1998

No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.

CONTENTS

PREFACE

CONTRIBUTERS

Part I THEORY

1 A COMPREHENSIVE SURVEY OF LINEAR SEMI-INFINITE OPTIMIZATION THEORY Miguel A. Goberna and Marco A. Lopez 1 Introduction 2 Existence theorems for the LSIS 3 Geometry of the feasible set 4 Optimality

5 Duality theorems and discretization 6 Stability of the LSIS 7 Stability and well-posedness of the LSIP problem 8 Optimal solution unicity REFERENCES

2 ON STABILITY AND DEFORMATION IN SEMI-INFINITE OPTIMIZATION Hubertus Th. Jongen and Jan-J. Riickmann 1 Introduction

2 Structure of the feasible set

3 Stability of the feasible set 4 Stability of stationary points

5 Global stability

v

xi

xv

1

3 3 5 6

10

12 14 19 23 25

29 29 32 40

44 53

VI SEMI-INFINITE PROGRAMMING

6 Global deformations REFERENCES

3 REGULARITY AND STABILITY IN NONLINEAR SEMI-INFINITE OPTIMIZATION Diethard Klatte and Rene Henrion 1 Introduction 2 Upper semicontinuity of stationary points 3 Metric regularity of the feasible set mapping 4 Stability of local minimizers 5 Concluding remarks REFERENCES

4 FIRST AND SECOND ORDER OPTIMALITY CONDITIONS AND PERTURBATION ANALYSIS OF SEMI-INFINITE PROGRAMMING PROBLEMS Alexander Shapiro 1 Introduction

57 63

69 69 73 83 95 98 99

103 103

2 Duality and first order optimality conditions 106 3 Second order optimality conditions 115 4 Directional differentiability of the optimal value function 122 5 Stability and sensitivity of optimal solutions REFERENCES

Part II NUMERICAL METHODS

5 EXACT PENALTY FUNCTION METHODS FOR NONLINEAR SEMI-INFINITE PROGRAMMING Ian D. Coope and Christopher J. Price 1 Introduction

127 130

135

137 137

2 Exact penalty functions for semi-infinite programming 143

3 Trust region versus line search algorithms 145 4 The multi-local optimization subproblem 148

Contents Vll

5 Final comments 154 REFERENCES 155

6 FEASIBLE SEQUENTIAL QUADRATIC PROGRAMMING FOR FINELY DISCRETIZED PROBLEMS FROM SIP Craig T. Lawrence and Andre L. Tits 159 1 Introduction 159 2 Algorithm 163 3 Convergence analysis 167 4 Extension to constrained minimax 177 5 Implementation and numerical results 180 6 Conclusions 186 REFERENCES 186 APPENDIX A Proofs 189

7 NUMERICAL METHODS FOR SEMI-INFINITE PROGRAMMING: A SURVEY Rembert Reemtsen and Stephan Gomer 195 1 Introduction 195 2 Fundamentals 196 3 Linear problems 219 4 Convex problems 234 5 Nonlinear problems 243 REFERENCES 262

8 CONNECTIONS BETWEEN SEMI-INFINITE AND SEMIDEFINITE PROGRAMMING Lieven Vandenberghe and Stephen Boyd 277 1 Introduction 277 2 Duality 280 3 Ellipsoidal approximation 281 4 Experiment design 285 5 Problems involving power moments 289 6 Positive-real lemma 291 7 Conclusion 292

VIll SEMI-INFINITE PROGRAMMING

REFERENCES 292

Part III APPLICATIONS 295

9 RELIABILITY TESTING AND SEMI-INFINITE LINEAR PROGRAMMING 1. Kuban Altmel and Siileyman Ozekici 297 1 Introduction 297 2 Testing systems with independent component failures 301 3 Solution procedure 306 4 Testing systems with dependent component failures 311 5 A series system working in a random environment 318 6 Conclusions 320 REFERENCES 321

10 SEMI-INFINITE PROGRAMMING IN ORTHOGONAL WAVELET FILTER DESIGN Ken O. Kortanek and Pierre Moulin 323 1 Quadrature mirror filters: a functional analysis view 324 2 Design implications from the property of perfect reconstruc-

tion 332 3 The perfect reconstruction semi-infinite optimization prob-

lem 339 4 Characterization of optimal filters through SIP duality 342 5 On some SIP algorithms for quadrature mirror filter design 346 6 Numerical results 351 7 Regularity constraints 353 8 Conclusions 354 REFERENCES 355

11 THE DESIGN OF NONRECURSIVE DIGITAL FILTERS VIA CONVEX OPTIMIZATION Alexander W. Potchinkov 361 1 Introduction 361 2 Characteristics of FIR filters 364

Contents ix

3 Application fields 368 4 Approximation problems 371 5 The optimization problem 374 6 Numerical examples 378 7 Conclusion 385 REFERENCES 386

12 SEMI-INFINITE PROGRAMMING IN CONTROL Ekkehard W. Sachs 389 1 Optimal control problems 390 2 Sterilization of food 395 3 Flutter control 401 REFERENCES 411

PREFACE

Semi-infinite programming (briefly: SIP) is an exciting part of mathematical programming. SIP problems include finitely many variables and, in contrast to finite optimization problems, infinitely many inequality constraints. Problems of this type naturally arise in approximation theory, optimal control, and at numerous engineering applications where the model contains at least one inequality constraint for each value of a parameter and the parameter, representing time, space, frequency etc., varies in a given domain. The treatment of such problems requires particular theoretical and numerical techniques.

The theory in SIP as well as the number of numerical SIP methods and applications have expanded very fast during the last years. Therefore, the main goal of this monograph is to provide a collection of tutorial and survey type articles which represent a substantial part of the contemporary body of knowledge in SIP. We are glad that leading researchers have contributed to this volume and that their articles are covering a wide range of important topics in this subject. It is our hope that both experienced students and scientists will be well advised to consult this volume.

We got the idea for this volume when we were organizing the semi-infinite programming workshop which was held in Cottbus, Germany, in September 1996. About forty scientists from fourteen countries participated in this workshop and presented surveys or new results concerning the field. At the same time, an up-to-date monograph on SIP was much missing so that we invited several of the participants to contribute to such volume. The result is the present collection of articles.

The volume is divided into the three parts Theory, Numerical Methods, and Applications, each of them consisting of four articles. Part I: Theory starts with a review by Goberna and Lopez on fundamentals and properties of linear SIP, including optimality conditions, duality theory, well-posedness, and geometrical properties of the feasible and the optimal set. Subsequently, Jongen and Ruckmann survey the structure and stability properties of SIP problems, where, in particular, the topological structure of the feasible set, the strong sta-

xi

xu SEMI-INFINITE PROGRAMMING

bility of stationary points, and one-parametric deformations are investigated. Eventually, in the contributions by Klatte and Henrion and by Shapiro, SIP problems are considered which depend on additional parameters, where Klatte and Henrion provide an overview of interrelations between metric regularity, constraint qualifications, local boundedness of multipliers, and upper semicontinuity of the stationary solution mapping, while Shapiro discusses properties as duality, optimality conditions, stability, and sensitivity of problems which are described by cone constraints.

In the first chapter of Part II: Numerical Methods, Coope and Price trace and study the application of exact penalty function methods to the solution of SIP problems. Next, in an article by Lawrence and Tits, the convergence and the effectiveness of a new sequential quadratic programming algorithm for the solution of finely discretized nonlinear SIP problems or other problems with many inequality constraints are verified. Afterwards, Reemtsen and Gorner describe the fundamental ideas for the numerical solution of SIP problems and provide a comprehensive survey of existing methods. Connections between SIP and semi-definite programming are finally explored by Vandenberghe and Boyd, who especially study a number of applications, including such from signal processing, computational geometry, and statistics.

The last part, Part III: Applications, begins with an article by Altinel and Ozekici, who investigate an approach in reliability testing of a complex system, which leads to the solution of a parameterized linear SIP problem. It follows an article by Kortanek and Moulin who focus on certain wavelet design problems which lead to linear SIP models and can be solved in this way. Afterwards, Potchinkov gives an introduction to FIR filter design and shows that the main design problems also in this field can be best modelled and solved as (convex) SIP programs. Finally, in the contribution by Sachs, several control problems are described and studied from the viewpoint of SIP, where, in particular, the problems of food sterilization and the flutter of aircraft wings are discussed.

We are very grateful to all authors of this book for their valuable contributions and to the referees of the articles for their qualified reports. We furthermore wish to express our sincere gratitude to John R. Martindale from Kluwer Academic Publishers for offering us the opportunity to edit this volume and for his practical help, encouragement, and understanding support. Finally, we would like to thank Jorg Biesold for the careful preparation of the camera-ready version of the manuscript.

Preface Xlll

We hope that this volume will give a substantial impetus to the further development of semi-infinite programming, and we invite the reader to participate in the research and application of this very interesting field.

Cottbus and Erlangen, December 1997

REMBERT REEMTSEN

J AN-J. RUCKMANN

CONTRIBUTERS

i. Kuban Altmel Department of Industrial Engineering, BogazilSi University, Istanbul, Thrkey

Stephen Boyd Electrical Engineering Department, Stanford University, Stanford, California, USA

Ian D. Coope Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand

Miguel A. Goberna Department of Statistics and Operations Research, University of Alicante, Alicante, Spain

Stephan Garner Department of Mathematics, Technical University of Berlin, Berlin, Germany

Rene Henrion Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany

Hubertus Th. Jongen Department of Mathematics, RWTH-Aachen, Aachen, Germany

Diethard Klatte Institute for Operations Research, University of Zurich, Zurich, Switzerland

Ken O. Kortanek College of Business Administration, University of Iowa, Iowa City, Iowa, USA

Craig T. Lawrence Department of Electrical Engineering and Institute for Systems Research, University of Maryland, College Park, Maryland, USA

Marco A. Lopez Department of Statistics and Operations Research, University of Alicante, Alicante, Spain

xv

XVI SEMI-INFINITE PROGRAMMING

Pierre Moulin Beckman Institute, University of Illinois, Urbana, Illinois, USA

Siileyman Ozekici Department of Industrial Engineering, Bogazic;i University, Istanbul, 'TUrkey

Alexander W. Potchinkov Institute of Mathematics, Brandenburg Technical University of Cottbus, Cottbus, Germany

Christopher J. Price Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand

Rembert Reemtsen Institute of Mathematics, Brandenburg Technical University of Cottbus, Cottbus, Germany

Jan-J. Riickmann Institute of Applied Mathematics, University of Erlangen-Nuremberg, Erlangen, Germany

Ekkehard W. Sachs Department of Mathematics, University of Thier, Thier, Germany

Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA,

Andre L. Tits Department of Electrical Engineering and Institute for Systems Research, University of Maryland, Maryland, USA

Lieven Vandenberghe Electrical Engineering Department, University of California, Los Angeles, California, USA

PART I

THEORY

1 A COMPREHENSIVE SURVEY OF

LINEAR SEMI-INFINITE OPTIMIZATION THEORY

Miguel A. Goberna and Marco A. Lopez

ABSTRACT

University of Alicante, Department of Statistics and Operations Research, 03071 Alicante, Spain,

Email: [email protected] marco. [email protected]

This paper reviews the linear semi-infinite optimization theory as well as its main foundations, namely, the theory of linear semi-infinite systems. The first part is devoted to existence theorems and geometrical properties of the solution set of a linear semi-infinite system. The second part concerns optimality conditions, geometrical properties of the optimal set and duality theory. Finally, the third part analyzes the well-posedness of the linear semi-infinite programming problem and the stability (or continuity properties) of the feasible set, the optimal set and the optimal value mappings when all the data are perturbed.

1 INTRODUCTION

Linear semi-infinite optimization (LSIP, in brief) primarily deals with linear optimization problems of finitely many unknowns and an arbitrary number of constraints; i.e., with problems of the form

(P) Inf c'x S.t. a~x 2: bt , t E T,

with T =I- ¢ arbitrary, c E IRn , at : T --+ IRn and bt : T --+ IR . Its value v(P) is bounded from below by the value, v(D), of its dual problem

3

R. Reemtsen and J.-J. Riicknumn (eds.), Semi-Infinite Programming, 3--27. © 1998 Kluwer Academic Publishers.

4 CHAPTER 1

where IR?') is the positive cone in the space of generalized finite sequences, IR(T), whose elements are functions At : T --t IR which vanish everywhere except in a finite set, called supporting set orA: supp(A) = {t E T I At "l0}.

Generally speaking, linear optimization problems with finitely many constraints and an arbitrary number of unknowns can also be considered semi-infinite, but only problems of the form CD) will be considered here. From a modeling perspective, (P) is more applicable in practice than (D), and this explains the greater attention that has been paid to the first one, not only from a theoretical point of view, but also from the numerical side. This paper surveys the LSIP theory, thoroughly exposed in [31] and, although it is primarily focused on (P), we shall occasionally underline the existing symmetry between the properties of (P) and (D). In our opinion, LSIP theory is mainly based upon results for the linear semi-infinite systems (LSISs), which have the general form a = {a~x ~ bt , t E T}, like existence theorems and geometrical properties of the solution set F, as well as its stability features. Although the closed convex set F can be represented in a variety of ways by means of LSISs, we can only obtain precise information about F from suitable representations (not always possible in practice).

The first paper on LSISs was written in 1924 by Haar [36] in his attempt to extend the famous Farkas Lemma. Before the 2nd World War, Remes [50], in a work on best uniform approximation problems (which can be formulated as primal LSIP problems) and, independently, Dantzig, in his Ph. D. Thesis on statistical inference (see [15]), conceived seminal exchange procedures for LSIP problems, so that LSIP is, to some extent, older than ordinary LP. Also the cutting-plane method of Kelley [45] replaced the given convex problem, first, by a convex program with linear objective function, and then, linearizing its feasible .set by means of subgradients, by an LSIP which is solved by discretization. During the sixties both LSISs and LSIP theories were developed, mainly, by Zhu (existence theorems) and by Charnes, Cooper and Kortanek (duality theory). Since then many authors have contributed to the maturity of the theory, methods and applications of LSIP, whereas the role played by LSISs theory in the foundation of LSIP has been systematically ignored. In present times, many non-specialists, in this particular field, consider LSIP as an interesting laboratory for both theory ([3]) and methods ([49],[53]) specific of other optimization problems._

Let us introduce the necessary notation. Given ¢ "I X C IRn , by lXI, aff(X) , span(X), cone(X), conv(X), X-L, XO and dim(X) we denote the cardinality o£ X, the affine hull of X, the linear hull of X, the conical convex hull of X,

Linear SIP Theory 5

the convex hull of X, the subspace of orthogonal vectors to X, the dual cone of X (i.e., XO = {y E Rn I y'x ~ 0 for all x E X}) and the dimension of X (Le., the dimension of aff(X)), respectively. Moreover we define aff(¢) = span(¢) = cone(¢) = {On}. From the topological side, cl(X), int(X) , ext (X) , bd(X), rint(X) and rbd(X) represent the closure of X, the interior of X, the exterior of X, the boundary of X, the relative interior of X and the relative boundary of X, respectively. The Euclidean and Chebyshev norms of x E Rn will be Ilxll and Ilxlloo ' respectively, and the Euclidean distance from x to X =I ¢ will be denoted by d(x,X). Moreover we shall define d(x,¢) = 00. Finally, we represent by B the unit open ball, in Rn, for the Euclidean norm.

2 EXISTENCE THEOREMS FOR THE LSIS

Most of the relevant properties of a = {a~x ~ bt , t E T} are related to four convex cones, three of them explicitly defined from the coefficients of a:

1st moment cone of a: M = cone({at, t E T});

2nd moment cone of a: N = cone ( { ( ~: ) , t E T } ) ;

Characteristic cone of a: K = N + cone ( { ( ~; ) } ) ;

Cone of ascent rays of a : it is the set, A, of vectors a E Rn such that, for every b E R , there exists SeT, lSI < 00, such that a' x ~ b is satisfied for all x which are solutions of the subsystem {a~x ~ bt , t E S}.

Theorem 2.1 The following statements are equivalent:

(i) There exists x E Rn such that a~x ~ bt , for all t E T;

(ii) (Ot ) rt cl(N);

(iii) (Ot ) rt cl(K);

(iv) cl(K) =I cl(M)xR;

6 CHAPTER 1

(v) A = ¢Y.

The characterization of the consistency of a (condition (i)) by means of (ii), (iii), (iv) and (v) appeared in [19], [56], [32] and [5], respectively.

Theorem 2.2 The following statements are equivalent:

(i) There exists a sequence ,in IRn , {xr}, such that liminfr(a~xr) ~ bt for all t E T;

(ii) There exists p(y) E (lR[y]) n such that a~p(y) > bt (in the lexicographic sense) for all t E T;

(iii) (Ot ) fI. N;

(iv) (Ot) fl.K;

(v) A =1= IRn;

(vi) Every finite subsystem of a is consistent.

(i){=:::::}(ii){=:::::}(vi) was proved in [40]; the sequences in (i) and the polynomial vectors in (ii) are called asymptotic solutions and polynomial solutions of a; the sequences in (i) can be replaced by nets and they coincide with the weak solutions of a in the sense of Holmes [39]. The remaining equivalent conditions appear in [32].

An inconsistent system having at least an asymptotic solution is called asymptotically inconsistent. Otherwise, it is called strongly inconsistent. If N is closed and K is non-closed, then a is strongly inconsistent [2]. Table 1 shows some classification criteria based upon the cones N, K and A.

3 GEOMETRY OF THE FEASIBLE SET

The main theoretical tool is the existing duality between a and K, which could be seen as alternative representations of F.

Linear SIP Theory 7

Table 1 Consistency of the LSIS

II cr consistent II cr weakly inconsistent II cr strongly inconsistent II ( °t ) fj cl(N) ! °t ) E cl(N) \ N (Ot) EN

l °t ) fj cl(K) ( °t ) E cl(K) \ K ( °t ) E K

A=¢ ¢ i= A i= IRn A = IRn

Theorem 3.1 Let cr = {a~x ;::: bt , t E T} be a linear representation of F i= ¢. Then, the following statements hold:

(i)

cl(K) = {( ~) E IRn +1 I a'x;::: bfor all x E F } (3.1)

and

F = {x E IRn I a' x ;::: b for all (~) E cl(K) } ; (3.2)

(ii) dim(F) = dim(KO)-1 = n- dim{cl(K) n [-cl(K)]};

(iii) F is bounded if and only if ( ~'i ) E int(K) or, equivalently,

int(K) = { ( : ) E IRn +1 I a' x > b for all x E F }; (3.3)

(iv) F is a polyhedral convex set (affine manifold, singleton) if and pnly if cl(K) is a polyhedral cone (half-line, half-space, respectively).

(3.1) characterizes those linear inequalities which are consequences of cr, and it is known as the non-homogeneous Farkas Lemma. It was proved in [36], for a particular case, and generalized in [56]. Obviously, (3.2) and (3.3) are a dual formula and an interior counterpart for (3.1), respectively. The two statements in (ii) were established in [56] and [28], whereas (iii) and (iv) were proved in [26].

8 CHAPTER 1

To go further ih the geometrical analysis of the feasible set, we must consider 'good linear representations' of F. With this aim, we introduce three important families of consistent LSISs.

A consistent system a is Farkas-Minkowski (FM, in brief) when every linear consequence of a is also a consequence of a finite subsystem or, according to (3.1), when K is closed. a is continuous if T is a compact Hausdorff space and the functions at : T --+ lRn and bt : T --+ lR are continuous. Many approximation problems can be formulated as LSIP problems with continuous constraints systems. If a is continuous and there exists x E lRn such that a~x > bt for all t E T (Slater constraint qualification (Slater c. q., in short», then a is FM ([9], [10], [11] and [12] ).

The third class of LSISs will be defined through a local property involving the well-known feasible directions cone at x E F,

D(F,x) = {d E lRn I x + (}d E F for a certain () > O},

and the active cone at x,

A(x) = cone({at I a~x = bt, t E T}),

which playa central role in this theory.

A consistent system a is said to be locally polyhedral (LOP) when A(x)O D(F, x) for all x E F. A sufficient condition for a to be LOP is the inconsistency of the system a U { a' x = b} (system obtained aggregating a' x = b to a), for all

( ~) E D, D denoting the set of accumulation points of the projection, on the

unit sphere, of { ( ~: ) I ( ~: ) ::j:. On+l, t E T} . Both kinds of systems were

introduced in [1] and [48], respectively.

We have to point out that any non-empty closed convex set F admits an FM linear representation. Even more, if a is an arbitrary linear representation of

F and int(F) ::j:. 4>, then aU {a'x 2': b, (~) ED} is an FM representation

of F. Nevertheless, F admits an LOP linear representation if and only if the non-empty intersections of F with polytopes are polytopes too (which justifies the name chosen for this class of systems).

Theorem 3.2 If a = {a~x 2': bt , t E T} is either FM or LOP, then the following statements hold:

Linear SIP Theory 9

(i) dim(F) = n- dim(span({at, tETe})), where Te is the set of carrier indices of CT; i.e., Te = {t E T I a~x = bt for all x E F};

(ii) rbd(F) = U{Ft It E T \ Te}, where Ft := {x E F I a~x = btl is the exposed face of F associated with t E Tj

(iii) bd(F) = U{Ft I at :I On, t E T}j

(iv) A(x) is closed for all x E F. In particular, if CT is LOP, A(x) will be polyhedral.

The LOP part of the last theorem was proved in [1), whereas its FM version can be found in (31). Notice that this result also characterizes aff(F), rint(F) and int(F). The above properties can fail for continuous non-FM systems.

Now consider a dual pair {(P), (D)} such that both feasible sets, F and A, are non-empty. Let O+(F) and O+(A) be their respective recession cones. We shall denote by CTo the homogeneous system associated with CT, namely CTo = {a~x ~ 0, t E T}. Given a solution u of CTo, we represent by Ao(u) the active cone at u w.r.t. CTo . Table 2 displays, for comparative purposes, some properties of F and A, which is considered to be bounded if there exists k > 0 such that At ~ k for all A E A and all t E Tj i.e., if A is bounded for the uniform norm in R(T) .

The following remarks concern the statements in Table 2:

(i) The references for Fare [56], [24] and [26]' and for A the more representative are (18) and [24]. In the last case, the boundedness of A always implies the pointedness of M, which is incompatible with M = Rn. Thus, F and A cannot be simultaneously bounded (the LP and convex-SIP version appeared in [14] and [44], respectively). The boundedness conditions for A are necessary and sufficient conditions provided that CT a is FM (or continuous) and On ¢ cl( {at, t E T}).

(ii, iii) The characterization of the extreme points ([11)) and extreme directions ([25)) of A allows a geometrical interpretation of the simplex method for (D) ([25)). The primal conditions are always sufficient, and they are also necessary for LOP systems ([1)).

(iv) The version for F is the well-known Klee's Theorem, and requires the existence of at least one extreme pointj i.e., that dim(span({at,t E T}» = n.

10 CHAPTER 1

Table 2 Geometrical properties of the feasible sets F and A

F A OT(F) = {On} O+(A) = {aT}

M= ]Rn M is pointed all the inequalities in (Fa (Fa has no

boundedness are carrier and carrier conditions (i) dim(span( {at, t E T}» = n inequalities

U{SUpp(A), A E O-=FA} = T and dim(O+F) = n

dim(span({at,t E T}» = n extreme those A E A such that points {x E F Idim{A{x» = n} {at, t E SUpp{A)} is

(ii) linearly independent extreme those u E (0+ F) \ {On} those 8 E (O+A) \ {aT}

directions such that such that {at, t E supp(8)} (iii) dim(Ao{u» = n - 1 is affinely independent

representation the feasible set is the sum of the convex hull theorem of its extreme points and the

(iv) conical hull of its extreme directions

The dual version appeared in [11] and [46] for the bounded and the general cases, respectively. The last one suggests a purification algorithm for (D).

4 OPTIMALITY

Let us denote by P and A * the optimal sets of (P) and (D). The boundedness of these problems does not imply solvability but, if they are solvable, then the corresponding optimal set is the sum of the convex hull of the optimal extreme points and the conical convex hull of those extreme directions along which the objective functional vanishes (assuming dim{span({at,t E T}» = n, for F*), as a consequence of the representation theorem.

Theorem 4.1 Given x E F, the following statements are sufficient conditions for x E F* :

(i) There exists 'X E A such that 'Xt{a~x - bt} = 0, for all t E T;

Linear SIP Theory 11

(ii) There exists X E IR?') such that L(x, A) ::::; L(x, A) ::::; L(x, A) for all x E IRn

and for all A E IR?') , where L(x, A) is the Lagrangian function

(iii) c E cl(A(x)).

L(x, A) = c'x + L At(bt - a~x); tET

They are necessary optimality conditions too when a is either FM or LOP. In such a case dim(F*) = n- dim(A(x)).

The sufficiency of (i) is the complementary slackness lemma in [23], and X is a complementary multiplier's vector for x. The point (x, X) in (ii) is a saddle point for the Lagrangian function L(.,.). The sufficiency of (iii) implies the sufficiency of the Karush-Kuhn-Tucker condition c E A(x), and both conditions are equivalent for FM and LOP systems, by Theorem 3.2 (iv). The equivalence between c E A(x) and x E F* was proved in [6] for continuous systems satisfying the Slater c.q. (a particular class of FM systems). Extended versions of Theorem 4.1, which subsume many other optimality conditions can be found in [29] and [31]. In [31] the dimensional equation for F* is proved, and optimality conditions for (D) are given as well. Finally, in [31] and in [47] some optimality conditions for the convex SIP are derived, emphasizing the importance of the FM property in relation to the proposed linearizing system (based on the subgradients of the convex functions which are involved in the problem).

Once again we shall compare the properties of primal and dual objects, now the bounded ness of the optimal sets, by means of a table. We assume, for (D), that a is a continuous consistent system.

Table 3 Boundedness of the optimal sets

II F* II A*

" the non-empty level sets are bounded the optimal set is still bounded the optimal set is still bounded under small perturbations of c under small perturbations of all the

data, providing dual feasibility C E int(M) Slater c.q.

(superconsistency of (D)) (super consistency of (P))

12 CHAPTER 1

The equivalence between the conditions in the columns of Table 3 and some others, under the same or under different hypothesis for (D), are established in [24), [31) and [38).

5 DUALITY THEOREMS AND DISCRETIZATION

If ITI < 00 and both (P) and (D) are bounded, then the so-called duality gap is 8(P, D) := v(P) - v(D) = O. This well-known LP duality theorem fails for LSIP problems, so that two questions arise for a given dual pair {(P), (D)} :

1st. Does {(P), (D)} satisfy the duality theorem? If not,

2nd. Is it possible to avoid the duality gap by extending one of the coupled problems, in such a way that the optimal value is attainable, at least, for one of them?

A third question concerns the possibility of solving (P) appealing to some discretization strategy. We say that (P) is discretizable if there exists a sequence {Tr } of finite subsets of T such that, denoting by Vr the optimal value of the subproblem obtained by replacing T by Tr in (P), we have v(P) = limrvr.

If (P) is not discretizable there are two outlets giving rise to the approximation of v(P) through a typical diagonal process:

(P) is always regularizable ([2), [31)) in the sense that there exists a polytope C, withOn E int(C), such that v(P) = limp-HXlv(p), where v(p) denotes the value of the discretizable LSIP problem obtained by aggregating to (P) the additional constraint x E pC, P > o.

(P) is weakly discretizable (concept due to [43)), if there exists d E ]Rn such that v(P) = lima""ov(o:), where v(a:) is the optimal value of the discretizable LSIP problem obtained by replacing the objective function in (P) by the perturbed one (c + o:d)'x, with 0: > O.

The third question, concerning the given problem (P), is:

3rd. Is (P) discretizable? If not, is (P), at least, weakly discretizable?


Table 4 gives an answer to the first and to the third questions:

Table 4 Duality states

II (j II c II (P) II 8(P,D) II c E rint(M) discretizable 0

weakly discretizable consistent c E rbd(M) and o {:} C1 = C2

discretizable {:} C1 = C2

C E ext(M) discretizable 0 cE A discretizable 0

asymptotically c E cl(M) \ A weakly discretizable +00 inconsistent but non-discretizable

c E ext(M) not even weakly +00 discretizable

strongly discretizable O{:}cEM inconsistent

C1 := cl[({c} x JR) nK]; C2 := ({c} x JR) n cl(K)

Notice that, if (j is FM and, so, K is closed, the sets C1 and C2 coincide, thus 8(P, D) = O. Analogously, if (P) is consistent and (D) super consistent (i.e., c E int(M)), we also get 8(P,D) = O. These are the classical duality theorems in LSIP (the first one was proved in [12], whereas the second one is discussed in [22]). The last theorem was extended in [52] to c E rint(M). The statements about discretizability were analyzed in [27]. The discussion is completed in [31].

Finally in this section, we consider the second question. We say that a pair of optimization problems depending on a parameter c E JRn, {P(c), D(c)}, is a p-uniform (d-uniform) dual pair if for every c E JRn, exactly one of the following statements holds:

(a) P(c) and D(c) are both inconsistent;

(b) P(c) is inconsistent and D(c) is unbounded;

(c) P(c) is unbounded and D(c) is inconsistent;

(d) P(c) and D(c) are both consistent, have the same value and P(c) is solvable (D(c) is solvable, respectively).

14 CHAPTER 1

Observe that replacing the given vector c in (P) and (D) by a parameter denoted in the same way, we have a pair of parameterized problems {P(c), D(c)} which is a d-uniform dual pair provided that a is either FM ([16)) or strongly inconsistent ([31)). Let us denote by P(c) the optimization problem whose feasible set F is formed by all the asymptotic solutions of a, and whose optimal value is

Similarly, denote by D(c) the result of replacing a in D(c) by an arbitrary FMrepresentation of F (this can be achieved in practice, by adding constraints, if int(F) f:. c/>; i.e., by enlarging the feasible set of the dual problem). Starting from a significant result in [40], the following result is proved in [31]:

TheoreIll 5.1 Given an LSIS a, the following statements hold:

(i) {P(c), D(c)} is a d-uniform dual pair;

(ii) {P(c), D(c)} is a p-uniform dual pair.

6 STABILITY OF THE LSIS

This section approaches the stability of the LSIS a = {a~x ~ bt , t E T}. Many LSIP problems have coefficients which are analytic functions, with values that have to be approximated in the computing process, or functions taking irrational values, which are rounded off. Then, we actually solve a perturbed problem whose constraint system al = {(an'x ~ bL t E T} is close to a. Throughout this section, we analyze the stability properties of the solution set mapping F, which assigns to each system a its (possibly empty) solution set F.

According to Robinson [51], a will be stable under small perturbations if F is lower semicontinuous at a, provided that a ranges on a certain Banach space. This assertion motivated the study of the lower semicontinuity, as well many other stability properties of F, in [29], [34) and [35], where T is not endowed with any particular topological structure. Nevertheless, in order to build a stability theory, one always needs to define a measure of the size of the perturbations. Thus, if we denote by e the set of all the systems, in ~n, whose index set is T, a natural pseudometric will be introduced in e by means of the


following pseudodistance

In this way, the so-called parameter space (e, d) becomes a pseudometric space, whose topology is Hausdorff, satisfies the first axion of count ability and describes the uniform convergence on e == (]Rn+1 f.

We shall apply many different stability criteria in relation to the solution set map. For instance, if limrO'r = 0', according to the pseudometric in e, the uniform convergence entails the pointwise convergence of all the functions, involved as coefficients of O'r, to the corresponding coefficients functions of 0'. So, for every sequence {xr}, xr E Fr, converging to X, one gets x E F. In other words, we have just realized that :F is a closed mapping at 0'.

The following theorem provides various charaterizations of the lower semicontinuity of:F at a consistent system 0'. Hereafter we represent by ee the consistent systems subset of the parameter space e, whereas e i denotes the inconsistent systems subset, which is itself split in two parts, ew and es , containing the weakly and the strongly inconsistent systems, respectively.

Theorem 6.1 If 0' = {a~x 2: bt , t E T} E ee, then the following statements are equivalent:

(i) :F is lower semicontinuous (lsc, for short) at O';i.e., for each open set W such that F n w =I 4> there exists an open set U in e, containing 0', such that F1 n W =I ¢ for every 0'1 E U;

(ii) 0' E int(ee);

(iii) The zero-function OT does not belong to bd {G (]Rn ) - ]R~ }, where G : ]Rn ~ ]RT is the slack function G(x) := a~x - bt , and]R~ represents the positive cone in ]RT j

(iv) b(.) E int{J(]Rn)-]Rr}, where J:]Rn ~]RT is J(x) := a~x; i.e., sufficiently small perturbations of the right hand side (RHS) function do not affect the consistency;

(v) On+ 1 i cl (conv ( { ( ~: ) , t E T } ) ) ;

16 CHAPTER 1

(vi) a satisfies the strong Slater condition; i.e., there exist a positive scalar c and a point x E lRn , called an SS-element, such that a~x :2: bt + c, for all t E T;

(vii) For each x E F a pair of positive scalars, f3 and c, exist such that the inequality d(x, Fd ~ f3h(x, ad holds for every al verifying d(al' a) < c, and where h(x, ad constitutes the following measure of infeasibility of x with respect

h(x, al) := max{O, sUPtET[b; - (aD/x]);

(viii) :F is dimensionally stable at a; i.e., dim(Fd = dim (F) for every system al in a certain neighborhood of a;

(ix) For every sequence {ar } C e converging to a, there exists ro such that F = limr~roFr in the Painleve-Kuratowski sense; i.e., F = liminfr~roFr = limsuPr~roFr;

(x) F =cl( U {x E lRn I a~x:2: bt +c, t E T}); i.e., F coincides with the clo-0>0

sure of the set formed by all the SS-elements of a;

If On f/. bd( conv( {at, t E T})), another condition can be added to the list:

(xi) :F is topologically stable at a; i.e., FI is homeomorphic to F for every system al in a certain neighborhood of a.

The previous theorem shows that many stability concepts taken from the literature, when they are applied to our problem, provide alternative characterizations of the lower semicontinuity property. More precisely, a system a satifying (iii) is called non-critical in [55], working in a more general setting; (iv) is called regularity in [51] by Robinson, who also proved the equivalence between (iv) and (vii) for a class of systems which does not include ec ; topological stability (xi), and its relation with the extended Mangasarian-Fromovitz constraint qualification (EMFCQ), has been extensively studied in [41] and [42], in the context of non-linear (parametric) SIP with C1 data; condition (vii), also called metric regularity, plays an important role in the stability and sensitivity analysis of many optimization problems, like CI-parametric SIP, in which case its equivalence with EMFCQ is established in [37]. In [34] the equivalence of the first seven conditions is proved, whereas that (viii) and (xi) are considered in [301. Conditions (ix) and (x) are established in [8], and the almost complete independence between the SS-elements set and int(F) is remarked in [311.

Concerning the upper semicontinuity of :F at a, the standard Dolecki characterization of upper semicontinuity for mappings between metric spaces (see, for instance, (4) Lemma 2.2.2) has inspired the following result (observe that (G, d'), with d' (al' a) := min {I, d( al , a)}, is a complete metric space, locally equivalent to (G, d), providing the uniform convergence topology too):

Theorem 6.2 Given a system a E Ge , the following assertions are equivalent;

(i) :F is upper semicontinuous (usc, in brief) at a; i.e., for each open set W containing F, there exists an open set, a E U c G, such that, if al E U, its solution set will satisfy Fl C W;

(ii) There exist a couple of positive scalars, t and p, such that Fl \ pcl(B) c F\ pcl(B) , for every al E G such that deal, a) < t; i.e., the solution sets of neighboring systems differ from the original one in some uniformly bounded manner.

This equivalence can be found in [35], where many consequences are derived. In particular, if F is a non-empty bounded set, then :F is usc at a, but the boundedness of F is also a necessary condition for the upper semicontinuity of :F at a if n ~ 2 and {at, t E T} is bounded too (see [20) for the continuous case). In the very particular case n = 1, :F is always usc at a. Finally, it deserves to be mentioned that, if T is infinite, there always exists an LSIS, al, equivalent to a, its index set having the same cardinality as T, and such that :F is continuous at al (in fact, we can take, al = {(rat}'x ~ rbt -1, (t,r) E T x N}).

The principal merit of the characterization given in the last theorem is that it does not require the boundedness of F, but its main drawback is that it is scarcely useful from a practical point of view, as far as it does not involve the representation of F. Some other rather technical conditions (necessary or sufficient for:F to be usc at a), supplied in [35], actually rely on the coefficients of a and exploit the properties of the asymptotic cone of {at, t E T}. Nevertheless, they are also difficult to check in practice.

Next we approach the stability of inconsistent systems. Now the main problem is to characterize the topological interior of Gs and Gw (:F is trivially lsc at a E Gi , whereas it is usc at such a system if and only if a E int(G i )).

18 CHAPTER 1

Theorem 6.3 ([34]) Given a E 8, the following statements are equivalent;

(i) a E int(8s );

(ii) (Ot ) E int(N);

(iii) (Ot ) E int(K);

(iv) a E 8 i and M = ]Rn.

Theorem 6.4 ([34]) Consider the following statements for a E 8 :

(i) On ~ cl(conv({at, t E T})) and {bt lIatll-1, t E T} is unbounded;

(ii) a E int(8w );

(iii) b(.) is unbounded on T.

Then, (i)*(ii)*(iii).

The following theorem deals with the relationship between the different sets considered above. The stability behaviour of the different classes clearly rely on the cardinality of T.

Theorem 6.5 ([34]) The following propositions hold:

(i) ¢ i- int(8c ) ~ 8 c · Moreover, 8 c \ int(8c ) C cl(8s );

(ii) int(8i) ~ 8 i. Moreover, int(8i) i- ¢ if and only if ITI ?: n + 1;

(iii) int(8s ) ~ 8 s . In addition, int(8s ) i- ¢ if and only if ITI ?: n + 1;

(iv) ¢ i- int(8w ) ~ 8 w if ITI = 00. Otherwise, 8 w = ¢;

(v) int(8s ) U int(8w ) ~ int(8i) if and only if ITI = 00.

As a consequence of this theorem, if 17'1 = 00 the four proper subsets of 8 considered here are neither open nor closed. In [34] the existence of (non-


trivial) highly unstable systems is illustrated, showing systems that belong to bd(ec) n bd(ew) n bd(es ).

Finally in this section, we introduce the set eo c e of all the continuous LSISs indexed by T, where only continuous perturbations of the coefficients are permitted. d induces the uniform distance in eo, and (eo, d) becomes a complete metric space. Our aim is to discuss the continuity properties of :F restricted to eo. The set of continuous consistent, continuous inconsistent, continuous weakly inconsistent and continuous strongly inconsistent systems are denoted by eoc, e Oi , eow and eos, respectively. The same sub-index is also used to distinguish the corresponding topological operators in (eo, d).

In this continuous setting, Urysohn's Lemma turns out to be the main tool, providing the characterization of into(eoc) as eo n int(ec), and giving rise to a large set of different charaterizations of the lower semicontinuity of :F leo at a Eeoc [34]. Most of them are nothing else but the continuous version of the corresponding property in the list supplied by Theorem 6.1. In precise terms, condition (ii) there now reads a E into(eoc); the topological concepts involved in conditions (iii) and (iv) are those corresponding to C(T), whose positive cone C+(T) will replace to IR~; in condition (v) the closure operator is not required; condition (vi) is now equivalent to the ordinary Slater c.q., which is additionally equivalent to the full-dimensionality of F and the absence of the trivial inequality O~x ~ 0 in a; and the actual meaning of (x), in this context, is that int(F) coincides with the set of Slater points. Some of these equivalences can be traced out in [7], [13], [51] and [54].

As one can expect, the class eow is much more unstable than e w. In fact, into(E>ow) = ¢ and E>ow is itself empty if, and only if, ITI is finite. It can be also stated intO(eOi) = into(eos), and these sets are non-empty if and only if ITI ~ n + 1.

7 STABILITY AND WELL-POSEDNESS OF THE LSIP PROBLEM

In a parametric setting, the primal LSIP problem, Inf c'x s.t. a~x ~ bt , t E T, will be represented by the pair 7r := (c,a), where a = {a~x ~ bt , t E T}. The parameter space, TI, is the set of problems 7r := (c, a) whose constraint systems have the same index set and c :f:. On. We shall consider the following subsets: TIc is the set of consistent problems; i.e., those problems 7r := (c, a) such that

20 CHAPTER 1

a E 8 c j ITb is the set of bounded problems; i.e., those having a finite optimal valuej and ITs is the set of solvable problems; i.e., the problems which have optimal solutions. Obviously, ITs C ITb C ITc. Now, IT is endowed again with the topology of the uniform convergence, derived from the pseudo-distance

where 7f = (c, a) and 7f1 = (c1 , ad.

In this section we analyze the stability behaviour of the optimal value function V (V(7f) = v), and of the optimal set mapping F* (F*(7f) = F*). The results presented in this section show that the lower semicontinuity of F at a, and the boundedness of PO, especially when they are simultaneously fulfilled, entail nice stability properties of V anf F* at 7f.

Theorem 7.1 ([8], [31]) In relation to the problem 7f = (c,a) E ITc the following properties hold:

(i) V is usc at 7f if and only if F is lsc at aj

(ii) If F* is a non-empty bounded set, V will be lsc at 7fj

(iii) If the assumptions in (i) and (ii) are simultaneously satisfied, there will exist a pair of positive scalars, 6 and k, such that when 6(7fi' 7f) < 6, i = 1,2, the following Lipschitzian inequality is satisfied

Theorem 7.2 ([8], [31]) For any 7f = (c,a) E ITs, the following propositions hold:

(i) F* is closed at 7f if and only if eith,er F is lsc at a or F = F*j

(ii) If F* is usc at 7f, then F* is closed at 7f, and the converse statement holds if F* is bounded;

(iii) F* is lsc at 7f if and only if F is lsc at a and F* is a singleton.

Table 5 summarizes all the results contained in the last two theorems, emphasizing the subtle relationship existing between the studied properties, which


Table 5 Stability properties of F* and V

II F* i- ¢ II :F is lsc at 7r II :F is not lsc at 7r 1/ F* is a F=F* I, II, III, IV ( ,I), II, III, (,IV)

singleton F i- F* (,I), (,II), III, (,IV) F* is bounded, F=F* ( ,I), II, III, IV ( ,I), II, III, (,IV) not a singleton F i- F* ( ,I), (,II), III, (,IV)

F* is F=F* ( ,I), (,III), IV ( ,I), (,III), (,IV) unbounded F i- F* ( ,I), ( ,II), (,III), (,IV)

are denoted as follows: I=:F * is lsc at 7rj II= :F* is usc at 7rj III= V is lsc at 7rj IV= V is usc at 7rj (,1)=1 does not hold (etc.).

The statements in Theorems 7.1 and 7.2 constitute extensions, to the general LSIP, of some results given in [7] and in [20], respectively, for the continuousLSIP.

In LSIP the continuous dependence of the optimal solutions on the problem data, expressed in the classical idea of well-posedness in physics due to Hadamard, gives rise to the following concept:

{xr} is said to be an asymptotically minimizing sequence for 7r E II if there exists an associated sequence of perturbed problems, {7rr = (cr , ar )} C lIb, such that limr7rr = 7r, xr E Fr for all r, and

limr{(cr )' xr - vr} = o.

We propose a definition of well-posed ness based on the strategy of approximately solving the perturbed problems, allowing the existence of more than one optimal solution. It is oriented towards the stability of the optimal value (in [17] it is called value Hadamard well-posedness).

The problem 7r E lIs will be Hadamard well-posed (Hwp, shortly) if for each x E F* and for each possible {7rr} C lIb converging to 7r, we can find some associated asymptotically minimizing sequence converging to x.

Theorem 7.3 ([8],[31]) Given 7r = (c, a) E lIs the following statements hold:

(i) If 7r is Hwp, then V IITb is continuous at 7r. If:F is lsc at a, the converse statement is also truej

22 CHAPTER 1

(ii) Assuming that F* is bounded, 'Tr will be Hwp if and only if either F is lsc at cr or F is a singleton;

(iii) When F* is unbounded and'Tr is Hwp, F has to be lsc at cr;

(iv) If F* = {x}, then 'Tr is Hwp if and only if for every sequence {'Trr} C ITs converging to'Tr, and for every sequence {xr}, xr E F;, r = 1,2, ... , we get limrxr = x.

The condition formulated in (iv), together with the unicity of the optimal solution, constitute, for many authors, a convenient definition of Hadamard well-posedness (see, for instance, [54]). So, we have established the equivalence of both concepts, under the unicity of F*. Table 6 allows an easy reading of all the partial results supplied by the last theorem.

Table 6 Hadamard well-posedness

II F is lsc at 'Tr II F is not lsc at 'Tr II F* is a I F=F* 'Tr is Hwp 'Tr is Hwp

singleton I F =i F* 'Tr is not Hwp F* is bounded, not a singleton 'Tr is Hwp 'Tr is not Hwp

F* is unbounded 'Tr is not Hwp

In the case corresponding to the blank cell, 'Tr is Hwp if and only if V Irrb is lsc at 'Tr. Moreover, the conditions: 'Tr is Hwp, F* is usc at 'Tr, and F = F*, are mutually independent.

A different concept of well-posedness, not involving any perturbation, is the Tykhonov well-posedness, requiring for every sequence {xr} C F such that limrc'xr = v, the fulfilment of

It can be proved ([31]) that the boundedness of F* is a sufficient condition for the Tykhonov well-posedness, and LSIP problems can be given which do not enjoy this property. Moreover, in our setting, some other related concepts, like the so-called Levitin-Polyak well-posedness (the approximating points xr are not required to be feasible), as well as the validity of some conditioning inequality are completely equivalent (see [31], Chapter 10, and references therein).

8 OPTIMAL SOLUTION UNICITY

F* is a singleton if and only if the diameter of the oS-optimal sets tends to zero as oS approaches zero. Moreover, since the numerical methods usually stop when a particular oS-optimal set is reached, in order to obtain accurate solutions at an early stage, those sets should decrease fast in size. This is accomplished if the strong uniqueness property holds.

X E F is strongly unique for 'IT == (c, F) = Inf c'x s.t. x E F, if there exists an a > 0 such that, for all x E F, we have

c'x 2': c'x+allx-xll· Obviously, in this case, F* = {x}. Observe that this property is independent of the linear representation chosen for F.

Next, we give a local concept yielding a continuous transition from usual uniqueness to strong uniqueness. Given w 2': 1, x E F is w-unique (or x satisfies the growth condition of order w) if there exist two positive scalars a and p such that

c'x 2': c'x + a Ilx - xllw, for all x E Fn(x + pB),

(we note that Ow = 0).

It can be proved that strong uniqueness is equivalent to I-uniqueness, and wuniqueness implies w'-uniqueness for every w' > w. Actually, w-uniqueness depends on the curvature of F at x. For instance, given the problem 'IT = Inf c'x s.t. IIxll ~ k, the point x = -k Ilcll-1 c is 2-unique, but it is not w-unique for any other w < 2. As w increases, w-uniqueness constitutes an increasingly weaker uniqueness condition and, given any w > 1, we can always build an optimization problem in IRn, 'IT = (c, F), possessing a w-unique optimal solution which is not w'-unique for any other w' < w. Moreover, one can find a problem 'IT = (c, F), F C IRn, such that the boundary of F is so flat, at its unique optimal solution x, that x is not w-unique for any w 2': 1.

The following theorem gives a couple of characterizations for the strong uniqueness property, which are independent from the representation of F. The proof in [31] is based on the Polyak characterization of the strong uniqueness property for the unrestricted convex optimization problem.

Theorem 8.1 ([31],[33]) Given the minimization problem 'IT = (c,F), where F is a non-empty closed convex set in IRn, the following three statements, relative to a feasible point x, are equivalent:

24 CHAPTER 1

(i) X is strongly unique for 7r;

(ii) c E int(D(F, x)O);

(iii) There exists p > 0 such that x E Ft, optimal set of the problem 7rl = (c1 , F) whichever c1 we take in c + pB.

In addition, w-uniqueness, for w > 1, is a necessary condition for strong uniqueness, and implies uniqueness. Finally, the closedness of D(F, x) makes all these notions of uniqueness mutually equivalent.

Hereafter, we are considering a certain linear representation of F; i.e., we deal with a problem 7r = (c, a), such that F is the solution set of the LSIS a. If 7r E ITs and a is LOP, D(F, x) is closed at any x E F, and uniqueness and strong uniqueness are equivalent. The next result provides some reformulations of a sufficient condition for strong uniqueness, involving the active cone A(x).

Theorem 8.2 ([31),[33]) The following conditions are mutually equivalent, and imply the strong uniqueness of x as optimal solution of 7r = (c, a), whereas the converse holds when a is either FM or LOP:

(i) c E int(A(x));

(ii) The system a(x) := {e'y:::; 0; a~y ~ 0, t E T(x)}

only admits the trivial solution y = On;

(iii) x E F*, and X E A * exists such that On is the unique solution of the system:

a(x X) := { a~y = 0, t E T(x) n supp0) }. , a~y ~ 0, t E T(x) \ SUpp(A)

Moreover, if these conditions are satisfied and IT(x) I = n, then the dual problem has a unique optimal solution too.

The reader will find related results in [21) and in [33).


REFERENCES [1] E. J. Anderson, M. A. Goberna, and M. A. Lopez. Locally polyhedral linear

semi-infinite programming. Linear Algebra Appl., to appear.

[2] N. N. Astaf'ev. On Infinite Systems of Linear Inequalities in Mathematical Programming (in Russian). Nauka, Moscow, 1991.

[3] A. Auslander, R. Cominetti, and J. P. Crouzeix. Convex functions with unbounded level sets and applications to duality theory. SIAM J. Control Optim., 3:669-87, 1993.

[4] B. Bank, J. Guddat, D. Klatte, and K. Tammer. Non-Linear Parametric Optimization. Akademie Verlag, Berlin, 1983.

[5] Ch. E. Blair. A note on infinite systems of linear inequalities in Rn. J. Math. Anal. Appl., 48:150-4, 1974.

[6] B. Brosowski. Parametric Semi-Infinite Optimization. Peter Lang, Frankfurt am Main, 1982.

[7] B. Brosowski. Parametric semi-infinite linear programming I. Continuity of the feasible set and of the optimal value. Math. Programming Study, 21:18-42, 1984.

[8] M. D. Canovas, M. A. Lopez, J. Parra, and M. I. Todorov. Stability and wellposedness in linear semi-infinite programming. Working paper, Dep. Stat. and Oper. Res., University of Alicante, 1996.

[9] S. N. Cernikov. Linear inequalities (in Russian). Nauka, Moscou, 1968.

[10] A. Charnes, W. W. Cooper, and K. O. Kortanek. Duality, Haar programs and finite sequences spaces. Proc. Nat. Acad. Sci. USA, 48:783-6, 1962.

[11] A. Charnes, W. W. Cooper, and K. O. Kortanek. Duality in semi-infinite programs and some works of Haar and Caratheodory. Management Sci., 9:209-28, 1963.

[12] A. Charnes, W. W. Cooper, and K. O. Kortanek. On representations of semiinfinite programs which have no duality gaps. Management Sci., 12:113-21, 1965.

[13] G. Christov and M. I. Todorov. Semi-infinite optimization. Existence and uniqueness of the solution. Math. Balcanica, 2:182-91, 1988.

[14] F. E. Clark. Remarks on the constraints sets in linear programming. Amer. Math. Monthly, 58:351-2, 1961.

[15] G. B. Dantzig. Linear programming. In J. K. Lenstra, A. H. Rinnoy Kan, and A. Schrijvers, editors. History of Mathematical Programming. A Collection of Personal Reminiscences, North Holland, Amsterdam, 1991.

[16] R. J. Duffin, R. G. Jeroslow, and L. A. Karlovitz. Duality in semi-infinite linear programming. In A. V. Fiacco and K. O. Kortanek, editors. Semi-infinite Programming and Applications, Springer-Verlag, Berlin, 1983.

[17] A. L. Dontchev and T. Zolezzi. Well-Posed Optimization Problems. SpringerVerlag, Berlin, 1991.

26 CHAPTER 1

[18J U. Eckhardt. Theorems on the dimension of convex sets. Linear Algebra Appl., 12:63-76, 1975.

[19J K. Fan. On infinite systems of linear inequalities. J. Math. Anal. Appl., 21:475-8, 1968.

[20J T. Fisher. Contributions to semi-infinite linear optimization. In B. Brosowski and E. Martensen E, editors. Approximation and Optimization in Mathematical Phisics, Peter Lang, Frankfurt am Main, 1983.

[21J T.Fisher. Strong unicity and alternation for linear optimization. J. Opt. Th. Appl., 69:251-67, 1991.

[22J K. Glashoff. Duality theory in semi-infinite programming. In R. Hettich, editor. Semi-infinite programming, Springer-Verlag, Berlin, 1979.

[23J K. Glashoff and S. A. Gustafson. Linear Optimization and Approximation. Springer-Verlag, Berlin, 1983.

[24J M. A. Goberna. Boundedness relations in linear semi-infinite programming. Adv. Appl. Math., 8:53-68, 1987.

[25J M. A. Goberna and V. Jornet. Geometric fundamentals of the simplex method in semi-infinite programming. O. R. Spektrum, 10:145-52, 1988.

[26J M. A. Goberna and M. A. Lopez. A theory of linear inequality systems. Linear Algebra Appl., 106:77-115, 1988.

[27J M. A. Goberna and M. A. Lopez. Optimal value function in semi-infinite programming. J. Opt. Th. Appl., 59:261-79, 1988.

[28J M. A. Goberna and M. A. Lopez. Dimension and finite reduction in linear semiinfinite programming. Optimization, 25:143-60, 1992.

[29] M. A. Goberna and M. A. Lopez. Optimality theory for semi-infinite linear programming. Numer. Funct. Anal. Optim., 16:669-700, 1995.

[30] M. A. Goberna and M. A. Lopez. A note on topological stability of linear semiinfinite inequality systems. J. Opt. Th. Appl., 89:227-236, 1996.

[31] M. A. Goberna and M. A. Lopez. Linear semi-infinite optimization. John Wiley, New York, 1997.

[32J M. A. Goberna, M. A. Lopez, J. A. Mira, and J. Valls. On the existence of solutions for linear inequality systems. J. Math. Anal. Appl., 192:133-50, 1995.

[33J M. A. Goberna, M. A. Lopez, and M. I. Todorov. Unicity in linear optimization. J. Opt. Th. Appl., 86:37-56, 1995.

[34J M. A. Goberna, M. A. Lopez, and M. I. Todorov. Stability theory for linear inequality systems, SIAM J. Matrix Anal. Appl., 17:730-743, 1996.

[35J M. A. Goberna, M. A. Lopez, and M. I. Todorov. Stability theory for linear inequality systems II: upper semicontinuity of the solution set mapping. SIAM J. Optim., to appear, 1997.

[36J A. Haar. Uber lineare Ungleichungen. Acta Math. Szeged, 2:1-14, 1924.


[37] R. Henrion and D. Klatte. Metric regularity of the feasible set mapping in semiinfinite optimization. Appl. Math. Opt., 30:103-9, 1994.

[38] R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods and applications, SIAM Review, 35:380-429, 1993.

[39] R. B. Holmes. Geometric Functional Analysis and its Applications. SpringerVerlag, New-York, 1975.

[40] R. G. Jeroslow and K. O. Kortanek. On semi-infinite systems of linear inequalities. Israel J. Math., 10:252-8, 1971.

[41] H. Th. Jongen, J.-J. Riickmann, and G. W. Weber. One-parametric semi-infinite optimization: on the stability of the feasible set. SIAM J. Optim., 4:637-48,1994.

[42] H. Th. Jongen, F. Twilt, and G. W. Weber. Semi-infinite optimization: structure and stability of the feasible set. J. Opt. Th. Appl., 72:529-52, 1992.

[43] D. F. Karney. Duality gaps in semi-infinite linear programming: An approximation theory. Math. Programming, 20:129-43, 1981.

[44] D. F. Karney. Clark's theorem for semi-infinite convex programs. Adv. Appl. Math., 2:7-12, 1981.

[45] J. E. Kelley. The cutting-plane method for solving convex programs. SIAM J., 8:703-712, 1960.

[46] K. O. Kortanek and H. M. Strojwas. On constraint sets of infinite linear programs over ordered fields. Math. Programming, 33:146-61, 1985.

[47] M. A. Lopez and E. Vercher. Optimality conditions for nondifferentiable convex semi-infinite programming. Math. Programming, 27:307-19, 1983.

[48] E. Marchi, R. Puente, and V. N. Vera de Serio. Quasi-polyhedral sets in semiinfinite linear inequality systems. Linear Algebra Appl., 255:157-69, 1997.

[49] M. J. D. Powell. Karmarkar's algorithm: A view from non-linear programming. IMA Bulletin, 26:165-81, 1990.

[50] E. Remes. Sur Ie caIcul effective des polinomes d'approximation de Tchebycheff. C.R. Acad. Sci. Paris, 199:337-40, 1934.

[51] S. M. Robinson. Stability theory for systems of inequalities. Part I: linear systems. SIAM J. Numer. Anal., 12:754-69, 1975.

[52] G. S. Rubinstein. A comment on Voigt's paper" A duality theorem for linear semi-infinite programming" (in Russian). Optimization, 12:31-2, 1981.

[53] M. J. Todd. Interior point algorithms for semi-infinite programming. Math. Programming, 65:217-45, 1994.

[54] M. I. Todorov. Generic existence and uniqueness of the solution set to linear semi-infinite optimization problems. Numer. Funct. Anal. Optim., 8:27-39, 1985-86.

[55] H. Tuy. Stability property of a system of inequalities. Math. Oper. Statist. Series Opt., 8:27-39, 1977.

[56] Y. J. Zhu. Generalizations of some fundamental theorems on linear inequalities. Acta Math. Sinica, 16:25-40, 1966.

2 ON STABILITY AND

DEFORMATION IN SEMI-INFINITE OPTIMIZATION

Hubertus Th. Jongen1 and Jan-J. Riickmann2

1 RWTH-Aachen, Department of Mathematics, D-52056 Aachen, Germany,

Email: [email protected]

2 University of Erlangen-Nuremberg, Institute of Applied Mathematics, Martensstrasse 3, D-91058 Erlangen, Germany,


ABSTRACT

In this tutorial paper we study finite dimensional optimization problems with infinitely many inequality constraints. We discuss the structure and stability of the feasible set, as well as stability of stationary points. Then, we consider global (or structural) stability of semi-infinite optimization problems and, finally, we focus on one-parametric deformations of them.

1 INTRODUCTION

In this tutorial paper we consider finite dimensional optimization problems of the following type:

(SIP) : Minimize f(x) subject to x E M[h, g], (1.1)

where

M[h,gj = M={XElRn I hi(x)=O, i E A, g(x,y) 2:: ° for all yEY}, (1.2)

Y = {y E lRT I Ui(Y) = 0, i E I, Vj(Y) 2:: 0, j E J},

--h = (hi, i E A), IAI < n, III < r, IJI < 00 (where I· I denotes the cardinality).

The index set Y in (1.2) is assumed to be compact. All defining functions f, g, . .. , are assumed to be real-valued and k-times continuously differentiable,

29 R. Reemtsen and 1.-1. Riickmann (eds.), Semi-Infinite Programming, 29-67. @ 1998 Kluwer Academic Publishers.

30 CHAPTER 2

where k k 1) will be specified later on; notation: f E Ck(JRn,JR), g E Ck(JRn x JRr,JR), etc.

The index set Y may contain infinitely many elements. That is why problems of the type (1.1) are called semi-infinite optimization problems (SIP). The striking difference between the cases WI < 00 and WI = 00 becomes clear from Figure 1. Let Yo(x) denote the active index set at a point x:

Yo(x) = {y E Y I g(x, y) = O}.

a b

Figure 1

The feasible set M in Figure La is described by means of a finite number of inequality constraints. At the point x there are two inequality constraints active. In virtue of continuity we have Yo(x) C Yo(x) for all x in some neighbourhood of x. In Figure 1.b, the feasible set M is described by means of (the intersection of) the tangent halfspaces at all boundary points: each halfspace is defined by means of an affine inequality constraint (three of them are depicted in Figure 1. b). Note that the boundary of the feasibe set M has a curvature, although each describing inequality constraint is affine (envelope-effect). Moreover, for feasible points arbitrarily near x, the active set Yo(x) need not be a

Stability and Deformation 31

subset of Yo(x). In particular, the active set Yo(x) changes from point to point along the boundary of the feasible set M. The control of the active set Yo(x) is one of the main features in semi-infinite optimization.

Semi-infinite optimization has a wide range of applications, among them Chebyshev approximation, environmental problems and robotics; see [14] for a recent survey. As an illustration we shortly describe the problem of Chebyshev approximation in terms of semi-infinite optimization (cf. [15], [19]). In Chebyshev approximation, a continuous function w : IRr ----> IR is to be approximated -uniformly on a compact set Y c IRr - by means of an n-parameter family W(x, .). This yields the following optimization problem:

Minimizeg(x) := max Iw(y) - W(x, Y)I. yEY

(1.3)

We may consider problem (1.3) in x-space and in y-space (see Figure 2, where the set Y is a compact interval).

w(·)-W(x,)

w(·)-W(x,), x""x

y-space

I ~ y

x-space

Figure 2

32 CHAPTER 2

In y-space, we are dealing with the error function w(·) - W(x, .), where x is treated as a parameter. The problem consists in minimizing the maximal deviation of the error function. Note that for a specific x, the extremal points (i.e. points y E Y at which Iw(y)- W(x, y)1 = max Iw(y)-W(x, y)l) are of particular

yEY

interest. The set of extremal points may shift and even may bifurcate for x ~ x (see Figure 2). In x-space, we treat y as a parameter and we are dealing with the envelope of all functions ±(w(y) - We, y)) as y ranges over the compact set Y. The latter envelope defines precisely the function p. The minimization of p may be replaced by the search for the "lowest point" (x, X n +l) in the epigraph Epi(p) := {(X,Xn+l) I Xn+l ~ p(x)}, where xn+l is an artificial coordinate. The latter search is nothing else but minimizing the function X n +l on Epi(p). Note that (x, Xn+l) E Epi(p) iff the inequalities Xn+l ~ w(y) - W(x, y) and Xn+l ~ -w(y) + W(x, y) are fulfilled for all y E Y. In this way, we have transformed the Chebyshev approximation problem into a semi-infinite optimization problem. In particular, the approximation region plays the role of the index set Y and the set of extremal points coincides with the active index set Yo(x). Moreover, if the underlying functions wand Ware differentiable, so is the associated semi-infinite problem (although the function pin (1.3) is structurally nonsmooth).

The paper is organized as follows. In Section 2 we discuss the structure of the feasible set M (local smooth/continuous coordinates, etc.). In Section 3 we are concerned with stability of the feasible set M[h, gJ with respect to perturbations of the defining functions (h,g), whereas in Section 4 we focus on stability of stationary points of (SIP). In Section 5 we discuss the global (or structural) stability of (SIP). Finally, Section 6 is devoted to one-parametric deformation of (SIP).

2 STRUCTURE OF THE FEASIBLE SET

In order to describe the local structure of the feasible set M, it is useful to study the behaviour of the active set Yo(x) as x varies. The following obvious lemma will be crucial within this context.

Lemma 2.1 Let x belong to M and suppose that Yo(x) -# 0. Then, y belongs to Yo(x) iff Y is a global minimum for the restricted function g(x, ·)Iy.


By means of the following constraint qualification, we will assume that the index set Y has a simple local structure, such as an interval, a rectangle, or a differentiable manifold with boundary.

Definition 2.2 The linear independence constraint qualification (LICQ) is said to hold at fJ E Y ~f the vectors DUi(fJ), i E I, DVj(fJ), j E Jo(fJ) are linearly independent.

In the preceding definition, DUi (fJ) stands for the row vector of partial derivatives of Ui, evaluated at fJ, and Jo(!J) = {j E J I Vj(Y) = o}.

We will assume throughout this section that (LICQ) is satisfied at all points y E Y.

Definition 2.3 Let 'lj; E C 1 (JRT, JR). A point fJ E Y is called a critical point for'lj;ly if there exist 'xi, i E I, flj, j E Jo(fJ) (called Lagrange multipliers) such that

D'lj; = L 'xiDui + L flj DVj IfJ· (2.1) iEI jEJo(y)

If, in addition, flj ~ 0, j E Jo(!J), then fJ is called a stationary point (KarushKuhn-Tucker point).

Note that Lagrange multipliers 'xi, flj in (2.1) are unique, in view of LICQ. Moreover, every local minimum for 'lj;ly is a stationary point. Next, we assume that all appearing functions are of class C 2 .

Definition 2.4 Let'lj; E C 2(JRT, JR) and let fJ E Y be a critical point for 'lj;ly with Lagrange multipliers 'xi, i E I, flj, j E Jo (fJ). Then, the critical point fJ is called nondegenerate ~f the following two conditions are satisfied:

(ND1)

(ND2)

flj =1= 0, j E Jo (fJ) (strict complementarity)

D2 L(fJ) IT-Y is nonsingular, y

where L = 'lj; - L: 'xiUi - L: fljvj (Langrange function) and D2 L is the iEI jEJo(y)

matrix of second order partial derivatives. Moreover, TyY = {~ E JRT I DUi(y)~ = 0, i E I, DVj(!J)~ = 0, j E Jo(fJ)}, and D2 L(fJ) IT-Y = VT D2 L(fJ) V, where V is some matrix whose columns form a

y

34 CHAPTER 2

basis for the tangent space TyY. A local minimum Y of'lfJly is called nondegenerate if Y is nondegenerate as a critical point, otherwise it is called degenerate.

Note that a critical point Y of 'lfJly is a nondegenerate local minimum iff both ilj > 0, j E Jo (fj) and D2 L(y) IT-Y is positive definite.

y

Now we can discuss the so-called Reduction Ansatz:

Let x E M and suppose that every y E Yo(x) is a nondegenerate local minimum for g(x, ·)Iy. In particular, it follows that each y E Yo(x) is an isolated critical point (cf. [19]). Consequently, the set Yo(x) is a discrete, closed subset of the compact set Y. Hence, Yo(x) is a .finite set, say Yo(x) = {YI, ... ,Yp} (see Figure 3).

y

level lines of g(i,.)/y

Figure 3

In virtue of nondegeneracy we may apply the Implicit Function Theorem around x and each of the points YI, ... ,Yp. In this way we locally obtain Cl-mappings YI(X), ... ,Yp(x), and each Yt(x) is a local minimum for g(x, ·)Iy, whereas


yc(x) = Yc and, moreover, the marginal functions

~c(x) =g(x,yc(x)), £ = 1, ... ,p, (2.2)

are of class C2 (d. [12], [19]; compare also Section 4).

In order to decide if x belongs to M, we need to check that the graph of g(x,·) "touches or stays above the set Y". For x near x, the touching points belong to the set {Yl (x), . .. , yp(x)}, since the set Y is compact. Consequently, there exists a neighbourhood 0 of x such that

M n 0 = {x EO I hi(x) = 0, i E J, ~c(x) ;:::: 0, £ = 1, ... ,p}. (2.3)

In particular, the description of the feasible set M by means of infinitely many constraints is locally reduced to an (implicit) description by means of a finite number of constraints.

From the strict complementarity condition (NDl) it follows, that the active index set for yc(x) remains constant, i.e. Jo(yc(x)) == Jo(yc), £ = 1, ... ,po Referring to Figure 3 this means that Yl (x) remains in the interior, Y2 (x) moves along the boundary, and Y3 (x) stays fixed in the corner.

The movement of the local minimum Yc (x) gives rise to a shift term of second order in the marginal function (d. [56], [12], [32], [33]). In fact, suppose - for simplicity - that the local minimum y(x) is an interior point of Y (in particular, J = 0). For the derivative of the marginal function ~(x) := g( x, y( x)) we obtain by means of the chain rule:

D~(x) = Dxg(x, y) + Dyg(x, y) . Dy(x)l y = y(x).

Since y(x) is a local minimum for g(x, .), we have Dyg(x, y(x)) == O. It follows, putting y(x) = y, that

(2.4)

and D2~(X) = D(DT ~)Ix = D~g + DyD;g. Dy(x)l(x, y).

The "velocity term" Dy can be computed via differentiation of the critical point relation Dyg(x, y(x)) == O. Altogether, we obtain

The second term in the right-hand side of (2.5) is the mentioned shift term. If the local minimum fj gets sharper (i.e. if the eigenvalues of D~g become

36 CHAPTER 2

larger), then the shift term will have smaller influence. Note, in particular, that the term DyD; 9 . [D~gl-l DxD~ 9 in (2.5) is positive semi-definite (this will be used in Section 4, (SIP)-Case II). Furthermore, the right-hand side in (2.5) is nothing else but the Schur complement of the submatrix D~g in the total second derivative D2g.

Formula (2.4) also holds if y(x) is not an interior point. This leads to the following local structure of the feasible set. Suppose that the Reduction Ansatz is applicable and that the vectors Dhi(x), i E A, Dxg(x, Ye), e =.1, ... ,p, are linearly independent. Then, we may use the functions hi(x), i E A, g(x,Ye(x)), e = 1, ... ,p, as a new local (partial) C2-coordinate system. In these coordinates, the feasible set M takes the following form:

M n 0 ~ (IHP x lRn-IAI-p) n V, (2.6)

where 0 and V are suitable neighbourhoods of x E M and 0 E lRn - 1A1 , respectively, and IHP = {(6, ... ,~p) E lRP I ~i :::: 0, i = 1, ... ,p}.

Unfortunately, the Reduction Ansatz is not always applicable. This can be seen from the following example (cf. [28]).

Example 2.5 Let Y be the interval [-1,1]' A = 0, x E lR3 and g(x, y) = y4 +Xly2 +X2Y+X3. The corresponding feasible set is convex, since the function 9 is linear in x (Figure 4.b). The boundary 8M is the upper part of the set {x E

lR3 I 9 = 0 and Dyg = 0 for some y}, the so-called "swallowtail" (Figure 4.a). In Figure 4.c the change of the function g(x,·) is depicted as x turns around the origin (along 8M).

The Reduction Ansatz is not applicable at the origin x = 0, since the function y4 has a degenerate minimum. The non-applicability of the Reduction Ansatz is even stable under small (C4-}perturbations. To see this, consider the map T:

T: (Xl,X2,x3,Y) f---> (g,Dyg,D~g,D~g). (2.7)

Note that T(O) = 0 and that DT(O) is nonsingular. Hence, basically in virtue of the Implicit Function Theorem, we may consider 9 itself as a parameter, and we obtain a mapping 9 f---> (x(g), y(g)) such that T(x(g), y(g)) == O. The point x(g) is exactly that point at which the Reduction Ansatz fails.


g(x,·)

+

a b c

Figure 4

The precise description of the feasible set M - in new differentiable coordinates - becomes quite complicated if the Reduction Ansatz fails. In fact, in that case we have to deal with degenerate local minima and their unfoldings (via the family x f---> g(x, ·)Iy). In case that the index set Y is a compact interval this has been studied in [31], [57]. Let us suppose that Y is the compact interval [a, b], A = 0 and that 9 E COO(JRn x JR, JR). Let M be the corresponding feasible set, x E M and let y belong to Yo(x) C [a, b]. At a degenerate critical point y we have to define the order of degeneracy in some stable way (compare the degeneracyy4 in Example 2.5 and the associated mapping T in (2.7)). An interior point y E (a, b) is of order k (k odd), if at (x, V):

(2.8)

A boundary point, say y = a, is of orQer k, if at (x, a):

(2.9)

Note that fj is nondegenerate iff its order equals l. In both cases (2.8), (2.9) we define the set Sk consisting of k vectors:

Sk = {Dxg, DyDxg, ... , D;-l Dxg}.

38 CHAPTER 2

The linear independence of the set Sk guarantees the stability of the degeneracy (d. the mapping T in Example 2.5). For k = 1 we have Sl = {Dxg} (compare (2.4)). According to Example 2.5 we introduce the following functions 'l/il, 'l/i~:

min yk+1 + x1yk-1 + ... + Xk-1Y + Xk, yE[-l,l]

min yk + x1yk-1 + ... + Xk-1Y + Xk. YE[O,l]

Now, the generic, stable case is the following: For each x E M the active set Yo(x) is finite, say Yo(x) = {-ih, ... , Yp}, every Ye, e = 1, ... , p is of finite order, say ke, and, moreover, dim(span(U Ske)) = I: ke. In the latter case, ideas from

e e singularity theory become available (versal unfoldings, etc.; see e.g. [2]). It can be shown that there exists a local CDO -coordinate transformation, sending x onto the origin, such that M locally takes the following form:

{x E IRn I TJkeO ~ 0, e = 1, ... ,p},

where TJke equals 'l/it ('l/i~e) if Ye is an interior point (boundary point) and where the argument of TJkl is the first k1-block of coordinates Xl, ... , Xk l , the argument of TJk2 is the next k2-block of coordinates Xk l +1, ... , Xk l +k2, etc.

In case that the Reduction Ansatz is applicable, we have ke = 1, e = 1, ... ,p. Note that 'l/ii( Xl) = 'l/i¥ (Xl) = Xl. Consequently, the feasible set M takes locally the form IHP x IRn- p , as we already have seen in (2.6).

For higher dimensional index sets Y, the precise local description of the feasible set M becomes extremely complicated, due to the classification of degenerate local minima. In fact, for I-dimensional Y the local minima of finite order form a discrete list of so-called normal forms: y2k, k = 1,2, ... , and {yk, y ~ O}, k = 1,2, ... However, in higher dimensions, the normal forms may depend on additional parameters. A typical (unconstrained) example in (Yl, Y2)-space is given by the family {fa I a> I}, where fa(Yl,Y2) = (YI + Y~)(YI + ay~). In particular, for a1 #- a2 there exists no local C 1-coordinate transformation , leaving the origin invariant, such that f a2 = fal 0 (see e.g. [57]).

Although a description of the feasible set M in differentiable local coordinates is complicated, its generic description in local Lipschitzian coordinates is quite easy to establish.

Definition 2.6 The extended Mangasarian-Fromovitz constraint qual~fication (EMFCQ) is said to hold at x E M if the following conditions are satisfied:

Stability and Deformation

(EMFCQ 1) The vectors Dhi(x), i E A are linearly independent.

(EMFCQ 2) There exists a vector ~ E IRn satisfying:

Dhi(x) . ~ = 0, i E A

Dxg(x, y) . ~ > 0 for all y E Yo(x).

A vector ~ satisfying (2.10), (2.11) is called an EMF-vector at X.

39

(2.10)

(2.11)

In Definition 2.6 we used the adjective" extended" in order to distinguish semiinfinite optimization problems from those having only a finite number of constraints. In the latter case (EMFCQ) is denoted by (MFCQ) (cf. [44]). In [28] it is shown that (EMFCQ) is a generic property (with respect to the C;-topology for the defining functions h, g; the C:-topology is introduced in the next section).

Theorem 2.7 ([28]) Suppose that (EMFCQ) is satisfied at all points of the feasible set M. Then, M is a Lipschitzian manifold (with boundary) of dimension n - IAI, and the boundary 8M equals {x E IRn I hi(x) = 0, i E A, ming(x, y) = O}. yEY

Sketch of the proof. In virtue of (EMFCQ 1) we may locally use the function hi, i E A, as new CI-coordinates. Taking these coordinates, we may assume that A = 0. Without loss of generality we now may assume that E = (l,O, ... ,O)T. Consider the mapping = (l, ... ,n), where l(X) =

ming(x, y), i(X) = Xi - Xi, i = 2, ... , n. Then, is locally Lipschitzian. UsyEY

ing Clarke's sub differential (cf. [3]), (EMFCQ 2) guarantees that «I> has a local Lipschitz inverse (around x). Consequently, in the new coordinates the set M takes the form {z E IRn I ZI ~ O} with boundary {z E IRn I Zl = O}. 0

Comments. The shift term (cf. (2.5)) was established in [56, 12] for C2_

data. A similar result is obtained in [32, 33] under relaxed differentiability assumptions. In particular, second order necessary optimality conditions for (SIP) are investigated and a formula for the upper second order directional derivative of the sup-type function sup{ -g(x, y), Y E Y} is given in terms of the first and second partial derivatives of g with respect to x. Relationship between [56], [12] and [32, 33] are presented in [34].

40 CHAPTER 2

3 STABILITY OF THE FEASIBLE SET

In this section we consider global stability of the feasible set M[h, g] under small CI-perturbations of the defining functions h = (hi, i E A) and g. The compact index set Y remains fixed, the defining functions Ui, Vj are assumed to be of class Coo and we assume throughout this section that (LICQ) is satisfied at all points of Y.

The space Ck (JRn , JR) will be topologized by means of the strong (or Whitney-) Ck-topology, denoted by C: (cf. [17], [20]). For finite k, the C:-topology is generated by allowing perturbations of the functions and their derivatives up to order k which are controlled by means of continuous positive functions EO : JRn ----> JR. The product space Ck(JRn,JRm) ~ Ck(JRn,JR) x .,. x Ck(JRn,JR) will be topologized with the corresponding product topology. In Section 2 we mentioned that the extended Mangasarian-Fromovitz constraint qualification (EMFCQ) is a generic property. Now, we can be more precise (d. [28]): Let F denote the subset of CI(JRn,JRIAI) x CI(JRn+r,JR) consisting of those pairs (h, g) for which (EMFCQ) holds at all points x E M[h, g]. Then, F is C;-open and C;-dense.

Definition 3.1 The feasible set M[h,g] is called stable ~f there exists a C;neighbourhoodU of (h,g) in C 1(JRn,JRIAI) x CI(JRn+r,JR) such that for every (h, il) E U, the corresponding feasible set M[h, ill is homeomorphic with M[h, g].

Theorem 3.2 (Stability Theorem) Suppose that M[h, g] is compact. Then, the feasible set M[h, g] is stable ~f and only if (EMFCQ) holds at every point x E M[h,g].

The Stability Theorem was firstly proved in [9] in case of finitely many inequality constraints. Then, in [28] it was extended to the semi-infinite case.

Sketch of the proof. Let us assume, for simplicity, that there are no equality constraints, i.e. A = 0.

Necessity part. Suppose that (EMFCQ) is not satisfied at x E M[g]. Then, the following system is not solvable (d. Definition 2.6):

Dxg(x, y) . ~ > 0, y E Yo(x).

Since Yo(.'r) is compact, it follows that the origin belongs to the convex hull conv{Dxg(x,y) lyE Yo(x)}. Choose a minimal subset {:ilI, ... ,Yp} such that ° E conv{Dxg(x, Ye), £ = 1, ... ,p} (d. Figure 5; p = 3) .

.L:\,~.L \

y

~ . , , , , , , , . , , ,

Figure 5

original

perturbed

Then, a local CI-perturbation of the function 9 around {x} x Y is performed such that afterwards we have: Yo(x) = {Y1,'" ,Yp} and all Ye are nondegenerate local minima. Now, the Reduction Ansatz is available, and for some neighbourhood 0 of x we have for the perturbed feasible set M:

M n 0 = {x E 0 I cpe(x) ~ 0, £ = 1, ... ,p} = {x EO I mtx(-cpe(x)) s:; O},

where cpe are marginal functions (d. (2.2) and (2.3)). Recall that Dcpe(x) = Dxg(x, fJe). Consequently, at x we have

p

L ).eDcpe(x) = 0, ).e > 0, L).e = 1. e=l

If, in addition, the restricted Hessian ~ ).eD2cpe(x)ln KerD<pe(x) is nonsine

gular, then the point x is a nondegenerate stationary point for the maximum

42 CHAPTER 2

function max( -rpe(x)) (cf. [19]). Here, KerDrpe(x) = {~ E IRn I Drpe(x)·~ = O}. e

If not, recall:

D2rpl(x,yt} = D;g(X,Yl) + shift term (containing partial

derivatives with respect to the variable y).

A local quadratic perturbation around (x, Yl) of the type (x _x)T B(x -x) does not depend on y and will make the mentioned restricted Hessian nonsingular. After all these perturbations we are in the situation that the feasible set M around x is the lower level set of a function of maximum type at a nondegenerate stationary point. Outside a small neighbourhood of x we perturb the function g further such that there (EMFCQ) holds. Hence, apart from the point x, the set M has become a (compact) topological manifold (see Figure 6). Finally, adding a small positive (negative) constant to the function g around {x} x Y, we obtain two compact topological manifolds M l , M 2 , which are not homeomorphic.

M

Figure 6


The latter comes from the fact that the lower level set of a function of maximum type changes its homotopy-type when passing a value corresponding to one nondegenerate stationary point (d. [19]). For example, in Figure 6, the number of connected components of Ml and M2 are different.

Sufficiency part. Suppose that (EMFCQ) holds at every point of M[g] (recall that we omitted the equality constraints). Consequently, at every point x E 8M[g] an EMF-vector ~(x) is available (d. Definition 2.6). For i sufficiently\near x, the vector ~(x) is an EMF-vector at i as well. This opens the possibility (via a partition of unity) to construct a vector field ~ E C1 (IRn, IRn) of compact support, such that ~(x) is an EMF-vector at every x E 8M[g]. Let x be near the boundary 8M[g]. Then, the integration time T(X), along the vector field ~, in order to reach 8M[g] - starting at x - is a Lipschitzian function of x. In fact, in virtue of (2.11), the integration time to reach some specific constraint g(., y) = 0, depends C 1 on (x, y); the time T(X) is the minimum integration time (with respect to y), see Figure 7.a.

M[gj

a

b

Figure 7

g(,Yj)=O

g(,yz)=O

/

~aM[gj

°,-vaM[g]

Now, for g sufficiently C;-near g we have: (EMFCQ) holds at every point of M[g], 8M[g] is C1-close to 8M(g), and ~(x) is an EMF-vector at every point x E 8M[g] as well (Figure 7.b). Next, we rescale the vector field ~ per integral curve such that the integration time from a point x E 8M[g]

44 CHAPTER 2

towards the corresponding (uniquely determined) point in 8M[g] is equal to one. The rescaled vector field ~ is Lipschitzian and globally integrable. Let (t, x) f---7 '!j;(t, x) denote the flow of the vector field { Then, '!j; is Lipschitzian and the "time one"-map '!j;(I,·) is a homeomorphism which sends M[gJ onto M[g]. 0

Comments. Theorem 3.2 was proved in [9J in case of finitely many inequality constraints and in [28J for the semi-infinite case under the assumption that hi E C2 (JRn, JR), i E I. In [18, 49J it is shown that it is also valid in case that hi E C 1 (JRn,JR), i E I. The compactness of M[h,gJ is used in the necessity part of the proof of Theorem 3.2 in order to have a finite number of certain topological invariants (e.g. the number of connected components, etc.). In [18, 55J corresponding stability results for noncompact feasible sets are established, where the lack of compactness is substituted by considering stability properties of certain compact subsets of M[h, gJ. An extension to feasible sets which depend on an additional real parameter is performed in [26J. Assuming (EMFCQ) and an appropriate generalization of the Condition C in [45J the stability of the feasible set is shown with respect to both perturbations of the function vector in the C;-topology and variations of the additional parameter. Besides the considered (global) stability of the feasible set, there are several papers dealing with the equivalence of (EMFCQ) with some (local) topological properties; as an example we refer to the investigation on metric regularity in [l1J.

4 STABILITY OF STATIONARY POINTS

In this section, all defining functions are assumed to be of class C2 . The compact index set Y remains fixed and we assume throughout this section that (LICQ) is satisfied at all points of Y. Moreover, we assume that (EMFCQ) is satisfied at the points of M under consideration.

Definition 4.1 A point x E M is called a stationary point for (SIP) ~f there exist fiI, ... ,YP E Yo(x), reals iJi, i E A, 1j 2: 0, j = 1, ... ,p such that

p

Df(x) = LiJiDhi(x) + L 1jDx g(x, Yj)· ( 4.1) iEI j=l

In virtue of (EMFCQ), a local minimum for (SIP) is necessarily a stationary point (cf. [15]). Strong stability of stationary points refers to existence, local uniqueness and continuous dependence on local C 2-perturbations of the data (f, h, g) (cf. [39], [50]).

Strong stability of stationary points is closely related with the invertibility of certain associated mappings. Let us firstly consider the unconstrained case. Then, a point x is stationary if D f(x) = O. The latter is a system of n equations with n variables. The mapping x f---+ DT f(x) is locally invertible at x if the Jacobian matrix D(DT f) = D2 f is nonsingular at X. Then, we may regard the function f itself as a parameter and, basically in virtue of the Implicit Function Theorem, we locally get the existence and uniqueness of a stationary point xU) for j be'ing C 2-close to fi moreover, xU) depends continuously on j (cf. [20]). Consequently, the nonsingularity of D2 f(x) is a sufficient condition for strong stability of the stationary point Xi on the other hand, it is a necessary condition for strong stability as well (cf. [20]). Hence, we have established an equivalent algebraic condition (nonsingularity of the Hessian D2 f) for strong stability.

For optimization problems with finitely many inequality constraints, an equivalent algebraic condition for strong stability of stationary points was presented by Kojima in [39]. The semi-infinite case was recently settled by Riickmann in [50]. In the sequel we will discuss some illustrative aspects from [39], [50]. We start with the case of finitely many constraints. Let gj E C 2 (JRn, JR), j E B, where IBI < 00, and consider the associated feasible set

M* = {x E JRn I hi(x) = 0, i-E A, gj(x) ~ 0, j E B}.

Case I Let (LICQ) be satisfied at x E M*, where x is a stationary point for fIM*' Then, there exist (unique) reals i3i, i E A, 1j ~ 0, j E Bo(x), such that

Df = 'Li3iDhi + 'L 1j Dgj lx, (4.2) iEA jEBo(x)

where Bo(x) = {j E B I gj(x) = O}. If all Lagrange multipliers 1j in (4.2) are nonvanishing, we say that strict complementarity holds. However, some 1j might vanish, and we put

B+(x) = {j E Bo(x) 11j > O}.

46 CHAPTER 2

Consider the following mapping F : JRn x JRIAI x JRIBI ----> JRn x JRIAI x JRIBI, introduced by Kojima in [39],

(4.3)

where a+ = max(O, a), a- = min(O, a). Note that x belongs to M* and is a stationary point for flM* iff F(x, /3,1') = 0 for some (/3,1')' So, the mapping F in (4.3) is an appropriate generalization of the stationary point mapping x f---4 DT f(x) before. However, in general, the mapping F is not differentiable, but it is piecewise continuously differentiable (PC1). In fact, F is a C1-mapping in a neighbourhood of (x, /3,1') iff the strict complementarity holds at x, i.e. iff B+ (x) = Bo(x). In the latter case, the Jacobian matrix DF(x,/3,1') is nonsingular iff D2L(x)lr_M* is nonsingular, where L is the associated Lagrange function, and TxM* Is the tangent space of M* at x (compare the nondegeneracy condition (ND2) in Definition 2.4); in case of nonsingularity, F is locally invertible, and, regarding f, h, 9 as parameters, we obtain that x is a strongly stable stationary point. In case that B+(x) -=I=- Bo(x), we may consider all possible C1-extensions of F with respect to the pieces where F actually is of class C1 . Now, it follows that F is locally PC1-invertible if the following condition holds (cf. [39]):

D2 L(x) IT(B) is nonsingular with constant sign of determinant}

for all B+(x) c Be Bo(x),

where T(B)={~EJRn I Dhi(X)~=O, iEA, Dgj(x)~=O, jEB}.

(4.4)

The constant sign of the determinants is a condition of coherent orientation (cf. [48]). It is the appropriate generalization of the following one-dimensional exam p le (cf. Figure 8).


/ /

/

/ , / 0 ,

/ , / ,

a

, ,

47

/ /

/

b

Figure 8

The mappings Fl, F2 in Figure 8 are pel-mappings, both consisting of two pieces. The two piece-extensions of FI have different signs (of the determinant) of the derivatives. Note that FI is not locally invertible at the origin. On the other hand, the corresponding signs for F2 are equal, and F2 is locally invertible at the origin.

In Figure 9 (cf. [7]) two examples related to Condition (4.4) are sketched; one of them shows strong stability (Figure 9.a) and the other one exhibits nonstability (Figure 9.b). In each case, two typical perturbations are depicted.

48 CHAPTER 2

• stationary point

a: f(x)=-x/+xi

¢:::

det JiZ/<O , det D2/lx2=o <0 strongly stable

b: f(x)=x/-xi

¢:::

det JiZ/<O , det D2/1 >0 not strongly stable x2=o

Figure 9

In [39] it is shown that Condition (4.4) is also necessary for strong stability. Hence, Condition (4.4) is the appropriate algebraic equivalence for strong stability in case that (LICQ) is satisfied.


Case II Let (MFCQ), but not (LICQ), be satisfied at x E M*, where x is a stationary point for fIM*' _ Note that in this case the Lagrange multipliers f3i, 'Yj in (4.2) need not be unique anymore. It turns out, that in this case only local minima might be strongly stable. In particular, the equivalent algebraic condition for strong stability now reads (d. [39]):

For all !3i, i E A" 'Yj 20, j E Bo(x), satisfying (4.2) }

D2 L(x)IT(B+(x)) is positive definite. (4.5)

In (4.5), B+(x) corresponds to the chosen multipliers i3i, 'Yj, and T(B+(x)) is defined according to (4.4). The set of Lagrange multipiers i3i, 'Yj satisfying (4.2) is a compact polyhedron (d. [5]); it suffices to check (4.5) at the extreme points of the latter polyhedron. Condition (4.5) implies that x is a strict local minimum for fIM*' Violatior. of (4.5) is illustrated by Figure 10, where a saddlepoint generates an unstable situation (d. [7]). The contraints gl, g2 in Figure 1O.b are shifted and rotated W.r.t. Figure 10.a.

f(x)=-x/+x/

gix)=gix)=x2~O

a

• stationary point

b

Figure 10

50 CHAPTER 2

Now, we turn to a discussion of strong stability in the semi-infinite case (cf. [50] for details). Let x E M be a stationary point for (SIP); cf. Definition 4.1. In accordance with the outline above, we consider two cases:

(SIP)-Case I Yo(x) = {til, ... , Yp}, and all points Y1, ... , YP are strongly stable stationary points for g(x, ·)Iy.

Choose Y E {YI, ... , Yp}· Recall that (LICQ) is assumed to hold at all points of Y. Then, we have the stationary point relation (all Jlj 20):

Dyg(x,y) = L:XiDui(Y) + L JljDvj(Y). iEI jEJo(Y)

Put J+(y) = {j E Jo(y) I Jlj > O}. According to the discussion in Case I above, we may apply the Implicit Function Theorem for every J+(y) c :J c Jo(y), and locally we get a CI-mapping y.:r(x), with y.:r(x) = Y, such that Y.:r(x) is a local minimum for g(x, ·)IY(:J)' where Y(:J) = {y E IW I Ui(Y) = 0, i E I, Vj(Y) = 0, j E :J}. In virtue of strong stability of Y, we also have locally a continuous function y(x), with y(x) = Y, such that y(x) is a local minimum for g(x, ·)Iy. Clearly, the function y(x) is a continuous selection of the CI-functions y.:r(x), hey) c :J c Jo(y), and, hence, y(x) is Lipschitz continuous. In this way we may generalize the Reduction Ansatz in the sense that the infinite set of inequality constraints {g(x, y) 2 0, Y E Y} can locally be replaced by means of finitely many inequality constraints 'Pj(x) 20, j = 1, ... ,p, where 'Pj(x) is the marginal function g( x, Yj (x)) and Yj (x) the Lipschitz function as mentioned above. In analogy to the discussion of Case I, II above, we distinguish two subcases:

Subcase La The vectors Dhi(x), i E A, Dxg(x, Yj), j = 1, ... ,p are linearly independent. Put Bo(x) = {I, ... ,p} and B+(x) = {j E Bo(x) I 1j > O}, where the unique numbers 1j refer to (4.1). Then, the stationary point x is strongly stable iff the following condition is satisfied:

For all B+(x) c B c Bo(x) and h(yj) C :J; c JO(yj), j = 1, ... ,p, the restricted Hessian D2£C.:rI, ... ,.:rv )(x)IT(B) is nonsingular with constant sign of

determinant, where

p

£(JI,···,sP) (x) = f(x) - L,6ihi(X) - L1jg(x, Y.:rj (x)) (4.6) iEA j=I


and

T(8) = {.; E mn I Dhi(X)'; = 0, i E A, Dxg(x,ifj)'; = 0, j E 8}.

Subcase I.b The vectors Dhi(x), i E A, Dxgj(x, ifj), j = 1, ... ,p are linearly dependent, but (EMFCQ) holds. Similarly as in Case II above, the Lagrange multipliers 13i, 1j in (4.1) need not be unique. So, we have two kinds of choices: on the one hand a set of 13i, i E A, 1j ;:::: 0, j = 1, ... ,p, satisfying (4.1), and, in addition, for every j E {l, ... ,p} we may choose .:T;, J+(yj) C .:T; C JO(yj) , as in Subcase I.a. Each selection defines a Lagrange function as in (4.6), and the equivalent algebraic condition for strong stability is similar to the positive definiteness condition in (4.5). In particular, in this sub case, a strongly stable stationary point is necessarily a strict local miminum for (SIP).

(SIP)-Case II Not all points from Yo(x) are strongly stable stationary points for g(x, ·)Iy.

Note, that the active set Yo(x) now might be infinite. The equivalent algebraic condition for strong stability in this case is also a positive definiteness condition, however, a rather technical one. In particular, a strongly stable stationary point will be a strict local minimum for (SIP). We will not spell out the precise equivalent algebraic condition, but merely motivate the necessity of positive definiteness.

Suppose that there are no equality constraints (A = 0) and that the index set Y is a square in m2 with the origin as its center. Firstly, suppose that the active index set Yo(x) is the singleton {if} with if = O. We have the stationary point relation

(4.7)

Let 1 in (4.7) be positive, and suppose that if is a nondegenerate minimum for g(x, ·)Iy. The corresponding Lagrange function will be

L(x) = f(x) -1g(x, y(x)).

Recalling the shift term in (2.5) we obtain

D2 L(x) = [D2 f(x) -1D;g(x, if)] + 1DyD~ g. [D;g(l . DxD'{; gl(x, if)' (4.8)

Since 1 > 0 and D~g(x, y) is positive definite, we see that the second term in the right-hand side of (4.8) is positive semidefinite.

52 CHAPTER 2

Now, suppose that the function 9 does not depend on the variable Y2, and that g(x, y) = yf (note that Yo(x) is an interval). In virtue of the nondegeneracy in the variable Yl, we obtain a similar expression as the right-hand side of (4.8), where D;g and DyD'f; 9 is to be replaced by D;,g and DYI D'f; g, respectively. Let us denote this new matrix by V. Next, we perturb 9 a little bit such that the variable Y2 comes into play in a dominant way. For example, we might think of a perturbation of the type E3y~ + EY2'1f; (x), where 'If; is some affine function of x with 'If;(x) = 0 and where E > 0 is small. Then, suddenly, the origin becomes a nondegenerate minimum for g(x, ')Iy, and Formulas (4.7), (4.8) are available. However, the change in the second term in the right-hand side of (4.8) is drastic; in fact, it is of order c l . Put T = {~ E IRn I Dxg(x,Y)· ~ = O} and suppose that the restricted matrix VIT ("the regular part") has a negative eigenvalue. Then, a specific perturbation as above, with the aid of the variable Y2, may yield a restricted Hessian D2 £(x) IT in which one of the negative eigenvalues of VIT is turned into a positive one. Such a change in the distribution of eigenvalues, due to arbitrarily small perturbations, would make the stationary point x unstable. The latter change, however, is not possible, if VIT is already positive definite. This motivates a positive definiteness condition" on all possible regular parts" , as presented in [50].

Comments. The concept of strong stability of a stationary point of a C 2_

problem with finitely many inequality constraints was introduced by Kojima [39]. On the other hand, Robinson [47] established the strong regularity of a solution of a C 2-problem via the approach of generalized equations. Then, it was shown in [24] that both concepts are equivalent in case that (LICQ) is satisfied at the considered point; other equivalent characterizations of strong stability are given in [38]. Several papers [23,41,42] deal with stability properties of solutions of a system of Lipschitz continuous equations as well as applications to Cl,l-optimization problems (Cl , 1 means that the describing functions are continuous differentiable and their derivatives are Lipschitz continuous). We also refer to the related papers [16, 10, 35, 37, 53] and [36], where in the latter paper stable local minima in semi-infinite optimization are considered. Finally, we mention the basic book by Fiacco [4] on sensitivity and stability in nonlinear optimization as well as the recent works [1] and [43] on perturbation theory in nonlinear optimization.


5 GLOBAL STABILITY

In this section we consider global (or structural) stability of semi-infinite optimization problems under small perturbations of the defining functions (f, h, g), which are assumed to be of class C2 . The compact index set Y remains fixed, the defining functions Ui, Vj are assumed to be of class Coo, and we assume throughout this section that (LICQ) is satisfied at all points of Y.

In Section 3 we discussed the stability of the feasible set M[h, g], whereas Section 4 was devoted to the stability of stationary points. Now, we put all these things together and consider structural stability of the whole optimization problem. Let P(f, h, g) denote a semi-infinite optimization problem of the type (SIP) generated by the data (f,h,g). For fixed t E JR, the (feasible) lower level set .c.t(f, h, g) is defined as follows:

.c}(f,h,g) = {x E M[h,g] I f(x) ~ t}.

Definition 5.1 Two problems P(f, h, g) and P(j, h, g) are called equivalent ~ notation: P(f, h, g) '" P(j, h, g) ~ ~f there exist continuous mappings cP : JR x JRn -> JRn, 'l/J : JR -> JR with the following properties:

1. The mapping cp(t,·) : JRn -> JRn is a homeomorphism for each t E JR.

2. The mapping 'l/J is a homeomorphism and monotonically increasing.

3. For all t E JR we have:

t 1jJ(t) - - -cpd£ (f,h,g)] = £ (f,h,g),

where CPt := cp(t, .).

The latter concept of equivalence was introduced in [8], and it was shown that '" is indeed an equivalence relation; see Figure 11.

54 CHAPTER 2

"" (t)

M[h,gj

LV,h,g)

Figure 11

The homeomorphism 'P(t,') in Definition 5.1 depends on the level t. It is not possible, in general, to take just one fixed homeomorphism for all t. This comes from the fact that strict complementarity at stationary points might not be satisfied. In Figure 12 we sketched some level lines of functions f, j; the feasible set is a square in JR2 .


f f

Figure 12

Each function has 3 stationary points: a minimum in the upper part, a maximum in the middle and a saddlepoint at the bottom of the square. The saddlepoint of f lies on the boundary of the feasible set and, consequently (D f vanishes), strict complementarity is not satisfied. A slight perturbation may cause the saddlepoint to shift into the interior (function f). Now, suppose that the two problems are equivalent, and that the homeomorphism r.p does not depend on the level t. The homeomorphism r.p will map M[h, gL onto M[h, g] and the stationary points of f onto the corresponding ones 6f f. So, if we reach the level of the saddlepoint of f, we have to map the connected remaining part of the feasible set (denoted by 1) onto the remaining part at the right-hand side. However, the latter consists of two connected components (denoted by 1

56 CHAPTER 2

and 2). But, a continuous mapping sends connected sets onto connected sets. Consequently, the homeomorphism 'P cannot be a fixed one.

Definition 5.2 The semi-infinite optimization problem P(f, h, g) is called structurally stable ~f there exists a C; -neighbourhood U of the defining triple (f, h, g) such that: P(j, h, g) '" P(f, h, g) for all (f, h, g) E U.

Theorem 5.3 (Structural Stability Theorem) Let the feasible set M[h, g] corresponding to P(f, h, g) be compact. Then, the semi-infinite optimization problem P(f, h, g) is structurally stable ~f and only ~f the conditions CI, C2, C3 are satisfied:

C1. (EMFCQ) holds at all points x E M[h,g].

C2. Every stationary point is strongly stable.

C3. Different stationary points have different f-values.

In case of finitely many constraints, Theorem 5.3 was proved in [8] (necessity part) and [29] (sufficiency part). Then, in [55] the ideas of the proof were generalized quite far with respect to semi-infinite optimization. However, the complete algebraic characterization of strongly stable stationary points was not settled at that time. Now, since the characterization in [50], all key tools are available, and easy to build in, thereby adopting the techniques from [8J, [29J and [55J.

Sketch of the proof (Necessity part).

The validity of Condition CI is a consequence of the Stability Theorem for feasible sets (Theorem 3.2); Condition C3 is easily seen to be necessary. Now, suppose that Condition C2 is not fulfilled. So, we have a stationary point x for P(f, h, g), which is not strongly stable. By means of arbitrarily small C;-perturbations we may perturb f into II, /2, and (h,g) into (h,g) such that:

1. The problem P(II, h, g) has k stationary points and all of them are strongly stable, except one (namely the point x).

2. The problem P(/2, h, g) has a least (k + I)-stationary points and all of them are strongly stable.

3. Both in P(II,h,g) and in P(hh,g), different stationary points have different II (/2-) values.

After the (technical) perturbation above, we have to bring in a topological idea: In fact, we have to decide when the homeomorphy type of a lower level set changes. Put £~(J,h,g) = {x E M[h,gll a:::; f(x):::; b}. The following can be shown:

1. If £~ (J, h, g) does not contain stationary points, then £a(J, h, g) and £b(J, h, g) are homeomorphic.

II. Let £~ (J, h, g) contain exactly one stationary point X, and let a < f(x) < b. Moreover, suppose that x is strongly stable. Then, £a (J, h, g) and £b (J, h, g) are not homeomorphic.

Now, suppose that P(J, h, g) is structurally stable. Then, the problems P(h, h, g) and P(h, h, g) are equivalent. Let the level t grow, starting with a level below minflIM[ii,g] and minf2IM[ii,gj'

Each time we meet a change in the homeomorphy type of a lower level set of P(h, h, g), we must have a change of the homeomorphy type of the corresponding level set ofP(h,h,g). From I, II above it follows that for the problem P(h, h, g) there are at most k changes in the homeomorphy type, whereas for P(h, h, g) there are at least (k + I)-changes. This, however, cannot be. 0

Comment. As in case of stability of the feasible set (cf. Section 3) the assumption of compactness in Theorem 5.3 can be slightly relaxed (cf. [55]).

6 GLOBAL DEFORMATIONS

In this section we consider semi-infinite optimization problems (SIP)t depending on a real parameter t E IR:

(SIP)t : Minimize f(x, t) subject to x E M(t),

where

M(t) {x E IRn I hi(x, t) = 0, i E A, g(x, t, y) 2: ° for all y E Y(t)},

Y(t) {y E IRT I Ui(t, y) = 0, i E I, Vj(t, y) 2: 0, j E J}.

All defining functions are assumed to be of class C3 . In particular, we are interested in the evaluation of stationary (critical, etc.) points as the parameter

58 CHAPTER 2

t varies. One of the possible applications is the following. Let us think of connecting a given problem (SIPh with an easy problem (SIP)o. A solution of (SIP)o is assumed to be known. Then, one might try to solve (SIPh by following a path of stationary points - starting at t = 0 at the known solution of (SIP)o - hopefully up to t = 1.

We have to impose some natural restrictions on the functions Ui, Vj, which define the index set Y(t). The functions Ui, Vj are assumed to belong to the set CUSC ("compact upper semi-continuous"), where

CUSC = {( ... , Ui, ... , Vj, ... ) E C3 (JRT +1, JRI11+IJI) I Y(t) is

compact for all t, and the set-valued mapping t f----+ Y(t)

is upper semi-continuous at each t}.

In [25] it is shown that CUSC is a C~-open set; hence, CUSC is also C;-open. As the parameter t varies, one cannot expect that (EMFCQ) is always satisfied. Consequently, for the definition of a critical (stationary, etc.) point, we have to relax the relation (4.1) since some Lagrange multipliers might tend to infinity. Therefore, we homogenize the relation (4.1), and this leads to the definition of a generalized critical point.

Definition 6.1 A point x E M(l) is called a generalized critical point (g.c. point) for (SIP)-[ if the vectors

Dxf(x, l), Dxhi(x, l), i E A, Dxg(x, l, y), Y E Yo(x, l)

are linearly dependent, where Yo(x,l) = {y E Y(l) I g(x, l, y) = O}.

Note, if x is a g.c. point, then there exist a finite (perhaps empty) subset {:ih, ... , Yp} C Yo(x, l) and real numbers R" i3i, "Ij, such that

p

R,Dxf(x, l) = "L i3i D xhi (x, l) + "L "IjDxg(x, Yj, l). (6.1) iEA j=1

In particular, if R, > 0 and "Ij 2 0, j = 1, ... ,p, we may divide by R, and, hence, x is a stationary point. However, in some situation, a vanishing R, is the only choice in order to fulfil (6.1).

Let E c IRn +1 denote the generalized critical point set,

E = {(x, t) E IRn +1 I x E M(t) and x is a g.c. point for (SIP)t}.


Theorem 6.2 ([13], [27]) There exists a C:-open and C:-dense subset g,

9 c C3 (JRn +1, mIA1+1 ) x c3 (JRn +r +1, m) x CUSC

such that for (f, . .. ,he, ... , g, .. . , Ui," . , Vj, .. . ) E 9 we have: Each point of the corresponding set I:: is one of eight types.

59

We will briefly describe the eight types; for details we refer to the literature mentioned below. Assume that (x, f) E I:: and put z = (x, f). The eight types can be divided into three groups:

Group I: (LICQ) is satisfied at all y E Y(f), and the Reduction Ansatz is applicable at x E M(f).

Via the Reduction Ansatz we may locally reduce the semi-infinite optimization problem to a problem with only a finite number of inequality constraints. In this case, I:: consists of five types (cf. [21, 22, 30, 7, 6]). For the local structure of I:: see Figure 13.

A point of Type 1 is a nondegenerate critical point (for the reduced problem); cf. Definition 2.4. In this case, we may apply the Implicit Function Theorem, and the set I:: is (locally) the graph of a C2-function of the variable t. At a point of Type 2, the strict complementarity «ND1) in Definition 2.4) is violated. Then, I:: is composed of the feasible parts of two curves: one lying in the (relative) boundary and the other one in the (relative) interior of the feasible set. At a point of Type 3 the nondegeneracy condition (ND2) is violated: the restricted Hessian of the Lagrange function has exactly one vanishing eigenvalue. The set I:: exhibits a quadratic turning point at z; in fact, the restricted function tlI:: has a nondegenerate local maximum (minimum) at z. The vanishing eigenvalue of the restricted Hessian changes sign when passing z along I::. The assumption that all defining functions are of class C 3 is related with points of Type 3. At a point of Type 4 the gradients of the active constraints are linearly dependent «LICQ) is violated) and the number of active constraints does not exceed the dimension n. Also in this case we have a quadratic turning point. However, the topological type is very different from that of points of Type 3; in particular, a local minimum switches into a local maximum when passing z along I::. At a point of Type 5 (LICQ) is also violated, but the number of active constraints equals n + 1. The set I:: consists of p half curves, emanating or terminating at z, where p = lYo(x, t)1 (= n + 1 - IAI). For a detailed discussion of points of Types 1-5 see [21], [22]. Points of Type 4, 5 might cause severe problems in the pathfollowing of branches of local minima; see also [7] and [30].

60 CHAPTER 2

Type 4 Type 5

Figure 13

Group II: (LICQ) is satisfied at all y E Y(t), but the Reduction Ansatz is not applicable at x E M(f).

This group consists of points of Type 6. The Reduction Ansatz fails in the sense that at exactly one point y E Yo(x,t) only the strict complementarity (cf. (NDl) in Definition 2.4) is violated: precisely one Lagrange multiplier ilj

at the critical point y for g(x, t, ·)IY(l) is vanishing.

As in the case of Type 2, there appear two curves: one of them treats the inequality constraint Vj (to which ilj above belongs) as an equality constraint, and the other one deletes Vj as a constraint. The set ~ locally consists of the union of two pieces, each piece belonging to one of the two mentioned curves; see Figure 14. The corresponding restricted Hessians only differ with respect to the shift term. For a detailed discussion of a point of Type 6 we refer to [13]. We note, that the nondegeneracy condition (ND2) from Definition 2.4 will


not be violated at a point y E Yo(x, t). This comes from the fact that each iJ E Yo(x, t) is a local minimum and that the parameter t is one-dimensional. The first appearing degeneracy would be of the type y4 which, however, needs two parameters for its universal unfolding (cf. [2]).

xt , / , / , /

~

Type 6

Figure 14

Group III: (LICQ) is violated at some point y E Yet).

Points of Type 7 and Type 8 belong to this group. A point is of Type 7 (Type 8) if q ::; r (q = r+I), where q = 111+IJo(E, Y)I and Jo(E, iJ) = {j E J 1 Vj(t, y) = O}. At a point of Type 7, the g.c. point set ~ stops (or emanates), and it is the half of a branch at a quadratic turning point (Figure I5.a).

At a point of Type 8, the g.c. point set ~ stops (or emanates) if (MFCQ) at iJ E Yet) is violated (Figure I5.b), whereas ~ only exhibits a nonsmoothness at z if (MFCQ) is satisfied (Figure I5.c). In fact, ~ stops (emanates) at t if t~e restricted function tly has a nondegenerate local maximum (minimum) at (t, y) E y, where y is the unfolded index set

y = {(t, y) E IR r +1 1 Ui(t, y) = 0, i E I, Vj(t, y) ~ 0, j E J}.

Consider Figure I5.a,b. When approaching the point z along ~, one of the points of the active index set Yo disappears as t passes t. Then, the expression for DxJ(x, t) as a linear combination of the (partial) derivatives of the active

62 CHAPTER 2

constraints (cf. (4.1)) is not possible anymore, since one of the necessary derivatives disappeared. Consequently, the branch of E must stop to exist at z. For a detailed study of the points of Type 7, 8 we refer to [27].

xt

. -t

Type 7 Type 8

a b c

Figure 15

Finally, we emphasize that the set E need not be closed. This is a phenomenon of global nature, due to the fact that components of Yet) may (dis)appear. We explain the idea with the aid of Figure 16.

As the parameter t increases, a new component of the set Yet) is born at h. Hence, suddenly a new inequality constraint becomes available. This constraint may cause infeasibility of corresponding points of the closure of E, and, then, the corresponding branch of E stops at h; in particular, the closure point does not belong to E. A similar situation may appear if a component of yet) disappears; this situation happens when passing the parameter value t2' We further refer to [25], where generic topological changes of the feasible set M(t) for (SIP)t are investigated.


I

- - I~"~--------------_------ I

-t

y

Figure 16

Comments. In case of finitely many inequality constraints Kojima and Hirabayashi [40J studied global deformations within piecewise differentiable setting; basically, they assumed (MFCQ) and a regular value condition. Another approach to one-parametric problems is presented in [46, 54J by using ideas from bifurcation theory. A modification of the mentioned five types from [21, 22J (cf. Group I) is considered in [51, 52J for the construction of a pathfollowing method for the solution of one-parametric semi-infinite problems.

REFERENCES

[IJ J. F. Bonnans and A. Shapiro. Optimization problems with perturbations: a guided tour. INRlA, Rapport de recherche No. 2872, 1996.

[2J Th. Brocker and L. Lander. Differential Germs and Catastrophes. London Math. Soc. Lect. Notes Vol. 17, Cambridge University Press, 1975.

[3J F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley, Chichester, 1983.

64 CHAPTER 2

[4] A. V. Fiacco. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York, 1983.

[5] J. Gauvin. A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming. Math. Programming, 12:136-138, 1977.

[6] W. Gomez, J. Guddat, H. Th. Jongen, J.-J. Riickmann, and C. Solano. Curvas criticas y saltos en optimizacion no lineal. In preparation.

[7] J. Guddat, F. Guerra, and H. Th. Jongen. Parametric Optimization: Singularities, Pathfollowing and Jumps. Wiley, Chichester, 1990.

[8] J. Guddat and H. Th. Jongen. Structural stability and nonlinear optimization. Optimization, 18:617-631, 1987.

[9] J. Guddat, H. Th. Jongen, and J.-J. Riickmann. On stability and stationary points in nonlinear optimization. J. Austral. Math. Soc., Ser. B 28:36-56, 1986.

[10] H. Giinzel and M. Shida. On stability concepts in nonlinear programming. ZORMath. Methods of Oper. Research, 41:153-160, 1995.

[11] R. Henrion and D. Klatte. Metric regularity of the feasible set mapping in semiinfinite optimization. Appl. Math. Opt., 30:103-106, 1994.

[12] R. Hettich and H. Th. Jongen. Semi-infinite programming: conditions of optimality and applications. In J. Stoer, editor. Optimization Techniques 2, pages 1-11. Springer, Berlin-Heidelberg-New York, 1978.

[13] R. Hettich, H. Th. Jongen, and O. Stein. On continuous deformations of semiinfinite optimization problems. In M. Florenzano et al., editors. Approximation and Optimization in the Carribean II, pages 406-424, Peter Lang, Frankfurt, 1995.

[14] R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods and applications. SIAM Review 35(3):380-429, 1993.

[15] R. Hettich and P. Zencke. Numerische Methoden der Approximation und semiinfiniten Optimierung. Teubner, Stuttgart, 1982.

[16] R. Hirabayashi, H. Th. Jongen, and M. Shida. Stability for linearly constrained optimization problems. Math. Programming, 66:351-360 1994.

[17] M. W. Hirsch. Differential Topology. Springer, Berlin-Heidelberg-New York, 1976.

[18] M. A. Jimenez and J.-J. Riickmann. On equivalent stability properties in semiinfinite optimization. ZOR-Math. Methods of Oper. Research, 41:175-190, 1995.

[19] H. Th. Jongen, P. Jonker, and F. Twilt. Nonlinear Optimization in lRn , I. Morse Theory, Chebyshev Approximation. Peter Lang, Frankfurt, 1983.

[20] H. Th. Jongen, P. Jonker, and F. Twilt. Nonlinear Optimization in lRn , II. T'ransversality, Flows, Parametric Aspects. Peter Lang, Frankfurt, 1986.

[21] H. Th. Jongen, P. Jonker, and F. Twilt. One-parameter families of optimization problems: equality constraints. J. Optim. Theory Appl., 48:141-161, 1986.


[22] H. Th. Jongen, P. Jonker, and F. Twilt. Critical sets in parametric optimization. Math. Programming, 34:333-353, 1986.

[23] H. Th. Jongen, D. Klatte, and K. Tammer. Implicit functions and sensitivity of stationary points. Math. Programming, 49:123-138, 1990.

[24] H. Th. Jongen, T. Mobert, J.-J. Riickmann, and K. Tammer. On inertia and Schur complement in optimization. Linear Algebra Appl., 95:97-109, 1987.

[25] H. Th. Jongen and J.-J. Riickmann. One-parameter families of feasible sets in semi-infinite optimization. University of Erlangen, Dept. for Appl. Mathematics, Preprint No. 203, 1996.

[26] H. Th. Jongen, J.-J. Riickmann, and G.-W. Weber. One-parametric semi-infinite optimization: on the stability of the feasible set. SIAM J. Opt., 4(3):637-648, 1994.

[27] H. Th. Jongen and O. Stein. On generic one-parametric semi-infinite optimization. University of Trier, Dept. of Mathematics, Preprint No. 95-06, 1995.

[28] H. Th. Jongen, F. Twilt, and G.-W. Weber. Semi-infinite optimization: structure and stability of the feasible set. J. Optim. Theory Appl., 72:529-552, 1992.

[29] H. Th. Jongen and G.-W. Weber. Nonlinear optimization: characterization of structural stability. J. Global Optimization, 1:47-64, 1991.

[30] H. Th. Jongen and G.-W. Weber. Nonconvex optimization and its structural frontiers. In W. Krabs and J. Zowe, editors. Proceedings of the Summer School 'Moderne Methoden der Optimierung', Lecture Notes in Econom. and Math. Systems, 378:151-203, 1992.

[31] H. Th. Jongen and G. Zwier. On the local structure of the feasible set in semiinfinite optimization. In B. Brosowski and F. Deutsch, editors. Parametric Optimization and Approximation, Int. Series of Num. Math., 72:185-202, Birkhiiuser, Basel, 1985.

[32] H. Kawasaki. An envelope-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems. Math. Programming, 41:73-96, 1988.

[33] H. Kawasaki. The upper and lower second order directional derivatives of a suptype function. Math. Programming, 41:327-339, 1988.

[34] H. Kawasaki. Second-order necessary and sufficient optimality conditions for minimizing a sup-type function. Appl. Math. Opt., 26:195-220, 1992.

[35] D. Klatte. On regularity and stability in semi-infinite optimization. Set- Valued Analysis, 3:101-111, 1995.

[36] D. Klatte. Stable local minimizers in semi-infinite optimization: regularity and second-order conditions. J. Compo Appl. Math., 56:137-157, 1994.

[37] D. Klatte. Stability of stationary solutions in semi-infinite optimization via the reduction approach. In W. Oettli, D. Pallaschke, editors. Advances in Optimization, pages 155-170, Springer, Berlin-Heidelberg-New York, 1992.

66 CHAPTER 2

[38] D. Klatte and K. Tammer. Strong stability of stationary solutions and KarushKuhn-Tucker points in nonlinear optimization. Ann. Oper. Res., 27:285-308, 1990.

[39] M. Kojima. Strongly stable stationary solutions in nonlinear programs. In S.M. Robinson, editor. Analysis and Computation of Fixed Points, pages 93-138, Academic Press, New York, 1980.

[40] M. Kojima and R. Hirabayashi. Continuous deformation of nonlinear programs. Math. Progr. Study, 21:150-198, 1984.

[41] B. Kummer. An implicit-function theorem for CO,I-equations and parametric Cl,l-optimization. J. Math. Analysis and Appl., 158(1):35-46, 1991.

[42] B. Kummer. Lipschitzian inverse functions, directional derivatives, and application in Cl,l-optimization. J. Optim. Theory Appl., 70:561-582, 1991.

[43] E. S. Levitin. Perturbation Theory in Mathematical Programming. Wiley, Chichester, 1994.

[44] O. L. Mangasarian and S. Fromovitz. The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl., 17:37-47, 1967.

[45] R. S. Palais and S. Smale. A generalized Morse theory. Bull. Am. Math. Soc., 70:165-172, 1964.

[46] A. B. Poore and C. A. Tiahrt. Bifurcation problems in nonlinear parametric programming. Math. Programming, 39:189-205, 1987.

[47] S. M. Robinson. Strongly regular generalized equations. Math. Oper. Res., 5:43-62, 1980.

[48] S. M. Robinson. Normal maps induced by linear transformations. Math. Oper. Res., 17:691-714, 1992.

[49] J.-J. Riickmann. Stability of noncompact feasible sets in nonlinear optimization. In J. Guddat et al., editors. Parametric Optimization and Related Topics III, pages 467-502, Peter Lang, Frankfurt, 1993.

[50] J.-J. Riickmann. On existence and uniqueness of stationary points. RWTH Aachen, Dept. of Mathematics (C), Preprint No. 61, 1995.

[51] T. Rupp. Kontinuitiitsmethoden zur Losung einparametrischer semi-infiniter Optimierungsprobleme. Dissertation, University of Trier, Dept. of Mathematics, 1988.

[52] T. Rupp. Kuhn-Tucker curves for one-parametric semi-infinite programming. Optimization, 20(1):61-77, 1989.

[53] M. Shida. Stability of Nonlinear Optimization. Dissertation, Tokyo Institute of Technology, Dept. of Systems Science, 1994.

[54] C. A. Tiahrt and A. B. Poore. A bifurcation analysis of the nonlinear parametric programming problem. Mathematical Programming, 47:117-141, 1990.


[55] G.-W. Weber. Charakterisierung struktureller Stabilitiit in der nichtlinearen Optimierung. Dissertation, RWTH Aachen, Dept. of Mathematics (C), 1992.

[56] W. Wetterling. Definitheitsbedingungen fur relative Extrema bei Optimierungsund Approximationsaufgaben. Num. Math., 15:122-136, 1970.

[57] G. Zwier. Structural Analysis in Semi-infinite Programming. Dissertation, University of Twente, 1987.

3 REGULARITY AND STABILITY IN

NONLINEAR SEMI-INFINITE OPTIMIZATION

Diethard Klatte1 and Rene Henrion2

ABSTRACT

l Institut fur Operations Research, Universitiit Zurich,

CH-8044 Zurich, Switzerland, Email: [email protected]

2 Weierstraft-Institut fUr Angewandte Analysis und Stochastik, Berlin,

D-l0117 Berlin, Germany, Email: [email protected]

The paper is concerned with semi-infinite 0 1 programs parametrized in the objective function and in the constraint functions, where perturbations may also occur in the index set of the semi-infinite constraints. Our purpose is to give a self-contained presentation of the interrelations between metric regularity, extended MangasarianFromovitz constraint qualification, local boundedness of multipliers and upper semicontinuity of stationary solutions. Moreover, we outline stability properties of perturbed local minimizers in the absence of second-order differentiability of the data.

1 INTRODUCTION

In this paper, we are concerned with parametric nonlinear programs involving finitely many equality and infinitely many inequality constraints in finite dimensions. The parameters may occur both in the problem functions and in the index set associated with the inequalities. Our studies will essentially be restricted to the use of first-order information.

69

R. Reemtsen and J.-J. Riiclonann (eds.), Semi-Infinite Programming, 69--102. © 1998 Kluwer Academic Publishers.

70 CHAPTER 3

Consider a family of semi-infinite optimization problems

P(t) : I(t, x) -+ min s.t. x E M(t) , x

where t is a parameter varying over T, and M is a multifunction which assigns to each t E T the solution set M(t) of the system

hi(t, x) 0

9(t,x,y) > 0

(i = 1, ... ,p)

(Vy E K(t)).

Throughout the paper we shall make the following

(1.1)

General Assumptions (GA). T is a metric space, K is a nonempty and compact subset of R,s, and the functions 1 : T x R,n -+ R" h = (hI, ... , hp ) : T x R,n -+ R,p and 9 : T x R, n X R, S -+ R, as well as the multifunction K : T -+ K have the following properties:

I, hand 9 are continuous,

I, hand 9 are differentiable with respect to x,

\7 xl(., .), \7 xh(., .)and \7 x9(., ., .) are continuous,

o :/: K(t) C K and K(t) is compact (Vt E T).

t I-t K(t) is closed and lower semicontinuous on T,

(1.2)

(1.3)

(1.4) (1.5)

(1.6)

where \7 xl(', .), \7 xh(·,·) and \7 x9(-",') denote the gradient mappings of j, h, 9 with respect to x. For further notation and definitions we refer to the end of Section 1.

The purposes of our paper are, on the one hand, to study the close interrelations between metric regularity, extended Mangasarian-Fromovitz constraint qualification, local boundedness of multipliers and upper semicontinuity of stationary solutions, and, on the other hand, to discuss certain consequences for the stability behavior of perturbed local minimizers in the absence of secondorder differentiability assumptions on the data. Since results of these types are scattered over the literature, we strive for a presentation which is as much as possible self-contained. For other views of perturbation and stability analysis in semi-infinite programming, we refer to the papers contributed by Jongen and Riickmann [23J and Shapiro [56J to the present volume.

Section 2 which essentially goes back to the first author's paper [34J is devoted to generalizations of Gauvin's theorem [14] and of the parametric version of Gauvin's theorem (Robinson [46]). These results were originally given for standard

Regularity and Stability 71

nonlinear programs and say (in the parametric variant) that the MangasarianFromovitz constraint qualification (MFCQ) holds at some stationary solution of the unperturbed problem if and only if the associated Lagrange multiplier set mapping of the parametric problem is locally bounded. Now, in the semiinfinite programming setting, MFCQ has to be replaced by a standard extension of it. We also outline the relations to constraint qualifications and Gauvin type theorems for other settings of semi-infinite programs.

Section 3 extends the authors' [16] results given for pet) in the case of a fixed index set K(t) == K to a parametric index set K(t), t varying. We shall show that metric regularity of the constraints near some feasible point is equivalent to satisfying the extended MFCQ at this point. As an essential prerequisite, we use Cominetti's lemma [11] on the relation between the metric regularities under arbritrary perturbations and right-hand side perturbations of the constraints. We shall use the fact that the extended MFCQ allows an equivalent epigraphical representation of the constraints, see [15,16]. An alternative proof using the cone constraint approach could be given by combining results from [11,43,55]. Finally, in the case of absence of equations in (1.1), we characterize metric regularity by an explicit growth condition of the constraint function, thereby we follow an idea in [17].

Section 4 treats some consequences of Section 3 for the stability behavior of local minimizers. The approach is based - for the qualitative stability part -on [6,29,47] and - for the quantitative stability part - on an idea of Alt [1] and modifications of this idea given in [5,29,30,32]. Some Concluding Remarks (Section 5) will complete the paper.

We finish Section 1 by introducing some definitions and notation which will be used throughout the paper. Notation and definitions needed only locally will be found in the corresponding (sub)section. In JRn we denote by On the zero vector, by (x,y) or xTy the Euclidean inner product, by 11·11 the Euclidean norm, by II· 1100 the maximum norm. By "dist (x, Z)" we mean the usual distance of a point x to a set Z induced by the norm II· II in JRn . We write JR+. for the nonnegative orthant of JRn , and use the abbreviation V[rJ to denote the projection of an n-vector v onto its first r coordinates (VI' ... ' vr ). By conv X and cone X we mean the convex and the conic hull of X C JRn , respectively, while dim X refers to the dimension of a convex set X. We use the symbol C(K,JR) for the linear space of continuous functions y f-t bey) from the compact set K c JRs to JR, equipped with the norm II bilK := maxyEK I bey) I. The zero element of C(K, JR) will be denoted bye. For each (t, x) E T x JRn we define

E(t, x) := {y E K(t) I get, x, y) = O}, K(t, x) := r\7 xg(t, x, y) lyE E(t, x)}.

72 CHAPTER 3

Both sets are compact due to the assumptions GA. Note that the set of active indices E(t, x) consists of all global minimizers of the function get, x, .) on K(t), provided that E(t,x) is nonempty and g(t,x,y) ~ 0 for all y E K(t).

When t E T is given, a point x E R.n is a (Karush-Kuhn-Tucker type) stationary solution of P (t), if there are a nonnegative integer r :s n + 1, a (possibly empty) subset {y1, ... ,yr} of E(t,x) as well as multipliers Ui E R. (i = 1, ... p) and Vj ~ 0 (j = 1, ... , r) such that

P r

X E M(t), 'Vxf(t,x) + LuNxhi(t,x) - Lvj'Vxg(t,x,yj) = On. (1.7) i=1 j=1

Given some x E M(tO) for fixed t = to, we shall say that x satisfies the Extended Mangasarian Fromovitz Constraint Qualification (EMFCQ) with respect to (1.1) at t = to, if

'V x h( to, x) has rank p, and there is some ~ E R. n with

~T'Vxhi(tO,x) = 0 (i = 1, ... ,p), ~T'Vxg(to,x,y) > 0 (Vy E E(tO,x)). (1.8)

The latter definition is due to Jongen, Twilt and Weber [26], it is based on a condition introduced in [18,20]. Indeed, (1.8) is a constraint qualification assuring that a local minimizer of (P) is a stationary solution of (P), cf., e.g., [20, Satz 3.1.14]. Obviously, EMFCQ includes P:S n and, if E(tO,x) ::j:. 0 then ~ ::j:. On and p :s n - 1 hold under EMFCQ at x E M (to). If K (to) is finite, Condition (1.8) is the standard Mangasarian-Fromovitz constraint qualification. We note that the case of having no equality constraints (p = 0) is not excluded from the following results. For simplicity we admit p = 0 when speaking of p-tuples, (p, n)-matrices, p-sums etc., the corresponding term is then assumed being vacuous.

Often we consider unperturbed semi-infinite programs, i.e., we may suppose to E T being fixed. For simplicity, we write in this situation

f(x) := f(tO,x), hex) := h(tO,x), g(x,y) := g(tO,x,y), (1.9)

as well as E(x) := E(tO,x), K:= K(tO), and rex) := r(tO,x).

Finally, we introduce the continuity notions for multifunctions, as they will be used below, for more details we refer to [6, Section 2.2]. Given a multifunction .p from a metric space S to R. d, .p is called closed at sO E S, if sk -+ so, wk -+ WO and wk E .p(sk) (Vk) imply that WO E .p(sO). .p is said to be

locally bounded at SO E S if for some neighborhood 0 of SO and some nonempty bounded set Q C R,d and for all s E 0, <p(s) is a subset of Q. Recall that for a closed multifunction, upper semicontinuity in Berge's sense is equivalent to local boundedness. So, in our context, it will be suitable to say that <P is upper semicontinuous at SO if <P is closed and locally bounded at so. If the metric space S is compact, then closed ness and upper semicontinuity coincide. Moreover, <P is said to be lower semicontinuous at (sO,wO) if, with wO E <p(sO), for each sequence sk -+ SO there is some sequence w k -+ WO satisfying w k E <p(sk) for k sufficiently large. <P is said to be lower semicontinuous at SO if <P is lower semicontinuous at any point of {SO} x <p(SO). Any of these properties is said to hold on S if it holds at all s in S.

2 UPPER SEMICONTINUITY OF STATIONARY POINTS

In Section 2 we give extensions of Gauvin's Theorem [14] of Robinson's [46] parametric variant of this result. In particular, it turns out that EMFCQ is equivalent to the local boundedness of the multiplier set mapping. This implies the upper semicontinuity of the stationary solution set mapping.

2.1 Preliminaries

We begin by discussing some basic concepts which are related to unperturbed semi-infinite programs. Hence, we may suppose to E T being fixed and use the notation (1.9). The semi-infinite problem P(tO) then reads

(P): f(x) -+ min s.t. x E M

where M is the solution set of the system

hi(x) = O(i = 1, ... ,p), g(x,y) 2 O(Vy E K). (2.1)

The following Gordan-type lemma (for a proof see, e.g., [34]) gives a dual form of the extended Mangasarian-Fromovitz condition. We note that the statement of the lemma is standard if no equations appear, cf., e.g., Cheney [9].

Lemma 2.1 Let Q be a nonempty compact subset ofR,n, and let B be a (p,n)matrix. The following are equivalent:

74 CHAPTER 3

a. B has rank p, and there is some' E R,n such that B,=Op, qT,>O(VqEQ).

b. For each set of v = n - p + 1 points {ql, ... , qV} in Q, the system

BTu - Dv = On, Op+v ¥- (u,v) E R,P X R,~ has no solution, where D is the (n, v)-matrix D := [ql ... qV].

c. For each r 2: 0 and each set of r points {ql, ... , qr} in Q, the system BTu - Dv = On, Op+r ¥- (u,v) E R,P X R,~ has no solution, where D is the (n,r)-matrix iJ:= [ql ... qr].

Now we recall two other approaches of modelling the semi-infinite program (P): the cone constraint formulation which leads to an optimization problem in the Banach space of continuous functions over a compactum, and the maximum constraint formulation which leads to a Lipschitz program in finite dimensions.

For each x E R, n, let 9 (x, .) be the function defined by

g(x,·)(y):=g(x,y) (VyEK),

hence, by the general assumptions, we have g(x,·) E C(K, R,) for all x. Let

where C+(K,R,):= {v E C(K,R,)lv(y) 2: OVy E K}.

Then the constraint system of the semi-infinite program (P) may be equivalently written by the cone constraint

(h(x) , g(x,·)) E C. (2.2)

Note that. the cone constraint formulation (2.2) fits into the theory of generalized equations handled by Robinson in [43].

By the General Assumptions GA, the function a: R,n ~ R,P X C(K, R,), defined by

a(x) := (h(x), g(x,·))

is together with its first derivative Da(x) = (Y'h(x),Dg(x,·)) continuous on R,n, where Y'h(x) is the (p x n) - Jacobian matrix of hat x. Further, one has for any z E R,n,

[Dg(x,.) z](y) = zT Y' zg(x, y) Vy E K.

Spezializing in these terms Robinson's definition [43] of regularity, we arrive at Robinson's Constraint Qualification (RCQ) which is said to hold at XO with respect to 2.2, if


(2.3)

where "int" means the interior and the bracketed set is the set of all elements e = a(xO) + Da(xO) z - (a, b), z E JRn , (a, b) E C. By the results in [42,44,60], RCQ is equivalent to

Da(xO)JRn -JR+(C - a(xO)) = Z,

where Z := JRP x C(K, JR).

(2.4)

The second version of modelling the constraint system of (P) is to formulate it :ts a system of smooth equations and one Lipschitzian inequality:

hi(x) G(x)

G(x)

o (i = 1, ... ,p),

> 0,

min g(x,y). yEK

(2.5)

Writing down Auslender's CQ [4,5] for the system (2.4), we obtain another extension of MFCQ: We shall say that EMFCQ* holds at a point XO E M if the set {V'hi(xO)h=l, ... ,p is linearly independent as well as either G(xO) > 0, or G(xO) = 0 and

V'h(xO)~ = Op and e e > 0 "Ie E 8G(xo) (2.6)

hold for some vector ~ E JRn , where 8G(xO) is Clarke's generalized gradient of G (2.4) at xO. Of course, the definition of EMFCQ* is justified, since x t-+

G(x) := minYEK g(x, y) is locally Lipschitzian.

It is well-known that the usual one-sided directional derivative G'(x; z) exists for all z and is for E(x) -:j:. 0 (i.e., if G(x) = 0) given by the formula

G'(x;z) = min zTV'xg(x,y). YEE(x)

(2.7)

Moreover, if E(x) -:j:. 0 then Clarke's generalized gradient is (cf. Rockafellar [50, Prop. 3H,Thm. 4C], [49])

8G(x) = conv{V'xg(x,y) iy E E(x)}. (2.8)

Proposition 2.2 (Equivalent constraint qualifications). Let XO be a point satisfying the constraints {2.1} or, equivalently, {2.2} or {2.4}. Then the following conditions are equivalent:

76 CHAPTER 3

a. EMFCQ holds at xO .

b. RCQ holds at xO.

c. EMFCQ* holds at xO .

While the equivalence of EMFCQ and EMFCQ* is an immediate consequence of the definitions and the property (2.8), the equivalence of EMFCQ and RCQ was only recently proved by Shapiro [55] for the case that no equations appear (the extension to the case of equations and inequalities is straightforward, see [33]).

2.2 Gauvin's theorem

Throughout this subsection, we fix to E T and use the notation (1.9).

Concerning standard (i.e., finite) C1 programs, Gauvin [14] proved that for a stationary solution xO, the corresponding set of Langrange multiplier vectors is bounded if and only if the Mangasarian-Fromovitz CQ (MFCQ) holds at xO. This theorem was extended to optimization problems in Banach spaces (see, e.g., Zowe and Kurcyusz [60] or Penot [42]), where MFCQ has to be replaced by Robinson's CQ. In particular, this result applies to the cone constraint model of the semi-infinite program (P), where by Proposition 2.2, Robinson's CQ may be replaced by EMFCQ. We shall follow a different approach which was proposed by the first author in [34]. This concept also allows a very simple proof of the upper semicontinuity of the stationary point mapping under perturbations, see [34] and Subsection 2.3 below.

In what follows, we prove the Gauvin theorem in the version of [34]. First we note that for a given stationary solution x, by Caratheodory's theorem for cones (cf., e.g., [48,58]), the second sum in the Kuhn-Tucker condition (1.7) could be restricted to n components (even to d:= dimconer(x) components). However, in view of our extended Gauvin theorem we shall work with (n + 1) multipliers associated with inequalities, see Example 2.5.

Let x be any stationary solution of (P) . For a given r E {O, 1, ... , n + 1} and a given r-tuple Y in [E(xW := E(x) x ... x E(x), with Y := 0 when r = 0, we set

AY ( )._ {( ) RP Rn+11(X,U,V[r1'Y) satisfies (1.7)t=to,} x.- u, v E x + . _ ° '-' .. <. < + 1 . vJ - vJ. r J _ n

(2.9)


The union of the sets AY (x), taken over all Y E [E(xW and all r E {O, 1, ... , n+ I}, will be called the set of (1RP x 1R~+ 1) -multipliers associated with x and will be denoted by A(x). By definition, the multiplier set A(x) associated with a stationary solution x is always nonempty. Of course, even in,the case of a finite index set K, the sets of (1RP x 1R~+ 1 )-multipliers generally differ from the usual Lagrange multiplier sets.

The following statement and proof are essentially taken from Klatte [34]. Note that the proof of the necessity part even provides a simple bound for the set of (1RP x 1R~+1 )-multipliers.

Theorem 2.3 (Extended Gauvin theorem) Let Xo be a stationary solution of (P). Then xo satisfies EMFCQ with respect to (2.1) if and only if the set of (1RP x 1R~+1)-multipliers A(xO) is bounded.

Proof. For shortness, we set q(y) := \1xg(XO,y),y E K.

Necessity. Suppose that (1.8) holds at XO with some~. Let E(xO) :f 0, otherwise the assertion is trivial. Further, assume p ~ 1, for p = 0 the proof runs similarly. Since r(xO) is compact, there exist A, 11 > 0 such that ~T q ~ A and IIH- qll :::; 11 for all q E r(xO), where H- is the pseudo-inverse of \1h(xO). Consider any (u,v) E A(xO). Thus there are r E {O, 1, ... ,n+ I} and y1, ... ,yr in E(xO) such that r

\1 f(xO) + \1h(xO)u - L Vjq(yj) = On (2.10) j=l

and Vj = 0 for j > r. If r = 0 then (u, v) = (-H-\1 f(xO), Od). If r ~ 1 then scalar multiplication by t;. in (2.10), v ;::: Od and (1.8) imply

n+1 r

o :::; L Vj = L Vj :::; a := A -1 ~ T\1 f(xO), j=l j=l

and so, by (1.8) and (2.10), Ilull :::; f3:= IIH-\1f(xO) II + all. Since a and f3 are independent of the choice of (u, v), A(xO) is hence bounded.

Sufficiency. We suppose that XO does not satisfy EMFCQ. It will be proved that there is some r-tuple Y in [E(xOW such that AY (XO) is unbounded. Let (uO, vO) be any (fixed) element of A(xO), i.e., there are yI, ... , yr E E(xO), 0:::; r :::; n + 1, such that

P r

\1f(xo) + L u?\1hi (xO) - LvJq(yj) = On. (2.11) i=l j=l

78 CHAPTER 3

If rank V' h( xO) < p then the system

p

L uiV'hi(xO) = On i=1

has a solution u· =I Op, hence (uO + OU·, VO) E A(xO) for all 0 E R., and so A(xO) is unbounded in this case. Consider the other case that rank V' h(xO) = p 2': 1 or p = O. Since xO does not satisfy EMFCQ, we thus conclude from Lemma 2.1 that there are u· E R.P, m 2': 1, fl, ... , fr E E(xO) and vi, ... , v;;' > 0 with

p m

L u;V'hi(xO) - L vjq(yl) = On. (2.12) i=1 1=1

Now choose any positive real number 0 > O. Combining (2.11) and (2.12), we then obtain

p r m

V' f(xO) + L(u? + OunV'hi(xO) - L vJq(yj) - Ovrq(yl) - L OVjq(yl) = On. i=1 j=1 1=2

Defining J:= {yI, ... ,yr,y2, ... ,ym} and W := cone{q(y)ly E J}, we apply Caratheodory's lemma for cones ([48, Corollary 17.1.2], [58, Thm. 2.2.11]) to the point

p

V' f(xO) - Ov;q(yl) + L(u? + OU;)V'hi(xO) E w. i=1

Since dim W ::; n, there are N(O) elements fit of J, 0::; N(O) ::; n, such that

p N(O) V' f(xO) + L(u? + Oui)V'hi(xO) - Ov;q(yl) - L Vj(O)q(fit) = On (2.13)

i=l j=1

with suitable quantities Vj(O) E R.+. This is true for any 0 > O. Hence, there is an infinite sequence Ok tending to +00 as k --+ 00 and satisfying N(Ok) == N, such that (2.13) holds with 0 = Ok (Vk) and with a constant selection {yI, ... , yN} C J, 0::; N ::; n. Obviously,

(UO + OkU*, OkVr, VI (Ok), ... , VN(Ok), On-N) E A Y (xo)

with Y = [yl, yl, ... , yN]. This defines multiplier vectors for which at least the component Okvi tends to +00 as k --+ +00. Hence, A(xO) is unbounded, which completes the proof by contraposition. 0

Remark 2.4 By a suitable modification of the arguments in the proof of Theorem 2.3, one obtains also a proof of the standard Gauvin theorem in the case of a finite index set K.

As mentioned above, the application of the Caratheodory theorem for cones to (1.7)t=to suggests to define, instead of (2.9),

A- Y ( )._ {( ) JRP JRd I (x,U,V[rl'Y) satisfies (1.7)t=to ,} x.- u, v E x + 0 I.J. • < d '

Vj = VJ: r < J _

where d := dim cone r(x) ~ n. Note that in this setting of (p + d)-dimensional multiplier vectors, the necessity part of Theorem 2.3 holds true, but the sufficiency part will fail, in general. This can be illustrated by

Example 2.5 Consider the linear program in JR2,

The unique optimal solution (and also stationary solution) is xo = (0,0). Starting from the left, we denote the indices of the constraint by j = 1,2,3. MFCQ does not hold at xo. Obviously E(xO) = {I, 2, 3}. The set LM(xO) of usual Lagrange multipliers is given by all v E JR3 such that

(1, 0) = VI • (0, 1) + V2 . (1, -1) + V3 . ( -1, -1), VI , V2 , V3 ~ 0,

hence LM(xO) = {(I, 1,0) + T· (2,1,1), T ~ O}. By definition, the multiplier set A(xO) is a subset ofJR~, i.e., we have

Hence, A(xO) is an unbounded set. This corresponds both to the classical Gauvin theorem and to Theorem 2.3 above. If we would restrict ourselves to multipliers of dimension d:= dim cone r(XO), where r(xO) is defined according to Section 1, we had r(xO) = {(O, 1), (1, -1), (-1, -I)}, i.e., d = 2, and hence, AJI.21(XO) = {(I, I)} and A[I.31(XO) = A[2.31(xO) = 0. This means that the set of "multipliers of minimal representation" associated with the stationary solution xO is bounded though MFCQ is violated.

Remark 2.6 In the cone constraint formulation of the semi-infinite program (P),

(P) : minimize f(x) s.t. a(x) E C,

80 CHAPTER 3

where a(x) := (h(x), g(x,·)) and C := {Op} x C+(K, R) according to Subsection 2.1, the following version of Gauvin's theorem was given by Zowe and Kurcyusz [60, Theorem 4.1J (see also Penot [42, Corollary 3.7]): If the set of Lagrange multipliers

LM(xO) := {>. E Z* l.x E Co, (.x, a(xo)) = 0, \l f(xO) + .x 0 Da(xO) = O}

is nonempty then the boundedness of LM(xO) is equivalent to (2.4). Here Z* denotes the dual space of Z, and CO is the polar cone of c, i. e., CO : = {.x E Z* I (.x, c) ::; 0 Vc E C}. Hence, by Proposition 2.2 and the equivalence of (2.4) and RCQ, the boundedness of LM(xO) is equivalent to RCQ, EMFCQ and EMFCQ*. Note that one direction of the equivalence, namely, the boundedness of LM(xO) under RCQ, also follows from [40,45}. For a relation between the Lagrange multiplier sets LM(xO) and A(xO), we refer to [34, 55}.

2.3 Semicontinuity results

Let us return to the parametric problem P(t), t E T. By S(t) we shall denote the set of stationary solutions of P(t). For (t, x) E T x R n , the union of all sets AY (t, x) with Y in [E(t, xW and r E {O, 1, ... , n + I} will be denoted by A(t, x), where

AY(t ).={( ) RP R d+1 1(t,x,U,V[r1,Y)SatisfieS(1.7),} , x. u, v E x + 0 \.J • • < 1 . Vj = VJ: r < J _ n + (2.14)

This defines a multifunction A from T x R n to RP X R+.+l, where, by construction and by Caratheodory's lemma, we have (see the discussion in the unperturbed case)

A(t,x)¥=0 {::=:} xES(t), (2.15)

i.e., for each stationary solution x of the problem P(t), A(t,x) is the associated set of (RP x R+.+1 )-multipliers.

The following stability results will extend Robinson's Theorem 2.3 [46] given for finite nonlinear programs in terms of the usual Lagrange multiplier mapping, and they will extend Theorem 2 in [34] given for parametric semi-infinite programs with fixed constraint index set. First we shall prove the closedness and local boundedness of multipliers.

Theorem 2.7 Suppose that the general assumptions GA are satisfied.

(a) The multifunction A is closed on T x R n •


(b) Let xO be a stationary solution of P(tO). If XO satisfies EMFCQ with respect to (1.1) at t = to, then the multifunction A is locally bounded at to.

Proof. Auxiliary construction. In order to unify the proof of (a) and (b), let us consider any point (l, x) in T x R.n and arbitrary sequences {(tk, xk)} in T x R.n , {(uk,vk)} in R.P x R.nH with

Hence, by definition of A(tk, xk), one has xk E M(tk) for each k, and

P r(k)

On = \1 xf(tk, Xk) + L u~\1 xhi(t\ xk) - L vj\1 xg(tk, x\ yik) (2.17) i=l i=l

with some r(k) E {O, 1, ... , n + 1} and certain points

yik E E(tk,xk), j = 1, ... ,r(k). (2.18)

Assumption (1.2), the compactness of K and the closedness of t t-t K(t) on T imply that the intersection of multifunctions

(t, x) t-t E(t, x) = K(t) n {y E Klg(t, x, y) = O}

is closed on T (cf., e.g., [6, Lemma 2.2.3, Thm. 3.1.2]). Further, since K is closed and lower semicontinuous on T, assumption (1.2) and (1.5) imply that the function G(t,x) := minYEK(t)g(t,x,y) is continuous at (l,x), hence, together with the continuity of h, we obtain

x E M(l).

Extracting subsequences in (2.18) if necessary, we may assume r(k) == r. By the compactness of K, each of the r sequences {yik} has a subsequence converging to some element yi which belongs to E(l, x) due to the closedness of the index multifunction (., .). Hence we may also assume

ik -i -i E(t- -) ( . 1 ) Y --+ Y , Y E ,x J = , ... , r . (2.19)

(a) Closedness of A. Given (l,x) E T x R.n , let {(tk,xk)}, {(uk,vk )} be as in (2.16), and suppose that (uk,vk ) converges to some (u,v). Proceeding as in the auxiliary construction, we have x E M(l), and we may assume that there are some integer r E {O, 1, ... , n + 1}, sequences {yik} and

82 CHAPTER 3

points yi, j = 1, ... , r, such that (2.17), (2.18) and (2.19) are satisfied. Thus, by applying (1.4) and by passing to the limit in (2.17), one has

p r

On = '\l.d(l, x) + LUi '\l xhi(l, x) - LVi '\l xg(l, x, yi) i=l i=l

and Vi = 0 for r + 1 ~ j ~ m + 1. Thus (u,v) E A(l,x).

(b) Local boundedness of A. First consider the case E( to, xO) = 0. Since E(·,·) is closed at (to,XO) (cf. the argument in the auxiliary construction), E(t,x) is empty for all (t,x) in some neighborhood W of (to,xO). Hence for x E Set) with (t,x) E W, the condition g(t,x,y) > 0 (Vy E K(t» will be automatically satisfied and so the Karush-Kuhn-Thcker system (1.7) reduces to

p

h(t, x) = Op, '\l xf(t, x) + L uN xhi(t, x) = On· i=l

Because of the linear independence of the gradients '\l xhi(tO, xO), i = 1, ... ,p, then the desired result is classical (cf. [43]).

Now suppose E(tO, XO) "I 0. Recall that EMFCQ holds for the system (1.1) at (to, xO). Then we have to show that there are an open neighborhood W of (to, XO) and a bounded set Z E RP x R! with

A(t, x) C Z Vet, x) E W. (2.20)

Assume that, on the contrary, there are sequences {(tk, xk)}, {(uk, vk)} satisfying

(tk,Xk) -t (t°,xO), (uk,vk) E A(tk,xk) (Vk) and ak := lI(uk,vk)lI-t +00.

Extracting a subsequence, one may assume that ak"l (uk, vk) converges to some (u, v) with II (U, v) II = 1. Reasoning as in the auxiliary construction, we see that, with no loss of generality, there are some r in {O, 1, ... , n + I}, sequences {yik} and points yi E E(tO, XO), j = 1, ... , r, such that (2.17), (2.18) and (2.19) hold. Dividing by ak and passing to the limit in (2.17), we find that

p r

LUi'\lxhi(tO,XO) - LVi'\lx9(tO,xO,yi) = On, i=l i=l (2.21) Ui E R (i = 1, ... ,p), Vi ~ 0 (j = 1, ... , r), (u, v) "lOp+r,

where again (1.4) was used. By Lemma 2.1, (2.21) contradicts EMFCQ for the system (1.1) at (to, xO). Hence, (2.20) is shown. 0


The preceding theorem particularly says that the sets of multipliers are closed at all events and compact under EMFCQ. As an immediate consequence, the upper semicontinuity of stationary solutions and multipliers follows.

Theorem 2.8 Suppose that the general assumptions GA are satisfied. Let xO be a stationary solution of P(tO). If xO satisfies EMFCQ with respect to (1.1) at t = to I then there exist neighborhoods U of to and N of xO such that the multi functions A and t I-t S(t) n N are upper semicontinuous on U x Nand U I respectively.

Proof. Since Z in (2.20) may be supposed to be compact, Theorem 2.7(a) and (2.20) imply that A is upper semicontinuous on W, cf., e.g., [6, Lemma 2.2.3].

Let U and N be neighborhoods of to and xO, respectively, such that U x NeW. Without loss of generality, one may suppose that N is compact. Hence, it suffices to show that t I-t S ( t) n N is closed at each t E U. Consider an arbitrary sequence ({tk, Xk)} in W with t k -+ t E U and Xk E S(tk) n N ('Vk). For each k then there is some (uk,vk) E A(tk,xk) c Z. Since {xk} has an accumulation point x E N, so, by the upper semicontinuity of A, ({uk,vk)} has an accumulation point (u, v) E A(t, x). Hence x E S(t), by (2.15), which completes the proof. 0

It is a trivial fact from linear programming that under the assumptions of Theorem 2 (b), S(t) may become empty for t near to. However, if xO is a strict local minimizer of P(tO) under EMFCQ, a slightly perturbed problem P(t) has a local minimizer near XOj moreover, EMFCQ persists, and so S(t) ::j:. 0 for t close to to. This is an immediate consequence of parametric global optimization theory, cf., e.g., [6, Theorems 4.2.1,4.2.2].

3 METRIC REGULARITY OF THE FEASIBLE SET MAPPING

Section 3 of our paper is devoted to the concept of metric regularity. The close relation between metric regularity of constraint systems and pseudo-Lipschitz continuity of the associated constraint set mapping (for a fundamental study we refer to Mordukhovich [41]) indicates the importance of metric regUlarity for sensitivity and stability analysis of perturbed optimization problems, see also

84 CHAPTER 3

Section 4 below. Hence, characterizations of metric regularity are of great interest. Robinson [44,45] has clarified the close connections - in fact, equivalences under suitable assumptions - between constraint qualifications (MangasarianFromovitz CQ, Slater CQ, Robinson CQ etc.) and metric regularity in the classical settings of nonlinear programming problems. Among the large number of publications handling metric regularity of multifunctions on different levels of generality we also mention those of [2,5,8,11,21,27,42,51,60].

In the semi-infinite setting, we shall prove that metric regularity is equivalent to EMFCQ, and if no equations appear in (1.1), we shall give a characterization of metric regularity via an explicit growth condition of the function which defines the aggregated inequality constraint.

3.1 Basic concepts

In the first subsection we define metric regularity and apply Cominetti's basic lemma on the equivalence of metric regularity under right-hand side perturbations and metric regularity under arbitrary perturbations to our setting.

Consider the following parametric system of equations and inequalities, where additionally to (1.1) right-hand side perturbations are included:

hi(x, t) = ai (i = 1, ... ,p) and get, x, y) ~ bey) (\ly E K(t)), (3.1)

where t E T,a = (al, ... ,ap ) E JRP and b:= b(·) E C(K,JR) are viewed as parameters, and K , T , h ,and g are given according to Section 1. For

w:= (t,a,b) E n:= T x JRP x C(K,JR),

M(w) denotes the solution set of the system (3.1).

Specializing the notions of metric regularity in [27, Def. 3.1], [11, Def. 2.1, parametric version] to our situation, we arrive at the following definition.

Let wO = (to, 0P' 8) and XO E M(wO). We shall call the system (3.1) metrically regular at (wO, xO) if there exist a neighborhood U of (wO, xO) and a real number f3 > 0 such that for all (w, x) E U with w = (t, a, b),

dist (x, M(w)) ~ f3. max{lIa - h(t, x)lIoo, max (b(y) - g(t, x, y))+}, (3.2) YEK(t)

where Q+ := max{a, O}. Note that (3.2) includes that M(w) is nonempty for all w in some neighborhood of wO. When fixing t = to in this definition, we shall


call the system (3.1)t=to metrically regular at (Op, e, xO) with respect to righthand side (RHS) perturbations (briefly metrically RHS-regular). When fixing t = to in (3.2) and allowing right-hand side perturbations only of the form (a,c(·)) with c(y) == c, c a positive scalar, then we shall say that the system (3.1)t=to is metrically regular at (Op, e, XO) with respect to uniform right-hand side (URHS) perturbations (briefly metrically URHS-regular). Clearly, one has metric regularity ==> metric RHS-regularity ==> metric URHS-regularity.

Cominetti (11, Thm. 2.1] has shown that under mild conditions, metric regularity in its general form and metric RHS-regularity are equivalent. Now we present this result in a version which fits in our context.

Lemma 3.1 Suppose GA, and let (to,xO) be some element ofT x Rn. If the system (3.1)t=to is metrically regular with respect to RHS perturbations at (Op,e,xO), then the system (3.1) is metrically regular at (to,Op,e,xO).

Proof. Define for (t,x) E T x Rn, G(t,x) := minYEK(t)g(t,x,y) and G(x) := minYEK(to) g(tO, x, y). In order to apply Theorem 2.1 in Cominetti (11], one has to show that there are neighborhoods Ut of to and U'" := B(xO,c) of xO such that

a. G and G are continuous on U'" and Ut x U"', respectively,

b. for each t E Ut , G(t,·) - GO is Lipschitz on U'" with modulus f3-1, where f3 > 0 is the modulus of metric RHS-regularity of (3.1)t=to. By the general assumptions GA and the continuity of t H K(t), Property a. follows from [6, Chapter 4]. Since 9 is continuously differentiable with respect to x with continuous derivatives \7 ",g(.,.,.), there exist for any ii E K(tO) neighborhoods U}ii) of to , U~ii) of XO and U~ii) of ii such that

II \7",g(t,x,y) - \7",g(t,z,y) II :::; ~f3-1 (3.3)

holds for all (t, x, y) E U}ii) x U£ii) x U~ii). Hence, by standard arguments, the compactness of K(tO) and the continuity of K yield that (3.3) holds for all (t, x, y) with t E Ut , x E B(xO, c) and y E K(t), where Ut is some neighborhood of to and c is some positive real number. Hence, for all (t, y) with t E Ut and y E K(t), the functions get, .,y) are Lipschitz on B(xO,c) with uniform modulus !f3-1 . Then, we have for all t E Ut and all x,z E B(xO,c),

G(t, x) - G(x) - G(t, z) + G(z)

86 CHAPTER 3

::; min g(t,x,y) - min g(tO,x,y) yEK(t) yEK(tO)

- min g(t,z,y) + min g(tO,z,y) yEK(t) yEK(tO)

::; get, x, Yt,z) - get, z, Yt,z) - g(tO, x, YO,x) + g(tO, z, YO,x)

::; ,a-Illx - zll,

where Yt,z solves minYEK(t) get, z, y), while YO,x solves minYEK(to) g(tO, x, y). By changing the role of x and z, hence, b. is shown. This completes the proof of the lemma. D

3.2 Epigraph representation

In this subsection, we shall present an auxiliary result which will be helpful in proving that EMFCQ implies metric regularity under right-hand side perturbations. This is a slightly strengthened version of Lemma 1 in [16]. The idea of our approach relies essentially on the following observation for a non-parametric inequality system: If one supposes that to is fixed, no equations appear, and for each y E K(tO), g(tO,x,y) has the form Xn -g(XI, ... ,Xn-l,y), then the distance between some point and the feasible set can be trivially estimated along the xn-axis. In the case of the general system (3.1), the estimate (3.2) reduces to a similarly simple form if a parametric epigraph-representation exists. As the subsequent lemma will show, EMFCQ provides this after some local coordinate transformation.

Since, the following lemma concerns pet) for fixed t = to, we use again the notation (1.9). Again, let M(a, b) := M(tO, a, b).

Lemma 3.2 Assume GA, and assume EMFCQ to hold at some XO E M(Op, 8) with E(xO) =1= 0. Then there exists an open neighborhood Ua X Ub X Ux X Uz of (Op, 8, XO, On), an open set 0 ;2 E(xO) and a local Cl-coordinate transformation 'ljJ: Ux -+ Uz with 'ljJ(XO) = On such that for all (a, b, x, z) E Ua X Ub X Ux X Uz, z = 'ljJ(x), the following systems are equivalent:

{g(~~~~ ~ ~(y) 'Vy E K } ¢=:} {z~~ ~ :(b,Z[n-I]'Y) 'Vy E K* }

Here, K* "is a compact set with E(xO) ~ K* ~ K nO and <p : Ub x (Uz)[n-l] x o -+ R is continuous on its domain, differentiable with respect to (b, Z[n-Ij) for every yEO, and the derivative (b,Z[n_I],y) H D(b,Z[n_ll)¢(b,z[n-I],y) is continuous on Ub x Uz X O. Further, a positive real number a exists such that


4>(b,Z[n-I],y) - Zn :$ a· (b(y) - g(x,y))+ holds for all (b,x,z,y) E Ub x Uz X

Uz x K* which satisfy Z = 'IjJ(x).

Proof. Since the proof is very similar to the arguments in proving Lemma 2.1 in Henrion [15], we only sketch the proof of the lemma. Using the idea from the proof of Lemma 2.1 in [15], where a nonparametric system was considered under EMFCQ, one can construct a local diffeomorphism 'IjJ between some open neighborhoods Vz of xO and Vz of On with the following essential features:

a) Z[p] = h(x) (where Z = 'IjJ(x)) and

b) the nth column of Y''IjJ-I(On) equals some EMFCQ vector ~ E Rn , this is a vector satisfying (1.8) in the definition of EMFCQ. "Y''' refers to the Jacobian. Then, equivalence of the equation parts formulated above is evident. We omit the details.

Concerning the remainder let y' E E(xO) be an arbitrary active index at xO. Define GI(b,z,y) := g('IjJ-I(z),y) - b(y) leading to

GI (8, On, y') = 0 and aaGI (8, On, y') = Y' zg(XO, y') . ~ > O. (3.4) Zn

By the general assumptions GA and the construction of 'IjJ, there is some neighborhood N° of (8, On, y') such that G I is continuous on NO and differentiable with respect to (b, z), where the derivative D(b,z)G1 (', " .) is continuous on NO. Hence, there exist neighborhoods Nb of e, N z of On and Ny of y' and exactly one map 4> : N := Nb x (Nz)[n-l] x Ny ---+ (Nz)n such that 4>(8, O[n-l], y') = 0 and G1 (b, Z[n-I], 4>(b, Z[n-I], y), y) = 0 on N, and this map 4> is continuous on N (cf., e.g., Deimling [12, Thm. 15.1]). Without loss of generality, assume the neighborhoods being so small that each (b, z, y) in Nb x N z x Ny fulfils

aG1 -a (b,z,y) = Y'zg(x,y)·", > 0, Zn

where x = 'IjJ-I(Z) and 'Tl is the n-th column of Y''IjJ-l(Z). The latter was possible by continuity of Y' 9 and (3.4). Thus, for each y E Ny, the standard implicit function theorem for C I functions (see [12, Corollary 15.1]) may be applied to G1 (', " y), and it yields that ¢(',', y) is continuously differentiable on a neighborhood of (8, On-I), which is (possibly strongly) contained in Nb x (Nz)[n-I]' Arguing as in the proofs of Corollary 15.1 (Implicit Function Theorem) and Theorem 15.2 (Inverse Function Theorem) in [12], one easily may show, that

88 CHAPTER 3

the continuity of D(b,z)G1(-,·,·) near (e,On,Y') carries over to the continuity of D(b,Z[n_1J)¢(·'·'·) with respect to some neighborhood of (e,On-l,y').

For all (b,z,y) sufficiently near to (e,On,Y'), we thus have

{ >O {>O G1(b,z,y) = 0 ¢} Zn - ¢(b,Z[n-l],y) = 0 .

Next, we define the function

where

hence

1 . 8G1 'Y:= -2· mm -8 . (8,On,Y) > 0,

YEE(xO) Zn

(3.5)

Again, the implicit function theorems apply, and, due to uniqueness of the resolving function and to G2 (b,z[n-l],¢(b,z[n-l],y),y) = 0, one gets locally:

{ >O {>O G2 (b,z,y) =0 ¢}Zn-¢(b,Z[n-l],y) =0· (3.6)

Similar equivalences like (3.5) and (3.6) may be established around all active indices y' E E(xO). Due to compactness of E(xO) there exist a neighborhood Ub of e, a neighborhood Uz of On, an open set Uy :J E(xO) as well as a realvalued function ¢ defined on Ub x (Uz)[n-l] xUy with the smoothness properties as above such that

Gl(b,z,y){~~ ¢} zn-¢(b'Z[n-l]'Y){~~ (3.7)

¢} Gl(b'Z'Y)+'Y.(¢(b'Z[n-l]'Y)-zn){:~

holds for all variables in the indicated neighborhoods. Since one may find an arbitrarily small open set 0, E(xO) C 0 C cl 0 c Uy, such that all indices y E K \ 0 remain inactive for all x E Ux (with Ux sufficiently small), the inequality part in the statement of the lemma is proved by the first equivalence in (3.7), one may take K*:= Kncl(O)).

Concerning the last statement of the lemma, let (b, x, z, y) E Ub X Ux X Uz x K* be arbitrary with Z = 'ljJ(x). If ¢(b, Z[n-l] , y) - Zn ~ 0 , then the statement

holds trivially. Let 1>(b, Z[n-i], y) - Zn > 0 then the second equivalence in (3.7) gives:

0< 1>(b,Z[n-i],y) - Zn < (lh)· (-Gi(b,z,y)) = (lh) . (b(y) - g(x, y))

= (lh) . (b(y) - g(x, y))+.

This completes the proof with O! = Ih.

3.3 Metric regularity and constraint qualifications

o

Now we are going to prove the main results of Section 3. Again to is fixed, we use the notation (1.9) and M(a, b) := M(tO, a, b).

Theorem 3.3 Consider the system (9.1) and suppose the general assumptions GA. Letwo:= (t°,Op, e). IfEMFCQ holds at XO E M(wO), then system (9.1) is metrically regular at (wO, xO) .

Proof. By Lemma 3.1 it suffices to show that (3.1) is metrically RHS-regular at (Op, e, xO). In the case E(xO) = 0, Assumption (1.2) yields that in some neighborhood of (Op, e, XO), system (1.1)t=to is exclusively described by the equations. But then one deals with classical MFCQ in finite optimization and the stated proposition follows from [44]. For E(xO) i- 0 Lemma 3.2 may be applied. We consider the system

(3.8)

in the setting of Lemma 3.2. In order to verify metric regularity, let (a, b, x, z) in Ua X Ub X Ux X Uz with Z = 'ljJ(x) be arbitrarily chosen. Define M*(a, b) := {z E Uz I Z satisfies (3.8)}. Clearly M*(a,b) = 'ljJ(Ux nM(a, b)), by Lemma 3.2. Define a reference point z* E M* ( a, b) by

Zin-i] := (a, zp+1, ... , Zn-i)

and Z~ := max{zn, 1>* (b, (Z*)[n-i]))

where 1>*(b, Z[n-i]) := maxyEK. 1>(b, Z[n-i], y). Feasibility of z* follows from Lemma 3.2. Then

dist (z, M*(a, b))

90

~ c· II i - z* 1100

= c· max{lIi[p] - alioo, lin - z~l}

_ { c· lIi[p] - all oo , - c· max{lli[p] - all oo , ¢*(b, (Z*)[n-l]) - in}

CHAPTER 3

if in = Z~ if in < Z~ ,

(3.9)

where c is some positive factor of norm equivalence. Recall that 'ljJ-l is continuously differentiable on Uz and may be hence assumed, without loss of generality, being Lipschitzian there with some modulus € > o. From (3.8) one concludes (with i = 'ljJ-l(X)):

dist (x, M(a, b))

~ dist (x,Ux n M(a, b)) = inf II 'ljJ-l(i) - 'ljJ-l(Z) II

zEM* (a,b)

~ € . dist (i, M*(a, b))

~€.c.{lIi[p]-~lIoo, * * _ max{lIz[p] - all oo , ¢ (b, (z )[n-l]) - zn}

Note that the functions ¢(.,., y) , Y E K*, are differentiable on some neighborhood of (8, i) with continuous derivatives D(lI,z)¢(·,·,·), and so, the Lipschitz continuity arguments in the proof of Lemma 3.1 apply to the maximum function ¢* . Thus, without loss of generality, ¢* may be assumed being Lipschitzian on U" x [Uzln- 1 with some modulus 6 > O. Taking account of the second assertion in Lemma 3.2 we thus obtain in the case in < z~ that for some y* E argmaxYEK*¢(b,i[n_l],y),

¢*(b, (Z*)[n-l]) - in

< ¢*(b, (Z*)[n-l]) - ¢*(b, i[n-l]) + ¢*(b, i[n-l]) - in

~ 6· II a - i[p] 1100 + ¢(b, i[n-l], y*) - in

~ 6· II a - i[p] 1100 +0 . (b(y*) - g(x, y*))+

~ 6· II a - i[p] 1100 + a . max(b(y) - g(x, y))+. yEK

Combining this with (3.9), we have

dist (x, M(a, b))

~ c· €. max{ II a - i[p] 1100, 6· II a - i[p] 1100 + a . max(b(y) - g(x, y))+}. yEK

By taking Z[p] = h(x) into account, this yields the required condition of metric regularity with modulus f3 := c· € . max{l , a + 6} > o. 0

Theorem 3.4 If the system (3.1) is metrically regular at Xo E M(wO) with respect to uniform right-hand side perturbations, then EMFCQ holds at XO .

Proof. For e E R let € be the function defined by €(y) = e (Vy E K). By hypothesis, there exist neighborhoods U ofOp , Vof xO and positive real numbers (3, 7 such that for all (a, x) E U x V and all e E [0,7],

dist (x, M(a, -e»~ ~ (3. max{lI(a - h(x)lloo, max( -e - g(x, y))+} yEK

The compactness of K and the continuity of 9 entail that for some neighborhood VI ~ V of xO, g(x,y) ~ -7 (V(a,x) E U x VI). Hence, we have for the solution set mapping a ---+ M (a) of the system h( x) = a that if (a, x) E U X VI then

dist (x, M(a)) ~ dist(x, M(a, -7))

~ (3. max{lIa - h(x)lIoo , max( -7 - g(x, y))+} yEK

= (3. lIa - h(x) 1100' which implies by classical regularity theory (cf., e.g., Robinson [44, Cor. 3]) that the linear independence requirement of EMFCQ is satisfied. Now we shall prove the inequality condition in EMFCQ. If E(xO) = 0 then locally one deals with equality constraints only, but then the assertion follows from [44] again. Therefore let E(xO) :/: 0. Then minYEK g(xO, y) = O. Consider a sequence ek ..t. 0 with ek > O. Since (a, b) = (Op, e"k) are feasible right-hand side perturbations of (3.1), metric regularity implies that to each ek there belongs a point xk E M(Op, tk) fulfilling

II xk - xO 11= dist (xo, M(Op, tk)) ~ f3 . max(ek - g(xO, y))+ = f3 . ek. (3.11) yEK

For all y E E(xO) one has g(xO,y) = 0 and, due to feasibility of xk, one has g(xk,y) ~ ek. Taking account of (3.11), this leads to (Vy E E(xO)):

1 < ek < g(xk,y)-g(xO,y) f3 IIxk-xoll- IIxk-xoll

'V ",g(xO, y)(xk - XO) + o(xk - xO) = II xk - xO II (3.12)

By (3.11) it holds limk-+oo II xk - xO 11= O. Without loss of generality we may assume limk-+oo IIxk - xOIl-1(xk - XO) = ~ ERn. Transition to the limits on both sides of (3.12) provides 'V",g(XO,y). ~ ~ (1/(3) > O(Vy E E(xO». On the other hand, since xO E M(Op, 9) and xk E M(Op,tk), it follows that h(xO) = h(xk) = Op, hence 'Vhi(xO)(Xk - xO) = o(xk - XO). Dividing by II xk - XO II and passing to the limits yield 'Vhi(xO). ~ = 0 (i = 1, ... ,p) 0

92 CHAPTER 3

Remark 3.5 By Theorem 3.3 and Theorem 3.4 we may conclude that for parametric systems of the type {3.1} and under the general assumptions GA, the concepts of metric regularity, metric RHS-regularity and metric URHS-regularity are equivalent each to each other. Moreover, metric regularity is equivalent to the constraint qualification EMFCQ. This fact could be alternatively proved by applying results in the cone constraint form of {3.1}: Taking the equivalence of EMFCQ and Robinson's CQ (Shapiro [55]) in the semi-infinite setting into account, we obtain from Robinson [43, Thm. 1} that EMFCQ (or, equivalently, RCQ) implies metric regularity. The converse is also true, this follows from Borwein [8, Thms. 6.1, 6.4} or Cominetti [11, Corollary 2.1]' Further, note that in the case of writing {3.1} in the parametric form of {2.4}, the implication EMFCQ* => metric RHS-regularity is a special case of A uslender [5, Thm. 1.1} {combined with [4, Thm. 2.1]' This result can be also found in Levitin [39].

3.4 Explicit growth of constraint function

Usually, characterizations of metric regularity for constraint mappings are formulated as constraint qualifications in terms of subgradients or gradients of the functions involved (compare conditions EMFCQ, EMFCQ* and RCQ of the previous sections). Sometimes however, explicit conditions on functional values might be easier to check, as it was found, for instance, in the context of verifying metric regularity of chance constraints (see Henrion and Romisch [17]). In the following, we shall establish a growth condition for a continuous constraint function F: R n -+ R describing a single inequality

F(x) 2: 0 (3.13)

and then apply it to the setting of semi-infinite programming. We make use of Mordukhovich's [41] complete characterization of metric regularity for multifunctions between finite dimensional spaces. First note that, in consistency with the previous definition, the constraint F(x) 2: 0 is said to be metrically regular with respect to right-hand side perturbations at some feasible xO if there exists some (3 > 0 and a neighborhood U of (0, xO) such that

dist (x,M(c)) ::; (3. (c - F(x))+ V(c,x) E U,

where M(c) = {x E R n I F(x) 2: c}. Denote by 8a the approximate subdifferential, which for general lower semicontinuous function ¢ : R n -+ R is defined as

aa¢(Z) = limsup a-¢(x), 2-+.

4>(2)-+4>(')

where 'limsup' denotes the limit superior of multifunctions in the KuratowskiPainleve sense and

refers to the Dini sub differential with lower Dini directional derivative

d-¢(x; h) = liminf rl(¢(x + tu) - ¢(x)). u-toh

t.j.O

More explicitly, one has

x* E Oa¢(z) <==> there exist sequences xn ~ Z, (xn)* ~ x*, such that

¢(xn) ~ ¢(z) and (xn)* E o-¢(xn) (V'n)

For continuous ¢, the condition ¢(xn) ~ ¢(z) may be omitted of course.

Specializing one of Mordukhovich's [41] characterizations of metric regularity to a single inequality F(x) ~ 0, we have

Lemma 3.6 Suppose that F : R n ~ R is continuous, and let XO satisfy F(xO) ~ o. Then the following two statements are equivalent:

1. F(x) ~ 0 is metrically regular at xO with respect to right-hand side perturbations.

2. F(xO) > 0 or 0 f/. oa(-F)(xO).

This allows to prove the main result of the present subsection.

Theorem 3.7 Let F : R n ~ R be continuous. The constraint F(x) ~ 0 is metrically regular at some feasible XO with respect to right-hand-side perturbations if and only if F(xO) > 0 (non-binding case) or the following growth condition is fulfilled: There exists some p > 0 and a neighborhood U of xO, such that arbitrarily close to any x E U one may find some y with F(y) > F(x) + pliy - xli·

Proof. From continuity arguments it is obvious that the non-binding case entails metric regularity at xO (both sides become zero in the inequality of statement 1 of Lemma 3.6), so let the growth condition be fulfilled at xo. Consider an arbitrary x E U. Now, the growth condition provides the existence of a

94 CHAPTER 3

sequence yn -t x such that F(yn) > F(x) + pllyn - xII. In particular, yn ¥- x, and, without loss of generality, one may assume that hn := llyn - xll-1 (yn -x) -t h. It follows

= liminf r 1(F(x) - F(x + tu)) u-+h

t.j.O

< liminf llyn - xll-1 (F(x) - F(x + llyn - xii, hn)) n-+oo

= liminf llyn - xll-1 (F(x) - F(yn)) ~ -P n-+oo

For x* E o-(-F)(x) one gets (x*,h) ~ -p, hence IIx*11 ~ p due to IIhll = 1. Since this is true for all x E U, the explicit definition of the approximate sub differential , given above, yields IIx* II ~ p > 0 for all x* E oa (-F)(xO). In particular, 0 ct. Oa( -F)(xO), so metric regularity follows from the equivalence of statements 1. and 2. of Lemma 3.6.

In order to show the reverse direction of the Theorem, assume metric regularity and the binding case. According to the notation introduced above, there exist {3,""1,""2 > 0 such that (with BO denoting open balls):

By continuity of F, one may choose ""1 small enough to meet

Now, consider any x E BO(xo,,,.,t} and any 8 > O. Select some a> 0 such that a ~ (3-18 and IF(x) + al < ""2. With c := F(x) + a, the distance estimation above applies to x and c:

dist (x, M(c» ~ {3. (c - F(x))+ = (3a.

In particular, the set M(c) is nonempty and it is also closed (by continuity of F). Consequently, there exists some, x E M(c) with

Ilx - xii = dist (x, M(c)) ~ (3a ~ 8.

Furthermore, x E M(c) implies that F(x) - F(x) ~ a, hence

IIX - xII :::; (3a :::; (3(F(x) - F(x».

Since x and 8 were chosen arbitrarily, this last relation implies the asserted growth condition for F (with U := BO(xO, ""1) and p:= (2{3)-1). 0


We note, that more general results in this direction have been obtained in [17]. In particular, a slightly generalized growth condition compared to the one considered here, is sufficient to guarantee metric regularity of a constraint system consisting of a finite number of inequalities which are defined by upper semicontinuous (not just locally Lipschitzian) functions. Moreover, the equivalence stated in Theorem 3.7 also holds true when considering metric regularity with respect to some additional unperturbed subset C ~ Rn.

As an application of Theorem 3.7 in the context of semi-infinite programming, we have

Corollary 3.8 Under the general assumptions GA, and with the notations of subsection 3.3, the constraint system (3.1) without equalities is metrically regular at some (to, Op, e, xO) with some feasible xO, if and only if E(xO) = 0 (non-binding case) or the following growth condition is fulfilled: There exists some p > 0 and a neighborhood U of xO, such that arbitrarily close to any x E U one may find some Xl with

g(XI,y) > g(x,y) + pllx - xIII for all y E K and for some y E K.

Proof. Using the minimum function G(x) = min{g(x,y) lyE K}, we see that E(xO) = 0 is equivalent to G(xO) > 0 and the growth condition of the corollary is equivalent to the growth condition of Theorem 3.7 with F replaced by G. Then, the theorem provides the assertion of the corollary, taking into account the equivalences (cf. Remark 3.5)

metric regularity of (3.1) ¢=:> URHS- regularity of (3.1)

¢=:> metric regularity of G(x) 2: 0 with respect

to right-hand side perturbations.

4 STABILITY OF LOCAL MINIMIZERS

o

This section is devoted to the stability behavior of local minimizers of the parametric problem introduced in Section 1, as far as the stability properties follow from metric constraint regularity and known abstract stability results [1,5,6,29,32,47]. The proofs will be only sketched. We denote by Bn the

96 CHAPTER 3

closed unit ball in Rn, by B(z, c) the closed c - neighborhood of z both in R nand T, and by cl Q the closure of Q eRn. Given to E T, a strict local minimizer XO of SIP(tO) is called to be of order y;, ~ 1 if there are real numbers (! > 0 and c > 0 such that

Denote by 'l/JQ(t) the set of all global minimizers for f(., t) on M(t) ncl Q, t E T, Q eRn, and by 'l/Jloc(t) the set of all local minimizers for f(., t) W.r. to M(t).

Following concepts in [1,5,29,47], we shall say that a local minimizer XO of SIP(tO) is stable (w.r. to SIP(t), t E T), if for some positive real numbers c', 8' and for each c E (0, c') there is some 8 E (0,8') such that with Q := B(xO, c'),

'l/JQ(tO) = {XO},

o i- 'l/JQ(t) c 'l/Jloc(t) 'l/JQ(t) c B(xO, c)

'It E B(tO, 8'),

'It E B(tO, 8).

(4.1)

(4.2)

(4.3)

A stable local minimizer is called to be stable with rate r, r E (0,1], if the relation between c and 8 in (4.3) is quantified (with some (3 > 0) by

'It E B(tO,8') 'Ix E 'l/JQ(t). (4.4)

Obviously (4.1) - (4.3) include that the multivalued selection 'l/JQ of the multifunction 'l/Jloc is (upper and lower) continuous at to.

For both stability properties, we now recall sufficient conditions which specialize more abstract results to SIP(t). Note that by the general assumptions, M is a closed multifunction.

Proposition 4.1 Suppose that GA holds. Let to E T, and let xO be a strict local minimizer oIP(tO). II EMFCQ holds at xO, then XO is stable.

Proof. (Outline) Since EMFCQ at XO implies metric regularity, it is not difficult to show that the general assumptions GA imply that M is lower semicontinuous at (to, xO). A direct proof that EMFCQ implies lower semicontinuity, one may find in [26]. Then the assertion is a specialization of Klatte [29, Thm. 1] or Robinson [47, Thm. 4.3]. 0

Next we give the quantified version of the previous proposition. A multifunction \lI from T to R n is called Lipschitzian on t C T if there is some constant c > 0

such that w(t') c wet) + cllt' - tliBn for all t, t' E T. Following Aubin [2] and Rockafellar [51], we shall say that W is pseudo-Lipschitzian at (to, xO) E graph 0, if there exist neighborhoods W of to and X of XO with w(t') n Xc wet) + cllt' - tliBn for all t, t' E W.

Proposition 4.2 Suppose that GA holds. Let to E T, and let xO be a strict local minimizer to SIP(tO) of order '" ~ 1. Suppose that for some neighborhoods U of to and V of xO, the functions hand 9 are Lipschitz continuous on U x V and U x V x K respectively, and the multifunction K is Lipschitzian on U. Moreover, let for some "( E (0,1] and cf E (0, +(0)

I/(x, to) - I(x', t)1 ~ cf (lix - x'il + d(t, tOP) Vt E U Vx, x' E V.

If XO satisfies EMFCQ with respect to (1.1) at t = to, then XO is stable with rate r = "(,,,-1.

Proof. (Outline) The principal idea already appears in [1,5,30] in the context of standard nonlinear programs. In our context, the assumptions yield that G(t,x) := minYEK(t)g(t,x,y), (t,x) E T x V is Lipschitz on U x V, see, e.g., [3]. By Theorem 3.3, the Lipschitzian constraint system (3.1)t=to is metrically regular, hence M is pseudo-Lipschitzian at to, see, e.g., [41]. Now, the assertion follows from [30, Thm. 2] (see also Theorem 2.2 in [32]). 0

For second-order sufficient conditions ensuring that xO is a strict local minimizer of order 2 to P(tO) under twice differentiability of the data with respect to x, we refer, e.g., to Hettich and Jongen [18], Shapiro [54], Hettich and Still [19], Kawasaki [28]. Under the reduction ansatz, conditions for strict local minimality of order", = 1 (also called strong unicity or weak sharp minimizer of order 1) can be found, e.g., in Hettich and Zencke [20] (see also [7]).

Proposition 4.2 provides (for "( = 1) Holder stability of order ",-1 for strict local minimizers of order "'. Simple examples show that in the case '" = 2, stability of order 1 (upper Lipschitz continuity) may not be expected [35]. Under certain constraint qualifications and second-order optimality conditions on P(tO), Shapiro [55] obtains stability of order 1 for global minimizers of pet) for K(t) == K.

98 CHAPTER 3

5 CONCLUDING REMARKS

In the previous sections, we concentrated on certain kinds of regularity and stability studies and shelved many other questions which might be also handled under the title of the paper. Let us mention a few of them.

A series of papers by Jongen, Twilt, Riickmann, Weber and others is devoted to the structural analysis and the topological stability of the feasible set of semi-infinite C2-optimization problems, for recent progress in this topic we refer, e.g., to [23,25,26,52].

Essential refinements of Proposition 4.2 may be obtained when the data are twice differentiable. The use of second-order information allows to derive (directional) differential stability as well as (upper) Holder or Lipschitz stability properties of optimal values and optimal solutions to perturbed optimization problems in Banach spaces. These results are applicable for perturbed semiinfinite programs by using the cone constraint setting. An excellent survey has been given recently by Bonnans and Shapiro [7], among the rich literature on these developments we would like to refer also to [13,22,28,39,40,46,54,56,57] and the references therein.

Directional stability of the optimal value function in linear or/and convex semiinfinite programming is studied, e.g., in Zencke and Hettich [59] and Shapiro [55,56].

Under the so-called reduction ansatz for the constraints, a perturbed semiinfinite program may be regarded (at least locally, i.e., in a neighborhood of a feasible point of interest) as a parametric finite C2- or C1,1_program, for this approach the reader may consult [31]. For example, the results on upper Lipschitz continuity of local minimizers apply, besides the above mentioned papers we also refer to [1,5,29-32,35]. Similarly, conditions for strong stability of Karush-Kuhn-'IUcker points apply, for recent progress which also includes parametric C1,1_programs, we refer to Kummer [36]-[38].

Concerning the subjects treated in the present paper, the question arises how to extend the results to the class of generalized semi-infinite programs, i.e., to programs in which the constraint index set may depend also on the state variable x. This question is still under work. Optimality conditions for this class of problems may be found in Hettich and Still [19], Jongen, Riickmann and Stein [24] and Riickmann and Shapiro [53].


Acknowledgement: The authors would like to thank Bernd Kummer (Humboldt University Berlin) for pointing out that Theorem 3.7 holds for an arbitrary continuous function F.

REFERENCES [lJ W. Alt. Lipschitzian perturbations of infinite optimization problems. In A. V.

Fiacco, editor, Mathematical Programming with Data Perturbations, pages 7-2l. M. Dekker, New York, 1983.

[2J J. P. Aubin. Lipschitz behaviour of solutions to convex minimization problems. Math. Oper. Res., 9:87-111,1984.

[3J J. P. Aubin and A. Cellina. Differential Inclusions. Springer, Berlin, 1984.

[4J A. Auslender. Differentiable stability in nonconvex and nondifferentiable programming. Math. Programming Study, 10:29-41, 1979.

[5J A. Auslender. Stability in mathematical programming with nondifferentiable data. SIAM J. Control Optim., 22:239-254, 1984.

[6J B. Bank, J. Guddat, D. Klatte, B. Kummer, and K. Tammer. Non-Linear Parametric Optimization. Akademie-Verlag, Berlin, 1982.

[7J F. Bonnans and A. Shapiro. Optimization problems with perturbations, a guided tour. Preprint, Dept. of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, April 1996.

[8J J. M. Borwein. Stability and regular points of inequality systems. J. Optim. Theory Appl., 48:9-52, 1986.

[9J E. W. Cheney. Introduction to Approximation Theory. McGraw-Hill, New York, 1966.

[10J F. Clarke. Optimization and Nonsmooth Analysis. Wiley, New York, 1983.

[l1J R. Cominetti. Metric regularity, tangent sets and second-order optimality conditions. Appl. Math. Optim., 21:265-287, 1990.

[12J K. Deimling. Nonlinear Functional Analysis. Springer, Berlin, 1985.

[13J A. Dontchev and R. T. Rockafellar. Characterizations of Lipschitz stability in nonlinear programming. In A. V. Fiacco, editor, Mathematical Programming with Data Perturbations, pages 65-82. Marcel Dekker, New York, 1997.

[14J J. Gauvin. A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming. Math. Programming, 12:136-138, 1977.

[15J R. Henrion. On constraint qualifications. J. Optim. Theory Appl., 72:187-197, 1992.

[16J R. Henrion and D. Klatte. Metric regularity of the feasible set mapping in semiinfinite optimization. Appl. Math. Optim., 30:103-109, 1994.

100 CHAPTER 3

[17] R. Henrion and W. Romisch. Metric regularity and quantitative stability in stochastic programs with probabilistic constraints. Preprint Nr. 96-2, Institut fUr Mathematik, Humboldt-Universitiit zu Berlin, 1996.

[18] R. Hettich and H. Th. Jongen. Semi-infinite programming: conditions of optimality and applications. In J. Stoer, editor, Optimization Techniques, Part 2, Lecture Notes in Control and Information Sciences, volume 7, pages 1-11. Springer, Berlin, 1978.

[19] R. Hettich and G. Still. Second order optimality conditions for generalized semiinfinite programming problems. Optimization, 34:195-211, 1995.

[20] R. Hettich and P. Zencke. Numerische Methoden der Approximation und semiinfiniten Optimierung. B.G. Teubner, Stuttgart, 1982.

[21] A. Ioffe. Regular points of Lipschitz mappings. 'I'rans. Amer. Math. Soc., 251:61-69, 1979.

[22] A. Ioffe. On sensitivity analysis of nonlinear programs in Banach spaces: the approach via composite unconstrained optimization. SIAM J. Optim., 4:1-43, 1994.

[23] H. Th. Jongen and J.-J. Riickmann. On stability and deformation in semi-infinite optimization. This volume.

[24] H. Th. Jongen, J.-J. Riickmann, and O. Stein. Generalized semi-infinite optimization: a first order optimality condition and examples. Forschungsbericht Nr. 95-24, MathematikJInformatik, Universitiit Trier, December 1995.

[25] H. Th. Jongen, J.-J. Riickmann, and G. W. Weber. One-parametric semi-infinite optimization: On the stability of the feasible set. SIAM J. Optim., 4:637-648, 1994.

[26] H. Th. Jongen, F. Twilt, and G.-W. Weber. Semi-infinite optimization: structure and stability of the feasible set. J. Optim. Theory Appl., 72:529-552, 1992.

[27] A. Jourani and L. Thibault. Approximate sub differential and metric regularity: the finite-dimensional case. Math. Programming, 47:203-218, 1990.

[28] H. Kawasaki. Second-order necessary and sufficient optimality conditions for minimizing a sup-type function. Appl. Math. Optim., 26:195-220, 1992.

[29] D. Klatte. On the stability of local and global optimal solutions in parametric problems of nonlinear programming. Part I: Basic results. In Seminarbericht Nr. 75 der Sektion Mathematik der Humboldt- Universitiit zu Berlin, pages 1-21, Berlin, 1985.

[30] D. Klatte. A note on quantitative stability results in nonlinear optimization. In Seminarbericht Nr. 90 der Sektion Mathematik der Humboldt- Universitiit zu Berlin, pages 77-86, Berlin, 1987.

[31] D. Klatte. Stability of stationary solutions in semi-infinite optimization via the reduction approach. In W. Oettli and D. Pallaschke, editors, Advances in Optimization, pages 155-170. Springer, Berlin, 1992.


[32] D. Klatte. On quantitative stability for non-isolated minima. Control and Cybernetics, 23:183-200, 1994.

[33] D. Klatte. Stable local minimizers in semi-infinite optimization: regularity and second-order conditions. J. Comput. Appl. Math., 56:137-157, 1994.

[34] D. Klatte. On regularity and stability in semi-infinite optimization. Set-Valued Analysis, 3:101-111, 1995.

[35] D. Klatte and B. Kummer. Stability properties of infima and optimal solutions of parametric optimization problems. In V. F. Demyanov and D. Pallaschke, editors, Nondifferentiable Optimization: Motivations and Applications, pages 215-229. Springer, Berlin, 1985.

[36] B. Kummer. Lipschitzian inverse functions, directional derivatives and application in Cl,l optimization. J. Optim. Theory Appl., 70:559-580, 1991.

[37] B. Kummer. An implicit function theorem for CO,I-equations and parametric Cl,l-optimization. J. Math. Analysis Appl., 158:35-46, 1991.

[38] B. Kummer. Lipschitzian and pseudo-Lipschitzian inverse functions and applications to nonlinear programming. In A. V. Fiacco, editor, Mathematical Programming with Data Perturbations, pages 201-222. Marcel Dekker, New York, 1997.

[39] E. S. Levitin. Perturbation Theory in Mathematical Programming and its Applications. Wiley, Chichester-New York, 1994.

[40] H. Maurer and J. Zowe. First and second-order necessary and sufficient optimality conditions for infinite-dimensional programming problems. Math. Programming, 16:98-110, 1979.

[41] B. S. Mordukhovich. Complete characterization of openess, metric regularity, and Lipschitzian properties of multifunctions. Trans. Amer. Math. Soc., 340:1-35, 1993.

[42] J.-P. Penot. On regularity conditions in mathematical programming. Math. Programming Study, 19:167-199, 1982.

[43] S. M. Robinson. Stability theorems for systems of inequalities. Part II: Differentiable nonlinear systems. SIAM J. Numer. Anal., 13:497-513, 1976.

[44] S. M. Robinson. Regularity and stability for convex multivalued functions. Math. Oper. Res., 1:130-143, 1976.

[45] S. M. Robinson. First order conditions for general nonlinear optimization. SIAM J. Appl. Math., 30:597-607, 1976.

[46] S. M. Robinson. Generalized equations and their solutions. Part II: Applications to nonlinear programming. Math. Programming Study, 19:200-221, 1982.

[47] S. M. Robinson. Local epi-continuity and local optimization. Math. Programming, 37:208-223, 1987.

[48] R. T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, 1970.

102 CHAPTER 3

[49) R. T. Rockafellar. Generalized directional derivatives and subgradients of nonconvex functions. Canad. J. Math., 32:257-280, 1980.

[50) R. T. Rockafellar. The Theory of Subgradients and its Application to Problems of Optimization. Convex and Nonvonvex Functions. Heldermann-Verlag, Berlin, 1981.

[51) R. T. Rockafellar. Lipschitzian properties of multifunctions. Nonlin. Analysis: Theory, Meth. Appl., 9:867-885, 1985.

[52) J.-J. Ruckmann. Topological stability of feasible sets in semi-infinite optimization: a tutorial. Bericht Nr. 123, Institut fiir Geometrie und Praktische Mathematik, RWTH Aachen, Dezember 1995.

[53) J.-J. Ruckmann and A. Shapiro. On first order optimality conditions in generalized semi-infinite programming. Preprint No. 216, Institut fur Angewandte Mathematik, Universitat Erlangen-Nurnberg, Erlangen, 1997.

[54) A. Shapiro. Second-order derivatives of extremal-value functions and optimality conditions for semi-infinite programs. Math. Oper. Res., 10:207-219, 1985.

[55) A. Shapiro. On Lipschitzian stability of optimal solutions of parametrized semiinfinite programs. Math. Oper. Res., 19:743-752, 1994.

[56) A. Shapiro. First and second order optimality conditions and perturbation analysis of semi-infinite programming problems. This volume.

[57) A. Shapiro and F. Bonnans. Sensitivity analysis of parametrized programs under cone constraints. SIAM J. Control Optim., 30:1409-1422, 1992.

[58) J. Stoer and C. Witzgall. Convexity and Optimization in Finite Dimensions I. Springer, Berlin, 1970.

[59) P. Zencke and R. Hettich. Directional derivatives for the value function in semiinfinite programming. Math. Programming, 38:323-340, 1987.

[60) J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programming problem in Banach spaces. Appl. Math. Optim., 5:49-62, 1979.

PART II

NUMERICAL METHODS

4 FIRST AND SECOND ORDER

OPTIMALITY CONDITIONS AND PERTURBATION ANALYSIS OF

SEMI-INFINITE PROGRAMMING PROBLEMS

Alexander Shapiro

School 01 Industrial and Systems Engineering; Georgia Institute 01 Technology, Atlanta,

Georgia 30332-0205, USA, Email: [email protected]

ABSTRACT

In this paper we discuss finite dimensional optimization problems subject to an infinite number of inequality constraints (semi-infinite programming problems). We study such problems in a general framework of optimization problems subject to constraints formulated in a form of cone inclusions. General results on duality, and first and second order optimality conditions are presented and specified to considered semi-infinite programming problems. Finally some recent results on quantitative stability and sensitivity analysis of parameterized semi-infinite programming problems are discussed.

1 INTRODUCTION

Consider the following optimization problem

(P) Min I(x) subject to gr(x) :s 0, T E T, ",ElRn

(1.1)

where T is a compact metric space and 1 (.), g(., T) = gr (-) are real valued functions. In case the set T is not finite, the feasible set

 := {x E IRn : gr(x) :s 0, T E T}

103

R. Reemtsen andJ.-J. Riickmann (eds.), Semi-Infinite Programming, 103--133. © 1998 Kluwer Academic Publishers.

104 CHAPTER 4

ofthe above optimization problem is defined by an infinite number of inequality constraints, and hence (P) becomes a semi-infinite programming problem.

Suppose that for every x the function g(x,·) : T -t IR is continuous. Then the feasible set of (1.1) can be written in the following equivalent form

q> = {x E IRn : G(x) E K}.

Here G : IRn -t C(T) is the mapping defined as G : x -t g(x, .), C(T) is the Banach space of continuous functions ¢ : T -t IR equipped with the sup-norm II¢II := sUPTET I¢(T)I and K := C_(T), where

C_(T) := {¢ E C(T) : ¢(T) :::; 0, 'iT E T}

is the cone of nonpositive valued continuous functions. Consequently problem (1.1) can be written in the form

(P) Min f(x) subject to G(x) E K. xERn

(1.2)

We discuss first and second order optimality conditions for optimization problems in the form (1.2) and subsequently specify those conditions to semi-infinite programming problems given in the form (1.1). We also discuss local behavior of the optimal value v(u) and a corresponding optimal solution x(u) of parameterized semi-infinite programs of the form

Min f(x, u) subject to gT(X, u) :::; 0, T E T, xERn

(1.3)

where u E U is a parameter vector. The above parametrized semi-infinite program can be also formulated in the abstract form

Min f(x, u) subject to G(x, u) E K, xEX

(1.4)

where X := IRn and K := C_(T). We describe some recent results on stability and sensitivity analysis of parametrized problems in the form (1.4) and specify them to semi-infinite programming problems. We assume that the parameter space U is a vector space equipped with a scalar product "." and study directional behavior of v(u) and x(u) at a point Uo E U along a given direction dE U.

We use the following notation and terminology throughout the paper. We assume that IRn is equipped with the standard scalar product, denoted by

Optimality Conditions and Perturbation Analysis 105

".". By Df(x) we denote the differential of f at x, i.e. Df(x)h = h· 'Vf(x). Similarly, D2f(x)(h, h) = h· 'V2 f(x)h,

D f(x, u)(h, d) = h . 'V d(x, u) + d . 'V uf(x, u),

etc. For a Banach space Y we denote by By its unit ball By := {y E Y : lIyll :::; I}, and by dist(y, S) := infzEs lIy - zll the distance from the point y E Y to the set S C Y. By y* we denote the dual space of Y, formed by continuous linear functionals on Y and equipped with the dual norm lIy* II := SUPyEB y (y*, y), where (y*,y) = y*(y), y* E Y*, y E Y. For a linear operator A: X -t Y, from a Banach space X into Y, we denote by A* : y* -t X* its adjoint operator, that is (A*y*,x) = (y*,Ax).

For a cone C C Y its polar cone is

C- := {y* E Y* : (y*,y) :::; 0, Vy E C}.

Let K be a closed convex set in Y. Then TK(y) denotes the tangent cone and NK(y) = [TK(y)]- is the normal cone to K at y E K. By the definition the tangent and normal cones are empty if y ~ K. By int(K) we denote the interior of the set K and by K= the recession cone

K=:= {y E Y: y + K C K}.

Note that if K is a closed convex cone, then K= = K. By core(K) we denote the set of points y E K satisfying the following property: for any w E Y there exists a number t > 0 such that y + tw E K.

Note that, since T is compact, the mapping G(x) := g(x, .), from IRn into G(T), is continuous if g(x, T) is continuous on mn x T, jointly in x and T. Similarly the mapping G is continuously differentiable if gT (.) is differentiable and 'V gT (.) is continuous on mn x T, and similarly for higher order derivatives. Note also that the cone K := C_(T) has a non empty interior in C(T).

For a function (mapping) F(x) we denote by F'(x, d) its directional derivative

F'(x, d):= lim F(x + td) - F(x). t--+O+ t

106 CHAPTER 4

2 DUALITY AND FIRST ORDER OPTIMALITY CONDITIONS

In this section we discuss some general duality results and first order optimality conditions for semi-infinite programming problems. We start our analysis by considering an optimization problem in the form

(P) Min f(x) subject to G(x) E K, xEX

(2.1)

where X and Yare Banach spaces, G : X -* Y and K is a closed convex subset of Y. Consequently we specify general results to the setting of semiinfinite programming.

Problem (2.1) can be embedded into the following parametric family of optimization problems

Minf(x) subject to G(x) + Y E K, xEX

(2.2)

where y E Y is viewed as a parameter vector. Denote by v(y) the optimal value of the problem (Py ). Note that, by the definition, v(y) = +00 if the feasible set of (Py ) is empty, and that v(y) can be -00. That is, v(y) is an extended real valued function of the parameter vector y. Clearly the optimal value val(P) of the problem (P) is equal to v(O). Consider the conjugate

v*(y*):= sup {(y*,y) - v(y)} y'EY'

of v and v**:= (v*)*. It turns out that the value v**(O) coincides with the optimal value of the following optimization problem

(D) Max {,p(>.) := -0'(>., K) + inf L(x, >.)} , AEY' xEX

(2.3)

where 0'(>., K) := sUPYEK(>', y) is the support function of the set K and

L(x, >.) := f(x) + (>', G(x))

is the Lagrangian of (P). It follows that the optimal value of (D) is always less than or equal to the optimal value of (P), i.e. val(D) ~ val(P). We refer to the above optimization problem (D) as the parametric dual of (P). The following basic duality results are developed in Laurent [36] and (in a somewhat more general framework) in Rockafellar [48] (see also [16]).


Let us observe at this point that if K is a closed convex cone, then U(A, K) = 0 if A E K-, and U(A, K) = +00 otherwise. Therefore in that case the dual problem takes the form

(D) Max {¢(A) := inf L(x, A)} . >'EK- xEX

Furthermore, if (P) is linear of the form

(P) Min(a, x) subject to Ax + b E K, xEX

(2.4)

(2.5)

where a E X*, bEY and A : X -t Y is a continuous linear operator, then the dual problem takes the form

(D) Min (y*,b) subject to A*y*+a=O. y"EK-

(2.6)

We say that the primal problem (P) is convex if the function f is convex and the mapping G is convex with respect to the cone C := - K OO • Recall that G is said to be convex with respect to a cone C if for any Xl, X2 E X and any t E [0,1]'

(2.7)

where a ~c b means that a - b E C. It is possible to show that if (P) is convex, then the corresponding optimal value function v(y) is also convex. In particular, (P) is convex if f is linear and G is affine, i.e. (P) is linear of the form (2.5).

Suppose now that (P) is convex and hence the optimal value function v(y) is convex. Then, by the Fenchel-Moreau theorem, v .... = clv, where clv denotes the closure of the convex function v, [48]. That is

clv(.)'- {ISCV(')' iflscv(y) > -00 for ally E Y, .- -00, if lscv(y) = -00 for at least oney E Y,

where lscv(y) := min{v(y),liminfv(y')}

y'-ty

denotes the lower semicontinuous hull of v. Note that if lscv has a finite value at some point ofY, then lscv(y) > -00 for all y E Y. It follows from the above discussion that if (P) is convex, then val(D) = clv(O).

The quantity Iscv(O) is called the s'Ubval'Ue of the problem (P). It is said that (P) is s'Ubconsistent if its subvalue is less than +00 (we follow here the

108 CHAPTER 4

terminology used in [1]). Clearly (P) is sub consistent iff either (P) is feasible, i.e. v(O) < +00, or there exists a sequence Yn -t 0 such that the corresponding optimal values v(Yn) are bounded from above. By summarizing the above discussion we obtain the following result.

Theorem 2.1 Suppose that the problem (P) is convex and subconsistent. Then

val(D) = min {val(p),liminfV(Y)}. (2.8) y-tO

In particular, val(P) = val(D) if and only if v(y) is lower semicontinuous at Y = 0, i.e.

v(O) :S liminfv(y). y-tO

(2.9)

The "no duality gap" condition (2.9) is a topological type condition and may be not easy to verify in particular situations. By convex analysis we know that if v(y) is sub differentiable at Y = 0, i.e. its sub differential 8v(0) is non empty, then v**(O) = v(O), [47,48). In that case val(P) = val(D) and the set of optimal solutions of (D) coincides with 8v(0). It is possible to show that the converse of that is also true, that is if val(P) = val(D), then the set of optimal solutions of (D) coincides with 8v(0), [48). By convex analysis the next results follow (cf. [16,36,48)).

Theorem 2.2 Suppose that the problem (P) is convex. Then: (i) If the subdifferential 8v(0) is non empty, then val(P) = val(D) and the set of optimal solutions of (D) coincides with 8v(0). (ii) If val(P) = val(D), then the (possibly empty) set of optimal solutions of (D) coincides with 8v(0). (iii) If val(P) = val(D) and x and X are optimal solutions of (P) and (D), respectively, then

x E argminL(x,X) and X E NK(G(X)). :rEX

(2.10)

Conversely, if condition (2.10) holds for some x and X, then x and X are optimal solutions of (P) and (D), respectively, and val(P) = val(D).

Note that the second condition of (2.10) implies that G(x) E K, i.e. x is a feasible point of (P), since otherwise the normal cone NK(G(X)) is empty. Note also that if K is a convex cone, the condition X E NK(G(X)) is equivalent to

G(x) E K, X E K- and (X, G(x)) = O. (2.11)


The problem (P) is said to be calm if val(P) is finite and the optimal value function v(y) is subdifferentiable at y = 0, i.e. ov(O) =fi 0. It is known from convex analysis that if v(y) is sub differentiable at y = 0, then

1· . f v(td) - v(O) 1m III > -00, t-tO+ t

"IdE Y. (2.12)

The converse of that is also true, i.e. (2.12) implies sub differentiability of v(y) at y = 0, if the space Y is finite dimensional. We have then the following result [48].

Proposition 2.3 Suppose the problem (P) is convex and calm. Then val(P) = val(D) and the dual problem (D) has a non empty set of optimal solutions.

Let us finally give conditions for continuity of the optimal value function v(y). Suppose that for every y in a neighborhood of zero in Y, there exists x E X such that G(x) + y E K, i.e. the feasible set ofthe problem (Py ) is non empty. Formally we can write this condition as

o E int{G(X) - K}, (2.13)

where G(X) is the range of the mapping G, i.e. G(X) is the set {G(x) : x EX}. Clearly if v(O) is finite, then condition (2.13) is necessary for continuity of v(y) at y = o. It turns out that the converse of that is also true, i.e. this condition is also sufficient [44, Corollary 1]. Recall that if the convex function v(y) is continuous at y = 0, then ov(O) is non empty and bounded, and hence (P) is calm. These arguments lead to the following duality theorem.

Theorem 2.4 Suppose that the problem (P) is convex, the function f is lower semicontinuous and the mapping G is continuous. We have then that if the regularity condition (2.13) holds, thenval(P) =val(D). Moreover, ifval(P) is finite, then the set of optimal solutions of the dual problem (D) is non empty, convex, closed and bounded.

Clearly condition (2.13) follows from the Slater condition: there exists a point x* E X such that G(x*) E int(K). Converse of that is also true if the (convex) set K has a non empty interior. If the mapping G is continuously differentiable, then (2.13) is equivalent to the following constraint qualification, due to Robinson [45], which is obtained from (2.13) by linearization of G at a feasible point Xo of (P),

o E int{G(xo) + DG(xo)X - K}. (2.14)

110 CHAPTER 4

Moreover, if K is a convex set with a non empty interior, then Robinson's constraint qualification (2.14) is equivalent to the corresponding Slater condition for the linearized system: there exists hEX such that

G(xo) + DG(xo)h E int(K). (2.15)

The conditions (2.10) can be viewed as (first order) optimality conditions for the problem (P). Note that it follows from the second condition of (2.10) that 5. E (KOO )-. Let us also observe that if (P) is convex, and hence G is convex with respect to the cone _Koo , then for any A E (K OO )-, (A, G(·)) is convex and hence L(·, A) is convex. By convexity of L(·, 5.) we have that if, in addition, L(·,5.) is continuously differentiable, then x is a minimizer of L(·, i) iff YT xL(x, 5.) = o. Consequently, if (P) is convex and f and G are continuously differentiable, then conditions (2.10) can be written in the following equivalent form

YTxL(x, 5.) = 0 and 5. E NK(G(X)). (2.16)

We summarize the above discussion about first order optimality conditions in the following theorem.

TheoreIll 2.5 Suppose that the problem (P) is convex, f is lower semicontinuous and G is continuous. Then the following results hold: (i) If X is an optimal solution of (P) and (P) is calm, then there exists a non empty set Ao of Lagrange multipliers such that the optimality conditions (2.10) hold for any 5. E Ao. Conversely, if there exists a pair (x, 5.) satisfying (2.10), then x is an optimal solution of (P), (P) is calm and the set Ao of Lagrange multipliers, satisfying (2.10), coincides with the set of optimal solutions of the corresponding dual problem and is the same for any optimal solution of (P). (ii) If x is an optimal solution of (P) and the regularity condition (2.13) holds, then the set Ao of Lagrange multipliers, satisfying (2.10), is non empty and bounded.

Consider now the case where the probl~m (P) is continuously differentiable (i.e. f and G are continuously differentiable), possibly non convex. In that case first order necessary conditions (2.16) hold under Robinson's constraint qualification (2.14) [35,46,58].

TheoreIll 2.6 Suppose that the problem (P) is continuously differentiable and let x be an optimal solution of (P). Then the set Ao of Lagrange multipliers,


satisfying (2.16), is non empty and bounded if Robinson's constraint qualification (2.14) holds. Conversely, if the set Ao of Lagrange multipliers, satisfying (2.16), is non empty and bounded and either the space Y is finite dimensional or the set K has a non empty interior, then Robinson's constraint qualification (2.14) holds.

Note that in case the space Y is infinite dimensional it is essential for the implication: "if the set Ao of Lagrange multipliers is non empty and bounded, then Robinson's constraint qualification (2.14) holds" to hold that the set K has a non empty interior (see [58]).

The above results can be applied to semi-infinite programs formulated in the form (1.2). Of course, semi-infinite programming problems have a specific structure which we discuss now. Consider the Banach space Y := C(T) and the cone K := C_ (T) of nonpositive valued continuous functions. The dual space y* of Y = C(T) is the space of finite signed Borel measures on T, with the norm given by the total variation of the corresponding measure, and for y E C(T), /1- E C(T)*,

(/1-, y) = £ y(T)/1-(dT).

The polar cone K- C y* of the cone K is formed by the set of (nonnegative) Borel measures on T. For y E K the tangent cone TK(y) can be written in the form

TK(Y) = {z E C(T) : Z(T) ~ 0, VT E 6.(y)},

where 6.(y) := {T E T : y(T) = O}. The corresponding normal cone NK(y) is formed by (nonnegative) Borel measures /1- such that supp(/1-) c 6.(y), where supp(/1-) denotes the support of the measure /1-, e.g. [53].

Suppose that the constraint function g(x, T) is differentiable in x and that \l g(x, T) is continuous on IRn x T. (Unless stated otherwise all gradients will be written here with respect to x.) It follows then that the corresponding mapping G(x) = g(x,·) is continuously differentiable and

[DG(x)h]O = h· \lg(x, .).

Let Xo be a feasible point of (P) and consider the set

of active at Xo constraints. Suppose that there exist a vector h E IRn such that

(2.17)

112 CHAPTER 4

In case the set T is finite this is the Mangasarian-Fromovitz constraint qualification [40]. Therefore we refer to (2.17) as the (extended) MF constraint qualification. It was used in various studies of semi-infinite programs (e.g. [27,31,32]). It is not difficult to show that (2.17) is equivalent to Robinson's constraint qualification (2.14), [53].

Under the MF constraint qualification first order optimality conditions (2.16) take the following form: there exists a (nonnegative) Borel measure J.-L on T such that

'\l f(xo) + l '\l g(xo, T)J.-L(dT) = 0 and supp(J.-L) C Llo, (2.18)

and the set M(xo) of such (nonnegative) Borel measures J.-L, satisfying (2.18), is bounded (in the "total variation" norm). Since the cone K has a non empty interior, the converse is also true. That is, if the set of (nonnegative) Borel measures satisfying (2.18) is non empty and bounded, then the MF constraint qualification follows. In case the set T is finite, and hence (1.1) becomes a nonlinear programming problem, equivalence of the MF constraint qualification to nonemptiness and boundedness of the corresponding set of Lagrange multipliers is shown in [18].

Suppose that the measure J.-L in (2.18) has a finite support (discrete measure), i.e. J.-L = 2::1 Ai8(Ti) where 8(T) denotes a measure of mass one at the point T (Dirac measure). Note that the total variation of the discrete measure 2::1 Ai8(Ti) is equal to 2::1 IAil· Since J.-L is nonnegative we have that Ai > 0 and since supp(J.-L) C Llo we have that Ti E Llo. Then optimality conditions (2.18) can be written in the form: there exist Ai and Ti E T, i = 1, ... ,m, such that

m

'\l f(xo) + L Ai '\l g(xo, Ti) = 0, Ai > 0, Ti E Llo, i = 1, ... , m. (2.19) i=1

It is well known (e.g. [43]) that if Xo is a locally optimal solution of (P) then, under the MF constraint qualification, optimality conditions (2.19) hold with m ~ n. That is, it is always possible to choose a discrete measure J.-L satisfying (2.18). This result can be proved in various ways. For example, one can use the following arguments.

Suppose that the set M(xo) of (nonnegative) Borel measures satisfying (2.18), is non empty and let fl E M(xo). Consider the set

Ai := {J.-L E M(xo) : J.-L(T) = fl(T)}.


Note that if J.L is a nonnegative Borel measure, then J.L(T) is equal to the total variation of J.L, i.e. IIJ.LII = J.L(T). The set M is non empty since P E M. Moreover, M is convex, bounded and closed in the weak* topology of C(T)*, and hence is weakly* compact. It follows then by the Krein-Millman theorem that M coincides with the closure (in the weak* -topology) of the convex hull of its extreme points. Also it is possible to show that the extreme points (measures) of M are discrete with the number of support points less than or equal to n + 1, this is because J.L(T) = jj(T) can be considered as a linear equation added to n equations defining measures J.L in (2.18) (see, e.g., [53]). Consequently we obtain that there exists a discrete measure J.L = L~l Ai8(Ti) satisfying (2.19) and such that m ::; n + 1 and IIJ.LII = Ilpll. If m > n then the corresponding vectors V' g(xo, Ti), i = 1, ... , m, are linearly dependent and hence such a discrete measure can be always constructed with m ::; n.

Let us denote by Mk(XO) the set of discrete measure J.L = L~l Ai8(Ti) satisfying (2.19) and such that m ::; k. By the above arguments we have that the MF constraint qualification implies that the set Mk(xo) is bounded and Mk(XO) is non empty for any k ~ n. Moreover, if J.L E M(xo), then there exists J.L' E Mn+1(xo) such that IIJ.LII = IIJ.L'II. Therefore M(xo) is non empty and bounded iff Mk(xo) is non empty and bounded for any k ~ n+l. We obtain the following result.

Proposition 2.7 For any k ~ n+ 1 the set Mk(XO) is non empty and bounded if and only if the MF constraint qualification holds.

The above result is derived by direct methods in Klatte [33) (see also [34)). It is interesting to note that the set Mn(xo) can be bounded while the set M(xo) is unbounded and hence the MF constraint qualification does not hold, i.e. it is essential in the "only if' part of the above proposition that k ~ n + 1 and not just k ~ n. A counterexample, demonstrating this, is constructed in [33] (see also [34, example 2.5]).

Consider now the convex case, i.e. suppose that the functions fO and grO, T E T, are convex. The dual of (P) can be written here in the form

(D) Max {¢(J.L):= inf [f(X) + [ g(x,T)J.L(dT)]}, ILt:O xElRn iT

(2.20)

where J.L >- 0 means that J.L is a nonnegative Borel measure on T. Various sufficient conditions for "no duality gap" property val(P) = val(D), given in theorems 2.1 - 2.4, can be applied here in a straightforward way.

114 CHAPTER 4

Consider the following linear semi-infinite programming problem:

(P) Min c· x subject to a(r) . x + b(r) :::; 0, rET, ",ERn

(2.21)

where c E IRn and a(·) : T -+ IRn, b(·) : T -+ IR are continuous functions. This, of course, is a convex problem. Its dual is

(D) Max ( b(r)JL(dr) subject to (a(r)JL(dr) + c = 0. JL~O iT iT (2.22)

If we restrict the above dual problem to discrete measures JL, with the number of support points less than or equal to m, it takes the form

MaxA,T E:'1 Aib(ri) subject to E:'1 Aia( ri) + c = 0, Ai ~ 0, i = 1, ... , m,

(2.23)

where T := (rl' ... , rm) E T x ... x T. We always have that val(P) ~ val(D) and val(D) ~ val(Dm).

Let us observe that the optimal values of these dual problems, val(D) and val(Dm), are the same for m = n + 1. Indeed, let \II be the feasible set of the problem (D). If \II is empty, then the feasible set of the problem (Dm) is also empty and the optimal value of the both problems is -00. Suppose that \II is non empty and let p E \II. Consider the set

~ := {JL E \II : l b(r)JL(dr) = l b(r)p(dr), JL(T) = P(T)} .

The set ~ is non empty since P E ~. Moreover, ~ is convex, bounded and closed in the weak* topology of C(T)*, and hence is weakly* compact. It follows then by the Krein-Millman theorem that ~ has an extreme point. Again we have that an extreme point of ~ is a discrete measure. We obtain that for any JL E \II there is a discrete measure JL' E \II such that the value of the objective function in (2.22) is the same for JL and JL'. Moreover, since there are n + 1 (linear) equations defining such points we can take m = n + 1.

The problem (P) is subconsistent here iff either (P) is feasible (Le. its feasible set is non empty) or there exists a sequence bk E C(T) such that bk converge, in the sup-norm, to b and the optimal values v(bk ) of the sequence of problems

Min c· x subject to a(r)· x + bk(r) :::; 0, rET, "'ERn

(2.24)

are bounded from above. By theorem 2.1 we have that if (P) is subconsistent, then there is no duality gap between (P) and its dual (D), and hence between (P) and (Dm) with m = n + 1, iff the inequality

val{P) ~ lim inf V{bk) (2.25) k-too

holds for any sequence bk E C{T) converging to b.

By Proposition 2.3.we have that if the primal problem (P), given in (2.21), has a finite optimal value and is calm, then there is no duality gap between (P) and its duals (D) and (Dn+1 ) and the duals have non empty sets of optimal solutions. Recall that if val{P) is finite and the Slater condition for (P) holds, then (P) is calm and in that case the sets of optimal solutions of the dual problems are non empty and bounded. It is not difficult to see that the Slater condition, for (P), holds iff there exists x E IRn such that

a{r) ·x+b{r) < 0, Vr E T. (2.26)

For the dual problem (D), in the form (2.22), the abstract constraint qualification (2.13) can be formulated in the form:

o E int ( { W E IRn : w = h a{r)JL(dr) + c, JL to} ) . (2.27)

If val{D) is finite and the above condition (2.27) holds, then the corresponding optimal value function (defined on IRn) is continuous at 0 and hence its subdifferential, at 0, is non empty and bounded. It follows then that there is no duality gap between (P) and (D), and the set of optimal solutions of (P) is non empty and bounded.

For a thorough survey of duality theory in linear semi-infinite programming we refer to Hettich and Kortanek [24, section 6]. Note that in [24] conditions (2.26) and (2.27) are referred to as superconsistency properties of the problems (P) and (D), respectively.

3 SECOND ORDER OPTIMALITY CONDITIONS

In this section we discuss second order necessary and sufficient optimality conditions. We start with a discussion of the general problem (2.1), where we

116 CHAPTER 4

assume that the space X is finite dimensional and f{x) and G{x) are twice continuously differentiable, and then consider semi-infinite programming problems. With a point y E K and a direction dEY is associated the so-called second order tangent set

T'k{y, d) := {z E Y : dist{y + td + ~t2 z, K) = o{t2), t ~ O} . (3.1)

The set T'k (y, d) is closed and convex and can be nonempty only if d is a tangent direction to K at y, i.e. dE TK{Y). Note that

T'k{y,d) + TTK(Y) (d) c T'k{y,d) c TTK(y){d) (3.2)

and that T'k{y,d) = TTK(y){d) if the set K is polyhedral, [14]. In particular, (3.2) implies that TTK(Y) (d) is the recession cone of T'k(y, d). Note also that TTK(y)(d) = TK(y) + [d], where [d] denotes the linear space (one dimensional if d =1= 0) generated by d.

Let Xo be a feasible point of the problem (P). The cone

C(xo) := {h EX: DG(xo)h E TK(G(XO)), Df(xo)h ~ O}

is called the critical cone of the problem (P) at the point Xo. If the point Xo satisfies the first order necessary conditions, i.e. the set A(xo) of Lagrange multipliers, satisfying (2.16), is non empty, then the inequality D f(xo)h ~ 0 in the definition of C(xo) can be replaced by the equation D f(xo)h = o. Therefore in that case we have that for any ,X E A(xo),

C(xo) = {h EX: DG(xo)h E TK(G(XO», (>.,DG(xo)h) = O}.

Clearly if Xo is a locally optimal solution of the problem (P), then for any feasible path of the form x(t) = Xo +th+ ~t2W+o(t2) the inequality f(x(t)) ~ f(xo) should hold for all t ~ 0 small enough. This implies that, under Robinson's constraint qualification (2.14), for every h E C(xo) the optimal value of the following optimization problem

minwEx D f(xo)w + D2 f{xo)(h, h) subject to DG(xo)w + D2G(xo)(h, h) E T'k(G(xo), DG(xo)h)

(3.3)

is non negative. By calculating the parametric dual of (3.3) (see (2.3)) these second order necessary conditions can be written in the following dual form.

Proposition 3.1 Let Xo be a locally optimal solution of the problem (P) and suppose that Robinson's constraint qualification (2.14) holds. Then for all h E

C(xo), sup {D;xL(xo, 'x){h, h) - a{'x, T{h)} ~ 0,

AEA(xo) (3.4)


where T(h) := Tk(G(xo), DG(xo)h) and A(xo) denotes the set of Lagrange multipliers satisfying the corresponding first order necessary conditions at the point Xo.

Apart from the second order expansion of the Lagrangian, the additional term a(A, T(h)) appears in the above second order conditions, which is referred to as the "sigma term". Particular forms of that term were derived, by the socalled reduction method, already in earlier works on semi-infinite programming [57],[23],[3]. In an abstract form this term was introduced in Kawasaki [28, 30]. In the form (3.4) second order necessary conditions were obtained by Cominetti [14]. The above idea of deriving second order optimality conditions via parabolic curves is due to Ben-Tal [4] (see also [5]).

Let A E A(xo) and h E C(xo). Then A E [TK(G(XO))]- and (A,DG(xo)h) = 0, and hence A E (TK(G(XO)) + [DG(xo)h])-. Since T(h) is a subset of TK(G(XO)) + [DG(xo)h], it follows that the "sigma term" a(A, T(h)) in (3.4) is less than or equal to zero. Consequently by deleting the sigma term in (3.4) one obtains a condition which always implies (3.4). Note also that if the set K is polyhedral, then T(h) coincides with TK(G(XO)) + [DG(xo)h] and hence in that case the sigma term in (3.4) vanishes. Note finally that the second order tangent set T(h) can be empty. In that case the sigma term is -00 and (3.4) trivially holds.

The corresponding second order sufficient conditions are more involved. In an abstract form second order optimality conditions were investigated in [25],[26], [41],[42],[49], for example. Second order necessary conditions (3.4) are based on verification of local optimality along parabolic curves and it is not true in general that by replacing "~" sign in (3.4) with the strict inequality sign ">" one obtains a sufficient condition. That is, the second order tangent set Tk(G(xo), DG(xo)h) can be "too small" for the purpose oflower approximation from which second order sufficient conditions are derived. Nevertheless for the following important class of second order regular sets, introduced in [7,11]' it suffices to consider variations along parabolic curves only.

Definition 3.2 The set K is said to be second order regular at a point y E K in a direction d E TK (y) and with respect to a linear mapping M : X --+ Y, if for any sequence Yn E K of the form Yn := Y + tnd + ~t~rn, where tn --+ 0+ and rn = MWn + an with {an} being a convergent sequence in Y and {wn} being a sequence in X satisfying tnwn --+ 0, the following condition holds

lim dist (rn, Tk(Y, d)) = o. n-+oo

(3.5)

118 CHAPTER 4

If K is second order regular at Y E K in every direction d E TK(Y) and with respect to any X and M we say that K is second order regular at y.

Note that K is second order regular at a point y E K if for any d E TK(y) and any sequence yHnd+tt;'rn E K such that tnrn -+ 0, condition (3.5) holds. The additional complication of considering sequences of the form rn = MWn +an in the above definition is needed for technical reasons. In case the set K is second order regular at the point Yo = G(xo), there is no gap between second order necessary and sufficient conditions in the sense of the following result obtained in [11]. Recall that <P denotes the set of feasible points of (P).

Theorem 3.3 Let Xo be a feasible point of (P) satisfying the first order optimality conditions. Suppose that Robinson's constraint qualification (2.14) holds and that for every h E C(xo), the set K is second order regular at the point G(xo) in the direction DG(xo)h and with respect to the linear mapping M := DG(xo). Then the following two conditions are equivalent: (i) (Second order growth condition) there exists a constant c > ° and a neighborhood N of Xo such that for all x E .)(h, h) - a(>., T(h»} > 0, AEA(xo)

where T(h) := TI(G(xo), DG(xo)h).

(3.6)

(3.7)

It turns out that the second order regularity can be verified in many situations. In particular it is shown in [11] that: (i) Every polyhedral convex set is second order regular. (ii) IT K is given in the form K := {y E Y : gi(y) ~ 0, i = 1, ... ,p}, where gi(Y), i = 1, ... ,p, are convex twice continuously differentiable functions and there exists fj such that gi(fj) < 0, i = 1, ... ,p (Slater condition), then K is second order regular. (iii) IT two convex closed sets Kl and K2 are second order regular and there exists a point y E K2 such that y E int(Kl), then Kl n K2 is also second order regular. (iv) The cones of positive and negative semi-definite matrices are second order regular.

In order to apply the above results to semi-infinite programming we study now the case of the space Y := C(T) and the cone K := C+ (T) of nonnegative valued functions. For the cone C_ (T) of nonpositive valued functions the analysis


is similar. Suppose that T is a non empty compact subset of lRP defined in the form

T := {r E lRP : g(r) E K:}, (3.8)

where K: is a convex closed subset of a Banach space Z and g : lRP 4 Z is a twice continuously differentiable mapping. Let us observe that the cone C+ (T) can be written in the form

C+(T) = {Y E C(T) : inf y(r) ~ o} . "'ET

(3.9)

Consider a function y E C+(T). Suppose that yO is twice continuously differentiable and that its set

~(y) := {r E T: y(r) = o}

of contact points is non empty. It follows then that ~(fi) is the set of minimizers of yO over T, i.e. ~(y) is the set of optimal solutions of the problem

Miny(r) subject to g(r) E K:. .,. (3.10)

Therefore, under a constraint qualification, to a point r E ~(y) corresponds a vector 'Y E Z· of Lagrange multipliers satisfying the corresponding first order optimality conditions

V'.,.£(r,'Y) = 0, 'Y E N1C(Q(r)), (3.11)

where £(r,'Y):= y(r) + (-y,g(r).

Denote by r(r) the set of Lagrange multipliers satisfying (3.11) and by C(r) the corresponding cone of critical directions. Recall that if Robinson's constraint qualification holds for (3.10) at r E ~(fi), then rer) is non empty and bounded.

Consider a direction d E TK(Y). Recall that d E TK(fi) iff d(r) ~ 0 for all r E ~(fi). Denote

~l(fi,d) := {r E ~(y) : d(r) = o}.

The following result is a slight modification of a result obtained in [12, Theorem 7.1].

Theorem 3.4 Suppose that T is a non empty compact set given in the form (3.8), that y E K := C+(T) is twice continuously differentiable, that d E TK(fi)

120 CHAPTER 4

is continuously differentiable, that the set A(y) is finite, that for every 7 E A(y) Robinson's constraint qualification holds and the set K is second order regular at 9(7), and that the following second order growth condition holds: there exist c> 0 and a neighborhood NeT of A(iJ) such that

y(7) 2: cdist(7,A(iJ»2, "'17 E TnN. (3.12)

Let M(x) := E~=l xi'l/Ji(') be a linear mapping from IRn into C(T) such that the functions 'l/Ji (.), i = 1, ... , n, are Lipschitz continuous on T. Then the cone K is second order regular at y in the direction d with respect to M. Moreover, if the sets A(iJ) and Al (y, d) are non empty, then

T'i(y,d) = {z E C(T) : Z(7) + /1;(7, d) 2: 0, "'17 E A1(y,d)} , (3.13)

where /1;(7, d) is the optimal value of the problem

Min Max {11' '\72£(7,'Y)11 + 211' '\7d(7) - a (-y,T~(Q(7), '\79(7)11»)}. (3.14) 1/EC( r) ')'H( r)

Otherwise T'i(y,d) = C(T).

Note that if the set K is polyhedral, then K is second order regular and the sigma term in (3.14) vanishes. Note also that the second order growth condition (3.12) is implied by appropriate second order sufficient conditions applied to the problem (3.10) at every 7 E ~(y). A general formula for second order tangent sets of the cone C+(T) is given in [15]. Under the assumptions of theorem 3.4, formulas (3.13)-(3.14) seem to be more direct and convenient for calculations, and moreover C+(T) is second order regular at y.

Consider now the semi-infinite programming problem (1.1). Suppose that the functions f(x) and gr(x), 7 E T, are twice continuously differentiable and that '\72gr(x) is continuous on IRn x T. Let Xo be a feasible point of (1.1) satisfying the first order optimality conditions (2.18), and let M(xo) be the corresponding set of nonnegative Borel measures satisfying (2.18), i.e. M(xo) is the set of Lagrange multipliers for the problem (1.1). Suppose further that the MF constraint qualification holds at Xo and that the cone C - (T) is second order regular at y{-) := g(xo,') (this, of course, is equivalent to the condition that C+(T) is second order regular at y{-) := -g(xo, .». Then condition (3.7) is necessary and sufficient for the the second order growth condition (3.6). Recall that, under the MF constraint qualification, the set M(xo) is compact in the weak* topology of C(T)* and hence by the Krein-Millman theorem is the closure (in the weak* topology) of the convex hull of its extreme points. Since extreme points of M(xo) are discrete measures, we obtain that discrete measures in


M(xo) form a dense subset of M(xo) in the weak* topology. Consequently the supremum in (3.7) can be taken with respect to discrete measures 2:::1 Ai8( ri) satisfying (2.19).

In order to apply second order sufficient condition (3.7) one needs to verify the second order regularity and to calculate the corresponding sigma term. Let us consider, for example, the case described in theorem 3.4. That is, suppose that the set T is given in the form (3.8), that g(x, r) is twice continuously differentiable jointly in x and r, and that the assumptions of theorem 3.4 hold for y(-) := -g(xo, .). Of course, the feasible set of (1.1) can be described by the constraints -gr(x) ~ 0, rET, as well. Let h E C(xo) and consider d(·) := -h· \1g(xo, .). Denote

6.0 := {r E T: g(xo,r) = O} and 6.t{h):= {r E 6.0 : h· \1g(xo,r) = O}.

We have here that y is twice continuously differentiable, that d is continuously differentiable and that M := DG(xo) maps any x E lRn into a Lipschitz continuous function on T. Note that since h E C(xo), we have that h· \1g(xo,r) ~ 0 for all r E 6.0 and, for any measure JL E M(xo),

Moreover we have that supp(JL) C 6.0 , and hence supp(JL) C 6.1 (h). Consequently the sigma term (see (3.4),(3.7) and (3.13» can be written here as (cf. [9])

a(JL, T(h» = -[ K(r,d)JL(dr). (3.15)

It is interesting to note that the sigma term here is linear in JL. Therefore in the corresponding second order optimality conditions it suffices to take supremum with respect to extreme points of M(xo), which are discrete measures JL = 2::1 Ai8(ri) with m ~ n.

Suppose, for example, that all points of 6.0 are interior points ofT. In that case the second order growth condition (3.12) is equivalent to negative definiteness of matrices \1;rg(XO, r) for every r E 6.0 , and

-K(r,d)

Therefore in that case second order sufficient condition (3.7) takes the form: to every h E C(xo) \ {O} corresponds a discrete measure 2::1 Ai8(ri), m ~ n,

122 CHAPTER 4

satisfying (2.19) such that

(3.16)

where

The corresponding second order necessary condition (3.4) is obtained by replacing the strict inequality sign in (3.16) with the "~" sign. Variants of the above second order optimality conditions were obtained (e.g. [3],[23],[50], [57]) by the so-called reduction method.

4 DIRECTIONAL DIFFERENTIABILITY OF THE OPTIMAL VALUE FUNCTION

Consider the parametric semi-infinite programming problem (Pu) given in (1.3). In this section we discuss (first order) differentiability properties of the optimal value v(u) of (Pu ). We assume that for a given value Uo ofthe parameter vector, the corresponding problem (Puo ) coincides with the (unperturbed) problem (P) and study differential behavior of v(u) at Uo in a given direction d E U. We assume that the functions !(x, u) and 9r(X, u) are continuously differentiable, jointly in x and u, and that 'V9r(X,U) is continuous on X x U x T, where X = JRn.

Let Xo be an optimal solution of (P) and consider the following linearization of (Pu) at (xo, uo) in the direction d:

MinhEX D !(xo, uo)(h, d) subject to DG(xo, uo)(h, d) E TK(G(XO, uo)).

(4.1)

It is not difficult to verify that the dual of (P L d ) can be written in the form

Max DuL(xo,.x, uo)d, >'EA(zo,uo)

(4.2)

where L(x,.x, u) is the Lagrangian of (Pu) and A(x, u) is the set of Lagrange multipliers satisfying the corresponding (first order) optimality conditions at the feasible point x. The problem (PLd) is linear and hence is convex. Note


that ifval(PLd) = val(DLd) , then a pair offeasible points X and h solves (PLd) and (DLd ), respectively, iff

(X, DG(xo, uo)(h, d)} = a.

Note also that if A(xo, uo) "I- 0, then for d = a the set of optimal solutions of (DLd) is A(xo, uo) and the set of optimal solutions of (PLd) is the critical cone C(xo). In general, if (PLd) and (DLd) have optimal solutions, then C(xo) is the recession cone of the set of optimal solutions of (PLd ).

Consider the following condition

a E int {G(xo,uo) + DG(xo,uo)(X x lR+(d)) - K}, (4.3)

where ~(d) := {td: t ~ a}. This condition, introduced in [7], can be viewed as an extension of a condition due to Gollan [21]. It implies (and in fact, since K has a non empty interior, is equivalent to) the condition

a E int {DG(xo, uo)(X x ~(d)) - TK(G(XO' uo»)}. (4.4)

Moreover, since the set inside the brackets in the right hand side of (4.4) is a cone, the above condition is equivalent to

(4.5)

It is also possible to show ([13]) that if the set lR+(d) in (4.3)-(4.5) is replaced by the smaller set lRt(d) := {td : t > a}, then the obtained conditions are equivalent to the respective conditions (4.3)-(4.5).

The constraint qualification (2.13) for (PLd) can be written in the form

a E core {DzG(xo,uo)X + DuG(xo,uo)d - TK(G(XO,uo»)}. (4.6)

Note that the term "int" is replaced by the term "core" in the above condition. It is possible to show that both such conditions are equivalent (e.g. [13]). Since lR+(d) in (4.5) can be replaced by lRt(d)), it is not difficult to see that condition (4.6) follows from (4.5). Therefore we have by theorem 2.4 that if (4.3) holds, then there is no duality gap between (PLd) and (DLd), and their common optimal value is finite iff the set A(xo, uo) is non empty, in which case the set of optimal solutions of (DLd) is non empty and bounded. Conditions (4.3)-(4.5) depend on the chosen direction d. Therefore we refer to them as directional constraint qualifications. Note that conditions (4.3)-(4.5) follow from (but are not equivalent to) Robinson's constraint qualification (2.14), and in general do not imply existence of Lagrange multipliers.

124 CHAPTER 4

In the case of semi-infinite programming, (4.3)-(4.5) are equivalent to the condition: there exists h E lRn such that

h· 'V x9r(XO, uo) + d· 'V u9r(Xo, uo) < 0 for all TELlo. (4.7)

The above condition can be viewed as the MF constraint qualification for the constraint function 9r (x, Uo + td) at (xo, 0) E lRn x lR+. The set of directions d for which the directional constraint qualification holds, forms an open convex cone in the space U. In the case of semi-infinite programming this cone is given by the projection onto U of the cone of all vectors (h, d) E lRn x U satisfying (4.7). If the MF constraint qualification for the unperturbed problem (P) holds at Xo, then the directional constraint qualification (4.7) is satisfied for any d E U.

Suppose that the directional constraint qualification holds and that the set A(xo, uo) is non empty, and hence the set of optimal solutions of (DLd) is non empty and bounded. Since (DLd) is linear in A, it attains its optimal value at an extreme point of A(xo, uo). Therefore, in the case of semi-infinite programming, we have that the optimal value of (DLd) is not changed if its feasible set A(xo,uo) is restricted to discrete measures J.L = L:~1 Ai8(Ti) satisfying the first order optimality conditions (2.19). Note also that in that case

By using first order Taylor expansion of (Pu), at (xo, uo), and a directional variant of Robinson-Ursescu ([44,56]) stability theorem ([7]), it is possible to show that, under the directional constraint qualification, the upper directional derivative of v(u), at Uo in the direction d, is less than or equal to the optimal value val(PLd) of the linearized problem (PLd). Since then val(PLd) = val(DLd), we obtain the following result ([7,37,38]).

Theorem 4.1 Suppose that the directional constraint qualification (4.3) holds. Then val(PLd) = val(DLd) and

1· v(uo + td) - v(uo) < l(DL) 1m sup _ va d . t-tO+ t

(4.8)

Moreover, val(DLd) > -00 if and only if the set A(xo, uo) is non empty, in which case the set of optimal solutions of (DLd) is a non empty bounded subset of A(xo, uo).


Let So be the set of optimal solutions of (P) and suppose that the directional constraint qualification holds at every x E So. It follows then from (4.8) that

. v(uo + td) - v(uo) . hm sup ~ mf ,sup DuL(x, A, uo)d. (4.9)

t-tO+ t xESo ).EA(x,uo)

It is interesting to note that if the set A(xo, uo) of Lagrange multipliers is empty for at least one optimal solution Xo of (P) at which the directional constraint qualification holds, then the left hand side of (4.9) equals -00, and hence v'(uo,d) = -00. It is remarkable that in that case it is possible to derive a directional expansion of the optimal value function of order t 1/ 2 , [6,12].

The natural question arises whether the upper bound (4.9) is tight, i.e. whether v'(uo, d) is equal to the right hand side of (4.9)? There are examples, however, showing that v'(uo, d) can be strictly less than val(DLd) even if T is finite, So = {xo} and the MF constraint qualification holds at Xo, [19]. It turns out that in some cases a formula for (first order) directional derivatives of v(u) involves second order information of (Pu), [20],[21]. In the convex case it is possible to give quite a complete description of the first order behavior of v( u). The following result is an extension of Gol'shtein's theorem [22] to the present setting, [7,54]. Recall that if (P) is convex, Xo is an optimal solution of (P) and the corresponding set Ao of Lagrange multipliers is non empty, then Ao is the set of optimal solutions of the dual problem and hence is the same for any other optimal solution of (P).

Theorem 4.2 Suppose that the problem (P) is convex, that the optimal set So of (P) is non empty and compact, that the directional constraint qualification holds at every Xo E So, and that for t ~ 0 small enough the problem (Puo+td) has an optimal solution x(t) such that dist(x(t), So) -+ 0 as t -+ 0+. Then the optimal value function v (u) is directionally differentiable at Uo in the direction d and

v'(uo, d) = inf sup DuL(x, A, uo)d. xESo >'EAo

(4.10)

Note that if Ao in the above theorem is empty, then (4.10) still holds with v'(uo, d) = -00. Note also that if the Slater condition holds for (P), then Ao is non empty and bounded and the directional constraint qualification holds in every direction d.

Another case where it is possible to obtain directional derivatives of v(u) by using first order derivatives of the data, is when Lagrange multipliers are unique,

126 CHAPTER 4

i.e. A(x, uo) = p.(x)} is a singleton for every x E So. In that case, under mild assumptions [37],

v'(uo, d) = inf D"L(x,X(x),uo)d. xESo

Uniqueness of Lagrange multipliers for optimization problems in the form (1.2) was studied in [55]. In particular it is shown there that in the case of semiinfinite programming problem (1.1) a discrete measure J.L = I:~I Ai8(Ti) satisfying first order optimality conditions (2.19), is unique if and only if the following conditions hold: (i) the gradient vectors V' g(xo, Ti), i = 1, ... , m, are linearly independent, and (ii) for any neighborhood N of the set {TI' ... , Tm} there exists h E IRn such that

h· V' g(xo, Ti) = 0, i = 1, ... , m,

h· V'g(Xo,Ti) < 0, i E ~o \ N.

Let us suppose now that f (x, u) and gr (x, u) are twice continuously differentiable. Let Xo be an optimal solution of (P) and consider the following auxiliary problem which involves second order information of (P,,) at (xo, uo),

(PQ ) Minw 2D f(xo, uo)(w, d) + D~xf(xo, uo)(h, h) d,h s.t. 2DG(xo, uo)(w, d) + D~xG(xo, uo)(h, h) E T(h),

(4.11)

where T(h) := T'k(G(xo, uo), DxG(xo, uo)h). Note that the above second order tangent set is the same as the one used in proposition 3.1 and theorem 3.3 for the unperturbed problem (P). Suppose that the second order tangent set T(h) is non empty. Then the (parametric) dual (see (2.3)) of (PQd,h) is

M { 2D"L(xo, A, uo)d+ } AEA(;:"o) D~xL(xo, A, uo)(h, h) - a(A, T(h)) .

(4.12)

Theorem 4.3 Let Xo be an ,!ptimal solution of (P) and suppose that: the directional constraint qualification holds at xo, the set A(xo, uo) of Lagrange multipliers is non empty, for every h E C(xo) the set K is second order regular at G(xo,uo) in the direction DxG(xo,uo)h with respect to DxG(xo,uo), the set K has a non empty interior, the second order sufficient condition (3.7) holds, ;nd (P"o+td) has an optimal solution x(t) converging to Xo as t -+ 0+. Then v'(uo, d) exists and

v'(uo,d) = ~ inf val(DQd,h). hEC(xo)

(4.13)


The above theorem is a combination of results obtained in [8] and [12]. It extends a result due to Gauvin and Janin [20] for nonlinear programming problems. In the case of semi-infinite programming the sigma term in (4.12) can be calculated and the second order regularity of K can verified· in the same way as in section 3. Note that it follows from the second order regularity of K that T(h) is non empty.

Let us make the following observations. Suppose that the second order condition

sup {D~",L(xo, >')(h, h) - a(>., T(h))} ;::: 0, V hE C(xo), AES(DLd)

(4.14)

holds, where S(DLd) is the set of optimal solutions of (DLd). Then the infimum in the right hand side of (4.13) is attained at h = 0, and hence in that case (4.13) reduces to v'(uo,d) = val(DLd)' In the convex case condition (4.14) holds automatically. Under the assumptions of theorem 4.3, the optimal solution x(t) is Holder stable of degree 1/2, i.e. IIx(t) - xoll = O(t1/ 2 ) for t ;::: 0. In order to ensure Lipschitzian stability of x(t), i.e. that perturbations of x(t) are of the same order as t > 0, some additional assumptions are required. We discuss this in the next section.

5 STABILITY AND SENSITIVITY OF OPTIMAL SOLUTIONS

In this section we discuss directional behavior of an optimal solution x( u) of the parameterized problem (Pu ) in a given direction dE U. It turns out that first order behavior of x (u) is closely related to second order analysis of the optimal value function v(u). Let Xo be an optimal solution of the unperturbed problem (P). In the following theorem we give sufficient conditions for directional Lipschitzian stability of x(u).

Theorem 5.1 Let x(t) be an optimal solution of (Pu +td) converging to Xo as o t -t 0+. Suppose that: (i) the directional constraint qualification holds at Xo in the direction d, (ii) the linearized problem (P Ld) has an optimal solution h such that for t ;::: 0,

dist (G(xo, uo) + t DG(xo, uo)(h, d), K)) = O(t2 ), (5.1)

128 CHAPTER 4

(iii) for every h E C(xo) the set K is second order regular at G(xo, uo) in the direction DxG(xo, uo)h with respect to the linear mapping

M(h, t) := DG(xo, uo)(h, td),

(iv) the following strong form of second order sufficient conditions holds

sup {D;xL(xo, )")(h, h) - a().., T(h))} > 0, V hE C(xo) \ {O}. (5.2) AES(DLd)

Then x(t) is Lipschitz stable at xo, i.e. for t :::: 0,

IIx(t) - xoll = O(t). (5.3)

Lipschitzian stability of optimal solutions of parameterized problems was discussed in various publications (see, e.g., [10] and references therein). The above formulation is due to [12]. The sufficient conditions of theorem 5.1, in a sense, are minimal assumptions required to ensure Lipschitzian stability of x(t). Assumption (ii) was introduced in [53]. It turns out that existence of an optimal solution of (PLd ) is a necessary condition for (5.3), [7]. Second order condition (5.2) is also minimal in the sense that the corresponding condition (4.14) is necessary for (5.3), [12],[51]. The second order regularity assumption (iii) is needed in order to use the second order tangent set T(h) in the sigma term of (5.2). It can be relaxed at the expense of enlarging the sigma term and hence strengthening (5.2) (see [12]). In particular, condition (iii) can be omitted if the sigma term is deleted from (5.2). Note that it follows from (iii) and (iv), respectively, that the sets T(h) and A(xo, uo) are non empty.

We specify now the assumptions of theorem 5.1 for the case of semi-infinite programming. The linearized problem (P L d ) can be written in the form

MinhElRn h· '\l xf(xo, uo) + d . '\l u/(xo, uo) subject to h· '\l xgr(XO, uo) + d· '\l ugr(XO, uo) :::; 0, T E ~o,

(5.4)

where ~o corresponds to the set of active at (xo, uo) constraints. The corresponding directional constraint qualification is given in (4.7). The optimization problem (5.4) is a linear semi-infinite programming problem. It can happen that this problem does not possess an optimal solution even if its optimal value is finite. As we mentioned earlier, in that case the optimal solution x(t) cannot be Lipschitz stable. If, however, Ao is finite, then (5.4) becomes a linear programming problem and in that case it has an optimal solution provided its optimal value is finite.

Suppose that (5.4) has an optimal solution h. Then condition (5.1) can be written in the form

for some c > 0 and all t ~ 0 sufficiently small. Note that h should satisfy the feasibility constraints of (5.4), i.e. (h, d) . V' gT(XO, uo) ~ 0 for all r E ~o. Therefore if ~o = T, for example, then (5.5) holds automatically. Suppose now that T is a compact subset of a normed space and that V' xg(xo, uo,·) and V' ug(xo, uo,·) are Lipschitz continuous on T. Let ret) be a maximizer, over T, of the function inside the brackets in the left hand side of (5.5). Then (5.5) holds if dist(r(t), ~o) = OCt) for t ~ O. This, in turn, can be ensured ([52, Lemma 1]) by the following second order growth condition: there exist k > 0 and a neighborhood N of ~o such that

-g(xo,uo,r) ~k[dist(r,~oW, VrETnN. (5.6)

Let us also note that if the directional constraint qualification (4.7) holds, and hence there is no duality gap between (5.4) and its dual, and J1. = E~l Ai<5(ri) is an optimal solution of the dual of (5.4), then the set S(PLd) of optimal solutions of (5.4) can be written in the form

In the following theorem we give sufficient conditions for directional differentiability of Lipschitz stable optimal solutions x(t).

Theorem 5.2 Letx(t) be an optimal solution of (Puo +td ). Suppose thatx(t) is Lipschitz stable, i. e. condition (5.3) is satisfied, that the directional constraint qualification holds at Xo in the direction d, and that for every h E S(P Ld) the set K is second order regular at G (xo, uo) in the direction DG (xo, uo) (h, d) with respect to the linear mapping DxG(xo,uo). Then: (i) for t ~ 0,

where val(Qd) is the optimal value of the problem

130

and Tl(h, d) := T'k(G(xo, uo), DG(xo, uo)(h, d)).

(ii) every accumulation point h of (x(t) - xo)/t, as t -+ 0+, solution of (Qd), (iii) if, in addition, (Qd) has a unique optimal solution h, then

1· x(t) - Xo h-Im = . t-tO+ t

CHAPTER 4

is an optimal

(5.8)

The above results were first obtained for nonlinear parametrized programs without the sigma term ([2,51]), and then extended to semi-infinite programming in [9,12]. It follows from (5.8) that, under the assumptions of the above theorem, an optimal solution of (Pu ) is directionally differentiable at Uo in the direction d provided the corresponding problem (Qd) has a unique optimal solution. The sigma term of the problem (Qd) can be calculated and the second order regularity of K can be verified in the same way as in section 3. Also in calculating the maximum with respect to the set S(DLd) (of optimal solutions of the dual problem), one can consider the discrete measures only.

REFERENCES

[1] E. J. Anderson and P. Nash. Linear Programming in Infinite-Dimensional Spaces. Wiley, New York, 1987.

[2] A. Auslender and R Cominetti. First and second order sensitivity analysis of nonlinear programs under directional constraint qualification conditions. Optimization, 21:351-363, 1990.

[3] A. Ben-Tal, M. Telboulle, and J. Zowe. Second order necessary optimality conditions for semi-infinite programming problems. In RHettich editor. Lecture Notes in Control and Information Sciences 15. pages 17-30. Springer Verlag, Berlin, 1979.

[4] A. Ben-Tal. Second order and related extremality conditions in nonlinear programming. Journal of Optimization Theory and Applications, 31:143-165, 1980.

[5] A. Ben-Tal and J. Zowe. A unified theory of first and second order conditions for extremum problems in topological vector spaces. Mathematical Programming Study, 19:39-76, 1982.

[6] J. F. Bonnans. Directional derivatives of optimal solutions in smooth nonlinear programming. Journal Optimization Theory and Applications, 73:27-45, 1992.

[7] J. F. Bonnans and R Cominetti. Perturbed optimization in Banach spaces, part I: a general theory based on a weak directional constraint qualification. SIAM J. Control and Optimization, 34:1151-1171, 1996.


[8] J. F. Bonnans and R. Cominetti. Perturbed optimization in Banach spaces II: a theory based on a strong directional qualification. SIAM J. Control and Optimization, 34: 1172-1189, 1996.

[9] J. F. Bonnans and R. Cominetti. Perturbed optimization in Banach spaces III: semi-infinite optimization. SIAM J. Control and Optimization, 34:1555-1567, 1996.

[10] J. F. Bonnans and A. Shapiro. Optimization problems with perturbations, a guided tour. SIAM Review, to appear.

[11] J. F. Bonnans, R. Cominetti, and A. Shapiro. Second order necessary and sufficient optimality conditions under abstract constraints. Rapport de Recherche INRIA 2952.

[12] J. F. Bonnans, R. Cominetti, and A. Shapiro. Sensitivity analysis of optimization problems under second order regular constraints. Rapport de Recherche INRIA 2989, 1996.

[13] J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. in preparation.

[14] R. Cominetti. Metric regularity, tangent sets and second order optimality conditions. Applied Mathematics and Optimization, 21:265-287, 1990.

[15] R. Cominetti and J.-P. Penot. Tangent sets to unilateral convex sets. C. R. Acad. Sci. Paris Ser. I Math. 321, 12:1631-1636, 1995.

[16] I. Ekeland and R. Temam. Analyse convexe et problemes variationnels. Collection Etudes Mathematiques. Dunod; Gauthier-Villars, Paris-Brussels-Montreal, Que., 1974.

[17] A. V. Fiacco. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. New York: Academic Press, 1983.

[18] J. Gauvin. A necessary and sufficient regularity condition to have bounded multipliers in nonconvex programming. Mathematical Programming, 12:136-138, 1977.

[19] J. Gauvin and J. W. Tolle. Differential stability in nonlinear programming. SIAM J. Control and Optimization, 15:294-311, 1977.

[20] J. Gauvin and R. Janin. Directional behaviour of optimal solutions in nonlinear mathematical programming. Mathematics of Operations Research, 13:629-649, 1988.

[21] B. Gollan. On the marginal function in nonlinear programming. Mathematics of Operations Research, 9:208-221, 1984.

[22] E. G. Gol'shtein. Theory of Convex Programming. Translations of Mathematical Monographs 36, American Mathematical Society, Providence, RI, 1972.

[23] R. Hettich and H. Th. Jongen. Semi-infinite programming: conditions of optimality and applications. In J. Stoer, editor. Optimization Techniques 2, pages 1-11. Springer, Berlin-Heidelberg-New York, 1978.

[24] R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods and applications. SIAM Review, 35:380-429, 1993.

132 CHAPTER 4

[25] A. D. Ioffe. On some recent developments in the theory of second order optimality conditions. In S. Dolecki editor. Optimization, Lecture Notes in Mathematics, pages 55-68, vol. 1405. Springer Verlag, Berlin, 1989.

[26] A. D. Ioffe. Variational analysis of a composite function: A formula for the lower second order epi-derivative. J. of Mathematical Analysis and Applications, 160:379-405, 1990.

[27] H. Th. Jongen and G. Zwier. On the local structure of the feasible set in semi-infinite optimization. In B. Brosowski, F. Deutsch, editors. Parametric Optimization and Approximation, Int. Series of Num. Math., 72:185-202. Basel: Birkhauser, 1985.

[28] H. Kawasaki. An envelope-like effect of infinitely many inequality constraints on second-order necessary conditions for minimization problems. Mathematical Programming, 41:73-96, 1988.

[29] H. Kawasaki. The upper and lower second order directional derivatives of a suptype function. Mathematical Programming, 41:327-339, 1988.

[30] H. Kawasaki. Second-order necessary and sufficient optimality conditions for minimizing a sup-type function. Appl. Math. Opt., 26:195-220, 1992.

[31] D. Klatte. Stability of stationary solutions in semi-infinite optimization via the reduction approach. In W. Oettli, D. Pallaschke, editors. Advances in Optimization, pages 155-170. Springer, Berlin-Heidelberg-New York, 1992.

[32] D. Klatte. Stable local minimizers in semi-infinite optimization: regularity and second-order conditions. J. Compo Appl. Math., 56:137-157, 1994.

[33] D. Klatte. On regularity and stability in semi-infinite optimization. Set- Valued Analysis, 3:101-111, 1995.

[34] D. Klatte and R. Henrion. Regularity and stability in nonlinear semi-infinite optimization. This volume.

[35] S. Kurcyusz. On the existence and nonexistence of Lagrange multipliers in Banach spaces. Journal of Optimization Theory and Applications, 20:81-110, 1976.

[36] P.-J. Laurent. Approximation et Optimisation. Collection Enseignement des Sciences, No. 13. Hermann, Paris, 1972.

[37] F. Lempio and H. Maurer. Differential stability in infinite-dimensional nonlinear programming. Applied Mathematics and Optimization, 6:139-152, 1980.

[38] E. S. Levitin. Differentiability with respect to a parameter of the optimal value in parametric problems of mathematical programming. Kibernetika, pp. 44-59, 1976.

[39] E. S. Levitin. Perturbation Theory in Mathematical Programming. Wiley, Chichester, 1994.

[40] O. L. Mangasarian and S. Fromovitz. The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl., 17:37-47, 1967.


[41] J. P. Penot. Optimality conditions for minimax problems, semi-infinite programming problems and their relatives. Report 92/16, UPRA, Laboratoire de Math. Appl., Av. de I'Universite, 64000 Pau, France.

[42] J. P. Penot. Optimality conditions in mathematical programming and composite optimization. Mathematical Programming, 67:225-245, 1994.

[43] B. N. Pshenichnyi. Necessary Conditions for an Extremum. Marcel Dekker, New York, 1971.

[44] S. M. Robinson. Regularity and stability for convex multivalued functions. Mathematics of Operations Research, 1:130-143, 1976.

[45] S. M. Robinson. Stability theorems for systems of inequalities, Part II: differentiable nonlinear systems. SIAM J. Numerical Analysis, 13:497-513, 1976.

[46] S. M. Robinson. First order conditions for general nonlinear optimization. SIAM Journal on Applied Mathematics, 30:597-607, 1976.

[47] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.

[48] R. T. Rockafellar. Conjugate Duality and Optimization. Regional Conference Series in Applied Mathematics, SIAM, Philadelphia, 1974.

[49] R. T. Rockafellar. Second-order optimality conditions in nonlinear programming obtained by way of epi-derivatives. Mathematics of Operations Research, 14:462-484, 1989.

[50] A. Shapiro. Second-order derivatives of extremal-value functions and optimality conditions for semi-infinite programs. Mathematics of Operations Research, 10:207-219, 1985.

[51] A. Shapiro. Sensitivity analysis of nonlinear programs and differentiability properties of metric projections. SIAM J. Control and Optimization, 26:628-645,1988.

[52] A. Shapiro. Perturbation analysis of optimization problems in Banach spaces. Numerical Functional Analysis and Optimization, 13:97-116, 1992.

[53] A. Shapiro. On Lipschitzian stability of optimal solutions of parametrized semiinfinite programs. Mathematics of Operations Research, 19:743-752, 1994.

[54] A. Shapiro. Directional differentiability of the optimal value function in convex semi-infinite programming. Mathematical Programming, Series A, 70:149-157, 1995.

[55] A. Shapiro. On uniqueness of Lagrange multipliers in optimization problems subject to cone constraints. SIAM J. Optimization, 7:508-518, 1997.

[56] C. Ursescu. Multifunctions with convex closed graph. Czechoslovak Mathematical Journal, 25:438-441, 1975.

[57] W. Wetterling. Definitheitsbedingungen fur relative Extrema bei Optimierungsund Approximationsaufgaben. Num. Math., 15:122-136, 1970.

[58] J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programming problem in Banach spaces. Applied Mathematics and Optimization, 5:49-62. 1979.

5 EXACT PENALTY FUNCTION METHODS FOR NONLINEAR

SEMI-INFINITE PROGRAMMING Ian D. Coope and Christopher J. Price

ABSTRACT

Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand,

Email: [email protected] [email protected]

Exact penalty function algorithms have been employed with considerable success to solve nonlinear semi-infinite programming problems over the last sixteen years. The development of these methods is traced from the perspective of standard nonlinear programming algorithms. The extension of standard theory to the semi-infinite case is illustrated through simple examples and some of the theoretical and computational difficulties are highlighted.

1 INTRODUCTION

In this paper, the development of Exact Penalty Function (EPF) algorithms for solving nonlinear semi-infinite programming (SIP) problems is traced from the perspective of standard nonlinear programming algorithms and theory. Exact penalty functions (particularly the L1-Penalty function) are frequently recommended for finite Non-Linear Programming (NLP) problems but there are some difficulties in applying the now standard techniques of NLP to the SIP case. The extension of standard theory to the semi-infinite case is illustrated through simple examples and some of the theoretical and computational difficulties are highlighted. Some of the more successful penalty function algorithms for SIP are contrasted in Sections 2 and 3 and the substantial computational task of computing values of the appropriate penalty function is addressed in Section 4.

137

R. Reemtsen andJ.-J. Riickmann (eds.), Semi-Infinite Programming, 137-157. © 1998 Kluwer Academic Publishers.

138 CHAPTER 5

1.1 The SIP problem

The problem considered in this paper is

minf(x) subject to g(x, t) ::; 0, 'tit ETC RP. (1.1) x

Here the objective function, f : R n -+ R, and the constraint function g, mapping R n x T into R, are both continuously differentiable in all arguments. The set T is compact, connected, and defined by a finite number of continuously differentiable constraints which satisfy an appropriate constraint qualification. Frequently T is a Cartesian product of intervals e.g. T = [0,1] x [0,1]' for p = 2, but in general may be more complex. For convenience the problem has been restricted to one semi-infinite constraint, and auxiliary constraints have been omitted.

Example 1.1 (Circle constraint) In this example, the semi-infinite constraint is simply a parametric form of the unit circle.

min{xl+2x2} subjectto xlcost+x2sint::;l, 'tItE[0,27l"]. (1.2)

Here x occurs linearly so this corresponds to a linear semi-infinite programming problem. If the set T is approximated by discretization, T = {tl' t2, ... , tm },

wheretj = 27l"j /(m+l),j = 1,2, ... , m then the following approximating (finite) linear programming problem is obtained.

subject to

[

cOStl sinh 1 cOSt2 sin t2

• • X ::; · . · . costm sintm

For example, when m = 100, the solution to this linear program is x = [-0.473, -0.882]T, where two of the linear constraints are active. However, simple algebra reveals that the true solution is x* = [- cos t* , - sin t*jT, where tan t* = 2 (i.e. x* = [-0.447, -0.894jT to 3 d.p')' The point t* = -2.0344 (= atan2( - 2, -1) in Fortran notation) is the only active point for this problem, where active points are defined to be values of t for which g(x*, t) = o. An important point to note is that, even for this very simple problem, where the only active point is easily identified, the semi-infinite constraint cannot be replaced by the single linear constraint g(x, t*) = 0 because the resulting linear

Exact Penalty Function Methods 139

programming problem has infinitely many solutions only one of which solves the original problem. The semi-infinite constraint furnishes (implicitly) a second equation gt(x, toO) = 0 which determines the solution uniquely.

This simple problem illustrates that even linear semi-infinite programming problems are inherently non-linear. Clearly problem (1.2) is equivalent to

min{x1 + 2X2} subject to h(x) ~ 0,

where h(x) == maxtE[0,21rdX1cost + x2sint} - 1. Also, g(x,t) achieves its maximum value (for given x) when gt(x, t) = 0 which occurs when tan t = X2/X1 (t = atan2(x2, Xl)). Algebraically this then leads to the nonlinear constraint

h(x) == Jxr + x~ - 1 ~ 0,

which shows the equivalence of this linear SIP to the non-linear programming problem

(1.3)

This example shows that when viewed from the perspective of nonlinear programming there may not be much to be gained by treating linear SIPs differently from nonlinear SIPs. However, the next example shows that linearity can sometimes be helpful in determining the number of active points in a SIP.

Example 1.2 (One-sided approximation) One-sided approximation of functions by polynomials, p(x, t) = LXiti-1, provides a useful source of SIP problems; the following example results from seeking such an approximation to tan x.

subject to

min r1Ip(x,t) - tantl dt

z 10

tant - p(x,t) ~ 0, t E [0,1].

In this example the semi-infinite constraint can be used to remove the modulus signs and the resulting integral is then easily evaluated to give the equivalent problem:

n

m~n L xiii == f(x) (1.4) i=l

140 CHAPTER 5

subject to g(x, t) == tan t - I; Xiti-l ::; o.

This example is also linear in x. Letting x* denote the solution, suppose that there are m* active points {tj}T:l at x*. Then g(x*, tj) = 0, j = 1,2, ... , m*, and if these m* linear constraints are linearly independent then m* ::; n. Since g(x*, t) is maximized at tj either 'ltg(x*, tj) = 0 or tj is a boundary point of T. For each interior active point, tj, in addition to g(x,tj) = 0, there is an extra nonlinear equation, 'ltg(x, tj) = 0, that x* must satisfy because each tj is a global maximizer of g. Thus if x* is unique it can be expected that there will be at least n/2 active points at the solution to this problem (supposing also that 'lttg(x*, tj) is invertible}. In fact, when n is even m* = n/2 + 1 (with both boundary points active) and when n is odd m* = (n + 1)/2 (with one boundary point active), and the active points are related to shifted Labatto or Radau abscissas!4J. This makes this problem particularly useful as a test problem for SIP algorithms. This problem is also a linear SIP problem and as in the case of Example 1.1 replacing the semi-infinite constraint by a finite set of constraints at the appropriate active points would yield a linear programming problem with insufficient constraints to define a unique solution.

The two examples above serve to show that even if the active points for a given problem were known in advance it would still not be enough to replace the SIP with the non-linear program

min f(x) subject to g(x, tj) ::; 0, j = 1,2, ... , m*.

In order to make further progress via standard nonlinear programming theory it is useful to explore the consequences of optimality conditions for (regular) SIP problems.

1.2 Optimality conditions

Let x* E R n be feasible and let there exist points t: E RP, i = 1,2, ... , m* satisfying

g(x*, ti) = 0, i = 1,2, ... , m*

and non-negative multipliers Ai, i = 1,2, ... , m* such that

m·

'If(x*) + I>i'lg(x*,ti) = O. i=l

Then x" is defined to be a Stationary Point of the SIP.

Now let x" be a local minimizer of SIP and let A(x, T/) denote the set

A(x, T/) = {t E T I g(x, t) ~ -T/ and t is a local maximiser of g(x, t) over T} (1.5)

then the following assumption is made:

Assumption 1.3 The gradients {V' ",g(x*, t), 'lit E A(x*, On are linearly independent.

Under this assumption the set A(x*, 0) is necessarily finite and x* is necessarily a stationary point of SIP. Of course, any local minimizer of a SIP problem is necessarily feasible. Hence either the set A(x*, 0) (of active points) is also the set of global maximizers of g(x*, t) for t E T or the semi-infinite constraint is redundant.

Optimality conditions can be derived under much weaker constraint qualifications than assumed here, (see, for example, [16] and the references therein), but from an algorithmic point of view there are disadvantages that occur when the optimality conditions assumed above do not hold at a local minimizer of SIP. At best this may be manifested in slow convergence of an algorithm and at worst false convergence or even the breakdown of an algorithm.

The optimality conditions described under the assumptions of this section allow a new interpretation of a SIP problem, at least locally, as a finite non-linear programming problem as follows.

1.3 Local approximation of a SIP problem

Let x" denote a solution to problem (1.1) with A(x*, 0) = {ti, t; ... , t~.} the set of active points. If each global maximizer t: of g(x·,·) satisfies second order sufficiency conditions then there exists € > 0 and continuous functions ti'{x) such that ti'{x*) = t: for all i, and each ti(x) E A(x,oo) for all x satisfying IIx - x* II < €. In this case the semi-infinite constraint can be replaced locally by the finite (but in general non-linear) constraints hi'{x) :$ 0, i = 1,2, ... , m*, where

hi(x) == g(x, ti(x)) , i = 1,2, ... , m*. (1.6)

For the case of Example 1.1 there is only one active point ti and ti(x) =

142 CHAPTER 5

tan-1(x2/xd and the SIP problem (1.2) is certainly locally equivalent to the NLP problem (1.3). At any other point xtt where each member of A(xtt ,17) satisfies second order sufficiency conditions for some 17 > 0, functions t~(x) satisfying appropriate conditions can be found which allow the semi-infinite constraint to be locally replaced in a manner similar to (1.6).

The introduction of constraint functions defined in (1.6) makes it possible to write optimality conditions for SIP problems in terms of a Lagrangian Function,

m*

L(x, A) = f(x) + L Aih;(x). (1.7) i=l

Here the hi(x) functions are defined in (1.6). The conditions for a stationary point can be written

V' L(x* A*) = 0· x, ,

A; ~ 0, hi(x*) ::; 0, A; hi(x*) = 0, i = 1,2, ... , m*, (1.8)

which compares with the optimality conditions for more standard NLP problem.

The above arguments show that if m* is known then, at least locally, the SIP problem can be reduced to solving the nonlinear equations

m*

V'f(x) + LAiV'h;(x) = 0, i=l

hi(x) =0, i=1,2, ... ,m*,

which is a system of n +m* nonlinear algebraic equations in n+m* unknowns, (x, A). Given a sufficiently accurate starting point, these equations can then be solved by Newton's Method with (usually) a second order rate of convergence. This is essentially the approach taken in [11], [15], but it depends on having good initial approximations (at least good enough that the number of local maximizers of 9 correspond to the number of active points m* at the solution). Of course, good starting points are rarely easily available and successful implementations usually form the last phase of a two or even three phase approach [12].

Exact Penalty Function Methods

2 EXACT PENALTY FUNCTIONS FOR SEMI-INFINITE PROGRAMMING

143

In NLP the dependence on requiring good initial approximations is often successfully removed by employing exact penalty functions to force convergence from remote starting points. The basic underlying difficulty in extending the successful EPF algorithms of NLP to the semi-infinite programming case is to replace the infinitely many constraints of the latter by an appropriate finite number of constraints. For technical and practical reasons (outlined in [32], [33] and discussed further in Section 4) it is convenient to focus attention on all local maximizers of g(x, t) that exceed a slightly negative threshold -"I, rather than just the members of A(x, 0), where A(x, "I) is defined in equation (1.5) as the set of local maximisers of 9 which take values not less than -"I. For a given "I > ° let A( x~ , "I) = {tL tt ... , t~}, and assume that a finite set of functions {t~ (x)} exists such that each t~ (x) is continuous at x~. Also assume that t~(x~) = t~ for all i and that t~(x) E A(x,oo) for all i and Vx near x~. Strictly, t~(x) and m(x~) also depend on "I but this dependence is kept implicit for compactness of notation. The semi-infinite constraint,g(x, t) :$; 0, Vt E T, can now be replaced by the (finitely many) constraints:

h~(x) == g(x, t~(x» :$; 0, i = 1,2, ... , m(x~). (2.1)

Provided that the complications that may arise when m(x~) :I m* can be handled, this allows the successful exact penalty function methods of NLP to be applied to the SIP problem. Although the h~(x) functions are important for theoretical purposes, practical algorithms almost always evaluate each h~(x) and its derivatives only at the point x = x~. Accordingly x == x~ has been used in the remainder of the paper, and the 'U' superscripts have been suppressed for compactness of notation.

2.1 L1 penalty functions

One of the most recommended penalty functions for NLP is that based on the L1 norm of the constraint violations. The analogue for the SIP case is the penalty function

m(z)

(2.2) i=1

where [h)+ denotes the maximum of hand 0. This penalty function has been used successfully [7], [33), to solve problems with dim T :$; 2. An unfortu-

144 CHAPTER 5

nate disadvantage with this penalty function is that it may be discontinuous! Although this potential disadvantage caused no difficulties on the problems reported in [7], [33], examples of SIP problems for which discontinuities may cause difficulties can be found in [23], [30]. These discontinuities can only occur at infeasible points but the consequent invalidity of convergence results for such problems makes this penalty function inadvisable for general SIP problems.

An alternative LI penalty function is considered in [5], (see also the earlier work [21]).

A ItET[g(x, t)]+ dt PI (x, p,) = f(x) + P, 1, [ (t)] dt

tET sgng x, + (2.3)

In principle this is just a generalization of the LI exact penalty function for NLP but the normalising denominator integral is essential if PI is to be an exact penalty function [5]. In practice, the accurate evaluation of the integrals makes this less attractive than the Loo penalty functions that follow but it does have the advantage of not assuming a finite number of global maximizers of 9 at each iterate. Unfortunately, it suffers the same disadvantage as PI of having potential discontinuities at infeasible points.

2.2 Loo penalty functions

An exact penalty function for SIP which does not suffer the disadvantage of possible discontinuities at infeasible points is the Loo EPF

Poo(x,p,) = f(x) +p,max[g(x,t)]+ (2.4) lET

Although less popular in the NLP context this EPF is certainly preferable for SIP problems. A variation on the EPF (2.4) is considered in [23], [24],

Poo(x,p"v) = f(x) + p,{}(x) + ~v{}2(X) (2.5)

where (}(x) = max[g(x, t)]+

lET

and v ~ 0 is an extra penalty parameter. Clearly, when v = 0 the two penalty functions (2.4) and (2.5) are identical. The main advantage in using v > 0 is that it allows smaller choices of p, to be chosen on many problems. Other advantages are discussed in [6], [24].


2.3 EPF properties

Each of the penalty functions (2.2), (2.3), (2.4), (2.5) is exact in the sense that for sufficiently large values of f-L local minimizers of the SIP are local minimizers of the penalty function. For example, the penalty functions (2.4), (2.5), are exact provided that

(2.6)

where ,X* is the vector of Lagrange multipliers defined by (1.8). Thus it is convenient to replace the SIP problem with that of finding a minimizer of an appropriate EPF, P(x, f-L). Of course, standard methods for unconstrained optimization are not appropriate because P(x, f-L) is not differentiable. Derivative discontinuities occur where hi(x) = 0 (e.g. at the solution). However, directional derivatives exist wherever f and h are differentiable and this is enough to form a framework for forcing convergence in appropriate line search or trust region algorithms.

3 TRUST REGION VERSUS LINE SEARCH ALGORITHMS

In unconstrained optimization the choice between line search or trust region algorithms often depends on the availability of derivatives. If second derivatives of the objective function are easily available then trust region algorithms are often the preferred choice because stronger convergence results have been established in this case. For SIP problems the most time consuming part of any EPF algorithm is the evaluation of the penalty function and this important aspect is considered in Section 4. Because second derivative methods usually require considerably fewer iterations than first derivative methods they naturally require fewer evaluations of the EPF. However, second derivatives are tedious to provide. This is especially the case for SIP problems because it is the second derivative matrix of the Lagrangian function (1. 7) that is important for fast convergence. Unfortunately, hi(x) depends implicitly on ti(X) through equations (2.1) and this implicit dependence carries through to the derivatives. This presents no difficulty for first derivatives because 'V'hi(x) = 'V' xg(x, ti) evaluated at ti = ti(X) but second derivatives 'V'xxhi(X) depend on 'V'xtg(X,ti) and 'V'ttg(x, ti). Moreover, the dependence on these latter derivatives is complicated by the type of local maximizer that ti(X) represents (strictly interior of T or on one or more boundaries of T). Formulas for the cases dim T = 1 and dim T = 2 are given in (33). Because of these extra complications, which become more severe as dim T increases, it is much more convenient to apply

146 CHAPTER 5

first derivative methods and in particular Quasi-Newton methods can automatically take account of the implicit dependence. Specific EPF algorithms are considered in the next three subsections.

3.1 Line search algorithms

In algorithms of this type, the first step is to compute a direction of descent, d, for the EPF at the current approximation. Thus d is chosen to satisfy DdP(x, J.t) < 0 where

D P( ) 1· P(x + 'Yd, J.t) - P(x, J.t) d x,J.t = 1m

1'-+0+ 'Y

is the directional derivative of the EPF in the direction d. Then a line search is performed in order to find a new approximation, x+ = x + ad, which sufficiently decreases the value of the EPF by satisfying a condition of the form:

Typically the search direction d solves the quadratic programming (QP) subproblem

subject to hi(x) + dT '\1 hi (x) :::; 0, i = 1 : m(x).

Early second derivative algorithms based on LIEPFs can be found in [5], [7], and [33).

For a superlinear rate of convergence the matrix H should be the Lagrangian Hessian '\12 L(x, A) defined from (1.7), or an appropriate approximation. However, if H is not positive definite then d may not be a descent direction. This can be countered by using, for example,

H ~ '\12 L(x, A) + 71

where 7 ~ 0 is chosen to force positive definiteness as in [33], or more subtly as in [7]. Sometimes the line search is replaced by a quadratic arc search, x+ = x + ad + a 2 s, where s is a projection step to prevent loss of superlinear rate of convergence through the so-called Maratos Effect as in [23], [24].


3.2 Trust region algorithms

As an alternative to line search algorithms, trust region methods also force convergence by requiring descent of the EPF but now x+ = x + d where, typically, for the L1 EPF (2.2), (see e.g. [32]), d minimizes:

dT \1 f(x) + !~ Hd + ~ ~ [hi(x) + dT\1hi (x)]+

subject to (3.1)

For the case of the L= EPF (2.4) this subproblem can be replaced by (see e.g. [30])

dT \1 f(x) + !dT Hd + ~max[hi(x) + dT\1hi (x)]+

subject to the trust region constraint (3.1). These subproblems defining the correction d are similar in complexity to the QP subproblems of line search algorithms above. In algorithms of this type, descent of the EPF is forced by adjusting the size of the trust region (3.1) appropriately. As with line search methods the condition H ~ \12 L(x,'\) is required for a superlinear rate of convergence but now there is no longer a requirement on positive definiteness of H to guarantee descent.

3.3 Quasi-Newton algorithms

More recent EPF based algorithms for SIP employ quasi-Newton (Q-N) approximations to \12 L(x, '\). As previously mentioned this avoids some of the complications that arise in second derivative methods but with some loss of efficiency. The now well-known deficiency of the L1 EPF (2.2) for SIP problems has focussed more attention on the L= EPFs (2.4), (2.5). Examples of these methods can be found in [2], [22] and [24] where the BFGS updating formula is used to provide Q-N approximations for \12 L. In [24] a descent direction is calculated by solving an L=QP subproblem of the form

!dTHd+~\1f(x) + !~Hd+~((d) + !1I(2(d)

where ((d) = max [g(x, t) + dT \1 g(x, t)]+

tEE(z)

This is equivalent to a strictly convex QP subproblem and the solution is guaranteed to define a descent direction for the EPF (2.5) provided that J.L is sufficiently large [22]. This is ensured automatically by adjustment of the penalty parameters, if necessary. Numerical results can be found in [24] which includes examples where dim T = 6.

148 CHAPTER 5

4 THE MULTI-LOCAL OPTIMIZATION SUBPROBLEM

Exact penalty function methods such as [7], [23], [29), [30], [32], [33), [34], require that the EPF be evaluated at each iterate and at each trial point. The calculation of an L= EPF at a point Xo requires the solution of the Global Optimization Problem (GOP)

max[g(xo, t)]+ tET

(4.1)

In order to ensure convergence under the appropriate conditions, algorithms based on an L= EPF must do more than solve (4.1). All members of the set

B(xo,1]) = {t E M(xo) : g(xo,t) ~ max[g(xo,t)]+ -1]} (4.2) tET

must be found for some pre-specified positive 1], where M(xo) is the set of local maximizers of g(xo, t) over T. If an L1 EPF is employed then the additional condition 1] > maxtET[g(XO, t)]+ is needed to ensure convergence. The problem (4.2) of finding all the global and near global maximizers is referred to as the Multi-Local Optimization Problem (MLOP). This section discusses the solution of the sequence of MLOP's generated during the solution of a SIP problem by an EPF based SIP algorithm.

One common method, [7], [29], [30], [33], [34], for solving the MLOP is to find coarse approximations to the maxima by comparing function values on a uniform mesh, and then to refine these approximations using a local search method such as Newton's method. In the context of global optimization, this method has a number of areas in which improvements are often sought. First, there is no attempt made to search more thoroughly parts of T most likely to contain members of B. Second, the number of function evaluations is exponential in p. Third, if g(x, t) is nearly constant along some axes of the mesh in T, then most function evaluations will be redundant. Finally, there is no guarantee that all, or indeed any of the global and near global maximizers will be found.

Guaranteed solutions are obviously desirable, but, as is shown in the following subsection, these are even more difficult to obtain for the MLOP than for the corresponding GOP.

4.1 Methods which give bounds on the global maximum

Several methods exist which use Lipschitz conditions [8], [17], [31], [35], or inclusion functions [18], [25], to provide guaranteed error bounds on the global maximal value of g(x,.) on T. These methods explicitly or implicitly provide a sequence of piecewise continuous upper bounds {un(tn on g(x, t) and either a sequence of lower bounds {Sn} on the maximal value of g(x, t) on T, or a sequence of piecewise continuous lower bounds {In(tn on g(x, t). If the bounds are to be of some use then In(t) < un(t) must hold for some t E Tj otherwise In(t) = un(t) = g(x, t) which provides no useful information. Usually In(t) < un(t) holds for every t E T except values of t at which 9 has been explicitly calculated. Such bounds do not provide any guarantees on the number and values of the local maxima. The situation is unchanged if the sequence of bounds {sn} is used in place of {In(tn unless Sn = maxt un(t) for some nj an event which is exceptionally rare in practice.

Guarantees about the local maximizers could be found, for example, by using one of these methods to find the stationary points of 9 by minimizing IIV'tgI1 2 ,

but this is usually much more difficult than finding the global maximum of g(x, t) because every stationary point of g(x, t) is a global minimizer of lIV'tgIl 2 .

These methods are generally slower than types which do not provide an upper bound on the global maximum. Moreover, an appropriate inclusion function or Lipschitz constant is not always available for each g(x(k), t). In the absence of a guarantee, other more efficient types of global optimization algorithms appeal.

4.2 Advantages of solving a sequence of MLOPs

If the method used to solve the MLOP does not guarantee that all members of l3(x,71) will be found, then the reliability of the MLOP algorithm may be increased by making use of information from previous multi-local optimizations. One simple method of doing so is to track the peaks of g(x,.) using, say, local searches starting from each global or near global maximizer from the previous MLOP call. Alternatively, adapting an idea from [30], a first order prediction of the position t + 6t at the point x + 6x of the maximizer t of g(x,.) can be found using 6t = -(V'ttg)-1(V'xtg)6x. A local search could then be started from the predicted position of each maximizer of 9 at x + 6x.

150 CHAPTER 5

The use of a tracking process increases the reliability of the SIP algorithm in two ways. First, it reduces the chance that a peak will occasionally be missed by the MLOP algorithm, a situation which can easily lead to a failure of the SIP algorithm. Second, once a strictly positive maximum is detected, the SIP algorithm will try and force it to become non-positive in subsequent iterations, usually making the peak smaller and more difficult to detect if a tracking process is not used.

The use of a tracking process is especially advantageous for a stochastic MLOP algorithm. In later iterations when 9 is not changing much from iteration to iteration, the probability of detecting the peak increases with the number of multi-local optimizations. One can, in effect, amortize a multi-local optimization with kN sample points over k multi-local optimizations, each using N sample points.

The fact that x and hence g(x,.) are only changing slightly in later iterations offers scope to reduce the computational effort put into each multi-local optimization by ignoring parts of T on which g(x,.) is large and negative. This carries some risk in that trying to minimize f when the semi-infinite constraint is active often tends to increase 9 in parts of T which are not being monitored.

4.3 Clustering methods

Clustering methods are a natural choice for solving the MLOP, because, unlike many other algorithms, they solve a global optimization problem by treating it as a multi-local problem. Clustering methods typically have two phases: the first concentrates the sample points around the maximizers, and the second groups the sample points into clusters. Two sample points are placed in the same cluster iff they are believed to lie in the region of attraction of the same local maximizer. A local search is then performed from the highest sample point in each cluster, and the highest local maximizer found by these searches is taken as a global maximizer.

One clustering algorithm that has been shown to be particularly effective is Multi-Level Single Linkage (MLSL) [27], [28]. This algorithm calculates 9 at a number of randomly selected sample points. These sample points are concentrated around the maximizers by clustering only a subset 1l of the sample points, where 1l contains the highest 20% of sample points. There is no need to form the clusters explicitly, it is sufficient to link the members of 1l. A point tl E 1l is linked to a point t2 E 1l iff g(x, tl) < g(x, t2) and IIt2 - tlll ~ lmax.


Members of 1£ which can not be linked to a higher point are used as the initial points for the local searches. Here

1

fmax = _1 {r (E + 1) atlog(N) 1 1 dt} P y7f 2 N tET

(4.3)

where r is the Gamma function, N is the number of sample points, and at is a parameter which governs the resolution of the algorithm [27], [28]. This choice of fmax(N, at) means that the volume of the set of points within a distance fmax of an arbitrary given point is proportional to ITllog(N)jN, where ITI denotes the volume of T.

Normally either at = 2 or at = 4 is used. With the former, the probability that any local searches are started at each iteration tends to zero as the iteration number increases, whereas with the latter only a finite number of local searches will ever be performed with probability 1. Decreasing at increases the ability of the algorithm to resolve two maximizers in close proximity, at the expense of an increase in the expected number of local searches.

Stopping conditions [3], [27], have been derived which are based on the assumption that, a priori, any number of local maximizers is equally likely for g, and that the relative sizes of the regions of attraction of these local maximizers have a uniform distribution on the unit simplex.

4.4 An implementation of a clustering method

A modified form of MLSL (hereafter referred to as MLSLR) using a reliability estimate for each link has been implemented in [22], [24]. These reliability estimates can be used to construct a stopping rule without invoking the assumptions used by [27], [28], and also allow local searches to be ranked according to the expectation that they will lead to a new maximizer. Along the line segment between each pair of linked sample points tl and t2 the function

was modelled by a Brownian motion process with mean hjf and a given variance c, where h = g(x, t2) - g(x, td and f = IIt2 - h II. The reliability of the link is defined as the probability that p(a) = 0 for some a E [0, fl. Simple dimensional arguments show that the reliability is an increasing function of h2 j(f3c).

152 CHAPTER 5

Estimating numerical values for the reliabilities is rather more involved [22]. If it is assumed that c is the same for all links, then the links can easily be placed in increasing order of reliability.

The algorithm generates the sample points in pairs. The first of each pair is drawn from a Halton sequence [13]. Such quasi-random sequences are preferred to randomly generated points as they have better uniformity properties [1], [19]. The nearest existing sample point is then located, and the second member of the pair is chosen to lie on the line through the first member of the pair and its nearest sample point. If c is assumed to be the same for all links then it may be estimated from these co-linear triple of points. The highest fraction (typically 20%) of sample points are linked, where two points are linked if the distance between them does not exceed a bound similar to (4.3) and they satisfy the additional restriction that the reliability exceeds a minimum value. The algorithm terminates if the average reliability exceeds a prespecified value. MLSLR was used as the multi-local optimization subroutine for the quasiNewton SIP algorithm of [23]. The MLSLR subroutine attempted to track maximizers by starting a local search from the position of each maximizer found with the previous call of MLSLR.

SIP's have been solved [22], [24] using the SIP algorithm described in [23] with values of p up to six, and the results are presented in [24]. Three problems with p > 2 were looked at.

4.4.1 Problem S Problem S for p = 6 is as follows:

f(x) XIX2 + X2X3 + X3 X 4

g(x, t) = 2(x~ + x~ + x~ + x~) - 6 - 2p

+ sin(tl - Xl - X4) + sin(t2 - X2 - X3)

+ sin(t3 - xt} T sin(2t4 - X2) + sin(t5 - X3) + sin(2t6 - X4)

T [O,I]P x(O) (1,1,1, If

When p is less than 6, all terms in 9 involving any ti with i > p are deleted. This problem has well spaced local maxima and the algorithm had no difficulty in locating them for 3 :::; p :::; 6.

Exact Penalty Function Methods

4.4.2 Problem T

This problem was solved for values of p ranging from 3 to 6.

4

I(x) = LX; - Xi

g(x, t)

i=1

441 -~x~+~-~ t ~l+w. i=1 i=1 t

[-3,3]P

(-2.25, -2.5, -2.75, -3.0)T

where, using l..i J to denote the greatest integer not larger than j,

P

WI = L(tj- X l)2

j=1

P

W2 L (tj - x2(-1)j)2 j=1

P 2

W3 L (tj -X3(-1)li/2J) j=1

P 2

W4 = L (tj - X4( _1)lU+1)/2J) j=1

153

The constraint function consists of the - L: x; term plus the sum of four identically shaped humps each of maximum height 1. Each hump moves along a ray emanating from the origin, where the apex of the ith hump lies at a distance XiVP from the origin. For all sufficiently large IIxll, 9 is non-positive on T. Roughly, the infeasible region is a roundish shape centred on the origin with the initial point on one side and the SIP's solution x* on the other. At the initial point 9 has four peaks. As the SIP algorithm proceeds these four merge into one peak as the sequence of iterates moves into the infeasible region, and then part again in stages as the sequence of iterates leaves the feasible region and proceeds towards the SIP's solution x*. At x*, 9 is almost flat between the four global maxima. For p = 6 the four global maximizers at x* lie within a hypercube of volume O.000086ITI. Lagrange multiplier estimates indicate two of the four maximizers are sufficient to satisfy the first order KKT conditions at x*. Hence even if a peak is not located, it will grow no larger than the peaks which have been located. In spite of this, the use of a tracking process enabled MLSLR to locate all four maximizers.

154 CHAPTER 5

4.4.3 Problem U

This problem tests an algorithm's ability to exploit any lack of curvature of the constraint function along certain directions. It illustrates very nicely the advantages of using a multi-local optimization algorithm based on a quasirandom set of points rather than a grid.

f(x) 4 1

= L lOx; -Xi

i=l

g(x, t) ~4 sin (30t1 sin(xt} + 30t2 COS(X2))

+ ~~ sin (t~~ ) + t3 Xl + t4X2 + t5X3 + t6 x4 - 4

T = [-1,1]6

x(O) (3,2,1, O)T

The constraint function g(x,.) is oscillatory in the first two dimensions of t, and linear in the last four. The period of the oscillations is just over one tenth of the length of one edge of T. If these oscillations are to be reliably detected without exploiting the partial linearity of g, then there needs to be about 20 sample points along each axis of the grid, or about 64,000,000 sample points per MLOP. In contrast MLSLR used at most 2400 sample points per MLOP, and found all maxima at the SIP's solution.

The use of an algorithm specifically designed to efficiently solve a sequence of related MLOP's can dramatically improve the performance of an EPF based SIP algorithm. Passing information between multi-local optimizations can significantly enhance the ability to locate maxima.

5 FINAL COMMENTS

It should be clear from Section 4 that the major difficulty in implementing EPF methods for SIP lies in the subproblem of determining global and (some) local maximizers of 9 on the set T. Of course, convergence theorems on EPF methods assume that the penalty function can always be evaluated exactly and so EPF based algorithms are sometimes referred to as conceptual algorithms. In spite of this the algorithms described in Sections 2 and 3 have proved to be highly successful in solving some difficult linear and nonlinear SIP problems.


It is also worth noting that the global optimization of 9 is to some extent an inevitable part of any algorithm for SIP if only to establish feasibility.

Discretization methods such as [10], [14], [20], [26], [36], [37], essentially replace the SIP by a sequence of finite NLPs. In general the semi-infinite constraint will be violated for some values of t in T but careful choices of the successive discretizations can lead to convergence under mild assumptions. In practice, small infeasibilities can usually be tolerated. Discretization methods can also be used to advantage as a first phase to speed up EPF methods for SIP as the results of [22], [34], demonstrate. Nevertheless, exact penalty function methods for semi-infinite programming are theoretically sound and computationally efficient with or without a first phase.

REFERENCES [1] M. M. Ali and C. Storey. Topographical multi-level single linkage. J. Global

Optimization, 5:349-358, 1994.

[2] B. M. Bell. Global convergence of a semi-infinite optimisation method. Appl. Math. Opt., 21(1):89-110, 1990.

[3] C. G. E. Boender and A. H. G. Rinnooy Kan. Bayesian stopping rules for multistart global optimization methods. Math. Prog., 37:59-80, 1987.

[4] R. Bojanic and R. Devore. On polynomials of best one-sided approximation. L 'Ensiegnement Mathematique, 12(2):139-164, 1966.

[5] A. R. Conn and N. I. M. Gould. An exact penalty function for semi-infinite programming. Math. Prog., 37:19-40, 1987.

[6] I. D. Coope and C. J. Price. A two-parameter exact penalty function for nonlinear programming. Journal of Optimization Theory & Applications, 83(1):49-61, 1994.

[7] I. D. Coope and G. A. Watson. A projected Lagrangian algorithm for semiinfinite programming. Math. Prag., 32:337-356, 1985.

[8] Y. J. Evtushenko. Numerical Optimisation Techniques. Optimization Software inc. Publications Division, 1985.

[9] A. C. Fiacco and K. O. Kortanek, editors. Semi-Infinite Programming and Applications, Lecture Notes in Economics and Mathematical Systems, no. 215. Springer-Verlag, Berlin, 1983.

[10] C. E. Gonzaga, E. Polak, and R. Trahan. An improved algorithm for optimisation problems with functional inequality constraints. IEEE 1tans. Automatic Control, 25(1):49-54, 1980.

156 CHAPTER 5

[11] S.-A. Gustafson. On the computational solution of a class of generalized moment problems. SIAM J. Numer. Anal., 7:343-357, 1970.

[12] s.-A. Gustafson. A three phase algorithm for semi-infinite programmes, pages 138-157. In Fiacco and Kortanek [9], 1983.

[13] J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik, 2:84-90, 1960.

[14] R. Hettich. An implementation of a discretization method for semi-infinite programming. Math. Prog., 34:354-361, 1986.

[15] R. Hettich and W. van Honstede. On quadratically convergent methods for semiinfinite programming. In R. Hettich, editor, Semi-Infinite Programming, pages 97-111. Springer-Verlag, Berlin, 1979.

[16] H. T. Jongen, J.-J. Riickmann, and O. Stein. Generalized semi-infinite optimization: A first order optimality condition and example. Technical Report Nr. 95-24, Universitiit Trier, Mathematik/Informatik, 1995.

[17] R. H. Mladineo. An algorithm for finding the global maximum of a multimodal, multivariate function. Math. Prog., 34:188-200, 1986.

[18] R. E. Moore and H. Ratschek. Inclusion functions and global optimisation II. Math. Prog., 41:341-356, 1988.

[19] H. Niederreiter. Quasi Monte Carlo methods and pseudo-random numbers. Bulletin of the American Math. Soc., 84(6}:957-1041, 1978.

[20] E. R. Panier and A. L. Tits. A globally convergent algorithm with adaptively refined discretization for semi-infinite optimisation problems arising in engineering design. IEEE Trans. Automatic Control, 34(8):903-908, 1989.

[21] T. Pietrzkowski. The potential method for conditional maxima in locally compact metric spaces. Numer. Math., 14:325-329, 1970.

[22] C. J. Price. Non-Linear Semi-Infinite Programming. PhD thesis, University of Canterbury, Christchurch, New Zealand, 1992.

[23] C. J. Price and I. D. Coope. An exact penalty function algorithm for semi-infinite programmes. BIT, 30:723-734, 1990.

[24] C. J. Price and I. D. Coope. Numerical experiments in semi-infinite programming. Computational Optimisation and Applications, 6(2}:169-189, 1996.

[25] H. Ratschek. Inclusion functions and global optimisation. Math. Prog., 33:300-317, 1985.

[26] R. Reemtsen. Discretization methods for the solution of semi-infinite programming problems. Journal of Optimization Theory f1 Applications, 23:141-153, 1988.

[27] A. H. G. Rinnooy Kan and G. T. Timmer. Stochastic global optimisation methods part I: Clustering methods. Math. Prog., 39:27-56, 1987.

[28] A. H. G. Rinnooy Kan and G. T. Timmer. Stochastic global optimisation methods part II: Multi-level methods. Math. Prog., 39:57-78, 1987.


(29) Y. Tanaka, M. Fukushima, and T. Hasegawa. Implementable Loo penalty function method for semi-infinite optimisation. Int. J. Systems Sci., 18(8):1563-1568, 1987.

(30) Y. Tanaka, M. Fukushima, and T. Ibaraki. A globally convergent SQP method for semi-infinite non-linear optimization. Journal of Computational and Applied Mathematics, 23:141-153, 1988.

(31) A. Torn and A. Zilinskas. Global Optimisation, Lecture Notes in Computer Science No. 350. Springer-Verlag, Berlin-Heidelberg, 1989.

(32) G. A. Watson. Globally convergent methods for semi-infinite programming. BIT, 21:362-373, 1981.

(33) G. A. Watson. Numerical experiments with globally convergent methods for semiinfinite programming problems, pages 193-205. In Fiacco and Kortanek [9], 1983.

(34) G. A. Watson. Lagrangian methods for semi-infinite programming problems. In E. J. Anderson and A. B. Philpott, editors, Infinite Dimensional Linear Programming, Lecture Notes in Economics and Mathematical Systems no. 259, pages 90-107. Springer-Verlag, 1985.

(35) G. R. Wood. Multidimensional bisection applied to global optimization. Computers and Mathematics with Applications, 21:161-172, 1991.

(36) J. L. Zhou and A. L. Tits. Erratum to "An SQP algorithm for finely discretized continuous minimax problems and other minimax problems with many objective functions". SIAM J. Optimization, to appear.

(37) J. L. Zhou and A. L. Tits. An SQP algorithm for finely discretized continuous minimax problems and other minimax problems with many objective functions. SIAM J. Optimization, 6:461-487, 1996.

6 FEASIBLE SEQUENTIAL

QUADRATIC PROGRAMMING FOR FINELY DISCRETIZED

PROBLEMS FROM SIP Craig T. Lawrence and Andre L. Tits

ABSTRACT

Department of Electrical Engineering and Institute for Systems Research,

University of Maryland, College Park, College Park, Maryland 20742, USA,

Email: [email protected] [email protected]

A Sequential Quadratic Programming algorithm designed to efficiently solve nonlinear optimization problems with many inequality constraints, e.g. problems arising from finely discretized Semi-Infinite Programming, is described and analyzed. The key features of the algorithm are (i) that only a few of the constraints are used in the QP sub-problems at each iteration, and (ii) that every iterate satisfies all constraints.

1 INTRODUCTION

Consider the Semi-Infinite Programming (SIP) problem

minimize f (x) subject to (x) ~ 0, (SI)

where f : lRn -t lR is continuously differentiable, and : lRn -t lR is defined by

(x) g sup 4>(x, ~), ~E[O,ll

with 4> : lRn x [0,1] -t lR continuously differentiable in the first argument. For an excellent survey of the theory behind the problem (SI), in addition to some algorithms and applications, see [9] as well as the other papers in the present

159

R. Reemtsen andJ.-J. Riicknuum (eds.), Semi-Infinite Programming, 159-193. © 1998 Kluwer Academic Publishers.

160 CHAPTER 6

volume. Many globally convergent algorithms designed to solve (SJ) rely on approximating cp(x) by using progressively finer cliscretizations of [0,1] (see, e.g. [5,7,8,16,18-20,23]). Specifically, such algorithms generate a sequence of problems of the form

minimize f (x) subject to ¢(x,~) :::; 0, V~ E 3,

(DSJ)

where 3 C [0,1] is a (presumably large) finite set. For example, given q E lN, one could use the uniform discretization

:=: ~ { o,~, ... , q ~ 1,1 }.

Clearly these algorithms are crucially dependent upon being able to efficiently solve problem (DSJ).

Of course, (DSJ) involves only a finite number of smooth constraints, thus could be solved in principle via classical constrained optimization techniques. Note however that when 131 is large compared to the number of variables n, it is likely that only a small subset of the constraints are active at a solution. A scheme which exploits this fact by cleverly using an appropriate small subset of the constraints at each step should, in most cases, enjoy substantial savings in computational effort without sacrificing global and local convergence properties.

Early efforts at employing such a scheme appear in [16, 19] in the context of first order methods of feasible directions. In [19], at iteration k, a search direction is computed based on the method of Zoutendijk [28] using only the gradients of those constraints satisfying ¢(Xk'~) 2: -f, where f > ° is small. Clearly, close to a solution, such "f-active" constraints are sufficient to ensure convergence. However, if the discretization is very fine, such an approach may still produce sub-problems with an unduly large number of constraints. It was shown in [16] that, by means of a scheme inspired by the bundle-type methods of nondifferentiable optimization (see, e.g. [11,13]), the number of constraints used in the sub-problems can be further reduced without jeopardizing global convergence. Specifically, in [16], the constraints to be used in the computation of the search direction dk+l at iteration k + 1 are chosen as follows. Let 3 k ~ 3 be the set of constraints used to compute the search direction dk, and let XkH be the next iterate. Then 3kH includes:

• All ~ E 3 such that ¢(XkH'~) = ° (i.e. the "active" constraints),

• All ~ E 3 k which affected the computation of the search direction dk , and

FSQP for Finely Discretized SIP 161

• A ~ E 3, if it exists, which caused a step reduction in the line search at iteration k.

While the former is obviously needed to ensure that dk is a feasible direction, it is argued in [16] that the latter two are necessary to avoid zig-zagging or other jamming phenomena.

The number of constraints required to compute the search direction is thus typically small compared to 131, hence each iteration of such a method is computationally less costly. Unfortunately, for a fixed level of discretization, the algorithms in [16,19] converge at best at a linear rate.

Sequential Quadratic Programming (SQP)-type algorithms exhibit fast local convergence and are well-suited for problems in which the number of variables is not too large but the evaluation of objective/constraint functions and their gradients is costly. In such algorithms, quadratic programs (QPs) are used as models to construct the search direction. For an excellent recent survey of SQP algorithms, see [2]. A number of attempts at applying the SQP scheme to problems with a large number of constraints, e.g. our discretized problem from SIP, have been documented in the literature. In [1], Biggs treats all active inequality constraints as equality constraints in the QP sub-problem, while ignoring all constraints which are not active. Polak and Tits [20], and Mine et al. [14], apply to the SQP framework an f-active scheme similar to that used in [19]. Similar to the f-active idea, Powell proposes a "tolerant" algorithm for linearly constrained problems in [22]. Finally, in [26], Schittkowski proposes another modification of the SQP scheme for problems with many constraints, but does not prove convergence. In practice, the algorithm in [26] mayor may not converge, dependent upon the heuristics applied to choose the constraints for the QP sub-problem.

In this paper, the scheme introduced in [16] in the context of first-order feasible direction methods is extended to the SQP framework, specifically, the Feasible SQP (FSQP) framework introduced in [17] (the qualifier "feasible" signifies that all iterates Xk satisfy the constraints, i.e. ¢(Xk'~) $ 0, for all ~ E 3). Our presentation and analysis significantly borrow from [27], where an important special case of (DS!) is considered, the unconstrained minimax problem.

Let the feasible set be denoted

162 CHAPTER 6

For x E X, 3 ~ 3, and H E lRnxn with H = HT > 0, let dO(x, H, 3) be the corresponding SQP direction, i.e. the unique solution of the QP

minimize ~(dO,HdO) + ('\1f(x) , dO) subject to ¢(x,O + ('\1 x¢(x, ~), dO) ::; 0, V~ E 3.

At iteration k, given an estimate Xk E X of the solution, a constraint index set 3 k ~ 3, and a symmetric positive definite estimate Hk of the Hessian of the Lagrangian, first compute d2 = dO(Xk, Hk, 3 k). Note that d2 may not be a feasible search direction, as required in the FSQP context, but that at worst it is tangent to the feasible set. Since all iterates are to remain in the feasible set, following [17], an essentially arbitrary feasible descent direction d~ is computed and the search direction is taken to be the convex combination dk = (1 - Pk)d2 + Pkd~. The coefficient Pk = p(d2) E [0,1] goes to zero fast enough, as Xk approaches a solution, to ensure the fast convergence rate of the standard SQP scheme is preserved. An Armijo-type line search is then performed along the direction dk, yielding a step-size tk E (0,1]. The next iterate is taken to be Xk+l = Xk + tkdk' Finally, Hk is updated yielding Hk+l, and a new constraint index set 3k+l is constructed following the ideas of [16].

As is pointed out in [27], the construction of [16] cannot be used meaningfully in the SQP framework without modifying the update rule for the new metric Hk+l' The reason is as follows. As noted above, following [16], if tk < 1, 3k+l is to include, among others, the index t. E :3 of a constraint which was infeasible for the last trial point in the line search. 1 The rationale for including t. in 3k+l is that if t. had been in :3k, then it is likely that the computed search direction would have allowed a longer step. Such reasoning is clearly justified in the context of first-order search directions as is used in [16], but it is not clear that t. is the right constraint to include under the new metric H k+1 . To overcome this difficulty, it is proposed in [27] that Hk not be updated whenever tk < 8, 8 a prescribed small positive number, and t. ¢ 3 k . We will show in Section 3 that, as is the case for the minimax algorithm of [27], for k large enough, t. will always be in 3k, thus normal updating will take place eventually, preserving the local convergence rate properties of the SQP scheme.

There is an important additional complication, with the update of 3k, which was not present in the minimax case considered in [27]. As just pointed out, any ~ E 3k which affected the search direction is to be included in 3k+l' In [27] (unconstrained minimax problem) this is accomplished by including those objectives whose multipliers are non-zero in the QP used to compute the search

1 Assuming that it was a constraint, and not the objective function, which caused a failure in the line search.


direction (analogous to QPO(Xk, Hk, 3 k) above), i.e. the "binding" objectives. In our case, in addition to the binding constraints from QPO(Xk, Hk, 3k), we must also include those constraints which affect the computation of the feasible descent direction dl. If this is not done, convergence is not ensured and a "zigzagging" phenomenon as discussed in [16] could result.

As a final matter on the update rule for 3 k , following [27], we allow for additional constraint indices to be added to the set 3 k • While not necessary for global convergence, cleverly choosing additional constraints can significantly improve performance, especially in early iterations. In the context of discretized SIP, exploiting the possible regularity properties of the SIP constraints with respect to the independent parameter can give useful heuristics for choosing additional constraints.

In order to guarantee fast (superlinear) local convergence, it is necessary that, for k large enough, the line search always accept the step-size tk = 1. It is well-known in the SQP context that the line search could truncate the step size arbitrarily close to a solution (the so-called Maratos effect), thus preventing superlinear convergence. Various schemes have been devised to overcome such a situation. We will argue that a second-order correction, as used in [17], will overcome the Maratos effect without sacrificing global convergence.

The balance of the paper is organized as follows. In Section 2 we introduce the algorithm and present some preliminary material. Next, in Section 3, we give a complete convergence analysis of the algorithm proposed in Section 2. The local convergence analysis assumes the just mentioned second-order correction is used. To improve the continuity of the development, a few of the proofs are deferred to an appendix. In Section 4, the algorithm is extended to handle the constrained minimax case. Some implementation details, in addition to numerical results, are provided in Section 5. Finally, in Section 6, we offer some concluding remarks.

2 ALGORITHM

We begin by making a few assumptions that will be in force throughout. The first is a standard regularity assumption, while the second ensures that the set of active constraint gradients is always linearly independent.

164 CHAPTER 6

Assumption 1: The functions f : JRn -+ JR and ¢>(.,~) : JRn -+ JR, ~ E 3, are continuously differentiable.

Define the set of active constraints for a point x E X as

3act(x) ~ {~E 3 I ¢>(x,~) = O}.

Assumption 2: For all x E X with 3 acdx) =I 0, the set {\7 x¢>(x,~) I ~ E 3 act (x)} is linearly independent.

A point x* E JRn is called a Karush-Kuhn-Tucker (KKT) point for the problem (DS!) if there exist KKT multipliers Ae, ~ E 3, satisfying

1 Vf~X') + ~A'V.~~X"~) ~ 0,

¢>(x ,~) :::; 0, v~ E :=.,

Ae ¢>(x*,~) = ° and Ae ~ 0, V~ E 3.

(2.1)

Under our assumptions, any local minimizer x* for (DS!) is a KKT point. Thus, (2.1) provides a set of first-order necessary conditions of optimality.

Throughout our analysis, we will often make use of the KKT conditions for QPO(x,H,3). Specifically, given x E X, H = HT > 0, and 2 ~ 3, dO is a KKT point for QPO(x, H, 3) if there exist multipliers A~, ~ E 3, satisfying

1 HdO + \7f(x) + ~ A~\7x¢>(X,~) = 0,

¢>(X,O + (\7 x¢>(X, ~), dO) :::; 0, V~ E 2,

A~ (¢>(x,~) + (\7 x¢>(x, ~), dO)) = ° and A~ ~ 0, V~ E 2.

(2.2)

In fact, since the objective for QPO(x, H, 3) is strictly convex, such a dO is the unique KKT point, as well as the unique global minimizer (stated formally in Lemma 3.1 below).

As noted above, dO need not be a feasible direction. The search direction d will be taken as a convex combination of dO and a first-order feasible descent direction d1 . For x E X and 3 ~ 3, we compute d1 = d1 (x, 3), and"( = ,,(x, 3),

FSQP for Finely Discretized SIP

as the solution of the QP

minimize ~lId11l2 +')'

subject to ('Vf(x),dl ) ~ ,)"

¢(x,O + ('V x¢(x, ~), dl ) ~ ,)" V~ E 3.

165

The notation II . II will be used throughout to denote the standard Euclidean norm. The pair (dl , ')') is a KKT point for Q pI (x, 3) if there exist multipliers J..LI and A~, ~ E 3, satisfying

[ d; ] + J..LI ['V ~~x) ] + ~ A~ ['V x~~'~) ] = 0,

('Vf(x),dl ) ~ ,)"

¢(x,~) + ('V x¢(x, ~), dl ) ~ ,)" v~ E 3,

J..LI (('Vf(x),dl ) - ')') = ° and J..LI ~ 0,

A~ (¢(x,~) + ('Vx¢(X,~),dl) - ')') = ° and A~ ~ 0,

(2.3)

In Section 1 we stated that the feasible descent direction dl was essentially arbitrary. In the subsequent analysis we assume that dl is chosen specifically as the solution of QPI(X, 3), though it can be shown that the results still hold if some minor variation is used. To be precise, following [17], we require that dl = dl (x,3) satisfy:

• d l (x, 3) = ° if x is a KKT point,

• ('V f(x), dl (x, 3)) < ° if x is not a KKT point,

• ('V x¢(x, ~), dl (x, 3)) < 0, for all ~ E 3 act if x is not a KKT point, and

• for 3 fixed, dl (x, 3) is bounded over bounded subsets of X.

It will be shown in Lemma 3.2 that the solution of QPI (x, 3) satisfies these requirements. In our context, dl must fulfill one additional property, which is essentially captured by Lemma A.O.1 in the appendix.

Thus, at iteration k, the search direction dk is taken as a convex combination of

d2 and dL i.e. dk ~ (1 - Pk)d2 + PkdL Pk E [0,1]. In order to guarantee a fast local rate of convergence while providing a suitably feasible search direction, we require the coefficient of the convex combination Pk = p(lf/.) to satisfy certain properties. Namely, p(.) : lRn -+ [0,1] must satisfy

166 CHAPTER 6

• p(dO) is bounded away from zero outside every neighborhood of zero, and

For example, we could take p(dO) = min{l, Ud°ll'''}, where 1) ~ 2.

It remains to explicitly specify the key feature of the proposed algorithm: the update rule for 2 k. As discussed above, following [16]' 2k+1 will include (in addition to possible heuristics) three crucial components. The first one is the set 2act(XHd of indices of active constraints at the new iterate. The second component of 2HI is the set 2~ ~ 2k of indices of constraints that affected dk. In particular, 2~ will include all indices of constraints in Q po (x k, H k, 2k) and Q pI (X k, 2k) which have positive multipliers, i.e. the binding constraints for these QPs. Specifically, let A2,~, and Al,~, for e E 2 k , be the QP multipliers from QPO(xk,Hk,2k) and QPI (Xk,2k), respectively. Defining

~b,1 A {c ~ I \l ° } ::'k = <" E::'k .l\k,~ > ,

we let -;:;b ~ -;:;b,O U -;:;b,1 ~k - ~k ~k'

Finally, the third component of 2HI is the index ~ of one constraint, if any exists, that forced a reduction of the step in the previous line search. While the exact type of line search we employ is not critical to our analysis, we assume from this point onward that it is an Armijo-type search. That is, given constants a E (0,1/2) and [3 E (0,1), the step-size tk is taken as the first number t in the set {l, [3, [32, ... } such that

(2.4)

and (2.5)

Thus, tk < 1 implies that either (2.4) or (2.5) is violated at Xk + ~dk' In the event that (2.5) is violated, there exists ~ E 2 such that

(2.6)

and in such a case we will include ~ in 2 k+1'

FSQP for Finely Discretized SIP

Algorithm FSQP-MC

Parameters. 0: E (O,!), (3 E (0,1), and 0 < 8 « 1.

Data. Xo E X, 0 < Ho = H;r E rn.nxn.

Step 0 - Initialization. Set k f- 0 and choose 30 ;2 3act(xo).

Step 1 - Computation of search direction.

(i). Compute d2 = dO(Xk,Hk,3k). If d2 = 0, stop.

(ii). Compute dl = d1 (Xk,3k).

(iii). Compute Pk = p(d2) and set dk f- (1 - Pk)d2 + Pkdl.

167

Step 2 - Line search. Compute tk, the first number t in the sequence {I, (3, (32, ... } satisfying (2.4) and (2.5).

Step 3 - Updates.

(i). Set Xk+1 f- Xk + tkdk·

(ii). If tk < 1 and (2.5) was violated at Xk + ~dk' then let ~ be such that (2.6) holds.

(iii). Pick 3k+l ;2 3 act (xk+d U 3~.

If tk < 1 and (2.6) holds for some ~ E 3, then set

(iv). If tk :::; 8 and ~ (j. 3 k, set Hk+1 f- Hk. Otherwise, obtain a new symmetric positive definite estimate Hk+1 to the Hessian of the Lagrangian.

(v). Set k f- k + 1 and go back to Step 1.

3 CONVERGENCE ANALYSIS

While there are some critical differences, the analysis in this section closely parallels that of [27]. We begin by establishing that, under a few additional assumptions, algorithm FSQP-MC generates a sequence which converges to a KKT point for (DSI). Then, upon strengthening our assumptions slightly, we show that the rate of convergence is two-step super linear.

168 CHAPTER 6

3.1 Global convergence

The following will be assumed to hold throughout our analysis.

Assumption 3: The level set { x E JRn I f{x) ~ f{xo) } n X is compact.

Assumption 4: There exist scalars 0 < 0"1 ~ 0"2 such that for all k,

Given the scalars 0 < 0"1 ~ 0"2 from Assumption 4, define

First, we derive some properties of dO{x,H,3) and d1{x,3).

Lemma 3.1 For all x E X, H E 1-£, and 3 ~ 3 such that 3 act {x) ~ 3, the search direction dO = dO{x, H, 3) is well-defined, is continuous in x and H for 3 fixed, and is the unique KKT point for QPO{x,H,3). Furthermore, dO = 0 if, and only if, x is a KKT point for the problem (DS!). If x is not a KKT point for this problem, then dO satisfies

(\1 f{x), dO) < 0,

(\1x¢(x,O,do) ~ 0, V~ E 3 act (x).

(3.1)

(3.2)

Proof: H > 0 implies that QPO{x, H, 3) is strictly convex. Further dO = 0 is always feasible for the QP constraints. It follows that the solution dO is well~defined and the unique KKT point for the QP. As the set 1-£ is uniformly positive definite, continuity in x and H for fixed 3 is a direct consequence of Theorem 4.4 in [4]. Now suppose dO is 0 and let { A~ I ~ E 3 } be the QP

multipliers. In view of the KKT conditions (2.2) for QPO(x, H, 3), since dO = 0 and x E X, we see that x satisfies the KKT conditions (2.1) for (DS!) with

The converse is proved similarly, appealing to the uniqueness of the KKT point for QPO{x,H,3) and the fact that 3 act{x) ~ 3. Finally, since 3 act{x) ~ 3, (3.1) and (3.2) follow from Proposition 3.1 in [17]. 0

Lemma 3.2 For all x E X, and 3 ~ 3 such that 3 act (x) ~ 3, the direction dl = dl (x,3) is well-defined and the pair (dl,'Y), where'Y = 'Y(x,3), is the unique KKT point of Qpl(X, 3). Furthermore, for given 3, dl(x, 3) is bounded over bounded subsets of X. In addition, dl = 0 if, and only if, x is a KKT point for the problem (DSI). If x is not a KKT point for this problem, then dl

satisfies

and'Y satisfies 'Y < O.

(\If(x),dl ) < 0,

(\lx¢(x,e),dl ) < 0, "Ie E 3 act(x),

(3.3)

(3.4)

Proof: We begin by noting that (dl,'Y) solves QPI(x,3) if, and only if, dl

solves

and

'Y = max { (\If(x),dl ), ~Eaf{¢(x,e) + (\lx¢(X,e),dIn}.

Since the objective function in (3.5) is strictly convex and radially unbounded, it follows that QPI(x,3) has (dl,'Y) = (dl (x,3),'Y(x,3)) as its unique global minimizer. Since QPI (x, 3) is convex, (dl , 'Y) is also its unique KKT point. Boundedness of dl (x, 3) over bounded subsets of X follows from the first equation of the optimality conditions (2.3), noting that the QP multipliers must all lie in [0,1]. Now suppose d l = O. Since x E X, it is clear that 'Y = O.

Substitute dl = 0 and 'Y = 0 into (2.3) and let JLI E ffi and Al E ffilBI be the corresponding multipliers. Note that, in view of Assumption 2, JLl > O. Since x E X, it follows that x satisfies (2.1) with multipliers

{II "" \ _ \/JL, e E.::.,

A~ - "" 0, ~ rt .::..

Therefore, x is a KKT point for (DSI). The converse is proved similarly, appealing to uniqueness of the KKT point for QPI(x,3), and the fact that 3 act (x) ~ 3. To prove (3.3) and (3.4) note that if x is not a KKT point for (DSI;, then as just shown d l f. 0, hence 'Y < 0 (since d l = 0 and 'Y = 0 form a feasible pair, the optimal value of QPI(Xk,3k) must be non-positive). The result then follows directly from the form of the QP constraints and the fact that 3 act(x) ~ 3. 0

Next, we establish that the line search is well-defined.

170 CHAPTER 6

Lemma 3.3 For each k, the line search in Step 2 yields a step tk = (3j for some finite j = j(k).

Proof: If Xk is a KKT point, then Step 2 is not reached, hence assume Xk is not KKT. In view of Lemma 3.1 and the properties of p(.), 4 "# 0 and p(d2) > o. Lemmas 3.1 and 3.2 imply, since 3 act(Xk) ~ 3k, that

(\1 !(Xk), dk) < 0,

(\1 x ¢(Xk, f,), dk) < 0, '<If, E 3 act (Xk).

FUrther, feasibility of Xk requires ¢(Xk,f,) < 0 for all f, E 3 \ 3 act (Xk). The result then follows by considering first order expansions of f(xk + tkdk) and ¢(Xk +tkdk, f,), f, E 3, about Xk and by appealing to our regularity assumptions. o

The previous three lemmas imply that the algorithm is well-defined. In addition, Lemma 3.1 shows that if Algorithm FSQP-MC generates a finite sequence terminating at the point x N, then x N is a KKT point for the problem (DS!). We now concentrate on the case in which the algorithm never satisfies the termination condition in Step l(i) and generates an infinite sequence {xd. Given an infinite index set K, we use the notation

to mean

kEX; * Xk ---t X

Xk -+ x· as k -+ 00, k E K.

Lemma 3.4 Let K be an infinite index set such that 3k = 3* for all k E K, kEX; * H kEX; H* JO kEX; dO,* dl kEX; dl ,* d hEX; * 'Ph (.J Xk ---t X, k ---t , uir, ---t 'k ---t ,an 'Yk ---t 'Y. .L, en ,

dO,* is the unique solution ofQPO(x*,H*,3*), and (ii) (dl,*,'Y*) is the unique solution of Q pI (x* , 3*).

Proof: Part (i) follows from continuity of dO(x, H, 3) for fixed 3 (Lemma 3.1). To prove part (ii), recall that in view of Lemma 3.2, (dl, 'Yk) is the unique KKT point for QPI(Xk,3*), i.e. is the unique solution of (2.3) with corresponding multipliers Pk 2: 0 and ..\l,~ 2: 0, f, E 3*. Note that the multipliers satisfy

Pk + L ..\l,~ = 1, ~ES'

for all k, hence are bounded. Let K' ~ K be an infinite index set such that

Pk ~ p l ,* and ..\l,~ ~ ..\~,*, f, E 3*. Taking limits in the optimality conditions (2.3) shows that (dl ,*, 'Y*) is a KKT point for QPI (x*, 3*) with multipliers pl,* and ..\~,*, f, E 3*. Uniqueness of such points proves the result. 0

The following lemma establishes a few basic properties of some of the sequences generated by the algorithm.

Lemma 3.5 (i) The sequences {xd, {~}, and {dk} are bounded; (ii) {J(Xk)} converges; (iii) tkdk -+ O.

Proof: In view of Assumption 3 and the fact that the line search guarantees {!(Xk)} is a monotonically decreasing sequence, it follows that {xd is bounded, and since !O is continuous, that {J(Xk)} converges. Boundedness of {~} follows from boundedness of {xd, Assumption 4, continuity of dO(x, H, 3) for fixed 3, and the fact that there are only finitely many subsets 3 of 3. Boundedness of {dl} follows from Lemma 3.2 and boundedness of {xd. Since Pk E [0,1]' {dk} is bounded as well. Finally, suppose {tkdd -It O. Then there exists an infinite index set K. ~ :IN such that tkdk is bounded away from zero on K. Since all sequences of interest are bounded and 3 is finite, we may suppose without loss of generality that Xk ~ x*, Hk ~ H*, ~ ~ dO,*, dl ~ d1,*,

Ik ~ 1*' Pk ~ p*, and 3k = 3* for all k E K. Lemma 3.4 tells us that dO,* is the unique solution of QPO(x*, H*, 3*) and (d1,*, 1*) is the unique solution pair for QP1(X*, 3*). Furthermore, since tkdk is bounded away from zero on K, there exists t > 0 such that tk 2:: t for all k E K, and since {td is bounded (tk E [0,1]), it follows that either dO,* t= 0 or d1,* t= O. Applying Lemmas 3.1 and 3.2 in both directions shows that x* is not a KKT point for (DS!) and both dO,* t= 0 and d1,* t= O. In addition, 1* < 0 and Ik < 0, for all k E K (from Lemma 3.2) and Pk is bounded away from zero on K (from the assumptions on p(.) as given in Section 2 and the fact that d2 is bounded away from zero on K). As a consequence, there exists P > 0 such that Pk = p(d2) 2:: P for all k E K and there exists 'Y < 0 such that Ik :::; 'Y for all k E K. Now, since (\7!(xk),dl) :::; Ik, for all k,

tk(\7 !(Xk), dk) = tk(1 - Pk)(\7 !(Xk), df) + tkPk(\7 !(Xk), dl)

< tkPklk :::; te"f < 0,

for all k E K, where we have used (3.1). Thus, by the line search criterion of Step 2,

!(xk+d :::; !(Xk) + te"f for all k E K. Since !(Xk) is monotone non-increasing, it follows that {J(Xk)} diverges, which contradicts (ii). 0

In order to establish convergence to a KKT point, it will be convenient to consider the value functions for the search direction QPs, QPO(x, H, 3) and

172 CHAPTER 6

QPI(x,3). In particular, given the solutions dO = dO(x,H,3) and (dl,'Y) = (d i (x, 3), 'Y(x, 3)), define

vO(x,H,3) ~ - (~(dO,HdO) + (V'f(x),dO)) ,

vI (x, 3) ~ - (~lldII12 + l' ) .

Further, let v(x,H,3) ~ vO(x,H,3) +vl (x,3), and, at iteration k, define v2 = vO(xk,Hk,3k), vl = VI(Xk,3k), and Vk = v(xk,Hk,3k). Note that, since o is always feasible for both QPs, v2 ~ 0 and vl ~ 0, for all k.

Lemma 3.6 Let K be an infinite index set. Then (i) cP,. ~ 0 if and only if v2 ~ 0, (ii) (dl, 'Yk) ~ (0,0) if and only if vl ~ 0, and (iii) if cP,. ~ 0, then all accumulation points of {xkhEK are KKT points for (DS/).

Proof: First, if cP,. ~ 0 then it is clear from the definition of v2 that v2 ~ O. Next, from (2.2) and the last statement in Lemma 3.1, it follows that (V' f(Xk), cP,.) :::; -(cP,., HkcP,.). Thus, using again the definition of v2, we get

v2 - (~(cP,.,HkcP,.) + (V'f(Xk),cP,.))

> -~(cP,.,HkcP,.) + (cP,.,HkcP,.) ~ TIIcP,.1I 2 > 0,

where we have used Assumption 4. Thus, if v2 ~ 0, then cP,. ~ O. If (dl, 'Yk) ~ (0,0), then from the definition of vl we see that vl ~ O. Now suppose vl ~ O. To show dl ~ 0, note that from the optimality conditions (2.3),

-'Yk + L >'l,{¢(Xk'~) < - 'Yk· {E8 k

Thus, again using the definition of vl,

I 1 I 2 1 I 12 vk = -"2l1dkll - 'Yk ~ "2lldkl ,

and it immediately follows that d~ ~ 0 and 'Yk ~ O. To prove (iii), suppose K is such that cP,. ~ O. Let x· be an accumulation point of {XdkEK and


let K/ ~ J( be an infinite index set such that Xk ~ X* and, for some S, Bk = S ~ B for all k E J('. Let.x~ E IRysl be the multiplier vector from QPO(Xk, Hk, S) and define

B~ ~ {~E SI ¢(Xk'~) + (\1x¢(xk,~),d2) = o}.

Suppose, without loss of generality, B~ = So, for all k E J('. As d2 ~ 0, it is clear that SO ~ Bact(x*) and, in view of Assumption 2, the set { \1 x¢(Xk'~) 1 ~ E SO } is linearly independent, for k large enough. Thus, from (2.2), a unique expression for the QP multipliers (for k large enough) is given by

where R(Xk) ~ [\1x¢(Xk'~) 1 ~ E SO] E IRnxlSol. In view of Assumption 4,

boundedness of {xd, the regularity assumptions, and the fact that d2 ~ 0, we see that

.x~ ~ .x0 ,* = - (R(x*f R(x*)) -1 R(x*f\1 f(x*).

Taking limits in the optimality conditions (2.2) for QPO(xk,Hk,S) shows that x* is a KKT point for (DSI) with multipliers

'* - II{,.. .::., II{ - ~

{ ,0,* (:E~

0, ~ rt .::.. o

We are now in a position to show that there exists an accumulation point of {Xk} which is a KKT point for (DS!). This result is, in fact, weaker than that obtained in [27] for the unconstrained minimax case, where under similar assumptions, but with a more involved argument, it is shown that all accumulation points are KKT. The price to be paid is the introduction of Assumption 5 below for proving Theorem 3.9.

The proof of the following result is inspired by that of Theorem T in [16].

Proposition 3.7 lim inf Vk = O. k

Corollary 3.8 There exists an accumulation point x* of {Xk} which is a KKT point for (DSI).

174 CHAPTER 6

Proof: Since v2 ~ ° and vk ~ ° for all k, Proposition 3.7 implies lim i~f v2 = 0,

i.e. there exists an infinite index set K such that v2 ~ 0. In view of Lemma 3.6, all accumulation points of {XdkEIC are KKT points. Finally, boundedness of {xd implies at least one such point exists. 0

Define the Lagrangian function for (DS!) as

L(x,)..) ~ I(x) + L)..«/>(x,~). ~ES

In order to show that the entire sequence converges to a KKT point x*, we strengthen our assumptions as follows.

Assumption 1': The functions I : IRn -+ IR and ¢(-,~) : IRn -+ IR, ~ E 3 are twice continuously differentiable.

Assumption 5: Some accumulation point x* of {Xk} which is a KKT point for (DS!) also satisfies the second order sufficiency conditions with strict complementary slackness, i.e. there exists )..* E IRlsl satisfying (2.1) as well as

• V';xL(x*, )..*) is positive definite on the subspace

{h I (V'x¢(x*,~),h) = 0, V~ E 3act (x*)},

• and )..~ > ° for all ~ E 3 act (x*).

It is well-known that such an assumption implies that x* is an isolated KKT point for (DS!) as well as an isolated local minimizer. The following theorem is the main result of this section.

Theorem 3.9 The sequence {xd generated by algorithm FSQP-MC converges to a strict local minimizer x* of (DS!).

Proof: First we show that there exists a neighborhood of x* in which no other accumulation points of {xd can exist, KKT points or not. As x* is a strict local minimizer, there exists 10 > ° such that I(x) > I(x*) for all x =I x*,

xES ~ B(x*, 10) n X, where B(x*, 10) is the open ball of radius 10 centered at x*. Proceeding by contradiction, suppose x' E B(x*, f), x' =I x*, is another accumulation point of {xd. Feasibility of the iterates implies that x' E S. Thus I(x') > l(x*), which is in contradiction with Lemma 3.5(ii). Next, in view of

Lemma 3.5(iii), (Xk+1 - Xk) --+ O. Suppose K is an infinite index set such that Xk ~ x*. Then there exists kl such that Ilxk - x* II < f../4, for all k E K, k 2: k1 · Further, there exists k2 such that Ilxk+l - xkll < f../4, for all k > k2. Therefore, if there were another accumulation point outside of B(x*, f..), then the sequence would have to pass through the compact set B(x*, f..) \ B(x*, f../4) infinitely many times. This contradicts the established fact that there are no accumulation points of {x d, other than x*, in B (x* , f..). 0

3.2 Local convergence

We have thus shown that, with a likely dramatically reduced amount of work per iteration, global convergence can be preserved. This would be of little interest, though, if the speed of convergence were to suffer significantly. In this section we establish that, under a few additional assumptions, the sequence {xd generated by a slightly modified version of algorithm FSQP-MC (to avoid the Maratos effect) exhibits 2-step superlinear convergence. To do this, the bulk of our effort is focussed on showing that for k large the set of constraints =t which affect the search direction is precisely the set of active constraints at the solution, i.e. 3 act (x*). In addition, we show that, eventually, no constraints outside of 3 act(x*) affect the line search, and that Hk is updated normally at every iteration. Thus, for k large enough, the algorithm behaves as if it were solving the problem

minimize f (x) subject to c/J{x,~) ~ 0, V~ E 3 act{x*),

(P*)

using all constraints at every iteration. Establishing this allows us to apply known results concerning local convergence rates.

The following is proved in the appendix.

Proposition 3.10 For k large enough,

( ';) ~b,O ~b,l ~ (*) d • ::'k =::'k = ::'act x ,an

(ii) c/J(Xk + tdk, 0 ~ 0 for all t E [0, 1], ~ E 3 \ 3 act (x*).

In order to achieve super linear convergence, it is crucial that a unit step, i.e. tk = 1, always be accepted for all k sufficiently large. Several techniques have been introduced to avoid the so-called Maratos effect. We chose to include a

176 CHAPTER 6

second order correction such as that used in [17]. Specifically, at iteration k, let d(x, d, H, 3) be the unique solution of the QP QP(x, d, H, 3), defined for T E (2,3) as follows

minimize !(d+d,H(d+d)) + ('Vf(x),d+d) ~ ~ subject to ¢(x + d,~) + ('V x¢(x, ~), d + d) ::::; IldlI T , 'rI~ E 3, QP(x, d, H, 3)

if it exists and has norm less that min{lIdll, C}, where C is a large number. Otherwise, set d(x, d, H, 3) = O. The following step is added to algorithm FSQP-MC:

In addition, the line search criterion (2.4) and (2.5) are replaced with

(3.6)

and (3.7)

(3.8)

With some effort, it can be shown that these modifications do not affect any of the results obtained to this point. Further, for k sufficiently large, the set of binding constraints in QP(xk,dk,Hk,3k) is again 3 act (x*). Hence, it is established that for k large enough, the modified algorithm FSQP-MC behaves identically to that given in [17], applied to (P*).

Assumption 1 is now further strengthened and a new assumption concerning the Hessian approximations Hk is given. These assumptions allow us to use the local convergence rate result from [17].

Assumption 1": The functions f : JRn --+ JR, and ¢(-, 0 : JRn --+ JR, ~ E 3, are three times continuously differentiable.

Assumption 6: As a result of the update rule chosen for Step 3(iv) , Hk approaches the Hessian of the Lagrangian in the sense that

lim IIPk(Hk - 'V;xL(x*, ),*))Pkdkll - 0 k-+oo IIdk II - ,


where ).* is the KKT multiplier vector associated with x* and

Theorem 3.11 For all k sufficiently large, the unit step tk = 1 is accepted in Step 2. Further, the sequence {Xk} converges to x* 2-step superlinearly, i. e.

lim Ilxk+2 - x* II = o. k-too IIxk - x* II

4 EXTENSION TO CONSTRAINED MINIMAX

The algorithm we have discussed may be extended following the scheme of [27] to handle problems with many objective functions, i.e. large-scale constrained minimax. Specifically, consider the problem

minimize max f (x, w) wEn

subject to ¢(x,~) ~ 0, v~ E 3,

where nand 3 are finite (again, presumably large) sets, and f : lRn x n ---t lR and ¢ : lRn x 3 ---t lR are both three times continuously differentiable with respect to their first argument. Given n ~ n, define

ll. Fn(x) = ma~f(x,w).

wEn

Given a direction d E lRn , define a first-order approximation of Fn(x + d) -Fn(x) by

F~(x, d) ~ m~{J(x + d,w) + (\7xf(x,w), d)} - Fn(x), wEn

and, finally, given a direction dE lRn , let

- -ll. -FMx,d, d) = ma~{J(x + d,w) + (\7 xf(x,w), d)} - Fn(x + d).

wEn

178 CHAPTER 6

Let Ok be the set of objective functions used to compute the search direction at iteration k. The modified QPs follow. To compute dO(x, H, n, 3), we solve

minimize ~ (dO, H dO) + F~ (x, dO) n ~ subject to 1(x,~) + (\l x1(x, ~), dO) ~ 0, V~ E 2,

and to compute d1(x,n,3), we solve

minimize ~lldlll2 + l' subject to F~(x,dl) ~ 1',

1(x,~) + (\l x1(x, ~), d1 ) ~ 1', V~ E 3.

Finally, to compute d(x, d, H, n, 3), we solve

1 - - - -minimize '2(d + d, H(d + d)) + F~(x, d, d) subject to 1(x + d,~) + (\l x1(x, ~), d + d) ~ IIdIl T , v~ E 3,

QP(x, d, H, n, 3) where again, if the QP has no solution or if the solution has norm greater than min{lIdll, e}, we set d(x, d, H, n, 3) = o.

In order to describe the update rules for Ok, following [27], we define a few index sets for the objectives (in direct analogy with the index sets for the constraints as introduced in Section 2). The set of indices of "maximizing" objectives is defined in the obvious manner as

~ Omax(x) = {w EO I f(x,w) = Fn(x)}.

At iteration k, let I-L~ w' w E Ok, be the multipliers from QPO(Xk, Hk, Ok, 2k) associated with the ~bjective functions. Likewise, let I-Ll,w, w E Ok, be the multipliers from QP1(Xk,Ok,2k) associated with the objective functions. The set of indices of objective functions which affected the computation of the search direction dk is given by2

O~ ~ {w E Ok I I-L~.w > 0 or I-Ll,w > O}.

The line search criterion (3.6) is replaced with

Fn(Xk + tdk + t2dk) ~ Fn(Xk) + atFb.{xk,dk). (4.1)

2Qpl(X, 11, 8) is not needed in the unconstrained case. Accordingly, in [27], n~ is defined based on a single set of multipliers.

If tk < 1 and the truncation is due to an objective function, then define w E 0 such that

We are now in a position to state the extended algorithm.

Algorithm FSQP-MOC

Parameters. 0: E (0, !), (3 E (0,1), and 0 < 0 « 1.

Data. Xo E IRn, 0 < Ho = Hi[ E IRnxn.

(4.2)

Step 0 - Initialization. Set k +- O. Choose 0 0 ;2 Omax(xo), 3 0 ;2 3 act (xo).

Step 1 - Computation of search direction.

(i). Compute df = dO(Xk,Hk,Ok,3k). If df = 0, stop.

(ii). Compute dl = d1 (Xk,Ok,3k).

(iii). Compute Pk = p(df) and set dk +- (1 - Pk)df + Pkdl.

(iv). Compute dk = d(Xk, dk, Hk, Ok, 3k).

Step 2 - Line search. Compute tk, the first number t in the sequence {1,(3,(32, ... } satisfying (4.1) and (3.7).

Step 3 - Updates.

(i). Set Xk+l +- Xk + tkdk + t~dk.

(ii). If tk < 1 and (4.1) was violated at Xk+l = Xk + ~dk + (~ ) 2 dk,

then let w be such that (4.2) holds. If (3.7) was violated at Xk+l, then let ~ be such that (3.8) holds.

(iii). Pick Ok+l ;2 Omax(Xk+d U ot, and

3 k+l ;2 3 act (xk+d U 3t·

Iftk < 1 and (4.2) holds for some wE 0, then set Ok+l +- Ok+l U{w}. Iftk < 1 and (3.8) holds for some ~ E 3, then set 3k+l +- 3k+l U{~}.

(iv). If tk ~ fJ and w f/. Ok or ~ f/. 3 k set Hk+l +- Hk. Otherwise, obtain a new symmetric positive definite estimate Hk+l to the Hessian of the Lagrangian.

(v). Set k +- k + 1 and go back to Step 1.

180 CHAPTER 6

5 IMPLEMENTATION AND NUMERICAL RESULTS

Algorithm FSQP-MOC has been implemented as part of the code CFSQP3

[12]. The numerical test results reported in this section were obtained with a modified copy of CFSQP Version 2.4 (the relevant changes will be included in subsequent releases, beginning with Version 2.5). All test problems we consider here are instances of (DS!). Thus in this section we only discuss implementation details relevant to solving such problems, i.e. implementation details of algorithm FSQP-MC modified to include the second order correction dk. 4

The implementation allows for multiple discretized SIP constraints and contains special provisions for those which are affine in x. Specifically, problem (DSI) is generalized to

minimize f (x)

subject to ¢j(x,~) ~ (Cj(~),x) - dj(~) ~ 0, V~ E :=:(j), j = 1, ... ,mi,

v~ E :=:(j), j = mi + 1, ... ,m,

where C • dj) ----' IRn J. - 1 m d . dj) ----' IR J. - 1 m j . ~i ---, , - , ... , i, j. ~i ---, , - , ... , i,

and :=:(j) is finite, j = 1, ... , m. The assumptions and algorithm statement are generalized in the obvious manner. As far as the analysis of Section 3 is concerned, such a formulation could readily be adapted to the format of (DSI) by grouping all constraints together, i.e. letting :=: = Uj=12(j) , and ignoring the fact that some may be affine in x.

In the case that the initial point Xo provided is not feasible for the affine constraints, CFSQP first computes v E IRn as the solution of the strictly convex QP

minimize (v, v) subject to (Cj(~),xo+v)-dj(~)~O, V~E:=:(j), j=l, ... ,mi,

for v such that Xo + v is feasible for linear constraints. If the new initial point is not feasible for all nonlinear constraints, then CFSQP iterates, using the algorithm FSQP-MOC, on the constrained minimax problem

minimize . max m~ {¢j(x,~)} J=ml+l, ... ,m ~E=,<J)

subject to (Cj(~),x) - dj(~) ~ 0, V~ E 3~j), j = 1, ... ,mi,

3 Available from the authors. See http://www . isr. umd. edu/Labs/CACSE/FSQP /fsqp. html 4That is, we will not discuss the implementation details specific to the minimax algorithm,

even though in the case that the initial guess is infeasible for nonlinear constraints, the minimax algorithm is used to generate a feasible point.


until. max max {cPj (x,~)} ~ 0 is achieved. The final iterate will be J=ml+l, ... ,m ~ES(')

feasible for all constraints, allowing the algorithm to be applied to the original problem.

Recall that it is only required that Sk contain certain subsets of S. The algorithm allows for additional elements of S to be included in order to speed up initial convergence. Of course, there is a trade-off between speeding up initial convergence and increasing (i) the number of gradient evaluations and (ii) the size of the QPs. In the implementation, heuristics are applied to add potentially useful elements to Sk (see, e.g. [26] for a discussion of such heuristics). In the case of discretized SIP, one may wish to exploit the knowledge that adjacent discretization points are likely to be closely related. Following [6,16,27], for some t > 0, the CFSQP implementation includes in Sk the set s~lm(Xk) of t-active "left local maximizers" at Xk. At a point x E X, for j = 1, ... ,m, define the €-active discretization points as

Such a discretization point ~1j) E SU) = {~~j), . .. , ~g~j) I} is a left local maxi

mizer if it satisfies one of the following three conditions: (i) i E {2, ... , ISU) I-I} and

(5.1)

and (5.2)

(ii) i = 1 and (5.2); (iii) i = ISU)I and (5.1). The set s~lm(x) is the set of all left local maximizers in S,(x) = UJ!=lS~j)(X). The first part of the update (Le. before updates due to line search violations) in Step 3(iii) of the algorithm becomes

'= - '= (x) U ,=b U ,=Um (x ) ~k+l - ~act k ~k~, k .

Finally, we have found that in practice, including the end-points (whether or not they are close to being active) during the first iteration often leads to a better initial search direction. Thus we set

A few other specifics of the CFSQP implementation are as follows. First, as was discussed in Section 2, it is not required to use Q pl (x, S) to compute our

182 CHAPTER 6

feasible descent direction d1 • In fact, at iteration k, CFSQP uses the following QP, which is a function of the SQP direction ef/.,

minimize ¥ lief/. - d1 112 + ')' subject to ('V'j(xk),d1 )::; ,)"

¢(Xk,') + ('V'x¢(xk,,),d1 ) ::; ,)" V, E 3k,

where"., was set to 0.1 for our numerical tests. Using such an objective function encourages d~ to be "close" to df, a condition we have found to be beneficial in practice. It can be verified that the arguments given in Section 3 go through with little modification if we disable the inclusion of ef/. in the QP objective function when the step size tk-l from the previous iteration is less than a given small threshold. 5 The expression used for Pk is given by

~ Ild°ll'< Pk = IIdfll~ + Vk '

where Vk = max{0.5, IId~IIT}, with f<i, = 2.1 and T = 2.5 for our numerical experiments. The matrices Hk are updated using the BFGS formula with Powell's modification [21]. While it is not clear that the sequence {Hd will satisfy Assumption 6, such an update scheme is known to perform well in practice. The multiplier estimates used for the updates are those obtained from QPO(Xk, Hk, 3 k), with all multipliers corresponding to discretization points outside of 3k set to zero. The QP sub-problems were solved using the routine QLD due to Powell and Schittkowski [25). Finally, the following parameter values were used for all numerical testing: a = 0.1, f3 = 0.5, f = 1, and 8 was set to the square root of the machine precision.

In order to judge the efficiency of algorithm FSQP-MOC, we ran the same numerical tests with two other algorithms differing only in the manner in which 3 k is updated. The results are given in Tables 1 and 2. All test algorithms were implemented by making the appropriate modifications to CFSQP Version 2.4. In the tables, the implementation of FSQP-MOC just discussed is denoted NEW. A simple f-active strategy was employed in the algorithm we call f-ACT, i.e. we set 3k = 3,(Xk) for all k, where f = 0.1. The standard FSQP scheme of [17] was applied in algorithm FULL by simply setting 3 k = 3, for all k. All three algorithms were set to stop when lid!:" ::; 1 X 10-4 •

We report the numerical results for 13 discretized SIP test problems with discretization levels of 101 and 501. A uniform discretization was used in all cases. Problems cw-2, cw_3, cw_5, and cw_6 are borrowed from [3]. Problems with the

SIn the numerical experiments reported here, tk remained bounded away from O.


prefix oet are from [15]. The sixth problem from [15] was run with m = 1, 2, and 3 (where m is as defined in [15]). These three cases are labelled oet_B .i, oet_B.2, and oet_B. 3 in our tables. Finally, hz_i is from [10]. Since [10] is not widely available, we give the problem hz_i explicitly:

minimize X2 subject to -(1 - y2) + (~x~ - 2X1Y) ~ X2,

(1 - y2) - (~x~ - 2X1Y) ~ X2, Vy E [-1,1],

Vy E [-1,1].

In all cases except for oet3, the initial guess Xo is the same as that given in the reference from which the problem was taken. For oet3, €-ACT and FULL had difficulty generating a feasible point (while NEW did not), thus we used the feasible initial guess Xo = (0,0,0,-7,-3,-1,3). The first two columns of the tables are self-explanatory. A description of the remaining columns is as follows. The third column, n, indicates the number of variables, while m( and mn in the next two columns indicate the number of linear SIP constraints and nonlinear SIP constraints (mn = m - ml), respectively. Next, NF is the number of objective function evaluations, NG is the number of "scalar" constraint function evaluations (i.e. evaluation of some ¢j (x,~) for a given x and ~), and IT indicates the number of iterations required before the stopping criterion was satisfied. Finally, L 12k I is the sum over all iterations of the size of 2k (it is equal to the number of gradient evaluations in the case of NEW and FULL), 12* I is the size of 2k at the final iterate, and TIME is the time of execution in seconds on a Sun Sparc 4 workstation. For all test problems and for all three algorithms, the value of the objective function at the final iterate agreed (to within four significant figures) with the optimal value as given in the original references.

A few conclusions may be drawn from the results. In general, NEW requires the most iterations to "converge" to a solution, whereas FULL requires the least. Typically, though, the difference is not large. Of course, such behavior is expected since NEW uses a simpler QP model at each iteration. It is clear from comparing the results for L 12k I that NEW provides significant savings in the number of gradient evaluations and the size of the QP sub-problems. The savings for €-ACT are not as dramatic. In almost all cases, comparing- TIME of execution confirms that, indeed, NEW requires far less computational effort than either of the other two approaches. FUrther, note that 12*1 remains, in general, unchanged for NEW when the discretization level is increased and is typically equal to, or less than, n. This is not the case for €-ACT and FULL, as would be expected. Such behavior suggests that computational effort does not increase with respect to discretization level for NEW at the same rate as it does for €-ACT and FULL. This conclusion is supported by the increase in execution TIME when discretization level increases.

184 CHAPTER 6

Table 1 Numerical results for problems with IS(j) I = 101.

PROB ALGO n mt mn NF NG IT EIBkl IB*I TIME cw_2 NEW 2 0 1 5 634 5 11 2 0.30

€-ACT 10 1416 10 357 76 0.56 FULL 5 599 5 505 101 0.76

cw_3 NEW 3 0 1 10 1625 14 22 2 0.43 €-ACT 14 2358 18 25 3 0.43 FULL 20 1866 16 1616 101 0.86

cw_5 NEW 3 1 0 13 13 38 3 0.20 €-ACT 5 5 347 101 0.19 FULL 4 4 404 101 0.53

cw_6 NEW 2 0 1 14 1858 15 15 1 0.33 €-ACT 15 1534 12 32 8 0.26 FULL 14 1752 15 1515 101 0.71

oet_l NEW 4 2 0 12 12 62 4 0.34 €-ACT 8 8 759 202 0.40 FULL 6 6 1212 202 0.60

oet_2 NEW 3 2 0 12 12 57 4 0.32 €-ACT 7 7 292 46 0.29 FULL 6 6 1212 202 0.62

oetA NEW 4 0 2 19 5711 21 91 4 0.88 €-ACT 14 5895 16 920 202 1.19 FULL 12 3147 14 2828 202 2.19

oet_5 NEW 5 0 2 23 8067 24 106 4 1.19 €-ACT 31 10930 27 3839 202 4.27 FULL 31 10777 29 5858 202 6.22

oet_6.1 NEW 3 0 2 6 1283 6 26 3 0.27 €-ACT 9 3132 9 412 111 0.49 FULL 4 817 4 808 202 1.23

oet_6.2 NEW 5 0 2 23 7099 21 111 6 1.18 €-ACT 17 6236 18 2359 202 3.15 FULL 15 4479 15 3030 202 3.46

oet_6.3 NEW 7 0 2 48 11744 29 188 7 2.36 €-ACT 647 61831 113 21552 202 50.44 FULL 372 39721 77 15554 202 37.82

hz_l NEW 2 0 2 4 1206 6 8 1 0.23 €-ACT 4 1206 6 68 31 0.24 FULL 8 1887 10 2020 202 1.03


Table 2 Numerical results for problems with ISCi) I = 501.

PROB ALGO n ml mn NF NG IT EIBkl IB*I TIME ClL2 NEW 2 0 1 5 3130 5 12 2 0.71

f-ACT 10 6947 10 1718 378 2.12 FULL 5 2965 5 2505 501 3.33

cw_3 NEW 3 0 1 10 7972 14 22 2 1.08 f-ACT 15 13599 19 86 12 1.37 FULL 20 9216 16 8016 501 3.75

ClL5 NEW 3 1 0 47 47 142 2 1.28 f-ACT 6 6 2213 501 0.71 FULL 5 5 2505 501 1.17

ClLS NEW 2 0 1 14 9012 15 15 1 0.88 f-ACT 15 7509 12 144 40 0.79 FULL 14 8585 15 7515 501 2.98

oet_l NEW 4 2 0 15 15 86 4 1.21 f-ACT 8 8 3734 1002 1.55 FULL 6 6 12024 1002 2.62

oet_2 NEW 3 2 0 18 18 89 4 1.34 f-ACT 8 8 1582 224 1.02 FULL 6 6 6012 1002 2.58

oetA NEW 4 0 2 19 26511 21 95 4 2.71 f-ACT 14 29145 16 4508 1002 5.07 FULL 12 15531 14 14028 1002 9.80

oet_5 NEW 5 0 2 24 39314 23 102 4 3.91 f-ACT 22 43177 24 15987 1002 16.9 FULL 31 52255 29 29058 1002 28.4

oet_S.l NEW 3 0 2 6 6270 6 26 3 0.76 f-ACT 9 15415 9 2010 557 1.83 FULL 4 4017 4 4008 1002 2.67

oet_S.2 NEW 5 0 2 23 35073 21 118 7 3.70 f-ACT 19 33602 19 12688 1002 16.3 FULL - 15 22114 15 15030 1002 15.7

oet_S.3 NEW 7 0 2 109 149623 73 483 9 17.8 f-ACT 647 305237 113 106860 1002 250 FULL 376 196081 77 77154 1002 172

hz_l NEW 2 0 2 4 5968 6 8 1 0.69 f-ACT 4 5968 6 320 159 0.88 FULL 10 11386 12 12024 1002 5.60

186 CHAPTER 6

6 CONCLUSIONS

We have presented and analyzed a feasible SQP algorithm for tackling smooth nonlinear programming problems with a large number of constraints, e.g. those arising from discretization of SIP problems. At each iteration, only a small subset of the constraints are used in the QP sub-problems. Thus, fewer gradient evaluations are required and the computational effort to solve the QP sub-problems is decreased. We showed that the scheme for choosing which constraints are to be included in the QP sub-problems preserves global and fast local convergence. Numerical results obtained from the CFSQP implementation show that, indeed, the algorithm performs favorably.

Acknowledgments. The authors wish to thank Stephan Gomer for his many valuable comments on a preliminary draft of the paper. This research was supported in part by NSF grant No. DMI-93-13286 and by the NSF Engineering Research Centers Program NSFD-CDR-88-03012.

REFERENCES

[IJ M. C. Biggs. Constrained minimization using recursive equality quadratic programming. In F. A. Lootsma, editor, Numerical Methods for Non-Linear Optimization, pages 411-428. Academic Press, New York, 1972.

[2J P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, pages 1-51, 1995.

[3J I. D. Coope and G. A. Watson. A projected Lagrangian algorithm for semiinfinite programming. Math. Programming, 32:337-356, 1985.

[4] J. W. Daniel. Stability of the solution of definite quadratic programs. Math. Programming, 5:41-53, 1973.

[5J C. Gonzaga and E. Polak. On constraint dropping schemes and optimality functions for a class of outer approximation algorithms. SIAM J. Control Optim., 17:477-493, 1979.

[6] C. Gonzaga, E. Polak, and R. Trahan. An improved algorithm for optimization problems with functional inequality constraints. IEEE Transactions on Automatic Control, AC-25:49-54, 1980.

[7J S. A. Gustafson. A three-phase algorithm for semi-infinite programs. In A. V. Fiacco and K. O. Kortanek, editors, Semi-Infinite Programming and Applications, Lecture Notes in Control and Information Sciences 215, pages 138-157. Springer Verlag, 1983.

[8J R. Hettich. An implementation of a discretization method for semi-infinite programming. Math. Programming, 34:354-361, 1986.


(9) R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods, and applications. SIAM Rev., 35:380-429, 1993.

(10) R. Hettich and P. Zencke. Numerische Methoden der Approximation und SemiInfiniten Optimierung. Teubner Studienbiicher Mathematik, Stuttgart, Germany, 1982.

(11) K. C. Kiwiel. Methods of Descent in Nondifferentiable Optimization, Lecture Notes in Mathematics No. 1183. Springer-Verlag, Berlin, 1985.

(12) C. T. Lawrence, J. L. Zhou, and A. L. Tits. User's Guide for CFSQP Version 2.4: A C Code for Solving (Large Scale) Constrained Nonlinear (Minimax) Optimization Problems, Generating Iterates Satisfying All Inequality Constraints, 1996. ISR TR-94-16r1, Institute for Systems Research, University of Maryland (College Park, MD).

[13] C. Lemarechal. Nondifferentiable optimization. In G. Nemhauser, A. RinooyKan, and M. Todd, editors, Optimization, Handbooks in Operations Research and Management Science. Elsevier Science, North Holland, 1989.

(14) H. Mine, M. Fukushima, and Y. Tanaka. On the use of €-most active constraints in an exact penalty function method for nonlinear optimization. IEEE Transactions on Automatic Control, AC-29:1040-1042, 1984.

(15) K. Oettershagen. Ein superlinear konvergenter Algorithmus zur Losung semiinfiniter Optimierungsprobleme. PhD thesis, Bonn University, 1982.

(16) E. R. Panier and A. L. Tits. A globally convergent algorithm with adaptively refined discretization for semi-infinite optimization problems arising in engineering design. IEEE Transactions on Automatic Control, AC-34(8):903-908, 1989.

(17) E. R. Panier and A. L. Tits. On combining feasibility, descent and superlinear convergence in inequality constrained optimization. Math. Programming, 59:261-276, 1993.

(18) E. Polak and L. He. Rate preserving discretization strategies for semi-infinite programming and optimal control. SIAM J. Control and Optimization, 30(3):548-572, 1992.

(19) E. Polak and D. Q. Mayne. An algorithm for optimization problems with functional inequality constraints. IEEE Transactions on Automatic Control, AC-21:184-193, 1976.

(20) E. Polak and A. L. Tits. A recursive quadratic programming algorithm for semiinfinite optimization problems. Appl. Math. Optim., 8:325-349, 1982.

(21) M. J. D. Powell. A fast algorithm for nonlinearly constrained optimization calculations. In G. A. Watson, editor, Numerical Analysis, Dundee, 1977, Lecture Notes in Mathematics 630, pages 144-157. Springer Verlag, 1978.

(22) M. J. D. Powell. A tolerant algorithm for linearly constrained optimization calculations. Math. Programming, 45:547-566, 1989.

(23) R. Reemtsen. Discretization methods for the solution of semi-infinite programming problems. J. Optim. Theory Appl., 71:85-103, 1991.

188 CHAPTER 6

[24] S. M. Robinson. Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms. Math. Programming, 7:1-16, 1974.

[25] K. Schittkowski. QLD: A Fortran Code for Quadratic Programming, User's Guide. Mathematisches Institut, Universitiit Bayreuth, Germany, 1986.

[26] K. Schittkowski. Solving nonlinear programming problems with very many constraints. Optimization, 25:179-196, 1992.

[27] J. L. Zhou and A. L. Tits. An SQP algorithm for finely discretized continuous minimax problems and other minimax problems with many objective functions. SIAM J. on Optimization, pages 461-487, May 1996.

[28] G. Zoutendijk. Methods of Feasible Directions. Elsevier Science, Amsterdam, The Netherlands, 1960.

APPENDIX A

PROOFS

The following lemmas will be used in the proof of Proposition 3.7.

Lemma A.O.l Given x E IRn and H > 0, suppose 2' C 2" ~ 2. (i) If dO(x, H, 2') is not feasible for QPO(x, H, 2"), then vO(x, H, 2") < vO(x, H, 2'), and (ii) if dl (x, 2') is not feasible for QP I (x, 2"), then vI (x, 2") < vI (x, 2').

Proof: First dO(x, H, 2") :f dO(x, H, 2'), since by assumption dO(x, H, 2') is not feasible for QPO(x, H, 2"). On the other hand, since 2' C 2", dO(x, H, 2") is feasible for QPO(x, H, 2'). Uniqueness of the solution of QPO(x, H, 2') then implies the claim. Part (ii) is proved similarly. 0

Lemma A.O.2 Suppose K is an infinite index set such that

kEIC * H kEIC H* JO kEIC dO * dl hEIC dl * hEX; * Xk ~ x, k ~ 'Uk ~ " k ~ " "{k ---t "( ,

where x* is not a KKT point for (DSI), and suppose 2k = 3 for all k E K. Then there exists! > 0 such that for all t E [O,!], fjJ(Xk + tdk'~) ~ 0, for all { E 3, and for all k E K sufficiently large.

Proof: By definition of d~ and "{k, for all k E K, fjJ(Xk, 0 + (\7 ",fjJ(Xk, ~), dl) ~ "{k, for all ~ E 3. Since Xk is not a KKT point, d~ :f 0 and "{k < 0, for all k E K (Lemma 3.2). Further, in view of Lemma 3.4 (dl ,*, "(*) solves Qpl(X*, 3), and since x* is not a KKT point, dl ,* :f 0 and "{* < 0 (Lemma 3.2). Thus, there exists 1 < 0 (e.g. 1 = "(* /2) such that for all k E K, fjJ(Xk'~) + (\7 ",fjJ(Xk, ~), dl) ~ 1, for all ~ E 3. It follows that there exists 0 > 0 and Is. such that for all k E K, k? Is.,

(\7",fjJ(xk,~),dD ~ -0, v~ E 3n2act (x*)

fjJ(Xk,~) ~ -J, v~ E 3 \ (3 n 2 act (x*)).

Next, in view of Lemma 3.1, 4c :f 0 and (\7",fjJ(xk,~),d2) ~ 0, for all ~ E 2act(Xk), for all k E K. On the other hand, applying Lemma 3.4 allows us to conclude dO,* solves QP(x*, H*, 3). Hence, from Lemma 3.6, since x* is not

190 CHAPTER 6

a KKT point for (DS!), d O,* :/:- O. Since p(.) is assumed to be bounded away from zero outside every neighborhood of zero, there exists p > 0 such that Pk = p(d2) 2: f!., for all k E /C. It follows that -

(1- Pk)(\1x¢(Xk,e),d2) + Pk(\1x¢(xk,O,dl)

< -f!.0' Ve E 3 n 3 act (x*),

~ ~ for all k E /C, k 2: k. Now let Q = {Xk\ k E /C} u {x*}, V = {dk\ k E /C} U {d*} and define

which is well-defined and continuous in t for all e E 3, since Q and V are compact. Now for all k E /C, e E 3 we have

¢(Xk + tdk, e) - ¢(Xk' e)

11 (\1 x¢(Xk + trydk' e), dk)dry

t {11 (\1 x¢(Xk + trydk' e) - \1 x¢(Xk, e), dk)dry + (\1 x¢(Xk, e), dk)}

< t{ sup IIVx<l>(Xk+tTJdk,O-Vx<l>(Xk'~)II.lIdkll+(Vx<l>(Xk,~),dk)} 1/E[O,1]

< t{M(t,e) + (\1x¢(xk,e),dk)}. (0.1)

Further note that M(O,e) = 0, for all e E 3. For e E 3 n 3 act(x*), define t.t, such that M(t,O < f!.0 for all t E [O,t.t,l. For all e E 3 \ (3 n 3 act (x*», our regularity assumptions and boundedness of {Xk} and {dd imply there exist M 1 ,e > 0 and M2 ,e > 0 such that

For such e, define t.t, = 0/(M1,e+M2,e)· Then t{M(t, e)+(\1 x¢(Xk, e), dk)} ~ 0, for all t E [O,t.t,j, e E 3 \ (3 n 3 act (x*». Finally, set t. = maxeES t.t,. From (0.1) it is easily verified that t. is as claimed. 0

Proof of Proposition 3.7. We argue by contradiction. Suppose that

v* ~ liminf Vk > O. k

(0.2)

As all sequences of interest are bounded, there exists an infinite index set K such that

Vk ~ v*, Xk ~ x*, Hk ~H*, ~ * Pk P ,

dl. ~ dO,*, &. ~ dO,* k+l + ' dk ~ d1,*, d1 ~ d1,* k+l + '

° kEIC ° ° ~ 0,* VI ~ v1,* 1 ~ 1,* Vk ~ V '*, Vk+1 V+ ' k , Vk+1 V+ '

~ * 'Yk 'Y , ~ * 'Yk+l 'Y+.

Further, since the number of possible subsets of 3 is finite, we may assume that on K, the sets 3:,0 and 3:,1 are constant and equal to Sb,O and Sb,1 , respectively. Thus, for all k, dl. solves QPO(xk,Hk,Sb,O) and dk solves QP1(Xk,Sb,I). Note that in view of the definition of 3:,0 and 3:,1, the sequences constructed by the algorithm are identical to those that would have been constructed with 3' ~ Sb,O U Sb,1 substituted for 3 k for all k. Without loss of generality we thus assume that 3 k = 3', for all k. Finally, define d* = (1 - p*)do,* + p*d1,*.

In view of Lemma 3.4, dO,* and d1,* are the unique solutions of QPO(x*, H*, 3') and QP1 (x*, 3'). Now, of course, x* is not a KKT point for (DS!), otherwise, in view of Lemmas 3.1 and 3.2, dO,* = d1,* = 0, which would imply v* = vO,* + v1,* = 0, contradicting (0.2). Hence, dO,* =I- 0 and d1,* =I- 0, and both are directions of descent for f(·) at x*. This further implies d* =I- 0 and (V' f(x*), d*) < O. Therefore, applying Lemma 3.5(iii), we conclude that tk ~ O. Without loss of generality, assume that tk < min{<5,t}, for all k E K, where t is as given by Lemma A.0.2 and <5 > 0 is as in the algorithm. The fact that tk < <5 < 1 implies that for all k E K the line search criterion of Step 2 is not satisfied at Xk+1 = Xk + o/Jdk. Since a < 1 (indeed, a < 1/2) using a standard argument it follows that (2.4) is violated at Xk+1 only finitely many times. Thus, without loss of generality, assume (2.6) holds for all k E K, i.e.

¢(Xk+l'~) > 0, Vk E K.

Further, we have assumed (since there are only a finite number of constraints) that the violation is caused by the same constraint, with index ~, for all k E K. In view of Lemma A.0.2, we may conclude that ~ ~ 3'. Thus, according to Step 3(iv) , Hk+l = Hk, for all k E K.

Since, for all k E K, Hk+1 = Hk and 3 k = 3' = 3:,0 u 3:,1, the directions df+1 and dk+l solve QPO(xk+1,Hk,3k+d and QP1(Xk+l,3k+1), for all k E K, where, for some 3",

~ ~" """\ ~, U {i} '::'k+l =.::. :::!..::. ...

192 CHAPTER 6

Without loss of generality, since the number of constraints is finite, we may assume that the set of indices in 3 k +1 and not in 3' U {~} is constant for all k E JC. Further, in view of Lemma 3.5(iii), Xk+1 ~ x*. It follows that, in view of Lemma 3.4, the limits d~'* and d~* are the unique solutions of QPO(x*,H*,3/1) and QP1(x*,3/1). Since, ¢(Xk+1'~) > 0 and ¢(Xk+l'~) ~ 0, for all k E JC, and since Lemma 3.5(iii) also implies Xk+l ~ x*, we see that ¢(x*,~) = o. By considering a first-order expansion of ¢(Xk+1'~) - ¢(Xk+1, ~), and taking limits, we see that ('\1 x¢(x*, ~), d*) 2: O. Note that since dO,* "I 0 and d2 "10, for all k E JC, d2 is bounded away from zero. By our assumptions on p(.), Pk is thus bounded away from zero and p* > O. This implies that either ('\1 x¢(x*, ~), d1,*) 2: 0, or ('\1 x¢(x*, ~), dO,*) > O. If the first inequality holds, then (d1,*,,*) is infeasible for QP1(x*,3/1) (recall that ¢(x*,~) = 0 and, from Lemmas 3.4 and 3.2, ,* < 0) and, in view of Lemma A.0.1, v~* < v1,*. Similarly, if the second inequality holds, then dO,* is infeasible for QPO(x*, H*, 3/1), and v~* < vO,*. In view of (0.2), in both cases we have a contradiction. 0

The following sequence of Lemmas will be used in the proof of Proposition 3.10.

Lemma A.0.3 There exists an infinite index set JC such that, for all k E JC, ( .) ~ (*) C ~b,O d ( .. ) ~ (*) C ~b,l ~ '::'act x -'::'k' an n '::'act x -'::'k·

Proof: In view of Proposition 3.7, since for all k v~ 2: 0 and vk 2: 0, there exists an infinite index set JC such that both vZ ~ 0 and vk ~ O. By Lemma 3.6, eft, ~ 0 and dl ~ o. To prove (i), let AZ,(, ~ E 3k, be the multipliers from QPO(xk,Hk,3k) and let AZ,( = 0, for all ~ f/. 3k. Assume, without loss of

generality, that 3~'0 = 2° for all k E JC and Hk ~ H*. Since 11. is compact, and in view of Assumptions 2 and 5, we may apply Theorem 2.1 of [24) to show that AZ,( ~ A~'*, ~ E 3, the KKT multipliers for QPO(x*, H*, 2°). Note

that the KKT conditions (2.2) for QPO(x*, H*, 20J are equivalent to the KKT conditions (2.1) for (DS!) at x* with multipliers At, ~ E 3. Uniqueness of the

multipliers at x* (Assumption 2) and strict complementarity imply A~'* > 0 if

~ E 3 act (x*). Therefore, 3act (x*) ~ 2b,0, which means 3 act(x*) ~ 3~'0, for all k E JC. Part (ii) is proved similarly. 0

Lemma A.O.4 Given f > 0, there exists 8 > 0 such that for every x E X satisfying Ilx - x*11 < 8, every H E 11., and every 2 ~ 3 with 3 act (x*) ~ 2,

(i) all ~ E 3 act(x*) are binding for QPO(x,H,2) and IIdO(x,H,2)11 < f, and

(ii) all ~ E 3 act(x*) are binding for QP1(x,2) and Ild1 (x,2)11 < f.

Proof: Given H E 1£ and 3 ~ 3 such that 3 act(x*) ~ 3, Lemmas 3.1 and 3.2 imply that dO(x*, H, 3) = d1(x*, 3) = O. Since 1£ is compact, Assumptions 2 and 5 allow us to apply Theorem 2.1 of [24] to conclude that, given f > 0, there exists 83 > 0 such that for all x satisfying Ilx - x* II < 83 and all H E 1£, the QP multipliers from QPO(x,H,3) and QP1(x,3) are positive for all ~ E 3 act (x*), IIdO(x,H,3)1I < f, and Ild1 (x,3)11 < f. As 3 is a finite set, 8 may be chosen independently of 3. 0

Lemma A.0.5 For k sufficiently large 3 act{x*) ~ 3~'o and 3 act{x*) ~ 3~,1.

Proof: For an arbitrary f > 0, let 8 > 0 be as given by Lemma A.OA. In view of Theorem 3.9, there exists Is. such that Ilxk - x* II < 8 for all k 2:: Is.. By Lemma A.0.3, there exists an infinite index set K such that 3 act {x*) ~ 3~'o, and 3 act (x*) ~ 3~,1, for all k E K. Choose Is.' 2:: Is., Is.' E K. It follows that 3 act (x*) ~ 3k,+!. The result follows by induction and Lemma A.OA. 0

Lemma A.0.6 ~ -----+ 0 and d~ -----+ O.

Proof: Follows immediately from Lemma A.0.5, Step 3(iii) of algorithm FSQP-MC, Assumption 4, and Lemma A.OA. 0

Proof of Proposition 3.10. For (i), in view of Lemma A.0.5, it suffices to show that, for k sufficiently large, 3~'o ~ 3 act {x*) and 3~,1 ~ 3 act {x*).

Suppose t E 3 \ 3 act (x*), i.e. ¢(x*, t) < O. Since Xk -----+ x*, by continuity we have ¢(Xk'~) < 0 for all k sufficiently large. In view of Lemma A.0.6, for k sufficiently large we have

Therefore, A~,t = 0 (hence ~ f/. 3~'o) for all k sufficiently large. The argument

is identical for 3~,1. Part (ii) follows from Theorem 3.9, Lemma A.0.6, and our regularity assumptions. 0

7 NUMERICAL METHODS FOR

SEMI-INFINITE PROGRAMMING: A SURVEY

Rembert Reemtsen1 and Stephan Gorner2

ABSTRACT

1 Brandenburgische Technische Universitiit Cottbus, Fakultiit 1, Post/ach 101344, 03013 Cottbus, Germany,


2 Technische Universitiit Berlin, Fachbereich Mathematik, StrafJe des 17. Juni 136, 10623 Berlin, Germany,


This paper provides a review of numerical methods for the solution of smooth semiinfinite programming problems. Fundamental and partly new results on level sets, discretization, and local reduction are presented in a primary section. References to algorithms for real and complex continuous Chebyshev approximation are given for historical reasons and in order to point out connections.

1 INTRODUCTION

The first book in which semi-infinite programming (SIP) problems were treated systematically seems to be the book by Glashoff and Gustafson [53,54] on linear programming. Later on, the proceedings volumes [5,45,83] provided important collections of articles on this subject. A milestone in SIP eventually was the book by Hettich and Zencke [91], which contains the fundamentals and basic ideas for the numerical solution of Chebyshev approximation (CAP) and general SIP problems.

More recently, Polak [148] and Hettich and Kortanek [88] gave surveys of SIP from the viewpoint of nondifferentiable and smooth optimization, respectively. Polak also devoted a chapter in his book [150] to SIP problems. While the authors of [88, 150] concentrate on certain types of methods, the purpose of

195

R. Reemtsen and J.-J. Riickmann (eds.), Semi-Infinite Programming, 195-275. © 1998 Kluwer Academic Publishers.

196 CHAPTER 7

this paper is to describe the state-of-the-art of numerical methods for general smooth SIP problems. Simultaneously as much background and help are provided for the practitioner as this is possible in such framework.

One motivation for investigating SIP problems originally was to unify and to extend numerical methods for the solution of CAP problems. Earlier, such methods had typically been developed by techniques from approximation theory. By today's understanding, however, they often own few or no peculiarities, compared with general SIP methods. Even the second Remes algorithm for linear real CAP problems, which is based on a special optimality criterion [134], is only slightly more efficient than an exchange algorithm for general SIP problems (cf. Section 3.1). Thus, when we refer to work on CAP, we do this for historical and illustrative purposes. Yet we do not strive for a complete list of references concerning CAP, since much of the work on such problems has been absorbed in SIP and since, in our opinion, CAP problems should normally be solved by SIP methods nowadays.

The paper is organized as follows. Introductions into the SIP and CAP problems and into the fundamentals for their numerical solution are provided in Section 2. Subsections deal with level sets, discretization, local reduction, and a classification of methods. The study of numerical methods begins with methods for linear SIP problems in Section 3, where dual exchange, primal exchange, interior-point, and further methods are considered in turn. In Section 4, the review is continued with a discussion of cutting plane methods and other algorithms for convex problems. The final Section 5 is devoted to methods for nonlinear problems, which are classified by discretization methods, methods based on local reduction, and other methods comprising hybrid methods.

2 FUNDAMENTALS

2.1 Notations

No means the set N U {O}, OC the field of real or complex numbers, and IAI the cardinality of the set A. For X ~ IRn and Z ~ IRffl, the space denoted by Cr,S(X x Z, OC) is the space of all OC-valued functions on X x Z which have r partial derivatives with respect to the first and s partial derivatives with respect to the second argument where the respective derivatives are continuous on X x Z. The definition of the space Cr(X, OC) then is obvious. C(· x ".) and C(',·) stand for CO,O(.,') and CO(" .), respectively.

Numerical Methods 197

The vector (x, z) with x E mp and z E m.q is considered to be an element of the product space m.p x m.q or to be a column vector in JW+q, depending on the circumstances. 11·11 is an arbitrary norm, and 11'lIp ' 1 :::; p :::; 00, is the usual lP -norm in m.s , sEN. Sequences of vectors xk and sets Yk are written as {xk} and {Yd and begin with XO resp. Yo. The density of a set Y.,. ~ Y in Y ~ m.m is defined by

dist(Y,,., Y) := sup inf Ily - y.,. 1100 . YEyy"EY"

"w.l.o.g." means "without loss of generality". "LP", "QP", and "SQP" stand for "linear programming" , "quadratic programming" , and "sequential quadratic programming" , respectively.

2.2 The semi-infinite programming problem

We consider the optimization problem

Minimize f(x) subject to gj(x,y):::; 0, Y E yj (j = 1, ... ,p),

(2.1)

where yj ~ m.mj is a compact set with cardinality jYj I = 1 or Iyj I = 00, and f E c(m.n, m.) and gj E c(m.n x yj, m.) are given functions. We speak of a finite optimization problem if Iyjl = 1 for j = 1, ... ,p, in which case we can write gj(x) := g(x, y) with y E yj, and we speak of a semi-infinite programming (SIP) problem when Iyjl = 00 for at least one j. The number n is said to be the dimension of the problem. For the sake of simplicity, all functions here are normally defined on the entire space m.n .

Problem (2.1) is said to be linear if f and gj("Y)' y E yj, are affine-linear and to be convex if these functions are convex. We speak of a nonlinear problem if neither linearity nor convexity of f and gj("Y)' y E yj, are required. W.l.o.g. the objective function f of (2.1) can be assumed to be linear when requested, since (2.1) can be equivalently converted into the following (n + I)-dimensional problem of the same type:

Minimize Xn+l subject to gj(x,y) :::; 0, Y E yj (j = 1, ... ,p), f(x):::; xn+l'

An (inequality) constraint gj(x,y) :::; 0, Y E yj, with constraint function gj is denoted as finite when jYj I = 1 and as semi-infinite when Iyj I = 00. For brevity, an equality constraint hj(x) = ° is assumed to be included in (2.1)

198 CHAPTER 7

by the two finite inequality constraints ±hj(x) ::; O. Sometimes, however, it is required that the feasible set of (2.1) has a nonempty interior, which excludes equality constraints.

Instead of (2.1), we usually relate in this paper to the SIP problem

P[Y]: Minimize f(x) subject to g(x, y) ::; 0, y E Y, (2.2)

with given functions f E C(JRn , JR) and 9 E C(JRn x Y, JR) and a compact set Y c;:; JRm. In fact, by proper definition of g, problem (2.1) can be written in this form so that results for (2.2) are also valid for problem (2.1). (Let m := maxl~j~p mj, embed the set yj in JRm, redefine some or all gj(x,') and yj, if necessary, such that all sets yj are disjoint, and let g(x,·) be the function defined on Y := U;=l yj which equals gj(x,') on yj.) Normally the set Y is believed to have cardinality WI = 00 here, but this does not need to be assumed. Many algorithms for SIP problems are also applicable to finite programs and especially suited for programs with many constraints.

For each subset Yo- £; Y, we define the minimization problem

P[Yo-]: Minimize f(x) aver F(Yo-)

with the feasible set resp. the set of feasible points

F(Yo-):= {x E JRn I g(x,y)::; 0, Y E Yo-}.

We call the number J,t(Yo-):= inf f(x)

xEF(Y,,)

(2.3)

(2.4)

the minimal value of P[Yo-], and we say that XO- E F(Yo-) is a (global) solution of P[Yo-] if f(xo-) = J,t(Yo-). For Yo- := Y, problems (2.2) and (2.3) coincide. If Yo- c;:; Y is a finite and Y an infinite set, problem P[Yo-] is said to be a discretized SIP problem.

In regard to P[Yo-], the inequality constraint g(x, y) ::; 0 with y E Yo- is said to be active at x E F(Yo-) if g(x,y) = 0 and inactive if g(x,y) < O. It is violated at x E JRn if g(x,y) > O. For x E F(Yo-) a point y E Yo- with g(x,y) = 0 is also called an active point. A feasible point x E F(Yo-) with g(x, y) < 0 for all y E Yo- is a Slater point or an interior (feasible) point.

Most feasible or interior-point methods require the knowledge of a feasible or an interior starting point. Provided that the method converges and a Slater


point exists, such a point can be found for P[Yu ) (in "phase I") by finitely many iterations of the respective method applied to the (n + I)-dimensional problem

Minimize Xn+1 subject to g(x,y) ::; Xn+1, Y E Yu . (2.5)

Obviously, an interior feasible point for problem (2.5) can be given easily.

Evidently, x E F(Y) is true if and only if maxyEY g(x, y) ::; o. Thus, in order to check whether a given point x E IRn is feasible or not, a (continuous) global maximizer and hence all (continuous) local maximizers of g(x,·) on Y have to be determined. Up to now, however, there does not exist an algorithm which is able to detect a global maximizer of an arbitrary continuous function with certainty. Therefore, in practice, normally all candidates for local maximizers first have to be bracketed by comparison of function values of g(x,·) on a sufficiently dense finite set Yu ~ Y and to be computed by an iterative procedure afterwards.

There exist many methods for the computation of the unique local maximizer in a specific subregion of Y if Y is a one-dimensional set. (The authors use a safeguarded quadratic interpolation method by Powell, which, as routine DUVMIF, is included in [98).) In case Y has a dimension greater than one, all zeros of 'V' yg(x, .) in the interior of Y have to be computed, for example, by means of the BFGS quasi-Newton method (e.g. [7,195)), and all maximizers on the boundary of Y have to be specified separately (e.g. [63)). In applications, the set Y almost always had the dimension one or two until now. (See [176) for an example with m = 3 and [172,174) for examples with up to m = 6, where in [172, 174) the maximizers were determined by a stochastic procedure.) Computing time can be saved by parallel computation of maximizers, especially when the problem includes several semi-infinite constraints [137).

The described ideas for checking feasibility of a point include the risk that not all local maximizers and therefore possibly not a true global maximizer are found. (One bracketed area may contain more than one local maximizer, if Yu

was not chosen dense enough, or the used algorithm may not always converge.) Difficulties caused by that, however, have not been reported but may occur when g(x,·) is (almost) constant on Y or on parts of Y (cf. Section 2.5).

There is little known about how accurately the local maximizers of g(xk ,.)

on Y have to be computed at an iterate xk in practice so that convergence of a method is preserved. (See e.g. [64,65,96] for results at use of SQP type methods). For the sake of simplicity, it is assumed throughout this paper that all needed local maximizers can be determined with sufficient accuracy and that hence algorithms which employ maximizers are always implementable in this respect.

200 CHAPTER 7

2.3 The Chebyshev approximation problem

A standard example of an SIP problem is the Chebyshev approximation (CAP) problem

Tn-l:= inf max Id(w) - F(x, w W , xEKn - 1 wEn

(2.6)

where K n - 1 ~ JRn-l is nonempty, n ~ JRm is nonempty and compact, q = 1 or q = 2, and d E C(n, lK) and F E C(Kn-l x n, lK) are given lK-valued functions. Problem (2.6) is said to be real when lK := JR and complex if lK := C. It is unconstrained if K n- 1 := JRn-l and constrained otherwise. Problem (2.6) is said to be linear when Vk E C(n, lK) exist such that F has the form

n-l F(x,w) := " XkVk(W) = v(wf x (2.7)

~k=l

where v(w) := (Vl(W), ... ,Vn_l(w))T E JRn-l for wEn. The linear problem has a solution if K n - 1 is closed. (E.g. modify Theorem 1 in [134].) Especially, if lK := lR, K n- 1 := JRn- 1 , and Vl, ... , Vn-l satisfy the Haar condition, the solution is unique and characterized by the alternation condition [134].

For complex CAP problems, it is no restriction to let K n - 1 be a subset of JRn-l, since a problem with e complex variables Xj can be transcribed into a problem with the 2£:= n -1 real variables Re{xj} and Im{xj}. The set n is assumed to be real also for lK := C here, since n is real at all applications of which we know, at least after use of the maximum principle. An extension of the below given results to compact sets n ~ em is straightforward, as also is an extension (with obvious definitions) to real or complex CAP problems of type

inf max max Idj(w) - Fj(x,wW . xEKn - 1 l~J~r wEnJ

Problem (2.6) can be equivalently stated as the following optimization problem with n variables and linear objective function:

Minimize f(x) := Xn with x:= (x,xn) subject to x E K n- 1 , Xn E JR, g(x,w) := Id(w) - F(x,wW - Xn ~ 0, wEn.

(2.8) Evidently, each point (x, Xn) with X E K n- 1 and sufficiently large Xn is a Slater point of this problem and hence may serve as a starting point for a method which needs such point.

Typically, for real CAP problems, one chooses q = 1. In this case the semiinfinite constraint of problem (2.8) can be equivalently replaced by

g(x, (w, a)) := a[d(w) - F(x,w)]- Xn ~ 0, (w, a) En x {-I, I}. (2.9)


One possibility to handle the complex case (cf. [55]) is to let q = 1, to employ the identity Izl = maxo E[O,21rj Re{ze- io } for z E C, and to rewrite the semiinfinite constraint in (2.8) in the form

g(x, (w, a» := Re { e- io [d(w) - F(x, w)]} - Xn ::; 0, (w, a) E n x [0,27r]. (2.10)

If especially Kn - 1 is describable by linear constraints and F is given as in (2.7), problem (2.6) is converted into a linear SIP problem of the form (2.2) in these ways.

For complex problems, a proper alternative is to choose q = 2 in (2.6) resp. (2.8). In that case, the semi-infinite constraint in (2.8) is everywhere smooth and quadratic with respect to x, and the dimension of the region generating it does not need to be augmented by one as at use of (2.10). For linear F, problem (2.8) then is a convex SIP problem if K n - 1 is given by finitely many linear equality and/or convex (semi-) infinite inequality constraints. This latter approach was used in [118,160,177]. In the complex case, exclusively such linear CAP problems seem to have been solved as SIP problems up to now (see [177] for a survey).

2.4 Level sets

For each feasible point x F E F(Y) of pry] (if such point exists) and Yo- ~ Y, we define the level set

(2.11)

By continuity of the involved functions, this set is closed.

For x F E F(Y), one has

inf f(x) = inf f(x), xEF(Y) xEA(x F ,y)

(2.12)

and a solution of problem pry] (if it exists) is a solution of the right-hand problem in (2.12) and conversely. Henceforth pry] can be replaced by the latter problem. Moreover, by Weierstrass' theorem, either problem has a solution when F(Y) is bounded or, what is less restrictive, when

A(xF , Y) is bounded for some xF E F(Y). (2.13)

Condition (2.13) is a classical assumption in finite optimization.

202 CHAPTER 7

Boundedness of F(Y) is often enforced by the requirement that a bounded set X ~ JRn is known with X ;2 F(Y). This not only ensures the existence of a solution of the SIP problem, but also the existence of an accumulation point for each sequence {xk} of points in X, generated, for example, by an algorithm. Both is also guaranteed if a bounded set X exists with X ;2 A(xF , Y) and if {xk} is contained in X and satisfies f(x k ) ~ f(x F ). Note that F(Y) can be unbounded and simultaneously A(xF , Y) be bounded, as it is the case for linear CAP problems with linearly independent basis functions VI, ... , Vn-I.

Occasionally, a bounded set X as described is needed explicitly and to be defined by linear inequalities (cf. Section 4.1). At numerical examples, in such case, X is typically given by box constraints, i.e. in the form

X:= {x E JRn 1 Clj ~ Xj ~ {3j, j = 1, ... ,n} (2.14)

with given reals Clj and {3j. In practice, a box (2.14) containing a solution of the given problem, however, may not be known a priori, and the optimal Xj

may vary in several orders of magnitudes, which can make the choice of trial constants Clj and {3j difficult. (Both is the normal situation for CAP problems.) In particular, if Clj and {3j are selected tentatively, a too large box may lead to many additional iterations of an algorithm, as for certain cutting plane methods (e.g. [31]), whereas, in case of a too small box, the problem has to be solved repeatedly with adapted constants when the obtained solution lies on the boundary and therefore the solution of pry] is possibly located outside of X. In such cases, knowledge of a proper set X which tightly encloses A(xF, Y) is relevant. (See also Section 4.1 on this.)

For SIP problems, the set A(xF , Y) is given by infinitely many inequalities and hence does not provide a suitable choice for X. Therefore, one often starts from the subsequent assumption (e.g. [91,178]).

Assumption 2.1 There exist x F E F(Y) and a finite set Yo ~ Y such that the level set A(xF , Yo) is bounded.

Remark 2.2 If Assumption 2.1 is satisfied with respect to some xF E F(Y) and finite Yo ~ Y and if problem pry] is linear, the constraints g(x, y) ~ 0, Y E Yo, can be written in the form A(Yo)x ~ b(Yo) with some A(Yo) E JRIYolxn and b(Yo) E JRI Yo I, and one can easily conclude from geometric considerations that IYoI 2': nand mnk(A(Yo)) = n.


For each xF E F(Y) and finite Yo ~ Y, one has A(xF, Y) ~ A(xF, Yo). Consequently, Assumption 2.1 in particular implies condition (2.13) and the following theorem. (But see (2.12) and note that infxEA(xF,Yo) f(x) :S infxEF(Y) f(x).)

Theorem 2.3 If Assumption 2.1 is satisfied, problem PlY] has a solution.

For convex problems, Assumption 2.1, in fact, is equivalent with condition (2.13), as the following result shows.

Lemma 2.4 Let f and 9 (., y), y E Y, be convex. The following conditions are equivalent:

(a/ The set of solutions of PlY] is nonempty and compact.

(b) A(xF, Y) is bounded for at least one x F E F(Y).

(c) A(xF, Y) is bounded for each xF E F(Y).

(d) For each xF E F(Y) there exists a 8 > 0 such that A(xF, Yo) is bounded for each closed set Yo ~ Y with dist(Yo, Y) :S 8.

(e) There exist x F E F(Y) and a finite set Yo ~ Y such that A(xF, Yo) is bounded.

Proof. (a) implies that A(x, Y) is bounded for each solution x of PlY]. Let now (b) be valid and assume that, for some x E F(Y), the set A(x, Y) is unbounded. Then {xF} ~ A(xF, Y) ~ A(x, Y) is true, and xk E A(x, Y) exist with Ilxkll -+ 00 for k -+ 00. We define zk := xk - x F and w.l.o.g. assume zk j Ilzk II -+ z for k -+ 00 for some z E IRn with Ilzll = 1. Since A(x, Y) is convex, x F + Ak(Xk - xF) lies in A(x, Y) for Ak := min(l, oj Ilzk II) with 0 > O. Thus we obtain g(xF + Ak(Xk - xF),y) :S Akg(Xk,y) + (1- Ak)g(XF,y) :S 0, Y E Y. Observing that Ak = oj Ilzk II is true for all sufficiently large k and taking the limit for k -+ 00, we reach g(xF + oz,y) :S 0, y E Y. Owing to f(x k) :S f(x), we similarly arrive at f(x F + oz) :S f(x F). Since 0 > 0 is arbitrary, both results together contradict (b).

Now assume that (c) holds true but (d) not. Then there exist x F E F(Y) and a sequence {Yd of closed subsets of Y satisfying limk-+oo dist(Yk, Y) = 0

204 CHAPTER 7

so that A(xF , Yk ) is unbounded for each kEN. Thus, for y E Y, we can find yk E Yk such that yk -+ y for k -+ 00, and we can find xk E A(xF , Yk) such that Ilxk II -+ 00 for k -+ 00. We particularly have g(xk, yk) :::; 0 and f(x k ) :::; f(x F). Next, we define zk, z, and Ak with a > 0 in the same way as before. Using g(xF + Adxk - xF), yk) :::; Akg(xk, yk) + (1 - Ak)g(XF, yk) :::; 0, we similarly infer g(xF + az, y) :::; 0, y E Y, and f(x F + az) :::; f(x F) for all a > O. This contradicts condition (c).

Condition (d) trivially implies condition (e). Finally, (a) follows from (e) since the solution set of pry] is closed and contained in A(xF , Y) resp. A(xF , Yo).

o

The equivalence of (a), (b), and (c) of Lemma 2.4 can be found in a somewhat different form e.g. in [91, p.71].

Corollary 2.5 Let f and g(., y), y E Y, be convex.

(a) If pry] has a unique solution, Assumption 2.1 is satisfied.

(b) If (2.13) is true for some x F E F(Y) and {Yi} is a sequence of finite subsets in Y with limi-too dist(Yi, Y) = 0, then, for all sufficiently large i, the set A(xF, Yi) is compact, and problem P[Yi] has a solution.

Other than in finite optimization, the point x F E F(Y) from Assumption 2.1 is usually not needed in SIP and hence does not have to be specified. For the convergence proof of several SIP algorithms, however, it is required that A(xF , Yo) is bounded, in particular, for that finite subset Yo of Y which is needed to initialize the algorithm. It is shown in the last part of this subsection that, for linear real and complex CAP problems, a set with the requested property can usually be determined easily. For that we consider problem (2.8) and let K n - 1 := IRn - 1 and Y := 0 in order to reach consistency with the definitions of F(Y) and A(xF , Y) in Section 2.1. Evidently, a result corresponding to the following theorem is also true for each nonempty set K n - 1 ~ IRn - 1 .

Theorem 2.6 Let problem (2.8) be given with F(x,·) := v(·)Tx, X E IRn - 1 ,

and assume that each function v(·)Tx with x =I 0 has at most "I E N zeros on O. Then, for this problem, the level set A(xF , 0 0 ) is bounded for each x F E F(O) and each set no ~ n with Inol 2:: "I + 1.


Proof. Let x F E F(n) and x E A(xF,no). Then one has 0 ::; Xn ::; x~. For Inol > 'Y one furthermore obtains ~ := minl!xll=l maxwEno Iv(w)Txl > 0 so that

IIxll ~ ::; Ilxll max Iv(Wf II~III = max Iv(wfxl ::; max Id(w)1 + (x;) ~. 0 wEno X wEno wEn

Consequently, if q = 1 in (2.8) and Inol ~ 'Y + 1, the requested level set is also bounded for Yo := no x {-I, I} at use of (2.9) for real problems and for Yo := no x Ao with an obvious choice of Ao ~ [0, 21l"] such that IAol = Inol at use of (2.10) for complex problems. (The latter, together with Remark 2.2, answers open questions in [46,199].) If the lK-valued functions VI, ... , Vn-I satisfy the Haar condition, one especially has 'Y = n - 2 [134].

2.5 Discretization

One approach to the solution of an SIP problem suggesting itself is to minimize its objective function subject to only a finite subset of the infinite set of constraints and to possibly repeat the procedure for an enlarged set when higher precision is requested or when, from consideration of a sequence of such solutions, an estimate of their accuracy is to be obtained. More precisely, the idea is to successively compute an "(approximate) solution" of the discretized SIP problem P[Yi] for i = 0,1,2, '" by an algorithm from finite optimization where {Yi} is a sequence of finite subsets of Y such that limHoo dist(Yi, Y) = O. A procedure of this type is denoted as a discretization method. The grid sequence {Yi} needed for that is usually prescribed a priori. Occasionally it is also successively defined a posteriori (or "adaptively"), where information obtained on the i-th discretization level is utilized to define YiH' (The latter case is the exception and therefore always emphasized here if it has been chosen.)

For the convergence proof of a discretization method it has to be ensured that each accumulation point of the sequence of "solutions" xi* of P[Yi] is a "solution" of the SIP problem pry]' where a "solution" of P[Yi] is that point to which the respective algorithm for P[Yi] converges according to its convergence proof. (That is also meant by a "solution" in the following discussion.) Two such stability results are given below. The first one states convergence of global minimizers and hence relates to linear and convex problems mainly. The second result employs solutions as they can be realistically obtained by algorithms for finite nonlinear optimization problems.

206 CHAPTER 7

Stability theorems of this type obviously entail that also each accumulation point of a sequence of approximate solutions xi of P[Yi] solves problem pry] if Ilxi - x i* II -+ a is true for i -+ 00. The inner algorithm used for the solution of P[Yi] generates such point xi after finitely many iterations, provided that it converges. When xi has been found, the grid index is increased by one. Thus the total procedure is practicable in principle, where optimality functions provide an implement able tool to measure the degree of accuracy of a point in regard to optimality for P[Yi]. (The usefulness of such optimality functions in optimization has primarily been explored by Polak, e.g. in [150].)

In regard to the solution of a discretized problem P[Yi]' it is not advisable to employ any method from finite programming. Such methods often require the solution of subproblems, which have the same number of constraints IYi I » n as the problem itself, and hence do not use to advantage that the constraints of the problem originate from a continuous function. For the all-over efficiency of a discretization method it is furthermore important that the information about the solution of the problem obtained on one discretization level is transported to the next level, which means that the solution xi* resp. Xi of P[Yi] can be exploited as a starting point for the solution of P[Yi+l]' Such starting point, however, is usually infeasible for P[Yi+d since, compared to P[Yi], problem P[Yi+d includes additional or different constraints. Therefore many methods which require a feasible starting point and consequently a two-phase procedure to solve P[Yi+l] are too costly for these purposes (cf. problem (2.5)).

Discretization methods have the advantage to internally work with finite subsets of Y only. In particular feasibility with respect to the finite program P[Yi] can normally be checked easily and accurately. Therefore a discretization method is especially suited for problems with a solution x· at which g(x·,·) is (almost) constant on Y or on parts of Y. Almost constancy is a phenomenon which, for instance, can occur for the constraint function in (2.8) at complex CAP [215]. (See [177] for examples which were solved with high accuracy by a discretization method.)

Discretization methods, however, are very costly numerically. The time, needed to verify feasibility with respect to P[Yi] and to solve this problem, increases dramatically with growing cardinality of Yi. Therefore, in practice, only grids with a limited number of points can be used. Typical are grids with at most 50, 000 to 100, 000 points for problems with less than 100 variables.

Another characteristic of a discretization method stressed in the literature is that it normally produces outer approximations of a solution of the SIP problem, i.e. approximate solutions which are not feasible for pry]. Observe that,

for Yi ~ Y, a global solution x i* of P[Yi] that is feasible for pry] solves pry] since

f(x i*) = inf f(x):S inf f(x):S f(x i*). xEF(Y;J xEF(Y)

In practice, however, many other methods also yield solutions which are feasible for pry] only with respect to a certain accuracy. (Computing times of a discretization method and a "semi-continuous" method are compared in [160].)

An approximate solution of an SIP problem which has been obtained by a discretization method may have to be improved by a method based on local red.uction when the method becomes too inefficient. A possible difficulty connected with that is that the obtained solution may not be close enough to a solution of the SIP problem and hence not be in the convergence region of such method (cf. Section 5.2). Also a reduction based algorithm evidently is not applicable when g(x*,·) is constant on parts of Y at a solution x* of the SIP problem. In many cases, however, the solution reached by a discretization procedure suffices for practical purposes.

We next derive the two stability results mentioned above. These ensure under the respective assumptions that a solution of a sufficiently finely discretized problem represents an approximate solution of the SIP problem. Indeed, this approximation property cannot be always expected, as the following example demonstrates.

Example 2.7 The nonlinear CAP problem

JL(Y) := inf max I (c + x2) exp [_X2(y - 1/../2)2] I xER yEY

with Y := [0,1] and some arbitrary constant c > 0 has the minimal value JL(Y) = infxER (c + x 2 ) = c and the unique global solution if = o. If Yi ~ Y is a finite set with arbitrary density in Y such that 1/V2 ~ Yi, one has JL(Yi) = infxER (c + x 2 ) exp( _x2c2 ) = 0 with c := dist( {1/V2}, Yi), and the discretized problem does not possess a solution. Linear SIP problems with similar properties can be found in [109, lllJ.

In order to obtain the first stability result, we consider the following algorithm.

Algorithm 1 Step o. Choose a sequence {Yi} of compact sets such that !Yo I < 00, Yi ~ Yi+l ~ Y, and limi-too dist(Yi, Y) = o. Set Do := Yo and i:= O.

208 CHAPTER 7

Step 1. Find a (global) solution xi E IRn of the finite problem P[Di].

Step 3. Choose a set DHI with

Di U {yi} ~ DHI ~ Yi+l'

Step 4. Set i := i + 1 and go to Step 1.

(2.15)

Theorem 2.8 Let Assumption 2.1 be satisfied with respect to the initial set Yo ~ Y. Then P[Di], i E No, has a solution, and Algorithm 1 generates an infinite sequence {xi} such that {xi} has an accumulation point and each such point solves pry]. Moreover, the sequence {P.(Di)} converges monotonically increasing to p.(Y) lor i -t 00.

Proof. Let x F E F(Y). From Yi+l ~ Y and (2.15) we have

{xF} ~ A(xF,Y) ~ A(xF,Yi+d ~ A(xF,DHd ~ A(xF,Di) ~ A(xF,Yo). (2.16)

Hence, by assumption, A(xF, D i ) is compact and P[Di] has a solution xi lying in the compact set A(xF , Yo). By (2.16), one also has I(xi) :::; f(x H1 ) :::; p.(Y). Thus, if {xi;} converges to some X E IRn for j -t 00, we have f(x) :::; p.(Y), and, if also maxyEY g(x, y) :::; 0 is true, x solves PlY].

Due to limi-fOO dist(Yi, Y) = 0, we can find 1]i; E Yi;+l for fixed y E Y such that 1]tj -t y for j -t 00. Moreover there exists fj E Y such that w.l.o.g. yij -t fj is true for j -t 00. Hence, from g(xij , 1]ij) :::; g(xij , yij) we obtain g(x,y) :::; g(x,fj). Since we have g(Xil,yij) :::; 0 for it> i j , we finally conclude that g(x, fj) :::; 0 and hence g(x, y) :::; o. 0

Theorem 2.8 was proved in [176] and, in a more general form, in [178] (e.g. I in P[Di ] can be replaced by h E C(JRn , IR) if {h} converges uniformly to f on A(xF , Yo) for i -t 00). With special choices of Yi+l and Di+l (usually Yi+l := Y and Di+l := Di U {yi}), results similar to Theorem 2.8 were given for linear problems e.g. in [20,74,91]' for certain convex problems in [48], and for nonlinear ones in [10,94]. Proper stopping criteria for Algorithm 1 can be found in the literature (see e.g. Algorithm 2 below in this respect).

If we in particular choose DHI := Yi+l in (2.15) and let Yi+l be finite, we arrive at the following requested stability result.


Corollary 2.9 Let {Yi} be a sequence of finite sets with Yi ~ Yi+1 ~ Y and limi-too dist(Yi, Y) = 0, and let Assumption 2.1 be satisfied with respect to Yo ~ Y. Then P[Yi], i E No, has a solution Xi E IRn. Moreover, the sequence {Xi} possesses an accumulation point, and each such.point solves P[Y].

Under various assumptions, results similar to Corollary 2.9 were proved for linear problems in [69,72,91], for convex problems in [205], and for nonlinear ones in [73]. Furthermore, for linear problems, the general possibility of discretization was studied in [57-59]. Note in this connection that some of the given theorems (like also Theorem 2.13 below) do not require the inclusion Yi ~ Yi+1 for all i E No, but that efficient use of the obtained solution of P[Yi] as a starting vector for solving P[Yi+1] normally suggests that Yi+1 contains at least those points of Y; which are active at this solution.

Corollary 2.9 is essentially useful only for linear and convex SIP problems. In [61,133] and [150, p.464] convergence of variants of Algorithm 1 is shown, where Xi does not need to be a global solution but only a certain (approximate) stationary point of P[Di ], which can be computed also for nonlinear problems. It is in particular suggested to choose Di+1 := Di U {yi} and to let Y; := Y for all i E N [61,133] or to let {Yi} be a sequence of finite sets satisfying some weak assumption [150]. Furthermore, in [61,133]' rules are given which allow to drop certain constraints in P[Di]' (Numerical experiments exploring the gain obtained by such rules are not known to the authors.)

Another stability result for nonlinear SIP problems, which especially applies to SQP type methods for the solution of P[Yi]' is derived next. For that we let f E C3 (IRn , 1R) and 9 E C3 ,o (IRn x Y, 1R) .

Definition 2.10 For each x E IRn , p > 0, and a compact set yO' ~ Y, we define the exact Loo -penalty function

Loo(x,p, yO') := f(x) + p¢+(x, YO' ), ¢+(x, yO') := max max {g(x,y),O} . yEY~

(2.17)

It can be shown (use e.g. [27, 150]) that the directional derivative D dLoo (', p, yO' ) of Loo (', p, yO') at x into direction d exists and is given by

DdLoo(x,p,YO')=Vf(xfd+p max max {Vxg(x,y)Td,O} , YEY~,o(x)

YO',o(x) := {y E yO' I g(x,y) = maxg(x,y)}. yEY~

210 CHAPTER 7

Definition 2.11 Given P > 0 and compact Y.,. ~ Y, a point x* E jRn is a stationary point of Loo (., p, Y.,.) if D dLoo (x* , p, Y.,.) ~ 0 is true for all d E jRn.

We say that x* E F(Y.,.) is a Karush-Kuhn-Tucker (KKT) point of P[Y.,.] if multipliers >'. (Y.,.) E jRl Y .. I exist such that the KKT (first order necessary) conditions are satisfied at x· for P[Y.,.]. (See [88,91] for a form of the KKT conditions which includes SIP problems.) In this connection the subsequent well-known result is of importance (e.g. [8,63]).

Lemma 2.12 LeJ Y.,. ~ Y be finite.

(a) If x· E F(Y.,.) together with >'. (Y.,.) E jRl Y .. I satisfies the KKT conditions for P[Y.,.], then x· is a stationary point of Loo (., p, Y.,.) for each P ~ II>'· (Y.,. ) 111 .

(b) If x· E jRn is a stationary point of Loo (., p, Y.,.) for some P > 0 and if x· E F(Y.,.) , then x· is a KKT point of P[Y.,.].

After these preliminaries, we can prove the following stability result.

Theorem 2.13 Let {Yi} be a sequence of finite sets Yi ~ Y which satisfies limHoo dist(Yi, Y) = O. For each i E No let there exist a stationary point Xi.

of L oo (·, Pi, Yi) for some Pi > 0 and let Pi S p* with some p*. Then each accumulation point x* of { Xi.} is a stationary point of Loo (., p* , Y). If { xij *} converges to x· and limj--too ¢+(xij *, Yi j ) = 0, then x* also is a KKT point of PlY].

Proof. 1 Let x E jRn, p > 0, and Y.,. ~ Y be a compact set. We consider the (n + 1 )-dimensional QP problem

Q[x,p, Y.,.] : Minimize !dT d +'\7 f(x)T d + p~ subject to g(x, y) + \7 xg(x, y)T d ~ ~,

~ ~ 0, (2.18)

IThe theorem was proved in [63] in a different way and under the additional assumption that P[x*] is regular (cf. Definition 2.15). The latter assumption was needed to gain control of the possibly varying number of terms in the sum of the KKT conditions for P[Yi] with i -+ 00. The impetus to the proof given here came through a comment by Professor E. Polak in a recent conversation with the first author, at which he pointed out the advantages of the use of optimality functions in such respect. Clearly, our proof here again reveals these advantages.

and the unconstrained minimization problem

U[x,p,Y".) : ((x, P, Y".):= inf O(d, x, P, Y".) dElR n

with optimality function ( where

O(d, x,p, Y".) := ~dT d + \7f(xf d + pmaxmax {g(x,y) + \7 xg(x,yf d, o} . 2 yEYO'

It can be proved that both problems are uniquely solvable and that the following three assertions are equivalent: (i) x".* is a stationary point of Loo(·,p"., Y".), (ii) (0, ¢+ (x"'*, Y".)) E ~n+l is the solution of the QP problem Q[x"'*, p"., Y".), and (iii) d* := 0 is the minimizer of U[x"'*,p".,Y".). Obviously, if (iii) is true, one has ((x"'*,p".,Y".) = p".¢+(x"'*,Y".). For finite Y"., a proof of "(i) {:} (ii)" is found in [8, p.185) or [63, p.1l8), while the equivalence of "(ii) {:} (iii)" can be shown easily. Both results can be straightforwardly extended to an infinite compact set Y". ~ Y. (In regard to the use of the KKT conditions for problem (2.18) with infinite Y"., see e.g. [91) and note that (2.18) has a Slater point.)

We let d* be the unique solution of problem U[x*, p*, Y) and B ~ ~n be a closed ball around the origin containing d*. Then, in the formulations of U[xi*, Pi, Y,:] and U[x*, p*, Y), we can equivalently replace the infimum over ~n by that over B. W.l.o.g. we moreover have x i* -t x* and Pi -t P for i -t 00 with some p ~ p* and hence ((Xi*,Pi,Y,) -t p¢+(x*,Y) and O(d,xi*,Pi'Y,) -t O(d,x*,p,Y) for each dEB. Therefore, each accumulation point of the sequence {di*} of solutions di* = 0 of U[Xi*, Pi, Y,) solves U[x*, p, Y) (use e.g. [68, Lemma 7, Theorem 14)). Consequently, problem U[x*, p, Y) has the unique solution d* = O. Since 0 < p ~ p* , the point x* also is a stationary point of Loo (., p* , Y). The last statement in the theorem eventually follows easily by extension of Lemma 2.12 to infinite Y. 0

Several SQP type methods for the solution of a discretized SIP problem P[Y,) converge to a point x i* where xi* is (at least) a stationary of Loo(',Pi, Y,) (e.g. [63)) or a KKT point of P[Y,) (e.g. [121)), respectively. Under certain conditions, which especially are satisfied when P[Y,) is solved by the method in [63) (see Section 5.1), the existence of an accumulation point of {xi*} is guaranteed if a certain level set for the exact Loo-penalty function is compact [63].

There exist only few rate of convergence results for the sequence of solutions of discretized SIP problems resp. for the entire sequence of solutions of all subproblems which are generated at solution of the discretized problems. One result of this type has been given for unconstrained linear CAP problems, i.e. when f and 9 are defined as in (2.8), Y := n, and K n - 1 := ~n-l. Then the

212 CHAPTER 7

estimate o :s JL(Y) - JL(Yi) :s C [dist(Yi, y)]2

with some constant C ~ 0 can be proved under weak assumptions. If the CAP problem is real and VI, ... , Vn-I in addition satisfy the Haar condition, a corresponding bound can be verified for the distance of the, in this case, unique solutions of P[Y] and P[Yi] [32,177]. In regard to such results, however, it has to be noted that the numerical costs for solving discretized SIP problems normally tend to infinity when the grid densities in Y converge to zero.

In connection with the discretization of SIP problems, Polak [149,150] has developed a theory of consistent approximations that provides conditions under which (local) minimizers and certain stationary points of the discretized problems converge to (local) minimizers and related stationary points of the SIP problem. The theory includes conditions which imply convergence of the entire sequence of iterates generated by a discretization algorithm (rather than only for the sequence of outer iterates as e.g. in Corollary 2.9), and it contains conditions on the rate of discretization which ensure that the entire sequence converges with the same rate as the algorithm used for the solution of the finite subproblems. The further study and application of these results constitute a worthwhile task for the future. (Note that, in order to gain accuracy in regard to a solution of the SIP problem, it may, for example, be more efficient to solve a discretized SIP problem with high accuracy by a fast algorithm rather than to proceed to a refined grid at an earlier stage.)

2.6 Local reduction

Given a feasible point x E ]Rn of a finite optimization problem, there is a neighborhood of this point where the feasible set can be described by the (usually at most n) constraints which are active at x. Especially, if x is a (local) solution of the problem, it is also a (local) solution of the problem at which all inactive constraints at x are dropped and conversely (cf. Figure 1).

In general, both is not true for SIP problems. To see this, consider the linear SIP problem of minimizing a linear function over a disk in ]R2, where the disk is defined as the intersection of the infinitely many linear tangential halfspaces which contain the disk (cf. Figure 2). Then, at a boundary point of the disk, only one constraint is active and there is no neighborhood of this point in which the related halfspace describes the feasible set of the problem. Moreover, the SIP problem has only one solution, while the reduced problem obtained by cancellation of all inactive constraints at this solution has infinitely many


solutions. (The minimal values of both problems are identical. Note that for convex problems which possess a Slater point there is always a subset Y,a ~ Y with 1Y,a1 :::; n such that J.L(Y,a) = J.L(Y) [125].)

. {x \ f1x) == c) - -- -Figure 1 Figure 2

Thus, the feasible set of an SIP problem normally cannot be locally represented by the (usually finitely many) active constraints only. Under proper assumptions, however, for x E ~n (not necessarily feasible) there exist a finite number of certain implicitly defined inequality constraints and a neighborhood where the feasible set defined by these constraints coincides with the feasible set of the SIP problem. Hence, under such assumptions, the SIP program can be locally reduced to a finite program, at least conceptually. This principle of local reduction goes back to Hettich and Jongen [87] and is developed in the sequel.

Let f E C3 (~n ,~) and 9 E C3 ,3 (~n X Y,~) and assume that the compact set Y is defined by functions Cr E C 3 (Y,~) in the form

Y:={YE~m I cr(y):::;O, r=l, ... ,r}. (2.19)

Moreover, let c(y) := (Cl (Y), ... , cr(y))T and R(y) := {r E {l, ... , f} I cr(y) = O} for y E Y. For x E ~n given, we consider the parametric optimization problem

P[x]: Maximize g(x,y) over Y (2.20)

and define the Lagrange function

(2.21)

If 'Vcr(y), r E R(y), are linearly independent, we say that the finite linear independence constraint qualification (FLICQ) is satisfied for 11 E Y.

214 CHAPTER 7

Definition 2.14 ([101}) (a) y E Y is called a critical point ofP[x) for x E IRn if 1. the FLICQ is satisfied for y, 2. there exists v E IRr such that V' yL(x, y, v) = 0, vT c(y) = 0.

(b) Y E Y is called a nondegenerate critical point of P[x) for x E IRn if y is a critical point of P[x) and if, in addition, 1. vr "1O, r E R(y), 2. eV';yL(x,y,v)~"1 0, ~ E T(y)\{O}, with

T(y):= {~E IRm I V'cr(y)T~ = 0, r E R(Y)}.

(c) Y E Y is called a Karush-Kuhn- Tucker (KKT) point of P[x) for x E IRn if there exists v E IRr such that V' yL(x, y, v) = 0, vT c(y) = 0, v ~ 0.

Clearly, a local maximizer y E Y of P[x) , for which the FLICQ condition holds true, is a KKT point and hence a critical point of P[x) (e.g. [7)). In general, however, a KKT point does not necessarily fulfill the FLICQ condition and therefore may not be a critical point. Conversely, the multiplier vector v E IRr associated with a critical point y E Y of P[x) does not need to be nonnegative so that y does not have to be a KKT point.

Definition 2.15 Problem P[x) with x E IRn is said to be regular (weakly regular) if 1. the FLICQ is satisfied for all y E Y, 2. all critical points (all global maximizers) of P [x) are nondegenerate.

Regularity was defined in this way in [102]' while weak regularity is introduced here for convenience. As is easily seen, regularity of P[x) implies that, for each local maximizer y E Y of P[x), the strong second order sufficiency condition (SSOSC) and the strict complementary slackness condition (SCS) are satisfied, i.e. that eV';yL(x,y,v)~ < 0, ~ E T(y)\{O}, for some v E IRr and vr > 0, r E R(y), are true respectively.

We next introduce the following sets:

Y9(X) := {y E Y I y is a global maximizer of P[x]} ,

yl(x) := {y E Y I y is a local maximizer of P[x]}.


In particular, we call a maximizer of an optimization problem an isolated local maximizer if it is a strict local maximizer and if a neighborhood exists in which it is the only local maximizer. An isolated local minimizer is defined accordingly. (See [180] for an example of a strict local minimizer which is not an isolated local minimizer.)

Assertions (a) and (b) of the following theorem and the subsequent corollary have been proved first in a somewhat different form by Hettich and Jongen [87] (see Remark 2.18), where the proof of (b) is based on the implicit function theorem. The proof of the theorem as it is stated here can be found in [63].

Theorem 2.16 Let x E IRn and problem P[x] be regular. Then the following is true:

(a) yl (x) has a finite cardinality rl (x) so that yl (x) = {yl, ... , 17' (x)} .

(b) For each local maximizer yi E yl (x) of problem P[x] there exist open neighborhoods Vi (x) of x and Wi (yi) of yi as well as an implicitly defined function yi E C l (Vi (x), Wi (yi) n Y) such that 1. yi (x) = yi and, 2. for all x E Vi(x), the vector yi := yi (x) is a nondegenerate and isolated local maximizer of P[x].

(c) There is an open neighborhood U(x) ~ Vex) ofx for Vex) := n;~i) Vi (x)

so that for all x E U (x) one has

(2.22)

The following corollary eventually states that, in a neighborhood of x for which P[x] is regular, the feasible set F(Y) of pry] can be represented by finitely many inequality constraints. These, however, are in general not known explicitly.

Corollary 2.17 Let problem P[x] be regular for x E IRn and, using Theorem 2.16, let gi(x) := g(x,yi(x)), x E U(x). Then there is an open neighborhood

U(x) ~ U(x) ofx with the following properties:

(a) for x E U(x) one has: x E F(Y) {:} gi(x) :S 0, j = 1, ... ,TI(X).

216 CHAPTER 7

(b) x* E U(x) is a (strict, isolated) local minimizer of pry] ¢:} x* E U(x) is a (strict, isolated) local minimizer of the problem

Pred[X]: Minimize f(x) subject to x E U(x) and gj(x)~O, j=I, ... ,rl(x).

(2.23)

Clearly, x in Corollary 2.17 can coincide with x*. Problem Pred[X] is called the at x locally reduced problem and has the feasible set

Fred(X):= {x E U(x) I gj(x) ~ 0, j = I, ... ,rl(x)}.

Provided that x is an element of U(x) (which, in practice, normally cannot be verified), then x lies in Fred(X) if and only if g(x,yj(x)) ~ 0 is true for all local maximizers yj(x) of P[x].

Remark 2.18 (1) When P[x] is only assumed to be weakly regular, one can show that y9 (x) has a finite cardinality r 9 (x) and that assertion (b) of Theorem 2.16 remains valid in the same way for "each global maximizer yj E y9 (x) " instead of "each local maximizer yj E yl (x) " . Moreover, Corollary 2.17 then is true analogously, when rl (x) is exchanged for r 9 (x). This was originally shown in [B7, 91} for x E F(Y) and used by some authors to describe the feasible region F(Y) in a neighborhood of a solution x of pry] by finitely many constraints. The earlier results can be straightforwardly extended to the situation considered here. The new result in [63} is essentially assertion (c) of Theorem 2.16.

(2) In practice and especially for locally convergent reduction based methods (see Section 5.2), those implicit functions yj, j E {I, ... , rz(x)}, which belong to global maximizers of P[x], normally cannot be identified only from the knowledge of the local maximizers of some x f:. x, neither if x is sufficiently close to x. (Note that, in a neighborhood of x, the number of all local maximizers can be constant, but that possibly not all local maximizers also are global ones at x.) Furthermore, if all global maximizers yj E y9 (x) or some or all local maximizers yj E yl (x) are known at some x (for example, at x := xk as in the globalized reduction based methods in [25,64,77,96,154, 19B}), then, in case P[x] is not regular, one only has information about the corresponding functions yi and not about all local maximizers in a neighborhood of x, so that it is difficult in practice to identify the correct local maximizers yj(x) for x close to x. (P[x] may have more local maximizers than P[x].) For practical purposes, therefore, a representation (2.22) for all x E U(x) and a local reduction of the form (2.23) with the set yZ (x) instead of y9 (x) are needed.


(3) In order to reach a representation (2.22) for all x E U(x), nondegeneracy has to be assumed for all critical points and it does not suffice to assume that all local maximizers of P[x] are nondegenerate. If only the latter holds, P[x] may have infinitely many local maximizers, as in case Y := [-1,1] and

(_ )._ { y6 cos(l/y), y i- 0, gx,y.- 0 -0 y - .

Other than is claimed in [65J, furthermore, in that case, a representation (2.22) may not be possible in a neighborhood of x, as the following example shows [63, p.59J: ifY:= [-1,3] and

._ { (1 - x)y3 - xy2, Y E [-1,1), g(x,y).- -9113Xy3 + (12 -17x)y2 + (-12 + 16x)y + 12316X, Y E [1,3],

then Yo := 0 is a degenerate critical point and Y1 := 2 the only nondegenerate local maximizer ofP[O]' whereas P[x] with x E (0,1/2] has nondegenerate local maximizers at Yo and Y1·

(4) Assume that problem P[x] is regular for x E IRn and let L(x, y, v) be defined as in (2.21). Moreover, for j E {I, ... , r/(x)} and x E U(x), let vi := vi (x) E IRr be the multiplier vector associated with the local maximizer yi := yi (x) E IRm

and let vi (x) := (v?(X))rER(yj) E IRIR(yj)l. Then one has

and the matrices V'yi(x) E IRmxn and V'vi(x) E IRIR(yj)lxn are uniquely determined by the system of equations

where c(yi) := (cr(yi))rER(yj) and the matrix V' ;xg(x, yi) E IRmxn has the

ij-th entry i)~i (i)~jg(x,yi)) [87,91]' Thus, for yi and vi given, V'gi(x) and

V' 2gi(x) can be computed.

2.7 Types of methods

Discretization methods are computationally costly, and their costs per iteration even grow with an increasing accuracy demand. Globally converging reduction

218 CHAPTER 7

based methods, on the other hand, require strong assumptions and are often conceptual methods which can be implemented in a rather simplified form merely. Discretization methods therefore are often used only for a first stage to generate an approximate solution of the SIP problem, whereas reduction based methods are typically employed only in a final stage as local methods to improve such solution (cf. Section 5).

Beyond discretization and reduction based methods, there exist other methods which also determine a solution of a SIP problem by solution of a sequence of finite subproblems. In extension of [81], we call such methods semi-continuous methods. We precisely speak of a semi-continuous method if it is not based on the reduction principle and if it works continuously with respect to the second entrance in g(., .), i.e. when it requires, for example, the knowledge of all continuous local maximizers of g(x,·) with respect to Y at x or when it involves the solution of integrals over Y. It is remarkable that such methods have been given for linear and convex problems (see Sections 3 and 4) but, except for the penalty type approach in [99,202,203] (cf. Section 5.3), not yet for nonlinear SIP problems.

Thus we follow Hettich [81] and distinguish SIP methods into discretization methods, semi-continuous methods, and methods based on local reduction (called continuous methods in [81]). This classification differs from that by Hettich and Kortanek [88] who, instead of semi-continuous methods, consider exchange methods as a special class. To our opinion, however, discretization and exchange methods in the sense of [88] are closely connected outer approximation methods. (Consider Algorithm 1 which allows the choice Yi := Y, i E N, and see Sections 3 and 4 for modifications of this algorithm which include rules for dropping points at each iteration.) Also the classification in [88] does not enable-to include interior-point methods. We therefore speak of an exchange method only in case of certain semi-continuous methods for linear problems (see Section 3) and relate the term "exchange" to the exchange (of columns) in the simplex algorithm rather than to the general exchange of points.

Like a discretization method, a semi-continuous method should make use of the special nature of the constraints in an SIP problem. Therefore, many semi-continuous methods are also suitable for the solution of discretized SIP problems and, in combination with a stability result of Section 2.5, provide the basis for a discretization method.

Each semi-continuous and discretization method can be used as a first phase method only and, under proper assumptions, be connected with a method based on local reduction, having good local convergence properties usually. We speak


of two-phase or hybrid methods when such combined procedures have been suggested explicitly. Because of the nonlinear nature of the local reduction approach, we discuss such methods in connection with methods for nonlinear problems (see Section 5.3).

We consider methods for linear, convex, and nonlinear SIP problems separately. Since discretization and reduction based methods are emphasized as such, it is obvious and therefore not mentioned especially that all other methods are semi-continuous methods.

3 LINEAR PROBLEMS

In this section we associate with pry] the linear SIP problem

pry] : Minimize I(x) := cT x subject to g(x, y) := a(y)T x - bey) ::; 0, y E Y,

where a(y) := (al (y), ... , an(y))T E IRn for y E Y and c E IRn, bE C(Y, IR), and aj E C(Y, IR) are given. Problem pry] is denoted as the primal problem. It is related to the dual problem [54]

SlY] : Maximize - L~/ uib(Yi) subject to - L~/ Uia(Yi) = c, (3.1)

Yi E Y, Ui;::: 0, i = 1, ... , n + 1.

If SlY] has a solution, "n + I" in the problem can be replaced by "n" [54]. Problem S [Y] , for example, has a solution when pry] has one and when pry] possesses a Slater point [54].

We make use of classical results on finite linear programming (LP) and the simplex algorithm (e.g. [37,130]). In Sections 3.3 and 3.4 we also assume a basic knowledge of interior-point LP methods. A simple and concentrated introduction into these can be found, for example, in [37]. For a deeper analysis of interior-point methods (with different weights) we refer the reader to the recent monographs [182,204,219,230].

220 CHAPTER 7

3.1 Dual exchange methods

3.1.1 Description

If Y.,. ~ Y is finite and IY.,. I ~ n, we can write the LP problem P[Y.,.) with some b(Y.,.) E RIY"I and A(Y.,.) E ~IY"lxn in the form

P[Y.,.) : Minimize cT x subject to A(Y.,.)x::; b(Y.,.).

By LP theory the dual problem related to pry) is

S[Y.,.) : Maximize -b(Y.,.) T u subject to _A(Y.,.)T u = c, u ~ o. (3.2)

If one of the two problems P[Y.,.) and S[Y.,.) has a solution, also the other one has a solution and their optimal objective function values are identical.

The simplex algorithm starts from the standard form of a LP problem which only includes equality constraints and nonnegative variables. In order to transform problem P[Y.,.) into this standard form, one would have to introduce IY.,.I slack variables and n additional variables, which would enlarge the number of variables of the problem considerably. Therefore it is usually more efficient to determine a solution of P[Y".l from the solution of S[Y".], which already has the requested form. The simplex algorithm applied to the dual problem S[y.,.) is also called the dual simplex algorithm for P[Y.,.).

If Y.,. := Y is an infinite set and SlY) has a solution, the dual (3.1) of P [Y) can be imagined as a problem of the form (3.2), where -b(y)T has infinitely many coefficients, _A(y)T has infinitely many columns of length n, and u varies over all column vectors having infinite length and at most n nonzero components. The matrix _A(y)T then has infinitely many (n x n)-submatrices which provide candidates for "vertices" of the feasible set of SlY). Thus the LP simplex algorithm can be formally applied to the infinite dual SlY). (In [56) the fundamentals of the simplex algorithm are characterized for this problem.) These observations explain the background of a number of algorithms for linear SIP problems. To make this clearer, we consider the following algorithm.

Algorithm 2 Step o. Choose a finite set Do ~ Y with IDol ~ n and set k := O.

Step 1. Find a solution xk ERn of P[Dk) (by solving the dual S[Dk)).

Step 2. Find yk E Y such that g(xk,yk) = maxyEY g(xk,y).

Step 3. If g(xk, yk) ~ 0, stop! (Then xk solves P[Y].) Else choose a set Dk+l <;;; Y with Dk+1 ;2 {yk}.

Step 4. Set k := k + 1 and go to Step 1.

For finite Y, phase I of the simplex algorithm detects whether the feasible set of the problem is empty. Since we do not know of a finite procedure which accomplishes the same goal for infinite Y, we assume that the feasible set of SlY] is nonempty. It can furthermore not generally be excluded that, for WI = 00, all problems P[Dk] and S[Dk] generated by Algorithm 2 have a solution, while the objective function of SlY] is unbounded on its feasible set or, equivalently (see [54]), pry] has an empty feasible set. Therefore, we also assume F(Y) ¥- 0, which implies F(Dk) ¥- 0 for all kENo. We finally need the standard assumption for phase II of the simplex algorithm that some set Do <;;; Y exists with IDol = nand rank(A(Do)) = n. Obviously, these three assumptions are especially fulfilled when SlY] has a nonempty feasible set, when Assumption 2.1 is satisfied with respect to some Yo := Do with IDol = n, and when F(Y) contains a Slater point (cf. Remark 2.2). The latter assumptions also imply that both problems pry] and SlY] possess a solution (see above).

With certain specifications, the simplex algorithm applied to SlY] implicitly just performs Steps 1 to 4 of Algorithm 2 where no distinction has to be made between finite and infinite Y. To see that, let rank(A(Dk)) = nand IDkl = n, which by our assumptions in particular holds true for k = O. Then solution of S[Dk] means to solve the (n x n)-system -A(Dk)T u = c. (A solution of P[Dkl may also be directly obtained from A(Dk)X = b(Dk) here.) If g(xk , yk) ~ 0 resp. xk E F(Y) is true, it follows that

and hence that xk is optimal for PlY]. Otherwise one column is added to -A(Dk)T which is related to the most violated constraint of the primal problem, and one column of -A(Dk)T resp. some y E Dk is taken out according to a certain rule (see [54,92] for its extension to infinite Y). The exchange rule in particular guarantees that IDk+11 = nand rank(A(Dk+1)) = n. The new problem S[Dk+l] then is solved by a proper update of the previous problem. In this way g(xk , yk) ~ 0, is achieved after finitely many iterations in case Y is finite. For infinite Y, however, this procedure does usually not terminate, and

222 CHAPTER 7

its convergence to a solution of S[YJ (if such exists) has not yet been proved to our knowledge.

Following (84), we denote the described specification of Algorithm 2 as an onepoint explicit exchange algorithm, since one point of Dk resp. one column of -A(Dk)T is exchanged at the k-th iteration and this is done explicitly by a rule which ensures IDk+l1 = nand rank(A(Dk+d) = n. Naturally the question comes up whether it is possible to exchange more than one column at the same time in order to make more progress towards a solution of the problem. For finite programs such a multiple exchange algorithm can indeed be found in [103, p.506), but the exchange there is performed in a quite difficult manner. It is shown in [181J that a multiple exchange in fact leads to the solution of a LP problem P[Dk+lJ resp. S[Dk+1J where, apart from yk and possibly other points from Y, the set Dk+l especially includes those (usually n) points which are active at Xk for P[DkJ and hence define the optimal basis matrix at the solution of S[DkJ. Henceforth Dk+l satisfies

(3.3)

and IDk+ll > n, where typically some or all of those local maximizers of g(xk, .) on Yare added to Dk+l at which the corresponding constraints of P[YJ are violated at xk. The optimal basis solution of S[Dk+1J then is the basis solution related to the multiple exchange. The following lemma provides assumptions under which Algorithm 2 is practicable for this choice.

Lemma 3.1 Let Assumption 2.1 be satisfied with respect to some Yo, choose Do := Yo, and let Dk+l, kENo, in Algorithm 2 satisfy {3.3}. Then P[Dkl has a solution xk for each kENo.

Proof. The assumptions ensure that P[DoJ is solvable and rank (A (Do)) = n (cf. Remark 2.2). They furthermore guarantee that F(Y) and hence F(Dk+d are not empty. Since Dk+l encloses all active points of Dk at xk with respect to P[DkJ, one therefore easily verifies that JL(Dk) ~ JL(Dk+l) ~ JL(Y). Consequently, with P[Dk)' also P[Dk+lJ possesses a solution. 0

The assumptions of the lemma ensure that rank(A(Do)) = n and that each problem S[DkJ has a solution. If rank(A(Dk)) = n and if the LP problem S[DkJ is solved by the simplex algorithm, then also the optimal basis matrix of S[DkJ has rank n. Consequently also rank(A(Dk+d} = n is true for Dk+l with (3.3). It can furthermore be shown easily that the optimal basis matrix of


S[Dk] is a submatrix of -A(Dk+t}T and hence provides a feasible basis solution of S[Dk+l] (which, at convergence of Algorithm 2, is already close to a solution of this problem). Therefore one can immediately proceed with phase II of the simplex algorithm to solve the finite program S[Dk+l]. In practice one should always act in this way, since solution of the problems S[Dk] by the two-phase simplex algorithm without provision of feasible vertices increases computing times by a large factor.

Algorithm 2 with (3.3) works well in practice [181]. It usually generates an infinite sequence of iterates and hence has to be terminated. Obviously, for some prescribed c > 0, the "feasibility check" g(xk, yk) :s c provides a stopping criterion if no other criterion is available.

The described procedure implicitly yields a new basis solution by solution of the LP problem S[Dk]. Generalizing [84], we speak for this reason of an implicit one-point or multiple-point exchange algorithm, depending on whether only yk or also additional points enter the new set D kH , where it is irrelevant whether members of Dk are dropped or not. While, for finite programs of normal size, a multiple exchange only results in little or no gain compared to one-point exchange, in SIP it avoids the in this case costly computation of yk after each simplex step. The considerable profit in computing time by multiple exchange in SIP was already manifested in [52, 181] for low-dimensional problems and is even larger for higher-dimensional ones (see also [84]). Therefore neither onepoint explicit nor one-point implicit exchange algorithms should be used for larger SIP problems.

Unfortunately exchange algorithms and discretization methods, based on these, typically run into numerical instabilities. To see that, consider the linear SIP problem of minimizing a linear function over a disk, which was discussed in Section 2.6. Obviously, only one constraint is active at the solution of the SIP problem, whereas two constraints are active at a solution of each finite subproblem which has at least two constraints and a finite minimal value. Thus, typically, when the xk approach the solution of the SIP program, the angle between the corresponding tangents converges to 1800 and the condition numbers of the related optimal basis matrices tend to infinity (cf. Figure 3). This is the usual situation in SIP so that the internal use of stable matrix decompositions in Algorithm 2 is a necessity. (The authors of [117,118,136] highly recommend the code in [49] for this purpose.) Only in special circumstances, like at real CAP problems under the Haar condition, the number of active constraints in a solution of the SIP problem can be proved to be at least n [91].

224 CHAPTER 7

Figure 3

3.1.2 History

Three kinds of exchange rules have been suggested. The first and simplest one is to add s points from Y to Dk and to drop none, i.e. to choose Dk+l '2 DkU{yk} and IDk+1 \Dkl = s. Convergence of this implicit s-point exchange algorithm is simple to proof and follows here from Theorem 2.8 (with Yo := Do and Yi := Y, i E N). Especially for s = 1 resp. Dk+l := Dk U {yk}, this algorithm was given, for example, in [20,74,76,91,94]. IT it is applied to the unconstrained linear real CAP problem (2.8) (with q = 1 and the semi-infinite constraint (2.9)) and if, with a global maximizer yk := (wk, o:k), also (wk, _o:k) is added to Dk so that s = 2, it is just the first algorithm of Remes [19], which also was suggested in a multiple exchange version [223] and similarly for linear complex CAP problems [16]. On account of the usually slow convergence of exchange algorithms, however, these algorithms suffer in practice from the monotonic growth of the constraint set and, in case of one-point resp. two-point exchange, from the large number of costly computations of global maximizers. Therefore these algorithms are only useful for small problems.

The second and ideal way of exchange is that of an explicit exchange. For the special problem of unconstrained linear real CAP under the Haar condition (and certain constrained ones [52]), such exchange rule can be given and goes back to work of Remes in 1934 [134]. Algorithm 2 in this case is known as the


second algorithm of Remes. Its convergence was proved for s-point exchange with s E {l, ... ,n} (e.g. [91,134]) and was shown to be quadratic for s:= n [134,220]. Generalizations ofthis explicit exchange algorithm and compromises between the first and the second Remes algorithm for the situation where the Haar condition is not satisfied were suggested as one-point resp. multiple-point algorithms in many papers but partly without convergence proofs (see e.g. [9,185,186,212,223] and, for a more general framework, [11,17,120]).

Explicit exchange procedures were also suggested for general linear SIP problems. In particular, explicit one-point exchange was investigated in [54,92] and, in a general functional analytic framework, in [183]. But a convergence proof of this infinite dual simplex algorithm is difficult and is also not given in these references. The authors of [92] refer to a convergence proof which, however, does not seem to have been published, and the convergence proof in [183] is not complete for the case of interest here.

Under strong assumptions, including that pry] has a unique solution at which exactly n points are active (the exceptional case), local convergence with a superlinear rate is proved for the explicit n-point exchange algorithm where Dk+l is formed by n local maximizers of g(x k ,.) on Y at each step [90,91]. This result in particular implies the superlinear rate of convergence of the second Remes algorithm and hence extends this property to certain SIP programs. For SIP problems, however, it is in general not known a priori whether the needed assumptions are satisfied or not. But the result suggests to include all those current local maximizers of the constraint function in the set Dk+l which could be candidates for active points at a solution of the SIP problem.

The third kind of exchange is to drop all inactive points of Dk at xk with respect to P[Dk ] so that Dk+l consists ofthe (usually n) active points of Dk , the global maximizer y\ and possibly additional points from Y (cf. (3.3)). This implicit exchange was first suggested by Roleff [181] (without convergence proof) where Roleff adds all violating local maximizers to Dk+l. In his (difficult to read) thesis [104], where several exchange procedures are explored, Jurgens claims that he has proved convergence of an algorithm which, in the normal case, coincides with Roleff's algorithm. Convergence of Roleff's algorithm for s = 1 is also shown in [119] but under assumptions which, in general, cannot be a priori verified in practice.

As was essentially observed already in [91, p.152]' it can be concluded from the properties of the alternation theorem [134] and the second Remes algorithm that, for unconstrained real CAP problems satisfying the Haar condition, the

226 CHAPTER 7

one-point second algorithm of Remes yields the same iterates as the one-point implicit exchange algorithm of Roleff, applied to the related SIP program with q = 1 and constraint (2.9). Moreover, the numerical labor to solve the subproblems in both algorithms is nearly identical if the subproblems in the Roleff algorithm are solved by the dual Simplex algorithm in the above described efficient way. Therefore (at least for the one-point version) the Roleff algorithm can be considered as the direct extension of the second Remes algorithm to linear SIP problems (without guaranteed convergence).

Tang [199-201] (see also [46]) succeeded to prove convergence of the one-point Roleff method for the special problem of unconstrained linear complex CAP and thereby generalized the one-point Remes algorithm to general complex problems. Tang's result of n-step superlinear convergence of his algorithm can be considered as a particular case of the general result on explicit exchange in [90,91] mentioned above. For complex CAP problems, the assumptions for that, however, are normally not satisfied.

It is shown in Section 4, that convergence of the implicit exchange algorithm which is given by Algorithm 2 with choice (3.3) can be easily verified when the linear objective function cT x is replaced by the strictly convex function cT x + &xT x with some tiny & > 0 and when the initial set Do is added to Dk+l at each iteration. (In practice Do can be dropped as soon as for the first time none of the points of Do is active any more.) This latter modification has provided good results for linear continuous approximation problems with up to 1000 variables [157] and has the advantage of being extendable to general convex SIP problems (cf. Section 4). It especially has turned out to converge quadratically in practice (as also could be expected) if it is applied to real CAP problems satisfying the Haar condition.

3.1.3 Discretization methods

As was pointed out in Section 2.7, a semi-continuous method and hence Algorithm 2 (with Y := Yi) may also be a suitable tool for the solution of the finite problem P[Yi] in a discretization method. It was used in this way for the discretization methods in [85,91,176] (see [175] with respect to CAP problems) , which are distinguished by the choice of D i ,k+l (the index '~" refers to the current grid) where always, with some 1'Ji ,k,

{ i k } Di,k+l;2 y E Yi I g(x' ,y) ~ 1'Ji ,k . (3.4)

Especially in [85,91], 1'Ji ,k equals some small negative number resp. zero for all i and k, and (as an option in [91]) Di ,k+l is a proper upper set of the


right-hand set in (3.4), i.e. other points from Yi or from previous grids enter Di,k+l according to certain rules. In [176], {)i,k ::; 0 is adapted properly at each iteration and Di,k+l equals the right-hand set in (3.4). Experiments in [176] seem to indicate that the latter choice is the most efficient.

The assumptions of Corollary 2.9 (one is Yi ~ YiH ~ Y) imply convergence of solutions of P[Yi] to a solution of pry] for i --t 00. Thus, under these assumptions, the methods in [85,91,176] converge when a solution of the linear problem P[Yi] is obtained after finitely many iterations of an inner algorithm. Especially for the choice (3.4) with {)i,k ::; 0, convergence of Algorithm 2, applied to the finite program P[Yi], can be proved when precautions are taken to avoid a certain cycling of the algorithm which is possible theoretically. In [85], for this reason, under certain conditions the set Di,k is augmented by at least one point of Yi at each inner iteration k (theoretically until Di,k equals Yi, which is not wanted and also does normally not happen in practice), while in [176] it is shown that the cycling does not occur when each subproblem P[Di,k] has a unique solution (which is almost always the case on a computer) or, more generally, when xi,k is the solution of P[Di,k] with minimall2-norm (which can be obtained by solution of a quadratic program).

Clearly, the grid index is increased when g(xi,k, yi,k) ::; 0 or g(xi,k, yi,k) ::; Ci is true, where {Ci} is a zero sequence of positive numbers. In that case, the choice Yi ~ Yi+l allows to proceed as at an inner iteration for the solution of P[Yi] resp. S[Yi]' i.e. to drop the columns in the last problem S[Di,k] related to inactive constraints at xi,k and to add new columns, which in this case, are generated by points from YiH'

A difficulty with the selection (3.4) is that the cardinality of Di,kH can become quite large so that, in practice, a subset of this set may have to be selected by a cumbersome management. An alternative choice, motivated by the success of Roleff's algorithm for the continuous problem and the methods in [62,63, 143,160] for nonlinear problems, would be to let Di,kH be the total set or a subset of all violating discrete local maximizers of g(xi,k,.) on Yi (cf. Section 5.1). But such choice does not yet seem to have been tried in connection with exchange methods.

3.2 A primal exchange method

Another approach to the solution of a linear SIP program is to solve the primal problem P[Y] directly by an extension of the simplex algorithm to infinite-

228 CHAPTER 7

dimensional problems. In analogy to the finite situation, one starts for that from its infinite-dimensional standard form

Ps[Y]: Minimize f(x):= cTx subject to x E IRn , z E C(Y,IR), a(y)T x + z(y) = b(y), y E Y, z(y)~O, yEY.

The approach is based on studies of Nash [139], who characterized the fundamental objects and operations of the simplex algorithm in a general algebraic framework, and of Anderson [2], who interpreted these characterizations for problem Pa[Y] (see also [4,183]). An extension of the simplex algorithm to Ps[Y], working with extreme points of the feasible set, was outlined in [2] arid, at least for the non-degenerate case (see below), completed in [3]. The main ideas of the method are summarized in the following.

Let Y := [0, 1] for the sake of simplicity and let b and the components aj of a be members of COO(Y, 1R). Assume that, for each feasible point (x, z) of Ps[Y], the set Z := {y E Y I z(y) = O} of active points resp. zeroes of z is finite, i.e. that Z = {YI, ... , y xJ. Furthermore assume that each zero Yi has a finite order d(i), i.e. that z(l)(Yi) = 0 for 1= O,l, ... ,d(i) and z(d(i)+I)(Yi) f:. o. Finally,

define t := K, + d(l) + ... + d(K,) and a(l)(.) := (a~I)(-), ... , a~)(.))T and let A be the (t x n)-matrix

A~.- ( ) '() (d(l))() () '() (d(K))( ))T .- a YI ,a YI , ... , a YI , ..... , a YK ,a YK , ... , a YK .

(Note that the defined quantities depend on (x, z).) Then one has the following characterizations.

Theorem 3.2 ([3]) Let (x, z) be feasible for Pa[Y].

(a) (x, z) is an extreme point of the feasible set ¢:} the columns of A are linearly independent (~ t ~ n).

(b) (x, z) is a non-degenerate extreme point of the feasible set ¢:} A is square and invertible.

(c) A non-degenerate extreme point (x, z) of the feasible set is optimal ¢:}

Ai,O ~ 0 and Ai,r = 0, r = 1, ... , d(i), for i = 1, ... , K, where

T ( )T TA~-l A = AI,O, ... , AI,d(I), ..... , AK,O, ... , AK,d(K) := c .


IT a feasible point of Ps[Y] is known, an extreme point of the feasible set with smaller or equal objective function value is obtained after finitely many iterations of a purification algorithm. Such algorithms are described in [3, 123], where in [3] the boundedness of the feasible set F(Y) is assumed, while in [123] only the linear independence of al, ... , an is needed. The algorithm in [123] can be considered as a method of feasible directions which either finds an extreme point or discovers that the objective of pry] is unbounded on F(Y).

IT degeneracy does not occur, an iteration of a conceptual primal exchange algorithm now is derived from Theorem 3.2 as follows [3], where (x, z) is a feasible point of Ps[Y] (for the initialization, such point can be found as described in Section 2.2): IT (x, z) is not extreme, use a purification algorithm to find an extreme point (x, z), next determine Z and X, compute). by solving the system .AT). = c and check), in regard to optimality of (x, z). IT (x, z) is not optimal, determine a new feasible point (x, z) such that the objective function value is not increased and proceed as before. For the latter step, different precautions have to be taken, depending on which of the conditions on the coefficients of ). is violated. Usable descent procedures for a non-optimal extreme point were suggested in [3, 124] where those in [124] ensure in any case that the new point is again an extreme point.

In contrast to finite linear programming, degeneracy, however, is a frequently occurring phenomenon at this problem unfortunately. It is noted in [3] that an optimal solution, for instance, of a standard CAP problem is always degenerate. An example where degeneracy does not occur is given when aj(Y) := yj - 1 and b is such that ben) has no roots in [0,1] [3].

By Theorem 3.2, in particular the optimality test in the computational scheme described above is not possible if (x, z) is a degenerate extreme point. An optimality test which also applies to an arbitrary feasible point is given in [122]: if pry] possesses a Slater point and (x, z) is feasible for Ps[Y], then x is optimal for pry] if and only if the minimal value of the following LP problem is zero:

Minimize cT d subject to a(Yi)T d $ 0, i = 1, ... , s, ±dj $ 1, j = 1, ... , n.

Moreover, procedures to obtain a descent direction and, by that, a new feasible point, when (x,z) is a degenerate extreme point and x is not optimal for pry], are developed in [124].

Convergence of an algorithm which employs the above described ideas has not yet been proved. Another drawback of such algorithm is that it demands a

230 CHAPTER 7

large number of numerically costly feasibility checks and of computations of maximal step-sizes. Verification of feasibility also has to be carried out with high accuracy in order to preserve feasibility. A primal exchange algorithm, however, generates points xk which are feasible for pry] and does not seem to meet the stability problems which are typical for dual exchange methods. (Not-e that the very ill-conditioned tangent problem from [25] was successfully solved in [123,124].)

3.3 Interior-point methods

Until now interior-point methods from finite linear programming have been extended to SIP mainly to theoretically study their behavior when the number of constraints is very large or tending to infinity, rather than for the interest of developing new and possibly competitive SIP algorithms. Therefore only few and small numerical experiments seem to have been carried out with such extensions up to now. The insights obtained from these, however, may be of importance for future developments in SIP and hence are worth to be summarized. Especially feasible regions of finely discretized linear SIP problems have many vertices so that, for example, discretization methods as discussed in Section 3.1 are very costly normally, while interior-point methods, shown to be superior to the simplex algorithm for many large-scale LP problems, stay away from the boundary and hence could be expected to be only hardly influenced by discretization refinements in regard to efficiency. The latter hope, however, is disappointed at least for some methods.

The development of interior-point methods in LP began with Karmarkar's algorithm [112]. Karmarkar started from a certain form of a LP problem which requires the introduction of slack variables for inequality constraints. Powell [168] showed that the possibly considerable increase of the number of variables caused by that (as for discretized SIP problems) can be avoided and applied the algorithm to discretizations of the simple problem displayed by Figure l. The outcome of these experiments is that the worst case number of iterations of Karmarkar's algorithm is bounded by a constant multiple the number of inequality constraints resp. discretization points and that the algorithm can be and also is likely to be as inefficient as the simplex method when the feasible set has many vertices [169, 171]. A natural extension of Karmarkar's algorithm to linear SIP problems (leading to a potential function with an integral over Y) even fails to converge to a solution of this particular problem in theory and practice (see also [211,216] for discussions on that). Eventually comparisons in [170] for some discretized SIP problems indicate that certain so-called small-step


logarithmic barrier methods are similarly inefficient as Karmarkar's algorithm and that long-step logarithmic barrier methods behave more satisfactory, but that all of these methods are surpassed by the SQP type method in [166, 167] which, generating feasible points, avoids early encounters with the boundary of the feasible region.

A variant of Karmarkar's algorithm is the rescaling resp. affine-scaling approach which was first suggested by Dikin [30]. Kortanek [117] (basing on a technical report from 1987) used Dikin's primal affine-scaling algorithm for the solution of the duals of discretized SIP problems. Experiments indicate that, for such problems, the affine-scaling method yields somewhat less accurate solutions in occasionally much less time than the LINOP code of the simplex algorithm in [49] when the discretized problems are approached as large LP problems. (Alternatively a discretization method as discussed in Section 3.1 could be employed, by which a finely discretized SIP problem is split into smaller subproblems.) Further numerical results related to the collapse state of rigid plastic material were presented in [21] and later improved in [1]. A version of Dikin's algorithm was moreover applied to the duals of discretized SIP problems with bounds on the variables and empirically studied with respect to the refinement of the discretization [126].

An extension of the LP primal affine-scaling algorithm to the primal linear SIP problem Ps[Y] in standard form is studied by Ferris and Philpott [43] and shown to perform poorly on some test examples. An implementation of this method requires the computation of O(n2 ) integrals and hence the implicit discretization of Y. The authors therefore applied the LP affine-scaling algorithm to the related primal discretized SIP problem directly and showed that, independent of the degree of discretization, such procedure is more efficient than the proposed SIP affine-scaling algorithm in case of employment of Simpson's integration rule. Convergence of the algorithms for both kinds of discretization is verified. Eventually in [44], an alternative generalization of the affine-scaling method for Pa[Y] is proposed which is superior to the previous one and, with Simpson's rule, needs the same number of iterations as the related LP method applied to the respective discretized SIP problem.

Vanderbei [218] uses an inequality version derived from the standard form of the primal affine scaling algorithm and studies an extension of it to the primal SIP problem pry] for the above mentioned example by Powell. He especially proved that the continuous trajectories converge to the solution of the problem, but that for every positive step-length there exist starting points such that the sequence of iterates does not converge to the solution.

232 CHAPTER 7

Generalizations of various interior-point methods to SIP are thoroughly investigated by Todd [211]. Todd first studies a certain invariance property of interiorpoint methods in the finite case, which provides indication what methods are least influenced by grid refinements in case of solution of discretized SIP problems, and he extends the invariant algorithms to the continuous case afterwards. By this concept the above discussed affine-scaling methods, projective-scaling methods as Karmarkar's algorithm, and some potential-reduction methods can be extended whereas most path-following methods cannot. Convergence and implementations of the so extended methods are not considered since the main interest of the author lies in obtaining information on the behavior of interiorpoint methods for large LP problems. The convergence of conceptual extensions of two primal-dual path-following methods for SIP was investigated by Tuncel and Todd [216] who, under certain assumptions, also provided the first polynomial-iteration complexity bounds for SIP.

Closely related to the logarithmic barrier approach is the method of (analytic) centers (e.g. [182,204]). It employs a logarithmic barrier function which, instead of a penalty parameter, exerts an upper bound of the minimal value of the problem. The natural extension of this function to the primal SIP problem pry] is the function

~(x, A) := -Mln(A - cT x) - i In (b(y) - a(yf x) dy (3.5)

where MEN and>' > J.l(Y) are given constants and ~ (., >.) is defined on the interior of F(A) := F(Y) n {x E }Rn I cT X ~ A} (see also (66) in [211]). The method of analytic centers then consists of consecutively reducing this bound and performing one or more Newton steps for minimizing the barrier function with respect to x.

Using (3.5), Schattler [187,188] has proved convergence of a conceptual adaptation of the method of centers to pry] (without complexity estimate), where in his algorithm a conceptual update of >. is followed by a Newton step with step-size one. He has moreover implemented a related algorithm where, in a "predictor step", an approximation of the new center (the unique minimum of <p(., A) in the interior of F(A)) is obtained by linear extrapolation which, in a "corrector step", is used as a starting point for Newton's method to compute the next center. Concerning the implementation of the algorithm, special attention is put on the computation of the integrals defining \7 x~(x, A) and \7;x<P(x, A). Typically for SIP interior-point methods, these integrals behave as being almost singular close to the boundary of F(Y) and hence necessitate the use of proper integration rules like Radau's or Lobatto's rule [26] suggested by Schattler. Some promising numerical experiments are presented, but these are


too small in order to provide satisfying information on the general performance of the algorithm. Numerical experiments with extensions of the method of centers to general convex SIP programs are also reported by Sonnevend [192] (basing on earlier work in [191]), who, requiring a high degree of smoothness of the objective and the constraint function, alternatively uses high-order extrapolations of the central path. But neither the algorithms nor the numerical results in [192] are stated explicitly.

3.4 Other methods

Some authors use interior point-methods for the solution of the LP subproblems as they occur, for example, in a dual exchange method. Especially in [128,190], the authors consider pry] in a form including bounds x 2: ° and start from the implicit one-point exchange algorithm at which no constraints are dropped (Algorithm 2 with Dk+l := Dk U {yk}). Then, in [128], it is suggested to compute a solution of P[Dk ] by the dual affine-scaling method. Typically for interior-point methods, this is exterior to the feasible set of P[Dk+1]. In regard to numerical efficiency, the SIP problem pry] therefore should have such structure that an interior feasible starting point of P[Dk +1] is easily generated from a solution of P[Dk]. Some small numerical experiments are presented in [128]. In [190] it is shown theoretically that the primal-dual infeasible interiorpoint algorithm of [116] is well-suited for the solution of the LP subproblems instead.

Including the condition x 2: 0 in PlY]' Fang and Wu [41] add, for x > 0, the entropic barrier function v Z=;=l Xj In(xj) with some v > 0 to the linear objective function of P [Y] and solve this perturbed SIP problem up to some accuracy by the original KCG algorithm (cf. Section 4.1), which can be interpreted as the convex generalization of Algorithm 2 with Dk+l := Dk U {yk}. (The objective function of the pertur,bed problem is strictly convex so that inactive constraints of the subproblems could be dropped according to Theorem 4.1 below.) Afterwards v is reduced and the process is repeated. Modifications given in [42,129] allow the simultaneous augmentation of Dk and reduction of v. (Convergence of the methods in [42,129] also follows under weaker assumptions from the above mentioned generalization in [178] of Theorem 2.8.) Numerical experiments with up to 50 variables are discussed, for which the occurring finite convex subproblems were solved by the method given in [39]. The algorithm in [129] is also compared with Algorithm 2 above for the choice Dk+l := Dk U {yk} where this algorithm was applied to pry] directly. The outcomes seem to favor the new method. (The authors do not comment why

234 CHAPTER 7

they used an interior-point method for the solution of the LP subproblems in the implementation of the latter approach rather than the simple and efficient dual procedure outlined in Section 3.1.) Finally, in [127], a solution of pry] is approximated by the unique minimizer of an unconstrained convex function involving an integral penalty term, and convergence is proved under certain assumptions in case the penalty parameter converges to zero. Except for an one-dimensional example, the numerical behavior of the method, however, is not explored there.

An unconventional method for linear SIP problems was recently proposed by Hu in [95]. The given example with n = 5 variables, however, reveals that the method needs an unusually high number of costly computations of (nearly) global maximizers.

4 CONVEX PROBLEMS

In this section we generally assume that the functions f and g(., y), y E Y, in problem pry] are convex and that f E C1(Rn, R) and g E C1,O(Rn x Y, R).

4.1 Cutting plane methods

In the beginnings of nonlinear programming, cutting plane methods were suggested for the solution of finite convex optimization problems. In particular the methods of Kelley-Cheney-Goldstein (KCG) [20,113], Veinott [221], and Elzinga and Moore [35] were popular and modified in various ways (see e.g. [33,93, 178] for references on cutting methods). These procedures can be straightforwardly extended t~ convex SIP problems, which for the KCG and the Elzinga-Moore algorithm was done first in [10,12,34] resp. [207]. Applied to linear SIP problems, especially the so extended KCG type algorithms turn out to be identical or to be modifications of dual exchange methods as they were discussed in Section 3.1 (see below). Therefore these cutting plane methods may be considered as generalizations of dual exchange methods to convex problems.

Consequently, cutting plane methods also have similar properties as dual exchange methods. They converge globally under weak assumptions, and the duals of the finite subproblems in the algorithms can be solved very efficiently. Only rows or columns have to be added and dropped in the current subprob-


lem, and its solution can be used as a starting point for the solution of the subsequent problem. Cutting plane methods, however, also typically encounter the same stability problems which are observed for dual exchange methods, so that employment of a numerically stable solver for the finite subproblems in the algorithms is crucial.

The numerical effort for solving the subproblems in a cutting plane algorithm is relatively small compared with other algorithms. This has to be regarded when the usually slow convergence of such methods is pointed out. Indeed, comparisons of the cutting plane method in [160] and the (under proper assumptions) locally superlinearly convergent method of Haaren-Retagne [77], which uses an SQP method for the solution of the subproblems (cf. Section 5.2), showed that the cutting plane method quite often is superior to that one with respect to computing time when normal accuracy is required. (Mrs. Haaren-Retagne had kindly made her code available to the first author.) Clearly, if requested, a cutting plane method may be used in a first phase only and be followed by a locally convergent method with a higher convergence rate (cf. Section 5.2).

In the next two subsections, a KCG type method and the Elzinga-Moore cutting plane method are specified for SIP problems. In particular the KeG type method of [158, 160] has turned out to be a very reliable method for the solution of linear and convex SIP problems and has provided highly accurate solutions of a large number of problems with up to 1000 variables [156-161]. Numerical examples of nearly that size and variety have not yet been published for any other method.

4.1.1 A KeG type method

Cutting plane methods require the knowledge of a compact set X which encloses the feasible region of the problem or, since this may not be bounded, a level set. Also, for practical purposes, X needs to be a polyhedral set, i.e. to be defined by finitely many linear inequality constraints. Provided that Assumption 2.1 is satisfied with respect to some xF E F (Y) and finite Yo ~ Y, which we assume throughout this subsection, it is therefore reasonable to search for a polyhedral set X with X ;2 A(xF , Yo). Especially, if pry] is a linear problem, X := A(xF , Yo) is a suitable choice. The constraint f(x) ~ f(x F ) can always be ignored since it only becomes active if xF is a solution of the problem [177].

Applied to linear SIP problems and with X := A(xF , Yo), the original KCG method extended to infinitely many constraints just is the one-point implicit dual exchange Algorithm 2 with choice DkH := Dk U {yk}, which again is

236 CHAPTER 7

closely related to the first algorithm of Remes for linear real CAP problems. Alike these, the KeG method suffers from the monotonic growth of the number of constraints in the subproblems. It has therefore been of major interest to verify the convergence of modifications of this method which include rules to also drop constraints (e.g. [33,93]). The following algorithm given in [178] is such a modification for SIP problems (see also [34,208]). For finite problems its convergence was proved in [33,213,214].

Algorithm 3 Step O. Choose a finite set Yo ~ Y with IYol ;::: n and a set X 2 A(xF , Yo) for some xF E F(Y). Set Mo := X, No := IRn , and k := o.

Step 1. Find a solution xk E IRn of the problem

Pk := min {f(x) I x E (Mk n X)}. (4.1)

If k > 0, let Nk be the solution set of those inequality constraints which define Mk and are active at xk.

Step 2. Find a global maximizer yk E Y such that g(xk, yk) = maxyEY g(xk, y}.

Step 3. If g(xk,yk) ~ 0, stop! (Then Xk solves P[Y].) Else choose a set Dk+l ~ Y with Dk+l 2 {yk} and let

Mk+1 := Nk n {x E IRn I g(xk ,y) + \1x g(xk,y)T(x - xk) ~ 0, Y E Dk+d.


Theorem 4.1 Let Assumption 2.1 be satisfied with respect to the initial set Yo ~ Y and xF E F(Y), and let X 2 A(xF , Yo) be convex and compact. Furthermore, let f be strictly convex. Then Algorithm 3 either stops after finitely many iterations with a solution of P[Y] or it generates an infinite sequence {xk} which converges to the unique solution x of P[Y]. Moreover, {pkl converges monotonically increasing to j.t(Y) for k -t 00.

Proof. Due to the assumptions, problem P[Y] has a unique solution. By convexity of g(.,y) one has g(xk,y) + \1zg(xk,y)T(x - Xk) ~ g(x,y), y E Y, for x E IRn. Noting that A(xF , Y) ~ A(xF , Yo) ~ X, we therefore obtain A(xF , Y) ~ (Mk+1 n X) and hence Pk+l ~ j.t(Y).


Obviously, there exist a subsequence {Xki } of {xk} and x and x+ such that Xki -t X and Xki +1 -t x+ for j -t 00. Furthermore, one can easily show that Pk = min{f(x) 1 x E (Nk n X)}. Thus, since Mk+l ~ Nk, we have

(4.2)

and, since xk E Nk and Xk+l E Nk, we have

f(xk) ::; f()..x k + (1 - )..) xk+ 1 ), ).. E [0,1]. (4.3)

(4.2) and (4.3) imply f(x) = f(x+) and f(x) ::; f()..x + (1 - )..)x+), ).. E [0,1]. Since f is strictly convex, both is possible only for x = x+. Consequently, if x is in F(Y), the assertion follows with f(x) ::; j.t(Y).

There exists fj E Y such that w.l.o.g. yki -t fj for j -t 00. Since g(Xki, y) ::; g(xki,yki), we get g(x,y) ::; g(x,fj) for y E Y. Moreover we have g(xki,yki) + \7 xg(xki, yki)T (xki+l - Xki ) ::; 0 which, for j -t 00, implies g(x, fj) ::; O. 0

Likewise the Veinott supporting hyperplane method [221] can be generalized to SIP problems, and its convergence can be proved for the same constraint dropping rule as used in Algorithm 3. Provided that a Slater point x S E IRn

exists and is known, such algorithm, however, requires the computation of the unique zero of z()..) := maxyEY g()..xs + (1 - )..)xk, y) in [0,1], which is very costly in SIP numerically. Moreover, experiments for finite problems indicated that the KCG algorithm with cuts for all violated constraints (compare our choice below) is similarly efficient as the Veinott method [31].

In applications of cutting plane methods it is typically assumed that a set X containing a solution of the problem is known and given by box constraints (e.g. [118,207,208]). For many problems, however, such a set is not available. Numerical experiments also have shown that the size of X considerably influences the number of iterations needed to find a solution with prescribed accuracy (e.g. [31]). It is therefore profitable to search for a polyhedral set X ;2 A(xF , Yo) which does not use a priori bounds on a solution and simultaneously encloses the level set A(xF , Y) tightly. For the real or complex CAP problem (2.8) with F(x,·) := v(·)Tx, such a set, fulfilling the assumptions of Theorem 4.1, is determined by the following procedure which replaces Step 0 in Algorithm 3 [177]. Note that it suffices to find a polyhedral set X which encloses a level set for the unconstrained CAP problem since X then also surrounds the corresponding set for each constrained problem.

Step O. (i) Let Y := n and choose a finite set Yo := no with IYoI 2: ,+ 1 where , is defined as in Theorem 2.6.

238 CHAPTER 7

(ii) Compute a solution xo := (XO, x~) E ~n of the LP problem

Minimize f(x) := Xn with x := (x, x n ) subject to x E ~n-l, Xn E ~, ± Re {d(y) - v(yfx} - Xn :::; 0, y E Yo, ±Im{d(y) -v(y)Tx } -xn:::; 0, Y E Yo.

(4.4) (Drop the last 2IYoi constraints for real problems.) If x~ = 0, choose Y1 ~ Y with dist(Y1 , Y) < dist(Yo, Y), set Yo := Y1 , and go to the beginning of (ii).

(iii) Let ak(x) :::; 0, k E I, denote those of the 41Yol resp. 21Yol constraints of problem (4.4) which are active at xO and let x F E F(Y). For q = 1 in (2.8), set

X := {x E ~n I ak(x) :::; 0, k E I; f(x):::; f(x F )},

define Po := x~, and use xO. If q = 2, set

X := { x E ~n I ak(x) + (1 - 2!~ )xn - x2~ :::; 0, k E I; f(x):::; f(x F )}

(4.5)

(iv) Let M o := X, No := ~n, and k := 0. (Then xO solves problem (4.1) for k := ° with minimal value Po.) Go to Step 2 of Algorithm 3.

Remark 4.2 (1) For each choice Yo ~ Y, problem (4.4) has a solution.

(2) If d E C(Y, OC) does not lie in the span of VI, ... , Vn-l E C(Y, OC) and if dist(Yo, Y) is sufficiently small, then one has x~ > 0.

(3) The constraint f(x) :::; f(x F ) and hence x F are not needed in practice [177}.

(4) The derivation of (4.5) requires that problem (4.4) has a unique solution, which on a computer is almost always the case. If (4.4) is not uniquely solvable, then the set of active indices I has to be exchanged for the total index set which has cardinality 4IYoi resp. 21Yol.

In order to reduce the size of problem (4.1), it is often suggested to take the constraints defining X out of the program as soon as none of them is active any more at the solution of (4.1). For the set Dk+l in Algorithm 3 one typically finds the choice Dk+l := {yk} in the literature (e.g. [10,208] for SIP). Like for exchange methods, however, the inclusion of additional points in Dk+l is


essential in regard to efficiency if higher-dimensional problems are to be solved (e.g. [31]). Therefore Dk+l should contain all violating local maximizers, as suggested for linear problems in [181]' or all c-global violating local maximizers, as used in [160].

Obviously, the "feasibility check" g(xk, yk) ::; c for some prescribed c > 0 may serve as a simple stopping criterion for Algorithm 3. A more meaningful criterion can be given, for example, in case of linear CAP problems. If especially q = 2 is chosen in problem (2.6) resp. (2.8), one has

(y'Tn-l - ffi) / ffi ::; (.;x;. - ffi) / ffi (4.6)

for each point (x, xn ) E K n - 1 x lR which is feasible for (2.8). In particular for K n - 1 := lRn - 1 , the vector (xk, Pk + g(xk, yk)) is feasible so that the relative error of...j1ik with respect to the continuous approximation error y'Tn-l is less than or equal to some prescribed c when this is true for the right-hand side of (4.6) with Xn:= Pk + g(xk,yk).

Convergence of Algorithm 3 is only proven if the objective function of P[Y] is (almost) strictly convex [178]. In case it is not strictly convex, we can w.l.o.g. start from a problem of type P[Y] with linear f (see Section 2.2) and reach strict convexity by adding an artificial quadratic term oxT x with tiny 0 > 0 to f. The problems (4.1) and (4.4) then become QP problems. A proper tool for the solution of such problems is the dual algorithm of Goldfarb-IdnaniPowell [60, 165], for which a code is given in [164] and implemented as routine DQPROG in [98]. (The authors of [160] altered the code in [164] in order to use the k-th solution as a starting point for the (k + l)-th problem.)

In practice, Algorithm 3, however, seems to converge in general also for linear f. If it is applied to linear problems and if X := A(xF , Yo) and Dk+1 is the set of all violating local maximizers, it just becomes the well working algorithm of Roleff after the constraints defining X have been dropped (cf. Section 3.1). Note also that the authors of [156-161] have usually chosen 0 := 10-40 in the above mentioned quadratic term (a 0> 0 is needed for DQPROG) and have always obtained highly accurate solutions with Algorithm 3 in the above specified form, where the algorithm has been applied to a large variety of complex Chebyshev and least-squares filter design problems with up to 1000 variables.

The rate of convergence of the KCG and the Veinott algorithm where inactive constraints are dropped and only one constraint is added at each iteration was proved for finite problems to be at least arithmetic [214]. (Arithmetic convergence is less than linear convergence.) It is not clear what the convergence rate is if, for example, all violated constraints are included in the subproblem or,

240 CHAPTER 7

in case of SIP, all violating local maximizers are added to Dk+l. The results in [90,91) and the experiences of the authors of [157,160) suggest that, in such case, the rate is superlinear if the problem has a strongly unique solution. The latter is given for linear real CAP problems under the Haar condition [19,91).

A discretization method, which internally employs Algorithm 3 for the solution of a discretized problem P[YiJ, was developed in [177,178) and applied to large filter design problems in [159). It can be considered as an extension of the discretization method [176) (described in Section 3.1) to the convex case. This method requires significantly more numerical effort than its semi-continuous counterpart, as comparisons in [160) show. But it is in particular capable of yielding accurate solutions when g(15,·) is almost constant on Y at a solution x of pry) [177). See also [86) in this connection which contains a generalization of the discretization method for linear problems from [85) to convex quadratic problems.

4.1.2 A central cutting plane method

The central cutting plane method of Elzinga and Moore [35) was generalized to linear SIP problems by Gribik [66) (see also [76)) and to convex SIP problems independently by Tichatschke and Lohse [207) and Kortanek and No [118). The method is described as follows where w.l.o.g. f is assumed to be linear.

Algorithlll 4 Step O. Choose Ii > Jl(Y), /3 E (0,1), and a polyhedral set X ;2 A(xF , Y) for some x F E F(Y). Set k := 0.

Step 1. Determine a solution (Xk, ak) E IRn x IR of the LP problem

Sk: Maximize a subject to f(x) + a::; Ii, x E X.

Step 2. If ak = 0, stop! (Then xk solves P[Y).) Otherwise, delete constraints in problem Sk according to either or both of the following rules (if requested) and call the resulting program Sk again.

Rule 1: If xk E F(Y), delete the constraint f(x) + a ::; Ii or any constraint generated ever at Step 3 (i).

Rule 2: Delete an inactive constraint from Sk if it was generated at Step 3 (ii) of the j-th iteration for j < k and if ak ::; /3aj.

Step 3. (i) If Xk E F(Y), add the constraint f(x) + a ~ f(x k) to Sk.

(ii) If xk i- F(Y), determine yk E Y with g(Xk, yk) > 0 and add to Sk the constraint

Step 4. Call the resulting program Sk+b set k := k + 1, and go to Step 1.

By (2.12) and X ;2 L(xF , Y), we have infxEF(Y) f(x) = infxE(F(Y)nx) f(x). Hence we can translate the results from [118,207] into the form here and obtain the following theorem.

Theorem 4.3 ([118, 207j) Let f be linear and pry] have a Slater point. Moreover, let Assumption 2.1 be satisfied and let X in Step 1 be bounded. Then Algorithm 4 either terminates after finitely many iterations with a solution of pry] or the following is true:

(a) It generates an infinite sequence {xk} such that { xk} has an accumulation point and each such point solves PlY].

(b) There exists a subsequence {xkj } of {xk} with x kj E F(Y), and for each such subsequence one has 0 ~ f(x kj ) - J.L(Y) ~ c (J(xkj - 1 ) - J.L(Y)) with some c < 1.

Algorithm 4 seems to require only the knowledge of a violated constraint rather than of the currently most violated constraint. For most problems, however, checking feasibility of Xk in Step 3 just means to compute the latter. It is also not clear whether choice of the most violated constraint improves the performance of the algorithm. The number of constraints in the subproblems here depends on the choice of f3 and hence cannot be controlled that easily as in Algorithm 3, where convergence is always obtained as long as the currently most violated constraint is included in the program. But, other than for Algorithm 3, strict convexity of f is not needed here for the convergence proof.

Algorithm 4 has been applied to the linear SIP formulation of geometric programs [67], to convex SIP programs including complex CAP problems [118], and to certain filter design problems [136]. The presented numerical examples, however, are low-dimensional and do not allow a comparison of the capabilities

242 CHAPTER 7

of both Algorithms 3 and 4. Improvements of Algorithm 4 seem to be possible and were discussed in [35] for the finite case.

4.2 Other methods

In [206,209], Tichatschke and Schwartz developed methods which combine an adaptively defined discretization scheme with the approach of feasible directions known from finite optimization. These methods are essentially conceptual methods since the grid widths are based on the knowledge of a Lipschitz constant of g(xk, .) on Y for the current iterate Xk. Convergence to a solution of the SIP problem, however, can be shown for the entire sequence of iterates over all grids.

Kaplan and Tichatschke [106-111] suggested several variants of a numerical approach to convex ill-posed SIP problems, partially in a general functional analytic framework. A definition of ill-posedness is presented in [107] and includes the situation shown by Example 2.7 for a nonlinear problem. The methods consolidate a penalty technique with discretization and iterative proxregularization. Numerical experiments with these methods have not been reported.

An interior-point approach of Sonnevend [191,192] was already mentioned in Section 3.3. A logarithmic barrier interior-point method for finite convex programs, which is of particular interest for SIP but cannot be described in this limited framework, was proposed by den Hertog et al. [28]. The method combines a logarithmic barrier function approach with a cutting plane technique, which allows to add and delete constraints, and hence is capable to solve finely discretized convex SIP problems. The performance of the method was displayed for the ill-conditioned tangent problem from [25] with up to n = 30 variables and 106 discretization points. Kaliski et al. [105] modified this method and incorporated it into a "dynamic" heuristic discretization procedure for SIP problems. They also present a large number of test examples with up to n = 137 variables, which are the outcomes of implementations on sequential as well as parallel computers. (The accuracy of the results in [105], however, seems to be low, possibly due to an error in the code. Most given optimal objective function values are distinctly below those obtained by other authors but should, in contrast to the interpretation of the authors, normally be larger than these according to the claimed fineness of discretization. Compare, for instance, the "exact" solutions of Examples 4.1 and 4.2 in [176] with those in [105] which are said to relate to a grid that is 8 times finer than the finest grid used in [176].)


The method of (41) discussed in Section 3.4 can be extended to convex SIP problems with linear constraints which is done by Fang et al. [36,40) for a quadratic and an entropic objective function f(x) := L7~; Xj In(xj) respectively (see also (38)). The method in (40) is applied to two (quite special) problems for various n up to n = 5000. The method of (36) also is compared with the original KeG algorithm, where again the linear subproblems were solved by an interior-point method rather than by an update of the dual as proposed usually.

5 NONLINEAR PROBLEMS

Many numerical methods in SIP are founded on results from constrained finite programming, which cannot be given here in detail. The reader who is not familiar with nonlinear programming and especially with SQP type methods, is advised to consult e.g. [7,8,13,47,195) in this connection. We especially make use of the Karush-Kuhn-Tucker (KKT) conditions and KKT points (cf. Definition 2.14) and distinguish between r-convergence and usual (q-}convergence in case of rate of convergence results (e.g. [142,150)). Throughout this section we assume that we have f E C3 (JRn, JR) and 9 E C3,3(JRn x Y, JR), where, for Section 5.1, 9 E C3,O(JRn x Y, JR) suffices.

5.1 Discretization methods

In Section 2.5 the general concept of a discretization method was described, and two stability results for convex and nonlinear SIP problems, essential for such approach, were given. In the following we discuss special discretization methods and methods which yield an (approximate) solution of one particular discretized SIP problem P[Yi) and take advantage of the special nature of the constraints in such finite optimization problem. We again mention in this connection that it is usually very inefficient to solve discretized SIP problems by general purpose finite programming algorithms.

We assume that {Yi} is a sequence of finite subsets of Y with dist(Yi, Y) -+ 0 for i -+ 00. For x E JRn and c ~ 0 given, we denote the set

Yi,c(x) := {Y E Yi I g(x,y) ~ maxg(x,y) - c} yEYi

244 CHAPTER 7

as the set of E-global points at x with respect to Yi. Some authors have alternatively used the set of E-most active points

Yi~(x) := {Y E Yi I g(x,y) 2:: max max {g(x,y),O} - E} , yEY;

[62,135,154,155]' which equals Yi,e(x) for each point x E IRn outside or on the boundary of F(Yi). Note that especially Yi,o(x) represents the set of discrete global maximizers of g(x,·) on the grid Yi. We furthermore introduce the discrete neighborhood Ui(Y) of Y E Y; with respect to Yi, which consists of y and neighboring points of y (cf. [63] for a precise definition). Especially, if Y is an interval or a rectangle and Yi ~ Y is an equispaced grid, the neighborhood Ui(y) of an interior point y E Yi consists of y and the two resp. eight adjacent discretization points. Related to Ui(y), we, moreover, define the set of discrete E-global local maximizers

Yi~e(x) := {y E Yi,e(x) I g(x,y) 2:: g(x,y), y E Ui(Y)}. (5.1)

Methods of feasible directions (e.g. [232]) generate feasible points and hence require the solution of an auxiliary problem to obtain a feasible starting point if such point is not available (cf. problem (2.5)). In the seventies, Polak and coauthors have developed methods for finite optimization problems which solve the auxiliary and the given problem simultaneously rather than successively and can start from an arbitrary point in ~n (e.g. [155])2. Such a combined approach naturally led to the introduction of the set of E-most active points [155], which, for a feasible point, reduces to the set of E-active constraints that is often used in connection with methods of feasible directions.

For SIP problems and E > 0, the set of E-(most) active points is normally an infinite set so that a direct transcription of (combined) feasible direction methods for finite problems into SIP methods would lead to subproblems which themselves would have infinitely many constraints [62,153]. For SIP problems, therefore, Polak and Mayne [153], Gonzaga et al. [62], and Panier and Tits [143] embedded such combined feasible direction methods for finite problems into a discretization scheme, where in [153] the subproblems on the i-th discretization level contain one constraint for each member of the entire set Yi~(x), while in [62] the special structure of a discretized SIP problem is exploited and constraints are needed only for the usually much smaller set of discrete "left" local maximizers in Yi~(x). In [143], further points were added to the latter set to improve the behavior of the algorithm in [62]. (The respective subproblems

2Much of the work by Polak and his coauthors to which we refer here is contained in [150).


are LP [153] resp. QP problems [62,143].) Convergence of certain (approximate) stationary points of P[Yi] to a related stationary point resp. KKT point of pry] is proven under relatively weak assumptions [62,143,153]' where the inner algorithms to solve the discretized problems use only first order information and hence normally have at best a linear rate of convergence. A numerical comparison of the method in [62] with other methods can be found in [197].

Later, Polak and He [151] have slightly improved the aforementioned method for finite problems from [155] and shown the r-linear convergence of this new method to a stationary point of the problem under proper assumptions. (The new method is also conceptually stated for SIP problems.) At this method all constraints have to be respected for the construction of the QP subproblems. A modification, at which only the €-most active constraints have to be considered, is given in [150, p.279] but without rate of convergence result. Polak and He have furthermore incorporated the method from [151] into an adaptive discretization procedure such that, for certain convex problems, the entire sequence of iterates has the same r-linear rate of convergence as the inner method for the finite subproblems [152]. Finally, see [150, p.479] for a combined discretization and exact penalty function method using first order information.

Asic and Kovacevic-VujCic [6] suggested an unconventional discretization procedure for problems with convex constraints, which generates feasible points with respect to the SIP problem. The method requires the knowledge of a Lipschitz constant, which determines the density of the first mesh, and the provision of some "reach function", which controls the selection of discretization points in the algorithm. Furthermore, Huth and Tichatschke [96,97] proposed a hybrid method, in which, in the first phase, successively refined discretized problems P[Yil are solved by an (at best linearly convergent) linearization method, where the subproblems contain constraints for all elements of Yi,€(x) with a certain € 2: 0 (cf. also Sections 5.2 and 5.3).

Some authors have developed special algorithms for the solution of finite nonlinear optimization problems with large numbers of constraints, as they occur, for example, at discretized SIP problems. The purpose of such developments is to considerably reduce the size of the arising QP subproblems and the number of gradient evaluations, compared to standard nonlinear programming methods. An SQP trust-region algorithm of this type, which uses second derivatives and incorporates constraints for all €-most active points into the QP subproblem at the current iterate, was presented by Mine et al. [135]. In [189], moreover, Schittkowski proposed some modifications of SQP type algorithms in this respect (without convergence proofs) and gave results of numerous numerical experiments. Eventually, an efficient SQP algorithm, which is applicable to finely

246 CHAPTER 7

discretized SIP problems with linear constraints, was developed by Powell [166], who also provided a code for his algorithm [167].

Lawrence and Tits [121] suggest an SQP algorithm for the solution of P[Yi], at which only few constraints enter the finite subproblems normally. The authors present a large number of well documented numerical examples, together with a highly reliable code contained in the software package CFSQP. A characteristic feature of this algorithm is that it needs a feasible starting point and that the iterates remain feasible with respect to P[Yi]. As we mentioned in Section 2.5, it is not clear, however, what effect the enforcement of feasibility has on the efficiency of the method when feasible points are not readily available and a sequence of discretized SIP problems is solved on progressively refined grids. (A solution of P[Yi] is usually nonfeasible for P[Yi+1], e.g. when IYi+ll > IYiI.)

In his PhD thesis [63], the second author suggested a two-phase hybrid method for the solution of P[Y], which, in the first phase, uses an r-superlinearly convergent SQP type algorithm to solve a (finite) sequence of discretized SIP problems. Convergence of "solutions" of the discretized problems P[Yi] , i = 0,1,2, ... , to a "solution" of the SIP problem and hence theoretical convergence ofthe total two-phase procedure are guaranteed (cf. Theorems 2.13 and 5.1 and consult Sections 5.2 and 5.3 for more information). In particular, the algorithm for the solution of P[Yi] reads as follows, where, for the sake of simplicity, we delete the index "i", relating to the current grid Yi, at all iterated values. (The algorithm is stated here in a rudimentary form. The reader is referred to [63) for more details.)

Algorithm 5 Step O. Select a E (0, ~), (3 E (0,1), P E [0,1), and € > 0. Choose PQ..> 0, X o E 1R~ a symmetric, positive definite matrix Ho E IRnxn , and a subset Yo ~ Yi with Yo 2 Yi~f:(xO). Set k := 0.

Step 1. Compute the unique solution (dk , ~k) E IRn x IR of the QP problem

Minimize ~d1' Hkd + '\l f(Xk)T d + Pk~ subject to g(xk, y) + '\l xg(xk, y)T d ~~, Y E Yk, (5.2)

~ ~ 0,

and associated Lagrange multipliers (,Xk, ).k) E JRIYk I x JR. If IIdkll = 0, stop!


Step 3. Let e E No be the smallest number such that, for Loo as in (2.17), tk := f31 satisfies

Step 4. Compute HkH by Powell's modification of the BFGS update with respect to the Hessian of the Lagrangian Li(x, A(Yi)) := f(x) + E yEYi A(y)g(X, y)

of P[Yi] [162]' where A(Yi) := (A(y))YEY;, and choose a set YkH ~ Yi such that YkH 2 Y/,,(xk+ l ). Let k := k + 1 and go to Step 1.

For i = 0, arbitrary quantities with the required properties can be chosen to initialize the algorithm, while, for i ~ 1 and e.g. Yi 2 Yi-l' the respective quantities obtained on the preceding discretization level can in general be employed completely and directly. Furthermore, one can show that, for each kENo, the quadratic problem (5.2) has a unique solution [63, p.113] and inequality (5.3) holds true for all t E (0, tk] with some tk E (0,1] [63, p.125]. Consequently, an Armijo type step-size tk satisfying (5.3) exists. Different stopping criteria for SQP type methods are discussed e.g. in [51,63]. Thus Algorithm 5 is implementable and, according to the following theorem, convergent.

Theorem 5.1 ([63j) Let Algorithm 5 not terminate after finitely many iterations such that it generates infinite sequences {xk}, {Pk}, and {Hk}. Assume that there exist positive numbers 'Y E IR and r E IR such that

and that there exist ko E No and p* > ° such that Pk = p* for all k ~ ko. Furthermore assume that, for each accumulation point x* of {xk}, the discrete local maximizers of g(x*,·) on Yi are strict, i.e. that, for each fj E Yi~,,(x*),

g(x*,fj) > g(x*,y), y E Ui(fj)\{fj}.

(i) Then each accumulation point x* of the sequence {xk} is a stationary point of L oo (·, p*, Yi).

(ii) If a subsequence {xkj } of {xk} converges to x* for j -+ 00 and simultaneously ~kj = 0 is valid for j E No, then x* is a KKT point of problem P[YiJ.

248 CHAPTER 7

(iii) If there exists a compact set B ~ IRn such that xk E B is true for all kENo and the gradients 'V x g (x, y), Y E Yi,o (x), art; linearly independent for all x E B, then the sequence {xk} possesses an accumulation point x* E Band each such point is a KKT point of problem P[Yi].

Bertsekas [8, Section 4.2] has proved convergence of an algorithm for general finite optimization problems, which is similar to Algorithm 5 and uses the upper set Yi,f:(xk) instead of Yi~f:(xk). Theorem 5.1 reveals that, for discretized SIP

problems, Yi,f:(xk) (or Yi~(xk)) can be replaced by the normally much smaller

set Yi~f:(xk). The theorem can be proved along the lines of [8], if it is first established that x kj -+ x* for j -+ 00 implies Yi,o(x*) ~ Yi~f:(xkj)for all sufficiently large j E No [63]. If, in addition, the Maratos effect avoiding scheme from [132] is properly incorporated into Algorithm 5, then, under some additional assumptions usually required in this context (e.g. [13,121]), the sequence {xk} also converges r-superlinearly to x*.

Thus, the QP subproblems to be solved in Algorithm 5 are normally rather small, since constraints have to be included only for discrete c:-global local maximizers on Yi (choice Yk := Yi~f:(xk)). The formulation of the algorithm, however, allows the incorporation of further constraints without jeopardizing global convergence. Such constraints can be selected according to a strategy suggested in [189] or any other strategy which the user considers to be advantageous. Note that, on one hand, more constraints are likely to lead to more suitable search directions and hence less iterations to satisfy a suitable stopping threshold, but that, on the other hand, at each iteration, larger QP subproblems have to be solved and more gradient evaluations become necessary then. By the experiences of the second author, a profitable choice for the over-all performance of the algorithm is the one suggested in [121,231]. Results of the numerical labor evaluated by function and gradient evaluations can be found in [63] for a set of 37 test examples. Some of these results show that, for the solution of a finely discretized SIP problem (to obtain e.g. a starting point for a phase two method), it is more efficient to start with a coarse grid and to progressively refine the grid than to proceed to a fine grid immediately.

We finally mention that also stochastic discretization procedures have been developed which provide quasi-optimal solutions for nonlinear SIP problems [222]. The algorithm in [222] incorporates stochastic strategies to drop irrelevant and to add relevant constraints to the subproblems and is shown for some linear SIP problems to yield decent results.


5.2 Methods based on local reduction

In this subsection we employ the notations and results of Section 2.6. We especially let Y be defined by functions Cr E C3(Y,~) as in (2.19).

If problem P[x] in (2.20) is regular for x E ~n, then, by Theorem 2.16 and Corollary 2.17, the sets

and yg(x) of local resp. global maximizers at x have finite cardinalities rl(x) resp. rg(x), and there is a neighborhood U(x) of x such that x E U(x) is a member of the feasible set F(Y) of the SIP problem if and only if x E U(x) satisfies the finitely many inequality constraints

gj(x) := g(x,yj(x)) ~ 0, j = 1, ... ,rl(x), (5.4)

where the yj(x) are the local maximizers of g(x,·) with respect to Y. (Note that, for x ft F(Y), the set U(x) n F(Y) may be empty.) While the functions gj themselves usually cannot be given explicitly, the function values gj (x) obviously are computable for x E U (x).

Methods, in which the semi-infinite constraint g(x, y) :::; 0, y E Y, of the SIP problem pry] is locally replaced at x by the finitely many constraints in (5.4), are called reduction based methods. We distinguish between locally convergent methods, where x is assumed to be a KKT point of the SIP problem and the reduction to the finite problem is only performed once, and globalized methods, at which a reduction is also executed for iterates of an, as is hoped, globally convergent iterative procedure. The advantage of such methods certainly lies in the fact that they only deal with relatively small finite programs internally. These programs are convex for linear and convex SIP problems, since, in these cases, the functions gj are strictly convex on U(x) [91, p.90], and they are usually nonlinear for all other problems. The drawbacks of reduction based methods are connected with the fact that the set U (x) and the functions gj are not known explicitly.

In the following we discuss locally convergent and globalized reduction based methods separately. (Several papers are referenced in both parts since they contain local as well as global convergence results.) Reduction based methods have already been reviewed before in [65,88]. Indeed, we repeat here many arguments from [65,88] for the sake of completeness of this article, but we also provide additional information on such methods.

250 CHAPTER 7

5.2.1 Locally convergent reduction based methods

Related to the reduced problem Pred[X] (2.23), we define the Lagrange function

",rl(x) . Lred(X, u) := f(x) + L.Jj=l UjgJ (x), (x, u) E U(x) x IRr, (x)

for x E IRn and let J(x) := {j E {I, ... , rl(x)} I yj E y9(X)}. In regard to the computation of the gradient '\7 xLred(X, u) and the Hessian '\7 ;xLred(x, u) we refer to Remark 2.18. Next, we make the following assumption.

Assumption 5.2 1. x* E F(Y) is a KKT point 0/ pry] resp. Pred[x*] , i.e. an u* E IRrl (x·) exists such that '\7 xLred(X*, u*) = 0, u* 2: 0, uj = 0, j ~ J(x*).

2. The strict complementary slackness condition (SGS) holds at x*, i.e. one has uj > 0, j E J(x*).

3. The linear independence constraint qualification (LJGQ) is satisfied at x* , i.e. the gradients '\7xg(x·,y), y E y9(X*), are linearly independent.

4. The strong second order sufficiency condition (SSOSG) is valid at x*, i.e. one has e'\7~xLred(X*,U*)~ > 0, ~ E {~E IRn I e'\7gj (x*) = 0, j E J(x*)}\{O}.

5. Problem P[x*] is regular.

Assumption 5.2 implies that x* is an isolated local minimizer of pry] [180,233]. Some of the succeeding results can be proved under weaker conditions which are founded on results in [100,115,179] and summarized e.g. in [63,64,233]' but these are quite sophisticated and less practicable than Assumption 5.2.5.

The methods regarded in this subsection start from an approximate solution xO E IRn of an isolated local minimizer x* E F(Y) of the SIP problem for which Assumption 5.2 is believed to be fulfilled. The vector XO is usually obtained in phase J by one of the methods treated in the earlier sections and is assumed to be a member of U(x*), so that especially the number rl(xO) oflocal maximizers of P[XO] is identical with the number rl(x*) of local maximizers of P[x*]. Consequently problem Pred[X*] can be considered instead of pry] (see also Section 5.3 in this regard).

The reduced problem Pred[X*] is approached in phase II by a locally convergent nonlinear programming method where the condition x E U(x*) is ignored. It


is assumed that this method converges for each Xo E U(x*) to x*, i.e. to the center of U(x*), and that hence all iterates generated by it remain in U(x*). We say in this case that the method is locally convergent. Thus w.l.o.g. it is assumed in the following that U(x*) is a subset of the convergence region of the respective method. Obviously, such two-phase approach is only meaningful if the method employed for the second phase is more efficient than the one for the first phase.

The most efficient methods for solving finite nonlinear optimization problems until now are SQP methods (e.g. [7,13,195]). Naturally, such methods have also been adapted to solve the reduced problem Pred[X*]. A conceptual, locally convergent SQP type algorithm for this problem has the following form.

Algorithm 6 Step O. Provide Xo E U(x*), yj(xO), j = 1, ... ,rl(x*), and UO E IRrl (x') as well as a symmetric matrix Ho E IRnxn. Set k := 0.

Step 1. Compute a solution dk E IRn and an associated Lagrange multiplier vector Uk+l E IRrl (x') of the QP problem

Minimize ~dT Hkd + \7 f(Xk)T d subject to gj(xk) + \7gj(xk)T d ~ 0, j = 1, ... ,rl(x*).

Step 2. If Ildk II = 0, stop! Else set xk+l := xk + dk.

Step 3. Use yj (Xk) as starting values to compute local maximizers yj (xk+ 1 ) for all j E {I, ... , rl(x*)} by an iterative method and update Hk properly.


A suitable choice of uO is provided by a solution of the least-squares problem (5.7) below for k = ° and rl(xk) = rl(x*) [13,50,51]. Owing to Assumption 5.2.3, this problem has a unique solution if xO is sufficiently close to x*. We furthermore suggest to generally let Ho := \7;xLred(XO,UO). The finite QP subproblems in Step 1 of the algorithm can be solved, for example, by the reliable codes in [98,138]. After termination of the algorithm, a feasibility check of the last iterate with respect to F(Y) is advisable.

The local maximizers yj (Xk+l), j = 1, ... , rl(x*), of p[xk+l] in Step 3 have to be uniquely assigned to the local maximizers yj (xk). If this is not possible or

252 CHAPTER 7

if Algorithm 6 does not converge, then Assumption 5.2 may not be satisfied or xO not be an element of U(x*). In the latter case, a better approximation of x* has to be computed by a return to the phase I method. Also, the convergence region of Algorithm 6 can normally be enlarged when, in addition, a step-size is introduced (see the methods of the next subsection). For the special methods discussed in the following, transcriptions into such form can be carried out straightforwardly and have also been applied in practice (e.g. [63]).

Algorithm 6 with Hk := 'V ;xLred(xk, uk), kENo, just is the original version of an SQP method by Wilson [229] stated for problem Pred[X*] and was suggested in [89,154,210,217] (see also [81,84]). For all sufficiently large k, in this case, the QP problem in Step 1 has a unique solution dk and associated multiplier vector Uk+l [8, p.252j. Moreover, for that choice, the method yields the same sequence {(Xk, uk)} with uj = 0, j ¢ I(x*), which is obtained when Newton's method is applied to the equations g(x,yj(x)) = 0, j E I(x*), and 'V xLred(X, u) = 0 with Uj = 0, j ¢ I(x*), of the KKT conditions for pry] resp. Pred[X*] and when the local maximizers are adapted as in Step 3 of Algorithm 6 [14,87,89]. Consequently, since our assumptions guarantee the local convergence of Newton's method and since w.l.o.g. U(x*) was assumed to belong to the convergence region of the respective method, one arrives at the following theorem.

Theorem 5.3 ([14,89]) Let Hk := 'V ;xLred(xk, uk), kENo, and Assumption 5.2 be satisfied. If Algorithm 6 does not stop after finitely many iterations, then it generates an infinite sequence {(xk, uk)} which converges quadratically to (x*, u*) for k -+ 00.

The specification of the set I(x*) may not be possible in practice (cf. Remark 2.18) so that the solution ofthe QP problem in Step 1 above is to be preferred to the application of Newton's method in the aforementioned sense. The difficulty to identify the index set of global maximizers is likewise encountered when, at the approach with Newton's method mentioned above, the local maximizers are not adapted at each iteration as in Step 3 but Newton's method is applied to the augmented system of equations which is obtained when the corresponding equations of the KKT conditions for all local maximizers of p[Xk], which are believed to belong to global maximizers of P[x*], are incorporated additionally. In that case, however, these local maximizers are updated only by a Newton step and not computed "exactly", which may destroy the superlinear rate of convergence [91].


In fact, local reduction was suggested for the first time in the latter way and that, for linear SIP problems, by Gustafson [70] (see also [54]). It was furthermore used for linearly constrained convex problems in [71,74,75]. Earlier, Wetterling [228] (see also [134]) had shown that, for linear real CAP problems under the Haar condition, Newton's method applied to such system is locally equivalent with the second Remes algorithm and hence explains its superlinear rate of convergence. Later, this approach was investigated for linear complex CAP problems by Spagl [193] (together with similar approaches) and Tang [201], for nonlinear real CAP problems by Watson [224] and Hettich [80,82]' and eventually for general nonlinear SIP problems by Hettich et al. [81,89,90].

Apart from second derivatives, the outlined Newton type approaches require the solution of the system of linear equations (2.24) at xk for the evaluation of the matrix H k . It was therefore suggested in [63,64,77,96,141,172] to update the matrix Hk according to the quasi-Newton BFGS formula for the Hessian of the Lagrangian Lred (e.g. [7,195]), where in [63,64,77] Powell's modification [162] of this formula was used. At choice of Powell's formula, the sequence {xk} converges r-superlinearly to {x*} under Assumption 5.2 [162]' while, for the original BFGS update, the sequences {(xk, uk)} and {xk} generated by Algorithm 6 converge superlinearly to {(x*, u*)} and {x*} if, in addition eV';xLred(X*,U*)~ > 0 is true for all ~ E IRn\{o} [13,14]. The latter type of convergence is also guaranteed under Assumption 5.2 if the augmented Lagrange function is employed [65,88], but for such function an additional parameter has to be adjusted properly, which can be difficult [140]. In connection with these methods, also the analysis in [64,65,96] is of importance, which shows that the local maximizers of p[xk], kENo, have to be computed with high accuracy so that the superlinear rate of convergence is preserved (see also [88]).

5.2.2 Globalized reduction based methods

The methods discussed in the preceding subsection necessitate that the approached KKT point x* of the SIP problem satisfies Assumption 5.2 and that the starting point XO lies in U(x*), i.e. is sufficiently close to x*. Then w.l.o.g. it can be assumed that also the iterates of the used locally convergent method remain in U(x*). Thus, at a solution point of the SIP problem, local reduction is a reasonable approach despite its inherent uncertainties. It has been successfully employed in numerous circumstances (see also Section 5.3).

Local reduction can also carried out for points xk which are not KKT points of the SIP problem and at which problem p[xk] is regular. If p[xk] is regular, the idea is to perform some inner iterations towards the solution of the finite

254 CHAPTER 7

reduced problem Pred[Xk) rather than of the SIP problem directly. In view of the subsequent conceptual algorithm suggested in [65,88), we therefore need the following assumption. (Clearly, the methods discussed in the following can also be used to solve problem Pred[X*) of the preceding subsection.)

Assumption 5.4 Problem p[xk) is regular for each xk, kENo.

Algorithm 7 Step O. Choose XO E ~n and set k := O.

Step 1. Compute all local maximizers yj(xk), j = 1, ... ,rl(xk), of problem p[xk) via a global search on Y.

Step 2. Starting with Xk,o := xk, apply ik E N inner iterations of an algorithm solving Pred[Xk). For each inner iterate Xk,i, i E {I, ... , ik-d, compute the local maximizers yj (Xk,i) of problem p[xk) by a local adaptation.

Step 3. Set xk+l := Xk,ik and k := k + 1. Go to Step 1.

The difference between the computation of all local maximizers by a global search and by local adaptation in Steps 1 and 2 is crucial in practice [65,88). The difficulties and the numerical labor in finding all local maximizers of p[xk] by a global search were already addressed in Section 2.1. In contrast to that, local adaptation only requires the computation of the maximizers yj (Xk,i) by a locally convergent method like Newton's method, where the vectors yj (Xk,i-l) can be employed as starting points. Thus a choice ik > 1 is desirable since it avoids the time consuming global search of all local maximizers in every iteration.

The gain in computing time obtained by a choice ik > lover the choice ik := 1 for all k was demonstrated in [64,77), where usually at most ik = 5 iterations had been executed. It has to be respected, however, that the inner iterates Xk,i move away from xk and hence may leave U(xk) when xk is not a KKT point of the SIP problem. Therefore, it is theoretically not justified to fix ik, as this was done in (64), and different actions have to be considered at each inner iterate, dependent on whether Xk,i is in U(xk) and/or in U(xk) (cf. Theorem 2.16) and whether Xk,i satisfies the constraints of Pred[Xk) or not. These actions can be described theoretically [65,77) but, since U(xk) and U(Xk) are unknown in general, they cannot be implemented in practice without serious simplifications [77]. Moreover, for ik > 1 (and similarly in Algorithm 6), the local maximizers


yj (Xk,Hl) have to be correctly assigned to the local maximizers yj (Xk,i) which, even if rl (Xk,i+l) = rl (Xk,i), can be a difficult task when the distance between the local maximizers at Xk,i is small. (See [77,198] for a quite subtle "matching process" and [184] for a pathfollowing algorithm in this connection). Thus, for ik > 1, there exists a considerable gap between the conceptual method and its practical realization. Nevertheless some highly nonlinear SIP problems with up to 67 variables, arising in connection with the planning of robot trajectories, were successfully solved with such method [77].

Similarly, if ik := 1 for all kENo, there is no guarantee that Xk+l remains in the neighborhood U(xk) of xk for each k. But global convergence can be proved in that case when a decrease is obtained with respect to an exact penalty function for the SIP problem (see below). We therefore recommend to set ik := 1, kENo, in Algorithm 7. In the following discussion, we assume this particular choice and hence have only to deal with iterates xk.

Numerical methods fitting the conceptual Algorithm 7 were given in [23-25,64, 77,172-174,196-198,225-227]. Such methods require the solution of a QP subproblem, providing a new search direction, and the computation of a step-size for this direction. A specific method and hence the choice of the QP subproblem and step-size rule can essentially be characterized by five ingredients discussed below. Similarly as in finite programming (e.g. [7,8,51,195]), various possibilities for these ingredients have been suggested. In Table 1 we specify these for the globalized reduction based methods of Coope and Watson [25], Gramlich [64], Haaren-Retagne [77], Price [172], and Tanaka, Fukushima, and Ibaraki [198]. One reason for the choice of just these methods is the fact that their performance was demonstrated by numerical results. Moreover, the algorithms in [25] and [77] can be regarded as further developments of the work in [225-227] and [64] respectively, while the algorithms in [172] and [198] are closely connected with those in [23,24,173,174] resp. [135,196,197]. (A numerical comparison of variants of the method in [198] and a method from [226] is found in [197].)

1. Use of a specific penalty resp. merit function. SQP type methods employ a penalty function. It either serves to replace the constrained problem under consideration by an unconstrained problem and minimization of this penalty function is the aim of the algorithm or it is used only as a merit function for the computation of a step-length for the search direction. One speaks of an exact penalty function when, for sufficiently large and fixed penalty parameters, there is a correlation between the local minimizers of the optimization problem and those of the penalty function [79].

256 CHAPTER 7

For a globalized reduction based method in SIP, the penalty function has to be globally defined for all relevant x E IRn. Typically, for Pred[x] , the exact L1 -penalty function

",rICz) . L1(x,p) := f(x) + L.Jj==1 Pj max {gJ(x),O} (5.5)

or the exact Loo -penalty function

Loo(x,p) :=f(x)+p. max max {gj(x),O} J==1, ... ,r,Cz)

(5.6)

have been chosen, where Pj > 0 resp. p > O. Coope and Price [23,24, 172-174] alternatively suggested use of the augmented exact Loo-penalty function

1 ( .)2 L~(x,p,a):=Loo(x,p)+-2a . max max {gJ(x),O} J==1, ... ,r,Cz)

with p > 0 and a > 0 and pointed out its advantages [23,174]. (The algorithm in [173] is not presented as a local reduction approach but is closely related to such approach under our assumptions.)

2. Incorporation of the penalty resp. merit function. In algorithms, at which a penalty function is minimized, the penalty function is properly incorporated into the QP search direction problem and the QP problem reflects the desire to minimize (a quadratic approximation of) this function. The penalty parameters are chosen before the search direction is computed, which is a direction of descent for the penalty function (e.g. [7]). If, on the other hand, the penalty function serves as a merit function for the computation of a step-size, the QP subproblem at xk may be considered as a direct quadratic approximation of Pred[Xk ] itself. The penalty parameters then are fixed after calculation of the search direction such that this is a direction of descent for the penalty function (e.g. [7]). While, at the first approach, the QP subproblems always have a solution, a subproblem may have an empty feasible set at the second approach. Several possibilities how to deal with the latter situation, i.e. how to set up an appropriate QP problem to compute the search direction, can be found e.g. in [15,130,146,194,195].

3. Use of Hesse matrix or quasi-Newton update. At both the penalty and the merit function approach, the Lagrangian Lred of Pred[Xk ] plays a crucial role. Similarly as for Algorithm 6, either the Hesse matrix or Powell's version of the BFGS quasi-Newton update [162] with respect to Lred can be used [13]. While second derivatives may not be available or may be costly to compute, the BFGS update requires only first order information.


Table 1 Classification of some globalized reduction based methods

# Ingredient II [25] I [64] I [77] I [172] I [198] I 1. Used penalty function L1 L1 L1 Le

00 Loo 2. Penalty/Merit function Mer Mer Mer Pen Pen 3. Powell Update/Hesse Matrix HM PU PU PU HM 4. Computation of multipliers LS QP QP QP QP 5. Computation of step-size Ar QI Ar Ar (TR) TR

LS: Computation of Lagrange multipliers via Least-Squares problem QP: Computation of Lagrange multipliers via QP subproblem Ar: Use of Armijo type step-sizes QI: Use of Quadratic Interpolation technique for step-sizes TR: Use of a Trust-Region technique

4. Calculation of the Lagrange multipliers. An SQP type method needs an approximation uk of the Lagrange multipliers of a KKT point of the problem. Typically (and at the methods in Table 1), uk is the Lagrange multiplier vector of the QP subproblem at the k-th iteration or a solution of the least-squares problem

(5.7)

Other choices are possible. An analysis concerning the computation of Lagrange multipliers in SQP type algorithms is given e.g. in [13,50,51].

5. Calculation of the step-size. The computation of a step-length which ensures a reduction of the value of the penalty or merit function is usually the numerically most costly component of an SIP algorithm since, for that, the penalty or merit function and hence all local maximizers of problem p[xk,try]

have to be computed at each trial iterate xk,try for the determination of Xk+1.

Therefore schemes are desirable which keep the total number of trial iterates of the algorithm small and simultaneously provide a sufficiently large decrease of the penalty function.

Various step-size rules are known from finite programming (e.g. [7,51]). In the methods listed in Table 1, the step was either determined by an Armijo type rule or by an interpolation technique. In this connection the numerical experiments in [63] are of interest, which show that a certain interpolation technique, presented in [29] for finite problems, is usually superior to an Armijo type rule. In connection with a step-size rule, also a Maratos effect avoiding scheme has to

258 CHAPTER 7

be incorporated to ensure the super linear rate of convergence of the resulting algorithm (cf. [8,18,131,132,144]).

Instead of the computation of a step-size, occasionally also a trust-region strategy is used, i.e. a bound on the II . lloo-norm of the proposed search direction is included into the QP subproblem and updated at each iteration. For details on trust region approaches we refer e.g. to [47,135,198].

Algorithm 7 with ik := 1, kENo, and an SQP type iteration in Step 2 may be denoted as an SQP method for the SIP problem pry]. Naturally one has tried to prove global convergence of such methods by adaptation of the corresponding proofs from finite programming. In particular for the global convergence proofs in [64,77,198]' the authors refer to the finite equivalents in [47,78,163], respectively. Proofs, however, are often outlined only, and the use of the reduction principle is not always clear (cf. Remarks 2.18). Therefore verification of proofs in this connection turns out to be a quite unpleasant task. In the opinion of the authors, at least the needed assumptions have not always been stated correctly.

Assumption 5.4 allows use ofthe reduced problem Pred[Xk] at xk and guarantees the continuity of the L1-penalty function (5.5) at xk, if this has been chosen. We moreover observe that

. max max {gi(x), o} = max max {g(x,y), o} J=l, ... ,r,(x) yEY

and that hence the Loo- and L~-penalty functions are continuous at each x E IRn (thus the Loo-function in (5.6) equals Loo(x, p, Y) defined by (2.17)), whereas the L1-penalty function can have discontinuities for x f/: F(Y) (see [172,198] for examples of such discontinuities and [22,147] for continuous L 1-

penalty functions). Apart from Assumption 5.4, in the opinion of the authors, it has furthermore to be assumed that a reduction in the requested sense is also possible for each accumulation point x* of {xk} (which is not required in [64,77]). In the following let therefore P[x*] be regular at each such point x*. Certainly it would be more desirable to require regularity of problem P[x] for all x of a compact set B and to assume xk E B. But such assumption implies that there exists a single finite problem of type (2.23) which is equivalent to the SIP problem pry] on the total set B. (Note that, in this case, B can be covered by finitely many sets U(x), x E B, as defined in Corollary 2.17 and that the local maximizers are uniquely defined in the intersection of two of such sets.) Therefore, in consideration of Remark 2.18, reduction assumptions on a whole set [25] and on the path of iterates [64] have to be seen critical.


Now assume that the considered globalized SQP algorithm does not terminate with a "solution" of problem pry) (in the sense of Section 2.5) after a finite number of iterations. Assume furthermore that the usual assumptions of an SQP method for the generated infinite sequences of iterates, matrices, and penalty parameters are satisfied which, in the finite case, guarantee the convergence of a subsequence of iterates to a "solution" of the problem (see e.g. Theorem 5.1) and that; at each iteration, a descent is ensured for the used penalty or merit function for the SIP problem. Due to possible discontinuities (if these are not excluded by strong assumptions), such decrease is quite difficult to prove for the L1-penalty function. Also, for the L1-penalty function, the assumption needed for the penalty parameters seems to entail the choice Pi := P, j = 1, ... , rl(x), in (5.5) for some P > 0, since, in our opinion, one otherwise would have to assume that the set of functions gi and hence the number rl(xk) of maximizers do not alter any more after finitely many iterations. An assumption of the latter type namely would result only in verification of local convergence, which seems to apply to the proof in [77).

Then global convergence can be proved along the following lines, e.g. for the method in [63). Let there exist a subsequence {xkj } of {xk} converging to some x* E IRn. By our assumptions, problem P[x*) is regular so that we have xkj E U(x*) for all j 2: jo with some jo E N. Since {xk} satisfies a stepsize acceptance condition of type (5.3) and the function values of the penalty function at {xk} are monotonically decreasing, it can be shown that similarly also {xkj } satisfies such condition. If furthermore the values of the penalty function at {xkj } converge to some number (for the Lao-penalty function this follows from its continuity and the required boundedness of the sequence of penalty parameters), then one can enter the convergence proof of the finite case and show that x* is a "solution" of Pred[X*) and hence of pry) (e.g. [78,162)). Also, under stronger assumptions on x* (e.g. [145)), the entire sequence {xk} can be proved to converge to x* and the usual results on superlinear convergence can be transferred to the present situation.

5.3 Hybrid methods and other methods

We speak of a hybrid method when several methods are combined to one method. Already Gustafson and Kortanek had suggested in their early papers [70,71,74) on linear SIP problems to connect a globally convergent method with a locally convergent, reduction based method so that the total procedure converges globally and has a local super linear rate of convergence. Later on, Hettich and Zencke [91) recommended the combination of a robust (first or-

260 CHAPTER 7

der) discretization technique with a fast, locally convergent, reduction based method for the solution of nonlinear SIP problems (see also [55, 193] for linear complex and [82,224] for nonlinear real CAP problems). For linear and convex SIP problems, alternatively a globally convergent semi-continuous method can be applied at the first phase (e.g. [118,136]). A two-phase method with a discretization technique and a reduction based method, however, still seems to be the most reliable and efficient approach to the solution of small and medium sized nonlinear SIP problems.

An intrinsic difficulty of such two-phase approaches is the determination of the proper time for the switch from the first to the second phase, if the method for the second phase only converges locally, as it is normally the case. A switch may be performed too early so that the obtained approximate solution does not lie in the convergence region of the second method. In this case the hybrid method should allow a return to the first phase. A return to the first phase, on the other hand, may mean a termination of the total procedure since a continuation of the first method may not be reasonable because of too high costs, as in case of discretization methods, or because of numerical instabilities, as they may occur for some outer approximation methods. In this connection it has often been observed that already the solution of a discretized problem on a coarse grid provides a good starting point for the second phase (e.g. [63,91,172,174]). But such observations have always been made on the basis of experiments with lowdimensional problems. For higher dimensional SIP problems, at which solutions have several hundreds of maximizers (e.g. [156-161]), it may be quite difficult to guess the correct number of maximizers at a solution from the number of maximizers of a point close to it.

In practice, a switch from the first to the second phase may be tried in case of a semi-continuous method, if the number of local maximizers does not change any more over several iterations, and, in case of a discretization method, if the number of the discrete local maximizers remains constant over at least two discretization levels. It may be utilized as an additional criterion that the norm of the gradient of the respective Lagrangian should be sufficiently small. Several suggestions in this direction are made in [63] and accompanied with numerical experiments.

In the following we briefly discuss the special hybrid methods which have been suggested for the solution of nonlinear SIP problems. In [141], Oettershagen reported about numerical experiments with a three-phase algorithm. He used the BFGS quasi-Newton method for the unconstrained problem of minimizing the augmented Lagrangian to solve a single discretized as well as a reduced SIP problem in the first and the final phase, respectively. He furthermore included


an additional phase to improve the approximate solution obtained at the first phase. Global convergence of the total procedure is in general not guaranteed since only one discretized SIP problem is solved.

Polak and Tits [154,210] proposed the combination of the first order method from [62] with a locally convergent method based on the reduction approach. (Both methods were considered above.) The authors tum their special attention to the questions connected with the switches between the two phases. At the same time, Gustafson [73] suggested to use a discretization approach for the first phase, which he motivated by a convergence result of the type of Corollary 2.9. (In [73] the formulation of the correct system of equations for the final phase is considered as the second of hence three phases.) Numerical results were not given in [73,154,210].

The linearly convergent algorithm from [114] for finite problems was properly modified by Ruth and Tichatschke [96,97] for the solution of progressively refined discretized SIP problems and linked with a locally convergent SQP type method based on local reduction. (See the previous subsections for both approaches). The authors also provided some small numerical examples for the method.

Price and Coope [172,174] report about some numerical experiments which show that the solution of one relatively coarse discretized problem serves as a good initial iterate for their globally convergent SQP type method discussed in the preceding subsection. At that, they do not exploit the special structure of the discretized problem. The latter was recently done by Gomer [63] who uses the discretization method described in Section 5.1 and combined it with a locally convergent BFGS quasi-Newton method for the reduced problem. The theoretical convergence of the total procedure is guaranteed under suitable assumptions by Theorems 2.13 and 5.1, and its efficiency is demonstrated by results for 37 test examples from the literature, which are documented in detail and hence may serve for future comparisons of numerical algorithms in the area. A clear advantage of the approach in [63] over the earlier approaches is that it uses the same superlinearly converging SQP type method for the solution of the discretized problems and the reduced problem of the second phase.

Apart from all these methods, we are only aware of a penalty type approach for nonlinear problems, developed by Teo et al. [99,202,203]. In [203] the authors replace the semi-infinite constraint g(x, y) ::; 0, y E Y, by the equivalent, non-smooth equality constraint Jy max{g(x, y), Oldy = 0 and approximate the integrand by a smooth, c-dependent function ge:(-, .). Then they consider minimization of the penalty function f(x) + Ie: Jy ge:(x, y)dy and show that the

262 CHAPTER 7

minimal values of the latter function converge to that of the SIP problem for c: ---t 0, if "flO > 0 is chosen large enough for fixed c: > O. The penalty function is minimized by an SQP method, where the authors apply a fixed quadrature scheme for the solution of the integrals so that, in fact, only a discretized SIP problem is solved. An example from [62] is investigated numerically.

REFERENCES [1] K. D. Andersen and E. Christiansen. Limit analysis with the dual affine scaling

algorithm. J. Compo Appl. Math., 59:233-243, 1995.

[2] E. J. Anderson. A new primal algorithm for semi-infinite linear programming. In [5], pages 108-122, 1985.

[3] E. J. Anderson and A. S. Lewis. An extension of the simplex algorithm for semi-infinite linear programming. Math. Programming, 44:247-269, 1989.

[4] E. J. Anderson and P. Nash. Linear Programming in Infinite-Dimensional Spaces. John Wiley & Sons, Chichester-New York-Brisbane-Toronto-Singapore, 1987.

[5] E. J. Anderson and A. B. Philpott, editors. Infinite Programming. Lecture Notes in Econom. and Math. Systems 259. Springer, Berlin-Heidelberg-New York-Tokyo, 1985.

[6] M. D. Asic and V. V. Kovaeevic-VujCic. An interior semi-infinite programming method. J. Optim. Theory Appl., 59:353-367, 1988.

[7] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.

[8] D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont, Massachusetts, 1996.

[9] L. Bittner. Das Austauschverfahren der linearen Tschebyscheff-Approximation bei nicht erfiillter Haarscher Bedingung. Z. Angew. Math. Mech., 41:238-256, 1961.

[10] J. W. Blankenship and J. E. Falk. Infinitely constrained optimization problems. J. Optim. Theory Appl., 19:261-281, 1976.

[11] H.-P. Blatt, U. Kaiser, and B. Ruffer-Beedgen. A multiple exchange algorithm in convex programming. In J.-B. Hiriart-Urruty, W. Oettli, and J. Stoer, editors, Optimization: Theory and Applications, pages 113-130. Marcel Dekker, New York-Basel, 1983.

[12] E. Blum and W. Oettli. Mathematische Optimierung. Springer, Berlin, 1975.

[13] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, 1-51, 1995.


[14] P. T. Boggs, J. W. Tolle, and P. Wang. On the local convergence of quasi-Newton methods for constrained optimization. SIAM J. Control Optim., 20:161-171, 1982.

[15] J. Burke and S.-P. Han. A robust sequential quadratic programming method. Math. Programming, 43:277-303, 1989.

[16] D. Burnside and T. W. Parks. Optimal design of FIR filters with the complex Chebyshev error criteria. IEEE TI-ans. on Signal Processing, 43:605-616, 1995.

[17] C. Carasso and P. J. Laurent. Un algorithme de minimisation en chaine en optimisation convexe. SIAM J. Control Optim., 16:209-235, 1978.

[18] R. M. Chamberlain, M. J. D. Powell, C. Leman3chal, and H. C. Pedersen. The watchdog technique for forcing convergence in algorithms for constrained optimization. Math. Programming Study, 16:1-17, 1982.

[19] E. W. Cheney. Introduction to Approximation Theory. Chelsea, New York, NY, 2nd edition, 1982.

[20] E. W. Cheney and A. A. Goldstein. Newton's method for convex programming and Tchebycheff approximation. Numer. Math., 1:253-268, 1959.

[21] E. Christiansen and K. O. Kortanek. Computing material collapse displacement fields on a Cray X -MP / 48 by the LP primal affine scaling algorithm. Annals Oper. Res., 22:355-376, 1990.

[22] A. R. Conn and N. I. M. Gould. An exact penalty function for semi-infinite programming. Math. Programming, 37:19-40, 1987.

[23] I. D. Coope and C. J. Price. A two parameter exact penalty function for nonlinear programming. J. Optim. Theory Appl., 83:49-61, 1994.

[24] I. D. Coope and C. J. Price. Exact penalty function methods for nonlinear semi-infinite programming. This volume.

[25] I. D. Coope and G. A. Watson. A projected Lagrangian algorithm for semiinfinite programming. Math. Programming, 32:337-356, 1985.

[26] P. J. Davis and P. Rabinowitz. Methods of Numerical Integration. Academic Press, New York, 1975.

[27] V. F. Dem'yanov and V. N. Malozemov. Introduction to Minimax. John Wiley & Sons, 1974.

[28] D. Den Hertog, J. Kaliski, C. Roos, and T. Terlaky. A logarithmic barrier cutting plane method for convex programming. Annals Oper. Res., 58:69-98, 1995.

[29] J. E. Dennis Jr. and R. B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall, Englewood Cliffs, NJ, 1983.

[30] I. I. Dikin. Iterative solution of problems of linear and quadratic programming. Soviet Math. Doklady, 8:674-675, 1967.

[31] J. J. Dinkel, W. H. Elliott, and G. A. Kochenberger. Computational aspects of cutting-plane algorithms for geometric programming problems. Math. Programming, 13:200-220, 1977.

264 CHAPTER 7

[32] C. B. Dunham and J. Williams. Rate of convergence of discretization in Chebyshevapproximation. Math. Comp., 37:135-139, 1981.

[33] B. C. Eaves and W. I. Zangwill. Generalized cutting plane algorithms. SIAM J. Control, 9:529-542, 1971.

[34] U. Eckhardt. Semi-infinite quadratic programming. OR Spektrnm, 1:51-55,1979.

[35] J. Elzinga and T. G. Moore. A central cutting plane algorithm for the convex programming problem. Math. Programming, 8:134-145, 1975.

[36] S.-C. Fang, C.-J. Lin, and S.-Y. Wu. On solving convex quadratic semi-infinite programming problems. Optimization, 31:107-125, 1994.

[37] S.-C. Fang and S. Puthenpura. Linear Optimization and Extensions. Prentice Hall, Englewood Cliffs, NJ, 1993.

[38] S.-C. Fang, J. R. Rajasekera, and H.-S. J. Tsao. Entropy Optimization and Mathematical Programming. Kluwer, Boston-London-Dordrecht, 1997.

[39] S.-C. Fang and H. S. J. Tsao. Linear programming with entropic perturbation. ZOR, 37: 171-186, 1993.

[40] S.-C. Fang and H.-S. J. Tsao. An efficient computational procedure for solving entropy optimization problems with infinitely many linear constraints. J. Compo Appl. Math., 72:127-139, 1996.

[41] S.-C. Fang and S.-Y. Wu. An entropic path-following approach for linear semiinfinite programming problems. In Mathematics Today Vol. XII-A, pages 1-16. 1994.

[42] S.-C. Fang and S.-Y. Wu. An inexact approach to solving linear semi-infinite programming problems. Optimization, 28:291-299, 1994.

[43] M. C. Ferris and A. B. Philpott. An interior point algorithm for semi-infinite linear programming. Math. Programming, 43:257-276, 1989.

[44] M. C. Ferris and A. B. Philpott. On affine scaling and semi-infinite programming. Math. Programming, 56:361-364, 1992.

[45] A. V. Fiacco and K. O. Kortanek, editors. Semi-Infinite Programming and Applications. Lecture Notes in Econom. and Math. Systems 215. Springer, BerlinHeidelberg-New York-Tokyo, 1983.

[46] B. Fischer and J. Modersitzki. An algorithm for complex linear approximation based on semi-infinite programming. Numerical Algorithms, 5:287-297, 1993.

[47] R. Fletcher. Practical Methods of Optimization, volume 2, Constrained Optimization. John Wiley & Sons, Chichester-New York-Brisbane-Toronto, 1981.

[48] J. Fiilop. A semi-infinite programming method for approximating load duration curves by polynomials. Computing, 49:201-212, 1992.

[49] K. Georg and R. Hettich. On the numerical stability of the simplex algorithm: The package LINOP. Technical report, Universitiit Trier, Trier, Germany, 1985.

[50] P. E. Gill and W. Murray. The computation of Lagrange multiplier estimates for constrained optimization. Math. Programming, 17:32-60, 1979.


[51J P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, New York, 1981.

[52J D. R. Gimlin, R. K. Cavin, and M. C. Budge. A multiple exchange algorithm for calculation of best restricted approximations. SIAM J. Numer. Anal., 11:219-231, 1974.

[53J K. Glashoff and s.-A. Gustafson. Einfuhrung in die lineare Optimierung. Wissenschaftliche Buchgesellschaft, Darmstadt, 1978.

[54J K. Glashoff and s.-A. Gustafson. Linear Optimization and Approximation. Springer, New York-Heidelberg-Berlin, 1983.

[55J K. Glashoff and K. Roleff. A new method for Chebyshev approximation of complex-valued functions. Math. Comp., 36:233-239, 1981.

[56J M. A. Goberna and V. Jornet. Geometric fundamentals of the simplex method in semi-infinite programming. OR Spektrum, 10:145-152, 1988.

[57J M. A. Goberna and M. A. Lopez. Reduction and discrete approximation in linear semi-infinite programming. Optimization, 18:643-658, 1987.

[58J M. A. Goberna and M. A. Lopez. Optimal value function in semi-infinite programming. J. Optim. Theory Appl., 59:261-279, 1988.

[59J M. A. Goberna and M. A. Lopez. Linear Semi-Infinite Optimization. John Wiley & Sons, Chichester-New York-Brisbane-Toronto-Singapore, 1998.

[60J D. Goldfarb and A. Idnani. A numerically stable dual method for solving strictly convex quadratic programs. Math. Programming, 27:1-33, 1983.

[61J C. Gonzaga and E. Polak. On constraint dropping schemes and optimality functions for a class of outer approximations algorithms. SIAM J. Control Optim., 17:477-493, 1979.

[62J C. Gonzaga, E. Polak, and R. Trahan. An improved algorithm for optimization problems with functional inequality constraints. IEEE Trans. Automat. Contr., AC-25:49-54, 1980.

[63J S. Gorner. Ein Hybridverfahren zur Losung nichtlinearer semi-infiniter Optimierungsprobleme. PhD thesis, Technische Universitat Berlin, Berlin, Germany, 1997.

[64J G. Gramlich. SQP-Methoden fur semiinfinite Optimierungsprobleme. PhD thesis, Universitat Trier, Trier, Germany, 1990.

[65J G. Gramlich, R. Hettich, and E. W. Sachs. Local convergence of SQP-methods in semi-infinite programming. SIAM J. Optim., 5:641-658, 1995.

[66J P. R. Gribik. A central-cutting plane algorithm for semi-infinite programming problems. In [83], pages 66-82, 1979.

[67J P. R. Gribik and D. N. Lee. A comparison of two central-cutting-plane algorithms for prototype geometric programming problems. In W. Oettli and F. Steffens, editors, Methods of Operations Research 31, pages 275-287. Anton Hain, Mannheim, Germany, 1978.

266 CHAPTER 7

[68J R. Grigorieff and R. Reemtsen. Discrete approximations of minimization problems. I. Theory. Numer. Funct. Anal. Optim., 11:701-719, 1990.

[69J R. Grigorieff and R. Reemtsen. Discrete approximations of minimization problems. II. Applications. Numer. Funct. Anal. Optim., 11:721-761, 1990.

[70J s.-A. Gustafson. On the computational solution of a class of generalized moment problems. SIAM J. Numer. Anal., 7:343-357, 1970.

[71J s.-A. Gustafson. Nonlinear systems in semi-infinite programming. In G. B. Byrne and C. A. Hall, editors, Numerical Solution of Nonlinear Algebraic Systems, pages 63-99. Academic Press, London-New York, 1973.

[72J s.-A. Gustafson. On numerical analysis in semi-infinite programming. In [83J, pages 51-65, 1979.

[73J s.-A. Gustafson. A three-phase algorithm for semi-infinite programs. In [45J, pages 138-157, 1983.

[74J s.-A. Gustafson and K. Kortanek. Numerical treatment of a class of semi-infinite programming problems. Nav. Res. Log. Quart., 20:477-504, 1973.

[75J s.-A. Gustafson and K. O. Kortanek. Numerical solution of a class of convex programs. Meth. Oper. Res., 16:138-149, 1973.

[76J s.-A. Gustafson and K. O. Kortanek. Semi-infinite programming and applications. In A. Bachem, M. Grotschel, and B. Korte, editors, Mathematical Programming. The State of the Art, pages 132-157. Springer, Berlin-Heidelberg-New York,1982.

[77J E. Haaren-Retagne. A Semi-Infinite Programming Algorithm for Robot Trajectory Planning. PhD thesis, Universitat Trier, Trier, Germany, 1992.

[78J S.-P. Han. A globally convergent method for nonlinear programming. J. Optim. Theory Appl., 22:297-309, 1977.

[79J S.-P. Han and O. L. Mangasarian. Exact penalty functions in nonlinear programming. Math. Programming, 17:251-269, 1979.

[80J R. Hettich. A Newton method for nonlinear Chebyshev approximation. In R. Schaback and K. Scherer, editors, Approximation Theory, pages 222-236. Springer, Berlin-Heidelberg-New York, 1976.

[81J R. Hettich. A comparison of some numerical methods for semi-infinite programming. In [83], pages 112-125, 1979.

[82J R. Hettich. Numerical methods for nonlinear Chebyshev approximation. In G. Meinardus, editor, Approximation in Theorie und Praxis, pages 139-156. B.I.Wissenschaftsverlag, Mannheim-Wien-Ziirich, 1979.

[83J R. Hettich, editor. Semi-Infinite Programming. Lecture Notes in ContI. and Inform. Sci. 15. Springer, Berlin-Heidelberg-New York, 1979.

[84] R. Hettich. A review of numerical methods for semi-infinite optimization. In [45], pages 158-178, 1983.


[85] R. Hettich. An implementation of a discretization method for semi-infinite programming. Math. Programming, 34:354-361, 1986.

[86] R. Hettich and G. Gramlich. A note on an implementation of a method for quadratic semi-infinite programming. Math. Programming, 46:249-254, 1990.

[87] R. Hettich and H. Jongen. Semi-infinite programming: conditions of optimality and applications. Springer Lecture Notes in Control and Information Science, 7:1-11, 1978.

[88) R. Hettich and K O. Kortanek. Semi-infinite programming: theory, methods, and applications. SIAM Review, 35:380-429, 1993.

[89) R. Hettich and W. van Honstede. On quadratically convergent methods for semi-infinite programming. In [83), pages 97-111, 1979.

[90) R. Hettich and P. Zencke. Superlinear konvergente Verfahren fUr semi-infinite Optimierungsprobleme im stark eindeutigen Fall. In L. Collatz, G. Meinardus, and W. Wetterling, editors, Konstruktive Methoden der finiten nichtlinearen Optimierung, pages 103-120. Birkhiiuser, Basel-Stuttgart, 1980.

[91) R. Hettich and P. Zencke. Numerische Methoden der Approximation und semiinfiniten Optimierung. Teubner, Stuttgart, 1982.

[92) K-H. Hoffmann and A. Klostermair. A semi-infinite linear programming procedure and applications to approximation problems in optimal control. In G. G. Lorentz et al., editors, Approximation Theory II, pages 379-389. Academic Press, New York-San Francisco-London, 1976.

[93) R. Horst. Deterministic methods in constrained global optimization: Some recent advances and new fields of application. Nav. Res. Log., 37:433-471, 1990.

[94) H. Hu. A one-phase algorithm for semi-infinite linear programming. Math. Programming, 46:85-103, 1990.

[95) H. Hu. A globally convergent method for semi-infinite linear programming. J. Global Optim., 8:189-199, 1996.

[96) M. Huth. Superlinear konvergente Verfahren zur Losung semi-infiniter Optimierungsaufgaben - Eine Hybridmethode. Dissertation (A), Piidagogische Hochschule Halle "N.KKrupskaja", Halle, DDR, 1987.

[97) M. Huth and R. Tichatschke. A hybrid method for semi-infinite programming problems. In U. Rieder et al., editors, Methods of Operations Research 62, pages 79-90. Anton Hain, Frankfurt, 1990.

[98) IMSL Math/Library. IMSL Inc., Houston, TX, 1989.

[99] L. S. Jennings and K L. Teo. A computational algorithm for functional inequality constrained optimization problems. Automatica, 26:371-375, 1990.

(100) K Jittorntrum. Solution point differentiability without strict complementarity in nonlinear programming. Math. Programming, 21:127-138, 1984.

(101) H. T. Jongen, P. Jonker, and F. Twilt. Critical sets in parametric optimization. Math. Programming, 34:333-353, 1986.

268 CHAPTER 7

[102] H. T. Jongen, P. Jonker, and F. Twilt. One-parameter families of optimization problems: Equality constraints. J. Optim. Theory Appl., 48:141-161, 1986.

[103] D. B. Judin and E. G. Golstein. Lineare Optimierung 1. Akademieverlag, Berlin, 1968.

[104] U. Jurgens. Zur Konvergenz semi-infiniter Mehrfachaustauschalgorithmen. PhD thesis, Universitat Hamburg, Hamburg, Germany, 1986.

[105] J. Kaliski, D. Haglin, C. Roos, and T. Terlaky. Logarithmic barrier decomposition methods for semi-infinite programming. Intern. Trans. Oper. Res. (in print).

[106] A. Kaplan and R. Tichatschke. Adaptive methods of solving ill-posed semiinfinite convex optimization problems. Soviet Math. Doklady, 45:119-123, 1992.

[107] A. Kaplan and R. Tichatschke. A regularized penalty method for solving convex semi-infinite programs. Optimization, 26:215-228, 1992.

[108] A. Kaplan and R. Tichatschke. Variational inequalities and convex semi-infinite programming problems. Optimization, 26:187-214, 1992.

[109] A. Kaplan and R. Tichatschke. Iterative processes for solving incorrect convex variational problems. J. Global Optim., 3:243-255, 1993.

[110] A. Kaplan and R. Tichatschke. Regularized penalty methods for semi-infinite programming problems. In B. Brosowski, F. Deutsch, and J. Guddat, editors, Approximation fj Optimization, pages 341-356. Peter Lang, Frankfurt, 1993.

[111] A. Kaplan and R. Tichatschke. Stable Methods for Ill-Posed Variational Problems. Akademie Verlag, Berlin, 1994.

[112] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373-395, 1984.

[113] J. E. Kelley. The cutting-plane method for solving convex programs. J. Soc. Industr. Appl. Math., 8:703-712, 1960.

[114] H. Kleinmichel. Uberlinear konvergente Verfahren der nichtlinearen Optimierung. In Proceedings X. IKM, Wiss. Z. der Hochsch. f. Arch. und Bauw., pages 73-76, Weimar, 1984.

[115] M. Kojima. Strongly stable stationary solutions in nonlinear programs. In S. M. Robinson, editor, Analysis and Computation of Fixed Points, pages 93-138. Academic Press, New York, 1980.

[116] M. Kojima, N. Meggido, and S. Mizuno. A primal-dual infeasible-interior-point algorithm for linear programming. Math. Programming, 61:263-280, 1991.

[117] K. O. Kortanek. Vector-supercomputer experiments with the primal affine linear programming scaling algorithm. SIAM J. Sci. Comput., 14:279-294, 1993.

[118] K. O. Kortanek and H. No. A central cutting plane algorithm for convex semiinfinite programming problems. SIAM J. Optim., 3:901-918, 1993.

[119] H.-C. Lai and S.-y' Wu. On linear semi-infinite programming problems: An algorithm. Numer. Funct. Anal. Optim., 13:287-304, 1992.


[120] P. J. Laurent and C. Carasso. An algorithm of successive minimization in convex programming. R.A.I.R.O. Numer. Anal., 12:377-400, 1978.

[121] C. T. Lawrence and A. L. Tits. Feasible sequential quadratic programming for finely discretized problems from SIP. This volume.

[122] T. Leon and E. Vercher. An optimality test for semi-infinite linear programming. Optimization, 26:51-60, 1992.

[123] T. Leon and E. Vercher. A purification algorithm for semi-infinite programming. Europ. J. Oper. Res., 57:412-420, 1992.

[124] T. Leon and E. Vercher. New descent rules for solving the linear semi-infinite programming problem. Oper. Res. Letters, 15:105-114, 1994.

[125] V. L. Levin. Application of E. Helly's theorem to convex programming, problems of best approximation and related questions. Math. USSR Sbornik, 8:235-247, 1969.

[126] A. S. Lewis and A. B. Philpott. Experiments with affine scaling and semi-infinite programming. New Zealand J. Math., 24:49-71, 1995.

[127] C.-J. Lin, S.-C. Fang, and S.-y' Wu. An unconstrained convex programming approach for linear semi-infinite programming. SIAM J. Optim. (in print).

[128] C.-J. Lin, S.-C. Fang, and S.-Y. Wu. A dual affine scaling based algorithm for solving linear semi-infinite programming problems. In D.-Z. Du and J. Sun, editors, Advances in Optimization and Approximation, pages 217-233. Kluwer, Dordrecht-Boston-London, 1994.

[129] C.-J. Lin, E. K. Yang, S.-C. Fang, and S.-Y. Wu. Implementation of an inexact approach to solving linear semi-infinite programming problems. J. Compo Appl. Math., 61:87-103, 1995.

[130] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, Reading, Massachusetts, 2nd edition, 1989.

[131] N. Maratos. Exact Penalty Function Algorithms for Finite Dimensional and Control Optimization Problems. PhD thesis, Imperial College Sci. Tech, University of London, London, 1978.

[132] D. Q. Mayne and E. Polak. A superlinearly convergent algorithm for constrained optimization problems. Math. Programming Study, 16:45-61, 1982.

[133] D. Q. Mayne, E. Polak, and R. Trahan. An outer approximations algorithm for computer-aided design problems. J. Optim. Theory Appl., 28:331-352, 1979.

[134) G. Meinardus. Approximation of Functions: Theory and Numerical Methods. Springer, Berlin-Heidelberg-New York, 1967.

[135] H. Mine, M. Fukushima, and Y. Tanaka. On the use of €-most-active constraints in an exact penalty function method for nonlinear optimization. IEEE 7rans. Automat. Contr., AC-29:1040-1042, 1984.

[136] P. Moulin, M. Anitescu, K. O. Kortanek, and F. Potra. The role of linear semi-infinite programming in signal-adapted QMF bank design. IEEE 7rans. on Signal Process., 45:2160-2174, 1997.

270 CHAPTER 7

[137J N. Miiller and M. Ries. Parallel computing using DECnet. Technical Report 32, Universitiit Trier, Trier, Germany, 1991.

[138J NAG Ltd, Oxford, England UK. NAG Fortran Library, Mark 16, 1993.

[139J P. Nash. Algebraic fundamentals of linear programming. In [5], pages 37-52, 1985.

[140J J. Nocedal and M. Overton. Projected Hessian updating algorithms for nonlinearly constrained optimization. SIAM J. Numer. Anal., 22:821-850, 1985.

[141J K. Oettershagen. Ein superlinear konvergenter Algorithmus zur Losung semiinfiniter Optimierungsprobleme. PhD thesis, Universitiit Bonn, Bonn, Germany, 1982.

[142J J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970.

[143J E. R. Panier and A. L. Tits. A globally convergent algorithm with adaptively refined discretization for semi-infinite optimization problems arising in engineering design. IEEE Trans. Automat. Contr., 34:903-908, 1989.

[144J E. R. Panier and A. L. Tits. Avoiding the Maratos effect by means of a nonmonotone line search 1. SIAM J. Numer. Analysis, 28:1183-1195, 1991.

[145J E. R. Panier, A. L. Tits, and J. N. Herskovits. A QP-free globally convergent, locally superlinearly convergent algorithm for inequality constrained optimization. SIAM J. Control Optim., 26:788-811, 1988.

[146J E. R. Pantoja and D. Q. Mayne. Exact penalty function algorithm with simple updating of the penalty parameter. J. Optim. Theory Appl., 69:441-467, 1991.

[147] T. Pietrzykowski. The potential method for conditional maxima in the locally compact metric spaces. Numer. Math., 14:325-329, 1970.

[148] E. Polak. On the mathematical foundations of nondifferentiable optimization in engineering design. SIAM Review, 29:21-89, 1987.

[149] E. Polak. On the use of consistent approximations in the solution of semi-infinite optimization and optimal control problems. Math. Programming, 62:385-414, 1993.

[150J E. Polak. Optimization. Algorithms and Consistent Approximations. Springer, Berlin-Heidelberg-New York, 1997.

[151J E. Polak and L. He. Unified steerable phase I-phase II method of feasible directions for semi-infinite optimization. J. Optim. Theory Appl., 69:83-107, 1991.

[152J E. Polak and L. He. Rate-preserving discretization strategies for semi-infinite programming and optimal control. SIAM J. Control Optim., 30:548-572, 1992.

[153] E. Polak and D. Q. Mayne. An algorithm for optimization problems with functional inequality constraints. IEEE Trans. Automat. Contr., AC-21:184-193, 1976.


[154] E. Polak and A. L. Tits. A recursive quadratic programming algorithm for semi-infinite optimization problems. Appl. Math. Optim., 8:325-349, 1982.

[155] E. Polak, R. Trahan, and D. Q. Mayne. Combined phase I-phase II methods of feasible directions. Math. Programming, 17:61-73, 1979.

[156] A. Potchinkov. Der Entwurf digitaler FIR-Filter mit Methoden der konvexen semi-infiniten Optimierung. PhD thesis, Technische Universitat Berlin, Berlin, Germany, 1994.

[157] A. Potchinkov. Design of optimal linear phase FIR filters by a semi-infinite programming technique. Signal Processing, 58:165-180, 1997.

[158] A. Potchinkov and R. Reemtsen. A globally most violated cutting plane method for complex minimax problems with application to digital filter design. Numerical Algorithms, 5:611-620, 1993.

[159] A. Potchinkov and R. Reemtsen. FIR filter design in the complex domain by a semi-infinite programming technique. Archiv fur Elektronik und Ubertragungstechnik, 48:1. The method: 135-144, II. Numerical results: 200-209, 1994.

[160] A. Potchinkov and R. Reemtsen. The design of FIR filters in the complex plane by convex optimization. Signal Processing, 46:127-146, 1995.

[161] A. Potchinkov and R. Reemtsen. The simultaneous approximation of magnitude and phase by FIR digital filters. Intern. J. Circuit Theory and Appl., 25:1. A new approach: 167-177, II. Methods and examples: 179-197, 1997.

[162] M. J. D. Powell. The convergence of variable metric methods for nonlinearly constrained optimization calculations. In O. L. Mangasarian, R. R. Meyer, and S. M. Robinson, editors, Nonlinear Programming 3, pages 27-63. Academic Press, New York, 1978.

[163] M. J. D. Powell. Variable metric methods for constrained optimization. In A. Bachem, M. Grotschel, and B. Korte, editors, Mathematical Programming: The state of the art, pages 288-311. Springer, Berlin-Heidelberg-New York, 1983.

[164] M. J. D. Powell. ZQPCVX a Fortran subroutine for convex quadratic programming. Technical Report DAMTP /1983/NA17, Dept. of Appl. Math. and Theor. Phys., Univ. of Cambridge, Cambridge, UK, 1983.

[165] M. J. D. Powell. On the quadratic programming algorithm of Goldfarb and Idnani. Math. Programming Study, 25:46-61, 1985.

[166] M. J. D. Powell. A tolerant algorithm for linearly constrained optimization calculations. Math. Programming, 45:547-566, 1989.

[167] M. J. D. Powell. TOLMIN: a Fortran package for linearly constrained opti" mization calculations. Technical Report DAMTP 1989/NA2, University of Cambridge, Cambridge, UK, 1989.

[168] M. J. D. Powell. Karmarkar's algorithm: A view from nonlinear programming. IMA Bulletin, 26:165-181, 1990.

272 CHAPTER 7

[169] M. J. D. Powell. The complexity of Karmarkar's algorithm for linear programming. In D. F. Griffiths and G. A. Watson, editors, Numerical Analysis 1991, pages 142-163. Longman Scientific & Technical, Burnt Mill, England, 1992.

[170] M. J. D. Powell. Log barrier methods for semi-infinite programming calculations. In E. A. Lipitakis, editor, Advances on Computer Mathematics and its Applications, pages 1-21. World Scientific, Singapore, 1993.

[171] M. J. D. Powell. On the number of iterations of Karmarkar's algorithm for linear programming. Math. Programming, 62:153-197, 1993.

[172] C. J. Price. Non-Linear Semi-Infinite Programming. PhD thesis, University of Canterbury, Christchurch, New Zealand, 1992.

[173] C. J. Price and I. D. Coope. An exact penalty function algorithm for semiinfinite programmes. BIT, 30:723-734, 1990.

[174] C. J. Price and I. D. Coope. Numerical experiments in semi-infinite programming. Compo Optim .. Appl., 6:169-189, 1996.

[175] R. Reemtsen. Modifications of the first Remez algorithm. SIAM J. Numer. Anal., 27:507-518, 1990.

[176] R. Reemtsen. Discretization methods for the solution of semi-infinite programming problems. J. Optim. Theory Appl., 71:85-103, 1991.

[177] R. Reemtsen. A cutting plane method for solving minimax problems in the complex plane. Numerical Algorithms, 2:409-436, 1992.

[178] R. Reemtsen. Some outer approximation methods for semi-infinite optimization problems. J. Compo Appl. Math., 53:87-108, 1994.

[179] S. M. Robinson. Strongly regular generalized equations. Math. Oper. Res., 5:43-62, 1980.

[180] S. M. Robinson. Generalized equations and their solutions, part II: applications to nonlinear programming. Math. Programming Study, 19:200-221, 1982.

[181] K. Roleff. A stable multiple exchange algorithm for linear SIP. In [83], pages 83-96, 1979.

[182] C. Roos, T. Terlaky, and J.-P. Vial. Theory and Algorithms for Linear Optimization. John Wiley & Sons, Chichester, 1997.

[183] H. Rudolph. Der Simplexalgorithmus der semiinfiniten linearen Optimierung. Wiss. Z. TH Leuna-Merseburg, 29:782-806, 1987.

[184] T. Rupp. Kontinuitiitsmethoden zur Losung einparametrischer semi-infiniter Optimierungsprobleme. PhD thesis, Universitat Trier, Trier, Germany, 1988.

[185] R. Schaback and D. Braess. Eine Losungsmethode fiir die lineare TschebyscheffApproximation bei nicht erfiillter Haarscher Bedingung. Computing, 6:289-294, 1970.

[186] E. Schiifer. Ein Konstruktionsverfahren bei allgemeiner linearer Approximation. Numer. Math., 18:113-126, 1971.


[187] U. Schiittler. An Interior-Point-Method for Semi-Infinite Programming Problems. PhD thesis, Universitiit Wiirzburg, Wiirzburg, Germany, 1992.

[188] U. Schiittler. An interior-point method for semi-infinite programming problems. Ann. Oper. Res., 62:277-301, 1996.

[189] K. Schittkowski. Solving nonlinear programming problems with very many constraints. Optimization, 25:179-196, 1992.

[190] R.-L. Sheu, S.-Y. Wu, and S.-C. Fang. A primal-dual infeasible-interior-point algorithm for linear semi-infinite programming. Computers Math. Applic., 29:7-18, 1995.

[191] G. Sonnevend. Applications of analytic centers for the numerical solution of semiinfinite, convex programs arising in control theory. In H.-J. Sebastian and K. Tammer, editors, System Modelling and Optimization, Lecture Notes in Contr. and Inform. Sci. 143, pages 413-422. Springer, Berlin-Heidelberg-New York, 1990.

[192] G. Sonnevend. A new class of a high order interior point method for the solution of convex semiinfinite optimization problems. In R. Bulirsch and D. Kraft, editors, Computational and Optimal Control, pages 193-211. Birkhiiuser, Basel, 1994.

[193] C. Spagl. Charakterisierung und Numerik in der linearen komplexen Tschebyscheff-Approximation. PhD thesis, Univ. Eichstiitt, Eichstiitt, Germany, 1988.

[194] P. Spellucci. Sequential quadratic programming: theory, implementation, problems. Meth. Oper. Res., 53:183-213, 1985.

[195] P. Spellucci. Numerische Verfahren der nichtlinearen Optimierung. Birkhiiuser, Basel-Boston-Berlin, 1993.

[196] Y. Tanaka, M. Fukushima, and T. Hasegawa. Implementable Loo penalty function-method for semi-infinite optimization. Int. J. Systems Sci., 18:1563-1568, 1987.

[197] Y. Tanaka, M. Fukushima, and T. Ibaraki. A comparative study of several semi-infinite nonlinear programming algorithms. Europ. J. Oper. Res., 36:92-100, 1988;

[198] Y. Tanaka, M. Fukushima, and T. Ibaraki. A globally convergent SQP method for semi-infinite nonlinear optimization. J. Compo Appl. Math., 23:141-153,1988.

[199] P. T. P. Tang. Chebyshev Approximation on the Complex Plane. PhD thesis, University of California, BerkeleY',CA, 1987.

[200] P. T. P. Tang. A fast algorithm for linear complex Chebyshev approximation. Math. Comp., 51:721-739, 1988.

[201] P. T. P. Tang. A fast algorithm for linear complex Chebyshev approximation. III J. C. Mason and M. G. Cox, editors, Algorithms for Approximation II, pages 265-273. Chapman and Hill, London-New York, 1990.

[202] K. L. Teo and C. J. Goh. A simple computational procedure for optimization problems with functional inequality constraints. IEEE Trans. Automat. Contr., AC-32:940-941, 1987.

274 CHAPTER 7

[203] K. L. Teo, V. Rehbock, and L. S. Jennings. A new computational algorithm for functional inequality constrained optimization problems. Automatica, 29:789-792, 1993.

[204] T. Terlaky, editor. Interior Point Methods of Mathematical Programming. Kluwer, Dordrecht-Boston-London, 1996.

[205] R. Tichatschke. Stetigkeitseigenschaften und Konvergenz von Folgen diskretisierter semi-infiniter konvexer Optimierungsaufgaben. Wiss. Z. TH KarlMarx-Stadt, 21:577-586, 1979.

[206] R. Tichatschke. Semi-infinite programming problems. Banach Center Publ., 14:543-554, 1985.

[207] R. Tichatschke and T. Lohse. Eine verallgemeinerte Schnittmethode fur konvexe semi-infinite Optimierungsaufgaben. Wiss. Z. TH Karl-Marx-Stadt, 24:332-338, 1982.

[208] R. Tichatschke and V. Nebeling. A cutting plane method for quadratic semiinfinite programming problems. Optimization, 19:803-817, 1988.

[209] R. Tichatschke and B. Schwartz. Methods of feasible directions for semi-infinite programming problems. Wiss. Inform. TH Karl-Marx-Stadt, 33:Part I: 1-15, Part II: 16-23, 1982.

[210] A. L. Tits. Lagrangian Based Superlinearly Convergent Algorithms for Ordinary and Semi-Infinite Optimization Problems. PhD thesis, University of California, Berkeley, CA, 1980.

[211] M. J. Todd. Interior-point algorithms for semi-infinite programming. Math. Programming, 65:217-245, 1994.

[212] H.-J. T6pfer. Tschebyscheff-Approximation und Austauschverfahren bei nicht erfiillter Haarscher Bedingung. In L. Collatz, G. Meinardus, and H. Unger, editors, Funktionalanalysis, Approximationstheorie, Numerische Mathematik, pages 71-89. Birkhiiuser, Basel-Stuttgart, 1967.

[213] D. M. Topkis. Cutting-plane methods without nested constraint sets. Oper. Res., 18:404-413, 1970.

[214] D. M. Topkis. A note on cutting-plane methods without nested constraint sets. Oper. Res., 18:1216-1220, 1970.

[215] L. N. Trefethen. Near-circularity of the error curve in complex Chebyshev approximation. J. Approx. Theory, 31:344-367, 1981.

[216] L. Tuncel and M. J. Todd. Asymptotic behavior of interior point methods: A view from semi-infinite programming. Math. Oper. Res., 21:354-381, 1996.

[217] W. van Honstede. An approximation method for semi-infinite problems. In [83], pages 126-136,1979.

[218] R. J. Vanderbei. Affine scaling trajectories associated with a linear semi-infinite program. Math. Oper. Res., 20:163-174, 1995.

[219] R. J. Vanderbei. Linear Programming: Foundations and Extensions. Kluwer, Dordrecht-Boston-London, 1997.


[220] L. Veidinger. On the numerical determination of the best approximations in the Chebyshev sense. Numer. Math., 2:99-105, 1960.

[221] A. F. Veinott. The supporting hyperplane method for unimodal programming. Oper. Res., 15:147-152, 1967.

[222] Y. V. Volkov and S. K. Zavriev. A general stochastic outer approximations method. SIAM J. Control Optim., 35:1387-1421, 1997.

[223] G. A. Watson. A multiple exchange algorithm for multivariate Chebyshev approximation. SIAM J. Numer. Anal., 12:46-52, 1975.

[224] G. A. Watson. A method for calculating best non-linear Chebyshev approximations. J. Inst. Maths. Applies., 18:351-360, 1976.

[225] G. A. Watson. Globally convergent methods for semi-infinite programming. BIT, 21:362-373, 1981.

[226] G. A. Watson. Numerical experiments with globally convergent methods for semi-infinite programming problems. In [45], pages 193-205, 1983.

[227] G. A. Watson. Lagrangian methods for semi-infinite programming problems. In [5], pages 90-107, 1985.

[228] W. Wetterling. Anwendung des Newtonschen Iterationsverfahrens bei der Tschebyscheff-Approximation, insbesondere mit nichtlinear auftretenden Parametern. MTW, pages 61-63 (Teil I), 112-115 (Teil II), 1963.

[229] R. B. Wilson. A Simplicial Algorithm for Concave Programming. PhD thesis, Harvard University, Boston, 1963.

[230] S. J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1997.

[231] J. L. Zhou and A. L. Tits. An SQP algorithm for finely discretized continuous minimax problems and other minimax problems with many objective functions. SIAM J. Optim., 6:461-487, 1996.

[232] G. Zoutendijk. Methods of Feasible Directions. Elsevier, Amsterdam-New YorkOxford, 1960.

[233] G. Zwier. Structural Analysis in Semi-Infinite Programming. PhD thesis, Universiteit Twente, Enschede, Netherlands, 1987.

8 CONNECTIONS BETWEEN

SEMI-INFINITE AND SEMIDEFINITE PROGRAMMING

Lieven Vandenberghel and Stephen Boyd2

ABSTRACT

1 Electrical Engineering Department, University of California at Los Angeles, 68-119 Engineering IV,

Los Angeles, CA 90095-1594, USA, Email: [email protected]

2 Information Systems Laboratory, Electrical Engineering Department,

Stanford University, Stanford, CA 94305, USA, Email: [email protected]

Some interesting semi-infinite optimization problems can be reduced to semidefinite optimization problems, and hence solved efficiently using recent interior-point methods. In this paper we discuss semidefinite optimization from this perspective and illustrate the connections between semidefinite optimization and semi-infinite programming with examples and applications from computational geometry, statistics, and systems and control.

1 INTRODUCTION

We consider convex optimization problems with linear matrix inequality (LMI) constraints, i.e., constraints of the form

(1.1)

where the matrices Fi = Fl E Rnxn are given, and the inequality F(x) 2: 0 means F(x) is positive semidefinite. The LMI (1.1) is a convex constraint in the variable x E R m. Conversely, many nonlinear convex constraints can be expressed as LMls (see the recent surveys by Alizadeh [2], Boyd, EI Ghaoui,

277

R. Reemtsen and I.-I. Riickmann (eds.), Semi-Infinite Programming, 277-294. © 1998 Kluwer Academic Publishers.

278 CHAPTER 8

Feron and Balakrishnan [5], Lewis and Overton [17], Nesterov and Nemirovsky [18] and Vandenberghe and Boyd [28]).

The purpose of the paper is to explore some connections between optimization with LMI constraints and semi-infinite programming. We immediately note that the LMI (1.1) is equivalent to an infinite set oflinear inequalities: F(x) ~ 0 if and only if

m

vTF(x)v = vTFov + LXi (vTFiV ) ~ 0 i=I

for all v in the compact set {v E Rn I IIvil = I}. It is therefore clear that convex optimization problems with LMI constraints can be studied as special cases of semi-infinite programming. Perhaps more interestingly, we will see that some important semi-infinite optimization problems can be formulated in terms of linear matrix inequalities. Such a reduction, if possible, has important practical consequences: It means that those SIPs can be solved efficiently with recent interior-point methods for LMI problems. The emphasis of the paper will be on illustrating this point with examples from systems and control, signal processing, computational geometry, and statistics.

The examples in this paper will fall in two categories. The first is known as the semidefinite programming problem or SDP. In an SDP we minimize a linear function of a variable x E Rm subject to an LMI:

minimize cT x subject to F(x) = Fo + xiFI + ... + xmFm ~ o. (1.2)

Semidefinite programming can be regarded as an extension of linear programming where the componentwise inequalities between vectors are replaced by matrix inequalities, or, equivalently, the first orthant is replaced by the cone of- positive semidefinite matrices. Although the SDP (1.2) looks very specialized, it is much more general than a (finite-dimensional) linear program, and it has many applications in engineering and combinatorial optimization [2,5, 17, 18,28]. Most interior-point methods for linear programming have been generalized to semidefinite programs. As in linear programming, these methods have polynomial worst-case complexity, and perform very well in practice.

We can express the SDP as a semi-infinite linear program

mInImIZe cTx subject to vT F(x)v ~ 0 for all v.

SIP and Semidefinite Programming 279

Lasserre [16] and Pataki [19] have exploited this fact to formulate Simplex-like algorithms for SDP. The observation is also interesting for theoretical purposes since it allows us to apply, for example, duality results from SIP to SDP.

The second problem that we will encounter is the problem of maximizing the determinant of a matrix subject to LMI constraints, i.e.,

maximize det G (x) subject to G(x) = Go + X1G1 + ... + xmGm > 0

F(x) = Fo +x1F1 + ... +xmFm ~ O.

We call this the determinant maximization or maxdet-problem. The matrices G i = GT E R1xl are given. The problem is equivalent to minimizing the convex function logdetG(x)-l subject to the LMI constraints. The max-det objective arises naturally in applications in computational geometry, control, information theory, and statistics.

A unified form that includes both the SDP and the determinant maximization problem is

minimize cT x + logdet G(X)-l subject to G(x) > 0

F(x) ~ O.

This problem was studied in detail in Vandenberghe, Boyd and Wu [29].

(1.3)

The basic facts about these two optimization problems, and of the unified form (1.3), can be summarized as follows.

• Both problems are convex.

• There is an extensive and useful duality theory for the problems.

• Very efficient interior-point methods for the problems have been developed recently [18].

• The problems look very specialized, but include a wide variety of convex optimization problems, with many applications in engineering.

280 CHAPTER 8

2 DUALITY

In [29] it was shown that we can associate with with (1.3) the dual problem

maximize log det W - Tr Go W - Tr FoZ + I subject to Tr Gi W + Tr FiZ = Ci, i = 1, ... , m, (2.1)

W = WT > 0, Z = ZT 2: 0.

The variables are WE R 1xl and Z E Rnxn. We say Wand Z are dual feasible if they satisfy the constraints in (2.1), and strictly dual feasible if in addition Z > 0. We also refer to (1.3) as the primal problem and say x is primal feasible if F(x) 2: ° and G(x) > 0, and strictly primal feasible if F(x) > ° and G(x) > 0.

Let p* and d* be the optimal values of problem (1.3) and (2.1), respectively (with the convention that p* = +00 if the primal problem is infeasible, and d* = -00 if the dual problem is infeasible). The following theorem follows from standard results in convex analysis (Rockafellar [24], see also [29]).

Theorem 2.1 p* 2: d*. If (1.3) is strictly feasible, the dual optimum is achieved; if (2.1) is strictly feasible, the primal optimum is achieved. In both cases, p* = d* .

As an illustration, we derive the dual problem for the SDP (1.2). Substituting Go = 1, Gi = 0, n = 1, in (2.1) yields

maximize 10gW - W - Tr FoZ + 1 subject to Tr FiZ = Ci, i = 1, ... , m,

W> 0, Z 2: 0.

The optimal value of W is one, so the dual problem reduces to

maximize -Tr FoZ subject to Tr FiZ = Ci, i = 1, ... ,m,

Z 2: 0, (2.2)

which is the dual SDP (in the notation used in [28]). Applying the duality result of Theorem 2.1 we see that the the optimal values of (1.2) and (2.2) are equal if at least one of the problems is strictly feasible.

Examples of primal and dual problems with nonzero optimal duality gap are well known in the semi-infinite programming literature, and also arise in SDP (see [28] for an example).


3 ELLIPSOIDAL APPROXIMATION

Our first class of examples are ellipsoidal approximation problems. We can distinguish two basic forms. The first is the problem of finding the minimumvolume ellipsoid around a given set C. The second problem is the problem of finding the maximum-volume ellipsoid contained in a given convex set C. Both can be formulated as convex semi-infinite programming problems.

To solve the the first problem, it is convenient to parametrize the ellipsoid as the pre-image of a unit ball under an affine transformation, i.e.,

[ = {v IIiAv + bll ::; I} .

It can be assumed without loss of generality that A = AT > 0, in which case the volume of [ is proportional to det A- l . The problem of computing the minimum-volume ellipsoid containing C can be written as

mInImIZe log det A-I subject to A = AT> 0

IIAv + bll ::; 1, Vv E C, (3.1)

where the variables are A and b. For general C, this is a semi-infinite programming problem. Note that both the objective function and the constraints are convex in A and b.

For the second problem, where we maximize the volume of ellipsoids enclosed in a convex set C, it is more convenient to represent the ellipsoid as the image of the unit ball under an affine transformation, i.e., as

[ = {By + d Illyll ::; I}.

Again it can be assumed that B = BT > o. The volume is proportional to det B, so we can find the maximum volume ellipsoid inside C by solving the convex optimization problem

maximize log det B subject to B = BT > 0

By + dEC Vy, lIyll ::; 1, (3.2)

in the variables Band d. For general convex C, this is again a convex semiinfinite optimization problem.

The ellipsoid of least volume containing a set is often called the L6wner ellipsoid (after Danzer, Griinbaum, and Klee [10, p.139]), or the L6wner-John ellipsoid

282 CHAPTER 8

(Grotschel, Lovasz and Schrijver [12, p.69]). John in [13] has showq that if we shrink the minimum volume outer ellipsoid of a convex set CeRn by a factor n about its center, we obtain an ellipsoid contained in C. Thus the Lowner-John ellipsoid serves as an ellipsoidal approximation of a convex set, with bounds that depend only on the dimension of the ambient space, and not in any other way on the set C.

3.1 Minimum volume ellipsoid containing given points

The best known example is the problem of determining the minimum volume ellipsoid that contains given points xl, ... , xK in Rn, i.e.,

(or, equivalently, the convex hull Co {Xl, ... ,xK}). This problem has applications in cluster analysis (Rosen [25], Barnes [4]), robust statistics (in ellipsoidal peeling methods for outlier detection (Rousseeuw and Leroy [23, §7]), and robotics (Rimon and Boyd [22]).

Applying (3.1), we can write this problem as

mInImIZe log det A-I subject to IIAxi + bll ~ 1, i = 1, ... ,K

A=AT>O, (3.3)

where the variables are A = AT E R nxn and bERn. The norm constraints IIAxi + bll ~ 1, which are just convex quadratic inequalities in the variables A and b, can be expressed as LMls

[ I Axi + b] (Axi + b)T 1 ~ 0,

so (3.3) is a maxdet-problem in the variables A and b.

3.2 Maximum volume ellipsoid in polytope

Assume the set C is a polytope described by a set of linear inequalities:

C = {x I aT x ~ bi, i = 1, ... ,L}

SIP and Semidefinite Programming

Figure 1 Maximum volume ellipsoid contained in a polyhedron.

(see Figure 1). To apply (3.2) we first work out the last constraint:

By + dEC if Ilyll = 1 ¢=:> an By + d) ::; bi if lIyll ::; 1

¢=:> max a;By+a;d::;bi , i=I, ... ,L. lIyll9

¢=:> IIBaill+a;d::;bi, i=I, ... ,L.

283

This is a set of L convex constraints in Band d, and equivalent to the L LMIs

[ (bi - a; d)J Bai ] . _ L arB bi -a;d ~ 0, z -1, ... , .

We can therefore formulate (3.2) as a maxdet-problem in the variables B and d:

minimize log det B-1 subject to B > 0

[ (bi - a; d)J (Bai)T

3.3 Minimum volume ellipsoid containing ellipsoids

These techniques extend to several interesting cases where C is not finite or polyhedral, but is defined as a combination (the sum, union, or intersection) of ellipsoids. In particular, it is possible to compute the optimal inner approximation of the intersection or the sum of ellipsoids, and the optimal outer approximation of the union or sum of ellipsoids, by solving a maxdet problem. We refer to [5] and Chernousko [8] for details.

284

I I

I

I I \

,

-------

, , , , \

I I

CHAPTER 8

Figure 2 Minimum volume ellipsoid containing five given ellipsoids. Finding such an ellipsoid can be cast as a maxdet-problem, hence efficiently solved.

As an example, consider the problem of finding the minimum volume ellipsoid £0 containing K given ellipsoids £1, ... , £ K. For this problem we describe the ellipsoids as sublevel sets of convex quadratic functions:

£i={xlxTAix+2b;x+Ci:S0}, i=O, ... ,K.

The solution can be found by solving the following maxdet-problem in the variables Ao = Air, bo, and K scalar variables Ti:

minimize log det Ao 1

subject to Ao = Air > 0 T1 ~ 0, ... , TK ~ 0

[:f ~01 b1]- Ti [:1 :: ~] :s 0, i = 1, ... , K. o bo -Ao 0 0 0

(eo is given by eo = bir A01bo - 1.) See [5, p.43] for details. Figure 2 shows an instance of the problem.


4 EXPERIMENT DESIGN

As a second group of examples, we consider problems in optimal experiment design. We consider the problem of estimating a vector x from a measurement y = Ax + w, where w ,...., N(O, J) is measurement noise. We assume A has full column rank. The minimum-variance estimator is x = A+y, where A+ is the pseudo-inverse of A, i.e., A+ = (AT A)-l AT. The error covariance of the minimum-variance estimator is equal to A+(A+)T = (AT A)-I. We suppose that the rows of the matrix A = [al ... aqf can be chosen among M possible test vectors v(i) E RP, i = 1, ... , M:

. {(1) (M)}. - 1 a. E v , ... , v ,~-, ... , q.

The goal of experiment design is to choose the vectors ai so that the error covariance (AT A)-l is 'small'. We can interpret each component of y as the result of an experiment or measurement that can be chosen from a fixed menu of possible experiments; our job is to find a set of measurements that (together) are maximally informative.

We can write AT A = qL::!l AiV(i)v(i)T, where Ai is the fraction of rows ak

equal to the vector v (i) . We ignore the fact that the numbers Ai are integer multiples of l/q, and instead treat them as continuous variables, which is justified in practice when q is large. (Alternatively, we can imagine that we are designing a random experiment: each experiment ai has the form v(k) with probability Ad

Many different criteria for measuring the size ofthe matrix (AT A)-l have been proposed. For example, in D-optimal design, we minimize the determinant of the error covariance (AT A)-I, which leads to the maxdet-problem

( M )-1 minimize log det ~ AiV ( i) v( i) T

subject to Ai 2: 0, i = 1, ... ,M M

L:Ai = l. i=l

An example is shown in Figure 3.

(4.1)

Fedorov [11], Atkinson and Donev [1], and Pukelsheim [20] give surveys and additional references on optimal experiment design. The formulation of Doptimal design as a maxdet-problem has the advantage that we can easily incorporate additional useful convex constraints. See [29] for examples.

286 CHAPTER 8

• • • • • • • • • • • • x : . • -. • - • • • • • • 0 • • • • • • • • • • •• • • •

• •

• •

Figure 3 A D-optimal experiment design involving 50 test vectors in R 2 .

The circle is the origin; the dots are the test vectors that are not used in the experiment (i.e., have a weight Ai = 0); the crosses are the test vectors that are used (i.e., have a weight Ai > 0). The D-optimal design allocates all measurements to only two test vectors.

There is an interesting relation between optimal experiment design and ellipsoidal approximation. We first derive the dual of the experiment design problem (4.1), applying (2.1). After a few simplifications we obtain

maximize log det W + p - z subject to W = W T > 0

(i)TW (i) < . 1 M v v _ z, z = , ... , , (4.2)

where the variables are the matrix Wand the scalar variable z. Problem (4.2) can be further simplified. The constraints are homogeneous in W and z, so for each dual feasible W, z we have a ray of dual feasible solutions tW, tz, t > O. It turns out that we can analytically optimize over t: replacing W by tW and z by tz changes the objective to log det W + P log t + p - tz, which is maximized for t = p/z. After this simplification, and with a new variable W = (P/z)W, problem (4.2) becomes

maximize log det W subject to W > 0

(i)T - (i) . V Wv $p, t = 1, ... ,M.

(4.3)


Problem (4.3) has an interesting geometrical meaning: the constraints state that W determines an ellipsoid {x I xTW X ~ p}, centered at the origin, that contains the points v(i), i = 1, ... , M; the objective is to maximize det W, i.e., to minimize the volume of the ellipsoid.

There is an interesting connection between the optimal primal variables Ai and the points v( i) that lie on the boundary of the optimal ellipsoid E. First note that the duality gap associated with a primal feasible A and a dual feasible W is equal to

log det (t, A;v«) v(O) T) -. - log det IV,

and is zero (hence, A is optimal) if and only if W = (2::~lAiV(i)v(i)T)-l. Hence, A is optimal if

t: ~ {x E RP xT (t,A;V(;)v(;)T) -. x $p }

is the minimum-volume ellipsoid, centered at the origin, that contains the points v(j), j = 1, ... , M. We also have (in fact, for any feasible A)

t. A; (p - vUlT (t,A;V(O)v(;)T) -. vUl )

=p-Tr (tAjVU)vU)T) (tAiV(i)v(i)T)-l =0. 3=1 .=1

If A is optimal, then each term in the sum on the left hand side is positive (since E contains all vectors vU)), and therefore the sum can only be zero if each term is zero:

Geometrically, Aj is nonzero only if vU) lies on the boundary of the minimum volume ellipsoid. This makes more precise the intuitive idea that an optimal experiment only uses 'extreme' test vectors. Figure 4 shows the optimal ellipsoid for the experiment design example of Figure 3.

The duality between D-optimal experiment designs and minimum-volume ellipsoids also extends to non-finite compacts sets (Titterington [27], Pronzato

288 CHAPTER 8

- - --/ .....

/ .....

" I • "-• • / • "-I

, • • • \ • • • • • )(

I I· • •• • \ \ .. • • • \

\ • • • 0 • • \ • ,

\ • • •• • • •• • \ • • \ , • • "-

"- I

" / ..... .... . ~/ ..... - - --Figure 4 In the dual of the D-optimal experiment design problem we compute the minimum-volume ellipsoid, centered at the origin, that contains the test vectors. The test vectors with a nonzero weight lie on the boundary of the optimal ellipsoid. Same data and notation as in Figure 3.

and Walter [21]). The D-optimal experiment design problem on a compact set C C RP is

maximize log det EvvT (4.4)

over all probability measures on C. This is a convex but semi-infinite optimization problem, with dual ([27])

maximize log det W subject to W > 0

vTWv ~p, v E C. (4.5)

Again, we see that the dual is the problem of computing the minimum volume ellipsoid, centered at the origin, and covering the set C.

General methods for solving the semi-infinite optimization problems (4.4) and (4.5) fall outside the scope of this paper. In particular cases, however, these problems can be solved as maxdet-problems. One interesting example arises when C is the union of a finite number of ellipsoids. In this case, the dual (4.5) can be cast as a maxdet-problem (see §3) and hence efficiently solved; by duality, we can recover from the dual solution the probability distribution that solves (4.4).


5 PROBLEMS INVOLVING POWER MOMENTS

5.1 Bounds on expected values via semidefinite programming

289

Let t be a random real variable. The expected values Etk are called the (power) moments of the distribution of t. The following classical result gives a characterization of a moment sequence: There exists a probability distribution on R such that Xk = Etk, k = 0, ... , 2n, if and only if Xo = 1 and

Xo Xl X2 Xn-l Xn

Xl X2 X3 Xn Xn+l

X2 X3 X4 Xn+l Xn+2 H(xo, ... ,X2n) = ~ 0. (5.1)

Xn-l Xn Xn+l X2n-2 X2n-1

Xn Xn+l Xn +2 X2n-1 X2n

It is easy to see that the condition is necessary: let Xi = Eti, i = 0, ... ,2n be the moments of some distribution, and let Y = [yO YI ... Ynf E Rn+l. Then we have

n

yTH(xo, ... ,X2n)Y = L YiyjEti+j = E (yO + yl t1 + ... + Yntn)2 ~ 0. i,j=O

Sufficiency is less obvious. The proof is classical (and based on convexity arguments); see e.g., Krein and Nudelman [14, p.182] or Karlin and Studden [15, p.189-199]. There are similar conditions for distributions on finite or semiinfinite intervals.

Note that condition (5.1) is an LMI in the variables Xk, i.e., the condition that Xo, ... , X2n be the moments of some distribution on R can be expressed as an LMI in x. Using this fact, we can cast some interesting moment problems as SDPs and maxdet-problems.

Suppose t is a random variable on R. We do not know its distribution, but we do know some bounds on the moments, i.e.,

6 :::; Etk :::; Jik

(which includes, as a special case, knowing exact values of some of the moments). Let pet) = Co + CIt + ... + C2nt2n be a given polynomial in t. The

290

expected value of pet) is linear in the moments Eti:

2n 2n Ep(t) = L CiEti = L CiXi·

i=O i=O

We can compute upper and lower bounds for Ep(t),

minimize (maximize) Ep( t)

CHAPTER 8

subject to !!:.k :::; Etk :::; 7lk' k = 1, ... , 2n,

over all probability distributions that satisfy the given moment bounds, by solving the SDPs

minimize (maximize) ClXl + ... + C2nX2n

subject to !!:.k :::; Xk :::; 7lk, k = 1, ... ,2n

1l(1,xl, ... ,x2n) ~ 0

over the variables Xl, ... , X2n' This gives bounds on Ep(t), over all probability distributions that satisfy the known moment constraints. The bounds are sharp in the sense that there are distributions, whose moments satisfy the given moment bounds, for which Ep(t) takes on the upper and lower bounds found by these SDPs.

A related problem was considered by Dahlquist, Eisenstat and Golub [9], who analytically compute bounds on Erl and Er2, given the moments Eti, i = 1, ... ,n. (Here t is a random variable in a finite interval.) Using semidefinite programming we can solve more general problems where upper and lower bounds on Eti, i = 1 ... , n, (or the expected value of some polynomials) are known.

Another application arises in the optimal control of queuing networks (see Bertsimas et al. [6,7] and Schwerer [26]).

5.2 Upper bound on the variance via semidefinite programming

As another example, we can maximize the variance of t, over all probability distributions that satisfy the moment constraints (to obtain a sharp upper bound on the variance of t):

maximize Et2 - (Et)2 subject to 6 :::; Etk :::; lik' k = 1, ... ,2n,


which is equivalent to the SDP

maximize y

subject to [X2X~ Y X1l ] 2: 0

6 ~Xk ~Jik' k= 1, ... ,2n H(I,Xl, ... ,X2n) 2: 0

291

with variables y, Xl, ... , X2n. The 2 x 2-LMI is equivalent to y ~ X2 - x~. More generally, we can compute an upper bound on the variance of a given polynomial Ep(t)2 - (Ep(t))2. Thus we can compute an upper bound on the variance of a polynomial p(t), given some bounds on the moments.

5.3 A robust estimate of the moments

Another interesting problem is the maxdet-problem

maxlmlze log det H(I, Xb ... , X2n) subject to 6 ~ Xk ~ Jik, k = 1, ... , 2n

H(I,Xl, ... ,X2n) > o. (5.2)

The solution can serve as a 'robust' solution to the feasibility problem of finding a probability distribution that satisfies given bounds on the moments. While the SDPs provide lower and upper bounds on Ep(t), the maxdet-problem should provide a reasonable guess of Ep(t).

Note that the maxdet-problem (5.2) is equivalent to

maximize logdet Ef(t)f(t)T subject to !:!:. ~ Ef(t) ~ Ji

(5.3)

over all probability distributions on R, where f(t) = [1 t t2 .•. tnf. We can interpret this as the problem of designing a random experiment to estimate the coefficients of a polynomial p(t) = Co + Clt + ... + cntn.

6 POSITIVE-REAL LEMMA

Linear system theory provides numerous examples of semi-infinite constraints that can be cast as LMIs (see [5] for an extensive survey). One of the fundamental theorems, the positive-real lemma, can be interpreted in this light.

292 CHAPTER 8

The positive-real lemma [3] gives a condition that guarantees that a rational function H : C ~ R m x m, defined as

H(s) = C(sI - A)-l B + D

where A E Rnxn (and of minimal dimension), C E Rmxn, B E R nxm , D E R m x m, satisfies certain inequalities in the complex plane. The theorem states that

H(s) + H(s)* ~ 0 for alllRs > 0

if and only if there exists a P = pT such that

[ AT p + PAP B - CT ] < 0 P > 0, BT P _ C _ D _ DT - .

(6.1)

(6.2)

In other words, the infinite set of inequalities (6.1) is equivalent to the finite matrix inequality (6.2) with the auxiliary variable P.

Assume, for example, that A and B are given, and that the matrices C and D depend affinely on certain parameters () E RP. Then (6.1) is an infinite set of LMIs in (), while (6.2) is a finite LMI in () and P.

Other examples in systems and control theory include the bounded-real lemma, and the Nevanlinna-Pick problem [5]. An application ofthe positive-real lemma in filter design is described in [31].

7 CONCLUSION

We have discussed examples of semi-infinite optimization problems that can be reduced to semidefinite programming or determinant maximization problems. It is clear that a reduction of SIPs to SDPs or maxdet-problems is not always possible. It is important, however, to recognize when such a reduction is possible, since it implies that the problems can be solved efficienlty using interior-point methods.

Acknowledgment. We thank Shao-Po Wu for his help with the numerical examples in the paper, which were generated using the codes SDPSOL [30] and MAXDET [33].

REFERENCES

[1] A. C. Atkinson and A. N. Donev. Optimum Experiment Designs. Oxford Statistical Science Series. Oxford University Press, 1992.


[2] F. Alizadeh. Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM Journal on Optimization, 5(1):13-51, February 1995.

[3] B. Anderson and S. Vongpanitlerd. Network Analysis and Synthesis: A Modern Systems Theory Approach. Prentice-Hall, 1973.

[4] E. R. Barnes. An algorithm for separating patterns by ellipsoids. IBM Journal of Research and Development, 26:759-764, 1982.

[5] S. Boyd, L. EI Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory, volume 15 of Studies in Applied Mathematics. SIAM, Philadelphia, PA, June 1994.

[6] D. Bertsimas. The achievable region method in the optimal control of queuing systems; formulations, bounds and policies. Queueing Systems, 21:337-389, 1995.

[7] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Optimization of multiclass queueing networks:polyhedral and nonlinear characterizations of achievable performance. Ann. Appl. Prob., 4(1):43-75, 1994.

[8] F. L. Chernousko. State Estimation for Dynamic Systems. CRC Press, Boca Raton, Florida, 1994.

[9] G. Dahlquist, S. C. Eisenstat, and G. H. Golub. Bounds for the error of linear systems of equations using the theory of moments. Journal of Mathematical Analysis and Applications, 37:151-166, 1972.

[10] L. Danzer, B. Grunbaum, and V. Klee. Helly's theorem and its relatives. In V. L. Klee, editor, Convexity, volume 7 of Proceedings of Symposia in Pure Mathematics, pages 101-180. American Mathematical Society, 1963.

[11] V. V. Fedorov. Theory of Optimal Experiments. Academic Press, 1971.

[12] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, volume 2 of Algorithms and Combinatorics. Springer-Verlag, 1988.

[13] F. John. Extremum problems with inequalities as subsidiary conditions. In J. Moser, editor, Fritz John, Collected Papers, pages 543-560. Birkhauser, Boston, Massachussetts, 1985.

[14] M. G. Krein and A. A. Nudelman. The Markov Moment Problem and Extremal Problems, volume 50 of Translations of Mathematical Monographs. American Mathematical Society, Providence, Rhode Island, 1977.

[15] S. Karlin and W. J. Studden. Tchebycheff Systems: With Applications in Analysis and Statistics. Wiley-Interscience, 1966.

[16] J. B. Lasserre. Linear programming with positive semi-definite matrices. Technical Report LAAS-94099, Laboratoire d'Analyse et d'Architecture des Systemes du CNRS, 1995.

[17] A. S. Lewis and M. L. Overton. Eigenvalue optimization. Acta Numerica, pages 149-190, 1996.

294 CHAPTER 8

[18] Yu. Nesterov and A. Nemirovsky. Interior-Point Polynomial Methods in Convex Programming, volume 13 of Studies in Applied Mathematics. SIAM, Philadelphia, PA,1994.

[19] G. Pataki. Cone-LP's and semi-definite programs: facial structure, basic solutions, and the simplex method. Technical report, GSIA, Carnegie-Mellon University, 1995.

[20] F. Pukelsheim. Optimal Design of Experiments. Wiley, 1993.

[21] L. Pronzato and E. Walter. Minimum-volume ellipsoids containing compact sets: Application to parameter bounding. Automatica, 30(11):1731-1739, 1994.

[22] E. Rimon and S. Boyd. Obstacle collision detection using best ellipsoid fit. Journal of Intelligent and Robotic Systems, pages 1-22, December 1996.

[23] P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. Wiley, 1987.

[24] R. T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, second edition, 1970.

[25] J. B. Rosen. Pattern separation by convex programming. Journal of Mathematical Analysis and Applications, 10:123-134, 1965.

[26] E. Schwerer. A Linear Programming Approach to the Steady-State Analysis of Markov Processes. PhD thesis, Graduate School of Business, Stanford University, 1996. Draft.

[27] D. M. Titterington. Optimal design: some geometric aspects of D-optimality. Biometrika, 62:313-320, 1975.

[28] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49-95, March 1996.

[29] L. Vandenberghe, S. Boyd, and S.-P. Wu. Determinant maximization with linear matrix inequality constraints. SIAM J. on Matrix Analysis and Applications, April 1998. To appear.

[30] S.-P. Wu and S. Boyd. SDPSOL: A Parser/Solver for Semidefinite Programming and Determinant Maximization Problems with Matrix Structure. User's Guide, Version Beta. Stanford University, June 1996.

[31] S.-P. Wu, S. Boyd, and L. Vandenberghe. FIR filter design via semidefinite programming and spectral factorization. In Proc. IEEE Conf. on Decision and Control, pages 271-276, 1996.

[32] S.-P. Wu, S. Boyd, and L. Vandenberghe. Magnitude filter design via spectral factorization and convex optimization. Applied and Computational Control, Signals and Circuits, 1997. To appear.

[33] S.-P. Wu, L. Vandenberghe, and S. Boyd. MAXDET: Software for Determinant Maximization Problems. User's Guide, Alpha Version. Stanford University, April 1996.

PART III

APPLICATIONS

9 RELIABILITY TESTING AND

SEMI-INFINITE LINEAR PROGRAMMING

o 00

I. Kuban AltInel and Siileyman Ozekici

Bo§azi~i University, Department of Industrial Engineering, 80815 Bebek, istanbul, Turkiye,

Email: [email protected] ozekici@boun. edu. tr

ABSTRACT

A typical approach in reliability testing of a complex system is to assign to the components an allocated level of reliability, and then figure out the number of component tests which guarantees component reliabilities with a certain level of confidence. Another approach is to test the system as a whole, and base the test plans on the desired value of system reliability. Both approaches have advantages and disadvantages. There is also a third method which is known as system based component testing. It is based on the idea of combining the advantages of component and system tests. The determination of minimum cost component test plans according to this new approach can be formulated as a parameterized semi-infinite linear programming problem. In this paper we explain the mathematical model and describe the solution procedure, which is based on the well known cutting plane idea and column generation technique.

1 INTRODUCTION

Test plans are essential parts in the design, development, and production of a complex system which has to function with certain level of reliability. Frequently, such a program consumes a large portion of the total system budget. Therefore, the efficiency of the test plans become an important issue. An efficiently designed test plan should satisfy three main objectives:

297 R. Reemtsen and J.-J. Riiclcmann (eds.), Semi-Infinite Programming, 297-322. @ 1998 Kluwer Academic Publishers.

298 CHAPTER 9

1. It should guarantee with a high level of assurance that the system's true reliability conditions are determined.

2. It should be capable of identifying problems that must be removed before the system can meet its reliability goal.

3. It should have the minimum possible cost.

The first one of these objectives points out that, at the end of the reliability tests accomplished according to an efficiently designed test plan, we expect having a clear idea about the system's working conditions. In other words, the information obtained during the tests should enable us to make a statement about the reliability of a newly produced system with the same specifications as the ones used in the tests. The second objective is related to lower level information. At the end of the tests the weaknesses of the system should be identified so that they can be removed before the assembly. Third objective is related to the budget conditions. At sum, an efficiently designed test plan leads to correct inferences about the system's true conditions based on the tests of its elements at minimum possible cost.

Reliability tests for a complex system can be conducted at two main levels, which are component and system, and under different conditions, e.g. different temperatures, different pressures, etc. A typical and often incorrect approach in the design of test plans is to assign to the components an allocated level of reliability, and then figure out the number of component tests which gurantees component reliabilities with a certain level of confidence. Another approach is to test the system as a whole, and base the test plans on the desired value of system reliability. Clearly, in this case a direct allocation of component reliabilities is not necessary. Even though the second approach seems to be more accurate, because the ultimate aim is to design efficient test plans which guarantee a desired value for system reliability, component tests have the following four advantages over system tests:

1. They are more economical, both in terms of the cost of the item being tested, and in the cost of test fixtures and facilites.

2. They enable the collection of more information about the components than a system test does.

3. Because it is possible to test each component separately and independently in the case of limited testing facilites, components can be tested in any desired sequence, and in different locations.

Reliability Testing 299

4. They are more timely in providing information about system reliability because they can be done without having to assemble a system.

Motivated by these points, the desire to combine the advantages of component and system tests gave birth to a different approach in the test of system reliability: system based component tests. The new approach basically suggests hypothesis test for system reliability based on independent experimentation over components In other words, it says, experiment with the components, and then accept or reject the system based upon the component test information.

The new approach is particularly relevant from a practical viewpoint because it assures the first two of the objectives that should be satisfied by an efficiently designed test plan. On the other hand, as it will be seen in the following sections, the combination of this reliability test methodology with the third objective yield a parameterized semi-infinite linear programming model whose solvability depends on the system structure, the environment, and the prior information on individual component reliabilities.

The determination of optimal component test plans by using this new approach aims at the efficient allocation of the available resources among the tests of different component types so that a certain level of inference on system reliability is obtained. This problem, which is the experimental design converse of using component test data to estimate system reliability, is called the system based component test problem.

System based component test problem was first addressed by Gal [10]. He studied a situation where a certain unacceptable reliability level, Ra , needs to be demonstrated at a specified confidence 1 - 0:. He assumed exponential life distributions for components and derived a general solution procedure to compute the optimum component test times which minimize total test cost

k

LCjtj (1.1) j=l

while satisfying the probability requirement

P [accept the systemlRs ::; Ra] ::; 0: (1.2)

Here Rs is the system reliability and is a function of individual component reliabilities, Cj is the cost per unit time of testing component j, and tj is the time spent for the test of component j. The system is accepted if and only if there are no component failures during the test.

300 CHAPTER 9

Mazumdar extended Gal's model to the situation where in addition to an unacceptable system reliability level Ro, certain acceptable reliability level RI needs to be demonstrated at a specific confidence 1 - {3 [14]. Namely he added

P [reject the systemlRs 2: RI] ::; {3 (1.3)

to (1.1) and (1.2) as the new constraint. They are actually type I and type II probability restrictions respectively of a typical hypothesis testing problem in classical statistics. The null hypothesis states that the system is unacceptable, i.e., Ho : Rs ::; Ro while the alternative states that it is acceptable, i.e., HI : Rs 2: R I. It is clear that Ro is the unacceptable reliability level while RI denotes the acceptable reliability level for the system. Moreover, the probability of accepting an unacceptable system is bounded by Q in (1.2) and the probability of rejection an acceptable system is bounded by {3 in (1.3). The specific form of Rs depends on the structure of the system and it is a function of the unknown parameters of the component life distribution. We will provide ample examples in the following sections.

The acceptance criterion plays a critical role in the component testing problem. Mazumdar used the following decision rule to accept a system: "Test (with replacement) each component j for a total oftj time units. Observe the number

of failures, N j , for component j. If the total number of failures N = E;=I N j ::; m, accept the system; otherwise reject it." This rule for accepting a system, is referred to as the sum rule in the sequel. Note that it is a generalization of Gal's rule since Gal considered only the situation where m = o. Previously in the literature, Gnedenko et al. have used the sum of component failures as a means for providing system reliability confidence limits [12]. This is referred to as the M-method by Gertsbakh [11]. In addition, Easterling et al. provide a justification for using the sum rule for a series system [9].

Within the framework of their formulations, both Gal and Mazumdar showed that for a series system, the optimum component test times are independent of component test costs, and are in fact identical [10,14]. Mazumdar, based on the ideas he used for series systems, developed later a procedure to compute optimum component test times for a series system with redundant subsystems [15]. He used again the sum rule. None of these works, which assume that component lifetimes are independently and exponentially distributed, considers the case where prior information is available on component reliabilities. The first work which considers prior information is due to Altmel [3]. In this work he developed a solution procedure to compute optimum component test times for a series system with upper bounds on component failure rates are given as the free prior information. He treats the system based component test problem

as a mathematical programming problem. His approach, which is explained in detail in his early work [1], is important not only because it is the first mathematical programming view of the problem, but also it leads to solution procedures for more general cases [4-6].

This paper consists of six sections. In the next section we present a semiinfinite linear programming model to compute optimum test plans for coherent systems under the assumption that component lifetimes are independently and exponentially distributed. The model assumes also that some prior information on component failure rates is provided. Section three includes a dual solution procedure based on the classical column generation idea. The model presented in section four generalizes furthermore the one of section two to systems with dependent component lifetimes. A test plan for a series system working in a randomly changing environment is computed to illustrate the solution procedure in section five. Finally, conclusions and pointers for new research directions are given in the last section.

2 TESTING SYSTEMS WITH INDEPENDENT COMPONENT FAILURES

To formalize the problem of determining an optimal test plan for a system, let K = {1, 2, ... , k} be the set of components and Cj ~ 0 cost per unit time of testing component j. We denote the prior information on component failure rates by Ij we assume that it is a nonempty and compact subset of nonnegative real numbers, and there is not any cost associated with it. An example of I is {A E R'+ : Aj :::; Uj j E K}. In other words upper bounds on component failure rates are given. All of this information is assumed to be known as well as system reliability levels Ro, R 1 , and significance levels a and f3j we assume that Ro, R 1 , a, and f3 are chosen from the interval (0,1) and a + f3 < 1. The unknowns are m, a nonnegative integer parameter bounding total number of failures which occur during component tests from above, tj,m, the test time of component j for a given m, and Aj, the constant failure rate for component j. However, only the values of component test times and the upper bound m are primarily important in the solution. Their optimal value, provides a minimum cost component test plan by deciding on component test times, and the maximum total number of component failures which should be allowed during the tests, in order to obtain the minimum total test cost.

302 CHAPTER 9

We assume that components have exponential life times with constant failure rates Aj, and fail independently. Moreover, for a given m, component j is tested for tj,m time units with replacement, namely when a component fails during the test it is replaced promptly by an identical one and the test continues with the new one, and the mission time, time period [0, s] during which we require the system to perform without failure, is equal to the time unit. The first impact of these assumptions is on the system reliability function Rs. Since the mission time for which the reliability of the system needs to be demonstrated is equal to the time unit, i.e. s = 1 , we can consider the system reliability function Rs = R(A, s) as a function of component failure rates only and denote it by R(A). Hence the system reliability constraints become equivalent to R(A) S Ro and R(A) ~ R i . The second impact is on the acceptance probability of the system. According to the sum rule used to combine the information obtained through component tests,

P[accept the system] == P[L Nj S m] jEK

where Nj is the number of component j failures during its test, which takes tj,m time units for a given m. Since each component j has an exponential life with failure rate Aj, components are tested with replacement, and component failures are mutually independent, failures form a Poisson process and thus N j j E K are independent Poisson random variables with parameter Ajtj,m. Consequently, total number of failures N = L:XEE N j has a Poisson distribution with parameter "EjEK Ajtj,m. In other words, once component test times are known, system acceptance probability is also a function of component failure rates.

Let us define AI == {A E Ri : R(A) S Ro}, All == {A E Ri : R(A) ~ Rd, p(Ro) == AI n I, and p(Rt} == All n I. Clearly AI and All denote k dimensional failure rate vectors, which are also feasible with respect to system reliability constraints. Hence, p(Ro) and p(Rt} describe the sets of failure rates which are feasible with respect to both system reliability constraints, and the prior information. For example, for a series system R(A) = TIjEK e->'; = exp( - L:jEK Aj). Then the system reliability constraints become equivalent to L:jEK Aj ~ -In Ro and L:jEK Aj S -In R i . Consequently,

p(Ro) == {A E Rk: L Aj ~ -lnRo , 0 S Aj S Uj

jEK

p(Ri ) == {A E Rk: L Aj S -InRi, 0 S Aj S Uj

jEK


where the prior information on component reliabilities is in the form of upper bounds on component failure rates, i.e. I == {Aj S Uj, j E K}. Note that for the situation when there is no prior information on component failure rates, namely nothing is known on them a priori, the only restriction is due to the system reliability constraints. Formally speaking, I == n~ implying that every thing is possible for the failure rates and thus p(Ro) == AI and p(R1 ) == All.

Assuming that p(Ro) and p(R1 ) are nonempty there can be more than one feasible failure rate vector and thus more than one value for the system accep-

tance probability P [LjEK Nj S m]. Then, the probability constraints (1.2)

and (1.3) are surely guaranteed for all feasible A vectors if they are modified as follows:

A~m./ [;~ N; ,; m] ,; a (2.1)

min P [L: Nj S m] 2:: 1 - fJ . '\Ep(Rd jEK

(2.2)

Suppose that Y is a random variable that has a Poisson distribution with parameter y. Define A,)"m to be the value of y for which pry S m] = 'Y and 1/Jm(Y) to be equivalent to pry S m]. In other words, 1/Jm(Y) = L::=o yXe- Y Ix!,

and 1/Jm (A,)"m) = 'Y. Then, P [LjEK Nj S m] = 1/J(LjEK Ajtj,m), and the inequalities (2.1) and (2.2) become,

(2.3)

(2.4)

Let us consider 1/Jm(LjEK Ajtj,m) S O! and assume tj,m 2:: 0 j E K. Then, this inequality is equivalent to 1/Jm(LjEK Ajtj,m) S 1/Jm(Aa:,m) because 1/Jm(Aa:,m) = O!. 1/Jm (y) is a strictly decreasing and continuous function of y 2:: o. It is also invertible with respect to y. Hence by taking the inverse of both sides of the last inequality we can write L ·EK Ajtj,m 2:: Aa:,m. Similarly LjEK Ajtj,m ~ Al-,B,m. Thus the inequalities (2.3) and (2.4) result respectively in (2.6) and (2.7) of the mathematical programming problem GP(m), whose formulation is given below. Here "G" stands for the word "general" and "P" stands for the

304 CHAPTER 9

word " problem" . m, the upper bound on the number of component failures, is the positive integer parameter.

GP{m} :

z*(m)

s.t.

min L Cjtj,m jEK

(2.5)

(2.6)

(2.7)

(2.8)

As it can be observed (2.6) is a minimization in A and (2.7) is a maximization in A now. This change follows from the inversion of 'ljJm(Y) with respect to y, and our desire of forcing the constraints (1.2) and (1.3) to hold for any component failure rate vector, which forces also EjEK Ajtj,m ~ Aa,m and EjEK Ajtj,m ::::; Al-,B,m to hold for all A vectors in p(Ro) and in p(RI) respectively. In this formulation we assume that p(Ro) and p(RI) are nonempty. Otherwise, the formulation is infeasible.

We denote the solution of GP(m) by (ti,m' t;;,m' ... , tk,m)' These are the minimum cost component test times for a given value of m, and z*(m) is the associated total test cost. As a result, the minimum total test cost is z* = z*(m*) = min {z*(m) : m ~ N}; and it is obtained by solving GP(m) parametrically with respect to m. Then optimal component test times, which are referred as (ti, t;;, ... , tk), is a solution of GP(m*). Note that tj,m* = tj for any component j by definition.

We refer to inequalities (2.6) and (2.7) as type I and type II inequalities and the two optimization problems describing their left-hand sides type I and type II problems. Clearly p(Ro) and p(RI) denote the feasible solution sets of these problems. As it can be seen type I and type II problems in their forms given as the left-hand side of inequalities (2.6) and (2.7) are two optimization problems in component failure rates once component test times are known as the coefficients of their objective functions. Although they both have linear objective functions with Aj j E K as the decision variables, the feasible solution sets p(Ro) and p(RI) may not have nice structures. They depend heavily on the reliability function R(A) and the prior information. In their most general form type I and type II problems are nonconvex optimization problems with

linear objective function. Let us consider the following equivalent formulation of GP(m), which is called P(m).

P(m} :

z*(m) min L Cjtj,m (2.9) jEK

s.t.

L >"jtj,m ~ >"o,m V>" E p(Ro) (2.10) jEK

L >"jtj,m ::; >"I-~,m V>" E p(Rd (2.11) jEK

tj,m ~ 0 j E K. (2.12)

This is a semi-infinite linear program, since it has infinitely many constraints, and finitely many, k, variables. The sets of constraints (2.10) and (2.11) describe two cones each of which has infinitely many inequalities. In other words the feasible solution set of P(m), or equivalently the feasible solution set of GP(m), consists of the intersection of two cones described by infinitely many linear inequalities, and the positive orthant.

P(m) or equivalently GP(m) is not feasible for all m. The existence of a feasible m, namely an m for which P(m) is solvable with respect to test times, and m*, depends on the value of Ro and R I . A sufficient condition is PfJ;- < 1 where MI = min'xEP(Ro) 2:jE K >"j and MIl = max'xEP(Rd 2:jEK >"j. Namely, once a,/3 > 0, and a+/3 < 1 are selected, optimal component test times exist for any Ro and RI in the interval (0,1) such that MIl < MI. This sufficient condition use the fact that {,x~-,8,~ }~=o is a stirictly increasing sequence converging to

1 from left when a,5'; 0, and a + /3 < 1 (2). Details, including the proofs and counter examples for the necessity of this condition can be found in an earlier work (4). Another result on the feasibility of P(m) is the stability of the feasibility with respect to m: it is true that once P(m) has a solution then P(m') has also a solution for any m' such that m' > m (4).

It is clear that system reliability R(>..) is decreasing as >.. is increasing where >.." ~ >..' say if >..'} ~ >..j for all j E K. Moreover, R(O) = 1 and R( +00) = 0 whenever >.. equals 0 or +00 identically. This is quite intuitive for coherent systems since as the failure rate increases, the reliability of that component as well as system reliability decreases. As a consequence the semi-infinite linear programming

306 CHAPTER 9

problem whose solution is necessary to compute optimum component test plans is aplicable to coherent systems with any topology.

As we have already mentioned, for the situation we do not know anything on component failure rates a priori, the only restriction is due to the system reliability constraints. When this is true and the sum rule is used as the acceptance criterion, the existence of a test plan is not guaranteed for every system topology; for example for parallel systems no test plan exists in these conditions. This is because type II problem is unbounded and thus GP(m) is infeasible for any value of m [1). However, this drawback vanishes when the prior information is given as a non-empty compact subset of nonnegative real numbers, simple upper bounds on component failure rates for example. This forces p(Rt), which is the intersection of AI I and I, to be a closed and bounded, namely compact, subset of nonnegative real numbers. This prevents any of the failure rates from becoming arbitrarily large and guarantees a finite optimum value for type II problem.

3 SOLUTION PROCEDURE

In the previous section we formulated the problem of determining an optimal test plan as a parametric semi-infinite linear programming problem. To achive an optimal solution procedure we first describe an algorithm which computes. minimum cost component test times for a given value of the integer parameter m. This will become clear in the next section where the new model is explained. Since details on the proofs of correctness and convergence can be found in an earlier paper [4), we only presente the idea and give a formal listing of the procedure. Although this solution procedure is for systems with independent components it can be generalized easily for systems with dependent components.

Let us define two index sets, FI and FI I in order to label feasible failure rates from p(Ro) and P(Rl). In other words, any failure rate vector with an index from FI is in p(Ro), and any failure rate vector with an index from FIJ is in p(Rt). Hence, if these labelled failure rate vectors are denoted by If and if I ,

then If for any index i from FI, and iP for any index i from FIJ are feasible solutions for type I and type II problems respectively. Let us now consider the following primal linear program PP(m) and its dual DP(m).


PP(m}:

zp(m)

DP(m} :

s.t.

min L Cjtj,m jEK

(3.1)

L f&tj,m ~ Aa,m i E FI (dual variable7ri) (3.2) jEK

L fH tj,m ~ AI-,6,m i E FIl (dual variable 8i ) (3.3) jEK

tj,m ~ 0 j E K (3.4)

zD(m) (3.5)

s.t.

iEF[ iEFl[

7ri ~ 0 i E FI

8i ~ 0 i E FIl

(3.7)

(3.8)

If FI and FI I can be chosen in such a way that a set of optimal component test times which solves P(m) to optimality is contained in the solution set of PP(m), then PP(m), or its dual DP(m), can be solved to compute these test times instead of solving GP(m) given by (2.5) - (2.8).

The algorithmic idea is based on this argument and combines the well-known cutting plane method with the well-known column generation technique. Starting with empty FI and FIl , or equivalently unconstrained PP(m), we keep generating new linear inequalities and solving PP(m) until a near optimal, or more precisely an arbitrarily close to optimal solution is obtained. Since the addition of a new constraint to the constraint set of PP(m) is equivalent to the addition of a new variable to its dual DP(m), and PP(m) can have a very large constraint set, instead of solving PP(m) from scratch we prefer updating the solution of DP(m) by using revised simplex algorithm in order to save on the computational effort. Recall that Cj ~ 0 for any component j. Then, by letting Sj denote the slack variable for the j th row of DP(m) it can be seen that Sj = Cj for all j, 7ri = 0 for any index i E FI, 8i = 0 for any index i E FIl

308 CHAPTER 9

is a basic feasible solution for any given value of m. In other words, DP(m) is feasible for any m, and IKI x IKI identity matrix can be a starting basis for any given value of m.

For a feasible maximizing type linear prgramming problem the simplex algorithm stops if and only if the reduced costs are all non-positive. In other words if the linear program is max{ h T X : Ax :::; b, x ;:::: O}, (here h is used to denote the cost vector in order to prevent any confusion with the unit test cost vector c used in the previous formulations), then the simplex algorithm stops if and only if min{zj - hj : for every nonbasic j} ;:::: O. Note that hj - Zj is the reduced cost associated with the j th nonbasic variable namely j th nonbasic column.

Let us consider DP(m) for a given set of columns with indices FI and FlI and assume that it is bounded. We observe that the index of a nonbasic column can be either in FI or in FI I. In addition, hj = Aa,m for all j E FI and, hj = -Al-.6,m for all j E FlI. Then, by denoting an optimal dual solution of DP(m) by (w~,m' w2,m'· .. ' wk,m) and using the fact that wi,m = ti,m' w2,m = t2,m,···, wZ,m = tZ,m' we can write the stopping condition as

Hence, if we slightly modify this stopping condition in order to consider not only the nonbasic columns from FI and FI I but also all possible nonbasic columns, which are to be generated from the two feasible failure rate sets p(Ro) and p(Rd, the simplex algorithm stops if and only if type I and type II constraints given as inequalities (2.6) - (2.7) in the original formulation of GP(m) are satisfied, or equivalently, if and only if

( minAEP(Ro) EjEK tj,mAj ;:::: Aa,m )

and . maxAEp(Rd LjEK tj,mAj :::; Al-.6,m

This condition requires the solution of two optimization problems with respect to failure rates, whose objective coefficients are the component test times which currently solve PP(m) to optimality.

We present this procedure more formally as the algorithm given below (see Figure 1). We define PPi(m) and DPi(m) as linear programs PP(m) and its dual DP(m) at iteration i of the algorithm, and zpP,i(m) and zDP,i(m) their optimum objective values. We also define the optimum objective values of the

type I and type II problems at iteration i as zj,i(m) and zjI,i(m) and their objective coefficients as wi m = (wil m' wi2 m"'" wik m)' which is an optimal dual solution of DPi(m) or ~quivalently the' componen:t test times solving PPi(m) to optimality. We dertote these test times by ti,m = (til,m' ti2,m'"'' tik,m)' Finally we let ff and fF be optimal solutions of type I and type II problems. They are the new columns generated at iteration i in order to update Bi l , the inverse of the basic matrix of DPi(m). Then, based on these definitions

and

zj)m) = minAEP(Ro ) LjEK wij,mAj = minAEP(Ro ) LjEK tij,mAj = LjEK tij,mfi~

At iteration i, if DPi(m) is bounded and the condition zj,i(m) ~ Ao:,m and zjI,i(m) :::; AI-I1,m holds then ti,m = (til,m' ti2,m' ... , tik,m) is an optimal solution of PP(m) and GP(m)j thus, the algorithm stops with zDPi(m) = zPP,i(m) = z*(m). If DP i (m) is bounded and either zj,i(m) < Ao:,m or

zh)m) > AI-I1,m, or both are true, then either LjEK fltij,m ~ Ao:,m or LjEK fl1tij,m :::; AI-I1,m, or both are violated, and the basis inverse Bi l is updated by pivoting on if or on f fI, or on both. Consequently it is possible to see that if the algorithm stops in finitely many iterations either the infeasibility of P(m) is detected or an optimal solution is computed. It can be shown that if the algorithm does not stop in finitely many steps then the sequence {W:,m}~l' generated by solving the dual problem DPi(m) at each iteration i, converges to an optimal solution of GP(m), i.e. to t;,. [4]. Therefore, it is possible to stop the algorithm in finite iterations by replacing the stopping condition of step 4 with the following €-perturbated one, which is referred as "€-stopping condition" :

If (zL(m) ~ Ao:,m - € and zh,i(m) :::; AI-I1,m + €) then STOP.

Here € is an arbitrarily small positive number.

In his work, Hu has provided a cutting plane algorithm to solve a class of semiinfinite linear programs [13]. His algorithm solves a primal linear program at each iteration to generate the sequence of primal solutions which we do in our algorithm by solving DPi(m).

310 CHAPTER 9

Algorithm: Input: Ro, R l , .Ao,m, .Al-,B,m, c, f Output: tj,m j E K, or "INFEASIBLE m" as a message begin

1. W~j,m +-- 0 j E K, Bil +-- Ik,k, z*(m) +-- 0, i +-- 1; 2. zi,i(m) +-- min AEp(Rol LjEK wij,m.Aj, and call the opt. solution if ; 3. ziI,i(m) +-- maxAEp(Rd LjEK wij,m.Aj, and call the opt. solution if I ;

4· If (zi,i(m) ~ .Ao,m and ziI,Jm) ~ .Al-,B,m) then

STOP, Wij,m j E K are the optimum test times and z'D i(m) is the minimum total test cost for this value of m ;

5. else'

end;

begin update B;l with if, ff as two new columns; update dual solution wij,m j E K ; if DPi(m) is UNBOUNDED

then STOP, and output "INFEASIBLE m"

else i +-- i + 1, go to 2 ;

end;

Figure 1 Column Generation Algorithm to Solve P(m)

The last step in the development of a solution procedure for the computation of optimum component test plans is the search for the best value of m. It is known from Charnes and Cooper that an optimal objective value of a problem in the form, max { z = cT x : Ax ~ b, x ~ o} is a convex function of the cost vector c and a concave function of the requirement vector b [7]. Moreover it is finitely piecewise. Hence, the optimal objective value of DP(m) is a convex function in .Ao,m and Al-,B,m when m is fix. Note that at each iteration of the column generation procedure, a new value for zDP(m) is obtained. In other words, if it is not stopped, the column generation procedure generates eventually a sequence {zDP,i(m)}:l where i counts the iterations. This sequence is non decreasing and eventually converges to z*(m) [4]. Since each zDP(m) is a convex function in .Aa,m and Al-,B,m when m is fix, z*(m) can be interpreted as the pointwise supremum of an arbitrary colection of convex functions, which is known to be convex [16]. Therefore, z*(m) is also a convex function in .Aa,m

and Al-,B,m when m is fix.

Unfortunately, this does not imply that the points of the set {z*(m) : mEN} can be fit by a convex function of m, because Aa:,m and AI-,B,m can be any discrete function of m. In fact, one of them can behave mostly as a convex function while the other behaves as a concave function of m. However, based on the reported experimental results [1], Ao:,m and AI-,B,m can be efficiently approximated by two linear functions, which are the affine hulls of the sets {PI + qlX : mEN} and {P2 + q2X : mEN} with ql > 0 and q2 > O. Consequently, we can state that z*(m) can be approximately fit by a convex function and thus the search for m*, the value of m for which z*(m) < z*(m + 1) holds for the first time, starting from m = 0 by using column generation procedure for computing z*(m) values is a reasonable approach. We can assume z*(m) = 00

for any value of m, DP(m) is unbounded, or equivalently PP(m) is infeasible, by convention. Although it does not guarantee the optimum solution, stopping at the first minimizing m is certainly not a bad strategy.

4 TESTING SYSTEMS WITH DEPENDENT COMPONENT FAILURES

A rather restrictive and unrealistic assumption of all the models introduced in the previous sections is the stochastic independence of the lifetimes of the components that make up the system. This assumption is hardly true for most cases. An interesting model of stochastic component dependence is due to Qmlar and Ozekici where stochastic dependence is introduced by a randomly changing common environment that all components of the system are subjected to [8]. This model is based on the simple observation that the aging or deterioration process of any component depends very much on the environment that the component is operating in. Consider, for example, a jet engine which is subjected to different set of environmental conditions like mechanical vibrations, pressure, temperature and other atmospheric conditions during takeoff, cruising and landing. It is quite clear that the stochastic structure of aging and deterioration changes as the environment changes, and it does not make too much sense to measure the age of the component with respect to real time.

This fact can be formalized by constructing an intrinsic clock that ticks at different rates under different environmental conditions to measure the intrinsic age of the component. The intrinsic age can be substantially different than the real age depending on the environments that the component has operated in. A stylish choice in this regard is to measure the intrinsic age by the total random hazard accumulated in time so that the component fails as soon as its

312 CHAPTER 9

intrinsic age exceeds an exponentially distributed treshold with mean 1. The only complication is that the replacement age now depends on the environment.

In our present setting, we suppose that the components are stochastically dependent due only to their common environmental process. We first provide the mathematical formulation involving intrinsic aging and the random environmental process. Then we explain the results pertaining to optimal component test plans.

Consider a complex system consisting of k components and let L j denote the lifetime of the j'th component while L represents the lifetime of the system. We assume that the system operates in a randomly changing environment depicted by X = {Xti t ~ O} where X t is the state of the environment at time t. The environmental process X is an arbitrary stochastic process with some state space E which is assumed to be discrete without loss of generality to simplify the notation. The probability law of X is given by Pt(x) = P[Xt = x] for x E E and t ~ O.

If the environment is at some state x E E, then component j fails exponentially with rate Aj(X). In other words,

P[Lj > ulXt = x, 0 ~ t ~ u] = exp( -Aj (x)u) (4.1)

for any component j. Note that the life distribution corresponding to L j is not necessarily exponential because the environment does not necessarily remain fixed at some state over time. Rather, it changes randomly in time which, in turn, changes the failure rates of the components. We further suppose that the stochastic dependence among the components is due to the random environment only, and the components are otherwise independent. This means

k

P[L1 > U, L2 > u,···, Lk > ulXt = x, 0 ~ t ~ u] = exp( - L Aj(X)U) (4.2) j=l

so that the lifetimes are independent as long as the environment is fixed.

Using the terminology of Qmlar and Ozekici [8], component j ages intrinsically with rate Aj(X) in environment x and its intrinsic age Aj(t) is given by the total hazard

(4.3)


and the component fails as soon as an exponential treshold with parameter 1 is exceeded; i.e.,

P[Lj > ulX] = exp( -Aj(u)) = exp( -lou Aj(Xs)ds) . (4.4)

Let Dt (x) denote the total amount of time that the environment has been in state x until time t, i.e., Dt(x) = I~ Ix (Xu)du where 1z (y) is the indicator function which is equal to 1 if and only if z = y. Then, equations (4.3) and (4.4) can be rewritten as

Aj(t) = L Aj(x)Dt(x) (4.5) xEE

and P[Lj > ulX] = exp( -Aj(u» = exp( - L Aj(x)Du(x» (4.6)

xEE respectively.

The conditional joint distribution is now given by

P[L1 > Ul, L2 > U2,···, Lk > uklX] = exp( - LjEK IoUj Aj(Xs)ds) = exp( - LjEK LXEE Aj(x)Duj (x))

(4.7) and this clearly explains the dependence of the lifetimes on the environment and, thus, among themselves.

Recall that we want the system to function during a mission in the time interval [0, s], which we take to be our unit time without loss of generality. Letting A = {Aj(x); j E K,x E E}, it follows from (4.7) that the system reliability function is again a function of A which can be denoted by R(A) = P[L > 1] after rescaling the parameters of the environmental process and the failure rates accordingly so that Aj(X) is the failure rate per mission, during [0, s], of component j if the environment is fixed at x. Similarly, we let D(x) = Ds(x) be the total amount of time that the environment will be in state x during our mission that lasts for one time unit. Note that the form of R(A) does not depend only on the structure function of the system, but also the probability law of the environmental process. So far, our formulation has been quite general in these regards. We now provide some specific examples to give a better idea on the form of the reliability function of systems whose components are dependent through their common operating environment.

EXAMPLE 1. Series system in a fixed environment. This is the model with independent components in some fixed environment x where D(x) = 1

314

and

R(>') = exp( - L >'j(x)) . jEK

CHAPTER 9

(4.8)

EXAMPLE 2. Series system in a deterministically changing environment. Since all components must function throughout the mission, we have

R(>.) = exp(- L L >'j(x)sx) (4.9) jEK xEE

where Sx = D(x) is the amount of time spent in environment x.

EXAMPLE 3. Series system in a randomly changing environment. This is the general case which extend the previous example so that

R(>.) = E[exp( - LjEK LXEE >'j(x)D(x))] = J ... J P[D(x) E dsx; x E E] exp( - LjEK LXEE >'k(X)Sx) .

(4.10)

EXAMPLE 4. Coherent system in a randomly changing environment If the reliability system is coherent with structure function II> and reliability function h(Pl,P2.··· ,Pk) = E[II>(Yl' Y2,"', Yk)] where each lj is a binary random variable with P[Y} = 1] = Pj for j E E, then

R(>.) = E[h(exp( - LXEE >'1 (x)D(x)),···, exp( - LiEE >'k(x)D(x)))] = J ... J P[D(x) E dsx;x E E]·

h(exp( - LXEE >'1 (x)sx),"', exp( - LXEE >'k(X)Sx)) . (4.11)

Note that the system in Example 4 generalizes the first three. Similar to the statement made about the reliability function in the second section, it is possible to say that R(>') is decreasing as >. is increasing where we say >." ~ >.' if >.; (x) ~ >.~(x) for all j E K and x E E. Moreover, R(O) = 1 and R(+oo) = 0 whenever >. equals 0 or +00 identically. Our discussion up to now sets up the stage to design optimal component test plans for complex systems with stochastically dependent component lifetimes.

The formulation of the testing problem is very similar to GP(m) of the second section. Here the set of environments E = {1, 2, ... , e} enters into the semiinfinite linear programming model as the index set of a second level summation.


In other words, we take a snapshot of the system at each environment by duplicating somewhat the model GP(m). In addition to the set of components K and the set of environments E let Cj(x) :2: 0 be the cost per unit time of testing component j in environment x. We denote the prior information on component failure rates by I, which is the union of environmental failure rate informations this time. Namely,

I= U I(x) xEE

where I(x) is the prior information on component failure rates in environment x; we assume that it is a nonempty and compact subset of nonnegative real numbers, and there is not any cost associated with it. An example of I(x) is {A(x) E ni : >'j(x) ::; Uj(x) j E K}, namely upper bounds on component failure rates in environment x are given. All of this information is assumed to be known as well as system reliability levels Ro, R 1 , and significance levels a and (3. As it can be guessed, the unknowns, are m, the upper bound on the total number of failures, which occur during component tests, tj,m(x), the test time of component j in environment x for a given m, and >'j(x), the constant failure rate for component j in environment x. However, only the optimum values of m and tj,m(x) j E K, x E E form an optimal test plan.

We assume that components have exponential lifetimes in environment x with constant failure rates >'j(x) and fail independently within an environment. For a given m component j is tested in environment x for tj,m(x) time units with replacement, and the mission time is equal to the time unit. These assumptions first lead to the formulation of system reliability as a function of component failure rates, as shown in the first part of this section. They also play an important role in the formulation of the acceptance probability of the system.

P[accept the system] == P[L N j ::; m] jEK

where Nj is the number of component j failures. If we let Nj(x) denote the number of component j failures durin&. its test in environment x, which takes tj,m(x) time units for a given m, Nj can be represented as

Nj = L Nj(x). xEE

Since each component j has an exponential life in environment x with failure rate >'j(x), components are tested with replacement, and component failures

316 CHAPTER 9

are mutually independent as long as the environment is fixed, the number of failures in environment x are independent random variables each of which has a Poisson distribution with parameter Aj(X)tj.m(x). Thus Nj = LXEE Nj(x) has a Poisson distribution with parameter LXEE Aj(X)tj.m(x), and therefore N = LjEK Nj has a Poisson distribution with parameter LjEK LXEE Aj(X)tj.m(x). Briefly, once component test times are known, system acceptance probability is a function of component failure rates. Note that the dimension of the failure rate vectors is not k but k x e, the number of components multiplied by the number of possible environments, this time. This is a natural consequence of the copies taken for each environment, which causes the use of a second summation in the mathematical model.

The rest of the formulation uses exactly the principles which we have applied and the steps we have followed while developing the semi-infinite linear programming model GP(m) of the second section. The new forms of GP(m) and P(m) are given below.

GP(m) :

z*(m)

P(m)

z*(m) =

s.t.

min L L Cj(x)tj.m(x) xEEjEK

s.t.

min I: I: Aj(X)tj.m(X) ~ Aa.m AEp(Ro) xEE jEK

A max I: L Aj(X)tj.m(X) :'S: AI-13.m Ep(Rtl xEE jEK

tj.m(x) ~ 0 j E K,x E E.

min I: I: Cj(x)tj.m(x) xEEjEK

I: L Aj(X)tj.m(x) ~ Aa.m V A E p(Ro) xEEjEK

I: L Aj(X)tj.m(x) :'S: AI-13.m VA E p(RI) xEEjEK

tj.m(x) ~ 0 j E K,x E E.

(4.12)

(4.13)

(4.14)

(4.15)

(4.16)

(4.17)

(4.18)

(4.19)


A test plan consists of k x e component test times, since each component is tested in e different environments. p(Ro) and peRl) are compact subsets ofnd

where d = kxe, which is used to be d = k, and the column generation algorithm of the second section can be easily adapted to solve this new semi-infinite linear program.

We denote the solution of P(m) by (ti,m(l), ti,m(2), ... , ti,m(e)j t;,m(l), t;,m(2), ... , t;,m(e))j ... j tk,m(l), tk,m(2), ... , tk,m(e)). These are the minimum cost component test times for a given value of m, and z*(m) is the associated total test cost. As a result, the minimum total test cost is z* = z*(m*) = min {z*(m) : m E N}j and it is obtained by solving P(m) parametrically with respect to m. This can be done by using the same search procedure which is based on the approximate convexity of z*(m). Then optimal component test times, which are referred as (ti(1),ti(2), ... ,ti(e)j t2(l), t2(2), ... ,t;(e)j ... j tk(l), tk(2), ... , tk(e)), is a solution of P(m*).

We would like to close this section by providing the new formulations of the primal problem PP(m) and its dual DP(m). Recall that they are used in the development of the column generation method. FI and FI I denote the index sets associated with the feasible solutions of the type I and type II problems and If i E FI , If I i E FH the feasible failure rate vectors. Their dimension is d = k x e, which becomes clear in their formulations, below.

PP(m) :

(4.20) xEEjEK

s.t.

L L Il(X)tj,m(X) 2:: AQ,m i E FI (dualvar.7ri) (4.21) xEEjEK

L L III (x )tj,m (x) ~ Al-.6,m i E FH (dual var. 8i ) (4.22) xEEjEK

tj,m(X) 2:: 0 j E K,x E E (4.23)

318 CHAPTER 9

DF(m) :

ZD(m)

s.t.

max Ao,m L 7ri - A1-,l3,m L Oi iEFI iEFII

iEFI iEFII

7ri 2: 0 i E FJ

Oi 2: 0 i E FJJ

5 A SERIES SYSTEM WORKING IN A RANDOM ENVIRONMENT

(4.24)

(4.26)

(4.27)

Let us consider a series system of k components working within a two state random environment changing with respect to an exponential distribution with parameter J-l. In other words, X t = 1 for 0 ::; t < min(l, U) and X t = 2 for min(l, U) ::; t ::; 1 where U is an exponentially distributed random variable with mean 1/ J-l. This is clearly a special case of Example 3 in the previous section. Then system reliability function becomes :

R(A) = eXP[.-(J-l + 2:,..iEK Aj(l))]+ /Lexp[- 2...jEK >'j(l)J ( [( '" (' () '( ))1) + /L+LjEK(>'j(1)->'j(2» 1 - exp - J-l + L-jEK Aj 1 - Aj 2 .

Recall that the reliability function for a series system with independent com

ponent lifetimes is R(A) = exp ( - LjEK Aj). The complicating effect of the environmental process is clear.

We assume that a priori information includes upper bounds on failure rates of components for each environment and the additional information that the environmental sum of component failure rates are equal to a constant C. This can be formalized as follows:

I = I(I) U I(2)

where

I(x) = {(A1(X),A2(X), ... ,Ak(X)): 0::; Aj(X)::; Uj(x), j E K, L Aj(X) = c}. jEK

This provides us the computational simplicity that

R('x) = exp( -C) .

In other words when the equality of the sum of failure rates is also considered as a part of the prior information, column generation becomes simple. However, when the prior information consists of only simple upper bounds on component failure rates, type I and type II problems are both non-convex optimization problems and column generation is considerably more difficult.

Since the column generation algorithm works under mild assumptions on the solution sets p(Ro) and p(R1), such as their compactness, the computation of component test times depends only on the existence of solution procedures for type I and type II problems. As a consequence, the column generation algorithm also works without the constant total failure rate prior information as long as type I and type II problems are solvable.

As a numerical illustration we consider a series system of three components working in a two state exponential environment. We arbitrarily choose a = 0.05, f3 = 0.05, Ro = 0.8, Rl = 0.9, component test costs (Cl (1), Cl (2); C2 (1), cd2); C3 (1), c3(2)) = (3.25, 15.35; 10.39, 1720; 250, 2298), and upper bounds (Ul (1), Ul (2); ~ (1), ~ (2); U3 (1), U3 (2)) = (0.15, 0.1; 0.2, 0.09; 0.05, 0.21). Then we compute total test costs, and component test plans for at most 19 and 20 total component failures. These costs are z*(19) = 536463.4 and z*(20) = 557713.0 . For 0 S m < 19 we halt with an unboundedness at some iterations of the algorithm implying that there is no feasible component test plan. Based on the search strategy, which uses the approximate convexity of z*(m) with respect to m, we can say that m* = 19, since it can guarantee the lowest total test cost. Consequently optimum component test times are (ti (1), ti (2); ;, (1), ;, (2); t3 (1), t3 (2)) = (125.76, 125.77; 125.43, 124.30; 123.33, 125.41); they are computed for m = 19.

Note that the optimum component test times are insensitive to unit test costs. This has been observed for series systems before in case there is no prior information available on component failures [9,14]. The reason is mainly the symmetry of the feasible solution sets of type I and type II problems. However, for the situation when only upper bounds on component failure rates are available as prior information, test times become sensitive to unit costs; in fact, not only to unit costs but also to upper bounds. Numerical examples where the optimum component test times have highly variable values are reported for series systems working in a single environment in [1]. In our case here the additional information that the environmental sum of component failure rates are equal to a constant C, which eases column generation, causes a remarkable amount

320 CHAPTER 9

of symmetry in the structure of type I and type II problems feasible solution sets. As a consequence optimum component test times have close values and they are insensitive to unit test costs as expected.

6 CONCLUSIONS

In this work, we first explained our semi-infinite linear programming formulation of reliability testing problem for systems with independent component lifetimes and presented a solution procedure which combines the well-known cutting plane idea with the well known column generation technique. Then, we have shown that both the formulation and the solution procedure can be easily extended to cover the situations where systems have dependent component lifetimes.

The final semi-infinite linear program P(m) has a special structure. The feasible solution set consists of the intersection of two convex cones each of which has infinitely many inequalities. In other words the constraints can be grouped into two sets according to their right hand sides, since there are exactly two distinct values. As a consequence, the use of the column generation techniqlle on the dual problem resulted in a solution procedure which generates an infinite sequence of optimum primal solutions (dual solution of the dual problem) converging asymptotically to the same point as the sequence of primal optimum solutions. This idea can be applicable for solving semi-infinite linear programs whose feasible solution sets consists of the intersection of finitely many convex cones each of which has infinitely many inequalities. Besides, an approximation scheme based on the number of distinct right hand sides also deserves further investigation.

The column generation procedure, which solves the semi-infinite linear program P(m) to compute a set of minimum cost test times for a given m, is general in the sense that working on the dual DP(m) guarantees the convergence of the optimum solution GP(m) as long as p(Ro) and p(Rd are both nonempty and a priori information is available in the form of compact subset. Nevertheless, the generality does not necessarily mean applicability. Columns can be generated only when type I and type II problems are solvable. This is possible for only systems whose reliability functions can be expressed analytically as a function of component failure rates, e.g. series system, serial connection of redundant systems because they have simpler reliability functions which provide nice forms for p(Ro) and p(Rd, namely the feasible solution sets of the type I and type II

problems. In short, for any type of prior information which guarantees p(Ro) and p(R1 ) to be a nonempty compact subset of nonnegative real numbers the column generation procedure computes component test times for any coherent system whose reliability function has a closed analytical form providing solvable formulations for type I and type II problems. How to generate the new columns when the system has more general structures, and/or works in more general environments is still an open research question.

REFERENCES

[1] i. K. Altmel. System Based Component Test Problem: The Design of Optimum Component Test Plans. PhD thesis, University of Pittsburgh, Pittsburgh, PA 15261, December 1990.

[2] i. K. Altmel. A note on the parameter of Poisson probability distribution. Technical Report 90 - 21, Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, PA 15261, September 1990.

[3] i. K. Altmel. The design of optimum component test plans in the demonstration of a series system reliability. Computational Statistics and Data Analysis, 14:281-292, 1992.

[4] i. K. Altmel. The design of optimum component test plans in the demonstration of system reliability. European Journal of Operational Research, 78:97-115, 1994.

[5] i. K. Altmel and S. Ozekici. Optimum component test plans for systems with dependent components. Technical Report FBE - IE - 01 / 95-03, Department of Industrial Engineering, Bogazi<;i University, Bebek, istanbul 80815, May 1995.

[6) i. K. Altmel and S. Ozekici. A dynamic model for component testing. Naval Research Logistics, 44:187-197, 1997.

[7] A. Charnes and W. Cooper. System evaluation and repricing theorems. Management Science, 9:209-228, 1962.

[8] E. Qmlar and S. Ozekici. Reliability of complex devices in random environments. Probability in the Engineering and Informational Sciences, 1:97-115, 1987.

[9] R. G. Easterling, M. Mazumdar, F. W. Spencer, and K. V. Diegert. System based component test plans and operating characteristics: binomial data. Technometrics, 33:287-298, 1991.

[10] S. Gal. Optimal test design for reliability demonstration. Operations Research, 22:1236 - 1242, 1974.

[11] I. B. Gertsbakh. Statistical Reliability Theory. New York: Marcel Dekker, 1989.

[12] B. V. Gnedenko, Y. K. Belyayev, and A. D. Solovyev. Mathematical Methods of Reliability Theory. New York: Academic Press, 1969.

322 CHAPTER 9

[13] H. Hu. A one-phase algorithm for semi-infinite linear programming. Mathematical Programming, 46:85-103, 1990.

[14] M. Mazumdar. An optimum procedure for component testing in the demonstration of series system reliability. IEEE Transaction on Reliability, R-26:342-345, 1977.

[15] M. Mazumdar. An optimum component testing procedure for a series with redundant subsystems. Technometrics, 22:23-27, 1980.

[16] T. R. Rockafellar. Convex Analysis, 2 nd printing. New Jersey: Princeton University Press, 1972.

10 SEMI-INFINITE PROGRAMMING

IN ORTHOGONAL WAVELET FILTER DESIGN

K. O. Kortanek1 and Pierre Moulin2

ABSTRACT

1 University of Iowa, College of Business Administration, Iowa City, IA 52242, USA,


2 University of Illinois, Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801, USA,


Quadrature mirror filters can be introduced by studying algebraic properties of discrete input signals viewed as square summable infinite sequences together with linear operators, termed filters, which act linearly through convolution on the input signals. The algebra is readily revealed through operations on complex variable transfer functions which are formally associated with the filters. By using the space of transfer functions, one can define how a signal can be decomposed into two subsidiary signals which can then be combined, using only certain linear mappings all along throughout the entire process. Ideally, the original signal may be recovered, and in this case the term perfect reconstruction is used. In addition to ideal recovery, it is desired to impose orthogonality conditions on the subsidiary signals. These constraints imply that one need only concentrate on the optimality of one of the subsidiary signals to guarantee the optimality of the other. The orthogonality properties are aSsociated with the term quadrature mirror filters.

We describe what optimality can mean within this special structure by making a transition to a description of statistical properties which are reasonably representative of the actual transmission of discrete signals and their recovery after transmission. The concept of coding gain is reviewed, and we show how this objective function may be combined with constraining relations on the coefficients of the (chosen) primary filter necessary to formally guarantee perfect reconstruction. The constraints lead to a nonlinear transformation of the original filter coefficients to variables that appear in an equivalent linear semi-infinite programming problem developed by the second author. We show how an optimal solution in the original filter-variables may be obtained

323


324 CHAPTER 10

from an LSI P optimal solution by spectral decomposition. Finally, we review some elementary duality-based sensitivity analysis and present some previously published numerical results (by us with other co-authors).

1 QUADRATURE MIRROR FILTERS: A FUNCTIONAL ANALYSIS VIEW

We refer to the books of Vaidyanathan [30], Yves Meyer [18], and Strang and Nyugen [27] for much more detail than we can present here on the theory of wavelets. We focus our limited review to those constructions occurring in wavelet theory that provide reasons why linear semi-infinite programming models are useful for studying and solving some wavelet design problems.

We shall define a discrete input signal x == {x( n)} nEZ to be an arbitrary infinite sequence which is square summable, i.e., a member of the linear space l2 (Z), where

00

L Ix(n)12 < 00.

-00

A simple linear operator on inputs is the decimation operator which corresponds to downsampling. This means that only the terms of x(n) with even index are retained. Denoted by D, it maps l2(Z) ---+ l2(2Z). D is also conveniently denoted by 2 .!. 1. The adjoint operator, D*, corresponds to upsampling. It starts with the sequence {x(2n)} nEZ and inserts zeros at the odd indices, giving the sequence

( ... , x( -2),0, x(O), 0, x(2), 0, x( 4),0, ... ).

Thus, D* : [2(2Z) ---+ l2(Z). In practice these operators have meaning with respect to the reduction in data to be transmitted. However, for purposes of this exposition, an input signal, x, is filtered through two filters, described as follows.

A filter is a linear operator that acts on an input signal, x, to produce an output vector through convolution,

00

output vector y(n) = L h(i)x(n - i). (1.1) -00

SIP and Wavelets 325

Actually, (1.1) also describes the action of a linear time-invariant system, or, LT I, for short. This means that a shifted input signal gives an equally shifted output signal, for all signals, x. A filter is identified with an infinite vector {h(n)}nEZ. We shall use uppercase letters to denote the transfer function of a given filter, itself denoted by lowercase letters. For filter h the transfer function in the complex z domain is H(z) = En h(n)z-n, and similarly its frequency response is given by H(w) = En h(n)e-inw , where i = A. When the filter has only finitely many nonzero components, it is termed finite impulse response, FIR.

Our interest is on filters of finite length, namely FIR's having only 2N nonzero components.

We are interested in pairs of signals, Xh, Xg identified with filters hand 9 which are actually defined convolutions as follows:

2N-l 2N-l

xh(n) = L h(i)x(n - i) and xg(n) = L g(i)x(n - i), n E Z .(1.2) i=O i=O

Using our notational convention, the transfer functions for these signals are: 00 00

n=-oo n=-oo

The convolutions (1.2) are equivalent to multiplication of the corresponding transfer functions,

Xh(z) = H(z)X(z) and Xg(z) = G(z)X(z). (1.4)

When two or more filters are combined, one has a filter bank.

In linea,r operator terminology, see [18], we introduce linear maps, L H (La) which generate Xh (Xg), respectively as follows,

(LH(x»(n) = xh(n) and (La(x»(n) = xg(n). (1.5)

We use the standard terminology (see Strang/Nyugen [27]) that Xh is Lowpass in the sense of taking averages to smooth out variations while Xg is Highpass in the sense of picking out the bumps or high frequencies in the signal. For example taking moving averages would be Lowpass, while taking differences would be a Highpass filter. A classical example nicely illustrating the properties of Lowpass and Highpass is the following one.

326 CHAPTER 10

Example 1.1

1 Lowpass xh(n) = yI2(x(n) + x(n - 1))

and Highpass xg(n) = ~(x(n) - x(n - 1)). (1.6)

The filters in this example are H(z) = ~(1 + Z-l) and G(z) = ~(1- Z-l),

also known as Haar filters.

These two filters split the signal, x, into two parts, each of which can be compressed and coded separately for transmission. Of special interest are those filters for which the original signal, x, can be recovered perfectly. Basically, each signal, Xh, Xg is downsampled, then upsampled, followed by the respective adjoint operator of each, L'H (La). The question is how does the sum of these two recovered signals compare to the original signal x, bringing us to a key definition for filter banks.

1.1 Perfect reconstruction

Definition 1.2 The term Perfect Reconstruction is used whenever Hand G have the property that any original signal X can be recovered. The requirement stated in terms of above four operators and their ad joints is,

L'HD* DLH + LaD* DLa = I (the identity operator), (1.7)

i.e., the input and output of the system diagrammed in Fig. 1 are identical. It is convenient to express the left-hand-side of (1.7) in terms of transfer functions and decimation. We first observe that the linear operator L'H is related to paraconjugation of complex polynomials, see [30, Section 2.3.3], namely

H (z) = H * (z -1), where H * means only the coefficients are conjugated.

Hence, in our case, H(z) = En h(n)z-n, where h(n) = h( -n) for all n. It then follows that (L'H(x))(n) = E~:o-l h(i)x(n-i). Returning to the left-hand-side of (1. 7) we obtain,

(DH)(z) = Xh(Z) 1.j.2 and (DG)(z) = Xg(z) 1.j.2, (1.8)

where 1 1 1 1 1 1

Xh(Z) 1.j.2= 2[Xh(Z2) + X h( -Z2)] and Xg(z) 1.j.2= 2[Xg(Z2) + X g(-Z2)].


Analysis Synthesis

Figure 1 Perfect reconstruction Quadrature Mirror Filter Bank.

Hence,

1 D*((DH)(z)) = (DH)(z2) = "2[Xh(Z) + Xh( -z)]

and D*((DG)(z)) = ~[Xg(z) + Xg( -z)].

Finally, the transfer function for the reconstructed signal is

X(z) = H(z)D*((DH)(z)) + G(z)D*((DG)(z)) 1 - - 1 - -

= "2[H(z)H(z) + G(z)G(z)]X(z) + "2[H( -z)H(z) + G( -z)G(z)]X( -z).

(1.9)

The components of (1.8) have a specific technical meaning in signal transmission, and the nomenclature is the following one (see [30]).

X(z) = T(z)X(z) ~ A(z)X( -z), where

T(z) = ~[H(z)H(z) + G(z)G(z)]X(z) and

A(z) = ![H( -z)H(z) + G( -z)G(z)]X( -z).

(1.10)

T(z) is termed the distortion transfer function, while A(z) is called the aliasing transfer function. Clearly for Perfect Reconstruction, X(z) = X(z), which is equivalent to T(z) == 1 and A(z) == O.

328 CHAPTER 10

In matrix notation (1.8) becomes:

2X(z) = [X(z) X( -z)] [H(Z) G(Z)] [i[(Z)] (1.11) H( -z) G( -z) G(z)

Example 1.3 The well-known trivial example illustrates an instance of perfect reconstruction, see Fig. 2.

* DXh(n)l+ 2IDD l\(n) 1

... 1(0) 0 1(2) 0 ... . 1(0) 0 1(2) 0 ...

... 1(-1) 0 1(1) ...

Analysis Synthesis

Figure 2 Trivial Perfect Reconstruction.

Here h(n) is the unit impulse (as is h(n)), and g(n) is a delay of one time unit:

The adjoint of 9 is:

{ I if k=O h(k) = 0 otherwise,

{ I if k=1 g(k) = 0 otherwise.

_ { 1 if k = -1 g( k) = 0 otherwise.

For this example (1.11) specializes to:

2X(z) [X(z) X( -z)] [~ ~:~l ] [! ] = [X(z) X( -z)] [~] = 2X(z). (1.12)

It will be easily verified that Example 1.1 is another instance of Perfect Reconstruction. There is another important property that both Examples 1.1 and 1.3 illustrate regarding the 2 x 2 matrices in (1.11) and (1.12).

Definition 1.4 Let H(z) be a polynomial in z having real number coefficients, and H(z) its paraconjugate. Let 1£ refer to the 2 x 2 matrix in {1.11} having real number coefficients. Let i£(z) denote 1£T(z-l). 1£ is paraunitary <=>

1£(z)i£(z) = i£(z)1£(z) = dI for some d> 0, for all z. (1.13)

There is a simple argument appearing in [30, Chap. 6, pp. 299-300] that justifies the following special form of G in (1.11), when 1£ is paraunitary,

G(z) = cz-L H( -z), Icl = 1, L is odd, (1.14)

whence g(n) = -c(-I)nh(L - n). (1.15)

The argument uses the form of the polynomial products in the matrix multiplication

1£(z)i£(z) = dI,

namely

H(z)H(z) + G(z)G(z) = d and H(z)H( -z) + G(z)G( -z) = 0 (1.16)

Basically it follows that H(z) and G(z) can have no common factors, nor can H( -z), G( -z) have common factors. By using this property within (1.16) one can justify the sought-for form (1.14) above. The result (1.16) implies that the reconstructed signal x(n) is proportional to the input signal x(n), see (1.10).

Similarly, from i£(z)1£(z) = dI, we find

H(z)H(z) + H( -z)H( -z) = d as polynomials

which in the notation of (1.8) becomes

- 1 H(z)H(z) 1.1-2= "2 d. (1.17)

Definition 1.5 H is a half-band filter if and only if {1.17} holds.

330 CHAPTER 10

1.2 Relation to orthogonality and a nonlinear constraint set

Using the decimation operation, D, we can express the perfect reconstruction occurring in Example 1.3 as follows (where the summation notation denotes the sum over all integers):

00 00

x(n) = L DXh(m)¢m(n) + L DXg(m)'ljJm(n). (1.18) m=-oo m=-oo

where ¢m(n) = h(n-2m) and 'ljJm(n) = g(n-2m). Furthermore, in this simple example the basis is orthogonal because

(1.19) and :Em ¢(n)'IjJ*(n) = 0 for any m, i.

The orthogonality is also seen from the matching of a zero component in the output of the first reconstruction filter in Fig. 2 with (possibly) nonzero entries in the output of the second reconstruction filter, and vice-versa. The example provides an illustration of perfect reconstruction with an orthonormal basis.

The orthogonality conditions (1.19) also hold for more general PR polynomials. We begin by rewriting the halfband condition (1.17) for d = 2:

Definition 1.6 The product filter, P, is defined by,

P(z) = H(z)H(z-l), which equals p(Z-l).

The perfect reconstruction condition is then P(z) + P( -z) = 2, and a transfer function H satisfying (1.20) is termed a PR polynomial.

Example 1.7

H(z) = ~(Z2N-1 + 1) is a PR polynomial.

Moreover, P(z) has zeros ei1f (2k+1)/(2N -1), 0::; k ::; 2N - 1 on the unit circle.

To analyze the structure of P(z) we simply expand H(z)H(z-l):

2N-1 N-12n-1-21 P(z) = L h; + L L hnhn+21(z21 + Z-21)

n=O 1=1 n=O N-12N-2n-2

+ L L hlhl+2n+l (z2n+1 + z-2n-1). (1.21) n=O 1=0

When forming the sum P(z) + P( -z) we find that the odd powers will automatically cancel. Hence, in order for a choice of h'S to satisfy (1.20), we must require that all of the even coefficients of P(z) be zero and that 2::~~;;-1 h;' = 1. These conditions can be conveniently written as:

2N-1-21 L hnhn+21 = 81,

n=O o ~ 1 < N. (1.22)

Definition 1.8 The nonlinear constraint set for perfect reconstruction for finite impulse LTI filters, denoted by CPR, is the set of h's satisfying {i. 22}.

Remark 1.9 The constraint set CPR {1.22} has appeared in the literature in conjunction with several choices of objective junctions, see [5, 32} and [27, pages 354-359}. In the latter reference a constraint set similar to CPR arises in the Quadratic Least Squares Algorithm described there.

Remark 1.10 There also exist unconstrained parameterizations of the set of wavelet filters in terms of rotation angles [30}. Such unconstrained parameterizations do not necessarily facilitate the design problem, as discussed in Sec. 2.2.

The basis functions <pm(n) are orthogonal because the constraints (1.22) are satisfied. Due to (1.15), the filter g(m) also satisfies constraints of the form (1.22), so the basis functions 'ljJm(n) are also orthogonal. It can be shown that <pm(n) and 'ljJm(n) are mutually orthogonal.

332 CHAPTER 10

1.3 Cascaded filter banks and discrete wavelets

The signal DXg(n), denoted here as Dx~O) for convenience, contains details of

the original signal x(n). The signal DXh(n), denoted here as DX~l), is a smooth version of x(n) and is obtained using the filtering operations pictured in Fig. 1. The signal DX~l) may be further decomposed into a smooth signal DX~2) and a

detailed signal DX~l), using the same filtering operations again. We may keep going and compute signals Dx~3) and Dx~2), etc. All these signals are obtained using a cascade of filter banks, with the system in Fig. 1 as the basic building block.

The cascaded system with an arbitrary finite number I of stages, still enjoys the orthogonality and perfect reconstruction properties. In particular, x(n) satisfies

00 /-1 00

x(n) = I: Dxf.(m)cf>~(n) + I: I: Dx~i)(m)¢}2(n), (1.23) i=O m=-oo

where the basis functions cf>~ (n) and 'IjJ}2 (n) are mutually orthogonal. The functions 'IjJ}2 (n) are the discrete wavelets at scale i, and the functions cf>~ (n) are the discrete scaling functions at scale I. Basis functions at the same scale are translates of each other. Under certain technical conditions on the filter h [4], a certain continuous-time version of the basic discrete wavelet at scale i tends to a limit as i tends to infinity; this limit is the mother wavelet. Likewise, a certain continuous-time version of the basic discrete scaling function at scale i tends to a limit as i tends to infinity; this limit is the scaling function, also known as father wavelet.

2 DESIGN IMPLICATIONS FROM THE PROPERTY OF PERFECT RECONSTRUCTION

Once the filter h has been determined, all other filters can be constructed from (1.14). From the viewpoint of mathematical programming the first question is how to construct the constraint set for the 2n variables {hn};/:::ll, while the second question is how to select an objective function.

SIP and Wavelets

2.1 Towards an objective function for subband coding design

333

The decimated signals DXh(n) and Dxg(n) are encoded and transmitted. In a coding system, the decoder does not have access to the original sub band signals and can only decode approximations x~ (n) and x~ (n). These signals are processed by the synthesis system in Fig. 1 so as to reconstruct an approximation x'(n) to the original signal x(n). The signal x'(n) - x(n) is termed the reconstruction error and is due to the subband quantization errors x~ (n) -xh(n) and x~(n) - xg(n). These quantization error signals are modeled as random processes. The underlying stochastic process assumptions are usually simplified ones but nevertheless widely accepted for studying noise analysis.

For example, let us assume that a signal x is subject to certain arithmetic rules called a quantization to obtain a fixed point binary representation. An excellent description of how stochastic processes are introduced into the functional analysis development of wavelet theory appears in Appendix C of [30], which we are using to aid this portion of our survey.

2l -(b-l)

Q(x): D E b bits

Figure 3 Binary Representation

The integer 1 in Fig. 3 governs the range over which Q(x) can vary, _21+1 < Q(x) < 21+1, while the number of bits b determines the accuracy of Q(x). The quantity ~ = 2-(b-1)21 is termed the quantization step. Assume that (i) the quantization error (or noise), q = Q(x) - x has a uniform probability distribution over -2A ~ q ~ ~, where the division by 2 occurs from the action of round off. Then the variance of q has a simple form,

~2 22(1-1)2-2b a; = var(q) = 12 = 12 (2.1)

Also assume that the quantization errors are (ii) statistically independent and (iii) statistically independent of the signal x.

Assumptions (i), (ii) and (iii) are valid in the limit as the quantizer step size tends to zero, if it can be assumed that the input x is itself a random variable,

334 CHAPTER 10

with variance a; [12]. Note that this assumption is routinely made in wavelet analysis [17]. For a given bit budget, the quantizer step size ~ and 'hence the quantizer noise variance are related to the input variance in a way that depends on the statistical properties of x. Two classical examples illustrate the elegant connection.

Example 2.1 Uniform or Rectangular Distribution on x. For x uniformly distributed in the range _21+1 < x < 21+1, we see that a; =

22(~+1), implying that the quantization error variance is a~ = 0.25 x 2-2ba;. The probability density function for x is also written as,

p(x) = (2 x 21+1)-1rect[_21+1,21+1)(X),

a notation which we shall use later for describing one type of input signal.

Example 2.2 Normal Distribution on x. Assume that x has zero mean. We restrict the otherwise unlimited range of x to the same finite range above, _21+1 < x < 21+1. Using the classical "three sigma" for limiting the probability of being outside the range to 0.0026, we determine I (up to an integer) from the requirement, a; = 21;1. In this case the quantization error variance is a~ = 0.75 X 2-2b a;.

Both examples illustrate the elegant connection between the variance of the input signal a; and the quantization error variance, a~ namely

(2.2)

where c is a constant which depends on the statistical properties of the signal x.

In the previous section H(z) = En h(n)z-n denoted the transfer function of a FIR filter with real response. To analyze the statistical properties of the quantization error further, we assume that the input signal x(n) is now a random variable for each n with rather nice properties. It is assumed to be Wide Sense Stationary, which means that the means mx = Ex(n) and autoc01'relatians, Rxx(k) = E[x(n)x(n - k)), are independent of n. The WSS assumption is standard initial assumption for noise analysis in LT I systems.

By passing the random process x(n) through an LTI system represented by transfer function H(z), we mean that the output y(n) is the convolution

y(n) = L x(k)h(n - k) (2.3) k


It follows that the following statistical relationships hold.

1. my = mx Ln hen)

2. Ryy(k) = Ln Rhh(n)Rxx(k-n), where Rhh(n) = Ll h(l)h(l-n) is termed the deterministic autocorrelation of hen).

We apply these results to sub band coding. For simplicity we restrict our review to the two-subband case, denoting by H1(z) and H2(Z) the transfer functions of the synthesis filters. The reconstruction error at the output of the subband decoder is the sum of the quantization error signals ql (n) and q2 (n), upsampled and filtered by the synthesis filters Hl(Z) and H2(Z), respectively. Denote the corresponding filter output errors by el (n) and e2 (n), respectively. Under assumptions (i) and (ii), the quantization errors are independent with zero mean and common variance, a~. Then

Rqq(k) = a;8(k) and Re;e; (k) = a;Rh;h; (2k) = a;8(k), i = 1, 2 (2.4)

by virtue of (1.22). It follows that

From the independence of ql and q2, it follows that el and e2 are independent, hence we obtain the variance of the reconstruction error as

(2.5)

A scalar stochastic process x(n) leads to a vector process X(n) by partitioning x(n) into successive blocks of M samples

X(n) = {x((n - I)M),x((n -1)M + 1), ... ,x(nM - I)}

where for our application M = 2N.

We are interested in the case having a finite number of random variables

X 2N = {x(O), ... ,x(2N -I)}.

The vector X 2N is said to be W S S if the expectation vector E( X (n)) and the autocorrelation matrix Rxx(m) = E(X(n)X(n - m)) are both independent

336 CHAPTER 10

of n. Setting R2N = E(X2NXrN), it follows that R2N is Toeplitz, namely

R(O) R(l)

R(l) R(O)

R(2N - 2) R(2N - 3) R(2N - 1) R(2N - 2)

R(2N - 1) R(2N - 2)

R(l) R(O)

(2.6)

Denote by P the transmission rate of x in a single digital channel, in bits/sample. (This coding system has been termed Pulse Code Modulation, PC M in the literature see [12].) For the case of this exposition, namely two subband channels, the transmission rate of the subband system is the average of the transmission rates PH and Po in the Lowpass and Highpass channels,

PH + Po = 2p (2.7)

where, consistent with the Lowpass/Highpass interpretation, we assume PH > Po. The Toeplitz matrix (2.6), denoted R, for short arises when one takes the variance of the linear form h T X, namely

(2.8)

The error variances Following our discussion (2.2) the reconstruction error variance of the full-band PC M becomes

2 2-2p 2 apCM = c x ax'

while the reconstruction error variance of the Lowpass subband is

and the reconstruction error variance of the Highpass subband is

for some constant c.

We have IIxl12 = IIDxhW + IIDxgI12, due to the orthogonality implied by the choice of the filter h(n). Taking expectations, we see that the sum of the subband input variances is equal to the variance of the full band input x, namely

(2.9)


Definition 2.3 The Coding Gain for a two-band subband system is the ratio of the original error variance (for the PC M channel) to the error variance obtained with the two subband channels, namely:

a2 2-2Pa; GSBC = ;CM = --~~~--~~--,,--

aSBC 2-2PH a 2 + 2-2PG a D2 DXh Xg

(2.10)

We are interested in how abXh can be changed so as to maximize the coding gain (2.10). This is basically a classical result which we now state.

Theorem 2.4 [12, (11.12)]. The maximal value of the coding gain GSBc is the following arithmetic mean geometric mean ratio,

(abXh + abx )/2 GsBc = 2 2 g! .

(aDx/,aDx ) 2

(2.11)

Proof: As is well known, e.g., [12] the gain GSBC (2.10) can be maximized over the variables PH and Pc subject to the constraint (2.7). Because the numerator a; is fixed maximizing the coding gain is equivalent to the following strictly convex minimization problem.

minimizepH , Pg 2-2PH a2 + 2-2PG a 2 DXh DXg

(2.12) subject to PH + pc ~ 2p and PH > 0, pc > 0,

where there is no loss in generality using the inequality instead of the equality.

Applying the K KT conditions determines a unique optimum PH' Po satisfying

(2.13)

From (2.13) we see that the optimal value of the convex objective function in (2.12) becomes

Substituting the rightmost term in the above equality for the denominator in (2.10) yields the classical arithmetic mean geometric mean ratio, (2.11). 0

338 CHAPTER 10

Corollary 2.5 In a Highpass/Lowpass filter the coding gain is maximized by . .. 2

max~mlzmg aDXh.

Proof: Note that abXh and abxg appear symmetrically in (2.11). By max

imizing abXh' we obtain the filter for the "high-energy channel." We could as well have minimized abXh' in which case we would have obtained the filter for the "low-energy channel." 0

Remark 2.6 In practice, the signals x(n) to be encoded are finite sequences. Assume the length of x(n) is an even integer P > 2N. Then filtering as in {1.2} followed by downsampling produces two subsequences DXh(n) and Dxg(n) of length Pj2, if x(n) is extended beyond its end points by periodic replication. The coding gain analysis above still applies. The optimization problem is solved using the estimates

• 2 1 ",P-l I (P)12 ax = p wp-O x ,

•

2.2 Nonlinear nonconvex optimization formulation

Adjoining the objective function (2.8) with the constraint set CPR, (1.22) yields the following difficult nonlinear optimization problem, which has been the standard approach for FIR, LT I filter design, see for example, [5,32] and [27, pages 354-359].

b · t t ",2N-1-21 h h su Jec 0 wn=O n n+21

(2.14)

The optimization problem maximizes a convex function over over a subset of a high dimensional sphere, which is a very difficult problem to solve because of its nonconvexity and usually many local optima which are not optimal. Transformation of the optimization problem into an unconstrained optimization problem, using the rotation-angle parameterization of wavelet filters [30], results in a highly nonlinear objective function, and does not necessarily make the

optimization problem simpler. In the next section we offer an alternate optimization based on an implicit transformation that yields a linear semi-infinite programming problem as the first step for eventually solving (2.14).

3 THE PERFECT RECONSTRUCTION SEMI-INFINITE OPTIMIZATION PROBLEM

3.1 Constructing the linear SIP problems

It was straightforward to derive the particular form of the product filter in (1.20), namely in z-variables,

where

N-l P(z) = 1 + L an(z-2n-l + z2n+l)

n=O

2N-2n-2 an ~ L h,hl+2nH , 0 ~ n < N.

1=0

(3.1)

(3.2)

Under the change of variable, z = e27ri! 0 ~ f ~ 0.5, let us denote P(e27ri!) by P(f) so that

N-l P(f) = 1 + 2 L an cos(27r f(2n + 1)) (3.3)

n=O

and P(f) = Ih(fW·

Observe that should P(f) < 0 for some f, 0 ~ f ~ 0.5, then P cannot be a product filter. This leads to specifying conditions on the vector a to guarantee that P really is a product filter. These conditions being a linear system of inequalities determine a convex set with regular properties that we list below.

Definition 3.1 The feasible set A is the set of real vectors {an} ~;:l satisfying

N-l 1 + 2 L an cos(27r f(2n + 1) :2: 0 for all f, 0 ~ f ~ 0.5. (3.4)

n=O

340 CHAPTER 10

Using the Toeplitz property of JR, we observe that the objective function in (2.8) is simply a linear function of {an} in (3.2) and write it as

N-l

£(a) = To/2 + L anT2nH (3.5) n=O

where {Tn,O :::; n < 2N} is the first row of JR. In other words, the coding gain is a function of the product filter.

The desired filter H(z) may be obtained from the product filter P(z) by spectral factorization [33, pp.128-131]. There are up to 2N - 1 different solutions corresponding to different groupings of zeroes of P(z) [4, p.174]. Each of these filters corresponds to a local maximum of the criterion (2.8). All of these local maxima have the same value and are thus global maxima as well, thereby providing an elementary estimate on the number of local optima. For the special case that the optimal P(z) is unique, it follows that these solutions differ only by their phase. A solution can be designed to have minimum phase (all zeroes on or inside the unit circle) or close-to-linear phase (by suitable alternation of zeroes inside and outside the unit circle [26]). Phase design has no bearing on the coding gain.

Together with the constraint set given in definition (3.4) we finally obtain the following dual pair of linear semi-infinite programming problems.

(PSIP) maximize £(a) = To/2 + L,~:Ol anT2nH (3.6)

0:::; f:::; 0.5.

The linear semi-infinite dual problem can be stated as follows [8,11].

(DSIP) minimize leA) = To/2 + L,f AU)

subject to -2 L,f AU) cos(27r(2n + 1)1) = T2nH' 0:::; n < N(3.7)

and AU) ~ 0, 0:::; f :::; 0.5

where AU) are generalized finite sequences.

SIP and Wavelets

3.2 On the duality of the PR semi-infinite programming problems

341

In the following we use the notation P(j; a) to refer to the product filter associated with a filter a = {an}. When no ambiguity is possible, we revert to the original notation P(j). The feasible A in (3.4) satisfies the following properties:

(PI) A is contained in the hypercube {a I Vn: lanl ~ I} and is thus bounded;

(P2) A contains the II hypersphere {a I Ln lanl ~ I/2}, in particular PSIP is superconsistent, see [11];

(P3) A is closed and convex;

(P4) A is symmetric around O.

Property (PI) is obtained by application of (1.22) and the Cauchy-Schwartz inequality to (3.2). Properties (P2) and(P4) are verified by inspection of constraining inequalities in Definition 3.4, while Property (P3) is a standard result in SIP, see [8,11].

We denote the maximum value of the objective function over A by V(PSIP), and derive bounds on £(a) and V(PSIP). Bounds (3.9) and (3.10) follow from (PI) and (P2), respectively.

Va E A: 0 ~ £(a) ~ TO (3.8)

N-1 N-1

Va E A: To/2 - L IT2n+11 ~ £(a) ~ To/2 + L Ir2n+11 (3.9) n=O n=O

V(PSIP) ~ To(1 + >')/2 (3.10)

where>. = maxn Ir2n+1/rol ~ 1. The lower bound (3.10) is attained by an == ~6nm, where the index m satisfies IT2m+d ~ IT2n+11 Vn. Since V(PSIP) is sandwiched between ro(I + >')/2 and TO, this formula provides a useful lower bound in the case of signals with mostly low-frequency content. Indeed, when Tn decays slowly, >. is close to 1, and the lower bound is attained by the Haar filter (an == ~6no).

We denote the minimum values of the primal and dual problems respectively by V(PSIP) and V(DSIP). The classical duality inequality in the context is

342 CHAPTER 10

V(PSIP) - V(DSIP). In general, it is nonnegative, but zero here due to the regularity properties of the PR problems.

The most general stability results depend on properties of the so-called level sets of both PSIP and DSIP, see Theorem 6.11 in [11]. The assumptions of this theorem, namely (a) the index set, 0 :::; I :::; 0.5, is compact, and (b) the coefficient functions of (3.4) are continuous, are automatically satisfied throughout this paper.

Theorem 6.11 in [11] implies that upon taking finer and finer discretizations of the index set [0,0.5], a finite dual pair of linear programming solutions will converge to an optimal pair of dual semi-infinite programming solutions regardless of any kind of degeneracy. Precise statements of this result have appeared in [10, Lemma 3.6] and [11, Theorem 4.4]. These natural regularity conditions for the QMF bank design problem are the reasons why discretization and cutting plane methods perform fairly well numerically. In particular, V(PSIP) = V(DSIP) [11, Theorem 6.9], and both values are attained, i.e., both problems have optimal solutions.

4 CHARACTERIZATION OF OPTIMAL FILTERS THROUGH SIP DUALITY

The analysis of DSIP leads to a characterization of the optimal filters in terms of their roots on the unit circle. Uniqueness of the solution and sensitivity issues are also examined.

4.1 Roots on unit circle

Theorem 4.1 Assume that 3n : 1"2n+l f. o. Then there exist an optimal filter P(f; a), an integer K E [1,···, N], and certain constants Pk, Ik such that Pk > 0, !k f. II il k f. l,

K

r2n+l = 2 L Pk COs(21T(2n + 1)(0.5 - Ik)), 0 :::; n < N, (4.1) k=l

and (4.2)

Conditions (4.1) and (4.2) are necessary and sufficient lor optimality 01 a E A.

Proof: Theorem 6.11 in [11] implies that V(DSIP) = V(PSIP) so DSIP admits a solution A with at most N nonzero components [8]. Assume momentarily that A =/; 0, and let fk' 1:S: k :s: K be the indices of the nonzero components. Since the solution A(f) of DSIP satisfies (3.7), we have

K

-2 L A(fk) cos(27r(2n + l)lk) = r2n+l, 0 :s: n < N (4.3) k=l

so (4.1) holds with Pk = A(fk) > O. In (4.3), A = 0 is possible only ifr2nH == O. This case is excluded by the hypothesis, so the assumption A=/;O above was justified. Finally by the optimality of A we have

K

V(DSIP) - £(a) = L A(lk)P(fk; a) (4.4) k=l

for all a E A. By [11, Theorem 6.11] there exists a E A such that £(a) = V(PSIP) , so the right-hand side of (4.4) is zero. The matching of the solutions {A(fk)} of DSIP with a of PSIP in this particular way is known as complementary slackness [8, p.95], and we shall return to this measure of optimality later when we report numerical results. The strict positivity of A(fk) in (4.4) implies (4.2) 1.

Conversely, if (4.1) holds, then A taking values Pk at f = fk and zero otherwise is feasible for DSIP. If in addition (4.2) holds then L/ A(f)P(f; a) = O. Hence we have complementary slackness, and a must be optimal. 0

In order to put Theorem 4.1 in perspective, consider {Pk,id satisfying (4.1). Algebraic manipulations similar to those used in establishing the classical duality show that

K

£(a) = ro/2 + L Pk(P(0.5 - fk; a) - 1). k=l

Any a satisfying (4.2) also attains the upper bound on (4.5),

K

£(a) = ro/2 + L Pk· k=l

(4.5)

(4.6)

However with an arbitrary choice of {Pk, fk} there may not exist a feasible polynomial with roots at lk, i.e., the condition a E A may not be satisfied. For

INotice that (4.2) does not preclude the existence of roots of P at locations other than Ik·

344 CHAPTER 10

instance in [1) Pk, fk are chosen using a Gaussian-quadrature technique, and P(fj a) with zeroes at fk is computed. It has been observed in [1) that this polynomial may be far from the feasible set.

Although Theorem 6.11 in [11) shows that a solution exists (and Theorem 4.1 partially characterizes it), this solution is not necessarily unique. The solution set may have infinitely many elements, in which case we adopt the nonconventional terminology of referring to such optimization problems as degenerate, solely for convenience. There are optimization problems for which the dimension of the degeneracy is as large as N, e.g., ifr2n+1 == 0 then £(a) == ro/2, so all filters are optimal. Other examples of degeneracy motivated by practical problems are presented below. First notice that if l:~=1 Pk = ro/2 in (4.6), then £(a) attains the maximum possible value ro in (3.8), and the energy in the second channel is zero.

Examples of degeneracy.

1. In the special case K = 1, PI = ro/2, (for example, sinusoidal input, rn = ro COS(21T(0.5 - lI)n)), any filter with a zero at II is optimal. The dimension of the degeneracy is N - 1.

2. In particular, for the special case II = 0.5 (rn == ro, "maximally correlated" process) any filter with a zero at f = 0.5 is optimal.

Although degeneracy is often a pathological case and occurs for a narrow class of processes, understanding its effects is useful due to the common occurrence of near-degenerate problems. Examples include near-sinusoidal inputs and lowpass processes, of which Examples 1 and 2 above are limiting caseSj also see §6.

4.2 Sensitivity of perfect reconstruction designs through a duality-based analysis

We have established that the optimal filters are characterized by the location of their zeroes on the unit circle. We are, however, also interested in the performance of filters whose zeroes on the unit circle are obtained by perturbation of the "optimal" zeroes. How crucial is the exact location of zeroes to coding performance?


Consider the solution A of DSIP with nonzero components at /k, 1 ~ k ~ K. By (4.4) the coding performance of an arbitrary filter P(fja) is

K

£(a) = V(PSIP) - L A(/k)P(/kj a). (4.7) k=l

The cost of using a nonoptimal filter with P(fkj a) > 0 is proportional to P(fkj a), with proportionality constant A(fk). The A(fk)'S can be viewed as sensitivity parameters. Looking at the solution of DSIP, it is possible to assess which zeroes contribute significantly to coding performance. In near-degenerate problems (see §4.1), one or several A(/k)'S are near zero.

Assume that the filter P(fj a) has zeroes at frequencies ik = Ik + 8/k where 8/k « 1 and all zeroes are interior and have multiplicity two. Using a Taylor series approximation we have

Plugging into (4.7) we obtain

K

£(a) ~ V(PSIP) - ~ L A(fk)PI/(jkj a) 18/k12 .

k=l

(4.8)

The coding performance is insensitive in the first order to errors 81k and is linear in 181k1 2 . This result has a favorable impact on some of the numerical algorithms of §5, which aim at explicitly identifying the optimal zeroes but inevitably incur numerical errors. It also suggests an implementation of the optimal filter bank in cascade form.

It is even possible to derive a good bound for (4.8) using the following result.

Lemma 4.2 If P(a, I) is a feasible polynomial and Ai, 1 ~ i ~ K is an optimal solution of the dual, then: (i) P'(a, I) ~ 27r(2N - 1), 0 ~ I ~ 0.5 , (ii) pl/(a,1) ~ 47r2(2N _1)2, 0 ~ I ~ 0.5.

Proof: The trigonometric polynomial Q(a, I) = P(a, I) -1 has degree 2N-l and satisfies IQ(a, 1)1 ~ 1. Applying Bernstein's theorem [35, p.ll) yields the tight bound IQ'(a, 1)1 ~ 27r(2N - 1), hence (i). In order to prove (ii), let Q(a, I) = [27r(2N - 1))-1 P'(a, I) and apply the same argument again. 0

346 CHAPTER 10

By application of Lemma 4.2(ii) we have

K

V(PSIP) - £(a) ~ 2rr2(2N - 1)2 L A(fk)18fkI2 k=l

~ 2rr2(2N _1)2 (t,A(fk)) mrxl8fkl2.

On the other hand E~=l A(fk) = V(PSIP) -ro/2, by application of (3.7) and optimality of A. This yields a simple bound on the normalized error due to perturbation of the optimal zeroes,

V(PSIP) - £(a) < 2 2(2N _ 1)2 Irf 12 V(PSIP) _ ro/2 - rr. mrx uk·

5 ON SOME SIP ALGORITHMS FOR QUADRATURE MIRROR FILTER DESIGN

5.1 Discretization methods

(4.9)

A simple and intuitive idea for solving the SIP problem is based on discretization of the frequency axis. Define a set of M + 1 frequencies :F M = {Ii, 0 ~ i ~ M} on the interval [0,0.5]. Let PM be the optimization problem (3.4)(3.5) with the positivity constraints enforced at f E :F M only. Since PM has a finite number of constraints, it is a linear program (LP) and may be solved using the standard simplex algorithm [8]. The solution a of PM is generally not feasible for the SIP probfem, so a small modification of a is needed to produce a feasible solution a. The next section provides more details on these operations.

5.1.1 Algorithms

We consider the two discretization algorithms summarized in Tables 1 and 2. The first step in each algorithm consists in choosing a uniform discretization, Ii = 0.5i/M. Then the dual LP [8]

SIP and Wavelets

minA E~o.Ai subject to

° ~ n < N, O~i ~ M

347

is solved using a double-precision version of the standard simplex algorithm in [23]. The standard substitution Xi = cos(21l' h) allows the trigonometric functions cos (211' Ii (2n + 1)) to be efficiently evaluated by recursive computation of the Chebyshev polynomials T2n+1 (Xi), ° ~ n < N [22]. By the fundamental theorem of LP, .A has Z ~ N nonzero components with indices in a set Az . These components are typically clustered in pairs.

The second step is to compute a feasible a for the SIP problem (3.4)(3.5). The two algorithms use a different technique to attain this goal. Using the complementary slackness property (see (4.2)), Algorithm I computes the (infeasible) solution ii of the LP problem as a solution to the linear system

P(fi;a) = 0, i E Az . (5.1)

See [20] for a characterization of uniqueness of solutiun.

Next, the minimum -8 ~ 0 of {P(/; ii), ° ~ I ~ 0.5} is computed. A feasible a is then obtained as

a = 0,/(1 + 8). (5.2)

This is a classical technique [26]. In contrast with Algorithm II below, it generally produces only one double zero on the unit circle. The third and final step of the algorithm consists in performing a spectral factorization of the feasible P(z;a), producing the desired filter H(z).

Algorithm II does not compute ii but uses the dual solution to seek directly a polynomial P(z; a) with roots on the unit circle. Given {Ii, i E Az }, the zeroes Ok, 0 ~ k < K} of P(/; a) and their multiplicities {ILk} are computed in Step 2 using a clustering technique. There is at most one k such that A E {a, 0.5}; if one such k exists, let ILk-l +- i-'k2-1. Assuming that the zeroes come in pairs but not in quadruples, etc ... , the clustering technique simply consists in assigning A to the center of gravity of the pair (/2k, hHd. Finally the following system of Z linear equations is solved for the N unknowns {an}:

(5.3)

348 CHAPTER 10

Here the final step of the algorithm (spectral factorization of P(z; a))is aided by the fact that the zeroes of P(z; a) on the unit circle are available. Indeed, P(z; a) may be deflated as indicated in Table 2 and spectral factorization performed on a reduced-order polynomial. Additionally, since spectral factorization is an ill-conditioned problem when zeroes are of multiplicity greater than one (as is the case for our zeroes on the unit circle) or in close proximity [4, p.174j[16], a potential source of instability is eliminated.

In [20,21] Lang and Frenzel's spectral factorization software [15,16]' was which provides an estimate for the relative accuracy of the zeroes in the complex plane.

In contrast with Algorithm I, there is unfortunately no guarantee that Step 2 of Algorithm II will produce a feasible a. However, the method succeeds provided that the discretization is sufficiently fine [8, pp.141-142]. A further refinement of the method, not implemented here, consists in applying a local optimization algorithm to optimize Ok} [8].

5.1.2 A brief analysis of elementary discretization algorithms

The discretization step is tlf = 0.5/M. It is easy to compute bounds on V(PSIP) in terms of £(a) and £(a). Indeed, we have

£(a) :S V(PSIP) :S £(a) (5.4)

where the first inequality follows from the definition of V(PSI P) and the second comes from the fact that the discretized problem has less constraints than the SIP problem [8, p.15]. By the duality result in [11, Theorem 6.11] and continuity of the trigonometric functions in (3.4), £(a) = V(PM) can be made arbitrarily close to V(PSIP) by choosing M large enough [8, p.113] [24].

Of particular interest is the performance of the solution a relative to the optimal solution V(PSIP). For Algorithm I, Theorem 5.1 below provides an upper bound on the error due to discretization. This bound tends to zero as (N/M)2. For M = 20N the error is less than 1.3%. The numerical experiments in §6 confirm that M ~ 20N is a reasonable design.

Theorem 5.1 The normalized error incurred by discretization

V(PSIP) - £(a) e = ---: ___ ::-:----;--.-..:...,...:.. V(PSIP) - ro/2

is upper-bounded by ~2 (N / M)2.

SIP and Wavelets

Proof: Using (3.5) and (5.2) we obtain

£(a) - ro/2 = (1 + 8) (£(a) - ro/2) .

Plugging £(a) from (5.5) into (5.4) yields

V(PSIP) - £(a) < _8_ V(PSIP) - ro/2 - 1 + 8

349

(5.5)

(5.6)

where 8 is defined above (5.2). The term 1!'; may be bounded as follows. Let Imin be such that P(fmin; a) = -8. The grid point Ii closest to Imin is such that I/min - Iii:::; O.25IM. Clearly, P'(fmin; a) = 0 and P(fi; a) ~ O. By Taylor's remainder theorem, there exists f* E [J min, Id such that

Since IP(f; a) - 11 :::; 1 + 8 is true, applying arguments from Lemma 4.2 yields plI(f; a) :::; 47[2(2N -1)2(1 + 8). Replacing plI(f; a) and I/min - Iii in (5.7) by their bounds derived above yields

_8_ < 7[2 (NIM)2. 1 +8 - 2

The statement of the theorem follows from (5.6) and (5.8) 2.

(5.8)

o

To briefly compare the performance of Algorithm II relative to Algorithm I let

11(f) ~ P(f; al) - P(f; all). (5.9)

By construction we have 0 :::; I1(A) :::; 8. Let A be an optimal solution of the dual with nonzero components at Ik. Then by (4.7)

K

£(all) - £(al) = L A(fk)I1(/k). (5.10) k=l

It can be shown that A -+ Ik as MIN -+ 00. Since 11(·) is a continuous function (trigonometric polynomial), it is reasonable to expect that for MIN large enough

K

£(all) - £(ar) ~ L A(fk)l1(jk) ~ 0, (5.11) k=l

2The expression for e is similar to (4.9) but the present derivation does not require explicit identification of the optimal zeroes.

350 CHAPTER 10

namely, the performance of Algorithm II is at least as good as that of Algorithm 1. An experimental verification of this conjecture appears in Section 8.1; a rigorous proof would entail considerable technical difficulties.

When using either of the discretization algorithms, it is recommended to also compute the bounds (5.4) on the optimal coding gain which provide a measure of confidence in the result of the algorithm. If the bounds are not close enough, the discretization should be made finer. This observation points out to the use of more sophisticated discretization techniques. There is ample literature on this subject, see [24] for a discussion of adaptive discretization techniques.

When M / N is very large, the simplex algorithm experiences near-degeneracy. Indeed, clustered zeroes give rise to nearly dependent linear constraints [8, pp.136-137] [24]. For Algorithm I, clustered zeroes have the additional consequence that the linear system (5.1) is numerically ill-conditioned.

The solution to the simplex algorithm is obtained after a finite number of steps, typically O(M) [23]. Computation of the solution to the linear system (5.1) (resp. (5.3)) for Algorithm I (resp. II) is an O(N3) operations.

5.2 Central cutting plane SIP

We present a brief description of the Central Cutting Plane SIP (CCPS!P) method [6,9,14]. For convenience we rewrite part of (3.5) as aT r. In general the maximation of aT r subject to the infinite system (3.4) using CCPS!P requires additional finite polyhedral constraints defining a compact polyhedron, an E K. For the case at hand

(5.12)

The idea of a cutting plane method, also termed column generation, for maximizing aT r subject to (3.4) appeared in the literature at least 45 years ago. At any stage of a cutting plane method, a finite number of cuts have been generated as a finite subset of the full system (3.4), and one solves the finite ordinary LP problem. Now CCPS!P is one of those methods which permits the dropping of cuts in order to keep the LP problem size manageable. However, unlike many other methods, CCPS!P can generate a cut from any violated constraint, while guaranteeing convergence of both primal and dual SIP optimal solutions. (The method additionally does assume existence of an interior point, namely a point where all constraints can be satisfied strictly.) This particular feature about convergence is useful for obtaining good starting solutions for solving the


nonlinear system of equations derived from the necessary Karush-Kuhn-Tucker (KKT) first-order necessary optimality conditions [14].

The method is actually an interior point method because each finite LP problem gives the largest sphere which can be drawn within all of the cuts generated so far and the upper bound on the objective function aT r, whose center lies within the convex polyhedron K. As additional cuts are added, see [C], the size of the sphere shrinks. The process continues until, within a finite number of iterations, the current sphere lies within the interior of the constraint set (3.4) and the set K. At this point if the feasible point is not optimal, then the objective function line is moved outwards to increase its level, i.e., aT r increases.

It is always recommended that any cutting plane method be combined with a nonlinear equations solver on the KKT first order system, where now both the "locations" f, the "masses" >"U), and primal variables a are variables in the nonlinear system. But only relatively recently have good starting solutions been possible via CCPSIP methods (14). The latter extension applies to problems in complex approximation, where infinitely convex inequalities arise naturally.

The LP subproblems are solved with the Georg and Hettich LINOP package (7), where the authors have shown that the most stable simplex method implementation occurs when the orthogonal Q-matrix is retained and itself updated through successive iterations. Among previously reported numerical experiments on solving discretized linear SIP problems using LINOP is the 1993 paper (13).

The convex SIP implementation [14) employs a nonlinear Krylov solver NKSOL of Brown and Saad [2], where the Newton equations are solved in an approximate sense by a linear Krylov iteration. The Newton equations themselves stem from first order necessary optimality conditions of KKT type.

6 NUMERICAL RESULTS

The algorithm was tested on three types of input signal. In all three cases we specify the stochastic characteristics by specifying the corresponding Toeplitz matrix of the design problem, see (2.6). The first two cases can be viewed as "near-degenerate" processes, in the sense of §4.1.

352 CHAPTER 10

1. AR(I) process with correlation coefficient p = 0.95 (simple image model [3,28,29]). In this case rn = pn.

2. AR(2) process with poles at z± = pe±ifJ, with p = 0.975 and () = 7r /3. (Models certain types of image texture.) Then r n = 2p cos () r n-i - p2r n-2,

·th 1 d 2pco~fJ Wi rO = an ri = i+p .

3. lowpass process with box spectrum (Le., rectangular distribution S(f) = (2Is)-lrect[_f.,f.] (f), with Is = 0.225. The optimization criterion is then

equivalent to minimization of the energy in the stopband [0.5 - Is, 0.5] of H(f) [31]. Then rn = sini;i!~n).

Our numerical results are summarized below; for more details, see [20].

6.1 Discretization algorithms

Results are shown in Tables 3 and 4 for 8-tap (N = 4) and 20-tap (N = 10) filters, respectively. The coding gain computed by Algorithms I and II is a lower bound on the optimal coding gain, see (5.4). For comparison the upper bound in (5.4) is also listed. When M/N is too small, the discretized problem may be an unacceptably poor approximation to the SIP problem, e.g., in the AR(2) example with N = 4 and M = 10, the feasible set is unbounded, and the "approximate" solution tends to infinity! The same artifact occurs in the AR(2) and box-spectrum examples with N = 10 and M = 20. As M / N increases the brackets on the optimal coding gain get tighter, as predicted by theory. Even with double-precision arithmetic, numerical instabilities occur when M / N is too large (not shown here).

In most of the examples studied Algorithm II yielded a larger coding gain than Algorithm I. However, the difference was minute for large M / N. The results obtained with M ~ 20N were almost identical to those obtained with the more sophisticated CCPSIP method.

6.2 Central cutting plane SIP algorithms

For the case N = 4 we applied the linear CCPSIP implementation and then automatically clustered the mass points by simply taking centers of gravities, in essentially the same way as done in Discretization Algorithm II. We then input a clustered distribution to the convex SIP solver, which was "accepted"


by NKSOL since the norm of the KKT system, together with a "matching" of first derivatives (see (5.3», was below a user-specified threshold.

For N = 10 we applied the linear CCPSIP implementation but could not readily fine tune the NLE solver as was the case for N = 4. The parameters are now set for the default mode. We obtained extremely accurate optimal solutions for the non-clustered dual solution, but we wished to test the effects of clustering the mass points, as we did for the case N = 4. Therefore, after doing the clustering we checked the clustered dual solution for KKT accuracy, and found that the accuracy was sufficient. Clustering mass points of course, has no effect on the accuracy of the primal solution to PSI P.

These results for the clustered dual solutions are presented in Tables 5 and 7. The location !k of the zeroes and corresponding value A(fk) of the dual variable (or sensitivity parameter) are displayed in the first two columns. Also shown in the third column is the complementary slackness, i.e., the value of the right-hand side of (4.4). A perfect match between solutions of the primal and dual problems would imply that this value is exactly zero. The product filter coefficients for the N = 4 design are displayed in Table 6.

7 REGULARITY CONSTRAINTS

In image coding applications using tree-structured QMF banks it is desirable for the lowpass filter H(f) to have at least one zero at f = 0.5, in order to enhance the visual quality of the reconstruction [33, p.414). (The technical argument in [33, p.414] uses wavelet filters but can be extended to the case of scale-dependent filter banks.) Here we outline a method for forcing L zeroes of H(f) at f = 0.5 and maximizing the coding gain over the remaining degrees of freedom. -The regularity constraint is expressed as L linear constraints on a,

n

The new optimization problem is (3.4)(3.5) together with the L additional constraints (7.1). Then L variables, say aN-L,···, aN-I, may be eliminated. The remaining variables are solution to a new (N - L)-dimensional SIP problem. Related elimination techniques have been reported in [1,25).

354 CHAPTER 10

8 CONCLUSIONS

We have shown that designing an orthogonal FIR-QMF bank so as to maximize the coding gain is a much simpler operation than was previously recognized. The coding gain depends on the coefficients (3.2) of the product filter only, and not at all on the phase of the QMFs. The transformation (3.2) leads to a linear SIP problem. Some of the useful properties of this formulation are: (I) every locally optimal solution is also globally optimal; (2) SIP algorithms are numerically stable; (3) degeneracy and ill-conditioning are much easier to control than in standard nonlinear optimization; (4) the solution is matched to a solution of the dual SIP problem. The dual solution is automatically supplied by the numerical algorithms and conveys useful information about the properties of the input signal relevant to the coding application. In particular, the sensitivity parameters are also indicators of degeneracy.

The first class of algorithms studied was based on discretization. The main advantage here is simplicity: the core of the method is the widespread simplex algorithm. Theorem 5.1 has shown that the solution has arbitrary accuracy when the discretization index M / N is made large enough. Convergence is quadratic, but care must be taken not to make the discretization overly fine, due to potential ill-conditioning problems. The second class of algorithms uses cutting-plane methods. These methods are very effective as a general-purpose tool for solving SIP problems. They have very high accuracy, but require more elaborate software.

The methods described here have the flexibility to be extended to a variety of optimization problems with an infinite number of constraints of the form (3.4). For instance, in §7 we showed how a finite number of additional linear constraints may be handled. Additionally, SIP techniques apply directly to any linear objective function of the form (3.5): {r2n+d need not not restricted to be one of the autocorrelation sequences defined in §2.1. Other possible extensions of the method even include problems with a nonlinear objective function. Cutting-plane methods have been extensively applied to such problems [11]. Of special interest is the case of convex objective functions, for which globally optimal solutions can still be guaranteed. Another extension of the problem consists in designing signal-adapted paraunitary M -band QMF banks, under an energy compaction criterion tailored to progressive transmission and coding applications [19,21]. Finally, it should be noted that the methods developed in this paper are not directly applicable to the design of signal-adapted multidimensional (M-D) FIR filter banks, due to the lack of a spectral factorization theorem for M-D polynomials. However, numerical solutions have been found


for some constrained designs [34], and the solution for unconstrained-length filters is a simple extension of known I-D results.

Acknowledgement The authors are grateful for the very helpful constructive comments made by a referee.

REFERENCES [lJ K. C. Aas, K. A. Duell, and C. T. Mullis. Synthesis of extremal wavelet

generating filters using Gaussian quadrature. IEEE 1rans. Sig. Proc., 43:1045-1057, 1995.

[2J P. N. Brown and Y. Saad. Hybrid Krylov methods for nonlinear systems of equations. SIAM J. Sc. Comput., 11:450-481, 1990.

[3J H. Caglar, Y. Liu, and A. N. Akansu. Statistically optimized PR-QMF design. SPIE, 1605:86-94, 1991.

[4J I. Daubechies. Ten Lectures on Wavelets. Number 61 in CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, PA., 1992.

[5J P. Delsarte, B. Macq, and D. T. M. Slock. Signal-adapted multiresolution transform for image coding. IEEE 1rans. Info. Theory, 38:897-904, 1992.

[6J J. Elzinga and T. G. Moore. A central cutting plane algorithm for the convex programming problem. Math. Programming, 8:134-145, 1975.

[7J K. Georg and R. Hettich. On the numerical stability of the simplex algorithm: The package linop. Technical report, The University of Trier, Trier, Germany, April 1985.

[8J K. Glashoff and S.-A. Gustafson. Linear Optimization and Approximation. Number 45 in Applied Mathematical Sciences. Springer-Verlag, Berlin-HeidelbergNew York, 1983.

[9J P. R. Gribik. A central cutting plane algorithm for semi-infinite programming problem. In R. Hettich, editor, Semi-Infinite Programming, number 15 in Lecture Notes in Control and Information Sciences, pages 66-82. Springer-Verlag, 1979.

[10J S. Gustafson and K. O. Kortanek. Numerical treatment of a class of semi-infinite programming problems. Naval Res. Logistics Quart., 20:477-504, 1973.

[l1J R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods, and applications. SIAM Review, 35:380-429, 1993.

[12J N. S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, 1984.

[13J K. O. Kortanek. Vector-supercomputer experiments with the linear programming primal affine scaling algorithm. SIAM J. Scientific and Statistical Computing, 14:279-294, 1993.

356 CHAPTER 10

[14] K. O. Kortanek and H. No. A central cutting plane algorithm for convex semiinfinite programming problems. SIAM J. Optimization, 3:901-918, 1993.

[15] M. Lang and B.-C. Frenzel. Software available by anonymous ftp from cml.rice.edu: /pub/markus/software, 1992. @1992-4 LNT.

[16] M. Lang and B.-C. Frenzel. Polynomial root finding. IEEE Sig. Proc. Lett., 1:141-143, 1994.

[17] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11:674-693, 1989.

[18] Y. Meyer. Wavelets Algorithms f1 Applications. SIAM Society for Industrial and Applied Mathematics, Philadelphia, PA, 1993. Translated and Revised by Robert D. Ryan,.

[19] P. Moulin, M. Anitescu, K. O. Kortanek, and F. Potra. Design of signal-adapted FIR paraunitary filter banks. In Proc. ICASSP, volume 3, pages 1519-1522, Atlanta, GA, 1996.

[20] P. Moulin, M. Anitescu, K. O. Kortanek, and F. Potra. The role of linear semiinfinite programming in signal-adapted QMF bank design. IEEE Transactions on Signal Processing, 45:2160-2174, 1997.

[21] P. Moulin and K. M. Mihcak. Theory and design of signal-adapted FIR paraunitary filter banks. Technical report, The University of Illinois Beckmann Institute, Champaign/Urbana, IL, 1997. to appear in IEEE Transactions on Signal Processing, Special Issue on Applications of Wavelets and Filter Banks, 1998.

[22] T. W. Parks and C. S. Burrus. Digital Filter Design. J. Wiley & Sons, 1987.

[23] W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge: Cambridge University Press, 1988.

[24] R. Reemtsen. Discretization methods for the solution of semi-infinite programming problems. J. Opt. Theory and Appl., 71:85-103, 1991.

[25] O. Rioul and P. Duhamel. A Remez exchange algorithm for orthonormal wavelets. IEEE Trans. Circ. and Syst. II. An. and Dig. Sig. Pmc., 41:550-560, 1994.

[26] M. J. T. Smith and T. P. B. III. Exact reconstruction techniques for treestructured subband coders. IEEE Trans. A SSP, 34:434-441, 1986.

[27] G. Strang and T. Nyugen. Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley, MA, 1996.

[28] M. Unser. An extension of the Karhunen-Loeve transform for wavelets and perfect-reconstruction filterbanks. SPIE, 2034:45-56, 1883.

[29] B. Usevitch and M. T. Orchard. Smooth wavelets, transform coding, and Markov-1 processes. In Pmc. ISCAS'93, pages 527-530, 1993.


[30) P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall, 1993.

[31) P. P. Vaidyanathan and P.-Q. Hoang. Lattice structures for optimal design and robust implementation of two--channel perfect-reconstruction QMF banks. IEEE Trans. A SSP, 36:81-94, 1988.

[32) L. Vandendorpe. CQF filter banks matched to signal statistics. Signal Processing, 29:237-249, 1992.

[33) M. Vetterli and J. Kovacevic. Wavelets and Subband Coding. Prentice-Hall, 1995.

[34) B. Xuan and R. H. Bamberger. Multi-dimensional, paraunitary principal component filter banks. In Proc. ICASSP'95, pages 1488-1491, Detroit, 1995.

[35) A. Zygmund. Trigonometric Series. Cambridge University Press" 1959.

358 CHAPTER 10

Table 1 Summary of Discretization Algorithm I

Step 1 Define {Ii = O.5i/M,O ~ i ~ M} Solve dual LP using simplex algorithm ~ zeroes {Ii, i E Az}

Step 2 Compute solution a of linear system (5.1). Find minimum -0 ~ 0 of {P(jj a), 0 ~ f ~ O.5}. Let a = a/(1 + 0).

Step 3 Compute spectral factors of P(Zj a) = H(z)H(z-l).

Table 2 Summary of Discretization Algorithm II

Step 1 Define {Ii = O.5i/M,O ~ i ~ M} Solve dual LP using simplex algorithm ~ zeroes {Ii, i E Az}

Step 2 Cluster {Ii, i E Az} ~ zeroes {ik} of P(jja) with multiplicities {JLd.

If ik E {O,O.5} let JLk = JLk/2. Compute solution a of linear system (5.3).

Step 3 Let Zk = e i21T ik and compute V(z) = P(z)/ TIk(z - Zk)l-'k (z - Zk)l-'k.

Compute spectral factors of V(z) = Hv(z)Hv(Z-l). Let H(z) = Hv(z) TIk(z - Zk)l-'k/2(Z - zk)l-'k/2.

SIP and Wavelets

Table 3 Coding gain in dB for 8-tap filters (N = 4): adapted filters using Algorithms I and II and CCPSIP, and Daubechies' non adapted D8 filter. UB is the upper bound (5.4) on the optimal coding gain.

Process ViscretizatIOn l,;l,;P~lP V~

M Alga 1 Alga 11 Uti AR(I) 5.862 5.810

10 5.449 5.860 5.892 20 5.660 5.855 5.867 50 5.845 5.862 5.863 90 5.857 5.862 5.862

AR(2) 6.070 2.632 10 - - 00

20 5.939 5.921 6.796 50 6.058 6.056 6.170 90 6.042 6.040 6.072

box-spectrum 4.885 3.431 10 3.616 4.220 6.308 20 4.738 4.855 5.118 50 4.853 4.872 4.915 90 4.847 4.878 4.891

Table 4 Coding gain in dB for 20-tap filters (N = 10): adapted filters using Algorithms I and II and CCPSIP, and Daubechies' nonadapted D20 filter. UB is the upper bound (5.4) on the optimal coding gain.

Process DiscretizatiOn l,;l,;P~lP V:W M Alga 1 Alga 11 Uti

AR(I) 5.945 5.872 20 3.782 5.924 5.962 50 5.686 5.943 5.945 90 5.878 5.945 5.945 190 5.932 5.945 5.945

AR(2) 6.835 3.402 20 - - 00

50 6.567 6.803 6.852 90 6.805 6.831 6.838 190 6.831 6.835 6.837

box-spectrum 9.879 4.316 20 - - 00

50 8.997 9.838 10.252 90 9.358 9.794 9.944 190 9.779 9.869 9.899

359

360 CHAPTER 10

Table 5 Results for Dual Solution by the CCPSIP Method : N = 4.

rocess

ox-spectrum 2 .44549 .96741 .67372E-09

Table 6 Results for primal Solution by the CCPSIP Method : N = 4

Process

AR 2

ox-spec

Table 7 Results for Dual Solution by the CCPSIP Method: N = 10.

rocess

ox-spectrum 2 3 4 5

.48952

.44761

.39889

.34941

.29973 .472435 .417435 .363570 .333373 .303585

.429598

.382922

.336958

.294065

1.69021 .17313 .04590

.018029

.006910

.015882

.028371 .15765

1.62752 .127142

.416314

.412895 .40190

.340812

2.335E-1O 1.606E-10 2.215E-ll -1.359E-ll

11 THE DESIGN OF NONRECURSIVE

DIGITAL FILTERS VIA CONVEX OPTIMIZATION

Alexander W. Potchinkov

Brandenburgische Technische Universitiit Cottbus, Fakultiit 1, Postfach 101344, D-03013 Cottbus, Germany,


ABSTRACT

The advantages of optimization in filter design over strongly specialized methods based upon approximation theory are well known since some years, first of all in the area of constrained linear-phase filter design. Mainly finite linear optimization has been used which requires the discretization w.r.t. the frequency variable and, if necessary, the linearization of important nonlinear filter characteristics. The work here is founded on convex finite and semi-infinite optimization. The approach avoids the discretization step and thereby, in particular, enables the design of large filters on personal computers. Moreover, convex functions such as the magnitude response, magnitude of the complex approximation error, and some quadratic error functions can be used in their original form.

1 INTRODUCTION

An interesting and above all profitable application of nonlinear and especially semi-infinite optimization is the design of digital filters w.r.t. the frequency domain, where here nonrecursive (FIR) digital filtersl are considered. It is

IThere are two types of digital filters, recursive and nonrecursive digital filters. In principle, both types of filters can be used for the same technical applications. But some differences w.r.t. the realization have to be considered. The paper is restricted to nonrecursive filters (polynomial approximation), which differ from recursive filters (rational approximation) by normally significantly larger filter lengths (up to about 2000 in contrast to about 50), such that their design leads to correspondingly high dimensional design problems. Some cases, however, are known where the nonrecursive filter needs a smaller number of coefficients than the recursive filter or a reduced computational demand in the implementation, if compara-

361


362 CHAPTER 11

profitable since, by that, it has significant advantages over the numerous already existing design methods in regard to the variety of design goals, the handling of large numbers of variables, and reliability of computer programs.

Optimization is a known tool for filter design. Semi-infinite optimization methods, however, have not been applied before, although important cases of minimax designs or designs w.r.t. a priori tolerance schemes are semi-infinite programming problems. On the other hand, classical finite optimization methods can be applied to discretized design problems only. And, more momentously, semi-infinite optimization allows to design filters of high degree by means of the widely used personal computers.

After an active phase of optimization in filter design during the late sixties and seventies, where optimization first and foremost was employed for the design of analog and recursive digital filters, it does not play the adequate role presently. In fact, the flexibility with regard to constrained design is known and appreciated, but, at the same time, inefficiency in view of speed, waste of computer resources, and difficulties at high degree designs are supposed. The advances in optimization (see for example SQP methods, interior point methods and, most important, semi-infinite optimization) have been appreciated by the filter designers only partially. Papers on minimax design of FIR filters, which are based on a fixed discretization and application of the simplex method of linear programming and thereby feed the above mentioned prejudices, can be found even in the nineties. On the other hand, heuristic methods are still suggested nowadays. For example, several heuristic methods result from the desire to knock the well established Remez exchange method of linear real Chebyshev approximation together with complex approximation, where even malfunction is accepted if only a couple of designs can be completed more or less successfully. In contrast to that, semi-infinite optimization now offers the flexibility of optimization in combination with robust methods for high degree designs which are founded on a rigorous mathematical basis and have proved convergence. To say it shortly, there is no reason to furthermore construct ad hoc methods for different problem classes which leave the users in the dark, if they want to get information about their ability to find reliable solutions.

Potchinkov and Reemtsen introduced convex semi-infinite optimization into FIR filter design. As results of this cooperation, papers have been published

ble technical qualities are to be reached. Furthermore, some characteristics can be obtained exactly by nonrecursive (linear phase) and recursive filters (all pass characteristic) only. The number of filter coefficients nowadays has lost importance since modern digital signal processors offer high computational power at low costs, in combination with intelligent and high speed interfaces.

FIR Filter Design 363

about complex Chebyshev approximation [16, 17], design of linear phase FIR filters [14], and the problems of simultaneous approximation of magnitude and phase [18] or magnitude and group delay responses [13]. Convex optimization fits exactly the design of linear phase FIR filters and complex approximation. The nonlinear simultaneous approximation problems can be solved by that in good approximation. Furthermore, Potchinkov and Reemtsen developed a method which has been used successfully for numerous filter designs including also commercial ones [11]. By that, in particular, large filters can be computed by personal computers since their design can be performed by a sequence of relatively small subproblems. Up to now, nonlinear phase filters with up to 1000 coefficients and linear-phase filters with up to 2000 coefficients were designed, showing, for example, 5 to 8 significant decimals in case of classical minimax designs.

The earlier filter design methods were based on multiple exchange algorithms for linear phase FIR filter design [9,12]' on single exchange techniques for complex Chebyshev approximation [2,3,24], on least squares techniques (e.g. [1]), and on finite linear programming (e.g. [29]). In addition there exist many heuristic methods. Especially the exchange algorithms were founded on approximation theory and hence only have limited scopes. To give an example, the Remez-II method became very popular for the important linear phase FIR filter minimax design, since Parks and McClellan [10] published a computer program in the early days of digital filter design. Derivatives of this program were incorporated into many software environments. But the program does not allow constrained minimax designs, an important feature which does not cause any principal difficulties for optimization techniques.

The paper contains six sections. The second section settles some basic ideas of the theory of digital FIR filters. For this, the digital signal is introduced as a discrete time signal, which arises from the sampling of time continuous physical signals and is suited for being processed by digital computers. The digital filter then is a system which processes digital signals. In the third section some application fields of digital filters are indicated. The design aspects and design goals are classified in the fourth section by four main problems, which, in combination with error valuations, lead to mathematical approximation problems. The topic of the fifth section is the convex optimization problem. The presentation of the relations and analogies between optimization and filter design connects both worlds. Two numerical examples, based on semi-infinite optimization, are the contents of the sixth section. The examples are chosen such that the technical application can be seen without difficulties and show the large variety of filter design problems. A short conclusion terminates the paper.

364 CHAPTER 11

2 CHARACTERISTICS OF FIR FILTERS

The characteristics of digital filters can be given here in concentrated form only. Beside many others, the books of Rabiner and Gold [20], Taylor [30], Schussler [25], and Parks and Burrus [11] discuss the digital filter in detail, beginning from theory over design up to its realization. This section contains a short description of FIR filters w.r.t. time and frequency domain and is intended to give an impression of a filter realization as a minimal system of modern signal processing electronics.

2.1 Signals and filters

Commonly, signals are functions of one or more independent variables. Here a single, independent variable is used, namely the time t. Signals represent time dependent characteristic functions of physics such as currents and voltages in electrical networks. The argument t indicates time continuous signals denoted as x(t), for example. Mathematically, a digital signal is a sequence consisting of quantized and coded numbers, which can be processed or filtered by a (possibly specialized) digital computer such as a digital signal processor. Analog to digital converters (ADCs) and digital to analog converters (DACs) change time continuous signals to digital signals and conversely. A digital signal is a sequence (x(n)) where the integer argument n or the index includes the value o and corresponds to a (discrete) instant tn. It is assumed that tn = nT, where T = 1/ F indicates the sampling (time) interval and F the sampling frequency.

Obviously, the duties of an ADC are to take samples of the bandlimited time continuous function x(t) at instants t = tn, to quantize and code them, and to finally place the flow of digital numbers to the disposal of the digital computer for the purpose of being processed there. On the other hand, the DAC together with a reconstruction filter forms a bandlimited time continuous signal from a digital signal. Under the condition of sufficiently bandlimited time continuous signals, the error of conversion depends on the quantization or the digital wordlength, respectively. But modern converters in the area of audio signal processing, corresponding to a bandwidth of more than 20kHz, offer wordlengths up to 20 bit (even 24 bits in the near future) which cause a negligible, inevitable quantization error at many technical applications.

A system is a mathematical model of a (physical) object with inputs and outputs, where the output signal is related to the input signal through the system transformation. Some suitable mathematical methods as, for example, the z-


and the Fourier transform are used to describe the input-output relations of the system. In the following, the digital input and output signals of the time domain are written as (x(n)) and (y(n)) and those of the frequency domain as X(w) and Yew).

A length N nonrecursive digital filter is a system which processes the digital signal (x(n)) by computation of the discrete convolution

N-l yen) = Lk=o h(k)x(n - k), (2.1)

which provides the digital signal (y(n)) without having to leave the time domain. The coefficients h(k), k = 0, ... , N - 1, called filter coefficients, are usually and here assumed to be real-valued since data converters work with real-valued signals only. The finite sequence (h(n)) corresponds to the unit impulse response of the filter, an important characteristic in system theory which is received after excitation of the filter by the unit impulse 8(n). If n indicates the present sample of the signal x(n) at the time instant t = nT, then x(n - k) is the sample which lies kEN instants backwards in time. The filters of this paper are "linear time invariant causal nonrecursive digital filters2" which are shortly denoted as "FIR filters" in the following.

2.2 Time domain description

Equation (2.1) is the time domain computational scheme of the filter. The finite duration impulse response is given by

",N-l i(n) = ~k=O h(k)8(n - k) (2.2)

where 8(n) denotes the unit pulse. Equation (2.2) characterizes the FIR filter completely in the time domain.

2 A filter is said to be linear, if the response to a sum of two signals is the sum of the two responses and scaling the input by a constant results in the output scaled by the same constant. A filter is said to be time invariant, if a time shift in the input signal causes a time shift in the output signal. A filter is said to be causal, if the output at any time depends only on values of the input at present time and from the past. A filter is said to be nonrecursive, if only input samples are used to compute the output samples.

366 CHAPTER 11

2.3 Frequency domain description

Application of the single sided z-transform to the sequences (x(n)), (y(n)), and (h(n)) leads to the complex-valued transfer function H(h,z) of the FIR filter (z-transform of the impulse response) defined by

Y(z) ",N-l -n H(h, z) = x(z) = L."n=O h(n)z .

Evaluation of the transfer function on the unit circle leads to the complexvalued frequency response of the filter

(2.3)

where P = -1. The frequency response completely characterizes the digital filter in the frequency domain, like the impulse response (2.2) does in the time domain. Specifications of filters preferably are drawn up in the frequency domain. Therefore the frequency response is a more important characteristic function than the impulse response w.r.t. specification and design. This coarsely describes filtering, namely the change of spectra of signals or the spectral separation of (parts of) signals.

Since linearity of the filter has been assumed, a harmonic input signal of the form

x(n,w) = sin(wn)

leads also to an harmonic output signal of the form

y(n,h,w) = M(h,w)sin(wn + cp(h,w)).

(2.4)

(2.5)

The output (2.5) corresponds to the input (2.4), weighted by M(h,w) and phase shifted by cp(h,w). The (angular) frequency w of the signal remains unchanged, for which reason the harmonic functions defined by f(n,w) = exp(jwn) are eigenfunctions of the filter. The functions M and cp in (2.5) are further important characteristic functions in the frequency domain, which are called the magnitude and phase response and which can be written as

M(h,w) = IH(h,w)l, cp(h,w) = arg(H(h,w))

or, correspondingly to (2.3), as

H(h,w) = M(h,w)eicp(h,w).

Finally, the group delay response T(h, w) is often needed. In case it is (piecewisely) constant, this response describes a time delay which belongs to groups of frequencies of the signal. The group delay is computed by

a (L N - 1 h( ) -iwn ) r(h,w) = __ a cp(h,w) = Re n;;~ln n e . . w Ln=o h(n)e-Jwn

A special characteristic function is the amplitude response which is the relevant characteristic function of linear phase FIR filters. Frequency response H (h, w) and amplitude response A(h,w) are related by

H(h,w) = A(h,w)ei</>(h,w).

The function ¢J(h,w) is the continuous version of cp(h,w) [11]. The equation

A(h,w) = ±M(h,w)

relates amplitude and magnitude responses. The affine linear functions of phase responses of linear phase filters are determined by the symmetry properties of the impulse responses or the filter coefficients, respectively. Therefore, linear phase FIR filter design means to approximate amplitude responses only, which gives solutions of magnitude response designs. Hence the linear phase FIR filter design is considered as a magnitude response design.

2.4 Realized FIR filters

A minimal low cost system is considered, consisting of a codec, a digital signal processor, a read only memory (e.g. an EPROM), and a clock generator. The system as it is can be used for high quality audio signal processing. The 66MHz signal processor DSP 56002 of Motorola and the 20-Bit codec3 CS4222 of Crystal Semiconductors allow the realization of two length N = 300 FIR filters at a sampling frequency of F=48kHz. The blockdiagram of the system is shown in Figure 1. The filters can be used as parts of a hifi stereo equipment, performing, for example, an effective frequency response equalization of both loudspeaker boxes.

3 A codec integrates ADC and DAC and many necessary peripheral components of conditioning analog signals for digital processing.

368 CHAPTER 11

20 BitCodec Signal Processor r-----------, I I x(n) I

Data x(l)

y(l)

Q) YIn) Q)

iii (.) ra g ra .;:: 't: : Fsync

.;:: 't: DSP Q) Q) Q) Q)

(1)- (1):g E

I SCLI<

ADR

Clock generator

Figure 1 A minimal system of digital audio processing.

3 APPLICATION FIELDS

In the following, some application fields of digital filters are listed.

1. Ideal selective filters Frequency selective filters let pass parts of the frequency band of a signal (passbands) as unaltered as possible and suppress other parts (stopbands) as completely as possible. Passbands and stopbands are labeled as design intervals. The ideal bandlimiting system or filter shows ideal characteristics, i.e. piecewisely constant magnitude response and group delay response. The latter characteristic can be exactly reached only by the linear phase FIR filter. Nonlinear phase FIR filters can approximate a constant group delay which can be chosen smaller than the corresponding group delay of the linear phase filter of same length. The customary selective filters are lowpass, highpass, bandpass, bandstop, and multi band filters, where these names refer to frequency domain applications.

2. Antialiasing and reconstruction filters Data acquisition systems with ADCs and DACs use so-called antialiasing and reconstruction filters in order to bandlimit the time continuous signal before sampling and to reconstruct the desired time continuous signal behind the DAC. These filters can be realized as partly digital filters, where the digital filters significantly improve the characteristics of single applied analog filters.

3. Numerical integration and differentiation Numerical integration and differentiation techniques for functions of the time are usually formulated in the time domain. But it is possible to describe these operations in the frequency domain, which leads to digital filters. This way is advantageous if noise


of known spectrum has to be eliminated in addition to the primarily desired integration or differentiation. Application fields of such filters, for example, are the vibration analysis of mechanical systems, if one is interested, for instance, in the movement and only the acceleration can be measured, or the analysis of human motional mechanics, if velocities and accelerations have to be acquired. Digital first and higher order lowpass differentiators can be used for the latter case, which can be realized as linear phase FIR filters after they have been specified in the frequency domain. The lowpass characteristic gives data smoothing and improves the signal to noise ratio. It should be noted that integrators are preferably to be designed as recursive filters.

4. Simulation and modeling Linear systems are often used along with computer simulations of physical systems [23]. To give an example, the sound level dependent frequency response of the human hearing is simulated by some special frequency weigthing filters together with sound level meters or audio signal analyzers.

5. Data windows The digital Fourier analysis of converted analog or digital signals works with data windows which are employed to extract the necessarily finite partial sequences of the signal. Windowing affects the selectivity of the analysis w.r.t. the frequency and the amplitude resolution. Beside the numerous known cosine series windows, linear phase minimax FIR lowpass filters provide the advantage of the smallest possible maximum sidelobe amplitude of all possible windows.

6. Equalization filters The transfer characteristics of physical systems can be equalized with digital filters. The equalization improves magnitude and phase or magnitude and group delay responses. In contrast to selective filters, the goal now is to form a frequency characteristic and is not to suppress frequency bands.

7. Interpolation Computing a fixed number of equidistant in-between elements (samples) of a sequence is a special task of interpolation. This interpolation can be considered in the frequency domain, where the ideal lowpass filter corresponds to the desired time domain interpolator. Since recursive interpolators cannot let the input sequence pass unaltered, nonrecursive linear phase FIR lowpass filters have to be used.

8. Altering the sampling frequency Lowpass filters, combined with upand/or downsamplers, are needed to reduce or increase the sampling frequency of a signal by an integer or rational factor. The computational demand of sampling frequency alteration can often be decreased by cascaded low factor blocks.

370 CHAPTER 11

If several blocks are in use, it can be necessary to impose flatness conditions on the passband magnitude response in order to diminish the resulting passband magnitude error of the cascade.

9. Analysis filters Many tasks of signal analysis like, for example, a third octave analysis as an important part of acoustical measurements, need filter banks, consisting of bandpass filters which realize the desired signal separation w.r.t. third octave passbands. The frequency contents of the signal are splitted into several bands analyzed separately.

10. Noise suppression The signal energies in time and frequency domain are connected by the Euclidean norm. Filter design w.r.t. different error measures (bounded least squares error, for example) can reach the desired frequency dependent compromise between noise energy suppression and minimax characteristics.

11. Special filters An example of a special filter is the Hilbert transformer which can be used to form analytical signals. The time delay spectrometry, for example, can analyze magnitude and phase responses only by employing a single real valued frequency modulated test signal and a Hilbert transformer in order to obtain the desired analytical signal.

12. Tolerance scheme Technical specifications and standards are given as a priori tolerance schemes. A tolerance scheme is the graph of the frequency dependent tolerance limits of frequency domain characteristic functions. The lowpass magnitude response tolerance scheme is given by the inequalities

1- dp :::; M(h,w) :::; 1 + dp , wE B P = [0, 211"/p], M(h,w) :::; 1 + dp, wE BT = (211" /p, 211"/s) , (3.1) M(h,w):::; ds, WEBs = [211"/s,1I"],

which, with some numbers /p and Is, specify the passband B P , transition band B T, and stopband BS (see below). The numbers dp and ds are the tolerances or the so-called ripple parameters. A priori and a posteriori tolerance schemes are distinguished. The a priori tolerance scheme design problem is to find the minimum length jilter, fulfilling the tolerance scheme which corresponds to single or double sided approximation. The a posteriori tolerance scheme is evaluated after the filter has been designed. These schemes are used to compare different filters.


4 APPROXIMATION PROBLEMS

The approximation problems of filter design consist of elements of five sets of ingredients, namely the sets of functions, design intervals, error valuations, main partial problems, and constraints.

Functions The functions in the approximation problems are the already mentioned characteristic functions as approximating functions, the fixed desired functions4 , and the error functions, defined as difference functions of desired and approximating functions. A list of these functions is given in Table 1.

Table 1 The set of functions.

Name deSIred approx.

error functions functions functions frequency response ll(w) H~h,w) ccth,w - D(w) - H(h,w) magnitude response MD(W) M(h,w) cM(h,w) = MD(W) - M(h,w) phase response j:!D(W) {:i( h,w lOp h,w - (:iD(W) - (:i(h,wL group delay response rD(w) r~h,w lOT (h,W - rD(w) - r(h,w) amplitude response AD(W) A(h,wJ lOA (h,w ) - AD(W) - A(h,w) impulse response iD(n) i n) c;(h,n - iD(n) - i(h,n) time domain signal aDen) a n) ca(h,n = aDen) - a(h,n)

Errors can be weighted and bounded by real-valued positive weighting functions W(w) or W(n) and real-valued positive bounding functions U(w) or U(n).This leads to the weighted errors W(w)c(h, w) or W(n)c(h, n) and the bounded errors Ic(h,w)1 :::; U(w) or Ic(h,n)1 :::; U(n).

Design intervals The frequency domain desired functions are defined on the design intervals. The interval bounds are assumed to be a priori given w.r.t. an individual filter design5 . The unions of closed intervals representing the passbands and the stop bands are denoted as B P = BPI U B P 2 ... and BS = B SI U BS2 ... Between every passband and stopband, there is an open interval, the so-called transition band. The union of the transition bands is defined as BT = BTl U BT2 ... The widths of transition bands are used to adjust the approximation error w.r.t. a given filter length N. The bands can be empty sets with the exception of B P which consists of at least one interval. Thus, at least

4Cortelazzo [6J proposed the design of quasi-linear phase digital filters where the desired constant group delay was a further variable of the optimization problem.

5It can sometimes be necessary to change one or more interval bounds (transition bands) to complete a tolerance scheme design. Burrus [4J, for example, proposed filters where variable bounds of design intervals were used to find a desirable ratio of least squares and minimax error.

372 CHAPTER 11

for one passband, a desired frequency response has to be defined. The union of passbands and stopbands is given by B = B P U B S. The union of pass bands , stopbands, and transition bands B P U BS U BT equals the complete frequency band [0, ?fl. The types of frequency bands are specified by the function D, where passbands correspond to frequencies with D(w) :I 0, stopbands to those with D(w) = 0, and transition bands to nonspecified D(w). Stopbands require magnitude response approximations only. Phase and group delay responses naturally cannot be specified in stopbands. Further time domain constraints are formulated w.r.t. the discrete sampling time instants [13,20].

Error valuations Assuming here an error function of the frequency domain, the error valuations are the maximum norm (Loo-norm)

IIc(h, .)11 00 = max Ic:(h, w)l, wEB

the Lp-norm, 1 ~ p < 00,

IIc(h, ·)llp = (L 1c:(h,wW dw) liP,

the bound by a function U(w)

lc(h,w)1 ~ U(w), wEB,

and the bounded Lp-norm with bound d as a combination of approximation and a priori tolerance scheme

Minimize 11c:(h, .)lIp s. t. max 1c:(h,w)1 ~ d. wEB

(4.1)

The case p = 2 (least squares design) has gained attraction because the design problems are normally more easy to solve than related minimax problems and the p = 2 approximation error is less sensitive w.r.t. the outliers which come up with measured data. The case p = 1 plays only a negligible role at linear phase FIR filter design up to now [31]. Burrus [4] discussed designs for Lp-norms, 3 ~ p < 00, on the basis of an iterative reweighted least squares method.

Main problems of unconstrained approximation Four distinct main problem types build up the classification scheme of the frequency domain filter design. Each filter design problem has to be associated with exactly one main problem for every design interval. The main problem types are:

Type C: approximation of a complex-valued frequency response D(w) by H(h,w) and/or fulfillment of an a priori tolerance scheme for cc(h,w).

Type M: approximation of a magnitude response MD(W) by M(h,w) and/or fulfillment of an a priori tolerance scheme for c M (h, w).

Type MP: simultaneous approximation of a magnitude response MD(W) by M(h,w) and a phase response (3D(W) by (3(h,w) on passbands and/or simultaneous fulfillment of the a priori tolerance schemes of cM(h,w) and c{3(h,w) on passbands.

Type MD: simultaneous approximation of a magnitude response MD(W) by M(h,w) and a group delay response TD(W) by T(h,w) on passbands and/or simultaneous fulfillment of the a priori tolerance schemes of C M (h, w) and C T (h, w) on passbands.

The error valuations of the main problem types w.r.t. the Lp-norms, 1 ~ p < 00,

are

Type C

Type M

Type MP

Type MD

IID(.) - H(h, .)llp '

IIMD(.) - M(h, .)lIp '

IIMD(.) - M(h, ·)lI p + II(3D(.) - (3(h, .)ll p '

IIMD(.) - M(h, ·)llp + IITD(.) - T(h, ·)lIp.

(4.2)

The error valuations of the main problem types w.r.t. the Loo-norm are

Type C

TypeM

Type MP

Type MD

IID(.) - H(h, .)11 00 ' (4.3)

IIMD(.) - M(h, .)11 00 ,

max {IIMD(') - M(h, .)11 00 , II(3D(.) - (3(h, ·)1I00} ,

max {IIMD(.) - M(h, .)11 00 ,IITD(') - T(h, .)1100}'

The Lp-norm designs can be combined with a priori tolerance schemes which correspond to the error valuation of the bounded Lp-norm (4.1). The main problems w.r.t. error bounds or w.r.t. the bounded Lp-norm can be easily formulated (see (4.2) and (4.3)). Errors can be weighted if the widths of a posteriori tolerance regions of minimax filters shall depend on the frequency or if different minimal defects shall be reached at simultaneous approximation. Approximations of phase response and group delay can be performed w.r.t. passbands only. Reemtsen [21) gave mathematically correct formulations and systematic investigations of the mathematical characteristics of the above defined four main design problems w.r.t. arbitrary Lp-norms, 1 ~ p ~ 00.

374 CHAPTER 11

Constraints The main designs can be constrained. Constraints often originate from the requirements of fixed error bounds, symmetry and pattern conditions ofthe filter coefficients (as e.g. for linear phase and halfband FIR filters), integer coefficients, conditions on derivatives such as flatness, monotonicity, convexity and concavity, and point conditions on characteristic functions.

With the exception of constraints (some filter designs are unconstrained), elements of all five sets of ingredients are mixed by certain rules. Sometimes a composition changes from one design interval to the next, as the examples of the sixth section show. In order to provide a rule: Given a design interval, a main problem type of one or two characteristic functions is associated with an error valuation. It should be noted that such rules help to a priori check the plausibility of design problems and are an indispensable prerequisite of filter design software.

5 THE OPTIMIZATION PROBLEM

Next, a convex optimization problem is posed. The filter design problems of the last section can be transferred into this form either exactly or at least in a good approximation. For that, let x E jRL be the vector of variables, f : jRL -t jR

be the objective function, gi : jRL x Ti -t jR and Vj : jRL -t jR be semi-infinite resp. finite constraint functions, which are convex w.r.t. to the variable vector x, and let hk : jRL -t jR be affine linear constraint functions, where Ti := [ai, bi ]

with ai < bi . Then the problem OP(I,J,K,L) is given by

Minimize f (x) subject to gi(X, t) :::; 0,

Vj(x):::;O, hk(x) = 0,

t E Ti , i = 1, ... , I, j = 1, ... ,J, k= 1, ... ,K.

An overview of numerical methods for the solution of such problems can be found in [22].

In filter design, the variables correspond to filter coefficients and single errors or to distances to given bounds. The objective function consists of the total error which is combined by single errors or by variable bounds of maximum functions and maximum norms. The semi-infinite parameter t represents the (angular) frequency w. Semi-infinite constraints express maximum functions or maximum norms of characteristic (error) functions of the frequency domain and other functions of the frequency domain such as derivatives of the magnitude response w.r.t. the frequency variable. Finite inequality constraints express maximum


functions or maximum norms of characteristic (error) functions w.r.t. the time domain or measured frequency domain data. Finally, equality constraints stand for point conditions. Some representative examples are listed in the following.

Example 1 (1)0, J~O, K~O) Frequency domain designs in combination with the error valuations of the maximum norm (minimax designs) or the maximum function (a priori tolerance scheme) lead to problems of semi-infinite optimization. An example is the unconstrained continuous type C problem

min maxlcc(h,w)l, hERN wEB

which can be written as a problem of type OP(I,O,O,N+l):

Minimize 8 subject to Icc(h,w)l- 8::; 0, wEB.

In the solution, the additional variable 8 provides the minimal defect of the approximation problem. In case of real-valued error functions, as at a minimax type M problem

min max ICM(h,w)l, hERN wEB

the related optimization problem OP(2,0,0,N+l) reads as

Minimize 8 subject to +cM(h,w) - 8::; 0, wEB,

-cM(h,w) - 8::; 0, wEB.

Example 2 (1=0, J>O, K~O) Frequency domain designs based on measured data in combination with the error valuations of the maximuin function (a priori tolerance scheme) or the maximum norm (minimax designs) lead to problems of finite optimization. Finite problems occur also at time domain design. An example is the equalization of the measured frequency response HE(wj)' j = 1, ... , J, J> N, by means of a minimax design. The desired values D(wj) stand for the resulting frequency response of the equalized system. The discrete minimax type C problem

becomes the following problem of type OP(O,J,O,N+l)

Minimize 8 subject to ID(wj) - HE(Wj)H(h,wj)l- 8::; 0, j = 1, ... , J,

where the optimal 8 gives the minimal defect.

376 CHAPTER 11

Example 3 (1=0, J=o, K>O) Least squares or Lp-norm design problems for p < 00 in combination with point constraints are problems of equality constrained optimization. An example is a least squares type M problem of a linear phase lowpass filter with flatness conditions in the passband and a least squares approximation in the stopband. The problem reads as

min r IA(h,w)12 dJ..v, hEX iBS

{ N I ak X = hEIR awkA(h,w) Iw=o= 0, k = 1, ... ,K,

or is written as a problem of type OP(O,O,K+l,N):

Minimize fBs IA(h,w)12 dJ..v

",N-l } L...n=o h(n) = 1

subject to -tz;A(h, w) Iw=o= 0, k = 1, ... , K,

"L.::o1 h(n) - 1 = 0.

Example 4 (1=0, J=O, K=O) Unconstrained least squares or Lp-norm designs for p < 00 yield problems of unconstrained optimization. An example is the unconstrained least squares type C problem of finding

which can be written as a problem of type OP(O,O,O,N)

Minimize k1cc(h,w)12 dJ..v.

Some nonlinear frequency domain error functions have to be 'convexified' so that one gets convex optimization problems. The occurring convex and convexified functions, together with the error valuation and further conditions, are the following.

Frequency response The error function Icc(h, w)1 is convex.

Magnitude response Three applications have to be distinguished. (i) Minimax type MP and MD designs: for M(h,w) ~ MD(W) the function -cM(h,w) = M(h,w) - MD(W) is convex, for M(h,w) < MD(W) the function (I/MD(w)) "L.:':Ol h(n) cos(wn - (3D(W)) is a proper convexification of cM(h,w) under the conditions IC{3(h,w)1 < 7r/2 and MD(W) > 0(18). (ii) Lp-norm type MP and MD designs: convexification of ±cM(h,w) leads to

±(I/MD(w)) L~:Ol hen) cos(wn-,BD(w)) under the conditions Ic,e(h,w)1 < rr/2 and MD(W) > ° [18]. (iii) Type M designs: linear approximation problems are obtained via the autocorrelation coefficients of the impulse response and spectral factorization [26] or via direct solution of the nonlinear problems by nonlinear programming [15].

Phase response Two applications have to be distinguished. (i) For approximations: convexification of ±c,e(h,w) leads to ±(I/MD(w)) L~:ll hen) sin(wn - ,BD(W)) and one has the conditions Ic,e(h,w)1 < rr/2 and MD(W) > 0[28]. (ii) For error bounds Ic,e(h,w)1 ::; U(w): these are convex and one has the conditions U(w) < rr/2 and MD(W) > 0.

Group delay response A convexification of ±cr(h,w) is ±(I/MD(w)) L~:Ol (n - TD(w))h(n) cos(wn - ,BD(W)) and one has the conditions Ic,e(h,w)1 < rr/2 and MD(W) > 0[5].

Amplitude response The error functions ±cA(h,w) are convex.

A final note is given to a priori tolerance scheme designs. Obviously, such problems may not possess a solution (empty feasible set) for a fixed filter length N. This fact turns out to be problematic if a minimum length solution has to be found via increasing lengths. In such cases, it is practicable to state a related, always solvable distance problem instead of the requested problem [14]. Let, for example,

IH(h,w)1 ::; U(w), wEB,

be the tolerance scheme inequality. Then the related semi-infinite distance problem of type OP(I,O,O,N) reads as

Minimize 6 subject to IH(h,w)l- U(w) - 6::; 0, wEB.

Such distance problem tries to maximize the distance variable 6, which stands for the maximum distance of the approximating (or error) functions to the bounds, and allows also negative distance variables. A negative value 6* in the solution (h*, 6*) corresponds to fulfilled inequalities, a positive value to violated ones. In some cases, the magnitude of 6* gives an idea of how much the filter length should be changed such that the requested filter length is reached.

378 CHAPTER 11

6 NUMERICAL EXAMPLES

This section contains an equalization filter (application fields 2, 6, and 12) and filters which are parts of systems for altering the sampling frequencies of signals (application fields 8 and 12).

6.1 Partly digital antialiasing and reconstruction filters

A standard AD conversion as part of a digital signal processing consists of the antialiasing filter, the sampler, and the ADC. The antialiasing filter performs the bandlimiting of the analog time continuous signals to a frequency band, of which the upper edge is smaller than half of the sampling frequency. A standard DA conversion consists of the DAC, the sample and hold stage, and the reconstruction filter. The reconstruction filter smoothes the spectral unbounded signal, which follows the sample and hold stage, to the desired time continuous bandlimited signal. Figure 2 shows a block diagram of a digital processing of analog signals. Commonly, both filters are analog filters. Analog filters of high quality are not easy to construct since, assuming a moderate filter length, the desired high magnitude selectivity stands in contrast to the also desired linear phase response. Furthermore, tolerances of electronic components, temperature drift, long time stability etc. complicate the design of higher degree analog filters.

q,(m)

x(l) Analog Partial Digijal Partial Downsampling -- Antialiasing -- Sampling ADC -- Antialiasing --Filter Filter

by factor 4

F,=501<Hz Fq,-20OkHz Fq 2-2001<Hz

y(n) Digital Signal Processing x(n)

q,(m) q,(m) F,-5OkHz

Upsampling by Digital Partial DACwith Analog Partial r(l)

r--- Reconstruction c--+- Sample and -- Reconstruction r----factor 4 Fiher Hold Staga Filter

Fq,-200l<Hz Fq .-200kHz

Figure 2 Partly digital antialiasing and reconstruction filters.


A possibility to significantly improve the transfer characteristics of the filters w.r.t. the frequency and also the time domain is their partly digital realization in combination with oversampling. In this way filters can be realized such that, if comparable analog filters ofthe same high quality could be constructed at all, they would be much more expensive than the partly digital filters. Such partly digital filters were already proposed in 1982 by Schiissler et al. [27], in 1987 by Parks and Burrus [11], and in 1989 by Preuss [19], who already used the oversampling. To say it shortly, partly digital filters show advantages w.r.t. the discharge of the requirements of the analog partial filters, the extension of the pass bands, the equalization of the magnitude response, and the equalization of phase or group delay response. The specifications of the here proposed solution, based on 18 bit ADCs and DACs of Burr-Brown, are the following:

• oversampling by q = 4, aliasing and imaging free filters (stopband B S = [7rlq,7r]), effective bandwidth about 90% which corresponds to the passband B P = [0, 7r 1 4.5], FIR filter in polyphase structure with N = 256 which corresponds to an effective computational effort of N* = 64 w.r.t. the sampling frequency of the converters,

• passband magnitude and phase response a priori tolerance scheme with tolerances of dp = 10-4, least squares stopband approximation of MD(W) = 0, and no overshot in the transition band,

• passband desiderata of constant magnitude response MD(W) = 1 and linear phase response f3D(W) = -TDW, TD = 120, where the desired constant group delay TD has been chosen w.r.t. a small approximation defect,

• analog partial filter with Nyquist attenuation of 80dB and the frequency response

n:= jw. Wg

The passband edge frequency has been set to Wg = 7r 17.02832 in order to reach a Nyquist attenuation of 80dB, corresponding to IHE(7r)1 = 10-4 .

The analog filter of degree 6 has been proposed by Burr-Brown [8], who use this filter for high quality audio applications. The digital reconstruction filter additionally has to equalize the magnitude response of the sample and hold-device following the DAC.

The approximation problem of the antialiasing filter consists of a passband B P

(magnitude and phase response, type MP, bounded error) and a stopband BS

380 CHAPTER 11

(magnitude response, type M, Euclidean norm). After convexification it results in the semi-infinite optimization problem of type OP(4,0,0,257)

Min. IBs IHE(w)H(h,w)1 2 dw

S. t. ME(W)M(h,w) - MD(W) - dp :::; 0, wE B P ,

MD(W) - ME(W) L.~:Ol h(n) cos(CPn(w)) - dp :::; 0, wE B P ,

L.~:ol h(n){ME(w) sin(cpn(w)) - tan(dp ) COS(CPn(W))} :::; 0, wE B P ,

L.~:Ol h(n){ -ME(W) sin(cpn(w)) - tan(dp ) cos(CPn(w))} :::; 0, wE B P ,

where ME(W) = IHE(W)I and (3E(W) = arg(HE(w)) and the functions CPn(W) = wn - (3D(W) + (3E(W) contain the phase response which has to be equalized. In case of the reconstruction filter, the magnitude response ME (w) has to be multiplied by the magnitude response of the sample and hold stage. The resulting characteristics are illustrated in Figure 3. The second constraint function uses the above mentioned linear approximation of the magnitude response. Obviously, the convexification error is very small, especially in case of a small passband ripple parameter (cf. (3.1) and the figures).

2 XIO-l malmitude error

co 0 0 "0 ,.. ,:

-2 ., "0 .: -50 -4 'c CI)

'"' E -6

- 100 -8 0 0 0.5

normalized frequency normalized frequency

x 10'" phase error x 10'" magnitude error

I A f\ I (\ fI

0 0

-I V -I

0 0.05 0 0.05

normalized frequency normalized frequency

Figure 3 Characteristics of the partial digital antialiasing filter.


6.2 Reduction of the sampling frequency

The reduction of the sampling frequency Fx = l/Tx of a digital signal x(n) by the integer factor qR requires a bandlimiting filter followed by a downsampling stage. Reduction here means to get a digital signal y(n) with the sampling frequency Fy = Fx/qR = l/Ty such that the spectral components of both signals do not differ w.r.t. the frequency band By = [0, Fy/2]. In the following, the interval from zero up to the Nyquist or critical frequency F/2 of a signal is called the baseband. Let now x(n) have spectral components in the whole baseband Bx = [0, Fx/2], where the parts in [0, Fy/2] are the relevant parts. Next, the critical frequency of the desired signal y(m) is Fy/2 = Fx/(2qR). Clearly, if a sampling frequency of a signal has to be reduced by the factor qR, two processes are necessary, firstly to filter the signal x(n) by a filter of the ideal frequency response H(w) = 1, Iwl ~ 7r/QR, and H(w) = 1 else, and secondly to eliminate (qR -1) samples ofthe filtered signal q(n) (downsampling stage), located between each of two consecutive samples of y(m). Both signal processing blocks must not be rearranged. Figure 4 shows the blocks with the usual graphical elements. Figure 5 illustrates the sampling frequency reduction for qR = 3. It shows a blockdiagram pointing to time and frequency domain plots, where Bx, B q , and By are the frequency domain basebands of the signals x(n), q(n), and y(m), respectively.

x(n) q(n) y(m)

Filter F.

Figure 4 Downsampling by integer factor qR.

Each real minimax lowpass filter causes approximation errors in passbands and stopbands and requires transition bands. The approximation errors are evaluated by means of the a posteriori tolerance scheme (3.1). Phase errors or distortions can be avoided by linear phase FIR filters. In order to qualify transition bands, a useful parameter N B, the relative effective bandwidth, is introduced as the quotient of the passband edge frequency I p and the critical frequency Ie. This parameter plays a crucial role for the design of cascaded filters. Both effects, approximation errors and transition bands, can cause the aliasing at sampling frequency altering applications. Undesired aliasing depends on the stopband edge frequency Is (aBasing stands for frequency domain backfolded spectral parts of a signal, caused by sampling and insufficient filtering or bandlimiting). More precisely, the aliasing distortions are caused by the signal spectral components which lie in the filter transition bands between Ie and is and the finite stopband signal suppression. Such artifacts should be avoided,

382 CHAPTER 11

Signal processing Blocks Time Domain Frequency Domain

x(n) x(n) t'l B, Y-

•• flF 1/2 x

q(n) 1:'""1 ::\ r ~ •• flF

1/6 1/2 5/6 1 x

y(m) IT By I. flF x

1/6 1/3

Figure 5 Downsampling in time and frequency domain (qn = 3).

especially since these distortions are nonharmonic distortions. The filter then has to be designed under condition fs = fe and should reach sufficient stopband attenuation. (Lower length N aliasing filters with fs > fe can be used for increasing the sampling frequency of bandlimited signals without causing the corresponding imaging distortions.)

For the purpose of medical data processing, filters were designed for different factors qR and NB. The objective of the special application was the construction of a filter set which can be universally used for sampling frequency alteration, where the criteria for the choice of the individual filters were required to be as simple as possible. Here, a subset of these filters is proposed, which is designed for qR = 2 and some stepped effective bandwidths. The specifications fp = feNBk' NBk = O.lk, k = 2, ... ,9, fs = fe, fe = 0.5/qR = 0.25, dp = 0.001, and ds = 0.0001 define 9 a priori tolerance schemes. The design problems of finding the minimum length N filters, which fulfill the a priori tolerance schemes, consist of pass bands Bt: (amplitude response, type M, maximum norm) and stopbands Bf (amplitude response, type M, bounded error). The related semi-infinite optimization problems of type OP( 4,0,0,Lk ) with ds = 10-4

and parameter NBk = kilO, k = 2,3, ... ,9, read as

Minimize d subject to ±(1- A(h,w)) - d::; 0, wE [0,NBk 7l"/2] ,

±A(h, w) - ds ::; 0, w E [71"/2,71"].


Table 2 Parameters of the "downsampling" filters.

filter II kiN I fp/fe I fp I fe I fs I ds q2nb02 2 22 0.2 0.05 0.25 0.25 1.569xlO ·4 10 -4

q2nb03 3 26 0.3 0.075 0.25 0.25 1.487xl0 ·4 10 ·4

q2nb04 4 30 0.4 0.1 0.25 0.25 2.280xl0 ·4 10 ·4

q2nb05 5 34 0.5 0.125 0.25 0.25 5.518xl0 -4 10 -4

q2nb06 6 44 0.6 0.15 0.25 0.25 6.202xl0 -4 10 -4

q2nb07 7 56 0.7 0.175 0.25 0.25 6.997xl0 -4 10 -4

q2nb08 8 82 0.8 0.2 0.25 0.25 6.758xl0 -4 10 -4

q2nb09 9 160 0.9 0.225 0.25 0.25 9.363xlO '4 10 '4

The filter lengths Nk have to be the minimum lengths, fulfilling the conditions ~k ~ dp, where (hk , ~k) is the solution of the k-th problem. A good estimation for N is given by the empirical formula of Kaiser in the book by Parks and Burrus [11, p.95-97]. Table 2 lists the a posteriori tolerance scheme parameters of the designed filters.

Figure 6 plots the magnitude responses of the filters. The filters make the best of the stopband tolerance regions of ds = 0.0001, where the stopbands start exactly at each critical frequency. In the transition bands, the filters for NB = 0.2 up to NB = 0.9 can be distinguished by visual inspection from the left to the right.

In the following, an application example is discussed. The sampling frequency Fx = 48kHz of the signal x(n) should be reduced by qR = 8 down to Fy = 6kHz. The signal x(n) consists of relevant spectral components in the band [O,2.4kHz] and further, uninteresting components in the remaining part of the baseband up to 24kHz. The effective bandwidth after the sampling frequency reduction should remain at [O,2.4kHz], corresponding to a relative effective bandwidth factor NB = 2.4kHz/3kHz= 0.8 of the filter, where fe = 3kHz is the critical frequency w.r.t. the final sampling frequency fy = 6kHz. Now two possibilities of applying filters are contrasted and compared. Firstly, a single length N = 328 filter with parameters qR = 8, NB = 0.8, dp = 0.001, and ds = 0.0001 is considered (Figure 7). Secondly, a cascade of three qR = 2 filters (Figure 8) with parameters dp = 0.001 and ds = 0.0001 and different factors N B is contrasted to the single filter solution. The computational demands of both filter arrangements are compared w.r.t. MAC(s) which are multiplications and accumulations (per second).

384

~ -0 c ., -0 a ·c 01) .. E

CHAPTER 11

0

-10

-20

-30

-40

-50

-60

-70

-80

-90 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

nonnalized frequency

Figure 6 Magnitude responses of the "down sampling" filters for qR = 2 and different factors N B .

x(n) q(n) y(m)

Filter ----- ~8 F.-48kHz F,,-F... 'Y-6kHz

Figure 7 The single qR = 8 filter.

0.5

F" :48kHz F" =13kHz

Figure 8 The triple qR = 2 filters.

The single qR=8 filter: The efficient polyphase structure realization of the single filter [7] requires the computational demand of 328MACx6kHZ = 1.968MMACs.


The triple cascaded qR=2 filters: The three qR = 2 filter polyphase structure solution (complete prime factor decomposition) leads to a reduction of the computational demand. Figure 8 shows the block diagram of the cascade structure. The first filter is a qR = 2 alias free filter, of which the stopband frequency is the critical frequency fe = 12kHz. The desired minimum effective bandwidth of y(m) is [0,2.4kHz] which, at this stage, corresponds to a minimum factor N Bmin = 0.2. The length N = 22 filter q2nb02 reaches N B = 0.2. The second qR = 2 filter is necessary to reach a sampling frequency of 12kHz. The stopband of this filter then has to start at the respective critical frequency of 6kHz. The minimum factor NBmin = '2.4kHz/6kHz= 0.4. The length N = 30 filter q2nb04 reaches N B = 0.4. The third filter allows to reduce the sampling frequency from 12kHz down to Fy =6kHz. The stopband of the filter has to start at fe = 3kHz and the minimum factor NBmin = 2.4kHz/3kHz= 0.8. The length N = 82 filter q2nb08 reaches N B = 0.8. If the efficient polyphase structure [7] is used, the computational demand is 22MACx24kHz + 30MACx12kHz + 82MACx6kHz = 1.380MMACs.

Cascading three filters reduces the computational demand of 1.968MMACs down to 1.38MMACs, corresponding to a reduction of about 30%. The maximum passband ripple could increase to the hypothetical value dp = 1.569 X 10-4

+ 2.280x10-4 + 6.758xl0-4 = 1.0607xlO-3 against dp=6.758 x 10-4 in case of the single filter solution. If only the filter q2nb08 would be available (and not the proposed set of filters for different values NB), then the cascade structure would lead to a demand of 82MACx(24kHz+12kHz+6kHz)= 3.444MMACs, which is significantly more than at the single filter solution.

7 CONCLUSION

It has been shown that a large variety of constrained and unconstrained filter design problems is of the type of the convex semi-infinite optimization problem OP. This problem can be solved in particular by the methods in [16,17]. With these methods, the authors have designed a large number of FIR filters, where the algorithm never failed.

The optimization approach is a unified approach, which means that ad hoc methods, often heuristic, should not be used for filter design, even though they may comprise a higher efficiency (lower computation time) in case of single designs or special cases. The robustness of the algorithms in [16, 17] especially allows high degree filter designs, which become more and more important since

386 CHAPTER 11

the computational power of digital signal processors grows from one chip generation to the next. Up to now, the highest degree designs can be found in [13, 17, 18]. Optimization is an active field so that more efficient methods especially for solving nonlinear semi-infinite optimization problems can be expected in the future.

REFERENCES

[1) J. W. Adams. FIR digital filters with least-squares stopbands subject to peakgain constraints. IEEE Trans. on Circuits and Systems, 39:376-388, 1991.

(2) A. S. Alkhairy, K. G. Christian, and J. S. Lim. Design and characterization of optimal FIR filters with arbitrary phase. IEEE Trans. on Signal Processing, 41{2}:559-572, 1993.

(3) D. Burnside and Th. W. Parks. Optimal design of FIR filters with the complex Chebyshev error criteria. IEEE Trans. on Signal Processing, 43:605-616, 1995.

(4) C. S. Burrus and I. W. Selesnick. New results in digital filter design. MiniWorkshop on Filter Design, LNT Erlangen, H. W. Schussler (ed.), 1997.

(5) X. Chen and T. W. Parks. Design of FIR filters in the complex domain. IEEE Trans. Acoust., Speech, and Signal Processing, ASSP-35:144-153, 1987.

(6) G. Cortelazzo and M. R. Lightner. Simultaneous design in both magnitude and group-delay of IIR and FIR filters based on multiple criterion optimization. IEEE Trans. Aco'Ust., Speech, and Signal Processing, ASSP-32:949-967, 1984.

(7) R. E. Crochiere and L. R. Rabiner. Multirate Digital Signal Processing. Prentice Hall, New Jersey, 1983.

(8) Rick Downs. A low noise, low distortion design for anti-aliasing and anti-imaging filters. Burr-Brown Application Bulletin AB-026A, Burr-Brown Corporation, Tucson, Arizona.

(9) F. Grenez. Design of linear or minimum-phase FIR filters by constrained Chebyshevapproximation. Signal Processing, 5:325-332, 1983.

(10) J. H. McClellan, T. W. Parks, and L. R. Rabiner. A computer program for designing optimum FIR linear phase digital filters. IEEE Trans., AU-21, 1973.

(11) T. W. Parks and C. S. Burrus. Digital Filter Design. J. Wiley, 1987.

(12) Thomas W. Parks and James H. McClellan. Chebyshev approximation for nonrecursive digital filters with linear phase. IEEE Trans. on Circuit Theory, CT-19{2}:189-194, 1972.

(13) A. Potchinkov. Der Entwurf digitaler FIR-Filter mit Methoden der konvexen semi-infiniten Optimierv.ng. PhD thesis, Techn. Univ. Berlin, 1994.

(14) A. Potchinkov. Design of optimal linear phase FIR filters by a semi-infinite programming technique. Signal Processing, 58:165-180, 1997.


[15] A. Potchinkov. Entwurfminimalphasiger nichtrekursiver digitaler Filter mit Verfahren der nichtlinearen Optimierung. Frequenz, 51:132-137, 1997.

[16] A. Potchinkov and R. Reemtsen. FIR filter design in the complex domain by a semi-infinite programming technique. Archiv fur Elektronik und Ubertragungstechnik, 48:1. The method: 135 - 144, II. Numerical results: 200-209, 1994.

[17] A. Potchinkov and R. Reemtsen. The design of FIR filters in the complex plane by convex optimization. Signal Processing, 46:127 - 146, 1995.

[18] A. Potchinkov and R. Reemtsen. The simultaneous approximation of magnitude and phase by FIR digital filters. Intern. J. Circuit Theory and Appl., 25:1. A new approach: 167-177, II. Methods and examples: 179-197, 1997.

[19] K Preuss. On the design of FIR filters by complex Chebyshev approximation. IEEE Trans. Acoust., Speech, and Signal Processing, ASSP-37:702-712, 1989.

[20] L. R. Rabiner and B. Gold. Theory and Application of Digital Signal Processing. Prentice Hall, London, 1975.

[21] R. Reemtsen. Design problems for nonrecursive digital filters. Technical Report M-04/1997, BTU Cottbus, 1997.

[22] R. Reemtsen and S. Gorner. Numerical methods for semi-infinite programming: a survey. This volume.

[23] J. S. Rosko. Digital Simulation of Physical Systems. Addison Wesley, 1972.

[24] M. Schulist. Ein Beitrag zum Entwurf nichtrekursiver Filter. PhD thesis, Univ. Erlangen-Nurnberg, Germany, 1992.

[25] H. W. Schussler. Digitale Signalverarbeitung. Springer Verlag, 1988.

[26] H. W. Schussler and P. Steffen. Some advanced topics in filter design. In J. S. Lim and A. Oppenheim, editors, Advanced Topics in Signal Processing, pages 416-491. Prentice Hall, NJ, 1988.

[27] H.W. Schussler, P. Mohringer, and P. Steffen. On partly digital anti-aliasing filters. Archiv fur Elektronik und Ubertragungstechnik, 36:349 - 355, 1982.

[28] K Steiglitz. Design of FIR digital phase networks. IEEE Trans. Acoust., Speech, and Signal Processing, ASSP-29:171-176, 1981.

[29] K Steiglitz, T. W. Parks, and J. F. Kaiser. METEOR: a constraint-based FIR filter design program. IEEE Trans. on Signal Processing, ASSP-40:1901-1909, 1992.

[30] F. J. Taylor. Digital Filter Design Handbook. Marcel Dekker, Inc., 1983.

[31] W.-S. Yu, I-K Fong, and K-C. Chang. An iI-approximation based method for synthesizing FIR filters. IEEE Trans. on Circuits and Systems, 39:578-581, 1992.

12 SEMI-INFINITE PROGRAMMING

IN CONTROL Ekkehard W. Sachs

ABSTRACT

Universitiit Trier, FB IV - Mathematik, D-54286 Trier, Germany, Email: [email protected]

Optimal control problems represent a special class of optimization problems which describe dynamical processes. There is a wide range of applications for optimal control problems in engineering and economics. A few of these problems are described and it is shown how they are related and lead to semi-infinite programming problems.

Among the vast literature in the area of semi-infinite programming we want to point out two review articles which include also the aspect of optimal control problems: Polak [14] reviews engineering applications of semi-inifinite programming and Hettich and Kortanek [10] give an overview of applications, algorithms and theory in this area.

The dynamical system can be given by a system of difference equations, ordinary differential equations or partial differential equations. With a proper objective function this often leads to optimization problems in function spaces.

A general class of optimal control problems for ordinary differential equations is presented. The controls and the states have to satisfy pointwise bounds. It is shown that a discretization of the control space leads to a semi-infinite programming problem.

A heat conduction process in food industry coupled with the decay of microorganisms is discussed as an important application of optimal control problems. This process, however, also leads to reduction in the vitamines. The optimal control problem consists in finding numerically a control which optimizes the loss of vitamines and satisfies the requirement of achieved sterility for the product. The mathematical model is described by a system of nonlinear partial differential equations. After discretization one obtains a semi-infinite programming problem with nonlinear constraints.

389

R. Reemtsen and 1.-1. RiicIanann (ells.), Semi-Infinite Programming, 389-411. @ 1998 Kluwer Academic Publishers.

390 CHAPTER 12

Another application of semi-infinite programming occurs in the control of flutter of an aircraft wing. The avoidance of flutter has to be guaranteed over a certain velocity range at which the aircraft operates. A proper formulation of this problem leads to a semi-infinite programming problem. The dependence of the functions on the indices describing the constraints, however, is not necessarily differentiable and therefore computationally difficult to handle. Therefore, a new approach is presented using the characterization of stability with the Lyapunov equation. This leads to an interesting and novel combination of positive-definite and semi-infinite programming. It is shown how this problem can be solved by a barrier function technique .

• This research was supported in part by the Air Force Office of Scientific Research under

grants F49620-93-1-0280 and F49620-96-1-0329 while the author was visiting the Center

for Optimal Design and Control, Virginia Tech, Blacksburg, VA 24061, U.S.A., and Nestle

Research and Development.

1 OPTIMAL CONTROL PROBLEMS

1.1 General framework

In this paper we present three different control problems - two optimal control problems, one with ordinary differential equations, another one with partial differential equations, and one control problem from optimal design.

All of these problems are related to real applications in various areas of engineering. We show how these lead to semi-infinite programming problems and point out the advantage if these problems are treated by semi-infinite programming. Two of the three problems have been solved numerically by other methods. It is a topic of future research to make full use of the advantage of semi-infinite programming in the numerical solutions of these two problems also.

An optimal control problem consists of an objective function and an inputoutput map which is usually described by a dynamical system. We refer to the examples in the later sections of the paper.

Let u E L~[O, T] denote the control of the system. We assume that with each u there is associated a unique output of the system which we denote by S(u). The corresponding map, often a solution operator of a differential equation, is defined as

S : L:[O, T] -+ Cn(n x [0, T]),

Semi-Infinite Programming in Control 391

where fl is a subset Rk. The objective function depends on the control u and the state S(u) in the following way. Let

(1.1)

and a restriction map rl : fl x [0, Tj -t [0, T]

be defined.

Often the controls or states have to be restricted to a certain range of values. Here we impose simple box constraints of the type

aCt) :::; u(t) :::; bet), c(x, t) :::; S(u)(x, t) :::; d(x, t), t E [0, T], x E fl

for given functions a, b E L~[O, T] and c, dE Cn(fl x [0, T]).

Then the optimal control problem is given by

Minimize

subject to

and

General Optimal Control Problem

F(u) = foT L(rl (S(u))(t), u(t), t) dt, u E L~[O, Tj

aCt) :::; u(t) :::; bet), t E [0, T]

c(x, t) :::; S(u)(x, t) :::; d(x, t), t E [0, Tj, x E fl

In the following we show how this problem leads to a semi-infinite programming problem.

1.2 Discretized optimal control problem

In order to compute the optimal control numerically it is necessary to discretize the problem. We introduce a finite-dimensional subspace of piecewise constant controls.

N

uN(a)(t) = L aiui(t), a E RN

i=l

The discretized problem is

392 CHAPTER 12

Semi-Infinite Programming Problem

Minimize FN(a) = F(uN(a))

= JoT L(rl(S(UN(a)))(t),UN(a)(t),t) dt, a E RN

subject to a(t)::; uN(a)(t) ::; b(t), t E [0, T]

and c(x, t) ::; S(UN(a))(X, t) ::; d(x, t), t E [0, TJ, x E 0

This problem is a semi-infinite programming problem. The variable is a vector in IR N , but the bound constraints have to hold a. e. on the whole interval [0, T] for the control and on 0 x [0, T] for the state.

1.3 Ordinary differential equations

We will consider a control system governed by an ordinary differential equation. Let a dynamical system be described by a system of first order differential equations with initial conditions. We have as variables

state x(t) ERn,

control u(t) E IRm.

The system is described by a nonlinear function

f : R n +m +1 -+ R n

and the system of ordinary differential equations

x(t) = f(x(t), u(t), t), x(o) = xo, t E [0, T] (1.2)

for some initial value Xo ERn. We assume that for any U E L~[O, T] there is a unique solution x E Wroo[O, T] which solves (1.2). This defines the map

S: L:[O, T] -+ en[O, T]

which is in this case independent of the set O.

The objective function given by a nonlinear function L as defined in (1.1) is given as

loT L(x(t), u(t), t) dt.


The optimal control problem can be formulated as follows:

Optimal Control Problem

Minimize JOT L(x(t),U(t),t) dt, u E L~[O,T]

subject to x(t) = f(x(t}, u(t), t), x(O) = xo

and a(t) ~ u(t) ~ b(t), t E [0, T]

c(t) ~ x(t) ~ d(t), t E [0, T]

This is a special case of the General Optimal Control Problem by omitting n and setting rl equal to the identity.

There are various ways to treat this optimal control problem numerically. It is possible to discretize this problem by finite differences or collocation schemes and then apply a finite dimensional optimization routine. Another approach that can be used is to derive the necessary optimality conditions which result in a two or multipoint boundary value problem. These problems can be solved numerically by multiple shooting techniques.

All of these methods have the disadvantage that special care has to be taken when it is important that really all the constraints on the state have to be satisfied at every point in n x [0, T]. In the general case, any kind of discretization leads to a constraint only on the grid points or another discretization parameter. In between the grid points the constraints could be violated. Methods from semi-infinite programming have the advantage that they avoid this kind of behavior since they are especially designed to deal with infinitely many constraints.

1.4 Robot trajectory planning

Examples for applications of semi-infinite programming in optimal control can be found in two review articles by Polak [14] and Hettich and Kortanek [10]. An application to robot trajectory planning is worked out in detail mathematically and numerically in [8]. The problem is described as follows. A robot arm should follow a prescribed path subject to certain constraints. There are bounds on the joint velocity and jerk as well as on the torque.

394 CHAPTER 12

Let Y E [0,1)

denote the coordinates describing the position of the robot depending on a paramter y. A transformation onto a fixed time interval - which is typical for time optimal control problems - is given as

t = r(y) = 1Y u(a)da

where u(a) > ° for all a E [0,1). If the time of the process should be minimized then the objective funtion is

r(l) = 11 u(a)da.

The transformed coordinates are denoted by a tilde

j = 1, ... ,1, t E [0, T).

The constraints on the joint velocity, acceleration and jerk have to hold for all times

tE[O,T), i=I,2,3.

In addition one has constraints on the torque 'If; which depend in a nonlinear way on f)

l",b(t) I = Ig(8)(t)1 ~ Co t E [O,T).

This is a special form of an optimal control problem as discussed previously. In [8) the control u was replaced by a cubic spline

N

uN(a)(t) = L ajBj(t). j=l

Then the variables a = (ai, ... , aN f lie in the finite dimensional space IR N whereas the constraints are posed in an interval [0, T)leading to a semi-infinite programming problem.


2 STERILIZATION OF FOOD

2.1 Mathematical model

In this section we present another complex example. It arises in the sterilization of food and is described by a system of a partial and an ordinary differential equation. For a review of mathematical methods to model and solve this problem we refer to [16].

The problem is to sterilize food which is typically packed in a can. The cans are filled in a large container which is called an autoclave. The autoclave is filled with hot water or steam which heats the cans. After the product is sterile the autoclave is filled with cold water to start the cooling down phase.

The hot water treatment kills the bacteria in food, but it also affects the nutrients in the food. Since the heating process is the slowest at the coldest point in the can which is often the geometric center one has to monitor the bacteria at this spot. In contrast, the nutrients are most damaged at the surface close to the water. This leads to an optimization problem where the nutrients are maximized under a given constraint on sterility. In the following we describe this process mathematically, see also [12].

Let n E 1R.2 be a domain in space and let [0, T] be the time interval in which the process takes place. We denote by

concentration of microorganisms temperature in location x at time t

C(x, t) O(x, t) u(t) temperature in the autoclave surrounding the can

The system which describes the sterilization process is given by

aO(x, t) p(8(x, t))c(O(x, t)) at V' . (k(8(x, t))V'O(x, t)) in n x (O,T),

aC(x, t) = -K(O(x, t))C(x, t) in n x (O,T),

at

k(O(~, t)) aO~~ t) = a(u(t) - O(~, t)) in r x (O,T),

O(x,O) Oo(x) in n C(x,O) Co (x) in n

396

The function K(O) is given by the Arrhenius law:

K(O)=Krexp(-~ (~- ;J) where the constants and initial data are given by

p(O) c(O) k(O) Q

00 (x) Co (x) Or Kr Ea R

density heat capacity thermal conductivity heat transfer coefficient initial temperature initial concentration constant reference temperature K(Or) activation energy universal gas constant

CHAPTER 12

Note that the differential equation for the concentration C is a family of ordinary differential equations parametrized by x.

The requirement for sterility is to reduce the level of concentration by a factor of 10-.8, where (3 depends on the product. Given 0, the differential equation for C can be solved analytically. In the engineering literature a simplification is carried out and the following solution is used.

(

t O(x,r) - Or ) C(x, t) = C(x, 0) exp -1 KrlO Z dr,

where RO~ In 10

Z = --,=-Ea

Then the sterility requirement

C(x, T) ~ lO-.8C(x, O)

is equivalent to the inequality

T O(x, r) - Or 1 In 10 10 Z dr ~ (3-- .

o Kr

The following function

¢ C(O x [0, T]) -t C(O)

has values being defined on the domain S1 and its value is called the F-value.

T 8(x, r) - 8r

</>(8)(x) := fa 10 Z dr.

The sterility requirement can be rewritten as

</>(8)(x) 2: </>0 for all x E S1

with an appropriate constant </>0,

2.2 Optimal control problem

Assume that for each u E £00[0, T] there exists a unique solution 8 E C(S1 x [0, T]). Correspondingly we define a solution operator

S : £00[0, T] -+ C(S1 x [0, T])

by S(u) = 8.

The process is controlled by the temperature u(~, t) surrounding the can. There are various ways to heat the product such that the required sterility, the Fvalue, is achieved. The goal is to control the heating process in such a way that a maximal amount of nutrients is retained. The concentration Cn of the nutrients is described by a similar differential equation as for the C

aCn(x, t) (()) . at = -Kn 8 x,t Cn(x,t) In S1 x (O,T)

with the initial condition

Cn(x, O) = CnO(x) x E S1.

Similar to the case of C we can solve the differential equation for Cn analytically. The nutrients are damaged most on the surface of the product. The remaining amount of nutrients at a point Xs on the surface is described by

(

T S(u)(xs, t) - 8q )

exp - fa K q10 Zq dt.

We can write this objective function also in the form (1.1). Note that it is equivalent to minimize

S(u)(xs, t) - 8q

F(u) = loT K q10 Zq dt.

398 CHAPTER 12

Hence we obtain the objective function by setting

and L appropriately.

Note that we have constraints on the temperature control U at the boundary.

Umin(t) ~ u(t) ~ umax(t) t E [0, T].

Furthermore, at the end of the process, the temperature of the product inside the can should not be too high. It is required to lie below a certain value

S(U)(x, T) ~ Oend in D.

Next we formulate the optimal control problem.

Optimal Control Problem

Minimize F(u)

Umin(t) ~ u(t) ~ umax(t) ¢(S(u))(x) ~ ¢o S(u)(x, T) :::; Oend

such that

t E [0, T] xED xED

Note that we have pointwise constraints on the controls and additional constraints on the state.

The following figures from [12] show that the optimal control is quite different from the control law typically used in industry. First we display in Figure 1 a typical control used in practice and the temperature at the center of the can.

The temperature control computed from the optimal control problem shows a slower increase in the initial phase. If one monitors the F-value then one can see in Figure 2 that both values are attained at the end of the process. However, for the optimal control the increase in the F -value as a function of time is less steep and therefore better for the product quality. The vitamine thiamine is reduced less for the optimal control during the sterilization process. At the end of the process about 7% more vitamines are retained compared to a industrial control, see [12].

Semi-Infinite Programming zn Control

<.> .5 ! :> e 8-e .. ..

,20

100

80

60

40

20

Typ4e.t control ot I lterUlutlon p ... ocel!l

Control Tempel'lltur •• t the center

~---- ----_ .. -._--

\ \

o ____ ____ ____ -L ____ ____ ____ ____ L-__

o 1000 3000 4000 5000 6000 7000 11me In NC.

Figure 1 Industrial Control: Temperature Outside and in the Center of the Can

<.> .5

t e ~

, .. , .. .. 10

.. 20

Optlm.1 control o' • sterilization proceu

ConlrOl -Tempemure 111: tM c.nter -

• L-____ -L ______ L-____ ~ ______ ~ ____ ~ ______ ~ ______ L_ ____ ~

• , ... .... -n...1n-. 1000 ....

Figure 2 Optimal Control: Temperature Outside and in the Center of the Can

399

400 CHAPTER 12

2.3 Numerical solution of optimal control problem

The previous results were obtained for a finite-dimensional discretized problem, however, the problem can also be treated in semi-infinite programming context with the advantage that the state constraints are satisfied at every point in the domain.

If we discretize the space of controls in a similar way as above by setting

N

uN(a)(t) = L aiui(t), a E JRN i=l

then we arrive again at a semi-infinite programming problem.


Minimize FN(a) = F(uN(a)) over JRN where

S(uN(a))(xs, t) - Bq

= JoT Kql0 Zq dt

Umin(t) :::; uN(a)(t) :::; umax(t)

¢(S(uN(a)))(x) ~ ¢o

S(uN(a))(x, T) :::; Bend

and

t E [0, T]

xE!1

xE!1

This is a minimization problem over a finite dimensional space. The constraints on U are called box constraints. For given constant upper and lower bounds Umin, Umax the constraints on UN for all t E [0, T] can be replaced by finitely many constraints. However, the other inequality constraints depend in a nonlinear way on the control and on a. Therefore these bounds have to hold on an interval like in the one-dimensional domain !1.

The following simplification of the model used by food technologists can also be interpreted as a reduction technique. It is obvious although not easy to prove that at least during the heating phase the coldest point of the food occurs in the center of the can. Therefore this is the point where the least bacteria are


destroyed and is the only point where the F -value is computed. In other words, the semi-infinite constraint

is replaced by a single constraint at the center Xc of the cylindrical can

From a mathematical point of view, it has to be checked, if this point is the coldest for all times, in particular Q,lso for the cooling phase. For a discussion of this aspect see [11].

3 FLUTTER CONTROL

3.1 Mathematical model

v

- -~ .} ~(t)

• v

yet)

• v

Figure 3 Model Experiment

The problem of flutter of aircraft wings is studied in aeroelasticity; two basic references are the monographs [2] and [7] where the physical background is explained. In this section we consider a simple model of an aircraft wing segment which hinges on two springs, see e. g. [4]. The degrees offreedom are such that up and down movements y(t) and torsions a(t) can occur. The movement of the point Z is measured with respect to its distance z from a fixed reference point R. The displacements can be described by a system of linear ordinary

402 CHAPTER 12

differential equations of second order with constant coefficients

M d2 x(t) Ddx(t) C ( ) = F( ) dt2 + dt + x t t .

The coefficients represent

ME IRnxn

DE R nxn

C E R nxn

the mass matrix, the damping matrix and the stiffness matrix for a fixed design of the wing.

The function F(t) E R n denotes the vector of external forces.

(3.1)

If quasi-stationary aerodynamic forces are assumed, (3.1) can be transformed into a homogeneous system of the form

A ~x(t) A dx(t) A () - 0 2 -;[i2 + 1 ~ + OX t -

with matrices Ai E IR n x n. It is known that the system is stable if the eigenvalues of the quadratic matrix eigenvalue problem

(3.2)

satisfy Re(>'j) < 0, j = 1, ... , 2n.

In a flutter analysis of a wing we consider the matrices in (3.2) as dependent on the inflow velocity v and the vector of design variables d E IRK. For a fixed design d the eigenvalues of the flutter equation

(3.3)

are computed for all v in a given velocity interval V = [Vmin, vmax ], where Vmax

is the maximal velocity under which the wing has to operate safely.

The smallest velocity, where the real part of some eigenvalue vanishes, is denoted as the flutter speed VII, and the corresponding eigenvalue is called the flutter mode.

Definition 3.1 Let >'j(v, d),j = 1, ... ,2n denote the eigenvalues 0/(3.3}. The flutter speed is defined as

VII(d)=min{v~O 3 jE{1, ... ,2n} with Re(>'j(v,d»=O}.

The optimal design problem consists of minimizing some objective function such as the weight subject to the constraint that the flutter speed is larger than the maximal speed in V. This can be formulated as

Optimal Design Problem

Minimize fed) subject to vJl(d) ~ Vmax dE IRK

3.2 Semi-infinite programming problem

We rewrite this problem in an equivalent form as a semi-infinite programming problem.


Minimize fed) over dE IRK subject to Re(Aj(V, d)) < 0, v E V = [Vmin, Vmax ], j = 1, ... , 2n

where Aj(v,d), j = 1, ... ,2n, solve

(A2 A2 (v,d) + AA1 (v,d) + Ao(v,d)) x = 0, v E V.

This is an example of a semi-infinite programming problem. Sometimes, in order to achieve a certain level of stability, one imposes an upper bound on the real part of the eigenvalues such that the stability constraint can be rewritten as

Re(Aj(v,d)) :S -f. < 0, v E V = [Vmin, vmax], j = 1, ... ,2n, (3.4)

This leads to consider so-called hump modes in the next paragraph.

3.3 Hump modes and method of reduction

In the aerodynamic design of wings, eigenvalues which, plotted as functions of the inflow velocity, come close to the instability region, are called hump modes. For a treatment of these in connection with optimization we refer to [9J.

404 CHAPTER 12

The concept of hump modes is closely related to a well known numerical technique in semi-infinite programming, the method of local reduction (see [10]). We formulate it for problems of the form

minf(d) s.t. g(V,d)~-f, dEIRK

v E V, (3.5)

of which the previous semi-infinite programming problem with the hump mode constraints is a special case.

For fixed d we have to consider the subproblem

max g(v, d). vEV

Let vmax be the set of local maxima of g(v, d) on V (for fixed d). IflVmaxl < 00 the semi-infinite programming problem is sometimes replaced by a finite optimization problem

min fed) s. t. g(vj,d) ~ -f, Vj E v max . dE IRK

(3.6)

If for a fixed design d one has identified a hump mode, then one monitors the movement of this maximum when design changes occur. This can be described using derivatives of the constraint functions. The differentiability of the constraints of (S1 P) can be shown under the following conditions (see [13] and [5]).

Theorem 3.2 Let Ai(t) E C~xn(T), i = 0,1,2 with T C IRm , open. Let t* E T and Aj, be an eigenvalue of

with multiplicity 1. Then there exist 6> 0, Uo(t*) and a map A* : Uo(t*) -t (J}

such that Aj, = A*(t*) and A*(t) is an eigenvalue of multiplicity 1 of

for t E Uo(t*). Furthermore


If we normalize the eigenvector by

eTx = 1

we can rewrite the flutter equation as F (x, A, v, d) = 0 as follows. This is the same equation as the one for the eigenvalues and eigenvector in the previous theorem except that we distinguish between the real and imaginary parts of the eigenvalues and eigenvectors. This reads as

F(X,A,V,d)

((Re(A)2 - Im(A)2)A2(v, d) + Re(A)Al (v, d) + Ao(v, d)) Re(x) - (2Re(A)Im(A)A2(v, d) + Im(A)Al (v, d)) Im(x)

((Re(A)2 - Im(A)2)A2(v,d) + Re(A)Al(V,d) + Ao(v,d)) Im(x) + (2Re(A)Im(A)A2(v, d) + Im(A)Al (v, d)) Re(x)

eT Re(x) - 1

eTlm(x)

Denoting J p as the Jacobian of F we obtain the following characterization of hump modes.

Corollary 3.3 Let Ai(V, d) E Cnxn[V x DJ, i = 0, 1, 2, be continuously differentiable with respect to v on V C IR and D C IRK, open. Let for fixed dE D all eigenvalues Aj (v, d) of the quadratic eigenvalue problem

(A2 A2(v, d) + AAl (v, d) + Ao(v, d))x = 0

with eigenvectors x j (v, d) have the algebraic multiplicity 1 for all v E V.

Then the hump mode for mode j and fixed design d satisfies locally the condition

:v gj (vL d) = (-JP(Xj, Aj, vL d)-l ~~ (Xj, Aj, v~, d)) = O. (3.7) 2n+l

v~ denotes the velocity v at which the hump mode j attains its local maximum.

3.4 Semi-definite semi-infinite programming problem

The problem formulation used above has the disadvantage that the restrictions of the optimization problem can be non differentiable. Furthermore the eigenvalues have to be computed explicitly in this model formulation.

406 CHAPTER 12

In the following we present an approach where the differentiability of the constraints can be guaranteed easily. Instead of posing restrictions on the real parts of the eigenvalues, we characterize the stability by linear matrix equalities and their solutions. As a reference to this topic in control see the monograph [3]. This approach avoids the explicit computation of eigenvalues and makes the problem amenable to methods of unconstrained optimization from the area of positive definite programming. More details can be found in the paper [6]. If one uses only finitely many velocities, one obtains an ordinary finite dimensional optimization problem and one can transform it also into a positive definite programming problem, see [15].

It is well known that the quadratic eigenvalue problem (3.2) is equivalent to the standard eigenvalue problem in 2n dimensions

(A(v, d) -)..I) y = 0

with

A( d) ( 0 I) 2nX2n V, := -A2 (v,d)-lAo(v,d) -A2 (v,d)-lA1 (v,d) E IR ,

if A2 (v, d) is regular. The following theorem is instrumental for the reformulation.

Theorem 3.4 (Lyapunov) ([1], Theorem 3.19) Let A E IRmxm. For an arbitrary symmetric negative definite matrix Q E IRmxm there exists a unique symmetric positive definite matrix P E IR m x m such that

ATp+PA=Q (3.8)

if and only if for all eigenvalues).. of A we have Re()..) < O.

The linear matrix equation (3.8) is called the Lyapunov equation.

The optimization problem can be written as a positive definite programming problem with Q = -I

Positive-Definite Infinite Programming Problem

Minimize

subject to

f(d) over dE IRK, P(·) E cmxm(V)

A(·, d)T P(·) + P(·)A(·, d) + 1= 0 P(v) pos. def. for v E V.


With this version of the original problem we have obtained an infinite and positive-definite programming problem. The constraint

P(v) positive definite for all v E V = [Vmin, vmaxl

demonstrates the infinite and positive-definite aspect.

Alternatively, instead of minimizing with regard to d and P( v) subject to the equality constraint defined through the Lyapunov equation, it is also possible to use only the variable d as a free variable and have P( v) defined through the Lyapunov equation. Since in this case P depends on v and d we should express this by writing P(v, d). Then the corresponding optimization problem is

Positive-Definite Semi-Infinite Programming Problem

Minimize

subject to

where P(',d) solves

f(d) over dE IRK

P(v, d) > 0 for all v E V

A(·, d)T P(', d) + P(', d)A(·, d) + I = O.

This optimization is in general a non-convex problem. Therefore, methods from semi-definite programming cannot be applied directly. However, we had good experience in several test cases with a nonlinear barrier method approach.

For each p > 0 we define the barrier or penalty function

B(d,p) = f(d) + p fv logdetP(v,d) dv.

For given Pk > 0 we solve the minimization problem

(BPh min f(d) + Pk r logdet P(v, d) dv dErn.K Jv

(3.9)

(3.10)

We decrease the penalty parameter as the iteration progresses. In order to solve (3.10), the gradient and possibly the Hessian of the barrier function have to be computed. The differentiability of the Lyapunov solution with regard to the parameter d can be shown if the matrix A(v, d) depends smoothly on d.

We obtain the following form of the gradient and the Hessian of the barrier function.

408 CHAPTER 12

Theorem 3.5 Let fed) E C2(D),P(v,d) E C!xm(V x D) with V = [Vmin, vmax] and Dc IRK be convex and compact. Let there exist a d* E int(D), such that for all v E V the matrix P(v, d*) is symmetric and positive definite.

There is an E > 0, such that the gradient of the barrier function (3.9) can be written as

8 8 J - 8 8d. B(d, p) = 8d fed) + p vtr[P 1 (v, d) 8d. P(v, d)] dv, J J J

(3.11)

for all d E U€(d*) and j = 1, ... , K. The Hessian of the barrier function of (3.9) can be represented for all d E U€(d*) by

82 82

8di 8dj B(d,p) = 8di8dj f (d)

-p J /r [p-I(v, d) (8~i P(v, d)) P-I(v,d) (8~j P(v, d)) ]

- tr [P-I(V,d) 8d~;dj P(v, d)] dv,

for i, j = 1, ... , K.

3.5 Numerical results

The mass minimization of a wing segment was carried out using the barrier approach with the logarithmic barrier function for a discretized version of the model (3.10). These are preliminary numerical results to obtain an indication of the feasability of the whole approach. An application of the reduction method of semi-infinite programming is part of future research.

The data of the flutter equation (3.3) which describes the vibration of the structure is given through the matrices [4]

( 1 -2.5) -2.5 41.27

Al = ~ (0.002248 -0.012803) d * 0.011523 0

Ao(v, d) ( 24.68jd _(v2 jd) * 0.002248) o 97.515jd2 - (v2 jd) * 0.0115


where d is the total mass of the wing segment.

For an initial design with a mass of d = 0.14 kg we obtain the eigenvalues described in Figure 3 depending on the velocity v.

4 Re3

-4

-6 Re4

-8oL-----~5~O------1~O~O----~1~50~----~20~O~----2~~~----~3~OO velocity v

Figure 4 Real Parts of the Eigenvalues for the Initial Design

The flutter speed is given by vII = 246 em/ s. The critical area for d E D = [0.11,0.14] is V = [Vmin, vmax ] = [240,260]. Let the objective function of the optimization problem (3.10) be given by f(d) = d. Using a sequence of barrier parameters

k = 1,2, ...

yields the optimal value of the design variable d* = 0.1199366 kg. In Table 1

N

bdi8 (d) = - L log det P(Vj, d)-I, j=o

In this example we have N = 20 and

Vj = 240 + j, j = 0, ... ,20

410 CHAPTER 12

Table 1 Result of the Optimal Design Problem Using the Barrier Method

k No. of Inner It. f(di.} bdi8 (diJ Bdi8(di., Pk} 1 14 0.1250247 275.7646 275.8897 2 20 0.1250244 275.7646 27.7015 3 15 0.1250221 275.7648 2.8827 4 12 0.1249948 275.7805 0.4008 5 6 0.1228071 291.3359 0.1519 6 22 0.1199756 346.1465 0.1234 7 14 0.1199462 350.1482 0.1203 8 26 0.1199371 356.4079 0.1199 9 23 0.1199366 356.4079 0.1199

In Figure 7 we show the eigenvalues of the inverse of the solution of the Lyapunov equation for various design values which one obtains during the iteration of the optimization routine. The computed Lyapunov solutions obviously satisfy the positive definiteness condition. A phenomenon similar to hump modes also occurs.

10.5 ,,"" ;v-'-

0.8 /60/

/ \

J J 0.6 J \

10 _.- ·-crs·-· -.- J , I -. ..,... '" 0.4 I > .- J > W W _---ao----_ I 0.2 ,

-9'~40- 0

245 250 255 260 240 245 250 255 260 v v

8 x 10-' 0.015

-6 dO_ - 0.01 -dO_

4 a5- -d5-_ M ... > > 0.005 w2 w

~40 245 250 255 260 ~40 245 250 255 260 v v

Figure 5 The Eigenvalues of the Inverse of the Solution of the Lyapunov Equation


REFERENCES [1] S. Barnett. Polynomials and Linear Control Systems. New York, 1983.

[2] R. L. Bisplinghoff, H. Ashley, and R. L. Halfman. Aeroelasticity. Reading, 1955.

[3] S. Boyd, L. EI Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Theory. SIAM Studies in Applied Mathematics, 1994.

[4J P. Dierolf. Aeroelastische Phanomene, Flattern. Vorlesung, Universitat Miinchen, 1980.

[5J M. Fahl. Zur Strukturoptimierung unter Flatterrestriktionen. Diploma Thesis, Universitat Trier, 1996.

[6J M. Fahl and E. W. Sachs. Modern optimization methods for structural design under flutter constraints. Technical report, Universitat Trier, Germany, 1997.

[7J H. W. Forsching. Grundlagen der Aeroelastik. Springer, Berlin-Heidelbeg-New York, 1974.

[8J E. Haaren-Retagne. Semi-Infinite Programming Algorithm for Robot Trajectory Planning. PhD thesis, Universitat Trier, 1992.

[9J R. T. Haftka. Parametric constraints with application to optimization using a continuous flutter constraint. AIAA Journal, 13:471 - 475, 1975.

[IOJ R. Hettich and K. O. Kortanek. Semi-infinite programming: theory, methods, and applications. SIAM Review, 35:380 - 429, 1993.

[11] D. Kleis. Augmented Lagrange SQP Methods and Application to the Sterilization of Prepackaged Food. PhD thesis, Universitat Trier, 1997.

[12J D. Kleis and E. W. Sachs. Optimal control of the sterilization of prepackaged food. Technical report, Universitat Trier, Germany, 1997.

[13J H.-D. Kothe. Storungstheorie des Eigenwertproblems. Diploma Thesis, Universitat Trier, 1988.

[14) E. Polak. On the mathematical foundations of nondifferentiable optimization in engineering design. SIAM Review, 29:21 - 89, 1987.

[15) U. T. Ringertz. Eigenvalues in optimum structural design. Technical Report 96-8, KTH Stockholm, Department of Aeronautics, 1996.

[16] C. L. M. Silva, F. A. R. Oliveira, and M. Hendrickx. Modelling optimum processing conditions for the sterilization of prepackaged foods. Food Control, 4(2):67 -78, 1993.


1. D.-Z. Du and J. Sun (eds.): Advances in Optimization and Approximation. 1994. ISBN 0-7923-2785-3

2. R. Horst and P.M. Pardalos (eds.): Handbook of Global Optimization. 1995 ISBN 0-7923-3120-6

3. R. Horst, P.M. Pardalos and N.V. Thoai: Introduction to Global Optimization 1995 ISBN 0-7923-3556-2; Pb 0-7923-3557-0

4. D.-Z. Du and P.M. Pardalos (eds.): Minimax and Applications. 1995 ISBN 0-7923-3615-1

5. P.M. Pardalos, Y. Siskos and C. Zopounidis (eds.): Advances in Multicriteria Analysis. 1995 ISBN 0-7923-3671-2

6. J.D. Pinter: Global Optimization inAction. Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications. 1996 ISBN 0-7923-3757-3

7. C.A. Floudas and P.M. Pardalos (eds.): State of the Art in Global Optimization. Computational Methods and Applications. 1996 ISBN 0-7923-3838-3

8. J.L. Higle and S. Sen: Stochastic Decomposition. A Statistical Method for Large Scale Stochastic Linear Programming. 1996 ISBN 0-7923-3840-5

9. I.E. Grossmann (ed.): Global Optimization in Engineering Design. 1996 ISBN 0-7923-3881-2

10. V.F. Dem'yanov, G.E. Stavroulakis, L.N. Polyakova and P.D. Panagiotopoulos: Quasidifferentiability and Nonsmooth Modelling in Mechanics, Engineering and Economics. 1996 ISBN 0-7923-4093-0

11. B. Mirkin: Mathematical Classification and Clustering. 1996 ISBN 0-7923-4159-7

12. B. Roy: Multicriteria Methodology for Decision Aiding. 1996 ISBN 0-7923-4166-X

13. R.B. Kearfott: Rigorous Global Search: Continuous Problems. 1996 ISBN 0-7923-4238-0

14. P. Kouvelis and G. Yu: Robust Discrete Optimization and Its Applications. 1997 ISBN 0-7923-4291-7

15. H. Konno, P.T. Thach and H. Tuy: Optimization on Low Rank Nonconvex Struc-tures. 1997 ISBNO-7923-4308-5

16. M. Hajdu: Network Scheduling Techniques for Construction Project Management. 1997 ISBN 0-7923-4309-3

17. J. Mockus, W. Eddy, A. Mockus, L. Mockus and G. Reklaitis: Bayesian Heuristic Approach to Discrete and Global Optimization. Algorithms, Visualization, Software, and Applications. 1997 ISBN 0-7923-4327-1

18. I.M. Bomze, T. Csendes, R. Horst and P.M. Pardalos (eds.): Developments in Global Optimization. 1997 ISBN 0-7923-4351-4

19. T. Rapcs8k: Smooth Nonlinear Optimization in Rn. 1997 ISBN 0-7923-4680-7 20. A. Migdalas, P.M. Pardalos and P. Varbrand (eds.): Multilevel Optimization:

Algorithms and Applications. 1998 ISBN 0-7923-4693-9 21. E.S. Mistakidis and G.E. Stavroulakis: Nonconvex Optimization in Mechanics.

Algorithms, Heuristics and Engineering Applications by the F.E.M. 1998 ISBN 0-7923-4812-5


22. H. Tuy: Convex Analysis and Global Optimization. 1998 ISBN 0-7923-4818-4 23. D. Cieslik: Steiner Minimal Trees. 1998 ISBN 0-7923-4983-0 24. N.Z. Shor: Nondifferentiable Optimization and Polynomial Problems. 1998

ISBN 0-7923-4997-0 25. R. Reemtsen and I.-I. Riickmann (eds.): Semi-Infinite Programming. 1998

ISBN 0-7923-5054-5 26. B. Ricceri and S. Simons: Minimax Theory and Applications. 1998

ISBN 0-7923-5064-2

KLUWER ACADEMIC PUBLISHERS - DORDREClIT I BOSTON I LONDON

Semi-Infinite Programming

Documents

Transcript of Semi-Infinite Programming