Optimization Module

8/13/2019 Optimization Module

1/109

Technical Report LBNL-54199

GenOpt(R)

Generic Optimization Program

User Manual

Version 2.0.0

Simulation Research Group

Building Technologies DepartmentEnvironmental Energy Technologies Division

Lawrence Berkeley National LaboratoryBerkeley, CA 94720

http://SimulationResearch.lbl.gov

Michael [email protected]

January 5, 2004

Notice:

This work was supported by the U.S. Department of Energy (DOE), by the Swiss Academy

of Engineering Sciences (SATW), and by the Swiss National Energy Fund (NEFF).

Copyright (c) 1998-2003The Regents of the University of California (through Lawrence Berkeley National Laboratory),subject to receipt of any required approvals from U.S. Department of Energy.


2/109

GenOptGeneric Optimization ProgramVersion 2.0.0

Lawrence Berkeley National LaboratoryBuilding Technologies Department


Contents

1 Abstract 5

2 Notation 6

3 Introduction 7

4 Optimization Problems 104.1 Classification of Optimization Problems . . . . . . . . . . 10

4.1.1 Problems with Continuous Variables . . . . . . . . 104.1.2 Problems with Discrete Variables . . . . . . . . . . 104.1.3 Problems with Continuous and Discrete Variables 114.1.4 Problems whose Cost Function is Evaluated by a

Building Simulation Program . . . . . . . . . . . . 114.2 Algorithm Selection . . . . . . . . . . . . . . . . . . . . . 12

4.2.1 Problem Pc with n >1 . . . . . . . . . . . . . . . 124.2.2 Problem Pcg with n >1 . . . . . . . . . . . . . . . 134.2.3 Problem Pc with n= 1 . . . . . . . . . . . . . . . 144.2.4 Problem Pcg with n= 1 . . . . . . . . . . . . . . . 144.2.5 Problem Pd . . . . . . . . . . . . . . . . . . . . . . 144.2.6 Problem Pcd and Pcdg . . . . . . . . . . . . . . . . 144.2.7 Functions with Several Local Minima . . . . . . . 14

5 Algorithms for Multi-Dimensional Optimization 155.1 Generalized Pattern Search Methods (Analysis) . . . . . . 15

5.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . 165.1.2 Characterization of Generalized Pattern Search Al-

gorithms . . . . . . . . . . . . . . . . . . . . . . . . 175.1.3 Model Adaptive Precision GPS Algorithm . . . . . 185.1.4 Convergence Results . . . . . . . . . . . . . . . . . 19

a) Unconstrained Minimization . . . . . . . 19b) Box-Constrained Minimization . . . . . . 20

5.2 Generalized Pattern Search Methods (Implementations) . 205.2.1 Coordinate Search Algorithm . . . . . . . . . . . . 21

a) Algorithm Parameters . . . . . . . . . . . 21b) Global Search . . . . . . . . . . . . . . . 21c) Local Search . . . . . . . . . . . . . . . . 21d) Parameter Update . . . . . . . . . . . . . 22e) Keywords . . . . . . . . . . . . . . . . . . 22

5.2.2 Hooke-Jeeves Algorithm . . . . . . . . . . . . . . . 23


1


3/109




a) Algorithm Parameters . . . . . . . . . . . 23b) Map for Exploratory Moves . . . . . . . . 23

c) Global Search Set Map . . . . . . . . . . 23d) Local Search Direction Map . . . . . . . 24e) Parameter Update . . . . . . . . . . . . . 24f) Keywords . . . . . . . . . . . . . . . . . . 24

5.2.3 Multi-Start GPS Algorithms . . . . . . . . . . . . 255.3 Discrete Armijo Gradient . . . . . . . . . . . . . . . . . . 27

5.3.1 Keywords . . . . . . . . . . . . . . . . . . . . . . . 295.4 Particle Swarm Optimization . . . . . . . . . . . . . . . . 31

5.4.1 PSO for Continuous Variables . . . . . . . . . . . . 31a) Neighborhood Top ology . . . . . . . . . . 32b) Model PSO Algorithm . . . . . . . . . . . 33

c) Particle Up date Equation . . . . . . . . . 34(i) Version with Inertia Weight . . . 34(ii) Version with Constriction Coef-

ficient . . . . . . . . . . . . . . . 355.4.2 PSO for Discrete Variables . . . . . . . . . . . . . 355.4.3 PSO for Continuous and Discrete Variables . . . . 365.4.4 PSO on a Mesh . . . . . . . . . . . . . . . . . . . . 375.4.5 Population Size and Number of Generations . . . . 375.4.6 Keywords . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 Hybrid Generalized Pattern Search Algorithm with Par-ticle Swarm Optimization Algorithm . . . . . . . . . . . . 41

5.5.1 Hybrid Algorithm for Continuous Variables . . . . 415.5.2 Hybrid Algorithm for Continuous and Discrete Vari-ables . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.5.3 Keywords . . . . . . . . . . . . . . . . . . . . . . . 425.6 Hooke-Jeeves . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.6.1 Modifications to the Original Algorithm . . . . . . 445.6.2 Algorithm Description . . . . . . . . . . . . . . . . 455.6.3 Keywords . . . . . . . . . . . . . . . . . . . . . . . 48

5.7 Simplex Algorithm of Nelder and Mead with the Exten-sion of ONeill . . . . . . . . . . . . . . . . . . . . . . . . . 495.7.1 Main Operations . . . . . . . . . . . . . . . . . . . 495.7.2 Basic Algorithm . . . . . . . . . . . . . . . . . . . 51

5.7.3 Stopping Criteria . . . . . . . . . . . . . . . . . . . 535.7.4 ONeills Modification . . . . . . . . . . . . . . . . 545.7.5 Modification of Stopping Criteria . . . . . . . . . . 545.7.6 Benchmark Tests . . . . . . . . . . . . . . . . . . . 565.7.7 Keywords . . . . . . . . . . . . . . . . . . . . . . . 59


2


4/109




6 Algorithms for One-Dimensional Optimization 606.1 Interval Division Algorithms . . . . . . . . . . . . . . . . . 60

6.1.1 General Interval Division . . . . . . . . . . . . . . 606.1.2 Golden Section Interval Division . . . . . . . . . . 616.1.3 Fibonacci Division . . . . . . . . . . . . . . . . . . 626.1.4 Comparison of Efficiency . . . . . . . . . . . . . . 636.1.5 Master Algorithm for Interval Division . . . . . . . 636.1.6 Keywords . . . . . . . . . . . . . . . . . . . . . . . 64

7 Algorithms for Parametric Runs 667.1 Parametric Runs by Single Variation . . . . . . . . . . . . 66

7.1.1 Algorithm Description . . . . . . . . . . . . . . . . 667.1.2 Keywords . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Parametric Runs on a Mesh . . . . . . . . . . . . . . . . . 677.2.1 Algorithm Description . . . . . . . . . . . . . . . . 677.2.2 Keywords . . . . . . . . . . . . . . . . . . . . . . . 68

8 Constraints 698.1 Constraints on Independent Variables . . . . . . . . . . . 69

8.1.1 Box Constraints . . . . . . . . . . . . . . . . . . . 698.1.2 Coupled Linear Constraints . . . . . . . . . . . . . 70

8.2 Constraints on Dependent Variables . . . . . . . . . . . . 708.2.1 Barrier Functions . . . . . . . . . . . . . . . . . . . 718.2.2 Penalty Functions . . . . . . . . . . . . . . . . . . 718.2.3 Implementation of Barrier and Penalty Functions . 72

9 Program 739.1 Interface to the Simulation Program . . . . . . . . . . . . 739.2 Interface to the Optimization Algorithm . . . . . . . . . . 749.3 Package genopt.algorithm . . . . . . . . . . . . . . . . . 749.4 Implementing a New Optimization Algorithm . . . . . . . 76

10 Installing and Running GenOpt 7810.1 Installing GenOpt . . . . . . . . . . . . . . . . . . . . . . 7810.2 System Configuration for JDK Installation . . . . . . . . . 78

10.2.1 Linux/Unix . . . . . . . . . . . . . . . . . . . . . . 7810.2.2 Microsoft Windows . . . . . . . . . . . . . . . . . . 79

10.3 Starting an Optimization with JDK Installation . . . . . . 7910.4 System Configuration for JRE Installation . . . . . . . . . 8010.5 Starting an Optimization with JRE Installation . . . . . . 80


3


5/109




11 Setting Up an Optimization Problem 8111.1 File Sp ecification . . . . . . . . . . . . . . . . . . . . . . . 81

11.1.1 Initialization File . . . . . . . . . . . . . . . . . . . 8211.1.2 Configuration File . . . . . . . . . . . . . . . . . . 8711.1.3 Command File . . . . . . . . . . . . . . . . . . . . 89

a) Specification of a Continuous Parameter . 89b) Specification of a Discrete Parameter . . 90c) Specification of Input Function Objects . 91d) Structure of the Command File . . . . . 92

11.1.4 Log File . . . . . . . . . . . . . . . . . . . . . . . . 9311.1.5 Output File . . . . . . . . . . . . . . . . . . . . . . 93

11.2 Pre-Processing and Post-Processing . . . . . . . . . . . . . 93a) Function Objects . . . . . . . . . . . . . . 93

b) Pre-Processing . . . . . . . . . . . . . . . 94c) Post-Processing . . . . . . . . . . . . . . 95

11.3 Truncation of Digits of the Cost Function Value . . . . . . 96

12 Conclusion 98

13 Acknowledgment 99

14 Notice 100

A Benchmark Tests 101A.1 Rosenbrock . . . . . . . . . . . . . . . . . . . . . . . . . . 101

A.2 Function 2D1 . . . . . . . . . . . . . . . . . . . . . . . . . 102A.3 Function Quad . . . . . . . . . . . . . . . . . . . . . . . . 103

Product and company names mentioned herein may be the trademarks of theirrespective owners. Any rights not expressly granted herein are reserved.


4


6/109




1 Abstract

GenOpt is an optimization program for the minimization of a cost functionthat is evaluated by an external simulation program. It has been developed foroptimization problems where the cost function is computationally expensiveand its derivatives are not available or may not even exist. GenOpt can becoupled to any simulation program that reads its input from text files andwrites its output to text files. The independent variables can be continuousvariables (possibly with lower and upper bounds), discrete variables, or both,continuous and discrete variables. Constraints on dependent variables can beimplemented using penalty or barrier functions.

GenOpt has a library with local and global multi-dimensional and one-dimensional optimization algorithms, and algorithms for doing parametric runs.An algorithm interface allows adding new minimization algorithms withoutknowing the details of the program structure.

GenOpt is written in Java so that it is platform independent. The platformindependence and the general interface make GenOpt applicable to a wide rangeof optimization problems.

GenOpt has not been designed for linear programming problems, quadraticprogramming problems, and problems where the gradient of the cost functionis available. For such problems, as well as for other problems, special tailoredsoftware exists that is more efficient.


5


7/109




2 Notation

1. We use the notation a b to denote that a is equal to b by definition.We use the notation ab to denote that a is assigned the value ofb.

2. Rn denotes the Euclidean space of n-tuplets of real numbers. Vectorsx Rn are always column vectors, and their elements are denoted bysuperscripts. The inner product in Rn is denoted by, and forx, yRn defined byx, y ni=1xi yi. The norm in Rn is denoted by and forx Rn defined byx x, x1/2.

3. We denote by Z the set of integers, by Q the set of rational numbers, andby N {0,1, . . .} the set of natural numbers. The setN+ is defined asN+ {1, 2, . . .}. Similarly, vectors inRn with strictly positive elementsare denoted by Rn+ {xRn | xi >0, i {1, . . . , n} }and the set Q+is defined as Q+

{qQ|q >0}.4. Let W be a set containing a sequence{wi}ki=0. Then, we denote byw k

the sequence{wi}ki=0 and by W k the set of all k + 1 element sequencesin W.

5. IfA and B are sets, we denote by A B the union ofA and B and byA B the intersection ofA and B.

6. IfS is a set, we denote by S the closure ofS and by 2S the set of allnonempty subsets ofS.

7. IfD Qnq is a matrix, we will use the notationdD to denotethe fact that

d Qn is a column vector of the matrix

D. Similarly, by

DD we mean thatD Qnp (1pq) is a matrix containing only

columns ofD. Further, card(D) denotes the number of columns ofD.8. f() denotes a function where () stands for the undesignated variables.

f(x) denotes the value off() at the point x. f: AB indicates thatthe domain off() is in the space A and its range in the space B .

9. We say that a function f : Rn R is once continuously differentiableiff() is defined on Rn, and iff() has continuous derivatives on Rn.

10. Forx Rn and f: Rn R continuously differentiable, we say that xis stationary iff(x) = 0.

11. We denote by{ei}ni=1 the unit vectors in Rn.12. We denote byU(0, 1) that R is a uniformly distributed random

number, with 01.


6


8/109




3 Introduction

The use of system simulation for analyzing complex engineering problems isincreasing. Such problems typically involve many independent variables1, andcan only be optimized by means of numerical optimization. Many designers useparametric studies to achieve better performance of such systems, even thoughsuch studies typically yield only partial improvement while requiring high la-bor time. In such parametric studies, one usually fixes all but one variable andtries to optimize a cost function2 with respect to the non-fixed variable. Theprocedure is repeated iteratively by varying another variable. However, everytime a variable is varied, all other variables typically become non-optimal andhence need also to be adjusted. It is clear that such a manual procedure is verytime-consuming and often impractical for more than two or three independentvariables.

GenOpt, a generic optimization program, has been developed to find withless labor time the independent variables that yield better performance of suchsystems. GenOpt does optimization of a user-supplied cost function, using auser-selected optimization algorithm.

In the most general form, the optimization problems addressed by GenOptcan be stated as follows: Let X be a user-specified constraint set, and letf: X R be a user-defined cost function that is bounded from below. Theconstraint set X consists of all possible design options, and the cost functionf() measures the system performance. GenOpt tries to find a solution to theproblem3

minxX

f(x). (3.1)

This problem is usually solved by iterative methods, which construct infi-nite sequences, of progressively better approximations to a solution, i.e., apoint that satisfies an optimality condition. If X Rn, with some n N,and X or f() is not convex, we do not have a test for global optimality, andthe most one can obtain is a point that satisfies a local optimality condition.Furthermore, for X Rn, tests for optimality are based on differentiabilityassumptions of the cost function. Consequently, optimization algorithms canfail, possibly far from a solution, iff() is not differentiable in the continuousindependent variables. Some optimization algorithms are more likely to fail at

1The independent variables are the variables that are varied by the optimizationalgorithm from one iteration to the next. They are also called design parameters orfree parameters.

2The cost function is the function b eing optimized. The cost function measuresa quantity that should be minimized, such as a buildings annual operation cost, asystems energy consumption, or a norm between simulated and measured values ina data fitting process. The cost function is also called objective function.

3Iff() is discontinuous, it may only have an infimum (i.e., a greatest lower bound)but no minimum even if the constraint set X is compact. Thus, to be correct, (3.1)should be replaced by infxXf(x). For simplicity, we will not make this distinction.


7


9/109


10/109




in Section 7 the algorithms for parametric runs. In Section 8, we discuss howconstraints on independent variables are implemented, and how constraints on

dependent variables can be implemented. In Section 9, we explain the struc-ture of the GenOpt software, the interface for the simulation program and theinterface for the optimization algorithms. How to install and start GenOpt isdescribed in Section 10. Section 11 shows how to set up the configuration andinput files, and how to use GenOpts pre- and post-processing capabilities.


9


11/109




4 Optimization Problems

4.1 Classification of Optimization Problems

We will now classify some optimization problems that can be solved withGenOpts optimization algorithms. The classification will be used in Section 4.2to recommend suitable optimization algorithms.

We distinguish between problems whose design parameters are continuousvariables1, discrete variables2, or both. In addition, we distinguish betweenproblems with and without inequality constraints on the dependent variables.

4.1.1 Problems with Continuous Variables

To denote box-constraints on independent continuous variables, we will usethe notation

X

xRn | li xi ui, i {1, . . . , n}, (4.1)where li < ui for i {1, . . . , n}.

We will consider optimization problems of the form

Pc minxX

f(x), (4.2)

wheref: Rn R is a once continuously differentiable cost function.Now, we add inequality constraints on the dependent variables to (4.2) and

obtain

Pcg minxX

f(x), (4.3a)

g(x)0, (4.3b)where everything is as in (4.2) and, in addition, g : Rn Rm is a once con-tinuously differentiable constraint function (for some m N). We will assumethat there exists an x X that satisfies g(x)< 0.

4.1.2 Problems with Discrete Variables

Next, we will discuss the situation where all design parameters can onlytake on user-specified discrete values.

Let Xd Znd denote the constraint set with a finite, non-zero number ofintegers for each variable.

1Continuous variables can take on any value on the real line, possibly betweenlower and upper bounds.

2Discrete variables can take on only integer values.


10


12/109




We will consider integer programming problems of the form

Pd minxXd f(x). (4.4)

4.1.3 Problems with Continuous and Discrete Variables

Next, we will allow for continuous and discrete independent variables.

We will use the notation

X Xc Xd, (4.5a)Xc

x Rnc | li xi ui, i {1, . . . , nc}

, (4.5b)

where the bounds on the continuous independent variables satisfy li 1

To solvePc withn >1, the hybrid algorithm (Section 5.5, page 41) or the

GPS implementation of the Hooke-Jeeves algorithm (Section 5.2.2, page 23)can be used, possibly with multiple starting points (Section 5.2.3, page 25). Iff() is once continuously differentiable and has bounded level sets (or if theconstraint set X defined in (4.1) is compact) then these algorithms constructfor problem (4.2) a sequence of iterates with stationary accumulation points(see Theorem 5.1.13).

Alternatively, the Discrete Armijo Gradient algorithm (Section 5.3, page 27)


12


14/109




can be used. Every accumulation point of the Discrete Armijo Gradient algo-rithm is a feasible stationary point.

Iff() is not continuously differentiable, or iff() must be approximated byan approximating cost function f(, ) where the approximation error cannotbe controlled, as described in Section 4.1.4, then Pc can only be solved heuris-tically. We recommend using the hybrid algorithm (Section 5.5, page 41), theGPS implementation of the Hooke-Jeeves algorithm (Section 5.2.2, page 23),possibly with multiple starting points (Section 5.2.3, page 25), or a ParticleSwarm Optimization algorithm (Section 5.4, page 31).

We do not recommend using the Nelder-Mead Simplex algorithm (Sec-tion 5.7, page 49) or the Discrete Armijo Gradient algorithm (Section 5.3,page 27).

The following approach reduces the risk of failing at a point which is non-optimal and far from a minimizer off():

1. Selecting large values for the parameter Step in the optimization com-mand file (see page 90).

2. Selecting different initial iterates.

3. Using the hybrid algorithm of Section 5.5, the GPS implementation ofthe Hooke-Jeeves algorithm, possibly with multiple starting points (Sec-tion 5.2.3, page 25), and/or a Particle Swarm Optimization algorithmand select the best of the solutions.

4. Doing a parametric study around the solution that has been obtainedby any of the above optimization algorithms. The parametric study can

be done using the algorithms Parametric(Section 7.1, page 66) and/orEquMesh(Section 7.2, page 67). If the parametric study yields a furtherreduction in cost, then the optimization failed at a non-optimal point.In this situation, one may want to try another optimization algorithm.

Iff() is continuously differentiable but must be approximated by approxi-mating cost functionsf(, ) where the approximation error can be controlledas described in Section 4.1.4, thenPc can be solved using the hybrid algorithm(Section 5.5, page 41) or the GPS implementation of the Hooke-Jeeves algo-rithm (Section 5.2.2, page 23), both with the error control scheme describedin the Model GPS Algorithm 5.1.8 (page 18). The GPS implementation ofthe Hooke-Jeeves algorithm can be used with multiple starting points (Sec-tion 5.2.3, page 25). The error control scheme can be implemented using thevalue of GenOpts variablestepNumber(page 72) and GenOpts pre-processingcapabilities (Section 11.2, page 93). A more detailed description of how to usethe error control scheme can be found in [PW03, WP03].

4.2.2 ProblemPcg with n >1

To solvePcg, the hybrid algorithm (Section 5.5, page 41) or the GPS imple-mentation of the Hooke-Jeeves algorithm (Section 5.2.2, page 23) can be used,


13


15/109




possibly with multiple starting points (Section 5.2.3, page 25). Constraintsg() 0 can be implemented using barrier and penalty functions (Section 8,page 69).

Iff() or g() are not continuously differentiable, we recommend using thehybrid algorithm (Section 5.5, page 41) or the GPS implementation of theHooke-Jeeves algorithm (Section 5.2.2, page 23), possibly with multiple start-ing points (Section 5.2.3, page 25), and implement the constraints g() 0using barrier and penalty functions (Section 8, page 69). To reduce the risk ofterminating far from a minimum point off(), we recommend the same mea-sures as for solvingPc.

4.2.3 ProblemPcwith n= 1

To solve Pc with n = 1, any of the interval division algorithms can beused (Section 6.1, page 60). Since only a few function evaluations are requiredfor parametric studies in one dimension, the algorithmParametriccan also beused for this problem (Section 7.1, page 66). We recommend doing a parametricstudy iff() is expected to have several local minima.

4.2.4 ProblemPcg with n= 1

To solve Pcgwithn= 1, the same applies as for Pcwithn= 1. Constraintsg()0 can be implemented by setting the penalty weighting factor in (8.8)to a large value. This may still cause small constraint violations, but it is easyto check whether the violation is acceptable.

4.2.5 ProblemPdTo solve Pd, a Particle Swarm Optimization algorithm can be used (Sec-

tion 5.4, page 31).

4.2.6 ProblemPcd and Pcdg

To solve Pcd, or Pcdg, the hybrid algorithm (Section 5.5, page 41) or aParticle Swarm Optimization algorithm can be used (Section 5.4, page 31).

4.2.7 Functions with Several Local Minima

If the problem has several local minima, we recommend using the GPSimplementation of the Hooke-Jeeves algorithm with multiple starting points

(Section 5.2.3, page 25), the hybrid algorithm (Section 5.5, page 41), or aParticle Swarm Optimization algorithm (Section 5.4, page 31).


14


16/109




5 Algorithms for

Multi-DimensionalOptimization

5.1 Generalized Pattern Search Methods

(Analysis)

Generalized Pattern Search (GPS) algorithms are derivative free optimiza-tion algorithms for the minimization of problemPc and Pcg, defined in (4.2)and (4.3), respectively. We will present the GPS algorithms for the case wherethe function f() cannot be evaluated exactly, but can be approximated byfunctions f

: R

q

+ Rn

R, where the first argumentRq

+ is the precisionparameter of PDE, ODE, and algebraic equation solvers. Obviously, the ex-planations are similar for problems wheref() can be evaluated exactly, exceptthat the scheme to control is not applicable, and that the approximate func-tions f(, ) are replaced by f().

Under the assumption that the cost function is continuously differentiable,all the accumulation points constructed by the GPS algorithms are stationary.

What GPS algorithms have in common is that they define the constructionof a mesh Mk in R

n, which is then explored according to some rules that differamong the various members of the family of GPS algorithms. If no decrease incost is obtained on mesh points around the current iterate, then the distance

between the mesh points is reduced, and the process is repeated.

We will now explain the framework of GPS algorithms that will be used toimplement different instances of GPS algorithms in GenOpt. The discussionfollows the more detailed description of [PW03].


15


17/109




5.1.1 Assumptions

We will assume that f() and its approximating functions

{f(,

)}

Rq

+

have the following properties.

Assumption 5.1.11. There exists an error bound function : Rq+ R+ such that for any

bounded setSX, there exists an S Rq+ and a scalarKS (0,)such that for allxS and for all Rq+, withS,1

| f(, x) f(x)| KS (). (5.1)Furthermore,

lim0

() = 0. (5.2)

2. The functionf: Rn

R is once continuously differentiable.Remark 5.1.2

1. The functions{f(, )}Rq+

may be discontinuous.

2. See [PW03] for the situation wheref() is only locally Lipschitz contin-uous.

Next, we state an assumption on the level sets of the family of approximatefunctions. To do so, we first define the notion of a level set.

Definition 5.1.3 (Level Set) Given a functionf: Rn R and an R,such that > infxRnf(x), we will say that the setL(f)

Rn, defined as

L(f) {x Rn | f(x)}, (5.3)is a level set off(), parametrized by.

Assumption 5.1.4 (Compactness of Level Sets) Let{f(, )}Rq+

be as

in Assumption 5.1.1 and let X Rn be the constraint set. Let x0 X bethe initial iterate and0 Rq+ be the initial precision setting of the numericalsolvers. Then, we assume that there exists a compact setC Rn such that

Lf(0,x0)(f(, )) XC, 0. (5.4)

1For q+ , by S, we mean that 0< i iS, for all i {1, . . . , q }.


16


18/109




5.1.2 Characterization of Generalized Pattern SearchAlgorithms

There exist different geometrical explanations for pattern search algorithms,and a generalization is given in the review [TGKT03]. We will use a simple im-plementation of the pattern search algorithms in [PW03] where we restrict thesearch directions to be the positive and negative coordinate directions. Thus,the search directions are the columns of the matrix

D [e1, +e1, . . . , en, +en] Zn2n, (5.5)which suffices for box-constrained problems. Furthermore, we construct thesequence of mesh size parameters that parametrizes the minimum distancebetween iterates such that it satisfies the following assumption.

Assumption 5.1.5 (k-th Mesh Size Parameter) Letr, s0, k

N, withr >

1, and{ti}k1i=0 N. We will assume that the sequence of mesh size parameterssatisfies

k 1

rsk, (5.6a)

where fork >0

sk s0+k1i=0

ti. (5.6b)

With this construction, all iterates lie on a rational mesh of the form

Mk {x0+ kD m|mN2n}. (5.7)We will now characterize the set-valued maps that determine the mesh

points for the global and local searches. Note that the images of thesemaps may depend on the entire history of the computation.

Definition 5.1.6 Let Xk Rn and k Q+ be the sets of all sequencescontainingk+ 1 elements, letMk be the current mesh, and let Rq+ be thesolver tolerance.

1. We define theglobal search set map to be any set-valued map

k : Xk kRq+

2Mk X (5.8a)whose imagek(xk, k, ) contains only a finite number of mesh points.

2. We will callGk k(xk, k, ) theglobal search set.3. We define the directions for the local search as

D [e1, +e1, . . . , en, +en]. (5.8b)


17


19/109




4. We will call

Lk xk+ kD ei| i {1, . . . , 2 n} X (5.8c)thelocal search set.

Remark 5.1.71. The mapk(, , ) can be dynamic in the sense that if{xki}Ii=0 k(xk, k, ),

then the rule for selecting xki, 1i I, can depend on{xki} i1i=0 and

{f(, xki)} i1i=0. It is only important that the global search terminatesafter a finite number of computations, and thatGk(2Mk X) .

2. As we shall see, the global search affects only the efficiency of the algo-rithm but not its convergence properties. Any heuristic procedure thatleads to a finite number of function evaluations can be used for k(, , ).

3. The empty set is included in the range ofk(, , ) to allow omitting theglobal search.

5.1.3 Model Adaptive Precision GPS Algorithm

We will now present our model GPS algorithm with adaptive precision costfunction evaluations.

Algorithm 5.1.8 (Model GPS Algorithm)

Data: Initial iteratex0X;Mesh size dividerr N, with r >1;Initial mesh size exponents0 N.

Maps: Global search set mapk : Xk kRq+

2Mk X

;

Function : R+

Rq

+

(to assign), such that the composition : R+ R+ is strictly monotone decreasing and satisfies(())/0, as 0.

Step 0: Initializek = 0, 0 = 1/rs0 , and = (1).Step 1: Global Search

Construct the global search setGk =k(xk, k, ).Iff(, x) f(, xk)< 0 for any x Gk, go to Step 3;else, go to Step 2.

Step 2: Local SearchEvaluate f(, ) for anyx Lk until somex Lksatisfyingf(, x) f(, xk)< 0 is obtained, or until all pointsinLk are evaluated.

Step 3: Parameter Update

If there exists anx

Gk Lk satisfying f

(, x

) f

(, xk)< 0,setxk+1 = x, sk+1 = sk, k+1 = k, and do not change;else, setxk+1 = xk, sk+1 = sk+tk, with tk N+ arbitrary,k+1 = 1/rsk+1, = (k+1/0).

Step 4: Replacek byk + 1, and go to Step 1.


18


20/109




Remark 5.1.91. To ensure that does not depend on the scaling of 0, we normalized the

argument of(). In particular, we want to decouple from the userschoice of the initial mesh parameter.

2. In Step 2, once a decrease of the cost function is obtained, one canproceed to Step 3. However, one is allowed to evaluate f(, ) at morepoints in Lk in an attempt to obtain a bigger reduction in cost. However,one is allowed to proceed to Step 3 only after either a cost decrease hasbeen found, or after al lpoints inLk are tested.

3. In Step 3, we are not restricted to accepting thex GkLk that giveslowest cost value. But the mesh size parameter k is reduced only ifthere exists no x Gk Lk satisfying f(, x) f(, xk)< 0.

4. To simplify the explanations, we do not increase the mesh size parameter

if the cost has been reduced. However, our global search allows search-ing on a coarser meshM Mk, and hence, our algorithm can easilybe extended to include a rule for increasing k for a finite number ofiterations.

5. Audet and Dennis [AD03] update the mesh size parameter using theformula k+1 = m k, where Q, > 1, and m is any element ofZ. Thus, our update rule for k is a special case of Audets and Dennisconstruction since we set= 1/r, withr N+,r2 (so that


21/109




Theorem 5.1.11 (Convergence to a Stationary Point) Suppose that As-sumptions 5.1.1 and 5.1.4 are satisfied and thatX = Rn. Letx Rn be anaccumulation point of the refining subsequence{xk}kK, constructed by ModelGPS Algorithm 5.1.8. Then,

f(x) = 0. (5.9)

b) Box-Constrained Minimization

We now present the convergence results for the box-constrained prob-lem (4.2). See [AD03, PW03, TGKT03] for the more general case of linearly-constrained problems and for the convergence proofs.

First, we introduce the notion of a tangent cone and a normal cone, which

are defined as follows:

Definition 5.1.12 (Tangent and Normal Cone)1. LetXRn. Then, we define thetangent cone to X at a pointx X

byTX(x

) { (x x)| 0, xX}. (5.10a)2. LetTX(x

)be as above. Then, we define thenormal coneto X atx Xby

NX(x) {v Rn | tTX(x),v, t 0}. (5.10b)

We now state that the accumulation points generated by Model GPS Al-

gorithm 5.1.8 are feasible stationary points of problem (4.2).

Theorem 5.1.13 (Convergence to a Feasible Stationary Point)Suppose Assumptions 5.1.1 and 5.1.4 are satisfied. Let x X be an accu-mulation point of a refining subsequence{xk}kK constructed by Model GPSAlgorithm 5.1.8 in solving problem (4.2). Then,

f(x), t 0, tTX(x), (5.11a)and

f(x)NX(x). (5.11b)

5.2 Generalized Pattern Search Methods

(Implementations)

We will now present different implementations of the Generalized PatternSearch (GPS) algorithms. They all use the Model GPS Algorithm 5.1.8 to solveproblem Pc defined in (4.2). The problem Pcg defined in (4.3) can be solved


20


22/109




by using penalty functions as described in Section 8.2.

We will discuss the implementations for the case where the function f()cannot be evaluated exactly, but will be approximated by functionsf : Rq+Rn R, where the first argument Rq+ is the precision parameter of thePDE, ODE, and algebraic equation solvers. This includes the case where isnot varied during the optimization, in which case the explanations are identical,except that the scheme to control is not applicable, and that the approximatefunctionsf(, ) are replaced by f().

If the cost function f() is approximated by functions{f(, )}Rq+

with

adaptive precision , then the function : R+ Rq+ (to assign ) can be im-plemented by using GenOpts pre-processing capability (see Section 11.2).

5.2.1 Coordinate Search Algorithm

We will now present the implementation of the Coordinate Search algo-rithm with adaptive precision function evaluations using the Model GPS Al-gorithm 5.1.8. To simplify the implementation, we assign f(, x) =for allxX whereX is defined in (4.1).

a) Algorithm Parameters

The search direction matrix is defined as

D [+s1 e1,s1 e1, . . . ,+sn en,sn en] (5.12)

where si

R, i

{1, . . . , n

}, is a scaling for each parameter (specified by

GenOpts parameterStep).

The parameterr N,r >1, which is used to compute the mesh size param-eter k, is defined by the parameter MeshSizeDivider, the initial value for themesh size exponents0 N is defined by the parameter InitialMeshSizeExponent,and the mesh size exponent incrementtk is, for the iterations that do not reducethe cost, defined by the parameter MeshSizeExponentIncrement.

b) Global Search

In the Coordinate Search Algorithm, there is no global search. Thus, Gk =for all k N.

c) Local Search

The local search setGk is constructed using the set-valued map Ek : Rn Q+ Rq+2Mk , which is defined as follows:


21


23/109




Algorithm 5.2.1 (Map Ek : Rn Q+Rq+2Mk for Coordinate Search)

Parameter: Search direction matrixD = [+s1 e1,s1 e1, . . . ,+sn en,sn en].Vector Nn.

Input: Iteration numberk N.Base pointxRn.Mesh divider k Q+.

Output: Set of trial pointsT.Step 0: InitializeT =.

Ifk = 0, initialize , i = 0 for all i {1, . . . , n}.Step 1: For i = 1, . . . , n

Setx= x+ kD e2 i1+i andT T {x}.Iff(, x)< f(, x)

Setx =

x.

else

Ifi = 0, seti = 1, else seti = 0.Setx= x+ kD e2 i1+i andT T {x}.Iff(, x)< f(, x)

Setx =x.else

Ifi = 0, set i = 1, else seti = 0.end if.

end if.end for.

Step 2: ReturnT.

Thus,Ek (x, k, ) =T for all k N.Remark 5.2.2 In Algorithm 5.2.1, the vectorNn contains for each coor-dinate direction an integer 0 or 1 that indicates whether a step in the positiveor in the negative coordinate direction yield a decrease in cost in the previousiteration. This reduces the number of exploration steps.

d) Parameter Update

The point x in Step 3 of the GPS Model Algorithm 5.1.8 corresponds tox argminxEk(xk,k,)f

(, x) in the Coordinate Search algorithm.

e) Keywords

For the GPS implementation of the Coordinate Search Algorithm, the com-

mand file (see page 89) must only contain continuous parameters.

To invoke the algorithm, the Algorithmsection of the GenOpt commandfile must have the following form:

Algorithm{

Main = GPSCoordinateSearch;


22


24/109




MeshSizeDivider = Integer; // 1 < M eshSizeDivider

InitialMeshSizeExponent = Integer; // 0 1, used to compute k 1/rsk(see equation (5.6a)). A common value is r = 2.

InitialMeshSizeExponent The value fors0 N in (5.6b). A common valueis s0 = 0.

MeshSizeExponentIncrementThe value for ti N (for the iterations thatdo not yield a decrease in cost) in (5.6b). A common value is ti= 1.

NumberOfStepReduction The maximum number of step reductions beforethe algorithm stops. Thus, if we use the notationm NumberOfStepReduction,then we have for the last iterations k = 1/rs0

+m tk . A common valueis m = 4.

5.2.2 Hooke-Jeeves Algorithm

We will now present the implementation of the Hooke-Jeeves algorithm [HJ61]with adaptive precision function evaluations using the Model GPS Algorithm 5.1.8.The modifications of Smith [Smi69], Bell and Pike [BP66] and De Vogelaere [DV68]are implemented in this algorithm.

To simplify the implementation, we assignf(, x) =

for allx

XwhereX is defined in (4.1).

a) Algorithm Parameters

The algorithm parametersD,r,s0, andtk are defined as in the CoordinateSearch algorithm (see page 21).

b) Map for Exploratory Moves

To facilitate the algorithm explanation, we use the set-valued mapEk : Rn

Q+ Rq+2Mk , as defined in Algorithm 5.2.1. The mapEk(, , ) defines theexploratory moves in [HJ61], and will be used in Section c) to define theglobal search set map and, under conditions to be seen in Section d), the local

search direction map as well.

c) Global Search Set Map

The global search set mapk(, , ) is defined as follows. Because 0(, , )depends on x1, we need to introducex1, which we define as x1 x0.


23


25/109




Algorithm 5.2.3 (Global Search Set Map k : Xk kRq+2Mk)Map: Map for exploratory movesEk : Rn

Q+

R

q+

2Mk .

Input: Previous and current iterate,xk1 Rn and xk Rn.Mesh divider k Q+.Solver precision Rq+.

Output: Global search setGk.Step 1: Setx = xk+ (xk xk1).Step 2: ComputeGk = Ek(x, k, ).Step 3: If

minxGkf

(, x)

> f(, xk)SetGk Gk Ek(xk, k, ).

end if.Step 4: ReturnGk.

Thus,k(xk, k, ) =Gk .

d) Local Search Direction Map

If the global search, as defined by Algorithm 5.2.3, has failed in reduc-ing f(, ), then Algorithm 5.2.3 has constructed a setGk that contains theset{xk + kD ei | i = 1, . . . ,2n}. This is because in the evaluation ofEk(xk , k, ), defined in Algorithm 5.2.1, all Iff(, x)< f(, x) statementsyield false, and, hence, one has constructed{xk+ kD ei| i = 1, . . . ,2n}=Ek(xk , k, ).

Because the columns ofD span Rn positively, it follows that the search onthe set{xk+ kD ei|i = 1, . . . ,2n}is a local search. Hence, the constructedset

Lk

{xk+ kD ei

|i = 1, . . . ,2n

} Gk (5.13)

is a local search set. Consequently, f(, ) has already been evaluated at allpoints ofLk (during the construction ofGk) and, hence, one does not need toevaluate f(, ) again in a local search.

e) Parameter Update

The point x in Step 3 of the GPS Model Algorithm 5.1.8 corresponds tox argminxGkf

(, x) in the Hooke-Jeeves algorithm. (Note thatLk Gkif a local search has been done as explained in the above paragraph.)

f) Keywords

For the GPS implementation of the Hooke-Jeeves algorithm, the command

file (see page 89) must only contain continuous parameters.


Algorithm{

Main = GPSHookeJeeves;


24


26/109


27/109




Seed This value is used to initialize the random number generator.

NumberOfInitialPoint The number of initial points.

The other entries are the same as for the Coordinate Search algorithm, and areexplained on page 22.

To use the GPSHookeJeevesalgorithm with multiple starting points, theAlgorithmsection of the GenOpt command file must have the following form:

Algorithm{

Main = GPSHookeJeeves;

MultiStart = Uniform;

Seed = Integer;

NumberOfInitialPoint = Integer; // 0 < NumberOfInitialPoint

MeshSizeDivider = Integer; // 1 < M eshSizeDivider

InitialMeshSizeExponent = Integer; // 0


28/109




5.3 Discrete Armijo Gradient

The Discrete Armijo Gradient algorithm can be used to solve problem Pcdefined in (4.2) wheref() is continuously differentiable.

The Discrete Armijo Gradient algorithm approximates gradients by finitedifferences. It can be used for problems where the cost function is evaluated bycomputer code that defines a continuously differentiable function but for whichobtaining analytical expressions for the gradients is impractical or impossible.

Since the Discrete Armijo Gradient algorithm is sensitive to discontinuitiesin the cost function, we recommend not to use this algorithm if the simula-tion program contains adaptive solvers with loose precision settings, such asEnergyPlus [CLW+01]. On such functions, the algorithm is likely to fail. InSection 4.2, we recommend algorithms that are better suited for such situations.

We will now present the Discrete Armijo Gradient algorithm and the Armijostep-size subprocedure.Algorithm 5.3.1 (Discrete Armijo Gradient Algorithm)

Data: Initial iteratex0X., (0, 1), (0, ), k, k0 Z,lmax, N (for reseting the step-size calculation).Termination criteriam, x R+, imax N.

Step 0: Initialize i = 0 and m = 0.Step 1: Compute thesearch directionhi.

Ifm < m, stop.Else, set = k0+m and compute, forj {1, . . . , n},h

ji = (f(xi+ ej) f(xi)) /.

Step 2 : Check descent.Compute (xi; hi) = (f(xi+ hi) f(xi)) /.If (xi; hi)< 0, go to Step 3.Else, replacem by m + 1 and go to Step 1.

Step 3 : Line search.Use Algorithm 5.3.2 (which requiresk, lmaxand) to compute ki.Set

i= arg min{ki ,ki1}

f(xi+ hi). (5.14)

Step 4 : Iff(xi+ihi) f(xi)> , replacem by m + 1 and go to Step 1.Step 5 : Setxi+1 = xi+ihi.

If

ih

i<

x, stop. Else, replace i by i + 1 and go to Step 1.


27


29/109




Algorithm 5.3.2 (Armijo Step-Size Subprocedure)

Data: Iteration numberi

N, iteratexi

Rn, search direction hi

Rn,

k, ki1 Z, , (0, 1), and (xi; hi) R with (xi; hi)< 0,parameter for restartlmax, N.

Step 0: Initialize l = 0.Ifi = 0, setk =k, else set k =ki1.

Step 1: Replacel by l + 1, and test the conditions

f(xi+k hi) f(xi) k (xi; hi), (5.15a)

f(xi+k1 hi) f(xi) > k1 (xi; hi). (5.15b)

Step 2: Ifk satisfies (5.15a) and (5.15b), return k .Step 3: Ifk satisfies (5.15b) but not (5.15a),

replacek byk + 1.else,

replacek byk 1.Ifl < lmax orki1k +, go to Step 1. Else, go to Step 4.

Step 4: SetK {k Z|kk}, and computek minkK{k| f(xi+k hi) f(xi)k (xi; hi)}.Returnk .

Note that in Algorithm 5.3.2, as 1, the number of tries to compute theArmijo step-size is likely to go to infinity. Under appropriate assumptions onecan show that = 1/2 yields fastest convergence [Pol97].

The step-size Algorithm 5.3.2 requires often only a small number of functionevaluations. However, occasionally, once a very small step-size has occurred,

Algorithm 5.3.2 can trap the Discrete Armijo Gradient algorithm into using avery small step-size for all subsequent iterations. Hence, ifki1 > k

+, wereset the step-size by computing Step 4.

Algorithm 5.3.1 together with the step-size Algorithm 5.3.2 have the fol-lowing convergence properties [Pol97].

Theorem 5.3.3 Let f: Rn R be continuously differentiable and boundedbelow.

1. If Algorithm 5.3.1 jams atxi, cycling indefinitely in the loop defined bySteps 1-2 or in the loop defined by Steps 1-4, thenf(xi) = 0.

2. If

{xi

}i=0 is an infinite sequence constructed by Algorithm 5.3.1 and Al-

gorithm 5.3.2 in solving(4.2), then every accumulation pointxof{xi}i=0satisfiesf(x) = 0.

Note that hi has the same units as the cost function, and the algorithmevaluates xi+ hi for some R+. Thus, the algorithm is sensitive to the


28


30/109




scaling of the problem variables, a rather undesirable effect. Therefore, in theimplementation of Algorithm 5.3.1 and Algorithm 5.3.2, we normalize the cost

function values by replacing, for all x Rn, f(x) byf(x)/f(x0), where x0 isthe initial iterate. Furthermore, we set x0 = 0 and evaluate the cost functionfor the values j +xj sj , j {1, . . . , n}, where xj R is the j-th componentof the design parameter computed in Algorithm 5.3.1 or Algorithm 5.3.2 andj R andsj R are the setting of the parametersIniandStep, respectively,for thej -th design parameter in the optimization command file (see page 89).

In view of the sensitivity of the Discrete Armijo Gradient algorithm to thescaling of the problem variables and the cost function values, the implemen-tation of penalty and barrier functions may cause numerical problems if thepenalty is large compared to the unpenalized cost function value.

If box-constraints for the independent parameters are specified, then thetransformations (8.2) are used.

5.3.1 Keywords

For the Discrete Armijo Gradient algorithm, the command file (see page 89)must only contain continuous parameters.


Algorithm{

Main = DiscreteArmijoGradient;

Alpha = Double; // 0 < Alpha < 1

Beta = Double; // 0 < Beta < 1

Gamma = Double; // 0 < Gamma

K0 = Integer;KStar = Integer;

LMax = Integer; // 0


31/109




KStar The variable k used to initialize the line search.

LMax The variable lmax used in Step 3 of Algorithm 5.3.2 to determinewhether the line search needs to be reinitialized.

Kappa The variableused in Step 3 of Algorithm 5.3.2 to determine whetherthe line search needs to be reinitialized.

EpsilonM The variable m used in the determination criteria m < m in

Step 1 of Algorithm 5.3.1.

EpsilonX The variablex used in the determination criteriaihi< x inStep 5 of Algorithm 5.3.1.


30


32/109




5.4 Particle Swarm Optimization

Particle Swarm Optimization (PSO) algorithms are population-based prob-abilistic optimization algorithms first proposed by Kennedy and Eberhart [EK95,KE95] to solve problem Pc defined in (4.2) with possibly discontinuous costfunction f: Rn R. In Section 5.4.2, we will present a PSO algorithm fordiscrete independent variables to solve problem Pd defined in (4.4), and inSection 5.4.3 we will present a PSO algorithm for continuous and discrete in-dependent variables to solve problemPcddefined in (4.6). To avoid ambiguousnotation, we always denote the dimension of the continuous independent vari-able by nc N and the dimension of the discrete independent variable bynd N.

PSO algorithms exploit a set of potential solutions to the optimizationproblem. Each potential solution is called a particle, and the set of potential

solutions in each iteration step is called a population. PSO algorithms are globaloptimization algorithms and do not require nor approximate gradients of thecost function. The first population is typically initialized using a random num-ber generator to spread the particles uniformly in a user-defined hypercube. Aparticle update equation, which is modeled on the social behavior of membersof bird flocks or fish schools, determines the location of each particle in thenext generation.

A survey of PSO algorithms can be found in Eberhart and Shi [ES01].Laskari et. al. present a PSO algorithm for minimax problems [LPV02b] andfor integer programming [LPV02a]. In [PV02a], Parsopoulos and Vrahatis dis-cuss the implementation of inequality and equality constraints to solve problemPcg defined in (4.3).

We first discuss the case where the independent variable is continuous, i.e.,the case of problemPc defined in (4.2).

5.4.1 PSO for Continuous Variables

We will first present the initial version of the PSO algorithm which is theeasiest to understand.

In the initial version of the PSO algorithm [EK95, KE95], the update equa-tion for the particle location is as follows: Let k N denote the generationnumber, let nP N denote the number of particles in each generation, letxi(k) Rnc , i {1, . . . , nP}, denote the i-th particle of the k-th generation,letvi(k)

Rnc denote its velocity, letc1, c2

R+ and let1(k), 2(k)

U(0, 1)

be uniformly distributed random numbers between 0 and 1. Then, the updateequation is, for all i {1, . . . , nP}and all k N,

vi(k+ 1) = vi(k) +c11(k)pl,i(k) xi(k)

+c22(k)

pg,i(k) xi(k)

, (5.16a)

xi(k+ 1) = xi(k) +vi(k+ 1), (5.16b)


31


33/109




wherevi(0) 0 and

pl,i(k) arg minx{xi(j)}kj=0 f(x), (5.17a)

pg,i(k) arg minx{{xi(j)}kj=0}

nPi=1

f(x). (5.17b)

Thus,pl,i(k) is the location that for thei-th particle yields the lowest cost overall generations, and pg,i(k) is the location of the best particle over all genera-tions. The term c1 1(k) (pl,i(k) xi(k)) is associated with cognition since ittakes into account the particles own experience, and the term c22(k) (pg,i(k)xi(k)) is associated with social interaction between the particles. In view ofthis similarity,c1 is calledcognitive acceleration constantand c2 is calledsocialacceleration constant.

a) Neighborhood Topology

The minimum in (5.17b) need not be taken over all points in the popu-lation. The set of points over which the minimum is taken is defined by theneighborhood topology. In PSO, the neighborhood topologies are usually de-fined using the particle index, and not the particle location. We will use thelbest, gbest, and the von Neumannneighborhood topology, which we will nowdefine.

In thelbesttopology of sizel N, withl >1, the neighborhood of a particlewith index i {1, . . . , nP} consist of all particles whose index are in the set

Ni

{i

l , . . . i , . . . , i +l}

, (5.18a)

where we assume that the indices wrap around, i.e., we replace1 bynP 1,replace2 bynP 2, etc.

In the gbesttopology, the neighborhood contains all points of the popula-tion, i.e.,

Ni {1, . . . , nP}, (5.18b)for all i {1, . . . , nP}.

For the von Neumanntopology, consider a 2-dimensional lattice, with thelattice points enumerated as shown in Figure 5.1. We will use the von Neumanntopology of range 1, which is defined, for i, j

Z, as the set of points whose

indices belong to the set

Nv(i,j)

(k, l)|k i| + |l j| 1, k , l Z . (5.18c)

The gray points in Figure 5.1 areNv(1,2). For simplicity, we round in GenOptthe user-specified number of particles nP N to the next biggest integer nP


32


34/109




0,0 0,1 0,2 0,3

1,0 1,1 1,2 1,3

2,0 2,1 2,2 2,3

Figure 5.1: Section of a 2-dimensional lattice of particles withnP 3.The particles belonging to the von Neumann neighborhoodNv(1,2) with range1,defined in (5.18c), are colored gray. Indicated by dashes are the particles thatare generated by wrapping the indices.

such that

nP N andnP nP.2 Then, we can wrap the indices by replacing,fork Z, (0, k) by (nP, k), (

nP+ 1, k) by (1, k), and similarly by replacing

(k, 0) by (k,

nP) and (k,

nP+ 1) by (k, 1). Then, a particle with indices(k, l), with 1 knP and 1 l

nP, has in the PSO algorithm the

index i = (k 1) nP+l, and hencei {1, . . . , nP}.

Kennedy and Mendes [KM02] show that greater connectivity of the parti-cles speeds up convergence, but it does not tend to improve the populationsability to discover the global optimum. Best performance has been achievedwith the von Neumann topology, whereas neither the gbestnor thelbesttopol-ogy seemed especially good in comparison with other topologies.

Carlisle and Dozier [CD01] achieve on unimodal and multi-modal functionsfor thegbesttopology better results than for the lbest topology.

b) Model PSO Algorithm

We will now present the Model PSO Algorithm that is implemented inGenOpt.

2In principle, the lattice need not be a square, but we do not see any computational

disadvantage of selecting a square lattice.


33


35/109




Algorithm 5.4.1 (Model PSO Algorithm for Continuous Variables)

Data: Constraint setX, as defined in (4.1),

but with finite lower and upper bound for each independent variable.Initial iteratex0X.Number of particlesnP N and number of generationsnG N.

Step 0: Initialize k = 0, x0(0) =x0 and the neighborhoods{Ni}nPi=1 .Step 1: Initialize{xi(0)}nPi=2X randomly distributed.Step 2: For i {1, . . . , nP}, determine the local best particles

pl,i(k) arg minx{xi(m)}km=0

f(x) (5.19a)

and the global best particle

pg,i(k) arg minx{xj(m) | jNi}km=0

f(x). (5.19b)

Step 3: Update the particle location{xi(k+ 1)}nP

i=1X.Step 4: Ifk = nG, stop. Else, go to Step 2.Step 5: Replace k byk + 1, and go to Step 1.

We will now discuss the different implementations of the Model PSO Algo-rithm 5.4.1 in GenOpt.

c) Particle Update Equation

(i) Version with Inertia Weight Eberhart and Shi [SE98, SE99] in-troduced aninertia weightw(k) which improves the performance of the originalPSO algorithm. In the version with inertia weight, the particle update equationis, for all i {1, . . . , nP}, for k N and xi(k)Rnc , withvi(0) = 0,

vi(k+ 1) = w(k) vi(k) +c11(k) pl,i(k) xi(k)+c2 2(k)

pg,i(k) xi(k)

, (5.20a)

vji (k+ 1) = sign(vji (k+ 1)) min{|vji (k+ 1)|, vjmax},j {1, . . . , nc}, (5.20b)

xi(k+ 1) = xi(k) +vi(k+ 1), (5.20c)

wherevjmax (u

j lj), (5.20d)with

R+, for all j

{1, . . . , nc

}, and l, u

Rnc are the lower and upper

bound of the independent variable. A common value is = 1/2. In GenOpt,if0, then no velocity clamping is used, and hence, vji (k+ 1) =vji (k+ 1),for all k N, all i {1, . . . , nP} and all j {1, . . . , nc}.

We compute the inertia weight as

w(k) = w0 kK

(w0 w1), (5.20e)


34


36/109




where w0 R is the initial inertia weight, w1 R is the inertia weight for thelast generation, with 0 w1 w0, and K N is the maximum number ofgenerations. w0 = 1.2 andw1= 0 can be considered as good choices [PV02b].

(ii) Version with Constriction Coefficient Clerc and Kennedy [CK02]introduced a version with a constriction coefficient that reduces the veloc-ity. In their Type 1 implementation, the particle update equation is, forall i {1, . . . , nP}, fork N and xi(k)Rnc , withvi(0) = 0,

vi(k+ 1) = (, ) vi(k) +c11(k) pl,i(k) xi(k)+c22(k)

pg,i(k) xi(k)

, (5.21a)

vji (k+ 1) = sign(vji (k+ 1)) min{|vji (k+ 1)|, vjmax},j {1, . . . , nc}, (5.21b)

xi(k+ 1) = xi(k) +vi(k+ 1), (5.21c)

wherevjmax (u

j lj), (5.21d)is as in (5.20d).

In (5.21a), (, ) is calledconstriction coefficient, defined as

(, )

2

|2

24 |, if >4,

, otherwise,(5.21e)

where c1+c2 and (0, 1] control how fast the population collapsesinto a point. If = 1, the space is thoroughly searched, which yields slowerconvergence.

Equation (5.21) can be used with or without velocity clamping (5.21b). Ifvelocity clamping (5.21b) is used, Clerc and Kennedy use = 4.1, otherwisethey use = 4. In either case, they set c1 = c2 = /2 and a population size ofnP= 20.

Carlisle and Dozier [CD01] recommend the settings nP = 30, no velocityclamping, = 1, c1 = 2.8 andc2 = 1.3.

Kennedy and Eberhart [KES01] report that using velocity clamping (5.21b)and a constriction coefficient shows faster convergence for some test problemscompared to using an inertia weight, but the algorithm tends to get stuck inlocal minima.

5.4.2 PSO for Discrete Variables

Kennedy and Eberhart [KE97] introduced a binary version of the PSO al-gorithm to solve problemPd defined in (4.4).

The binary PSO algorithm encodes the discrete independent variables ina string of binary numbers and then operates with this binary string. For


35


37/109




4 3 2 1 0 1 2 3 40.2

0.4

0.6

0.81

v

s(v) = 1/(1 +ev)

Figure 5.2: Sigmoid function.

some i {1, . . . , nd}, let xi N be the component of a discrete independentvariable, and let i

{0, 1

}mi be its binary representation (with mi

N+

bits), obtained using Gray encoding [PFTV93], and let l,i(k) and g,i(k) bethe binary representation ofpl,i(k) andpg,i(k), respectively, where pl,i(k) and

pg,i(k) are defined in (5.19).Then, for i {1, . . . , nd} and j {1, . . . , mi} we initialize randomly

ji (0) {0, 1}, and compute, for k N,vji (k+ 1) = vji (k) +c11(k) jl,i(k) ji (k)+c2 2(k)

jg,i(k) ji (k)

, (5.22a)

vji (k+ 1) = sign(vji (k+ 1)) min{|vji (k+ 1)|, vmax}, (5.22b)ji (k+ 1) =

0, ifi,j(k)s

vji (k+ 1)

,

1, otherwise,(5.22c)

where

s(v) 1

1 +ev (5.22d)

is the sigmoid function shown in Fig. 5.2 and i,j(k) U(0, 1), for all i{1, . . . , nd} and for all j {1, . . . , mi}.

In (5.22b),vmax R+is often set to 4 to prevent a saturation of the sigmoidfunction, and c1, c2 R+ are often such that c1+c2 = 4 (see [KES01]).

Notice that s(v) 0.5, as v 0, and consequently the probability offlipping a bit goes to 0.5. Thus, in the binary PSO, a small vmax causes alarge exploration, whereas in the continuous PSO, a smallvmax causes a smallexploration of the search space.

Any of the above neighborhood topologies can be used, and Model Algo-rithm 5.4.1 applies if we replace the constraint setX by the user-specified setXd Znd .

5.4.3 PSO for Continuous and Discrete Variables

For problem Pcd defined in (4.6), we treat the continuous independentvariables as in (5.20) or (5.21), and the discrete independent variables as


36


38/109




in (5.22). Any of the above neighborhood topologies can be used, and ModelAlgorithm 5.4.1 applies if we define the constraint set X as in (4.5).

5.4.4 PSO on a Mesh

We now present a modification to the previously discussed PSO algorithms.For evaluating the cost function, we will modify the continuous independentvariables such that they belong to a fixed mesh in Rnc. Since the iteratesof PSO algorithms typically cluster during the last iterations, this reducesin many cases the number of simulation calls during the optimization. Themodification is done by replacing the cost functionf: Rnc Znd Rin ModelAlgorithm 5.4.1 as follows: Let x0 (xc,0, xd,0) Rnc Znc denote the initialiterate, let Xc be the feasible set for the continuous independent variablesdefined in (4.5b), let r, s N, with r >1, be user-specified parameters, let

1

rs (5.23)

and let the mesh be defined as

M(xc,0, , s)

xc,0+

ni=1

mi si ei| m Znc

, (5.24)

where sRnc is equal to the value defined by the variable Step in GenOptscommand file (see page 89). Then, we replace f(, ) byf: Rnc Znd Rnc R Rnc R, defined by

f(xc, xd; xc,0, , s) f((xc), xd), (5.25)

where : Rnc

Rnc is the projection of the continuous independent variable

to the closest feasible mesh point, i.e., (xc) M(xc,0, , s) Xc. Thus, forevaluating the cost function, the continuous independent variables are replacedby the closest feasible mesh point, and the discrete independent variables re-main unchanged.

Good numerical results have been obtained by selecting s Rnc andr, sN such that about 50 to 100 mesh points are located along each coordinatedirection.

5.4.5 Population Size and Number of Generations

Parsopoulos and Vrahatis [PV02b] use for x Rnc a population size ofabout 5 n up to n = 15. For n

10 . . . 20, they use nP

10 n. They set

the number of generations to nG = 1000 up to n = 20 and to nG = 2000 forn= 30.

Van den Bergh and Engelbrecht [vdBE01] recommend using more than 20particles and 2000 to 5000 generations.

Kennedy and Eberhart [KES01] use, for test cases with the lbest neigh-borhood topology of size l = 2 and n = 2 and n = 30, a population size of


37


39/109




nP = 20 . . . 30. They report that 10 . . . 50 particles usually work well. As a ruleof thumb, they recommend for the lbestneighborhood to select the neighbor-

hood size such that each neighborhood consists of 10 . . . 20% of the population.

5.4.6 Keywords

For the Particle Swarm algorithm, the command file (see page 89) can con-tain continuous and discrete independent variables.

The different specifications for the Algorithmsection of the GenOpt com-mand file are as follows:

PSO algorithm with inertia weight:

Algorithm{

Main = PSOIW;NeighborhoodTopology = gbest | lbest | vonNeumann;

NeighborhoodSize = Integer; // 0 < NeighborhoodSize

NumberOfParticle = Integer;

NumberOfGeneration = Integer;

Seed = Integer;

CognitiveAcceleration = Double; // 0 < CognitiveAcceleration

SocialAcceleration = Double; // 0 < SocialAcceleration

MaxVelocityGainContinuous = Double;

MaxVelocityDiscrete = Double; // 0 < MaxVelocityDiscrete

InitialInertiaWeight = Double; // 0 < InitialInertiaWeight

FinalInertiaWeight = Double; // 0 < FinalInertiaWeight

}

PSO algorithm with constriction coefficient:

Algorithm{

Main = PSOCC;

NeighborhoodTopology = gbest | lbest | vonNeumann;

NeighborhoodSize = Integer; // 0 < NeighborhoodSize

NumberOfParticle = Integer;


Seed = Integer;





ConstrictionGain = Double; // 0 < ConstrictionGain


40/109




Main = PSOCCMesh;

NeighborhoodTopology = gbest | lbest | vonNeumann;

NeighborhoodSize = Integer; // 0 < NeighborhoodSizeNumberOfParticle = Integer;


Seed = Integer;







41/109




For the PSOCC implementation, following additional entries must be specified:

ConstrictionGain This is equal to

(0, 1] in (5.21e).

Notice that for discrete independent variables, the entries ofInitialInertiaWeight,FinalInertiaWeight, and ConstrictionGainare ignored.

For the PSOCCMesh implementation, following additional entries must be spec-ified:

MeshSizeDivider This is equal to r N, withr >1, used in (5.23).InitialMeshSizeExponent This is equal to sN used in (5.23).


40


42/109




5.5 Hybrid Generalized Pattern Search Algorithm

with Particle Swarm Optimization Algorithm

This hybrid global optimization algorithm can be used to solve problem Pcdefined in (4.2) and problemPcddefined in (4.6). Problem Pcg defined in (4.3)and problemPcdg defined in (4.7) can be solved if the constraint functionsg()are implemented as described in Section 8.2.

This hybrid global optimization algorithm starts by doing a Particle SwarmOptimization (PSO) on a mesh, as described in Section 5.4.4, for a user-specified number of generationsnG N. Afterwards, it initializes the Hooke-Jeeves Generalized Pattern Search (GPS) algorithm, described in Section 5.2.2,using the continuous independent variables of the particle with the lowest costfunction value. If the optimization problem has continuous and discrete in-

dependent variables, then the discrete independent variables will for the GPSalgorithm be fixed at the value of the particle with the lowest cost functionvalue.

We will now explain the hybrid algorithm for the case where all independentvariables are continuous, and then for the case with mixed continuous anddiscrete independent variables. Throughout this section, we will denote thedimension of the continuous independent variables bync N and the dimensionof the discrete independent variables by nd N.

5.5.1 Hybrid Algorithm for Continuous Variables

We will now discuss the hybrid algorithm to solve problem Pc definedin (4.2). However, we require the constraint set X

Rnc defined in (4.1) to

have finite lower and upper bounds l i, ui R, for all i {1, . . . , nc}.

First, we run the PSO algorithm 5.4.1, with user-specified initial iteratex0X for a user-specified number of generationsnG Non the mesh definedin (5.24). Afterwards, we run the GPS algorithm 5.1.8 where the initial iteratex0 is equal to the location of the particle with the lowest cost function value,i.e.,

x0 p argminx{xj(k) | j{1,...,nP}, k{1,...,nG}}

f(x), (5.26)

where nP N denotes the number of particles and xj (k), j {1, . . . , nP},k {1, . . . , nG} are as in Algorithm 5.4.1.

Since the PSO algorithm terminates after a finite number of iterations, allconvergence results of the GPS algorithm hold. In particular, if the cost func-tion is once continuously differentiable, then the hybrid algorithm constructsaccumulation points that are feasible stationary points of problem (4.2) (seeTheorem 5.1.13).


41


43/109


44/109






MaxVelocityGainContinuous = Double;MaxVelocityDiscrete = Double; // 0 < MaxVelocityDiscrete



45/109




5.6 Hooke-Jeeves

This algorithm is implemented for compatibility with previous GenOpt ver-sions and is no longer supported. We recommend using the implementation ofthe Hooke-Jeeves algorithm described in Section 5.2.2 on page 23.

The Hooke-Jeeves pattern search algorithm [HJ61] is a derivative free op-timization algorithm that can be used to solve problem Pc defined in (4.2) forn >1. ProblemPcg defined in (4.3) can be solved by implementing constraintson the dependent parameters as described in Section 8.

For problem (4.2), if the cost function is continuously differentiable andhas bounded level sets, then the Hooke-Jeeves algorithm converges to a pointx Rn that satisfiesf(x)= 0 (see [Tor97, AD03, TGKT03]).

Hooke and Jeeves found empirically that the number of function evaluationsincreases only linearly with the number of independent variables [HJ61].

5.6.1 Modifications to the Original Algorithm

Now, we explain modifications to the original algorithm of [HJ61] whichare implemented in GenOpt.

Smith [Smi69] reports that applying the same step size for each variablecauses some parameters to be essentially ignored during much of the searchprocess. Therefore, Smith proposes to initialize the step size for each variableby

xi =|xi0|, (5.27)where > 0 is a fraction of the initial step length and x

0 Rn is the ini-

tial iterate. In GenOpts implementation, xi is set equal to the value of theparameter Step, which is specified in the command file (see page 89). Thisallows taking the scaling of the components of the independent parameter intoaccount.

In [HJ61], the search of the exploration move is always done first in thepositive, then in the negative direction along the coordinate vectors, ei Rn,i {1, . . . , n}. Bell and Pike [BP66] proposed searching first in the directionthat led in the last exploration move to a reduction of the cost function. Thisincreases the probability to reduce the cost function already by the first explo-ration move, thus allows skipping the second trial.

De Vogelaere [DV68] proposed changing the algorithm such that the max-imum number of function evaluations cannot be exceeded, which can be thecase in the original implementation.

All three modifications are implemented.


44


46/109




To implement the box constraints of problem Pc andPcg, defined in (4.2)and (4.3), respectively, we assign f(, x) =for allxX whereX is definedin (4.1).

5.6.2 Algorithm Description

Hooke and Jeeves divide the algorithm in an initial exploration(I), abasiciteration(II), and a step size reduction(III). (I) and (II) make use of so-calledexploratory movesto get local information about the direction in which thecost function decreases.

The exploratory moves are executed as follows (see Fig. 5.3):Let xi R be the step size of the i-th independent parameter, and ei Rnthe unit vector along the i-th coordinate direction. Assume we are given a basepoint, called the resulting base point xr and its function value, say fp f(xr).

Then we make a sequence of orthogonal exploratory moves. To do so, we seti= 0 and assign

xrxr+ xi ei. (5.28)Provided that xr is feasible, that is xrX, we evaluate the cost function

and assign fr f(xr). Iffr < fp, then the new point becomes the resultingbase point, and we assign

fpfr. (5.29)Otherwise, we assign

xi xi, (5.30)xr xr+ 2 xi ei, (5.31)

evaluate f(xr) and assign fr f(xr). If this exploration reduced the costfunction, we apply (5.29). Otherwise, we reset the resulting base point byassigning

xrxr xi ei (5.32)so that the resulting base point has not been altered by the exploration inthe direction along ei. Therefore, if any of the exploration moves have beensuccessful, we have a new resulting base point xr and a new function valuefp = f(xr). Using the (probably new) resulting base pointxr the same pro-cedure is repeated along the next coordinate direction (i.e., along ei+1,) untilan exploration along all coordinate vectors ei, i {1, . . . , n} has been done.Note that according to (5.31) xi has in the next exploration move along eithe sign that led in the last exploration to a reduction of the cost function (if

any reduction was achieved).

At the end of the n exploratory moves, we have a new resulting base pointxr if and only if at least one of the exploratory moves led to a reduction of thecost function.

(I) Initial Iteration


45


47/109




Givenxxrr,ff

pp=f(x

r))

i=0, n=dim(xrr))

Check firstdirection:

xxrrxx

rr+xxiie

ii

ffr

f(xr))

Failure, checkotherdirection:

xxixxii

xxrrxx

rr+2 xxiie

ii

ffrrf(xrr

))

Failure, resetcoordinate:

xxrrxx

rrxxiie

ii

Success:

ffppffrr

ffrr


48/109




Evaluatefunctionatinitialbasepoint:

ffcf(xc))

Start

m=0

II

Assign basepo int:

ffpff

c;x

rxx

cc

Explore:

{xr

Optimization Module

Documents

Transcript of Optimization Module