The parallel tunneling method

11

Click here to load reader

Transcript of The parallel tunneling method

Page 1: The parallel tunneling method

The parallel tunneling method q

Susana G�oomez a,*, Nelson del Castillo a, Longina Castellanos b,Julio Solano a

a IIMAS, UNAM, Apdo Postal 20-726, Mexico D.F. 01000, Mexicob ICIMAF, Calle 15, Vedado Cd. de la Habana, Cuba

Received 4 September 2002; received in revised form 4 November 2002

Abstract

This paper describes the parallel version of the tunneling methods for global optimization

of bounded constrained problems, taking advantage of the stochastic element of these me-

thods that allows a smart exploration of the space. When the search for a point in another

valley starting from points in a neighbourhood of the last (and best) local minimum along ran-

dom directions, is performed at the same time on several processors, a more efficient and fast

global optimization method is obtained. The performance of the parallel method is illustrated

on several academic problems and on the specially difficult Lennard-Jones molecular struc-

tures problem.

� 2003 Elsevier Science B.V. All rights reserved.

Keywords: Global optimization; Tunneling methods; Parallel implementation; Lennard-Jones clusters

1. Introduction

The tunneling methods [11,12,1,7] are deterministic methods in the sense that they

find a sequence of local minima with monotonically decreasing objective function.

This is accomplished by a two phase process of local minimization to find x�, fol-lowed by a tunnelization to find a point xtun in another valley, with lower or equal

value of the optimal objective function f ðxtunÞ6 f ðx�Þ, that will serve as the initial

point for the next local minimization.

qThis work has been partially supported by UNAM-PAPIIT, grant no. IN-112999, by CONACYT

grants nos. 35458-A, 37913-A and 37017-A and by the OPTIMA project.*Corresponding author.

E-mail addresses: [email protected] (S. G�oomez), [email protected] (N. del

Castillo), [email protected] (L. Castellanos), [email protected] (J. Solano).

0167-8191/03/$ - see front matter � 2003 Elsevier Science B.V. All rights reserved.

doi:10.1016/S0167-8191(03)00020-6

www.elsevier.com/locate/parco

Parallel Computing 29 (2003) 523–533

Page 2: The parallel tunneling method

They also have a stochastic element as they start the search for points in another

valley (in the tunnelization phase) in random directions. It is this stochastic element

that can be exploited to perform a smart exploration of the feasible space in parallel

processors, to improve the performance and the speed of the method.

In order to guarantee convergence to the global minimum, the tunneling methodcan be imbedded in a branch and bound setting [15]. However, this branch and

bound hybrid method is not efficient for practical applications, as although the glo-

bal solution is found very fast by the tunneling method and this determines the

upper bound, the check for global optimality, when the lower and upper bounds

become equal, is extremely slow. Furthermore, the algorithms to find the lower

bounds are only limited to certain kind of functions and are not yet available for

general objective functions. In the cases when the objective function is a program

or a simulator, as is the case in parameter estimation complex problems as [9],the branch and bound methodology cannot yet be applied. We then use the tunnel-

ing methods without this mathematical check for global optimality and perform it

computationally: at the global solution a point in another valley with lower or equal

value of the global optimum objective function does not exists, and thus the tunnel-

ing search will have no solution. The global optimality test would then consist in

performing the tunneling search from a prefixed number of initial points both from

a neighbourhood of the candidate global solution and within the whole feasible

region.The sequential tunneling methods have shown to be very successful to solve aca-

demic problems [6,10], molecular Lennard-Jones structures [8], industrial problems

[13,14] and parameter identification problems [9].

A new version of the method has been developed in the code TUNNEL [6], taking

into account the scales of the problem for the design of the different tolerances of the

method: to consider minima at the same level, to define a neighbourhood of the local

minima, to start the search for points in different valleys (the tunneling phase), and

to decide weather a tunneling search is successful or not.Some of the applications cited above, do not necessarily need the global solution,

but a set of local minima with sufficiently lower objective function. The ability of the

tunneling methods to find several minima (local or global) at the same level (within a

prefixed tolerance), allows the possibility to find several ‘‘sufficiently good’’ solu-

tions. This property is specially useful for parameter identification problems, where

the uncertainty in the problem formulation requires alternative local solutions (fore-

cast scenarios) with good match to the data [9]. The evaluation of the objective func-

tion in these problems is generally very expensive and thus the need to developefficient and inexpensive global optimization methods. The use of parallel processors

is a step in this direction.

2. The parallel tunneling method

A detailed description of the original sequential methods can be found in

[11,12,1]. Here we only give a brief description.

524 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533

Page 3: The parallel tunneling method

We want to solve the general global bounded optimization problem, find the glo-

bal minimum x�G of

min f ðxÞsubject to x 2 B; B ¼ fxjxmin 6 x6 xmaxg

ð1Þ

At the global solution f ðxÞP f ðx�GÞ 8x 2 B.The basic idea of this method, is to tunnel from one valley of the objective func-

tion to another, to be able to find a sequence of local minima with decreasing func-

tion values, f ðx�1ÞP f ðx�2ÞP P f ðx�GÞ, ignoring all the local minima with larger

objective function values than the best already found. The tunneling method has a

minimization phase in which starting from an initial point x0, a local minimum x�

is found with f � ¼ f ðx�Þ, using any local bounded optimization method.

From x� in the tunneling phase, it obtains a feasible point xtun in another valley,

that is

f ðxtunÞ6 f �

This point will be taken as the initial point x0 for the minimization phase. These two

phases are repeated alternatively until convergence is achieved.

2.1. Minimization phase

Starting from an initial approximation x0, we have to find a local solution to the

problem

Find x� ¼ arg minx2B

f ðxÞ ð2Þ

In this work we use a limited memory quasi-Newton local optimization method [18],

but a code is being written using a Truncated Gauss–Newton local optimization

method.

2.2. Tunneling phase

Once a local minimum has been obtained, to be able to tunnel from one valley to

another using gradient-type methods, it is necessary to destroy the minimum, placing

a pole at the minimum point x� and generating directions that would move the ite-

rates away from it. As we are trying to find a point xtun in another valley with less or

equal value than f �, we need to solve the following inequality

T ðxÞ ¼ f ðxÞ � f ðx�Þ6 0 ð3ÞWe then place a pole at x� using the exponential tunneling function 1 [1].

TeðxÞ ¼ ðf ðxÞ � f �Þ exp k�

kx� x�k

� �

1 k:k is the squared euclidean norm.

S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 525

Page 4: The parallel tunneling method

Or the classical tunneling function [11,12],

TcðxÞ ¼f ðxÞ � f �

kx� x�kk�

Solving problem (3) now consists in finding xtun such that

TeðxtunÞ6 0 or TcðxtunÞ6 0 ð4Þ

We can take descent directions to solve the inequality problem, and thus we use the

same algorithm used to solve problem (2) with appropriate stopping conditions to

check convergence for problem (4).

As the original objective function f is a general non-linear function only assumed

to belong to C2 for x 2 B, it could have many local and global minima and conver-gence to minima at the same level is possible, that is f ðx�1Þ ¼ f ðx�2Þ ¼ ¼ f ðx�t Þ as x�iwould be acceptable solutions of problem (4). Then in order to avoid cycling and go-

ing back to these minima at the same level already found, it is important to preserve

the poles used to destroy them, until a better minimum x�tþ1 with lower value of the

objective function is found. To achieve this goal it is necessary to modify the defini-

tion of the tunneling function in the following fashion

TeðxÞ ¼ ðf ðxÞ � f �ÞYt

i¼1

expk�i

kx� x�i k

making t ¼ 1 as soon as an x�tþ1 is found with strictly smaller value than f ðx�Þ.

2.3. Strength of the pole

If a large value of k� is taken, the tunneling function will be smoother and the dan-

ger of encountering critical points during the search will be reduced. However, expe-

rience has shown that the search for the point with T ðxtunÞ6 0 will be more expensive

for large values of k�. We therefore take the smallest value that gives a descent direc-tion, and use mobile poles to deal with critical points.

2.4. Mobile poles

It can happen that a critical point (i.e. a local minima or a saddle point) of the

tunneling function is encountered before a negative value of T ðxÞ is found, becausethe tunneling function T ðxÞ may inherit the non-convex nature of the original objec-

tive function f . To be able to continue the search, we place an additional mobile poleat the critical point xm. The tunneling function then becomes:

T 0ðxÞ ¼ T ðxÞ � exp km

kx� xmkHowever, this mobile pole is turned off when the iterand is out of the zone of at-

traction of the critical point.

526 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533

Page 5: The parallel tunneling method

2.5. Stopping conditions for the tunneling phase

Once a local minimum x� has been found, we generate an initial point xtun0 to start

the tunneling search. This point is located at a given distance from x� along a randomdirection. Details about this will be given in the parallelization section.

The tunneling search is not successful when:

• A corner of the admissible region has been reached.

• The strength of the pole is greater than a maximum value given, without having

obtained a descent direction.

• No further precision for x is possible.

• The maximum number of function evaluations allowed for this phase has been

reached.

2.6. General stopping conditions

When the algorithm has found the global minimum, the solution to problem (4)

does not exist and because there is not a mathematical condition to check for global

optimality, either the value of f ðx�GÞ is known and has been attained, or the search

for the solution T ðxÞ6 0 has to be exhaustive. The test to check for global optimalityis related to the amount of computation through the number of function evaluations,

and through the number of initial points allowed to start the tunneling search. This is

described in more detail in the next section.

In our implementation [6], the algorithm stops when any of the following criteria

are satisfied:

1. In the tunneling phase the given maximum number of initial points to start the

search for xtun has been reached. The last minimum found is the putative globalminimum.

2. The maximum allowed number of function evaluations has been reached.

3. A lower bound of the objective function FMIN given by the user has been at-

tained. The last minimum found is the putative global minimum.

4. All the global minima at the same level FMIN required by the user have been

found.

2.7. Parallelization of the method

In the tunneling phase, a key aspect for the behaviour of the method, is how one

takes the initial point from which the tunneling search starts. First, and for a pre-

fixed number of points, the search will start from a point x in a neighbourhood of

the last (and best) local minimum found, and will be taken along random directions,

x ¼ x� þ er

S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 527

Page 6: The parallel tunneling method

where e is a scalar that depends on the scale of the problem (see [6]), and r is a

random vector within [)1,1]. From this initial point the tunneling search starts

looking for a point in another valley, solving the inequality problem (4), using

the same local optimization method we used to solve the minimization phase, to

generate descent directions and step lengths. If the search from an initial point isnot successful, (as explained in the stopping conditions for the tunneling phase)

another point in the neighbourhood is generated along a random direction.

When a prefixed number of points in the neighbourhood have not been successful,

another prefixed number of random points are then taken at the whole feasible re-

gion.

The maximum number of initial points allowed for the tunneling phase, serves

also to control the amount of computing effort designed to check for global opti-

mality.This search can be done in parallel to explore the space in an efficient fashion. The

parallel method has been designed as follows:

• There is a central processor P0 that controls the process and broadcasts the initial

data to all processors Pi. It finds the first local minimum, and sends this informa-

tion to all Pi.• Each processor will carry out both phases, tunnelization and minimization, and

will have a different seed for the random number generator.• The tunnelization phase will start from different initial points (along different ran-

dom directions) at each processor.

• When a processor finds a point in another valley (successful tunnelization), it pro-

ceeds to find a local minimum.

• When a processor finds a local minimum, sends the result immediately to proces-

sor P0.• Processor P0 checks if the new minimum is the best found so far in which case it

proceeds to send a message with this information to all processors. It also keepsthis minimum in memory both as a local minimum and as a candidate to be the

global (checking first that the minimum is different from the other minima found

so far).

• Each processor checks if there is a message from the central processor, only dur-

ing the tunneling phase. If it is already in a minimization phase, it continues the

phase until it finds a local minimum, without checking messages from the central

processor. It is not worth interrupting the search for the local minimum once in

the minimization phase, as this valley may be the global one.• When a processor checks for a message at the tunneling phase, if it gets a new

minimum, re-starts the tunneling search from this new minimum.

• The central processor has the information of all the minima found so far and of

the best optimal solution, and fills the output files with the information required

by the user.

• The central processor checks the general stopping conditions, prints the output

and stops the whole process.

528 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533

Page 7: The parallel tunneling method

As all the processors are searching for points in another valley (at the tunneling

phase), from different initial points (first in a neighbourhood of the last local mini-

mum and in the whole feasible region afterwards), they explore efficiently several

regions of the feasible space simultaneously.

3. Numerical results

Results obtained by the traditional tunneling method have been good so far for

different applications. Nevertheless, the development of recent computer systems has

given the possibility of exploiting intrinsic features of the method, to create the paral-

lel version described in the previous section to make it more efficient (in terms of con-

vergence) and to reduce the computational time. These two are the essentialcharacteristics for the operation of many industrial processes [16,17].

This section makes a comparison test between the results obtained using

only one processor (the sequential process), and several processors (the parallel

process). We solve some academic problems: Rastrigin-46, Rastrigin-49, Shaffer

[10] and the Lennard-Jones molecular structure problem for 38 and 40 atoms

[4,8].

The algorithm has been implemented on a Cluster of Workstations consisting of 4-

1.2 GHz PCs running Linux 2.4.18. They are connected through a 100 Mb/s Ether-net switch and the system uses the standard message passing interface MPI.

Fig. 1 shows a star topology used for this implementation. As described pre-

viously, the initial data is provided to the system. This is done by processor P0 whichis also in charge of controlling the communications between processors.

P0

P1

P2

P3 P4

Pn

Fig. 1. A star topology.

S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 529

Page 8: The parallel tunneling method

For this implementation of the system, 3 processors are used in addition to pro-

cessor P0. The performance evaluation for each case study is obtained from an

average of 20 runs of the parallel algorithm. All the results presented here, were ob-

tained using the exponential tunneling method.

Table 1 shows the results for the academic problems Rastrigin-46, Rastrigin-49and Shaffer. It is clear the time improvement obtained for the Rastrigin cases when

using several processors. Although the complexity of Rastrigin-49 is greater than

Ratrigin-46 the response is almost the same. For the Shaffer case, we observe the

same effect in terms of time performance but we can notice a greater improvement

when going from 2 to 3 processors.

Table 1

Results for academic problems

Tim

e (s

ec)

Tim

e (s

ec)

Tim

e (s

ec)

Processor Time (s) Speed-up Efficiency

Rastrigin-46 1 11.8779

2 6.3615 1.8671 0.9336

3 5.1673 2.2987 0.7662

Rastrigin-49 1 11.1996

2 6.8625 1.6320 0.8160

3 5.0457 2.2196 0.7399

Shaffer 1 18.0963

2 16.7247 1.0820 0.5410

3 2.6178 6.9129 2.3043

530 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533

Page 9: The parallel tunneling method

4. Molecular structure problems

The problem of finding the optimal structure of small molecules (clusters),

modelled using the Lennard-Jones potential, is one of the most challenging global

optimization problems, due to the number of local minima ðexpðn2ÞÞ. The objectivefunction is

f ðxÞ ¼Xn�1

i¼1

Xn

j¼iþ1

ðr�12ij � r�6

ij Þ

where ri;j ¼ ððxi � xjÞ2 þ ðyi � yjÞ2 þ ðzi � zjÞ2Þ1=2, and ðx; y; zÞ are the coordinates of

each atom in R3.

Here we will test the parallel method for a ‘‘normal’’ test case of 40 atoms (120

variables), and the specially difficult case of 38 atoms. It is known that most of

the structures in the range 13–147 atoms are icosahedral, with a few exceptions

(see [2–5,8]). The first non-icosahedral structure found was the 38 atoms, [8]. Taking

into account the fact that most structures are icosahedral, we usually take the struc-ture of n� 1 atoms, as the initial point for the n-atoms case (plus one atom at some

arbitrary point). The 40-atoms case is normal in the sense that the structure does not

change from 39 to 40 (remains icosahedral), whereas the 38 implies a large movement

of the atoms from the 37 to form an exceptional structure called face centered cubic

fcc, with many local minima. This case represents a real challenge for a global opti-

mization method.

Table 2 shows the results. Significant figures can be observed for 38 atoms, as the

system presents super speed-up rarely obtained in multiprocessing systems, that is, itshows a speed-up greater than 258 for 3 processors. This behaviour can be explained

by the use of different seeds for the random number generator at each processor,

which produces a diverse set of starting points of the tunneling search, giving as a

Table 2

Lennard-Jones for 38 atoms

Processor Time (s) Speed-up Efficiency

1 1370.8756

2 78.3758 17.4911 8.7455

3 5.3027 258.5241 86.1747

Table 3

Lennard-Jones for 40 atoms

Processor Time (s) Speed-up Efficiency

1 5.0218

2 0.9407 5.3384 2.6692

3 0.8865 5.6647 1.8882

S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 531

Page 10: The parallel tunneling method

result an efficient exploration of the feasible space. It is important to point out that

the migration of information conducted by the system gives, as observed, a better

probability of improving local minima and finding a point in the global valley.

For Lennard-Jones-40 (Table 3), results are similar although this time the levels of

speed-up are less spectacular.

5. Conclusions

The sequential tunneling methods have proved in the past to be successful for

solving several complex applications, but to be able to solve problems with particu-

larly expensive objective functions, it then becomes important to make them faster

and more efficient. The stochastic side of these methods and some recent develop-ments in computer architectures and communication tools that permits the use of

clusters, allow the development of the parallel tunneling method described in this pa-

per which has the following features:

• It is more efficient in terms of search exploration.

• It has better execution times.

The efficient exploitation of the intrinsic characteristics of the tunneling search, al-lows the development of an elegant parallel approach which can produce good re-

sults on simple topologies due to: the stochastic parallel search, the migration of

minimum values through the processes in the system and the low communications

overhead between processors.

The numerical results obtained on the academic examples presented, show consi-

derable reductions in execution times due simply to the use of a multiprocessing sys-

tem, with the well known limitations in terms of speed-up (i.e. asymptotical

reduction of speed-up when increasing the number of processors).In the Lennard-Jones 38 example, we observe a rare super speed-up effect (not

very common in multiprocessing systems) due to its complexity. The reason for this

behaviour is the efficient mechanisms provided by the parallel approach, which ex-

plores the search space in many regions simultaneously, with local minima informa-

tion available for all processes in the system. This has been essential to be able to find

the global minimum in less time than the sequential tunneling algorithm.

In summary, for all the examples tested, it has been shown that the parallel ver-

sion is better than the sequential method. We conclude by saying that the parallelmethod is a practical tool to solve real and complex problems.

Acknowledgements

We would like to thank Francisco C�aardenas Flores for the cluster set-up and his

valuable advice in the implementation. Also Jose Luis Gordillo Ruiz was very help-

ful for the use of the MPI protocol.

532 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533

Page 11: The parallel tunneling method

References

[1] C. Barr�oon, S. G�oomez, The exponential tunneling method, Reporte de Investigaci�oon IIMAS 1 (3)

(1991) 1–23.

[2] C. Barr�oon, S. G�oomez, D. Romero, Archimedean polyhedron structure yields a lower energy atomic

cluster, Applied Mathematics Letters 9 (5) (1996) 75–78.

[3] C. Barr�oon, S. G�oomez, D. Romero, Lower energy icosahedral atomic cluster with incomplete core,

Applied Mathematics Letters 10 (4) (1997) 25–28.

[4] C. Barr�oon, S. G�oomez, D. Romero, A. Saavedra, A genetic algorithm for Lennard-Jones atomic

clusters, Applied Mathematics Letters 12 (7) (1999) 85–90.

[5] C. Barr�oon, S. G�oomez, D. Romero, The optimal geometry of Lennard–Jones clusters: 148–309,

Computer Physics Communication 123 (1999) 87–96.

[6] L. Castellanos, S. G�oomez, A new implementation of the Tunneling methods for bound constrained

global optimization, Reporte de Investigaci�oon IIMAS 10 (59) (2000) 1–18.

[7] S. G�oomez, A.V. Levy, The tunneling method for solving the constrained global optimization problem

with several non-connected feasible regions, Lecture Notes in Mathematics, Springer-Verlag 909

(1982) 34–47.

[8] S. G�oomez, D. Romero, Two global methods for molecular geometry optimization, Progress in

Mathematics, Birkhauser 121 (1994) 503–509.

[9] S. G�oomez, O. Gosselin, J. Barker, Gradient-based history––matching with a global optimization

method, Society of Petroleum Engineering Journal (June 2001) 200–208.

[10] S. G�oomez, J. Sol�oorzano, L. Castellanos, M.I. Quintana, Tunneling and genetic algorithms for global

optimization, in: N. Hadjisavvas, P. Pardalos (Eds.), Advances in Convex analysis and Global

Optimization. Non Convex Optimization and its Applications, Kluwer Academic Publishers, 2001,

pp. 553–567.

[11] A.V. Levy, A. Montalvo, The tunneling algorithm for the global minimization of functions, SIAM

Journal on Scientific and Statistical Computing 6 (1) (1985) 15–29.

[12] A.V. Levy, S. G�oomez, The tunneling method applied to global optimization, in: P.T. Boggs, R.H.

Byrd, R.B. Schnabel (Eds.), Numerical Optimization, SIAM, 1985, pp. 213–244.

[13] D.V. Nichita, S. G�oomez, E. Luna, Multiphase equilibria calculation by direct minimization of Gibbs

free energy with a global optimization method, Computers and Chemical Engineering, in press.

[14] D.V. Nichita, S. G�oomez, E. Luna, Phase stability analysis with cubic equations of state using a global

optimization method, Fluid Phase Equilibria 4943, Special Issue, (May 2002) 1–27.

[15] C. Pantelides, B. Keeping, S. Gomez, A hybrid branch and bound-tunneling method for the global

optimization of bounded problems, Presented at the conference Optimization 98, Coimbra Portugal,

19–22 July 1998.

[16] C.B. Pettey, M.R. Leuze, A theoretical investigation of parallel genetic algorithms, in: Proceedings of

the Third International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA,

1989, pp. 398–405.

[17] J. Solano Gonz�aalez, K. Rodr�ııguez V�aazquez, D.F. Garc�ııa Nocetti, Model-based spectral estimation of

Doppler signals using parallel genetic algorithms, Artificial Intelligence in Medicine 19 (2000) 75–89.

[18] C. Zhu, R.H. Byrd, P. Liu, J. Nocedal, Algorithm 778: L-BFGS-B Fortran subroutines for large-scale

bound constrained optimization, TOMS 23 (4) (1997) 550–560.

S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 533