The parallel tunneling method
Click here to load reader
-
Upload
susana-gomez -
Category
Documents
-
view
216 -
download
3
Transcript of The parallel tunneling method
![Page 1: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/1.jpg)
The parallel tunneling method q
Susana G�oomez a,*, Nelson del Castillo a, Longina Castellanos b,Julio Solano a
a IIMAS, UNAM, Apdo Postal 20-726, Mexico D.F. 01000, Mexicob ICIMAF, Calle 15, Vedado Cd. de la Habana, Cuba
Received 4 September 2002; received in revised form 4 November 2002
Abstract
This paper describes the parallel version of the tunneling methods for global optimization
of bounded constrained problems, taking advantage of the stochastic element of these me-
thods that allows a smart exploration of the space. When the search for a point in another
valley starting from points in a neighbourhood of the last (and best) local minimum along ran-
dom directions, is performed at the same time on several processors, a more efficient and fast
global optimization method is obtained. The performance of the parallel method is illustrated
on several academic problems and on the specially difficult Lennard-Jones molecular struc-
tures problem.
� 2003 Elsevier Science B.V. All rights reserved.
Keywords: Global optimization; Tunneling methods; Parallel implementation; Lennard-Jones clusters
1. Introduction
The tunneling methods [11,12,1,7] are deterministic methods in the sense that they
find a sequence of local minima with monotonically decreasing objective function.
This is accomplished by a two phase process of local minimization to find x�, fol-lowed by a tunnelization to find a point xtun in another valley, with lower or equal
value of the optimal objective function f ðxtunÞ6 f ðx�Þ, that will serve as the initial
point for the next local minimization.
qThis work has been partially supported by UNAM-PAPIIT, grant no. IN-112999, by CONACYT
grants nos. 35458-A, 37913-A and 37017-A and by the OPTIMA project.*Corresponding author.
E-mail addresses: [email protected] (S. G�oomez), [email protected] (N. del
Castillo), [email protected] (L. Castellanos), [email protected] (J. Solano).
0167-8191/03/$ - see front matter � 2003 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0167-8191(03)00020-6
www.elsevier.com/locate/parco
Parallel Computing 29 (2003) 523–533
![Page 2: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/2.jpg)
They also have a stochastic element as they start the search for points in another
valley (in the tunnelization phase) in random directions. It is this stochastic element
that can be exploited to perform a smart exploration of the feasible space in parallel
processors, to improve the performance and the speed of the method.
In order to guarantee convergence to the global minimum, the tunneling methodcan be imbedded in a branch and bound setting [15]. However, this branch and
bound hybrid method is not efficient for practical applications, as although the glo-
bal solution is found very fast by the tunneling method and this determines the
upper bound, the check for global optimality, when the lower and upper bounds
become equal, is extremely slow. Furthermore, the algorithms to find the lower
bounds are only limited to certain kind of functions and are not yet available for
general objective functions. In the cases when the objective function is a program
or a simulator, as is the case in parameter estimation complex problems as [9],the branch and bound methodology cannot yet be applied. We then use the tunnel-
ing methods without this mathematical check for global optimality and perform it
computationally: at the global solution a point in another valley with lower or equal
value of the global optimum objective function does not exists, and thus the tunnel-
ing search will have no solution. The global optimality test would then consist in
performing the tunneling search from a prefixed number of initial points both from
a neighbourhood of the candidate global solution and within the whole feasible
region.The sequential tunneling methods have shown to be very successful to solve aca-
demic problems [6,10], molecular Lennard-Jones structures [8], industrial problems
[13,14] and parameter identification problems [9].
A new version of the method has been developed in the code TUNNEL [6], taking
into account the scales of the problem for the design of the different tolerances of the
method: to consider minima at the same level, to define a neighbourhood of the local
minima, to start the search for points in different valleys (the tunneling phase), and
to decide weather a tunneling search is successful or not.Some of the applications cited above, do not necessarily need the global solution,
but a set of local minima with sufficiently lower objective function. The ability of the
tunneling methods to find several minima (local or global) at the same level (within a
prefixed tolerance), allows the possibility to find several ‘‘sufficiently good’’ solu-
tions. This property is specially useful for parameter identification problems, where
the uncertainty in the problem formulation requires alternative local solutions (fore-
cast scenarios) with good match to the data [9]. The evaluation of the objective func-
tion in these problems is generally very expensive and thus the need to developefficient and inexpensive global optimization methods. The use of parallel processors
is a step in this direction.
2. The parallel tunneling method
A detailed description of the original sequential methods can be found in
[11,12,1]. Here we only give a brief description.
524 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533
![Page 3: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/3.jpg)
We want to solve the general global bounded optimization problem, find the glo-
bal minimum x�G of
min f ðxÞsubject to x 2 B; B ¼ fxjxmin 6 x6 xmaxg
ð1Þ
At the global solution f ðxÞP f ðx�GÞ 8x 2 B.The basic idea of this method, is to tunnel from one valley of the objective func-
tion to another, to be able to find a sequence of local minima with decreasing func-
tion values, f ðx�1ÞP f ðx�2ÞP P f ðx�GÞ, ignoring all the local minima with larger
objective function values than the best already found. The tunneling method has a
minimization phase in which starting from an initial point x0, a local minimum x�
is found with f � ¼ f ðx�Þ, using any local bounded optimization method.
From x� in the tunneling phase, it obtains a feasible point xtun in another valley,
that is
f ðxtunÞ6 f �
This point will be taken as the initial point x0 for the minimization phase. These two
phases are repeated alternatively until convergence is achieved.
2.1. Minimization phase
Starting from an initial approximation x0, we have to find a local solution to the
problem
Find x� ¼ arg minx2B
f ðxÞ ð2Þ
In this work we use a limited memory quasi-Newton local optimization method [18],
but a code is being written using a Truncated Gauss–Newton local optimization
method.
2.2. Tunneling phase
Once a local minimum has been obtained, to be able to tunnel from one valley to
another using gradient-type methods, it is necessary to destroy the minimum, placing
a pole at the minimum point x� and generating directions that would move the ite-
rates away from it. As we are trying to find a point xtun in another valley with less or
equal value than f �, we need to solve the following inequality
T ðxÞ ¼ f ðxÞ � f ðx�Þ6 0 ð3ÞWe then place a pole at x� using the exponential tunneling function 1 [1].
TeðxÞ ¼ ðf ðxÞ � f �Þ exp k�
kx� x�k
� �
1 k:k is the squared euclidean norm.
S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 525
![Page 4: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/4.jpg)
Or the classical tunneling function [11,12],
TcðxÞ ¼f ðxÞ � f �
kx� x�kk�
Solving problem (3) now consists in finding xtun such that
TeðxtunÞ6 0 or TcðxtunÞ6 0 ð4Þ
We can take descent directions to solve the inequality problem, and thus we use the
same algorithm used to solve problem (2) with appropriate stopping conditions to
check convergence for problem (4).
As the original objective function f is a general non-linear function only assumed
to belong to C2 for x 2 B, it could have many local and global minima and conver-gence to minima at the same level is possible, that is f ðx�1Þ ¼ f ðx�2Þ ¼ ¼ f ðx�t Þ as x�iwould be acceptable solutions of problem (4). Then in order to avoid cycling and go-
ing back to these minima at the same level already found, it is important to preserve
the poles used to destroy them, until a better minimum x�tþ1 with lower value of the
objective function is found. To achieve this goal it is necessary to modify the defini-
tion of the tunneling function in the following fashion
TeðxÞ ¼ ðf ðxÞ � f �ÞYt
i¼1
expk�i
kx� x�i k
making t ¼ 1 as soon as an x�tþ1 is found with strictly smaller value than f ðx�Þ.
2.3. Strength of the pole
If a large value of k� is taken, the tunneling function will be smoother and the dan-
ger of encountering critical points during the search will be reduced. However, expe-
rience has shown that the search for the point with T ðxtunÞ6 0 will be more expensive
for large values of k�. We therefore take the smallest value that gives a descent direc-tion, and use mobile poles to deal with critical points.
2.4. Mobile poles
It can happen that a critical point (i.e. a local minima or a saddle point) of the
tunneling function is encountered before a negative value of T ðxÞ is found, becausethe tunneling function T ðxÞ may inherit the non-convex nature of the original objec-
tive function f . To be able to continue the search, we place an additional mobile poleat the critical point xm. The tunneling function then becomes:
T 0ðxÞ ¼ T ðxÞ � exp km
kx� xmkHowever, this mobile pole is turned off when the iterand is out of the zone of at-
traction of the critical point.
526 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533
![Page 5: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/5.jpg)
2.5. Stopping conditions for the tunneling phase
Once a local minimum x� has been found, we generate an initial point xtun0 to start
the tunneling search. This point is located at a given distance from x� along a randomdirection. Details about this will be given in the parallelization section.
The tunneling search is not successful when:
• A corner of the admissible region has been reached.
• The strength of the pole is greater than a maximum value given, without having
obtained a descent direction.
• No further precision for x is possible.
• The maximum number of function evaluations allowed for this phase has been
reached.
2.6. General stopping conditions
When the algorithm has found the global minimum, the solution to problem (4)
does not exist and because there is not a mathematical condition to check for global
optimality, either the value of f ðx�GÞ is known and has been attained, or the search
for the solution T ðxÞ6 0 has to be exhaustive. The test to check for global optimalityis related to the amount of computation through the number of function evaluations,
and through the number of initial points allowed to start the tunneling search. This is
described in more detail in the next section.
In our implementation [6], the algorithm stops when any of the following criteria
are satisfied:
1. In the tunneling phase the given maximum number of initial points to start the
search for xtun has been reached. The last minimum found is the putative globalminimum.
2. The maximum allowed number of function evaluations has been reached.
3. A lower bound of the objective function FMIN given by the user has been at-
tained. The last minimum found is the putative global minimum.
4. All the global minima at the same level FMIN required by the user have been
found.
2.7. Parallelization of the method
In the tunneling phase, a key aspect for the behaviour of the method, is how one
takes the initial point from which the tunneling search starts. First, and for a pre-
fixed number of points, the search will start from a point x in a neighbourhood of
the last (and best) local minimum found, and will be taken along random directions,
x ¼ x� þ er
S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 527
![Page 6: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/6.jpg)
where e is a scalar that depends on the scale of the problem (see [6]), and r is a
random vector within [)1,1]. From this initial point the tunneling search starts
looking for a point in another valley, solving the inequality problem (4), using
the same local optimization method we used to solve the minimization phase, to
generate descent directions and step lengths. If the search from an initial point isnot successful, (as explained in the stopping conditions for the tunneling phase)
another point in the neighbourhood is generated along a random direction.
When a prefixed number of points in the neighbourhood have not been successful,
another prefixed number of random points are then taken at the whole feasible re-
gion.
The maximum number of initial points allowed for the tunneling phase, serves
also to control the amount of computing effort designed to check for global opti-
mality.This search can be done in parallel to explore the space in an efficient fashion. The
parallel method has been designed as follows:
• There is a central processor P0 that controls the process and broadcasts the initial
data to all processors Pi. It finds the first local minimum, and sends this informa-
tion to all Pi.• Each processor will carry out both phases, tunnelization and minimization, and
will have a different seed for the random number generator.• The tunnelization phase will start from different initial points (along different ran-
dom directions) at each processor.
• When a processor finds a point in another valley (successful tunnelization), it pro-
ceeds to find a local minimum.
• When a processor finds a local minimum, sends the result immediately to proces-
sor P0.• Processor P0 checks if the new minimum is the best found so far in which case it
proceeds to send a message with this information to all processors. It also keepsthis minimum in memory both as a local minimum and as a candidate to be the
global (checking first that the minimum is different from the other minima found
so far).
• Each processor checks if there is a message from the central processor, only dur-
ing the tunneling phase. If it is already in a minimization phase, it continues the
phase until it finds a local minimum, without checking messages from the central
processor. It is not worth interrupting the search for the local minimum once in
the minimization phase, as this valley may be the global one.• When a processor checks for a message at the tunneling phase, if it gets a new
minimum, re-starts the tunneling search from this new minimum.
• The central processor has the information of all the minima found so far and of
the best optimal solution, and fills the output files with the information required
by the user.
• The central processor checks the general stopping conditions, prints the output
and stops the whole process.
528 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533
![Page 7: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/7.jpg)
As all the processors are searching for points in another valley (at the tunneling
phase), from different initial points (first in a neighbourhood of the last local mini-
mum and in the whole feasible region afterwards), they explore efficiently several
regions of the feasible space simultaneously.
3. Numerical results
Results obtained by the traditional tunneling method have been good so far for
different applications. Nevertheless, the development of recent computer systems has
given the possibility of exploiting intrinsic features of the method, to create the paral-
lel version described in the previous section to make it more efficient (in terms of con-
vergence) and to reduce the computational time. These two are the essentialcharacteristics for the operation of many industrial processes [16,17].
This section makes a comparison test between the results obtained using
only one processor (the sequential process), and several processors (the parallel
process). We solve some academic problems: Rastrigin-46, Rastrigin-49, Shaffer
[10] and the Lennard-Jones molecular structure problem for 38 and 40 atoms
[4,8].
The algorithm has been implemented on a Cluster of Workstations consisting of 4-
1.2 GHz PCs running Linux 2.4.18. They are connected through a 100 Mb/s Ether-net switch and the system uses the standard message passing interface MPI.
Fig. 1 shows a star topology used for this implementation. As described pre-
viously, the initial data is provided to the system. This is done by processor P0 whichis also in charge of controlling the communications between processors.
P0
P1
P2
P3 P4
Pn
Fig. 1. A star topology.
S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 529
![Page 8: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/8.jpg)
For this implementation of the system, 3 processors are used in addition to pro-
cessor P0. The performance evaluation for each case study is obtained from an
average of 20 runs of the parallel algorithm. All the results presented here, were ob-
tained using the exponential tunneling method.
Table 1 shows the results for the academic problems Rastrigin-46, Rastrigin-49and Shaffer. It is clear the time improvement obtained for the Rastrigin cases when
using several processors. Although the complexity of Rastrigin-49 is greater than
Ratrigin-46 the response is almost the same. For the Shaffer case, we observe the
same effect in terms of time performance but we can notice a greater improvement
when going from 2 to 3 processors.
Table 1
Results for academic problems
Tim
e (s
ec)
Tim
e (s
ec)
Tim
e (s
ec)
Processor Time (s) Speed-up Efficiency
Rastrigin-46 1 11.8779
2 6.3615 1.8671 0.9336
3 5.1673 2.2987 0.7662
Rastrigin-49 1 11.1996
2 6.8625 1.6320 0.8160
3 5.0457 2.2196 0.7399
Shaffer 1 18.0963
2 16.7247 1.0820 0.5410
3 2.6178 6.9129 2.3043
530 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533
![Page 9: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/9.jpg)
4. Molecular structure problems
The problem of finding the optimal structure of small molecules (clusters),
modelled using the Lennard-Jones potential, is one of the most challenging global
optimization problems, due to the number of local minima ðexpðn2ÞÞ. The objectivefunction is
f ðxÞ ¼Xn�1
i¼1
Xn
j¼iþ1
ðr�12ij � r�6
ij Þ
where ri;j ¼ ððxi � xjÞ2 þ ðyi � yjÞ2 þ ðzi � zjÞ2Þ1=2, and ðx; y; zÞ are the coordinates of
each atom in R3.
Here we will test the parallel method for a ‘‘normal’’ test case of 40 atoms (120
variables), and the specially difficult case of 38 atoms. It is known that most of
the structures in the range 13–147 atoms are icosahedral, with a few exceptions
(see [2–5,8]). The first non-icosahedral structure found was the 38 atoms, [8]. Taking
into account the fact that most structures are icosahedral, we usually take the struc-ture of n� 1 atoms, as the initial point for the n-atoms case (plus one atom at some
arbitrary point). The 40-atoms case is normal in the sense that the structure does not
change from 39 to 40 (remains icosahedral), whereas the 38 implies a large movement
of the atoms from the 37 to form an exceptional structure called face centered cubic
fcc, with many local minima. This case represents a real challenge for a global opti-
mization method.
Table 2 shows the results. Significant figures can be observed for 38 atoms, as the
system presents super speed-up rarely obtained in multiprocessing systems, that is, itshows a speed-up greater than 258 for 3 processors. This behaviour can be explained
by the use of different seeds for the random number generator at each processor,
which produces a diverse set of starting points of the tunneling search, giving as a
Table 2
Lennard-Jones for 38 atoms
Processor Time (s) Speed-up Efficiency
1 1370.8756
2 78.3758 17.4911 8.7455
3 5.3027 258.5241 86.1747
Table 3
Lennard-Jones for 40 atoms
Processor Time (s) Speed-up Efficiency
1 5.0218
2 0.9407 5.3384 2.6692
3 0.8865 5.6647 1.8882
S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 531
![Page 10: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/10.jpg)
result an efficient exploration of the feasible space. It is important to point out that
the migration of information conducted by the system gives, as observed, a better
probability of improving local minima and finding a point in the global valley.
For Lennard-Jones-40 (Table 3), results are similar although this time the levels of
speed-up are less spectacular.
5. Conclusions
The sequential tunneling methods have proved in the past to be successful for
solving several complex applications, but to be able to solve problems with particu-
larly expensive objective functions, it then becomes important to make them faster
and more efficient. The stochastic side of these methods and some recent develop-ments in computer architectures and communication tools that permits the use of
clusters, allow the development of the parallel tunneling method described in this pa-
per which has the following features:
• It is more efficient in terms of search exploration.
• It has better execution times.
The efficient exploitation of the intrinsic characteristics of the tunneling search, al-lows the development of an elegant parallel approach which can produce good re-
sults on simple topologies due to: the stochastic parallel search, the migration of
minimum values through the processes in the system and the low communications
overhead between processors.
The numerical results obtained on the academic examples presented, show consi-
derable reductions in execution times due simply to the use of a multiprocessing sys-
tem, with the well known limitations in terms of speed-up (i.e. asymptotical
reduction of speed-up when increasing the number of processors).In the Lennard-Jones 38 example, we observe a rare super speed-up effect (not
very common in multiprocessing systems) due to its complexity. The reason for this
behaviour is the efficient mechanisms provided by the parallel approach, which ex-
plores the search space in many regions simultaneously, with local minima informa-
tion available for all processes in the system. This has been essential to be able to find
the global minimum in less time than the sequential tunneling algorithm.
In summary, for all the examples tested, it has been shown that the parallel ver-
sion is better than the sequential method. We conclude by saying that the parallelmethod is a practical tool to solve real and complex problems.
Acknowledgements
We would like to thank Francisco C�aardenas Flores for the cluster set-up and his
valuable advice in the implementation. Also Jose Luis Gordillo Ruiz was very help-
ful for the use of the MPI protocol.
532 S. G�oomez et al. / Parallel Computing 29 (2003) 523–533
![Page 11: The parallel tunneling method](https://reader038.fdocuments.us/reader038/viewer/2022100421/575022331a28ab877ea398e9/html5/thumbnails/11.jpg)
References
[1] C. Barr�oon, S. G�oomez, The exponential tunneling method, Reporte de Investigaci�oon IIMAS 1 (3)
(1991) 1–23.
[2] C. Barr�oon, S. G�oomez, D. Romero, Archimedean polyhedron structure yields a lower energy atomic
cluster, Applied Mathematics Letters 9 (5) (1996) 75–78.
[3] C. Barr�oon, S. G�oomez, D. Romero, Lower energy icosahedral atomic cluster with incomplete core,
Applied Mathematics Letters 10 (4) (1997) 25–28.
[4] C. Barr�oon, S. G�oomez, D. Romero, A. Saavedra, A genetic algorithm for Lennard-Jones atomic
clusters, Applied Mathematics Letters 12 (7) (1999) 85–90.
[5] C. Barr�oon, S. G�oomez, D. Romero, The optimal geometry of Lennard–Jones clusters: 148–309,
Computer Physics Communication 123 (1999) 87–96.
[6] L. Castellanos, S. G�oomez, A new implementation of the Tunneling methods for bound constrained
global optimization, Reporte de Investigaci�oon IIMAS 10 (59) (2000) 1–18.
[7] S. G�oomez, A.V. Levy, The tunneling method for solving the constrained global optimization problem
with several non-connected feasible regions, Lecture Notes in Mathematics, Springer-Verlag 909
(1982) 34–47.
[8] S. G�oomez, D. Romero, Two global methods for molecular geometry optimization, Progress in
Mathematics, Birkhauser 121 (1994) 503–509.
[9] S. G�oomez, O. Gosselin, J. Barker, Gradient-based history––matching with a global optimization
method, Society of Petroleum Engineering Journal (June 2001) 200–208.
[10] S. G�oomez, J. Sol�oorzano, L. Castellanos, M.I. Quintana, Tunneling and genetic algorithms for global
optimization, in: N. Hadjisavvas, P. Pardalos (Eds.), Advances in Convex analysis and Global
Optimization. Non Convex Optimization and its Applications, Kluwer Academic Publishers, 2001,
pp. 553–567.
[11] A.V. Levy, A. Montalvo, The tunneling algorithm for the global minimization of functions, SIAM
Journal on Scientific and Statistical Computing 6 (1) (1985) 15–29.
[12] A.V. Levy, S. G�oomez, The tunneling method applied to global optimization, in: P.T. Boggs, R.H.
Byrd, R.B. Schnabel (Eds.), Numerical Optimization, SIAM, 1985, pp. 213–244.
[13] D.V. Nichita, S. G�oomez, E. Luna, Multiphase equilibria calculation by direct minimization of Gibbs
free energy with a global optimization method, Computers and Chemical Engineering, in press.
[14] D.V. Nichita, S. G�oomez, E. Luna, Phase stability analysis with cubic equations of state using a global
optimization method, Fluid Phase Equilibria 4943, Special Issue, (May 2002) 1–27.
[15] C. Pantelides, B. Keeping, S. Gomez, A hybrid branch and bound-tunneling method for the global
optimization of bounded problems, Presented at the conference Optimization 98, Coimbra Portugal,
19–22 July 1998.
[16] C.B. Pettey, M.R. Leuze, A theoretical investigation of parallel genetic algorithms, in: Proceedings of
the Third International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA,
1989, pp. 398–405.
[17] J. Solano Gonz�aalez, K. Rodr�ııguez V�aazquez, D.F. Garc�ııa Nocetti, Model-based spectral estimation of
Doppler signals using parallel genetic algorithms, Artificial Intelligence in Medicine 19 (2000) 75–89.
[18] C. Zhu, R.H. Byrd, P. Liu, J. Nocedal, Algorithm 778: L-BFGS-B Fortran subroutines for large-scale
bound constrained optimization, TOMS 23 (4) (1997) 550–560.
S. G�oomez et al. / Parallel Computing 29 (2003) 523–533 533