Nature-inspired metaheuristic algorithms for optimization and computional intelligence
An Introduction to Nature-inspired Metaheuristic Algorithms · An Introduction to Nature-inspired...
Transcript of An Introduction to Nature-inspired Metaheuristic Algorithms · An Introduction to Nature-inspired...
An Introduction to Nature-inspired
Metaheuristic Algorithms
Dr P. N. Suganthan School of EEE, NTU, Singapore
Workshop on Particle Swarm Optimization and
Evolutionary Computation
Institute for Mathematical Sciences, NUS
Feb 20th, 2018
Some Slides Contributed by Dr Swagatam Das, ISI,
Kolkata, and Dr Kaizhou Gao, NTU.
Benchmark Functions & SurveysResources available from
http://www.ntu.edu.sg/home/epnsugan
IEEE SSCI 2018, Bangaluru, in Nov. 2018
EMO-2019, Evolutionary Multi-Criterion Optimization
10-13 Mar 2019, MSU, USA
https://www.coin-laboratory.com/emo2019
2
Randomization-Based ANN, Pseudo-Inverse Based
Solutions, Kernel Ridge Regression, Random
Forest and Related Topicshttp://www.ntu.edu.sg/home/epnsugan/index_files/RNN-Moore-Penrose.htm
http://www.ntu.edu.sg/home/epnsugan/index_files/publications.htm
Consider submitting to
SWEVO journal dedicated to
the EC-SI fields. SCI
Indexed from Vol. 1, Issue 1.
2 Year IF= 3.8
5 Year IF=7.7
3
Swarm & Evolutionary
Computation Journal
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 4
5
• What is the practical goal of (global)
optimization?
– “There exists a goal (e.g. to find as small a
value of f() as possible), there exist resources
(e.g. some number of trials), and the problem is
how to use these resources in an optimal way.”
– A. Torn and A. Zilinskas, Global Optimisation.
Springer-Verlag, 1989. Lecture Notes in
Computer Science, Vol. 350.
Global Optimization
6
Hard Optimization Problems
• Goal: Find
– where S is often multi-dimensional; real/integer valued
or binary
– And subject to
– Many classes of optimization problems (and algorithms)
exist.
– When might it be worthwhile to consider metaheuristic
approaches?
SffS xxxx ),(*)(such that *
nn SS 1,0or R
( ) 0, 1,...,
, 1,...,
j
i i i
g j M
A x B i N
x
7
• A candidate solution to an optimization problem specifies the values of the decision variables, and therefore also the value of the objective function.
• A feasible solution satisfies all constraints.
• An optimal solution is feasible and provides the best objective function value.
• A near-optimal solution is feasible and provides a superior objective function value, but not necessarily the best.
Types of Solutions
8
• Optimization problems can be continuous (an
infinite number of feasible solutions) or
combinatorial (a finite number of feasible solutions).
• Continuous problem generally maximize or
minimize a function of continuous variables such as
min 4x + 5y where x and y are real numbers
• Combinatorial problems generally maximize or
minimize a function of discrete variables such as
min 4x + 5y where x and y are countable items (e.g.
integer only).
• Mixed decision variables are also possible.
Continuous vs Combinatorial
9
• Combinatorial optimization is the mathematical study of finding an optimal arrangement, grouping, ordering, or selection of discrete objects usually finite in numbers.
- Lawler, 1976
• In practice, combinatorial problems are often more difficult because there is no derivative informationand the surfaces are not smooth.
• In general nature inspired metaheuristics are not effective for solving combinatorial problems compared to heuristics.
• However, hybrids are very effective.
Definition of Combinatorial Optimization
10
• Constraints can be hard (must be satisfied) or soft
(is desirable to satisfy).
Example: In your course schedule a hard constraint
is that no classes overlap. A soft constraint is that no
class be before 10 AM.
Constraints may be formulated as objectives and
multi-objective evolutionary algorithms can be used.
• Constraints can be explicit (stated in the problem) or
implicit (obvious to the problem).
Constraints
11
• TSP (Traveling Salesman Problem)
Given the coordinates of n cities, find the shortest
closed tour which visits each once and only once
(i.e. exactly once).
Example: In the TSP, an implicit constraint is that
all cities be visited once and only once.
Constraints
-12-
– Combinatorial, continuous problem
– Bound constrained problem
– Constrained problem
– Single / Multi-objective problem
– Static / Dynamic optimization problem
– Expensive Problem
– Large Scale problem
– With/out noise
– Multimodal problem (niching, crowding, etc.)
Aspects of an Optimization Problem
13
• Constructive search techniques work by constructing a
solution step by step, evaluating that solution for (a) feasibility
and (b) objective function.
• Improvement search techniques obtain a solution by moving
to a neighboring solution, evaluating that solution for (a)
feasibility and (b) objective function.
---------------------------------------
• Search techniques may be deterministic (always arrive at the
same final solution through the same sequence of solutions,
although they may depend on the initial solution). Examples
are LP (simplex method), tabu search, simple heuristics, and
greedy heuristics.
• Search techniques may be stochastic where the solutions
considered and their order are different depending on random
variables. Examples are metaheuristic methods.
Solution Approaches
14
Simple
Few decision variables
Differentiable
Single modal
Objective easy to calculate
No or light constraints
Feasibility easy to determine
Single objective
deterministic
Hard
Many decision variables
Discontinuous, combinatorial
Multi modal
Objective difficult to calculate
Severely constraints
Feasibility difficult to determine
Multiple objective
Stochastic
15
• For Simple problems, enumeration or exact
methods such as differentiation or mathematical
programming or branch and bound will work the
best.
• For Hard problems, differentiation is not possible
and enumeration and other exact methods such as
math programming are not computationally
practical. For these problems, heuristics are used.
16
Search is the term used for constructing/improving solutions to obtain the optimum or near-optimum.
Solution Encoding (representing the solution)
Neighborhood Nearby solutions (in the encoding or solution space)
Move Transforming current solution to another (usually neighboring) solution
Evaluation The solutions’ feasibility and objective function value
Search Basics
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 17
MetaheuristicsA few points from: https://en.wikipedia.org/wiki/Metaheuristic
I. Heuristic (from Greek εὑρίσκω "I find, discover") is a technique
designed for finding an approximate more quickly when classic methods
are too slow, or fail to find any exact solution. This is achieved by
trading optimality, completeness, accuracy, or precision for speed.
II. Meta – beyond, for example inspiration from nature.
III. Metaheuristics may make few assumptions about the optimization
problem being solved, and so they may be usable for a variety of
problems.
IV. Many metaheuristic methods have been published with claims of
novelty and practical efficacy. While the field also features high-quality
research, unfortunately many of the publications have been of poor
quality; flaws include vagueness, lack of conceptual elaboration, poor
experiments, and ignorance of previous literature.
V. If you wish to be known as a “father of an invention”, nature-
inspired metaheuristics is a good field … ☺18
MetaheuristicsI. When to use: Only when traditional methods with proofs, etc. can’t
offer a satisfactory solution within the available time. – As these two research fields have diverged, cross-checking is not done
frequently, i.e. recent development of traditional methods are ignored …
II. Proofs for Metaheuristics: A little contradictory as metaheuristics
are developed to complement mathematical programming methods
with proofs, etc. – Metaheuristics claims to make no assumption about the problem being solved
while mathematical theories developed for these metaheuristics have several
assumptions, etc.
III. From “Metaheuristics – the Metaphor Exposed”, by Kenneth Sorensen:https://web.archive.org/web/20131102075645/http://antor.ua.ac.be/system/files/mme.pdf
19
Some of the Metaheuristicshttps://en.wikipedia.org/wiki/List_of_metaphor-
based_metaheuristics
Evolution strategy (1960s), Genetic algorithms (1970s)
– 1.1 Simulated annealing (Kirkpatrick et al. 1983)
– 1.2 Ant colony optimization (Dorigo, 1992)
– 1.3 Particle swarm optimization (Kennedy &
Eberhart 1995)
– 1.4 Harmony search (Geem, Kim & Loganathan
2001)
– 1.5 Artificial bee colony algorithm (Karaboga 2005)
– 1.6 Bees algorithm (Pham 2005)
– 1.7 Glowworm swarm optimization (Krishnanand &
Ghose 2005)
– 1.8 Shuffled frog leaping algorithm (Eusuff, Lansey
& Pasha 2006)
– 1.9 Imperialist competitive algorithm (Atashpaz-
Gargari & Lucas 2007)
20
Some of the Metaheuristicshttps://en.wikipedia.org/wiki/List_of_metap
hor-based_metaheuristics
– 1.10 River formation dynamics
(Rabanal, Rodríguez & Rubio 2007)
– 1.11 Intelligent water drops algorithm
(Shah-Hosseini 2007)
– 1.12 Gravitational search algorithm
(Rashedi, Nezamabadi-pour &
Saryazdi 2009)
– 1.13 Cuckoo search (Yang & Deb
2009)
– 1.14 Bat algorithm (Yang 2010)
– 1.15 Spiral optimization (SPO)
algorithm (Tamura & Yasuda
2011,2016-2017)
– 1.16 Flower pollination algorithm (Yang
2012)
21
– 1.17 Cuttlefish optimization
algorithm (Eesa, Mohsin, Brifcani &
Orman 2013)
– 1.18 Artificial swarm intelligence
(Rosenberg 2014)
– 1.19 Duelist Algorithm (Biyanto
2016)
– 1.20 Killer Whale Algorithm (Biyanto
2016)
– 1.21 Rain Water Algorithm (Biyanto
2017)
– 1.22 Mass and Energy Balances
Algorithm (Biyanto 2017)
– 1.23 Hydrological Cycle Algorithm
(Wedyan et al. 2017)
Never possible to have a
complete list of
metaheuristics …
More on Metaheuristics• Dr Xin She Yang’s Scholar page:
https://scholar.google.co.uk/citations?user=fA6aTlAAAAAJ
• Dr Seyedali Mirjalili’s scholar page:
https://scholar.google.com/citations?user=TJHmrREAAAAJ&hl
=en
• Cultural Algorithms, Brainstorming, Fireworks,
Spider Monkey, Egyptian Vulture, Wolf, Grey Wolf,
…
22
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 23
The Genetic Algorithm (GA)
• Directed search algorithms based on the
mechanics of biological evolution
• Developed by John Holland, Ken De Jong,
University of Michigan (1970’s)
– To understand the adaptive processes of natural
systems
– To design artificial simulation software that
retains the robustness of natural systems
24
Evolution in the real world
• Each cell of a living thing contains chromosomes - strings of DNA
• Each chromosome contains a set of genes - blocks of DNA
• Each gene determines some aspect of the organism (like eye colour)
• A collection of genes is sometimes called a genotype
• A collection of aspects (like eye colour) is sometimes called a phenotype
• Reproduction of offspring involves recombination (orcrossover) of genes from parents and then small amounts of mutation (errors) in copying
• The fitness of an organism is how much it can reproduce before it dies
• Evolution based on “survival of the fittest” realized by selection operation. 25
Emulating Evolution: GA
Generate a population of random chromosomes (i.e.
potential solutions to the problem)
Repeat (each generation)
Calculate fitness of each chromosome
Repeat
Select pairs of parents
Generate offspring with crossover and mutation
Until a new population has been produced
Until best solution is good enough26
How do you encode a solution?
• Obviously this depends on the problem!
• GA’s often encode solutions as fixed length bitstrings (e.g.
101110, 111111, …) (although real-coded GAs exist now)
• For the GA to work, we need to be able to “test” any string
and get a “score” (fitness) indicating how “good” that
solution is
• GA is a competitive method if the problem naturally has
binary solution. In this case, each bit represents a decision
variable.
• However, it is also possible to encode integers, floating
values by bitstrings using quatization with prespecified
range for each decision variable. 27
Search Space• By encoding several decision variables into the chromosome
many dimensions can be searched, e.g. two dimensions f(x,y)
• Search space an be visualised as a surface or fitness landscape in which fitness dictates height
• Each possible genotype is a point in the space
• A GA tries to move the points to better places (higher fitness) in the space
• Obviously, the nature of the search space dictates how a GA will perform– A completely random space would be bad for a GA
– Also GA’s can get stuck in local maxima if search spaces contain lots of these
– Generally, spaces in which small improvements get closer to the global optimum are good
28
Fitness landscapes
29
The figures are for 2-dimensional real decision
variables where vertical height represents fitness
of each point.
Above 2 landscapes are suitable for GA to search while the 3rd is not.
Parent Selection, Crossover & Mutation
• Two highly fit parent bit strings are selected and
randomly combined to produce two new offspring
(bit strings).
• Many schemes are possible so long as better scoring chromosomes/parents more likely selected
• “Roulette Wheel” selection can be used:– Add up the fitness's of all chromosomes
– Generate a random number R in that range
– Select the first chromosome in the population that - when all previous fitness’s are added - gives you at least the value R
• A few bits in each offspring may then be changed
randomly (mutation)30
Recombination by One Point Crossover
101000000
100101111
Crossover
single point -
random
1011011111
100000000
Parent1
Parent2
Offspring1
Offspring2
With some high probability (crossover
rate) apply crossover to the parents.
(typical values are 0.8 to 0.95)
Other alternatives are 2-point crossover and uniform crossover.
2-point crossover is a simple extension of one point crossover
Uniform crossover randomly mix bits from two parents to
obtain 2 offspring 31
Mutation
1011011111
100000000
Offspring1
Offspring2
1011001111
101000000
Offspring1
Offspring2
With some small probability (the mutation rate) flip
each bit in the offspring (typical values between 0.1
and 0.001, or just about a couple of bits)
mutated
Original offspring Mutated offspring
32
Many Variants of GA• Different parent selection methods to generate
offspring– Tournament
– Elitism,
– Roulette wheel, etc.
• Different recombination– Multi-point crossover
– Multi-parent crossover, etc.
• Different kinds of encoding other than bitstring– Integer values
– Floating values
– Categorical, symbolic, etc.
• Different kinds of mutation (for non-binary encoding)
33
Many parameters/Operators to set
• Any GA implementation needs to decide on a number of parameters: Population size (N), mutation rate (m), crossover rate (c)
• Typical parameter (arbitrary) values might be: N = 50, m = 0.05, c = 0.9
• Selection of operators such as 1-point / 2-point / uniform crossover, roulette wheel / tournament selection, etc.
• Often these have to be “tuned” based on results obtained - no general theory to deduce good values
• Adaptive methods with ensemble of operators and parameter values can also be used.
34
Why does GA work?
• Some theories about this and some controversy
• Holland introduced Schema theory
• The idea is that crossover preserves “good bits” from different parents, combining them to produce better solutions
• A good encoding scheme would therefore try to preserve “good bits” during crossover and mutation
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 36
-37-
Evolution Strategy (ES, from 1960s)
• Real variable optimizer
• Randomly generate an initial population of M solutions.
Compute the fitness values of these M solutions.
• Use all vectors as parents to create nb offspring vectors by
Gaussian mutation.
– Gaussian mutation: Xnew=Xold+N(0, )
• Calculate the fitness of nb vectors, and prune the
population to M fittest vectors.
• Go to the next step if the stopping condition is satisfied.
Otherwise, go to Step 2.
• Choose the fittest one in the population in the last
generation as the optimal solution.
• CMA-ES is a competitive state of the art variant of ES.
Differential Evolution• A stochastic population-based algorithm for continuous function
optimization (Storn and Price, 1995)
• Finished 3rd at the First International Contest on Evolutionary Computation, Nagoya, 1996 (icsi.berkley.edu/~storn)
• Outperformed several variants of GA and PSO over a wide variety of numerical benchmarks over past several years.
• Continually exhibited remarkable performance in competitions on different kinds of optimization problems like dynamic, multi-objective, constrained, and multi-modal problems held under IEEE Congress on Evolutionary Computation (CEC) conference series.
• Very easy to implement in any programming language.
• Very few control parameters (typically three for a standard DE) and their effects on the performance have been well studied.
• Complexity is very low as compared to some of the most competitive continuous optimizers like CMA-ES. 38
DE is an Evolutionary Algorithm
This Class also includes GA, Evolutionary Programming and Evolutionary Strategies
Initialization Mutation Recombination Selection
Basic steps of an Evolutionary Algorithm
39
Representation
Min
Max
May wish to constrain the values taken in each domain
above and below.
x1 x2 x D-1 xD
Solutions are represented as vectors of size D with each
value taken from some domain.
X
40
Population Size - NP
x1,1 x2,1 x D-1,1 xD,1
x1,2 x2,2 xD-1,2 xD,2
x1,NP x2,NP x D-1,NP xD, NP
We will maintain a population of size NP
1X
2X
NPX
41
Different values are instantiated for each i and j.
Min
Max
x2,i,0 x D-1,i,0 xD,i,0x1,i,0
, ,0 ,min , ,max ,min[0,1] ( )j i j i j j jx x rand x x
0.42 0.22 0.78 0.83
Initialization Mutation Recombination Selection
, [0,1]i jrand
iX
42
Initialization Mutation Recombination Selection
➢For each vector select three other parameter vectors randomly.
➢Add the weighted difference of two of the parameter vectors to the
third to form a donor vector (most commonly seen form of
DE-mutation):
➢The scaling factor F is a constant from (0, 2)
➢Self-referential Mutation
).(,,,,
321 GrGrGrGi iii XXFXV
43
Initialization Mutation Recombination Selection
Components of the donor vector enter into the trial offspring vector in the following way:
Let jrand be a randomly chosen integer between 1,...,D.
Binomial (Uniform) Crossover:
44
Initialization Mutation Recombination Selection
➢“Survival of the fitter” principle in selection: The trial
offspring vector is compared with the target (parent)
vector and the one with a better fitness is admitted to the
next generation population.
1, GiX
,,GiU
)()( ,, GiGi XfUf
if
,,GiX
if )()( ,, GiGi XfUf
➢Importance of parent-mutant crossover & parent-
offspring competition-based selection
45
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 46
-47-
Particle Swarm Optimizer
• Introduced by Kennedy and Eberhart in 1995
• Emulates flocking behavior of birds to solve optimization problems
• Each solution in the landscape is a particle
• All particles have fitness values and velocities
-48-
Particle Swarm Optimizer• Two versions of PSO
– Global version (May not be used alone to solve multimodal
problems): Learning from the personal best (pbest) and the best
position achieved by the whole population (gbest)
– Local Version: Learning from the pbest and the best position
achieved in the particle's neighborhood population (lbest)
• The random numbers (rand1 & rand2) should be generated
for each dimension of each particle in every iteration.
1 2* 1 ( ) * 2 ( )
d d d d d d d
i i i i i i
d d d
i i i
V c rand pbest X c rand gbest X
X X V
1 2* 1 ( ) * 2 ( )
d d d d d d d
i i i i i k i
d d d
i i i
V c rand pbest X c rand lbest X
X X V
i – particle counter & d – dimension counter
+ 𝜔𝑉𝑖𝑑
-49-
Particle Swarm Optimizer
• where and in the equation are the acceleration constants
are two random numbers in the range [0,1];
• represents the position of the ith particle;
• represents the rate of the position change
(velocity) for particle i.
• represents the best previous
position (the position giving the best fitness value) of the ith
particle;
• represents the best previous
position of the population;
• represents the best previous
position achieved by the neighborhood of the particle;
1c 2c
1 and 2d d
i irand rand1 2( , ,..., )D
i i i iX X XX
1 2( , ,..., )D
i i i ipbest pbest pbestpbest
1 2( , ,..., )Dgbest gbest gbestgbest
1 2( , ,..., )D
i i i iV V VV
1 2( , ,..., )D
k k k klbest lbest lbestlbest
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 50
Harmony search algorithm
The following work suggests that HSA can be derived from ES:
“Metaheuristics – the Metaphor Exposed”, by Kenneth Sorensen:https://web.archive.org/web/20131102075645/http://antor.ua.ac.be/system/files/mme.pdf
Geem, Zong Woo, Joong Hoon Kim, and G. V. Loganathan (2001). A new heuristic optimization
algorithm: harmony search. Simulation 76.2: pp. 60-68.
Kai-Zhou Gao, Ponnuthurai N Suganthan, Quan-Ke Pan. Pareto-based grouping discrete
harmony search algorithm for multi-objective flexible job shop scheduling. Information sciences,
289: 76-90, 2014
Water Cycle Algorithm: Basic Concept
Se
a
Strea
m
River
Sea
Hadi Eskandar, Ali Sadollah, Ardeshir Bahreininejad, Mohd Hamdi. Water cycle algorithm – A novel metaheuristic optimization method for solving constrained engineering optimization problems. Computers & Structures, 110-111: 151-166, 2012.
Kaizhou Gao, Peiyong Duan, Rong Su, Junqing Li. Bi-objective Water Cycle Algorithm for Solving Remanufacturing Rescheduling Problem. Proceedings of Asia-Pacific Conference on Simulated Evolution and Learning (SEAL 2017), 671-683.
Schematic view of Water Cycle Process
Nature Water Cycle Algorithm
Precipitation Initial Population
Stream(s) Individual(s) of population
River(s) Second best solution (a number of best solution)
Sea Best solution (optimum solution)
Surface Runoff Moving streams to rivers, and rivers to sea
Evaporation Evaporation condition
Water cycle process Iteration
Water Cycle Algorithm: Movement Strategy
( 1) ( ) ( ( ) ( ))Stream Stream River StreamX t X t rand C X t X t
( 1) ( ) ( ( ) ( ))Stream Stream Sea StreamX t X t rand C X t X t
( 1) ( ) ( ( ) ( ))River River Sea RiverX t X t rand C X t X t
1
1 1 1 12
1 2 3
3 2 2 2 2
1 2 3
4
5
1 2 36
pop pop pop pop
pop
N
N
N N N N
N
N
Sea
River
Riverx x x x
Riverx x x x
Total Population Stream
Streamx x x x
Stream
Stream
1SR
Sea
N Number of Rivers
Streams Pop SRN N N
Jaya Algorithm
• The definition of Jaya is victory in Sanskrit
• In the Jaya algorithm, the applied strategy always tries to
become victorious by reaching the best solution and hence it
is named Jaya, and it is reported as a simple and applicable
optimization approach in the literature.
1 2( 1) ( ) ( ( ) ( )) ( ( ) ( )), 1,2,3,...,i i i i i i
Best Worst PopX t X t r X t X t r X t X t i N
R. Venkata Rao, Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. International Journal of Industrial Engineering Computations, 7: 19–34, 2016.
K Gao, Y Zhang, A Sadollah, A Lentzakis, R Su. Jaya, harmony search and water cycle algorithms for solving large-scale real-life urban traffic light scheduling problem. Swarm and Evolutionary Computation 37, 58-72, 2017.
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 58
59
Bees in nature
ABC Algorithm
Artificial Bee Colony Algorithm
D. Karaboga, AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL
OPTIMIZATION,TECHNICAL REPORT-TR06,Erciyes University, Engineering
Faculty, Computer Engineering Department 2005.
60
Bees in Nature1- A colony of honey bees can extend itself over long
distances in multiple directions (more than 10 km)
Flower patches with plentiful amounts of nectar or pollen that
can be collected with less effort should be visited by more
bees, whereas patches with less nectar or pollen should
receive fewer bees.
62
Bees in Nature
3- The bees who return to the hive, evaluate
the different patches depending on certain
quality threshold (measured as a
combination of some elements, such as
sugar content)
63
Bees in Nature
4- They deposit their nectar or pollen go to
the “dance floor” to perform a “waggle
dance”
64
Bees in Nature
5- Bees communicate through this waggle
dance which contains the following information:
1. The direction of flower patches
(angle between the sun and the
patch)
2. The distance from the hive
(duration of the dance)
3. The quality rating (fitness)
(frequency of the dance)
65
Bees in Nature
These information helps the colony to send its bees precisely
6- Follower bees go after the dancer bee to the patch to gather
food efficiently and quickly
7- The same patch will be advertised in the waggle dance again
when returning to the hive is it still good enough as a food
source (depending on the food level) and more bees will be
recruited to that source
8- More bees visit flower patches with plentiful amounts of
nectar or pollen
66
Bees in Nature
Thus, according to the fitness, patches can
be visited by more bees or may be
abandoned
Artificial Bee Colony Algorithm
Initialization
• The food sources indicating the probable solutions to the objective
function are represented as where c is the cycle number and i is the
running index that can takes values {1, 2.. SN} (SN being food
number). The j th component of the i th food source (real vector)
are initialized as
max max min
0 (0,1).i j j j jrand
where j is {1,2…D} for a D-dimensional problem.
The food sources continue to improve upon their previous values
till the termination criteria are met. 67
Artificial Bee Colony Algorithm
Employer Bee Phase
• Each food source has an employed forager associated with it and their
population size equals the number of food sources. An employed bee
modifies the position of the food source and searches an adjacent food
source with respect to a single dimension as:
.i j i j i j k j i jv c c c
Now the fitness of the newly produced site is calculated.
1/ 1 ( ) if ( ) 0
1 ( ) if ( ) 0
i i
i
i i
f ffitness
f f
where f (.) denote the objective function.
The fitter one is selected by using greedy selection.68
Artificial Bee Colony Algorithm
Onlooker Bee Phase
• The onlooker bees select the food source based on its nectar
content. The chances of selecting a food source is determined by
its probability value computed as
1
ii SN
i
i
fitnessp
fitness
Identical to the employed bee phase.
A random number is generated in the range [0, 1]
If this value is greater than probability (calculated above) for a
given food source, positional modification is done.
This is carried on till all the onlookers have been allotted a food
source.69
Artificial Bee Colony Algorithm
Scout Bee Phase
• The scout phase was designed to prevent the wastage of
computational resources.
• In case the trial counter for a food source exceeds a pre-
defined limit then it is deemed to be exhausted such that no
further improvement is possible.
• In such a case, it is randomly reinitialized and its fitness value
is re-evaluated and counter is reset to 0.
• This is motivated from the fact that repeated exploitation of
nectar by a forager causes the exhaustion of the nectar content.
70
71
Cuckoo Search Algorithm• Cuckoos have a parasitic breeding behaviour and
they engage in the obligate brood parasitism by
laying their eggs in the nests of other host birds
(often other species).
• Three basic types of brood parasitism: intra-
specific brood parasitism, cooperative breeding,
and nest takeover.
• If a host bird discovers the eggs are not their
owns, they will either throw these alien eggs away
or simply abandon the nest and build a new nest
elsewhere.
• Some cuckoo species such as the New World
brood-parasitic Tapera have evolved in such a
way that female parasitic cuckoos are often very
specialized in the mimicry in color and pattern of
the eggs of a few chosen host species.
• The timing of egg-laying of some species is also
amazing
X.-S. Yang; S. Deb (December 2009). "Cuckoo search via Lévy flights". World Congress
on Nature & Biologically Inspired Computing (NaBIC 2009). IEEE Publications. pp. 210–
214.
Cuckoo search algorithm• In CS algorithm each egg in a nest represents a solution, and a cuckoo egg
represent a new solution, the aim is to use the new and potentially better solutions (cuckoos) to replace a not-so-good solution in the nests.
• The basic steps of the CS algorithm involve:
1) Each cuckoo lays one egg at a time, and dump its egg in randomly chosen nest;
2) The best nests with high quality of eggs will carry over to the next generations;
3) The number of available host nests is fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability [0, 1].
• Levy flight is used in generating new cuckoo solutions:
where α > 0 is the step size which is related to the scales of the problem of interest. In most cases, it was taken as α = 1. The product ⊕ means entry-wise multiplications. Lévy flights essentially provide a random walk while their random steps are drawn from a Lévy distribution:
)()()1( Levyxx t
i
t
i
tuLevy ~
72
Group Search Optimization
(GSO)•GSO is inspired by the food searching
behavior and group living theory of social
animals, such as birds, fish and lions.
•The foraging strategies of these animals
mainly include: producing, e.g., searching
for food; and joining (scrounging), e.g.,
joining resources uncovered by others.
•GSO also employs ‘rangers’ which
perform random walks to avoid entrapment
in local minima.
S. He, Q.H. Wu, J.R. Saunders, Group Search Optimizer: an optimization algorithm
Inspired by animal searching behavior, IEEE Transactions on Evolutionary
Computation 13 (October (5)) (2009) 973–990. 74
Members of GSO In GSO, a group consists of three kinds of members:
• Producers
• Scroungers
• Rangers
The producer, scroungers and rangers do not differ intheir relevant phenotypic characteristics. Therefore, theycan switch among the three roles.
The framework mainly follows the Producer–Scrounger(PS) model, which states that group members searcheither for “finding” (producer) or for “joining” (scrounger)opportunities.
75
• At each iteration, a group member, located in the most promising area,
conferring the best fitness value, is chosen as the producer. It locates in the
most promising area and stays still.
• The other group members are selected as scroungers or rangers by
random. Then, each scrounger makes a random walk towards the producer,
and the rangers make a random walk in arbitrary direction.
Individual Representation in GSO
The i-th member at the k-th iteration is located in a position:nk
i RX
The associated head angle is: 1 1
,...........,n
k k k
i i i
The search direction associated with the i-th member is represented as a
unit vector : 1,...........,
n
k k k k n
i i i iD d d R
The vector can be obtained from Cartesian to Polar transformation as:
1
1
1
cos .q
nk k
i i
q
d
1
1
sin cos ,j qn
nk k k
i i i
q j
d
2,3,..., .j n
1
sin .n n
k k
i id
76
Producer activities
At the k-th iteration the producer behaves as:
• The producer will scan at zero degree and then scan
laterally by randomly sampling three points in the
scanning field: one point at zero degree:
• One point in the left side of the hypercube:
• And one point in the right hand side hypercube:
where r is a normally distributed random number with
mean=0 and standard deviation=1
kk
p
k
pz DlrXX max1
2/max2max1 rDlrXX kk
p
k
pr
2/max2max1 rDlrXX kk
p
k
pl
77
Contd.
• The producer will then find the best point with the best
resource (fitness value). If the best point has a better
resource than its current position, then it will fly to this
point. Or it will stay in its current position and turn its
head to a new angle:
where is the maximum turning angle.
• If the producer cannot find a better area after a iterations,
it will turn its head back to zero degree:
where a is a constant.
max2
1 rkk
1
max R
kak
78
Scrounger dynamics
• At the k-th iteration, the area copying behavior of the i-th
scrounger can be modeled as a random walk towards
the producer:
• where is a uniform random sequence in the
range (0, 1).
k
i
k
p
k
i
k
i XXrXX 3
1
nRr 3
79
Ranger movements
• Besides the producer and the scroungers, a small number of
rangers have been also introduced into the GSO algorithm.
• Random walks, which are thought to be the most efficient
searching method for randomly distributed resources, are
employed by rangers.
• If the i-th member is selected as a ranger, at the k-th
iteration, it produces a random heading angle:
• then it chooses a random distance:
and moves to the point:
max2
1 rkk
max1 lrli
11 kk
ii
k
i
k
i DlXX
80
Behavior of Fireflies
• Most of the two thousand firefly species, and mostfireflies produce short and rhythmic flashes which areunique a particular species.
• The flashing light is produced by a process ofbioluminescence.
• The fundamental functions of such flashes are– to attract mating partners (communication),
– to attract potential prey.
– flashing may also serve as a protective warningmechanism.
• Both sexes of fireflies are brought together via the rhythmic flash, the rate of flashing and the amount of time form part of the signal system.
81
• The flashing light produced by fireflies can be
formulated to be associated with the objective
function to be optimized, which makes it possible
to formulate new optimization algorithms.
83
Firefly Algorithm
• For simplicity in describing our new Firefly Algorithm
(FA), we now use the following three idealized rules:– All fireflies are unisex so that one firefly will be attracted to other fireflies
regardless of their sex;
– Attractiveness is proportional to their brightness, thus for any two
flashing fireflies, the less brighter one will move towards the brighter
one. The attractiveness is proportional to the brightness and they both
decrease as their distance increases. If there is no brighter one than a
particular firefly, it will move randomly;
– The brightness of a firefly is affected or determined by the landscape of
the objective function.
• Other forms of brightness can be defined in a similar
way to the fitness function in genetic algorithms.
84
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 92
• The similarities often stem from
The very objective of designing good
metaheuristics
97
Should have a good trade-off between
exploitation and exploration
Must be able to rapidly converge to the
global optimum
Should not impose serious computational
overheads
Should have very few (or no) algorithmic
control parameters except the general ones
(population size, total no. of iterations,
problem dimensions etc.)
Efficient
metaheuristics
The nature-inspired story lines are different. When the story
is converted to equations, they become too similar !!
Hybridization Aspects• Synergy of two or more SI algorithms may lead to
new global optimizers with great performance on a
set of problems.
• The biggest challenge is to determine
– which algorithms to combine?
– Which component s should be taken?
• Hybridization must perform better than each
component algorithm (state of the art variants).
• Successful in the combinatorial cases: Nature
inspired algorithms + local search heuristics.
• Limited research community & less successful in
real variable optimization. 98
Genetic Programming (from 1960s/70s)
• When the chromosome encodes an entire
program or function itself this is called
genetic programming (GP)
• In order to make this work encoding is often
done in the form of a tree representation
• Crossover entails swapping subtrees
between parents
• Successful in symbolic regression tasks.
Genetic Programming
It is possible to evolve whole programs like this but
only small ones. Large programs with complex
functions present big problems
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 101
Theoretical Studies• Some works are for population-based, though they are based
on simplifications like only one global minimum, smooth
objective function, constant coefficients instead of stochastic.
Bacterial Foraging works assume small population.
• Other works are mostly on (1+1) EAs, very easy binary
problems (like onemax, maximize the sum of bits in a bit string).
• My standard review question: “Can you demonstrate that your
theory can overall improve performance of a state of the art
algorithm?”
• The answer is always “no” as theories are for simplified models
while state of the art methods are too complicated for exact
analysis.
• Proofs & rigorous analysis for metaheuristics: A little contradictory as
metaheuristics are developed to complement mathematical
programming methods with proofs, etc. 102
Theoretical Studies• Sayan Ghosh, Swagatam Das, Athanasios V. Vasilakos, Kaushik
Suresh: On Convergence of Differential Evolution Over a Class of
Continuous Functions With Unique Global Optimum. IEEE Trans.
Systems, Man, and Cybernetics, Part B 42(1): 107-124 (2012).
• Swagatam Das, Arpan Mukhopadhyay, Anwit Roy, Ajith Abraham,
Bijaya K. Panigrahi: Exploratory Power of the Harmony Search
Algorithm: Analysis and Improvements for Global Numerical
Optimization. IEEE Trans. Systems, Man, and Cybernetics, Part B
41(1): 89-106 (2011).
• Sayan Ghosh, Swagatam Das, Debarati Kundu, Kaushik Suresh, Ajith
Abraham: Inter-particle communication and search-dynamics of lbest
particle swarm optimizers: An analysis. Inf. Sci. 182(1): 156-168
(2012).
• Swagatam Das, Sambarta Dasgupta, Arijit Biswas, Ajith Abraham,
Amit Konar: On Stability of the Chemotactic Dynamics in Bacterial-
Foraging Optimization Algorithm. IEEE Trans. Systems, Man, and
Cybernetics, Part A 39(3): 670-679 (2009).
103
Theoretical Studies
• Swagatam Das, Sambarta Dasgupta, Arijit Biswas, Ajith Abraham, Amit
Konar: On Stability of the Chemotactic Dynamics in Bacterial-Foraging
Optimization Algorithm. IEEE Trans. Systems, Man, and Cybernetics, Part
A 39(3): 670-679 (2009).
• Sambarta Dasgupta, Swagatam Das, Arijit Biswas, Ajith Abraham: On
stability and convergence of the population-dynamics in differential
evolution. AI Commun. 22(1): 1-20 (2009).
• Jun He, Xin Yao: Average Drift Analysis and Population Scalability. IEEE
Trans. Evolutionary Computation 21(3): 426-439 (2017).
• J. He, T. Chen, and X. Yao, “On the easiest and hardest fitness functions,”
IEEE Trans. Evol. Comput., vol. 19, no. 2, pp. 295–305, Apr. 2015
• T. Friedrich, P. S. Oliveto, D. Sudholt, and C. Witt, “Analysis of diversity-
preserving mechanisms for global exploration,” Evol. Comput., vol. 17, no.
4, pp. 455–476, 2009.
• Arijit Biswas, Swagatam Das, Ajith Abraham, Sambarta Dasgupta: Stability
analysis of the reproduction operator in bacterial foraging optimization.
Theoretical Computer Science, 411(21): 2127-2139 (2010).104
Theoretical Studies on PSO
• Maurice Clerc’s works: The particle swarm-explosion, stability,
and ... - Semantic Scholar
• V Kadirkamanthan:
https://scholar.google.com/citations?user=JN9QqjEAAAAJ&
hl=en
• Prof Andries Engelbrecht:
https://scholar.google.com/citations?user=h9pOfj0AAAAJ&hl
=en
105
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 10
6
General Thoughts: NFL
(No Free Lunch Theorem)
• Glamorous Name for Commonsense?
– Over a large set of problems, it is impossible to find a single best algorithm
– DE with Cr=0.90 & Cr=0.91 are two different algorithms Infinite algos.
– Practical Relevance: Is it common for a practicing engineer to solve several
practical problems at the same time? NO
– Academic Relevance: Very High, if our algorithm is not the best on all
problems, NFL can rescue us!!
Other NFL Like Commonsense Scenarios
Panacea: A medicine to cure all diseases, Amrita: nectar of immortal perfect life
Silver bullet: in politics … (you can search these on internet)
Jack of all trades, but master of none
If you have a hammer all problems look like nails 107
General Thoughts: Convergence
• What is exactly convergence in the context of EAs & SAs ?
– The whole population reaching an optimum point (within a tolerance)…
– Single point search methods & convergence …
• In the context of real world problem solving, are we going to reject a
good solution because the population hasn’t converged ?
• Good to have all population members converging to the global
solution OR good to have high diversity even after finding the
global optimum ? (Fixed Computational budget Scenario)
What we do not want to have:
For example, in the context of PSO, we do not want to have chaotic oscillations
c1 + c2 > 4.1+
108
General Thoughts: Algorithmic Parameters
• Good to have many algorithmic parameters / operators ?
• Good to be robust against parameter / operator variations ? (NFL?)
• What are Reviewers’ preferences on the 2 issues above?
• Or good to have several parameters/operators that can be tuned
to achieve top performance on diverse problems? YES
• If NFL says that a single algorithm is not the best for a very large set
of problems, then good to have many algorithmic parameters &
operators to be adapted for different problems !!
CEC 2015 Competitions: “Learning-Based Optimization”
Similar Literature: Thomas Stützle, Holger Hoos, …109
General Thoughts: Nature Inspired Methods
• Good to mimic too closely natural phenomena? Lack of freedom to
introduce heuristics due to conflict with the natural phenomenon.
• Honey bees solve only one problem (gathering honey). Can this
ABC/BCO be the best approach for solving all practical problems?
• NFL & Nature inspired methods.
• Swarm inspired methods and some nature inspired methods do not
have crossover operator.
• Dynamics based methods such as PSO and survival of the fitter
method: PSO always moves to a new position, while DE moves
after checking fitness. 110
OverviewI. General Introduction
II. Introduction to Metaheuristics
III. Genetic Algorithms (GA)
IV. Evolution Strategy (ES), Differential Evolution (DE)
V. Particle Swarm Optimization (PSO)
VI. Harmony Search, Water Cycle, Jaya Algorithm
VII. Artificial Bee Colony (ABC), Group Search Algorithm, Cuckoo
Search (CSA) Firefly Algorithm (FA)
VIII.Conceptual Similarities & Hybridization
IX. Theoretical Studies
X. General Thoughts
XI. Future Directions 11
1
-112-
I - Population Topologies• In population based algorithms, population members exchange
information between them.
• Single population topology permits all members to exchange
information among themselves – the most commonly used.
• Other population topologies have restrictions on information
exchange between members – the oldest is island model
• Restrictions on information exchange can slow down the
propagation of information from the best member in the population
to other members (i.e. single objective global optimization)
• Hence, this approach
– slows down movement of other members towards the current best
member(s)
– Enhances the exploration of the search space
– Beneficial when solving multi-modal problems
As global version of the PSO converges fast, many topologies were
Introduced to slow down PSO …
I - PSO with Euclidean Neighborhoods
Presumed to be the oldest paper to consider distance based
dynamic neighborhoods for real-parameter optimization.
Lbest is selected from the members that are closer (w.r.t.
Euclidean distance) to the member being updated.
Initially only a few members are within the neighborhood (small
distance threshold) and finally all members are in the n’hood.
Island model and other static/dynamic neighborhoods did not
make use of Euclidean distances, instead just the indexes of
population members.
Our recent works are extensively making use of distance based
neighborhoods to solve many classes of problems.
113
P. N. Suganthan, “Particle swarm optimizer with neighborhood
operator,” in Proc. Congr. Evol. Comput., Washington, DC, pp.1958–
1962, 1999.
II - Ensemble Methods
• Ensemble methods are commonly used for pattern
recognition (PR), forecasting, and prediction, e.g. multiple
predictors.
• Not commonly used in Evolutionary algorithms ...
There are two advantages in EA (compared to PR):
1. In PR, we have no idea if a predicted value is correct or
not. In EA, we can look at the objective values and make
some conclusions.
2. Sharing of function evaluation among ensembles possible.
114
III - Adaptations
• Self-adaptation: parameters and operators are evolved by
coding them together with decision vectors
• Separate adaptation based on performance: operators
and parameter values yielding improved solutions are
recorded and rewarded.
• 2nd approach is more successful and frequently used with
population-based numerical optimizers.
115
Two Subpopulations with Heterogeneous
Ensembles & Topologies
▪ Proposed for balancing exploration and exploitation capabilities
▪ Population is divided into exploration / exploitation sub-poplns➢ Exploration Subpopulation group uses exploration oriented ensemble
of parameters and operators
➢ Exploitation Subpopulation group uses exploitation oriented ensemble
of parameters and operators.
• Topology allows information exchange only from explorative subpopulation
to exploitation sub-population. Hence, diversity of exploration popln not
affected even if exploitation popln converges.
• The need for memetic algorithms in real parameter optimization: Memetic
algorithms were developed because we were not able to have an EA or SI to be
able to perform both exploitation and exploration simultaneously. This 2-popln
topology allows with heterogeneous information exchange.
116
Two Subpopulations with Heterogeneous
Ensembles & Topologies
▪ Sa.EPSDE realization (for single objective Global):
N. Lynn, R Mallipeddi, P. N. Suganthan, “Differential Evolution with Two
Subpopulations," LNCS 8947, SEMCCO 2014.
▪ 2 Subpopulations CLPSO (for single objective Global)N. Lynn, P. N. Suganthan, “Comprehensive Learning Particle Swarm Optimization with
Heterogeneous Population Topologies for Enhanced Exploration and Exploitation,”
Swarm and Evolutionary Computation, 2015.
▪ Neighborhood-Based Niching-DE: Distance based neighborhood forms local
topologies while within each n’hood, we employ exploration-exploitation
ensemble of parameters and operators.
S. Hui, P N Suganthan, “Ensemble and Arithmetic Recombination-Based Speciation Differential
Evolution for Multimodal Optimization,” IEEE T. Cybernetics, pp. 64-74 Jan 2016.
10.1109/TCYB.2015.2394466
B-Y Qu, P N Suganthan, J J Liang, "Differential Evolution with Neighborhood Mutation for
Multimodal Optimization," IEEE Trans on Evolutionary Computation, DOI:
10.1109/TEVC.2011.2161873. (Supplementary file), Oct 2012. (Codes Available: 2012-TEC-
DE-niching)117
IV - Population Size Reduction
• Evolutionary algorithms are expected to explore the
search space in the early stages
• In the final stages of search, exploitation of previously
found good regions takes place.
• For exploration of the whole search space, we need a
large population while for exploration, we need a small
population size.
• Hence, population size reduction will be effective for
evolutionary algorithms.
118