NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
COMS M0305: Learning in Autonomous Systems
6: Evolving Artificial Neural Networks
Tim Kovacs
6: Evolving ANNs 1 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Today
Artificial Neural Networks
Adapting:
WeightsArchitecturesLearning Rules
Yao’s Framework for Evolving NNs
6: Evolving ANNs 2 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Artificial Neural Networks
NNs are function approximators
They approximate some input/output functionUseful because
they can generalise from training data to new casesthey can represent big functions with little memory
Inspired by brains
Rough analogy to how networks of real neurons work
Widely used in engineering applications
Specialised versions also used to model brain functionsSee Computational Neuroscience unit
6: Evolving ANNs 3 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Artificial Neural Networks
A typical NN consists of:A set of nodes
In layers: input, output and hidden
A set of directed connections between nodesEach connection has a weight
Nodes compute by:
Integrating their inputs using an activationfunctionPassing on their activation as output
NNs compute by:
Accepting external inputs at input nodesComputing the activation of each node inturn
6: Evolving ANNs 4 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Node Activation
A node integrates inputs with:
yi = fi
n∑j=1
wijxij − θi
yi is the output of node ifi is the activation function (typically a sigmoid)n is the number of inputs to the nodewij is the connection weight between nodes i and jxij is the j th input to node iθi is a threshold (or bias)
From the universal approximation theorem for neural networks: anycontinuous function can be approximated arbitrarily well by a NNwith one hidden layer and a sigmoid activation function.
6: Evolving ANNs 5 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Learning Weights
Weights are usually learned with a supervised learning process
They learn from a training set of input/output examplesE.g. a set of animal pictures, labeled as either cats or dogsThey generalise to new pictures, which were not in training set
Repeat
present input to NNcompute outputcompute error of output compared to known correct answerupdate weights based on error
Most NN learning algorithms are based on gradient descent
Including the best known: backpropagation (BP)Many successful applications, but often get trapped in local minima[15, 17]Require a continuous and differentiable error function
6: Evolving ANNs 6 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving Neural Networks
Evolution has been applied at 3 levels:
Weights
Architecture
connectivity: which nodes are connectedactivation functions: how nodes compute outputsplasticity: which nodes can be updated
Learning rules
Evolve new supervised learning rules to learn weightsI.e. evolve alternatives to backpropagation
6: Evolving ANNs 7 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Representations for Evolving NNs
Direct encoding [18, 6]
all details (connections and nodes) specified
Indirect encoding [18, 6]
only key details (e.g. number of hidden layers and nodes)a learning process determines the rest
Developmental encoding [6]
a developmental process is genetically encoded [10, 7, 12, 8, 13, 16]
Uses:
Indirect and developmental representations are more flexible
tend to be used for evolving architectures
Direct representations tend to be used for evolving weights alone
6: Evolving ANNs 8 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving Weights
An alternative to supervised learning ofweights
EC forms an outer loop to the NN
EC generates weightsPresent many inputs to NN, computeoutputs and overall errorUse error as fitness in EC
In figure
Ig,t – Input at generation g and time tOg,t – OutputFg,t – Feedback (either NN error or fitness)
EC doesn’t rely on gradients and can work ondiscrete fitness functions
Much research has been done on evolution ofweights
6: Evolving ANNs 9 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Fitness Functions for Evolving NNs
Fitness functions typically penalise: NN error and complexity(number of hidden nodes)
The expressive power of a NN depends on the number of hiddennodes
Fewer nodes = less expressive = fits training data less
More nodes = more expressive = fits data more
Too few nodes: NN underfits data
Too many nodes: NN overfits data
6: Evolving ANNs 10 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving weights vs. gradient descent
Evolution has advantages [18]:
Does not require continuous differentiable functions
Same method can be used for different types of network(feedforward, recurrent, higher order)
Which is faster?
No clear winner overall – depends on problem [18]
Evolving weights AND architecture is better than weights alone(we’ll see why later)
Evolution better for RL and recurrent networks [18]
[6] suggests evolution is better for dynamic networks
Happily we don’t have to choose between them . . .
6: Evolving ANNs 11 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving AND learning weights
Evolution:
Good at finding a good basin of attraction
Bad at finding optimum
Gradient descent:
Opposite of above
To get the best of both: [18]
Evolve initial weights, then train with gradient descent
2 orders of magnitude faster than random initial weights [6]
6: Evolving ANNs 12 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving NN Architectures
Arch. has important impact on results: can determine whether NNunder- or over-fits
Designing by hand is a tedious, expert trial-and-error process
Alternative 1:
Constructive NNs grow from a minimal network
Destructive NNs shrink from a maximal network
Both can get stuck in local optima and can only generate certainarchitectures [1]
Alternative 2:
Evolve them!
6: Evolving ANNs 13 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Reasons EC is suitable for architecture search space
1 “The surface is infinitely large since the number of possible nodesand connections is unbounded
2 the surface is nondifferentiable since changes in the number of nodesor connections are discrete and can have a discontinuous effect onEANN’s [Evolutionary Artificial NN] performance
3 the surface is complex and noisy since the mapping from anarchitecture to its performance is indirect, strongly epistatic1, anddependent on the evaluation method used;
4 the surface is deceptive2 since similar architectures may have quitedifferent performance;
5 the surface is multimodal3 since different architectures may havesimilar performance.” [11]
1 fitness is not a linear function of genes2 slope of the fitness landscape leads away from the optimum3 landscape has multiple basins of attraction
6: Evolving ANNs 14 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Reasons to evolve architectures and weights simultaneously
Learning with gradient descent:
Many-to-1 mapping from NN genotypes to phenotypes [20]
Random initial weights and stochastic learning lead to differentresultsResult is noisy fitness evaluationsAveraging needed – slow
Evolving arch. and weights simultaneously:
1-to-1 genotype to phenotype mapping avoids above problem
Result: faster learning
Can co-optimise other parameters of the network: [6]
[2] found best networks had very high learning ratemay have been optimal due to many factors: initial weights, trainingorder, amount of training
6: Evolving ANNs 15 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving Learning Rules [18]
There’s no one best learning rule for all architectures or problems
Selecting rules by hand is difficult
If we evolve the architecture (and even problem) then we don’t knowwhat it will be a priori
Solution: evolve the learning rule
Note: training architectures and problems must represent the testset
To get general rules: train on general problems/architectures, notjust one kindTo get rule for a specific arch./problem type, just train on that
6: Evolving ANNs 16 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving Learning Rule Parameters [18]
E.g. learning rate and momentum in backpropagation
Adapts standard learning rule to arch/problem at hand
Non-evolutionary methods of adapting them also exist
[3] found evolving architecture, initial weights and rule parameterstogether as good or better than evolving only first two or third (formulti-layer perceptrons)
6: Evolving ANNs 17 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving learning rules [18, 14]
Open-ended evolution of rules was initially considered impractical
Instead generic update rule was given and its parameters evolved [4]
Generic update is a linear function of 10 terms4 terms represent local information about node being updated6 terms are the pairwise products of the first 4The weight on each term is evolved as a vector of realsCan outperform human-designed rules e.g. [5]
Later Genetic Programming used to evolve novel rule types [14]
GP uses a set of mathematical functionsResult consistently outperformed standard BP
Whereas architectures are fixed, rules could change over lifetime(e.g. learning rate)
But evolving dynamic rules is more complex
6: Evolving ANNs 18 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Yao’s Framework for Evolving NNs [18]
Architectures, rules and weights can evolve as nested processes
Weight evolution is innermost (fastest time scale)
Either rules or architectures are outermost
If we have prior knowledge, or are interested in a specific class ofeither, this constrains search spaceOutermost should be the one which constrains search space most
Can be thought of as 3D space of evolutionary NNsThe dimensions are the choice of algorithm for learning
weightslearning rulesarchitectures
A particular way of evolving weights, learning rules and architecturesis a point in this space
If we remove references to EC and NNs it becomes a generalframework for adaptive systems
6: Evolving ANNs 19 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Evolving NNs – conclusions [6]
Most studies of neural robots in real environments use some form ofevolution
Evolving NNs can be used to study “brain development anddynamics because it can encompass multiple temporal and spatialscales along which an organism evolves, such as genetic,developmental, learning, and behavioral phenomena.”
“The possibility to co-evolve both the neural system and themorphological properties of agents . . . adds an additional valuableperspective to the evolutionary approach that cannot be matched byany other approach.” p. 59
6: Evolving ANNs 21 of 25
NNs Evolving NNs Adapting Weights Adapting Architectures Adapting Learning Rules Yao’s Framework Conclusions Bibliography
Reading
Reading on evolving NNs:
Yao’s classic 1999 survey [18]
Kasabov’s 2007 book [9]
Floreano et al.’s 2008 survey [6]
includes evolving dynamic and neuromodulatory NNs
Yao and Islam’s 2008 survey of evolving NN ensembles [19]
6: Evolving ANNs 22 of 25
[1] P.J. Angeline, G.M. Sauders, and J.B. Pollack.An evolutionary algorithm that constructs recurrent neural networks.IEEE Trans. Neural Networks, 5:54–65, 1994.
[2] R.K. Belew, J. McInerney, and N.N. Schraudolph.Evolving networks: using the genetic algorithm with connectionistic learning.In C.G. Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, editors, Proceedings of the 2ndConference on Artificial Life, pages 51–548. Addison-Wesley, 1992.
[3] P.A. Castilloa, J.J. Merelo, M.G. Arenas, and G. Romero.Comparing evolutionary hybrid systems for design and optimization of multilayer perceptronstructure along training parameters.Information Sciences, 177(14):2884–2905, 2007.
[4] D. Chalmers.The evolution of learning: An experiment in genetic connectionism.In E. Touretsky, editor, Proc. 1990 Connectionist Models Summer School, pages 81–90.Morgan Kaufmann, 1990.
[5] A. Dasdan and K. Oflazer.Genetic synthesis of unsupervised learning algorithms.Technical Report BU-CEIS-9306, Department of Computer Engineering and InformationScience, Bilkent University, Ankara, 1993.
[6] Dario Floreano, Peter Durr, and Claudio Mattiussi.Neuroevolution: from architectures to learning.Evolutionary Intelligence, 1(1):47–62, 2008.
[7] F. Gruau.Automatic definition of modular neural networks.Adaptive Behavior, 3(2):151–183, 1995.
[8] P. Husbands, I. Harvey, D. Cliff, and G. Miller.The use of genetic algorithms for the development of sensorimotor control systems.In P. Gaussier and J.-D. Nicoud, editors, From perception to action, pages 110–121. IEEEPress, 1994.
[9] N. Kasabov.Evolving Connectionist Systems: The Knowledge Engineering Approach.Springer, 2007.
[10] H. Kitano.Designing neural networks by genetic algorithms using graph generation system.Journal of Complex System, 4:461–476, 1990.
[11] G.F. Miller, P.M. Todd, and S.U. Hegde.Designing neural networks using genetic algorithms.In J.D. Schaffer, editor, Proc. 3rd Int. Conf. Genetic Algorithms and Their Applications,pages 379–384. Morgan Kaufmann, 1989.
[12] S. Nolfi, O. Miglino, and D. Parisi.Phenotypic plasticity in evolving neural networks.In P. Gaussier and J.-D. Nicoud, editors, From perception to action, pages 146–157. IEEEPress, 1994.
[13] S. Pal and D. Bhandari.Genetic algorithms with fuzzy fitness function for object extraction using cellular networks.Fuzzy Sets and Systems, 65(2–3):129–139, 1994.
[14] Amr Radi and Riccardo Poli.Discovering efficient learning rules for feedforward neural networks using geneticprogramming.In Ajith Abraham, Lakhmi Jain, and Janusz Kacprzyk, editors, Recent Advances in IntelligentParadigms and Applications, pages 133–159. Springer Verlag, 2003.
[15] R.S. Sutton.Two problems with backpropagation and other steepest-descent learning procedures fornetworks.In Proc. 8th Annual Conf. Cognitive Science Society, pages 823–831. Erlbaum, 1986.
[16] T. Sziranyi.Robustness of cellular neural networks in image deblurring and texture segmentation.Int. J. Circuit Theory App., 24(3):381–396, 1996.
[17] D. Whitley, T. Starkweather, and C. Bogart.Genetic algorithms and neural networks: Optimizing connections and connectivity.Parallel Comput., 14(3):347–361, 1990.
[18] X. Yao.Evolving artificial neural networks.Proceedings of the IEEE, 87(9):1423–1447, 1999.
[19] X. Yao and M.M. Islam.Evolving artificial neural network ensembles.IEEE Computational Intelligence Magazine, 3(1):31–42, 2008.
[20] X. Yao and Y. Liu.A new evolutionary system for evolving artificial neural networks.IEEE Trans. Neural Networks, 8:694–713, 1997.
Top Related