s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

1/13

ForP

eerRev

iewOnly

Statistical Model of Evolutionary Algorithm forFeed-Forward ANN Architecture Optimization

Journal: Journal of Experimental & Theoretical Artificial Intelligence

Manuscript ID: TETA-2013-0034

Manuscript Type: Original Article

Keywords: Artificial neural network, , Crossover., schema theory, topology mutation

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

2/13

ForP

eerRev

iewOnly

ABSTRACTThe optimization of feed-forward

architecture designing is the evolution of

Artificial Neural Network (ANN). There is

no systematic procedure to design a near-optimal architecture for a given application

or task. The pattern classification methods

and constructive and destructive algorithms

can be used for designing of architectures.

The proposed work develops the statistical

model of Evolutionary algorithm (EA) to

optimize the architecture. A single-point

crossover is applied with selective schemas

on the network space and evolution is

introduced in the mutation stage so that an

optimized ANNs are achieved.

Keywords: Artificial neural network,

topology mutation, schema theory

Crossover.

1 INTRODUCTION: Genetic algorithms

were developed by John Holland [1], [3], [2]

& [4]. Due to day to day life, a growing

number of applications combined with a

hardware enhancement, a variety of EAs are

becoming more and more popular. A family

of subsets of the search space and

appropriate process of re-encoding are two

notions and analogous to familiar facts

relating continuous maps and families of

open sets or measurable functions. In order

to apply on EA to a typical optimization

problem, we need to model the problem in a

suitable manner, i.e. to construct a search

space together with positive valued fitness

function and a family of mating and

mutation transforms. Therefore EAs can be

represented by a 4 tuple (, , , ) order.

is the family mating transforms; M is the

unary transformation on . The total search

space is divided into invariant subsets [3]

and a crossover operation is performs on .

While M is the family of mutations on and

is Ergodic, i.e. it ensures that MarKov

process [5] modeling the algorithm is

irreducible. The schemata correspond to

invariant subsets of the search space and the

schema theorem can be reformulated in

general framework. The invariant subsets of

the search space are encoding process

relating continuous maps and families of

open sets or measurable functions and sigma

algebras. A classical Geiringer theorem is

extended to represents a class of

evolutionary computation techniques with

crossover and mutation.

2.0 Representation of Evolutionary

Algorithm:

The mathematical foundation on

evolutionary algorithms representation given

section 1.0, we exploit the language of

category theory [6] is used. To apply on

evolutionary algorithm on a specific

Statistical Model of Evolutionary Algorithm for

Feed-Forward ANN Architecture Optimization

G.V.R. Sagar [email protected]

ssoc. Professor,

G.P.R. Engg. College.

Kurnool AP, 518007, India.

Dr. S. Venkata Chalam [email protected]

Professor,

CVR Engg. College,

Hyderabad AP, India

ge 1 of 12



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

3/13

ForP

eerRev

iewOnly

optimization problem, we need to model the

problem in a respective manner. This needs

to build a search space which containsthe elements of all the possible solutions to

the problem, a computable positive valued

fitness function ( ),0: f and a suitablefamily of mating or crossover andmutation transforms.

The category of Heuristic 3-tuples: All the

families (F) are invariant subsets [3] of ,characterize all families in set-theoretic and

sigmaalgebra.

Let denote a nonempty family oftransforms from m to for a fixed 1m ,for m-fixable families. We then denote the

family of invariant subsets of underfamily of is .

( ){ }= TSSTSS m, 3.18

It follows that for every element x thereis a unique element of containing x.

For a heuristic 3-tuple ( )MF,,= is a 3-tuple such that { }= ,M . Let x , Fora single heuristic 3-tuple = (, F, M),denoted by

xS then the smallest element of

F is family of invariant subsets.

In a similar manner for given two heuristic

3-tuples ( )1111 ,, MF= and( )2222 ,, MF= , we define a function

21: a represents the reproductiontransformation called as morphism. Let

x1 and y2 1FT and

( ) ( ) 2,2, FFyxx yx = such that

( )( ) ( ) ( ) ( )( )yxFyxT yx ,, ,= 3.2

Similarly 1MM and 2MHxx

such that ( )( ) ( )( )xxx HM = , gives acollection of all morphisms from 1 into 2denoted by M.

A Generalization of Geiringers theorem

for EAs

A family of recombination operators, (also

see in [7]) of a given evolutionary algorithm

changes the frequency, with which various

elements of search space are sampled [1],[8]. To illustrate this point, let

i

n

i A1== denote the search space of a given

evolutionary algorithm first discussed in [9].

Fix a population P consisting of a m

individuals with m being an even number. P

can be thought of as an m by n matrix whose

rows are individuals of the population P,

=

mnmm

n

n

aaa

aaa

aaa

P

....

.......

.......

.......

.......

....

....

21

22221

11211

1.0

The elements of the ith column of P are

members of Ai. The general Geringers

theorem [10] tells us the limiting frequency

with which certain elements of the search

space are sampled in the long run, provided

one uses crossover operator [19] alone is

represented by ( )iph ,, , where hAi theproportion of rows, say j of p for which

aji=h. i.e. if one starts with a population of

individuals and runs a evolutionary

algorithm in the absence of selection and

mutation (crossover being the only operator

involved) then, in the long run, the

frequency of occurrence of the individual

( )nhhh .....,, 21 before time t, represented by( )thhh n ,.....,, 21 is

( )thhh nt

,.....,,lim 21

= ( )iphn

i

,,1

=1.1

The limiting distributions of the frequency of

occurrence of individuals belonging to a

certain schema under these algorithms have

been computed also appeared in [11], [12],

[13]. The classical Geiringer theorem and

proposed or modified Geiringer algorithms

Page 2



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

4/13

ForP

eerRev

iewOnly

established from the basic facts about

MarKov chains [5] and random walks on

groups. This is mainly a matter of

formulating the statement of the theorem in a

slightly different manner. This new point of

view not only the existing various of

Geiringers theorem applied EAs, but alsoextends the process on evolutions

algorithms. Below we shall give a more

formal description of an EA then the one

given in Section 1.

Framework:

A population P of size m is simply an

element of m . (It is a column vector). Anelementary step is a probabilistic rule which

takes one population as an input and produce

another population of the same size as an

output. We shall consider the following

types of elementary steps.

Selection: Consider a given population P as

an input.

=

mx

x

x

P

.

.

.

2

1

with ix 1.2

The individuals of P are evaluated

( )( )

( )mm xf

xf

xf

x

x

x

..

..

..

.

.

.

2

1

2

1

1.3

A new population P1

is obtained, where yis

are chosen independently m times from the

individuals of P and yi

=xj

with probability

P =

This means that, the individuals of P1

are

among there of P, and the expectation of the

number of occurrences of any individual of

P in P1

is proportional to the number of

=

my

y

y

P

.

.

.

2

1

11.4

occurrence of that individual in P times the

individuals fitness value. In particular, the

fitter the individual is, the more copies of

that individual are likely to be present in P1.

On the other hand, the individuals having

relatively small fitness value are not likely to

enter into P1

at all. This is similar to imitate

the natural survival of fittest principle.

Crossover: The population P1 is the output

of the selection process. Now consider the

search space be a set, Fix on ordered K-tuple of integers ( )kqqqq ........,, 21= with

kqqq ........21 . Let K denote a partition

of set {1, 2, ..m} mN. Now partition theset that partition K is q-fit it

{ }kpppK .......,, 21= with ii qP = and is

denoted bym

q the family of all q-fit

partitions of {1, 2, .m}.

Let there areqkqq FFF ......,, 21 fixed families of

qi are operations on and P1, P2,..Pk bethe probability distributions on

( ) ( ) ( )qkqk

q

q

q

q FFF ..........,,2

2

1

1 respectively. Let

Pm be the probability distribution on the

collectionm

q of partitions {1, 2, m}

there the their exists a 2(k+1) tuple

mkqkqq PPPPFFF ,....,,,......,,, 2121 .

According to the above process . The given

reproduction K-tuple

mkqkqq PPPPFFF ,....,,,......,,, 2121 . The

individuals of P are portioned into pairwise

disjoint tuples for mating according to Pm is( )l

xf

xfm

l

j

=1

ge 3 of 12



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

5/13

ForP

eerRev

iewOnly

( ) ( )( )

=........,....,

,.....,....,,,......,,

21

2

2

2

2

2

1

1

1

1

2

1

1

j

qj

jj

qq

iii

iiiiiiK then

the corresponding tuples are given by

=

11

12

11

.

.

.

1

qi

i

i

x

x

x

Q

=

22

22

21

.

.

.

2

qi

i

i

x

x

x

Q

=

jqj

j

j

i

i

i

j

x

x

x

Q

.

.

.2

1

1.5

Having selected the partition, replace every

one of the selected qj tuples

=

jqj

j

j

i

i

i

j

x

x

x

.

.

.2

1

with the qj tuple 1.6

( )

( )

= .

...,,

.,......

.,......

.,......

.,......

,...,,

,...,,

21

21

21

2

1

1

jqj

jj

jqj

jj

jqj

jj

iiiqj

iii

iii

xxxT

xxxT

xxxT

1.7

For a qj - tuple of transformations

( ) ( )qjqjqj FTTT ,......, 21 selected randomlyaccording to the probability Pj on ( )qjqjF .

This gives a new population.

=

my

y

y

P

.

.

.

2

1

11.8

Notice that a single child does not have to be

produced by exactly two parents. It is

possible that a child has more than two

parents. Asexual reproduction (mutation) is

also allowed.

A general evolutionary search algorithm

works as follows. Fix a cycle, say

{ }jnn

SC1== when Sn is a finite sequence of

elementary steps. Now start the algorithmwith an initial population P given above may

be selected randomly. To run the algorithm

with cycle { }nSC= , simply input P into S1,run S1, input the output of S1 to S2 .. into

the O|P of Sj-1 into Sj and produce the new

O|P, say P1. Now as an initial population and

run the cycle C again. Continue the is loop

finitely many times depending on the

circumstances. A recombination sub-

algorithm defined by a sequence of

elementary steps of reproduction only.

Modified Evolutionary algorithm model:The general structure of EA proposed in [14]

The evolution algorithm has used the

following operators

a. Initializationb. Recombination or Crossoverc. Mutationd. Selection

The frame work of the EAs approach

requires a floating architecture and a fixedpopulation size. The population size,

maximum size and structure of the network

and genetic parameters are user specified.

The weight population is initialized with

user-defined number of hidden nodes for

each individual in order to create a new

population and the weights are generated

randomly same as the size of the population.

Page 4



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

6/13

ForP

eerRev

iewOnly

ANN Recombination or Crossover:In the proposed method, from the above

discussion consider a search space set andfamily of transformations Fq form

qinto ,

fix an ordered q-tuple of transforms

qq FTTT ......,, 21 . Now consider the

transformation qqqTTT :......,, 21

sending any given element

q

qx

x

x

.

.

.

2

1

into

(( )

( )

q

qqj

q

q

xxxT

xxxT

xxxT

.

...,,

.,......

.,......

.,......

.,......

,...,,

,...,,

21

212

211

1.9

Let the subsequence

{ }

j

nnSC

1== (The

element step Sn is or recombination) the

recombination sub-algorithm of proposed

EA reproduces K-tuple

mkqkqq PPPPFFF ,....,,,......,,, 2121 , this heuristic

search algorithm results a MarKov process

with a state space of population, P of fixed

size m and is devoted by ( )mm P . Letthe transition probability Pxy is simply the

probability that the population my isobtained from the population x by going

through the recombination cycle once. Thesetransition probabilities have been computed

but MarKov chain obtained is difficult to

analyze.

Let fix an EA A and the probability that a

population y is obtained from the population

X upon the completion of n complete cycles

of the recombination with a probability

0. >n

yxP . We also write YXA for X

leads to Y or with a population mP , wealso write [P]A denotes the equivalence class

of the population P under the equivalence

relation A .

Therefore The MarKov chain initiated at

some populationmP is irreducible and

its unique stationary distribution is the

uniform distribution on [P]A.

Now fix a partition

( ) mqnkPPPK = ,........., 21 when

( )nknn qqqqn ,........., 21= and now fix a parti-cular choice of tuples of transformation

( ) ( ) ( )

ni

ni

ni

ni

q

q

q

q

i

qin

ii

i FFTTTT = :......,, 21 2.0

such that ( ) ,0......,, 21 >iqiniini TTTP

First notice that we can identify m with the

setnk

nn qqq ....21 via portion( )kPPPK ...,, 21= as follows

given ( ) mmxxxx = ....,, 21 , identifymx

with the one point cross over element

( )xkxxx uuuu ...,, 21=r

when ( )naaaxi xxxu 121 ,....,,= ,i

n

i Paqaa ...,,, 21 andn

aiaaa

7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

7/13

ForP

eerRev

iewOnly

Then set of all such that transformation

denote by

=ionrecombinatforchosenare,....,

andinpartitionais

21

m

qn,...., 21

k

k

TTT

n

TTT

KTH k

2.2

Now consider the set of transformation H

from m into itself as follows

{ }nnjjmm HFoFoFoFTTH == ,.....: 112.3

therefore any transformation HT is acomposition of bijections, hence is itself a

bijection so that we can say that mSH

when mS is a group of permutations of

m

.

Let G denotes the subgroup of Sm

generated by H. Now, when a E A A runs a

cycle on the input X amounts to selecting the

transformation form H independently and

applying them consecutively so that the

output of the cycle C on the input X would

be T(x) for some HT chosen with somepositive probability.

We now proceed to define a random walk

associates to a group action.

Let be a finite set and G. be a finite group

generated by H. ( )GH and let d denotesthe identity of group G ( )He .

Let is a probability distribution on Gwhich is concentrated on

( )( ) nxyPHggH .0 > the probabilitythat a state XY is reached from the state in exactly n steps.

The random walk on action of a group G onthe set to be the Markov process with

transition probabilities is

{ }( )yxggPyx xy == , ( )( )xgxg =Q 2.4

Since H generates G, n large enough so thatGg we assure 0).( >

n

xgxP , such that

g

ngmmmg ........,2

2

2

1= now

let { }Ggngn = /max .

Therefore eq. (2.4) can be written as Markov

chain as by the definition of group action

( )( )n

Xdddmmmx

n

xgx gng

ggPP = .............)( 21

( )( ) = n

xdddmmmx

n

xgx gng

ggPP ......))(....(((((.........()( 21

PP xddxdxdx ....))........()(()(

P xxdddxddd...... )(.........()......)(.......(

( ) ( ) ( )( )..........1 xmmxmPxmPX gnggnggnggngx

( ) ( )( ) XmmmXmmmP gngngngnggngn ............. 32132

According to equation (2.4)

( )( )( ) ( )=

>ng

i

g

ni

ngnmd

0

0. 2.5

Equation (2.5) is an irreducible Markovchain with a finite state space and it has

unique stationary unique distribution

denoted by is the initial distribution on x.

There fore we then have

( )X

X1

= 2.6

Then the distribution in the next generation

say . is given as .

( ) ( )

=Hm

mxmX ).(1

2.7

=Hm

mX

)(1

2.8

( ) ===

XX

mX Hm

1)(

1 2.9

Page 6



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

8/13

ForP

eerRev

iewOnly

Since

=Hx

m 1)(and is

concentrated on H.

Markov chain modeling an EA A is a

random walk associated to a finite group set

of G on Xgenerates a new population or set

H, in the long run with a uniform

distribution ().

In this proposed evolutionary algorithms

given above to improve the behavior

between parents and off-springs. Single-

point crossover given in section xxx in

which, (single-point crossover) different

cutting points for each of the two parents in

the population. The cutting points are

independently extracted for each parent

because the genotype lengths of individuals

are variable. The cutting point is taken only

between the one layer and the next layer(for

two hidden layer between second layers of

two network parents); this means that a newevolutionary weight matrix has created to

make connection between two layers at the

cutting points in the parents producing the

two off-springs, so that the population is

maintained constant. In each off-spring node

or layer creation and deletion is possible

based the predefined genetic parameters.

3.3 Topology Mutation:The mutation transformations consist of

the transformations

:a

M 3.0

where { } iSins AUa

.,.........2,1 .

Therefore ( )ikii aaaa .....,, 21=

for

{ }nSiiia

k ,.......,2,1.......21 defined

as ( ) = ni xxxX .....,, 2 we have

( ) ( )na

yyyyxM ...,........., 21== 3.1

where =

=wiseotherx

jsomeforiqifay

q

jq

q

The global behavior of evolutionary

algorithms is to consider a group or family

of subsets of the search space and to predict

which ones of these subsets (say Q) must

satisfies the property is The expected

number of occurrences of elements of Q

increases from one generation to the next.

The each subset is called as schemata.

If the chromosomes length is fixed to n, the

search space is in

i AS 1== where Ai is set ofall possible alleles which may occur at the i

th

position in the chromosome. The next

section gives the selection of offspring based

on the fitness function.

3.4 Selection: A tournament is performed by

choosing the group of off-springs which are

selected randomly and reproducing the best

individual form this group. Now picking up

the P number of challengers as a group,

which is 10% of population size and arrange

the tournament with respected to fitness

between the P challengers and rth solution

and define the scores of rth solutions. The

scores are determined by the minimum

distance method using fitness function [18].This is called the P tournament selection.

Arrange the scores of all the solutions in the

ascending order and pick up the best half

score positions. The best half scores are

considered for the next generation. Repeat

the process for r number of times, where r

is the twice the population size and obtained

the scores of r number of P-tournaments.

The selection probabilities for P-tournament

selection are given by

1 1 1 3.2

More number of selection pressures and their

comparision are given in [15], [16].

6. EXPERIMENTAL SETUPThe idea proposed in this work emphasis on

evolving ANNS; a new evolutionary system

for evolving feed-forward ANNs from

ge 7 of 12



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

9/13

ForP

eerRev

iewOnly

architecture space. In this contest, the

evolutionary process attempts to crossover

and mutate weights before performing any

structural or topology crossover and

mutation. The evolutionary process is

involved in mutation of weights and

topology. Weight mutation is carried outbefore structural or topology mutation.

Population size in EA taken as 20 and 10

independent trails have given to get the

generalize behavior. Condition of

terminating criteria is taken as fixed iteration

and it is equal to 100 for EA. Table 5.1 gives

all the parameters of the algorithm and

default setting values are taken for

considered problem. All the experiments are

run by specifying the parameters and by

tuning the genetic parameters to obtain the

best solution.

Table: 5.1 Default parameters.

In this work we considered five benchmark

problems are used to check the ANNoptimization.

a)N.bit (2 and 4) Parity (even) classifierb)Pima-India diabetes classifierc)SPECT heart decies classifierd)Brest cancer classifier

Performance of N-Bit Parity (XOR)

classification Problem:

In simultaneous evolution of architecture

and connection weights, we considered only

2-bit and 4-bit parity encoders with different

network sizes are given in this section.

FIGURE 5.8 Performance of Evolutionary ANN for

2-bitparity with initial sizes of [2 3 2 1 2].


2-bitparity with initial sizes of [2 2 2 1 2].

For parity 2/4 all networks in the space has a

maximum of 10 nodes including 2/4 inputs ,

number of hidden nodes in layer one,

number of hidden nodes in layer two, 1

output node and two hidden layers i.e. the

size is [2/4 2/3 2 1 2]. This allowed for

hidden layer configurations up to 5 nodes to

be evolved. The average and best generation

over all runs that found a solution for parity-

2 using accuracy fitness function and the

smallest architecture size found. The meansquare error (MSE) for 10 trail runs are

given in the Table 5.1 and the performance

of 5 runs are shown in Fig (5.8) and the run

3,4, & 5 completed in 50 generations and

run1&2 completed in 20 generations. The

average number of hidden nodes over 10

successful trail runs is 2.1 and the average

number of connections is 7.9. For ten runs,

Symbol ParameterDefault

value

N Population size 20

SeedPreviously saved

populationnone

Probability of inserting a

hidden layer0.1

Probability of deleting a

hidden layer0.05

Probability of inserting a

neuron in hidden layer0.05

Probability of deleting a

neuron hidden layer0.05

Probability of crossover 0.1

Number of network

inputs

Problem

specific

Number of network

outputs

Problem

specific

K MSE in the range 10

Page 8



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

10/13

ForP

eerRev

iewOnly

for the N-bit parity problems the best

individuals found with = 0.05,

= 0.05, = 0.01 and

= 0.01.


4-bit parity with initial size of [4 5 4 1 2].


4-bit parity with initial size of [4 4 5 1 2].

Table 5.2 Performance of ANN shown by

EA for different trails

Performance of Real-Time dataset

classification Problems:

For real time datasets all the data applied to

the training and test sets are acquired from

the UCI Machine Learning Repository [17].

Each input dataset variable should bepreprocessed so that its mean value,

averaged over the entire training set, is close

to zero, or else it is small compared to its

standard deviation

i) Pima-India-Diabetes datasetproblems.

ii) SPECT Heart DeceasePima-India-Diabetes dataset composed of 8

attributes plus a binary class value to show

the signs of diabetes which corresponds to

the target classification value and includes768 instances shown in Table 4.8. All the

datasets are divided in to two sets, using 500

instances for the training, 268 for the test.

For Single Proton Emission Computed

Tomography (SPECT) Heart datasets only13 attributes are used as input parameters to

classify the problem and a total of 267

instances. The target value has stored at 14th

parameter in the data set. These data sets are

normalized before applying to the network.

All the datasets are divided in to two sets,

using 200 instances for the training and 67

for the test.

The evolutionary process initialized with all

the networks in the architecture space with

an defend architecture size, example of size

[x y z 1 n] i.e. x inputs, y hidden nodes in 1st

hidden layer, z hidden nodes in 2nd

hidden

layer, one output layer with one node and n

represents the number of layers After the

evolutionary ANN process the optimized

network consists of only 2 hidden nodes in a

single hidden layer with uni-model sigmoidactivation function and the result of the real

data classification problems are shown in

Figures and Tables

Trail

No.

MSE

([2 3 2 1 2])

MSE([4 5 4 1 2])

1 9.0084e-003 3.2548e-006

2 2.1219e-026 1.3548e-002

3 2.0416e-014 6.3254e-011

4 1.3406e-003 5.4856e-019

5 2.1219e-026 9.2154e-026

6 9.0084e-003 9.3554e-004

7 2.1219e-026 2.8754e-0148 2.0416e-014 9.2365e-013

9 1.3406e-003 3.4587e-001

10 3.2323e-022 8.2657e-016

ge 9 of 12



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

11/13

ForP

eerRev

iewOnly


Pima India diabetes with initial size of [9 4 5 1 2]

Table 5.4 Results of Pima India Diabetes.


SPECT Heart dataset with initial size of [14 4 5 1 2].


Breast Cancer dataset with initial size of [11 4 5 1 2].

Table 5.6 Results of SPECT Heart dataset.

For Pima-India classification the average

mean square error is 8.6214e-3. During the

training the network is adjusted according to

its error, whereas the test process provides

an independent measure of networkperformance during and after training. The

best solution found in less than 50generations with = 0.1,

= 0.05,

= 0.1 and

= 0.1. With another network

size of [9 5 4 1 2] is also shown in the Table

5.4 with minimum hidden nodes of 3 in a

single hidden layer. The results of the heart

dataset are as shown in Table 5.6 and a

comparisons with literature is shown in

Table 5.7. Ten runs are executed and the

Parameter

Experimental

Results

Number Of Runs 10 10Number Of Generations 40 61

Number of Training patterns

used500 500

Average Training Set

Accuracy76.0 76.5

Number of Test patterns used 268 268

Average Test Set Accuracy 81.5 83.5

Initial Number of Hidden

layers / Nodes2 / [4 5] 2 / [5 4]

Final Number of Hidden

layers / Nodes (Resulted NN)1 / [2] 1 / [3]

Population size 50 50

Number of inputs 09 09

Number of outputs 01 01

Parameter

Experimental

Results

Number Of Runs 10 10

Number Of Generations 90 103

Number of Training patterns

used200 200


Accuracy86.0 87.2



Initial Number of Hidden

layers / Nodes2 / [4 5] 2 / [5 4]


layers / Nodes (Resulted NN)

1 / [3] 1 / [3]




Page 10



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

12/13

ForP

eerRev

iewOnly

average percentage error values of the

training and test process are summarized in

Table 5.6 and 5 trail runs are shown in Fig

(5.13) with an average mean square error of

7.7264e-3. The best solutions reached in less

than 90 generations with = 0.1,

= 0.1, = 0.1 and = 0.1. With anothernetwork size of [14 5 4 1 2] is also shown in

the table with minimum hidden nodes of 3 in

a single hidden layer. The results of the

Brest Cancer dataset are as shown in Table

5.8 and a comparisons with literature is

Table 5.8 Results of Breast Cancer dataset.

shown in Table 5.9. Ten runs are executed

and the average percentage error values of

the training and test process are summarized

in Table 5.8 and 5 trail runs are shown in

Fig (5.14) with an average mean square error

of 5.3614e-3. The best solution reached inless than 45 generations with

= 0.05,

= 0.05,

= 0.05 and = 0.05 in all

runs. With another network size of [11 5 4 1

2] is also shown in the table with minimum

hidden nodes of 3 in a single hidden layer.

CONCLUSION:The optimal weights in ANN in the phase of

learning have obtained by using the concept

of evolutionary genetic algorithm.

Determination of optimal architecture and

weights in ANN in the phase of learning has

obtained by using the concept of

evolutionary genetic algorithm. Proposed

method of both architecture and weights

adjustment has shown outperform at every

level for 2-Bit and 4-Bit parity compared to

the fixed network Back-Propagation and real

dataset classification problems reached the

excellent percentage of accuracy and

optimized network with less number of

hidden nodes and layers of less probability. .

REFERENCES:

[1] Michalewicz, Z. Genetic algorithms + data

structures = evolution programs. Springer-Verlag.

1996.

[2] Muhleinbeim, H. and Mahnig, T. Evolutionary

computation and beyond. In Y. Uesaka, P.

Kanerva, and H. Asoh, editors, Foundations of

Real-World Intelligence, CSLI Publications, pp.

123-188, 2001.

[3] Mitavskiy B. Crossover Invariant Subsets of the

Search Space for Evolutionary Algorithms.

Evolutionary Computation.

http://www.math.lsa.umich.edu/vbmitavsk/

[4] J. H. Holland, "Adaptation in Natural and

Artificial Systems. Ann Arbor", MI: Univ. of

Michigan Press, 1975.

[5] Coffey, S. An Applied Probabilists Guide to

Genetic Algorithms. A Thesis Submitted to The

University of Dublin for the degree of Master in

Science, 1999.

[6] Mac Lane, S. Categories for the working

mathematician. Graduate Texts in Mathematics

5, Springer-Verlag. 1971.

[7] Poli, R., Stephens, C., Wright, A., Rowe, J. A

Schema-Theory-Based Extension of Geiringers

Theorem for Linear GP and variable-length GAs

under Homologous Crossover, (2002).

[8] Vose, M. Generalizing the notion of a schema in

genetic algorithms Artificial Intelligence, 50(3):

385-396, 1991.

Parameter

Experimental

Results

Number Of Runs 10 10

Number Of Generations 45 52

Number of Training patternsused

400 400


Accuracy97.0 97.0



Initial Number of Hiddenlayers / Nodes

2 / [4 5] 2 / [5 4]


layers / Nodes (Resulted NN)1 / [2 ] 1/ [2]




ge 11 of 12



7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

13/13

ForP

eerRev

iewOnly

[9] Radcliffe, N. The algebra of genetic algorithms.

Annals of Mathematics and Artificial

Intelligence, 10:339-384, 1994.

http://users.breathemail.net/njr/papers/amai94.pdf

[10] Geiringer, H. On the probability of linkage in

Mendelian heredity. Annals of Mathematical

Statistics, 15:25-57, 1944.

[11]Vose, M. and Wright, A. The simple genetic

algorithm and the Walsh transform: Part II, the

inverse. Evolutionary Computation, 6(3):275-

289, 1998.

[12]Stephens, C. and Waelbroeck, H. Schemata

evolution and building blocks. Evolutionary

Computation, 7(2):109-124, 1999.

[13] C. Stephens. The Renormalization Group and the

Dynamics of Genetic systems, to be published in

Acta Physica Slovaka, /0210271/ (2002).

http://arXiv.org/abs/condmat/

[14] Wright, A., Rowe, J., Poli, R., and Stephens C. A

fixed point analysis of a gene pool GA with

mutation. Proceedings of the Genetic and

Evolutionary Computation Conference (GECCO)

Morgan Kaufmann. 2002.

http://www.cs.umt.edu/u/wright/.

[15] Jun He, Xin Yao Drift analysis and average

time complexity of evolutionary algorithms;

Artificial Intelligence 127, 5785, 2001.

[16]T. Chen, J. He, G. Sun, G. Chen, X. Yao, A new

approach to analyzing average time complexity

of population-based evolutionary algorithms on

unimodal problems, IEEE Trans. Syst., Man, and

Cybern., Part B 39 (5), 1092_1106, 2009.

[17] D.J. Newman, S. Hettich, C.L. Blake, and

C.J. Merz. UCI repository ofmachine learning

databases, 1998.

[18] M. Hutter, S. Legg, Fitness uniform

optimization, IEEE Trans. Evol. Comput. 10 (5)

568_589.2006.

[19] Liepins, G. and Vose, M. haracterizing cross-over in Genetic Algorithms. Annals of

Mathematics and Artificial Intelligence, 5: 27 -

34.(1992).

Page 12



s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

Documents

Transcript of s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2