s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

download s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

of 13

Transcript of s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    1/13

    ForP

    eerRev

    iewOnly

    Statistical Model of Evolutionary Algorithm forFeed-Forward ANN Architecture Optimization

    Journal: Journal of Experimental & Theoretical Artificial Intelligence

    Manuscript ID: TETA-2013-0034

    Manuscript Type: Original Article

    Keywords: Artificial neural network, , Crossover., schema theory, topology mutation

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    2/13

    ForP

    eerRev

    iewOnly

    ABSTRACTThe optimization of feed-forward

    architecture designing is the evolution of

    Artificial Neural Network (ANN). There is

    no systematic procedure to design a near-optimal architecture for a given application

    or task. The pattern classification methods

    and constructive and destructive algorithms

    can be used for designing of architectures.

    The proposed work develops the statistical

    model of Evolutionary algorithm (EA) to

    optimize the architecture. A single-point

    crossover is applied with selective schemas

    on the network space and evolution is

    introduced in the mutation stage so that an

    optimized ANNs are achieved.

    Keywords: Artificial neural network,

    topology mutation, schema theory

    Crossover.

    1 INTRODUCTION: Genetic algorithms

    were developed by John Holland [1], [3], [2]

    & [4]. Due to day to day life, a growing

    number of applications combined with a

    hardware enhancement, a variety of EAs are

    becoming more and more popular. A family

    of subsets of the search space and

    appropriate process of re-encoding are two

    notions and analogous to familiar facts

    relating continuous maps and families of

    open sets or measurable functions. In order

    to apply on EA to a typical optimization

    problem, we need to model the problem in a

    suitable manner, i.e. to construct a search

    space together with positive valued fitness

    function and a family of mating and

    mutation transforms. Therefore EAs can be

    represented by a 4 tuple (, , , ) order.

    is the family mating transforms; M is the

    unary transformation on . The total search

    space is divided into invariant subsets [3]

    and a crossover operation is performs on .

    While M is the family of mutations on and

    is Ergodic, i.e. it ensures that MarKov

    process [5] modeling the algorithm is

    irreducible. The schemata correspond to

    invariant subsets of the search space and the

    schema theorem can be reformulated in

    general framework. The invariant subsets of

    the search space are encoding process

    relating continuous maps and families of

    open sets or measurable functions and sigma

    algebras. A classical Geiringer theorem is

    extended to represents a class of

    evolutionary computation techniques with

    crossover and mutation.

    2.0 Representation of Evolutionary

    Algorithm:

    The mathematical foundation on

    evolutionary algorithms representation given

    section 1.0, we exploit the language of

    category theory [6] is used. To apply on

    evolutionary algorithm on a specific

    Statistical Model of Evolutionary Algorithm for

    Feed-Forward ANN Architecture Optimization

    G.V.R. Sagar [email protected]

    ssoc. Professor,

    G.P.R. Engg. College.

    Kurnool AP, 518007, India.

    Dr. S. Venkata Chalam [email protected]

    Professor,

    CVR Engg. College,

    Hyderabad AP, India

    ge 1 of 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    3/13

    ForP

    eerRev

    iewOnly

    optimization problem, we need to model the

    problem in a respective manner. This needs

    to build a search space which containsthe elements of all the possible solutions to

    the problem, a computable positive valued

    fitness function ( ),0: f and a suitablefamily of mating or crossover andmutation transforms.

    The category of Heuristic 3-tuples: All the

    families (F) are invariant subsets [3] of ,characterize all families in set-theoretic and

    sigmaalgebra.

    Let denote a nonempty family oftransforms from m to for a fixed 1m ,for m-fixable families. We then denote the

    family of invariant subsets of underfamily of is .

    ( ){ }= TSSTSS m, 3.18

    It follows that for every element x thereis a unique element of containing x.

    For a heuristic 3-tuple ( )MF,,= is a 3-tuple such that { }= ,M . Let x , Fora single heuristic 3-tuple = (, F, M),denoted by

    xS then the smallest element of

    F is family of invariant subsets.

    In a similar manner for given two heuristic

    3-tuples ( )1111 ,, MF= and( )2222 ,, MF= , we define a function

    21: a represents the reproductiontransformation called as morphism. Let

    x1 and y2 1FT and

    ( ) ( ) 2,2, FFyxx yx = such that

    ( )( ) ( ) ( ) ( )( )yxFyxT yx ,, ,= 3.2

    Similarly 1MM and 2MHxx

    such that ( )( ) ( )( )xxx HM = , gives acollection of all morphisms from 1 into 2denoted by M.

    A Generalization of Geiringers theorem

    for EAs

    A family of recombination operators, (also

    see in [7]) of a given evolutionary algorithm

    changes the frequency, with which various

    elements of search space are sampled [1],[8]. To illustrate this point, let

    i

    n

    i A1== denote the search space of a given

    evolutionary algorithm first discussed in [9].

    Fix a population P consisting of a m

    individuals with m being an even number. P

    can be thought of as an m by n matrix whose

    rows are individuals of the population P,

    =

    mnmm

    n

    n

    aaa

    aaa

    aaa

    P

    ....

    .......

    .......

    .......

    .......

    ....

    ....

    21

    22221

    11211

    1.0

    The elements of the ith column of P are

    members of Ai. The general Geringers

    theorem [10] tells us the limiting frequency

    with which certain elements of the search

    space are sampled in the long run, provided

    one uses crossover operator [19] alone is

    represented by ( )iph ,, , where hAi theproportion of rows, say j of p for which

    aji=h. i.e. if one starts with a population of

    individuals and runs a evolutionary

    algorithm in the absence of selection and

    mutation (crossover being the only operator

    involved) then, in the long run, the

    frequency of occurrence of the individual

    ( )nhhh .....,, 21 before time t, represented by( )thhh n ,.....,, 21 is

    ( )thhh nt

    ,.....,,lim 21

    = ( )iphn

    i

    ,,1

    =1.1

    The limiting distributions of the frequency of

    occurrence of individuals belonging to a

    certain schema under these algorithms have

    been computed also appeared in [11], [12],

    [13]. The classical Geiringer theorem and

    proposed or modified Geiringer algorithms

    Page 2

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    4/13

    ForP

    eerRev

    iewOnly

    established from the basic facts about

    MarKov chains [5] and random walks on

    groups. This is mainly a matter of

    formulating the statement of the theorem in a

    slightly different manner. This new point of

    view not only the existing various of

    Geiringers theorem applied EAs, but alsoextends the process on evolutions

    algorithms. Below we shall give a more

    formal description of an EA then the one

    given in Section 1.

    Framework:

    A population P of size m is simply an

    element of m . (It is a column vector). Anelementary step is a probabilistic rule which

    takes one population as an input and produce

    another population of the same size as an

    output. We shall consider the following

    types of elementary steps.

    Selection: Consider a given population P as

    an input.

    =

    mx

    x

    x

    P

    .

    .

    .

    2

    1

    with ix 1.2

    The individuals of P are evaluated

    ( )( )

    ( )mm xf

    xf

    xf

    x

    x

    x

    ..

    ..

    ..

    .

    .

    .

    2

    1

    2

    1

    1.3

    A new population P1

    is obtained, where yis

    are chosen independently m times from the

    individuals of P and yi

    =xj

    with probability

    P =

    This means that, the individuals of P1

    are

    among there of P, and the expectation of the

    number of occurrences of any individual of

    P in P1

    is proportional to the number of

    =

    my

    y

    y

    P

    .

    .

    .

    2

    1

    11.4

    occurrence of that individual in P times the

    individuals fitness value. In particular, the

    fitter the individual is, the more copies of

    that individual are likely to be present in P1.

    On the other hand, the individuals having

    relatively small fitness value are not likely to

    enter into P1

    at all. This is similar to imitate

    the natural survival of fittest principle.

    Crossover: The population P1 is the output

    of the selection process. Now consider the

    search space be a set, Fix on ordered K-tuple of integers ( )kqqqq ........,, 21= with

    kqqq ........21 . Let K denote a partition

    of set {1, 2, ..m} mN. Now partition theset that partition K is q-fit it

    { }kpppK .......,, 21= with ii qP = and is

    denoted bym

    q the family of all q-fit

    partitions of {1, 2, .m}.

    Let there areqkqq FFF ......,, 21 fixed families of

    qi are operations on and P1, P2,..Pk bethe probability distributions on

    ( ) ( ) ( )qkqk

    q

    q

    q

    q FFF ..........,,2

    2

    1

    1 respectively. Let

    Pm be the probability distribution on the

    collectionm

    q of partitions {1, 2, m}

    there the their exists a 2(k+1) tuple

    mkqkqq PPPPFFF ,....,,,......,,, 2121 .

    According to the above process . The given

    reproduction K-tuple

    mkqkqq PPPPFFF ,....,,,......,,, 2121 . The

    individuals of P are portioned into pairwise

    disjoint tuples for mating according to Pm is( )l

    xf

    xfm

    l

    j

    =1

    ge 3 of 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    5/13

    ForP

    eerRev

    iewOnly

    ( ) ( )( )

    =........,....,

    ,.....,....,,,......,,

    21

    2

    2

    2

    2

    2

    1

    1

    1

    1

    2

    1

    1

    j

    qj

    jj

    qq

    iii

    iiiiiiK then

    the corresponding tuples are given by

    =

    11

    12

    11

    .

    .

    .

    1

    qi

    i

    i

    x

    x

    x

    Q

    =

    22

    22

    21

    .

    .

    .

    2

    qi

    i

    i

    x

    x

    x

    Q

    =

    jqj

    j

    j

    i

    i

    i

    j

    x

    x

    x

    Q

    .

    .

    .2

    1

    1.5

    Having selected the partition, replace every

    one of the selected qj tuples

    =

    jqj

    j

    j

    i

    i

    i

    j

    x

    x

    x

    .

    .

    .2

    1

    with the qj tuple 1.6

    ( )

    ( )

    = .

    ...,,

    .,......

    .,......

    .,......

    .,......

    ,...,,

    ,...,,

    21

    21

    21

    2

    1

    1

    jqj

    jj

    jqj

    jj

    jqj

    jj

    iiiqj

    iii

    iii

    xxxT

    xxxT

    xxxT

    1.7

    For a qj - tuple of transformations

    ( ) ( )qjqjqj FTTT ,......, 21 selected randomlyaccording to the probability Pj on ( )qjqjF .

    This gives a new population.

    =

    my

    y

    y

    P

    .

    .

    .

    2

    1

    11.8

    Notice that a single child does not have to be

    produced by exactly two parents. It is

    possible that a child has more than two

    parents. Asexual reproduction (mutation) is

    also allowed.

    A general evolutionary search algorithm

    works as follows. Fix a cycle, say

    { }jnn

    SC1== when Sn is a finite sequence of

    elementary steps. Now start the algorithmwith an initial population P given above may

    be selected randomly. To run the algorithm

    with cycle { }nSC= , simply input P into S1,run S1, input the output of S1 to S2 .. into

    the O|P of Sj-1 into Sj and produce the new

    O|P, say P1. Now as an initial population and

    run the cycle C again. Continue the is loop

    finitely many times depending on the

    circumstances. A recombination sub-

    algorithm defined by a sequence of

    elementary steps of reproduction only.

    Modified Evolutionary algorithm model:The general structure of EA proposed in [14]

    The evolution algorithm has used the

    following operators

    a. Initializationb. Recombination or Crossoverc. Mutationd. Selection

    The frame work of the EAs approach

    requires a floating architecture and a fixedpopulation size. The population size,

    maximum size and structure of the network

    and genetic parameters are user specified.

    The weight population is initialized with

    user-defined number of hidden nodes for

    each individual in order to create a new

    population and the weights are generated

    randomly same as the size of the population.

    Page 4

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    6/13

    ForP

    eerRev

    iewOnly

    ANN Recombination or Crossover:In the proposed method, from the above

    discussion consider a search space set andfamily of transformations Fq form

    qinto ,

    fix an ordered q-tuple of transforms

    qq FTTT ......,, 21 . Now consider the

    transformation qqqTTT :......,, 21

    sending any given element

    q

    qx

    x

    x

    .

    .

    .

    2

    1

    into

    (( )

    ( )

    q

    qqj

    q

    q

    xxxT

    xxxT

    xxxT

    .

    ...,,

    .,......

    .,......

    .,......

    .,......

    ,...,,

    ,...,,

    21

    212

    211

    1.9

    Let the subsequence

    { }

    j

    nnSC

    1== (The

    element step Sn is or recombination) the

    recombination sub-algorithm of proposed

    EA reproduces K-tuple

    mkqkqq PPPPFFF ,....,,,......,,, 2121 , this heuristic

    search algorithm results a MarKov process

    with a state space of population, P of fixed

    size m and is devoted by ( )mm P . Letthe transition probability Pxy is simply the

    probability that the population my isobtained from the population x by going

    through the recombination cycle once. Thesetransition probabilities have been computed

    but MarKov chain obtained is difficult to

    analyze.

    Let fix an EA A and the probability that a

    population y is obtained from the population

    X upon the completion of n complete cycles

    of the recombination with a probability

    0. >n

    yxP . We also write YXA for X

    leads to Y or with a population mP , wealso write [P]A denotes the equivalence class

    of the population P under the equivalence

    relation A .

    Therefore The MarKov chain initiated at

    some populationmP is irreducible and

    its unique stationary distribution is the

    uniform distribution on [P]A.

    Now fix a partition

    ( ) mqnkPPPK = ,........., 21 when

    ( )nknn qqqqn ,........., 21= and now fix a parti-cular choice of tuples of transformation

    ( ) ( ) ( )

    ni

    ni

    ni

    ni

    q

    q

    q

    q

    i

    qin

    ii

    i FFTTTT = :......,, 21 2.0

    such that ( ) ,0......,, 21 >iqiniini TTTP

    First notice that we can identify m with the

    setnk

    nn qqq ....21 via portion( )kPPPK ...,, 21= as follows

    given ( ) mmxxxx = ....,, 21 , identifymx

    with the one point cross over element

    ( )xkxxx uuuu ...,, 21=r

    when ( )naaaxi xxxu 121 ,....,,= ,i

    n

    i Paqaa ...,,, 21 andn

    aiaaa

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    7/13

    ForP

    eerRev

    iewOnly

    Then set of all such that transformation

    denote by

    =ionrecombinatforchosenare,....,

    andinpartitionais

    21

    m

    qn,...., 21

    k

    k

    TTT

    n

    TTT

    KTH k

    2.2

    Now consider the set of transformation H

    from m into itself as follows

    { }nnjjmm HFoFoFoFTTH == ,.....: 112.3

    therefore any transformation HT is acomposition of bijections, hence is itself a

    bijection so that we can say that mSH

    when mS is a group of permutations of

    m

    .

    Let G denotes the subgroup of Sm

    generated by H. Now, when a E A A runs a

    cycle on the input X amounts to selecting the

    transformation form H independently and

    applying them consecutively so that the

    output of the cycle C on the input X would

    be T(x) for some HT chosen with somepositive probability.

    We now proceed to define a random walk

    associates to a group action.

    Let be a finite set and G. be a finite group

    generated by H. ( )GH and let d denotesthe identity of group G ( )He .

    Let is a probability distribution on Gwhich is concentrated on

    ( )( ) nxyPHggH .0 > the probabilitythat a state XY is reached from the state in exactly n steps.

    The random walk on action of a group G onthe set to be the Markov process with

    transition probabilities is

    { }( )yxggPyx xy == , ( )( )xgxg =Q 2.4

    Since H generates G, n large enough so thatGg we assure 0).( >

    n

    xgxP , such that

    g

    ngmmmg ........,2

    2

    2

    1= now

    let { }Ggngn = /max .

    Therefore eq. (2.4) can be written as Markov

    chain as by the definition of group action

    ( )( )n

    Xdddmmmx

    n

    xgx gng

    ggPP = .............)( 21

    ( )( ) = n

    xdddmmmx

    n

    xgx gng

    ggPP ......))(....(((((.........()( 21

    PP xddxdxdx ....))........()(()(

    P xxdddxddd...... )(.........()......)(.......(

    ( ) ( ) ( )( )..........1 xmmxmPxmPX gnggnggnggngx

    ( ) ( )( ) XmmmXmmmP gngngngnggngn ............. 32132

    According to equation (2.4)

    ( )( )( ) ( )=

    >ng

    i

    g

    ni

    ngnmd

    0

    0. 2.5

    Equation (2.5) is an irreducible Markovchain with a finite state space and it has

    unique stationary unique distribution

    denoted by is the initial distribution on x.

    There fore we then have

    ( )X

    X1

    = 2.6

    Then the distribution in the next generation

    say . is given as .

    ( ) ( )

    =Hm

    mxmX ).(1

    2.7

    =Hm

    mX

    )(1

    2.8

    ( ) ===

    XX

    mX Hm

    1)(

    1 2.9

    Page 6

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    8/13

    ForP

    eerRev

    iewOnly

    Since

    =Hx

    m 1)(and is

    concentrated on H.

    Markov chain modeling an EA A is a

    random walk associated to a finite group set

    of G on Xgenerates a new population or set

    H, in the long run with a uniform

    distribution ().

    In this proposed evolutionary algorithms

    given above to improve the behavior

    between parents and off-springs. Single-

    point crossover given in section xxx in

    which, (single-point crossover) different

    cutting points for each of the two parents in

    the population. The cutting points are

    independently extracted for each parent

    because the genotype lengths of individuals

    are variable. The cutting point is taken only

    between the one layer and the next layer(for

    two hidden layer between second layers of

    two network parents); this means that a newevolutionary weight matrix has created to

    make connection between two layers at the

    cutting points in the parents producing the

    two off-springs, so that the population is

    maintained constant. In each off-spring node

    or layer creation and deletion is possible

    based the predefined genetic parameters.

    3.3 Topology Mutation:The mutation transformations consist of

    the transformations

    :a

    M 3.0

    where { } iSins AUa

    .,.........2,1 .

    Therefore ( )ikii aaaa .....,, 21=

    for

    { }nSiiia

    k ,.......,2,1.......21 defined

    as ( ) = ni xxxX .....,, 2 we have

    ( ) ( )na

    yyyyxM ...,........., 21== 3.1

    where =

    =wiseotherx

    jsomeforiqifay

    q

    jq

    q

    The global behavior of evolutionary

    algorithms is to consider a group or family

    of subsets of the search space and to predict

    which ones of these subsets (say Q) must

    satisfies the property is The expected

    number of occurrences of elements of Q

    increases from one generation to the next.

    The each subset is called as schemata.

    If the chromosomes length is fixed to n, the

    search space is in

    i AS 1== where Ai is set ofall possible alleles which may occur at the i

    th

    position in the chromosome. The next

    section gives the selection of offspring based

    on the fitness function.

    3.4 Selection: A tournament is performed by

    choosing the group of off-springs which are

    selected randomly and reproducing the best

    individual form this group. Now picking up

    the P number of challengers as a group,

    which is 10% of population size and arrange

    the tournament with respected to fitness

    between the P challengers and rth solution

    and define the scores of rth solutions. The

    scores are determined by the minimum

    distance method using fitness function [18].This is called the P tournament selection.

    Arrange the scores of all the solutions in the

    ascending order and pick up the best half

    score positions. The best half scores are

    considered for the next generation. Repeat

    the process for r number of times, where r

    is the twice the population size and obtained

    the scores of r number of P-tournaments.

    The selection probabilities for P-tournament

    selection are given by

    1 1 1 3.2

    More number of selection pressures and their

    comparision are given in [15], [16].

    6. EXPERIMENTAL SETUPThe idea proposed in this work emphasis on

    evolving ANNS; a new evolutionary system

    for evolving feed-forward ANNs from

    ge 7 of 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    9/13

    ForP

    eerRev

    iewOnly

    architecture space. In this contest, the

    evolutionary process attempts to crossover

    and mutate weights before performing any

    structural or topology crossover and

    mutation. The evolutionary process is

    involved in mutation of weights and

    topology. Weight mutation is carried outbefore structural or topology mutation.

    Population size in EA taken as 20 and 10

    independent trails have given to get the

    generalize behavior. Condition of

    terminating criteria is taken as fixed iteration

    and it is equal to 100 for EA. Table 5.1 gives

    all the parameters of the algorithm and

    default setting values are taken for

    considered problem. All the experiments are

    run by specifying the parameters and by

    tuning the genetic parameters to obtain the

    best solution.

    Table: 5.1 Default parameters.

    In this work we considered five benchmark

    problems are used to check the ANNoptimization.

    a)N.bit (2 and 4) Parity (even) classifierb)Pima-India diabetes classifierc)SPECT heart decies classifierd)Brest cancer classifier

    Performance of N-Bit Parity (XOR)

    classification Problem:

    In simultaneous evolution of architecture

    and connection weights, we considered only

    2-bit and 4-bit parity encoders with different

    network sizes are given in this section.

    FIGURE 5.8 Performance of Evolutionary ANN for

    2-bitparity with initial sizes of [2 3 2 1 2].

    FIGURE 5.8 Performance of Evolutionary ANN for

    2-bitparity with initial sizes of [2 2 2 1 2].

    For parity 2/4 all networks in the space has a

    maximum of 10 nodes including 2/4 inputs ,

    number of hidden nodes in layer one,

    number of hidden nodes in layer two, 1

    output node and two hidden layers i.e. the

    size is [2/4 2/3 2 1 2]. This allowed for

    hidden layer configurations up to 5 nodes to

    be evolved. The average and best generation

    over all runs that found a solution for parity-

    2 using accuracy fitness function and the

    smallest architecture size found. The meansquare error (MSE) for 10 trail runs are

    given in the Table 5.1 and the performance

    of 5 runs are shown in Fig (5.8) and the run

    3,4, & 5 completed in 50 generations and

    run1&2 completed in 20 generations. The

    average number of hidden nodes over 10

    successful trail runs is 2.1 and the average

    number of connections is 7.9. For ten runs,

    Symbol ParameterDefault

    value

    N Population size 20

    SeedPreviously saved

    populationnone

    Probability of inserting a

    hidden layer0.1

    Probability of deleting a

    hidden layer0.05

    Probability of inserting a

    neuron in hidden layer0.05

    Probability of deleting a

    neuron hidden layer0.05

    Probability of crossover 0.1

    Number of network

    inputs

    Problem

    specific

    Number of network

    outputs

    Problem

    specific

    K MSE in the range 10

    Page 8

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    10/13

    ForP

    eerRev

    iewOnly

    for the N-bit parity problems the best

    individuals found with = 0.05,

    = 0.05, = 0.01 and

    = 0.01.

    FIGURE 5.10 Performance of Evolutionary ANN for

    4-bit parity with initial size of [4 5 4 1 2].

    FIGURE 5.10 Performance of Evolutionary ANN for

    4-bit parity with initial size of [4 4 5 1 2].

    Table 5.2 Performance of ANN shown by

    EA for different trails

    Performance of Real-Time dataset

    classification Problems:

    For real time datasets all the data applied to

    the training and test sets are acquired from

    the UCI Machine Learning Repository [17].

    Each input dataset variable should bepreprocessed so that its mean value,

    averaged over the entire training set, is close

    to zero, or else it is small compared to its

    standard deviation

    i) Pima-India-Diabetes datasetproblems.

    ii) SPECT Heart DeceasePima-India-Diabetes dataset composed of 8

    attributes plus a binary class value to show

    the signs of diabetes which corresponds to

    the target classification value and includes768 instances shown in Table 4.8. All the

    datasets are divided in to two sets, using 500

    instances for the training, 268 for the test.

    For Single Proton Emission Computed

    Tomography (SPECT) Heart datasets only13 attributes are used as input parameters to

    classify the problem and a total of 267

    instances. The target value has stored at 14th

    parameter in the data set. These data sets are

    normalized before applying to the network.

    All the datasets are divided in to two sets,

    using 200 instances for the training and 67

    for the test.

    The evolutionary process initialized with all

    the networks in the architecture space with

    an defend architecture size, example of size

    [x y z 1 n] i.e. x inputs, y hidden nodes in 1st

    hidden layer, z hidden nodes in 2nd

    hidden

    layer, one output layer with one node and n

    represents the number of layers After the

    evolutionary ANN process the optimized

    network consists of only 2 hidden nodes in a

    single hidden layer with uni-model sigmoidactivation function and the result of the real

    data classification problems are shown in

    Figures and Tables

    Trail

    No.

    MSE

    ([2 3 2 1 2])

    MSE([4 5 4 1 2])

    1 9.0084e-003 3.2548e-006

    2 2.1219e-026 1.3548e-002

    3 2.0416e-014 6.3254e-011

    4 1.3406e-003 5.4856e-019

    5 2.1219e-026 9.2154e-026

    6 9.0084e-003 9.3554e-004

    7 2.1219e-026 2.8754e-0148 2.0416e-014 9.2365e-013

    9 1.3406e-003 3.4587e-001

    10 3.2323e-022 8.2657e-016

    ge 9 of 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    11/13

    ForP

    eerRev

    iewOnly

    FIGURE 5.12 Performance of Evolutionary ANN for

    Pima India diabetes with initial size of [9 4 5 1 2]

    Table 5.4 Results of Pima India Diabetes.

    FIGURE 5.13 Performance of Evolutionary ANN for

    SPECT Heart dataset with initial size of [14 4 5 1 2].

    FIGURE 5.14 Performance of Evolutionary ANN for

    Breast Cancer dataset with initial size of [11 4 5 1 2].

    Table 5.6 Results of SPECT Heart dataset.

    For Pima-India classification the average

    mean square error is 8.6214e-3. During the

    training the network is adjusted according to

    its error, whereas the test process provides

    an independent measure of networkperformance during and after training. The

    best solution found in less than 50generations with = 0.1,

    = 0.05,

    = 0.1 and

    = 0.1. With another network

    size of [9 5 4 1 2] is also shown in the Table

    5.4 with minimum hidden nodes of 3 in a

    single hidden layer. The results of the heart

    dataset are as shown in Table 5.6 and a

    comparisons with literature is shown in

    Table 5.7. Ten runs are executed and the

    Parameter

    Experimental

    Results

    Number Of Runs 10 10Number Of Generations 40 61

    Number of Training patterns

    used500 500

    Average Training Set

    Accuracy76.0 76.5

    Number of Test patterns used 268 268

    Average Test Set Accuracy 81.5 83.5

    Initial Number of Hidden

    layers / Nodes2 / [4 5] 2 / [5 4]

    Final Number of Hidden

    layers / Nodes (Resulted NN)1 / [2] 1 / [3]

    Population size 50 50

    Number of inputs 09 09

    Number of outputs 01 01

    Parameter

    Experimental

    Results

    Number Of Runs 10 10

    Number Of Generations 90 103

    Number of Training patterns

    used200 200

    Average Training Set

    Accuracy86.0 87.2

    Number of Test patterns used 67 67

    Average Test Set Accuracy 85.2 86.5

    Initial Number of Hidden

    layers / Nodes2 / [4 5] 2 / [5 4]

    Final Number of Hidden

    layers / Nodes (Resulted NN)

    1 / [3] 1 / [3]

    Population size 50 50

    Number of inputs 14 14

    Number of outputs 01 01

    Page 10

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    12/13

    ForP

    eerRev

    iewOnly

    average percentage error values of the

    training and test process are summarized in

    Table 5.6 and 5 trail runs are shown in Fig

    (5.13) with an average mean square error of

    7.7264e-3. The best solutions reached in less

    than 90 generations with = 0.1,

    = 0.1, = 0.1 and = 0.1. With anothernetwork size of [14 5 4 1 2] is also shown in

    the table with minimum hidden nodes of 3 in

    a single hidden layer. The results of the

    Brest Cancer dataset are as shown in Table

    5.8 and a comparisons with literature is

    Table 5.8 Results of Breast Cancer dataset.

    shown in Table 5.9. Ten runs are executed

    and the average percentage error values of

    the training and test process are summarized

    in Table 5.8 and 5 trail runs are shown in

    Fig (5.14) with an average mean square error

    of 5.3614e-3. The best solution reached inless than 45 generations with

    = 0.05,

    = 0.05,

    = 0.05 and = 0.05 in all

    runs. With another network size of [11 5 4 1

    2] is also shown in the table with minimum

    hidden nodes of 3 in a single hidden layer.

    CONCLUSION:The optimal weights in ANN in the phase of

    learning have obtained by using the concept

    of evolutionary genetic algorithm.

    Determination of optimal architecture and

    weights in ANN in the phase of learning has

    obtained by using the concept of

    evolutionary genetic algorithm. Proposed

    method of both architecture and weights

    adjustment has shown outperform at every

    level for 2-Bit and 4-Bit parity compared to

    the fixed network Back-Propagation and real

    dataset classification problems reached the

    excellent percentage of accuracy and

    optimized network with less number of

    hidden nodes and layers of less probability. .

    REFERENCES:

    [1] Michalewicz, Z. Genetic algorithms + data

    structures = evolution programs. Springer-Verlag.

    1996.

    [2] Muhleinbeim, H. and Mahnig, T. Evolutionary

    computation and beyond. In Y. Uesaka, P.

    Kanerva, and H. Asoh, editors, Foundations of

    Real-World Intelligence, CSLI Publications, pp.

    123-188, 2001.

    [3] Mitavskiy B. Crossover Invariant Subsets of the

    Search Space for Evolutionary Algorithms.

    Evolutionary Computation.

    http://www.math.lsa.umich.edu/vbmitavsk/

    [4] J. H. Holland, "Adaptation in Natural and

    Artificial Systems. Ann Arbor", MI: Univ. of

    Michigan Press, 1975.

    [5] Coffey, S. An Applied Probabilists Guide to

    Genetic Algorithms. A Thesis Submitted to The

    University of Dublin for the degree of Master in

    Science, 1999.

    [6] Mac Lane, S. Categories for the working

    mathematician. Graduate Texts in Mathematics

    5, Springer-Verlag. 1971.

    [7] Poli, R., Stephens, C., Wright, A., Rowe, J. A

    Schema-Theory-Based Extension of Geiringers

    Theorem for Linear GP and variable-length GAs

    under Homologous Crossover, (2002).

    [8] Vose, M. Generalizing the notion of a schema in

    genetic algorithms Artificial Intelligence, 50(3):

    385-396, 1991.

    Parameter

    Experimental

    Results

    Number Of Runs 10 10

    Number Of Generations 45 52

    Number of Training patternsused

    400 400

    Average Training Set

    Accuracy97.0 97.0

    Number of Test patterns used 240 240

    Average Test Set Accuracy 98.5 98.5

    Initial Number of Hiddenlayers / Nodes

    2 / [4 5] 2 / [5 4]

    Final Number of Hidden

    layers / Nodes (Resulted NN)1 / [2 ] 1/ [2]

    Population size 50 50

    Number of inputs 11 11

    Number of outputs 01 01

    ge 11 of 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence

  • 7/29/2019 s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001_2

    13/13

    ForP

    eerRev

    iewOnly

    [9] Radcliffe, N. The algebra of genetic algorithms.

    Annals of Mathematics and Artificial

    Intelligence, 10:339-384, 1994.

    http://users.breathemail.net/njr/papers/amai94.pdf

    [10] Geiringer, H. On the probability of linkage in

    Mendelian heredity. Annals of Mathematical

    Statistics, 15:25-57, 1944.

    [11]Vose, M. and Wright, A. The simple genetic

    algorithm and the Walsh transform: Part II, the

    inverse. Evolutionary Computation, 6(3):275-

    289, 1998.

    [12]Stephens, C. and Waelbroeck, H. Schemata

    evolution and building blocks. Evolutionary

    Computation, 7(2):109-124, 1999.

    [13] C. Stephens. The Renormalization Group and the

    Dynamics of Genetic systems, to be published in

    Acta Physica Slovaka, /0210271/ (2002).

    http://arXiv.org/abs/condmat/

    [14] Wright, A., Rowe, J., Poli, R., and Stephens C. A

    fixed point analysis of a gene pool GA with

    mutation. Proceedings of the Genetic and

    Evolutionary Computation Conference (GECCO)

    Morgan Kaufmann. 2002.

    http://www.cs.umt.edu/u/wright/.

    [15] Jun He, Xin Yao Drift analysis and average

    time complexity of evolutionary algorithms;

    Artificial Intelligence 127, 5785, 2001.

    [16]T. Chen, J. He, G. Sun, G. Chen, X. Yao, A new

    approach to analyzing average time complexity

    of population-based evolutionary algorithms on

    unimodal problems, IEEE Trans. Syst., Man, and

    Cybern., Part B 39 (5), 1092_1106, 2009.

    [17] D.J. Newman, S. Hettich, C.L. Blake, and

    C.J. Merz. UCI repository ofmachine learning

    databases, 1998.

    [18] M. Hutter, S. Legg, Fitness uniform

    optimization, IEEE Trans. Evol. Comput. 10 (5)

    568_589.2006.

    [19] Liepins, G. and Vose, M. haracterizing cross-over in Genetic Algorithms. Annals of

    Mathematics and Artificial Intelligence, 5: 27 -

    34.(1992).

    Page 12

    URL: http://mc.manuscriptcentral.com/teta

    Journal of Experimental & Theoretical Artificial Intelligence