zemouri2003eaai Radial Basis Function

download zemouri2003eaai Radial Basis Function

of 11

Transcript of zemouri2003eaai Radial Basis Function

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    1/11

    Engineering Applications of Artificial Intelligence 16 (2003) 453463

    Recurrent radial basis function network for time-series prediction

    Ryad Zemouri*, Daniel Racoceanu, Noureddine Zerhouni

    Laboratoire dAutomatique de Besan@on, Groupe Maintenance et S#uret!e de Fonctionnement, 25, Rue Alain Savary, 25 000 Besan@on, France

    Abstract

    This paper proposes a Recurrent Radial Basis Function network (RRBFN) that can be applied to dynamic monitoring and

    prognosis. Based on the architecture of the conventional Radial Basis Function networks, the RRBFN have input looped neurons

    with sigmoid activation functions. These looped-neurons represent the dynamic memory of the RRBF, and the Gaussian neurons

    represent the static one. The dynamic memory enables the networks to learn temporal patterns without an input buffer to hold the

    recent elements of an input sequence. To test the dynamic memory of the network, we have applied the RRBFN in two time seriesprediction benchmarks (MacKey-Glass and Logistic Map). The third application concerns an industrial prognosis problem. The

    nonlinear system identification using the Box and Jenkins gas furnace data was used. A two-steps training algorithm is used: the

    RCE training algorithm for the prototypes parameters, and the multivariate linear regression for the output connection weights.

    The network is able to predict the two temporal series and gives good results for the nonlinear system identification. The advantage

    of the proposed RRBF network is to combine the learning flexibility of the RBF network with the dynamic performances of the

    local recurrence given by the looped-neurons.

    r 2003 Elsevier Ltd. All rights reserved.

    Keywords: Neural network; Radial basis function; Dynamic neural networks; Recurrent neural networks; Neural predictive model; Time series

    prediction

    1. Introduction

    The modern industrial monitoring requires processing

    a certain number of sensors signals. It concerns

    essentially the detection of all deviations comparing to

    a working reference by generating an alarm, and the

    failure diagnosis. The diagnosis operation has two main

    functions: the location of the weakening system or sub-

    system and the identification of the primary cause of this

    failure (Lefebvre, 2000). The monitoring methods can be

    classified in two categories (Dash and Venkatasubra-

    manian, 2000): model-based monitoring methodologies

    and without any model monitoring. The first class

    contains essentially control system techniques based on

    the difference between the system models outputs and

    the equipments output (Combacau, 1991). The major

    disadvantage of these techniques consists in the diffi-

    culty to obtain the formal model especially for complex

    or re-configurable equipments. The second class of

    monitoring techniques is not sensitive to this problem.

    These techniques are the probabilistic ones and the

    Artificial Intelligence ones. The AI techniques are

    essentially based on a training process that gives certain

    adaptability to the monitoring application (Rengaswa-

    my and Venkatasubramanian, 1995).

    The use of the Artificial Neural Networks (ANN) on

    a monitoring task can be viewed as a pattern recognition

    application. The form to recognize is the measurable or

    observable equipment data. The output classes are the

    different working and failure modes of the equipment

    (Koivo, 1994). The Radial Basis Function Networks are

    completely adapted to this kind of application. Due to

    the non-exhaustiveness of the history database of the

    equipment operation, RBF networks are able to detect

    new operations or failures modes by their local general-

    ization. This one is obtained by the Gaussians basis

    functions that are maximal to the core, and decrease in a

    monotonous way with the distance. The second

    advantage of the RBF network is the flexibility of their

    training process.

    The problem with the static classification methods is

    that the dynamic process behavior is not considered

    (Koivo, 1994). For example, the distinction between a

    true degradation and a false alarm needs a dynamic

    processing of the sensors signals (Zemouri et al., 2002a).

    ARTICLE IN PRESS

    *Corresponding author.

    URL: http://www.lab.cnrs.fr

    0952-1976/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.

    doi:10.1016/S0952-1976(03)00063-0

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    2/11

    In our previous works, we have demonstrated that a

    dynamic RBF is able to distinguish between a pick of

    variation and a continuous variation of a signal sensor.

    This can be interpreted as a distinction between a false

    alarm and a true degradation. The prognosis function is

    also strongly dependent on the dynamic behavior of the

    process.The aim of the prognosis function is to predict a

    sensor signal evolution. This operation can be obtained

    either by a priori knowledge of the laws of the ageing

    phenomena evolution or by a training process of the

    signal evolution. In this way, the prognosis can identify

    degradations or predict the time remaining before

    breakdown (Brunet et al., 1990).

    For this purpose, we introduce a new Recurrent

    Radial Basis Function Network (RRBF) architecture

    that is able to learn temporal sequences. The RRBFN

    network is based on the advantages of Radial Basis

    Function networks in term of training process time.

    The recurrent or dynamic aspect is obtained by cas-

    cading looped neurons on the first layer. This layer

    represents the dynamic memory of the RRBF network

    that permits to learn temporal data. The proposed

    network combines the easy use of the RBF network

    with the dynamic performance of the Locally Recurrent

    Globally Feed forward network (Tsoi and Back, 1994).

    The prognosis function can be seen like a time-

    series prediction problem. In order to validate the

    prediction capability of the RRBFN, we test the

    network on two standards time series prediction bench-

    marks: the MacKey-Glass and the Logistic Map. The

    prognosis validation is made on a nonlinear systemidentification using the Box & Jenkins gas furnace

    data.

    The paper is organized in three sections: a brief

    survey of the RBF network, their application and their

    training process algorithms is presented in the second

    section. The third section describes the architecture of

    the RRBF network for the time series prediction.

    Finally, we present the results obtained on the three

    benchmarks.

    2. Radial basis function network overview

    2.1. RBF networks definition

    Radial Basis Functions networks are able to provide a

    local representation of an N-dimensional space. This is

    made by restricted influence zone of the basis functions.

    The parameters of this basis function are given by a

    reference vector (core or prototype) lj and the dimen-

    sion of the influence field sj: The response of the basis

    function depends on the Euclidian distance between the

    input vector x and the prototype vector lj; and depends

    also on the size of the influence field:

    fjx exp jjx ljjj

    2

    2s2j

    !: 1

    For a given input, a restricted number of basis

    functions gives the calculation of the output. The RBF

    network can be classified in two categories, according tothe type of output neuron: standardized and non-

    standardized (Mak and Kung, 2000; Moody and

    Darken, 1989; Xu, 1998; Ghosh and Nag, 2000).

    Moreover, the RBF network can be used in two kind

    of application: regression and classification.

    2.2. RBF training techniques

    The parameters of the RBF networks are the center

    and the influence field of the radial function and the

    output weight (between the intermediate layers neurons

    and those of the output layer). The training process canobtain these parameters. One classify these training

    techniques in the three following groups:

    2.2.1. Supervised techniques

    The principle of these techniques is to minimize the

    quadratic error (Ghosh et al., 1992):

    E X

    n

    En: 2

    At each step of the training process, we consider the

    variations: Dwij of the weight, Dmjk of the center and Dsjof the influence field. The update law is obtained by

    using the descent of the gradient on En (Rumelhart et al.,1986; Le Cun, 1985).

    2.2.2. Heuristic techniques

    The principle of these techniques is to determine the

    network parameters in an iterative way. Generally, we

    start the training process by initializing the network on a

    center with an initial influence field l0;s0: Presenting

    the training vectors progressively creates the prototypes

    centers. The aim of the next step is to modify the

    influence rays and the connections weights (only weights

    between the intermediate layer and the output one).

    Some of the heuristic techniques used for RBF training

    are presented below:

    2.2.2.1. RCE Algorithm (Restricted Coulomb Energy)

    (Hudak, 1992). The RCE Algorithm was inspired from

    the theory of particles charges. The principle of the

    training algorithm is to modify the network architecture in

    a dynamic way. The intermediate neurons are added only

    when it is necessary. The influence field is then adjusted to

    minimize conflicting zones by a threshold y (Fig. 1).

    2.2.2.2. Dynamic Decay Adjustment Algorithm (Berthold

    and Diamond, 1995). This technique, partially extracted

    ARTICLE IN PRESS

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463454

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    3/11

    from the RCE algorithm, is used for classificationapplications (discrimination). The principle of this

    technique is to introduce two thresholds y and y

    in order to reduce conflicting zone between prototypes.

    To ensure the convergence of training algorithm, the

    neural network must satisfy the two inequality (3) and

    this for each vector x of class C from the training set

    (Fig. 2):

    (i: fcixXy48kac; 8j: fkj xoy

    : 3

    2.2.3. Two times training techniques

    These techniques estimate the RBF parameters in two

    phases: a first phase is used to determine the centers and

    the rays of the basis functions. In this step, only input

    vectors are used (unsupervised training algorithm). The

    second step has to calculate the connections weights

    between the hidden layer and the output layer (super-

    vised training). Some of these techniques are presented

    as below.

    2.2.3.1. First phase (unsupervised). The k-means algo-

    rithm: The prototypes centers and the variances matrix

    can be calculated in two steps: in the first step, the

    k-means cluster algorithm determines the center of the

    cluster point Nk with the same class. This center is

    obtained by a segmentation of the training space wk of

    the k classes, in Jk disjoined groups fwk

    j gJk

    j1: The

    population of this group is Nkj points. We estimate then

    the center lj of the function by the average:

    lj 1

    Nkj

    XxAwk

    j

    x: 4

    The second step calculates the variance of the Gaussian

    function (influence field). This one is calculated using the

    following expression:

    sj 1

    Nkj

    XxAwk

    j

    x ljx ljT: 5

    Method Expectation Maximization (EM) (Dempster

    et al., 1977): This technique is based on the analogy

    between the RBF network and the Gaussian mixture

    models. The Expectation Maximization (EM) algorithm

    determines, in an iterative way, the parameters of a

    Gaussian mixture (by the maximum of probability). The

    RBF parameters are obtained by the two steps: step E

    which calculates the mean of the unknown data

    compared to the known data. The step M which

    maximizes the vector parameters of the step E:

    2.2.3.2. Second phase (Supervised). Maximum of mem-

    bership (Hernandez, 1999): This technique, used in the

    classification applications, considers the most significant

    basis functions values fix:

    fmax

    maxN

    i1f

    i; 6

    where N is the number of basis functions for all the

    classes. The output of the neural network is then given by

    y classefmax: 7

    Algorithm of least squares: Let suppose that is fixed an

    empirical risk function to minimize (Remp). As for the

    Multi Layer Perceptron, the determination of the

    parameters can then be done in a supervised way by

    gradient decent method. If the selected cost function is

    quadratic with fixed basis functions F; the weight matrix

    W is obtained by a simple linear system resolution. The

    solution is the weights matrix W that minimizes theempirical risk Remp. By canceling the derivative of this

    risk compared to the weight, we obtain the optimal

    conditions, which can be written in the following matrix

    form:

    FtFWt FtY: 8

    Y represents the desired outputs vector. If the FtF matrix

    is square and non-singular (Michelli condition (Michelli,

    1986)), the optimal solution for the weights, with fixed

    basis functions, can be written as

    Wt FtF1FtY F1Y: 9

    ARTICLE IN PRESS

    A B

    Input Vector

    (category B)

    xxn xBxA

    Fig. 1. Influence field adjustment by RCE algorithm. Only one

    threshold is used. The reduction of the conflicting zone must respect

    the following relations: fBxAoy;fAxnoy;f

    AxBoy: No new

    prototype is added for the input vector xn:

    A

    Input Vector

    (category B)

    +

    xxBxnxA

    B

    Fig. 2. Influence field adjustment by DDA algorithm. Two thresholds

    y and y are used for the conflict reduction according to this

    expression fBxAoy; fAxnoy

    ; fAxBoy

    : No prototype is

    added for the input vector fBxn > y:

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 455

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    4/11

    3. The recurrent radial basis function network

    The proposed recurrent RBF neural network con-

    siders the time as an internal representation (Chappelier,

    1996; Elman, 1990). The dynamic aspect is obtained by

    the use an additional self-connection on the input

    neurons with a sigmoid activation function. Theselooped neurons are a special case of the Locally

    Recurrent Globally Feedforward architecture, called

    local output feed back (Tsoi and Back, 1994). The

    RRBF network can thus take into account a certain past

    of the input signal (Fig. 3).

    3.1. Looped neuron

    Each neuron of the input layer gives a summation at

    the instant t between its input Ii and its previous output

    weighted by a self-connection wii: The output of its

    activation function is

    ait wiixit 1 Iit; 10

    xit fait; 11

    where ait and xit represent respectively the neuron

    activation and its output at the instant t: f is the sigmoid

    activation function:

    fx 1 expkx

    1 expkx: 12

    To highlight the influence of this self-connection, we

    let evolve the neuron without an external influence

    (Frasconi et al., 1995; Bernauer, 1996). The initialconditions are: the input Iit0 0 and that xit0 1:

    The output of the neuron evolves according to the

    following expression:

    xt 1 expkwiixt 1

    1 expkwiixt 1: 13

    Fig. 4 shows the temporal evolution of the output

    neuron.

    This evolution depends on the slope of the straight-

    line D: This slope depends on two parameters: the self-

    connection weight wii and the value of the activation

    function parameter k: The equilibrium points of the

    looped neuron satisfy the following equation:

    at wiifat 1: 14

    The point a0 0 is a first obvious solution of this

    equation. The other solutions are obtained by the

    variations study of the function:

    ga a wiifa: 15

    According to kwii; the looped neuron has one or more

    equilibrium points:

    * Ifkwiip2; the neuron has only one equilibrium point

    a0 0:* If kwii > 2; the neuron has three equilibrium points

    a0 0; a > 0; ao0:

    To study the stability of these points, we study the

    variations of the Lyapunov function (Frasconi et al.,

    1995; Bernauer, 1996). In the case where kwiip2; this

    function is defined by Va a2: We obtain

    DV wiifa2 a2 gawiifa a: 16

    If a > 0; then fa > 0 and gao0: If wii > 0 so then,

    we have DVo0: If ao0; then fao0 and ga > 0: If

    wii > 0; we have DVo0: The point a0 0 is thus a

    steady-state equilibrium point if kwiip2 with wii > 0:

    In the case where kwii > 2; the looped neuron has

    three equilibrium points: a0 0; a > 0 and ao0: To

    study the stability of the point a; we define theLyapunov function Va a a2 (see Frasconi

    et al., 1995; Bernauer, 1996). We obtain

    DV wiifa a2 a a2

    gaga 2a a:

    If a > a; gao0 and ga 2a a0; so we have

    DVo0: The calculation is the same in the case of aoa:

    The point a is a stable equilibrium point. In the same

    way, we can prove that the point a is another stable

    equilibrium point. The point a0 0 is an unstable

    equilibrium point.

    ARTICLE IN PRESS

    Sigmoid FunctionRadial Basis Function

    Output Neurons

    w

    w

    w

    Input

    I1

    I2

    I3

    Fig. 3. RRBF network (recurring networks with radial basis

    functions).

    xi

    ai

    f(ai)

    i

    ii

    a

    w

    t

    t+1

    t+2

    ()

    a0

    xi

    ai

    f(ai)

    a

    a+

    (a) (b)

    Fig. 4. Equilibrium points of the looped neuron: (a) the forget

    behavior kwiip2 and (b) temporal memorizing behavior (kwii > 2).

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463456

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    5/11

    The looped neuron thus can exhibit two behaviors

    according to kwii: forgetting behavior kwiip2; and

    temporal memory behavior kwii > 2: The figure below

    shows the influence of the self-connection weight on

    the behavior of the looped neuron with k 0:05

    (Fig. 5):

    The self-connection procures to the neuron thecapacity to memorize a certain past of the input data.

    The weight of this self-connection can be obtained

    by training, but the easier way to do it is to fix it a

    priori. We will see in the next section how this looped

    neuron can make the RRBF network possible to treat

    dynamic data whereas traditional RBR treat only

    static data.

    3.2. RRBF for the prognosis

    After showing the effect of the self-connection on the

    dynamic behavior of the RRBF network, we present in

    this paragraph the topology of the RRBF network and

    its training algorithm for time series prediction applica-

    tions (Fig. 6).

    The looped neurons cascades represent the dynamic

    memory of the neural network. The network then treats

    the data dynamically. The output vector of the looped

    neurons represents the input vector for the RBF nodes.

    The neural network output is defined by

    yt Xni1

    wi fili;si; 17

    where wi represents the connection weight between

    radial neurons and the output neuron. The output of the

    RBF nodes has the following expression:

    fili;si exp

    Pmj1 x

    jt lji

    2

    s2i

    !18

    li lji

    mj1 and si represent respectively the center and

    the dimension of the influence ray of the ith prototype.

    These radial neurons are the static memory of the

    network. The output xjt of the jth looped neurons is

    the dynamic memory of the network with the followingexpression:

    xjt 1 expk$xjt 1 xj1t

    1 expk$xjt 1 xj1t19

    with j 1;y; m represents the number of the neurons

    of the input layer. The first neuron of this layer has a

    linear activation function x1t xt:

    Fig. 7 shows the relation between the looped

    neuron number and the length of a signal past. We

    have introduced a variation D at the instant t 50

    for a signal (Figs. 7(a) and (b)). The aim is to high-

    light the dynamic memory longer of the RRBF shownin Fig. 6. Four looped-neuron RRBF is stimulated

    by the signal of Fig. 7(a). Figs. 7(c)(f) show the

    output error of each looped neuron caused by this

    variation D:

    The network parameters are determined with a two-

    stage training process. During the first stage, an

    unsupervised learning algorithm is used to determine

    the parameters of the RBF nodes (the centers and the

    influence rays). In the second stage, linear regression is

    used to determine the weights between the hidden and

    the output layer.

    3.3. Training process of the RRBF

    3.3.1. The prototypes parameters

    The first step of the training process consists to

    determine the centers and the influence rays of the

    prototypes (static memory). These prototypes are

    extracted from the output of the looped neurons

    (dynamic memory). Each temporal signal is thus

    characterize by a cluster point that the coordinate are

    the output of the loop neuron at every moment t: We

    have adopted the RCE training algorithm for this first

    stage of the training process. The influence rays are

    ARTICLE IN PRESS

    0 20 40 60 80 100 120 140 160 180 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time

    Outputofthelooped

    neuron

    30ii

    w = 39iiw =40iiw =

    41ii

    w =

    Fig. 5. Influence of self-connection on the behavior of the looped

    neuron with k 0:05:

    Fig. 6. Topology of the RRBF. The self-connection of the input

    neurons procures to the network a dynamic processing of the input

    data.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 457

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    6/11

    adjusted according to a threshold y: A complete

    iteration of this algorithm is as follows:

    // Training Iteration

    // Creation of a new prototype

    for all training vector x Do:

    add a new prototype pn1 with:

    ln1 x

    n 1

    end

    // adjusting the influence raysfor all prototype li Do:

    si max1pjpn4jai s: filjoy

    end

    // End

    3.3.2. Connections weights

    The time series prediction can be seen like an

    interpolation problem. The output of RBF network is

    hx

    Xn

    i1

    wifijjx lijj; 20

    where N represents the number of the basis functions,

    centered in the N input points.

    The solution of this problem is to solve the N linear

    equations to find the weight coefficients:

    f11 f12 ? f1n

    f21

    f22? f

    2n^ ^ & ^

    fn1 fn2 ? fnn

    26664 37775w1

    w2

    ^

    wn

    26664 37775 y1

    y2

    ^

    yn

    26664 37775; 21

    yi is the desired output, and

    fij fjjli ljjj; i;j 1; 2;y; n: 22

    The equation can be written as

    F w Y: 23

    The weight vector is then

    w F1 Y: 24

    4. Application in prediction

    We have tested the RRBF network on three time

    series predictions applications. On these three applica-

    tions, the required goal is to predict the evolution of the

    input data from the knowledge of the past of these data.

    The training process is made from a part of the data set.

    The network was tested on the totality of the data. We

    give for each application, two error-prediction average

    and two error standard deviations according if the

    network test is made on the only the test population or

    on both test and training population.

    4.1. MacKeyGlass chaotic time series

    The MacKeyGlass chaotic time series is generated by

    the following differential equation:

    xt bxt axt t

    1 x10t t: 25

    xt is quasi-periodical and chaotic for the following

    parameters: a 0:2; b 0:1 and t 17 (Jang, 1993;

    Chiu, 1994). The simulated data were obtained by using

    the fourth-order RungeKutta method for Eq. (25) with

    the following initial conditions x0 1:2; and xt

    t 0 for 0ptot: The simulation step is 1. The data of

    this series are available on the following location http://

    neural.cs.nthu.edu.tw/jang/benchmark.

    We have tested the RRBF network presented pre-

    viously on the MacKeyGlass prediction. To obtain good

    result, we have used six looped neurons. The parameters

    of these looped neurons are set such as to obtain the

    longest dynamic memory (Fig. 5). This characteristic is

    obtained with the value $ 40 of the self-connection

    and the parameter of the sigmoid function k 0:05: The

    parameters of the Gaussian functions as well as the

    ARTICLE IN PRESS

    0 50 100 150 200 250 30044

    46

    48

    50

    52

    54

    56

    58

    60

    62

    44

    46

    48

    50

    52

    54

    56

    58

    60

    62

    0 50 100 150 200 250 300

    0 50 100 150 200 250 300 0 50 100 150 200 250 300

    0

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0

    0.5

    1

    1.5

    2

    2.5

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8 x 10

    -5x 10

    -6

    x 10-4

    0

    0 50 100 150 200 250 300

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    Signal evolution

    1st looped neuron error

    3rd looped neuron error 4thlooped neuron error

    0 50 100 150 200 250 300

    2ndlooped neuron error

    signal with variation(a) (b)

    (c) (d)

    (e) (f)

    Fig. 7. Influence of the number of looped neurons on the length of the

    dynamic memory of the network: (a) signal evolution, (b) signal with

    variation D; (c) first looped neuron error, (d) second looped neuron

    error, (e) third looped neuron error, and (f) fourth looped neuron

    error.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463458

    http://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmark
  • 8/22/2019 zemouri2003eaai Radial Basis Function

    7/11

    connections weights are given by the training algorithms

    presented previously with y 0:8:

    Table 1 presents the obtained results by the RRBF

    network with different number of training points (Nb)

    taken from the 118th data point. The prediction errors

    between the network output and the real value of the

    series are presented in the various columns of the table

    with the percentages of each error. This percentage is

    calculated according to the amplitude 0.9 of the series.

    The network is able to predict the series evolution with a

    minimum of 50 training points with a mean error equalto 19% and standard deviation error equal to 27%. This

    error decreases with the augmentation of the training

    points until 2% of the error. The training time

    corresponds to one iteration. Fig. 8 show the results of

    the test with 500 training points.

    4.2. Logistic map

    The Logistic Map series is defined by the expression

    below:

    xt 1 4xt1 xt: 26

    This series is chaotic in the interval of [0,1], with

    x0 0:2: The goal of this application is to predict the

    target value of xt 1: The input value of the RRBF

    network is xt:

    The best prediction results are obtained with one

    looped neuron having the parameters $ 40 for the

    self-connection, and k 0:05 for the sigmoid function

    parameter. The parameter y 0:999 was used for the

    first stage training process.

    Table 2 shows the test results of the RRBF network

    for different training number (Nb). The network cangives good results with only 10 training points. Fig. 9

    shows the results of the test with a 100 training data

    points.

    4.3. Prediction nonlinear system

    The third application relates to a nonlinear prediction

    system, using the Box and Jenkins (1970) gas furnace

    database, which is available in the location http://

    neural.cs.nthu.edu.tw/jang/benchmark. These data re-

    present a time series of gas furnace process with ut

    represents the input gas and yt represents the output

    ARTICLE IN PRESS

    Table 1

    Results of the RRBF test on the MacKeyGlass series prediction

    Nb Min Max Moy1 Moy2 Dev Std1 Dev Std2

    50 3:90 104 0.043% 1.1669 129% 0.1862 20% 0.1776 19% 0.251 27% 0.2482 27%

    100 3:27 105 0.0036% 1.1632 129% 0.0969 10% 0.0879 9% 0.184 20% 0.1778 19%

    150 4:13 105 0.00458% 0.7129 79% 0.0655 7% 0.0564 6% 0.103 11% 0.0982 11%

    200 2:60 105

    0.00288% 0.3915 43% 0.0502 5% 0.0408 4% 0.058 6% 0.0559 6%250 4:54 105 0.00504% 0.3000 33% 0.0480 5% 0.0369 4% 0.054 6% 0.0518 5%

    300 1:46 105 0.00162% 0.2727 30% 0.0441 5% 0.0318 3% 0.048 5% 0.0456 5%

    350 2:45 106 0.00027% 0.2874 31% 0.0439 4% 0.0296 3% 0.048 5% 0.0445 5%

    400 3:35 105 0.0037% 0.3114 34% 0.0375 4% 0.0236 2% 0.042 4% 0.0382 4%

    450 9:56 105 0.01062% 0.2893 32% 0.0360 4% 0.0209 2% 0.042 4% 0.0368 4%

    500 1:50 105 0.00166% 0.2789 31% 0.0380 4% 0.0203 2% 0.043 4% 0.0371 4%

    Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the

    average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev

    Std2 are the standard deviations without and with training data. The percentages are given according to the amplitude of the signal 0.9.

    0 200 400 600 800 1000 12000

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    System Output

    Network Output

    0 200 400 600 800 1000 12000

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    (a) (b)

    Fig. 8. Prediction results: (a) neural network output and the MacKey-Glass series values and (b) error of the neural network prediction.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 459

    http://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmark
  • 8/22/2019 zemouri2003eaai Radial Basis Function

    8/11

    CO2 concentration. The goal of this application is to

    predict the yt value from the knowledge ofyt 1 and

    ut 1:

    The used RRBF network contains two inputs: an

    input for yt and another for ut: The past of each

    input signal is taken into account by a looped neuron.

    The output of the neural network gives the yt 1

    value. The network is composed of four input neurons

    (a linear neuron and a looped neuron for each input

    signal) and one output neuron. The intermediate

    neurons are determined by the first stage training

    process described previously. The first 145 points of

    the database are used for the training process. The

    second stage-training algorithm determined the connec-

    tions weights. The best results were obtained with $

    500 and k 0:05 for the sigmoid function, and y 0:84

    for the training of the influence ray.

    Table 3 shows the results of the network test on

    this application. The RRBF neuronal network gives

    a prediction result with an error average estim-

    ation of 8%. The training process takes one time-

    iteration.

    5. Discussion

    The Recurrent Radial Basis Function Network

    presented in this article was successfully validated in

    the two time series prediction problems. Figs. 8 and 9

    show the results and the error prediction of the RRBF

    for the MacKeyGlass series and the Logistic Map

    series. This dynamic aspect is obtained thanks to the

    looped input nodes (Fig. 3). This local output feedback

    procures to the neuron a dynamic memory (Fig. 5). We

    ARTICLE IN PRESS

    Table 2

    Results of the RRBF test on the Logistic Map series prediction

    Nb Moy1 Moy2 Dev Std1 Dev Std2

    10 0.0945 9% 0.0898 9% 0.0636 6% 0.0652 6%

    20 7:26 104 7:26 102% 6:53 104 6:53 102% 5:11 104 5:11 102% 5:32 104 5:32 102%

    30 1:59 106 1:59 104% 1:35 106 1:35 104% 1:69 106 1:69 104% 1:66 106 1:66 104%

    40 4:69 108

    4:69 106

    % 3:75 108

    3:75 106

    % 3:66 108

    3:66 106

    % 3:77 108

    3:77 106

    %50 1:33 109 1:33 107% 1:00 109 1:00 107% 1:64 109 1:64 107% 1:53 109 1:53 107%

    60 4:29 1010 4:29 108% 3:02 1010 3:02 108% 8:06 1010 8:06 108% 7:00 1010 7:00 108%

    70 7:11 1011 7:11 109% 5:10 1011 5:10 109% 1:90 1010 1:90 108% 1:55 1010 1:55 108%

    80 4:23 1012 4:23 1010% 3:25 1012 3:25 1010% 9:86 1012 9:86 1010% 7:74 1012 7:74 1010%

    90 1:51 1011 1:51 109% 1:32 1011 1:32 109% 1:23 1011 1:23 109% 1:45 1011 1:45 109%

    100 2:14 1011 2:14 109% 1:55 1011 1:55 109% 1:68 1011 1:68 109% 1:38 1011 1:38 109%

    Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the

    average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev

    Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

    0 20 40 60 80 100 120 140 160 180 2000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    System Output

    Network Output

    0 20 40 60 80 100 120 140 160 180 2000

    1

    2

    3

    4

    5

    6

    7

    8

    9x 10

    -11

    (a) (b)

    Fig. 9. (a) Comparison of the prediction results of the network and the values of the series Logistic Map and (b) error of prediction of the neuron

    network.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463460

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    9/11

    do not have so to use temporal windows to store or bloc

    the input data as some neural architecture: NETtalk

    introduced by Sejnowski and Rosenberg (1986), theTDNN by Waibel et al. (1989) and the TDRBF by

    Berthold (1994). These temporal windows techniques

    can have many disadvantages (Elman, 1990). First, the

    data must be blocked by an external mechanism: when

    the data can be presented to the network? The second

    disadvantage is the limitation of the temporal window

    dimension. The recurrent networks are not affected with

    these points. We have shown in Fig. 7 that the RRBF

    with four looped neurons is sensitive to a past of about

    100 step time data.

    A second advantage of the RRBF is the flexibility of

    the training process. A two stage-learning algorithm was

    used. The first stage concerns the determination of the

    RBF parameters, and the second stage for the output

    weight calculation. Only few seconds are required for

    train the RRBF by a personal computer with a

    700 MHz processor.

    The major difficulty is to find the best parameters that

    optimize the output result. These parameters are: the

    number of the input looped neurons N > 0; the self-

    connection value wii > 0; the parameter of the sigmoid

    function k> 0; and the parameter of the first stage-

    training algorithm 0oyo1. In the major case, we can

    have good results with only one looped neuron N 1:

    This input neuron is configured to have the longest

    memory obtained with kw 2 (Fig. 5). The kparameter

    is chosen so that to give a quasi-linear aspect to thesigmoid function around the initial point kE0:05: The

    last parameter to adjust is the first stage-training

    threshold y:

    The results obtained by the RRBF show that the

    RCE algorithm does not rigorously calculate the

    parameters of the Gaussian nodes. The neural network

    is over training. This result is completely coherent

    because all the data of the training set are stored as

    prototypes. The clustering techniques like the k-means

    algorithm, which minimizes the sum of squares error

    (SSE) between the inputs and hidden node centers,

    will certainly give better result than the RCE algorithm.

    However, these techniques can have also some dis-

    advantages. We have presented in our previous work

    an example which highlights these disadvantages

    (Zemouri et al., 2002b):

    * There is no formal method for specifying the number

    of hidden nodes.* These nodes are initialized randomly. We have to run

    several iterations to obtain the best result.

    Our future works will concern the development of a new

    method, which boosts the performances of the k-means

    algorithm (Figs. 1012).

    ARTICLE IN PRESS

    0 50 100 150 200 250 300

    44

    46

    48

    50

    52

    54

    56

    58

    60

    62

    y(t)

    t

    0 50 100 150 200 250 300

    -3

    -2

    -1

    0

    1

    2

    3

    u(t)

    t(a) (b)

    Fig. 10. (a) CO2 output concentration of the gas furnace and (b) input gas of the furnace.

    Table 3

    Results of the RRBF test on the nonlinear system prediction

    Nb Min Max Moy1 Moy2 Dev Std1 Dev Std2

    145 0.0067 0.04% 18.0235 120% 1.5274 10% 1.2441 8% 2.3267 15% 3.4950 23%

    Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the

    average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev

    Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 461

    http://-/?-http://-/?-
  • 8/22/2019 zemouri2003eaai Radial Basis Function

    10/11

    6. Conclusion

    We have presented in this article an application of the

    RRBF network on three time series prediction pro-

    blems: MacKey-Glass, Logistic Map and Box & Jenkins

    gas furnace data. Thanks to its dynamic memory, the

    RRBF network is able to learn temporal sequences. This

    dynamic memory is obtained by a self-connection of theinput neurons. The input data are not blocked by an

    external mechanism, but are memorized by the input

    neurons. The training process time is relatively short. It

    took one iteration-time for the RBF parameters

    calculation and a matrix multiplication-time for the

    output weight calculation. In the three examples, all the

    training data were correctly tested.

    The results obtained in the three Time-Series Predic-

    tion applications represent a validation for the dynami-

    cal data-treatment by the RRBF network.

    References

    Bernauer, E., 1996. Les r!eseaux de neurones et laide au diagnostic: un

    mod"ele de neurones boucl!es pour lapprentissage de s!equences

    temporelles, Ph.D. Thesis, LAAS/FRANCE.

    Berthold, M.R., 1994. A time delay radial basis function network for

    phoneme recognition. Proceedings of International Conference on

    Neural Networks, Orlando, Vol. 7, pp. 44704473.

    Berthold, M.R., Diamond, J., 1995. Boosting the performance of RBF

    networks with dynamic decay adjustment. In: Tesauro, G.,

    Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Informa-

    tion Processing Systems, MIT Press, Cambridge, MA, pp. 521528.

    Box, G.E.P., Jenkins, G.M. 1970. Time Series Analysis, Forecasting

    and Control. Holden Day, San Francisco, pp. 532533.

    Brunet, J., Jaume, D., Labarr"ere, M., Rault, A., Verg!e, M., 1990.

    D!etection et diagnostic de panes, Approche par mod!elisation.

    Traitement des nouvelles technologies/s!erie diagnostic et main-

    tenance, edition hermes FRANCE.

    Chappelier, J.C., 1996. RST: une architecture connexionniste pour la

    prise en compte de relations spatiales et temporelles. Ph.D. Thesis,

    Ecole Nationale Sup!erieure des T!el!ecommunications/France.

    Chiu, S., 1994. Fuzzy model identification based on cluster estimation.Journal of Intelligent & Fuzzy Systems 2 (3), 267278.

    Combacau, M., 1991. Commande et surveillance des syst"emes "a

    !ev!enements discrets complexes: application aux ateliers flexibles.

    Ph.D. Thesis, University of.Sabatier Toulouse, France.

    Dash, S., Venkatasubramanian, V., 2000. Challenges in the

    industrial applications of fault diagnostic systems. Proceedings

    of the Conference on Process Systems Engineering Computing

    and Chemical Engineering 24(27). Keystone, Colorado,

    pp. 785791.

    Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood

    from incomplete data via the EM algorithm. Journal of the Royal

    Statistic Society, Series B 39, 138.

    Elman, J.L., 1990. Finding Structure in Time. Cognitive Science 14,

    179211.

    Frasconi, P., Gori, M., Maggini, M., Soda, G., 1995. Unifiedintegration of explicit knowledge and learning by example in

    recurrent networks. IEEE Transactions on Knowledge and Data

    Engineering 7 (2), 340346.

    Ghosh, J., Nag, A., 2000. In: Howlett, R.J., Jain, L.C. (Eds.), Radial

    Basis Function Neural Network Theory and Applications. Physica-

    Verlag, Wurzburg.

    Ghosh, J., Beck, S., Deuser, L., 1992. A neural network based hybrid

    system for detection, characterization and classification of short-

    duration oceanic signals. IEEE Journal of Ocean Engineering 17

    (4), 351363.

    Hernandez, N.G., 1999. Syst!eme de diagnostic par r!eseaux de neurones

    et statistiques: application "a l a d!etection dhypovigilance dun

    conducteur automobile. Ph.D. Thesis, LAAS/France.

    Hudak, M.J., 1992. RCE classifiers: theory and practice. Cybernetics

    and Systems 23, 483515.

    Jang, J.-S.R., 1993. ANFIS: adaptive-network-based fuzzy inference

    systems. IEEE Transactions on Systems, Man, and Cybernetics 23,

    665685.

    Koivo, H.N, 1994. Artificial neural networks in fault diagnosis and

    control. Control in Engineering Practice 2 (1), 89101.

    Le Cun, Y., 1985. Une proc!edure dapprentissage pour r!eseau "a seuil

    asym!etrique. Cognitiva 85, 599604.

    Lefebvre, D., 2000. Contribution "a la mod!elisation des syst!emes

    dynamiques "a !ev!enements discrets pour la commande et la

    surveillance. Habilitation "a Diriger des Recherches, Universit!e de

    Franche Comt!e/ IUT Belfort, Montb!eliard/France.

    Mak, M.W., Kung, S.Y., 2000. Estimation of elliptical basis

    function parameters by the EM algorithms with application to

    speaker verification. IEEE Transactions on Neural Networks 11 (4),961969.

    Michelli, C.A., 1986. Interpolation of scattered data: distance matrices

    and conditionally positive definite functions. Constructive Approx-

    imation 2, 1122.

    Moody, J., Darken, J., 1989. Fast learning in networks of locally tuned

    processing units. Neural Computation 1, 281294.

    Rengaswamy, R., Venkatasubramanian, V., 1995. A syntactic pattern

    recognition approach for process monitoring and fault diagnosis.

    Engineering Applications of Artificial Intelligence Journal 8 (1), 3551.

    Rumelhart, D.E, Hinton, G.E., Williams, R.J., 1986. Learning

    internal representation by error propagation. In: Rumelhart,

    D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing

    Explorations in the Microstructure of Cognition, Vol. 1. The MIT

    Press, Bradford Books, Cambridge, MA, pp. 318362.

    ARTICLE IN PRESS

    0 50 100 150 200 250 30030

    40

    50

    60

    System Output y(t)

    Network Output

    Training population Test population

    Fig. 11. Comparison of the test results of the CO2 concentration

    prediction of the furnace gas with the real values.

    0 50 100 150 200 250 300

    0

    10

    20

    Fig. 12. Prediction error of the RRBF network.

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463462

  • 8/22/2019 zemouri2003eaai Radial Basis Function

    11/11

    Sejnowski, T.J., Rosenberg, C.R., 1986. NetTalk: a parallel network

    that learns to read aloud. Electrical Engineering and Computer

    Science Technical Report, The Johns Hopkins University.

    Tsoi, A.C., Back, D., 1994. Locally Recurrent Globally Feedforward:

    a critical review of the architectures. IEEE Transactions on Neural

    Networks 5 (2), 229239.

    Xu, L., 1998. RBF nets, mixture experts, and Bayesian Ying-Yang

    learning. Neurocomputing 19 (13), 223257.Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K., 1989.

    Phoneme recognition using time delay neural network. IEEE

    Transactions in Acoustics, Speech and Signal Processing 37 (3),

    328339.

    Zemouri, R., Racoceanu, D., Zerhouni, N., 2002a. Application of the

    dynamic RBF network in a monitoring problem of the production

    systems. 15 IFAC World Congress on Automatic Control,

    Barcelone, Espagne.

    Zemouri, R., Racoceanu, D., Zerhouni, N., 2002b. R"eseaux de

    neurones R!

    ecurrents"

    a Fonction de base Radiales RRFR:Application au pronostic. Revue dIntelligence Artificielle, RSTI

    S!erie RIA 16 (03), 307338.

    ARTICLE IN PRESS

    R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 463