zemouri2003eaai Radial Basis Function

8/22/2019 zemouri2003eaai Radial Basis Function

1/11

Engineering Applications of Artificial Intelligence 16 (2003) 453463

Recurrent radial basis function network for time-series prediction

Ryad Zemouri*, Daniel Racoceanu, Noureddine Zerhouni

Laboratoire dAutomatique de Besan@on, Groupe Maintenance et S#uret!e de Fonctionnement, 25, Rue Alain Savary, 25 000 Besan@on, France

Abstract

This paper proposes a Recurrent Radial Basis Function network (RRBFN) that can be applied to dynamic monitoring and

prognosis. Based on the architecture of the conventional Radial Basis Function networks, the RRBFN have input looped neurons

with sigmoid activation functions. These looped-neurons represent the dynamic memory of the RRBF, and the Gaussian neurons

represent the static one. The dynamic memory enables the networks to learn temporal patterns without an input buffer to hold the

recent elements of an input sequence. To test the dynamic memory of the network, we have applied the RRBFN in two time seriesprediction benchmarks (MacKey-Glass and Logistic Map). The third application concerns an industrial prognosis problem. The

nonlinear system identification using the Box and Jenkins gas furnace data was used. A two-steps training algorithm is used: the

RCE training algorithm for the prototypes parameters, and the multivariate linear regression for the output connection weights.

The network is able to predict the two temporal series and gives good results for the nonlinear system identification. The advantage

of the proposed RRBF network is to combine the learning flexibility of the RBF network with the dynamic performances of the

local recurrence given by the looped-neurons.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: Neural network; Radial basis function; Dynamic neural networks; Recurrent neural networks; Neural predictive model; Time series

prediction

1. Introduction

The modern industrial monitoring requires processing

a certain number of sensors signals. It concerns

essentially the detection of all deviations comparing to

a working reference by generating an alarm, and the

failure diagnosis. The diagnosis operation has two main

functions: the location of the weakening system or sub-

system and the identification of the primary cause of this

failure (Lefebvre, 2000). The monitoring methods can be

classified in two categories (Dash and Venkatasubra-

manian, 2000): model-based monitoring methodologies

and without any model monitoring. The first class

contains essentially control system techniques based on

the difference between the system models outputs and

the equipments output (Combacau, 1991). The major

disadvantage of these techniques consists in the diffi-

culty to obtain the formal model especially for complex

or re-configurable equipments. The second class of

monitoring techniques is not sensitive to this problem.

These techniques are the probabilistic ones and the

Artificial Intelligence ones. The AI techniques are

essentially based on a training process that gives certain

adaptability to the monitoring application (Rengaswa-

my and Venkatasubramanian, 1995).

The use of the Artificial Neural Networks (ANN) on

a monitoring task can be viewed as a pattern recognition

application. The form to recognize is the measurable or

observable equipment data. The output classes are the

different working and failure modes of the equipment

(Koivo, 1994). The Radial Basis Function Networks are

completely adapted to this kind of application. Due to

the non-exhaustiveness of the history database of the

equipment operation, RBF networks are able to detect

new operations or failures modes by their local general-

ization. This one is obtained by the Gaussians basis

functions that are maximal to the core, and decrease in a

monotonous way with the distance. The second

advantage of the RBF network is the flexibility of their

training process.

The problem with the static classification methods is

that the dynamic process behavior is not considered

(Koivo, 1994). For example, the distinction between a

true degradation and a false alarm needs a dynamic

processing of the sensors signals (Zemouri et al., 2002a).

ARTICLE IN PRESS

*Corresponding author.

URL: http://www.lab.cnrs.fr

0952-1976/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0952-1976(03)00063-0


2/11

In our previous works, we have demonstrated that a

dynamic RBF is able to distinguish between a pick of

variation and a continuous variation of a signal sensor.

This can be interpreted as a distinction between a false

alarm and a true degradation. The prognosis function is

also strongly dependent on the dynamic behavior of the

process.The aim of the prognosis function is to predict a

sensor signal evolution. This operation can be obtained

either by a priori knowledge of the laws of the ageing

phenomena evolution or by a training process of the

signal evolution. In this way, the prognosis can identify

degradations or predict the time remaining before

breakdown (Brunet et al., 1990).

For this purpose, we introduce a new Recurrent

Radial Basis Function Network (RRBF) architecture

that is able to learn temporal sequences. The RRBFN

network is based on the advantages of Radial Basis

Function networks in term of training process time.

The recurrent or dynamic aspect is obtained by cas-

cading looped neurons on the first layer. This layer

represents the dynamic memory of the RRBF network

that permits to learn temporal data. The proposed

network combines the easy use of the RBF network

with the dynamic performance of the Locally Recurrent

Globally Feed forward network (Tsoi and Back, 1994).

The prognosis function can be seen like a time-

series prediction problem. In order to validate the

prediction capability of the RRBFN, we test the

network on two standards time series prediction bench-

marks: the MacKey-Glass and the Logistic Map. The

prognosis validation is made on a nonlinear systemidentification using the Box & Jenkins gas furnace

data.

The paper is organized in three sections: a brief

survey of the RBF network, their application and their

training process algorithms is presented in the second

section. The third section describes the architecture of

the RRBF network for the time series prediction.

Finally, we present the results obtained on the three

benchmarks.

2. Radial basis function network overview

2.1. RBF networks definition

Radial Basis Functions networks are able to provide a

local representation of an N-dimensional space. This is

made by restricted influence zone of the basis functions.

The parameters of this basis function are given by a

reference vector (core or prototype) lj and the dimen-

sion of the influence field sj: The response of the basis

function depends on the Euclidian distance between the

input vector x and the prototype vector lj; and depends

also on the size of the influence field:

fjx exp jjx ljjj

2

2s2j

!: 1

For a given input, a restricted number of basis

functions gives the calculation of the output. The RBF

network can be classified in two categories, according tothe type of output neuron: standardized and non-

standardized (Mak and Kung, 2000; Moody and

Darken, 1989; Xu, 1998; Ghosh and Nag, 2000).

Moreover, the RBF network can be used in two kind

of application: regression and classification.

2.2. RBF training techniques

The parameters of the RBF networks are the center

and the influence field of the radial function and the

output weight (between the intermediate layers neurons

and those of the output layer). The training process canobtain these parameters. One classify these training

techniques in the three following groups:

2.2.1. Supervised techniques

The principle of these techniques is to minimize the

quadratic error (Ghosh et al., 1992):

E X

n

En: 2

At each step of the training process, we consider the

variations: Dwij of the weight, Dmjk of the center and Dsjof the influence field. The update law is obtained by

using the descent of the gradient on En (Rumelhart et al.,1986; Le Cun, 1985).

2.2.2. Heuristic techniques

The principle of these techniques is to determine the

network parameters in an iterative way. Generally, we

start the training process by initializing the network on a

center with an initial influence field l0;s0: Presenting

the training vectors progressively creates the prototypes

centers. The aim of the next step is to modify the

influence rays and the connections weights (only weights

between the intermediate layer and the output one).

Some of the heuristic techniques used for RBF training

are presented below:

2.2.2.1. RCE Algorithm (Restricted Coulomb Energy)

(Hudak, 1992). The RCE Algorithm was inspired from

the theory of particles charges. The principle of the

training algorithm is to modify the network architecture in

a dynamic way. The intermediate neurons are added only

when it is necessary. The influence field is then adjusted to

minimize conflicting zones by a threshold y (Fig. 1).

2.2.2.2. Dynamic Decay Adjustment Algorithm (Berthold

and Diamond, 1995). This technique, partially extracted

ARTICLE IN PRESS

R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463454


3/11

from the RCE algorithm, is used for classificationapplications (discrimination). The principle of this

technique is to introduce two thresholds y and y

in order to reduce conflicting zone between prototypes.

To ensure the convergence of training algorithm, the

neural network must satisfy the two inequality (3) and

this for each vector x of class C from the training set

(Fig. 2):

(i: fcixXy48kac; 8j: fkj xoy

: 3

2.2.3. Two times training techniques

These techniques estimate the RBF parameters in two

phases: a first phase is used to determine the centers and

the rays of the basis functions. In this step, only input

vectors are used (unsupervised training algorithm). The

second step has to calculate the connections weights

between the hidden layer and the output layer (super-

vised training). Some of these techniques are presented

as below.

2.2.3.1. First phase (unsupervised). The k-means algo-

rithm: The prototypes centers and the variances matrix

can be calculated in two steps: in the first step, the

k-means cluster algorithm determines the center of the

cluster point Nk with the same class. This center is

obtained by a segmentation of the training space wk of

the k classes, in Jk disjoined groups fwk

j gJk

j1: The

population of this group is Nkj points. We estimate then

the center lj of the function by the average:

lj 1

Nkj

XxAwk

j

x: 4

The second step calculates the variance of the Gaussian

function (influence field). This one is calculated using the

following expression:

sj 1

Nkj

XxAwk

j

x ljx ljT: 5

Method Expectation Maximization (EM) (Dempster

et al., 1977): This technique is based on the analogy

between the RBF network and the Gaussian mixture

models. The Expectation Maximization (EM) algorithm

determines, in an iterative way, the parameters of a

Gaussian mixture (by the maximum of probability). The

RBF parameters are obtained by the two steps: step E

which calculates the mean of the unknown data

compared to the known data. The step M which

maximizes the vector parameters of the step E:

2.2.3.2. Second phase (Supervised). Maximum of mem-

bership (Hernandez, 1999): This technique, used in the

classification applications, considers the most significant

basis functions values fix:

fmax

maxN

i1f

i; 6

where N is the number of basis functions for all the

classes. The output of the neural network is then given by

y classefmax: 7

Algorithm of least squares: Let suppose that is fixed an

empirical risk function to minimize (Remp). As for the

Multi Layer Perceptron, the determination of the

parameters can then be done in a supervised way by

gradient decent method. If the selected cost function is

quadratic with fixed basis functions F; the weight matrix

W is obtained by a simple linear system resolution. The

solution is the weights matrix W that minimizes theempirical risk Remp. By canceling the derivative of this

risk compared to the weight, we obtain the optimal

conditions, which can be written in the following matrix

form:

FtFWt FtY: 8

Y represents the desired outputs vector. If the FtF matrix

is square and non-singular (Michelli condition (Michelli,

1986)), the optimal solution for the weights, with fixed

basis functions, can be written as

Wt FtF1FtY F1Y: 9

ARTICLE IN PRESS

A B

Input Vector

(category B)

xxn xBxA

Fig. 1. Influence field adjustment by RCE algorithm. Only one

threshold is used. The reduction of the conflicting zone must respect

the following relations: fBxAoy;fAxnoy;f

AxBoy: No new

prototype is added for the input vector xn:

A

Input Vector

(category B)

+

xxBxnxA

B

Fig. 2. Influence field adjustment by DDA algorithm. Two thresholds

y and y are used for the conflict reduction according to this

expression fBxAoy; fAxnoy

; fAxBoy

: No prototype is

added for the input vector fBxn > y:

R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453463 455


4/11

3. The recurrent radial basis function network

The proposed recurrent RBF neural network con-

siders the time as an internal representation (Chappelier,

1996; Elman, 1990). The dynamic aspect is obtained by

the use an additional self-connection on the input

neurons with a sigmoid activation function. Theselooped neurons are a special case of the Locally

Recurrent Globally Feedforward architecture, called

local output feed back (Tsoi and Back, 1994). The

RRBF network can thus take into account a certain past

of the input signal (Fig. 3).

3.1. Looped neuron

Each neuron of the input layer gives a summation at

the instant t between its input Ii and its previous output

weighted by a self-connection wii: The output of its

activation function is

ait wiixit 1 Iit; 10

xit fait; 11

where ait and xit represent respectively the neuron

activation and its output at the instant t: f is the sigmoid

activation function:

fx 1 expkx

1 expkx: 12

To highlight the influence of this self-connection, we

let evolve the neuron without an external influence

(Frasconi et al., 1995; Bernauer, 1996). The initialconditions are: the input Iit0 0 and that xit0 1:

The output of the neuron evolves according to the

following expression:

xt 1 expkwiixt 1

1 expkwiixt 1: 13

Fig. 4 shows the temporal evolution of the output

neuron.

This evolution depends on the slope of the straight-

line D: This slope depends on two parameters: the self-

connection weight wii and the value of the activation

function parameter k: The equilibrium points of the

looped neuron satisfy the following equation:

at wiifat 1: 14

The point a0 0 is a first obvious solution of this

equation. The other solutions are obtained by the

variations study of the function:

ga a wiifa: 15

According to kwii; the looped neuron has one or more

equilibrium points:

* Ifkwiip2; the neuron has only one equilibrium point

a0 0:* If kwii > 2; the neuron has three equilibrium points

a0 0; a > 0; ao0:

To study the stability of these points, we study the

variations of the Lyapunov function (Frasconi et al.,

1995; Bernauer, 1996). In the case where kwiip2; this

function is defined by Va a2: We obtain

DV wiifa2 a2 gawiifa a: 16

If a > 0; then fa > 0 and gao0: If wii > 0 so then,

we have DVo0: If ao0; then fao0 and ga > 0: If

wii > 0; we have DVo0: The point a0 0 is thus a

steady-state equilibrium point if kwiip2 with wii > 0:

In the case where kwii > 2; the looped neuron has

three equilibrium points: a0 0; a > 0 and ao0: To

study the stability of the point a; we define theLyapunov function Va a a2 (see Frasconi

et al., 1995; Bernauer, 1996). We obtain

DV wiifa a2 a a2

gaga 2a a:

If a > a; gao0 and ga 2a a0; so we have

DVo0: The calculation is the same in the case of aoa:

The point a is a stable equilibrium point. In the same

way, we can prove that the point a is another stable

equilibrium point. The point a0 0 is an unstable

equilibrium point.

ARTICLE IN PRESS

Sigmoid FunctionRadial Basis Function

Output Neurons

w

w

w

Input

I1

I2

I3

Fig. 3. RRBF network (recurring networks with radial basis

functions).

xi

ai

f(ai)

i

ii

a

w

t

t+1

t+2

()

a0

xi

ai

f(ai)

a

a+

(a) (b)

Fig. 4. Equilibrium points of the looped neuron: (a) the forget

behavior kwiip2 and (b) temporal memorizing behavior (kwii > 2).



5/11

The looped neuron thus can exhibit two behaviors

according to kwii: forgetting behavior kwiip2; and

temporal memory behavior kwii > 2: The figure below

shows the influence of the self-connection weight on

the behavior of the looped neuron with k 0:05

(Fig. 5):

The self-connection procures to the neuron thecapacity to memorize a certain past of the input data.

The weight of this self-connection can be obtained

by training, but the easier way to do it is to fix it a

priori. We will see in the next section how this looped

neuron can make the RRBF network possible to treat

dynamic data whereas traditional RBR treat only

static data.

3.2. RRBF for the prognosis

After showing the effect of the self-connection on the

dynamic behavior of the RRBF network, we present in

this paragraph the topology of the RRBF network and

its training algorithm for time series prediction applica-

tions (Fig. 6).

The looped neurons cascades represent the dynamic

memory of the neural network. The network then treats

the data dynamically. The output vector of the looped

neurons represents the input vector for the RBF nodes.

The neural network output is defined by

yt Xni1

wi fili;si; 17

where wi represents the connection weight between

radial neurons and the output neuron. The output of the

RBF nodes has the following expression:

fili;si exp

Pmj1 x

jt lji

2

s2i

!18

li lji

mj1 and si represent respectively the center and

the dimension of the influence ray of the ith prototype.

These radial neurons are the static memory of the

network. The output xjt of the jth looped neurons is

the dynamic memory of the network with the followingexpression:

xjt 1 expk$xjt 1 xj1t

1 expk$xjt 1 xj1t19

with j 1;y; m represents the number of the neurons

of the input layer. The first neuron of this layer has a

linear activation function x1t xt:

Fig. 7 shows the relation between the looped

neuron number and the length of a signal past. We

have introduced a variation D at the instant t 50

for a signal (Figs. 7(a) and (b)). The aim is to high-

light the dynamic memory longer of the RRBF shownin Fig. 6. Four looped-neuron RRBF is stimulated

by the signal of Fig. 7(a). Figs. 7(c)(f) show the

output error of each looped neuron caused by this

variation D:

The network parameters are determined with a two-

stage training process. During the first stage, an

unsupervised learning algorithm is used to determine

the parameters of the RBF nodes (the centers and the

influence rays). In the second stage, linear regression is

used to determine the weights between the hidden and

the output layer.

3.3. Training process of the RRBF

3.3.1. The prototypes parameters

The first step of the training process consists to

determine the centers and the influence rays of the

prototypes (static memory). These prototypes are

extracted from the output of the looped neurons

(dynamic memory). Each temporal signal is thus

characterize by a cluster point that the coordinate are

the output of the loop neuron at every moment t: We

have adopted the RCE training algorithm for this first

stage of the training process. The influence rays are

ARTICLE IN PRESS

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Outputofthelooped

neuron

30ii

w = 39iiw =40iiw =

41ii

w =

Fig. 5. Influence of self-connection on the behavior of the looped

neuron with k 0:05:

Fig. 6. Topology of the RRBF. The self-connection of the input

neurons procures to the network a dynamic processing of the input

data.



6/11

adjusted according to a threshold y: A complete

iteration of this algorithm is as follows:

// Training Iteration

// Creation of a new prototype

for all training vector x Do:

add a new prototype pn1 with:

ln1 x

n 1

end

// adjusting the influence raysfor all prototype li Do:

si max1pjpn4jai s: filjoy

end

// End

3.3.2. Connections weights

The time series prediction can be seen like an

interpolation problem. The output of RBF network is

hx

Xn

i1

wifijjx lijj; 20

where N represents the number of the basis functions,

centered in the N input points.

The solution of this problem is to solve the N linear

equations to find the weight coefficients:

f11 f12 ? f1n

f21

f22? f

2n^ ^ & ^

fn1 fn2 ? fnn

26664 37775w1

w2

^

wn

26664 37775 y1

y2

^

yn

26664 37775; 21

yi is the desired output, and

fij fjjli ljjj; i;j 1; 2;y; n: 22

The equation can be written as

F w Y: 23

The weight vector is then

w F1 Y: 24

4. Application in prediction

We have tested the RRBF network on three time

series predictions applications. On these three applica-

tions, the required goal is to predict the evolution of the

input data from the knowledge of the past of these data.

The training process is made from a part of the data set.

The network was tested on the totality of the data. We

give for each application, two error-prediction average

and two error standard deviations according if the

network test is made on the only the test population or

on both test and training population.

4.1. MacKeyGlass chaotic time series

The MacKeyGlass chaotic time series is generated by

the following differential equation:

xt bxt axt t

1 x10t t: 25

xt is quasi-periodical and chaotic for the following

parameters: a 0:2; b 0:1 and t 17 (Jang, 1993;

Chiu, 1994). The simulated data were obtained by using

the fourth-order RungeKutta method for Eq. (25) with

the following initial conditions x0 1:2; and xt

t 0 for 0ptot: The simulation step is 1. The data of

this series are available on the following location http://

neural.cs.nthu.edu.tw/jang/benchmark.

We have tested the RRBF network presented pre-

viously on the MacKeyGlass prediction. To obtain good

result, we have used six looped neurons. The parameters

of these looped neurons are set such as to obtain the

longest dynamic memory (Fig. 5). This characteristic is

obtained with the value $ 40 of the self-connection

and the parameter of the sigmoid function k 0:05: The

parameters of the Gaussian functions as well as the

ARTICLE IN PRESS

0 50 100 150 200 250 30044

46

48

50

52

54

56

58

60

62

44

46

48

50

52

54

56

58

60

62

0 50 100 150 200 250 300

0 50 100 150 200 250 300 0 50 100 150 200 250 300

0

0.002

0.004

0.006

0.008

0.01

0.012

0

0.5

1

1.5

2

2.5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8 x 10

-5x 10

-6

x 10-4

0

0 50 100 150 200 250 300

0.5

1

1.5

2

2.5

3

3.5

4

Signal evolution

1st looped neuron error

3rd looped neuron error 4thlooped neuron error

0 50 100 150 200 250 300

2ndlooped neuron error

signal with variation(a) (b)

(c) (d)

(e) (f)

Fig. 7. Influence of the number of looped neurons on the length of the

dynamic memory of the network: (a) signal evolution, (b) signal with

variation D; (c) first looped neuron error, (d) second looped neuron

error, (e) third looped neuron error, and (f) fourth looped neuron

error.

http://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmark


7/11

connections weights are given by the training algorithms

presented previously with y 0:8:

Table 1 presents the obtained results by the RRBF

network with different number of training points (Nb)

taken from the 118th data point. The prediction errors

between the network output and the real value of the

series are presented in the various columns of the table

with the percentages of each error. This percentage is

calculated according to the amplitude 0.9 of the series.

The network is able to predict the series evolution with a

minimum of 50 training points with a mean error equalto 19% and standard deviation error equal to 27%. This

error decreases with the augmentation of the training

points until 2% of the error. The training time

corresponds to one iteration. Fig. 8 show the results of

the test with 500 training points.

4.2. Logistic map

The Logistic Map series is defined by the expression

below:

xt 1 4xt1 xt: 26

This series is chaotic in the interval of [0,1], with

x0 0:2: The goal of this application is to predict the

target value of xt 1: The input value of the RRBF

network is xt:

The best prediction results are obtained with one

looped neuron having the parameters $ 40 for the

self-connection, and k 0:05 for the sigmoid function

parameter. The parameter y 0:999 was used for the

first stage training process.

Table 2 shows the test results of the RRBF network

for different training number (Nb). The network cangives good results with only 10 training points. Fig. 9

shows the results of the test with a 100 training data

points.

4.3. Prediction nonlinear system

The third application relates to a nonlinear prediction

system, using the Box and Jenkins (1970) gas furnace

database, which is available in the location http://

neural.cs.nthu.edu.tw/jang/benchmark. These data re-

present a time series of gas furnace process with ut

represents the input gas and yt represents the output

ARTICLE IN PRESS

Table 1

Results of the RRBF test on the MacKeyGlass series prediction

Nb Min Max Moy1 Moy2 Dev Std1 Dev Std2

50 3:90 104 0.043% 1.1669 129% 0.1862 20% 0.1776 19% 0.251 27% 0.2482 27%

100 3:27 105 0.0036% 1.1632 129% 0.0969 10% 0.0879 9% 0.184 20% 0.1778 19%

150 4:13 105 0.00458% 0.7129 79% 0.0655 7% 0.0564 6% 0.103 11% 0.0982 11%

200 2:60 105

0.00288% 0.3915 43% 0.0502 5% 0.0408 4% 0.058 6% 0.0559 6%250 4:54 105 0.00504% 0.3000 33% 0.0480 5% 0.0369 4% 0.054 6% 0.0518 5%

300 1:46 105 0.00162% 0.2727 30% 0.0441 5% 0.0318 3% 0.048 5% 0.0456 5%

350 2:45 106 0.00027% 0.2874 31% 0.0439 4% 0.0296 3% 0.048 5% 0.0445 5%

400 3:35 105 0.0037% 0.3114 34% 0.0375 4% 0.0236 2% 0.042 4% 0.0382 4%

450 9:56 105 0.01062% 0.2893 32% 0.0360 4% 0.0209 2% 0.042 4% 0.0368 4%

500 1:50 105 0.00166% 0.2789 31% 0.0380 4% 0.0203 2% 0.043 4% 0.0371 4%

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the

average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev

Std2 are the standard deviations without and with training data. The percentages are given according to the amplitude of the signal 0.9.

0 200 400 600 800 1000 12000

0.2

0.4

0.6

0.8

1

1.2

1.4

System Output

Network Output

0 200 400 600 800 1000 12000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(a) (b)

Fig. 8. Prediction results: (a) neural network output and the MacKey-Glass series values and (b) error of the neural network prediction.

http://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmarkhttp://neural.cs.nthu.edu.tw/jang/benchmark


8/11

CO2 concentration. The goal of this application is to

predict the yt value from the knowledge ofyt 1 and

ut 1:

The used RRBF network contains two inputs: an

input for yt and another for ut: The past of each

input signal is taken into account by a looped neuron.

The output of the neural network gives the yt 1

value. The network is composed of four input neurons

(a linear neuron and a looped neuron for each input

signal) and one output neuron. The intermediate

neurons are determined by the first stage training

process described previously. The first 145 points of

the database are used for the training process. The

second stage-training algorithm determined the connec-

tions weights. The best results were obtained with $

500 and k 0:05 for the sigmoid function, and y 0:84

for the training of the influence ray.

Table 3 shows the results of the network test on

this application. The RRBF neuronal network gives

a prediction result with an error average estim-

ation of 8%. The training process takes one time-

iteration.

5. Discussion

The Recurrent Radial Basis Function Network

presented in this article was successfully validated in

the two time series prediction problems. Figs. 8 and 9

show the results and the error prediction of the RRBF

for the MacKeyGlass series and the Logistic Map

series. This dynamic aspect is obtained thanks to the

looped input nodes (Fig. 3). This local output feedback

procures to the neuron a dynamic memory (Fig. 5). We

ARTICLE IN PRESS

Table 2

Results of the RRBF test on the Logistic Map series prediction

Nb Moy1 Moy2 Dev Std1 Dev Std2

10 0.0945 9% 0.0898 9% 0.0636 6% 0.0652 6%

20 7:26 104 7:26 102% 6:53 104 6:53 102% 5:11 104 5:11 102% 5:32 104 5:32 102%

30 1:59 106 1:59 104% 1:35 106 1:35 104% 1:69 106 1:69 104% 1:66 106 1:66 104%

40 4:69 108

4:69 106

% 3:75 108

3:75 106

% 3:66 108

3:66 106

% 3:77 108

3:77 106

%50 1:33 109 1:33 107% 1:00 109 1:00 107% 1:64 109 1:64 107% 1:53 109 1:53 107%

60 4:29 1010 4:29 108% 3:02 1010 3:02 108% 8:06 1010 8:06 108% 7:00 1010 7:00 108%

70 7:11 1011 7:11 109% 5:10 1011 5:10 109% 1:90 1010 1:90 108% 1:55 1010 1:55 108%

80 4:23 1012 4:23 1010% 3:25 1012 3:25 1010% 9:86 1012 9:86 1010% 7:74 1012 7:74 1010%

90 1:51 1011 1:51 109% 1:32 1011 1:32 109% 1:23 1011 1:23 109% 1:45 1011 1:45 109%

100 2:14 1011 2:14 109% 1:55 1011 1:55 109% 1:68 1011 1:68 109% 1:38 1011 1:38 109%



Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

System Output

Network Output

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9x 10

-11

(a) (b)

Fig. 9. (a) Comparison of the prediction results of the network and the values of the series Logistic Map and (b) error of prediction of the neuron

network.



9/11

do not have so to use temporal windows to store or bloc

the input data as some neural architecture: NETtalk

introduced by Sejnowski and Rosenberg (1986), theTDNN by Waibel et al. (1989) and the TDRBF by

Berthold (1994). These temporal windows techniques

can have many disadvantages (Elman, 1990). First, the

data must be blocked by an external mechanism: when

the data can be presented to the network? The second

disadvantage is the limitation of the temporal window

dimension. The recurrent networks are not affected with

these points. We have shown in Fig. 7 that the RRBF

with four looped neurons is sensitive to a past of about

100 step time data.

A second advantage of the RRBF is the flexibility of

the training process. A two stage-learning algorithm was

used. The first stage concerns the determination of the

RBF parameters, and the second stage for the output

weight calculation. Only few seconds are required for

train the RRBF by a personal computer with a

700 MHz processor.

The major difficulty is to find the best parameters that

optimize the output result. These parameters are: the

number of the input looped neurons N > 0; the self-

connection value wii > 0; the parameter of the sigmoid

function k> 0; and the parameter of the first stage-

training algorithm 0oyo1. In the major case, we can

have good results with only one looped neuron N 1:

This input neuron is configured to have the longest

memory obtained with kw 2 (Fig. 5). The kparameter

is chosen so that to give a quasi-linear aspect to thesigmoid function around the initial point kE0:05: The

last parameter to adjust is the first stage-training

threshold y:

The results obtained by the RRBF show that the

RCE algorithm does not rigorously calculate the

parameters of the Gaussian nodes. The neural network

is over training. This result is completely coherent

because all the data of the training set are stored as

prototypes. The clustering techniques like the k-means

algorithm, which minimizes the sum of squares error

(SSE) between the inputs and hidden node centers,

will certainly give better result than the RCE algorithm.

However, these techniques can have also some dis-

advantages. We have presented in our previous work

an example which highlights these disadvantages

(Zemouri et al., 2002b):

* There is no formal method for specifying the number

of hidden nodes.* These nodes are initialized randomly. We have to run

several iterations to obtain the best result.

Our future works will concern the development of a new

method, which boosts the performances of the k-means

algorithm (Figs. 1012).

ARTICLE IN PRESS

0 50 100 150 200 250 300

44

46

48

50

52

54

56

58

60

62

y(t)

t

0 50 100 150 200 250 300

-3

-2

-1

0

1

2

3

u(t)

t(a) (b)

Fig. 10. (a) CO2 output concentration of the gas furnace and (b) input gas of the furnace.

Table 3

Results of the RRBF test on the nonlinear system prediction

Nb Min Max Moy1 Moy2 Dev Std1 Dev Std2

145 0.0067 0.04% 18.0235 120% 1.5274 10% 1.2441 8% 2.3267 15% 3.4950 23%



Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

http://-/?-http://-/?-


10/11

6. Conclusion

We have presented in this article an application of the

RRBF network on three time series prediction pro-

blems: MacKey-Glass, Logistic Map and Box & Jenkins

gas furnace data. Thanks to its dynamic memory, the

RRBF network is able to learn temporal sequences. This

dynamic memory is obtained by a self-connection of theinput neurons. The input data are not blocked by an

external mechanism, but are memorized by the input

neurons. The training process time is relatively short. It

took one iteration-time for the RBF parameters

calculation and a matrix multiplication-time for the

output weight calculation. In the three examples, all the

training data were correctly tested.

The results obtained in the three Time-Series Predic-

tion applications represent a validation for the dynami-

cal data-treatment by the RRBF network.

References

Bernauer, E., 1996. Les r!eseaux de neurones et laide au diagnostic: un

mod"ele de neurones boucl!es pour lapprentissage de s!equences

temporelles, Ph.D. Thesis, LAAS/FRANCE.

Berthold, M.R., 1994. A time delay radial basis function network for

phoneme recognition. Proceedings of International Conference on

Neural Networks, Orlando, Vol. 7, pp. 44704473.

Berthold, M.R., Diamond, J., 1995. Boosting the performance of RBF

networks with dynamic decay adjustment. In: Tesauro, G.,

Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Informa-

tion Processing Systems, MIT Press, Cambridge, MA, pp. 521528.

Box, G.E.P., Jenkins, G.M. 1970. Time Series Analysis, Forecasting

and Control. Holden Day, San Francisco, pp. 532533.

Brunet, J., Jaume, D., Labarr"ere, M., Rault, A., Verg!e, M., 1990.

D!etection et diagnostic de panes, Approche par mod!elisation.

Traitement des nouvelles technologies/s!erie diagnostic et main-

tenance, edition hermes FRANCE.

Chappelier, J.C., 1996. RST: une architecture connexionniste pour la

prise en compte de relations spatiales et temporelles. Ph.D. Thesis,

Ecole Nationale Sup!erieure des T!el!ecommunications/France.

Chiu, S., 1994. Fuzzy model identification based on cluster estimation.Journal of Intelligent & Fuzzy Systems 2 (3), 267278.

Combacau, M., 1991. Commande et surveillance des syst"emes "a

!ev!enements discrets complexes: application aux ateliers flexibles.

Ph.D. Thesis, University of.Sabatier Toulouse, France.

Dash, S., Venkatasubramanian, V., 2000. Challenges in the

industrial applications of fault diagnostic systems. Proceedings

of the Conference on Process Systems Engineering Computing

and Chemical Engineering 24(27). Keystone, Colorado,

pp. 785791.

Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood

from incomplete data via the EM algorithm. Journal of the Royal

Statistic Society, Series B 39, 138.

Elman, J.L., 1990. Finding Structure in Time. Cognitive Science 14,

179211.

Frasconi, P., Gori, M., Maggini, M., Soda, G., 1995. Unifiedintegration of explicit knowledge and learning by example in

recurrent networks. IEEE Transactions on Knowledge and Data

Engineering 7 (2), 340346.

Ghosh, J., Nag, A., 2000. In: Howlett, R.J., Jain, L.C. (Eds.), Radial

Basis Function Neural Network Theory and Applications. Physica-

Verlag, Wurzburg.

Ghosh, J., Beck, S., Deuser, L., 1992. A neural network based hybrid

system for detection, characterization and classification of short-

duration oceanic signals. IEEE Journal of Ocean Engineering 17

(4), 351363.

Hernandez, N.G., 1999. Syst!eme de diagnostic par r!eseaux de neurones

et statistiques: application "a l a d!etection dhypovigilance dun

conducteur automobile. Ph.D. Thesis, LAAS/France.

Hudak, M.J., 1992. RCE classifiers: theory and practice. Cybernetics

and Systems 23, 483515.

Jang, J.-S.R., 1993. ANFIS: adaptive-network-based fuzzy inference

systems. IEEE Transactions on Systems, Man, and Cybernetics 23,

665685.

Koivo, H.N, 1994. Artificial neural networks in fault diagnosis and

control. Control in Engineering Practice 2 (1), 89101.

Le Cun, Y., 1985. Une proc!edure dapprentissage pour r!eseau "a seuil

asym!etrique. Cognitiva 85, 599604.

Lefebvre, D., 2000. Contribution "a la mod!elisation des syst!emes

dynamiques "a !ev!enements discrets pour la commande et la

surveillance. Habilitation "a Diriger des Recherches, Universit!e de

Franche Comt!e/ IUT Belfort, Montb!eliard/France.

Mak, M.W., Kung, S.Y., 2000. Estimation of elliptical basis

function parameters by the EM algorithms with application to

speaker verification. IEEE Transactions on Neural Networks 11 (4),961969.

Michelli, C.A., 1986. Interpolation of scattered data: distance matrices

and conditionally positive definite functions. Constructive Approx-

imation 2, 1122.

Moody, J., Darken, J., 1989. Fast learning in networks of locally tuned

processing units. Neural Computation 1, 281294.

Rengaswamy, R., Venkatasubramanian, V., 1995. A syntactic pattern

recognition approach for process monitoring and fault diagnosis.

Engineering Applications of Artificial Intelligence Journal 8 (1), 3551.

Rumelhart, D.E, Hinton, G.E., Williams, R.J., 1986. Learning

internal representation by error propagation. In: Rumelhart,

D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing

Explorations in the Microstructure of Cognition, Vol. 1. The MIT

Press, Bradford Books, Cambridge, MA, pp. 318362.

ARTICLE IN PRESS

0 50 100 150 200 250 30030

40

50

60

System Output y(t)

Network Output

Training population Test population

Fig. 11. Comparison of the test results of the CO2 concentration

prediction of the furnace gas with the real values.

0 50 100 150 200 250 300

0

10

20

Fig. 12. Prediction error of the RRBF network.



11/11

Sejnowski, T.J., Rosenberg, C.R., 1986. NetTalk: a parallel network

that learns to read aloud. Electrical Engineering and Computer

Science Technical Report, The Johns Hopkins University.

Tsoi, A.C., Back, D., 1994. Locally Recurrent Globally Feedforward:

a critical review of the architectures. IEEE Transactions on Neural

Networks 5 (2), 229239.

Xu, L., 1998. RBF nets, mixture experts, and Bayesian Ying-Yang

learning. Neurocomputing 19 (13), 223257.Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K., 1989.

Phoneme recognition using time delay neural network. IEEE

Transactions in Acoustics, Speech and Signal Processing 37 (3),

328339.

Zemouri, R., Racoceanu, D., Zerhouni, N., 2002a. Application of the

dynamic RBF network in a monitoring problem of the production

systems. 15 IFAC World Congress on Automatic Control,

Barcelone, Espagne.

Zemouri, R., Racoceanu, D., Zerhouni, N., 2002b. R"eseaux de

neurones R!

ecurrents"

a Fonction de base Radiales RRFR:Application au pronostic. Revue dIntelligence Artificielle, RSTI

S!erie RIA 16 (03), 307338.

ARTICLE IN PRESS


zemouri2003eaai Radial Basis Function

Documents

Transcript of zemouri2003eaai Radial Basis Function