A novel self-constructing Radial Basis Function Neural-Fuzzy System

A

YCa

b

c

d

a

ARRAA

KR(SLNCO

1

ap[skNAm

UR

sdc

1h

Applied Soft Computing 13 (2013) 2390–2404

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing

j ourna l ho me p age: www.elsev ier .com/ l ocate /asoc

novel self-constructing Radial Basis Function Neural-Fuzzy System

ing-Kuei Yanga,∗, Tsung-Ying Sunb, Chih-Li Huob, Yu-Hsiang Yub, Chan-Cheng Liuc,heng-Han Tsaid

Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, ROCDepartment of Electrical Engineering, National Dong Hwa University, Hualien, Taiwan, ROCQNAP Systems, Inc., Taipei, Taiwan, ROCGetac Technology Corporation, Taipei, Taiwan, ROC

r t i c l e i n f o

rticle history:eceived 23 February 2012eceived in revised form 23 October 2012ccepted 28 January 2013vailable online 26 February 2013

eywords:adial Basis Function Neural-Fuzzy SystemRBFNFS)elf-constructingeast-Wilcoxon normonlinear function approximation

a b s t r a c t

This paper proposes a novel self-constructing least-Wilcoxon generalized Radial Basis Function Neural-Fuzzy System (LW-GRBFNFS) and its applications to non-linear function approximation and chaos timesequence prediction. In general, the hidden layer parameters of the antecedent part of most traditionalRBFNFS are decided in advance and the output weights of the consequent part are evaluated by leastsquare estimation. The hidden layer structure of the RBFNFS is lack of flexibility because the structure isfixed and cannot be adjusted effectively according to the dynamic behavior of the system. Furthermore,the resultant performance of using least square estimation for output weights is often weakened by thenoise and outliers.

This paper creates a self-constructing scenario for generating antecedent part of RBFNFS with particleswarm optimizer (PSO). For training the consequent part of RBFNFS, instead of traditional least square(LS) estimation, least-Wilcoxon (LW) norm is employed in the proposed approach to do the estimation. As

haotic time series predictionutliers

is well known in statistics, the resulting linear function by using the rank-based LW norm approximationto linear function problems is usually robust against (or insensitive to) noises and outliers and thereforeincreases the accuracy of the output weights of RBFNFS. Several nonlinear functions approximation andchaotic time series prediction problems are used to verify the efficiency of self-constructing LW-GRBFNISproposed in this paper. The experimental results show that the proposed method not only creates optimalhidden nodes but also effectively mitigates the noise and outliers problems.

. Introduction

Both of artificial neural networks (ANNs) and fuzzy system (FS)re two popular techniques for modeling systems and have beenroved as universal approximator in many research literatures1–6]. ANNs have ability of learning from data, but cannot createtructural knowledge for a target system. FS can exhibit structuralnowledge but cannot provide learning mechanism. Therefore,eural-Fuzzy System (NFS), a scenario combining the features of
NNs and FS, was proposed as a new trend for nonlinear systemodeling. In general, a NFS is a combination of fuzzy inference
∗ Corresponding author at: Department of Electrical Engineering, National Taiwanniversity of Science and Technology, No. 43, Sec. 4, Keelung Rd., Taipei 106, Taiwan,OC. Tel.: +886 2 27376681; fax: +886 2 27376699.

E-mail addresses: [email protected] (Y.-K. Yang),[email protected], [email protected] (T.-Y. Sun),[email protected] (C.-L. Huo), [email protected] (Y.-H. Yu),[email protected] (C.-C. Liu), [email protected] (C.-H. Tsai).

568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.asoc.2013.01.023

© 2013 Elsevier B.V. All rights reserved.

mechanism and multiple-layer ANN with network parameterstrained by proper learning algorithm.

Radial basis function neural network (RBFN) is a combinationof learning vector quantizer (LVQ-I) and gradient descent. RBFNwas first proposed by Broomhead and Lowe [7], and its interpo-lation and generalization properties are thoroughly investigatedin [8,9]. Since the mid-1980s, RBFN has been applied on manyapplications, such as pattern classification, system identification,nonlinear function approximation, adaptive control, speech recog-nition, and time-series prediction, and so on. In contrast to thewell-known Multilayer Perceptron (MLP) Networks, the RBFN util-izes a radial construction mechanism. MLP networks are trainedby the error back propagation (BP) algorithm. Since the RBFN hasa substantially faster training procedure and adopts typical two-stage training scheme, its resultant solution can avoid the problemof falling into local optima. In addition, RBFN presents featuressimilar to those of Fuzzy Rule-Based System (FRBS). FRBS charac-
terizes the dynamic behavior of a system by a set of linguistic fuzzyIF-THEN rules. Because the outputs of RBFN are calculated by theweighted summation of inputs, the number of hidden node of RBFNcould be equal to the number of IF-THEN rules of FRBS. The function
dx.doi.org/10.1016/j.asoc.2013.01.023

http://www.sciencedirect.com/science/journal/15684946

www.elsevier.com/locate/asoc

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.asoc.2013.01.023&domain=pdf

mailto:[email protected]







dx.doi.org/10.1016/j.asoc.2013.01.023

Y.-K. Yang et al. / Applied Soft Comp

Nomenclature

ANNs artificial neural networksBP back propagationEDBD enhanced version of delta-bar-deltaFIS Fuzzy Inference SystemFRBS Fuzzy Rule-Based SystemFS Fuzzy SystemIID independently and identically distributedLMS Least Mean SquareLVQ-I learning vector quantizerLW-GRBFNFS least Wilcoxon-generalized RBFNFSMLP Multilayer PerceptronNFS Neural-Fuzzy SystemNRBFN normalized radial basis function neural networkPSO particle swarm optimizerRAN resource allocation networkRBFN radial basis function neural networkRBFNFS Radial Basis Function Neural-Fuzzy SystemRLS Recursive Least-Square

ebwssFith

doe[flbebst

tvhmptstsfow[trbcttri

SSE sum of squared errors

quivalence between RBFN and FIS (fuzzy inference system) haseen proven by [10], indicating that the hidden nodes and outputeights of RBFN are regarded as the antecedent part and the con-

equent part of fuzzy rules respectively. Therefore, a combinationcenario of RBFN and FRBS, forming Radial Basis Function Neural-uzzy System (RBFNFS), has been developed for nonlinear systemdentification [11]. By following the learning procedure of RBFN,he equivalent structural knowledge for RBFNFS can be derived toandle target systems.

In conventional RBFN training approach, the number of hid-en nodes is usually decided according to the statistic propertiesf input data. The center and its corresponding spread width forach hidden nodes is determined by K-means clustering algorithm12]. However, such fixed hidden layer structure of the RBFN lacksexibility of being adjusted effectively according to the dynamicehavior of the system. In general, the output weights of RBFN arevaluated by least square (LS) estimation [13]. However, the draw-ack is the accuracy of output weights could be affected while dataet is corrupted by outliers or noises. Hence, there are two impor-ant issues of RBFN to be considered during learning procedure.

The first one is how to decide a proper number of hidden nodes. Ifhe hidden node number of RBFN is too small, the generated outputectors could be in low accuracy. On the contrary, a large number ofidden nodes may cause over-fitting for the input data and under-ines its global generalization performance. That is, the network

erformance of fixed hidden layer RBFN is greatly dependent onhe pre-selected number of hidden nodes. To solve this problem, theelf-growing RBFN approaches were proposed in [14,15]. However,he need of predefined parameters and local searching on solutionpace in these approaches causes the inaccuracy of approximationrom a sub-solution, and it is also not suitable for online learning. Tovercome above problems, the resource allocation network (RAN)hich is the first of sequential learning algorithm was proposed in

16]. The RAN constructed the hidden layer sequentially based onhe novelty of a new sample. Hence, many sequential learning algo-ithms, such as [17–19], were developed to decide self-regulationsased on growing/pruning criterion. Further, the meta-cognitiveoncept was also considered to decide self-regulation according to
he characteristics of the sample information [20,21]. The advan-ages of sequential learning algorithm are: (1) it does not need toe-train the network when a new training sample arrives; and (2)t does not require predefined training sample number [22]. The
uting 13 (2013) 2390–2404 2391

second issue is how to evaluate the output weights of RBFN, sincethe accuracy of the weight values derived by LS estimation couldbe affected by the noise and outliers existing in a nonlinear func-tion. An outlier is a data pattern that deviates substantially fromthe data distribution. Due to the large deviation from the norm,outliers have serious effects on accuracy, especially when LS or gra-dient descent is used with sum of squared errors (SSE) as objectivefunction [23]. Hence, the approximation precision could be conse-quently damaged for RBFN when noise and outliers are included indata set.

The purpose of this paper is to develop a self-constructing leastWilcoxon-generalized RBFNFS (LW-GRBFNFS) and to apply it tononlinear function approximation and chaotic time series predic-tion problems. The proposed approach presents a novel scenariousing PSO-based self-constructing concept based on our prelimi-nary works [24,25] and least Wilcoxon norm proposed in Wilcoxonlearning machine [26] for learning. The advantage of PSO-basedself-constructing concept is that it can avoid falling into local opti-mal which causes the training error not being able to be reduceddeeply. Furthermore, PSO also exhibits the rapid convergence, bet-ter optimal search ability [25] and less training iteration [27]. Theproposed LW-GRBFNFS can provide flexible and dynamic ability forgenerating more accurate RBFNFS with less sensitivity to noise andoutliers. The experimental results show that the proposed approacheffectively solves the problems caused by noise and outliers. Sincemassively ranking and iteration procedures are needed in LW norm,fixed learning rate could slow down the convergent rate. There-fore, an enhanced version of delta-bar-delta (EDBD) algorithm [28]with an adaptive learning rate is introduced in the proposed LW-GRBFNFS to improve the convergent rate.

The remainder of this paper is organized as follows: Section 2describes many related works to the proposed approach. Section 3describes the proposed approach which includes PSO-based self-constructing RBFNFS and its training by LW norm. Section 4 showsthe experimental results and Section 5 is conclusions.

2. Related works

This section presents many related works to the proposedapproach in this paper, including radial basis function neural net-work (RBFN), Radial Basis function Neural-Fuzzy System (RBFNFS),particle swarm optimization (PSO), and linear regression with leastWilcoxon (LW) norm.

2.1. Radial basis function neural network

Generally, a RBFN consists of three layers: the input layer, theRBF layer (hidden layer) and the output layer. The inputs of hiddenlayer are the linear combinations of scalar weights and input vec-tor, where the scalar weights are usually assigned as unity values.Thus the whole input vector appears to each neuron in the hiddenlayer. The incoming vectors are being mapped by the radial basisfunctions in each hidden node. The output layer yields a vector bylinear combination of the outputs of the hidden nodes to producethe final output. The structure of an n inputs and m outputs RBFNis depicted as Fig. 1. The network outputs can be obtained by

yj = fj(x) =G∑

k=1

wkj �k(x), for j = 1, . . . , m (1)

where x = [x1, x2, . . ., xn]T and y = [y1, y2, . . ., ym] denote the
input vector for n inputs and the output vector for m outputsrespectively. The final output of the jth output node, fj(x), is thelinear combination of all hidden nodes, where �k(·) denotes theradial basis function of the kth hidden node, wk
jdenotes the weight

2392 Y.-K. Yang et al. / Applied Soft Comp

ci

bt

y

ddfr

�

wdk

(

(

tIR

2

msonk

R

The past best solution Pbi of the ith particle is updated by:

Fig. 1. The structure of a RBFN.

orresponding to the kth hidden node and jth output node, and Gs the total number of hidden nodes.

A normalized radial basis function neural network (NRBFN) cane derived from Eq. (1) by dividing the summation of �k to producehe jth output yj as

j = fj(x) =∑G

k=1wkj�k(x)∑G

k=1�k(x), j = 1, . . . , m; k = 1, . . . , G (2)

A radial basis function is a multi-dimensional function thatescribes the distance between a given input vector and a pre-efined center vector. There are different types of radial basisunction. A normalized Gaussian function is usually used as theadial basis function that is

k(x) = exp

(−∥∥x − �k

∥∥2

2�2k

), for k = 1, . . . , G (3)

here �k = [ �1 �2 · · · �n ]k

and �k = [ �1 �2 · · · �n ]k

enote the n-dimensional vector of center and spread width of theth node respectively.

Generally, the RBFN training can be divided into two stages:

1) Determine the parameters of radial basis functions, i.e., Gauss-ian center and spread width. In general, K-means clusteringmethod is commonly used here.

2) Determine the output weight w by supervised learning method.Usually Least-Mean-Square (LMS) or Recursive Least-Square(RLS) are used.

The first stage is crucial, since the number and locations of cen-ers in the hidden layer influence the performance of RBFN directly.n next section, the principle and procedure of a self-constructingBF algorithm will be described.

.2. Radial Basis Function Neural-Fuzzy System

The TS-fuzzy model was introduced in [29,30] as a hybridodel, integrating both fuzzy conditions and functional relation-

hips between the input and output spaces. For an n inputs and mutputs system, the kth rule can be defined as Eq. (4), where i = 1, . . .,

and j = 1, . . ., m denote the index of input and output respectively, = 1, . . ., r denote the index of rules.

k : IF x1 is A1k and x2 is A2k· · ·xn is Ank THEN y1k

= f1k(x) and y2k = f2k(x)· · ·ymk = fmk(x) (4)

uting 13 (2013) 2390–2404

where Aik presents the membership function of antecedent part ofthe kth rule for ith input variable. The mth output of the kth ruleis denoted as ymk = fmk(x), where fmk(x) is the corresponding linearcombination function of input space. If a TS-fuzzy model is reducedto a singleton rule-based system, then Eq. (4) can be written asbelow.

Rk : IF x1 is A1k and x2 is A2k· · ·xn is Ank THEN y1k

= ˇ1k and y2k = ˇ2k· · ·ymk = ˇmk (5)

where ˇjk denotes the jth output weight in the kth rule. If the mem-bership functions Aik uses radial basis function such as Gaussiankernel, then it has been proven that the function equivalence existsbetween normalized RBFN and singleton rule-based TS-fuzzy [10].That is, the structure of Eq. (5) can be regarded as RBFN TS-FuzzySystem (RBFN TS-FS), or RBFN Fuzzy System (RBFNFS). Therefore,the jth output of RBFNFS can be defined as Eq. (6), where x = [x1, x2,. . ., xn] presents the system input, ˇjk denote the weight betweenkth rule and jth output. In Eq. (6), �k(x) denote the antecedent partof the kth rule which is a multi-dimensional Gaussian function.

yj =∑G

k=1ˇjk�k(x)∑Gk=1�k(x)

=∑G

k=1ˇjk · exp [−(||x − �||)/(2�2)]k∑Gk=1 exp [−(||x − �||)/(2�2)]k

(6)

2.3. Particle swarm optimization

The particle swarm optimization (PSO) is a population-basedoptimization technique proposed by Kennedy and Eberhart [31],in which the population is referred to as a swarm. The particlesexpress the ability of fast convergence to local and/or global optimalposition(s) over a small number of generations.

A swarm of PSO consists of a number of particles. Each particlerepresents a potential solution of the optimization task. All of theparticles iteratively discover the probable solution. Each particlegenerates a position according to the new velocity and the previ-ous positions of the particle, and the newly generated position iscompared with the best position generated so far by previous par-ticles according to a defined objective function. The best solutionof the comparison is then kept; i.e., each particle accelerates in thedirections of not only the local best solution but also the globalbest position. If a particle discovers a new probable solution, otherparticles will move closer to it so as to explore the region morecompletely in the process [27]. Suppose that PSO has sz particles(or swarm size) and objective function f is given to be minimized.Then, the three attributes of the particles’ current position pi, cur-rent velocity vi, and local best position Pbi for particles in the searchspace present their features. Each particle in the swarm is itera-tively updated according to the aforementioned attributes. For adimension consists of n particles, its new velocity of every particleis updated by

vij(t + 1) = wvij(t) + c1r1,j(t)[Pbij(t) − aij(t)]

+ c2r2,j(t)[Gbi(t) − aij(t)] (7)

where vij and aij denote the velocity and position of the jth dimen-sion of the ith particle respectively for all i ∈ 1, . . ., sz. w is the inertiaweight of velocity, c1 and c2 denote the acceleration coefficients, r1and r2 are two uniform random values falling in the range between(0, 1), and t is the number of generations. The new position of theith particle is calculated as follows:

aij(t + 1) = aij(t) + vij(t + 1) (8)

Pbi(t + 1) ={

Pbi(t), if f (ai(t + 1)) ≥ f (Pbi(t))

ai(t + 1), otherwise(9)

t Computing 13 (2013) 2390–2404 2393

p

G

u

mmsgsrtpwTa

2

saat{xl

wautt

(oIptt

‖

wsa

y

b

ˇ

wsim

Y.-K. Yang et al. / Applied Sof

The global best solution Gb found from all of particles duringrevious three steps are defined as:

b(t + 1) = argminPbi

f (Pbi(t + 1)), 1 ≤ i ≤ n (10)

The evolutionary process will continuously repeat Eqs. (7)–(10)ntil some terminative conditions are satisfied.

Since initial particles are generated randomly, their distributionay not be uniform enough over the solution space. Therefore, itay trap particles into local optimal solution inevitably. To avoid

olution falling into the local minimal and jumping out to find thelobal minimal, this paper proposes a mutation-like disturbancetrategy into the PSO process [32]. The disturbance mechanism isandomly activated under a disturbance probability. While the dis-urbance mechanism is active, the selected particle is randomlylaced at a new position (ε value in this paper), and this particleill then keep following the PSO process to search a better solution.

he other non-selected particles keep following the PSO iterations usual and trying to find a new solution.

.4. Linear regression with least Wilcoxon norm

Linear regression is one of the most widely used models intatistics to model the relationship between a scalar variable ynd one or more variables denoted as x. In linear regression, datare modeled using linear functions, and unknown model parame-ers are estimated from the data. Given a set of n statistical data(xi, yi)}ni=1, where yi is an observed univariate response variable,

i = (x1i, . . . , xmi)T is a m × 1 vector of explanatory m variables, the

inear model is typically written as

yi = + ˇ1x1i + ˇ2x2i + · · · + ˇmxmi + vi = xTi � + vi,

i = 1, 2, . . . , n (11)

here � = (ˇ1, . . . , ˇm)T is a m×1 regression parameters vector,nd its elements are called regression coefficients, is a constantsually included as the intercept of the linear model, vi is an errorerm which captures all other factors which influence yi other thanhe xi.

Linear regression models are often fitted using the least squareLS) norm approach. The goal of LS norm is to find a good estimationf parameters that fit a function, f(x), of a set of data, x1, . . ., xn.t requires that the estimated function has to deviate as little asossible from f(x) in the sense of a 2-norm. To fit a set of data best,he LS norm ||v||LS minimizes the sum of squared errors (SSE) withhe error vi by

v‖LS =n∑

i=1

v2i , where vi = yi − f (xi) (12)

hich is the difference between the actual points and the regres-ion line. Following the situation mentioned above, an example for

linear model with one explanatory variable is presented as

i = + ˇxi, i = 1, . . . , n (13)

The statistical estimation and inference in linear regression cane derived by

ˆ =∑n

i=1(xi − x)(y − y)∑ni=1(xi − x)2

and = y− ˆ xi (14)

∑ ∑
here x = n
i=1xi/n and y = ni=1yi/n. In this parametric regres-

ion method, it is assumed that the error term vi = yi− f(xi) isndependently and identically distributed (iid) according to a nor-

al distribution function.

Fig. 2. The distribution of associated score value.

Least Wilcoxon norm (LW norm) was originally used in linearregression problems. It is well-known that LW norm is a rank-based regression method and outperforms LS norm when givendata deviate from normality and/or contain outliers [33]. A scorefunction, ϕ(u) : [0, 1] → �, is required to define the LW norm. Thescore associated with ϕ(·) is defined as Eq. (15), where n is thenumber of data presented by a fixed positive integer. Many scorefunctions can be used for different requirements, among them, theϕ(u) =

√12(u − 0.5) is selected to be employed in this paper. Fol-

lowing the score function, the associated score value of aϕ(1) ≤aϕ(2) ≤ · · · ≤ aϕ(n) therefore exhibits a monotonically increasingtendency shown as Fig. 2.

aϕ(i) = ϕ(

i

n + 1

), i = 1, . . . , n (15)

In order to estimate a linear model as Eq. (13) by LW norm, weconsider without the parameter � first. The LW norm of a vectorv = [ v1 v2 · · · vn ] which is regarded as the difference vectorbetween y and �x can be presented as the following pseudo normon �n.

‖v‖LW =∥∥y − ˇx

∥∥LW=

n∑i=1

aϕ(R(vi))vi =n∑

i=1

aϕ(i)v(i)

for v = [v1, . . . , vn]T ∈ �n (16)

where y = [ y1 y2 · · · yn ] and x = [ x1 x2 · · · xn ] are outputvector and input vector of linear function respectively, R(vi) denotesthe rank among v1, . . ., vn. Therefore v(1) ≤ · · · ≤ v(n) are the orderedvalues of v1, . . ., vn and v can be regarded as a linear estimationfunction. To fit a set of data best, the LW norm ||v||LW should beminimized by means of partial differential to obtain the estimated

via following

n∑i=1

aϕ(R(yi − ˇxi))xi = 0 (17)

The obtained final can be substituted into Eq. (13), and theestimated output yi can be obtained as

yi = ˇxi, for i = 1, . . . , n (18)

The linear functions of Eq. (13) and Eq. (18) are in parallel. There-
fore, the intercept parameter can be calculated by subtracting Eq.(18) from Eq. (13) as
vi = yi − yi = ( + ˇxi) − ˇxi = ˛, for i = 1, . . . , n (19)

2394 Y.-K. Yang et al. / Applied Soft Computing 13 (2013) 2390–2404

i

y

apmsfmvc

˛

cbHe

3R

nl

3

aoLoa

d

wG

Fig. 3. LS-method estimate and LW-method estimate.

Following the viewpoint of Eq. (19), the solution of the approx-mated function can easily obtain by

i = + yi = + ˇxi, for i = 1, . . . , n (20)

In real-world applications, data may be corrupted with outliersnd/or measurement noise during collecting processing. Since thearameter could be influenced by extreme values of outliers, theethod described above could be not adequate to obtain its correct

olution. In general, finding the mean value from the ranked vi is aeasible approach to avoid the influence of extreme values. Further-

ore, the summation of the absolute difference between the meanalue and each vi is minimal. Therefore, the intercept coefficient ˛an be obtained by

= med1≤i≤n

(yi − yi) (21)

Example for a given linear function y = 3x + 2 with 24 data pointsorrupted by 7 outliers is shown in Fig. 3. The estimated resulty LS-method is y = 3.6124x + 4.0815, which is biased by outliers.owever, the estimated result by LW-method is y = 3x + 2 which isxactly the same as the true function.

. The self-constructing least Wilcoxon-generalizedBFNFS (LW-GRBFNFS)

This section introduces the architecture, self-constructing sce-ario, and learning algorithm of the proposed self-constructing

east Wilcoxon-generalized RBFNFS (LW-GRBFNFS).

.1. Architecture overview

A single-input single-output (SISO) LW-GRBFNFS is depicteds Fig. 4, where x, y, and b are input, expected output, and theutput offset weight of LW-GRBFNFS respectively. The proposedW-GRBFNFS has three-layer architecture with input, hidden, andutput layers. Each node of input layer and hidden layer is regardeds input variable and linguistic term of input variable respectively.

The kernel function of each hidden node denoted as �j(x) isefined by

�j(x) = �Aj (x)∑Mj=1�Aj (x)

, where �Aj (x) = exp

[− (x − cj)

2(�j)2

],

forj = 1, . . . , M (22)

here M denotes the number of fuzzy rules. One-dimensionalaussian-shaped membership function, �Aj (x), is defined by cj and

Fig. 4. The architecture of LW-GRBFNFS.

�j. In this study, cj and �j is created via PSO-based self-constructingscenario which will be introduced in next subsection.

The objective of output layer is to obtain y shown as Eq. (23)which is the expected output vector of self-constructing LW-GRBFNFS for function approximation. The value b is the offsetweight to modify y.

y = f + b, where f =M∑

j=1

ˇj�j(x) (23)

The scalar weight ˇj between jth hidden node and output node isestimated by LW norm during learning procedure of the proposedLW-GRBFNFS.

Several following subsections will describe the process of self-constructing the antecedent part and the procedure of training theconsequent part of LW-GRBFNFS.

3.2. PSO-based self-constructing antecedent part of LW-GRBFNFS

The hidden layer of an LW-GRBFNFS acts as a receptive fieldoperating in the input data space. The number of hidden node isbased on the distribution of the training data set. The proposedapproach performs this task by defining a cluster distance factor,ε, which is the maximum distance between an input sample and aspecific LW-GRBFNFS node center. The number of basis function isallowed to increase iteratively according to this factor.

The rationale of this learning is described as follows: the hid-den layer starts with no hidden node and ε is predetermined byPSO to control the creation of clusters. The first node center of LW-GRBFNFS, c1, is set by randomly choosing one data, x1, from NT inputdata samples. The value of Euclidean 2-norm distance between c1and next input sample, x2, is compared against ε. If it is greater, anew cluster with center location x2 is created as c2; otherwise, theelements of c1 are updated as

c1i(new) = c1i(old) + ˛∥∥x2i − c1i(old)

∥∥ , i = 1, 2, . . . , N (24)

where c1i and x2i are the ith component of vectors c1 and x2,respectively, ||·|| denotes the Euclidean distance and 0 < < 1 is theupdating ratio. Thus, this procedure is carried out on the remainingtraining samples. The number of clusters grows, or in other wordsthe center of LW-GRBFNFS nodes self-adjust, continuously untilall of samples are processed. The proposed self-constructing LW-
GRBFNFS algorithm can be summarized as follows:
(1) Assuming that there are p clusters with their centers, c1, . . ., cp,being generated from previous iterations. Taking a new input

t Computing 13 (2013) 2390–2404 2395

(

(

((

�

ooumamoppbae

rcy

f

witoo

pdap0tpabbficd

om

(

(

‖v‖LW = ∥y − ˇx∥LW= a(R(vi))vi = a(i)v(i),


sample xn to calculate the distances with each cluster ||xn− ci||,where i = 1, . . ., p.

2) The cluster whose center cq is argminci

(||xn − ci||, where i =1, . . . , p) will be the one to be processed.

3) Comparing ||xn− cq|| with the distance criterion parameter ε. Ifit is greater than ε, then a new cluster center, cp+1, is created atthe position of the sample point xn. Otherwise the elements ofcp are updated by Eq. (24).

4) Repeating above steps until all of the samples are processed.5) For L clusters, a global spread width � can be derived by the

average of Euclidean distance between each cluster center andits nearest neighbor as Eq. (25), where 〈·〉 denotes the expressionof the average value for 1 ≤ i, j ≤ L and i /= j.

= 〈||ci − cj||〉 (25)

As discussed in [34], the cluster distance factor, ε ∈ (0, ∞), isbviously a critical factor to determine input space partitioning andbtain the hidden node number and locations in LW-GRBFNFS. Annduly large value of ε cannot reflect enough number of clusters, itay cause a poor-generalized precision solution. On the contrary,

n unduly small value of ε will create redundant clusters, whichay cause overlap between RBF neurons. Moreover, a small value

f ε may lead to poor accuracy and slow convergence either. Thisaper proposes a PSO-based searching approach to determine aroper value of ε so that the optimal structure of LW-GRBFNFS cane obtained. An objective function to evaluate the effectiveness ofpplying PSO is also proposed. This section will describe how tomploy PSO technique to search a potential optimal value ε.

For searching a suitable ε value for RBFNFS training, a function ofoot mean squared error (RMSE) is applied to evaluate discrepan-ies between the sampling data output yn and the predictive output

ˆn. Thus, the objective function for NT sample is defined as

(ε, yn) = RMSE(y) =

√∑NTk=1(yn(k) − yn(k))

NT(26)

here yn(k) is the predictive output of the kth sample data whichs obtained by ε value during training. If Eq. (26) can be reducedo a sufficiently small value, it means a suitable value of ε can bebtained to train the structure of RBFNFS and the predictive RBFNFSutput will be close to the sampling data output.

The goal is to minimize the value of f (ε, yn). Since only onearameter of cluster distance factor ε is searched by PSO, theimension number j is equal to 1. The particle number i is defineds 1 ≤ i ≤ m where m is swarm size. In the initial state of PSO, all thearticles’ positions ai (i.e., initial cluster distance factor ε) are set as.02, velocity of particle i, vi is set as 0, and the Pbi and Gb are ini-ialized by a random number generator in the range of [0, 1]. Afterositions of particles are adjusted by Eq. (8), each particle will find

potential solution. Consequently, the new past best position wille updated by Eq. (9) and the global best position will be updatedy Eq. (10). The positions of particles are adjusted continuously tond a better solution until the goal is reached or the terminationondition is satisfied. The pseudo code of our PSO-based clusteristance factor searching approach is presented in Fig. 5.

Based on the algorithm presented in Fig. 5, the antecedent partf LW-GRBFNS can be obtained for a non-linear function approxi-ation problem by following steps.

1) Firstly, assume a pair of data (xi, yi) is given for a non-linearfunction, where i = 1, . . ., n.

2) After using PSO to search the suitable cluster radial factor for(xi, yi), the adequate number of hidden nodes of LW-GRBFNFS

Fig. 5. The pseudo code of PSO-based cluster distance factor searching.

and its corresponding Gaussian modeling defined as Eq. (27)can be obtained.

� =

⎡⎢⎢⎣

�A1 (x1) . . . �AM (x1)

.... . .

...

�A1 (xn) · · · �AM (xn)

⎤⎥⎥⎦

T

(27)

(3) The kernel matrix of LW-GRBFNFS hidden layer nodes can beobtained by substituting Eq. (27) into Eq. (22) and is shown asbelow

˚ =

⎡⎢⎢⎣

�1(x1) . . . �M(x1)

.... . .

...

�1(xn) · · · �M(xn)

⎤⎥⎥⎦

T

(28)

The antecedent part of the LW-GRBFNFS is obtained after exe-cuting the step 1 to step 3. The consequent part of the LW-GRBFNFScan be obtained by LW method described in Section 3.3.

3.3. Consequent part of LW-GRBFNFS with least Wilcoxon norm

For training the consequent part of LW-GRBFNFS, the outputlayer is obtained as following

F = � (29)

where � is the kernel matrix obtained from self-constructingprocess as Eq. (28), F = [ f1 f2 · · · fM ] denotes the vector of out-put function fi mapped from input data xi, � = [ ˇ1 ˇ2 · · · ˇM ]denotes the best weight vector between hidden nodes and out-put node. In order to find a suitable ˇj, LW norm is used toestimate � and rewrite Eq. (29) as below with difference vectorv = [ v1 v2 · · · vn ]

v = y − F = y − � (30)

where y = [ y1 y2 · · · yn ] is the output vector mapped frominput data x = [ x1 x2 · · · xn ]. Following the definition of LWnorm, Eq. (16) can be substituted by Eq. (30) as

∥ ∥ n∑ n∑
i=1 i=1
where vi = yi −M∑

j=1

ˇj�j(x) (31)

2 t Comp

mi

∑

fu

wfasougDplrr

�

wv�

E

d

�ceE

ETb

b

TT

396 Y.-K. Yang et al. / Applied Sof

To estimate a suitable ˇj, the LW norm should be minimized,eaning � = arg min‖v‖LW , by partial differential with ˇj as follow-

ng

n

i=1

a

⎛⎝R

⎛⎝yi −

M∑j=1

ˇj�j(xi)

⎞⎠⎞⎠ · �j(xi) = 0 (32)

It is difficult to calculate directly for all ˇj from Eq. (32). There-ore, an incremental gradient-descent approach is introduced forpdating all of the ˇj as following

ˇjk+1 ← ˇj

k+ �j

k+1ˇjk,

where ˇjk=

n∑i=1

a

⎛⎝R

⎛⎝yi −

M∑j=1

ˇjk�j(xi)

⎞⎠⎞⎠�j(xi) (33)

here k denotes the iterative number and �jk+1 is the learning rate

or jth rule at k + 1th iteration. Because massively ranking and iter-tion procedures are needed in Eq. (33), a fixed learning rate couldlow down the convergent rate. Therefore, based on the conceptf delta-bar-delta (DBD) algorithm [28], an adaptive learning ratepdating approach is introduced in the proposed LW-GRBFNFS. Ineneral, fixed adjustment strategy is employed in the traditionalBD algorithm. However, different Gaussian parameters are map-ing to different fuzzy rules, requiring different adjustment for

earning rate. Therefore, the enhanced version of DBD (EDBD) algo-ithm has been proposed to modify the fixed strategy for learningate adjustment in DBD. The proposed EDBD is shown as following

jk+1 =

⎧⎪⎪⎨⎪⎪⎩

�jk+ , if �ˇj

k· ˇj

k+1 > 0

�jk· �, if ˇj

k· ˇj

k+1 < 0

�jk, otherwise

(34)

here ˇjk

and ˇjk+1 denote the incremental weighted mean

alue of jth rule at kth and k + 1th iterations respectively, and respectively denote as increase factor and decrease factor. Inq. (34), the updating strategy of �j

k+1 is based on the gradient

irection of ˇjk

and ˇjk+1. When both are in same direction,

jk+1 is increased by ; otherwise, �j

k+1 is decreased by � or nohange. In next section, the advantages of fast convergent rate andnhanced robustness for network weights gained by the proposedDBD approach are shown by some example simulations.

The output weight � = [ ˇ1 ˇ2 · · · ˇM ] can be obtained byq. (33), and then F = [ f1 f2 · · · fn ] can be obtained by Eq. (29).herefore, following Eq. (21), the output offset weight calculates as
elow
= med1≤i≤n

{yi − fi} (35)

able 1he six test nonlinear functions.

Function number Description

F1 s-type non-linear function

F2 Sinc non-linear function

F3 Hermite non-linear function

F4 Trapezoidal non-linear function

F5 Special sine non-linear function

F6 Continuous oscillation function

uting 13 (2013) 2390–2404

Finally, following Eq. (23), the expected output vector of LW-GRBFNFS, Y, is calculated by

Y = F+ b (36)

3.4. Algorithm overview

The algorithm overview of the proposed LW-GRBFNFS isdescribed here in detail. The algorithm shown as Fig. 6 is composedof the process of self-constructing the antecedent part and that oftraining the consequent part of LW-GRBFNFS.

Fig. 6. The algorithm overview of LW-GRBFNFS.

4. Simulations and results

Some simulations are designed in order to demonstrate the twomajor contributions of this study:

(1) In order to evaluate the efficiency of PSO-based self-constructing scenario for RBFN, six nonlinear functions withdifferent complexities are tested here. To confirm the advan-tages of the proposed approach, the K-means algorithm [12]and GA-based self-growing RBFNN training algorithm [34] arealso carried out against these tested functions for comparison.

(2) To estimate the accuracy and robustness of the proposedmethod during the output weight training procedure, it wouldbe interesting to know what happens if the noise is progres-sively increased and if the number of outliers is increased.Therefore, the performances of the least Wilcoxon norm is com-pared against the least square norm for non-linear functionsapproximation problem and chaotic time series prediction withdifferent degrees of corrupted data.

4.1. PSO-based self-constructing GRBFNFS simulations

The six nonlinear functions to be tested are listed as Table 1. For
every simulation, the training data set consists of 50 input–outputdata samples taken at random, and the testing data set includes75 samples different from the training data set. For the definitionof PSO parameters in the proposed approach [24], w, c1 and c2 are
Equation expression Range of x

y = (x−2)(2x−1)(1+x2)

x ∈ [−8, 12]

y ={

1, x = 0sin(x)/x, x /= 0

x ∈ [−10, 10]

y = 1.1 · (1 − x + 2x2) · e−x2/2 x ∈ [−5, 5]

y = 0.5 · sin(

x2

)+ 1+cos x

2 x ∈ [−4.5, 10.8]

y = sin(2 x) + sin(

2 x10

)x ∈ [0, 20]

y = − sin(

10 x)+ 0.5 · cos

(25 x)

x ∈ [–10, 10]

t Comp

gbPwtiiaitr


iven 0.12, 0.25 and 0.25 respectively, and the search range of ε isounded between 0.2 and 1, the particle number is 10. Actually,SO parameters are not sensitive to the final result during net-ork training process by several independently tested runs. In [34],

he Simple Genetic Algorithm (SGA) is adopted to use binary cod-ng to train GRBFNFS structure for saving computation time, butt will lose some accuracy. That is, this method may not present
n optimal solution. So the Real-value Genetic Algorithm (RGA)s implemented in this paper to obtain more accurate results. Forhe GA-based self-growing GRBFNFS training algorithm, the searchange of ε in the input space is also in the range from 0.2 to 1, the
Fig. 7. Nonlinear function approximation without

uting 13 (2013) 2390–2404 2397

crossover rate Pc is given 0.8, and mutation rate Pm is given 0.01,the population size is 10. For the K-means method [12], the optimalnumber of GRBFNFS neurons in the hidden layer is chosen to be 30by experience. Other parameter setting of K-means can be referredto [16].

To evaluate the result, we use the root mean square error (RMSE)shown as below

RMSE =

√√√√1n

n∑i=1

(y(xi) − F(xi))2 (37)

outliers by LW-GRBFNFS and LS-GRBFNFS.


Table 2Self-constructing GRBFNFS comparison between the three approaches in six test functions.

Test function Performance index Methods

PSO-based GA-based K-means

F1

RMSE for training data 0.0332 0.0584 0.1552RMSE for testing data 0.0520 0.0786 0.1962Maximal error 0.3852 0.2355 0.9508Number of hidden node 33 29 30

F2


F3


F4


F5


F6


Fig. 8. Test function F1 approximation for corrupted data.

Y.-K. Yang et al. / Applied Soft Computing 13 (2013) 2390–2404 2399

Table 3Performance comparison between LW-GRBFNFS and LS-GRBFNFS.

Corrupted percent RMSE F1 F2 F3 F4 F5 F6

A B+ A B+ A B+ A B+ A B+ A B+

0%Avg. 0.0843 0.0087 0.0245 0.0016 0.0339 0.0012 0.0449 0.0008 0.0663 0.0008 0.0588 0.0005Std. 0 0 0 0 0 0 0 0 0 0 0 0

10%Avg. 0.1180 0.0572 0.0799 0.0362 0.0704 0.0173 0.0835 0.0262 0.1054 0.0710 0.0945 0.0354Std. 0.0388 0.0247 0.0357 0.0282 0.0291 0.0285 0.0371 0.0257 0.0467 0.0295 0.0280 0.0251

20%Avg. 0.1816 0.0925 0.1160 0.0802 0.1010 0.0444 0.1174 0.0692 0.1358 0.1118 0.1230 0.0775Std. 0.0479 0.0288 0.0529 0.0361 0.0406 0.0304 0.0601 0.0366 0.0527 0.0360 0.0442 0.0319

30%Avg. 0.2665 0.1343 0.2489 0.1309 0.1216 0.0687 0.1449 0.0814 0.1867 0.1613 0.1555 0.1238Std. 0.0592 0.0379 0.0581 0.0440 0.0560 0.0428 0.0694 0.0366 0.0565 0.0405 0.0725 0.0493

40%Avg. 0.4020 0.1769 0.3807 0.1692 0.2519 0.1002 0.1790 0.1491 0.2364 0.2077 0.1880 0.1707Std. 0.0699 0.0428 0.0593 0.0440 0.0569 0.0432 0.0731 0.0479 0.0572 0.0471 0.0787 0.0628

50%Avg. 0.4482 0.2226 0.4150 0.2108 0.3762 0.1362 0.2127 0.1877 0.2667 0.2354 0.2202 0.2035Std. 0.0863 0.0857 0.0898 0.0724 0.0876 0.0653 0.0933 0.0588 0.0652 0.0624 0.0945 0.0880

NN

wafoaa[to

ote 1: A is LS-based method.ote 2: B+ is the proposed method with EDBD.

here y(xi) denotes the output of a nonlinear function for input xi,nd F(xi) denotes the derived output following the RBFN approachor input xi. After simulations, the RMSE of training data, the RMSEf testing data, the maximal error and the number of hidden nodesre presented in Table 2 for each case. In Table 2, the three involved
lgorithms are denoted as PSO-based, GA-based [34] and K-means12]. In the simulation results, PSO-based approach has lower RMSEhan others for both training data and testing data. It means thatver fitting does not happen in the proposed approach. The RBFN
Fig. 9. Test function F2 approxim

is known to need different number of hidden nodes and clus-ter radius for different complexities. K-means approach usuallyresults in a larger error because it is not able to decide a suit-able number of hidden nodes. Though GA-based approach decidesa suitable number of hidden nodes, its cluster radius is not good
enough to classify whole data. The proposed approach is able tofind out the optimal cluster radius to further decide a numberof hidden nodes because PSO is more capable of global searchingthan GA.
ation for corrupted data.


Table 4Performance comparison between with and without enhanced DBD.

Corrupted percent RMSE & CS F1 F2 F3 F4 F5 F6

B B+ B B+ B B+ B B+ B B+ B B+

0%Avg. 0.0158 0.0087 0.0018 0.0016 0.0046 0.0012 0.0019 0.0008 0.0019 0.0008 0.0053 0.0005Std. 0 0 0 0 0 0 0 0 0 0 0 0CS 6.1587 0.5275 17.1749 0.4455 16.4492 0.4283 17.1120 0.4950 18.0426 0.5270 17.1333 0.4833

10%Avg. 0.0607 0.0572 0.0407 0.0362 0.0195 0.0173 0.0293 0.0262 0.0718 0.0710 0.0374 0.0354Std. 0.0376 0.0247 0.0418 0.0282 0.0302 0.0285 0.0357 0.0257 0.0420 0.0295 0.0277 0.0251CS 6.0508 0.5317 16.9200 0.4396 16.2903 0.4279 17.3534 0.4873 18.3451 0.5439 17.0051 0.4837

20%Avg. 0.0988 0.0925 0.0864 0.0802 0.0427 0.0444 0.0814 0.0692 0.1159 0.1118 0.0790 0.0775Std. 0.0465 0.0288 0.0504 0.0361 0.0423 0.0304 0.0481 0.0366 0.0434 0.0360 0.0394 0.0319CS 6.0688 0.5358 17.1502 0.4429 16.4101 0.4582 17.3379 0.4842 18.0810 0.5477 17.0051 0.4726

30%Avg. 0.1431 0.1343 0. 1372 0.1309 0.0691 0.0687 0.1034 0.0814 0.1754 0.1613 0.1273 0.1238Std. 0.0605 0.0379 0.0468 0.0440 0.0557 0.0428 0.0576 0.0366 0.0435 0.0405 0.0671 0.0493CS 6.0703 0.5409 16.8663 0.4428 16.3422 0.4397 17.5313 0.4888 17.9242 0.5432 17.4508 0.4799

40%Avg. 0.1980 0.1769 0.1692 0.1629 0.1016 0.1002 0.1719 0.1491 0.2144 0.2077 0.1880 0.1707Std. 0.0447 0.0428 0.0508 0.0440 0.0545 0.0432 0.0588 0.0479 0.0500 0.0471 0.0798 0.0628CS 6.0607 0.5457 17.0333 0.4429 16.5645 0.4526 17.6221 0.4946 17.8668 0.5397 17.4498 0.4722

50%Avg. 0.2339 0.2226 0.2116 0.2108 0.1329 0.1362 0.2036 0.1877 0.2411 0.2354 0.2132 0.2035Std. 0.0860 0.0857 0.0765 0.0724 0.0594 0.0653 0.0665 0.0588 0.0630 0.0624 0.0882 0.0880CS 6.0618 0.5462 17.2517 0.4453 16.8757 0.4453 17.3452 0.4885 18.2691 0.5351 17.0145 0.4699

N the pN ion ofN

4a

c

ote 1: Method B denotes the proposed method without EDBD, Method B+ denotesote 2: avg. denotes the average of 30 Runs’ RMSE, std. denotes the standard deviatote 3: CS denotes convergence speed in second.

.2. Simulations of LW-GRBFNFS for nonlinear function
pproximations with outliers
The objective of this section is to approximate outliers-orrupted nonlinear functions by means of the proposed

Fig. 10. Test function F3 approxi

roposed method with EDBD. 30 Runs’ RMSE.

self-constructing GRBFNFS with LW norm and LS norm. The ability
of mitigating the influence from noise and outliers is comparedbetween the two approaches. The six test nonlinear functions,shown as Table 1, generate uniformly distribution input-outputsamples for approximating each function [35–37]. The training
mation for corrupted data.



dtfowuii1ddE

oaoIaLprdH

cd

ata set consists of 50 randomly chosen x-points (training pat-erns) with the corresponding y-values (target values) evaluatedrom the underlying true function. The outliers set is composedf the same x-points as the corresponding uncorrupted ones butith randomly chosen y-values corrupted by adding random val-es from an uniform distribution range in [–2, 2]. It would be

nteresting to know what happens if the noise is progressivelyncreased and if the number of outliers is increased. To this end,0%, 20%, 30% and 40% randomly chosen y-values of the trainingata points will be outliers for the PSO-based self-constructingynamic Gaussian modeling matrix of hidden nodes shown asq. (28).

For PSO-based self-constructing process of the antecedent partf GRBFNFS, the related PSO parameters are assigned as [24]: w, c1nd c2 are given 0.12, 0.25 and 0.25 respectively, the search rangef ε is bounded between 0.2 and 1, and the particle number is 10.n each experiment, the output layer weight of initial value is sets 0. The training process of the antecedent part of GRBFNFS withW norm is more complex than that with LS norm. Therefore, theroposed EDBD is employed in LW-GRBFNFS with initial learningate set as �j

0 = 0.3 and 1000 iterative times. The increase factor andecrease factor shown in Eq. (34) are set as 0.2 and 0.6 respectively.
owever, the learning rate is fixed as 0.0006 in LS-GRBFNFS.
For each test function with different number of outliers-orrupted training data, GRBFNFS is self-constructed by PSO. Afterynamical Gaussian models are calculated, the output data that

approaches a specified test function can be obtained when weightsare trained by LW and LS respectively. For the case without outliers,the output data resulted by LS-based method and the proposedLW-based method are shown in Fig. 7. Meanwhile, their RMSE andstandard deviation (SD) are presented in the first row of Table 3.From the table, performance of the proposed LW-based methodis better than that of LS-based method. Some cases with variantnumber of outliers are also tested by 30 independent runs, wherepercentage of outliers is 10%, 20%, 30%, 40%, and 50% respectively.The output data are shown in Figs. 8–13, and the average of RMSEsand SDs are listed in lower rows of Table 3. From these results, itis emerged that LS-based method is sensitive to number of out-liers but LW-based method is not. The case with 50% outliers isquite difficult for function approximation since SD is increasedsuddenly.

In order to speed up the convergence speed (CS), the enhancedDBD (EDBD) algorithm is applied for adapting learning rate ofthe proposed LW-based method. Simulations for above casesare conducted again and their results are presented in Table 4.Since gradient-decedent is applied to train the output weightsin LW-based method without EDBD, the simulation is conductedover 12,000 iterations. However, it only requires 1000 itera-
tions in LW-based method with EDBD. The results shown inTable 4 clearly indicate the proposed LW-based method withEDBD is able to greatly improve convergence speed with goodaccuracy.


proxi

4

[

wcFgtE

nf[dbsmLarfF

i

of Table 5. The case with 50% corrupted outliers is quite diffi-cult for function approximation since RMSE and SD are increasedsuddenly.

Table 5Mackey–Glass process prediction result.

Corrupted percent RMSE Mackey–Glassprocess prediction

A B+

0%Avg. 0.1600 0.0107Std. 0 0

10%Avg. 0.1603 0.0107Std. 0.0007 0

20%Avg. 0.1744 0.0110Std. 0.032 0.0003

30%Avg. 0.1800 0.0110Std. 0.0040 0.0007

40%Avg. 0.1800 0.0112Std. 0.0041 0.0010

Fig. 12. Test function F5 ap

.3. Chaotic time series prediction

In this section, signal prediction for Mackey–Glass time series38] shown as Eq. (38) is simulated by LW-GRBFNFS.

dx(t)d(t)

= 0.1x(t) + 0.2 · x(t − 17)

1 + x(t − 17)10(38)

here x(t) denotes the sampled time series of input space. Theorresponding output signal by following Eq. (38) is shown asig. 14(a). In Fig. 14(a), the first 17 output signals are randomlyenerated in the range of 0 and 1, while the samples from 18tho 1000th of the Mackey–Glass time series are generated byq. (38).

To process the prediction, assume that the first 500 output sig-als are known for training and the last 500 points are the objective

or predicting. Therefore, 500 random data pairs are selected from0, 500] sample space for self-constructing GRBFNFIS with 376 hid-en nodes by PSO. After dynamical Gaussian models are calculatedy Eq. (28), output data approximate to that of Mackey–Glass timeeries can be obtained when weights are trained by the LW-basedethod and LS-based method respectively. The predicted result by

S-GRBFNFS and LW-GRBFNFS are shown in Fig. 14(b), and theirssociated RMSE and standard deviation are presented in the firstow of Table 5. From the table, it is obvious to observe that the per-
ormance of the proposed LW-GRBFNFS is better than LS-GRBFNFS.ig. 14(b) is the error comparison of two methods shown as Fig. 15.
Some cases with variant number of outliers are tested by 30ndependent runs, where percentage of outliers is 10%, 20%, 30%,

mation for corrupted data.

40%, and 50% respectively. The established GRBFNFS are withaverage about 382 hidden nodes under 30 independent runs.After dynamical Gaussian models are calculated by Eq. (28), theoutput data that approach to Eq. (38) can be obtained whenweights are trained by LW-based method and LS-based methodrespectively. The average of RMSEs and SDs are listed in rows

50%Avg. 0.4135 0.2809Std. 0.0609 0.0179

Note 1: A is LS-based method.Note 2: B+ is the proposed method with EDBD.



Fig. 14. Mackey–Glass process time series signal.

2404 Y.-K. Yang et al. / Applied Soft Comp

5

gflPTetafabf

A

tsC2

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

Fig. 15. The error comparison between LW-GRBFNFS and LS-GRBFNFS.

. Conclusion

This paper addresses on self-constructing least Wilcoxon-eneralized RBFNFS (LW-GRBFNFS) and its application to nonlinearunction approximation and chaotic time series prediction prob-ems. The proposed LW-GRBFNFS presents a novel scenario usingSO-based self-constructing and least-Wilcoxon norm learning.he LW-GRBFNFS can provide flexible and dynamic ability of gen-rating RBFNFS with better accuracy and much less sensitivityo outliers. The experimental results have shown the proposedpproach not only can effectively solve the problem of nonlinearunction approximation with outliers, but also can be successfullypplied to chaotic time series predictions. As a final remark, it is ourelief that the proposed method provides a promising methodologyor many nonlinear function approximation problems.

cknowledgements

The authors would like to thank the anonymous reviewers forheir insightful comments and suggestions. This work was in partupported by the National Science Council, Taiwan, Republic ofhina, under grant numbers NSC98-2221-E-011-118 and NSC99-221-E-259-002-MY3.

eferences

[1] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks areuniversal approximators, Neural Networks 2 (1989) 359–366.

[2] L.X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation andorthogonal least squares learning, IEEE Transactions on Neural Networks 3(1992) 807–813.

[3] B. Kosko, Fuzzy systems are universal approximators, IEEE Transactions onComputers 43 (1994) 1329–1333.

[4] T. Chen, H. Chen, Universal approximation to nonlinear operators by neuralnetworks with arbitrary activation functions and its application to dynamicalsystems, IEEE Transactions on Neural Networks 6 (1995) 911–917.

[5] J.L. Castro, Fuzzy systems with defuzzification are universal approximators,IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 26(1996) 149–152.

[6] F. Scarselli, A.C. Tsoi, Universal approximation using feedforward neuralnetworks: a survey of some existing methods, and some new results, NeuralNetworks 11 (1998) 15–37.

[7] D.S. Broomhead, D. Lowe, Multivariable functional interpolation and adaptivenetworks, Complex Systems 2 (1988) 321–355.

[8] D. Lowe, Adaptive radial basis function nonlinearities, and the problem of gen-eralization, in: Proceedings of IEE International Conference on Artificial NeuralNetworks, 1989, pp. 171–175.

[

uting 13 (2013) 2390–2404

[9] J.A.S. Freeman, D. Saad, Learning and generalization in radial basis functionnetworks, Neural Computation 9 (1995) 1601–1622.

10] J.S. Jang, C.T. Sun, Function equivalence between radial basis function networksand fuzzy inference systems, IEEE Transactions on Neural Networks 4 (1993)156–159.

11] S. Wu, M.J. Er, Dynamic fuzzy neural networks – a novel approach to functionapproximation, IEEE Transactions on Systems, Man, and Cybernetics Part B:Cybernetics 30 (2000) 358–364.

12] J. Moody, C.J. Darken, Fast learning in networks of locally-tuned processingunits, Neural Computation 1 (1989) 281–294.

13] J. Park, I.W. Sandberg, Universal approximation using radial-basis-functionnetworks, Neural Computation 3 (1991) 246–257.

14] N.B. Karayiannis, G.W. Mi, Growing radial basis neural networks: mergingsupervised and unsupervised learning with network growth techniques, IEEETransactions on Neural Networks 8 (1997) 1506–1942.

15] N. Zheng, Z. Zhang, G. Shi, Y. Qiao, Self-creating and adaptive learning of RBFnetworks: merging soft-completion clustering algorithm with network growthtechnique, Proceedings: International Joint Conference on Neural Networks 2(1999) 1131–1135.

16] J.C. Platt, A resource allocation network for function interpolation, Neural Com-putation 2 (1991) 213–225.

17] L. Yingwei, N. Sundararajan, P. Saratchandran, A sequential learning scheme forfunction approximation using minimal radial basis function neural networks,Neural Computation 9 (1997) 461–478.

18] Y. Li, N. Sundararajan, P. Saratchandran, Analysis of minimal radial basisfunction network algorithm for real-time identification of nonlinear dynamicsystems, IEE Proceedings – Control Theory and Applications 147 (2000)476–484.

19] G.B. Huang, N. Sundararajan, P. Saratchandran, An efficient sequential learningalgorithm for growing and pruning RBF (GAP-RBF) networks, IEEE Transactionson Systems, Man, and Cybernetics Part B: Cybernetics 34 (2004) 2284–2292.

20] G.B. Babu, S. Suresh, Meta-cognitive neural network for classification problemsin a sequential learning framework, Neurocomputing 81 (2012) 86–96.

21] S. Suresh, K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptiveresource allocation network classifier, Neurocomputing 73 (2010) 3012–3019.

22] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-categoryclassifier using radial basis function networks, Neurocomputing 71 (2008)1345–1358.

23] A.P. Engelbrecht, Computational Intelligence: An Introduction, 2nd ed., JohnWiley & Sons Ltd., Chichester, West Sussex, England, 2007, pp. 100–101.

24] C.L. Lin, S.T. Hsieh, T.Y. Sun, C.C. Liu, Cluster distance factor searching by ParticleSwarm Optimization for self-growing radial basis function neural network, in:Proceedings: International Joint Conference on Neural Networks, Vancouver,BC, Canada, 2006, pp. 4825–4830.

25] T.Y. Sun, C.C. Liu, C.L. Lin, S.T. Hsieh, C.S. Huang, in: A. Zemliak (Ed.), A RadialBasis Function Neural Network With Adaptive Structure Via Particle SwarmOptimization, I-Tech, 2009, pp. 423–436 (Chapter 26).

26] J.G. Hsieh, Y.L. Lin, J.H. Jeng, Preliminary study on Wilcoxon learning machines,IEEE Transactions on Neural Networks 19 (2008) 201–211.

27] V.G. Gudise, G.K. Venayagamoorthy, Comparison of particle swarm optimiza-tion and backpropagation as training algorithms for neural networks, in:Proceedings: IEEE Swarm Intelligence Symposium, 2003, pp. 110–117.

28] R.A. Jacobs, Increased rates of convergence through learning rate adaptation,Neural Networks 1 (1988) 295–307.

29] T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications tomodeling and control, IEEE Transactions on Systems, Man, and Cybernetics,Part. B: Cybernetics 15 (1985) 116–132.

30] M. Sugeno, G.T. Kang, Structure identification of fuzzy model, Fuzzy Sets andSystems 28 (1988) 15–33.

31] R.C. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in:Proceedings: 6th Int. Symp. Micro Machine and Human Science, 1995, pp.39–43.

32] S.T. Hsieh, T.Y. Sun, C.L. Lin, C.C. Liu, Effective learning rate adjustment of blindsource separation based on an improved particle swarm optimizer, IEEE Trans-actions on Evolutionary Computation 12 (2008) 242–251.

33] R.V. Hogg, J.W. McKean, A.T. Craig, Introduction to Mathematical Statistics, 6thed., Prentice-Hall, Englewood Cliffs, NJ, 2005.

34] B. Yunfei, L. Zhang, Genetic algorithm based self-growing training for RBFneural network, in: Proceedings: International Joint Conference on NeuralNetworks, 2002, pp. 840–845.

35] C.C. Lee, P.C. Chung, J.R. Tsai, C.I. Chang, Robust radial basis function neuralnetworks, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cyber-netics 29 (1999) 674–685.

36] C.C. Chuang, J.T. Jeng, P.T. Lin, Annealing robust radial basis function networksfor function approximation with outliers, Neurocomputing 56 (2004) 123–139.

37] T.Y. Sun, S.J. Tsai, C.H. Tsai, C.L. Huo, Nonlinear function approximation based on
Least Wilcoxon Takagi-Sugeno fuzzy model, in: Proceedings: the eighth Inter-national Conference on Intelligent Systems Design and Applications, KaohsiungCity, Taiwan, 2008, pp. 312–317.
38] M.C. Mackey, L. Glass, Oscillation and chaos in physiological control systems,Science 197 (1997) 287–289.

A novel self-constructing Radial Basis Function Neural-Fuzzy System

Documents

Transcript of A novel self-constructing Radial Basis Function Neural-Fuzzy System