An enhanced swarm intelligence clustering-based RBFNN classifier and its application in deep Web...

RESEARCH ARTICLE

An enhanced swarm intelligence clustering-based RBFNNclassifier and its application in deep Web sources

classification

Yong FENG (✉), Zhongfu WU, Jiang ZHONG, Chunxiao YE, Kaigui WU

College of Computer Science, Chongqing University, Chongqing 400030, China

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2010

Abstract The central problem in training a radial basisfunction neural network (RBFNN) is the selection ofhidden layer neurons, which includes the selection of thecenter and width of those neurons. In this paper, wepropose an enhanced swarm intelligence clustering(ESIC) method to select hidden layer neurons, and then,train a cosine RBFNN based on the gradient descentlearning process. Also, we apply this new method forclassification of deep Web sources. Experimental resultsshow that the average Precision, Recall and F of our ESIC-based RBFNN classifier achieve higher performance thanBP, Support Vector Machines (SVM) and OLS RBF forour deep Web sources classification problems.

Keywords swarm intelligence, Clustering, radial basisfunction neural network (RBFNN), deep Web sourcesclassification, classifier

1 Introduction

Radial basis function neural networks (RBFNNs) areoften trained in practice by hybrid learning algorithms.Such learning algorithms employ a supervised scheme forupdating the weights that connect the Radial basisfunction (RBFs) with the output units and an unsupervisedclustering algorithm for determining the centers of theRBFs, which remain fixed during the supervised learningprocess. Alternative learning algorithms relied on forwardsubset selection methods, such as the orthogonal least

squares (OLS) algorithm [1]. The relationship between theperformance of RBFNNs and their size motivated thedevelopment of network construction and/or pruningprocedures for autonomously selecting the number ofRBFs [2–5]. The problems of determining the number,shapes, and locations of the RBFs are essentially related toand interact with each other. Solving these problemssimultaneously was attempted by developing a multi-objective evolutionary algorithm [6].An alternative set of approaches to training RBFNNs

relied on gradient descent to update all their freeparameters [7]. This approach reduces the developmentof reformulated RBFNNs to the selection of admissiblegenerator functions that determine the form of the RBFs.Linear generator functions of a special form producedcosine RBFNNs, that is, a special class of reformulatedRBFNNs constructed by cosine RBFs. Cosine RBFs havesome attractive sensitivity properties, which make themmore suitable for gradient descent learning than GaussianRBFs [8]. An alternative set of approaches to trainingRBFNNs relied on gradient descent to update all their freeparameters. Training RBFNNs by a fully supervisedlearning algorithm based on gradient descent is sensitivelydependent on the properties of the RBFs.In this paper, we focus on a new clustering method

based on Swarm Intelligence Clustering (SIC) and apply itto the construction of the hidden layer of RBFN. Ourmethod firstly uses a self-organizing ant colony to find thecandidate hidden neurons, it then refines the RBF neuralnetwork with all candidate hidden neurons and employspreserving criterion to remove some redundant hiddenneurons. This new algorithm takes full advantage of the

Received July 6, 2009; accepted January 20, 2010

E-mail: [email protected]

Front. Comput. Sci. China 2010, 4(4): 560–570DOI 10.1007/s11704-010-0104-5

class label information and starts with a small neuralnetwork; hence it is likely to be more efficient and isexpected to generalize well. SIC is a heuristic clusteringalgorithm derived from the ethological simulations of realants’ collective behavior. SIC has characteristic propertiescomparable with alternative clustering methods: sampledata are collected into a few piles on a 2-d grid (which canbe regarded as SOM-like feature map) by ant-like agentsaccording to dissimilarity without knowing the number ofclusters.One of the first studies using the metaphor of ant

colonies related to the clustering domain is conducted byDeneubourg [9], where a population of ant-like agentsmoving randomly on a 2D grid are allowed to maneuverbasic objects so as to cluster them. This method was thenfurther generalized by Lumer and Faieta [10] (here afterLF model), applying it to exploratory data analysis, for thefirst time. In 1995, the two authors then went beyond thesimple example, and applied their algorithm to interactiveexploratory database analysis, where a human observercan probe the contents of each represented point (sample,image, item) and alter the characteristics of the clusters.They showed that their model provides a way of exploringcomplex information spaces, such as document orrelational databases, because it allows information accessbased on exploration from various perspectives. However,this last work entitled "Exploratory Database Analysis viaSelf-Organization", according to [11], was never pub-lished due to commercial applications. They applied thealgorithm to a database containing the "profiles" of 1650bank customers. Attributes of the profiles included maritalstatus, gender, residential status, age, a list of bankingservices used by the customer, etc. Given the variety ofattributes, some of them qualitative and others quantita-tive, they had to define several dissimilarity measures forthe different classes of attributes, and to combine theminto a global dissimilarity measure.More recently, a clustering algorithm based on Swarm

Intelligence (CSI) is systematically proposed by Wu andShi [12]. Ramos et al. [13,14] presented a novel strategy(ACLUSTER) to tackle unsupervised clustering as well asdata retrieval problems, avoiding not only short-termmemory based strategies, as well as the use of severalartificial ant types (using different speeds), present inthose approaches proposed initially by Lumer. Otherworks in this area include those from Han and Shi [15],Runkler [16], Kuo et al. [17], A. P. Chen and C. C. Chen[18].

The rest of the paper is organized as follows. In Section2, the architecture of the RBFNN model is provided. InSection 3, our enhanced swarm intelligence clustering(ESIC) algorithm is developed to get candidate hiddenneurons, and the ESIC-based cosine RBF neural networktraining process is presented. Experimental results arereported in Sections 4 and 5, some conclusions are alsoprovided at the end.

2 Architecture of the RBFNN model

The RBFNN has a structure shown in Fig. 1.

It consists of three layers, an input layer, a hidden layerand an output layer. The input nodes deliver the inputsignals to the hidden nodes. The activated function of ahidden neuron adopts a radiate function, being called theradial basis function, which has local response character-istics to the input signals. The Gaussian kernel function isthe most widely used.

The response of the jth hidden neuron to the input X ¼½x1,x2,:::,xn�T can be expressed as

uj ¼ expðX – cjÞTðX – cjÞ

2�2j

" #, j ¼ 1,2,:::,Nh, (1)

where uj is the output of the jth hidden neuron, cj is the

center for the jth hidden neuron, �j is the generalized

constant (probability divergence) and Nh is the number ofhidden nodes. The radial basis function has the localresponse characteristic shown in Eq. (1). This means that ahidden node will produce greater output when the inputsignal is near the central scope of the kernel function. Thegeneralized constant �j limits the scope of input space so

that the radial basis function neuron can respond. So

Fig. 1 Architecture of RBFNN

Yong FENG et al. ESIC-based RBFNN classifier and its application in deep Web sources classification 561

RBFNN can be also called the local perception network.The output nodes usually use simple line functions, whichare linear combinations of the output of the hidden nodes.The output of each node in the output layer is defined by

yi ¼XNh

j¼1

wijuj – �i ¼ WTi U , i ¼ 1,2,:::,M , (2)

where

Wi ¼ ½wi1,wi2,:::,wiNh, – �i�, U ¼ ½u1,u2,:::,uNh

,1�T:Wi is the weight matrix from the hidden node to the ithoutput node, and �i is the respective threshold value.

3 ESIC-based RBFNN

The learning of an RBF network is divided into twostages: Firstly, according to the input samples, determinethe center value cj and the generalized constant �j of the

Gauss function of each node of hidden layer; Secondly,employ the gradient descent learning process to training acosine RBFNN, remove some redundant neurons andadjust the weight between hidden neurons and outputunits. Once the center, cj, of the radial basis function has

been determined, the output weight value and thresholdvalue can be solved according to the gradient descentlearning process. Therefore, it is most important todetermine the center of the radial basis function forconstructing the RBF network by the given trainingsamples.

3.1 Choosing the center of the radial basis function of

RBFNN using ESIC

Because the activated function of RBF network is local,the nearer the input sample comes to the center of thekernel function, the greater the output value the hiddennode produces. Therefore, it is very important to choosean accurate center for the kernel function so as to improvethe efficiency of the RBF network. This paper presents aself-adaptive clustering algorithm to determine the centerof the radial basis function based on our enhanced swarmintelligence clustering (ESIC) algorithm. This algorithm isa dynamically adaptive clustering algorithm that need notdetermine the clustering number in advance, it dispenseswith the iterative process executed for different clusteringmodes, and avoids the complicated process of determiningthe optimal number of hidden neurons by gradually

increasing clustering modes. ESIC increases the proces-sing speed for clustering.ESIC has three main steps. First, data objects are

randomly projected onto a plane. Second, each antchooses an object at random, and picks up, moves ordrops the object according to picking-up or droppingprobability. Finally, clusters are collected from the plane.Definition 1 Swarm similarity is the integrated simi-

larity of a data object with other data objects within itsneighborhood.A basic formula of measuring the swarm similarity is

shown in Eq. (3).

f ðoiÞ ¼X

oj2NeighðrÞ1 –

dðoi,ojÞβ

� �, (3)

where Neigh(r) denotes the local region, it is usually arounded area with a radius r, d(oi, oj) denotes the distanceof data object ob with oj in the space of attributes. It isusually Euclidean distance or City block distance. Theparameter β is defined as swarm similarity coefficient. It isa key coefficient that directly effects the number ofclusters and the convergence of the algorithm. If β is toolarge, dissimilar data objects will cluster together, and thealgorithm converges quickly, whereas if β is too small,similar data objects will not cluster together, and thealgorithm converges slowly.Definition 2 The probability conversion function is a

function of f(oi) that converts the swarm similarity of adata object into picking-up or dropping probability for asimple agent.The picking-up probability for a randomly moving ant

that is currently not carrying an object to pick up an objectis given by

Pp ¼1

2–1

πarctan

f ðoiÞα

: (4)

The dropping probability for a randomly moving loadedant to deposit an object is given by

Pd ¼1

2þ 1

πarctan

f ðoiÞα

, (5)

where α is a positive constant and can increase the rate ofconvergence if it is decreased.Instead of using the linear segmentation function of the

CSI model and the complex probability conversionfunction of the LF model, here we propose to use asimple nonlinear probability conversion function in ESIC.According to experiments on UCI and KDD’99 (our

562 Front. Comput. Sci. China 2010, 4(4): 560–570

previous work) [19], ESIC can help to solve linearlyinseparable problems of the CSI model and slow theconvergence speed of the LF model. Fig. 2 shows thechanging trend of Pp and Pd.

From the above formulae, we have found: the smallerthe similarity of an object is (i.e., there are not manyobjects that belong to the same cluster in its neighbor-hood), the higher the probability of picking-up and thelower the probability of dropping; on the other hand, thelarger the similarity is, the lower the probability ofpicking-up (i.e., objects are unlikely to be removed fromdense clusters) and the higher the probability of dropping.The ESIC algorithm is described as follows:

Input: sample modes XOutput: the candidate hidden neurons H (clusters), k is thepattern clustering number, cj (j ¼ 1,2,:::,k) is the cluster-

ing center.1) Initialize β, ant-number, maximum iterative times n,

α, and other parameters.2) Project the data objects on a plane at random, i.e.,

randomly give a pair of coordinate (x, y) to each dataobject.3) Give each ant initial objects, initial state of each ant is

unloaded.4) for i = 1,2,...,n // while not satisfying stop criteria for j

= 1,2,...,ant-numbera) Compute f(oi) within a local region with radius r

by Eq. (3).b) If the ant is unloaded, compute Pp by Eq. (4).

Compare Pp with a random probability Pr, if Pp<Pr, theant does not pick up this object, another data object israndomly given the ant, else the ant pick up this object, thestate of the ant is changed to loaded.

c) If the ant is loaded, compute Pd by Eq. (5).Compare Pd with Pr, if Pd>Pr, the ant drops the object,the pair of coordinate of the ant is given to the object, thestate of the ant is changed to unloaded, another data objectis randomly given the ant, else the ant continue movingloaded with the object.

5) for i = 1, 2,...,pattern-num //for all patternsa) If an object is isolated, i.e., its number of neighbors

is less than a given constant, then label it as an outlier.Else label this pattern with cluster serial number serial-

num; recursively apply the label to those patterns whosedistance to this pattern is smaller than a short distance dist.i.e., collect the patterns belong to a same cluster on theagent-work plane.

b) Serial number serial-num++.6) Compute the cluster means of the serial-num clusters

as initial cluster centers.7) Repeata) (Re)Assign each pattern to the cluster to which the

pattern is the most similar, based on the mean value of thepatterns in the cluster.

b) Update the cluster means, i.e., calculate the meanvalue of the patterns for each cluster.8) Until no change.

3.2 Training Cosine RBFNN

After choosing the center of the RBFs, we can employ thegradient descent learning process to training a cosineRBFNN and remove some redundant neurons.Consider an RBFNN with inputs from Rn, c RBFs and

M output units. Let vj 2 Rn be the prototype that is center

of the jth RBF and wi ¼ ½wi1,wi2,:::,wic�T be the vectorcontaining the weights that connect the ith output unit tothe RBFs. Define the sets V ¼ fvig andW ¼ fwjg and letalso A ¼ faig be a set of free parameters associatedwith the RBFs. An RBFNN is defined as the functionN : Rn ! Rm that maps X 2 Rn to NðV ,W ,A;X Þ, suchthat:

YiNðV ,W ,A;X Þ ¼ f

Xcj¼1

wijgjð X – vj�� 2Þ þ wi0

!,

(6)

where f ðxÞ ¼ 1=ð1þ e – xÞ used in this paper, gj representsthe response of the RBF centered at the prototype vj.

Using this notation, the response of the ith output unit tothe input xk is

y~i,k ¼

YiNðxkÞ ¼ f

Xcj¼1

wijgj,k þ wi0

!, (7)

where gj,k represents the response of the RBF centered at

the prototype vj to the input vector xk . Unlike the

traditional RBFNN using the exponential functions, in

Fig. 2 Changing trend of Pp (left) and Pd (right)


this paper, we use the following cosine function for gj,k :

gi,k ¼ aj=ð xk – vj�� 2 þ a 2

j Þ1=2: (8)

Cosine RBFNNs can be trained by the original learningalgorithm, which was developed by using “stochastic”gradient descent to minimize:

Ek ¼ 1=2Xmi¼1

y~i,k – yi,k

� �2, k ¼ 1,2,:::,n: (9)

For sufficiently small values of the learning rate,sequential minimization of Ek , leads to a minimum total

error E ¼Xn

k¼1Ek . After an example ðxk ,ykÞ is presented

to the RBFNN, the new estimate wi,k of each weight

vector wi, is obtained by incrementing its current estimateby the amount Δwi,k ¼ – �rwiEk , where � is the learning

rate.

wi,k ¼ wi,k – 1 þ Δwi,k

¼ wi,k – 1 þ � gi,k y~i,k 1 – y

~i,k

� �yi,k – y

~ik

� �: (10)

The new estimate aj,k of each reference distance aj, can

be obtained by incrementing its current estimate by theamount as Δaj,k ¼ – �∂Ek=∂aj.

ai,k ¼ ai,k – 1 þ Δai,k

¼ ai,k – 1 þ �gj,kð1 – g2j,kÞεhj,k=ai,k – 1εhj,k

¼ ðg3j,k=a2j ÞXci¼1

f # y~i,k

� �yi,k – y

~i,k

� �wi,j: (11)

According to Eq. (8), the jth cosine RBF can beeliminated during the training process if its referencedistance aj approaches zero.

3.3 ESIC-based RBFNN Classifier

ESIC-based RBFNN classifier which includes trainingand testing processes is shown in Fig. 3.We can get the new algorithm to train the RBF classifier

which uses the ESIC algorithm, initially to get thecandidate hidden neurons, and subsequently, to train theneural network based on the gradient descent learningprocess described in this section.

4 Experiments on KDD’99 and UCI

In this section, we use the 1999 KDD intrusion detectiondatasets and UCI datasets to test ESIC-based RBFNN

Fig. 3 ESIC-based RBFNN classifier


classifier. Several different kinds of classification methodare compared with our ESIC-based RBFNN classifierusing the KDD’99 and UCI datasets. In the experiments,we compare our approach with the traditional OLS RBFclassifier, SVM (Support Vector Machines) classifier andBP neural network classifier.

4.1 Experiments on KDD’99

The 1999 KDD intrusion detection contest used the 1998DARPA intrusion detection dataset to construct connec-tion records and extract object features. The 1998 DARPAintrusion detection dataset was acquired from nine weeksof raw TCP dump data for a local-area network (LAN)simulating a typical U.S. Air Force LAN, peppered withfour main categories of attacks: DoS, Probe, U2R, R2L. Aconnection record is a sequence of TCP packets startingand ending at well defined times, between which dataflows to and from a source IP address to a target IP addressunder a well defined protocol. Each connection is labeledas either normal, or as an attack, with exactly one specificattack type. For each TCP/IP connection, 41 quantitativeand qualitative features were extracted.Five testing datasets (TDS1-TDS5) are prepared to

perform the algorithms. Each dataset contains 6000instances (consisting of 1% to 1.5% intrusion instances

and 98.5% to 99% normal instances), all of which areselected at random from the normal dataset and abnormaldataset respectively. Fig. 4 shows the number of attacks inthe datasets.To evaluate the algorithm we observe two major

indicators of performance: DR (Detection Rate) andFPR (False Positive Rate) both measured as percentages.Parameter settings as follows: The BP algorithm were

trained with nh hidden neurons, nh was varied from 10 to50, and the maximum training cycles is 6000. The widthparameter of radial function is the most important to theOLS RBF classifier; it was varied between 1 and 4 in thepaper and the maximum RBFs is 100. In SVM classifier

experiment, the kernel function is Kðu,vÞ ¼ ½ðu$vÞ þ 1�d ,where d is the order of the kernel function, the value of d islarger and the performance of SVM classifier is higher, sowe adjust d from 1 to 5 (interval is 1). For the ESIC-basedRBFNN classifier, the learning rate used for updating theoutput weights and prototypes of the cosine RBFNNs was� ¼ 0:01, the swarm similarity coefficient β from 0.85 to0.20 (interval is 0.05), the other parameters are set asfollows: ant–number = 15, r = 12, α = 10, n = 50�1000.The results from the Table 1 show that, our ESIC-based

RBFNN maintains good performance for all 5 differenttesting datasets, with an average DR greater than 98.4%(BP is 92.1%, SVM is 95.1% and OLS RBF is 82.3%).

Fig. 4 Number of attacks in the data sets

Table 1 Performance comparison across 5 different testing datasets of KDD’99

Dataset ESIC-based RBFNN BP SVM OLS RBF

DR FPR DR FPR DR FPR DR FPR

TDS1 98.3 0.14 97.6 0.27 98.1 0.19 87.6 0.31

TDS2 97.7 0.21 96.4 0.33 96.4 0.26 80.5 0.36

TDS3 98.7 0.17 86.5 0.36 89.1 0.29 86.5 0.38

TDS4 99.2 0.09 82.1 0.41 93.6 0.17 77.8 0.42

TDS5 98.4 0.19 97.8 0.26 98.3 0.24 79.3 0.59


The average FPR is lower than 0.16% (BP is 0.33%, SVMis 0.23% and OLS RBF is 0.41%). We can also see thatour ESIC-based RBFNN classifier performs more con-sistently across the datasets and is therefore moreadaptable to different datasets of KDD’99 than BP,SVM and OLS RBF models.

4.2 Experiments on UCI

Three testing datasets (‘ecoli’, ‘glass’ and ‘wine’) areselected from UCI to perform the algorithms. Table 2shows properties of ‘ecoli’, ‘glass’ and ‘wine’. Parametersettings are as experiments on KDD’99.

The results from the Table 3 shows that, our ESIC-based RBFNN keeps a good performance on 3 differenttesting datasets of UCI that the average classificationaccuracy is higher than 95.1% (BP is 84.6%, SVM is87.7% and OLS RBF is 80.7%). It is also shown thatESIC-based RBFNN classifier is more adaptable todifferent datasets of UCI than BP, SVM and OLS RBFmodels.

5 Application in deep Web sourcesclassification

Searching on the Internet today can be compared todragging a net across the surface of the ocean. While agreat deal may be caught in the net, there is still a wealthof information that is deep, and therefore, missed. Thereason behind this is simple: Most of the Web’sinformation is buried far down on dynamic sites, andstandard search engines never find it. Information held indeep Web sources can only be accessed through theirquery interfaces [20]. A study [21] in April 2004estimated 450000 such Web data-sources. With a myriad

data-sources on the Web, users may be unable to find thesources satisfying their information requirements. Forinstance, considering we want to buy a CD product on theInternet. There are many deep Web sources providing theservices, but we may not be aware of their existence, andnot know which sources can satisfy our demands, such asthe brand, price, discount, payment, delivery cost etc. Sothat it is of great importance to build a system to integratethe myriad deep of Web sources in the Internet.Such integration faces great challenges [22]. First, the

scale of deep Web pages is between 400 to 500 timeslarger than the static pages, and they are still proliferatingrapidly. Second, the query conditions are variable, andthere are no rules for building unified query interfaces.Facing this challenge, a compromising strategy isinevitable: Using domain-based integration [23]. Queryinterfaces of deep Web sources in the same domain canhave much in common. So the classification of deep Websources is very important in the integration phase.Furthermore, all deep Web sources are inside autonomoussystems. Finally, classification also provides a way toorganize the myriad deep Web sources, similar to thecategory directory provided by yahoo.We have conducted experiments to test the performance

of our method on datasets (Airfrares.xml, Movies.xml,MusicRecords.xml, Books.xml) [24], which relate to fourdifferent domains of deep Web sources, they are Airfares,Movies, Music, and Books, and there are 260 differentquery interface schemas involved. In this paper, weorganized the types of attributes following the methoddescribed in Ref. [25]. We select all the distinct attributesfrom the training samples. According the deep Web model[23], using the probability Pðf Þ£5%, we filter theconcepts as features.

Pðf Þ ¼ Nf

N, (12)

where f is a feature. Nf is the number of deep Web sources

that f appeared in, in the training samples, and N is thetotal number of features.Several different kinds of classification methods are

compared with our ESIC-based RBFNN classifier on the

Table 2 Properties of ‘ecoli’, ‘glass’ and ‘wine’

Dataset Number of samples Number of features Number of classes

ecoli 336 8 8

glass 214 10 7

wine 178 13 3

Table 3 Classification accuracy comparing on 3 different testing datasets of UCI (%)

Dataset ESIC-based RBFNN BP SVM OLS RBF

ecoli 93.7 81.5 83.3 75.8

glass 94.1 79.8 82.6 77.9

wine 97.5 92.5 97.2 88.3


datasets. In the experiment, we compare our approachwith the traditional OLS RBF classifier, SVM (SupportVector Machines) classifier and BP neural networkclassifier.To evaluate the algorithm we are interested in 3 major

indicators of performance: Precision, Recall and F. F is afunction of Precision and Recall.Supposed that there are n deep Web sources, dw1,

dw2,…, dwn, with domain identified l1, l2,…, lm, let TPdenote the number of deep Web sources that are correctlyclassified, FN denotes number of deep Web sources thatare incorrectly excluded from li, FP denotes the number ofdeepWeb sources belong to lj but are incorrectly classified

into li.

Precision ¼ TP

TP þ FP, (13)

Recall ¼ TP

TP þ FN, (14)

F ¼ 2Precision� Recall

Precisionþ Recall: (15)

Parameter settings are the same as in experiments onKDD’99 and UCI. The experimental results for ESIC-based RBFNN classifier are reported in Fig. 5. Results forBP, SVM and OLS RBF are reported in Fig. 6, 7 and 8,respectively.

Fig. 5 Experimental results for ESIC-based RBFNN classifier

Fig. 6 Experimental results for BP classifier


The results from the Fig. 5 shows that, the variation ofswarm similarity coefficient influences the number of theoriginal RBFNNs, but the ESIC-based RBFNN classifierkeeps a good performance that the average Precision ishigher than 92.13%, the average Recall is higher than95.4% and the average F is higher than 93.73%.According to testing results, we found that, the

performance of ESIC-based RBFNN classifier isincreased obviously than the traditional RBF networkclassifier and BP. The average Precision, Recall and F arealso higher than BP (82.09%, 89.71% and 85.73%), SVM(91.56%, 94.61% and 93.06%), and OLS RBF (76.96%,82.28% and 79.52%).The average performance of ESIC-based RBFNN

classifier and the other classifiers is presented in Fig. 9.

We can see from Fig. 9, that there is little differencebetween the performance of ESIC-based RBFNN classi-fier and SVM. When the environment changes because ofan external perturbation, the ants respond appropriately tothat perturbation as if it were a modification of theenvironment caused by the ant's activities. According toour testing experiments on UCI, KDD’99 and deep Websource datasets, we see that ESIC-based RBFNN classifieris more adaptable to our different datasets than the SVMmodel.

6 Conclusions

This paper proposes an enhanced swarm intelligenceclustering-based RBF neural network classifier, which

Fig. 7 Experimental results for SVM classifier

Fig. 8 Experimental results for OLS RBF classifier


contains two main stages: i) employing ESIC to find thecandidate hidden neurons; ii) employing preservingcriterion remove redundant neurons and adjust the weightbetween hidden neurons and output units. Experimentalresults indicate that our ESIC-based RBFNN classifier hasmore desirable classification ability when compared withBP, SVM and OLS RBF for UCI, KDD’99 and deep Websource datasets.

Acknowledgements This work was supported by the Natural ScienceFoundation Project of CQ CSTC (Grant No. 2008BB2183), ChinaPostdoctoral Science Foundation (No. 20080440699), National SocialSciences Fund Special Projects in Educational Science (No. ACA07004-08) and Key Projects in the National Science & Technology Pillar Program(No. 2008BAH37B04).

References

1. Chen S, Cowan C N, Grant P M. Orthogonal least squares learning

algorithm for radial basis function networks. IEEE Transactions on

Neural Networks, 1991, 2(2): 302–309

2. Mao K Z, Huang G B. Neuron selection for RBF neural network

classifier based on data structure preserving criterion. IEEE

Transactions on Neural Networks, 2005, 16(6): 1531–1540

3. Huang G B, Saratchandran P, Sundararajan N. A generalized

growing and pruning RBF (GGAP-RBF) neural network for

function approximation. IEEE Transactions on Neural Networks,

2005, 16(1): 57–67

4. Lee S J, Hou C L. An ART-based construction of RBF networks.

IEEE Transactions on Neural Networks, 2002, 13(6): 1308–1321

5. Lee H M, Chen C M, Lu Y F. A self-organizing HCMAC

neural-network classifier. IEEE Transactions on Neural Networks,

2003, 14(1): 15–27

6. Gonzalez J, Rojas I, Ortega J, Pomares H, Fernandez F J, Diaz A F.

Multiobjective evolutionary optimization of the size, shape, and

position parameters of radial basis function networks for function

approximation. IEEE Transactions on Neural Networks, 2003, 14

(6): 1478–1495

7. Karayiannis N B. Reformulated radial basis neural networks trained

by gradient descent. IEEE Transactions on Neural Networks, 1999,

10(3): 657–671

8. Randolph-Gips M M, Karayiannis N B. Cosine radial basis

function neural networks. In: Proceedings of the International Joint

Conference on Neural Networks. New York: IEEE Press, 2003, 96–

101

9. Deneuhourg J L, Goss S, Franks N. The dynamics of collective

sorting: robot-like ant and ant-like robot. In: Proceedings of the 1st

international conference on simulation of adaptive behavior on

from animals to animats. Cambridge: MIT Press, 1991, 356–365

10. Lumer E, Faieta B. Diversity and adaptation in populations of

clustering ants. In: Proceedings of the third international conference

on Simulation of adaptive behavior: from animals to animats 3:

from animals to animats 3. Cambridge: MIT Press, 1994, 499–

508

11. Bonabeau E. Dorigo M Swarm intelligence: from natural to

artificial systems: Oxford University Press, 1999

12. Wu B, Shi Z Z. A clustering algorithm based on swarm intelligence.

In: Proceeding of International Conferences on Info-tech & Info-

net. New York: IEEE Press, 2001, 58–66

13. Ramos V, Pina P, Muge F. Self-organized data and image retrieval

as a consequence of inter-dynamic synergistic relationships in

Fig. 9 Average performance comparison


artificial ant colonies. In: Frontiers in Artificial Intelligence and

Applications, Soft Computing Systems-Design, Management and

Applications. Amsterdam: IOS Press, 2002, 500–509

14. Ramos V, Merelo J J. Self-organized stigmergic document maps:

environment as a mechanism for context learning. In: Proceeding of

1st Spanish Conference on Evolutionary and Bio-Inspired Algo-

rithms. Spain: Merida University Press, 2002, 284–293

15. Han Y F, Shi P F. An improved ant colony algorithm for fuzzy

clustering in image segmentation. Neurocomputing, 2007, 70(4–6):

665–671

16. Runkler T A. Ant colony optimization of clustering models.

International Journal of Intelligent Systems, 2005, 20(12): 1233–

1251

17. Kuo R J, Wang H S, Hu T L, Chou S H. Application of ant k-means

on clustering analysis. Computers & Mathematics with Applica-

tions (Oxford, England), 2005, 50(10–12): 1709–1724

18. Chen A P, Chen C C. A new efficient approach for data clustering

in electronic library using ant colony clustering algorithm.

Electronic Library, 2006, 24(4): 548–559

19. Feng Y, Zhong J, Xiong Z Y, Ye C X, Wu K G. Network anomaly

detection based on DSOM and ACO clustering. Lecture Notes in

Computer Science. Advances in Neural Networks – ISNN 2007.

LNCS, 2007, 4492(Part 2): 947–955

20. He B, Patel M, Zhang Z, Chang K C C. Accessing the deep Web.

Communications of the ACM, 2007, 50(5): 94–101

21. Ghanem T M, Aref W G. Databases deepen the Web. Computer,

2004, 37(1): 116–117

22. Liu W, Meng X F, Meng W Y. A survey of deep Web data

integration. Chinese Journal of Computers, 2007, 30(9): 1475–

1489 (in Chinese)

23. Wang Y, Zuo W L, Peng T, He F L. Domain-specific deep Web

sources discovery. In: Proceeding of Fourth International Con-

ference on Natural Computation, New York: IEEE Press, 2008,

202–206

24. He B. UIUC Web integration repository. Computer Science

Department, University of Illinois at Urbana-Champaign. http://

metaquerier.cs.uiuc.edu/repository, 2003

25. He B, Chang K C C. Automatic complex schema matching across

Web query interfaces: a correlation mining approach. ACM

Transactions on Database Systems, 2006, 31(1): 346–395


An enhanced swarm intelligence clustering-based RBFNN classifier and its application in deep Web...

Documents

Transcript of An enhanced swarm intelligence clustering-based RBFNN classifier and its application in deep Web...