Fuzzy Signature Neural Network - CECS - ANUcourses.cecs.anu.edu.au/courses/CS_PROJECTS/12S1/... ·...
Transcript of Fuzzy Signature Neural Network - CECS - ANUcourses.cecs.anu.edu.au/courses/CS_PROJECTS/12S1/... ·...
Fuzzy Signature Neural Network
Kun He
1st June 2012
A report submitted for the degree of Master of Computing of
Australian National University
Supervisor: Prof. Tom Gedeon
Page 1
Acknowledgements
Thanks to my supervisor Tom Gedeon, and Dingyun Zhu for their recommendations and kind
support on this project. Also thanks to the course coordinator Weifa Liang for supporting the
skill of writing report. Moreover, thanks to Wei Fan for his suggestions about the knowledge of
fuzzy signature neural networks. Finally, thanks to my family and my friends for their
encouragements.
Page 2
Abstract
In this report, firstly, we introduce the background of neural networks and fuzzy signatures,
and then focus on the fuzzy signature neural network. The fuzzy signatures are used as the
solution for fuzzy rule based systems to reduce the rule explosion issue. The neural networks
which we consider use Radial Basis Functions as the activation function, which are real value
functions whose value only depends on the distance from the centroid of that function.
We modify the previous approach to improve the method of choosing aggregation functions,
and the method of creating the structure of the fuzzy signature neural network. The new
method reduces the risk that the results depends highly on the manual choice of aggregation
function and manual choice of structure of fuzzy signature.
Finally, this report presents experimental evaluation for the fuzzy signature neural network by
three experiments. The first and second experiments compare the fuzzy signature based on
RBF neural networks. They show that our approach is better than the previous fuzzy signature
based RBF neural network when the datasets have significant numbers of missing values. The
third experiment compares our work with other neural networks. The results demonstrate that
our approach is viable and worth further investigation.
Page 3
List of Abbreviations
FSNN Fuzzy signature neural network NN Neural Networks
ANN Artificial Neural Networks
RBF Radial Basis Function
Cascor Cascade Correlation Neural Network
sNN Symmetric Nearest Neighbor
Page 4
Contents
Acknowledgements .................................................................................................................................. 1
Abstract ................................................................................................................................................... 2
List of Abbreviations................................................................................................................................. 3
List of Figures .......................................................................................................................................... 6
List of Tables ............................................................................................................................................ 7
1. Introduction...................................................................................................................................... 8
1.1 Motivation .......................................................................................................................... 8
1.2 Objectives ........................................................................................................................... 8
1.3 Contribution ....................................................................................................................... 8
1.4 Preview .............................................................................................................................. 9
2. Background and relevant knowledge .................................................................................................. 9
2.1 Neural Networks ................................................................................................................. 9
2.2 Radial Basis Function......................................................................................................... 11
2.3 Fuzzy Rules Based system .................................................................................................. 12
2.4 Fuzzy Signature ................................................................................................................. 13
3. Fuzzy Signature Neural Network ...................................................................................................... 14
3.1 Basic structure and process of fuzzy signature neural network ............................................ 14
3.2 Improvement of fuzzy signature neural network ................................................................. 15
4. Design and Implementation of Fuzzy signature neural network ......................................................... 15
4.1 Description ....................................................................................................................... 16
4.2 Construction of fuzzy signature neural network .................................................................. 17
4.2.1 Damaged data ............................................................................................................ 17
4.2.2 Clustering .................................................................................................................. 18
4.2.3 Create fuzzy signature ................................................................................................ 19
4.2.3.1 Structure of fuzzy signature .................................................................................... 19
4.2.3.2 Obtain fuzzy signature information .......................................................................... 20
4.2.3.3 Aggregation .......................................................................................................... 21
4.2.4 Create neural network ................................................................................................ 22
4.2.5 Training the Fuzzy Signature Neural Network .............................................................. 22
4.3 Testing .............................................................................................................................. 24
4.3.1 Testing network ......................................................................................................... 24
4.3.2 Extracting network information .................................................................................. 25
5. Experiments and Evaluation............................................................................................................. 25
5.1 Description of the dataset .................................................................................................. 26
5.2 Hardware and software environment information ............................................................... 27
5.3 Experiment 1: Datasets experiments with no missing data................................................... 28
5.3.1 Purpose of the experiment .......................................................................................... 28
5.3.2 Description of the experiment ..................................................................................... 28
5.3.3 Experiment Process and Discussion of Results ............................................................. 29
5.4 Experiment 2: Datasets experiments with missing data ....................................................... 30
5.4.1 Purpose of the experiment .......................................................................................... 30
5.4.2 Description of the experiment ..................................................................................... 30
Page 5
5.4.3 Experiment Process and Discussion of Results ............................................................. 31
5.5 Experiment 3: Benchmarks comparison between Fuzzy signature neural network and other
approaches ..................................................................................................................................... 32
5.5.1 Purpose of the experiment .......................................................................................... 32
5.5.2 Description of the experiment ..................................................................................... 32
5.5.3 Experiment Process and Discussion of Results ............................................................. 33
6. Conclusion and Future Works........................................................................................................... 33
6.1 Conclusion ........................................................................................................................ 33
6.2 Future Work...................................................................................................................... 33
Reference ............................................................................................................................................... 35
Appendix A ............................................................................................................................................ 37
Page 6
List of Figures
Figure 1: Example of a basic NN .......................................................................................................... 10
Figure 2: Example of a neural network ................................................................................................. 11
Figure 3: Example of structure of fuzzy signature of SARS patient ............................................................ 13
Figure 4: Example of aggregation of SARS patient.................................................................................. 14
Figure 5: Example of Fuzzy signature based redial basis function neural network ...................................... 15
Figure 6: Construction of fuzzy signature neural network & testing suit ................................................... 16
Figure 7: Example of Agglomerative hierarchical clustering..................................................................... 18
Figure 8: Examples of structure of fuzzy signature for SARS .................................................................... 20
Figure 9: The example of structures of fuzzy signature according to Figure 8 ............................................ 20
Figure 10: Example of aggregation function selection ............................................................................ 21
Figure 11: Manhattan distance ........................................................................................................... 23
Page 7
List of Tables
Table 1: Information of files................................................................................................................ 25
Table 2: Datasets details .................................................................................................................... 26
Table 3: the information of Hard and Software environment .................................................................. 28
Table 4: the detail of number of cluster selected ................................................................................... 29
Table 5: the benchmark for our approach............................................................................................. 29
Table 6: the benchmark for Fan's approach .......................................................................................... 30
Table 7: the benchmark for our approach with missing value .................................................................. 31
Table 8: the benchmark for Fan's approach with missing value ............................................................... 31
Table 9: the information of our approach that will be used..................................................................... 32
Table 10: the results of 3 different neural networks ............................................................................... 33
Page 8
1. Introduction
1.1 Motivation
Human decision making is a comprehensible hierarchical process, in which cognition processes
lead to the selection of a set of actions among many variations, so bio-inspired techniques are
used as human decision making foundation. The design of intelligent systems, in Artificial
intelligence or its descendent Computational Intelligence is a problem of identifying
approximate models to describe a real world scenario [1]. Nowadays, AI is applied is a wide of
variety of fields, such as business, math, industry, medical science, and so on. However, if those
systems consist of very complex structured high dimensional data, and sometimes with
interdependent features and missing components, conventional AI systems are not adequate
[1]. Therefore, how to handle the datasets correctly and effectively or how to design structure
which is trend to real word, all of these have become a key issue in decision-making under
uncertainty.
An efficiency issue called rule explosion affects fuzzy rule based system which is conventional
AI system [1]. It will grow exponentially with the number of input dimensions. In this case,
fuzzy signature method is used by Gedeon et [2], which can be used in fuzzy rule based system,
meanwhile solving the rule explosion issue.
Neural Networks have very strong nonlinear fitting ability, can represent any complex
nonlinear mapping relationship, and also have strong robustness, memory, nonlinear mapping
ability and strong learning ability.[3] However, the neural network approach does not produce
easily comprehensible results, especially for networks that contain very complex structured
high dimensional data. [4]The fuzzy signatures based neural network approach has been used
to combine the benefits of neural networks and fuzzy signatures. In our project, we choose
Radial Basis Function as the activation function. Moreover, each fuzzy signature is seem as the
hidden neural in Radial Basis function. So fuzzy signature neural network can gain the
advantages of neural networks, at the same it can solve the efficiency issue of rule explosion in
fuzzy rule based system.
1.2 Objectives
The aim of this project is to implement and improved fuzzy signature neural network based on
Fan’s code [3], and to provide a detailed investigation, then evaluate this approach by testing
data, and compare with the previous approach.
1.3 Contribution
The contribution of this project involves the three following areas. Firstly, the FSNN code
Page 9
written by Fan is simplified and modified in order to implement and improve FSNN. Secondly,
the sequence training code is designed for FSNN, and is used to compare with other neural
networks, also including the previous FSNN. Finally, the feasibility and future works of such a
neural network based on the results of the experiments is discussed.
1.4 Preview
Chapter 2 gives an overview of the relevant techniques and the basic concepts, including neural
networks, radial basis function neural networks, and fuzzy rule based system and fuzzy
signature, which are helpful for understanding FSNN and its evaluation. Fallowing that; in
Chapter 3 introduction and especially focuses on relevance to the fuzzy signature neural
network. Chapter 4 demonstrates the techniques and methodologies of basic fuzzy signature
neural networks and then describes the test suite for the implementation, then presents the
design, implement and testing. Chapter 5 is about evaluating this approach based on the
implementation in Chapter 4,We used 3 different experiment to evaluate our approach in
various aspect. Finally, Chapter 6 concludes the report and indicates the weaknesses as well as
suggestions for future work.
2. Background and relevant knowledge
2.1 Neural Networks
An artificial neural network, also called neural network, which is a mathematical
model or computational model that is inspired by the structure and/or functional aspects
of biological neural networks. It consists of an interconnected group of artificial neurons, which
processes information using a connectionist approach to computation [5]. Modern neural
networks are non-linear statistical data modeling tools which means it has an ability to solve
some problems that do not have a known statistical model [3]. So they are usually used to
model complex relationships between inputs and outputs or to find patterns in data. The
following figure shows the basic principle of NNs.
Page 10
X1
X2
XN
……
Y
W1
W2
WN
Inputs Weights Outputs
∑WX
Activation fuction
Figure 1: Example of a basic NN
According to the Figure 1, X represents a number of input data either from original data or the
output of other neurons; the strength of the connection between input data and neurons is
called weights. Finally the activation function that converts a neurons weighted input to its
output activation. The activation function could be non-liner function such as Gaussian function,
sigmoid function. The activation function formula is shown below [5].
y = f (∑wixi
n
i=1
)
Where y is the output of the neuron
n is the number of the inputs
i is the i th of the neuron
𝑥𝑖 is the value of input i of the neuron
𝜔𝑖 is the weight value of input i i is the value of input i of the neuron
f is the activation function, e.g. for sigmoid, f(x)=1
1+e−x
Equation 1: basic equation of neural network
However a single layer neural network cannot handle complex problems because its structure
is too simple, and then we need to create multiple layer neural networks. This neural network
consists of numbers of neurons so that they are able to solve more complex problems. It consists of 3 different kind neurons which are input neurons, hidden neurons, and output
neurons, and then each neuron is located at different layers which are called input layer, hidden
layer and output layer respectively. [5] Figure 2 shows an example neuron network.
Page 11
x2
xn
Hidden neuron
Hidden neuron
Hidden neuron
y2
y2
x1
Inputs layer Hidden layer Output layer
...
Weights matrix
Weights matrix
Figure 2: Example of a neural network
In order to create a neural networks module, we need two procedures which are training and
testing. The first necessary procedure is training the neural network. Through training neural
networks is able to find suitable values of the weight matrix which can make the actual outputs
more closely with the desired outputs. During this process, neurons learn the weights
iteratively by being given a number of training data. This process is finished when the network
has stabilized. After this procedure, the second necessary procedure is testing the neural
network by using the testing data and the weights matrix after training to compare between
actual outputs and desired outputs. Then, according to the testing accuracy rate, we can
conclude about our neural networks module whether it was successful or not.
2.2 Radial Basis Function
A radial basis function, also called RBF, is a real-valued function whose value depends only on the distance between the origin point 𝑥 and some other point 𝑐 [6]. The radial basis
function represents as:
∅(𝑥, 𝑐) = ∅(‖𝑥 − 𝑐‖)
Any function ∅ satisfies the property below, also called radial function.
∅(𝑥) = ∅(‖𝑥‖)
There are different distance measures which can be used in the radial basis function such as
Euclidean distance, Lukaszyk-Karmowski metric and taxicab distance. In our project, we
choose the Euclidean distance measure method for distance of RBF. The Definition of
Euclidean distance is below:
Page 12
𝑑(𝑞,𝑝) = √∑(𝑞𝑖 − 𝑝𝑖)2
𝑛
𝑖=1
Where p = (p1, p2... pn) and q = (q1, q2... qn) are two points in Euclidean n-space.
Equation 1.1: Euclidean distance
There are two obviously advantages of RBF neural networks, firstly RBF neural networks train faster than other multiple layer neural networks. Another advantage that is claimed is that the
hidden layer is easier to interpret than the hidden layer in an MLP.
2.3 Fuzzy Rule Based systems
Fuzzy Rule Based systems are linguistic IF-THEN- constructions that have the general form "IF
A THEN B" where A and B are (collections of) propositions containing linguistic variables,in
which case, A is called the premise and B is the consequence of the rule. In effect, the use of
linguistic variables and fuzzy IF-THEN- rules exploits the tolerance for imprecision and
uncertainty. In this respect, fuzzy logic mimics the crucial ability of the human mind to
summarize data and focus on decision-relevant information.
Fuzzy Rule based systems are very successful and popular in control system applications. They
outperform the conventional method of modeling non-linear control systems, which is based
on solving high order partial deferential equations, by simplicity of inference. However, the
fuzzy rule based system is also called a dense fuzzy rule based system, because it suffers from a
serious issue which called rule explosion [1]. Rule explosion is caused by the exponential growth of the number of rules needed with regard to the number of fuzzy sets per input
dimension and number of inputs. The equation 1.2 below shows the calculation of the number
of rules required for a system which has k input variables and T number of fuzzy sets per input
dimension [7].
|𝑅| = 𝑂(𝑇𝐾)
Equation 1.2: Rule explosion
As the k or T increasing, the number of rules will have an increasing sharply. In order to solve
this problem; there are 4 possible solutions to model systems that have high number of inputs
and/or fuzzy subsets within those inputs. They are sparse fuzzy rule based systems,
hierarchical fuzzy rule based systems, sparse hierarchical fuzzy rule based systems, and fuzzy
signatures. In our approach, we choose the fuzzy signatures to solve this issue.
Page 13
2.4 Fuzzy Signature
Computational Intelligence research focuses mainly on identifying approximate models for
decision support or classification where analytically unknown systems exist. Especially, those systems consist of very complex structured and/or high dimensional data, even with
interdependent features [3]. Traditional fuzzy logic approaches such as fuzzy rule based
systems have become popular for Computational Intelligence research because of the ability to
assign linguistic labels and to model uncertainty in many decision making and classification
problems. However, conventional fuzzy rule based systems suffer from high computational time
complexity, so in most cases, applications of fuzzy rule based systems still remain in some
conditions which have few dimensions of input variables and have relatively simple structured
data even if there is complex behavior in the system being modeled [1]. The role of aggregation
of information in rule based fuzzy systems, including sparse hierarchical fuzzy r ule based
systems, is generally by min, max and average. This is a restriction on conventional fuzzy
systems as it neglects other membership values of the same input.
A Fuzzy Signature is a Vector Valued Fuzzy Set (VVFS), where each vector component is
another VVFS (branch) or an atomic value (leaf). It can be described as below [7]:
𝐴:𝑋 → [𝑎𝑖]𝑖=1𝑘
Where: 𝑎𝑖 = { [𝑎𝑖𝑗]𝑗=1
𝑘𝑖 ; 𝑖𝑓 𝑏𝑟𝑎𝑛𝑐
[0,1] ; 𝑖𝑓 𝑙𝑒𝑎𝑓
Equation 1.3: VVFS
If each fuzzy signature has the same structure and aggregation method, then fuzzy signature
can be described a vector, normally we used min, max, and average method as aggregation functions [2]. Figure 3 below represents a fuzzy signature example for a SARS patient, and
Figure 4 shows an aggregation method based on max, min then average. According to the figure,
these aggregation functions transfer those membership values of this fuzzy signature into a
single fuzzy signature and eventually a single value.
Figure 3: Example of structure of fuzzy signature of SARS patient
Page 14
Figure 4: Example of aggregation of SARS patient
There are 3 important advantages of fuzzy signature in fuzzy rule based systems. Firstly it is able to reduce the high computation cost. Second, fuzzy signatures have the ability to handle
noisy and missing value by using specific aggregation functions. Thirdly, new information or
features can be added without redesigning the structure of the data representation
However, in actual use, the definition of structure and choice of aggregation function which is
based on professional knowledge, in the presence of uncertain data, we have some difficulty to
define and choose all of these. In order to solve this issue, in our project, we are trying to
automate and hence improve the parts of definition of structure and selecting of aggregation
function to make fuzzy signature more general. In the next chapter, we will discuss more details
of the improvement of the definition of structure and selection of aggregation functions.
3. Fuzzy Signature Neural Network
3.1 Basic structure and process of fuzzy signature neural network
Fuzzy signature neural network is a type of neural networks, in our project; the radial basis
function is treated as the activation function in the neurons [8].Firstly, a number of input data
either from original data or the output of other neurons as input, then the Euclidean distance is
calculated from the evaluation point (each input is a point in a multi-dimensional input space)
to the sample point in each neuron. Additionally, each hidden neuron (fuzzy signature neuron)
in the neural network has a specific fuzzy signature associated with it and the output of the
hidden neuron is the similarity between the input vector and the fuzzy signature based on the
specific aggregation function. Thirdly, through the equation below, we can calculate the output,
and then we use the equation below to modify the weights. ∆𝑤𝑖𝑗 = (𝑡𝑖 − 𝑦𝑖)𝑥𝑖
𝑤𝑒𝑟𝑒 𝛼 is a constant with small value called learning rate
𝑡𝑖 is the value of desired output at dimension i
𝑦𝑖 is the value of actual output at dimension i
Page 15
𝑥𝑖 is the value of input at dimension i
Equation 2.1: Weight modify
So the strengths between the input neurons and hidden neurons are constants, which mean it
is not able to change with training. The whole training task is only performed by the weight
matrix between hidden neurons (fuzzy signature neurons) and the output neurons. Therefore,
the time to train these neural networks should reduce since the smaller size of weight matrix,
and reduced number of layers. Figure 5 shows the general architecture of the fuzzy signature based radial basis function neural network.
Input neuron
Input neuron
Input neuron
Hidden neuron
Hidden neuron
Output neuron
Output neuron
Weight
WeightWeight
Weight
Fuzzy Signature neuron
Figure 5: Example of Fuzzy signature based redial basis function neural network
3.2 Improvement of fuzzy signature neural network
In our project, we work on a similar network has been shown by Fan; However, Fan’s
application has some limitations. Firstly, the users must be choose the aggregation function by
themselves, so the results depends highly on their choice, and secondly, each structure of fuzzy
signature must be set by the users, in actual use, the users choice must be based on
professional or practical knowledge then the users can select the structure of fuzzy signature.
Hence selecting the structure of a fuzzy signature is quite difficult to common users. The third
weakness is that once chosen, the same structure of fuzzy signature and aggregation function is
used in constructing all fuzzy signatures in that network. So it is likely to miss some features in
the data if there are significant differences in the important data in different clusters. This is
likely if we consider highly complex data with significantly different sub-structure. Therefore,
this implementation significantly extends the previous work, re-designs and constructs the
Page 16
fuzzy signature based RBF network in a different and more general way.
4. Design and Implementation of Fuzzy signature neural
network
4.1 Description
This chapter describes the techniques and methodology used to implement fuzzy signature
neural networks. The programing language used to implement the FSNN is Matlab. In order to
create a fuzzy signature neural network, the implementation can be divided into five modules,
and the testing can be divided into four modules. The fundamental architecture of the
implementation and testing is demonstrated in Figure 6, the construction of fuzzy signature
neural network consists of data with missing values, clustering, and obtaining fuzzy signature
information, create and train the network. The testing suite contains three modules which are
extracting network information, and testing network, and then collate results. The construction
of the fuzzy signature neural network should be embedded into the test suite, which means it
should be part of the test suite.
Construct of fuzzy signature neural network
Testing suite
Figure 6: Construction of fuzzy signature neural network & testing suit
Damaged input
Clustering input
Obtain fuzzy signature
Create nerual
network
Train neural network
Extracting network
information
Clustering input from testing
data
Testing neural network
Collect the results
Page 17
4.2 Construction of fuzzy signature neural network
This section describes the procedure to construct fuzzy signature neural networks in detail. The whole network procedure is divided into five different modules and will be introduced as
follows:
4.2.1 Dealing with data
In a real world, the recorded data is not always perfect, normally, missing data happens under
uncertainty condition. We add this module in order to simulate the situation above. The
damaged data module is an optional module, we only need to simulate the data which has
missing values, and then we will choose to use it. Below a pseudo code demonstrates the
algorithm and process.
The damaged function to rearrange the data and it contains some missing values.
Input: data (this means the data we need to handle), rates (proportion of missing values in whole data)
Output: damageddata
damaged(data, rate)
1. inputCol=number of column (data)
2. inputRow=number of row(data)
3. total=round(rate*inputCol*inputRow); // get the number of the total missing values in whole data
4. col=random(1,total,[1,inputCol]); //get the position of the column
5. row=random(total,1,[1,inputRow]); // get the position of the random
6. for loop 1 to total
7. Set data(col(1,i), row(1,i)← Not a number //set those position to a non-number
8. End loops
9. Set damageddata=data
10. End
From above pseudo code, firstly, we need to set the rate with which we need to deal the data
with missing values. Then we calculate the total number of missing values, after that we random by the positions in the matrix, then can set those position numbers into non-numbers.
Then the damage-data set is ready.
Page 18
4.2.2 Clustering
Clustering is the pre-processing method to obtain the fuzzy signature neurons in this
implementation. Clustering is a creditable unsupervised method; using a clustering technique allows that the users do not need to consider how to construct the fuzzy signature themselves.
The second reason is that extracting fuzzy signature manually is time consuming and can be
difficult, especially when the input data set contains large numbers of records.
Clustering has many different methods. In our approach, we use agglomerative hierarchical
clustering. Agglomerative hierarchical clustering has some obvious advantages with regards
other methods, the first one is that we do not need users to specify the number of clusters, the
second one is that the algorithm is deterministic which is useful for research purposes, and the
third one is that the outputs are more informative than in flat clustering. [9]
Agglomerative hierarchical clustering is a bottom up hierarchical clustering method where
each cluster can be another cluster’s sub-cluster. It begins with each single object in a separate
cluster. Then it agglomerates similar clusters based on similarity criteria, until all data merge
into one cluster. Below the figures shows an example of hierarchical clustering.
.
a
b c d e f
a b c d e f
abcdef
bcdef
debc
Figure 7: Example of Agglomerative hierarchical clustering
In the hierarchical clustering, the similarity criterion is based on the Ward linkage method
since it is efficient. It uses the incremental sum of squares to calculate the distance; normally
distance means the Euclidean distance. The sum of squares measure is defined as in the
formula below:
𝑑(𝑟, 𝑠) = √2𝑛𝑟𝑛𝑠
(𝑛𝑟 +𝑛𝑠)‖𝑥𝑟− 𝑥𝑠‖
Page 19
Where 𝑥𝑟and 𝑥𝑠 are the centroids of clusters r and s
‖𝑥𝑟−𝑥𝑠‖ is Euclidean distance between 𝑥𝑟 and 𝑥𝑠
𝑛𝑟 and 𝑛𝑠 are the number of elements in clusters r and s
Equation 2.2: distance calculate of Euclidean
In our application, Matlab has a package in the statistical toolbox for Agglomerative
Hierarchical Clustering with the Ward linkage method. Function 𝑙𝑖𝑛𝑘𝑎𝑔𝑒(𝑖𝑛𝑝𝑢𝑡𝑑𝑎𝑡𝑎, ′𝑤𝑎𝑟𝑑′, ′𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛′) takes input data set inputdata and the specific
method name as arguments then creates a hierarchical cluster tree. In this case, the formatted
data combined with a string as specific parameters pass though this function. This string
indicates Ward linkage method. Furthermore, the function 𝑐𝑙𝑢𝑠𝑡𝑒𝑟(𝑍, ′𝑚𝑎𝑥𝑐𝑙𝑢𝑠𝑡′, 𝑛)
constructs clusters from a hierarchical cluster tree and the number of clusters n as arguments. The clustering module in this implementation uses both functions to cluster the formatted data
set.
4.2.3 Create fuzzy signature
Before creating a fuzzy signature, we need to get the structure of fuzzy signature. In our
application, each fuzzy signature has a different structure which was created automatically.
Additionally, the fuzzy signature neurons are based on selected clusters (which has been
mentioned in Section 4.2.2), after that, we will get the membership values as described above,
and finally, our code will choose the best aggregation function to find the final membership
value as the hidden neurons (fuzzy signature neuron). In order to present the detail, we will
use the next 3 sections to explain our application.
4.2.3.1 Structure of fuzzy signature
Each fuzzy signature has different structure, this has the advantage that it reduces the risk of
affecting the results by manual construction. In our project, we use a function called 𝐶𝑟𝑒𝑎𝑡𝑒𝑆(𝑛, 𝑖𝑛𝑝𝑢𝑡) to handle this part. Following below is the pseudo code description of the
algorithm and the process for creating the structure of the fuzzy signature.
The function to create structure of fuzzy signature
Input: input (the number of input dimensions), n (this means the number of fuzzy signatures we need
create)
Output: structure (it is a matrix which contains the structure of fuzzy signature)
Structure=CreateS (n, input)
1. For loop 1 : n //once we just handle a fuzzy signature
Page 20
2. For loops 1:10 // in our application we limit to 10 as maximum levels of structure.
3. Then create a structure in each level until the number of input dimensions is reached
4. End loop
5. End loop
6. Return the structure
Actually, in our project, we divide this to three parts, each function involves a loop, and then the
last function creates a structure in a fuzzy signature. After creation is done, we will gain the structure. The matrix below is an example of the structure of fuzzy signature for SARS which
contains 8 input dimensions, and we want to create the 2 fuzzy signatures.
Figure 8: Examples of structure of fuzzy signature for SARS
The first fuzzy signature has two levels; first level is from the second dimension to the seventh
dimension. The second level is from 1st to the end; However, the second fuzzy signature
structure has 3 levels, the first one is 1st to 3rd , after aggregation function to the first level, then
we can handle the second level which from to 1st to 6th , and the third level is from 1st to 8th.
The two vectors below present the detailed structure of two above two fuzzy signatures.
Figure 9: The example of structures of fuzzy signature according to Figure 8
4.2.3.2 Obtain fuzzy signature information
Each fuzzy signature demonstrates the similarity between input and the centroid of a specific
cluster in a data set. Therefore, a matrix would be extracted after the hierarchical clustering
Page 21
method is applied on the formatted data set. It is essential to create a neural network that
contains the cluster’s information such as the centroid. The detailed structure information of
this matrix is shown below:
[
𝑐11 … 𝑐1𝑛𝑐21 … 𝑐2𝑛𝑐𝑗1 … 𝑐𝑗𝑛
]
Where j is the number of clusters
n is the number of dimensions of input data
Cjn is the coordinate of centroid point at dimension n in cluster j Equation 3.1: Information of the cluster
4.2.3.3 Aggregation
Before we do the aggregation, we need select the aggregation functions for each fuzzy signature.
In our project, we have 3 aggregations function method, max, min, and average. In Fan’s
application, we must select it manually, which has the disadvantage of the risk of worse
selection of aggregation functions. Therefore, in our application, we improve this, and following
Mendis’ work, [10] we used the standard deviation comparison to select the aggregation
function.
The standard deviation comparison has 4 processes; firstly we calculate all possible
membership values of the fuzzy signature. Then we find the average value for all membership
values, thirdly we find the standard deviation according to the average membership values,
finally we can find the smallest standard deviation value’s position, then we can find the
aggregation function we select. The figure below shows the process of finding the aggregation
function for a 2 level structure fuzzy signature.
Figure 10: Example of aggregation function selection
0.7 0.2 0.4 𝑎𝑣𝑒0.5 0.3 0.5 𝑚𝑎𝑥0.4𝑎𝑣𝑒
0.6𝑚𝑎𝑥
0.2 𝑚𝑖𝑛𝑚𝑖𝑛 𝑓𝑢𝑛
Average =
0.42
Smallest sd = 0.04
Aggregation function =
min/average
Page 22
4.2.4 Create neural network
Before the neural network is created, we need to set some parameters to satisfy initial
conditions. The following parameters need to be specified by the creation process:
1. Number of Fuzzy Signature neurons
2. Number of training epochs
Firstly, the number of fuzzy signature neurons and the number of training epochs are
determined by the user at the beginning of the program. The second parameter is the number
of training times for a dataset. The approach also provides some parameters automatically. The
size of the weight matrix is based on the number of hidden neurons and the number of output
dimensions. Furthermore, the neural network weight matrix values are initially set to small
random values. This provides the training procedure with a good starting stage to work
towards a solution. In this case, a function in the Matlab statistics toolbox called is random
embedded into the implementation. It takes the number of columns and rows comb ined with a
mean and standard deviation as arguments then generates a matrix with random values. After
all possible parameters have been determined; one data container called handles packs all of
them and sends the package to the training module.
4.2.5 Training the Fuzzy Signature Neural Network
Training procedure helps us to find suitable values of the weight matrix which can make the
actual outputs more closely the desired outputs. During this process, neurons learn the weights
iteratively by being given a number of training data. Therefore, providing a suitable training
method is an essential pre-condition for solving a specific task such as a classification problem.
In our approach, we use the training module to handle it.
The training procedure helps us to find suitable values of the weight matrix which can make the
actual outputs more closely model the desired outputs. During this process, neurons learn the
weights iteratively by being given a number of training data. Therefore, providing a suitable
training method is an essential pre-condition for solving a specific task such as a classification
problem. In our approach, we use the training module to handle it.
The training module starts the neural network when training data is input. After receiving th e
training data, we calculate the Manhattan distance between centroid of cluster and the given
input at each dimension as in Figure 11, and then the Gaussian function is then applied on
those distances. The Gaussian function control the range of results between 0 and 1 which
represent the similarity between the given input and centroid of cluster. The value 1 means the
given input is the same or very close to the centroid of the cluster; on the other hand, the value
0 means the given input is far from the centroid. After the hidden neurons finalize the similarity
Page 23
computations in all input dimensions, the results are encapsulated by the structure, that is, a
fuzzy signature.
Input data
Centroid of cluster
Vertical distance
Horizontal distance
Figure 11: Manhattan distance
Distance = Horizontal distance + Vertical distance
In order to get a membership value of the fuzzy signature, a specific function called
𝑏𝑒𝑠𝑡𝑎𝑔𝑔𝑟𝑒𝑎𝑡𝑒𝑑𝑎𝑡𝑎(𝑓, 𝑠) takes the fuzzy signature 𝑓 and structure 𝑠 as arguments, and then
we can produce the membership value.
After above process, each membership value will be used as compute the actual value. The
formula below demonstrates the calculation of the actual value:
𝑦 = ∑𝑤𝑖𝑖
𝑛
𝑖=1
𝑤𝑒𝑟𝑒 𝑦 𝑖𝑠 𝑜𝑢𝑡𝑝𝑢𝑡 𝑓𝑜𝑟 𝑛𝑒𝑢𝑎𝑙 𝑛𝑒𝑡𝑤𝑜𝑘
𝑤 𝑖𝑠 𝑡𝑒 𝑤𝑒𝑖𝑔𝑡𝑠
𝑖𝑠 𝑡𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑖𝑑𝑑𝑒𝑛 𝑛𝑒𝑢𝑟𝑜𝑛
𝑖 𝑖𝑠 𝑡𝑒 𝑖𝑡ℎ 𝑜𝑓 𝑖𝑑𝑑𝑒𝑛 𝑛𝑒𝑢𝑟𝑜𝑛 𝑜𝑟 𝑤𝑒𝑖𝑔𝑡
After we calculate actual values, then the neural network will compare the difference between
the actual values and desired outputs, and then update the weights matrix values based on the delta rule. The delta rule is a gradient descent update rule for tuning the weights matrix of the
neurons. The training module in this implementation uses this delta rule to minimize the error
between actual output and desired output. The learning rate can be determined by the users,
with a default value of 0.01. The formula below represents a simplify form with the update
function:
∆𝑤𝑖𝑗 = (𝑡𝑖 − 𝑦𝑖)𝑥𝑖
𝑤𝑒𝑟𝑒 𝛼 is a constant with small value called learning rate
𝑡𝑖 is the value of desired output at dimension i
𝑦𝑖 is the value of actual output at dimension i
𝑥𝑖 is the value of input at dimension i
Page 24
When all the training data has been used in this module, one training epoch complete, but it is
not the end of the training phase. It just finished training one epoch. The training is not
complete until all the epochs finish.
4.3 Testing
In the testing part, we have three modules, which are testing network, collect results and
extracting network information. The 2 sections below demonstrate the details of them
separately.
4.3.1 Testing network
Before testing the network, the dataset need to be reorganized, to create more accurate benchmarks since the training set and testing set are isolated. The organizing method is
performed in this test suite for the k-fold cross validation scheme. For example, if the variable k
is determined as 4 by the user, both input data set and desired data set are divided into 4 sets,
where three of them are treated as training set randomly and the remaining one is treated as
test set, the total number of iterations is 4. The advantage of achieving this is that the user is
able to choose how large each test set is and how many iterations are averaged over
independently.
After re-organizing the raw data, the network starts the testing part. In the dataset, each
desired output is a vector of binary values that represents the class to which this observation
belongs. On the other hand, the actual result from the neural network is usually a vector
consisting of decimal numbers. Therefore, a specific mapping function is perfo rmed to produce
optimized output. Firstly, we gain the actual values from the neural network. Secondly, find the
position of maximum value. Thirdly, create a new vector which has the same size as the vector
of outputs, and then set all elements as 0 in this vector. Fourthly, assign the value of element at
position which has the same position from the second step to 1. Finally, return this vector as
output. Once the optimized output has been generated, the module then compares it with the
desired output and produces the value of accuracy and mean squared error, one by one.
Furthermore, the final benchmarks including average accuracy rate and average mean squared
error are generated at the end of the procedure. This module also shows the trend of benchmarks by diagrams, so users can easily understand the quality of the network with those
benchmarks.
Page 25
4.3.2 Extracting network information
Once the neural network has been trained and tested, all information that directly relates to the
network has been determined. This module collects network information as a kind of benchmark and stores it into a file. There are 5 files which are Cluster_detail.txt,
Traning_detail.txt, Testing_detail.txt, Final_result.txt, and structure.txt. The table 1 below
describes the files and the benchmarks that are associated with them. The Appendix A at the
end of this report also shows an example of network information based on the “Wine” data set
that is associated with the experiments and evaluation chapter.
File name Benchmarks Description
Cluster_detail.txt This file describes cluster information in detail. It
includes the class distribution, centroid value, minimum
value and maximum value for each cluster.
Traning_detail.txt This file contains training information
Testing_detail.txt This file shows testing information based on the trained
network. It includes the desired output, optimized
output and actual output information for unmatched
results.
Structure.txt This file shows detailed information about each cluster.
E.g.: 8 inputs dimensions. 1,8; 1,2 3,7 1 8;
Final_result.txt This file shows the general result benchmarks. It
includes accuracy rate and mean square error by
iterations. Moreover, the average accuracy rate and
mean squared error is included as well.
Table 1: Information of files
5. Experiments and Evaluation
This chapter introduces three experiments, and their evaluation and compassion with other
result. The experiments 1 and 2 used the same data sets; however, we handle the dataset with
some missing value in experiment 2. We wish to evaluate the advantages of our approach when
the dataset has missing values. The experiment 3 our approach will be compare with other
neural networks, other neural networks results from the relative academic essay. Before we
experiment and evaluate, we need to describe the basic information for the experiments.
Page 26
5.1 Description of the dataset
There are 10 data sets described in section 5.1 from the University of California Irvine (UCI)
machine learning data set repository. Table 2 below shows the general information for these data sets [12].
Data set Number of
input
dimensions
Number of
output
dimensions
Number of
observation
Wine 13 3 178
Thyroid 21 3 7200
SARS 8 4 4000
Ionosphere 34 2 351
Horse 58 3 364
Heart 35 2 920
Heartc 35 2 303
Diabetes 8 2 768
Card 51 2 690
Cancer 8 2 699
Table 2: Datasets details
Descriptions for all data sets except SARS are listed at the UCI website. SARS is from Wong [2].
A brief synopsis of each data set is as follows:
Wine
This data set is about a chemical classification of three different type of wine that grown in the
same region in Italy. It consists of 13 input dimensions which describe the information of the
wine and three output dimensions which indicate three types of wines.
Thyroid
This data set refers to a classification task that diagnoses patients’ thyroid condition. There are
21 inputs, 7200 examples. The desired output has 3 categories which indicate whether the
patient’s thyroid has over function, normal function, or under function.
SARS
This data set describes the severe acute respiratory syndrome (SARS) patients’ information
such as fever temperature at different time, blood pressure, nausea and abdominal pain by 8
input dimensions. The 4 outputs indicate 4 types of patients. They are SARS, normal,
pneumonia and hypertension. The total number of observations is 4,000 (Wong et al [2]).
Ionosphere
Page 27
This data set refers to radar information that was obtained by a system in G oose Bay, Labrador.
It is used to determine whether some type of structure exists in the ionosphere. There are 34
input dimensions that indicate values of the electromagnetic signal, and 2 desired output
dimensions show whether the structure exists or not. Therefore, this is a binary classification
task. Total number of observations is 351.
Horse
This data set is about a classification task which predicts the fate of a colic horse. It demonstrates whether the horse will survive, will die or will be euthanized based on the result
of veterinary examination.
Heart
This data set represents the prediction of heart disease. There are 35 input dimensions associated with personal data such as age, sex, smoking habits, subjective patient pain
descriptions and results of various medical examinations such as blood pressure and electro
cardiogram result. The 2 output dimensions determine whether at least one of four vessels is
reduced in diameter by more than 50%.
Heartc
This is an alternate version of the “Heart” data set. The structure of input and output is the
same as the heart data set. The difference between “Heart” and “Heartc” is the “Heartc” dataset
comes from a different source, the Cleveland Clinic Foundation.
Diabetes
This data is about a classification task that diagnoses Pima Indians’ diabetes. The aim of the
data set is trying to determine whether a Pima Indian individual is diabetes positive or not.
Card
This data set refers to a binary classification task that predicts whether a customer’s credit card
should be approved or not. There are 51 inputs that represent a real credit card application.
The 2 outputs show the decision on the credit card.
Cancer
This data set is about a classification task that diagnoses patients’ breast cancer. The main task
indicates a tumor as either benign or malignant.
.
5.2 Hardware and software environment information
We use the same Matlab version and computer to test our and Fan’s approach, because we need
to avoid the situation that different hardware and software leads to the different result. Table 3
blow shows the detail of the hardware and software information:
Page 28
MATLAB version 7.8.0(R2009) X64
Operating system Win7 Ultimate X64
CPU Intel® Core™ I7-2630QM CPU @ 2,00GHz
Installed memory( RAM) 6.00GB
Hard Disk 750G
Table 3: the information of Hard and Software environment
5.3 Experiment 1: Datasets experiments with no missing data
5.3.1 Purpose of the experiment
The aim of this experiment is to discover the feasibility and performance of fuzzy signature
based neural networks for a range of data sets. This experiment determines the results of our
approach and Fan’s in datasets with no missing data, through the experiment, and then
comparison with ours and Fan’s, we will try to find the advantages and disadvantages of our
fuzzy signature neural network. Finally we analyze the reasons which cause the differences of
results.
5.3.2 Description of the experiment
This experiment produces related benchmarks with the data sets that are described in section
5.1 and compares them with Fan’s classification approach. In order to avoid the risk which has
each result different due to the initial conditions, we run the experiment 5 times for each
dataset, and we used the same number of hidden neurons (fuzzy signature neurons) each time.
Additionally, in Wei’s approach, we used the average method as the aggregation function. Both
approaches set for the limit at 100 training times, and 20% data as the testing dataset, and 80%
as the training dataset. The table 4 below shows the number of fuzzy signature neurons.
Data set Number of
input
dimensions
Number of
hidden neurons
Wine 13 5-10
Thyroid 21 13-18
SARS 8 2-6
Ionosphere 34 18-25
Horse 58 20-25
Heart 35 17-23
Heartc 35 17-23
Diabetes 8 2-6
Page 29
Card 51 20-25
Cancer 8 2-7
Table 4: the detail of number of cluster selected
5.3.3 Experiment Process and Discussion of Results
As mentioned in the last section, various numbers of fuzzy neurons have been used to find the
maximum accuracy rate and minimum mean squared error in each dataset. The locally
optimized results for all data sets are listed in the table below. More specifically, each row in
table 5 represents one specific data set benchmark, and includes mean and standard deviation
of accuracy rate (results of 5 fold cross validation) for both training and testing data sets, and
mean squared error for both the training and testing data sets. For each data set benchmark,
we used the average number to represent the 5 folds.
Data set Training data set Testing data set
Mean (%) MSE Mean (%) MSE
Wine 93 0.0679 95 0.0714
Thyroid 92.7 0.042 92.8 0.043
SARS 94.13 0.086 94.08 0.085
Ionosphere 87.28 0.115 87.42 0.126
Horse 64.9 0.166 62.7 0.167
Heart 80.76 0.144 79.96 0.149
Heartc 79.97 0.148 78.70 0.156
Diabetes 72.80 0.178 73.37 0.175
Card 82.10 0.131 83.04 0.133
Cancer 96.3506 0.041 96.285 0.042
Average 84.4 0.11 84.34 0.11
Table 5: the benchmark for our approach
In the table above, there is no generally optimum number of fuzzy neurons in this experiment;
moreover, each data set’s accuracy rate is different. Table 6 below shows Fan’s results which are
not very different.
Data set Training data set Testing data set
Mean (%) MSE Mean (%) MSE
Wine 91.64 0.0635 92.22 0.0662
Thyroid 92.6 0.0421 92.48 0.0424
SARS 93.317 0.0822 93.57 0.0829
Ionosphere 90.39 0.0949 90.85 0.0913
Horse 63.43 0.1575 63.58 0.1542
Heart 81.9 0.131 80 0.14
Page 30
Heartc 80.49 0.133 80.98 .0.138
Diabetes 70.78 0.204 68.31 0.191
Card 80.65 0.152 78.84 0.153
Cancer 96.14 0.038 96.42 0.0376
Average 84.13 0.11 83.73 0.096
Table 6: the benchmark for Fan's approach
From the two tables above, we can see that the overall results have only a little difference
between our fuzzy signature neural network and Fan’s fuzzy signature RBF neural network.
Although accuracy rates of our training dataset and testing dataset is higher than Fan’s, on the
other hand, the mean square error of our approach in the testing dataset is lower than Fan’s. So we do not have any obvious evidence to present that our algorithm has any advantages
compared with Fan’s approach. Additionally, in specific datasets, the accuracy rates of wine,
thyroid, SARS, diabetes, card, and cancer datasets are a little higher than Fan’s. However, the
accuracy rates of other dataset are a little lower than Fan’s. So there are 6 specific datasets
where our result is higher than Fan’s and 4 specific datasets where our result is slightly lower
than Fan’s. Based on those the benchmarks above, we still cannot conclude which algorithm is
better. Therefore, we performed the second experiment in which datasets with missing va lues
are used to evaluate the differences in the effects of the algorithm between ours and Fan’s.
5.4 Experiment 2: Datasets experiments with missing data
5.4.1 Purpose of the experiment
In experiment 1, we do not find any significant difference between our approa ch and Fan’s from
the benchmark. Therefore, we will try to add the datasets with missing data to test our and
Fan’s approach. The aim of this experiment is still to discover the feasibility and performance of
fuzzy signature based neural networks for a range of data sets. Through the experiment, and
then comparison with ours and Fan’s, we will try to find the advantages and disadvantages of
our fuzzy signature neural network.
5.4.2 Description of the experiment
In this experiment, we handle the dataset with 20% missing values, for the method of dealing
the missing values, we have introduced by section 4.2.1. For Fan’s approach, we need to handle
the missing values. In Fan’s approach, it uses the average value of this dimension instead of the
missing values. After dealing with the missing values, we used otherwise exactly the same data
and number of hidden neurons in experiment 1. All the information can be seen in section
5.3.2.
Page 31
5.4.3 Experiment Process and Discussion of Results
With the same as experiment 1, various numbers of fuzzy neurons have been used to find the
maximum accuracy rate and minimum mean squared error in each dataset. The table 7 below shows the average values of the accuracy rate and mean squared in our approach.
Data set Training data set Testing data set
Mean (%) MSE Mean (%) MSE
Wine 93.024 0.07 91.776 0.08
Thyroid 92.556 0.042 92.94 0.041
SARS 86.436 0.1 85.6 0.1
Ionosphere 87.5982 0.12 86.91 0.11
Horse 64.662 0.15 63.966 0.18
Heart 79.466 0.14 79.626 0.14
Heartc 78.342 0.17 77.532 0.18
Diabetes 69.64 0.19 69.9 0.19
Card 77.068 0.155 75.6 0.151
Cancer 95.56 0.04 96 0.05
Average 82.43522 0.117 81.985 0.122
Table 7: the benchmark for our approach with missing value
Because the datasets have missing values, all the benchmark results have decreased slightly
when compared with experiment 1. However, the decrease is slightly, it keeps stable generally;
therefore, our approach works quite well on datasets with missing values. After that, we tested
Fan’s approach, with the same conditions, with 20% missing values. Table 8 below shows the benchmark for Fan’s approach.
Data set Training data set Testing data set
Mean (%) MSE Mean (%) MSE
Wine 57.316 0.2 55.046 0.21
Thyroid 92.538 0.05 92.84 0.05
SARS 74.56 0.1 74.28 0.1
Ionosphere 88.94 0.08 88.76 0.09
Horse 65.02 0.16 64.24 0.16
Heart 82.52 0.15 82.42 0.14
Heartc 79.94 0.15 79.22 0.15
Diabetes 71.08 0.19 70.74 0.19
Card 78.86 0.15 78.88 0.16
Cancer 96.22 0.04 96.36 0.04
Average 78.6994 0.127 78.2786 0.129
Table 8: the benchmark for Fan's approach with missing value
From Fan’s benchmark, the accuracy rates both of training datasets and testing datasets have a
larger decrease with the accuracy rate of training datasets going from 84.13 to 78.70, and the
Page 32
accuracy rate of testing datasets from 83.70 to 78.28. Compared with our approach, it
represents worse results, which has two aspects. The first aspect is that all of the benchmarks
of our approach are higher than Fan’s approach. The secondly, the effect of the dataset with
missing values in our approach are less than Fan’s approach. In two datasets, wine and SARS,
the results of Fan’s approach has decreased sharply, the accuracy rate of the wine dataset
decreased from 93% to 57%, on the other hand, in our approach the effect is slight, the result is
almost the same which is 93%. The same effect happens in SARS dataset.
According to the results above, we conclude Fan’s fuzzy signature RBF neural network has an
obvious limitation, which is that it depends highly on data integrity. On the other hand, our
approach has less effect when the datasets have missing values. Because the aggregation
method of our approach and structure of fuzzy signature is suited for more extreme cases, it
will be less affected when the data is incomplete. In future work we should examine eliminating the 20% of data values and using those datasets to construct the fuzzy signatures for both our
and Fan’s approaches.
5.5 Experiment 3: Benchmarks comparison between Fuzzy
signature neural network and other approaches
5.5.1 Purpose of the experiment
The purpose of this experiment is to evaluate the performance of our approach when compared with other neural networks. Then we can find some ways to improve our approach.
5.5.2 Description of the experiment
In this experiment, our approach will be compared with other neural networks which are
Cascor and k-sNN. In order to do this comparison, firstly, we find the benchmarks on which
both Cascor and k-sNN have been tested by other researchers [13, 14]. The datasets heart,
horse, diabetes and cancer have been tested by Cascor and K-sNN. After that, we find the most
optimized results from each approach. The table 9 below shows our values of parameters,
which involve number of fuzzy neurons and epochs.
Dataset Number of fuzzy neurons Epochs
Heart 30 200
Horse 35 200
Diabetes 16 400
Cancer 18 400
Table 9: the information of our approach that will be used
Page 33
5.5.3 Experiment Process and Discussion of Results
We used the last section’s parameters to test our approach 5 times, and then find the accuracy
rates of training datasets and testing datasets, finally calculating the average accuracy rates and find the other neural network’s accuracy rates. The table 10 below shows the information
mentioned above.
Dataset Fuzzy signature
neural network
Cascor K-sNN
Heart 80.6 80.1 75.1
Horse 68 73.6 70.9
Diabetes 76.7 76.5 69.8
Cancer 96.3 98.1 62.5
Table 10: the results of 3 different neural networks
From the table above, the best performance is Cascor, our approach is slightly lower than
Cascor; however the K-sNN is the worst one. In the horse dataset, our approach has the worst
performance, though on the hand, in the heart and diabetes dataset our approach has the best
accuracy rates. From the viewpoint of the speed of processing, the fuzzy signature neural
network is the best one, and the others are slower.
6. Conclusion and Future Work
6.1 Conclusion
We have implemented our approach (fuzzy signature neural network) and Fan’s approach
(fuzzy signature RBF neural network), and then compared their advantages and disadvantages.
The techniques and methodologies to implement such a network and suitable test cases have
been demonstrated. We used real world data sets from UCI, the benchmarks have been
compared with different parameters in the network and different classification methods. Our
approach achieved stable and good results, especially when using datasets with missing values,
it still provides stable and good benchmarks. Therefore the approach is viable and worth
further investigation.
6.2 Future Work
In this report, our approach (fuzzy signature neural network) has good performance for
Page 34
classification datasets; however, this approach still has space for improvement. Therefore we
give 3 major suggestions for future work, beyond the few minor suggestions already mentioned
in various sections earlier in this report.
Firstly, in the aggregation function, we only choose 3 methods, which are max, min, and average
functions, to handle the aggregation process. This is a little simple for the aggregation process;
we can add the weights matrix during the aggregation process. For the adding weights method,
we may use the weights learning which is introduced by Mendis, 2008 [1]. It is a more complex aggregation method, and we would compare benchmarks to see the result, which we expect
would improve.
Secondly, in our approach, we have 3 layer neural networks, which means only one hidden
layer is used. This has a disadvantage in that it may lose some features in the fuzzy signature neural network. Based on this, we suggest that we can add a hidden layer, after that using the
back propagating algorithm to modify the weights, and then it may find some new features in
the dataset.
Finally, this approach has a limitation for general real world tasks, because it is suited for
classification datasets, and not tested for regression datasets. In order to extend the types of
dataset, for the fuzzy part, we can use the polymorphic fuzzy signature instead of our fuzzy
signature. At the same time, the neural network could use the cascor neural network instead of
the RBF neural network [15]. Because the polymorphic fuzzy signature and cascor neural
network have better performance individually, we consider that combining those two methods
into our approach will have better performance [1, 15].
Page 35
Reference
[1] B.S.U. Mendis, 2008, Fuzzy Signatures: Hierarchical Fuzzy Systems and Applications, PhD thesis, Department of Computer Science, The Australian National University, Australia, March
2008.
[2] K.W. Wong, T.D. Gedeon, L.T. Kóczy, “Construction of Fuzzy Signature from Data: An Example
of SARs Pre-clinical Diagnosis System”, Proceedings of IEEE International Conference on Fuzzy
Systems – FUZZ-IEEE 2004, 2004,pp. 1353.
[3] W. Fan, 2008, Fuzzy Signature based Radial Basis Neural Network, Master thesis, Department
of Computer Science, The Australian National University, Australia, Nov 2011.
[4]T. Gedeon, K. Wong, and D. Tikk, “Constructing hierarchical fuzzy rule bases for classification,”
in Fuzzy Systems, 2001. The 10th IEEE International Conference, 2001,pp. 1388-1391
[5] M. Minsky, and S. Papert, Perceptron, Cambridge: MA: MIT Press, 1969, pp 115-138.
[6]M. D. Buhmann, Radial Basis Functions: Theory and Implementations, 1st edn , Canmbridge
University Press ,2003.
[7] B. S. U. Mendis, T. D. Gedeon, and L. T. K_oczy. “Flexibility and robustness of hierarchical
fuzzy signature structures with perturbed input data.” in International Conference of Information Processing and Management of Uncertainty in Knowledge Based Systems (IPMU),
2006, pp 2552-2559.
[8]Pawel Strumillo, Wladyslaw Kaminski, “Radial Basis Function Neural Networks: Theory and
Applications”, in proceedings of the Sixth International Conference on Neural Network and Soft Computing, Zakopane, Poland,2006, pp.107-119.
[9]H. Tevor, T. Robert, F. Jerome, The Elements of Statistical Learning (2nd ed.), New York:
Springer, 2009, pp520-528.
[10] B. S. U. Mendis, and T. D. Gedeon, “Complex Structured Decision Making Model: A
hierarchical frame work for complex structured data .” Information Sciences, vol 194, pp 85-106,
July,2011.
[11] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: Hierarchical Clustering Algorithm Using
Dynamic Modeling,” Computer, 1999,vol. 32, no. 8, pp. 68-75,
[12] The UCI Machine Learning Repository, 1987, Center for Machine Learning and Intelligent
Systems, viewed 15 Feb 2011, <http://archive.ics.uci.edu/ml/datasets.html>.
Page 36
[13]N.K. Treadgold, T.D. Gedeon, “Exploring constructive cascade networks”, Industrial
Electronics Society, 2001. IECON '01. The 27th Annual Conference of the IEEE, vol. 1, 2001, pp.
25-30.
[14] Nock, R., Sebban, M., and Bernard, D, “A Simple locally adaptive nearest neighbor rule with
application to pollution forecasting”, Internal Journal of Pattern Recognition and Artificial
Intelligence, 2003 pp.1369-1382.
[15] S. Fahlman, and C. Lebiere, “The Cascade-Correlation Learning Architecture,” Advances in
Neural Information Processing Systems 2, D.S. Touretsky, ed., 1990,pp. 524-532.
Page 37
Appendix A
The files below show an example of wine’s the text benchmarks with 6 fuzzy neurons and 100 epochs, with 20% missing values.
File: cluster_detail.txt
Details for 6 clusters
Details for cluster number 1 with number of elements 14 are:
Percentage of member is class 1 are 7 50.000000
Percentage of member is class 2 are 2 14.285714
Percentage of member is class 3 are 5 35.714286
Cluster centroid -
13.030 2.438 2.476 20.036 108.214 2.258 1.885 0.356 1.554 5.494 0.891
2.637 865.429
Minimum values of each argument
11.960 1.090 2.100 15.200 96.000 1.100 0.550 0.130 1.140 3.050 0.590
1.560 830.000
Maximum values of each argument
13.870 4.280 3.220 27.000 124.000 3.380 3.030 0.630 2.030 10.200 1.250
3.820 920.000
Details for cluster number 2 with number of elements 44 are:
Percentage of member is class 1 are 6 13.636364
Percentage of member is class 2 are 16 36.363636
Percentage of member is class 3 are 22 50.000000
Cluster centroid -
12.940 2.513 2.374 19.800 100.932 2.086 1.496 0.397 1.490 5.850 0.873
2.296 688.568
Minimum values of each argument
11.450 0.940 1.700 13.200 70.000 1.350 0.470 0.140 0.410 1.740 0.540
1.270 615.000
Maximum values of each argument
14.340 5.650 2.870 25.000 151.000 3.520 3.750 0.630 2.760 13.000 1.310
3.710 795.000
Details for cluster number 3 with number of elements 28 are:
Percentage of member is class 1 are 0 0.000000
Page 38
Percentage of member is class 2 are 26 92.857143
Percentage of member is class 3 are 2 7.142857
Cluster centroid -
12.350 2.254 2.241 20.411 90.321 2.371 2.177 0.350 1.607 3.078 1.024
2.772 376.321
Minimum values of each argument
11.030 0.740 1.710 16.000 80.000 0.980 0.340 0.170 0.680 1.900 0.580
1.330 278.000
Maximum values of each argument
13.880 5.800 2.750 26.500 116.000 3.500 3.150 0.660 2.910 7.100 1.710
3.640 438.000
Details for cluster number 4 with number of elements 44 are:
Percentage of member is class 1 are 0 0.000000
Percentage of member is class 2 are 25 56.818182
Percentage of member is class 3 are 19 43.181818
Cluster centroid -
12.619 2.683 2.344 21.086 94.273 1.852 1.457 0.418 1.314 4.755 0.887
2.280 520.182
Minimum values of each argument
11.460 0.890 1.360 10.600 78.000 1.150 0.470 0.170 0.420 1.280 0.480
1.290 450.000
Maximum values of each argument
14.130 5.510 3.230 28.500 123.000 3.180 5.080 0.630 3.580 10.800 1.360
3.690 607.000
Details for cluster number 5 with number of elements 28 are:
Percentage of member is class 1 are 26 92.857143
Percentage of member is class 2 are 2 7.142857
Percentage of member is class 3 are 0 0.000000
Cluster centroid -
13.674 1.952 2.367 16.968 106.714 2.825 2.940 0.279 1.961 5.149 1.054
3.172 1067.571
Minimum values of each argument
12.470 1.350 2.040 11.200 90.000 2.350 2.270 0.170 1.250 2.600 0.870
2.510 937.000
Maximum values of each argument
14.830 4.040 2.670 30.000 162.000 3.880 3.740 0.430 3.280 7.220 1.310
4.000 1195.000
Page 39
Details for cluster number 6 with number of elements 20 are:
Percentage of member is class 1 are 20 100.000000
Percentage of member is class 2 are 0 0.000000
Percentage of member is class 3 are 0 0.000000
Cluster centroid -
13.921 1.769 2.498 17.200 106.650 2.908 3.082 0.295 1.908 6.323 1.117
3.008 1360.850
Minimum values of each argument
13.290 1.430 2.140 12.000 89.000 2.200 2.190 0.190 1.250 3.950 0.860
2.650 1235.000
Maximum values of each argument
14.390 2.160 2.720 22.500 132.000 3.850 3.930 0.500 2.960 8.900 1.280
3.580 1680.000
File: training_detail.txt weight matrix learnt after rum number 1
-0.0146 0.0066 0.0026 -0.0197 -0.0083 0.0163
-0.0146 0.0262 0.0096 -0.0197 0.0243 -0.0298
0.0136 0.0072 0.0068 0.0290 -0.0124 0.0058
weight matrix learnt after rum number 2
-0.0368 -0.0072 -0.0028 -0.0195 -0.0077 -0.0107
0.0119 0.0037 0.0171 -0.0132 0.0007 -0.0203
-0.0044 -0.0013 -0.0086 0.0118 0.0119 0.0060
weight matrix learnt after rum number 3
-0.0068 -0.0083 0.0200 0.0310 -0.0165 -0.0082
0.0346 -0.0050 -0.0272 0.0054 -0.0136 -0.0315
0.0020 0.0079 -0.0154 -0.0263 -0.0315 0.0198
weight matrix learnt after rum number 4
0.0197 -0.0085 -0.0177 -0.0008 0.0456 0.0014
-0.0131 -0.0109 0.0040 0.0124 -0.0182 0.0088
-0.0052 0.0116 0.0035 0.0044 0.0219 0.0009
weight matrix learnt after rum number 5
0.0149 0.0092 -0.0074 0.0064 -0.0058 -0.0034
Page 40
-0.0086 -0.0037 -0.0222 -0.0125 -0.0143 -0.0187
0.0122 -0.0001 -0.0286 -0.0120 -0.0124 0.0176
File: testing_detail.txt
Failed results in run number 1
Observation 69 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.03 0.28 0.52
Observation 82 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.30 0.46 0.56
Observation 89 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.21 0.43 0.52
Observation 138 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - -0.07 0.39 0.33
Observation 145 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.15 0.60 0.23
Observation 172 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - -0.14 0.57 0.48
Failed results in run number 2
Observation 39 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.44 0.46 0.11
Page 41
Observation 62 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - -0.11 0.55 0.67
Observation 147 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.21 0.27 0.24
Observation 159 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.32 0.42 0.16
Observation 161 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - -0.38 0.87 0.59
Failed results in run number 3
Observation 36 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.40 0.42 0.17
Observation 69 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.10 0.31 0.48
Observation 145 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.06 0.67 0.24
Observation 171 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - -0.09 0.68 0.45
Observation 172 is not match the desired result
Page 42
Actual classification - 0 0 1
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - -0.18 0.62 0.55
Failed results in run number 4
Observation 22 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.36 0.30 0.37
Observation 26 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.43 0.02 0.50
Observation 28 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.54 0.64 -0.02
Observation 63 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - 0.30 0.22 0.48
Observation 64 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 1 0 0
Neural net classification (actual) - 0.50 0.48 -0.03
Observation 147 is not match the desired result
Actual classification - 0 0 1
Neural net classification (rounded) - 1 0 0
Neural net classification (actual) - 0.26 0.07 0.25
Failed results in run number 5
Observation 24 is not match the desired result
Actual classification - 1 0 0
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.34 0.42 0.28
Observation 36 is not match the desired result
Page 43
Actual classification - 1 0 0
Neural net classification (rounded) - 0 1 0
Neural net classification (actual) - 0.38 0.40 0.19
Observation 97 is not match the desired result
Actual classification - 0 1 0
Neural net classification (rounded) - 0 0 1
Neural net classification (actual) - -0.05 0.32 0.43
File structure.txt
There are 6 clusters in it.
Cluster 1 :
2,5
6,11
1,13,2
Cluster 2 :
2,4
5,12
1,13,2
Cluster 3 :
2,3
4,7
8,12
1,13
Cluster 4 :
2,4
5,11
1,13
Cluster 5 :
2,7
8,10
11,12
1,13
Cluster 6 :
2,5
6,11
Page 44
1,13
File final_result.txt
Final result for iteration 1
training data set success rate = 0.880282
testing data set success rate = 0.833333
training data set mean square error = 0.083936
testing data set mean square error = 0.099394
Final result for iteration 2
training data set success rate = 0.908451
testing data set success rate = 0.861111
training data set mean square error = 0.077219
testing data set mean square error = 0.098365
Final result for iteration 3
training data set success rate = 0.901408
testing data set success rate = 0.861111
training data set mean square error = 0.081429
testing data set mean square error = 0.102125
Final result for iteration 4
training data set success rate = 0.901408
testing data set success rate = 0.833333
training data set mean square error = 0.086662
testing data set mean square error = 0.084951
Final result for iteration 5
training data set success rate = 0.866197
testing data set success rate = 0.916667
training data set mean square error = 0.085341
testing data set mean square error = 0.081400
Final benchmarks
Average training success rate = 0.891549
Average testing success rate = 0.861111
Average training mean square error = 0.082917
Average testing mean square error = 0.093247
Page 45
The screen shot for the example