Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm...

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

24 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

Enhancing Performance of KNN Classifier by Means of

Genetic Algorithm and Particle Swarm Optimization. Asha Gowda Karegowda*, Kishore B.

Department of MCA, Siddaganga Institute of Technology, Tumkur, India

[email protected]

A B S T R A C T

KNN is susceptible to noise in view of the fact that, it is based on the distance between the test and

the training sample. Feature weighting and significant feature selection can be the way out to

surmount this limitation of KNN classifier. This paper proposes three methods namely: binary

encoded Genetic Algorithm (GA) for identifying significant features and real encoded GA and

Particle swarm optimization (PSO) identified feature weights for enhancing the performance of

KNN classifier. The outcome of the proposed method proved to be of better-quality when

compared to KNN performance with weights, provided by information gain, gain ratio and Relief

method. Further the estimated work results also proved to be superior when compared to results

of prominent classifiers like radial basis function, support vector machine, decision tree, Bayesian

and Nave Bayes classifier. The binary encoded GA identified significant features and real encoded

GA and PSO provided weights also proved to augment the performance of fuzzy KNN classifier.

Computational work has been carried on seven different datasets availed from UCI machine

learning datasets.

Index Terms: Crisp KNN, Fuzzy KNN, Genetic Algorithm, Particle swarm optimization, feature

subset selection, feature weights

I. INTRODUCTION

Classification is a supervised model, which maps or classifies a data item into one of several predefined

classes. Data classification is a two-step process. In the first step, a model is built describing a

predetermined set of data classes or concepts. Typically the learned model is represented in the form of

classification rules, decision trees, or mathematical formulae. In the second step the model is used for

classification.

The classifiers are of two types: Instance based or lazy learners and Eager learners. Eager learners

(decision tree, Bayesian classifier, SVM, Back propagation neural network) when given set of training set,

will construct a classifier model and use the constructed model to classify the test samples/ previously

unseen samples. In contrast Instance based or lazy learners (k-nearest neighbor classifier and case based

reasoning classifier) are the one in which the classifiers store all of training samples and do not build a

classifier until a new sample with no class label needs to be classified. Lazy learner does less work when

training samples are presented and more work when making a classification or prediction for test sample

[1].

Feature subset selection is of immense importance in the field of data mining. Mining on the reduced set

of attributes, not only reduces computation time and but also helps to make the patterns easier to

understand. Wrapper model approach uses the method of classification itself to measure the significance

of features set; hence, the feature selected depends on the classifier model used in contrast to the filter

approach, which is independent of the learning induction algorithm [2]. In this paper, binary encoded




Genetic Algorithm (GA) has been used for feature subset selection and is wrapped with KNN classifier. In

addition to feature subset selection, the performance of KNN classifier can be enhanced by finding the

weights for each feature, which measures the relevance of feature for the classification task [3]. Feature

subset selection is special case of feature weighting, where weight one is assigned to significant feature

and weight zero is assigned to non-significant feature. Binary encoded GA has been used to identify the

significant feature for k-means clustering [4]. In this paper real encoded GA [5] and Particle Swarm

optimization (PSO) [6] has been used to find the weights of features for enhancing the accuracy of KNN

classifier. For the sake of completeness crisp and fuzzy KNN classifiers are briefed in Section II, followed

by brief discussion of anticipated binary encoded GA for feature selection for KNN classifier in Section III.

The proposed real encoded GA and PSO generated weights adopted for enhancing the performance of

KNN classifier are briefed in Section IV and Section V respectively. The computational results are

presented in Section VI followed by conclusions and future enhancement in Section VII.

II. CRISP AND FUZZY K-NEAREST NEIGHBOR ALGORITHM

Crisp KNN is a simple supervised classification technique, which belongs to instance-based or lazy

learning family of methods [1, 7]. It delays the process of modeling the training data until it is needed to

classify the test samples. The training samples are described by n-dimensional numeric attributes. The

training samples are stored in an n-dimensional space. When a test sample (with unknown class label) is

given, the k-nearest neighbor classifier searches for the k training samples which are closest to the

unknown sample, followed by applying majority voting for classification. Closeness is usually defined in

terms of Euclidean distance. The Euclidean distance between two points P (p1, p2, , pn) and Q (q1, q2,

.qn) given by equation 1.

2

1)(),( ii

n

iqpQPd =

= eq. (1)

In spite of simplicity of KNN, it suffers from quite a few drawbacks such as it requires large memory

proportional to the size of training set ; high computation cost since it needs to compute distance of each

test instance to all training samples ; low accuracy rate in multidimensional data sets with irrelevant

features and there is no thumb rule to determine value of parameter k (number of nearest neighbors).

The accuracy of KNN classifier can be improved by identifying the optimal value of K neighbors, in

addition to identifying the significant inputs for KNN. The Golden section search has been used in

combination with Akaikes Information Criterion (AIC) to find the optimal number of K nearest neighbors

[8]. In addition, prototype generation and prototype selection are used to enhance the nearest neighbor

classifier through data reduction [9,10]. Further the KNN performance can be improved by identifying

the significant features and finding the feature weights. Weighted KNN is an extension of KNN classifier

which incorporates weights for individual attributes, in contrast to KNN classifier which assumes equal

weights for all the attributes. Authors have used six different methods namely: information gain, Gain

Ratio, One rule Classifier, Significance Feature Evaluator, Relief and KNNFP with one attribute for

assigning weights to enhance the performance of KNN classifier [11].

In case of KNN classifier, once an input vector is assigned to a class, there is no indication of its strength

of membership in that class [12]. The fuzzy KNN algorithm assigns class membership to the test record

rather than assigning the test record to a particular class. It assigns the membership based on the sample

records distance from its k-nearest neighbors and those neighbors membership in the possible classes.

The following properties must be true for membership matrix of size c x n where, c and n are the

number of class labels and training samples subjected to condition given by equation 2 and 3 where ik is

the membership of kth training record for ith class.




11

==

c

iik

eq(2)

[ ]1,0 ik eq(3)

The working of fuzzy K-nearest Neighbor Algorithm [11] is as follows:

For each test sample

x repeat steps a-d

a) Compute distance between test sample x and each of the training sample

b) Find k nearest neighbors for test sample x

c) Compute membership of test sample x for each class ci (i = 1 to number of class labels) i.e i(x)

using following equation (4)

=

=

=k

j

m

j

m

j

k

j ij

i

xx

xxx

1

)1

2(

)1

2(

1

||||1||||1

)(

eq(4)

where ij is the membership of jth neighbor of test sample x for each class ci and m is the

fuzzifier value usually set to 2.

d) The results of the fuzzy classification for test sample x is specified using simple crisp partition,

where a test sample is assigned to the class of maximum membership.

Endfor

There are basically three different techniques of membership assignment ij used in equation 4 for the

training samples [12]. The first method, uses a crisp labeling, and assigns each training sample complete

membership of one in its known class and zero membership in all other classes. The second technique

works only on two class data sets. The procedure assigns membership in its known class based on its

distance from the mean of the training sample class. The third method assigns membership to the

training samples according to a K-nearest neighbor rule using equation 5. The K-nearest neighbors to

each sample x (say x belonging to class ci) are found, and then membership in each class is assigned

according to the following equation:

ijifkijifkelse

j

n

nx

j

j

=+

=

=,49.0*)(51.0

!,49.0*)()(

(j = 1 to c) eq(5)

where, value nj is the number of the neighbors belonging to the class cj and k is the total number of

neighbors of test sample x.

III. GA FOR IDENTIFYING SIGNIFICANT FEATURES AND FEATURE WIEGHTS FOR CRISP AND FUZZY

KNN CLASSIFIER




GA is a stochastic general search method, capable of effectively exploring large search spaces. The basic

techniques of the GAs, follow the Charles Darwin principle of survival of the fittest". The reproduction,

crossover, and mutation operations are applied to parent chromosomes to generate the next generation

offspring. Authors have used GA for optimizing the connection weights of feed forward neural network,

finding the significant features for different classifier using both filter and wrapper approach for various

classifiers and for finding the optimal centroids for k-means clustering and fuzzy k-means clustering [4,

13-16].

Authors have applied Binary encoded GA for identifying the significant features. In Binary Encoded GA,

the chromosome can have either 0s or 1s as gene value. The 1s and 0s in the binary encoded GA

represent that the feature is significant and not significant respectively. In the proposed method, with

binary encoding GA, the length of the chromosome is equal to total number of features say F. For example

with diabetic data set, total number of features F = 8. The chromosome length is 8. With 10011001

binary encoded chromosomes, 1st, 4th, 5th and 8th features are significant features and 2nd, 3rd, 6th and

7th features are not significant. The working of binary encoded GA for finding the significant feature

subset for K-Nearest Neighbor is as follows:

i. initialize the chromosome population randomly using binary encoding (each chromosome length

is equal to total number of features F for a given dataset)

ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.

a. Apply KNearest Neighbor using individual chromosome representing the significant features

and find the classification accuracy as fitness of the chromosome.

b. Select the chromosome resulting in highest classification accuracy of KNN classifier as the

fittest chromosome and replace the low fit chromosome by highest fit chromosome

(reproduction).

c. Select any two chromosomes randomly and apply one point crossover operation.

d. Apply mutation operation by randomly selecting any random chromosome and randomly

change the bit 1 to 0 and bit 0 to 1.

The positions of bit 1 in the best-fit chromosome are considered as significant attributes for both fuzzy

and crisp K-Nearest Neighbor classifier.

In addition to significant feature selection using Binary encoded GA, authors have applied Real Encoded

GA for finding feature weights for KNN classifier. With the real encoded GA, chromosomes represent the

weights of the features in contrast to binary encoded GA which is used to find the subset of significant

features from original feature set. The binary encoded GA can be considered as special method for finding

the weights of features, where, zero weight is assigned to non significant features and weight one is

assigned to significant features. A Repair algorithm is used with real Encoded GA algorithms to guarantee

the feasibility of chromosomes (i.e. sum of weights of all features must be equal to one). This is done by

finding the sum of all feature weights and dividing each feature weight by total weight.

The functioning of real encoded GA for finding the feature weights for K-Nearest Neighbor is as follows:

i. Initialize the chromosome population randomly using real encoding. (Each chromosome length is

equal to total number of features F)

ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.




a) Apply weighted KNearest Neighbor to individual chromosome representing the feature

weights and find the classification accuracy as fitness of chromosome.

b) Select the chromosome resulting in highest classification accuracy of weighted KNN as the

fittest chromosome and replace the low fit chromosome by highest fit chromosome.

c) Select any two chromosomes randomly and apply one point crossover operation (call repair

algorithm if the sum of the weights of all feature exceeds one)

d) Apply mutation operation by randomly selecting any chromosome and alter any one of the

weights by multiplying with a random real number. (call repair algorithm if the sum of the

weights of all features exceeds one)

iii. The weights in the fittest chromosome are considered as features weights for both crisp and

fuzzy K-Nearest Neighbor classifier.

IV. PSO FOR FINDING FEATURES WEIGHTS FOR CRISP AND FUZZY KNN CLASSIFIER

PSO are population based search method, inspired by the social behavior of a ock of migrating birds. A

particle is analogous to a chromosome (population member) in GAs. In GA the next generation

chromosomes are generated using crossover, mutation and reproduction process using parent

chromosomes. As opposed to GAs, the evolutionary process in the PSO does not create new birds from

parent ones, instead, each particle flies in the search space with a velocity adjusted by its own knowledge:

Pbest (local search) and best among the companions knowledge: Gbest (global search) [6]. Authors have

applied PSO to optimize the connection weights of feed forward neural network [16] and to find the k-

means centroids [17].

This paper proposes real encoded PSO algorithm to find the weights of features for KNN classifier. The

dimension of each particle is equal to total number of features for a given dataset. The working of real

encoded PSO for finding the significant feature weights for K-Nearest Neighbor is as follows:

i. Initialize the particle population randomly using real encoding (Each particle length is equal to

total number of features F)

ii. Repeat the steps a-e following till terminating condition (Maximum number of iterations) is

reached.

a) Apply weighted KNearest Neighbor using individual particle representing the

significant feature weights and find the classification accuracy.

b) Find local best for each particle. (Best accuracy of individual particle)

c) Find global best from the population of particles. (Best accuracy among all particles)

d) Compute new velocity using local and global best for each particle using equation (6).

e)

vij(t + 1) = wvij (t) + c1R1(pbestij - xij (t)) + c2R2(gbestij - xij (t)) eq (6)

f) Update each particle position using old position and new velocity using equation (7)

xij (t + 1) = xij (t) + vij (t + 1) eq (7)

(Call repair algorithm (as mentioned is Section 3) if the sum of weights of all features of particle

exceeds 1)

iii. The weights represented by the global best particle are considered as final features weights for

both crisp and fuzzy K-Nearest Neighbor classifier.




In equation 6, vij(t + 1) is the new velocity, w is the inertia weight, vij (t) is the old velocity, c1 and c2 are

constants usually set to 2, R1 and R2 are randomly generated numbers, pbestij is the particles local best

and gbestij is the population global best, xij is the old position of particle.

V. EXPERIMENTAL RESULTS

Experiments have been carried out using seven different datasets namely: Heart stat log, diabetes, wine,

Indian liver, vehicle, iris and ionosphere availed from UCI machine learning dataset. The data has been

partitioned by means of holdout method with 60-40 ratio as training set and test set. The data is

normalized using min-max normalization method. To enhance the performance the KNN classifier, binary

encoded GA has been experimented to discover the significant features as explained in section III. In

addition real encoded GA and real encoded PSO have been experimented for identifying the weights of

features for KNN classifier. The results of crisp KNN are compared with anticipated binary encoded GA

identified significant features, and with weights identified by GA and PSO versus weights identified by

information gain, gain ratio and Relief method as shown in Figure 1. Experiments with GA and have been

carried out by varying the populations size and number of generations, in addition to changing the value

of K (1-10) for crisp KNN and fuzzy KNN classifier.

Figure 1 illustrates, thats the classification accuracy of crisp KNN is improved by GA identified features

and GA and PSO identified weights for all the seven datasets. For heart stat log dataset and vehicle

dataset, the crisp KNN performance was preeminent with weights identified by real encoded GA. PSO

identified weights proved to be superlative for crisp KNN classifier for diabetes and Indian liver datasets.

For iris dataset, the binary encoded GA features proved to top for crisp KNN classifier. For wine and

ionosphere datasets, binary encoded GA features and real encoded GA and PSO weights resulted in top

and same accuracy of crisp KNN classifier.

Further GA identified feature and GA and PSO identified weights have been used to further enhance the

performance of fuzzy KNN classifier. For fuzzy KNN classifier, the membership of training samples is

computed using two methods (a) crisp assignment method (assign each training sample complete

membership in its known class and zero membership in all other classes) and is named as F1WKNN and

(b) using equation 5 and is named as F2WKNN. The relative results of F1WKNN classifier (with crisp

assignment method) and F2WKNN classifier (with assignment using equation 5) using significant

features identified by GA and using weights identified by GA and PSO versus weights identified by

information Gain, gain ratio and Relief method are shown in Figure 2 and Figure 3 respectively with

fuzzifier value m equal to 2. With F1WKNN classifier, weights identified by real encoded GA proved to be

best for heart-statlog, wine, Indian liver and iris datasets. For diabetes dataset, F1WKNN showed an

augmented accuracy by order of 2- 3 % with weights identified by Gain ratio method when compared to

proposed method. For vehicle and ionosphere datasets, weights given information gain proved to be to

some extent better for F1WKNN when compared to proposed method.

The classification accuracy of F2WKNN was paramount with real encoded GA identified weights for heart

statlog, Indian liver, vehicle and iris datasets when compared to other methods. Both binary encoded GA

features and real encoded GA weights resulted in same classification accuracy of F2WKNN for wine and

ionosphere datasets. For diabetes datasets, the binary encoded GA provided features resulted in better

accuracy of F2WKNN when compared to weights proposed by real encoded GA and PSO. However, the

weights identified by information gain and Gain ratio resulted in slight improved performance for

diabetes dataset when compared to GA identified features. The results of F2WKNN are found to enhance

when compared to F1WKNN.




In addition the results of proposed method for improving the crisp and fuzzy KNN classifier accuracy by

using GA identified features, and PSO and GA identified weights is compared with five well known

classifiers (available in WEKA tool) namely radial basis network (RBF), Support vector machine (SVM),

Decision tree (C4.5), Bayesian and Nave Bayes methods as shown in Figure 4 Figure 6.

Figure 4 illustrates that, both binary encoded GA features and real encoded GA and PSO weights resulted

in same classification accuracy of crisp KNN classifier for wine and ionosphere datasets. For iris dataset,

GA identified features for crisp KNN classifier resulted in top accuracy for iris dataset when compared to

other classifiers. PSO identified weights proved to finest for both diabetes and Indian liver datasets and

GA identified weights proved to superlative for Heart stat-statlog and vehicle datasets for crisp KNN

when compared to other classifiers.

Figure 5 depicts that, PSO identified weights proved be most excellent for F1WKNN for diabetes, when

compared to other classifiers. SVM proved to preeminent for Heart-statlog dataset for F1WKNN. Further

GA identified weights with F1WKNN showed better performance when compared to other classifiers for

wine, Indian liver, vehicle and iris dataset. Figure 5 illustrates that, both binary encoded GA features and

real encoded GA and PSO weights resulted in same classification accuracy of F1WKNN ionosphere

datasets. However, the GA identified features showed a decline in performance for both vehicle and iris

datasets for F1WKNN classifier.

Figure 6 depicts that GA identified weights for F2WKNN proved be best for all the datasets excluding

diabetes when compared with other classifiers. For Diabetes dataset, Bayesian classifier showed

negligible enhancement when compared to F2WKNN with GA identified features. On the other hand, the

GA identified features showed a decline in performance for iris datasets with F2WKNN classifier. GA and

PSO identified weights proved to vital with F2 WKNN and resulted in same accuracy for ionosphere

dataset, where as GA recognized features and weights proved to paramount with F2WKKNN for wine

dataset when compared to other classifiers.

VI. CONCLUSIONS

This paper projected binary encoded GA for identifying significant features and real encoded GA and PSO

for finding feature weights for enhancing the performance of both crisp and fuzzy KNN classifier. Among

the three proposed methods for improving the performance of KNN classifier, there is no one single

method which is superlative for all the seven datasets. Overall the two evolutionary methods namely GA

and PSO have enhanced the performance of KNN classifier when compared to outcome of well known

classifier like radial basis function, support vector machine, decision tree, Bayesian and Nave bayes

classifiers as well as with the feature weights identified by information gain, gain ratio and Relief method

for all the experimented seven datasets. As part of future enhancement, authors would like to extend the

work on significant feature selection using binary PSO and binary cuckoo search algorithms. In addition

to PSO and GA, the basic version and modified version of Cuckoo search algorithms identified feature

weights can be applied for further improving the performance of KNN classifier.

VII. REFERENCES

[1] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann

Publishers, (2001).




[2] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Feature Subset Selection Problem using

Wrapper Approach in Supervised Learning, .International Journal on Computer

Applications(IJCA) ,Vol. 1, pp. 13-17, 2010.

[3] S. Cost, Salzberg, A weighted nearest neighbor algorithm for learning with symbolic features,

Machine Learning, Vol. 10, No 1, Jan.1993, pp. 57-78.

[4] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Vidya T, Shama, Genetic Algorithm based

Dimensionality Reduction for Improving Performance of k-means and fuzzy k-means clustering:

A Case study for Categorization of Medical dataset, Gwalior, India, Proceedings of Seventh

International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA

2012),Advances in Intelligent Systems and Computing, Vol.201, pp. 169-180, 2013.

[5] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine learning, Addison Wesley,

(1989).

[6] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory, Proceedings of the

Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp.39-43,

1995.

[7] T.M. Cover, P.E. Hart, Nearest Neighbor pattern classification, IEEE Transactions on Information

Theory, Vol. IT13, pp. 21-27, 1967.

[8] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Combining Akaikes Information

Criterion (AIC) and the Golden-Section Search Technique to find Optimal Numbers of Nearest

Neighbors, International , Journal of Computer Applications Vol. 2, pp. 80-67, May 2010.

[9] Isaac Triguero, Joaqun Derrac, Salvador Garca, Francisco Herrera, A Taxonomy and

Experimental Study on Prototype Generation for Nearest Neighbor Classification, IEEE

Transactions on Systems, Man, and Cybernetics, Part C Vol. 42(1), pp. 86-100, 2012.

[10] Salvador Garcia, Joaquin Derrac, Jose Ramon Cano, Francisco Herrera, Prototype Selection for

Nearest Neighbor Classification: Taxonomy and Empirical Study, IEEE Transactions on Pattern

Analysis and Machine Intelligence, Vol. 345(3), pp. 417-435, 2012.

[11] Asha Gowda Karegowda, Rakesh Kumar Singh, M.A.Jayaram, A.S .Manjunath, Improving

Weighted K-Nearest Neighbor Feature Projections Performance with Different Weights

Assigning Methods , International Conference on Computational Intelligence (ICCI 2010)

December 9 11, 2010, Coimbatore, India.

[12] James M. Keller, Michael R Gray, James A Givens JR, A fuzzy K-nearest neighbor algorithm, IEEE

Transactions on Systems, Man, and Cybernetics, Vol. SMC-15, No. 4, pp. 580-585, 1985.

[13] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Application of Genetic Algorithm

Optimized Neural Network connection weights for medical diagnosis of PIMA Indian diabetes,

International Journal of Soft Computing, Vol.2, No.2, pp. 15-22, May 2011.

[14] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Feature subset selection using cascaded

GA and CFS: a filter approach in supervised learning:, International Journal on Computer

Applications, Vol. 23(2), pp.1-10, 2011.




[15] Asha Gowda Karegowda, Shama, Vidya T, M.A.Jayaram, A.S .Manjunath, Improving Performance

of K-Means Clustering By Initializing Cluster Centers Using Genetic Algorithm and Entropy Based

Fuzzy Clustering for Categorization Of Diabetic Patients, MSRIT, Bangalore, Proceedings of

International Conference on Advances in Computing, Advances in Intelligent Systems and

Computing, Volume 174, July 4-6, pp. 899-904, 2012.

[16] Asha Gowda Karegowda, M.A. Jayaram, Significant Feature Set Driven, Optimized FFN for

Enhanced Classification, International Journal of Computational Intelligence and Informatics

ISSN: 2231-0258, Vol. 2, No 4, Mar 2013.

[17] Asha Gowda Karegowda, Seema Kumari, Particle Swarm Optimization Algorithm Based k-means

and Fuzzy c-means clustering, International Journal of Advanced Research in Computer Science

and Software Engineering Vol. 3, Issue 7, pp. 448-451, July 2013 .

67

70

73

76

79

82

85

88

91

94

97

100

Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphere

Datasets

Cla

ssifi

catio

n Ac

cura

cy

KNN Binary GA-WKNN Real GA-WKNNReal PSO-WKNN Information Gain-WKNN Gain Ratio-WKNNRelief-WKNN

Figure 1. Comparative performance of crisp KNN classifier using binary encoded GA identified features

and real encoded GA and PSO identified feature weights vs. information gain, gain ratio and relief method

identified weights for different datasets




646770737679828588919497

100

Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere

DataSets

Cla

ssifi

catio

n Ac

cura

cyF1KNN Binary GA-F1WKNN Real GA-F1WKNN

Real PSO-F1KNN Information Gain-F1WKNN Gain Ratio-F1KNN

Releif-F1WKNN

65

70

75

80

85

90

95

100

Heart-statlog Diabetes Wine Indian Liver Vehicle Iris IonosphereDatasets

Cla

ssifi

catio

n a

ccu

racy

F2KNN Binary GA-F2WKNN Real GA-F2WKNN

Real PSO-F2WKNN Information Gain-F2WKNN Gain Ratio-F2WKNN

Relief-F2WKNN

Figure 2. Comparative performance of F1WKNN classifier (with crisp method for membership

assignment) using binary encoded GA identified features and real encoded GA and PSO identified

feature weights vs. information gain, gain ratio and relief method for different datasets

Figure 3. Comparative performance of F2WKNN classifier (with equation 5 for membership


feature weights vs. information gain, gain ratio and relief method for different datasets




45

50

55

60

65

70

75

80

85

90

95

100

Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphereDATA SETS

Cla

ssifi

catio

n Ac

cura

cyKNN Binary GA-WKNN Real GA-WKNN Real PSO-WKNN RBF

SVM DT Bayes NaiveBayes

45

5055

60

657075

8085

9095

100


Datasets

Cla

ssifi

catio

n A

ccu

racy

F1KNN Binary GA-F1WKNN Real GA-F1WKNNReal PSO-F1KNN RBF SVMDecision Tree Bayesian NaiveBayes

Figure 5. Comparative performance of F1WKNN classifier (with crisp method for membership

assignment) using binary encoded GA identified features and real encoded GA and PSO identified weights

vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different datasets

Figure 4. Comparative performance of Crisp KNN classifier (with crisp method for membership


feature weights vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different

datasets




45

50

55

60

65

70

75

80

85

90

95

100


Datase tsDatase tsDatase tsDatase ts

Cla

ssifi

catio

n A

ccu

racy

F2KNN Binary GA-F2WKNN Real GA-F2WKNNReal PSO-F2WKNN RBF SVMDecision Tree Bayesian NaiveBayes

ss

Figure 6. Comparative performance of F2WKNN classifier (with equation 5 for membership assignment)

using binary encoded GA identified features and real encoded GA and PSO identified weights vs SVM, RBF,

Decision Tree, Bayesian and Nave Bayes classifiers for different datasets

Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm...

Documents

Transcript of Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm...