Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm...

download Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm Optimization.

of 12

description

KNN is susceptible to noise in view of the fact that, it is based on the distance between the test and the training sample. Feature weighting and significant feature selection can be the way out to surmount this limitation of KNN classifier. This paper proposes three methods namely: binary encoded Genetic Algorithm (GA) for identifying significant features and real encoded GA and Particle swarm optimization (PSO) identified feature weights for enhancing the performance of KNN classifier. The outcome of the proposed method proved to be of better-quality when compared to KNN performance with weights, provided by information gain, gain ratio and Relief method. Further the estimated work results also proved to be superior when compared to results of prominent classifiers like radial basis function, support vector machine, decision tree, Bayesian and Naïve Bayes classifier. The binary encoded GA identified significant features and real encoded GA and PSO provided weights also proved to augment the performance of fuzzy KNN classifier. Computational work has been carried on seven different datasets availed from UCI machine learning datasets.

Transcript of Enhancing Performance of KNN Classifier by Means of Genetic Algorithm and Particle Swarm...

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    24 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Enhancing Performance of KNN Classifier by Means of

    Genetic Algorithm and Particle Swarm Optimization. Asha Gowda Karegowda*, Kishore B.

    Department of MCA, Siddaganga Institute of Technology, Tumkur, India

    [email protected]

    A B S T R A C T

    KNN is susceptible to noise in view of the fact that, it is based on the distance between the test and

    the training sample. Feature weighting and significant feature selection can be the way out to

    surmount this limitation of KNN classifier. This paper proposes three methods namely: binary

    encoded Genetic Algorithm (GA) for identifying significant features and real encoded GA and

    Particle swarm optimization (PSO) identified feature weights for enhancing the performance of

    KNN classifier. The outcome of the proposed method proved to be of better-quality when

    compared to KNN performance with weights, provided by information gain, gain ratio and Relief

    method. Further the estimated work results also proved to be superior when compared to results

    of prominent classifiers like radial basis function, support vector machine, decision tree, Bayesian

    and Nave Bayes classifier. The binary encoded GA identified significant features and real encoded

    GA and PSO provided weights also proved to augment the performance of fuzzy KNN classifier.

    Computational work has been carried on seven different datasets availed from UCI machine

    learning datasets.

    Index Terms: Crisp KNN, Fuzzy KNN, Genetic Algorithm, Particle swarm optimization, feature

    subset selection, feature weights

    I. INTRODUCTION

    Classification is a supervised model, which maps or classifies a data item into one of several predefined

    classes. Data classification is a two-step process. In the first step, a model is built describing a

    predetermined set of data classes or concepts. Typically the learned model is represented in the form of

    classification rules, decision trees, or mathematical formulae. In the second step the model is used for

    classification.

    The classifiers are of two types: Instance based or lazy learners and Eager learners. Eager learners

    (decision tree, Bayesian classifier, SVM, Back propagation neural network) when given set of training set,

    will construct a classifier model and use the constructed model to classify the test samples/ previously

    unseen samples. In contrast Instance based or lazy learners (k-nearest neighbor classifier and case based

    reasoning classifier) are the one in which the classifiers store all of training samples and do not build a

    classifier until a new sample with no class label needs to be classified. Lazy learner does less work when

    training samples are presented and more work when making a classification or prediction for test sample

    [1].

    Feature subset selection is of immense importance in the field of data mining. Mining on the reduced set

    of attributes, not only reduces computation time and but also helps to make the patterns easier to

    understand. Wrapper model approach uses the method of classification itself to measure the significance

    of features set; hence, the feature selected depends on the classifier model used in contrast to the filter

    approach, which is independent of the learning induction algorithm [2]. In this paper, binary encoded

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    25 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Genetic Algorithm (GA) has been used for feature subset selection and is wrapped with KNN classifier. In

    addition to feature subset selection, the performance of KNN classifier can be enhanced by finding the

    weights for each feature, which measures the relevance of feature for the classification task [3]. Feature

    subset selection is special case of feature weighting, where weight one is assigned to significant feature

    and weight zero is assigned to non-significant feature. Binary encoded GA has been used to identify the

    significant feature for k-means clustering [4]. In this paper real encoded GA [5] and Particle Swarm

    optimization (PSO) [6] has been used to find the weights of features for enhancing the accuracy of KNN

    classifier. For the sake of completeness crisp and fuzzy KNN classifiers are briefed in Section II, followed

    by brief discussion of anticipated binary encoded GA for feature selection for KNN classifier in Section III.

    The proposed real encoded GA and PSO generated weights adopted for enhancing the performance of

    KNN classifier are briefed in Section IV and Section V respectively. The computational results are

    presented in Section VI followed by conclusions and future enhancement in Section VII.

    II. CRISP AND FUZZY K-NEAREST NEIGHBOR ALGORITHM

    Crisp KNN is a simple supervised classification technique, which belongs to instance-based or lazy

    learning family of methods [1, 7]. It delays the process of modeling the training data until it is needed to

    classify the test samples. The training samples are described by n-dimensional numeric attributes. The

    training samples are stored in an n-dimensional space. When a test sample (with unknown class label) is

    given, the k-nearest neighbor classifier searches for the k training samples which are closest to the

    unknown sample, followed by applying majority voting for classification. Closeness is usually defined in

    terms of Euclidean distance. The Euclidean distance between two points P (p1, p2, , pn) and Q (q1, q2,

    .qn) given by equation 1.

    2

    1)(),( ii

    n

    iqpQPd =

    = eq. (1)

    In spite of simplicity of KNN, it suffers from quite a few drawbacks such as it requires large memory

    proportional to the size of training set ; high computation cost since it needs to compute distance of each

    test instance to all training samples ; low accuracy rate in multidimensional data sets with irrelevant

    features and there is no thumb rule to determine value of parameter k (number of nearest neighbors).

    The accuracy of KNN classifier can be improved by identifying the optimal value of K neighbors, in

    addition to identifying the significant inputs for KNN. The Golden section search has been used in

    combination with Akaikes Information Criterion (AIC) to find the optimal number of K nearest neighbors

    [8]. In addition, prototype generation and prototype selection are used to enhance the nearest neighbor

    classifier through data reduction [9,10]. Further the KNN performance can be improved by identifying

    the significant features and finding the feature weights. Weighted KNN is an extension of KNN classifier

    which incorporates weights for individual attributes, in contrast to KNN classifier which assumes equal

    weights for all the attributes. Authors have used six different methods namely: information gain, Gain

    Ratio, One rule Classifier, Significance Feature Evaluator, Relief and KNNFP with one attribute for

    assigning weights to enhance the performance of KNN classifier [11].

    In case of KNN classifier, once an input vector is assigned to a class, there is no indication of its strength

    of membership in that class [12]. The fuzzy KNN algorithm assigns class membership to the test record

    rather than assigning the test record to a particular class. It assigns the membership based on the sample

    records distance from its k-nearest neighbors and those neighbors membership in the possible classes.

    The following properties must be true for membership matrix of size c x n where, c and n are the

    number of class labels and training samples subjected to condition given by equation 2 and 3 where ik is

    the membership of kth training record for ith class.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    26 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    11

    ==

    c

    iik

    eq(2)

    [ ]1,0 ik eq(3)

    The working of fuzzy K-nearest Neighbor Algorithm [11] is as follows:

    For each test sample

    x repeat steps a-d

    a) Compute distance between test sample x and each of the training sample

    b) Find k nearest neighbors for test sample x

    c) Compute membership of test sample x for each class ci (i = 1 to number of class labels) i.e i(x)

    using following equation (4)

    =

    =

    =k

    j

    m

    j

    m

    j

    k

    j ij

    i

    xx

    xxx

    1

    )1

    2(

    )1

    2(

    1

    ||||1||||1

    )(

    eq(4)

    where ij is the membership of jth neighbor of test sample x for each class ci and m is the

    fuzzifier value usually set to 2.

    d) The results of the fuzzy classification for test sample x is specified using simple crisp partition,

    where a test sample is assigned to the class of maximum membership.

    Endfor

    There are basically three different techniques of membership assignment ij used in equation 4 for the

    training samples [12]. The first method, uses a crisp labeling, and assigns each training sample complete

    membership of one in its known class and zero membership in all other classes. The second technique

    works only on two class data sets. The procedure assigns membership in its known class based on its

    distance from the mean of the training sample class. The third method assigns membership to the

    training samples according to a K-nearest neighbor rule using equation 5. The K-nearest neighbors to

    each sample x (say x belonging to class ci) are found, and then membership in each class is assigned

    according to the following equation:

    ijifkijifkelse

    j

    n

    nx

    j

    j

    =+

    =

    =,49.0*)(51.0

    !,49.0*)()(

    (j = 1 to c) eq(5)

    where, value nj is the number of the neighbors belonging to the class cj and k is the total number of

    neighbors of test sample x.

    III. GA FOR IDENTIFYING SIGNIFICANT FEATURES AND FEATURE WIEGHTS FOR CRISP AND FUZZY

    KNN CLASSIFIER

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    27 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    GA is a stochastic general search method, capable of effectively exploring large search spaces. The basic

    techniques of the GAs, follow the Charles Darwin principle of survival of the fittest". The reproduction,

    crossover, and mutation operations are applied to parent chromosomes to generate the next generation

    offspring. Authors have used GA for optimizing the connection weights of feed forward neural network,

    finding the significant features for different classifier using both filter and wrapper approach for various

    classifiers and for finding the optimal centroids for k-means clustering and fuzzy k-means clustering [4,

    13-16].

    Authors have applied Binary encoded GA for identifying the significant features. In Binary Encoded GA,

    the chromosome can have either 0s or 1s as gene value. The 1s and 0s in the binary encoded GA

    represent that the feature is significant and not significant respectively. In the proposed method, with

    binary encoding GA, the length of the chromosome is equal to total number of features say F. For example

    with diabetic data set, total number of features F = 8. The chromosome length is 8. With 10011001

    binary encoded chromosomes, 1st, 4th, 5th and 8th features are significant features and 2nd, 3rd, 6th and

    7th features are not significant. The working of binary encoded GA for finding the significant feature

    subset for K-Nearest Neighbor is as follows:

    i. initialize the chromosome population randomly using binary encoding (each chromosome length

    is equal to total number of features F for a given dataset)

    ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.

    a. Apply KNearest Neighbor using individual chromosome representing the significant features

    and find the classification accuracy as fitness of the chromosome.

    b. Select the chromosome resulting in highest classification accuracy of KNN classifier as the

    fittest chromosome and replace the low fit chromosome by highest fit chromosome

    (reproduction).

    c. Select any two chromosomes randomly and apply one point crossover operation.

    d. Apply mutation operation by randomly selecting any random chromosome and randomly

    change the bit 1 to 0 and bit 0 to 1.

    The positions of bit 1 in the best-fit chromosome are considered as significant attributes for both fuzzy

    and crisp K-Nearest Neighbor classifier.

    In addition to significant feature selection using Binary encoded GA, authors have applied Real Encoded

    GA for finding feature weights for KNN classifier. With the real encoded GA, chromosomes represent the

    weights of the features in contrast to binary encoded GA which is used to find the subset of significant

    features from original feature set. The binary encoded GA can be considered as special method for finding

    the weights of features, where, zero weight is assigned to non significant features and weight one is

    assigned to significant features. A Repair algorithm is used with real Encoded GA algorithms to guarantee

    the feasibility of chromosomes (i.e. sum of weights of all features must be equal to one). This is done by

    finding the sum of all feature weights and dividing each feature weight by total weight.

    The functioning of real encoded GA for finding the feature weights for K-Nearest Neighbor is as follows:

    i. Initialize the chromosome population randomly using real encoding. (Each chromosome length is

    equal to total number of features F)

    ii. Repeat the steps a-d till terminating condition (maximum number of generations) is reached.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    28 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    a) Apply weighted KNearest Neighbor to individual chromosome representing the feature

    weights and find the classification accuracy as fitness of chromosome.

    b) Select the chromosome resulting in highest classification accuracy of weighted KNN as the

    fittest chromosome and replace the low fit chromosome by highest fit chromosome.

    c) Select any two chromosomes randomly and apply one point crossover operation (call repair

    algorithm if the sum of the weights of all feature exceeds one)

    d) Apply mutation operation by randomly selecting any chromosome and alter any one of the

    weights by multiplying with a random real number. (call repair algorithm if the sum of the

    weights of all features exceeds one)

    iii. The weights in the fittest chromosome are considered as features weights for both crisp and

    fuzzy K-Nearest Neighbor classifier.

    IV. PSO FOR FINDING FEATURES WEIGHTS FOR CRISP AND FUZZY KNN CLASSIFIER

    PSO are population based search method, inspired by the social behavior of a ock of migrating birds. A

    particle is analogous to a chromosome (population member) in GAs. In GA the next generation

    chromosomes are generated using crossover, mutation and reproduction process using parent

    chromosomes. As opposed to GAs, the evolutionary process in the PSO does not create new birds from

    parent ones, instead, each particle flies in the search space with a velocity adjusted by its own knowledge:

    Pbest (local search) and best among the companions knowledge: Gbest (global search) [6]. Authors have

    applied PSO to optimize the connection weights of feed forward neural network [16] and to find the k-

    means centroids [17].

    This paper proposes real encoded PSO algorithm to find the weights of features for KNN classifier. The

    dimension of each particle is equal to total number of features for a given dataset. The working of real

    encoded PSO for finding the significant feature weights for K-Nearest Neighbor is as follows:

    i. Initialize the particle population randomly using real encoding (Each particle length is equal to

    total number of features F)

    ii. Repeat the steps a-e following till terminating condition (Maximum number of iterations) is

    reached.

    a) Apply weighted KNearest Neighbor using individual particle representing the

    significant feature weights and find the classification accuracy.

    b) Find local best for each particle. (Best accuracy of individual particle)

    c) Find global best from the population of particles. (Best accuracy among all particles)

    d) Compute new velocity using local and global best for each particle using equation (6).

    e)

    vij(t + 1) = wvij (t) + c1R1(pbestij - xij (t)) + c2R2(gbestij - xij (t)) eq (6)

    f) Update each particle position using old position and new velocity using equation (7)

    xij (t + 1) = xij (t) + vij (t + 1) eq (7)

    (Call repair algorithm (as mentioned is Section 3) if the sum of weights of all features of particle

    exceeds 1)

    iii. The weights represented by the global best particle are considered as final features weights for

    both crisp and fuzzy K-Nearest Neighbor classifier.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    29 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    In equation 6, vij(t + 1) is the new velocity, w is the inertia weight, vij (t) is the old velocity, c1 and c2 are

    constants usually set to 2, R1 and R2 are randomly generated numbers, pbestij is the particles local best

    and gbestij is the population global best, xij is the old position of particle.

    V. EXPERIMENTAL RESULTS

    Experiments have been carried out using seven different datasets namely: Heart stat log, diabetes, wine,

    Indian liver, vehicle, iris and ionosphere availed from UCI machine learning dataset. The data has been

    partitioned by means of holdout method with 60-40 ratio as training set and test set. The data is

    normalized using min-max normalization method. To enhance the performance the KNN classifier, binary

    encoded GA has been experimented to discover the significant features as explained in section III. In

    addition real encoded GA and real encoded PSO have been experimented for identifying the weights of

    features for KNN classifier. The results of crisp KNN are compared with anticipated binary encoded GA

    identified significant features, and with weights identified by GA and PSO versus weights identified by

    information gain, gain ratio and Relief method as shown in Figure 1. Experiments with GA and have been

    carried out by varying the populations size and number of generations, in addition to changing the value

    of K (1-10) for crisp KNN and fuzzy KNN classifier.

    Figure 1 illustrates, thats the classification accuracy of crisp KNN is improved by GA identified features

    and GA and PSO identified weights for all the seven datasets. For heart stat log dataset and vehicle

    dataset, the crisp KNN performance was preeminent with weights identified by real encoded GA. PSO

    identified weights proved to be superlative for crisp KNN classifier for diabetes and Indian liver datasets.

    For iris dataset, the binary encoded GA features proved to top for crisp KNN classifier. For wine and

    ionosphere datasets, binary encoded GA features and real encoded GA and PSO weights resulted in top

    and same accuracy of crisp KNN classifier.

    Further GA identified feature and GA and PSO identified weights have been used to further enhance the

    performance of fuzzy KNN classifier. For fuzzy KNN classifier, the membership of training samples is

    computed using two methods (a) crisp assignment method (assign each training sample complete

    membership in its known class and zero membership in all other classes) and is named as F1WKNN and

    (b) using equation 5 and is named as F2WKNN. The relative results of F1WKNN classifier (with crisp

    assignment method) and F2WKNN classifier (with assignment using equation 5) using significant

    features identified by GA and using weights identified by GA and PSO versus weights identified by

    information Gain, gain ratio and Relief method are shown in Figure 2 and Figure 3 respectively with

    fuzzifier value m equal to 2. With F1WKNN classifier, weights identified by real encoded GA proved to be

    best for heart-statlog, wine, Indian liver and iris datasets. For diabetes dataset, F1WKNN showed an

    augmented accuracy by order of 2- 3 % with weights identified by Gain ratio method when compared to

    proposed method. For vehicle and ionosphere datasets, weights given information gain proved to be to

    some extent better for F1WKNN when compared to proposed method.

    The classification accuracy of F2WKNN was paramount with real encoded GA identified weights for heart

    statlog, Indian liver, vehicle and iris datasets when compared to other methods. Both binary encoded GA

    features and real encoded GA weights resulted in same classification accuracy of F2WKNN for wine and

    ionosphere datasets. For diabetes datasets, the binary encoded GA provided features resulted in better

    accuracy of F2WKNN when compared to weights proposed by real encoded GA and PSO. However, the

    weights identified by information gain and Gain ratio resulted in slight improved performance for

    diabetes dataset when compared to GA identified features. The results of F2WKNN are found to enhance

    when compared to F1WKNN.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    30 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    In addition the results of proposed method for improving the crisp and fuzzy KNN classifier accuracy by

    using GA identified features, and PSO and GA identified weights is compared with five well known

    classifiers (available in WEKA tool) namely radial basis network (RBF), Support vector machine (SVM),

    Decision tree (C4.5), Bayesian and Nave Bayes methods as shown in Figure 4 Figure 6.

    Figure 4 illustrates that, both binary encoded GA features and real encoded GA and PSO weights resulted

    in same classification accuracy of crisp KNN classifier for wine and ionosphere datasets. For iris dataset,

    GA identified features for crisp KNN classifier resulted in top accuracy for iris dataset when compared to

    other classifiers. PSO identified weights proved to finest for both diabetes and Indian liver datasets and

    GA identified weights proved to superlative for Heart stat-statlog and vehicle datasets for crisp KNN

    when compared to other classifiers.

    Figure 5 depicts that, PSO identified weights proved be most excellent for F1WKNN for diabetes, when

    compared to other classifiers. SVM proved to preeminent for Heart-statlog dataset for F1WKNN. Further

    GA identified weights with F1WKNN showed better performance when compared to other classifiers for

    wine, Indian liver, vehicle and iris dataset. Figure 5 illustrates that, both binary encoded GA features and

    real encoded GA and PSO weights resulted in same classification accuracy of F1WKNN ionosphere

    datasets. However, the GA identified features showed a decline in performance for both vehicle and iris

    datasets for F1WKNN classifier.

    Figure 6 depicts that GA identified weights for F2WKNN proved be best for all the datasets excluding

    diabetes when compared with other classifiers. For Diabetes dataset, Bayesian classifier showed

    negligible enhancement when compared to F2WKNN with GA identified features. On the other hand, the

    GA identified features showed a decline in performance for iris datasets with F2WKNN classifier. GA and

    PSO identified weights proved to vital with F2 WKNN and resulted in same accuracy for ionosphere

    dataset, where as GA recognized features and weights proved to paramount with F2WKKNN for wine

    dataset when compared to other classifiers.

    VI. CONCLUSIONS

    This paper projected binary encoded GA for identifying significant features and real encoded GA and PSO

    for finding feature weights for enhancing the performance of both crisp and fuzzy KNN classifier. Among

    the three proposed methods for improving the performance of KNN classifier, there is no one single

    method which is superlative for all the seven datasets. Overall the two evolutionary methods namely GA

    and PSO have enhanced the performance of KNN classifier when compared to outcome of well known

    classifier like radial basis function, support vector machine, decision tree, Bayesian and Nave bayes

    classifiers as well as with the feature weights identified by information gain, gain ratio and Relief method

    for all the experimented seven datasets. As part of future enhancement, authors would like to extend the

    work on significant feature selection using binary PSO and binary cuckoo search algorithms. In addition

    to PSO and GA, the basic version and modified version of Cuckoo search algorithms identified feature

    weights can be applied for further improving the performance of KNN classifier.

    VII. REFERENCES

    [1] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann

    Publishers, (2001).

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    31 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    [2] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Feature Subset Selection Problem using

    Wrapper Approach in Supervised Learning, .International Journal on Computer

    Applications(IJCA) ,Vol. 1, pp. 13-17, 2010.

    [3] S. Cost, Salzberg, A weighted nearest neighbor algorithm for learning with symbolic features,

    Machine Learning, Vol. 10, No 1, Jan.1993, pp. 57-78.

    [4] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Vidya T, Shama, Genetic Algorithm based

    Dimensionality Reduction for Improving Performance of k-means and fuzzy k-means clustering:

    A Case study for Categorization of Medical dataset, Gwalior, India, Proceedings of Seventh

    International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA

    2012),Advances in Intelligent Systems and Computing, Vol.201, pp. 169-180, 2013.

    [5] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine learning, Addison Wesley,

    (1989).

    [6] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory, Proceedings of the

    Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp.39-43,

    1995.

    [7] T.M. Cover, P.E. Hart, Nearest Neighbor pattern classification, IEEE Transactions on Information

    Theory, Vol. IT13, pp. 21-27, 1967.

    [8] Asha Gowda Karegowda, M.A. Jayaram, A.S. Manjunath, Combining Akaikes Information

    Criterion (AIC) and the Golden-Section Search Technique to find Optimal Numbers of Nearest

    Neighbors, International , Journal of Computer Applications Vol. 2, pp. 80-67, May 2010.

    [9] Isaac Triguero, Joaqun Derrac, Salvador Garca, Francisco Herrera, A Taxonomy and

    Experimental Study on Prototype Generation for Nearest Neighbor Classification, IEEE

    Transactions on Systems, Man, and Cybernetics, Part C Vol. 42(1), pp. 86-100, 2012.

    [10] Salvador Garcia, Joaquin Derrac, Jose Ramon Cano, Francisco Herrera, Prototype Selection for

    Nearest Neighbor Classification: Taxonomy and Empirical Study, IEEE Transactions on Pattern

    Analysis and Machine Intelligence, Vol. 345(3), pp. 417-435, 2012.

    [11] Asha Gowda Karegowda, Rakesh Kumar Singh, M.A.Jayaram, A.S .Manjunath, Improving

    Weighted K-Nearest Neighbor Feature Projections Performance with Different Weights

    Assigning Methods , International Conference on Computational Intelligence (ICCI 2010)

    December 9 11, 2010, Coimbatore, India.

    [12] James M. Keller, Michael R Gray, James A Givens JR, A fuzzy K-nearest neighbor algorithm, IEEE

    Transactions on Systems, Man, and Cybernetics, Vol. SMC-15, No. 4, pp. 580-585, 1985.

    [13] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Application of Genetic Algorithm

    Optimized Neural Network connection weights for medical diagnosis of PIMA Indian diabetes,

    International Journal of Soft Computing, Vol.2, No.2, pp. 15-22, May 2011.

    [14] Asha Gowda Karegowda, M.A.Jayaram, A.S. Manjunath, Feature subset selection using cascaded

    GA and CFS: a filter approach in supervised learning:, International Journal on Computer

    Applications, Vol. 23(2), pp.1-10, 2011.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    32 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    [15] Asha Gowda Karegowda, Shama, Vidya T, M.A.Jayaram, A.S .Manjunath, Improving Performance

    of K-Means Clustering By Initializing Cluster Centers Using Genetic Algorithm and Entropy Based

    Fuzzy Clustering for Categorization Of Diabetic Patients, MSRIT, Bangalore, Proceedings of

    International Conference on Advances in Computing, Advances in Intelligent Systems and

    Computing, Volume 174, July 4-6, pp. 899-904, 2012.

    [16] Asha Gowda Karegowda, M.A. Jayaram, Significant Feature Set Driven, Optimized FFN for

    Enhanced Classification, International Journal of Computational Intelligence and Informatics

    ISSN: 2231-0258, Vol. 2, No 4, Mar 2013.

    [17] Asha Gowda Karegowda, Seema Kumari, Particle Swarm Optimization Algorithm Based k-means

    and Fuzzy c-means clustering, International Journal of Advanced Research in Computer Science

    and Software Engineering Vol. 3, Issue 7, pp. 448-451, July 2013 .

    67

    70

    73

    76

    79

    82

    85

    88

    91

    94

    97

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphere

    Datasets

    Cla

    ssifi

    catio

    n Ac

    cura

    cy

    KNN Binary GA-WKNN Real GA-WKNNReal PSO-WKNN Information Gain-WKNN Gain Ratio-WKNNRelief-WKNN

    Figure 1. Comparative performance of crisp KNN classifier using binary encoded GA identified features

    and real encoded GA and PSO identified feature weights vs. information gain, gain ratio and relief method

    identified weights for different datasets

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    33 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    646770737679828588919497

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere

    DataSets

    Cla

    ssifi

    catio

    n Ac

    cura

    cyF1KNN Binary GA-F1WKNN Real GA-F1WKNN

    Real PSO-F1KNN Information Gain-F1WKNN Gain Ratio-F1KNN

    Releif-F1WKNN

    65

    70

    75

    80

    85

    90

    95

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris IonosphereDatasets

    Cla

    ssifi

    catio

    n a

    ccu

    racy

    F2KNN Binary GA-F2WKNN Real GA-F2WKNN

    Real PSO-F2WKNN Information Gain-F2WKNN Gain Ratio-F2WKNN

    Relief-F2WKNN

    Figure 2. Comparative performance of F1WKNN classifier (with crisp method for membership

    assignment) using binary encoded GA identified features and real encoded GA and PSO identified

    feature weights vs. information gain, gain ratio and relief method for different datasets

    Figure 3. Comparative performance of F2WKNN classifier (with equation 5 for membership

    assignment) using binary encoded GA identified features and real encoded GA and PSO identified

    feature weights vs. information gain, gain ratio and relief method for different datasets

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    34 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    45

    50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris ionosphereDATA SETS

    Cla

    ssifi

    catio

    n Ac

    cura

    cyKNN Binary GA-WKNN Real GA-WKNN Real PSO-WKNN RBF

    SVM DT Bayes NaiveBayes

    45

    5055

    60

    657075

    8085

    9095

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere

    Datasets

    Cla

    ssifi

    catio

    n A

    ccu

    racy

    F1KNN Binary GA-F1WKNN Real GA-F1WKNNReal PSO-F1KNN RBF SVMDecision Tree Bayesian NaiveBayes

    Figure 5. Comparative performance of F1WKNN classifier (with crisp method for membership

    assignment) using binary encoded GA identified features and real encoded GA and PSO identified weights

    vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different datasets

    Figure 4. Comparative performance of Crisp KNN classifier (with crisp method for membership

    assignment) using binary encoded GA identified features and real encoded GA and PSO identified

    feature weights vs SVM, RBF, Decision Tree, Bayesian and Nave Bayes classifiers for different

    datasets

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 5, May 2014. ISSN 2348 - 4853

    35 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    45

    50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Heart-statlog Diabetes Wine Indian Liver Vehicle Iris Ionosphere

    Datase tsDatase tsDatase tsDatase ts

    Cla

    ssifi

    catio

    n A

    ccu

    racy

    F2KNN Binary GA-F2WKNN Real GA-F2WKNNReal PSO-F2WKNN RBF SVMDecision Tree Bayesian NaiveBayes

    ss

    Figure 6. Comparative performance of F2WKNN classifier (with equation 5 for membership assignment)

    using binary encoded GA identified features and real encoded GA and PSO identified weights vs SVM, RBF,

    Decision Tree, Bayesian and Nave Bayes classifiers for different datasets